Comprehensive Geographic Information Systems 9780128046609

Geographical Information Systems is a computer system used to capture, store, analyze and display information related to

1,578 137 80MB

English Pages [1488] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Comprehensive Geographic Information Systems
 9780128046609

Table of contents :
Volume 1: GIS METHODS AND TECHNIQUES
EDITOR IN CHIEF
VOLUME EDITORS
CONTRIBUTORS TO VOLUME 1
CONTENTS OF VOLUME 1
CONTENTS OF ALL VOLUMES
PREFACE
PERMISSION ACKNOWLEDGMENTS
1.01 The Future Development of GISystems, GIScience, and GIServices
1.02 Geocomputation: Data, Methods, and Applications in a New Era
1.03 Big Geodata
1.04 Current Themes in Volunteered Geographic Information
1.05 Open Data and Open Source GIS
1.06 GIS Databases and NoSQL Databases
1.07 Geospatial Semantics
1.08 Geocoding and Reverse Geocoding
1.09 Metadata and Spatial Data Infrastructure
1.10 Spatial Analysis Methods
1.11 Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)
1.12 Network Analysis
1.13 Analysis and Modeling of Movement
1.14 Spatial Metrics: The Static and Dynamic Perspectives
1.15 Multicriteria Analysis
1.16 Agent-Based Modeling
1.17 Spatial Optimization for Sustainable Land Use Planning
1.18 Geostatistical Approach to Spatial Data Transformation
1.19 Spatial and Spatiotemporal Data Mining
1.20 Space-Time GIS and Its Evolution
1.21 Time Geography
1.22 Spatial Data Uncertainty
1.23 Cyberinfrastructure and High-Performance Computing
1.24 Augmented Reality and GIS
1.25 GIS and Serious Games
1.26 Mobile GIS and Location-Based Services
1.27 Societal Impacts and Ethics of GIS
1.28 Geoprivacy
1.29 Defining Public Participation GIS
1.30 User-Centered Design for Geoinformation Technologies
1.31 GIS Project Management
Volume 2: GIS APPLICATIONS FOR ENVIRONMENT AND RESOURCES
EDITOR IN CHIEF
VOLUME EDITORS
CONTRIBUTORS TO VOLUME 2
CONTENTS OF VOLUME 2
CONTENTS OF ALL VOLUMES
PREFACE
PERMISSION ACKNOWLEDGMENTS
2.01 GIS for Mapping Vegetation
2.02 GIS for Paleo-limnological Studies
2.03 GIS and Soil
2.04 GIS for Hydrology
2.05 GIS Applications in Geomorphology
2.06 GIS for Glaciers and Glacial Landforms
2.07 GIS and Remote Sensing Applications in Wetland Mapping and Monitoring
2.08 GIS for Natural Resources (Mineral, Energy, and Water)
2.09 GIS for Urban Energy Analysis
2.10 GIS in Climatology and Meteorology
2.11 GIS and Coastal Vulnerability to Climate Change
2.12 Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study in Iran
2.13 Data Integration and Web Mapping for Extreme Heat Event Preparedness
2.14 GIS Technologies for Sustainable Aquaculture
2.15 An Integrated Approach to Promote Precision Farming as a Measure Toward Reduced-Input Agriculture in Northern Greece Using a Spatial Decision Support System
2.16 GIS and Placemaking Using Social Media Data
2.17 GIS and Scenario Analysis: Tools for Better Urban Planning
2.18 Transit GIS
2.19 Modeling Land-Use Change in Complex Urban Environments
2.20 Application of GIS-Based Models for Land-Use Planning in China
2.21 GIS Graph Tool for Modeling: Urban–Rural Relationships
Volume 3: GIS APPLICATIONS FOR SOCIO-ECONOMICS AND HUMANITY
EDITOR IN CHIEF
VOLUME EDITORS
CONTRIBUTORS TO VOLUME 3
CONTENTS OF VOLUME 3
CONTENTS OF ALL VOLUMES
PREFACE
PERMISSION ACKNOWLEDGMENTS
3.01 GIS and Spatial Statistics/Econometrics: An Overview
3.02 Estimating Supply Elasticities for Residential Real Estate in the United Kingdom
3.03 Forced Displacement and Local Development in Colombia: Spatial Econometrics Analyses
3.04 Searching for Local Economic Development and Innovation: A Review of Mapping Methodologies to Support Policymaking
3.05 An Agent-Based Model of Global Carbon Mitigation Through Bilateral Negotiation Under Economic Constraints: The Key Role of Stakeholders’ Feedback and Facilitated Focus Groups and Meetings in the Development of Behavioral Models of Decision-Making
3.06 GIS-Based Approach to Analyze the Spatial Opportunities for Knowledge- Intensive Businesses
3.07 GIS for History: An Overview
3.08 PastPlace Historical Gazetteer
3.09 Collaborative Historical Information Analysis
3.10 A Review on the Current Progress in Chinese Historical GIS Research
3.11 GIS in Linguistic Research
3.12 GIS in Comparative-Historical Linguistics Research: Tai Languages
3.13 Spatial Dimensions of American Politics
3.14 GIS-Enabled Mapping of Electoral Landscape of Support for Political Parties in Australia
3.15 A Global Administrative Solution to Title and Tenure Insecurity: The Implementation of a Global Title and Rights Registry
3.16 Revamping Urban Immovable Property Tax System by Using GIS and MIS: A Case Study of Reforming Urban Taxation Systems Using Spatial Tools and Technology
3.17 Urban Dynamics and GIScience
3.18 Sensing and Modeling Human Behavior Using Social Media and Mobile Data
3.19 GIS-Based Social Spatial Behavior Studies: A Case Study in Nanjing University Utilizing Mobile Data
3.20 The Study of the Effects of Built Form on Pedestrian Activities: A GIS-Based Integrated Approach
3.21 The Fusion of GIS and Building Information Modeling for Big Data Analytics in Managing Development Sites
3.22 Smarter Than Smart Cities: GIS and Spatial Analysis for Socio-Economic Applications That Recover Humanistic Media and Visualization
3.23 Comparing Global Spatial Data on Deforestation for Institutional Analysis in Africa
3.24 Constructing a Map of Physiological Equivalent Temperature by Spatial Analysis Techniques
3.25 GIS-Based Accessibility Analysis of Health-Care Facilities: A Case Study in Hong Kong
3.26 From Base Map to Inductive MappingdThree Cases of GIS Implementation in Cities of Karnataka, India
3.27 Using GIS to Understand Schools and Neighborhoods
INDEX
AUTHOR INDEX

Citation preview

COMPREHENSIVE GEOGRAPHIC INFORMATION SYSTEMS

This page intentionally left blank

COMPREHENSIVE GEOGRAPHIC INFORMATION SYSTEMS EDITOR IN CHIEF

Bo Huang The Chinese University of Hong Kong, Hong Kong

VOLUME 1

GIS METHODS AND TECHNIQUES VOLUME EDITORS

Thomas J. Cova The University of Utah, Salt Lake City, UT, United States

Ming-Hsiang Tsou San Diego State University, San Diego, CA, United States

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK 225 Wyman Street, Waltham, MA 02451, USA Copyright Ó 2018 Elsevier Inc. All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notice Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers may always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN 978-0-12-804660-9 For information on all publications visit our website at http://store.elsevier.com

Publisher: Oliver Walter Acquisition Editor: Priscilla Braglia Content Project Manager: Laura Escalante Santos Associate Content Project Manager: Paula Davies and Katie Finn Cover Designer: Mark Rogers Printed and bound in the United States

EDITOR IN CHIEF Bo Huang Dr. Bo Huang is a professor in the Department of Geography and Resource Management, The Chinese University of Hong Kong, where he is also the Associate Director of Institute of Space and Earth Information Science (ISEIS). Prior to this, he held faculty positions at the University of Calgary, Canada, and the National University of Singapore. He has a background and experience in diverse disciplines, including urban planning, computer science, Geographic Information Systems (GIS), and remote sensing. His research interests cover most aspects of GIScience, specifically the design and development of models and algorithms in spatial/spatiotemporal statistics, remote sensing image fusion and multiobjective spatial optimization, and their applications in environmental monitoring and sustainable land use and transportation planning. The Geographically and Temporally Weighted Regression (GTWR) model (available in his ResearchGate) that was developed by him in 2010 has now been widely used in a wide range of areas, including economics, environment, geography, and urban planning. Dr. Huang serves as the Asia-Pacific Editor of International Journal of Geographical Information Science (Taylor & Francis), the Executive Editor of Annals of GIS (Taylor & Francis), and the Chief Scientist of the Joint Laboratory of Smart Cities (Beijing). He was awarded Chang Jiang Chair Professorship in 2016 by the Ministry of Education of PR China.

v

This page intentionally left blank

VOLUME EDITORS Georg Bareth Georg Bareth studied Physical Geography at the University of Stuttgart and graduated in 1995. From 1996 to 1999 he received a PhD scholarship from the German Research Foundation (DFG) and worked on his thesis “Emissions of Greenhouse Gases from Agriculture – Regional Presentation and Estimation for a dairy farm region by using GIS” at the University of Hohenheim. In 2004, he habilitated in Agroinformatics, and since 2004, he holds a professorship for Geoinformatics at the University of Cologne.

Kai Cao Kai Cao is a lecturer in the Department of Geography at National University of Singapore (NUS), an affiliated researcher in the Institute of Real Estate Studies, a research associate in the Center for Family and Population Research, and a member of the steering committee of the Next Age Institute at NUS. He is serving on the Board Committee and as the chair of Newsletter Committee in the International Association of Chinese Professionals in Geographic Information Science. He had also been a member of the National Geographic’s Committee for Science and Exploration for one year. He obtained his BSc degree in Geography (Cartography and Geographic Information Science) and MPhil degree in Geography (Remote Sensing and Geographic Information Science) from Nanjing University in China, and his PhD degree in Geography from The Chinese University of Hong Kong. Prior to joining the Department of Geography at NUS, he had worked in the Center for Geographic Analysis at Harvard University, in the Department of Geography at the University of Illinois at Urbana–Champaign, and in the World History Center at the University of Pittsburgh, respectively. He was also a visiting research scholar in the Department of Human Geography and Spatial Planning at Utrecht University in 2009, and a visiting scholar in the Center for Spatial Studies and Department of Geography at University of California, Santa Barbara (UCSB) in 2012. Dr. Kai Cao specializes in GIScience, spatial simulation and optimization, urban analytics, and spatially integrated social science. He has published numerous internationally referred journal articles, book chapters, and conference papers in his field and had also been a guest editor of a special issue in the International Journal of Geographical Information Science on the topic of “Cyberinfrastructure, GIS and Spatial Optimization”, together with Dr. Wenwen Li from Arizona State University and Prof. Richard Church from UCSB.

vii

viii

Volume Editors Tom Cova

Tom Cova is a professor of Geography at the University of Utah and director of the Center for Natural and Technological Hazards. He received a BS in Computer Science from the University of Oregon and an MA and PhD in Geography from the University of California, Santa Barbara where he was an Eisenhower Fellow. Professor Cova’s research and teaching interests are environmental hazards, emergency management, transportation, and geographic information science (GIScience). His initial focus was regional evacuation modeling and analysis, but this has since been expanded to include emergency preparedness, public warnings, and protective actions. He has published in many leading GIS, hazards, and transportation journals including the International Journal of Geographical Information Science (IJGIS), Transactions in GIS, Computers, Environment and Urban Systems, Transportation Research A and C, Natural Hazards, Geographical Analysis, Natural Hazards Review, and Environment and Planning A. His 2005 paper in Natural Hazards Review resulted in new standards in the United States for transportation egress in fire-prone regions (National Fire Protection Association 1141). Concepts drawn from his 2003 paper on lane-based evacuation routing in Transportation Research A: Policy and Practice have been used in evacuation planning and management worldwide, most notably in the 2012 Waldo Canyon Fire evacuation in Colorado Springs. Professor Cova was a coinvestigator on the National Center for Remote Sensing in Transportation (NCRST) Hazards Consortium in 2001–04. Since then most of the support for his research has been provided by the National Science Foundation on projects ranging from evacuation versus shelter-in-place in wildfires to the analytical derivation of warning trigger points. He chaired the GIS Specialty Group for the Association of American Geographers in 2007–08 and the Hazards, Risks and Disasters Specialty Group in 2011–12. In 2008 he served as program chair for the International Conference on Geographical Information Science (GIScience, 2008) in Park City, Utah. He was a mentor and advisor for the National Science Foundation project “Enabling the Next Generation of Hazards Researchers” and is a recipient of the Excellence in Mentoring Award from the College of Social & Behavioral Science at the University of Utah.

Elisabete A. Silva Elisabete Silva, BA, MA (Lisbon), PhD (Massachusetts), MRTPI, is a University Senior Lecturer (Associate Professor) in Spatial Planning and a Fellow and DoS of Robinson College, University of Cambridge, UK. Dr. Silva has a research track record of 25 years, both at the public and private sector. Her research interests are centered on the application of new technologies to spatial planning, in particular city and metropolitan dynamic modeling through time. The main subject areas include land use change, transportation and spatial plans and policy, the use of Geographic Information Systems (GIS), spatial analysis, and new technologies/models in planning (i.e., CA and ABM). She is the coauthor of the Ashgate book A Planners’ Encounter With Complexity (2010) and The Routledge Handbook of Planning Research Methods (2014).

Chunqiao Song Chunqiao Song received his BS degree from Wuhan University in 2008 and his MS degree from the Chinese Academy of Sciences in 2011, respectively. Both major in geographic information science. He received his PhD degree in geography from the Chinese University of Hong Kong in 2014. He is currently working as a researcher in the University of California, Los Angeles. He focuses his research on developing the applications of remote sensing and geographic information techniques in large-scale environment monitoring and process modeling. It aims to contribute to the development of novel scientific, theoretical, and methodological aspects of geoinformatics techniques to understand how the key environment elements (e.g., water, ice, and ecosystem) respond to a changing climate and human intervention in High Mountain Asia and worldwide. His current research work includes (1) developing high-resolution satellite-based lake hydrographical datasets, which are available at global scale, and (2) understanding lake water storage dynamic and its hydrological processes and cryosphere on the Tibetan Plateau (Earth’s “Third Pole”) and high mountainous regions. He is the author of more than 50 primary research articles, reviews, and book chapters in hydrological, remote sensing, ecological, or environmental fields.

Volume Editors

ix

Yan Song Yan Song is a full professor at the Department of City and Regional Planning and director of the Program on Chinese Cities at the University of North Carolina at Chapel Hill. Dr. Song’s research interests includes low-carbon and green cities, plan evaluation, land use development and regulations, spatial analysis of urban spatial structure and urban form, land use and transportation integration, and how to accommodate research in the above fields by using planning supporting systems such as GIS, big data, and other computer-aided planning methods and tools.

Ming-Hsiang Tsou Ming-Hsiang (Ming) Tsou is a professor in the Department of Geography, San Diego State University (SDSU), and the founding director of the Center for Human Dynamics in the Mobile Age (HDMA) (http://humandynamics.sdsu.edu/). He received a BS (1991) from the National Taiwan University, an M.A. (1996) from the State University of New York at Buffalo, and a PhD (2001) from the University of Colorado at Boulder, all in Geography. His research interests are in Human Dynamics, Social Media, Big Data, Visualization, and Cartography, Web GIS, High Performance Computing (HPC), Mobile GIS, and K-12 GIS education. He is a coauthor of Internet GIS, a scholarly book published in 2003 by Wiley, and served on the editorial boards of the Annals of GIS (2008–), Cartography and GIScience (2013–), and the Professional Geographers (2011–). Tsou was the chair of the Cartographic Specialty Group (2007–08), the chair of the Cyberinfrastructure Specialty Group (2012–13) in the Association of American Geographers (AAG), and the cochair of the NASA Earth Science Enterprise Data System Working Group (ESEDWG) Standard Process Group (SPG) from 2004 to 2007. He has served on two US National Academy of Science Committees: “Research Priorities for the USGS Center of Excellence for Geospatial Information Science” (2006–07) and “Geotargeted Alerts and Warnings: A Workshop on Current Knowledge and Research Gaps” (2012–13). In 2010, Tsou was awarded a $1.3 million research grant funded by National Science Foundation and served as the principal investigator (PI) of the “Mapping ideas from Cyberspace to Realspace” (http://mappingideas.sdsu.edu/) research project (2010–14). This NSF-CDI project integrates GIS, computational linguistics, web search engines, and social media APIs to track and analyze public-accessible websites and social media (tweets) for visualizing and analyzing the diffusion of information and ideas in cyberspace. In Spring 2014, Tsou established a new research center, Human Dynamics in the Mobile Age (HDMA), a transdisciplinary research area of excellence at San Diego State University to integrate research works from GIScience, Public Health, Social Science, Sociology, and Communication. Tsou is the founding director of the HDMA Center. In Fall 2014, Tsou received an NSF Interdisciplinary Behavioral and Social Science Research (IBSS) award for “Spatiotemporal Modeling of Human Dynamics Across Social Media and Social Networks” (Award#1416509, $999,887, 2014–18, http://socialmedia.sdsu.edu/). This large interdisciplinary research project studies human dynamics across social media and social networks, focusing on information diffusion modeling over time and space, and the connection between online activities and real-world human behaviors (including disaster evacuation, vaccine exemption, etc). Tsou is also involved with several GIS education projects for K-12 and higher education. He has served on the AP GIS&T course advisory board at AAG and as a senior researcher in the National GeoTech Center, and the Geospatial Technology Coordinator in California Geographic Alliance to promote GIS education in universities, community colleges, and high schools. Tsou has conducted professional GIS training workshops for GIS teachers annually at the San Diego State University during the last 10 years (http://geoinfo.sdsu. edu/hightech/).

This page intentionally left blank

CONTRIBUTORS TO VOLUME 1 Jochen Albrecht Hunter College, City University of New York, New York, NY, United States Reem Y Ali University of Minnesota Twin Cities, Mississippi, MN, United States Li An San Diego State University, San Diego, CA, United States Marc P Armstrong The University of Iowa, Iowa City, IA, United States

Shaun Fontanella Ohio State University, Columbus, OH, United States Sven Fuhrmann George Mason University, Fairfax, VA, United States Song Gao University of California, Santa Barbara, CA, United States Rina Ghose University of Wisconsin-Milwaukee, Milwaukee, WI, United States

Hyowon Ban California State University, Long Beach, CA, United States

Michael F Goodchild University of California, Santa Barbara, CA, United States

Saad Saleem Bhatti University of Cambridge, Cambridge, United Kingdom

Jacob Hartz Rochester Institute of Technology, Rochester, NY, United States

Kai Cao National University of Singapore, Singapore Jeremy W Crampton University of Kentucky, Lexington, KY, United States

Nick Hedley Simon Fraser University, Burnaby, BC, Canada Alison Heppenstall University of Leeds, Leeds, United Kingdom

Andrew Crooks George Mason University, Fairfax, VA, United States

Paul Holloway University of York, York, United Kingdom

Kevin M Curtin George Mason University, Fairfax, VA, United States

Yingjie Hu University of Tennessee, Knoxville, TN, United States

Jie Dai San Diego State University, San Diego, CA, United States; and University of California, Santa Barbara, CA, United States

Miaoqing Huang University of Arkansas, Fayetteville, AR, United States

Emre Eftelioglu University of Minnesota Twin Cities, Mississippi, MN, United States

Zhe Jiang University of Alabama, Tuscaloosa, AL, United States

Rob Feick University of Waterloo, Waterloo, ON, Canada Colin J Ferster University of Victoria, Victoria, BC, Canada

Eric M Huntley University of Kentucky, Lexington, KY, United States

Emily C Kaufman University of Kentucky, Lexington, KY, United States Angelina Konovitz-Davern Rochester Institute of Technology, Rochester, NY, United States

xi

xii

Contributors to Volume 1

Dapeng Li Michigan State University, East Lansing, MI, United States

Lena Siedentopp United Nations University Institute for Environment and Human Security, Bonn, Germany

Linna Li California State University, Long Beach, CA, United States

Elisabete A Silva University of Cambridge, Cambridge, United Kingdom

Yan Li University of Minnesota Twin Cities, Mississippi, MN, United States Gengchen Mai University of California, Santa Barbara, CA, United States Jacek Malczewski Western University, London, ON, Canada Nick Malleson University of Leeds, Leeds, United Kingdom Ashely Miller Rochester Institute of Technology, Rochester, NY, United States Jennifer A Miller University of Texas at Austin, Austin, TX, United States Atsushi Nara San Diego State University, San Diego, CA, United States Trisalyn Nelson Arizona State University, Tempe, AZ, United States José Pedro Reis University of Cambridge, Cambridge, United Kingdom Colin Robertson Wilfrid Laurier University, Waterloo, ON, Canada David Schwartz Rochester Institute of Technology, Rochester, NY, United States Dara E Seidl San Diego State University, San Diego, CA, United States Shashi Shekhar University of Minnesota Twin Cities, Mississippi, MN, United States Xuan Shi University of Arkansas, Fayetteville, AR, United States

Scott Simmons Open Geospatial Consortium, Fort Collins, CO, United States Joerg Szarzynski United Nations University Institute for Environment and Human Security, Bonn, Germany Zhenyu Tan Wuhan University, Wuhan, China Xun Tang University of Minnesota Twin Cities, Mississippi, MN, United States Brian Tomaszewski Rochester Institute of Technology, Rochester, NY, United States Ming-Hsiang Tsou San Diego State University, San Diego, CA, United States Fahui Wang Louisiana State University, Baton Rouge, LA, United States Suzanne P Wechsler California State University, Long Beach, CA, United States David W S Wong George Mason University, Fairfax, VA, United States Ningchuan Xiao Ohio State University, Columbus, OH, United States Bo Xu California State University, San Bernardino, CA, United States Chen Xu University of Wyoming, Laramie, WY, United States Xinyue Ye Kent State University, Kent, OH, United States Eun-Hye Yoo University at Buffalo, Buffalo, NY, United States Peng Yue Wuhan University, Wuhan, China

CONTENTS OF VOLUME 1 Editor in Chief

v

Volume Editors

vii

Contributors to Volume 1

xi

Contents of All Volumes

xvii

Preface

xxiii

New Perspectives on GIS (Multidisciplinary) 1.01

The Future Development of GISystems, GIScience, and GIServices Ming-Hsiang Tsou

1

1.02

Geocomputation: Data, Methods, and Applications in a New Era Shaun Fontanella and Ningchuan Xiao

5

Data Management 1.03

Big Geodata Michael F Goodchild

19

1.04

Current Themes in Volunteered Geographic Information Colin J Ferster, Trisalyn Nelson, Colin Robertson, and Rob Feick

26

1.05

Open Data and Open Source GIS Xinyue Ye

42

1.06

GIS Databases and NoSQL Databases Peng Yue and Zhenyu Tan

50

1.07

Geospatial Semantics Yingjie Hu

80

1.08

Geocoding and Reverse Geocoding Dapeng Li

95

1.09

Metadata and Spatial Data Infrastructure Scott Simmons

110

xiii

xiv

Contents of Volume 1

Spatial Analysis and Modeling 1.10

Spatial Analysis Methods David W S Wong and Fahui Wang

125

1.11

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark) Chen Xu

148

1.12

Network Analysis Kevin M Curtin

153

1.13

Analysis and Modeling of Movement Paul Holloway and Jennifer A Miller

162

1.14

Spatial Metrics: The Static and Dynamic Perspectives Saad Saleem Bhatti, José Pedro Reis, and Elisabete A Silva

181

1.15

Multicriteria Analysis Jacek Malczewski

197

1.16

Agent-Based Modeling Andrew Crooks, Alison Heppenstall, and Nick Malleson

218

1.17

Spatial Optimization for Sustainable Land Use Planning Kai Cao

244

1.18

Geostatistical Approach to Spatial Data Transformation Eun-Hye Yoo

253

Space-Time GIS 1.19

Spatial and Spatiotemporal Data Mining Shashi Shekhar, Yan Li, Reem Y Ali, Emre Eftelioglu, Xun Tang, and Zhe Jiang

264

1.20

Space-Time GIS and Its Evolution Atsushi Nara

287

1.21

Time Geography Jie Dai and Li An

303

Spatial Data Quality 1.22

Spatial Data Uncertainty Linna Li, Hyowon Ban, Suzanne P Wechsler, and Bo Xu

313

Cyberinfrastructure and GIS 1.23

Cyberinfrastructure and High-Performance Computing Xuan Shi and Miaoqing Huang

341

Virtual GIS 1.24

Augmented Reality and GIS Nick Hedley

355

1.25

GIS and Serious Games Brian Tomaszewski, Angelina Konovitz-Davern, David Schwartz, Joerg Szarzynski, Lena Siedentopp, Ashely Miller, and Jacob Hartz

369

Contents of Volume 1

xv

Mobile GIS 1.26

Mobile GIS and Location-Based Services Song Gao and Gengchen Mai

384

Public GIS 1.27

Societal Impacts and Ethics of GIS Jeremy W Crampton, Eric M Huntley, and Emily C Kaufman

398

1.28

Geoprivacy Marc P Armstrong, Ming-Hsiang Tsou, and Dara E Seidl

415

1.29

Defining Public Participation GIS Rina Ghose

431

GIS Design and Project Management 1.30

User-Centered Design for Geoinformation Technologies Sven Fuhrmann

438

1.31

GIS Project Management Jochen Albrecht

446

This page intentionally left blank

CONTENTS OF ALL VOLUMES VOLUME 1: GIS METHODS AND TECHNIQUES New Perspectives on GIS (Multidisciplinary) 1.01

The Future Development of GISystems, GIScience, and GIServices Ming-Hsiang Tsou

1

1.02

Geocomputation: Data, Methods, and Applications in a New Era Shaun Fontanella and Ningchuan Xiao

5

Data Management 1.03

Big Geodata Michael F Goodchild

19

1.04

Current Themes in Volunteered Geographic Information Colin J Ferster, Trisalyn Nelson, Colin Robertson, and Rob Feick

26

1.05

Open Data and Open Source GIS Xinyue Ye

42

1.06

GIS Databases and NoSQL Databases Peng Yue and Zhenyu Tan

50

1.07

Geospatial Semantics Yingjie Hu

80

1.08

Geocoding and Reverse Geocoding Dapeng Li

95

1.09

Metadata and Spatial Data Infrastructure Scott Simmons

110

Spatial Analysis and Modeling 1.10

Spatial Analysis Methods David W S Wong and Fahui Wang

125

1.11

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark) Chen Xu

148

xvii

xviii

Contents of All Volumes

1.12

Network Analysis Kevin M Curtin

153

1.13

Analysis and Modeling of Movement Paul Holloway and Jennifer A Miller

162

1.14

Spatial Metrics: The Static and Dynamic Perspectives Saad Saleem Bhatti, José Pedro Reis, and Elisabete A Silva

181

1.15

Multicriteria Analysis Jacek Malczewski

197

1.16

Agent-Based Modeling Andrew Crooks, Alison Heppenstall, and Nick Malleson

218

1.17

Spatial Optimization for Sustainable Land Use Planning Kai Cao

244

1.18

Geostatistical Approach to Spatial Data Transformation Eun-Hye Yoo

253

Space-Time GIS 1.19

Spatial and Spatiotemporal Data Mining Shashi Shekhar, Yan Li, Reem Y Ali, Emre Eftelioglu, Xun Tang, and Zhe Jiang

264

1.20

Space-Time GIS and Its Evolution Atsushi Nara

287

1.21

Time Geography Jie Dai and Li An

303

Spatial Data Quality 1.22

Spatial Data Uncertainty Linna Li, Hyowon Ban, Suzanne P Wechsler, and Bo Xu

313

Cyberinfrastructure and GIS 1.23

Cyberinfrastructure and High-Performance Computing Xuan Shi and Miaoqing Huang

341

Virtual GIS 1.24

Augmented Reality and GIS Nick Hedley

355

1.25

GIS and Serious Games Brian Tomaszewski, Angelina Konovitz-Davern, David Schwartz, Joerg Szarzynski, Lena Siedentopp, Ashely Miller, and Jacob Hartz

369

Mobile GIS 1.26

Mobile GIS and Location-Based Services Song Gao and Gengchen Mai

384

Contents of All Volumes

xix

Public GIS 1.27

Societal Impacts and Ethics of GIS Jeremy W Crampton, Eric M Huntley, and Emily C Kaufman

398

1.28

Geoprivacy Marc P Armstrong, Ming-Hsiang Tsou, and Dara E Seidl

415

1.29

Defining Public Participation GIS Rina Ghose

431

GIS Design and Project Management 1.30

User-Centered Design for Geoinformation Technologies Sven Fuhrmann

438

1.31

GIS Project Management Jochen Albrecht

446

VOLUME 2: GIS APPLICATIONS FOR ENVIRONMENT AND RESOURCES GIS for Biophysical Environment 2.01

GIS for Mapping Vegetation Georg Bareth and Guido Waldhoff

1

2.02

GIS for Paleo-limnological Studies Yongwei Sheng, Austin Madson, and Chunqiao Song

28

2.03

GIS and Soil Federica Lucà, Gabriele Buttafuoco, and Oreste Terranova

37

2.04

GIS for Hydrology Wolfgang Korres and Karl Schneider

51

2.05

GIS Applications in Geomorphology Jan-Christoph Otto, Günther Prasicek, Jan Blöthe, and Lothar Schrott

81

2.06

GIS for Glaciers and Glacial Landforms Tobias Bolch and David Loibl

112

2.07

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring Qiusheng Wu

140

GIS for Resources 2.08

GIS for Natural Resources (Mineral, Energy, and Water) Wendy Zhou, Matthew D Minnick, and Celena Cui

158

GIS for Energy 2.09

GIS for Urban Energy Analysis Chaosu Li

187

xx

Contents of All Volumes

GIS and Climate Change 2.10

GIS in Climatology and Meteorology Jürgen Böhner and Benjamin Bechtel

196

2.11

GIS and Coastal Vulnerability to Climate Change Sierra Woodruff, Kristen A Vitro, and Todd K BenDor

236

GIS for Disaster Management 2.12

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study in Iran Alireza Motevalli, Hamid Reza Pourghasemi, and Mohsen Zabihi

2.13

Data Integration and Web Mapping for Extreme Heat Event Preparedness Bev Wilson

258 281

GIS for Agriculture and Aquaculture 2.14

GIS Technologies for Sustainable Aquaculture Lynne Falconer, Trevor Telfer, Kim Long Pham, and Lindsay Ross

2.15

An Integrated Approach to Promote Precision Farming as a Measure Toward Reduced-Input Agriculture in Northern Greece Using a Spatial Decision Support System Thomas K Alexandridis, Agamemnon Andrianopoulos, George Galanis, Eleni Kalopesa, Agathoklis Dimitrakos, Fotios Katsogiannos, and George Zalidis

290

315

GIS for Land Use and Transportation Planning 2.16

GIS and Placemaking Using Social Media Data Yan Chen

353

2.17

GIS and Scenario Analysis: Tools for Better Urban Planning Arnab Chakraborty and Andrew McMillan

371

2.18

Transit GIS Qisheng Pan, Ming Zhang, Zhengdong Huang, and Xuejun Liu

381

2.19

Modeling Land-Use Change in Complex Urban Environments Brian Deal, Haozhi Pan, and Youshan Zhuang

401

2.20

Application of GIS-Based Models for Land-Use Planning in China Huang Xianjin, Li Huan, He Jinliao, and Zong Yueguang

424

2.21

GIS Graph Tool for Modeling: UrbaneRural Relationships Paulo Morgado, Patrícia Abrantes, and Eduardo Gomes

446

VOLUME 3: GIS APPLICATIONS FOR SOCIO-ECONOMICS AND HUMANITY GIS for Economics 3.01

GIS and Spatial Statistics/Econometrics: An Overview Daniel A Griffith and Yongwan Chun

3.02

Estimating Supply Elasticities for Residential Real Estate in the United Kingdom Thies Lindenthal

1 27

Contents of All Volumes

xxi

3.03

Forced Displacement and Local Development in Colombia: Spatial Econometrics Analyses Néstor Garza and Sandra Rodriguez

42

3.04

Searching for Local Economic Development and Innovation: A Review of Mapping Methodologies to Support Policymaking 59 Alexander Kleibrink and Juan Mateos

3.05

An Agent-Based Model of Global Carbon Mitigation Through Bilateral Negotiation Under Economic Constraints: The Key Role of Stakeholders’ Feedback and Facilitated Focus Groups and Meetings in the Development of Behavioral Models of Decision-Making Douglas Crawford-Brown, Helin Liu, and Elisabete A Silva

69

GIS for Business and Management 3.06

GIS-Based Approach to Analyze the Spatial Opportunities for Knowledge-Intensive Businesses Mei Lin Yeo, Saad Saleem Bhatti, and Elisabete A Silva

83

GIS for History 3.07

GIS for History: An Overview N Jiang and D Hu

101

3.08

PastPlace Historical Gazetteer Humphrey Southall, Michael Stoner, and Paula Aucott

110

3.09

Collaborative Historical Information Analysis Patrick Manning, Pieter François, Daniel Hoyer, and Vladimir Zadorozhny

119

3.10

A Review on the Current Progress in Chinese Historical GIS Research Peiyao Zhang, Ning Bao, and Kai Cao

145

GIS for Linguistics 3.11

GIS in Linguistic Research Jay Lee, Jiajun Qiao, and Dong Han

152

3.12

GIS in Comparative-Historical Linguistics Research: Tai Languages Wei Luo, John Hartmann, Fahui Wang, Huang Pingwen, Vinya Sysamouth, Jinfeng Li, and Xuezhi Cang

157

GIS for Politics 3.13

Spatial Dimensions of American Politics Iris Hui and Wendy K Tam Cho

181

3.14

GIS-Enabled Mapping of Electoral Landscape of Support for Political Parties in Australia Robert J Stimson, Prem Chhetri, and Tung-Kai Shyy

189

GIS for Law and Regulations 3.15

A Global Administrative Solution to Title and Tenure Insecurity: The Implementation of a Global Title and Rights Registry C Kat Grimsley

257

xxii

Contents of All Volumes

3.16

Revamping Urban Immovable Property Tax System by Using GIS and MIS: A Case Study of Reforming Urban Taxation Systems Using Spatial Tools and Technology Nasir Javed, Ehsan Saqib, Abdul Razaq, and Urooj Saeed

272

GIS for Human Behavior 3.17

Urban Dynamics and GIScience Chenghu Zhou, Tao Pei, Jun Xu, Ting Ma, Zide Fan, and Jianghao Wang

297

3.18

Sensing and Modeling Human Behavior Using Social Media and Mobile Data Abhinav Mehrotra and Mirco Musolesi

313

3.19

GIS-Based Social Spatial Behavior Studies: A Case Study in Nanjing University Utilizing Mobile Data Bo Wang, Feng Zhen, Xiao Qin, Shoujia Zhu, Yupei Jiang, and Yang Cao

320

3.20

The Study of the Effects of Built Form on Pedestrian Activities: A GIS-Based Integrated Approach Ye Zhang, Ying Jin, Koen Steemers, and Kai Cao

330

3.21

The Fusion of GIS and Building Information Modeling for Big Data Analytics in Managing Development Sites Weisheng Lu, Yi Peng, Fan Xue, Ke Chen, Yuhan Niu, and Xi Chen

345

GIS for Evidence-Based Policy Making 3.22

Smarter Than Smart Cities: GIS and Spatial Analysis for Socio-Economic Applications That Recover Humanistic Media and Visualization Annette M Kim

3.23

Comparing Global Spatial Data on Deforestation for Institutional Analysis in Africa Aiora Zabala

371

3.24

Constructing a Map of Physiological Equivalent Temperature by Spatial Analysis Techniques Poh-Chin Lai, Pui-Yun Paulina Wong, Wei Cheng, Thuan-Quoc Thach, Crystal Choi, Man Sing Wong, Alexander Krämer, and Chit-Ming Wong

389

3.25

GIS-Based Accessibility Analysis of Health-Care Facilities: A Case Study in Hong Kong Wenting Zhang, Kai Cao, Shaobo Liu, and Bo Huang

402

3.26

From Base Map to Inductive MappingdThree Cases of GIS Implementation in Cities of Karnataka, India Christine Richter

3.27

Using GIS to Understand Schools and Neighborhoods Linda Loubert

Index

360

411 422

441

PREFACE Since its inception in the 1960s, Geographic Information System (GIS) has been undergoing tremendous development, rendering it a technology widely used for geospatial data management and analysis. The past several decades have also witnessed increasing applications of GIS in a plethora of areas, including environment, energy, resources, economics, planning, transportation, logistics, business, and humanity. The rapid development of GIS is partly due to the advances in computational technologies and the increasing availability of various geospatial data such as satellite imagery and GPS traces. Along with the technological development of GIS, its underlying theory has significantly progressed, especially on data representation, data analysis, uncertainty, and so on. As a result, the theory, technology, and application of GIS have made great strides, leading to a right time to summarize comprehensively such developments. Comprehensive Geographical Information System (CGIS) thus comes. CGIS provides an in-depth, state-of-the-art review of GIS with an emphasis on basic theories, systematic methods, state-of-the-art technologies, and its applications in many different areas, not only physical environment but also socioeconomics. Organized into three volumes, GIS theories and techniques, GIS applications for environment and resources, and GIS applications for socioeconomics and humanity, the book comprises 79 chapters, providing a comprehensive coverage of various aspects of GIS. In particular, a rich set of applications in socioeconomics and humanity are presented in the book. Authored and peer-reviewed by recognized scholars in the area of GIS, each chapter provides an overview of the topic, methods used, and case studies. The first volume of the book covers a wide spectrum of topics related to GIS methods and techniques, ranging from data management and analysis to various new types of GIS, e.g., virtual GIS and mobile GIS. While the fundamental topics in GIS such as data management, data analysis, and data quality are included, the latest developments in spaceetime GIS, cyber GIS, virtual GIS, mobile GIS, and public GIS are also covered. Remarkably, new perspectives on GIS and geocomputation are also provided. The further development of GIS is driven by the demand on applications, and various new data may be required. Big data has emerged to provide an opportunity to fuel the GIS development. Mike Goodchild provides an overview of such data, which is followed by voluntary geographic information, an important part of big geodata. Closely related to big data, open data is, however, accessible public data; they are not the same. Spatial analysis is indispensable for a GIS. After an overview of spatial analysis methods, big data analytics, spatial metrics, spatial optimization, and other relevant topics are included. Space and time are interrelated information, and their integration has long been an active research area in GIS. This section covers spaceetime data mining, spaceetime GIS, and time geography. Drawing on the developments in computer science and engineering, GIS has evolved to become more powerful through the integration with virtual reality and wireless technologies. Clearly, this volume provides new insights into different designs of GIS catering to the widespread needs of applications. This volume of the book will be of great interest not just to GIS researchers, but also to computer scientists and engineers. Environment and resources are fundamental to human society. The second volume of the book focuses on GIS applications in these areas. GIS has been widely used in the areas related to natural environments; hence various such applications using GIS, such as vegetation, soil, hydrology, geomorphology, wetland, glaciers and glacial landforms, and paleolimnology, are covered. Resources and energy are closely related to the environment and so applications in these aspects are also covered. Climate change represents a challenge to human sustainable development. One reason for this is that climate change is increasing the odds of more extreme weather events taking place. It is apparent that GIS has been capitalized on to address the related issues, starting from climatology and meteorology to disaster management and vulnerability analysis. Parallel to applications

xxiii

xxiv

Preface

for natural environment, resources, energy, and climate, GIS has also applied to human production activities, such as agriculture and aquaculture, which has also been covered in this volume. In addition to natural environment, built environment and its associated topics such as place-making, public transit, and land use modeling and planning are also included. Parallel to the second volume, the third volume of the book covers the applications of GIS in socioeconomics and humanities. Comparatively such applications are not as many as those in environment and resources. However, due to the increasing availability of data that can capture human activities, more applications have emerged in the areas, including economics, business management, history, linguistics, politics, law, human behavior, and policy making. Starting from Dan Griffith’s overview of GIS and spatial statistics/econometrics, GIS applications in real estate, local economic development, and carbon mitigation are then covered. Innovation drives economic growth in today’s knowledge-based economy; their relationship is covered in both the economics section and business management section. In addition to economics, GIS has also been widely applied to humanities. Such GIS applications as in history, linguistics, politics, and law are included. Human behavior has been given renewed emphasis due to the advent of social media and other types of big data. The first chapter in this section provides an overview of urban dynamics and geographic information science; several chapters are devoted to this topic. Finding evidence to support socioeconomic policy making is a highly important contribution that GIS can make. This volume also covers several chapters to find evidences for policy making. This book could have not been completed without the help and advice of many people. In this regard we would like to thank a number of people who were instrumental in bringing this project to fruition. First, I would like to acknowledge the enthusiastic support of an outstanding editorial team including Thomas Cova and Ming-Hsiang Tsou (Volume 1), Yan Song, Georg Bareth and Chunqiao Song (Volume 2), and Kai Cao and Elisabete Silva (Volume 3). From the initial discussions of the structure of the book, the selection of authors for chapters in different volumes, to the encouragement of authors and review of chapters, they have made significant contributions at each stage of the book. I am very grateful for their invaluable input and hardwork. I would also like to express my sincere gratitude to the production team at Elsevier, Priscilla, Paula, Katie, and in particular Laura, for their many efforts, perseverance, and skillful management of every aspect of this project. Last and certainly not least, I am hugely indebted to all of our authors. We have been extraordinarily fortunate in attracting individuals from all over the world to take time from their busy schedules to prepare this set of contributions. Finally, my special thanks go to my wife Rongrong and our daughter Kate for their love, help, and understanding. Without their endless support, this book would have never come to the end. Bo Huang, Editor in Chief

PERMISSION ACKNOWLEDGMENTS The following material is reproduced with kind permission of Taylor & Francis Figure 6 Spatial Analysis Methods Figure 7 Spatial Analysis Methods Figure 12 Spatial Analysis Methods Figure 2 GIS for Linguistic Research Figure 3 GIS for Linguistic Research Figure 4 GIS for Linguistic Research Figure 5 GIS for Linguistic Research Figure 6 GIS for Linguistic Research Table 2 GIS for Linguistic Research Table 3 GIS for Linguistic Research Figure 1 GIS and Scenario Analysis: Tools for Better Urban Planning Figure 4 GIS Applications in Geomorphology Figure 8 GIS for Glaciers and Glacial Landforms Figure 18 GIS for Glaciers and Glacial Landforms Table 1 Spatial Metrics - The Static and Dynamic Perspectives Table 2 Spatial Metrics - The Static and Dynamic Perspectives Figure 6 Urban Dynamics and GIScience Figure 7 Urban Dynamics and GIScience Figure 8 Urban Dynamics and GIScience Table 2 Using GIS to Understand Schools and Neighborhoods www.taylorandfrancisgroup.com

i

1.01

The Future Development of GISystems, GIScience, and GIServices

Ming-Hsiang Tsou, San Diego State University, San Diego, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.01.1 1.01.2 1.01.3 1.01.4 1.01.5 References

1.01.1

Introduction The Future Development of GISystems The Future Development of GIServices The Future Development of GIScience The Future Societal Impacts of GIS Development

1 1 2 3 3 4

Introduction

In the last decade, innovative computing technologies and new software applications have transformed GIS from a centralized, function-oriented Geographic Information Systems (GISystems) into distributed, user-centered Geospatial Information Services (GIServices). Many new web services, open data, big data, geospatial cyberinfrastructure, mobile apps, and web map application programming interfaces (APIs) have become essential components within the GIS ecosystem. The fundamental knowledge of Geographic Information Science (GIScience) is also changing dramatically. GIS databases have shifted from relational databases to NoSQL databases. Data collection methods have been changed from paper-based digitization procedures to GPS tracking, to volunteered geographic information (VGI) and crowdsourcing. GIS software is transforming from desktop standalone programs to mobile app design, to Cloud-based web services. This article introduces some prominent future development directions from three aspects of GIS: GISystems, GIServices, and GIScience. Before we can describe these future technological advances of GIS in detail, it is important to provide a clear definition of the three aspects of GIS and their associated contents as follows: l

l

l

l

l

GIS is the abbreviation for geographic information systems or geospatial information services or geographic information science. It is a multifaceted research and technology domain and a generalized concept for describing geospatial technologies, applications, and knowledge. Geographic Information Systems (GISystems) focus on the development of computing software/hardware for conducting mapping and spatial analysis functions. Run-time performance, system architecture, information process flow, geocoding, user interface design, and database management are several key issues for the development of GISystems. Geospatial Information Services (GIServices) represent the service perspective of GIS, that is, delivering geospatial information, mapping services, and spatial analysis tools to end users over the Internet or mobile devices. Usability and User Experience (UX) are essential components for evaluating the effectiveness of GIServices. Geographic Information Science (GIScience) is “the development and use of theories, methods, technology, and data for understanding geographic processes, relationships, and patterns” (Mark, 2003; UCGIS, 2016 [2002], p. 1). GIScience is question-driven and follows scientific methods (questions, hypothesis, testing, analysis, and falsification). Geospatial cyberinfrastructure is the combination of distributed high-performance geospatial computing resources, comprehensive geospatial data coverages, wireless mobile networks, real-time geotagged information, geoprocessing web services, and geographic knowledge. The goal of geospatial cyberinfrastructure is to facilitate the advancement of GIScience research, geospatial information services, and GIS applications (modified from Zhang and Tsou, 2009).

The main driven force of future GIS development will be the advancement of geospatial cyberinfrastructure, which can enable fast and robust GISystems, provide smart and intelligent GIServices, and transform GIScience from a specialized scientific discipline into an important research domain bridging data science, computer science, and geography together. The following section provides some prominent predictions about the future development of GISystems, GIServices, and GIScience.

1.01.2

The Future Development of GISystems

There are four unstoppable trends in the future development of GISystems: (1) Web-based and Cloud-based GIS; (2) personalized data collection methods via mobile apps, drones, digital cameras, and portable LIDAR devices; (3) high-performance computing (HPC) and dynamic data storage services; and (4) lightweight and responsive mapping APIs with lightweight geodata exchange formats. In the future, traditional desktop GIS software (such as ArcGIS, ArcGIS Pro, QGIS, gvSIG, uDIG, and MapInfo) probably will be used only by a small group of GIS professionals (20%) who need to handle sensitive or protected geospatial data within local and secured workstations. Most GIS users (80%) will utilize Web GIS and Cloud computing frameworks, such as ArcGIS online, Google Maps, Google Earth, MapBox, and CartoDB toolboxes, to conduct GIS tasks and spatial analysis functions.

1

2

The Future Development of GISystems, GIScience, and GIServices

Mobile GIS apps will be the main personalized data collection tool for future GIS applications. Several popular tools, such as ESRI Survey 123, ESRI Collector, and GIS Cloud, can enable GIS data collection via mobile phones and combine photos, videos, surveys, and GPS coordinates into online databases directly. Collected GIS data via mobile devices will be uploaded or synced via wireless communication to Cloud-based databases or storage services. Other personalized geospatial data collection devices, such as Unmanned Aircraft Systems (UAS) or Drones, digital cameras, portable 3D LIDAR scanning systems, and mapping vehicles (such as Google Street View Cars), will be integrated into Web GIS or Cloud GIS platforms seamlessly to provide high-resolution aerial photos, street views, or digital elevation models for various GIS applications. Many GIS operations are computational intensive and require huge sizes of memories or data storage spaces. The recent development of big data and HPC framework, such as Hadoop, Apache Spark, and MapReduce, can be applied in future GIS data models and databases. These big data computing framework will enhance the performance of GIS operations significantly. However, the main challenge will be how to convert GIS data and spatial analysis operations suitable for parallel operations and how to set up the cluster-computing frameworks for various GIS applications. Another promising direction is to utilize graphics processing units (GPU) for the intensive 3D or animation display of GIS applications. The future development of GIS software programs will also become more light-weighted and customizable for different applications. Some new web mapping service APIs and libraries, such as Leaflet, MapBox, CartoDB, and ArcGIS online, can provide dynamic mapping or spatial query functions for lightweight web apps or mobile apps (Tsou et al., 2015). GIS is no longer a large standalone system equipped with hundreds of functions inside a box but rather a customizable service framework, which can provide fast and simple GIS functions and services to end users (Tsou, 2011). Along with the development of lightweight mapping functions (such as Leaflet), lightweight data exchange formats (such as GeoJSON) will become very popular in Web GIS applications. GeoJSON is a JSON (JavaScript Object Notation)-based geospatial data-interchange format for web apps or mobile apps. It is a text-based data format utilizing JSON, decimal coordinate systems, and a predefined projection framework. Software developers can easily develop dynamic and responsive Web GIS by using lightweight mapping APIs and GeoJSON. In summary, GISystems have evolved from the mainframe computers (in 1970s and 1980s) to desktop GIS (in 1990s), to Web GIS (in 2000s), and to mobile apps (in 2010s). The performance and functionality of GISystems have been improved significantly to meet the needs from various GIS users and applications. In the near future, every single GISystem can be linked and integrated together into a global geospatial cyberinfrastructure (with hundreds of thousands of GIS nodes across the whole world) (Tsou and Buttenfield, 2002). These dynamic GIS nodes can provide personalized and customizable GIServices for various users. The next section provides some good examples of future GIServices.

1.01.3

The Future Development of GIServices

GIServices are essential in our daily life. This section focuses on four types of important GIServices and discusses their future development: navigation services, web mapping services, spatial query and analysis services, and location-based services (LBS). Navigation services are probably the most popular and heavily used GIServices in both mobile apps and web apps today. Popular navigation service platforms include Google Maps, Apple maps, HERE, Navigator for ArcGIS, MapQuest, and Bing maps. Uber app is another good example of navigation services for both drivers and passengers. Navigation services required comprehensive base road maps with detailed points of interests (POIs), real-time road condition, and traffic updates. One major application of navigation services in the future will be the development of self-driving cars (autonomous car). Self-driving cars will require a seamless integration between the navigation services and the sensor data (cameras, LIDAR, etc.) collected in real time on each vehicle. The future development of navigation services will need to integrate with all traffic cameras, weather stations, and the sensors collected from nearby vehicles. Hundreds of nearby autonomous cars will create a “mesh network” dynamically, and each nearby self-driving car can provide and relay traffic data via wireless communication to each other. The mesh network can provide real-time traffic and road condition updates automatically. All nearby autonomous cars can cooperate in the distribution of traffic data and navigation services together. Recently, web mapping services have been applied in various mobile apps and GIS applications. For example, Pokémon GO utilized popular Google Mapping services to create a virtual world for users to catch monsters, eggs, and treasures. Zilliow and Foursquare used Google Mapping services to provide locational information and maps for their customers. Several prominent web mapping service developers, such as MapBox and CartoDB, have developed interactive, responsive, and fast mapping services to different GIS applications. One challenge of web mapping services is to provide effective map display on multiple devices using the same map contents. For example, users will need to display a campus map on his/her smart watches (320  320), mobile phones (750  1334), high-resolution computer screen (3840  1600), and smart 8K UHD TV (7680  4320) simultaneously. Advanced map generalization and intelligent cartographic mapping principles will be developed to transform web maps into responsive display for fitting different devices and screen resolutions. Web mapping services will provide both 2D and 3D display functions for next generation of web map applications for virtual reality (VR) and augmented reality (AR) applications. In terms of spatial query and analysis services, one future application for utilizing these services will be the development of Smart Cities and Smart Transportation Systems (Smart Traffic Controls). For example, a visitor will be able to use his/her smart phone to query the best walking/running route nearby the hotel and to avoid unsafe areas and heavy traffic zones in real time. Car drivers will get advice and warning about possible traffic jams nearby and provide alternative routing options (which is already available in Google Maps now). One major future application of spatial analysis services could come from a virtual personal assistant in

The Future Development of GISystems, GIScience, and GIServices

3

a mobile phone, who can provide recommended shopping, eating, driving and parking, movie watching, dating, and exercising choices nearby the locations of users. Some advanced spatial analysis functions, such as clustered dots and hot spot analysis, can be applied to the crowd management for music concerts, conference meetings, and popular events. LBS focus on the collection of consumer information based on the location of users and the nearby environment. LBS can include or combine with navigation services, web mapping services, and spatial analysis services. However, LBS will only focus on the nearby information or POI, rather than providing information far away from the users. Currently, the outdoor locations of users can be defined by GPS signals, Wi-Fi signatures, and cellular tower signal triangulation. One current technological challenge of LBS is how to provide a better and accurate indoor positioning system (IPS). Several possible technological frameworks of IPS include Wi-Fi access point signal triangulation, magnetic positioning, iBeacon, RFID tags, etc. However, most of IPSs require the setup of indoor environment labels in advance or the 3D scanning of each room before the positioning process. Some potential LBS applications for IPS are hospital patient room arrangement, conference exhibit halls, and popular event promotions.

1.01.4

The Future Development of GIScience

The knowledge domain of GIScience will change dramatically in the next decade driven by new types of GIServices and new design of GISystems. Four research topics in GIScience are highlighted in this section as representative trends: machine learning methods, crowdsourcing data, new data models for big data, and human dynamics. Machine learning methods are derived from the development of artificial intelligence (AI) and statistic models. The GIScience community has developed a few applications utilizing AI and expert systems before (Openshaw and Openshaw, 1997). However, due to the lack of programming skills and suitable HPC frameworks, very few GIS researchers have developed fully functional AI or expert systems for GIS applications. Some cartographers have developed very limited expert systems for providing intelligent mapping, text labeling, and symbolization functions before. The recent development of geospatial cyberinfrastructure and easy-to-learn programming languages, such as Python and R, has enabled GIScientists to utilize powerful machine learning methods to develop intelligent web mapping and spatial analysis functions. Several machine learning methods (such as K-means, logistic regression, decision tree, deep learning, principal component analysis (PCA), support vector machine (SVM), and Naïve Bayes) can be applied in GIS data classification, map symbolization, spatial analysis, spatial pattern detection, and geovisualization. For example, dasymetric mapping methods can be improved by using SVM or Naïve Bayes to estimate the population density based on different types of land use and land cover. Geographic weighted regression (GWR) models can adopt PCA to provide a better explanation of multi-variables’ contribution to the targeted data layer. Crowdsourcing and citizen science have become major data input methods in GIScience. VGI is one popular type of crowdsourced data. Some VGI applications include OpenStreetMap, Waze, and iNaturalist. Other crowdsourced data input methods include geotagged social media data, wearable sensor data for mHealth, or GPS tracking data from bikes or taxi, which are not VGI. One major challenge of crowdsourced data is how to assess the credibility and accuracy of collected data. Since there are many errors in crowdsourced data, it is extremely important to develop effective data filtering, data cleaning, and data validation procedures for crowdsourced data. Sampling problems and user biases are other major concerns in crowdsourced data. For example, social media users (such as Twitter and Instagram) are mostly under age 35 and live in urban areas. Most volunteers working in OpenStreetMap are white male persons with full-time jobs. Traditional GIS data models include vector-based object data model and raster-based field data model. However, very few geodatabases can provide effective space–time relationship for advanced spatiotemporal geography analysis. Along with new types of big data collections (such as social media and crowdsourced data), many traditional GIS models are no longer suitable for big geodata. NoSQL databases (as MongoDB) and new space-time data models will become more popular in the future, and researchers can utilize new data models to build more effective and customizable geospatial data analytics. Another emerging research topic in GIScience is human dynamics, which can be defined as a transdisciplinary research field focusing on the understanding of dynamic patterns, relationships, narratives, changes, and transitions of human activities, behaviors, and communications. Many scientific research projects (in the fields of public health, GIScience, civil engineering, and computer science) are trying to study human dynamics and human behaviors. One main goal of these projects is to develop effective intervention methods to modify or change human behaviors and to resolve public health problems (such as obesity, disease outbreaks, and smoking behaviors) or transportation problem (traffic jams and vehicle incidents). Several innovative data collection methods can be applied to study human dynamics. For example, researchers can use computer vision algorithms to analyze Google Street Views and to estimate the built environment index and neighborhood social status. Combined CCTVs in urban areas and street traffic cameras can be used to analyze the usage of bike lanes and biking behaviors in different communities/neighborhoods. The frequency of geotagged social media check-ins can be used to estimate dynamic changes of population density for supporting disaster evacuation decision support systems.

1.01.5

The Future Societal Impacts of GIS Development

This article highlighted several prominent applications and topics in the future development of GISystems, GIServices, and GIScience. Many GIS researchers may think that the advancement of future GIS applications can provide better information services

4

The Future Development of GISystems, GIScience, and GIServices

for the general public and improve quality of life for everyone. However, the spatial disparity of geospatial technology and the potential digital discrimination between rural and urban areas could trigger serious social problems and social unrest in the future. Since the development of geospatial cyberinfrastructure is expensive and unequal, there are huge gaps of cyberinfrastructure between rural and urban areas, between developed and developing countries, and between the rich and the poor. For example, major cities in the United States have the most updated high-resolution aerial photos compared to some African regions, which only have low-resolution satellite images 10 years ago. Google Street Views are updated frequently in New York and San Francisco, but many small US cities have no Google Street Views at all. The spatial disparity of geospatial infrastructure can trigger “digital discrimination” to the people who live in low-income and rural areas. Along with the development of future GIServices, such as self-driving cars and smart transportation systems, people who live in rural and low-income areas will not be able to access these advanced GIServices. The advancement of geospatial technology will exaggerate the digital discrimination and the digital divide between urban and rural areas. The rich get richer and the poor get poorer. To solve these potential social unrest and social problems, local and federal governments need to make significant investment of geospatial cyberinfrastructure in rural and low-income areas to reduce the disparities of GIServices across different regions. Hopefully, everyone can enjoy the progress of GISystems and GIServices without worrying potential social unrest in the future.

References Mark, D.M., 2003. Geographic information science: Defining the field. Foundations of Geographic Information Science 1, 3–18. Openshaw, S., Openshaw, C., 1997. Artificial intelligence in geography. Wiley, Chichester. Tsou, M.H., 2011. Revisiting web cartography in the United States: The rise of user-centered design. Cartography and Geographic Information Science 38 (3), 249–256. Tsou, M.H., Buttenfield, B.P., 2002. A dynamic architecture for distributing geographic information services. Transactions in GIS 6 (4), 355–381. Tsou, M.H., Jung, C.T., Allen, C., Yang, J.A., Gawron, J.M., Spitzberg, B.H., Han, S., 2015. Social media analytics and research test-bed (SMART dashboard). In: Proceedings of the 2015 International Conference on Social Media & SocietyACM, New York, p. 2. University Consortium for Geographic Information Science (UCGIS) (2016 [2002]). UCGIS bylaws (revised in 2016). http://www.ucgis.org/assets/docs/ucgis_bylaws_march2016.pdf (accessed 6 March 2017). Zhang, T., Tsou, M.H., 2009. Developing a grid-enabled spatial Web portal for Internet GIServices and geospatial cyberinfrastructure. International Journal of Geographical Information Science 23 (5), 605–630.

1.02

Geocomputation: Data, Methods, and Applications in a New Era

Shaun Fontanella and Ningchuan Xiao, Ohio State University, Columbus, OH, United States © 2018 Elsevier Inc. All rights reserved.

1.02.1 1.02.2 1.02.3 1.02.3.1 1.02.3.2 1.02.3.3 1.02.3.4 1.02.4 1.02.4.1 1.02.4.1.1 1.02.4.1.2 1.02.4.2 1.02.4.3 1.02.4.4 1.02.4.5 1.02.5 1.02.5.1 1.02.5.2 1.02.5.3 1.02.5.4 1.02.6 1.02.7 References

1.02.1

Introduction Early Stage of Geocomputation and GIS Computing on the World Wide Web Spatial Databases Spatial Data as Web Services New Data Formats New Challenges of the Web The Move to the Cloud Host Virtualization Why virtualization Implementing host virtualization Containerization Application Hosting Cloud Computation Software as a Service Computational Methods Visualization Machine Learning Spatial Optimization Spatial Simulation Visualizing Twitter Data Conclusion

5 6 7 8 9 10 11 11 11 12 13 13 13 13 13 14 14 15 15 16 16 17 17

Introduction

Geographers embraced computing technology in their research and applications in the very early years of computers (Chrisman, 2006). Thoughts of using computational methods to solve geographic problems started to emerge in the 1960s (Chorley and Haggett, 1967). Such a trend gave rise to geographic information systems (GIS), which quickly became the dominating terminology in almost every field that involves spatial data. However, many quantitative geographers appeared to refuse to equate GIS with the computational needs and applications in geography. As a consequence, 1996 saw the first GeoComputation conference held by the Department of Geography at the University of Leeds. The organizers of this conference described geocomputation as a new paradigm and declared the “dawning of a new era of geocomputation” (Openshaw and Abrahart, 1996). Themes of the first GeoComputation conference include high-performance computing, artificial intelligence, and GIS. But the extent and scope of the conference sequence, along with the research field, have quickly changed to embrace broader technological advances and application domains (Gahegan, 1999). Recent years saw a clear trend in computing technology that has started to shape the field of geocomputation. Many of the traditional geocomputational tasks such as mapping and spatial analysis have moved off the desktop and onto Web-based GIS platforms. Entire computational workflows, such as data collection, analysis, and mapping, can be done on hardware agnostic webpages. In addition to these new geocomputational capabilities, data-driven geography (Miller and Goodchild, 2015) has replaced the data paucity of early geocomputation and GIS. To be sure, traditional desktop platforms will still have their place, but for those with limited resources and technical skills, the Web offers a powerful geocomputational platform at costs significantly less than in the past. In this new era of Web GIS, location-aware mobile devices, inexpensive cloud computing, and widespread broadband connectivity tend to dominate the conversation. However, in the background, the Web connects all of these technologies together. Without the Web, many of these technologies would be stranded on islands of spatial data computing as they were in the previous era; unable to communicate because of incompatible programs and protocols. The modern Web has connected the many islands of computing together and become the home of much of the data and processing power. This article will discuss significant steps in the progress of computing technology and then present a case study that embodies many of the modern aspects of geocomputation. In the next section, we discuss computation in GIS by providing a narrative of data gathering and management techniques. In section “Computing on the World Wide Web”, cloud infrastructure will be explained. We reinforce the idea that the cloud is what makes many of the new computing capabilities possible as it provides the facilities to build and host the Web. Cloud infrastructure has removed much of the friction from the process of deploying GIS applications. It

5

6

Geocomputation: Data, Methods, and Applications in a New Era

has lessened the barriers of entry to the Web and allowed individuals and small companies access to what was once prohibitively expensive technology. In section “The Move to the Cloud”, we continue the discussion of cloud computing with a focus placed on the enabling techniques. In section “Computational Methods”, we overview some of the computational methods that can be used to provide behind-the-scene analysis of spatial data. In the final part of this article we will present a case study that employs many of the themes described above. The case study application lives in the cloud where it actively collects data for updated analysis and visualization in the cloud. It must be pointed out that in this article we often use the terms geocomputation and GIS in an almost interchangeable fashion when discussing computational issues in handling spatial data. We recognize that GIS is a much broader term that involves issues beyond computation. But we also recognize that it is difficult to clearly delineate the strictly computational aspects from what we would normally call GIS, even though the term geocomputation did not become part of the literature until the late 1990s. In the text that follows, when the term GIS is used, we intend to focus on the computation aspect of GIS.

1.02.2

Early Stage of Geocomputation and GIS

For those new to geocomputation and GIS, it is easy to think that the current environment, where computing resources and spatial data are abundant, has always been the norm. This is not so. The explosion of available data and geocomputation resources is a phenomenon as recent as the last decade. There were two previous eras of geocomputation and GIS that were much different. It is important to have some historical perspective to give context to the current environment. As this collection has an interdisciplinary scope, a review of the geocomputation landscape will be useful to the reader coming to this work from other disciplines. We roughly identify three eras of developments in geocomputation and GIS. In the first era, computing resources and spatial data were scarce. In the very beginning of the discipline, computation on spatial data was done on large and highly expensive mainframe computers. It was mostly governments and universities who could afford these mainframe computers and so they were some of the few who could produce and analyze digital spatial data. Indeed, the term GIS was coined in the 1960s by Roger Tomlinson’s group who were working on the Canada Geographic Information System (Tomlinson, 1998). The Canada GIS was a large government program that used expensive computer equipment to solve a spatial problem that could not be feasibly completed with manual analysis by humans using paper maps. It was only at that scale that computing on spatial data made economic sense. In this early era, computers were as large as whole rooms. Programs and data were stored and entered into a computer through punch cards and magnetic tape. Most output was to a printer, not to a screen, which was an expensive and optional add-on. The fundamental ideas of geocomputation also saw their roots in this early era but resources were so scarce that little progress could be made. Much of the early data were digitized from existing paper maps. These maps could be stitched together to form larger data sets. Areal imagery taken with film cameras was also a source of spatial data (Campbell and Wynne, 2011). Photos could be digitized into files for analysis using scanning equipment to create digital copies (Faust, 1998). Areal imagery produced from photographs changed to remote sensing when electronic sensors began to take images without film and outside the range of the visible spectrum. Just like spatial data analysis, data collection required expensive equipment and labor. Availability of spatial data and computing resources was very limited for most researchers in this first era. In the second era of geocomputation and GIS, Moore’s Law (Moore, 1965) eventually made computing cheap enough to enable GIS on desktop computers. New commercial as well as free software became available to do mapping and analysis (Hart and Dolbear, 2013; Goran, 1998). This move from the mainframe to the desktop was a significant shift in GIS. The drop in price was significant enough that both public and private sectors started building capacities to collect and process geographic data sets. However, during this second period, many of these computers were still islands of computing. If they were attached to networks, those networks often used LAN (local area network) protocols like IPX/SPX to share resources like file servers and printers on small isolated networks. These protocols could not communicate with the larger Internet. If data were to be shared, they often had to be put on physical media and transferred. The data island effect could (and still does) happen for reasons other than technical. GIS interoperability can also be restricted for of a number of institutional reasons. For instance, in the United States there has been little coordination or standardization among mapping agencies. Maps created at the county level often do not align their data or share standards with adjacent counties. This happens again at the state level. Even if the same software and file formats are used, researchers trying to create regional maps have been faced with collecting data from multiple entities. Despite powerful computers and fast networks, computing spatial data can still be impeded by institutional barriers. GPS (global positioning system) started to be used in this second period of geocomputation and GIS development. GPS has become an important tool as it makes spatial data easier and cheaper to collect. GPS accelerated the collection of spatial data but was limited in use in early applications. GPS was originally restricted to use by the military. When GPS was made available to the public the signal was intentionally degraded to decrease accuracy to 50 m. It wasn’t until the year 2000 that the signal degradation was turned off and GPS receivers could get good accuracy without assistance from ground stations. GPS greatly increased the speed of accurate data collection. However, in the early period of GPS use, it could still be an expensive, technical task to take a unit into the field, collect data, and upload those data to specialized and often expensive mapping software in order to do analysis and mapping. It wasn’t until later when GPS points could be automatically uploaded and mapped through Web connectivity that GPS truly exploded as a collection device.

Geocomputation: Data, Methods, and Applications in a New Era

7

The Internet slowly started to extend into the islands of computing, first by phone lines and later using broadband. The Internet’s communication protocol, TCP/IP, weeded out competing standards and eventually most computers were connected the Internet. At this point though, data were still mostly static files transferred through traditional Internet protocols like file transfer protocol (FTP) for larger files and email for smaller files. More complex data containers like spatial databases existed but were often secured behind firewalls, thus limiting their usefulness. During these early periods, computation of spatial data was conducted using a variety of static data files. Some of the most common early versions of static data, such as ASCII text files, would be recognized today. These were efficient means of sharing tabular data that associated attributes with geographic entities. They also worked well for describing where points were located. However, as more complex features and geographies emerged with better collection methods, ASCII files started to show their limitations. For discrete vector data, combining binary and text formats was a more efficient way of storing data. Formats like the shapefile (ESRI, 1998) were able to store complex geometries and attributes in compressed binary formats while still linking to text data for attribute data. Static data files have served the purpose of spatial data computation well but they have many limitations. These limitations have become more obvious as networks have tied together millions of computers. One of the problems with static files is that the entire file has to be downloaded and parsed even if just one row of information is needed. Static files may be very large. Some data from the US Census are multiple gigabytes in size. Getting these files may not be a problem on a fast network with a powerful computer. However, on a mobile network or with a resource-limited device, it may take a long time to download and parse the data. The growing size of static files poses problems for many commercial off-the-shelf programs. Most productivity applications like Microsoft Excel are not designed for large data sets. They have built-in limits to the amount of data they can handle. These limitations are traded for performance and simplicity. Large static files may require special programs or programming scripts for computation or for extracting more manageable data sets that can be manipulated with productivity software. Another limitation of static files is that they have problems holding multiple types of data efficiently. They are usually set up with a particular type of data intended for them. Researchers can now collect much more data from a diverse landscape of sources. One of the increasingly important trends is to gather data from individual silos and combine them for new analysis. Static files are poorly suited to complex queries across multiple tables of data. These types of research are better suited to databases with the ability to index data in multiple dimensions. Static files still have a place for publishing many data types and they will still be around for a long time, but they are increasingly sidelined by databases and Web services. Particularly in Web mapping applications, sending all possible data to a client and having the client computer sift through them is a problematic model on bandwidth-limited mobile devices. Spatial databases are becoming more popular because they now have the utility and security of Web technologies which we will discuss below.

1.02.3

Computing on the World Wide Web

The Web has been one of the greatest influences on geocomputation and GIS, and its effects can be observed in multiple ways. To begin with, it has become a platform for collecting, analyzing, and visualizing GIS data. Online mapping platforms from Carto, Mapbox, Tableau, and Esri allow users to present spatial data on platform agnostic Web interfaces from any computer while the computational tasks are performed on the server side, behind the scene. In addition, the Web has standardized communication protocols and formats so that data can flow easily between systems. This important bridging function will be explained in greater detail below. It is important to make the distinction between the Web and the Internet as to most people the Web and the Internet are the same. Speaking simplistically, the Web is an application that runs through the communication pipe provided by the Internet. It may be a surprise to some but the first four nodes of the ARPANET, the precursor to the Internet, were connected in the summer of 1969 (Lukasik, 2011; Fidler and Currie, 2015). The Web was not developed by Tim Berners-Lee until the early 1990s. By this time email, file transfer, newsgroups, and other networked programs were well established on the Internet. At that time too, there were other competing systems for sharing documents like Gopher and wide area information server (WAIS). Berners-Lee credits the flexibility, open standards, and decentralized nature of the Web for making it the eventual winner for document sharing (Berners-Lee et al., 2000). These attributes also made it well suited for spatial data. Tim Berners-Lee’s description of one of the first real successful uses of the Web is a great example of the transition in data access created by the Web. This transition is echoed with data in the GIS world. At CERN, there was frequent staff turnover because research projects were often temporary. Printed phone books could not keep up with the constant change and were inevitably inaccurate. An electronic up-to-date version of the phone book could be accessed through the mainframe computer. However, accessing the mainframe required credentials. Even when logged in, sessions would time out to free up the limited number of licenses. Researchers had to log in every time they wanted to check a phone number. A webpage that accessed the phone number database was created that would allow read-only access to the mainframe from any computer on the network running the Web software (Berners-Lee et al., 2000). Requests were instantaneous and stateless so they didn’t tie up connections. They also didn’t require authentication as the phone book was public data. Many more hosts could be served with the same resources. This was a great example of the Web providing access to public information that was limited through authentication and commercial licensing. The three most crucial components that make the Web work are a set of communication protocols and file standards (Fu and Sun, 2010) that govern how data are transmitted. The first component is the hypertext transfer protocol (HTTP). HTTP transfers data

8

Geocomputation: Data, Methods, and Applications in a New Era

between client and server using an IP address and a port at that address. Ports differentiate applications on the same host. If a server is thought of as a warehouse, the IP address is the street address and the port is the dock door. Applications have officially designated ports maintained by the Internet Assigned Numbers Authority so that applications know the default port to communicate on. For instance, email servers use port 25 to send mail, and PostgreSQL servers use port 5432. The game DOOM is assigned port 666. These ports are open on local networks where access to a network can be physically controlled but it is unwise to open many application ports to the wider Internet for security reasons. These ports that serve out data are blocked by firewalls where the local network interfaces with the Internet. This is where the Web serves a valuable function. One of the important capabilities of HTTP is the bridging function it performs for many applications. Web servers run on port 80 for HTTP and 443 for encrypted traffic on HTTPS. Since this port is already open for webpages, Web servers can act as proxies for resources secured behind a firewall. This is commonly the case for GIS servers. They take requests on port 80, contact a database such as a PostgreSQL server on port 5432 to request data, and return the results on port 80 to the client. Web servers allow protected resources on a network to be more securely shared with the Internet. This is one of the most important functions they serve. The second component of the Web, hypertext markup language (HTML), is the base file format for sending webpages. HTML contains content such as the stock quotes and recipe ingredients that are on a webpage. The presentation of the content can be enhanced with cascading style sheets (CSS), Javascript, and other browser extensions like Flash or Silverlight, but the base container is HTML. HTML is an evolving standard. It is expanding to embrace capabilities that were once provided by add-ons like Flash or Silverlight. As HTML evolves, it continues to bring more sophisticated capabilities to the platform agnostic Web. The final important part of the Web is the address system that allows resources to be accessed on the Web. Uniform resource locators (URLs) are the Web addresses that describe where a resource is located. The first part of a URL is the hostname of the computer that has the desired data. This hostname part of the URL builds upon the already established domain name system that allows Internet-connected computers find each other. The second part of the URL describes the resource requested from the server. This part can be a file name, a search query, or data being returned to the server. This last part of the URL is one of the things that makes the Web so powerful. As long as the server getting the request knows how to parse the URL, it can contain a wide array of text. This gives developers great latitude when programming applications for Web servers. The Web has come a long way since its inception. Beautiful and well-designed webpages have replaced earlier, much cruder ones. But the facade of sophisticated webpages hides the even more important changes to the way webpages communicate data. Modern webpages rely on data being passed between client and server silently in the background. This communication often happens automatically without any interaction from the user. Predictive typing that guesses what a user is searching for on Google and prepopulates the search box is an example of data communication that is transparent to the webpage user. In the case of maps, the page needs data to update the map when a user interacts with it by panning, zooming, or switching layers on and off. The collection of protocols and technologies that communicate in the background to update the map seamlessly are part of what is known as Web 2.0 Goodchild (2007). Web 2.0 includes many themes of two-way communication between actors instead of one way consumption from few servers to many consumers (O’reilly, 2007), but this work will limit itself to a handful of protocols that make Web 2.0 work for Web GIS. These “Slippy” maps are a great example of Web 2.0 in the realm of geocomputation and GIS. Maps on the Web existed before Web 2.0 but they were much more difficult to use. Panning or zooming required reloading the entire webpage with the new map extent and zoom, and of course new ads. Web 2.0 makes maps much more usable than early versions. Web 2.0 transmits data in the background and only updates the region of the webpage that has the map on it. This function is provided mostly through protocols like asynchronous Javascript and XML (AJAX). These behind-the-scenes protocols have made data communication agnostic to the various operating systems, databases, and applications that share data all over the world. This standardization is helping provide access to many more data sets that were previously only available through tedious manual collection methods. It is one of the primary reasons that the problem of Big Data (Miller and Goodchild, 2015) exists. The variety and volume of data that can now be collected are exceeding our ability to process and store them. All of these available data have pushed researchers to use more sophisticated storage methods than traditional static files. Just as computer hardware has decreased in price, software, much of it through open-source development, has also become cheaper while at the same time becoming more sophisticated. In geocomputation and GIS, this has been the case with spatial databases. Spatial databases are the best method to maximize the research potential of the growing tide of collected data. Spatial databases will be discussed in the next section.

1.02.3.1

Spatial Databases

Spatial databases address many of the limitations of static data files. Spatial databases can contain large amounts of data in multiple tables with linking mechanisms that maintain data integrity. They can enforce restrictions on data entry to limit collection of inconsistent data. As they grow, they can span multiple physical machines as well as maintain copies of themselves for redundancy. Spatial databases can also maintain changes to a set of spatial data and track which users are making edits for approval and auditing. It is important to make the distinction between typical databases and spatial databases. Spatial databases are standard databases that have been extended to accept spatial data types and queries. Spatial data types store feature geometry that describes shape and location. The geometry of spatial features is compressed and stored in a binary field along with the attribute data that describe the feature. In addition, the database application code is extended so that typical queries using alphanumeric characters and logical operators are extended to take advantage of location, proximity, and topology. For instance, in a typical customer database

Geocomputation: Data, Methods, and Applications in a New Era

9

a company might query for all customers whose last name begins with a certain letter. In a spatial database, they can query for all customers within proximity of a particular store or find clusters of customers. These types of spatial transactions are not available in typical databases. Spatial databases existed long before the Web but they were often hidden behind firewalls or authentication and were unavailable to most users. Because spatial databases are usually hosted on powerful servers and are always connected to the Internet, they may become targets for hackers. Even with user authentication and read-only access, malicious users may attempt to gain unauthorized access to the database using bugs in the database or operating system programming code. That was the case when SQL Slammer used a buffer overrun exploitation to take control of over 75,000 SQL servers and brought the Internet to a crawl (Microsoft, 2003). Traditional software vendors like Microsoft, Oracle, and IBM all have database offerings that can be used for geocomputation and GIS applications. Database software is often very expensive to implement in terms of both the cost of the software and the labor to implement them. GIS software has benefited greatly from open-source software, and spatial databases are no exception. For researchers prototyping applications on little or no budget, open-source software offers several free and robust spatial database alternatives. The most popular open-sourced spatial database is PostgreSQL, extended with PostGIS. An emerging trend in databases is a more flexible “noSQL” database. Also called “document databases,” these databases store data in a format much like a JSON file. These databases don’t have the schemas that enforce data integrity in typical databases. Schemaless databases are not as structurally stringent and have flexibility with the data they can hold. This flexibility allows the storage of dissimilar data but requires that queries be designed to deal with dissimilar data. The most popular schemaless open-source spatial database software is called MongoDB and has a growing set of spatial capabilities. Spatial databases have many advantages over static files but they too come with disadvantages. Database servers require significant hardware resources to host large data sets. The servers have to fit into existing networks with security policies and firewalls. In most cases, users have to be authenticated which requires maintaining user accounts and permissions.

1.02.3.2

Spatial Data as Web Services

The place where the Web and GIS really connected is the joining of spatial data and Web services. This function is provided by GIS servers that take spatial data from static files, file geodatabases, and enterprise databases and serve them out using HTTP. These endpoints are called Web services. These services can serve geographic data as text that describes features, as rendered map tiles, or as raster coverages. In addition to read-only publishing, Web services can also allow editing of data from Web-connected devices. These Web services have greatly expanded the amount of data available for geocomputation. In addition to providing data for maps, they also allow users to access data programmatically through code. Software like Python or R can pull data from any Web service and perform analysis on them. The Web has essentially connected the entire Internet to the command line. Data transmitted by Web services may take several forms as spatial data can represent the same set of features in multiple ways. For instance, a feature representing air quality can be represented by geoJSON text with latitude and longitude values, or a set of jpg images stitched together to make a map. A browser can consume the spatial data in any of these formats and make a map that looks the same. The method of transfer is transparent to the user. An example that demonstrates the various ways that data may be served for maps can be seen in a simple map of a city’s sidewalks. Individual sections of sidewalks may exist in a spatial database and have attributes associated with them. These details can track a section’s condition, ADA compliance, or inspection date. In a large city, there will be hundreds of thousands of these segments. When exposing this data through a Web service, the segments can be transmitted as text describing the hundreds of thousands of sidewalk segments, or they can be transmitted as images rendered from spatial data in the database and broken into map tiles showing the thousands of sidewalk segments as images. From a performance perspective, it is much quicker to send the tiles. However, you may lose the ability to get feature attributes. If the segments are sent by text, care must be taken to filter the number of features transmitted, or the map will become useless as it hangs trying to download too many records. There are multiple factors that must be considered to determine how spatial data are transmitted. A GIS must balance detail, function, and performance. Too much data and a GIS will be too slow to be usable. Too little data and a map is incomplete or has no context. The format of spatial data provided by a service depends on the type of Web service it is. Three common Web services are Web mapping services (WMS), Web feature services (WFS), and Web coverage services (WCS). Most people are familiar with WMS. WMS transmit tiles rendered from spatial data to a client. If these tiles are prerendered, they can be cached on the server to cut down the time needed to render the images. It may take significant resources to render and store a cached tile service but there is a significant performance increase in services with many features. Google Maps and Open Street map are examples of a WMS. The slippy map that looks like a solid canvas is actually many tiles that are stitched together. Web feature services (WFS) return text data that describe features. Geometry and attributes are transmitted in GML, JSON, or other format. The browser renders the text data into features and draws a map. The attribute data is often used to symbolize each feature or create hover effects Fig. 1. Web services do not have to implement a full GIS server to serve geographic data. Web services can serve spatial data in text form that have simple geographic information encoded like latitude and longitude. For instance, Web services are often used as a front end for scientific instruments such as air quality meters or weather stations. These instruments provide real-time spatial data through a Web service. Many manufactures of equipment have switched from proprietary software and interfaces to standardsbased interfaces using Web services and formats like XML and JSON. These interfaces can then be accessed from any computer connected to the Internet using a browser or programmatically through languages like Python or R.

10

Geocomputation: Data, Methods, and Applications in a New Era

Web map service

PostgreSQL server

Ports 80/443

Shapefile

GIS server geoJSON Multiple spatial data formats

Fig. 1

{"executionTime":"2017-01-18 06:19:03 AM","stationBeanList":[{"id":1,"stationNa me":"Bicentennial Park", "availableDocks":9,"totalDocks":19, "latitude":39.955864, "longitude":83.003106, "statusValue":"In Service", "availableBikes":10, "lastCommunicationTime":"2017-01-18 06:17:58","is_renting":true}]}

Web feature service

Spatial data translated by a server into tiles and JSON.

In addition to simply serving data, some Web services can perform geoprocessing tasks (OGC, 2016). A geoprocessing Web service can take input from a webpage form or URL request, calculate results using spatial data, and return data to be visualized on a map. For instance, entering a value in a form or clicking on a map will calculate a drive time radius around a selected point. These geoprocessing tasks were once only available through desktop software. They can now be done over the Web in a browser. Even if a website does not have a formal Web service, researchers can still collect data from regular webpages using a technique called scraping. Scraping uses software to go to a website and parse through the HTML code that makes that page look for particular data. For instance, data on gas prices can be scraped from crowd-sourced gas price sites like gasbuddy.com. These scraped data can be stored in a database for further analysis. Data accessed through any of these sources may be immediately plotted on a map for display or captured and saved. Saved data can be analyzed for temporal patterns that may not be immediately apparent seen through simple display. It can also be correlated with other collected data. For example, air quality may be correlated with traffic data, precipitation, or electricity use by factories. Each of these data sources is published by different entities but combining them may reveal new patterns.

1.02.3.3

New Data Formats

The advances in data sharing made on the Web have led to new file formats for transferring data. Unlike many file types of the past, these formats are generally based on open standards. For instance, geography markup language (GML) is used to transfer data from Open Geospatial Consortium (OGC) Web feature services. Another new file format is Keyhole Markup Language (KML). KML files are a form of extensible markup language (XML). KML can store spatial data as well as information used for visualization. KML files can store the location and orientation of a viewer of the spatial data. This viewer position is used in mapping software to define how the viewer sees the data. KML is often associated with Google as they bought the company that created the standard. KML is used in several Google products including their mapping API and Google Earth. However, KML has become an open standard and was accepted by the OGC for official standardization in 2008. (OGC, 2008) XML-based files are useful for small collections of features. They can be opened and read with any text editor and file contents can be easily comprehended. A drawback of KML files is that they often have a significant amount of redundant data. In XML files, data are arranged in hierarchical trees that help describe the data. This is not a problem if the data set is small. However, if thousands of features are in a KML file, there is a large amount of redundant data that has to be transmitted and parsed. XML files are being used less frequently as other formats have become more popular. The inefficiencies of XML led to a search for more efficient transfer formats. One format that became popular was Javascript object notation (JSON). The JSON format is more efficient to transfer than KML. It stores data as sets of key value pairs. One of the early sites to implement JSON and popularize its use was Twitter. The large number of sites integrating Twitter data exposed many programmers to JSON and spread its use.

Geocomputation: Data, Methods, and Applications in a New Era

11

JSON is useful for transmitting tweets and sports scores but it originally did not have a formal format for spatial data. In order to address the issues of spatial data, an extension of JSON called GeoJSON was created (Butler et al., 2016). GeoJSON was explicitly created for spatial data and has become one of the most popular formats for transmitting feature data. One limitation of geoJSON is that it does not store topology. To address this issue and add a more compact format, TOPOJSON was created. TOPOJSON is an extension of GeoJSON (Bostock and Metcalf, 2016).

1.02.3.4

New Challenges of the Web

The standardization of data transfer that the Web enabled made much more data available to researchers. In addition to traditional providers of geographic data, many instruments and devices are also sharing data. This is not a new problem to GIS, or going farther back, to general cartography. Arthur Robinson wrote about cartography in 1952 at the very beginning of digital computers, “The ability to gather and reproduce data has far outstripped our ability to present it” (Robinson, 1952). That was before GPS was available on billions of smart devices. The explosion of data is not a new problem but it is getting closer to the end user who in the past was not involved in the data process. It used to be that the cartographer filtered and generalized data to create a paper map. When desktop GIS came along, the cartographic process of filtering and generalization of data became a dynamic process for the GIS operator. In order for a GIS to be usable, data are often filtered and symbolized at different scales but this process was still hidden from the end user. The varying symbologies that make GIS analysis work well on the desktop are often invisible in the maps that are the output of a GIS. With the Web platform, the end user is ever closer to the data, switching layers and basemaps, searching, filtering, and performing analysis. Free data published on the Web is a boon for researchers, but it also comes with risks. There is no guarantee that the data will be available online at all the times. Sites may go down for maintenance or upgrades. Site upgrades may change Web service URL syntax or paths to data. These changes may break a data collection process or a map based on published data. Researchers must think about the long-term goals of their data collection and research. If they feel the data sources may not be stable, it may be necessary to extract and self host the data. With all the available data just a few lines of code away, there is a temptation for the Web GIS builder to want to put all the available data into a GIS application and let the user find what they want. This type of application has come to be known as “kitchen sink” GIS. These applications are often confusing and laden with domain-specific terms and acronyms. Strategies are emerging to produce more curated GIS applications. As described above, data may be published in multiple formats. In some cases the researcher can collect whichever format is desired. In many cases though, data are published in a format that is inconvenient or unusable to the researcher. In these cases, data must be downloaded and transformed to a format usable by the researcher. This can take significant technical skills.

1.02.4

The Move to the Cloud

Research using Web services described above has been aided by the move of computing infrastructure into the cloud. The term cloud infrastructure has become a buzzword as of late. In general it means purchasing computing resources as services instead of physical entities from centralized providers connected to the Internet. These services may be hosted by an off-premise commercial provider, or they can be localized data centers within a large organization. They can even be a combination of the two with some data and services on premises and older data in “cold storage” offsite. With cloud infrastructure, customers do not have to consider any of the complex details of building advanced networks. They just rent capacity. Amazon was the first major provider of cloud infrastructure. Amazon began providing cloud services in earnest when they started renting out excess capacity they had installed for their online store (Wingfield, 2016). Other large providers soon followed and cloud infrastructure is now available from many providers including Microsoft, Google, Digital Ocean, and Rackspace. Infrastructure has now been commoditized to such a degree that entire clusters of servers can be purchased and deployed in hours with nothing more needed than a credit card. As Moore’s Law and competition take hold of the market, infrastructure services are getting even cheaper. These new resources available outside the restrictions and limitation of institutional network policies offer many opportunities for researchers to experiment, make small prototypes to demonstrate feasibility, and to conduct resource-intensive research on temporary basis without having to buy permanent hardware that will soon be obsolete. Infrastructure services exist at different levels. Perhaps the most common example is putting files into Dropbox, Google Drive, or Microsoft Onedrive, and having them available through a browser on any computer or mobile device. In the past, it was common for these files to be stored on a file server on a school or work network and not be available off the network. This is another example of an old technology being made more useful and flexible through Web services. File storage is just a small part of the stack of available services. At the base of the stack is “bare metal” hosting where hardware or virtual hardware is provided for a customer. Moving up the stack, applications like databases and Web servers can be rented that share space on common hosts. At the top are services like computation and file storage. We will discuss some of these in more depth.

1.02.4.1

Host Virtualization

Host virtualization refers to running operating systems on a virtual set of hardware instead of physical hardware (Fig. 2). Each virtual machine (VM) thinks that it is running on physical hardware. Multiple VMs, each with their own configuration,

12

Geocomputation: Data, Methods, and Applications in a New Era

Virtualization

Rack mounted physical hardware Fig. 2

Multiple virtual machines sharing one physical host

Several virtual machines on a physical host.

applications, and security, can share a single set of hardware. Each VM has a complete installation of files just like a physical machine and can be addressed from the network as an individual server. However, these VMs all share the same physical host machine.

1.02.4.1.1

Why virtualization

In the past, it was often the case that data centers were full of servers that only ran a single application that had to have its own hardware because it conflicted with other software. These servers used very little of a computer’s resources most of the day when they were being used and idled the rest of the day. The hardware for these servers was expensive and when many of them were put in one room, they took a large amount of energy to run and cool. Today, data centers full of physical hardware are being condensed onto powerful VM host servers that can contain hundreds of separate VMs. Increasing capacity provided through Moore’s Law has meant that hardware has vastly outpaced the needs of most users and applications. Server hardware is now powerful enough to host multiple operating systems without performance degradation. There are many advantages of host virtualization. A VM exists as one large file. All of its operating system files, applications, and data are contained in that one file. Because they are one file, they can be backed up or transitioned to different hardware. In addition, a differencing file can be used to snapshot the state of a VM before any significant change like software installation or configuration modification is made to the VM. If the change encounters problems, it is easy to roll back the system state by simply deleting the checkpoint snapshot. Another advantage to using virtualization is the ability to create VMs from images that already have particular configurations of software installed. Base VM configurations can be stored in libraries so that new machines can be provisioned by simply copying a VM image from a library and configuring it. The time and expertise needed to complete a fresh install of all of the software are avoided. These libraries can be hosted or public. Both Amazon and Microsoft keep libraries of VM images. VMs have virtual hardware that can be adjusted, prioritized, and shared. Small test VM servers can share limited resources while important production VMs can be provisioned more processors and memory. Provisioning may be done on a temporary basis. If a research project is going to be doing resource-intensive activities, it can be scheduled for off-peak hours when the other VMs on a physical host will have little activity. In the commercial world, seasonal companies like tax preparers or holiday merchants may pay for more virtual capacity during their peak season and throttle back during the off season. Prototyping is another useful application of virtualization. VMs can be prototyped on a client and then moved to a server environment with more resources when the VM is ready. This allows flexibility of configuration and testing while not on a production network. VMs may be used on fast networks disconnected from the Internet or low bandwidth connections and then migrated to highspeed networks. For instance, one workflow might have a researcher build a virtual machine while disconnected from the Internet on a long airplane flight. After arrival on a remote disconnected site, the researcher may use local wireless to share and collect data with other researchers. During remote data collection, the VM can be temporarily moved to a high-speed internet connection to sync data with a home institution’s computers. When the researcher returns to their institution, they may choose to migrate the VM to robust hardware to conduct resource-intensive data processing. The entire VM and all of its data and configuration can be moved to a server infrastructure by moving one file. Virtualization has environmental advantages as well. Since network connectivity is the only requirement for management, data centers full of VM hosts can be located with their environmental footprint in mind. Locating where renewable energy is abundant decreases a data center’s carbon footprint. A huge part of a data center’s energy consumption is often in cooling the many servers it contains. If a data center can be located in cool climates or where ample chill water is available, the footprint can be further reduced. Repurposing power-intensive industrial sites located next to hydro power like aluminum smelting plants is an ideal case (Wilhem, 2015).

Geocomputation: Data, Methods, and Applications in a New Era

13

For those with a historical perspective of computer operating systems, the interoperability of OSs with virtualization may raise some eyebrows. There has been a long-standing animosity between vendors of computer operating systems, particularly between Microsoft and the open-source community. This has recently changed as Microsoft has joined the Linux Foundation (Bright, 2016), supports open-source operating systems on their cloud platform Azure, and has recently released a version of Microsoft SQL server for Linux. While Linux has yet to take over the desktop market, its server versions are increasingly popular as free, stable, and high-performing operating systems. One further note about operating systems and virtualization. While other OSs can be virtualized on Apple hardware, it is a violation of the license agreement to virtualize Apple operating systems on non-Apple hardware, so it is not discussed below.

1.02.4.1.2

Implementing host virtualization

For the researcher who wants to use virtualization for their research, the three most common choices are Microsoft Hyper-V, VMware, and Oracle VirtualBox. Microsoft Hyper-V is built into most modern versions of Windows Server and has recently been added to the Windows 10 desktop operating system making it ideal for prototyping VMs on clients and migrating to servers. Hyper-V supports virtualization of both Linux and Windows operating systems. The host software only runs on Microsoft Windows computers, so while VMs can be moved and shared among Windows hosts they cannot be moved to other operating systems. Hyper-V is often used because it is the default on Windows and is a key technology that is strongly supported and actively developed by Microsoft. VMWare is the most featured virtualization software and the most expensive. It is targeted toward large enterprise installations. VMware does have a limited version that can be used for free. Large institutions that host many servers or have their own on premises cloud often use VMWare to manage their virtualization infrastructure. VMware can host VMs on Windows, Linux, and Apple operating systems. Because VMware is a licensed product, one drawback of VMware is that it may restrict the sharing of VMs between researchers at different institutions. VirtualBox is a very popular choice because it is free and it supports guests on Windows, Linux, and Apple operating systems. This gives it the greatest flexibility for sharing VMs between researchers. VirtualBox can host Windows and Linux operating systems. It is the base of the Docker containerization technology that will be discussed below.

1.02.4.2

Containerization

Containerization is a form of virtualization that is becoming increasingly popular. Containerization is an even denser version of virtualization. As described above, virtualization takes an entire operating system and condenses it down to one file. With full virtualization, all the files for the entire operating system exist within each VM. If a host server has 10 VMs, it has 11 copies (including its own) of all of the operating system files. A container is a VM that exists as a separate OS from the host OS but it only has the files that are unique to it. Containerization removes the redundant files and only maintains configuration files necessary to maintain the system state. This sharing of base files means that the VMs take even fewer resources to host. Also, library images of containers with stock configurations are much smaller and can be shared more efficiently. Containerization allows researchers to spin up many VMs to keep applications logically separate while not having to worry about idling hardware.

1.02.4.3

Application Hosting

Cloud infrastructure can be moved further up the stack from whole machines to applications and services. For instance, database servers require a significant amount of technical expertise to install and maintain. While researchers may know how to programmatically interact with a database server, installation and configuration is often a difficult task. Databases on large networks have to conform to institutional policies. They have to sit behind institutional firewalls. This is not the case with cloud services. Researchers can provision a database server with a credit card. The database server is built and maintained by a cloud host. Communications with the database can be encrypted through use of a virtual private network to maintain security during transmission of data.

1.02.4.4

Cloud Computation

Cloud computing is another example of services in the cloud. Significant but temporary computation services can be rented to process large data sets. Like the database example above, the infrastructure is hidden and the researchers use only what is needed. Prices may vary based on the time of day so further savings are possible with the correct timing. It is important to note that most of these services have free tiers. For researchers this is important as it can allow them to complete a limited proof of concept in order to apply for funding to build a full-scale application. If successful, an application or project can be scaled up when funding is available. This fast prototyping in the cloud allows researchers to iterate through multiple configurations that may be slowed by institutional friction on a home network.

1.02.4.5

Software as a Service

Software as a service (SaaS) is another category of cloud infrastructure. One of the software as a service categories that has significantly advanced geocomputation is software version control. Software version control allows multiple programmers to work on

14

Geocomputation: Data, Methods, and Applications in a New Era

the same software by managing the changes that are made in the programming code. Multiple programmers can collaborate on the same software without conflicting. Version control may also implement project management tasks like task lists, documentation, and time lines. Many version control systems allow developers to share their software with the rest of the world. Collaborative version control has greatly advanced the progress of research by allowing researchers to build on each other’s work. The most popular software version control website is Github. Github is a software version control website that allows users to publish unlimited projects if they are open to the public. Private repositories are also available for a fee. From a GIS perspective, cloud-based version control plays an important role in developing new capabilities. Many of the libraries used for Web GIS are open sourced and are hosted on Github. Leaflet.js, one of the most popular open-source mapping software on the Web, is hosted on Github as is Esri Leaflet, the library that allows Leaflet maps to talk to the most used proprietary servers from Esri. One final example of SaaS that is particularly useful in a geocomputation and GIS context is Python Anywhere (Anywhere, 2016). Python is an open-source programming language that is particularly popular with spatial research. Python can be used to collect data automatically on a set schedule from Web services and deposit them in a database. Data collection using Python requires few resources but must be done from a stable, always online host. That is where Python Anywhere can be useful to researchers. Python Anywhere offers a programming and application hosting environment in the cloud. It can be reached from any Web browser on any platform. The service has a free tier and very inexpensive paid tier plans. Python Anywhere can be used to build simple GIS data collection applications or to prototype applications to use as a proof of concept. In a classroom context, instructors can get students programming on the Web without setting up complicated infrastructure onsite. Students can retain their work after classes end or use them across multiple classes.

1.02.5

Computational Methods

The previous sections described the evolution of computation and its move onto the Web. These resources have been linked up with a tremendous amount of data. New questions are emerging as researchers take advantage and cope with these new opportunities. With the advances in spatial–temporal data and computing technology, an important direction of handling such data is to discover useful and nontrivial patterns from the data using various computational methods. This is a vast area that encompasses a diverse set of methodological spectrum, and it is a daunting task to even try to categorize the methods. Here, we summarize these computational methods into roughly four groups: visualization, machine learning, spatial optimization, and simulation. We note that such a categorization is far from being ideal. However, this allows us to effectively identify salient tasks in geocomputation that have emerged in recent years.

1.02.5.1

Visualization

An important task in geocomputation, or any area that involves the use of data, is to know your data. Such a need has led to the emergence of the research field of exploratory data analysis, or EDA, in statistics. EDA stresses the importance of seeing the data in order to understand and detect patterns in the data, instead of just analyzing the data (Tukey, 1977). Shneiderman (1996) summarized a three-step approach to EDA: “Overview first, zoom and filter, and then details-on-demand.” EDA not only requires computational and visualization tools (as in software packages) to process the data, but also relies on principles and methods that guide the development of those tools. The extension of EDA that deals with spatial and temporal data has a root in cartography and has fully fledged into an interdisciplinary field (Andrienko and Andrienko, 2006). In any spatial data set, we call the indications of space and time the references and the measures at these references a set of characteristics. The goal of spatial and temporal exploratory data analysis is to use graphics to address various tasks (Bertin, 1967). These tasks ask questions that can be as simple as “what is the air quality of Columbus, Ohio today?” or as complicated as “which areas in Columbus, Ohio have the worst air quality and also have the highest density of minority ethnicity groups?” While answering these questions requires intimate knowledge about the data, it is also important to develop tools that provide interactivity for users to explore the data by means that can be as simple as looking up and comparing the data values of different references, or as complex as finding patterns and associations between data values and their referenced spatial and/or temporal entities. Much of this can only be achieved in an interactive environment where the user can choose the scope and extent of the data to be displayed and further examined. While tremendous progress has been made in spatial exploratory data analysis in the last two decades (Andrienko and Andrienko, 1999), recent years have seen a rapid change in the computational community where data can be closely bound with graphic elements on a visualization, especially for data visualization on a Web-based platform. Techniques enabling such a trend include D3 (Bostock, 2016) and LeafLet (Agafonkin, 2016), both JavaScript libraries. D3 utilizes the vector-based image data format called scalable vector graphics (SVG) that supports detailed description of graphic elements (such as shapes, text, color, and other graphic effects) for different visualizations. A key feature in D3 is to bind the data into various graphic elements. For example, we can bind a one-dimensional array with circle elements so that the size of the array determines the number of circles, and the value of an array element determines the size of its corresponding circle. In addition to binding the data array with circles, D3 can also bind the range of the data in the array with the Y-axis and the number of elements in the array with the Y-axis, which will effectively construct a bar chart. This kind of data binding can be used between multidimensional data and other graphic elements. Beyond data binding, D3 can also be used for subsetting (filtering) data that allows the user to change the focus of data exploration.

Geocomputation: Data, Methods, and Applications in a New Era

15

Maps can be made in various ways today. Packages such as D3 can support mapping as a general graphic representation method. However, here we specifically note that LeafLet provides a wide range of tools for mapping data from different formats and can be used to integrate a custom map with data from many sources. LeafLet enables basic interactive functions such as zooming and panning. It can also allow the user to filter, query, and highlight features on the map. More importantly, these interactive functions can be linked to D3 visualizations by establishing the match of the unique identifications of the features on the map and the data items bound with D3 graphic elements.

1.02.5.2

Machine Learning

The root of machine learning dates back to as early as the 1950s when Alan Turing asked the fundamental question of “can machines think?” Turing (1950) stipulated a computer program that simulates a child who learns while developing the brain. The start of machine learning follows the footsteps of artificial intelligence in logical inference. In recent years, however, machine learning has shifted more toward the computational aspect of artificial intelligence, inspired by advances of algorithms and software in statistics, ecology, and other disciplines (Mitchell, 1997). The essential task of machine learning is the ability to classify input data into categories that can be learned from. In general, there are two main camps of machine learning: supervised and unsupervised. In supervised learning, the user must prepare a data set that includes the desired output to train the algorithm. For example, to develop a supervised machine learning algorithm to assign remote sensing image pixels into different land use types, the training data must include the land use types of each pixel, which will allow the algorithm to learn about the rules that can be used to classify new pixels. Supervised machine learning methods include various parametric and nonparametric statistical models, many neural networks, decision trees, and support vector machines. Unsupervised learning methods do not rely on the use of training data sets. Instead, the learning algorithm is designed to reveal the hidden structure in the data. Some neural networks and most of the clustering methods belong to this category. A widely used machine learning method in spatial data is the k-means clustering method (Lloyd, 1982). This method can be used to search for the cluster of points in an area. This is an iterative method that starts from a random set of k locations and each of these locations is used to serve as the center of a cluster. Points are assigned to their closest center to form the k clusters. In the next step, the points assigned to each cluster are used to compute a new center. If the new centers are significantly different from the previous k centers, the new centers will be used and we repeat the process until no significant changes can be made. At the end, the method will yield a partition of the points into k clusters. While there are obvious applications of the k-means method in two-dimensional data, a k-means method applied on a one-dimensional data is similar to the very widely used Jenks classification method in choropleth mapping. A supervised machine learning works in a different way. A neural network, for example, utilizes a series of weights that convert a set of inputs to outputs. By comparing with the known result, an algorithm is used to adjust the weights in order to minimize the error between the model output and known output. A support vector machine, on the other hand, utilizes an optimization method to try to determine if a line (or a hyperplane for multidimensional data) can be considered to be best separating input data into two classes. Some of these machine learning methods have been used to process data from nontraditional sources. For example, support vector machines were used to geocode tweets by identifying location expression in the text by finding the best match between locations retrieved from the text and place names in a gazetteer (Zhang and Gelernter, 2014).

1.02.5.3

Spatial Optimization

Spatial optimization refers to the need of finding an optimal set of spatial configurations such that a goal or objective can be reached while some constraints are satisfied. There are various applications of spatial optimization. For example, one may wish to find a subset of land parcels in an area under a budget constraint such that the total benefit of the selected land parcels is maximized. In general, spatial optimization problems can be categorized into two major groups (Xiao, 2008). A selection problem aims to find a subset of spatial units. We may impose spatial constraints on these selected spatial units. For example, some problems may only consider contiguous spatial units to be selected. A second type of spatial optimization problem is partitioning problems. These problems aim to separate the spatial units into a set of regions. For example, the political redistricting problems require an area to be partitioned into a number of contiguous districts. Solving spatial optimization problems generally requires tremendous amount of computing resources as many of these problems are NP-hard, meaning there may not exist a solution method that can be used to find the optimization solution in a reasonable amount of time. For this reason, researchers have developed a special type of solution approach called heuristics; methods that can be used to find quickly good, but not necessarily optimal, solutions to the problem. The computational efficiency of heuristic methods is the key (Xiao, 2016). Traditional heuristic methods are typically designed to solve one type of optimization problem. For example, the p-median problem is a representative optimization problem in location-allocation analysis, where the goal is to locate facilities or services on p nodes on a network so that the distance from each node to its nearest facility or service node is minimized (Hakimi, 1964). A commonly used heuristic method to solve the p-median problem is the vertex exchange algorithm (Teitz and Bart, 1968). This algorithm starts with randomly selected p nodes from the network and keeps switching these nodes with unselected ones until such exchange can no longer improve the solution.

16

Geocomputation: Data, Methods, and Applications in a New Era

In contrast to traditional methods, metaheuristic methods aim to provide a solution framework for a wide range of problems. The increasing list of metaheuristic methods includes genetic algorithms (Holland, 1975), tabu search (Glover et al., 1997), and simulated annealing (Kirkpatrick et al., 1983). It is noticeable that these methods are often inspired by some natural processes and they can be used to solve different kinds of problems. To use a genetic algorithm (GA) to solve the p-median problem, for example, one could use an array of integers to represent the indices of the nodes selected for locating the facilities. The GA will start with a set of arrays, each containing randomly selected indices. This is called the population of solutions. The GA will evaluate each individual in the population by assigning it a fitness value. A high fitness value means a good solution in the population with a small total distance. Individuals with high fitness values have a high chance to be selected by the GA to participate in an operation called crossover, where the two individuals selected mix their contents to create two new individuals. New individuals with fitness values better than the current individuals will be inserted into the population. A mutation operation may also be used to randomly change the content of an individual. By continuously doing these selection, crossover, and mutation operations, the population will evolve toward a new state where individuals with better fitness values (thus better solutions) will be obtained.

1.02.5.4

Spatial Simulation

Models used to solve optimization problems are often called normative models because they prescribe what should be done. But results coming from such models may not work as prescribed because situations in the real world may not necessarily match exactly as formulated in the models (Hopkins et al., 1981). To have a better understanding of the behavior of the system, it is therefore necessary to identify the processes or mechanisms in the system, which may require various types of simulation models. Two types of simulation approach have been widely adopted in geocomputation research. The first approach is derived from John Conway’s Game of Life (Gardner, 1970), where a set of simple transition rules govern the change of state for each location (a cell) based on the states of the neighbors of that location. This type of simulation model is called cellular automata where each cell will automatically change its state if the transition rules are met. Researchers extended such model to simulate the dynamics of spatial systems such as urban sprawl, and land use and land cover change (Clarke and Gaydos, 1998). A second type of spatial simulation approach explores a more explicit representation of the processes linked to spatial systems, where the important players in the system must be identified and, more importantly, the interactions between these players must be understood and simulated. These are called agent-based models (Epstein and Axtell, 1996), where each agent is a player in the system and agents proactively seek to maximize their own benefits. A particular application of the agent-based modeling approach is found in the land use and land cover change literature (Parker et al. 2003) where different players such as land owners and other stakeholders can be identified to act through interactions such as land use planning, land market, and development. The relatively straightforward modeling concept has made agent-based modeling highly popular across a wide range of disciplinary boundaries. More importantly, implementing such a model has become increasingly intuitive. Programming platforms such as NetLogo (Wilensky, 1999) and MASON (Luke et al., 2005) not only support the necessary coding environment but also a comprehensive visualization tool to help present the simulation results. These tools have played an important role in making agent-based modeling accessible to researchers and professionals without much training in programming.

1.02.6

Visualizing Twitter Data

Twitter data represents a significant challenge to geocomputation. By sending a message of less than 140 characters, millions of Twitter users around the world are constantly connected with each other by sending new tweets or retweeting tweets from other users. The volume of Twitter data is growing rapidly. Among all the tweets, a subset of them also contains locational information that can be used to help identify where they were sent from. Though these geocoded tweets only account for a small portion of all the tweets, they can be used to help understand the dynamics of a region or the world. Many researchers have also started to analyze tweets in order to categorize these tweets in terms of the emotion of the sender and the kinds of topics conveyed in the tweets (Mitchell et al., 2013; Steiger et al., 2015). This is a bona fide big spatial temporal data set. Here we concentrate on the tweets around the Central Ohio area where the city of Columbus is located. Geocoded tweets from this region were collected for the time period of May to July 2015 using the public Twitter API, which provided a small sample of about 1% of the tweets. In this period of time, more than 200,000 tweets are collected and stored in a PostgreSQL database. A spatial temporal index is developed for efficient data retrieval from the database. Each tweet stored in the database is assigned a set of tags, indicating the theme of the tweet. Seven themes are identified for this region: food, business, move (mobility), sports/fitness, current events, LGBT, and local interests. A set of keywords is used to identify a theme. For example, a tweet is tagged as “Food” if it contains words such as pizza, barbecue, and brewing. A tweet is also tagged as happy, unhappy, or neutral by calculating the average happiness index of the words in the tweet (Dodds and Danforth, 2009). Fig. 3 is a Web-based graphical user interface of a prototype geocomputational approach to visualizing the tweets collected for the central Ohio region. The left side of the screen visualizes the temporal aspects of the tweets. A user can define a slice of time for visualization using the sliding bar underneath the plot. Each curve in the plot shows the number of tweets of each category. Each of the curves can be turned on and off. On the very right side of the screen is a small window that can be used to show random individual tweets in the selected category within the timeframe defined in the left window. These are renderings of the Twitter data that

Geocomputation: Data, Methods, and Applications in a New Era

Fig. 3

17

A geocomputational approach to visualizing Twitter data.

are retrieved from the database through an AJAX framework. The request made in the Web browser is routed through a Web server that is connected to the database. When the user clicks on the Map button in the left window, the Web browser gathers the information from the screen, including the start and end time period specified by the user. A request is then sent to the server which will fire up server side programs to (1) compute the heat map of all the tweets in the region in the specified timeframe and (2) calculate a spatial clustering method called kernel density estimation (Wand and Jones, 1995; Winter and Yin, 2010) for each of the tweet categories. The results of these computational tasks are then sent back to the browser as JSON data that are subsequently used to create Leaflet layers for the heat map and the kernel curves. The computation of the kernel density estimation is a nontrivial task because a total of 10 kernels must be calculated for each request. To make the interface effective, only the 50% kernel is shown for each category, meaning that 50% of the tweets in each category were sent within the enclosed curves on the map. The user can create up to 20 sets of these heat maps and kernels; the forward and backward buttons allow the user to loop through these sets.

1.02.7

Conclusion

For much of the history of geocomputation and GIS, we witnessed incremental improvements in geoprocessing capacity. The same workflows maintained as geographic data were created and analyzed in isolated computing units and GIS software installations. Data were only shared through extracts and static data files. Recently, though, the modern Web has standardized data transfer between hosts and facilitated much easier data sharing. This ability to share data came along just as GPS, broadband, and wireless have connected many islands of computing and mobile devices. The physical implementation of the Web has been hastened by cloud infrastructure. The cloud has decreased the costs to develop and host applications on the Web. It has granted resourceconstrained parties access to powerful Web geocomputation tools. These new data and capabilities offer researchers many new opportunities for investigation but come with challenges all their own.

References Agafonkin V (2016) Leaflet.js javascript library. https://www.leafletjs.com. (Date accessed 2/6/2017). Andrienko, G.L., Andrienko, N.V., 1999. Interactive maps for visual data exploration. International Journal of Geographical Information Science 13 (4), 355–374. Andrienko, N., Andrienko, G., 2006. Exploratory analysis of spatial and temporal data: A systematic approach. Springer Science & Business Media, Heidelberg. Anywhere P (2016) Python anywhere website. https://www.pythonanywhere.com. (Date accessed 2/6/2017). Berners-Lee, T., Fischetti, M., Foreword By-Dertouzos, M.L., 2000. Weaving the web: The original design and ultimate destiny of the World Wide Web by its inventor. HarperInformation, New York. Bertin, J., 1967. Semiology of graphics: Diagrams, networks, maps. University of Wisconsin Press, Madison. Bostock M (2016) Data-driven documents. https://www.d3js.org. (Date Access 2/6/2017). Bostock M and Metcalf C (2016) The topojson format specification. https://github.com/topojson/topojson-specification/blob/master/README.md. (Date Accessed 2/6/2017).

18

Geocomputation: Data, Methods, and Applications in a New Era

Bright P (2016) Microsoft, yes, microsoft, joins the Linux foundation. Ars Technica. https://arstechnica.com/information-technology/2016/11/microsoft-yes-microsoft-joins-thelinux-foundation/. Date Published:11/16/2016. (Date accessed 2/6/2017). Butler H, Daly M, Doyle A, Gillies S, Hagen S and Schaub T (2016) The geojson format. Technical report. Internet Engineering Taskforce. https://tools.ietf.org/html/rfc7946. (Date accessed 2/6/2017). Campbell, J.B., Wynne, R.H., 2011. Introduction to remote sensing. Guilford Press, New York. Chorley, R.J., Haggett, P., 1967. Models in geography. Methuen, London. Chrisman, N., 2006. Charting the unknown: How computer mapping at Harvard become GIS. ESRI Press, Redlands. Clarke, K.C., Gaydos, L.J., 1998. Loose-coupling a cellular automaton model and GIS: Long-term urban growth prediction for San Francisco and Washington/Baltimore. International Journal of Geographical Information Science 12 (7), 699–714. Dawn C Parker, Steven M Manson, Marco A Janssen, Matthew J Hoffmann, Peter Deadman (2003) Multi-agent systems for the simulation of land-use and land-cover change: a review Annals of the association of American Geographers. Taylor & Francis 93(2): 314–337. Dodds, P., Danforth, C., 2009. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. Journal of Happiness Studies 11, 441–456. Epstein, J.M., Axtell, R., 1996. Growing artificial societies: Social science from the bottom up. Brookings Institution Press, Washington, DC. ESRI, 1998. ESRI shapefile technical description. Environmental Systems Research Institute, Redlands. Faust, N., 1998. Chapter 5: Raster based GIS. In: Foresman, T.W. (Ed.), The history of geographic information systems: Perspectives from the pioneers. Prentice Hall, Oxford. Fidler, B., Currie, M., 2015. The production and interpretation of Arpanet maps. IEEE Annals of the History of Computing 37 (1), 44–55. Fu, P., Sun, J., 2010. Web GIS: Principles and applications. ESRI Press, Redlands. Gahegan, M., 1999. What is geocomputation? Transactions in GIS 3, 203–206. Gardner, M., 1970. Mathematical games: The fantastic combinations of John Conway’s new solitaire game life. Scientific American 223 (4), 120–123. Glover F, Laguna M, et al. (1997) Tabu searchdpart I. ORSA Journal on computing 1.3 (1989): 190–206. Goodchild MF (2007) In the world of web 2.0. International Journal 2(2): 27–29. Goran, W., 1998. Chapter 12: GIS technology takes root in the department of defense. In: Foresman, T.W. (Ed.), The history of geographic information systems: Perspectives from the pioneers. Prentice Hall, Oxford. Hakimi, S.L., 1964. Optimum locations of switching centers and the absolute centers and medians of a graph. Operations Research 12 (3), 450–459. Hart, G., Dolbear, C., 2013. Linked data: A geographic perspective. CRC Press, Boca Raton. Holland, J.H., 1975. Adaptation in natural and artificial systems. An introductory analysis with application to biology, control, and artificial intelligence. University of Michigan Press, Ann Arbor. Hopkins, L.D., Brill, E.D., Wong, B.D., et al., 1981. Generating alternative solutions for dynamic programming models of water resources problems. University of Illinois, Water Resources Center, Urbana. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., et al., 1983. Optimization by simulated annealing. Science 220 (4598), 671–680. Lloyd S (1982) Least squares quantization in PCM. IEEE transactions on information theory. 28.2: 129–137. Lukasik, S.J., 2011. Why the Arpanet was built. IEEE Annals of the History of Computing 33 (3), 4–20. Luke, S., Cioffi-Revilla, C., Panait, L., Sullivan, K., Balan, G., 2005. MASON: A multi-agent simulation environment. Simulation: Transactions of the Society for Modeling and Simulation International 82, 517–527. Microsoft (2003) Microsoft security bulletin ms02-039dcritical. https://technet.microsoft.com/library/security/ms02-039. (Date accessed 2/7/2017). Miller, H.J., Goodchild, M.F., 2015. Data-driven geography. GeoJournal 80 (4), 449–461. Mitchell, T.M., 1997. Artificial neural networks. Machine Learning 45, 81–127. Mitchell, L., Frank, M.R., Harris, K.D., Dodds, P.S., Danforth, C.M., 2013. The geography of happiness: Connecting twitter sentiment and expression, demographics, and objective characteristics of place. PLoS One 8, e64417. Moore G (1965) Cramming more components onto integrated circuits. Electronics 38(8): 83–84. OGC (2008) OGC keyhole markup language. http://www.opengeospatial.org/standards/kml. (accessed 2/7/2017). OGC (2016) Web service common standard. http://www.opengeospatial.org/standards/common. (accessed 2/7/2017). Openshaw, S., Abrahart, R.J., 1996. GeoComputation. In: Abrahart, R.J. (Ed.), Proceedings of the First International Conference on GeoComputation. University of Leeds, Leeds, pp. 665–666. O’reilly T (2007) What is web 2.0: Design patterns and business models for the next generation of software. Communications & Strategies 65(1): 17–38. ISSN: 1157–8637. Robinson, A., 1952. The look of maps. University of Wisconsin Press, Madison. Shneiderman B (1996) The eyes have it: A task by data type taxonomy for information visualizations. In: Proceedings of the IEEE Symposium on Visual Languages. Boulder: Colorado. pp. 336–343. IEEE. Steiger, E., de Albuquerque, J.P., Zipf, A., 2015. An advanced systematic literature review on spatiotemporal analyses of Twitter data. Transactions in GIS 19, 809–834. Teitz, M.B., Bart, P., 1968. Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research 16 (5), 955–961. Tomlinson, R., 1998. Chapter 2: The Canada geographic information system. In: Foresman, T.W. (Ed.), The history of geographic information systems: Perspectives from the pioneers. Prentice Hall, Oxford, p. 2. Tukey, J.W., 1977. Exploratory data analysis. Addison-Wesley, Reading. Turing, A.M., 1950. Computing machinery and intelligence. Mind 59 (236), 433–460. Wand, M.P., Jones, M.C., 1995. Kernel smoothing. Chapman & Hall, London. Wilensky U (1999) Netlogo. http://ccl.northwestern.edu/netlogo/ (accessed 2/7/2017). Wilhem S (2015) Power shift: Data centers to replace aluminum industry as largest energy consumers in Washington state. http://www.bizjournals.com/seattle/blog/techflash/2015/ 11/power-shift-data-centers-to-replace-aluminum.html. (accessed 2/7/2017). Wingfield N (2016) Amazon’s cloud business lifts its profit to a record. New York Times, April 28. Winter, S., Yin, Z., 2010. Directed movements in probabilistic time geography. International Journal of Geographical Information Science 24 (9), 1349–1365. Xiao, N., 2008. A unified conceptual framework for geographical optimization using evolutionary algorithms. Annals of the Association of American Geographers 98 (4), 795–817. Xiao, N., 2016. GIS algorithms. Sage, London and Thousand Oaks. Zhang, W., Gelernter, J., 2014. Geocoding location expressions in twitter messages: A preference learning method. Journal of Spatial Information Science 2014 (9), 37–70.

1.03

Big Geodata

Michael F Goodchild, University of California, Santa Barbara, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.03.1 1.03.1.1 1.03.1.2 1.03.1.3 1.03.2 1.03.2.1 1.03.2.2 1.03.2.3 1.03.2.4 1.03.3 1.03.3.1 1.03.3.2 1.03.3.3 1.03.3.4 1.03.3.5 1.03.4 1.03.4.1 1.03.4.2 1.03.5 References

Definitions Geodata Big Data Big Geodata Related Concepts Data-Driven Science Real-Time Analytics The Changing Nature of Science Open Data and Open Software Disruptions Publication Production of Geodata New Questions Consumerization Spatial Prediction The Technology of Big Geodata High-Performance Computing Synthesis Conclusion

1.03.1

Definitions

1.03.1.1

Geodata

19 19 19 20 20 20 21 21 22 22 22 22 23 23 23 24 24 24 25 25

Geodata are normally defined as data about the surface and near-surface of the Earth. More precisely, geodata are observations about what is present at some location. Since the number of possible locations is infinite, geodata are often observed or captured in the form of aggregated or summary observations about areas (e.g., states, forest stands), lines (e.g., rivers, highways), or volumes (e.g., oil reservoirs, buildings); or geodata may be sampled at selected locations. A host of types of geodata exist, ranging from data about such physical variables as ground elevation or surface temperature, to data about the numbers of inhabitants in an area, or their average income. Geodata may be structured in a range of common formats, and are conveniently handled in readily available software. Synonyms for geodata include geospatial data, geospatial information, and geographic information. Spatial data is normally assumed to be a superset that includes data about phenomena embedded in other spaces besides geographic space.

1.03.1.2

Big Data

Big Data is a term of comparatively recent coinage, and has been the focus of a remarkable outpouring of energy and innovation in the past decade. The most obvious meaning of the term relates to data volume, and to the very rapid expansion of data storage and processing capacity in recent years. Whereas a gigabyte (roughly 109 bytes or 8  109 bits) might well have stretched the capacity of most computers in the 1970s, today the average laptop has approaching a gigabyte of random-access memory and a terabyte (roughly 1012 bytes) of hard-drive storage. It has always been possible to imagine a quantity of data larger than a given device can handle, and thus one convenient definition of Big Data is a volume of data larger than can readily be handled by a specified device or class of devices. For example, the volume of geodata collected by the Landsat series of satellites at their inception in the early 1970s was well beyond the processing capacity of the computers at the time. Already there are research projects that must deal with petabytes (roughly 1015 bytes) of data; we are living, we are told, in the midst of an “exaflood” of data (one exabyte is roughly 10 to the power 18 bytes); and 1024 has already been given an internationally agreed prefix (“yotta”). But although it is important, volume is not the only distinguishing characteristic of Big Data, which is why the term is capitalized here, to distinguish it from run-of-the-mill voluminous data. Networks of sensors, the Internet of Things, and increasingly sophisticated data collection, transmission, and aggregation systems have created a new and abundant supply of dynamic data in close-toreal time. The average citizen now expects near-instantaneous delivery of information on such topics as traffic congestion, international news, and sports results. Thus “velocity” is often cited as a second defining characteristic of Big Data. Finally the last three decades have seen a steady transformation from a world dominated by single, authoritative sources of information to a proliferation of multiple and often contentious or conflicting sources. To cite a typical geodata example, a search for information about the elevation of a given point used to produce a single result, with the authority behind it of a national mapping

19

20

Big Geodata

agency. Now, however, multiple technologies for measuring elevation that include surveys using GPS (the Global Positioning System), measurements by hikers, traditional maps, LiDAR (Light Distancing and Ranging), and SRTM (the Shuttle Radar Topography Mission) create a plethora (a “variety”) of answers with a range of values. How to choose among them, and whether an improved estimate can be obtained by combining them, for example, by averaging, is one of the new concerns raised by the advent of Big Data. Thus Big Data is often defined by the “Three Vs”: volume, velocity, and variety. A fourth “V” is often suggested, to capture the various forms of uncertainty associated with Big Data, especially Big Data that come from nonauthoritative sources that have not been subjected to quality control. This fourth V might stand for validity or veracity, but unfortunately validity or veracity is what Big Data often lack, rather than a distinguishing property. Uncertainty is an especially important issue for geodata, and the subject of a large and growing literature (e.g., Zhang and Goodchild, 2002). Position is an essential element of geodata, and is measured using one of a range of techniques, each of which introduces its own level of uncertainty. For example, the GPS receiver in an average smart phone produces latitude and longitude with an error that is commonly in the 10 m range, but may be as high as 100 m if measurement is impacted by tall buildings, tree canopies, and many other factors. The local properties recorded in geodata (commonly termed the attributes) are also often subject to errors of numerous kinds, and when what is recorded is a class (of vegetation cover, e.g., or land use) the definition of the class will include uncertainty, such that two observers cannot be guaranteed to record the same class at a given point. In summary, geodata can never be perfect, never the truth.

1.03.1.3

Big Geodata

The concept of Big Geodata is comparatively recent, and deals with the well-defined and important subset of Big Data that are geodata. As the Landsat example cited above illustrates, volume is no stranger to geodata, and today our ability to collect and acquire geodata vastly exceeds our ability to store or process them. But the traditional process of acquiring geodata, through surveying, photogrammetry, or satellite-based remote sensing, has been slow and painstaking. Thus velocity in acquisition is a much more recent concern, with impacts that are disruptive. Similarly variety is novel, given the past reliance on single, authoritative sources, and thus also disruptive. The nature of these disruptive impacts is discussed at length later. If volume is no stranger to geodata, how have the problems of excessive volume been addressed in the past? Several longaccepted techniques have been used, allowing researchers and others to deal with what would otherwise have been impossible volumes of data. Geographers have long practiced the use of regions, by dividing the world into a number of areas that can reasonably be assumed to be uniform with respect to one or more properties. For example, the Midwest is part of the central United States, with large areas devoted to raising corn and soybeans. Of course it is not uniform, and significant variation exists within it, but it is nevertheless useful as a way of simplifying what otherwise might be overwhelming detail. Similarly geographic data is often abstracted or generalized, omitting detail in the interests of reducing volume, such as detail below a specified spatial resolution. Geographic data may also be sampled, on the principle that the phenomena in the gaps between the samples are likely to be similar to those at the sampled points. Spatial interpolation makes use of a range of techniques to estimate values of properties such as elevation or atmospheric temperature between sampled points. In addition, researchers and others have frequently addressed the volume problem using techniques that are generally termed divide-and-conquer. Landsat data, for example, is acquired and stored as a series of approximately 50,000 scenes, each covering an area of about 100 km by 100 km, and when combined providing complete coverage of the Earth’s 500,000,000 km2. For the Thematic Mapper sensor each scene contains about 3000 by 3000 cells, each roughly 30 m by 30 m. In order not to overload storage capacity, much research using Landsat proceeds one scene at a time. The weakness of this approach stems from the inability to identify and examine patterns that extend from one scene to its neighborsdbut the benefit lies in the ability to process Landsat with modest computing facilities. In summary, standard techniques widely practiced across the sciences have long made it possible to process Big Geodata using conventional means. It follows that the kinds of novel, unconventional computing described later open numerous opportunities for new discoveries. This entry is organized as follows. The next section discusses concepts that are related to Big Geodata, and broader but related trends that are impacting science and society. This is followed by a section on the disruptive impacts of Big Geodata, and by another on the technical advances associated with Big Geodata. The final section addresses the research issues that Big Geodata raise, and the prospects for their resolution in the near future.

1.03.2

Related Concepts

Big Geodata, with their characteristics of volume, velocity, and variety, have appeared at a time of major disruption in both science and society. Some of the more important and relevant of these are discussed in the following subsections.

1.03.2.1

Data-Driven Science

The volumes of data now being captured in digital form from sensors, satellites, social media, and many other sources have led to the suggestion that we are on the verge of a new era of data-driven science. Instead of the often ponderous process of theory-building and theory-testing, we should rely on analytic tools to discover patterns and correlations, and thus to make discoveries. This concept

Big Geodata

21

has often been termed the Fourth Paradigm (Hey et al., 2009), emphasizing a progression in the history of science from empirical observation to theory to simulation, and now elevating data to the primary role. In the world of data-driven science there would be no need for theory, as methods of machine learning and artificial intelligence would be sufficient to search for and discover all important patterns. Miller and Goodchild (2014) discuss the prospects for a data-driven geography based on geodata. The notion of automated pattern detection predates Big Data by several decades. Dobson (1983) was arguing for an automated geography in the 1980s, and techniques from artificial intelligence such as artificial neural nets (e.g., Schalkoff, 1997) and selforganizing maps (Agarwal and Skupin, 2008) are widely used. But while such techniques are elegant ways of finding patterns, in the absence of theory they provide no basis on which to interpret those patterns, and no basis for assuming that the patterns discovered from one sample of data or one geographic area can be generalized to other samples or areas. Similarly they may be successful at predicting certain events, but they provide no reason to expect that they will always be equally successful. Moreover, the underlying mantra of data-driven science, that we should “let the data speak for themselves,” assumes that the data are perfectly representative of reality, and thus that what is discovered about the data is also being discovered about reality. But measurements are always subject to measurement error. The positions that are an essential element of geodata are measurements, and moreover geodata are universally subject to uncertainties of many additional kinds, as noted earlier. Thus if we “let the geodata speak for themselves” we are never at the same time letting geography speak for itself. The patterns and correlations discovered in this way may be true of the real world, but they may also be spurious and uninteresting artifacts of the data. Moreover the data may miss aspects of the real world that are essential to understanding. For example, if the data were acquired by satellite imaging with a spatial resolution of 100 m, any important features or patterns with spatial resolution much finer than 100 m will be effectively invisible, and undiscoverable from the data.

1.03.2.2

Real-Time Analytics

With increasing volumes of data available in near-real time, Big Geodata appear to offer the possibility of a new kind of activity that is very different from the somewhat protracted and even leisurely traditions of geodata analysis. Rather than spend as much as 2 years gathering data, conducting analyses, writing and finally publishing the results, it is now possible to imagine geodata being continuously monitored, providing early warning of such events as disease outbreaks or earthquakes, and making discoveries in minutes that might previously have taken months. Moreover the Internet provides the means for almost instant dissemination of results. Yet while early warning can clearly be extremely valuable, the broader implications of velocity for science are more challenging. Science has always given highest value to knowledge that is true everywhere and all times (nomothetic science). Thus there would be little value given to the discovery of some principle that was true only at certain times, in certain places (idiographic science). Such discoveries might be described by many scientists using such pejorative terms as “journalism” or “mere description.” But while this attitude might be prevalent in the physical sciences and perhaps even in the environmental sciences, the situation in the social sciences is more nuanced. In recent decades the concept of “place-based analysis” has received significant attention, in the form of techniques such as local indicators of spatial association (LISA; Anselin, 1995) and geographically weighted regression (GWR; Fotheringham et al., 2002). Such techniques are driven by the notion that while a single mathematical principle may be (more or less) true everywhere, the parameters of the principle (e.g., the constants in a linear regression) may vary from place to place.

1.03.2.3

The Changing Nature of Science

Early science was dominated by the lone investigator, the researcher who “stood on the shoulders of giants” in the words of Isaac Newton, to create new knowledge through empirical investigation or theoretical reasoning. Science was organized into disciplines, with the underlying assumption that the giants in any area were members of a researcher’s own discipline, or one closely related to it. That model worked well for the likes of Newton, Darwin, or Einstein, as long as there were major advances to be made by solving comparatively simple problems. Today, however, there is a growing sense that the simple problems have been solved, and that future progress in science must involve researchers from many disciplines, working in teams, and untangling the many aspects of complex systems. Science is becoming multidisciplinary, with teams that integrate the contributions of many individual investigators. One consequence of this emerging pattern of science is that no one individual member of a team is able to know and understand all aspects of the study. Yet scientific methodology, which emerged in the days of Newton, Darwin, and Einstein, made every investigator responsible for all aspect of his or her work. For Darwin, for example, it was essential that virtually all of the observations that led him to develop the theory of natural selection were made personally by him. To be sure the prior work of others, including von Humboldt and Wallace, was influential, but in no sense did Darwin have to put his trust in data collected by others. This new world of collaborative research and the sharing of data is challenging, and threatens to disrupt the very foundations of the scientific method. Metadata, and documentation generally, may be held up to be the answer, but however complete they may be, metadata are never a perfect substitute for personal engagement in data acquisition. Science is also becoming more computational, and the computer has become an indispensable part of every project. Much of the software used in a project was likely written by someone who was not one of the team of investigators, and the data may have been acquired from a source outside the team, without complete documentation of the data’s provenance. The effect is that science

22

Big Geodata

conducted in this environment of multiple investigators, acquired data, and acquired software may not be fully replicable. Thus the changing nature of science, and especially science based on Big Geodata, may no longer be adhering to the long-established principles of the scientific method.

1.03.2.4

Open Data and Open Software

In a small group, collaborating scientists will be able to build a level of trust in each other’s work and expertise. Ideally each member of the group should be able to question openly the work of the others, so that in effect the principle of individual responsibility for science is transferred into a principle of group responsibility. But open sharing of data and software, often between individuals who will never come into direct or even electronic contact, makes this principle much less tenable. The open-data movement, which advocates publication and widespread dissemination of data, cannot possibly establish the same level of trust that is achievable in a small group of colocated collaborators. In principle, open data should be accompanied by complete documentation of provenance and quality, but this is rarely the case. Moreover the level of detail required for adequate documentation of provenance expands along with the bounds of dissemination: for example, the data-collection practices of one discipline may need much more detailed explanation if the data are to be shared with members of another discipline. If an information community is defined as a community that shares terminology and practices, then sharing of data and software across information communities clearly requires greater documentation and care than sharing within an information community. Especially problematic is the case where data or software originates in the commercial sector, which may well be concerned about its proprietary interests and less willing to share details of the data’s provenance or the software’s detailed algorithms.

1.03.3

Disruptions

It is clear from the discussion thus far that Big Geodata is capable of being disruptive or transformative, that is, of changing traditional practices. This section examines some of those disruptions in greater detail.

1.03.3.1

Publication

In the traditions of science, the results of a study are distilled into one or more papers or books, which are then reviewed by peers, edited, published, distributed, and made available in libraries. There has always been pressure to speed the process, but because so many stages involve humans it has been difficult to reduce the time of publication to much less than a year. Yet the “velocity” aspect of Big Geodata and the near-instantaneous communication offered by the Internet are having disruptive effects on the publication process. Moreover the “volume” aspect is removing many of the constraints on the amount of information that can be published. Once papers, books, maps, and atlases had been published and printed, their contents necessarily remained constant (though they might be subjected to later publications of errata, and a single copy might be modified by personal annotation). In this sense there was a clear end to a specific process of scientific investigation or data compilation. On the Internet, however, velocity has come to mean the demise of that clear end, as results can often be disseminated in draft form prior to full review, and modified later to reflect improvements in the author’s thinking or new results. Online maps can be modified instantaneously as new geodata become available. Today we are surprised when a restaurant found using an online service turns out to have been closed, whereas a generation ago few maps bothered to show information that was likely subject to change. Variety is also having a disruptive impact on publication. The processes of editing and peer review were largely successful at ensuring that published information was correct. Today, however, there are few if any checks on the validity of information that is published through social networks, blogs, Wikis, and other Internet-era media. Finally the advent of Big Data implies the removal of effective constraints on the volume of information that can be published. Traditional publication involved an intensive process of distilling, given the cost of the process and the limited capacity of books and papers. Books were effectively limited to a few hundred pages, and papers to a few thousand words, making it inconvenient to publish extensive raw data, complex software, or the detailed results of analysis. Today investigators are expected to make their data and software available through the Internet, so that replication or re-analysis by others becomes much more feasible, subject of course to issues of confidentiality and intellectual property. In short, the arrival of Big Data has dramatically disrupted the publication process.

1.03.3.2

Production of Geodata

The traditional processes of acquiring, compiling, publishing, and disseminating geodata were expensive, slow, and often manual. There was a high fixed cost of entry into the production process, with the result that only well-financed government agencies and large corporations could be producers. Maps and other products were designed to serve as many purposes as possible, for as long as possible, in order to offset the high costs. Thus the phenomena that were captured in geodata tended to be those that were comparatively static, and broadly useful. Beginning in the early 1990s, with the advent of mapping tools on personal computers, the costs of entry into the process of geodata production began to fall, eventually to close to zero. Economies of scale were no longer as important, and it became

Big Geodata

23

possible to make maps of almost anything, for purposes that were often very specific. Maps could be centered on the individual, rather than on an abstract system of tiles. Maps could be oriented to show the view from close to the ground, rather than from vertically above. Maps could be used to represent data that were valid only at the time of collection, rather than valid for an extended period into the future. Moreover maps could be made by anyone equipped with a personal computer, some software, and data gathered personally through the use of GPS and a variety of techniques typified by the camera flown on a personal drone, or downloaded from numerous Web portals. Turner (2006) has termed this neogeography, a new and disruptive world in which the individual citizen is empowered both to collect and to use geodata, and in which the old distinctions between the expert professional and the amateur are no longer as significant. Neogeography is a highly disruptive impact of the advent of Big Geodata, with its volume, velocity, and variety. It calls into question the value of long-established practices in the production of geodata, and the authority of mapping agencies and corporations that has for generations been the source of trust in published data. Geodata can now be crowdsourced (Sui et al., 2012), providing a very competitive product that benefits from the efforts of individual citizens rather than those of a small number of professional experts.

1.03.3.3

New Questions

Big Geodata differs from its precursors in many significant ways. First, it offers the possibility of finer spatial resolution. Widely available GPS tools can determine location to 10 m with the simplest and cheapest devices, and to decimeters with more advanced versions. Satellite imagery now offers spatial resolutions of well under 1 m, giving far more detail than the comparatively coarse resolutions of the past. When rules protecting confidentiality allow, cities can be studied and simulated at the level of the individual rather than in the aggregate, and new questions can be asked about human movement patterns and interactions. For example, the tracking of taxis and cell phones is being used to reach new understandings of the social structure of cities, and to build new models of the transmission of disease or the problems of evacuation during disasters. Fine-resolution imagery can be used to monitor crops around the world, and to observe the destructive impacts of earthquakes and hurricanes. Similar disruptions to past practice can be attributed to improved resolution in time, and to the benefits of near-real-time data. The average citizen now has access to current traffic congestion, and to tools that allow routes around congestion to be found. Weather maps of near-real-time data allow us to watch the development of storms and monitor the distribution of rainfall. All of these examples illustrate how improving temporal resolution is changing the way we do things, and vastly improving the information to which we have access. In some instances improved spatial and temporal resolution allows us to make better decisions and to correct the mistakes of the past that may be attributable to the coarse nature of the data then available. But the truly interesting impacts are on the new questions that improved data, and the technology of Big Data, allow us to ask and investigate. Here, however, it is necessary to confront the essential conservatism of science, or what Kuhn (1970) has called “normal science.” In Kuhn’s analysis, science continues along a largely predictable path until it is transformed in some way. Such transformations may be triggered by new data, new tools, new theories, or new concepts. Often such new ideas come from outside a given discipline, and must therefore fight an uphill battle against the established norms of the discipline.

1.03.3.4

Consumerization

As noted earlier, a central tenet of neogeography is that the relationship between amateur and professional expert is becoming blurred. The average citizen is now empowered both to consume and to produce geodata. Yet the tools and technology of geodata are not designed for this emerging world. Locations are defined using coordinate systems, such as latitude and longitude, that are not part of the everyday life of the citizen. Instead, people tend to learn about and describe the geographic world in terms of named places, which may range from entire continents (“Asia”) to rooms in one’s home (“the kitchen”). Places lack the precise boundaries of officially recognized places, such as cities or counties, and are often culturally, linguistically, or context-specific. Thus “Los Angeles” has different meaning to an Angeleno, a New Yorker, or a resident of China; and “The English Channel” and “La Manche” are used to refer to the same feature by the British and the French respectively. Traditionally it was necessary to limit and standardize the names of places, through the work of national toponymic committees. In the world of Big Geodata, however, there is ample potential to capture and represent the vernacular names of places, and their associations, and to provide interfaces to geotechnology that accommodate the ways people work with geography. Thus the interface to Google Maps, for example, includes a wide range of names that lack official recognition and would not have appeared on traditional maps. Researchers have developed techniques for identifying references to places in text, and for linking such references to maps (see, e.g., Li et al., 2013; Jones et al., 2008).

1.03.3.5

Spatial Prediction

Much of the enthusiasm for Big Data emerged in the world of commerce, where vast new sources of data began to provide a basis for useful prediction. For example, in a celebrated paper O’Leary was able to predict the winner of the Eurovision Song Contest from Big Data. Other work has been more skeptical, especially about the generalizability of such predictions, and about the general lack of attention to uncertainty, but the potential benefits of such predictions in the commercial world are undoubtedly huge.

24

Big Geodata

As noted earlier, prediction is not generally a highly valued activity in science. Moreover the very basis of the term “prediction” implies a concern with time, not space. Thus the advent of Big Data has the potential to change the balance between discovery and prediction in science, raising the latter from a somewhat peripheral to a central role. Traditionally prediction has meant estimation of what will occur when. Spatial prediction might similarly be defined as estimation of what will occur where, and perhaps also when. In line with scientific norms, however, both forms of prediction have been given little attention. Some exceptions can be found in the literature of geographic information systems (GIS). For example, the geologist Bonham-Carter was concerned with prediction of where gold deposits were likely to be found in Canada (BonhamCarter, 1991) based on layers of geologic data. Lloyd and Greatbatch (2009) were able to unravel the clues spread throughout the novels of P.G. Wodehouse to predict the geographic location of the imaginary Blandings Castle. Common practical applications of spatial prediction include estimates of real-estate value based on the characteristics of the house and its neighborhood. The second V, velocity, is likely to make spatial prediction much more valuable by opening the possibility of early warning in near-real time. That, together with the volume of data now available and the variety of its sources, suggests that it would be worth developing more extensive and sophisticated tools for spatial prediction in this era of Big Geodata.

1.03.4

The Technology of Big Geodata

As noted earlier, part of the drive toward Big Data is technological. Big volume requires high performance, and the use of the world’s most powerful computers. But velocity and variety also create technological challenges, as discussed later.

1.03.4.1

High-Performance Computing

We now have the ability to collect, process, store, and disseminate unprecedented quantities of data. We now have storage for petabytes and access to supercomputers that operate at petaflop rates (1 petaflop is roughly 1015 floating-point operations per second). Even the cheapest personal computers now employ multiple processors, while the number of CPUs (central processing units) and GPUs (graphics processing units) in the largest supercomputers is increasingly in the thousands or even millions. Several papers have discussed the potential of such massive computation for the geographical sciences. Operations that were impossible or would have taken prohibitively long on earlier devices are now within the capabilities of today’s supercomputers. These tend to be problems at fine spatial resolution and covering large areas, with algorithms that are not readily amenable to divide-and-conquer. For example, the US Geological Survey faced a significant computational problem in its National Map project when dealing with the rasterized versions of its 1:24,000-scale topographic maps. These maps use a projection with several distinct zones, such that pairs of maps that cover adjacent areas that lie in different zones will not fit cleanly along their common border. The problem does not arise in the case of vector data, so vector features such as roads and coastlines continue as expected across zone boundariesdbut for raster data, early versions of the National Map showed unacceptable “rips” in the raster data. Reprojection of the raster data on the fly was computationally intensive, but a solution was found (“pcRasterBlaster”; Finn et al., 2012) in high-performance computing through parallelization of the reprojection algorithm. The CyberGIS project headed by the University of Illinois at Urbana-Champaign, now in its fifth and final year of funding from the National Science Foundation, has systematically explored the application of high-performance computing and research collaboration for problems in the geographical sciences. As such, it represents a major investment in the technology that is making it possible to handle Big Geodata, and to explore the new questions that Big Geodata allows researchers to investigate. The possibilities of applying high-performance computing to geospatial problems have long intrigued researchers (Healey, 1998), but have remained tantalizingly out of reach for a variety of reasons. Today, however, it seems that the age of high-performance GIS has finally arrived.

1.03.4.2

Synthesis

The transition from a world of single, authoritative sources to multiple sources of often unknown provenance has drawn attention to the general lack of tools for the synthesis of Big Geodata. In the past, the multiple sources of data that might contribute to a geographic fact, such as the elevation of Mount Everest, have been largely compiled and synthesized by experts. Thus the estimate of 8848 m is the result of a long series of measurements of increasing accuracy, extending over more than a century, from early triangulation from the plains of India to the latest techniques of photogrammetry, radar interferometry, and satellite-based positioning. The trust we place in that estimate derives from the authority of the experts and their mapping agencies. Things could not be more different in the current neogeographic world. A search of the Web would produce perhaps thousands of pieces of information that might bear on a given fact; and the highly paid experts that in the past would have synthesized those pieces no longer exist in sufficient numbers to conduct the painstaking synthesis of every needed fact. Instead we are forced to rely on automation, in the form of techniques of fusion that create estimates from raw inputs. Although numerous techniques have been described, they are not yet at the level of availability to rival our techniques of analysis. In short, the geospatial world remains locked in a paradigm of analysis of single authoritative sources, rather than one of the synthesis of an abundance of sources of highly variable quality. To cite one very simple example, everyone knows how to compute a mean and many could estimate the uncertainty of the mean under common assumptions, but far fewer are trained in how to produce an optimum estimate from a number of sources of varying reliability, or how to estimate the uncertainty in the result.

Big Geodata

1.03.5

25

Conclusion

In recent years Big Data has captured the imagination of many researchers, and there has been very rapid growth in the demand for data scientists. Big Geodata is a well-defined subset of Big Data, sharing many of its concerns and priorities. It is also extreme in some respects: in the importance of uncertainty and the impossibility that any geodata can represent the truth; and in the efforts that have been expended in the geospatial community over the past quarter century on topics such as metadata, provenance, data sharing, interoperability, archiving, and other concerns of data science. There can be no doubt that Big Geodata will continue to grow in significance. Volume, velocity, and variety will continue to increase, taking advantage of broader developments in information technology and in the fundamental technologies of geodata. Spatial prediction offers some very significant practical applications. Progress will continue to be made in data integration and synthesis. Some of those problems will be solved. On the other hand if one defines Big Geodata as having volume, velocity, or variety that is beyond our current ability to handle, then Big Geodata will continue to remain just beyond our reach, and a major invitation to cutting-edge research.

References Agarwal, P., Skupin, A. (Eds.), 2008. Self-organizing maps: Applications in geographic information science. Wiley, Chichester, UK. Anselin, L., 1995. Local indicators of spatial associationdLISA. Geographical Analysis 27 (2), 93–115. Bonham-Carter, G.F., 1991. Integration of geoscientific data using GIS. In: Maguire, D.J., Goodchild, M.F., Rhind, D.W. (Eds.), Geographical information systems: Principles and applications, vol. 2. Longman, Harlow, UK, pp. 171–184. Dobson, J.E., 1983. Automated geography. The Professional Geographer 35 (2), 135–143. Finn, M.P., Liu, Y., Mattli, D.P., Guan, Q., Yamamoto, K.H., Shook, E., Behzad, B., 2012. pRasterBlaster: High-performance small-scale raster map projection transformation using the Extreme Science and Engineering Discovery Environment. In: The XXII Congress of the International Society for Photogrammetry and Remote Sensing. Melbourne, Australia. Fotheringham, A.S., Charlton, M., Brunsdon, C., 2002. Geographically weighted regression: The analysis of spatially varying relationships. Wiley, Hoboken, NJ. Healey, R.G., 1998. Parallel processing algorithms for GIS. Taylor and Francis, Bristol, PA. Hey, A.J.G., Tansley, S., Tolle, K.M., 2009. The fourth paradigm: Data-intensive scientific discovery. Microsoft Research, Redmond, WA. Jones, C.B., Purves, R.S., Clough, P.D., Joho, H., 2008. Modelling vague places with knowledge from the Web. International Journal of Geographical Information Science 22 (10), 1045–1065. Kuhn, T.S., 1970. The structure of scientific revolutions. University of Chicago Press, Chicago. Li, L., Goodchild, M.F., Xu, B., 2013. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science 40 (2), 61–77. Lloyd, D.A., Greatbatch, I.D., 2009. The search for blandings. Journal of Maps 5 (1), 126–133. Miller, H.J., Goodchild, M.F., 2014. Data-driven geography. GeoJournal 80 (4), 449–461. Schalkoff, R.J., 1997. Artificial neural networks. McGraw-Hill, New York. Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), 2012. Crowdsourcing geographic knowledge: Volunteered geographic information (VGI) in theory and practice. Springer, New York. Turner, A., 2006. Introduction to neogeography. O’Reilly, Sebastopol, CA. Zhang, J.X., Goodchild, M.F., 2002. Uncertainty in geographical information. Taylor and Francis, New York.

1.04

Current Themes in Volunteered Geographic Information

Colin J Ferster, University of Victoria, Victoria, BC, Canada Trisalyn Nelson, Arizona State University, Tempe, AZ, United States Colin Robertson, Wilfrid Laurier University, Waterloo, ON, Canada Rob Feick, University of Waterloo, Waterloo, ON, Canada © 2018 Elsevier Inc. All rights reserved.

1.04.1 1.04.2 1.04.2.1 1.04.2.1.1 1.04.2.1.2 1.04.2.1.3 1.04.2.1.4 1.04.2.2 1.04.2.2.1 1.04.2.2.2 1.04.2.2.3 1.04.3 1.04.3.1 1.04.3.2 1.04.3.3 1.04.4 References

1.04.1

Introduction Themes Process of Generating VGI History Types of VGI Motivation Equity in VGI Products Data quality Data ownership and open data VGI in cities and governance Examples RinkWatch The ForestFuelsApp BikeMaps.org Summary and Conclusions

26 27 27 27 27 28 28 29 29 30 30 31 31 33 34 37 39

Introduction

Data collection can be expensive, and sometimes practical constraints limit the ability to map in detail or keep maps up to date. Take the example of collecting data about species distributions of birds. Traditional surveys can provide detailed data on bird locations, but given the limited number of trained people on research teams, the large number of species, and the wide migratory distributions, it is impossible to get a detailed and comprehensive record of bird locations over a large area and a long period of time using standard approaches. Roads are another great example of phenomena that are difficult to map because they are always changing. Imagine trying to keep road maps up to date in a city that is developing quickly. Maintaining road datasets requires constant updating by city technical staff. By the time the map is completed it is out of date. Furthermore, the data may not be available for the public to access, use in the way it wants, or make edits and additions based on first-hand knowledge. With the growing popularity of mobile devices equipped with location sensors, there has been increasing demand for geographic data in applications and the possibility for individuals to use these tools to gather data. Issues of data timeliness, limitations in spatial and temporal resolutions, restrictions on use of official data, and difficulty capturing large spatial extents have led to growing interest in having individuals be part of networks for collecting data. When linked to a map, citizen science or crowdsourced data is called volunteered geographic information (VGI). VGI can be defined as individuals using the Web to create, assemble, and disseminate geographic information (Goodchild, 2007). VGI’s popularity is also supported by new feasibility for map data collection made possible by major advances in the way people create, view, and use maps. Improvements in mobile computing, in particular, personal communication devices (e.g., smartphones) have made digital maps accessible to an unprecedented number of people. Smartphones are increasingly common, and many are equipped with global positioning systems (GPS) to measure location, in addition to tools for capturing images (camera), recording text, sound, and other input (touch screen and microphone), and sharing over networks (data connectivity). Social networks are also important in that they connect many people to facilitate information sharing. It is now common to have measurements of location attached to messages and images shared over social networks (e.g., geotagged Tweets or images on Flickr) (Robertson and Feick, 2015) or for scientists to engage untrained volunteers in collecting and analyzing data (citizen science) (Haklay, 2013). Many aspects of scientific inquiry are geographic in nature, and therefore relate to VGI. Never before have as many advanced mapping tools been available to so many people, even while often this mapping task is hidden, creating a growing cadre of “accidental geographers” (Unwin, 2005). Geospatial tools have embedded geography into people’s lives. While generating geographic information may be a passive activity (e.g., a city routing app. tracks your progress while you drive through traffic to calculate the fastest route), there are also intentional efforts to collect data (Harvey, 2013). Scientists increasingly turn to citizen engagement to enhance data collection and outreach efforts in their communities (Dickinson and Crain, 2014). Also, the collection of technologies and methods known

26

Current Themes in Volunteered Geographic Information

27

as the “geoweb” create opportunities for people to rapidly collaborate on geographic issues of interest. Because tools are available to so many people, there are opportunities for many people to contribute, and VGI has several possible advantages. First, because there are opportunities for so many people to contribute, data collection can be more spatially and temporally extensive than traditional approaches coordinated through a central organization (Sui et al., 2013). Social media tools can engage an audience who is interested, experienced, skilled, and invested in a topic to contribute observations, interpret results, and share findings (Cooper and Lewenstein, 2016). Therefore, VGI can represent unique perspectives, experiences, and knowledge. Funding limitations for traditional government mapping initiatives may lead organizations to look outward for help keeping data up to date or motivate people to collect their own data to meet a need. VGI may cover topics that are difficult, or impossible, to address using traditional approaches (Nelson et al., 2015). VGI can be more rapidly responsive and adaptive to local needs than centralized efforts (Goodchild and Glennon, 2010). Despite the strengths of VGI, research has focused on data collection at the expense of information use and decision making, and many concerns remain in these areas. An obvious first concern is related to the quality of data contributed by volunteers, as often no credentials or training are required to participate in a project (Burgess et al., 2016). Another concern is that VGI may represent an incomplete segment of society (Romanillos et al., 2015). Barriers may be in place to having everyone represented; for example, many people do not have access to smartphones, computers, and networks due to available infrastructure and personal cost, the ability to effectively use the tools, free time to participate, and motivation to take part (Sanchez and Brenman, 2013). Addressing these concerns will increase the ability to use VGI to inform science and management, lead to new discoveries, and represent a wider range of experiences than traditional approaches. The purpose of this article is to introduce and discuss major themes in VGI, present three recent case studies where geographers used the Internet and smartphone tools to engage wide audiences using VGI. Our discussion of the themes of VGI is divided into two sections: the process of generating VGI and the resultant data products. Related to the process of generating VGI, first we discuss definitions, history, types, motivations of both participants and project organizers, and potential barriers to participating in VGI projects. Related to VGI products, we discuss data quality, data ownership, and the use in governance, in particular for cities. In the next section, we present three recent VGI case studies that demonstrate considerations for applying theory to real-life VGI projects. In the final section, we relate the projects back to the themes discussed in the article and summarize the main points.

1.04.2

Themes

1.04.2.1

Process of Generating VGI

1.04.2.1.1

History

It has long been a goal of some geographers to include a wide variety of people and interests in maps (Miller, 2006). Local people have local knowledge, are geographically close to a phenomenon, and may dedicate effort to topics and outcomes they are directly invested in, perspectives that may be missed by more centralized approaches (Feick and Roche, 2013). Devices like smartphones and personal computers are equipped with input tools for text, sound, images, location, and movement. Networks seamlessly connect individual devices, online tools, and extensive “cloud-based” storage. In the process of using these tools, masses of data are generated, much of which is geographic (Sui et al., 2013). These tools have presented new opportunities for collecting and sharing geographic information. Our understanding of the term VGI has evolved to embrace both a spectrum of types of geographic data that citizens create and share, as well as a growing range of technologies and social practices that enable these data and information resources to be created (Elwood et al., 2012). When viewed as geographic data, VGI can be seen to include both citizens’ objective recording of their environments (e.g., local stream temperature readings, counts of amphibians) and more subjective information that relates to their perceptions of opinions of places and features (e.g., georeferenced narratives). VGI can reference locations in either an explicit (e.g., geographic coordinates) or an implicit (e.g., vernacular regions such as “downtown”) manner. The data that individuals collect can vary substantially in format and include data that correspond with traditional GIS data, such as points, lines, and polygon features in OpenStreetMap that are characterized with descriptive text tags (Haklay, 2010). Other types of VGI, such as geotagged photographs (e.g., Flickr), text messages (e.g., Twitter), and videos (e.g., YouTube) are products of communication and present new opportunities to document and analyze human and natural phenomena (Shelton et al., 2015; Quesnot and Roche, 2014).

1.04.2.1.2

Types of VGI

Diversity can also be seen in the processes that underlie how VGI is created and used. Stefanidis et al. (2013), for example, make a useful distinction between VGI that individuals create deliberately and data that are generated passively as a byproduct of other activities. Active VGI is characteristic of citizen science activities as well as more routine municipal issue reporting (e.g., graffiti, potholes) where citizens are engaged in deliberately collecting information for a set problem or interest (e.g., bird sightings) and usually across a predefined set of variables (e.g., species, sex, time observed). In this way, active VGI projects seek to enlist citizens in helping experts to collect, curate, and share information that can be used to monitor environmental conditions and/or address applied research questions. In contrast, ambient and passively generated VGI are typically created without user intervention as an outcome of another process or activity. Considerable attention has been directed at examining how communication data, such as Twitter, microblog texts, and geotagged photographs, and videos, can be used to infer new insights about human behavior, movement, and perceptions (Li and Goodchild 2014; Shelton et al., 2015; Robertson and Feick, 2015).

28

Current Themes in Volunteered Geographic Information

Even within these bounds, the diversity of VGI data and authoring processes introduces interesting challenges and opportunities to further research. In terms of challenges, VGI producers often have different reasons for creating data and engaging in a VGI project. Some individuals, for example, may be interested in enhancing their personal reputation and status within a community, while others may have more altruistic motivations (Coleman et al., 2009). Since individuals’ motivations and expertise differ, the quality of data contributions can vary substantially from person to person (Foody et al., 2013; Li and Goodchild 2014; Devillers et al., 2010). Data coverage can be uneven as more people are willing to collect data about features and places that are popular and accessible than their counterparts that are seen to be less interesting or more difficult to capture. Notwithstanding challenges of this nature, the use of VGI offers several key advantages for advancing citizen science. The three VGI examples in section “Examples” illustrate several of these challenges and opportunities in more detail.

1.04.2.1.3

Motivation

VGI created through an active and purposeful engagement is dependent on the participatory process. Through the evolutions of public participation geographic information systems (PPGIS) and VGI with roots in participatory planning, much has been learned about processes of participation. Arnstein’s (1969) ladder of participation has been used to link degrees of participation of citizens to issues of power and control. While PPGIS projects were firmly rooted in urban and regional planning, VGI has a wider scope in terms of the types of projects and forms of participation enabled by more recent advances in geotechnologies (Goodchild, 2007). The reasons why people participate in VGI projects are intimately tied to the project’s objectives, and often many motivations exist for participants within single projects and even within single individuals (Coleman, 2009). Understanding participant motivations is a critical need for project designers that want to foster and build tools that cater to specific participant motivations. Coleman (2009) characterized user motivations in VGI, drawing from experiences in the open source software community, listing categories of motivations including altruism, professional/personal interest, intellectual stimulation, personal investment, social reward, enhancing personal reputation, an outlet for creativity and self-expression, and pride of place, as well as negative motivations of mischief, having a hidden agenda, or even criminal intent. These user motivations are tied to the nature of the information contributed, and may be a predictor of data quality. Note also that motivations are not static and can and will change throughout the life of a project (Rotman et al., 2012). For example, eBird, a popular citizen science project in ornithology, changed slogans from “Birding for a Cause,” which aimed to engage the altruistic motivations of volunteers, to “Birding in the 21st Century” with a focus on providing digital tools for birders to become better at their hobby. This change was associated with large increases in the number of contributions and improvements in the quality of contributed data (Cooper and Lewenstein, 2016). In projects that solicit citizen reporting of plants and animals, many people may be motivated to contribute data simply because they understand how better data may lead toward improved research, decision making, and/or conservation efforts. Researchers can therefore encourage submissions by using the data for research, publishing papers, presenting at scientific conferences and developing knowledge mobilization activities that translate the findings back to the participant community. User motivations also relate to mechanisms that can be built into a VGI project, such as data standards and sampling designs (Goodchild, 2009). A second and much less theorized aspect of VGI relates to the motivations of project designers or researchers. For citizen science, the most widely expressed motivation is expanding sampling effort through the use of citizen data collectors (Dickinson et al., 2010). However, research objectives may extend beyond data collection, for example, testing web-based participation tools (Nuojua, 2010) or evaluating spatial cognition tasks, in which case researcher and participant motivations may not align explicitly, and strategies such as gamification might be employed to target a general class of participants. Deeper participation in citizen science takes an approach “higher in the participatory ladder,” whereby participants are engaged in defining project objectives and how and what data gets collected (Haklay, 2013). Participatory action research may have much to inform citizen science and VGI more broadly in this regard, which has a long history of linking researcher and “practitioner” interests in research projects (Argyris et al., 2002). While participatory action research is rooted in social research, citizen science and VGI encompass both social and natural science research questions, often at scales not possible within the participatory action research model (Cooper et al., 2007). The continuum of control for VGI defines power structures that influence how actors in the project relate to each other. The interlinkages between user motivations, researcher and/or designer objectives, data quality characteristics, and project design choices can ultimately determine the characteristics of a VGI project and the information it produces. In the case studies investigated here, we see several motivations at play for participants. In the case of RinkWatch (see section “RinkWatch”), research objectives were both educational/outreach in nature (linking climate change to meaningful cultural ecosystem services) and to provide data for research relating temperature variability to outdoor skating. Through qualitative interviews of participants from RinkWatch, many participants revealed their motivation was driven by interest in outdoor skating. Outdoor skating, and in particular rinkmaking, is an activity individuals engaged in individually and RinkWatch served as an online community. In response to this realization, researchers implemented several bulletin boards to cater to these users, giving them a forum to exchange ideas and tips related to rink-making and stories about outdoor skating.

1.04.2.1.4

Equity in VGI

We have addressed the benefits of VGI as a novel data source that can satisfy unmet social or scientific needs and also as a vehicle for citizens to create and share information that is interesting or important to them. However, these benefits (and any costs) are not distributed equally. Individuals, social groups, and geographic areas differ in terms of access to technical resources that enable VGI production and use (e.g., Internet connectivity, open spatial data), as well as social, economic, and societal factors (e.g., financial resources, digital literacy, sociopolitical environments, legal structures) that condition how and whether digital data and tools are used (Sui et al., 2013). These inequities in access to digital tools and the ability to use them effectively have been described as a digital divide.

Current Themes in Volunteered Geographic Information

29

Access to the Internet is an easily understood prerequisite for generating VGI. A large proportion of the world’s population have limited or no Internet access due to a lack of infrastructure and the costs exceeding the income of many people (International Telecommunications Union (ITU), 2016). In North America, computer and smartphone ownership have increased dramatically, even for many in lower-income groups (Sanchez and Brenman, 2013). However, there are other nontechnical factors that impact how and if individuals can engage in VGI and citizen science projects. For example, many people do not have the free time needed to create VGI, particularly if they need to work long hours at multiple jobs or care for young or elderly family members (Wiggins, 2013). Similarly, disadvantaged groups may encounter social and educational barriers that limit their capacity to organize data collection projects and how effectively they can interact with governments through online tools (Sanchez and Brenman, 2013). While participation in VGI projects may not be possible for everyone, there is some hope that the outcomes (e.g., better data for planning) may benefit broader groups of people beyond those who directly participated. It is important to note that even within advantaged groups in society that have capacity to contribute to crowdsourcing and VGI projects, rates of participation differ leading to participation inequality. For example, in Wikipedia, a small subset of participants is responsible for generating the vast majority of the content, while many people make few contributions or only consume content (Bruns et al., 2013; Quattrone et al., 2015). The most active contributors for OpenStreetMap have been predominantly young, male, educated, and focused their efforts to mapping urban centers (Stephens, 2013; Quattrone et al., 2015). Different types of projects can attract contributions by different groups; for example, females made most of the contributions to the citizen science project EyeWire, a puzzle-like game to map neurons (Kim et al., 2014). Questions have arisen about the consequences of groups with similar demographic profiles, and likely common experiences and world views, generating crowdsourced products that are increasingly ubiquitous in use (Lam et al., 2011). Haklay (2016) emphasized that “[w]hen using and analysing crowdsourced information, consider the implications of participation inequality on the data and take them into account in the analysis.”

1.04.2.2 1.04.2.2.1

Products Data quality

Generally, definitions of data quality relate to fitness of the data for a given use (Chrisman, 1984), and VGI have opened discussion about more complex dimensions of data quality. The standards developed through the International Standards Organization (ISO) and specifically Technical Committee 211 provide a good starting point for examining spatial data quality (ISO, 19157, 2013). These cooperatively developed standards cover data quality elements such as lineage (history of the data generation and processing), positional accuracy, attribute accuracy, temporal consistency, and completeness, among others. Applying spatial data quality standards to VGI can be challenging. Unlike spatial data that are created by experts in government, private, or nongovernment organizations, VGI are often authored by many dispersed contributors who differ in expertise, interests, and methods for creating and documenting data (Poore and Wolf, 2013). As a result, data quality can vary from contributor to contributor within a single data set. Critiques of volunteered data quality have focused on concerns over spatial and attribute accuracy. In particular, inexperience in scientific protocols, the use of low-cost consumer devices (e.g., smartphone GPS sensors) rather than dedicated instruments, and differing motivations of volunteers as a source of bias in volunteered science (Show, 2015). These views are challenged on the basis that professional scientists also demonstrate nonobjectivity in research design and implementation, narrow views of data quality (as simple adherence to scientific protocols) are limiting, and differing approaches are enriching for the greater goal of discovery (Newman et al., 2015). Certainly, demonstrating adherence to measurement protocols can add authority to arguments backed by volunteered data (Ottinger, 2009). Technology can also be used to support curation of data to ensure unusual measurements are flagged for review and areas needing more detailed observations identified (Ferster and Coops, 2014). Similar to the measures of lineage, the dispersed contributors of VGI require strategies such as training for new participants and filtering and reviewing unusual values to ensure logical consistency of submitted data (Sullivan et al., 2014). Another challenge for VGI is data completeness. VGI is heterogeneous by nature and the density of data contributions varies by where volunteers choose to concentrate their efforts. However, VGI also presents opportunities to collect data that are spatially and temporally extensive. In some cases, having more data that are extensive and uncertain may be more valuable than having few very accurate data, or possibly no data at all, from official sources (Hochachka et al., 2012; Goodchild and Glennon, 2010). In general, VGI projects and citizen science initiatives will likely continue to operate with fewer protocols to control data quality than traditional scientific measures. However, the volume of data opens up potential and future research should consider how to implement confirmatory approaches to allow consistent data to be highlighted when repeated reports indicated similar patterns. Emergent considerations for dimensions of data quality are related to the extent and opportunity for contributions by a diversity of people, as the value of data can be enriched if a diversity of people have a chance to make unique contributions, chance discoveries, and opportunities to follow up on insights and chance discoveries (Lukyanenko et al., 2016). The key example in citizen science highlighted by Lukyanenko et al. (2016) was the discovery of a rare type of astronomical object by a school teacher in Holland, Hanney Van Arkel, in the Galaxy Zoo project (“Hanney’s Voorwerp”) (Raddick et al., 2010). The task being performed was a rather mechanical manual classification of shape within telescope images. Opportunities were provided to ask follow-up questions in an Internet forum attended to by experts, leading to a major outcome for science and the individual involved. The connection between participating citizens and the representativeness of data is a growing area of interest for VGI. For example, in cycling research there is a concern about how technology-based VGI may exclude participants and generate biased data (Romanillos et al., 2015). Equity has been discussed in terms of access to forms of active transportation and associated health benefits for different social groups (Lee et al., 2016). There is a strong appetite for cycling route data for city planners, and novel and

30

Current Themes in Volunteered Geographic Information

crowdsourced origins are being considered. For example, Strava is a cycling smartphone game where participants can track, compare, and compete their rides (Strava, 2017). The data on Strava represent actual cycle trips, but there may be a bias toward different types of routes compared to recreational or utility cyclists (e.g., competitive riders may seek out hilly routes in rural areas for training) (Griffin and Jiao, 2015). In other areas, such as urban centers, there may be less difference between Strava users and other types of cyclists (Jestico et al., 2016). Given appropriate modeling constraints, these novel data sources can be complementary to traditional data sources by offering covariates to make estimates over larger areas (Jestico et al., 2016). Interpretation within context is important to ensure the equitable allocation of public resources for developing active transportation facilities (Le Dantec et al., 2015).

1.04.2.2.2

Data ownership and open data

Data and algorithms are increasingly important in society and the two are intimately linked; the performance of algorithms is often tied to the size and quality of the training data used to develop them. The dominant mode of big data ownership has normalized the practice of individuals trading personal information services for access and ownership to personal data. In social media, for example, each exchange implies a trade in service (e.g., posting a message to friends) for a piece of data (e.g., access and ownership of the digital representation of that message). This trade positions corporations in opposition to individual users, in what many consider to be an imbalanced relationship (Smith et al., 2012; Kitchin, 2014). In citizen science, data ownership can take a variety of forms. One model is the previous model, where participants have no or little ownership or control over the data they create. Often, in academic projects, this model is not tenable because research ethics boards require participants to be able to withdraw from projects and have their data removed from the larger database. In some cases, full data access is granted on an individual and aggregate basis. For many users, access to raw data is less relevant than access to information products that relate to participant interests and motivations. In the case of citizen science with higher levels of citizen participation, participants can have direct roles in the management and access of data. Only in this case, where researchers and participants comanage data access policies, can the project be considered an open source citizen science project. Yet there are several important barriers to “opening” citizen science data. Firstly, many projects are designed to target a specific research or societal issue, and public access could have negative consequences. For example, reporting observations of an endangered species could incite others to seek it out, causing damage to habitat and possibly conflicting with conservation aims of the wider project. Similarly, health-related projects are particularly susceptible to ethical issues associated with open data (Goranson et al., 2013). Privacy concerns are also an issue, whereby home-based observations and usernames can be linked to other information (e.g., social media profiles) and risk harm to participants. Participation in determining data access policies is one way around this, where potential risks are discussed early and on an ongoing basis among researchers and participants (Haklay, 2013). A joint researcher–participant oversight committee is one tool projects can use to realize this level of participatory design in citizen science.

1.04.2.2.3

VGI in cities and governance

OpenStreetMap (OSM) is perhaps the most established VGI project and the focus on roads results in a strong emphasis in cities (Haklay and Weber, 2008). OSM is a community of people that map and update worldwide data on roads, which are notoriously difficult to keep current through traditional mapping workflows. Through time, OSM has grown to include other infrastructure and services. The strength of OSM is the huge number of contributors. Neis et al. (2011) estimated that in 2011 there were over 0.5 million contributors and that the number of contributors grew by 150 people each day. There is huge power in collectively harnessing local knowledge, especially involving something as dynamic as roads. OSM is a great example of a key strength of VGI: by compiling bits of data from a massive number of individuals, a new type of information is generated. As is typical of VGI projects, a key discussion of OSM has been around the quality of data (Ward et al., 2005; Jackson et al., 2013). The quality of road data is difficult to assess given lack of data to “truth” maps. Comparing OSM data to official data is helpful for assessing congruence and variability but it is not possible to know which is right. Additional quality discussions have emphasized assessment of the amount of OSM data, which varies spatially and is influenced by access to technology and skills of the population (Neis et al., 2011). The longevity and number of applications that use OSM is an indication that even with concerns about quality, VGI datasets can be the best available and are growing from fringe data into mainstream sources. The ubiquity of smartphones has created new opportunities for urban and city embedded VGI projects. Mobility and transportation focused VGI initiatives are becoming particularly prevalent, as personal GPS devices track where people move with unprecedented resolution (Misra et al., 2014; Griffin and Jiao, 2015). Cycling research is at the forefront of discussions about the validity of VGI as a source of data, given the plethora of fitness apps for tracking where people ride (Krykewycz et al., 2011). Personal fitness apps (e.g., Strava) are used by cyclists to track where they ride, cycling distance, and speed. With a gaming element, Strava encourages use by allowing people to compete with themselves and friends on distance and speed or undertake challenges, such as riding the elevation gain of Everest in a given time period. Perhaps unintentionally, Strava users are contributing to a massive global dataset on where people ride (Fig. 1). Strava Metro is now pursuing a new business model where they curate and sell the data to cities and researchers interested in ridership data. Derivative products that map ridership and visualize cyclist flow through a city are now possible. The primary criticism of using Strava is that data are primarily collected by recreational bike riders and biased toward young men (Jestico et al., 2016). However, recent research also indicates that in midsized North American cities the ridership patterns of Strava riders are similar to overall ridership patterns (Jestico et al., 2016). While there is much to be done to understand

Current Themes in Volunteered Geographic Information

31

Fig. 1 Strava global heatmap. Brighter blues indicate higher rates of Strava application use to record cycle trips. Source: Strava.com – Global Heatmap. http://labs.strava.com/heatmap/.

the appropriate use of fitness app data as a source of VGI for urban planning and population health research, interest in data will continue given the proliferation of personal apps that leverage GPS capabilities of phones (Romanillos et al., 2016). Emergency response and disaster management have been another focus of VGI research. There are very compelling examples of the benefit of using VGI when responding to floods (Bruns et al., 2011) and wildfires (Goodchild and Glennon, 2010). Particularly, in developing countries where authoritative data may be limited or void, such as the 2010 Haiti earthquake, volunteer data can be the only available source of information for evacuation, rescue, and recover (Meier, 2012). Beyond overcoming a paucity of data, VGI can have the advantage of real-time updating. Due to the potential for a large number of data contributors, as well as the lack of requirement to verify data before publishing, VGI can be updated much more quickly than official sources. During a disaster, this leads to both benefits and challenges (Roche et al., 2013). The positive benefit of knowing quickly what is happening and how change is occurring can be diminished if rumors or false data are provided. Ultimately, unverified data will have inaccuracies. To optimize utility of VGI, data need to be compiled and accessible, which does happen when a skilled developer of VGI quickly sets up a website to respond to an issue. One of the main strengths of VGI, the responsiveness and flexibility to provide information at the times and places where it is needed most, makes it a key tool for disaster response and management (Meier, 2012, Zook et al., 2010). Another aspect of emergency response where VGI is having an impact is preparedness. As an example, the BikeMaps.org (see section “BikeMaps.org”) initially launched in a city prone to earthquakes. A group of emergency responders quickly reached out to see if the technology could be broadened to support bike-based evacuation during an earthquake. Though an interesting application of VGI, cell phone service could not be guaranteed during an earthquake, making it a poor choice for this particular application. One challenge that may be met by VGI in the future is motivating community action for disaster mitigation and preparedness. Among volunteers in the ForestFuelsApp (see section “The ForestFuelsApp”), there were very low levels of knowledge and action for the existing wildfire mitigation programs (non-VGI), even among the highly engaged and knowledgeable audience. Salience and motivation for tasks related to wildfire preparedness are often highest immediately following a large fire event, while engaging communities at other times can be challenging (Monroe et al., 2006). Many recent wildfire apps are designed to disseminate information about active wildfires, while only a few provide information about mitigation or preparedness (Kulemeka, 2015). More broadly, there is a gap between the many natural disaster-related VGI efforts that are directed at response and the few that are directed at mitigation or preparedness (Horita et al., 2013; Klonner et al., 2016). Among traditional public outreach efforts for wildfire mitigation, interactive and participatory approaches have been the most effective, but have limited audiences (Toman, 2006). If VGI can be used to reach larger audiences, there is potential to use it as a tool to increase disaster mitigation and preparedness.

1.04.3

Examples

1.04.3.1

RinkWatch

RinkWatch is a citizen science project that engages citizens in climate change research through the reporting of ice skating conditions on outdoor community and backyard rinks. The website RinkWatch.org launched in 2012 with a simple web map interface and user

32

Current Themes in Volunteered Geographic Information

Fig. 2 RinkWatch “coldmap” showing percentage of skateable reports submitted by participants and rink information for a selected rink. Source: RinkWatch – http://www.rinkwatch.org/.

management system that allowed people to register with an email address, identify the location of their rink, and then to update continuously throughout the winter which days they could skate (or not) as temperatures changed (Fig. 2). RinkWatch helps people make the link between climate change and impacts to their daily life. Despite increasing availability of information and growing public awareness of climate change, it remains difficult to get the general public to take actions that enhance their ability to adapt to its potential impacts (Burch, 2010). The link between climate change and the feasibility of outdoor skating was made by Damyanov et al. (2012) through a modeling study that analyzed changes in weather station data and forecasted future change to the outdoor skating season in Canada. Given the cultural importance of outdoor skating to many northern communities, RinkWatch was formed to examine climate change impacts on daily life and the culture connection to climate change through citizen science. The RinkWatch project has three interrelated objectives: to better understand how temperature changes are impacting people’s ability to engage in outdoor skating, to engage and inspire interest in climate change and climate change research in a meaningful way, and to provide a testing bed for developing the “science of citizen science” (Robertson et al., 2015). The response to the RinkWatch project was immediate and widespread, with hundreds of participants signing up in the first weeks, and over 500 in its first season. Currently, now approaching the fifth year of operation, there are over 1900 registered rinks with the project and over 30,000 skating reports. Partly responsible for this successful recruitment was widespread media interest in Canada and the northeast United States, providing over 100 media opportunities to help publicize the project. We have since been able to analyze the data in relation to local temperature records and couple relationships between skatability and temperature to climate model scenarios (Robertson et al., 2015). While these projections are derived from relationships learned from only two seasons of data, they enable the translation of climate model projections from units of temperature (e.g., change centigrade) to units of days suitable for outdoor skating (see Brammer et al., 2014 for additional empirical work on this)da potentially more personally relevant metric to inspire changes that will reduce personal carbon footprints (Whitmarsh et al., 2011). Since the original launch of the website, the original web map interface has been replaced with a more comprehensive web mapping framework including spatial data visualizations (e.g., “cold” maps, point clusteringdsee Fig. 2), full open access data export for individuals and for all data, time series graphs, and some expert analyses of the data in relation to local temperature data. We have also added user-engagement tools through user forums, and ability to post and share photos of rinks. We have collaborated with the sustainability arm of the National Hockey League (NHL Green) on communicating and promoting the project, and sponsored student research projects into outdoor skating and climate change. RinkWatch has been successful in generating attention and interest in the topic of outdoor skating and climate change. As well, we have been able to leverage the data for research, showing regional variation in the skating-temperature relationship and how that may change with projected changes in climate (Robertson et al., 2015). Data quality has been assessed qualitatively through comparing plots of “skateability” to temperature recorded at local weather stations, finding generally consistent patterns and evidence to support  5 C as a skating threshold (rinkwatch.org). Since we expect some variability in within-city temperatures due to microclimates (e.g., shading), the spatial variability of observations within cities is an area of current investigation. Handling variable submission rates among users has led to the development of new methods for dealing with messy, heterogeneous

Current Themes in Volunteered Geographic Information

33

observations (Lawrence et al., 2015). In future seasons we plan to investigate the notion of climate fatigue, and how winter temperature variability impacts outdoor activities (e.g., rebuilding rinks). Challenges have also been managerial in nature: keeping the website current and updated with new content and features; finding resources to support software development (rinkwatch has never been supported by research funding), and maintaining engagement with users through social media and online forums.

1.04.3.2

The ForestFuelsApp

The ForestFuelsApp. was a regional citizen science project to collect data about the fuel available to burn wildfires in the WildlandUrban Interface (WUI), where human development meets natural areas. Populations are expanding in the WUI (Radeloff et al., 2005), and when wildfire occurs, there can be devastating human impacts (e.g., stress, injury, loss of life, and loss of homes and other infrastructure) (Gray et al., 2015). The app was tested in Kelowna, BC, where rural and scenic ideals often lead to people living in places where potential harm due to wildfire may occur. For example, the 2013 Kelowna Mountain Park Fire destroyed 239 homes and forced 27,000 people to evacuate (City of Kelowna, 2016). Kelowna, BC is located in the Very Dry variant of the Ponderosa Pine Biogeoclimatic Zone, where “[frequent low intensity] fires have played an important role in the ecology” (Hope et al., 1991). With increased development, suppression of low-intensity fires has resulted in open stands characterized by Ponderosa pine (Pinus ponderosa Dougl. ex. Laws.) being succeeded by more closed stands of Lodgepole pine (Pinus contorta var. latifolia Engelm. ex S. Wats.) abundant ground and ladder fuels and closed canopies. In these stands, wildfires can burn at high intensity and spread rapidly (Hope et al., 1991). The aim of the ForestFuelsApp was to make tools to assess forest fuels loading accessible to a broader population, collect consistent data, and increase awareness of WUI wildfire issues. The approach was inspired by ocular assessment methods, which provide a rapid and accessible way to make a general assessment of forest conditions (e.g., distinguishing open and closed canopies) by comparing field conditions with reference photographs (Keane, 2013). Forms from provincial protocols (Morrow et al., 2008) were coded with reference photographs and illustrations developed by the research team (Fig. 3). Location was measured using the device GPS. Photographs were acquired using the device camera and accelerometer to ensure consistent framing and leveling. The device compass and accelerometer were used to measure slope and aspect. When initially opened, a brief tutorial was presented with illustrations and text describing wildfire fuel assessments and use of the app. Eighteen volunteers were recruited from the community through media coverage, classified advertisements, posters, and contact with local hiking clubs and neighborhood associations. Questionnaires were administered before and after using the application. Smartphones (Apple iPhone 4) were provided to volunteers for testing with the ForestFuelsApp loaded. Volunteers were accompanied by a member of the research team and instructed to collect measurements at locations of their choice in the general vicinity of

Fig. 3 An example of estimating the crown closure of conifer trees using (A) reference photographs of different stand conditions and coding from official protocols, (B and C) instructions and illustrations, and (D) capturing reference imagery using the accelerometer to ensure consistent acquisition angles.

34

Current Themes in Volunteered Geographic Information

the University of British Columbia Okanagan campus, to simulate a volunteered and opportunistic dataset collected over the Internet. Observational notes were collected. Finally, the locations chosen by volunteers were revisited by the research team to collect reference measurements for comparison. Many of the volunteers were recruited through hiking or neighborhood groups or had professional experience and interest in WUI fuels conditions (50%). While the interest from professional foresters was higher than anticipated, this provided an opportunity to both solicit professional feedback on the application, and compare differences between people with and without professional experience in wildfire and forestry topics. Both groups expressed similar motivations for volunteering in the project (the most common reason was related to valuesdwanting to help solve a community problem), while people with professional forestry experience more frequently expressed career motivations (e.g., learning new tools that may be useful for their job). One tension that existed was that some professional foresters expressed concerns about nonprofessionals coming to incorrect conclusions or setting unrealistic expectations for treatments that exceeded available resources. This concern was not reflected by the participants without professional experience, who generally indicated that they were more interested in helping out with the tedious task of collecting data to help their community, than setting priorities for stand treatments. Some of the volunteers indicated that they would have found more physical and demanding tasks, such as covering greater distances and submitting more data, more rewarding. The most frequently cited reward for participation was related to understanding, both related to wildfires and technology. People over the median age more frequently reported that they learned a new skill related to either the use of smartphones or wildfire management (Ferster et al., 2013). For many of the fuel components, the volunteered measurements were consistent with the reference measurements. For measurements of slope and aspect, measurements by people without professional experience were less accurate than people with professional forestry experience, likely due to less practice with compass and inclinometer. This would be expected to improve over time. Observations of height to live crown were more consistent when made by people without professional forestry experience. People with professional forestry experience were observed to have differing working definitions of this attribute based on a range of experiences, while those without working experience more closely followed the instructions. People with no previous professional experience with wildfires collected data that covered a greater spatial extent and a wider range of conditions, while people with professional fire experience identified high priority locations near buildings with higher fire loads. For model building, the two sets of measurements were complementary (Ferster and Coops, 2014). The ForestFuelsApp followed a very traditional volunteering approach, requiring high levels of engagement and effort from volunteers. One participant stated “tools are needed for people living in the [WUI], including communication, steps, and actions. I could see this being useful for work parties in the community.” As a result, a number of volunteers were limited and volunteers were highly dedicated. While a broader audience may have been reached using less intensive activities, at the same time, highly engaged participants could have been given more demanding tasks. The implementation did not fully utilize the potential for social connectivity; for example, volunteers could not see the data collected by other volunteers to find out where more measurements were needed, and there were not opportunities to interact with other volunteers using social models (e.g., using social media to promote, connect participants, analyze data, and discuss and share results). Concerns about liability and community conflict restricted further growth. For example, there were concerns about people using the application to document fuel threats on private land where landowners lacked resources to perform treatments and may suffer financial liabilities, leading to community conflict. However, initial outcomes were promising, with useful data collected, positive responses from participants, and expression of collective goals between people with different experiences in the community.

1.04.3.3

BikeMaps.org

BikeMaps.org is a global VGI tool that is filling the gap in available data on cycling safety. It is estimated that only 30%–40% of cycling related incidents are recorded in official databases. Official databases are typically generated by police departments and insurance reports, and primarily represent bike incidents that involved vehicles. However, in a study of injured adult cyclists, treated in emergency departments, only 34% of incidents were collisions with motor vehicles and another 14% were a result of avoidance of a motor vehicle (Teschke et al., 2014). When a bike collision occurs with infrastructure, another bike, or a pedestrian there is typically no mechanism for reporting. Another gap in cycling data is the lack of near miss reporting. Near miss events are critical for safety management in general (Gnoni et al., 2013) and have the potential to provide early warning of high-risk areas. When compared to the number of human errors or near miss incidents, a crash is a relatively rare event. Thus collecting sufficiently large near-miss databases can enable earlier detection of problematic areas (Reason, 1991) and support robust statistical analysis. Near-miss information also provides critical link to overcoming deterrents to ridership. Concerns about safety are a primary barrier to new ridership (Winters et al., 2012) and with many cities setting goals to increase ridership understanding real and perceived safety concerns are critical. In cycling, near misses can have significant physiological impacts that deter ridership (Aldred, 2016). Through BikeMaps.org citizens can report cycling crashes, near misses, hazards, and thefts. BikeMaps.org includes a webmap, smartphone apps, and visualization tools (Fig. 4) (Nelson et al., 2015). Citizens identify an incident location by clicking a “submit new point” button and adding the location on the map where the incident occurred. Details of collisions and near misses are reported via a digital form through pull-down options. All reports are anonymous. The attributes captured through the pull-down menus are designed to enable research on determinants of cycling injury (Teschke et al., 2012). There are three categories of attributes: incident details, conditions, and personal details, with a balance of required and optional questions to manage citizen mapper burden. BikeMaps.org is also supported by Apps for both Android and iPhone devices. In addition to allowing mobile

Current Themes in Volunteered Geographic Information

35

Fig. 4 BikeMaps.org visualization tools. The visualizations are dynamic, adjusting with display extents as well as selection of incident types and time periods. Source: BikeMaps.org – https://bikemaps.org/vis/.

mapping, the Apps provide feedback to users through push notifications that alert cyclists to new mapping in their area. The website also includes a visualization page where a spatial extent can be selected and temporal trends in crashes and near misses summarized by day of the week and hour. The visualizations are dynamic enabling queries of data trends with the click of the mouse. Launched in the fall of 2014, BikeMaps.org has over 3200 locations mapped in 35 countries (Fig. 5). The global response to the website is an indication that BikeMaps.org is filling an important gap in data available to study cycling safety. Sixty percent of

Fig. 5

Global reports of BikeMaps.org incidents. Source: BikeMaps.org – https://bikemaps.org/.

36

Current Themes in Volunteered Geographic Information

Fig. 6 Local print media covering BikeMaps.org is often associated with increased use. Source: Paterson, T. (2016) ‘BikeMaps charts course across the country’, Saanich News, 19 May, p.1.

locations are mapped in Victoria and Vancouver, Canada, where outreach efforts were initially focused. In most other locations, uptake has been more organic, resulting from social and earned media. In Victoria and Vancouver, we have undertaken a range of outreach activities. The biggest gains in data are typically associated with print media (Fig. 6). While social media, Twitter (Fig. 7) especially, have been important for broader communication of our message, visits to the website and data submissions are highest when local newspapers feature a story on BikeMaps.org. Guerrilla marketing strategies have also proven effective (Fig. 8). In one campaign, we delivered 500 branded water bottles to parked bikes around the city. When cyclists returned to their bikes, they found a note to encourage them to contribute to safer cycling by mapping their experience. A strength of the BikeMaps.org VGI project is the use of data for community engagement, research, and policy decisions. With expertise in spatial analysis, the BikeMaps.org team is able to create map products from data, such as maps of cycling safety hot spots, and these have been invaluable for ongoing engagement of users. Maps are also a great way to generate earned media as they tell a story of broad interest. As data sets increase in size, we are also using BikeMaps.org data for peer-reviewed publications on cycling safety, bringing credibility to the project. In areas where a substantial number of incidents have been reported, city

Fig. 7

BikeMaps.org Twitter feed. Source: BikeMaps.org Twitter – https://twitter.com/bikemapsteam?lang=en.

Current Themes in Volunteered Geographic Information

Fig. 8

37

Examples of BikeMaps.org guerrilla marketing include distributing branded saddle covers, water bottles, and other goods.

planners have requested data and used them for planning. For example, in 2015 BikeMaps.org data were used in the City of Victoria’s bicycle network planning (Fig. 9).

1.04.4

Summary and Conclusions

VGI is a new source of data that is changing what we can study and how we explore our world. Growth in VGI is fueled by technology such as GPS, GIS, and the ability to quickly build and share maps over the Internet. Digital maps are everywhere and mobile GPS technology has been made mainstream through smartphones. As such, a huge proportion of individuals are carrying out dayto-day tasks with a device that is perfect for VGI collection, a smartphone. The power of VGI is that it leverages the fact that each of us has knowledge or can make observations. When we combine an individual’s knowledge or experience within a coherent data structure, the whole becomes more than the sum of the parts. The types of VGI that are generated are diverse, from simple actions such as georeferencing other shared media (e.g., a Tweet or a photography) to intentional efforts to engage a wide audience in generating information. The motivations for both project organizers and participants of VGI projects are wide and ranging. It is informative to consider motivations when evaluating project popularity and outcomes for individuals, management, and science. Issues of data quality and representation bias seem to be the primary criticism of VGI. However, even with limitations, VGI often represents the best available data. While we can and should design technology and methods that optimize the collection of high-quality and consistent data, VGI projects will benefit from research that develops approaches to working with uncertain data. Solutions to uncertain data may take several approaches. For example, an important solution could be to develop tools that enable tracking of confirmatory VGI that emphasizes patterns that are consistent. A second solution is to develop statistical approaches to integrating or conflating VGI with traditional data sets. For example, Jestico et al. (2016) leverage the spatial and temporal extents of Strava by integrating it with official counts that have complete attribution. Finally, awareness of barriers to VGI use by groups of people can lead to greater inclusion or alternate strategies to solicit input. RinkWatch links climate and the culturally important activity of outdoor ice skating to engage interest and awareness of climate change. This approach attracted extensive media attention and garnered many contributions through submitted data, discussion forums, and data visualizations. The submitted reports are useful for exploring regional variation in climate and linking climate change models to the “skateability” of outdoor rinks, a metric that is relatable for many people. The ForestFuelsApp showed that people who do not traditionally take part in forestry data collection can assist with data collection tasks, there are people in the community who are motivated to assist, and these people reported enjoyment and learning from some of the tasks. However, compared to the other projects (RinkWatch and BikeMaps.org), which had lower requirements in terms of entry levels of effort, a relatively small audience was engaged. More dynamic and less tedious forms of engagement may reach larger audiences, while deeper forms of engagement may still have a role for certain types of tasks. A flagship success of BikeMaps.org is that within 1 year of project launch the data were used to support planning of cycling infrastructure in Victoria, Canada. The project goal was to overcome the lack of available cycling safety data and when data began being requested for decision making the BikeMaps.org project had begun to achieve its mission. Essential to success were the quality of VGI technology and the careful development of the attributes associated with data. As well, promotion efforts were substantive during the first year which generated a quantity of data that was sufficient to demonstrate utility of the site. The overarching reason for success is that the VGI generated by citizen cyclists fills a specific data niche, making it a valuable resource for planning and research. As an author team we have run a variety of VGI projects and a key lesson learned is that VGI projects require maintenance. Both the technology and promotion of VGI tools require ongoing support. It can be costly to keep apps and websites maintained and the knowledge of the technology may need to transfer from one technician to another over time. As well, it is rare that a VGI project will become self-promoting. Rather, it is typical that continued use of a VGI tool requires ongoing outreach to the user community.

38

Current Themes in Volunteered Geographic Information

Fig. 9 An example of BikeMaps.org data presented for use in bicycle network planning in Vancouver, British Columbia, Canada. Source: BikeMaps.org – https://bikemaps.org/blog/post/10th-avenue-corridor-vancouver-bc-cycling-safety-trouble-spots.

Current Themes in Volunteered Geographic Information

39

Gamification of tools and generating products from data will help. However, a plan is required to ensure the investment in VGI technology will have long-term use and benefits. In the case of BikeMaps.org, the project initially began as a research project but is now morphing into a team that has both a research arm and a nonprofit outreach arm.

References Aldred, R., 2016. Cycling near misses: Their frequency, impact, and prevention. Transportation Research Part A: Policy and Practice 90, 69–83. http://dx.doi.org/10.1016/ j.tra.2016.04.016. Argyris, C., Schön, D.A., 2002. Participatory Action Research and Action Science Compared. American Behavioral Scientist 32 (5), 612–623. Arnstein, S.R., 1969. A Ladder Of Citizen Participation. Journal of the American Institute of Planners 35 (4), 216–224. http://dx.doi.org/10.1080/01944366908977225. Brammer, J.R., Samson, J., Humphries, M.M., 2014. Declining availability of outdoor skating in Canada. Nature Climate Change 5 (1), 2–4. http://dx.doi.org/10.1038/ nclimate2465. Bruns, A., Burgess, J., Crawford, K., Shaw, F., 2011. qldfloods and @ QPSMedia: Crisis Communication on Twitter in the 2011 South East Queensland Floods, ARC Centre of Excelence for Creative Industries and Innovation. Brisbane. http://dx.doi.org/10.1007/978-3-642-39527-7_16. Bruns, A., Highfield, T., Burgess, J., 2013. The Arab Spring and social media audiences: English and Arabic Twitter users and their networks. In: McCaughey, M., Ebooks Corporation (Eds.), Cyberactivism on the participatory web. Routledge, New York, pp. 86–116. Burch, S., 2010. Transforming barriers into enablers of action on climate change: Insights from three municipal case studies in British Columbia, Canada. Global Environmental Change 20 (2), 287–297. http://dx.doi.org/10.1016/j.gloenvcha.2009.11.009. Burgess, H.K., DeBey, L.B., Froehlich, H.E., Schmidt, N., Theobald, E.J., Ettinger, A.K., HilleRisLambers, J., Tewksbury, J., Parrish, J.K., 2016. The science of citizen science: Exploring barriers to use as a primary research tool. Biological Conservation 208, 113–120. http://dx.doi.org/10.1016/j.biocon.2016.05.014. Chrisman, N.R., 1984. Part 2: issues and problems relating to cartographic data use, exchange and transfer: the role of quality information in the long-term functioning of a geographic information system. Cartographica: The International Journal for Geographic Information and Geovisualization 21 (2-3), 79–88. Coleman, D.J., Georgiadou, Y., Labonte, J., 2009. Volunteered Geographic Information: The Nature and Motivation of Produsers. International Journal of Spatial Data Infrastructures Research 4 (4), 332–358. http://dx.doi.org/10.2902/1725-0463.2009.04.art16. Cooper, C.B., Lewenstein, B.V., 2016. Two meanings of citizen science. In: Cavalier, D., Kennedy, E.B. (Eds.), The rightful place of science: citizen science. Consortium for Science, Policy, & Outcomes, Tempe, Arizona, pp. 51–62. Cooper, C.B., Dickinson, J., Phillips, T., Bonney, R., 2007. Citizen science as a tool for conservation in residential ecosystems. Ecology and Society 12 (2), 11. Damyanov, N.N., Damon Matthews, H., Mysak, L.a., 2012. Observed decreases in the Canadian outdoor skating season due to recent winter warming. Environmental Research Letters 7, 14028. http://dx.doi.org/10.1088/1748-9326/7/1/014028. Devillers, R., Stein, A., Bédard, Y., Chrisman, N., Fisher, P., Shi, W., 2010. Thirty Years of Research on Spatial Data Quality: Achievements, Failures, and Opportunities. Transactions in GIS 14 (4), 387–400. http://dx.doi.org/10.1111/j.1467-9671.2010.01212.x. Dickinson, J., Crain, R., 2014. Socially Networked Citizen Science and the Crowd-Sourcing of Pro-Environmental Collective Actions. In: Agarwal, N., Lim, M., Wigand, R.T. (Eds.), Online Collective Action, 1st edn. Springer-Verlag Wien, Vienna, pp. 133–152. http://dx.doi.org/10.1007/978-3-7091-1340-0 (Lecture Notes in Social Networks). Dickinson, J.L., Zuckerberg, B., Bonter, D.N., 2010. Citizen Science as an Ecological Research Tool: Challenges and Benefits. Annual Review of Ecology, Evolution, and Systematics 41 (1), 149–172. http://dx.doi.org/10.1146/annurev-ecolsys-102209-144636. Elwood, S., Goodchild, M.F., Sui, D., 2012. Researching volunteered geographic information: spatial data, geographic research, and new social practice. Annals of the Association of American Geographers 102 (3), 571–590. Feick, R., Roche, S., 2013. Understanding the value of VGI. In: Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice, 2012th edn. Springer, Dordrecht, Netherlands, pp. 15–29. Ferster, C.J., Coops, N.C., 2014. Assessing the quality of forest fuel loading data collected using public participation methods and smartphones. International Journal of Wildland Fire 23 (4), 585–590. Ferster, C.J., Coops, N.C., Harshaw, H.W., Kozak, R.A., Meitner, M.J., 2013. An exploratory assessment of a smartphone application for public participation in forest fuels measurement in the wildland-urban interface. Forests 4 (4), 1199–1219. Foody, G.M., See, L., Fritz, S., Van der Velde, M., Perger, C., Schill, C., Boyd, D.S., 2013. Assessing the Accuracy of Volunteered Geographic Information arising from Multiple Contributors to an Internet Based Collaborative Project. Transactions in GIS 17 (6), 847–860. http://dx.doi.org/10.1111/tgis.12033. Gnoni, M.G., Andriulo, S., Maggio, G., Nardone, P., 2013. “Lean occupational” safety: an application for a near-miss management system design. Safety Science 53, 96–104. Goodchild, M., 2009. NeoGeography and the nature of geographic expertise. Journal of Location Based Services 3 (2), 82–96. http://dx.doi.org/10.1080/17489720902950374. Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69 (4), 211–221. http://dx.doi.org/10.1007/s10708-007-9111-y. Goodchild, M.F., Glennon, J.A., 2010. Crowdsourcing geographic information for disaster response: a research frontier. International Journal of Digital Earth 3 (3), 231–241. Goranson, C., Thihalolipavan, S., Di Tada, N., 2013. VGI and Public Health: Possibilities and Pitfalls. In: Sui, D., Elwood, S., Goodchild, M. (Eds.), Crowdsourcing Geographic Knowledge, 1st edn. Springer Netherlands, Dordrecht, pp. 329–340. http://dx.doi.org/10.1007/978-94-007-4587-2_18. Gray, R.W., Oswald, B., Kobziar, L., Stewart, P., Seijo, F., 2015. Reduce wildfire risks or we’ll continue to pay more for fire disasters, Position statement of the Association for Fire Ecology, International Association of Wildland Fire, and The Nature Conservancy. Eugene, OR. Available at: http://fireecology.org/Reduce-Wildfire-Risks-or-Well-Pay-More-forFire-Disasters (Accessed April 3, 2017). Griffin, G.P., Jiao, J., 2015. Where does bicycling for health happen? Analysing volunteered geographic information through place and plexus. Journal of Transport and Health 2 (2), 238–247. http://dx.doi.org/10.1016/j.jth.2014.12.001. Haklay, M., 2010. How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environment and Planning B: Planning and Design 37 (4), 682–703. http://dx.doi.org/10.1068/b35097. Haklay, M., 2013. Citizen science and volunteered geographic information: overview and typology of participation. In: Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice, 2012th edn. Springer, Dordrecht, Netherlands, pp. 105–122. Haklay, M., 2016. Why is participation inequality important? In: Capineri, C., et al. (Eds.), European handbook of crowdsourced geographic information. Ubiquity Press, London, pp. 35–44. Haklay, M., Weber, P., 2008. OpenStreetMap: user-generated street maps. IEEE Pervasive Computing 7 (4), 12–18. Harvey, F., 2013. To volunteer or to contribute locational information? Towards truth in labeling for crowdsourced geographic information. In: Sui, D., Elwood, S., Goodchild, M. (Eds.), Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice. Springer, Dordrecht, Netherlands, pp. 31–42. Hochachka, W.M., Fink, D., Hutchinson, R.A., Sheldon, D., Wong, W., Kelling, S., 2012. Data-intensive science applied to broad-scale citizen science. Trends in Ecology & Evolution 27 (2), 130–137. Hope, G.D., Lloyd, D.A., Mitchell, W.R., Erickson, W.R., Harper, W.L., Wikeem, B.M., 1991. Ponderosa Pine Zone. In: Meidinger, D., Pojar, J. (Eds.), Ecosystems of British Columbia, 1st edn. Research Branch, BC Ministry of Forests, Victoria, British Columbia, pp. 139–151.

40

Current Themes in Volunteered Geographic Information

Horita, F., Degrossi, L., Assis, L., Zipf, A., Porto de Albuquerque, J., 2013. The use of volunteered geographic information and crowdsourcing in disaster management: a systematic literature review. In: Proceedings of the 19th Americas Conference on Information SystemsAIS, Chicago, pp. 1–10. Available at: http://aisel.aisnet.org/cgi/viewcontent.cgi? article¼1591&context¼amcis2013. International Telecommunications Union (ITU), 2016. ITU ICT Facts and Figures 2016. Switzerland, Geneva. Available at: http://www.itu.int/en/ITU-D/Statistics/Documents/facts/ ICTFactsFigures2016.pdf (Accessed: 3 April 2017). ISO 19157 (2013) Retrieved October 28, 2016, from http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber¼32575 Jackson, S.P., Mullen, W., Agouris, P., et al., 2013. Assessing completeness and spatial error of features in volunteered geographic information. ISPRS International Journal of GeoInformation 2 (2), 507–530. Jestico, B., Nelson, T., Winters, M., 2016. Mapping ridership using crowdsourced cycling data. Journal of Transport Geography 52, 90–97. Keane, R.E., 2013. Describing wildland surface fuel loading for fire management: a review of approaches, methods and systems. International Journal of Wildland Fire 22 (1), 51. Kim, J.S., et al., 2014. Space–time wiring specificity supports direction selectivity in the retina. Nature 509 (7500), 331–336. Kitchin, R., 2014. The data revolution: Big data, open data, data infrastructures and their consequences. In: Rojek, R., Dickens, K., Haw, K. (Eds.), 1st edn. SAGE Publications, Croyden, England. Klonner, C., Marx, S., Tomás, U., Porto de Albuquerque, J., Höfle, B., 2016. Volunteered geographic information in natural hazard analysis: a systematic literature review of current approaches with a focus on preparedness and mitigation. ISPRS International Journal of Geo-Information 5 (7), 103. Krykewycz, G., Pollard, C., Canzoneri, N., He, E., 2011. Web-based “crowdsourcing” approach to improve areawide “bikeability” scoring. Transportation Research Record: Journal of the Transportation Research Board 2245, 1–7. Kulemeka, O., 2015. A review of wildland fire smartphone applications. International Journal of Emergency Services 4 (2), 258–270. Lam, S., Uduwage, A., Dong, Z., et al., 2011. WP: clubhouse?: An exploration of Wikipedia’s gender imbalance. In: Proceedings of the 7th International Symposium on Wikis and Open CollaborationACM Press, New York, pp. 1–10. Available at: http://dl.acm.org/citation.cfm?id¼2038560. Lawrence, H., Robertson, C., Feick, R., Nelson, T., 2015. Identifying Optimal Study Areas and Spatial Aggregation Units for Point-Based VGI from Multiple Sources. In: Harvey, F., Leung, Y. (Eds.), Advances in Geographic Information Science, 1st edn. Springer International Publishing, Cham, Switzerland, pp. 65–84. http://dx.doi.org/10.1007/978-3-31919950-4_5. Le Dantec, C.A., Asad, M., Misra, A., Watkins, K., 2015. Planning with crowdsourced data: rhetoric and representation in transportation planning. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social ComputingACM Press, New York, pp. 1717–1727. Available at: http://dl.acm.org/citation.cfm? doid¼2675133.2675212. Lee, R.J., Sener, I.N., Jones-Meyer, S.N., 2016. A review of equity in active transportation. In: Transportation Research Board 95th Annual Meeting Compendium of Papers, no. 161835. Available at: http://amonline.trb.org/16-1835-1.2982186?qr¼1. Li, L., Goodchild, M.F., 2014. Spatiotemporal Footprints in Social Networks. In: Alhajj, R., Rokne, J. (Eds.), Encyclopedia of Social Network Analysis and Mining, 1st edn. Springer New York, New York, NY, pp. 1990–1996. http://dx.doi.org/10.1007/978-1-4614-6170-8_322. Lukyanenko, R., Parsons, J., Wiersma, Y.F., 2016. Emerging problems of data quality in citizen science. Conservation Biology 30 (3), 447–449. Meier, P., 2012. Crisis mapping in action: how open source software and global volunteer networks are changing the world, one map at a time. Journal of Map & Geography Libraries 8 (2), 89–100. Miller, C.C., 2006. A beast in the field: the Google Maps mashup as GIS/2. Cartographica: The International Journal for Geographic Information and Geovisualization 41 (3), 187–199. Misra, A., Gooze, A., Watkins, K., Asad, M., Le Dantec, C., 2014. Crowdsourcing and its application to transportation data collection and management. Transportation Research Record: Journal of the Transportation Research Board 2414, 1–8. Monroe, M.C., Pennisi, L., McCaffrey, S., Mileti, D., 2006. Social science to improve fuels management: a synthesis of research relevant to communicating with homeowners about fuels management. In: General Technical Report NC-267USDA FS, St. Paul Minnesota. Available at: http://www.nrs.fs.fed.us/pubs/gtr/gtr_nc267.pdf. Morrow, B., Johnston, K., Davies, J., 2008. Rating Interface Wildfire Threats in British Columbia, Report to BC Ministry of Forests and Range Protection Branch. Victoria, British Columbia. Neis, P., Zielstra, D., Zipf, A., 2011. The street network evolution of crowdsourced maps: OpenStreetMap in Germany 2007–2011. Future Internet 4 (1), 1–21. Nelson, T.A., Denouden, T., Jestico, B., Laberee, K., Winters, M., 2015. BikeMaps.org: a global tool for collision and near miss mapping. Frontiers in Public Health 3, 53. Newman, G., Roetman, P., Vogel, J., Brocklehurst, M., Cappadonna, J., Cooper, C., Goebel, C., Haklay, M., Kyba, C., Piera, J., Ponti, M., Sforzi, A., Shirk, J., 2015. Letter in response to ‘Rise of the Citizen Scientist’. European Citizen Science Association, Berlin, Germany. Available at: https://ecsa.citizen-science.net/sites/default/files/cs_associations_response_to_nature_editorial.pdf (Accessed: 3 April 2017). Nuojua, J., 2010. WebMapMedia: A map-based Web application for facilitating participation in spatial planning. Multimedia Systems 16 (1), 3–21. http://dx.doi.org/10.1007/ s00530-009-0175-z. Ottinger, G., 2009. Buckets of resistance: standards and the effectiveness of citizen science. Science, Technology & Human Values 35 (2), 244–270. Poore, B.S., Wolf, E.B., 2013. Metadata squared: enhancing its usability for volunteered geographic information and the GeoWeb. In: Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice, 2012th edn. Springer, Dordrecht, Netherlands, pp. 43–64. Quattrone, G., Capra, L., De Meo, P., 2015. There’s no such thing as the perfect map. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social ComputingACM Press, New York, pp. 1021–1032. Available at: http://dl.acm.org/citation.cfm?doid¼2675133.2675235. Quesnot, T., Roche, S., 2014. Measure of Landmark Semantic Salience through Geosocial Data Streams. ISPRS International Journal of Geo-Information 4 (1), 1–31. http:// dx.doi.org/10.3390/ijgi4010001. Raddick, M.J., Bracey, G., Gay, P.L., et al., 2010. Galaxy zoo: exploring the motivations of citizen science volunteers. Astronomy Education Review 9 (1), 10103. Radeloff, V.C., Hammer, R.B., Stewart, S.I., Fried, J.S., Holcomb, S.S., McKeefry, J.F., 2005. The Wildland–Urban Interface in The United States. Ecological Applications 15 (3), 799–805. http://dx.doi.org/10.1890/04-1413. Reason, J., 1991. Too little and too late: a commentary on accident and incident reporting systems. In: van der Schaaf, T., Lucas, D., Hale, A. (Eds.), Near miss reporting as a safety tool. Butterworth-Heinemann, Oxford, pp. 9–26. Robertson, C., Feick, R., 2015. Bumps and bruises in the digital skins of cities: unevenly distributed user-generated content across US urban areas. Cartography and Geographic Information Science 43 (4), 283–300. http://dx.doi.org/10.1080/15230406.2015.1088801. Robertson, C., McLeman, R., Lawrence, H., 2015. Winters too warm to skate? Citizen-science reported variability in availability of outdoor skating in Canada. Canadian Geographer 59 (4), 383–390. http://dx.doi.org/10.1111/cag.12225. Roche, S., Propeck-Zimmermann, E., Mericskay, B., 2013. GeoWeb and crisis management: issues and perspectives of volunteered geographic information. GeoJournal 78 (1), 21–40. Romanillos, G., Zaltz Austwick, M., Ettema, D., De Kruijf, J., 2016. Big data and cycling. Transport Reviews 36 (1), 114. Romanillos, G., Zaltz Austwick, M., Ettema, D., De Kruijf, J., 2015. Big Data and Cycling. Transport Reviews 1647, 1–20. http://dx.doi.org/10.1080/01441647.2015.1084067. Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C., Lewis, D., Jacobs, D., 2012. Dynamic Changes in Motivation in Collaborative Citizen-Science Projects. In: Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work - CSCW ’12, pp. 217–226. http://dx.doi.org/10.1145/2145204.2145238. Seattle, WA. Sanchez, T.W., Brenman, M., 2013. Public participation, social equity, and technology in urban governance. In: Silva, C.N. (Ed.), Citizen e-participation in urban governance. IGI Global, Hershey, PA, pp. 35–48.

Current Themes in Volunteered Geographic Information

41

Shelton, T., Poorthuis, A., Zook, M., 2015. Social media and the city: Rethinking urban socio-spatial inequality using user-generated geographic information. Landscape and Urban Planning 142, 198–211. http://dx.doi.org/10.1016/j.landurbplan.2015.02.020. Show, H., 2015. Rise of the citizen scientist. [online]. Nature 524 (7565), 265. Smith, M., Szongott, C., Henne, B., von Voigt, G., 2012. Big data privacy issues in public social media. In: 2012 6th IEEE International Conference on Digital Ecosystems and Technologies (DEST), pp. 1–6. http://dx.doi.org/10.1109/DEST.2012.6227909. Stefanidis, A., Crooks, A., Radzikowski, J., 2013. Harvesting ambient geospatial information from social media feeds. GeoJournal 78 (2), 319–338. http://dx.doi.org/10.1007/ s10708-011-9438-2. Strava, 2017. How it works: Frequently asked questions. Available at: https://www.strava.com/how-it-works (Accessed: 3 April 2017). Stephens, M., 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78 (6), 981–996. Sui, D., Goodchild, M., Elwood, S., 2013. Volunteered geographic information, the exaflood, and the growing digital divide. In: Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), Crowdsourcing geographic knowledge: volunteered geographic information (VGI) in theory and practice, 2012th edn. Springer, Dordrecht, Netherlands, pp. 1–12. Sullivan, B.L., Aycrigg, J.L., Barry, J.H., Bonney, R.E., Bruns, N., Cooper, C.B., Damoulas, T., Dhondt, A.a., Dietterich, T., Farnsworth, A., Fink, D., Fitzpatrick, J.W., Fredericks, T., Gerbracht, J., Gomes, C., Hochachka, W.M., Iliff, M.J., Lagoze, C., La Sorte, F.a., Merrifield, M., Morris, W., Phillips, T.B., Reynolds, M., Rodewald, A.D., Rosenberg, K.V., Trautmann, N.M., Wiggins, A., Winkler, D.W., Wong, W.-K., Wood, C.L., Yu, J., Kelling, S., 2014. The eBird enterprise: An integrated approach to development and application of citizen science. Biological Conservation 169, 31–40. http://dx.doi.org/10.1016/j.biocon.2013.11.003. Teschke, K., Harris, M., Reynolds, C., et al., 2012. Route infrastructure and the risk of injuries to bicyclists: a case-crossover study. American Journal of Public Health 102 (12), 2336–2343. Teschke, K., Frendo, T., Shen, H., et al., 2014. Bicycling crash circumstances vary by route type: a cross-sectional analysis. BMC Public Health 14 (1), 1205. Toman, E., Shindler, B., Brunson, M., 2006. Fire and Fuel Management Communication Strategies: Citizen Evaluations of Agency Outreach Activities. Society & Natural Resources 19 (4), 321–336. http://dx.doi.org/10.1080/08941920500519206. Unwin, D.J., 2005. Fiddling on a different planet? Geoforum 36 (6), 681–684. Ward, M., Nuckols, J., Giglierano, J., et al., 2005. Positional accuracy of two methods of geocoding. Epidemiology 16 (4), 542–547. Whitmarsh, L., 2011. Scepticism and uncertainty about climate change: Dimensions, determinants and change over time. Global Environmental Change 21 (2), 690–700. http:// dx.doi.org/10.1016/j.gloenvcha.2011.01.016. Wiggins, A., 2013. Free as in puppies: compensating for ICT constraints in citizen science. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work and Social ComputingACM Press, San Antonio, TX, pp. 1469–1480. Available at: http://dl.acm.org/citation.cfm?id¼2441942. Winters, M., Babul, S., Becker, H.J., et al., 2012. Safe cycling: how do risk perceptions compare with observed risk? Canadian Journal of Public Health 103 (9), S42–S47. Zook, M., Graham, M., Shelton, T., Gorman, S., 2010. Volunteered Geographic Information and Crowdsourcing Disaster Relief: A Case Study of the Haitian Earthquake. World Medical & Health Policy 2 (2), 6–32. http://dx.doi.org/10.2202/1948-4682.1069.

1.05

Open Data and Open Source GIS

Xinyue Ye, Kent State University, Kent, OH, United States © 2018 Elsevier Inc. All rights reserved.

1.05.1 Introduction 1.05.2 Open Data 1.05.3 Open Source GIS 1.05.4 Practicing Open Source GIS 1.05.5 Summary Acknowledgments References

1.05.1

42 42 44 44 47 48 48

Introduction

With the growing capability of recording individual’s digital footprints and the emerging open culture, the open big data are flooding everywhere (Batty, 2012). Geospatial data are an important component of open data unfolding right in front of our eyes (Warf and Arias, 2008). Geographic information system (GIS) research is shifting toward analyzing ever-increasing amounts of large-scale, diverse data in an interdisciplinary, collaborative, and timely manner. Goodchild (2013) proposed the crowd, social, and geographic approaches to assess big data quality. Sui (2014) argued that open GIS should involve eight dimensions related to data, software, hardware, standards, research, publication, funding, and education facilitated by web-based tools and the growing influence of the open culture. The key pillars of open GIS have always been and will continue to be open source, open data, open modeling, open collaboration, and open publishing for future GIS research and applications (Rey, 2014). Sui (2014) noted that “the big data torrent will eventually be more powerful if they can be made to conform to open standards such as those developed by OGC over the years”. This article places greater emphasis on open data and open source GIS. According to Sui (2014), open GIS offers four exciting opportunities for participation and collaboration among both GIS experts and volunteers: (1) technology-driven opportunities for addressing big data challenges; (2) application-led opportunities for improving decisions across all levels; (3) curiosity-inspired, crowd-powered opportunities for developing citizen science; and (4) education-focused opportunities for realizing a spatial university. To realize science’s powerful capacity for self-correction, it is critical to reproduce the outcomes of scientific research (Ye et al., 2015). However, if data and codes are not transparent, the reproductivity will not work due to bottlenecks or restrictions from copyright, patents, or other mechanisms of control. Open data and open source GIS aim to make GIS research open to everyone. In other words, data and codes should be made legally open and accessible to both professional and nonprofessional communities (Stodden, 2009). The growing and evident interdisciplinary efforts dedicated to “open data and open source GIS” represent a transformative trend shaped by increased scholarly collaboration and research methods sharing (Sui, 2014). Instead of following the traditional proprietary approach, open data and open source GIS can lead to a large number of benefits to both citizens and businesses across the globe, with the success of GIS research, education, and applications. The rest of this article is organized as follows. Elements of the emerging open data are described in section “Open Data”. Section “Open source GIS” discusses how open source GIS plays a pivotal role in the research and education. Section “Practicing Open Source GIS” demonstrates the use of open source GIS for regional economic analysis. The article ends with a summary and conclusion for open GIS paradigm in section “Summary” toward the goal of reproducibility and the desired reuse.

1.05.2

Open Data

Knowledge is eventually derived from data. The term “open data” was coined in 1995 to deal with the disclosure of geophysical and environmental data realizing the idea of common good applied to knowledge (Chignard, 2013). Open data is gaining popularity with the launch of open government data initiatives. The volume of open space–time data in various disciplines and domains has dramatically increased due to the growing sophistication and ubiquity of information and communication technology (Jiang, 2011). Open data can be used and reused at no cost or restriction without mechanisms of control such as copyright and patents (Auer et al., 2007). The intention of open data movement is to make publicly acquired data available for direct manipulation such as cross tabulation, visualization, and mapping (Gurstein, 2011). It is clear that a space–time perspective in using such data has become increasingly relevant to our understanding of socioeconomic and environmental dynamics in the collaborative and transdisciplinary manner. As noted by (Rey, 2014), “Open data constitutes available, intelligible, accessible, and usable data. For science’s error–correction mechanisms to kick in, data underlying research projects must be made accessible to the wider research community”. Featured by the ever-growing volume, variety, and velocity of ubiquitous geospatial information, the big spatial data in the changing environmental, urban, and regional contexts demand innovative thinking that can capture the rich

42

Open Data and Open Source GIS

43

information of patterns and processes and provide spatial strategies for sustainable development. Meanwhile, the volume of data created by an ever-increasing number of geospatial sensor platforms such as remote sensing and social sensing (including citizen sensors) to collect data at ever-increasing spatial, spectral, temporal, and radiometric resolutions currently exceeds petabytes of data per year and is only expected to increase. Data come from various sources, types, organizations (governments, military, NGOs, etc.), and purposes. Recent developments in information technology commonly referred to as big data along with the related fields of data science and analytics are needed to process, examine, and realize the value of the overwhelming amount of open geospatial data. The research agenda has been substantially redefined in light of open data, which have transformed the focus of GIS research toward dynamic, spatial, and temporal interdependence of human–environment issues across multiple scales. Metadata is information about data. The use of metadata enhances the opportunities for semantical interoperability involving open data, lowering the cost of access, and manipulating and sharing across data boundaries (Nogueras-Iso et al., 2004). As declared by (FGDC 2017), “Geospatial metadata describes maps, Geographic Information Systems (GIS) files, imagery, and other location-based data resources. The FGDC is tasked by Executive Order 12906 to enable access (see GeoPlatform.gov) to National Spatial Data Infrastructure (NSDI) resources and by OMB Circular A-16 and the A-16 Supplemental Guidance to support the creation, management, and maintenance of the metadata required to fuel data discovery and access”. Twelve critical factors were identified to city-level, regional, and transnational cases regarding the publication and use of open data (Susha et al., 2015): (1) a national guide on legal intellectual property right issues; (2) a clear data publishing process; (3) addressing of societal issues and publishing of related data; (4) interest for users; (5) where to publish datasets; (6) a virtual competence center for technical help; (7) a strategy for maintaining published datasets; (8) allowing citizens to post, rate, and work with datasets and web services; (9) a clear user interface; (10) standards for data, metadata, licenses, URIs, and exchange protocols; (11) integrate metadata schemas and federated controlled vocabularies; and (12) application programming interfaces for open data provision. As (Maguire and Longley, 2005, p. 3) noted, “geoportals are World Wide Web gateways that organize content and services such as directories, search tools, community information, support resources, data and applications”. The Geospatial One-Stop emerged as an easier, faster, and less expensive gateway for searching relevant geographic information sponsored by the US Federal Government (Yang et al., 2007). As (Goodchild et al., 2007, p. 250) pointed out, “humans have always exchanged geographic information, but the practice has grown exponentially in recent years with the popularization of the Internet and the Web, and with the growth of geographic information technologies. The arguments for sharing include scale economies in production and the desire to avoid duplication. The history of sharing can be viewed in a three-phase conceptual framework, from an early disorganized phase, through one centered on national governments as the primary suppliers of geographic information, to the contemporary somewhat chaotic network of producers and consumers”. Many governments have been developing various programs to open data available via websites for public consumption such as the US, the UK, and Canada. These datasets typically contain records with spatial properties in democratizing public sector data and driving innovation (Arribas-Bel, 2014). Data.gov as a website launched in 2009 aims to enhance accessing the repository for federal government information and to ensure better accountability and transparency over 50 US government agencies with over 194,708 datasets (Hendler et al., 2012; Data.Gov, 2017). In line with the spirit of crowdsourcing and citizen science, open data movement is that data should be legally and technically open to the scientific community, industry, and the public to use and republish. In other words, data should be provided in open machine-readable formats and readily located, along with the relevant metadata evaluating the reliability and quality of the data to promote increased data use and facilitate credibility determination. To advocate both transparency and innovation, open government data initiatives have been implemented in many countries from local to global level regarding accessibility, persistent identification, and long-term availability. The open data initiatives encourage peer production, interactivity, and user-generated innovation, which have stimulated the sharing and distribution of information across communities and disciplines. The underlying philosophy of open government or the theory of open source governance is that the interested citizens can access the documents of the government to facilitate effective public oversight and enable the direct involvement in the legislative process to promote openness, participation, and efficiency in government (Janssen et al., 2012). Civic hacking is utilizing government data to make governments more accountable through solving civic problems by those who care about their communities. For instance, Code for America is a nonprofit organization founded in 2009 to deal with the growing cross-sector gap in their effective use of technology and design for good (Wadhwa, 2011). Some civic hackers are employed by Code for America. National Day of Civic Hacking is a nationwide day of action where various developers come together to coordinate civic tech events dedicated to civic hacking (Johnson and Robinson, 2014). Transparency and participation through data integration and dissemination across domains and boundaries will facilitate collaboration among researchers, private sectors, and civilians leveraging their skills to help the society. (Gurstein, 2011) also examined the impact of such open data initiatives on the poor and marginalized and call for ensuring a wide opportunity for effective data use in the context of digital divide. Papers with publicly available datasets usually have a higher citation rate and visibility than similar studies without available data, either by a direct data link or indirectly from cross-promotion (Piwowar and Vision, 2013). However, open data movement often faces the economic, legal, organizational, political, social, and technical challenges at either individual or institutional levels. Many researchers are still reluctant to share the original data used in their research to support the open access initiative, fearing the loss of credits, future publications, and competitive advantage, as well as the time to document and deal with questions for users (Fernandez, 2010; Sui, 2014). Their motivation for publication of datasets and intentions in doing so remain uncertain. Arguably, different person-based policies among stakeholders with various backgrounds and interests need to be developed to encourage sharing behavior in collaboration. Organizational support has been playing a substantial role in promoting researchers’ intentions of sharing datasets through dealing with the heterogeneity of collaborators and the complexity of the data sharing process.

44

1.05.3

Open Data and Open Source GIS

Open Source GIS

The Free and Open Source Software for Geospatial Conference has been playing a pivotal role in promoting the open science in software development. Open source GIS is gaining growing market shares in academia, business, and public administration. This recognition came at a time when many open source programming and scripting languages such as Python and R are starting to make major inroads in geospatial data production, analysis, and mapping (Ye and Rey, 2013). As a consequence, open source software development has been a crucial element in the GIS community’s engagement with open GIS and the most well-developed aspect of open GIS (Rey, 2009). The availability and widespread use of codes and tools to support more robust data analysis will play a critical role in the adoption of new perspectives and ideas across the spatial sciences. The openness to scrutiny and challenge underlies the open source GIS movement through the release of the source code, which has subsequently influenced software functionality and support (Neteler and Mitasova, 2008). Users have the freedom to access, modify, and distribute the source code based on licensing agreements such as MPL, MIT, Apache, GPL, and BSD. Making source code both legally and technically open is the very first step of being promoted as a public good (Rey and Ye, 2010). In particular, scientists could benefit from the open source code, which would reduce code duplication and free up additional developer time to enhance the respective applications (Rey, 2009). Bonaccorsi and Rossi (2003) argued that “when programmers are allowed to work freely on the source code of a program, this will inevitably be improved because collaboration helps to correct errors and enables adaptation to different needs and hardware platforms”. The credibility of research findings tends to be higher for papers with the available code. Third-party researchers might be more likely to adopt such papers as the foundation of additional research. In addition, coding repository platforms such as GitHub and BitBucket are making this open source tide stronger. Links from code to a paper might enhance the search frequencies of the paper because of accelerated awareness of the methods and findings. The dramatic improvement in computer technology and the availability of large-volume geographically referenced data have enabled the spatial analytical tools to move from the fringes to central positions of methodological domains. By and large, however, many existing advanced spatial analysis methods are not in the open source context. The open source and free approach offer unprecedented opportunities and the most effective solution for developing software packages through attracting both users and developers. Instead of reinventing the wheel, we can study how the program works, to adapt it, and to redistribute copies including modifications from a number of popular alternatives. Anselin (2010) emphasized the role of the open source software movement in stimulating new development, transcending disciplinary boundaries, and broadening the community of developers and adopters. With accelerated development cycle, open source tools can give GIS users more flexibility to meet the user community needs that are only bound by our imaginations, which are aligned with more efficient and effective scientific progress. New theories and novel practices can thus be developed beyond narrowly defined disciplinary boundaries (Sui, 2014). Regarding open source efforts on spatial analysis, Arribas-Bel (2014) argued, “the traditional creativity that applied researchers (geographers, economists, etc.) have developed to measure and quantify urban phenomena in contexts where data were scarce is being given a whole new field of action”. Sui (2014) also noted that a hybrid model integrating both open/free paradigm and proprietary practices (copyright and patent, IP stuff) would be the most realistic option and promising route to move GIS forward. Open source GIS can facilitate the interdisciplinary research due to “the collaborative norms involving positive spillover effects in building a community of scholars” (Rey, 2009; Ye et al., 2014). During the past several decades, burgeoning efforts have been witnessed on the development and implementation of spatial statistical analysis packages, which continue to be an active area of research (Rey and Anselin, 2006; Anselin, 2010). The history of open source movement is much younger, but its impact on GIS world is impressive (Rey, 2009). As Rey (2009) commented, “a tenet of the free software (open source) movement is that because source code is fundamental to the development of the field of computer science, having freely available source code is a necessity for the innovation and progress of the field”. The development of open source packages has been boosted. However, many duplicates and gaps in the methodological development have also been witnessed. The open source toolkit development is community-based with developers as well as casual and expert users located everywhere. Through the use of an online source code repository and mailing lists, users and developers can virtually communicate to review the existing code and develop new methods. However, Tsou and Smith (2011, p. 2) argued that “open source software is not well adopted in GIS education due to the lack of user-friendly guidance and the full integration of GIS learning resources”. Some representative open source desktop GIS software packages include KOSMO, gvSIG, uDig, Quantum GIS (QGIS), Geographic Resource Analysis Support System (GRASS), and so on. KOSMO was implemented using the Java programming language based on the OpenJUMP platform and free code libraries. Developed by the European GIS community offering multiple language user interfaces, gvSIG is known for having a user-friendly interface, being able to access a wide range of vector and raster formats. Built upon IBM’s Eclipse platform, uDig (user-friendly desktop Internet GIS) is an open source (EPL and BSD) desktop application framework. QGIS integrates with other open source GIS packages such as PostGIS, GRASS GIS, and MapServer, along with plugins being written in Python or Cþþ. As a founding member of the Open Source Geospatial Foundation (OSGeo), GRASS offers comprehensive GIS functions for data management, image processing, cartography, spatial modeling, and visualization.

1.05.4

Practicing Open Source GIS

The study of economic inequality and convergence continues to attract enormous attention thus becoming a dynamic academic landscape where the interdisciplinary literature has evolved (Ye and Rey, 2013). This interest has been reflected in the analysis

Open Data and Open Source GIS

45

of spatial patterns of economic convergence and the temporal dynamics of geographical inequality. However, the literature studies of process analysis and form analysis are mainly separated because most methods are standalone without the sharing of the code. At the same time, the increasing availability of open space–time data has outpaced the development of space–time analytical techniques across social sciences. The methodological integration of space and time call for open data and open source methods, which will help narrow the gap between growth theories and their empirical testing. While the substantive focus of this case study is on open source computing of regional income dynamics, the issues examined are relevant to the development of a wide class of methods based on open science. This section suggests some novel exploratory approaches to compare spatial pattern and temporal trend of regional development. The cross-fertilization between domain science and open source computing is identified and illustrated. There is a growing list of papers using local indicator of spatial autocorrelation (LISA) to measure the spatial structure of socioeconomic patters due to the following two reasons: the availability of open socioeconomic data across administrative units and the implementation of LISA indicators in software packages (Anselin, 1995). However, the release of spatial panel data calls for the extension of static spatial methods into the dynamic context. A LISA time path can be used to measure how economies covary over space and time in a regional context, giving insights to the debate about cooperative versus competitive regional growth. The challenging issue is that most domain users cannot handle the LISA time path due to the lack of programming skills. Once the new space–time indicators are developed, an extensive set of inferential approaches is needed to evaluate their sampling distributions for comparative analysis between two regional systems. A tortuous LISA time path indicates that the focal economy and its average neighbor have instable convergence/divergence rates, while a frequent crossing suggests mixed convergence and divergence trends over time. With the growing awareness of the potential importance of the spatial dimension of economic structure, these space–time constructs can be implemented into empirical specifications to test the existence of poverty traps, convergence clubs, and spatial regimes. Two interesting questions arise from this analysis (the value hereafter refers to tortuous indicator or crossing ratio): 1. Is a LISA time path statistically more or less tortuous than what is expected if the path is randomly organized? 2. Is a LISA time path’s crossing ratio statistically larger/smaller than what is expected if the path is randomly organized? Both LISA time path and various simulation procedures can be modified based on codes from open source packages STARTS and PySal (Rey, 2009, 2014). It is expected that LISA coordinates are independent from each other; however, an individual region’s economic growth at one time point relates to its history and its neighbors’ temporal economic dynamics. An alternative to the aforementioned methods is to employ a Monte Carlo simulation approach and thus circumvent the assumption of independence that causes inferential problems. The presence of space–time effects needs to be considered when examining the distributional properties of a LISA time path indicator. Three sets of permutation approaches are suggested to test the independence of space, time, and trend through python code implementation. The spatial independence test answers the following questions: Can the observed value (or the differences between two observed values) be used to reject the null hypothesis of spatial randomness? The temporal independence test answers the following question: Can the observed value (or the differences between two observed values) be used to reject the null hypothesis of temporal randomness? The trend independence test answers the following question: Can the observed value (or the difference between two observed values) be used to reject the null hypothesis of trend randomness? Figs. 1–3 show the effects of these three independence tests, the top left view is a LISA time path of region i from Time 1 to Time 10. In Fig. 1, through random permutations on the spatial coordinates of all regions, the other three graphs show three different LISA time paths with different groups of points (LISA coordinates). In Fig. 2, through randomly relabeling time stamps, the other three graphs show different LISA time paths based on the same group of points. In Fig. 3, through randomly normalization the path segments to follow a normal distribution, the other three graphs show different LISA time paths that retain the trends of the original path. We test LISA path significance using tortuosity and crossing ratio using provincial GDP data in mainland China from 1998 to 2008, consisting of 31 provinces in total. This work uses the k-nearest neighbors method to construct the spatial matrix, and default k is 4. The related code is adopted from PySal (http://pysal.readthedocs.io/en/latest/users/tutorials/weights.html). We compute the observations ot and their lag values lt for each time points. Both values are standardized as z-scores. Construct a dictionary D that uses the time point t as key, and a matrix comprised two rows transposed from ot and lt. We then construct the observed LISA path with the values extracted from the dictionary D. A LISA time path Pi for a given province i consists of several spatial coordinates with their temporal labeling, represented as: Pi ¼ fðxk ; yk ; tk Þg; 0 < k < n þ 1 The first two elements xk and yk in the tuple are the attribute value and its spatial lag value, respectively, of the province i, while tk is the kth year when the value is measured. Finally, we obtain a set of paths Sp for all provinces. For each path, Pi in the path set Sp, two indexes are integrated in the following simulation procedures: tortuosity and crossing ratio. Tortuosity is represented as:  Tp ¼ distanceððx1 ; y1 Þ; ðxn ; yn ÞÞ lenðP Þ where (x1,y1) and (xn,yn) are the head and tail spatial coordinates of the LISA path P, and len(P) measures the total length of all the segments of the path. The value of tortuosity is in the range [0, 1], with 0 representing higher degree of path tortuosity, and 1 for the path to be completely straight spatially, stretching in a stable direction.

46

Open Data and Open Source GIS

11

11 13

14

13

9

14

4

3 2

9

12

10

5

10 12

5 8

2

1

4

3

6

6

8

1

7

7 13 14

12

14

9

1

8

11 2

5

8

12

11

13

3 10

9

10 5

4

1

3

7 4

7

2

6

6 Fig. 1

Spatial independence test.

11

11

13

13 9

14 2

1

7

8

2

12

3

1

4

3

6

12

10

5

9

14

6

8

5

10 4

7

3 4

12

7

2

14

3

8

13 9

6

12

11

5

10

5

6

13 14

4

10

11

10

9

1

2 8 1

Fig. 2

Temporal independence test.

The crossing ratio is represented as: cp ¼ 2  icp

.

np 2  np



where icp represents the self-intersection count of the LISA path p, and np represents the number of points in p. The self-intersection count is calculated by checking whether segments of this path P intersect with each other. The value of crossing ratio also lies in the range [0, 1], with 0 representing no crossing, which indicates that the path is highly stable, and 1 for another extreme case where all the segments in the path happen to intersect with each other, indicating highly unstable evolvements of the path over time.

Open Data and Open Source GIS

47

11 13

13 12

9

14

14 2

2

12

10

5

3

8

7

4 3

6

11 4

1 6

1

9

10

5

8 7 11 11

14

13

2 5 4 12 3 1 10 6 7

Fig. 3

13 14 2 5 1 6 3

9

8

10

12 9

4 8 7

Trend independence test.

We permute the original data 999 times using one of the following three modes: spatial independence, temporal independence, and trend independence. (i) Spatial independence test The spatial independence acts on all the provinces. For each time point t, randomly permutate the spatial coordinates for all the regions. This process rearranges the LISA and the lag values among the provinces, and a new LISA time path is constructed for the given region. (ii) Temporal independence test The temporal independence test uses the data solely from the observed LISA path. The test randomly permutates the temporal order of the spatial coordinates of the given region and form a new LISA time path. (iii) Trend independence test Similar to the temporal independence test, the trend independence test takes only the data from the observed LISA path. The test starts with breaking the observed LISA path into a set of vectors. These vectors are then normalized to follow a normal distribution centered at zero and with unit length as standard deviation while preserving their directions. A trend list is thus formed. The new LISA time path is generated by first randomly picking a starting coordinate and then uses the trend list to construct the whole path. The values calculated in the simulation process are then ordered, and the pseudo significance level is then computed. This simply sorts the empirically generated 999 values and then develops a pseudo significance level by calculating the share of the empirical values that are higher than the actual value. The results show that spatial independence and temporal independence tests show stronger significance level than trend independence test. The ranks of the tortuosity and crossing ratio in each test are not much correlated but are relatively stable respectively across three tests. Under the hypothesis of spatial or temporal independence, all provinces are significantly tortuous than expected, except three provinces (Jiangxi, Jilin, and Shanghai). However, none of the provinces shows significant crossing frequencies. In other words, there are no obvious mixed convergence and divergence trends over time.

1.05.5

Summary

Spatial turn in many socioeconomic theories has been noted in a vast field, encompassing both social and physical phenomena (Krugman, 1999; Goodchild et al., 2000; Batty, 2012). The fast growth in socioeconomic dynamics analysis is increasingly seen as attributable to the availability of space–time datasets (Rey and Ye, 2010; Ye et al., 2016). Rigorous space–time analysis and modeling open up a rich empirical context for scientific research and policy interventions. To help scholars and stakeholders deal with the challenges and issues, methodologies and best practice guidelines are needed at both international and national or local level. Making data and source code available for both replication and continuing research will have far-reaching and broader impacts in both GIS and domain communities, highlighting the growing recognition of the role of geography in interdisciplinary research (Karnatak et al., 2012). Such improved discoverability is beneficial for both investigators and the science community as a whole. Open source GIS research has received increased attention and will lead to the revolutionary advances in individual and collective decision-making processes (Sui et al., 2012). The goal of this article is to make a modest effort to synthesize an agenda surrounding and hopefully to stimulate further discussions that promote open GIS as the driving force to guide the development of

48

Open Data and Open Source GIS

GIS at a finer scale (Sui, 2014). To gain momentum under the general umbrella of big data and new data, GIS should fully embrace the vision of an open data and open source GIS paradigm to enhance government efficiency and to improve citizen services (Wright, 2012). Although there are academic, legal, social/political, and environmental impediments for the practice, open GIS will provide numerous technology-driven, application-led, science-inspired, and education-focused opportunities (Sui, 2014). Methods developed in the mainstream spatial science disciplines need to be progressed with more attention paid to the potential reuse of data and codes. Though a growing list of research papers have highlighted the increasing awareness of spatiotemporal thinking and action, the gap has been widening between a small group of method developers and users. Hence, a crucial step is to develop the dialog between computational scientists and domain users, seeking the cross-fertilization between the two fast-growing communities. As Rey (2009) suggested, “increased adoption of open source practices in spatial analysis can enhance the development of the next generation of tools and the wider practice of scientific research and education”. The methods are built in open source environments and thus easily extensible and customizable. Hence, open source project can promote collaboration among researchers who want to improve current functions or add extensions to address specific research questions.

Acknowledgments This research was funded by the National Science Foundation (1416509, 1535031, 1637242).

References Anselin, L., 2010. Thirty years of spatial econometrics. Papers in Regional Science 89 (1), 3–25. Anselin, L., 1995. Local indicators of spatial associationdLISA. Geographical Analysis 27 (2), 93–115. Arribas-Bel, D., 2014. Accidental, open and everywhere: Emerging data sources for the understanding of cities. Applied Geography 49, 45–53. Auer, S. R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R. and Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In: The semantic web (Lecture notes in computer science, vol. 4825), p. 722. 10.1007/978-3-540-76298-0_52. Batty, M., 2012. Smart cities, big data. Environment and Planning-Part B 39 (2), 191. Bonaccorsi, A., Rossi, C., 2003. Why open source software can succeed. Research Policy 32 (7), 1243–1258. Chignard, S. (2013). A brief history of open data. ParisTech Review, 29 March. Data.Gov (2017) http://www.data.gov/ (accessed 20 February 2017). Fernandez, R. (2010). Barriers to open science: From big business to Watson and Crick. http://opensource.com/business/10/8/barriers-open-science-big-business-watson-andcrick (accessed 16 April 2017). FGDC.Gov (2017). https://www.fgdc.gov/metadata (accessed 20 February 2017). Goodchild, M.F., 2013. The quality of big (geo) data. Dialogues in Human Geography 3 (3), 280–284. Goodchild, M.F., Anselin, L., Appelbaum, R.P., Harthorn, B.H., 2000. Toward spatially integrated social science. International Regional Science Review 23 (2), 139–159. Goodchild, M.F., Fu, P., Rich, P., 2007. Sharing geographic information: An assessment of the Geospatial One-Stop. Annals of the Association of American Geographers 97 (2), 250–266. Gurstein, M. B. (2011). Open data: Empowering the empowered or effective data use for everyone? First Monday 16(2). (accessed 16 April 2017). Hendler, J., Holm, J., Musialek, C., Thomas, G., 2012. US government linked open data: Semantic. Data.Gov. IEEE Intelligent Systems 27 (3), 25–31. Janssen, M., Charalabidis, Y., Zuiderwijk, A., 2012. Benefits, adoption barriers and myths of open data and open government. Information Systems Management 29 (4), 258–268. Jiang, B., 2011. Making GIScience research more open access. International Journal of Geographical Information Science 25 (8), 1217–1220. Johnson, P., Robinson, P., 2014. Civic hackathons: Innovation, procurement, or civic engagement? Review of Policy Research 31 (4), 349–357. Karnatak, H., Shukla, R., Sharma, V., Murthy, Y., Bhanumurthy, V., 2012. Spatial mashup technology and real time data integration in geo-web application using open source GIS – A case study for disaster management. Geocarto International 27 (6), 499–514. Krugman, P., 1999. The role of geography in development. International Regional Science Review 22 (2), 142–161. Maguire, D.J., Longley, P.A., 2005. The emergence of geoportals and their role in spatial data infrastructures. Computers, Environment and Urban Systems 29 (1), 3–14. Neteler, M., Mitasova, H., 2008. Open source GIS: A GRASS GIS approach, 3rd ed. Springer, Berlin. Nogueras-Iso, J., Zarazaga-Soria, F.J., Lacasta, J., Béjar, R., Muro-Medrano, P.R., 2004. Metadata standard interoperability: Application in the geographic information domain. Computers, Environment and Urban Systems 28 (6), 611–634. Piwowar, H.A., Vision, T.J., 2013. Data reuse and the open data citation advantage. Peerj 1 (175), 1–25. http://dx.doi.org/10.7717/peerj.175. Rey, S., 2009. Show me the code: Spatial analysis and open source. Journal of Geographical Systems 11 (2), 191–207. Rey, S.J., 2014. Open regional science. The Annals of Regional Science 52 (3), 825–837. Rey, S.J., Anselin, L., 2006. Recent advances in software for spatial analysis in the social sciences. Geographical Analysis 38 (1), 1–4. Rey, S., and Ye, X. (2010). Comparative spatial dynamics of regional systems. In: Páez, A., Le Gallo, J., Buliung, R. & Dall’Erba, S. (eds.) Progress in spatial analysis: Theory, computation, and thematic applications, pp. 441–464. Springer: New York City. Stodden, V., 2009. The legal framework for reproducible research in the sciences: Licensing and copyright. IEEE Computing in Science and Engineering 11 (1), 35–40. Sui, D., 2014. Opportunities and impediments for open GIS. Transactions in GIS 18 (1), 1–24. Sui, D., Elwood, S., Goodchild, M., 2012. Crowdsourcing geographic knowledge: Volunteered geographic information in theory and practice. Springer, Berlin. Susha, I., Zuiderwijk, A., Charalabidis, Y., Parycek, P., Janssen, M., 2015. Critical factors for open data publication and use: A comparison of city-level, regional, and transnational cases. JeDEM-eJournal of eDemocracy and Open Government 7 (2), 94–115. Tsou, M. H., & Smith, J. (2011). Free and open source software for GIS education. White paper written with the support from the National Geospatial Technology Center of Excellence (GeoTech Center, http://www.geotechcenter.org/). Wadhwa, V. (2011). Code for America: An elegant solution for government IT problems. The Washington Post. Warf, B., Arias, S. (Eds.), 2008. The spatial turn: Interdisciplinary perspectives. Routledge, London, UK. Wright, D. (2012). Big data, GIS, and the academic community. http://blogs.esri.com/esri/esri-insider/2012/10/03/big-data-gis-and-the-academic-community/#more1311.(accessed 15 April 2017).

Open Data and Open Source GIS

49

Yang, P., Evans, J., Cole, M., Marley, S., Alameh, N., Bambacus, M., 2007. The emerging concepts and applications of the spatial web portal. Photogrammetric Engineering & Remote Sensing 73 (6), 691–698. Ye, X., Rey, S.J., 2013. A framework for exploratory space-time analysis of economic data. Annals of Regional Science 50 (1), 315–339. Ye, X., She, B., Wu, L., Zhu, X., Cheng, Y., 2014. An open source toolkit for identifying comparative space-time research questions. Chinese Geographical Science 24 (3), 348–361. Ye, X., She, B., Zhao, H., Zhou, X., 2016. A Taxonomic Analysis of Perspectives in Generating Space-Time Research Questions in Environmental Sciences. International Journal of Applied Geospatial Research doi. http://dx.doi.org/10.4018/IJAGR.2016040104. Ye, X., Yu, J., Wu, L., Li, S., Li, J., 2015. Open source point process modeling of earthquake. In: Geo-informatics in resource management and sustainable ecosystem. Springer, Berlin and Heidelberg, pp. 548–557.

1.06

GIS Databases and NoSQL Databases

Peng Yue and Zhenyu Tan, Wuhan University, Wuhan, China © 2018 Elsevier Inc. All rights reserved.

1.06.1 1.06.2 1.06.2.1 1.06.2.2 1.06.2.2.1 1.06.2.2.2 1.06.2.2.3 1.06.2.3 1.06.2.3.1 1.06.2.3.2 1.06.2.3.3 1.06.2.3.4 1.06.2.4 1.06.2.4.1 1.06.2.4.2 1.06.2.4.3 1.06.2.4.4 1.06.3 1.06.3.1 1.06.3.1.1 1.06.3.1.2 1.06.3.2 1.06.3.3 1.06.3.3.1 1.06.3.3.2 1.06.3.3.3 1.06.3.3.4 1.06.3.4 1.06.3.5 1.06.3.5.1 1.06.3.5.2 1.06.3.5.3 1.06.4 1.06.4.1 1.06.4.2 1.06.4.2.1 1.06.4.2.2 1.06.4.2.3 1.06.5 References

1.06.1

Introduction Spatial Databases Spatial Data and Databases Data Models Relational model Object-relational model Object model Spatial Indices BSP index K-D-B tree index R-tree and Rþ tree index GeoHash Typical Spatial Databases Hybrid model Relational databases Object-relational databases Object-oriented databases NoSQL Databases Big Data and NoSQL What is NoSQL? Why NoSQL? CAP Theorem Typical NoSQL Databases Wide-column stores Key-value stores Document databases Graph stores NoSQL Databases in GIS Advanced Database Technologies Distributed file systems In-memory databases Array databases Retrospect and Prospect Review of Spatial Database Development Current Situation and Problems Distributed geospatial big data structure model Distributed geospatial big data storage and management Indices for geospatial big data Conclusion

50 51 51 52 53 55 55 56 56 56 57 57 57 57 59 61 63 64 64 64 64 65 65 66 67 68 69 70 72 72 73 74 76 76 77 77 77 77 78 78

Introduction

The 21st century has been an era of information explosion. Data, the carriage of information, has reached unprecedented levels. Spatial data, which are also known as geospatial data, are information regarding geospatial entities or geographic phenomena on our planet or outer space that can be represented by a collection of geometric and alphanumeric values in geographic information systems (GIS), which are commonly used in many fields and contribute to the quality of our human life and environment (Heywood and Cornelius, 2010; Shekhar and Xiong, 2007). Where data exists, so too does data storage. Data storage is one of the most significant issues in fundamental information technology (IT) infrastructure and GIS. From local files to database management, the storage strategy of spatial data is evolving with the pace of the IT industry yet maintains its own particularity, which means we cannot directly employ off-the-peg databases in IT infrastructure for spatial data storage. This article explains some fundamental concepts of spatial databases and database models for GIS and introduces several types of databases that are employed for geospatial data, including traditional data storage solutions and the most popular NoSQL

50

GIS Databases and NoSQL Databases

51

approaches for big data. And an overview of the history and development of spatial databases is represented to better explain the relevant technologies of spatial databases. Database technologies are produced when handling large sets of data. Wikipedia defines database as “an organized collection of data.” Usually, a database is stored as a file or collections of files on a magnetic tape or disk, optical disk, or some other computer storage devices in a format that can be retrieved and manipulated easily. Database management systems (DBMSs), which are built on top of databases, are elaborately designed computer software applications that offer a convenient way to let users interactively manipulate and manage data in databases. The main functions of DBMSs can be summarized as follows: (A) Database creation: DBMSs provide data definition language (DDL) to conveniently define data objects in a database and approaches to create a database and load data into the database. (B) Data manipulation: DBMSs provide data manipulation language (DML) to create, update, insert, and delete (CUID) data in a database and query language to retrieve data from the database. In addition, most DBMSs offer a set of functions to process and analyze data in databases. (C) Database management: DBMSs perform unified management and control during database creation, operation, and maintenance to ensure the availability, consistency, integrity, and security of a database. (D) Database maintenance: DBMSs provide a series of maintenance functions such as data dumps, data recovery, performance monitoring, and database reorganization to maintain the functions of a system. (E) Data communications: DBMSs provide interfaces to communicate with other applications and interoperability with other databases. In addition, some of these systems enable conversions among different data formats. Generally speaking, three revolutions in database technologies have occurred (Harrison, 2015) (Fig. 1). The first was driven by the emergence of electrical computers and was a sign that we entered the digital storage era. The second was the rise of relational database management systems (RDBMSs) and their prosperity. The third was the explosion of NoSQL databases that are driven by modern Web applications. This article mainly focuses on relational and NoSQL databases and their relevant applications in GIS for spatial data.

1.06.2

Spatial Databases

1.06.2.1

Spatial Data and Databases

Spatial databases, as the name implies, are databases that are optimized to store and query spatial data. In geographic information science, spatial data can be classified into two major categories: vector and raster (Heywood and Cornelius, 2010). Vector data model spatial entities with geometries such as points, lines, and polygons, and the topologies among them. For instance, a river can be regarded as a line, and a lake can be treated as a polygon. Raster data represent geographical phenomena with a grid of multidimensional discrete values, such as remote sensing images, scanned topographic maps, and digital elevation model (DEM) data. In traditional GIS context, spatial data often refers to spatial vector data. The earlier stage of spatial database studies mainly focuses on how to put vector data into databases, while raster data were still stored as files. Spatial data consist of plain attributes, locations, times, and topology information. Their variable length and unstructured nature make directly handling these data with mainstream databases difficult. Additional important features of spatial data include large data volumes, various heterogeneous data formats, and complex data query processes. These features pose some challenges for database technologies. The design and implementation of a spatial database must meet the following requirements: (1) (2) (3) (4)

This database This database This database This database

can be employed for data storage and management. should natively support spatial data types in its data model. should offer a query language to perform spatial queries. should provide spatial indices to accelerate spatial queries.

Database models, spatial queries, and indices are three important issues that must be considered in order to provide a satisfactory spatial database service. Spatial queries that are performed on spatial objects mainly include location-based queries, spatialrelationship-based queries, and attribute-based queries; the first two are typical spatial queries. Basic spatial queries are summarized in Table 1. Database models and spatial indices are covered in subsequent sections.

1st generation

2nd generation Hierarchal DBMS

Magnetic media Network DBMS

1951

Fig. 1

1961

Timeline of database technology development.

RDBMS

1968 1970

3rd generation

OO DBMS

NoSQL

Late 1990s

Early 2000s

52

GIS Databases and NoSQL Databases

Table 1

Spatial query types

Query type

Example

Surface–surface query

Determine all the adjacent regions of a given administrative district

Line–line query

Determine all the braches of a given river

Point–point query

Determine the closeted gas station to where I stand

Line–surface query

Determine all the cities and towns that a given railway passes through

Point–line query

Count the street lamps along a given road

Point–surface query

Determine all the schools in a given city

1.06.2.2

Diagram

Data Models

Data models are the abstract description of specific things in the real world. These models can be used to express and organize data in computer systems, based on which various operators can be performed to manipulate these data (Date, 2003). Objects cannot be directly turned into data that a computer can handle, unless they are abstracted and transformed through a data model. A welldesigned data model can model the real world fairly authentically and can be easily understood by people and implemented on a computer. Theoretically, data models are collections of strictly defined concepts that describe the static and dynamic characteristics and integrity constraints of a system. Accordingly, a data model comprises data structures, data operations, and integrity constraints on data. Data structures describe the objects and their relationships within a database. Data operations are sets of functions that can be applied in the instance of objects in a database. Integrity constraints define the constraints and dependency rules of data and their relationships to guarantee the integrity and consistency of the data in the database. Data models can be classified as conceptual models, logical models, and physical models according to their intended uses (Date, 2003). Conceptual models, which are also called information models, model real-world information from the perspective of users and are mainly used during the database design stage. Logical models organize data from the perspective of computer systems and are mainly used during the DBMS implementation stage. Hierarchical models and network models are widely employed logical models in the first-generation databases. Hierarchical models are based on tree structure, and network models are based on a graph structure. Relational models, which were proposed by E.F. Codd in 1970, lay the foundation of the second-generation databasesdrelational databases. Since then, RDBMSs have predominated the database market. However, object-oriented (OO) models and object-relational models have emerged with the increasing popularity of OO programming. NoSQL databases prevailed during the 2000s alongside various other data models. Physical models are data abstractions at the lowest level and describe the physical storage mode and access method in the internal computer system. Correspondingly, the process of entering relevant spatial information from the real world into a spatial database involves three main procedures (Fig. 2): from real-world information to a conceptual model, from a conceptual model to a logical model, and from a logical model to a physical model. In conceptual models, spatial data can be classified as object models for discrete features

GIS Databases and NoSQL Databases

Spatial information

Real world Object model (discrete features)

Field model (continuous features) Abstract

Conceptual model

Logical model

Hybrid model, relational model, objectrelational model, object model, ...

Physical model Fig. 2

53

Index, table, object, array, binary, ...

Database storage

Process of entering spatial information into database storage.

and field models for continuous features (Rigaux et al., 2001; Shekhar and Chawla, 2003). Discrete features such as wells, roads, and lakes in nature are independently existing spatial objects, which are usually expressed as vector data. Continuous features describe nondiscrete phenomena such as temperature, rainfall distributions, or vegetation that cover a specific area and are usually expressed as raster data. Logical models are designed by considering the specialties of spatial data and referring mainstream database models. Physical models permanently retain these spatial data in database files. Finally, index and storage optimization technologies can be leveraged to improve the performance of spatial queries. Data models that are employed in spatial databases are greatly influenced by mainstream database technologies in the IT industry and are used and improved to fit spatial data. In the following section, relational models, object-relational models, and object models are briefly introduced.

1.06.2.2.1

Relational model

Relational database systems (RDBSs), which are based on relational models, have dominated the business market for a long time and play a significant role in the modern information society. Relational models are built on algebra sets with rigorous mathematical foundations and formal analysis abilities. Here, the fundamentals of RDBSs are briefly reviewed from a user’s perspective. Relational models consist of three main parts: relational data structures, relational operations, and integrity constraints (Date, 2003; Kroenke and Auer, 2010). 1.06.2.2.1.1 Relational data structures Relational models consist of a set of relations, each of which is represented as a normalized two-dimensional data table. Some basic terms are introduced in Table 2. In relational models, the entities and relationships of entities are represented as relations. Relations must meet the demands of some conditions, with the first and foremost being that each attribute is inseparable. Thus, a table cannot contain another subtable in RDBSs. 1.06.2.2.1.2 Relational operations Relational models show their powerful capabilities with flexible manipulations and query operations. Query operations are a significant part of relational operations, among which select, project, union, except, and Cartesian are the five most fundamental operations; other operations can be deduced from these fundamental operations. Users can easily perform create, retrieve, update, and delete (CRUD) operations on these data with standard structured query language (SQL). Table 2

Basic terms of relational data structures

Term

Explanation

Relation Tuple Attribute Key Domain Schema

A relation usually corresponds to a table in relational models. A row or record of a table is called a tuple. A column of a table is an attribute, each of which has a name for identification. A key consists of one or more attributes to identify a tuple. The value range of an attribute is called a domain. A relation schema is the definition of relationships, which is typically represented as relationship name (attribute 1, attribute 2, ., attribute n).

54

GIS Databases and NoSQL Databases

1.06.2.2.1.3 Integrity constraints Integrity constraints of relational model stipulate some specific constraints to the relations of entities. At any time, relations in database must satisfy the predefined semantic constraints to ensure the consistency and integrity. Three types of constraints are present in relational models: entity integrity, referential integrity, and user-defined integrity. Entity integrity specifies that the key of a relationship cannot be null. Referential integrity, which is shown as follows, specifies that the values of a foreign key Fr in relation R must match the value of the key Ks in relation to S if the foreign key Fr corresponds to the key Ks: R(Kr, ... , Fr, ... )

S(Ks, ...)

Every RDBMS must support entity integrity and referential integrity. Moreover, some additional constraints attached constitute user-defined integrity along with differences in applications for stricter integrity level. For a specific business requirement, how to build suitable relational database schemas? The normalization theory of relational databases provides some clear guidelines. The basic idea of normalization is to eliminate the attribute dependencies of schemas step by step and finally remove all the redundancies in relations (Shan and Shixuan, 2014). The normalization can be distinguished into the first normal form (1NF), the second normal form (2NF), and the third normal form (3NF), according to the dependencies of attributes in schema. The 1NF indicates the most basic requirement of a relation that each attribute in the table is atomic and inseparable. The following example shows a database schema designed for college student management system: S-D-C (id, name, dept, dean, course, grade). Attributes in this schema, in turn, stand for student identifier, name, department, dean of the department, student’s selected courses, and corresponding grades. Evidently, relation S-D-C satisfies the criteria of 1NF, but it turns out to be very problematic, such as data redundancy, updating, insertion, and deletion anomalies when employing this schema in practice. This is because there are functional dependencies in this schemas. Usually, a lower normal form can be transformed into higher one by schema decomposition, and this process is called normalization. Before digging into normalization process, several concepts of function dependencies must be clarified. The candidate key of a relation is defined as one or more attributes that can uniquely identify any other attributes in a relation. If there are more than one candidate keys, any of them can be selectively defined as primary key. If an attribute is included in any one of candidature key, then it is named prime attribute; if not, it is called nonprime attribute. Loosely speaking, if a nonprime attribute cannot be uniquely determined by any proper subsets of the candidate key, the dependency between key and nonprime attribute is called full functional dependency. Otherwise, it is called partial functional dependency. If a nonprime key r1 can uniquely determine another non-prime key r2, the dependency between key and nonprime key r2 is called transitive functional dependency. The following illustrates a normalization process by these simple concepts. The normalization of relation S-D-C from 1NF to 3NF is shown in Fig. 3. First, the candidate key should be found out. Evidently, the key of relation S-D-C is (id, course). Second, all the dependencies need to be figured out. Full functional dependencies are rendered with blue arrowed lines in this figure; partial functional dependencies are in green, and transitive functional dependencies are in purple. Third, by removing all the partial functional dependencies in S-D-C, one relation is decomposed into two relations: C

course

id

name

dean

grade

S-D-C

dept

1NF

Remove Partial Functional Dependencies id

C

course

grade

id

2NF

S-D name

dean

dept

Remove Transitive Functional Dependencies

C

course

id

id name Fig. 3

The normalization of relations.

grade 3NF

dept

S dean

D dean

GIS Databases and NoSQL Databases

55

(id, course, grade) and S-D (id, name, dept, dean). Relation C and S-D fulfill the criteria of 2NF. Hence, if a relation is in 1NF, and every nonprime attribute is fully functional determined by any one of the candidate keys, then it is in 2NF. Next, if transitive functional dependencies are removed from 2NF, then it goes up to 3NF. Similarly, relation S-D can be further decomposed into S (id, name, dept) and D (dept, dean), and then relation S, D, and C fulfill 3NF. Yet, most of the aforementioned problems are solved in 3F. Beyond that, there are some other higher normal forms are proposed to eliminate more variety of dependencies to provide more suitable relation schemas. However, higher normal form is not always the best, and it should be balanced between efficiency and design. In addition, database transaction is another important concept that we must mention here. Transaction is a sequence of operations on the data in database that are performed as a logical unit (Kroenke and Auer, 2010). There are four properties attached with transaction, namely atomicity, consistency, isolation, and durability (ACID) properties. Atomicity means that a transaction is an indivisible unit, and either all the operations in a transaction are fully performed or none of them is performed. Consistency means that when a transaction is complete, all the data in database is in a consistent state. Isolation means that concurrent transactions are isolated from each other, just like they are performed serially. Durability means that all the modification to data is permanently effected when a transaction is complete. RDBSs commonly support ACID properties to ensure data integrity and consistency.

1.06.2.2.2

Object-relational model

RDBMS technology has achieved great success in the commercial domain; hundreds of thousands of commercial applications run on various RDBMSs. However, new changes are forming from domains with complex data types, such as the long raw text in computer-assisted typesetting systems, images in healthcare fields, and temporal-spatial data in GIS. The design goals of traditional RDBSs come from business transaction processing, so it is difficult to meet the needs of new database applications. More specifically, RDBSs only provide limited primary data types, some of which do not support user-defined data types or user-defined operations and functions. These systems cannot express complex data objects in a simple and clear manner. Some exploration and innovation regarding the integration of database technologies and OO technologies were started in the 1990s. Object-relational database systems (ORDBSs), as the name suggests, are generated from a combination of relational models and OO thinking. This object-relational approach both inherits existing RDBS technologies and provides support for object data management. Overall, most ORDBSs are mainly implemented within relational models but merely add partial support for simple object types. In 1999, SQL3 (also known as SQL99) was published to provide some OO extensions to SQL and thus objects can be stored in relational table row (Melton, 2002). One of the most significant extensions of SQL3 is the introduction of a new OO data types, namely, row type and abstract data type (ADT). Row type is a collection of attribute definitions, the instance of which can be regarded as a tuple in relational table. ADT enables users to create a customized data type that consists of attributes and method definitions and supports type heritance. In an ORDBS, type possesses the characteristics of class, which greatly enhances the capability of traditional RDBSs to store and manipulate object data. Additionally, the SQL/object language binding standard is intended to embed SQL in Java language and make OO programming language interact with SQL in a native and direct manner. Later, commercial RDBMSs gradually evolved into object-relational database management systems (ORDBMs) by adding support for customized data types and functions. However, SQL3 standards lag behind the actual implementation of most ORDBMSs, so different products have their own terminology and language syntax and offer different levels of support for OO models.

1.06.2.2.3

Object model

OO technology is both a programming paradigm and a cognitive method. Since the middle and late 1980s, people have attempted to combine OO thinking and DBMS technologies. Object-oriented database systems (OODBS) can be regarded as a persistent and sharable object repository with the capabilities of querying and management. The design and implementation of OODBSs are based on OO models. An OO model describes the logical organizations, relationships, and constraints among entities in the real world from an OO perspective. Some key concepts that form the foundation of OO models are similar to those in OO programming. The following gives a brief review. Table 3 provides a comparison of OO models and relational models.

Table 3

Comparison of OO models and relational models

Item

Relational model

OO model

Fundamental data structure Data identifier Static state Dynamic behavior Abstract data structure Encapsulation Relation of data item Schema evolution capability

Two-dimensional table Key Attribute Relational operation Schema No Key and foreign key, referential integrity Weak

Class OID Attribute Method Class Yes Inheritance, combination Strong

56

GIS Databases and NoSQL Databases

(A) Object: An object consists of a series of attributes and methods. Attributes are a collection of variables that represent the compositions, characteristics, and states of an object. Methods define operations on the object to describe the behavioral competencies. (B) Object identifier (OID): A unique, constant OID always uniquely exists to identify each object in OODBs. An OID is persistent and exists with a specific object until the object is deleted. (C) Encapsulation: Each object is the encapsulation of its states (attributes) and behaviors (methods). Encapsulation segregates the external interfaces and internal implementation. Users can manipulate objects by sending messages to exposed interfaces without considering how the program works inside. (D) Class: Objects that share the same attributes and methods form a class. Classes are the abstractions of objects, and objects are instances of a class. In OODBs, a class is the “type” and an object is the “value.” Classes can be nested into other classes, so the attributes of classes can be any defined class or a primitive type. (E) Inheritance: New data types can be derived from existing types by inheritance and own the attributes and methods of both the inherited types and their newly added members. Usually, the inherited class is called a superclass, and the derived class is called a subclass. This inheritance mechanism increases the information’s reusability and provides a simple and clear method to model the real world.

1.06.2.3

Spatial Indices

Spatial indices are an elaborately designed data structure to produce certain orders and arrangements on spatial objects according to their locations, shapes, attributes, or relationships among each other to easily and quickly extract desired information from databases. Querying with spatial indices can roughly filter data that do not meet the query criteria, greatly reduce computing time without losing accuracy, and locate the target objects more quickly, so spatial indices are always a focal point of study. Designing a well-performing spatial index requires partitioning spatial data, eliminating overlapping data to obtain a unique index value, and expressing the spatial data and their relationships well. Scholars have presented many index methods, which are mainly classified into four categories: structures that are based on binary trees, structures that are based on B-trees, structures that are based on hashing, and structures that are based on space-filling curves. Among these systems, tree-based index structures are mostly commonly employed in mainstream spatial databases. Some of these typically used spatial indices are briefly introduced here.

1.06.2.3.1

BSP index

BSP (Paterson and Yao, 1990), which is short for binary space partitioning, is a binary tree that partitions the target space into two parts (Fig. 4). BSP fits well with the distributions of spatial data in databases but contrasts spatial operations because of its great depth.

1.06.2.3.2

K-D-B tree index

K-D-B trees (Robinson, 1981), which are known as K-dimensional B-trees, are the extension of BSP to a multidimensional space. KD-B enables the user to dynamically query points in the multidimensional space and is convenient to add and remove spatial points. The disadvantage of K-D-B trees is that this approach provides poor support for the querying of objects with certain spatial scopes, such as lines and surfaces. This problem can be solved by mapping the multidimensional surface objects into points and then performing the query; however, this solution creates some other issues.

M

R1

A R1

R2

C

R2

M

A B D

R3

B

R3

C

E D

Fig. 4

BSP index.

E

GIS Databases and NoSQL Databases 1.06.2.3.3

57

R-tree and Rþ tree index

R-trees (Guttman, 1984) group nearby spatial objects and contain them within their minimum bounding rectangle (MBR) (Fig. 5). Each MBR only contains a single object at the leaf level, whereas an MBR contains the aggregation of nearby objects at higher levels. The principles for constructing an R-tree are as follows: (1) the overlap among MBRs must be as small as possible; (2) as many spatial objects as possible must be contained within the MBR; and (3) MBRs can be nested to contain smaller MBRs. Spatial queries, in the first place, are to judge which MBRs fall within the query window and then to judge which objects within this MBR are the queried objects according to other query conditions. In practice, spatial objects vary in the real world, and no surefire method exists to ensure that MBRs do not overlap if every object must be contained within the MBRs. Rþ trees (Sellis et al., 1987) have been presented as an improved R-tree by some scholars to permit overlap in the MBRs and so one object can be contained within several MBRs simultaneously. This approach ensures quick querying with R and Rþ trees; however, the efficiency of insertion, deletion, and querying is not easy to balance.

1.06.2.3.4

GeoHash

GeoHash is an encoding algorithm that expresses location information as a type of sortable, comparable string (Moussalli et al., 2015). This method was originally designed to denote points on a map with uniform resource locators (URLs). These points are usually recorded with latitude and longitude, so the problem involves encoding the latitude and longitude with a clean and succinct string. GeoHash code can be obtained with the following steps. First, the latitude range [ 90, 90] should be evenly divided into two subintervals [ 90, 0] and [0, 90]. If the latitude of a given location belongs to the left interval, then the code is 0; otherwise, the code is 1. This process is performed reclusively to get the corresponding binary sequence of latitude at a specific accuracy and so does the longitude. Next, the latitude and longitude sequences should be merged into one single sequence. In the merged sequence, the code of an odd-number bit comes from the latitude, and the code from an even-number bit comes from the longitude. Finally, base-32 encoding is used to transform the series of binary numbers into a string (Fig. 6). Although GeoHash is generated by location coordinates, as a matter of fact, this code stands for a region. Any position coordinates in the region share the same GeoHash code. GeoHash forms a hierarchal data structure by subdividing the region into a bulk of grids. The deeper that this subdivision is performed, the more detailed the location is denoted. We can obtain the rough location range by checking the prefix code of GeoHash. Regions are closer if they have a similar prefix. Thus, a time-saving method is to compare the prefixes of GeoHash when performing a distance query. GeoHash indices are frequently used in NoSQL databases for spatial data. However, some problems do exist in GeoHash encoding, such as the edge cases where some points may be spatially close but have different hashes.

1.06.2.4

Typical Spatial Databases

The unstructured nature of spatial data and the diversity of data types prevent us from directly applying existing nonspatial databases to spatial data management. On the one hand, data models that are employed in current generic databases cannot represent complex spatial relationships among various spatial entities. On the other hand, spatial attributes of geographic entities, such as locations and shapes, tend to have variable length. Of course, this type of data is not suitable for mainstream relational databases. In this case, some solutions to extend and improve current database technologies have been developed to meet the needs of spatial data management.

1.06.2.4.1

Hybrid model

Hybrid model, also called loosely coupled architecture, is the first model that was developed for spatial data management. The basic idea of hybrid models is to employ two subsystems to store and manage spatial geometric attributes and nonspatial attributes, in which nonspatial attribute records are stored in an RDBMS and geometric data are stored in a specially designed spatial database, usually binary data files (Rigaux et al., 2001). Shared identifiers of spatial entities are used to link two subsystems together (Fig. 7).

M E B

A

R1 D1

R2

D2

C

R1

F M

Fig. 5

Rþ tree index.

R2

A

B

C

D1

E

F

D2

58

GIS Databases and NoSQL Databases

Fig. 6

Calculation of GeoHash.

Fig. 7

Hybrid model for spatial data.

The hybrid model takes full advantage of RDBMSs for attribute data management and enhances the efficiency of building large spatial databases. Operations that are relevant to geometry data manipulation and management are performed by GIS software. This approach, which combines local files and RDBMSs, is still quite common in current GIS software and is extensively used in small data storage and management. However, some obvious drawbacks exist in this model. On the one hand, differences between two subsystems, each of which has their own query rules, make query optimization difficult. On the other hand, this model is likely to break data integrity because of the fragile linkage of these two systems. For example, a spatial entity may exist in a spatial database, while its attribute data have been deleted from the RDBMS. ARC/INFO coverage data model (Morehouse, 1985, 1989) is an early hybrid model for spatial vector data proposed by Environmental Systems Research Institute (ESRI) and employed in their early GIS product ARC/INFO. The “ARC” stands for locations of spatial feature, and the “INFO” stands for relational tables containing corresponding attribute data. So coverage model comprises structure and storage definitions of both locational and thematic attributes for geographic features. A feature in coverage is uniquely identified by an auto-generated identifiable number. The spatial and attribute data of feature is linked by this number. Coverage model puts extra emphasis on the topologies of geometries. Each feature in a coverage is composed of one or more strictly defined geometry elements called feature classes, such as nodes, arcs, polygons, and annotations. And each type of feature class follows specific rules to ensure the correctness of topologies. Hence, coverage data can work very well when performing edit operations that need to handling topological relationships. When it comes to the physical storage, coverage data is stored as a directory containing collections of feature classes in binary format in a computer. And there is another directory containing the INFO table files which is used for storing attribute data of feature classes. ESRI shapefile (ESRI, 1998) is another typical example of hybrid model and still commonly used in the GIS domain for spatial vector data. This model is regulated and developed by ESRI and was originally introduced in their ArcView GIS software. A shapefile is a collection of files with different extension names, three mandatory files of which include a main file (.shp), an index file (.shx), and a dBASE table (.dbf). The geometrical shapes, which describe point, line, and polygon vector features, are stored in the main file and are written and read according to the specified binary sequences. This binary main file consists of a single fixed-length header in

GIS Databases and NoSQL Databases

59

front and one or more variable-length records of geographic entities at the back. Each record includes a header and contents that describe a geometric shape with a list of its vertices. One shapefile can only store one type of geometry, so more than one shapefile is usually required to save all the feature layers in a geographical map. Index files, which correspond to the sequences in main file, store the offsets from each record to the beginning of the main file. Attributes of geographic entity records are stores in a dBASE table. dBASE, which was a widely used RDBS in early microcomputers, is famous for its simplicity and ease of use. Now, the major legacy of dBASE is its .dbf file format for data storage, which is still served in many other applications. Beyond that, some additional files in a shapefile store other information, such as the coordinate system and projections, spatial and attribute indices, and geospatial metadata.

1.06.2.4.2

Relational databases

Complete RDBMS solutions for spatial data refer to the management patterns in which geometry and attribute data are both stored and accessed via the same relational database. In general, relational spatial databases can be implemented in two ways (Fig. 8). (1) Relational models: Various spatial features are decomposed into several types of basic spatial entities, and then corresponding tables are designed to hold the spatial data according to the relational model and characteristics of the spatial entities. The disadvantage is that querying process is time consuming and inefficient because it involves multiple table join operations. (2) Binary large objects (BLOBs): Geometry data can be saved as a BLOB field in a relational table because of the wide support for BLOBs in current relational databases. The disadvantage is the low reading efficiency for binary data, especially for nested objects. Roughly speaking, ArcGIS personal geodatabase and ArcSDE (SDE, short for spatial database engine) are typical RDBMS solutions for geospatial data. In this section, ArcGIS is used as examples to examine some implementation details of the relational approach for geospatial data. First of all, it is necessary to clarify geodatabase model. In essence, geodatabase is OO conceptual data models that are designed and implemented by ESRI for their proprietary ArcGIS series software (Zeiler, 1999). Geodatabase defines a uniform geospatial data description model and support several types of datasets, such as feature classes of vector data, raster datasets, topologies of spatial entities, and attribute tables. A geodatabase instance stores both data schemas, rules for each type of geographic dataset, and actual spatial and attribute data. Currently, three types of geodatabases with different foundations exist, including file geodatabases, personal geodatabases, and ArcSDE geodatabases. File geodatabase holds geospatial data in local file systems, representing as a directory that contains several data files in computer. Personal geodatabase employs Microsoft Access database to store and manage geospatial data, serving as a completely relational approach. ArcSDE geodatabase stores geospatial data in a third-party RDBS by leveraging a middleware software called ArcSDE. This approach can be considered as either purely relational or object-relational hybrid depending on whether the employed RDBS supports spatial object types. An instance of a personal geodatabase is a Microsoft Access database, which consists of several system tables and user data. The system tables remain in the database and store personal geodatabase schemas, which include the definitions, integrity rules, and behaviors of geographic datasets. In addition to this, spatial data, including vector features or raster datasets, are also stored in tables. Specifically, an access table can hold a vector feature collection, in which each row is a feature record and each column contains a geometrical shape or attribute of this entity. The geometrical shape is stored as a BLOB in relational tables. A raster dataset is partitioned into several trunks, which are also called blocks, because of its large size, and then stored into individual rows in a relational table with its metadata. Fig. 9 shows a vector of continents and a raster of a world image in ArcMap. These features are stored in the same personal geodatabase. Fig. 10 shows how the data are organized in the Access database. Personal geodatabase is a small and flexible approach for geospatial data, but some limitations still exist. Because personal geodatabases employ MS Access databases to store all information, it cannot run on non-Windows platforms. Additionally, the maximum data size cannot exceed 2G in Access, and it does not support multiuser editing or version management. Overall, personal geodatabase is particularly suitable for personal, small and middle-size data storage and management. ArcSDE geodatabase (Robert, 2001), another family member of ArcGIS geodatabases, is designed for enterprise data management and GIS applications. ArcSDE, which acts as a gateway, connects GIS software and applications with common-enterprise RDBMS and significantly fit spatial data storage and management into a mainstream IT infrastructure. This method shields the diversity and complexity of underlying RDBMSs, enables same spatial functionality on different RDBMSs, and offers a uniform

Fig. 8

Spatial entities in a relational database.

60

GIS Databases and NoSQL Databases

Fig. 9

Vector and raster spatial data in ArcMap.

Fig. 10

Vector and raster spatial data in Access database.

interface to upper GIS applications. Consequently, GIS applications that are built on ArcSDE API can be easily transplanted onto any RDBMSs that are supported by ArcSDE with little or no modification. Currently, databases that are supported by ArcSDE include Oracle, Microsoft SQL Server, IBM DB2, IBM Informix, and PostgreSQL. ArcSDE provides numerous excellent features to meet the needs of modern GIS applications, including massive data storage, multiuser editing, long transactions, and version management, based on these powerful commercial databases. How does ArcSDE map all types of spatial data into a relational table? ArcSDE organizes vector features in feature classesda collection of features with same type. A feature is abstracted as a geometrical shape with attributes that represent a real-world spatial object. A feature class is stored in one or more tables, and each table row stands for one feature. The geometrical shape and its

GIS Databases and NoSQL Databases

61

attributes constitute columns of table. Logically, ArcSDE stores geometrical shapes as coordinates and curves. A point is recorded as a single (x, y) coordinate in two-dimensional cases, (x, y, z) in three-dimensional cases, and (x, y, z, m) with additional measurement values. A line is recorded as a series of ordered coordinates and curves and a surface as a series of lines or curves whose start point and end point meet. Every geometrical shape has its relevant verification rules to ensure its integrity. Under the hood, the physical storage completely employs the standard data types that are provided by host RDBMS. More specifically, if a spatial data type is built in the RDBMS, then geometry and nonspatial attributes are wrapped into corresponding data type to store in database. Otherwise, standard binary or BLOB types are used to store geometrical shapes. ArcSDE adopts different storage strategies according to different RDBSs to maintain a transparent persistent layer. When a feature class is stored with ArcSDE, conventionally, several tables are created in the host RDBS, which usually consist of business tables, feature tables, spatial index tables, and delta tables. Business tables record the logical structure of the feature class. Feature tables store the actual spatial data. Spatial index tables contain the spatial indices of the corresponding feature class. Delta tables include addition table and deletion tables to enable multiversion management and keep track of modifications to the data. Each table can be joined with keys and foreign keys to represent a complete feature class. ArcSDE not only provides full support for vector data, but also enables the storage and management of raster data (ESRI, 2005). When raster data are imported into a DBMS with ArcSDE, the data are converted into internal SDE’s raster format. Next, the original data are partitioned into several blocks (or tiles) according to the user-specified size, and spatial indices are built for further spatial query. Then, data are resampled to created pyramids in different levels for querying and display. Finally, the partitioned blocks are delivered to the host DBMS and stored in tables with BLOBs (Fig. 11). Of course, numerous tables are involved in this process, similar to vector data. Business tables with a raster column are created to identify this raster dataset, where raster column information are maintained. ArcSDE creates the following four tables to store the actual raster data for each raster column record in the business table: a raster metadata table, a raster band metadata table, a raster band auxiliary table, and a data block table. The two metadata tables are used to store the meta-information on the raster and its bands, such as raster dimensions and pixel depth. The auxiliary table stores the additional information on each raster band, such as a color map or statistics. The block table is the location where the actual data are stored.

1.06.2.4.3

Object-relational databases

ORDBSs for GIS offer dedicated modules for spatial data definition and manipulation, which facilitates the direct storage and management of unstructured data. Most commonly, the database vendors extend traditional relational databases with additional spatial object types and related functions to improve the capabilities for geometry data. The management issues for variable-length records of geometry data are solved in this approach, and experiments have proven that this method is more efficient and effective than the aforementioned BLOB approach. ORDBMSs are the mainstream database solutions for modern spatial data management. However, spatial object types cannot be customized in this approach, so some practical restrictions are still existing. As mentioned earlier, some enterprise databases already can handle geospatial information with embedded spatial object types. On the one hand, traditional GIS applications could benefit from the advantages of current mainstream database technologies in the IT industry. On the other hand, common business applications would introduce location intelligence to provide better services. Object-relational database technology acts as a glue that combines traditional GIS with mainstream IT, greatly eliminating their isolation. Oracle Spatial and PostGIS are both well-known established ORDBMS extensions for geospatial data. Oracle Spatial is introduced in this section to better understand object-relational approach for geospatial data.

Raster metadata table

Raster pyramid Raster block

Fig. 11

Process of putting rasters into RDBS.

62

GIS Databases and NoSQL Databases

Oracle Spatial is an additional object-relational component for Oracle Database that provides a SQL schema and functions to facilitate the storage, retrieval, and management of geospatial data (Kothuri et al., 2008). The basic components are listed as follows: utility tools, data models, spatial query engines, and some advanced functional components. Utility tools are a set of assistive tools for data format transformation, data encoding and decoding, data import and export, etc. Data models define the logical data structure, which consists of elements, geometries, and layers. A layer is a collection of geometries with the same attribute set. A geometry that represents a spatial feature consists of one or more elements. An element is the basic unit of the geometry, including points, line strings, and polygons. Internally, a geometry is stored in a native spatial object type called SDO_GEOMETRY, which is the core of Oracle Spatial. Each table that stores spatial vector data must contain at least one SDO_GEOMETRY column. Oracle Spatial defines the object type SDO_GEOMETRY as follows: CREATE TYPE SDO_GEOMETRY AS OBJECT ( SDO_GTYPE NUMBER, SDO_SRID NUMBER, SDO_POINT SDO_POINT_TYPE, SDO_ELEM_INFO MDSYS.SDO_ELEM_INFO_ARRAY, SDO_ORDINATES MDSYS.SDO_ORDINATE_ARRAY);

SDO_GTYPE indicates the shape of the current geometry. SDO_SRID identifies the coordinate reference system that is associated with the geometry. The remaining three are used to hold the coordinates of the elements of the current geometry. If the geometry is a point, then the SDO_POINT attribute is used. Otherwise, the SDO_ORDINATES attribute is used to store the coordinates of all the elements of the geometry and SDO_ELEM_INFO specifies where a new element starts in the SDO_ORDINATES coordinate series, how the coordinates are connected (straight lines or arcs) in a specific element, and what type of the specific element is used (point, line, or polygon). The definition of SDO_GEOMETRY states that an SDO_GEOMETRY object mainly includes a series of coordinate values. When other attributes are specified, these coordinates can be restored to the basic elements, namely, the geometrical shapes. Additionally, SDO_GEOMETRY combines the Open Geospatial Consortium’s (OGC) Simple Features Specification and International Organization for Standardization’s (ISO) SQL/MM Specification for Spatial Data. Geometry Engine is a core component of Oracle Spatial that provides operators, functions, and procedures to query, analyze, and manipulate geometries. Index Engine, similar to other index technologies, is used to accelerate queries. Oracle Spatial adopts R-trees as its spatial index method, and index information is stored in the spatial index table. Oracle Spatial supports some other spatial data storage in addition to vector data, such as storage topologies and network and raster data, and enables full support for coordinate reference systems. Oracle GeoRaster, which is an extension for raster data, can store, index, query, analyze, and deliver remote sensing images and their metadata. GeoRaster also adopts an object-relational approach to store and manage raster data by using SDO_GEORASTER and SDO_RASTER data (Kothuri et al., 2008). SDO_GEORASTER is mainly used to store raster metadata and their spatial extent, and SDO_RASTER is used to store actual data. Conceptually, a SDO_GEORASTER object can be considered an N-dimensional array of cells. Users can query and process raster data with these built-in objects. Fig. 12 illustrates how raster data are actually stored in GeoRaster. When importing a remote sensing image into a database, a SDO_GEORASTER object should be initialized and stored in a GeoRaster Table, which contains at least one column with SDO_GEORASTER data. Then, the raster data would be subdivided into multiple blocks to store in a Raster Data Table (SDT). SDTs are tables of SDO_RASTER data that store block information and divide BLOBs in this image.

Fig. 12

Storage diagram of GeoRaster.

GIS Databases and NoSQL Databases 1.06.2.4.4

63

Object-oriented databases

The concept of OODBS can date back to the 1980s. Some commercial OODBSs have been available since the 1990s, but most of them have not received much attention in the industry market. On the one hand, OODBSs are not as mature as RDBSs and the performance of emerging OODBSs often lags behind that of full-fledged RDBSs, let alone a lack of some advanced features in RDBMSs, such as failure recovery, statistics reports, and online analytical processing (OLAP). On the other hand, the standard that was drafted by the Object Data Management Group (ODMG) for object data definition and query has not gained as wide acceptance as the SQL for RDBMSs. OODBS products have risen and fallen over time, but research and development on OODBSs is still going on. OODBSs can work cooperatively with the current OO programming paradigm without tedious object-relational mapping (ORM) and can handle large entities with complex data types. Some OODBSs even provide full support for ACID transactions and good solutions for schema evolution. The proprietary products, such as Objectivity/DB, the Versant Object Database (VOD), and VelocityDB, provide full commercial technical support, and open-source products such as the Zope Object Database (ZODB) and db4o provide free use and exploratory studies. Db4o is one of the most vibrant open-source communities and has achieved some popularization and application in specific areas until the program was purchased by the Actian corporation, who later ceased development. However, db4o, as an excellent object database, is still chosen to introduce the details of OODBMSs. Db4o, which is characterized by its high performance, compactness, simplicity, and zero administration (Paterson et al., 2006), has been gaining recognition in both academia and industries. Compared with the ORM approach, db4o, which is a pure OODBS, is much faster according to official benchmark testing. Furthermore, indices of specific attributes typically speed up queries with slight delays during insertion. Its native support for the storage of Java and .NET objects accelerates the application development cycle and enables cross-platform capability. The lightweight and compact implementation enables this system to be employed as a stand-alone database in client/server mode or embedded in clients or other application components, similar to SQLite. Operations that are performed on object data, including insert, delete, query, and update, only require a few programming codes. From the perspective of programmers, storing data in db4o is similar to writing objects into files with serialization. The difference is that the user cannot perform queries on a serialization file, whereas db4o supports this action. db4o provides three types of queries: query by example (QBE), simple object data access (SODA), and native queries (NQ). Users can perform queries in an OO manner. Additionally, db4o provides some level of schema evolution. Changes in the interfaces that a class implemented and changes in the attributes can be silently handled by db4o. The evolution of class inheritance is solved by utilizing a type-less transfer database. Another convenience of db4o is that it requires zero administration and less configuration compared with traditional RDBMSs. Theoretically, object databases are the most suitable databases for spatial data, especially vector data, including general object databases without any additional spatial functionality. Spatial geometries can be directly stored into databases, similar to any other ordinary objects. Spatial queries can be performed with customized functions, such as native queries in db4o. We attempt to save two-dimensional geographical data into db4o according to the OGC Simple Features Specification. It turns out that db4o is really easy to use and query for spatial objects. List 1 provides a snippet of Java code on how to store spatial entities into db4o and perform

List 1 Java code for storing and querying spatial geometry in db4o // open a db4o database. If it does not exist, then create one ObjectContainer db ¼ Db4oEmbedded.openFile(Db4oEmbedded.newConfiguration(), “road.yap”); // create a geometry by specifying the coordinates directly Coordinate[] coordinates ¼ new Coordinate[]{new Coordinate(0, 0), new Coordinate(10, 10), new Coordinate(20, 20)}; LineString roadLine ¼ new GeometryFactory().createLineString(coordinates); try { db.store(roadLine); // store the object into database // perform Native Queries, select all the lines in the database List lines ¼ db.query(new Predicate() { public boolean match(LineString line) { return line.getCoordinate() !¼ null; } }); // print the query result with OGC WKT format, output: LINESTRING (0 0, 10 10, 20 20) if (lines !¼ null) { System.out.println(lines.get(0).toText()); } } finally { db.close(); // close the connection of database }

64

GIS Databases and NoSQL Databases

spatial queries. Raster data can also be wrapped into objects and entered into a database. However, the performance and formal theory is the Achilles’ heel of object databases. Some relevant explorations and practices in this field are forthcoming.

1.06.3

NoSQL Databases

With the rise of Web 2.0 and online social activities from early 2000s, traditional RDBMSs are not suitable for handling large volumes of data in a large-scale and high-concurrency Web environment, such as the data that are generated by social network sites (SNSs) or indexed by search engines. Additionally, the mobile Internet and Internet of Things (IoT) have gradually emerged and flourished, constantly producing new data, both of which pose some new challenges to RDBMSs. Under these circumstances, NoSQL databases come into being, and now there have been a variety of excellent nonrelational database products.

1.06.3.1

Big Data and NoSQL

1.06.3.1.1

What is NoSQL?

Although RDBMSs have predominated the database market for a long time and achieved great success in the business domain, Google, which is the largest online search service provider in the world, find it hard to store or process massive-volume and fast-velocity data with RDBMSs. From 2003 to 2006, Google published three key papers that successively revealed the details of their solutions for big data, which mainly included an extensible distributed file system (DFS) called Google file system (GFS), a distributed parallel processing framework called MapReduce, and a high-performance and scalable distributed nonrelational database system for structured data called BigTable. These three technologies have a great influence on the subsequent development of big data technologies; the popular Hadoop platform was developed based on these thoughts. BigTable, which was a first-generation NoSQL database system, opened a door for further development of NoSQL databases. E-commerce sites such as Amazon and SNSs such as Facebook all faced challenges when scaling out their database infrastructure. In 2008, Amazon unveiled and released their new storage platform, Dynamo, in their public cloud computing platform, which was called Elastic Compute Cloud (EC2). Afterwards, various nonrelational databases have emerged and flourished because of the strong drive for big data management. These products vary in terms of their data models and have been specialized in their own areas. The rise of NoSQL databases was the third revolution in database technologies. Fig. 13 provides an overview of NoSQL database families. Some of the popular NoSQL database products are roughly listed according to the data storage models, and it can be noticed that some databases are designed with multimodels. The details of NoSQL data model will cover in the following part. The arrows in this figure stand for the derivation of NoSQL product. For example, Apache Hadoop is the open-source implementation of GFS and MapReduce; Apache Cassandra is designed and implemented with the reference to Google BigTable data storage and Amazon Dynamo distributed architecture. Generally speaking, these increasingly used nonrelational databases in big data, and real-time Web applications are called NoSQL databases. At first, NoSQL was referred to as “non-SQL” or “nonrelational” to indicate that databases are not relational databases without supporting SQL queries. Now, NoSQL is mostly considered as “Not Only SQL” in the community, which implies that some NoSQL databases may provide SQL-like query languages (Harrison, 2015). NoSQL databases involve a variety of database technologies that have been developed to serve modern Web applications.

1.06.3.1.2

Why NoSQL?

NoSQL databases are usually scheme-less, and have little support for transactions and no guarantee for ACID properties compared with RDBSs. Thus, NoSQL databases can provide more flexibility and scalability. Some benefits that are offered by NoSQL databases are comparatively explained in the following section based on their characteristics.

Distribute file systems

stores Closed-source Open-source Company Community

stores

Multi models DBMS

Fig. 13

NoSQL database families.

NoSQL databases

stores

GIS Databases and NoSQL Databases

65

Flexible data models: RDBMSs require an exact scheme definition before the user enters data into a database. When the business requirements for the application need change in some cases, the database schema should be changed accordingly by adding or modifying fields in the RDBMS. This process could be lengthy, requiring a long downtime. NoSQL databases, however, are exempt from this restriction because nearly all these databases are scheme-less. Without a predefined scheme, NoSQL can be used for additional data types, including structured data, semistructured data, and even unstructured data. Easily extensible: Relational databases are usually hosted on a single server because of the performance of table joins and transactions. Some additional sophisticated measures must be taken to ensure that databases that are distributed on different servers work well, as if they were on the same server. NoSQL databases were created for distributed data storage with the so-called auto-sharding ability. These databases can easily expand their capacities by simply adding commodity servers without considerations for data balance, replication, node failure, etc. All sharding-related work can be performed automatically by databases. Additionally, this process greatly frees the developers to focus on the business logic and content of their applications rather than necessary infrastructure administration tasks. High availability: As mentioned before, RDBMSs have little support for distributed storage and replication. However, most NoSQL databases can perform automated replication for failover. Distributed servers share different degrees of redundancy. Some sophisticated NoSQL databases even support self-healing and recovery, so regional failures would not cause any trouble. In short, the native support for replication and failover ensures the high availability of NoSQL databases. High performance: The big data era requires dynamic real-time data reading and writing with high concurrency. Tests have indicated that NoSQL databases have higher performance with regard to data reading and writing. On the one hand, NoSQL databases have more flexible data models without too many consistency checks. On the other hand, most NoSQL databases provide an excellent integrated caching mechanism, which retains as much frequently used data as possible in the system memory. All these characteristics increase the performance of NoSQL databases in response to high-volume data. NoSQL databases have also shown some other advantages in several difference cases. For instance, NoSQL databases have satisfactory support for OO programming. Many difference types of NoSQL databases exist, and the user can choose the best fit according to their requirements. Additionally, most NoSQL databases are free open-source, and users can customize and extend their functions with a lower cost if necessary. NoSQL databases exhibit a bevy of outstanding qualities compared with RDBMSs, but there are still some weaknesses. The disadvantages are listed as follows: (1) NoSQL databases lack uniform standards for querying. Users must learn a product’s own querying language to run a specific database product. (2) The supported features of NoSQL products are not as rich as those of RDBMSs. For example, most of these products have little support for transactions and statistical data reports. (3) NoSQL databases are still in the early phases of development, and some may not be sufficiently mature. All in all, NoSQL databases are a new database technology that has emerged in modern Web applications to deal with challenges from big data.

1.06.3.2

CAP Theorem

In 2000, computer scientist Eric Brewer presented the famous CAP theorem at the Symposium on Principles of Distributed Computing. The CAP theorem states three properties, that is, consistency, availability, and partition tolerance, which cannot be ensured simultaneously in a distributed computer system (Han et al., 2011). Consistency means that all the backup data in different server nodes share the same values at the same time. Availability means that entire clusters can still respond to the request even if node failures occur. Partition tolerance means that entire clusters can still work even if a network partition causes communication interruption between nodes. The CAP theorem provides suitable guidance to the design of distributed computer systems. Most NoSQL databases belong to distributed systems, so their design and development must follow the CAP theorem. According to the CAP theorem, existing distributed databases can provide full guarantee to two properties or partial support to the other at most. For instances, systems with the consistency and availability (CA) properties are usually RDBSs; systems with the consistency and partition tolerance (CP) properties include BigTable, HBase, MongoDB, Redis, and Membase, and systems with the availability and partition tolerance (AP) properties include Voldemort, Tokyo Cabinet, CouchDB, and Riak. High-availability and consistency networks are the goals when designing distributed systems, but partition tolerance is unavoidable. In fact, systems with CA properties only retain the consistency and availability of each subsystem, not the whole after network partitioning. Systems with CP properties do not have suitable support for availability, which is relatively a small fraction. Systems with AP properties take the great majority and achieve partial consistency at different levels according to different application scenarios. In practice, there are some significant levels of consistency, such as strict consistency, weak consistency, and eventual consistency, frequently mentioned. Strict consistency guarantees that all the read operations can access the newly updated data value. Eventual consistency means that all the accesses will reach the last update value eventually after a certain period if no new updates to a data item occur. Existing NoSQL databases often implement eventual consistency, but one disadvantage is that the system may return any value before reaching the eventual consistency. Weak consistency cannot guarantee that the read operations can produce a newly updated data value. Fig. 14 shows the relationships of the CAP properties and different levels of consistency.

1.06.3.3

Typical NoSQL Databases

A number of data models have been generated for NoSQL databases from earlier stages of development to the present. NoSQL Databases can be roughly classified as follows according to data storage models: wide-column stores, key-value stores, document

66

GIS Databases and NoSQL Databases

Fig. 14

CAP theorem.

Table 4

Comparisons of NoSQL databases

Storage model

Representative products

Characteristics

Application scenarios

Wide-column stores

Google BigTable, Apache HBase, Cassandra, Hypertable

Data mining, horizontal scaling, version control, etc.

Key-value stores

Amazon Dynamo, Riak, Redis, Oracle Berkeley DB

Document database

MongoDB CouchDB, CouchBase

Graph stores

Neo4J, Titan, OrientDB

Rapid data aggregation, scalable, versioning, locking, Web accessible, schema-less, distributed Simple, replication, versioning, locking, transactions, sorting Web accessible, schema-less, distributed Stores and retrieves unstructured documents, map/reduce, Web accessible, schema-less, distributed Associative datasets, OO, friendly, rigid schemas

In-memory caches, website analytics, event logs, e-commerce, etc. Content management systems, blogs, event logs, etc. Relationship representation, network data, etc.

databases, graph stores, etc. NoSQL databases that are based on new and different data models as well as multimodels are still increasingly produced. Before describing the details of each category, Table 4 provides an overview of these models.

1.06.3.3.1

Wide-column stores

Wide-column stores take tables as the primary data model but do not support relational operations on tables. Google’s BigTable is a typical example of a wide-column store, based on which a series of products have been derived, such as Apache Cassandra, HBase, Accumulo, and Hypertable. Although the implementation of wide-column stores is somewhat different, these databases still share many features. As an example, some details of wide-column store in Google’s BigTable are covered in the following. Google’s BigTable is a distributed storage system for structured data that is specifically designed to store and process very large amounts of data across thousands of commodity servers (Chang et al., 2008). Many projects in Google employ BigTable as a data storage platform, such as Google App Engine, Google Search Engine, and Google Earth. Tables are leveraged as a data model to organize data in BigTable and have columns and rows as well, similar to relational tables. Each table includes one or more column families that consist of one or more columns. The column families are named and specified in the table’s definition. These column names are dynamic, and new columns can be dynamically created during data insertion. Therefore, data in BigTable are sparse, and empty columns do not occupy any storage. Data for a specific column family are stored together on a disk, which explains the designation of “wide-column store.” Each row in a table has a sortable keyword and logically contains the corresponding data in the column families. Functionally, the row is designed as the unit for load balancing and the column as the unit for access control and resource accounting. The intersection of a row and column is called a cell, and each cell is referenced by a row key, column name, and timestamp, which are represented as a triplet (row : string, column : string, time : int64). The introduction of a timestamp enables BigTable to contain multiple versions of a data element in a cell. Google employs BigTable for web pages in WebTable: the URL of the website is the row key, various aspects of the web pages are the columns, and the contents of the web pages are in a contents:column format (Fig. 15). BigTable follows a master–slave architecture in a distributed environment. Basically, BigTable comprises three major components: a library that is linked into every client, one master server, and many tablet servers. Tablet servers can be dynamically added or removed from a cluster according to changes in the workloads. The master server is responsible for partitioning these tables into tablets, distributing the tablets to each tablet server, monitoring the status of tablet servers, maintaining the load balance across tablet servers, collecting a garage for GFS, handling schema changes, etc. The distributed information of tablets is stored in metadata files, and each tablet server can locate these tablets that correspond to each table. GFS is leveraged for the storage of all the log files

GIS Databases and NoSQL Databases

Fig. 15

67

Example of WebTable with Google’s BigTable.

and data files, and the data files are internally saved as a Google Sorted Strings Table (SSTable). A highly available and persistent distributed lock service called Chubby is implemented to ensure data consistency. The main characteristics of BigTable can be summarized as follows: (1) this database is particularly well suited for the storage and processing of large volumes of data at the petabyte scale or higher; (2) BigTable is very efficient at parallel data processing in a distributed environment; (3) this architecture can be easily and dynamically scaled inward and outward; (4) inexpensive commodity servers can adequately support BigTable’s functioning; and (5) this program is suitable for reading data, rather than writing data. Wide-column databases can be employed for spatial data, especially for large volume of data when high availability is needed (Amirian et al., 2013). For instance, the back-end servers of taxi-hailing apps need to track the real-time location of many cars simultaneously. The incoming data from cars can be stored into wide-column databases and the apps can obtain higher response speed. There are many attempts to enter spatial data into wide-column databases, and some have achieved good results.

1.06.3.3.2

Key-value stores

Key-values store data with a map (or dictionary) structure, which are represented as key-value pairs in a data collection. Clients can enter and request values that correspond to the key. A key is unique in a database, and the corresponding values can theoretically be any data type. However, in practice, most key-value stores impose some restrictions on the data types of values to support indexing and conflict resolution. Key-value stores are particularly prone to interactions with programming languages, because most modern programming languages support JavaScript object notation (JSON) objects, which are typical collections of key-value pairs. A JSON file or object can be easily modified by programming and then injected into the database; in contrast, key-value pairs in the database can be easily queried and extracted to the JSON object. Key-value stores have flourished after Amazon’s Dynamo paper was published. Dynamo-inspired databases such as Riak and Voldemort are replicas of Dynamo’s design, while some others, such as Cassandra, blend certain features of Dynamo and other NoSQL models. Some key-value databases have little in common with the Dynamo model, such as Redis and the Oracle NoSQL Database. Details of key-value stores in Dynamo are explored in the following. Dynamo employs a peer-to-peer (P2P) architecture to organize and distribute data nodes (DeCandia et al., 2007), which differs from the master–slave pattern of Google’s BigTable. This decentralized architecture can work with minimal administration. The purpose of Dynamo is to achieve a distributed storage system with high availability and high scalability. To this end, Dynamo combines several well-known techniques, including consistent hashing, tunable consistency, and data versioning. Consistent hashing solves the problem of distributing and managing storage nodes and routing nodes in a dynamically changing network. Tunable consistency enables the user to configure the level of system consistency. Data versioning provides support for multiple versions of data in the database with a vector lock algorithm, and version conflicts are handled by the upper application. Some additional technologies are used to solve node failures. Dynamo utilizes two simple interfaces, namely, get() and put(), to read and write data into the database. When a client sends a request (or coordinate) to a node to handle data reading and writing, the coordinator selects several available nodes from the preference list. This preference list contains a list of nodes that are responsible for storing a particular key. Then, the coordinator forwards the request to these nodes, waits for a response, reprocesses the returned results, and sends the final result to the client (Fig. 16). The main characteristics of Dynamo include the following: (1) this database employs a key-value store model, and the size of values cannot exceed 1M; (2) Dynamo is a decentralized distributed system, in which all the nodes are equally important and loads are perfectly balanced; (3) this program offers simple interfaces, namely, the get and put operations from HTTP protocol; (4) multiple nodes and distributed data replicas ensure high availability; (5) Dynamo is easy to scale out by adding commodity servers; (6) this program enables quick responses and large-scale concurrency; and (7) Dynamo supports high fault tolerance, and clients can also handle conflicts when data inconsistency occurs. Key-value databases can be employed to store spatial data, yet they are not the appropriate choices. Because key-value stores cannot easily and directly express complex spatial data, such as polylines and polygons, and their spatial relationships, additional

68

Fig. 16

GIS Databases and NoSQL Databases

Put/Get process of Dynamo.

spatial indices must be created to perform spatial queries. Since key-value stores are perfect for inserting and querying massive volumes of data with unique keys, applications just involving simple spatial data types, but large in volume, such as location information which are represented as points, should consider key-value databases.

1.06.3.3.3

Document databases

Document databases store data as structured documents, which represent Extensible Markup Language (XML) or JSON formats in most cases. To some extent, document databases are the further development of key-value stores. On the one hand, document databases employ similar key-value stores and permit nested key-value pairs. These databases enable higher query performance than key-value stores. On the other hand, document databases are regarded as an improvement to scheme-less key-value stores by adopting a self-described document format. Thus, these databases can perform more validations compared with key-value stores and endure fewer restrictions from rigid schema compared with RDBMSs. Typical document databases include Apache CouchDB, MongoDB, and CouchBase. Among them, different implementations differ in data organization at different levels. Currently, MongoDB is the most popular document database, and the following illustrates some of its technical details. MongoDB is a document-based, collection-oriented, schema-free, and open-source distributed database system that is implemented in the Cþþ programming language (Chodorow, 2013). Data in MongoDB are grouped into collections, and each collection owns a name and contains one or more documents. Collections and documents are to document databases as tables and records are to RDBSs. In MongoDB, documents are represented in a JSON-like format called Binary JSON (BSON) and internally stored as a series of key-value pairs. Functionally, this program is the most feature-rich database among NoSQL databases and is most similar to RDBMSs. MongoDB offers a powerful querying language whose grammar is similar to OO querying language and can achieve most functionalities, similar to the queries of single relational tables, including support for the creation of indices over data. In addition, MongoDB is ACID compliant at the document level. A typical MongoDB cluster comprises shard servers, configuration servers, and route servers (Fig. 17). Shard servers take charge of real data storage; configuration servers are responsible for recording metadata for a sharded cluster; route servers are the unified

Fig. 17

Architecture of MongoDB.

GIS Databases and NoSQL Databases

69

access points for routing queries and returning results. Auto-sharing enables outward horizontal scaling by adding commodity servers or running on cloud infrastructures. Replica sets act as replication facilities and provide automatic failover and data redundancy, ensuring high availability. The main characteristics of MongoDB can be summarized as follows: (1) this program provides high-performance data persistence; (2) this system supports a rich query language; (3) MongoDB supports automatic replication and failure recovery; (4) this program is easy to scale outward by auto-sharding; and (5) MongoDB supports multiple storage engines and accepts third-party storage engines. The usage scenarios of MongoDB include (1) dynamic data websites with real-time data insertion, updating, and queries; (2) a persistent cached layer of an information infrastructure; (3) storage for large-volume but low-value data; (4) scenarios with high scalability; and (5) storage for objects and JSON data. Document databases are more efficient for spatial data compared with key-value databases. On the one hand, document can express more complex data types and relationships. On the other hand, additional schema can be predefined to spatial data and then spatial queries can be performed on the spatial data inside document database. For example, OGC Geography Markup Language (GML) or GeoJSON, which are elaborately designed general specifications for spatial data, respectively, with XML and JSON format, can be employed to store spatial data in a document database. And their related document search methods and spatial indices can work together to perform various queries within document databases for spatial data.

1.06.3.3.4

Graph stores

Graph stores, which are based on graph theory, represent data as nodes, relationships, and properties. In graph theory, a graph is an ordered collection of vertices (or nodes) that are connected by edges, in which vertices hold the properties of the entities and edges contain the relationships of the entities. Fig. 18 illustrates a graph model with a directed graph. Similar to social networks, the individual people themselves are not important; the relationships between people make the social networks meaningful. Although RDBMSs can represent the relationships between entities via foreign keys and table joins, some performance issues still occur when handling very large graphs. Graph databases appear to address this problem. Typical products include Neo4j, Titan, OrientDB, and AllegroGraph. Details are introduced through Neo4j. Neo4j is a lightweight, high-performance, open-source graph database that is mainly implemented in Java programming language. It can be easily embedded into Java-based applications or run as a stand-alone data server (Goel, 2015). Neo4j, which is a directed property graph, stores graph entities with a number of properties, and each property is stored as a key-value pair. Neo4j supports multi-node storage, multiversion consistency, and ACID-compliant transactions. Neo4j adopts a property graph model as its data structure, which is also employed by many other graph databases. A property graph includes a set of vertices and edges. Each vertex has a unique identifier, a collection of outgoing and incoming edges, and a collection of properties that are defined by key-value pairs. Each edge also has a unique identifier, an outgoing and an incoming vertex, a label that denotes the type of relationships between its two vertices, and a collection of properties that are defined by keyvalue pairs. In addition, Neo4j provides a declarative query language for graphs called Cypher. With Cypher, clients do not need to write any traversal code for the graph structure, and the user can perform efficient and effective ad hoc queries of data in graph databases. Thus, Cypher is to graph models as SQL is to relational models. The main characteristics of Neo4j can be summarized as follows: (1) flexible schemas permit the user to add or remove properties on the fly; (2) this program completely supports ACID transactions; (3) Neo4j has an elastic scalability via a replication protocol; and (4) this program provides a powerful graph query language. Neo4j is suitable for scenarios in which the focus is mostly on relationships, such as social networks, traffic networks, and recommendation engines. Spatial data can be molded as graphs and then stored into the graph databases. Topological relationship can also be easily expressed with graph databases. And they are perfect for linear networks such as roads or railways and for routing and navigation applications (Amirian et al., 2013). However, data query tends to be time consuming as the searched node is buried deep within a graph. Usually, the performance of data loading and querying is not so good as the other three.

Fig. 18

Graph model.

70

GIS Databases and NoSQL Databases

1.06.3.4

NoSQL Databases in GIS

Nowadays, huge volumes of multitemporal, multispectrum, and multiresolution remote sensing images have been continuously generated and transferred downward with numerous performance improvements in remote sensing and observation hardware devices. Information for personal navigation and the Internet of Vehicles have also constantly accumulated with the popularity of the mobile Internet and promotion of navigational satellites. Sensor technology has been widely used in urban construction with the progress of the “Smart City” and “Digital Earth” strategic projects, producing tremendous amount of data. Clearly, traditional GIS also faces greater challenges and opportunities because of the emergence of “big data.” Some scholars, scientists, and technical workers have attempted to leverage state-of-the-art big data technologies to solve issues regarding geospatial data storage, management, and analysis. One important aspect is exploring the feasibility and capability of entering geospatial big data into NoSQL databases within a management scope. Some technologies and products of NoSQL databases that are used to manage geospatial data are introduced below. MongoDB provides native support for geospatial information through spatial indices and built-in spatial queries. Geospatial vector data can be stored in the Coordinate Pairs or GeoJSON formats. Coordinate Pairs is a legacy format in MongoDB that stores data with coordinate pairs of points on a planar reference system (e.g., [x, y]). GeoJSON, which was derived from JSON, is a standard interchange format that encodes various geographic data structures. A GeoJSON object can denote a geometry, a feature, or a collection of features. Geometry stands for a geometrical shape. A feature contains a geometry object and series of relevant properties. All geometries that are specified in GeoJSON are supported by MongoDB. List 2 provides an example of a GeoJSON object. Two additional types of coordinate systems are available for geospatial data in MongoDB: spherical and flat. A spherical system specifies coordinates with longitude and latitude over an Earth-like sphere, and a flat system represents coordinates on a Euclidean plane. MongoDB offers the 2dsphere index for spherical systems and the 2d index for flat systems to accurately perform queries. Some basic geospatial query operators, such as inclusion, intersection, and proximity, are enabled in MongoDB in addition to key-value-based queries. Clients can easily perform some location-based queries with these built-in operators, which are very significant in some enterprise applications with location intelligence. Here is an example of how to put the shapefile into MongoDB. One simple way is converting the data into GeoJSON with existing GIS tools such as Geospatial Data Abstraction Library (GDAL/OGR), and then insert the whole GeoJSON file as a document into database with MongoDB data import tool. Another feasible solution shown as Fig. 19 is that we need to read each feature class from the shapefile, convert every simple feature into GeoJSON string, then take each GeoJSON as a document, and insert it into the collection of MongoDB. The whole process needs to be done by programming manually, but it turns to be more granular. Later, we can create index on the geometry field and perform the spatial query. List 3 gives an example of how to find the spatial entities that are within the circle centered on [108, 34] and with a radius of 2000 m: Neo4j loads, stores, and queries geospatial data through a library called Neo4j Spatial. This program maps geospatial data into a graph model and objects and relationships into nodes and edges, respectively. Neo4j Spatial utilities provide support to import data from existing ESRI shapefiles and OpenStreetMap (OSM) files. A simplest generic approach for geospatial data with a number of geometries, like shapefiles, is to store each geometry as either well-known text (WKT) or well-known binary (WKB) in a single property of a single node. Geospatial data in Neoj4 are represented as layers, each of which consists of a collection of geometries and indices for querying. Every geometry instance is existing as one node with numerous properties (Neo Technology, 2013). Neo4j Spatial is compatible with OGC Simple Features Specification, supporting all the common geometries. A shapefile containing continents of the world, shown as Fig. 20, is stored in Neo4j. And the data is logically organized in Neo4j as in Fig. 21.

List 2 Example of a GeoJSON object { “type”: “FeatureCollection”, “features”: [ { “type”: “Feature”, “properties”: {}, “geometry”: { “type”: “Point”, “coordinates”: [ 116.38916015624999, 39.926588421909436 ] } .

GIS Databases and NoSQL Databases

Fig. 19

71

Process of importing shapefiles into MongoDB.

List 3 MongoDB’s spatial query db.countries.find( { geometry : { $geoWithin : { $center : [ [108 , 34 ] , 2000 ] }}})

Fig. 20

A shapefile containing continents of the world.

Currently, R-tree indices are built on geometries for fast querying, and customized indices can be added if necessary. The Java Topology Suite (JST), which is an open-source Java library for 2D spatial geometry models and geometrical operations, is embedded into Neo4j Spatial to support basic spatial queries, such as contain, cover, cross, and within. Additionally, Neo4j Spatial can be integrated with the GeoTools library and GeoTools-enabled applications such as GeoServer and uDig to perform complex spatial analysis. Although Neo4j is capable of spatial analysis with JST and beneficial when queries can be expressed as traversal over local regions, it suffers from low performance when importing large datasets (Baas, 2012).

72

GIS Databases and NoSQL Databases

spatial_...

Database information

Relationship types

Property keys

LAYER, RTREE_METADATA, RTREE_REFERENCE, RTREE_ROOT

bbox, CONTINENT, ctime geomencoder, geometry, gtype, ID, layer, layer_class, layercrs, Layerprops, maxNodeReferences, name

Graph visualization Fig. 21

The stored graph in Neo4j Spatial.

Some other spatial extensions, such as GeoCouch for Couchbase and Apache CouchDB, and GeoModel for Google App Engine, exist in addition to the above NoSQL spatial databases (Pourabbas, 2014). These extensions provide different levels of support for spatial data, mostly for vector data. However, spatial querying and analysis are relatively weaker compared with the full-fledged object-relational approach for spatial databases. Nevertheless, research on NoSQL for spatial databases is still progressing. Predictably, many additional spatial functionalities could eventually be implemented into NoSQL databases. Generally speaking, NoSQL, which is known for its high performance and easy scalability, reached a climax during the big data era. Some commercial companies are beginning to employ NoSQL databases to manage their location data, such as MongoDB in Foursquare, Cassandra in SimpleGeo, and BigTable in Google Earth. The common ground of these commercial applications is that huge amounts of data must be handled, and horizontal scalability is very important. Do we really require a NoSQL database to hold spatial data? What type of NoSQL database should we use? These answers can vary. NoSQL is not a one-size-fits-all approach, and every NoSQL must have its own specialties for certain areas.

1.06.3.5

Advanced Database Technologies

Following the third database revolution, various database products have been designed and implemented for different application scenarios. DFSs have been developed to provide a uniform distributed data management platform because of the need for distributed data storage. In-memory databases have been created to accelerate the input and output of data because of the limitations of disk I/O. Array databases have been developed for sensors, images, simulations, statistical data, etc. because of the volume of multidimensional array data in scientific research. This section introduces some other promising database technologies that have emerged during the third database revolution.

1.06.3.5.1

Distributed file systems

Traditionally, computers store and manage data via file systems, which specify the conventions of naming files and where files are placed in the physical media for access. At present, the most widely used media are still magnetic disks. Data have expanded exponentially since the beginning of the Information Age, far beyond the advances of storage technologies, which could be very problematic if only by adding disks to increase the storage capability. Thus, DFSs have been developed to store and manage large data volumes in a distributed environment. DFSs spread a single-file system’s nodes to any number of locations to form a network of file systems. Each node communicates and transfers data throughout the network. Although the data that are stored in each node are managed by local file systems, all the nodes coordinate as a whole. Users manipulate data in DFSs similar to local file systems, independent of which node the data are stored in or retrieved from. A typical DFS has the following advantages: (1) their distributed nature enables DFSs to handle high volumes with high scalability. These systems can reliably handle data on the scale of petabytes. (2) Cheap commodity servers can be clustered to distribute data, which are a low-cost solution for commercial applications. (3) Data in different nodes are processed in parallel, which makes DFSs more efficient. (4) Higher reliability is ensured by multiple data replicas and the use of failover. GFS inspired the design and implementation of modern DFSs, and some DFSs that were established later borrowed ideas from GFS (Ghemawat et al., 2003). The Hadoop distributed file system (HDFS), which is an open-source implementation of GFS, led to a boom in handling big data with DFSs in recent years, and many related projects have been derived from HDFS to handle data storage, management, processing, and analysis (White, 2012). Here, the HDFS platform is briefly introduced.

GIS Databases and NoSQL Databases

Fig. 22

73

HDFS architecture.

HDFS was designed based on the following principles: (1) to store very large datasets from several hundred megabytes to terabytes or even petabytes; (2) to enable stream data access. HDFS is tailored for write-once and read-many scenarios. Most, if not all, of the datasets in HDFS may be involved during analysis, so the high throughput of data access is more important than low latency; and (3) to run on cheap commodity servers. These capabilities can be expanded by adding commodities; even if hardware failures occur, fault-tolerant strategies can guarantee the high availability of data. Internally, files in HDFS are partitioned into blocks, and each block is stored in distributed nodes according to its key-value mapping. HDFS adopts a master–slave architecture, which involves the NameNode and DataNode roles (Fig. 22). Only one NameNode exists, which is responsible for managing the mapping of data blocks and namespaces in HDFS and handling read and write requests from clients. Namespaces are stored in a file called FsImage, which retains all the information of the hierarchical tree structure and metadata of entire files. Additionally, a transaction log file called EditLog is stored in NameNode to record every change to the file system. DataNodes are for the real storage of data blocks and their read and write operations. DataNode proves its availability by sending a Heartbeat message to the NameNode, and a DataNode that is out of touch is marked as dead, after which some relevant replicas come into play to keep the entire system working. When a file on a client is about to write into HDFS, the HDFS client first caches the data into a temporary local file; when accumulating over one block size, a request is sent to NameNode. The NameNode inserts the file name into the file system tree and returns the specific DataNodes and locations of the data block. Say the HDFS is configured with a replication factor of three. The client receives a list of DataNodes from the NameNodes and flushes the data block to the first NameNode. The first NameNode receives small portions of this data block in sequence and writes the block into the local disk while transferring the received portion to the second NameNode according to the list. Then, the second NameNode receives and transfers in turn, forming a pipeline of streaming data. The reading procedure is simple compared with the writing procedure. A client sends a reading quest to NameNode, which looks over the file system’s namespace and returns the locations of the data blocks. The client retrieves the data blocks according to the returned locations. HDFS and its related big data technologies are increasingly common in modern commercial applications, but HDFS is not always suitable for some cases. First, HDFS does not permit data access with low latency. HDFS achieves high data throughput in applications at the expense of high latency. Second, small files should be stored in large quantities. The metadata of entire files are stored in the memory of NameNode, so the memory cannot accumulate excessively large amounts of small files.

1.06.3.5.2

In-memory databases

A major aspect of improving database performance is optimizing the read/write speed of magnetic disks. Moore’s law accelerates the performance of CPU and memory, but the disk’s performance still lags behind. In-memory databases that directly enter data into the main memory have been developed to perform manipulation because of this mismatch between the performance of CPUs and disks. Compared with disk, the read/write speed of memory is greater by several orders of magnitude. Storing all the data into memory should significantly improve this performance. In-memory databases abandon traditional methods of disk data management and redesigns the system architecture by considering that all the data are stored into the memory. Accordingly, this method also alters and improves some aspects in cache mechanisms, parallel operation, etc. The most significant characteristic of in-memory databases is that the actively operating component of the database constantly resides in the main memory, namely, all currently active transactions only interact with the memory copy.

74

GIS Databases and NoSQL Databases

Of course, this approach requires large amounts of memory, so not all the data reside in the memory at any time under any circumstances, and in-memory databases still have to handle disk input and output. Data in memory means that the data disappear if a power failure or similar issue occurs. In-memory databases adopt the following strategies to solve this problem: (1) replicating data to other nodes in clustered or distributed environments; (2) periodically writing complete database images (also called snapshots) and checkpoints to disk files; and (3) writing database operation logs to an append-only disk file. Although these strategies are employed, the data persistency of in-memory databases still cannot completely be guaranteed. Redis is an open-source, in-memory database that also employs key-value stores (Harrison, 2015). This program maps keys to typed values and mainly supports data types such as strings and collections of string, including hashes, lists, and sets. By default, Redis holds all the data in memory, but this program can also be configured to store data in a virtual memory system. Redis employs a master–slave architecture, and the processes that are involved when running instances are as follows (Fig. 23): (1) applications perform queries to obtain values with primary keys directly from memory; (2) the key-value pairs may be held in the virtual memory and swapped in and out with the main memory; (3) database images are periodically written to the disk as a backup; (4) operations on data are written to an append-only file to perform data recovery; and (5) data from any Redis server can be replicated to any number of slaves. Other databases, such as Oracle Database and Microsoft SQLServer, are also trying to take the advantages of in-memory storage to optimize their performance. Most of the time, in-memory databases act as the cached layer in some practical applications instead of persistent layers, because these databases cannot ensure the complete durability of data, and the underlying persistent layer may employ another database to hold data on the disk. Hence, the combination of in-memory database and traditional disk-based databases can work much more efficiently.

1.06.3.5.3

Array databases

In computer science, an array is a type of data structure that stores and operates multidimensional discrete data, namely, array data. Two perspectives can explain the concept of arrays. From the perspective of function mapping, an array A can be considered as a function f ðaÞ : D /V that maps from an index domain D to a value domain V. In this regard, arrays provide a convenient and efficient approach to obtain values from indices. Second, the set theory treats an array as a set of homogenous elements that are ordered in a discretized space (Baumann, 2013). Each element in this space is called a cell, and each cell contains a value. Coordinates are vectors that are used to identify the particular position of a cell in this space. The length of a coordinate is called a dimension. Each dimension has an axis that indicates its direction and scale. X and Y axes are usually present in a 2D space, where cells are referred to as pixels. Meanwhile, 3D spaces have an additional Z axis, and cells are called voxels (Fig. 24). In the real world, array data are often represented as images, multimedia, simulations, or statistical data, which appear in domains such as Earth, space, and life science. Some well-established data formats, such as HDF and NetCDF, provide an effective and friendly interface to access multidimensional array data, although entering array data into databases is not common from a management perspective. Array data have traditionally been stored as BLOBs in databases, which were represented as plain binary files. Elaborated queries cannot be performed on BLOBs, even when considering the capability for in situ array processing. Multidimensional array DBMSs have recently arisen to offer flexible, scalable storage and retrieval on array data. One of the most distinguished features of multidimensional array DBMSs

Fig. 23

Redis in-memory database.

GIS Databases and NoSQL Databases

Fig. 24

75

Concepts in an array model.

is that these systems manipulate data based on the semantics of arrays, which means that users can directly perform various operations on selected cell values in the database. Unlike RDBMSs, which have well-established relational algebra and internationally accepted SQL, data models that are employed for array databases vary between vendors. Despite these differences, most of these systems provide an SQL-like query language and a number of operations on arrays. Users can manipulate array data with declarative query languages without worrying about the processes under the hood. Nowadays, array databases are gradually being employed in application domains such as Earth, space, and life science, highlighting great advantages for handling multidimensional array data. Some excellent products have been developed, such as Rasdaman and SciDB, and have drawn a great deal of attention in academia and industry. Here, we glance through some basics of array database products by examining Rasdaman. Rasdaman, which is one of the pioneers in array database systems, was created in 1989 to provide a flexible, high-performance, and scalable DBMS for array data applications (Baumann et al., 1997). This program follows a classical client/server architecture with queries that are processed on the server side. Under the hood, a base RDBMS is employed to support BLOBs for raster data storage. A file-based storage stage is alternatively implemented in the new version of Rasdaman, which has proven to be more efficient than BLOB storage in RDBMSs. The actual data are entered into the native file storage system with metadata in SQLiteda light and handy embedded relational database. The Rasdaman server acts as a middleware and maps the array semantics into the relational table semantics. In terms of client interfaces, Rasdaman offers an SQL-like language called Rasdaman Query Language (rasql) to manipulate raster data. Rasql queries are parsed and executed by Rasdaman servers, which retrieve data from the base RDBSM. Additionally, Rasdaman offers a Web application called Petascope (Aiordachioaie and Baumann, 2010), which implements some of the OGC Web Service interfaces for online array data sharing and processing and a Semantic Coordinate Reference System Resolver called Secore to add the semantic information of coordinate reference systems to array dimensions. Secore is very useful when querying spatial data in Rasdaman. The architecture of Rasdaman is shown in Fig. 25.

Fig. 25

Architecture of Rasdaman.

76

GIS Databases and NoSQL Databases

Time (NIR, R, G, B)

WCS (WCPS)

rasql Rasdaman

Lo

ng

itu

de

e

Latitud 0 Fig. 26

Management of remote sensing images in Rasdaman

Generally, Rasdaman is an all-purpose multidimensional array database that is suitable for all types of array data. This program offers broad support for geospatial data. Geographic coordinate systems, projection coordinate systems, and temporal coordinate systems can be easily conveyed in Rasdaman via Secore based on the OGC standards. Plenty of work on managing geospatial raster data in Rasdaman databases has been performed to ensure high efficiency for array data, powerful in situ array data processing, and complete support for spatial reference systems. Related projects, such as EarthServer and PublicaMundi, are suitable examples that are applied in geospatial fields. Remote sensing images are an important geospatial data source and are usually represented as a form of multidimensional arrays, so these images could take advantage of array database technology. If we consider the latitude, longitude, and time series (4, l, t) as three dimensions, we have entered bulks of Landsat remote sensing images into Rasdaman, as shown in Fig. 26. The WCSTImport tools that are offered by Rasdaman for data import are highly customizable. Users can insert various multidimensional data into Rasdaman and publish the data as Web Coverage Services (WCS) for public sharing with little extra programming. After loading data into Rasdaman, queries that are based on the spatial and time range can be easily performed, which greatly satisfies most needs of remote sensing image management. In situ processing can be directly performed on these images with rasql to derive new information. List 4 shows how to calculate the normalized difference vegetation index (NDVI) of an image with rasql. A benchmark test has showed that array databases exhibit many more advantages over traditional RDBMSs for geospatial raster data. In addition, both raster data and geoprocessing functions can be shared and accessed online with Petascope. Among Web services, WCS offers a standard method to access multidimensional coverage data over the Internet. In Addition, Web Coverage Processing Services (WCPS) define a language for retrieving and processing multidimensional coverage data, which can be easily integrated into WCSs to perform online coverage processing. We designed an approach that combined a Web Processing Service (WPS) and WCPS based on Rasdaman and Petascope and added WCPS processing capabilities into geoprocessing workflows. This approach would be more flexible when dealing with complex coverage data processing.

1.06.4

Retrospect and Prospect

1.06.4.1

Review of Spatial Database Development

Geospatial data were mainly stored by using local files in various formats from the late 1950s to middle 1960s. Most GIS platforms had their own data format and provided support for other popular formats. Early research on spatial databases coordinated with works on computer-aided mapping during the 1970s. Some attempts to manage the basic spatial geometries of points, lines, and polygons into databases were conducted. Points can be organized as structured data. Lines and polygons can be converted as collections of points. However, spatial databases were only in their primary stage during this period and were inefficient and lacked support for topology. The hybrid approach with geometries in a file and attributes in a RDBS achieved great success and was widely employed. ESRI coverage and shapefile are the typical examples of hybrid approach. The coverage data model defines various kinds of feature classes to represent spatial features and the topological relations of features can be explicitly expressed. It can benefit

List 4 NDVI calculation with rasql select encode(((m).2 - (m).3) / ((m).2 þ (m).3), “tiff”) from GF2 as m

GIS Databases and NoSQL Databases

77

editing operations related with spatial topologies. Shapefile stores spatial features based on simple feature classes, such as point, line, and polygon. It cannot hold topological relations, but the simplicity of data structure makes it better for quick visualization and data exchange. And until now, shapefiles have been one of the most widely used data formats in GIS. From the late 1980s to early 1990s, some RDBSs began to support BLOBs to hold variable-length binary data such as images, audios, and videos. Geospatial data, which are typically unstructured, variable-length data, could certainly utilize BLOBs in full-fledged RDBMS solutions. During this period, both vector and raster data could be entered into RDBMSs, and applications that were built from the secondary development of some GIS platforms were used to perform advanced data processing and sophisticated spatial analysis. Spatial databases confronted another great technology leap during the mid-to-late 1990s. ESRI Inc. designed and implemented a groundbreaking product called ArcSDE by partnering with Oracle and other leading companies in database technologies. ArcSDE is still built on RDBMSs but shields the differences among underlying database systems, providing a unified interface and enabling the powerful spatial analysis of traditional GIS platforms. Later, some database vendors developed object-relational models to hold spatial entities in an object type, and object-relational databases became one of the most popular approaches for spatial data. Since the early 2000s, NoSQL databases start to meet challenges for big data. Some work on NoSQL databases for GIS is still in progress, and some NoSQL products have already been developed for spatial data. The following section focuses on current problems and some considerations regarding distributed database management development.

1.06.4.2

Current Situation and Problems

The development of sensor Web technology has led to significant improvements in the spatial and temporal resolution of data. These higher quality data place enormous pressure on current data storage and processing solutions. The distributed storage and management of geospatial data are fundamental to distributed processing, maintenance, and sharing and is an inevitable trend of spatial database development in the future. The major issues of distributed spatial databases include distributed spatial data models, distributed spatial indices, efficient spatial queries, and high-concurrent access and control.

1.06.4.2.1

Distributed geospatial big data structure model

Traditional GIS technologies, which are built on static data models and rigid processing patterns, lack real-time and dynamic data representations and cannot properly support the management of dynamic, multidimensional, multisource spatial data, and methods for spatiotemporal stimulations. With the development of big geospatial data, traditional RDBMSs such as Oracle and SQL Server can only meet the demands for structured data and provide little support for unstructured data. NoSQL databases employ various nonrelational data models to organize volumes of data. These data models tend to be schema-less, and data are usually represented as collections of key-value pairs. For instance, Google BigTable can be treated as a type of sparse, distributed, multidimensional ordered key-value mapping structure, and keys comprise a row key, column key, and timestamp. Dynamo employs a distributed hashing storage architecture to store scattered key-value pairs in a large-scale distributed storage system. New data models should be designed and implemented to accommodate distributed storage to improve the flexibility and scalability issues of geospatial big data.

1.06.4.2.2

Distributed geospatial big data storage and management

The storage and management of spatial data, including spatial extensions for general RDBMSs such as Oracle Spatial or software middleware such as ArcSDE that are built on RDBMSs to provide a unified spatial data access interface, which are known as SDEs, both rely on traditional RDBMSs. RDBMSs have played a significant role in traditional GIS domains, but now encounter problems in effectively and efficiently storing and processing geospatial big data. Emerging distributed database technologies can handle volumes of data in a distributed Web environment. And nowadays NoSQL databases are guiding the development of distributed storage technologies. These databases break the unity of relational databases and ACID theory and have developed various data models and storage strategies. Each of the systems has particular applicable scenarios. Google, Amazon, Facebook, Oracle, etc. are major enablers of big data technologies in the industrial circle. For instance, Google employs the GFS for unstructured data and BigTable for semistructured and structured data. The distributed NoSQL approach has already been applied in several projects in Google and has demonstrated its feasibility and satisfactory performance. In the academic world, scholars have explored the possibility of storing and managing volumes of spatial data in an elastic cloud computing environment. For example, some scholars established DFSs with clusters to achieve a hierarchical and distributed organization and management of global remote sensing images. Some have attempted to store and index spatial images and vector features with existing NoSQL databases, such as Apache HBase and MongoDB. Predictably, the NoSQL approach for distributed spatiotemporal databases should rapidly progress in the following years.

1.06.4.2.3

Indices for geospatial big data

Spatial queries rely on spatial indices, spatial query optimization, and spatial join algorithms. Efficient spatial indices are one of the greatest challenges for distributed geospatial databases. Many research works have created local centralized spatial indices, which have been used widely. Global spatial indices must determine to which local storage nodes a request should be sent when performing a global spatial query. Existing indices for distributed databases often adopt a hybrid structure of spatial multilevel indices. For instance, spatial indices in MongoDB are mixtures of GeoHash and B-trees. GeoHash is used to establish spatial grids to cover the smallest spatial entity, and the B-tree index is built on the GeoHash code to accelerate global queries. Currently, the spatial indices in

78

GIS Databases and NoSQL Databases

MongoDB only support two-dimensional spaces, and edge problems are still unavoidable in GeoHash approach. Some scholars proposed a solution that employs R-tree indices. Overall, the spatial indices in distributed spatial databases are still in the exploration stage, and no mature system for distributed, parallel, and multisource spatial databases exists. The current problems in distributed spatiotemporal databases include the following. (1) Various data types that are relevant to spatial data include traditional static data and volumes of dynamic streaming data, which differ in terms of data models, formats, encodings, etc. Traditional geospatial data structure models cannot accommodate distributed storage and management. Key-valuebased data models have satisfactory simplicity and scalability but lack support for the multidimensional characteristics of geospatial data. It is necessary to search for a comparatively universal data structure model for big geospatial data. (2) The current approaches for big geospatial data mainly focus on data management and emphasize efficient storage and quick queries. These approaches do not consider the demand for effective data processing and analysis, such as high-throughput data I/O, high-speed data acquisition, and paralleling data processing. (3) Current research achievements on spatial indices cannot be directly applied to distributed spatial databases. The management of dynamic streaming data requires that spatial indices can be built in real time, distributed through extensions, and elastically scaled.

1.06.5

Conclusion

This article represents an overview of the development of traditional spatial databases and the general trend towards the NoSQL movement. First, some basic concepts and related theories of databases are reviewed, the most common data models and spatial indices for spatial databases are introduced, and four typical implementations of spatial databases are exhibited. Second, concepts of NoSQL and CAP theorem are explained, and four typical NoSQL databases are introduced. A couple of solutions of NoSQL technologies employed in GIS are represented and analyzed. Third, some advanced database technologies are briefly introduced. Final, possible solutions for spatiotemporal big data are suggested. Traditional RDBMSs and NoSQL databases were simultaneously created and developed over the years, and additional databases with different data models should become available for spatial data. For instance, object databases for spatial vectors, array databases for raster data, and graph databases for network data are all promising fields in spatial data management. Choosing a database according to the data’s characteristics and application scenarios is very important. Additionally, spatial technologies are gradually being incorporated into mainstream IT technologies. GISs should take full advantage of IT technologies and dig deep to discover the value of spatial data to better serve human beings and society. Lastly, the design and improvement of distributed storage systems and NoSQL databases should greatly increase in the following years. Geospatial big data require distributed parallel data storage and management systems.

References Aiordachioaie, A., Baumann, P., 2010. PetaScope: An open-source implementation of the OGC WCS Geo Service Standards Suite. In: Gertz, M., Lud Scher, B. (Eds.)Scientific and Statistical Database Management: 22nd International Conference, SSDBM 2010. Springer, Berlin/Heidelberg. Heidelberg, Germany, June 30–July 2. Amirian P, Winstanley AC, and Basiri A (2013) NoSQL storage and management of geospatial data with emphasis on serving geospatial data using standard geospatial web services. Baas, B., 2012. NoSQL spatial: Neo4j versus PostGIS. Delft University of Technology, Delft. Baumann P (2013) Query language guide. Available: http://www.rasdaman.org/browser/manuals_and_examples/manuals/doc-guides/ql-guide.pdf [accessed 09.20.2016]. Baumann, P., Furtado, P., Ritsch, R., Widmann, N., 1997. The RasDaMan approach to multidimensional database management. In: Proceedings of the 1997 ACM Symposium on Applied ComputingACM, San Jose, CA. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E., 2008. BigTable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 1–26. Chodorow, K., 2013. MongoDB: The definitive guide. O’Reilly Media, Newton, MA. Date, C.J., 2003. An introduction to database systems. Addison-Wesley, Reading, MA. Decandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W., 2007. Dynamo: Amazon’s highly available key-value store. SIGOPS Operating Systems Review 41, 205–220. ESRI, 1998. ESRI Shapefile technical description: An ESRI White Paper. ESRI, Redlands, CA. ESRI, 2005. Raster data in ArcSDE 9.1: An ESRI White Paper. ESRI, Redlands, CA. Ghemawat, S., Gobioff, H., Leung, S.-T., 2003. The Google file system. SIGOPS Operating Systems Review 37, 29–43. Goel, A., 2015. Neo4j Cookbook. Packt Publishing, Birmingham. Guttman, A., 1984. R-trees: A dynamic index structure for spatial searching. ACM, New York. Han, J., Haihong, E., Le, G., Du, J., 2011. Survey on NoSQL database. In: 2011 6th International Conference on Pervasive Computing and Applications (ICPCA)IEEE, Port Elizabeth, South Africa, pp. 363–366. Harrison, G., 2015. Next generation databases: NoSQL and big data. Apress, New York. Heywood, I., Cornelius, S., 2010. An introduction to geographical information systems. Pearson, Harlow. Kothuri, R., Godfrind, A., Beinat, E., 2008. Pro oracle spatial for oracle database 11g. Dreamtech Press, New Delhi. Kroenke, D.M., Auer, D.J., 2010. Database concepts. Prentice Hall, Upper Saddle River, NJ. Melton, J., 2002. Advanced SQL: 1999: Understanding object-relational and other advanced features. Morgan Kaufmann, Boston, MA. Morehouse, S., 1985. ARC/INFO: A geo-relational model for spatial information. In: Proceedings of 7th International Symposium on Computer Assisted Cartography, pp. 388–398. Washington, DC. Morehouse, S., 1989. The architecture of ARC/INFO. In: Proceedings of International Conference on Computer Assisted Cartography (Auto-Carto 9), pp. 266–277. Baltimore, MD.

GIS Databases and NoSQL Databases

79

Moussalli, R., Srivatsa, M., Asaad, S., 2015. Fast and flexible conversion of GeoHash codes to and from latitude/longitude coordinate. In: IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)IEEE, Vancouver, Canada, pp. 179–186. Neo Technology (2013). Neo4j Spatial v0.12-neo4j-2.0.0-SNAPSHOT [Online]. Available: http://neo4j-contrib.github.io/spatial/ (accessed 09.20 2016). Paterson, M.S., Yao, F.F., 1990. Efficient binary space partitions for hidden-surface removal and solid modeling. Discrete & Computational Geometry 5, 485–503. Paterson, J., Edlich, S., Horning, H., Horning, R., 2006. The definitive guide to db4o. Apress, Berkely, CA. Pourabbas, E., 2014. Geographical information systems: Trends and technologies. CRC Press, Boca Raton, FL. Rigaux, P., Scholl, M., Voisard, A., 2001. Spatial databases: With application to GIS. Morgan Kaufmann, San Francisco, CA. Robert, W., 2001. Understanding ArcSDE: GIS by ESRI. ESRI, Redlands, CA. Robinson, J.T., 1981. The KDB-tree: A search structure for large multidimensional dynamic indexes. In: Proceedings of the 1981 ACM SIGMOD International Conference on Management of dataACM, pp. 10–18. Ann Arbor, MI. Sellis T, Roussopoulos N, and Faloutsos C (1987) Themain-tiRþ-tree: A dynamic index for multi-dimensional objects. Shan, W., Shixuan, S., 2014. Database system introduction. Higher Education Press, Beijing. Shekhar, S., Chawla, S., 2003. Spatial databases: A tour. Prentice Hall, Upper Saddle River, NJ. Shekhar, S., Xiong, H., 2007. Encyclopedia of GIS. Springer, New York. White, T., 2012. Hadoop: The definitive guide. O’Reilly Media, Newton, MA. Yue, P., Jiang, L., 2014. BigGIS: How big data can shape next-generation GIS. In: Third International Conference on Agro-geoinformatics (Agro-geoinformatics 2014)IEEE, pp. 1–6. Beijing, China. Zeiler, M., 1999. Modeling our world: The ESRI guide to geodatabase design. ESRI, Redlands, CA.

1.07

Geospatial Semantics

Yingjie Hu, University of Tennessee, Knoxville, TN, United States © 2018 Elsevier Inc. All rights reserved.

1.07.1 1.07.2 1.07.2.1 1.07.2.2 1.07.2.3 1.07.2.4 1.07.2.5 1.07.2.6 1.07.3 References

1.07.1

Introduction Six Major Research Areas Semantic Interoperability and Ontologies Digital Gazetteers Geographic Information Retrieval Geospatial Semantic Web and Linked Data Place Semantics Cognitive Geographic Concepts and Qualitative Reasoning Summary and Outlook

80 81 81 82 83 85 86 87 88 89

Introduction

The term semantics refers to the meaning of expressions in a language and is in contrast with the term syntactics. For example, the two expressions “I love GIS” and “I ❤ GIS” have clearly different syntactics; however, they have close, if not the same, semantics. The term geospatial semantics adds the adjective geospatial in front of semantics, and this addition both restricts and extends the initial applicable area of semantics. On one hand, geospatial semantics focuses on the expressions that have a connection with geography rather than any general expressions; on the other hand, geospatial semantics enables studies on not only linguistic expressions but also the meaning of geographic places, geospatial data, and the GeoWeb. While geospatial semantics is a recognized subfield in Geographic Information Science (GIScience) (Agarwal, 2005; Mark et al., 2000), it also involves a variety of related research areas. Kuhn (2005) defines geospatial semantics as “understanding GIS contents, and capturing this understanding in formal theories.” This definition can be divided into two parts: understanding and formalization. The understanding part triggers the question: Who is supposed to understand the geographic information systems (GIS) content, people or machines? When the answer is “people,” research in geospatial semantics involves human cognition of geographic concepts and spatial relations (Egenhofer and Mark, 1995; Golledge, 2002; Smith and Mark, 2001),whereas when the answer is “machines,” it can involve research on the semantic interoperability of distributed systems, digital gazetteers, and geographic information retrieval (GIR) (Bishr, 1998; Fonseca et al., 2002; Goodchild and Hill, 2008; Harvey et al., 1999; Jones and Purves, 2008). The second part of the definition proposes to capture this understanding through formal theories. Ontologies, as formal specifications of concepts and relations, have been widely studied and applied in geospatial semantics (Couclelis, 2010; Frank, 2001; Pundt and Bishr, 2002; Visser et al., 2002), and formal logics, such as first-order logic (Russell et al., 2003) and description logics (Hitzler et al., 2009), are often employed to define the concepts and axioms in an ontology. While Kuhn’s definition includes these two parts, research in geospatial semantics is not required to have bothdone study can focus on understanding, while another one examines formalization. Advances in computer and information technologies, especially the Web, have greatly facilitated geospatial semantic research. With the Semantic Web initially proposed by Berners-Lee et al. (2001), Egenhofer (2002) envisioned the geospatial Semantic Web which is able to understand the semantics of geospatial requests of users and automatically obtain relevant results. The development of Linked Data (Bizer et al., 2009a) as well as the resulting Linked Open Data (LOD) cloud (Heath and Bizer, 2011) has fostered geospatial semantic studies on organizing, publishing, retrieving, and reusing geospatial data as structured Linked Data (Janowicz et al., 2013). Meanwhile, there is a rapid increase in the volume of unstructured natural language text on the Web, such as social media posts, blogs, and Wikipedia entries. While often subjective, textual data reveal the understanding and perceptions of people toward natural and social environments. Existing studies have demonstrated the use of unstructured text data in extracting place semantics and understanding the spatiotemporal interaction patterns between people and places (Adams and McKenzie, 2013; Ballatore and Adams, 2015; Hu et al., 2015c). More novel research topics based on big text data will become possible, given the fast development of natural language processing (NLP) and text mining techniques. Geospatial semantics is a broad field that adopts a unique research perspective toward geospatial problems. To some extent, geospatial semantics can be compared with geospatial statistics: both can be applied to various problems across domains and both have their own unique set of methods (e.g., ontological modeling and NLP for geospatial semantics). In recent years, a lot of research on geospatial semantics has been conducted, and the results are published in journals or presented in conferences, such as the Conference on Spatial Information Theory, the International Conference on GIScience, the International Conference on geospatial semantics, and many others. This article systematically reviews and summarizes the existing efforts. The objective is to delineate a road map that provides an overview on six major research areas in geospatial semantics.

80

Geospatial Semantics

1.07.2

Six Major Research Areas

1.07.2.1

Semantic Interoperability and Ontologies

81

Semantic interoperability was driven by the componentization of GIS. While GIS were traditionally used locally, geospatial functions and data were increasingly encapsulated into services and shared on the Web (Kuhn, 2005). As a result, it became necessary to formally define the semantics of the distributed Web services, so that they can automatically interact with each other and be dynamically integrated. Semantic interoperability is also critical for Spatial Data Infrastructures (SDIs) that provide access to a wealth of distributed geospatial data sources and services which can be combined for various queries and tasks (Alameh, 2003; Lemmens et al., 2006; Lutz et al., 2007, 2009). While semantic interoperability can refer to any matchmaking process (e.g., matching a Web document to a user’s query), this section focuses on enabling the integration among distributed geospatial data and services. A major approach for enabling semantic interoperability is developing ontologies. While studied in the field of philosophy as the nature of being, ontologies in geospatial semantics are closer to those in computer science and bioinformatics, which serve the function of formalizing the meaning of concepts in a machine-understandable manner (Bittner et al., 2005; Couclelis, 2009; Gruber, 1993; Guarino, 1998; Stevens et al., 2000). From a data structure perspective, an ontology can be considered a graph with concepts as nodes and relations as edges. Fig. 1A shows a fragment taken from an example ontology developed in the GeoLink project supported by the US National Science Foundation (NSF) (Krisnadhi et al., 2015b). The concepts (e.g., Cruise and Vessel) and the relations (e.g., isUndertakenBy) in this example ontology have been labeled using terms in the corresponding domain (oceanography in this case). Ontologies are often embedded into GIS and Web services as an additional component to enable semantic interoperability (Frank, 1997). Examples include the Ontology-Driven GIS proposed by Fonseca and Egenhofer (1999), as well as other ontology-based approaches developed by Hakimpour and Timpf (2001), Fonseca et al. (2000), and Fallahi et al. (2008). Kuhn (2003) proposed semantic reference systems, an ontology-based system analogous to the existing spatial and temporal reference systems widely used in GIS to facilitate semantic interoperability. Ontologies must be developed before they can be used in a GIS. Three types of ontologies can be identified from the literature: top-level ontology, domain ontology, and ontology design pattern (ODP). A top-level ontology contains general terms (e.g., isPartOf, endurant, and perdurant) that can be used across domains, while domain ontologies formalize the concepts for a specific discipline (Ashburner et al., 2000; Guarino, 1997; Rogers and Rector, 1996). The ontologies used in GIScience are generally considered as domain ontologies and are often called geographic ontologies or geo-ontologies (Fonseca et al., 2006; Tomai and Kavouras, 2004a). ODPs are developed based on applications. Instead of seeking agreements within or across domains, they capture the common needs shared by multiple applications (Gangemi, 2005; Gangemi and Presutti, 2009). The process of developing ontologies is called ontology engineering. Three types of approaches are often used for ontology engineering, which are top-down, bottom-up, and hybrid approaches. Top-down approaches rely on knowledge engineers and domain experts to define and formalize the ontological concepts and relations (Brodaric, 2004; Gates et al., 2007; Schuurman and Leszczynski, 2006; Shankar et al., 2007; Wang et al., 2007). Bottom-up approaches employ data mining methods to extract concepts and relations from structured databases or unstructured natural language text (Baglioni et al., 2007; Buitelaar et al., 2005; Maedche and Staab, 2004; Sen, 2007; Shamsfard and Barforoush, 2004). The hybrid approaches integrate the previous two and combine both expert knowledge and results from data mining processes (Buitelaar et al., 2004; Hu and Janowicz, 2016; Prieto-Díaz, 2003). One challenge in ontology engineering is to define the semantics of the primitive terms (the atomic concepts that cannot be further divided) in an explicit and unambiguous manner. To address this challenge, researchers have proposed to ground the

Fig. 1

(A) A fragment of an ontology from the NSF GeoLink project (Krisnadhi et al., 2015b); (B) a fragment of a possible gazetteer.

82

Geospatial Semantics

primitive terms based on the environment and the observation process (Janowicz, 2012; Mallenby, 2007; Scheider et al., 2009; Schuurman, 2005; Third et al., 2007). Ontologies can not only be encoded using formal logics but can also be implemented as a simple structured vocabulary. A data standard is a simple ontology (Bittner et al., 2005). So far, many geo-ontologies have been developed. For example, Grenon and Smith (2004) developed SNAP and SPAN which are two general geo-ontologies for modeling continuants and occurrents, respectively. Worboys and Hornsby (2004) proposed the geospatial event model which extends the traditional object model with events to capture dynamic geospatial processes. There are also geo-ontologies for ecosystems (Sorokine et al., 2004), hydrology (Feng et al., 2004), and Earth Science (Raskin and Pan, 2005). In addition, multilayer ontologies, which distinguish the entities in the physical world and their representations in human cognition, were developed for spatiotemporal databases (Frank, 2001, 2003) and geographic information (Couclelis, 2010). Recent years have witnessed rapid development of ODPs in the geospatial domain, such as the ODPs for semantic sensor networks (Compton et al., 2012), semantic trajectories (Hu et al., 2013), barrier dynamics (White and Stewart, 2015), cartographical scale (Carral et al., 2013), surface water features (Sinha et al., 2014), oceanographic cruises (Krisnadhi et al., 2015a), and space–time prisms (Keßler and Farmer, 2015). With many ontologies developed by different researchers and communities, it is often necessary to align these ontologies to support data integration, a process known as ontology alignment (Cruz et al., 2004; Hess et al., 2007). Based on the alignment direction, we can identify centralized and peer-to-peer alignments (Sunna and Cruz, 2007). The former aligns multiple ontologies to a standard ontology, while the latter establishes links between two peer ontologies. The alignment methods can be classified into element-level, structure-level, and hybrid methods (Shvaiko and Euzenat, 2005). Element-level methods focus on the individual concepts and relations in an ontology and compare the similarities of the label strings as well as their dictionary definitions (e.g., based on WordNet (Miller, 1995)) (Lin and Sandkuhl, 2008). Structure-level methods examine not only the terms themselves but also their neighboring concepts in the ontology (Sunna and Cruz, 2007). Hybrid approaches combine the element- and structure-level methods, and examples include the Alignment API (Euzenat, 2004), SAMBO (Lambrix and Tan, 2006), RiMOM (Li et al., 2009), and Falcon-AO (Hu and Qu, 2008). There are also studies that align ontologies based on the instances inside the ontology concepts (Brauner et al., 2007; Navarrete and Blat, 2007). As conflicts can rise during the alignment process, some research has incorporated human experts in the alignment process, such as COMAþþ (Aumueller et al., 2005) and AgreementMaker (Cruz et al., 2007). While a lot of research has been conducted on ontologies, it is worth noting that developing an ontology is only one approach for realizing semantic interoperability. In fact, some researchers have criticized the use of ontologies to address semantic issues by arguing that ontologies as a priori agreements cannot capture the meaning of concepts that change dynamically (Di Donato, 2010; Gärdenfors, 2004). New approaches for semantic interoperability may also be possible and will need further investigation.

1.07.2.2

Digital Gazetteers

Digital gazetteers are structured dictionaries for named places. The place entries within a digital gazetteer are often organized into a graph, with nodes representing places and edges capturing their relations (e.g., Los Angeles is part of California). In fact, digital gazetteers can be considered as a special type of ontology. Fig. 1B shows a fragment of a possible gazetteer whose organization structure shares similarity with the ontology fragment in Fig. 1A. The reason that digital gazetteers are discussed as a separate section is their vital importance in GIScience (Goodchild and Hill, 2008). There exist many applications of digital gazetteers, including geocoding, navigation, and GIR (Alani et al., 2001; Rice et al., 2012; Schlieder et al., 2001). Three core components are usually contained in a digital gazetteer: place names (N), place types (T), and spatial footprints (F) (Hill, 2000). These three components enable three common operations: spatial lookup (N / F), type lookup (N / T), and reverse lookup (F ( T) / N) (Janowicz and Keßler, 2008). As people frequently use place names rather than numeric coordinates to refer to places, digital gazetteers fill the critical gap between informal human discourses and formal geographic representations. From a perspective of geospatial semantics, digital gazetteers help machines understand the geographic meaning (e.g., the spatial footprint) of a textual place name as well as the relations among places. Examples of digital gazetteers include GeoNames, Getty Thesaurus for Geographic Names, GEOnet Names Server, Geographic Names Information System (GNIS), and the Alexandria Digital Library (ADL) Gazetteer. One important topic in gazetteer research is enriching existing gazetteers with local or vernacular place entries. Gazetteers are traditionally developed and maintained by naming authorities (e.g., the Board on Geographic Names in the United States) and often do not contain the local place names used in everyday conversations (Davies et al., 2009; Hollenstein and Purves, 2010). For example, the entry San Francisco Chinatown shown in Fig. 1B may not be included in a traditional gazetteer. Yet, such vernacular places are important for some GIS applications (e.g., finding the hotels in San Francisco Chinatown for tourists). Since these places often do not have clearly defined boundaries, research has been conducted to represent their vague spatial footprints. For example, Burrough and Frank (1996) used a fuzzy-set-based approach to extract the intermediate boundaries of vague places. Montello et al. (2003b) asked human participants to draw the boundary of downtown Santa Barbara and found a common core area agreed upon by the participants. Jones et al. (2008a) proposed a computational approach which employs a Web search engine to harvest the geographic entities (e.g., hotels) associated with a vague place name, and then used kernel density estimation (KDE) to represent the vague boundary. Geotagged photos (e.g., Flickr photos) provide natural links between textual tags (which often contain vernacular place names) and geographic locations and have been utilized by many researchers to model vague places, such as Grothe and Schaab (2009), Keßler et al. (2009b), Intagorn and Lerman (2011), Li and Goodchild (2012), and Gao et al. (2017). The methods used for modeling vague boundaries include KDE, characteristic shape (Duckham et al., 2008), and the

Geospatial Semantics

83

“egg-yolk” representation (Cohn and Gotts, 1996). A recent work from Chen and Shaw (2016) proposed a weighted KDE approach that assigns different weights to Flickr photo locations with different representativeness of the vague place. In addition, there is research that focuses on assigning place types to place instances rather than generating geometric representations for spatial footprints. For example, Uryupina (2003) developed a bootstrapping approach that can automatically classify places into predefined types (e.g., city and river) and achieved a high precision of about 85%. It is worth noting that the inclusion of user-generated content (e.g., geotagged photos) brings the issues of data quality and credibility. User reputations (Bishr and Kuhn, 2007), topological relations (e.g., islands should be surrounded by water) (Keßler et al., 2009a), and other methods could be used to address these issues. Another topic in digital gazetteer research is to align and conflate multiple gazetteers. Digital gazetteers from independent sources may have different geographic coverages, different spatial footprints (e.g., the same place may be represented as a point or as a polygon in different gazetteers), different place types, and different attributes. While these differences can be combined to form a richer data resource, they also present challenges for gazetteer conflation. Based on the conflating targets, we can identify schema- and instance-level conflations. The schema-level conflation aligns the place types from one gazetteer to that of another gazetteer and can be considered as a special type of ontology alignment. Naturally, the ontology alignment methods, such as those based on the similarities of labels, definitions, and structures, can be employed to align the place types in different gazetteers (Rodríguez and Egenhofer, 2003). There are also methods that leverage the spatial distribution patterns of place instances belonging to a place type to align place types, such as the work from Navarrete and Blat (2007) and Zhu et al. (2016). The instance-level conflation aims at merging the specific place entries in different gazetteers. There exist a variety of methods for measuring the similarities of spatial footprints (geometries) (Goodchild and Hunter, 1997; Li and Goodchild, 2011), place types (ontologies) (Rodríguez et al., 1999), and place names (strings) (Sankoff and Kruskal, 1983). These similarity metrics are sometimes combined into workflows to conflate place entries, such as the work from Samal et al. (2004), Sehgal et al. (2006), and Hastings (2008). There are other research topics in digital gazetteers. One is to equip digital gazetteers with the capability of reasoning. A digital gazetteer is typically implemented as a plain place dictionary without the ability to infer additional information from the existing content. Janowicz and Keßler (2008) and Keßler et al. (2009a) consider the place types in digital gazetteers as ontologies and use logics to formally define and encode reasoning rules. They also designed prototypical Web and programming interfaces that can support subsumption (i.e., identifying the sub concepts of a broader concept) and similarity-based reasoning. With rapid advances of Semantic Web technologies (see more in Geospatial Semantic Web and Linked Data section), new data models and computational methods can enhance the query answer capability of existing gazetteers. GeoNames is among the early pioneers that employed the Semantic Web technologies (Bizer et al., 2008). Another research topic in digital gazetteer focuses on the temporal dimension of places. These studies are often conducted in the context of historical gazetteers in which the place names, boundaries, or place types change over the years (Martins et al., 2008; Southall et al., 2011). A unique approach is presented by Mostern and Johnson (2008) who use events instead of locations as the fundamental unit for building a historical digital gazetteer.

1.07.2.3

Geographic Information Retrieval

GIR is about retrieving relevant geographic information based on user queries (Jones and Purves, 2008). More generally, GIR can refer to retrieving geographic information from any type of data source, such as a structured database. However, studies on GIR usually focus on retrieving geographic information from unstructured data (Larson, 1996), especially from natural language text on the Web (McCurley, 2001; Purves et al., 2007). It is estimated that 13%–15% of Web queries contain place names (Jones et al., 2008b; Sanderson and Kohler, 2004). While GIR is traditionally considered an extension of information retrieval (BaezaYates and Ribeiro-Neto, 1999), it has also received a lot of attention from the GIScience community, as demonstrated by the GIR workshop series which began in 2004 (Jones and Purves, 2014) as well as the specialist meeting on Spatial Search held in Santa Barbara in 2015 (Ballatore et al., 2015). It is not difficult to see the inherent connection between GIR and geospatial semantics. In order to retrieve relevant results, it is critical to understand the meaning of both the user queries and the candidate results (Janowicz et al., 2011). One important topic in GIR is place name disambiguation (also called toponym disambiguation) which aims at understanding the actual geographic place a place name refers to. Different places can have the same name (e.g., there are more than 43 populated places in the United States named “Washington”), and one place can have several different names (e.g., California is also called “The Golden State”). Therefore, how can we identify the correct geographic place when a place name shows up in a query or in a Web document? A general strategy toward place name disambiguation is to measure the similarity between the current context of the place name (i.e., the surrounding words) and the likely context of each possible candidate place. The likely context of the candidate places (e.g., the persons, organizations, or other places that are likely to be associated to the place) can be extracted from external gazetteers or knowledge bases, such as WordNet (Buscaldi and Rosso, 2008a) and DBpedia (Hu et al., 2014). These likely contexts can also be learned from data sources, such as Wikipedia, in a data-driven fashion (Cucerzan, 2007; Overell and Rüger, 2008). Similarity metrics, such as conceptual density (Agirre and Rigau, 1996) and cosine similarity (Bunescu and Pasca, 2006), can then be employed to quantify the similarities between the surrounding context of the place name and the likely context of each candidate place. While some similarity metrics are based on words and entities, some others are based on the geographic distance or overlap between the locations found in the surrounding context and the location of each candidate place (Buscaldi and Rosso, 2008b; Leidner, 2008; Smith and Crane, 2001). More recently, topic modeling techniques, such as Latent Dirichlet Allocation (LDA), have also been introduced to address the problem of place name disambiguation. For example, Ju et al. (2016) proposed an approach that divides the surface of the Earth using a triangular mesh (see Fig. 2 for an illustration). Their approach then maps georeferenced Wikipedia texts

84

Geospatial Semantics

Fig. 2

A triangular mesh for computing the thematic topics on the surface of the Earth.

into these triangles and uses LDA to model the topics of each triangle. These topics are then compared with the target place name for disambiguation. The problem of place name disambiguation is also closely related to geoparsing (Gelernter and Balaji, 2013; Gelernter and Mushegian, 2011; Moncla et al., 2014; Vasardani et al., 2013; Wallgrün et al., 2014), which involves first detecting the existence of place names from natural language text and then disambiguate it. Another topic in GIR is ranking candidates based on the input query. Such a topic often boils down to computing the matching score between the input query and a candidate result. Once the matching scores are computed, the candidates can then be ranked. Many queries in GIR can be characterized using the format “ < spatial relationship > < location>” (Jones and Purves, 2008), such as the query “natural disasters in California.” Accordingly, the matching scores can be calculated based on these three components (Mata, 2007). To quantify the similarity between the thematic concepts in the query and those in the candidates, domain ontologies are often employed to identify the shortest distance between two concepts in an ontology (e.g., “natural disaster” and “earthquake”) (Jones et al., 2001). The thematic similarity can also be quantified through keyword expansion which enriches the input queries, which are typically short, with thematically related terms. External knowledge bases, such as WordNet (Buscaldi et al., 2005; Hu et al., 2015b; Stokes et al., 2008), or data mining approaches, such as latent semantic analysis (Li et al., 2014), can be employed for keyword expansion. Locations can be extracted from Web documents through geoparsing and place name disambiguation and can then be used for spatially indexing the Web documents (Amitay et al., 2004; Silva et al., 2006; Wang et al., 2005). The extracted locations provide basis for computing the spatial relationships between the input query and the possible candidates. Spatial similarity metrics, such as those based on minimum bounding box or convex hull (Frontiera et al., 2008), have been used to quantify the spatial relations. In addition to the geometry-based approaches, digital gazetteers, which contain information about places and their relationships, are also used to quantify the similarity of places based on their proximity in the gazetteer graph (Jones et al., 2002; Keßler et al., 2009a). With the similarity scores based on theme, spatial relationship, and location, a final matching score can then be computed by combining these scores. To evaluate the performance of different GIR methods, there are projects, such as GeoCLEF (Gey et al., 2005) and the Toponym Resolution Markup Language (Leidner, 2006), which establish ground truth data by providing standard and annotated corpus. The frequently used evaluation metrics include precision, recall, and F1 score (Manning et al., 2008), which are defined as follows: Precision ¼

Recall ¼

jRetrieved Relevant Resultsj jAll Retrieved Resultsj jRetrieved Relevant Resultsj jAll Relevant Resultsj

F1 score ¼ 2$

Precision  Recall Precision þ Recall

(1)

(2)

(3)

In addition to retrieving geographic information from the Web, GIR is applied in many other contexts. One major application domain is SDI. As SDIs usually manage a large amount of data and metadata, it is important to use an effective search method that allows potential data users to quickly find the data they need (Janowicz et al., 2008). Ontologies are often used to associate metadata with external concepts and terminologies to facilitate data discovery. The application examples include Geosciences Network (Bowers et al., 2004), Linked Environments for Atmospheric Discovery (Droegemeier et al., 2005), and Virtual Solar Terrestrial Observatory (Fox et al., 2009). Another type of GIR applications focuses on modeling the task of the user. Instead of having a short keyword query, the user may have a more complicated task. Modeling the task of the user can help retrieve the

Geospatial Semantics

85

information that can fit the user’s needs. For example, Wiegand and García (2007) proposed to model professional tasks (e.g., emergency response) as ontologies and identify the information that can be paired with the tasks. Hu et al. (2016) modeled the spatiotemporal properties of the daily tasks of an individual (e.g., traveling to the workplace) and identified the geographic information that can help complete the task using information value theory. There are also GIR studies that focus on the design of effective query interfaces (Jones et al., 2002; Purves et al., 2007) as well as the searching and indexing of remote sensing images (Shyu et al., 2007).

1.07.2.4

Geospatial Semantic Web and Linked Data

The Semantic Web was originally proposed by Berners-Lee et al. (2001). It was a vision in which the Web was populated with structured and semantically annotated data that can be consumed by not only human users but also machines. The Semantic Web can be considered as an enhancement of the existing document-based Web, on which the unstructured and natural language contents in Web pages are difficult for machines to understand. For example, a computer program may find it difficult to know that the current Web page is about Washington, DC, and that this page is linked to another page describing the United States is because Washington, DC, is the capital of the United States. The vision of the Semantic Web was quickly embraced by the GIScience community (Egenhofer, 2002) and has influenced the thinking of the community on data organization, sharing, reusing, and answering complex spatial queries (Hart and Dolbear, 2013; Kuhn et al., 2014). The term geospatial Semantic Web has been frequently used in the studies that focus on the geospatial part of the Semantic Web (Bishr, 2006; Fonseca, 2008; Yue, 2013). Realizing the vision of the Semantic Web requires the current Web to be populated with structured and semantically annotated data. Linked Data has been proposed by the World Wide Web Consortium (W3C) as a general guidance for publishing such data (Bizer et al., 2009a; Heath and Bizer, 2011). The term Linked Data has a twofold meaning that is often used interchangeably. On the one hand, it refers to four principles for publishing well-structured data, such as using Uniform Resource Identifiers, and providing data descriptions readable to both humans and machines. On the other hand, it refers to the data that have been published following these four principles. Since the data to be published can come from a variety of domains with diverse attributes, Resource Description Framework (RDF) has been employed for organizing and publishing Linked Data. RDF represents data as subject, predicate, and object, and the three together are called a triple (Brickley and Guha, 2000; Hitzler et al., 2009). Different data formats can be used to implement RDF, such as XML, Turtle, RDFa, and JSON-LD (Adams, 2016). From 2007 to 2014, more than 570 data sets (with billions of RDF triples) were published on the current Web (Hu and Janowicz, 2016), forming the LOD cloud which can be considered as a prototypical realization of the Semantic Web vision. Some of these Linked Data sets focus on geographic content, including GeoNames, Linked Geo Data (which contains OpenStreetMap data) (Auer et al., 2009b), and the ADL Gazetteer (see Fig. 3). Other Linked Data sets not only provide more general content but also contain a significant amount of geographic data, such as DBpedia which is the Semantic Web version of Wikipedia (Auer et al., 2007; Bizer et al., 2009b; Lehmann et al., 2015). Some Linked Data sets are contributed by authoritative agencies, such as the US Geological Survey (Usery and Varanka, 2012) and the UK Ordnance Survey (Goodwin et al., 2008). Two broad topics are investigated in geospatial Semantic Web and Linked Data: (1) How to effectively annotate and publish geospatial content? and (2) How to retrieve data to answer complex questions? For the first topic, geo-ontologies and ODPs, such as the semantic sensor network ODP (Compton et al., 2012) and the cartographic scale ODP (Carral et al., 2013) discussed previously, have been developed to formalize the semantics of the data. Software tools, such as Triplify (Auer et al., 2009a), CSV2RDF (Ermilov et al., 2013), and TripleGeo (Patroumpas et al., 2014), have been developed to extract data from traditional

Fig. 3

The Linked Open Data cloud as of 30 Aug. 2014 (http://lod-cloud.net/) and several major geographic data sets.

86

Geospatial Semantics

data sources (e.g., relational databases) and convert them into RDF. Linked Data servers (also called triplestores), such as Virtuoso (Erling and Mikhailov, 2010) and GraphDB (Bellini et al., 2015), can publish billions of triples and have provided some support to geospatial data. Despite these developments, questions, such as how to convert raster data into Linked Data and how to represent the temporal validity of geospatial data triples, still remain (Janowicz et al., 2012; Kuhn et al., 2014). For retrieving data from the Semantic Web, the SPARQL language (Pérez et al., 2006) is a standard RDF query language recommended by W3C to retrieve Linked Data. The syntax of SPARQL is similar to Structure Query Language that has been widely used in relational databases. However, SPARQL does not directly support the queries based on spatial relations (e.g., within and intersects). To address this issue, GeoSPARQL was proposed as an extension of SPARQL and has been endorsed by the Open Geospatial Consortium (Battle and Kolas, 2011). Triplestores, such as Parliament (Battle and Kolas, 2012) and Oracle Spatial and Graph (Perry et al., 2015), have also implemented GeoSPARQL to support geospatial queries. In addition to GeoSPARQL, there exist other research efforts for supporting spatial and temporal queries, such as the work from Perry et al. (2007) and Gutierrez et al. (2005). Many other applications make use of the technologies and data models from the geospatial Semantic Web. One type of application is SDIs which can be considered as local geospatial Semantic Webs. The Linked Data principles have been applied to interlinking the metadata and services hosted by a SDI and have facilitated search and resource discoveries (Athanasis et al., 2009; Janowicz et al., 2010; Zhang et al., 2010; Zhao et al., 2015). Another type of application is using Linked Data sets as external knowledge bases to support named entity recognition and disambiguation. The examples include DBpedia Spotlight (Mendes et al., 2011) and Open Calais (Gangemi, 2013) which can identify and extract geographic places and other types of entities (e.g., persons, companies, and universities) from unstructured texts with high accuracy and computational efficiency. There are also Linked Datadriven portals that enable users to interactively explore Linked (Geo) Data by following the links between entities. One example is Spatial@LinkedScience which hosts the bibliographic Linked Data for researchers, papers, and organizations in major GIScience conferences (Keßler et al., 2012). Another example is the portal developed by Mai et al. (2016) which enables users to explore Geoscience and Oceanography Linked Data from tabular, graph, and map views.

1.07.2.5

Place Semantics

Place is an important concept in GIScience (Winter et al., 2009; Goodchild, 2011). The notion of place is closely associated with human experience and can be differentiated from space (Couclelis, 1992; Fisher and Unwin, 2005; Tuan, 1977). Accordingly, not any space can be considered as a place. Place plays an indispensable role in human communication, and place names are frequently used in daily conversations (Cresswell, 2014; Winter and Freksa, 2012). Research on place semantics focuses on understanding the meaning of places through human descriptions and human–place interactions. Traditionally, interviews were employed to elicit people’s opinions toward places (Cresswell, 1996; Montello et al., 2003b). While such an approach sheds valuable insight on human experience, they were labor-intensive and therefore may not be suitable for the studies that involve many places over a multiyear timespan. The emergence of the Web, especially Web 2.0, brings a big volume of data, and a lot of these data are contributed by the general users (Goodchild, 2007; Stoeckl et al., 2007). Place descriptions and human–place interactions are often contained in Web data which offer great opportunities for studying places (Purves et al., 2011; Winter et al., 2009). By harvesting place-related data and designing automatic algorithms, we can perform studies that can scale up. This section focuses on data-driven and computational approaches toward place semantics. Two types of place-related data can be found on the Web. The first type contains only textual descriptions about places. Examples include place descriptions on city websites (Tomai and Kavouras, 2004b), travel blogs, and Wikipedia articles. This type of data does not contain explicit geographic coordinates, and geoparsing is necessary to extract and locate places (Vasardani et al., 2013). Wikipedia is a valuable data resource that provides descriptions about a large number of cities and towns in the world. The second type of data contains associations between descriptions and geographic coordinates. Examples include the various location-based social media data, such as geotagged tweets and Flickr photos. With the given link between descriptions and locations, the descriptive texts can be aggregated to the corresponding locations for place studies. Usually, location-based social media data also contain the time when a user interacts with a place (e.g., by checking in), and therefore can be used to study human–place interactions. In addition to the wide availability of place-related data, the fast development of NLP techniques and standard tools has significantly boosted the research in place semantics. For example, Manning et al. (2014) developed the Stanford NLP Toolkit which requires little programming background of the users. The text mining package in R (the tm package) also lowers the barrier of performing text analysis, especially for researchers who are already familiar with R (Feinerer and Hornik, 2012; Meyer et al., 2008). Place semantics can be studied from thematic, spatial, and temporal perspectives. The thematic perspective examines human experiences toward places through natural language descriptions. In many cases, the thematic topics discussed by people about or at a place are related to the activities that can be afforded by the place, reflecting Gibson’s affordance theory (Gibson, 1982). These thematic topics can be revealed through simple approaches, such as word clouds. Fig. 4A and B shows the top 20 most frequent words from the reviews of two places (stop words, such as “the,” “of,” and “in,” are removed). Without any further information, one can easily tell the general place type and the supported activities. More advanced NLP methods, such as LDA (Blei et al., 2003), are also used in recent years to model thematic topics based on place-related data. For example, Adams et al. (2015) and Adams (2015) have proposed LDA-based approaches for extracting thematic topics for places, quantifying place similarity thematically, and searching places based on topics (e.g., finding places associated with the topic beach). Emotions, as part of human experience, can also be extracted through sentiment analysis to understand places thematically (Ballatore and Adams, 2015). Space is another perspective for studying places. This perspective often focuses on representing the vague boundary of

Geospatial Semantics

Fig. 4

87

Word clouds constructed based on the reviews of two places.

placesd a topic that has been discussed in the previous section Digital Gazetteers. There is also research on spatially representing places based on the surrounding landmarks and examples include the works from Kim et al. (2016) and Zhou et al. (2016). These studies do not attempt to construct geometries (e.g., polygons) or density surfaces to represent the spatial footprints of places; instead, they formalize the spatial relations between the target place and nearby landmarks into a graph representation. The third perspective for studying place semantics takes a temporal view. This perspective examines the time when people are more (or less) likely to interact with a place (Ye et al., 2011). For example, McKenzie et al. (2015b) examined the check-in data of Foursquare users at different types of Points of Interest, and used the temporal check-in patterns to characterize places. Such temporal patterns also reflect human activities at the corresponding places (e.g., restaurants typically receive high number of check-ins during lunch and dinner hours) and can be applied to other applications, such as reverse geocoding (McKenzie and Janowicz, 2015). In addition to the three unit perspectives, place semantics can be studied by combining two or all three of them. For example, by combining space and time, we can examine the spatiotemporal dynamics of places, such as how the vague boundaries of places change over the years. By combining space and theme, we can develop topic maps that visualize the major topics related to different locations (Kennedy et al., 2007; Rattenbury and Naaman, 2009). By combining time and theme, we can explore the evolution of topics over time, such as the emergence of new topics and the disappearance of old ones. There are also studies that combine space, time, and thematic topics to obtain a more comprehensive understanding on places and the associated events (Hu et al., 2015a; Wang and Stewart, 2015). With the availability of new place-related data sources, such as Yik Yak (McKenzie et al., 2015a), machine learning methods, such as deep learning (LeCun et al., 2015), and data mining tools, such as TensorFlow (Abadi et al., 2015), more research on place semantics is waiting to be conducted.

1.07.2.6

Cognitive Geographic Concepts and Qualitative Reasoning

Cognitive geographic concepts generally refer to the informal geographic knowledge that people acquire and accumulate during the interactions with the surrounding environment (Golledge and Spector, 1978). Such informal knowledge was termed as naïve geography by Egenhofer and Mark (1995) and can be differentiated from the formal geographic knowledge that requires intentional learning and systematical training (Golledge, 2002). The training requirements of formal geographic knowledge can be seen from the special concepts and terminologies, such as projected coordinate systems, raster, vector, and map algebra, discussed in many GIS textbooks (Bolstad, 2005; Clarke, 1997; Longley et al., 2001). Since not every GIS user has received formal training, understanding the conceptualization of general people toward geographic concepts can facilitate the design of GIS (Smith and Mark, 1998). This section focuses on the informal understanding of general people toward geographic concepts and spatial relations, given the focus of this article on geospatial semantics. However, it is worth noting that the content discussed in this section is only part of the field of cognitive geography (Montello, 2009; Montello et al., 2003a) which involves many other topics such as geovisualization (MacEachren and Kraak, 2001). Geographic concepts and spatial relations are two types of informal geographic knowledge that people develop in everyday life. Studies on the former often examine the typical examples that people associate with the corresponding geographic concepts. For example, Smith and Mark (2001) found that nonexpert individuals usually think about entities in the physical environment (e.g., mountains and rivers) when asked to give examples of geographic features or objects, whereas they are more likely to answer with social or built features (e.g., roads and cities) when asked things that could be portrayed on a map. Such typical examples can be explained by the prototype theory from Rosch (1973) and Rosch and Lloyd (1978) in psychology, in which some members are better examples of a category. For example, robin is generally considered as a better example for the category of bird compared with penguin. There are values in understanding the typical examples of geographic concepts. For instance, it can help increase the precision of GIR by identifying the default geographic entities that are more likely to match the search terms of a user. Besides, it has been found that different communities, especially the communities with different languages, may establish their own conceptual systems (Mark and Turk, 2003). Understanding these conceptualization differences on geographic concepts can help develop GIS that can better fit local needs (Smith and Mark, 2001). Spatial relation is another type of informal geographic knowledge that we acquire by interacting with the environment. Fig. 5 provides an example which illustrates the spatial relations that a person may develop for different places near the campus of the University of California Santa Barbara. Such spatial relations are qualitative: we may know the general locations and directions of these places but not the exact distances between them (e.g., the distance between Costco and Camino Cinemas in meters is unknown to the person). Yet, these informal spatial relations are useful and sufficient for many of our daily tasks such as wayfinding and route descriptions (Brosset et al., 2007; Klippel and Winter, 2005; Klippel et al., 2005; Montello, 1998). In addition, these

88

Geospatial Semantics

Fig. 5

Qualitative relations a person may develop for the places around the University of California Santa Barbara.

relations are convenient to acquire since we do not always carry a ruler to measure the exact distances and angles between objects. These informal spatial relations also present an abstraction from some quantitative details and are not restricted to a set of specific values (e.g., the spatial relation A is to the west of B can represent an infinite number of A and B, as long as they satisfy this relative spatial constraint) (Freksa, 1991; Gelsey and McDermott, 1990). The informal conceptualization of people on geographic concepts and spatial relations can be formally and computationally modeled to support qualitative reasoning. The term qualitative reasoning should be differentiated from the term qualitativeness which may imply descriptive rather than analytical methods (Egenhofer and Mark, 1995). Spatial calculi can be employed to encode spatial relations (Renz and Nebel, 2007), such as the mereotopology (Clarke, 1981), 9-intersection relations (Egenhofer, 1991; Egenhofer and Franzosa, 1991), double-cross calculus (Freksa, 1992), region connection calculus (Randell et al., 1992), flip-flop calculus (Ligozat, 1993), and cardinal direction calculus (Frank, 1996). In addition to spatial relations, temporal relations, such as the relative relations between events, can also be formally represented using, for example, the interval algebra proposed by Allen (1983). Algorithmically, informal knowledge can be modeled as a graph with nodes representing geographic concepts (and their typical instances) and edges representing their spatial relations. If we restrict the nodes to be only place instances, we can derive a place graph. This graph representation is fundamentally different from the Cartesian coordinates and geometric rules widely adopted in existing GIS and could become the foundation of place-based GIS (Goodchild, 2011). Platial operations, as counterparts of the spatial operations (Gao et al., 2013), could then be developed by reusing and extending the existing graph-based algorithms as well as designing new ones. Constructing such a geographic knowledge graph can be challenging, since different individuals often conceptualize places and spatial relations differently. However, such a challenge also brings the opportunity of designing more personalized GIS for supporting the tasks of individuals (Abdalla and Frank, 2011; Abdalla et al., 2013). Qualitative reasoning and place-based GIS should not be seen as replacements for quantitative reasoning and geometry-based GIS (Egenhofer and Mark, 1995). Instead, they complement existing methods and systems and should be used when the application context is appropriate.

1.07.3

Summary and Outlook

This article has provided a synthetic review on the research related to geospatial semantics. Specifically, six major areas are identified and discussed, including semantic interoperability, digital gazetteers, GIR, geospatial Semantic Web, place semantics, and cognitive geographic concepts. For content organization, each research area is discussed as a separate section. However, they are interconnected and can be involved simultaneously in one study. For example, research on geospatial Semantic Web usually also employs ontologies to facilitate semantic interoperability. Similarly, digital gazetteers are widely used in GIR and place semantics to extract and disambiguate place names. In addition, understanding naïve geographic concepts is important for developing ontologies that can be agreed on by multiple communities. The six major research areas share a common core, namely understanding the meaning of geographic information. Such an understanding brings its own values to geospatial research and applications. For example, constructing ontologies, gazetteers, and Linked Data can help machines process geographic information more effectively and automatically extract knowledge more efficiently. Examining place semantics can help people quickly grasp the meaning of places from a large amount of texts. Modeling cognitive geographic concepts and designing intelligent GIR algorithms can help machines understand the thinking of people, thus facilitating the interactions between GIS and users. In sum, geospatial semantics offers a unique semantic perspective toward advancing GIScience research. Other topics in geospatial semantics can be investigated in the near future. For example, the studies reviewed in this article focused on English data, and there is a lot of geographic information collected and recorded in other languages. Research efforts, such as those from Ouksel and Sheth (1999), Mark and Turk (2003), Nowak et al. (2005), and Mata-Rivera et al. (2015), are important for understanding and processing multilingual and multicultural data sets. In addition, there can be more applications of high-performance computing (HPC) in geospatial semantics. Existing HPC frameworks (Yang et al., 2011; Wang, 2010) can be integrated with the semantic methods to crunch the large volume of geospatial and textual data. Given the rich amount of existing literature, this article cannot cover all related studies. However, hopefully it can serve as an entry point for exploring the world of geospatial semantics.

Geospatial Semantics

89

References Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J. and Devin, M. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org, 1. Abdalla, A. and Frank, A. U. (2011). Personal geographic information management. In: Proceedings of the Workshop on Cognitive Engineering for Mobile GIS, Belfast. Abdalla, A., Weiser, P., Frank, A.U., 2013. Design principles for spatio-temporally enabled pim tools: A qualitative analysis of trip planning. In: Vandenbroucke, D., Bucher, B., Crompvoets, J. (Eds.), Geographic information science at the heart of Europe. Springer International Publishing, Dordrecht. Adams, B., 2015. Finding similar places using the observation-to-generalization place model. Journal of Geographical Systems 17, 137–156. Adams, B., 2016. Wahi, a discrete global grid gazetteer built using linked open data. International Journal of Digital Earth 1, 1–14. Adams, B., Mckenzie, G., 2013. Inferring thematic places from spatially referenced natural language descriptions. In: Sui, D.Z., Elwood, S., Goodchild, M.F. (Eds.), Crowdsourcing geographic knowledge: Volunteered Geographic Information (VGI) in theory and practice. Springer, Dordrecht. Adams, B., Mckenzie, G. and Gahegan, M. (2015). Frankenplace: Interactive thematic mapping for ad hoc exploratory search. In: Proceedings of the 24th International Conference on World Wide Web, pp. 12–22. New York: ACM. Agarwal, P., 2005. Ontological considerations in GIScience. International Journal of Geographical Information Science 19, 501–536. Agirre, E., Rigau, G., 1996. Word sense disambiguation using conceptual density. In: Proceedings of the 16th Conference on Computational LinguisticsAssociation for Computational Linguistics, Stroudsburg, pp. 16–22. Vol. 1. Alameh, N., 2003. Chaining geographic information web services. IEEE Internet Computing 7, 22–29. Alani, H., Jones, C.B., Tudhope, D., 2001. Voronoi-based region approximation for geographical information retrieval with gazetteers. International Journal of Geographical Information Science 15, 287–306. Allen, J.F., 1983. Maintaining knowledge about temporal intervals. Communications of the ACM 26, 832–843. Amitay, E., Har’el, N., Sivan, R. and Soffer, A. (2004). Web-a-where: Geotagging web content. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280. New York: ACM. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., 2000. Gene ontology: Tool for the unification of biology. Nature Genetics 25, 25–29. Athanasis, N., Kalabokidis, K., Vaitis, M., Soulakellis, N., 2009. Towards a semantics-based approach in the development of geographic portals. Computers & Geosciences 35, 301–308. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R. and Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In: The semantic web. Berlin: Springer. Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D., 2009a. Triplify: Light-weight linked data publication from relational databases. In: Proceedings of the 18th International Conference on World Wide WebACM, Madrid, pp. 621–630. Auer, S., Lehmann, J., Hellmann, S., 2009b. Linkedgeodata: Adding a spatial dimension to the web of data. In: International Semantic Web Conference. Springer, Heidelberg, pp. 731–746. Aumueller, D., Do, H.-H., Massmann, S., Rahm, E., 2005. Schema and ontology matching with COMAþþ. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of DataACM, Baltimore, pp. 906–908. Baeza-Yates, R., Ribeiro-Neto, B., 1999. Modern information retrieval. ACM press, New York. Baglioni, M., Masserotti, M. V., Renso, C. and Spinsanti, L. (2007). Building geospatial ontologies from geographical databases. In: International Conference on GeoSpatial Sematics, pp. 195–209. Heidelberg: Springer. Ballatore, A. and Adams, B. (2015). Extracting place emotions from travel blogs. In: Proceedings of AGILE, Lisbon, pp. 1–5. Ballatore, A., Hegarty, M., Kuhn, W. and Parsons, E. (2015). Spatial search, Final Report. Battle, R., Kolas, D., 2011. Geosparql: Enabling a geospatial semantic web. Semantic Web Journal 3, 355–370. Battle, R., Kolas, D., 2012. Enabling the geospatial semantic web with parliament and geosparql. Semantic Web 3, 355–370. Bellini, P., Nesi, P. and Pantaleo, G. (2015). Benchmarking RDF stores for smart city services. In: 2015 I.E. International Conference on Smart City/SocialCom/SustainCom (SmartCity), pp. 46–49. New York: IEEE. Berners-Lee, T., Hendler, J., Lassila, O., 2001. The semantic web. Scientific American 284, 28–37. Bishr, M., Kuhn, W., 2007. Geospatial information bottom-up: A matter of trust and semantics. In: Fabrikant, S.I., Wachowicz, M. (Eds.), The European information society: Leading the way with geo-information. Springer, Berlin. Bishr, Y., 1998. Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science 12, 299–314. Bishr, Y., 2006. Geospatial semantic web. In: Rana, S., Sharma, J. (Eds.), Frontiers of geographic information technology. Springer, Heidelberg. Bittner, T., Donnelly, M., Winter, S., 2005. Ontology and semantic interoperability. In: Prosperi, D., Zlatanova, S. (Eds.), Large-scale 3D data integration: Challenges and opportunities. CRC press, Boca Raton, pp. 139–160. Bizer, C., Heath, T., Berners-Lee, T., 2009a. Linked datadthe story sofar. International Journal on Semantic Web and Information Systems (IJSWIS) 5 (3), 1–22. Bizer, C., Heath, T., Idehen, K., Berners-LEE, T., 2008. Linked data on the web (LDOW2008). In: Proceedings of the 17th International Conference on World Wide WebACM, New York, pp. 1265–1266. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S., 2009b. DBpedia-A crystallization point for the Web of Data. Web Semantics: Science, services and agents on the world wide web 7, 154–165. Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022. Bolstad, P., 2005. GIS fundamentals: A first text on Geographic Information Systems. Eider Press, White Bear Lake, MN. Bowers, S., Lin, K. and Ludascher, B. (2004). On integrating scientific resources through semantic registration. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management, 2004, pp. 349–352. New York: IEEE. Brauner, D.F., Casanova, M.A., Milidiú, R.L., 2007. Towards gazetteer integration through an instance-based thesauri mapping approach. In: Advances in Geoinformatics. Springer, Berlin. Brickley, D. and Guha, R. V. (2000). Resource Description Framework (RDF) Schema Specification 1.0: W3C Candidate Recommendation, 27 March 2000. Brodaric, B., 2004. The design of GSC FieldLog: Ontology-based software for computer aided geological field mapping. Computers & Geosciences 30, 5–20. Brosset, D., Claramunt, C., Saux, E., 2007. A location and action-based model for route descriptions. In: International Conference on GeoSpatial Sematics. Springer, Heidelberg, pp. 146–159. Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology learning from text: An overview. Ontology Learning From Text: Methods, Evaluation and Applications 123, 3–12. Buitelaar, P., Olejnik, D. and Sintek, M. (2004). A protégé plug-in for ontology extraction from text based on linguistic analysis. In: European Semantic Web Symposium, 31–44. Heraklion: Springer. Bunescu, R.C., Pasca, M., 2006. Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16. Burrough, P.A., Frank, A., 1996. Geographic objects with indeterminate boundaries. CRC Press, Boca Raton, FL. Buscaldi, D., Rosso, P., 2008a. A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Science 22, 301–313. Buscaldi, D. and Rosso, P. (2008b). Map-based vs. knowledge-based toponym disambiguation. In: Proceedings of the 2nd International Workshop on Geographic Information Retrieval, 19–22. New York: ACM.

90

Geospatial Semantics

Buscaldi, D., Rosso, P. and Arnal, E. S. (2005). Using the wordnet ontology in the geoclef geographical information retrieval task. In: Workshop of the Cross-Language Evaluation Forum for European Languages, pp. 939–946. Berlin: Springer. Carral, D., Scheider, S., Janowicz, K., Vardeman, C., Krisnadhi, A. A. and Hitzler, P. (2013). An ontology design pattern for cartographic map scaling. In: Extended Semantic Web Conference, pp. 76–93. Berlin: Springer. Chen, J. and Shaw, S.-L. (2016). Representing the spatial extent of places based on flickr photos with a representativeness-weighted Kernel density estimation. In: International Conference on Geographic Information Science, pp. 130–144. Berlin: Springer. Clarke, B.L., 1981. A calculus of individuals based on connection. Notre Dame Journal of Formal Logic Notre-Dame, Ind. 22, 204–219. Clarke, K.C., 1997. Getting started with geographic information systems. Prentice Hall, Upper Saddle River, NJ. Cohn, A.G., Gotts, N.M., 1996. The ‘egg-yolk’representation of regions with indeterminate boundaries. Geographic Objects With Indeterminate Boundaries 2, 171–187. Compton, M., Barnaghi, P., Bermudez, L., García-Castro, R., Corcho, O., Cox, S., Graybeal, J., Hauswirth, M., Henson, C., Herzog, A., 2012. The SSN ontology of the W3C semantic sensor network incubator group. Web Semantics: Science, Services and Agents on the World Wide Web 17, 25–32. Couclelis, H., 1992. Location, place, region, and space. Geography’s Inner Worlds 2, 15–233. Couclelis, H., 2009. Ontology, epistemology, teleology: triangulating geographic information science. In: Navratil, G. (Ed.), Research trends in geographic information science. Springer, Berlin. Couclelis, H., 2010. Ontologies of geographic information. International Journal of Geographical Information Science 24, 1785–1809. Cresswell, T., 1996. In place-out of place: geography, ideology, and transgression. University of Minnesota Press, Minneapolis. Cresswell, T., 2014. Place: An introduction. John Wiley & Sons, New York, NY. Cruz, I.F., Sunna, W., Chaudhry, A., 2004. Semi-automatic ontology alignment for geospatial data integration. In: International Conference on Geographic Information Science. Springer, Heidelberg, pp. 51–66. Cruz, I.F., Sunna, W., Makar, N., Bathala, S., 2007. A visual tool for ontology alignment to enable geospatial interoperability. Journal of Visual Languages & Computing 18, 230–254. Cucerzan, S. (2007). Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, Prague, pp. 708–716. Davies, C., Holt, I., Green, J., Harding, J., Diamond, L., 2009. User needs and implications for modelling vague named places. Spatial Cognition & Computation 9, 174–194. Di Donato, P., 2010. Geospatial semantics: A critical review. In: International Conference on Computational Science and Its Applications. Springer, Heidelberg, pp. 528–544. Droegemeier, K.K., Chandrasekar, V., Clark, R., Gannon, D., Graves, S., Joseph, E., Ramamurthy, M., Wilhelmson, R., Brewster, K., Domenico, B., 2005. Linked environments for atmospheric discovery (lead): Architecture, technology roadmap and deployment strategy. In: 21st Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology. San Diego, CA. Duckham, M., Kulik, L., Worboys, M., Galton, A., 2008. Efficient generation of simple polygons for characterizing the shape of a set of points in the plane. Pattern Recognition 41, 3224–3236. Egenhofer, M.J., 1991. Reasoning about binary topological relations. In: Symposium on Spatial Databases. Springer, Berlin and Heidelberg, pp. 141–160. Egenhofer, M.J., 2002. Toward the semantic geospatial web. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information SystemsACM, New York, pp. 1–4. Egenhofer, M.J., Franzosa, R.D., 1991. Point-set topological spatial relations. International Journal of Geographical Information Systems 5, 161–174. Egenhofer, M.J., Mark, D.M., 1995. Naive geography. In: International Conference on Spatial Information Theory. Springer, Berlin, pp. 1–15. Erling, O., Mikhailov, I., 2010. Virtuoso: RDF support in a native RDBMS. In: Virgilio, R.D., Giunchiglia, F., Tanca, L. (Eds.), Semantic web information management. Springer, Heidelberg. Ermilov, I., Auer, S. and Stadler, C. (2013). Csv2rdf: User-driven csv to rdf mass conversion framework. In: Proceedings of the ISEM. Graz. Euzenat, J., 2004. An API for ontology alignment. In: International Semantic Web Conference. Springer, Heidelberg, pp. 698–712. Fallahi, G.R., Frank, A.U., Mesgari, M.S., Rajabifard, A., 2008. An ontological structure for semantic interoperability of GIS and environmental modeling. International Journal of Applied Earth Observation and Geoinformation 10, 342–357. Feinerer, I. and Hornik, K. (2012). tm: Text mining package. R package version 0.5-7.1, 1. Feng, C.-C., Bittner, T., Flewelling, D.M., 2004. Modeling surface hydrology concepts with endurance and perdurance. In: International Conference on Geographic Information Science. Springer, Heidelberg, pp. 67–80. Fisher, P., Unwin, D., 2005. Re-presenting geographical information systems. In: Unwin, D., Fisher, P. (Eds.), Re–presenting GIS. Wiley, London, pp. 1–17. Fonseca, F. (2008). Geospatial semantic web. In: Encyclopedia of GIS. Berlin: Springer. Fonseca, F., Câmara, G., Miguel Monteiro, A., 2006. A framework for measuring the interoperability of geo-ontologies. Spatial Cognition and Computation 6, 309–331. Fonseca, F.T., Egenhofer, M.J., 1999. Ontology-driven geographic information systems. In: Proceedings of the 7th ACM International Symposium on Advances in Geographic Information SystemsACM, Kansas city, pp. 14–19. Fonseca, F.T., Egenhofer, M.J., Agouris, P., Câmara, G., 2002. Using ontologies for integrated geographic information systems. Transactions in GIS 6, 231–257. Fonseca, F.T., Egenhofer, M.J., Davis, C., Borges, K.A., 2000. Ontologies and knowledge sharing in urban GIS. Computers, Environment and Urban Systems 24, 251–272. Fox, P., Mcguinness, D.L., Cinquini, L., West, P., Garcia, J., Benedict, J.L., Middleton, D., 2009. Ontology-supported scientific data frameworks: The virtual solar-terrestrial observatory experience. Computers & Geosciences 35, 724–738. Frank, A.U., 1996. Qualitative spatial reasoning: Cardinal directions as an example. International Journal of Geographical Information Science 10, 269–290. Frank, A.U., 1997. Spatial ontology: A geographical information point of view. In: Stock, O. (Ed.), Spatial and temporal reasoning. Kluwer, Dordrecht. Frank, A.U., 2001. Tiers of ontology and consistency constraints in geographical information systems. International Journal of Geographical Information Science 15, 667–678. Frank, A. U. (2003). Ontology for spatio-temporal databases. In: Spatio-Temporal Databases. Berlin: Springer. Freksa, C., 1991. Qualitative spatial reasoning. In: Mark, D.M., Frankk, A.U. (Eds.), Cognitive and linguistic aspects of geographic space. Kluwer Academic Press, Dordrecht, pp. 361–372. Freksa, C., 1992. Using orientation information for qualitative spatial reasoning. In: Theories and methods of spatio-temporal reasoning in geographic space. Springer, Berlin, pp. 162–178. Frontiera, P., Larson, R., Radke, J., 2008. A comparison of geometric approaches to assessing spatial similarity for GIR. International Journal of Geographical Information Science 22, 337–360. Gangemi, A. (2005). Ontology design patterns for semantic web content. In: International Semantic Web Conference, pp. 262–276. Heidelberg: Springer. Gangemi, A. (2013). A comparison of knowledge extraction tools for the semantic web. In: Extended Semantic Web Conference, pp 351–366. Heidelberg: Springer. Gangemi, A., Presutti, V., 2009. Ontology design patterns. In: Staab, S., Studer, R. (Eds.), Handbook on ontologies. Springer, Berlin, pp. 221–243. Gao, S., Janowicz, K., Mckknzie, G. and Li, L. (2013). Towards platial joins and buffers in place-based GIS. COMP@ SIGSPATIAL, pp 42–49. New York: ACM. Gao, S., Li, L., LI, W., Janowicz, K. and Zhang, Y. (2017). Constructing gazetteers from volunteered big geo-data based on Hadoop. Computers, Environment and Urban Systems 61: 172–186. Gärdenfors, P., 2004. How to make the semantic web more semantic. In: Vieu, A.C., Varzi, L. (Eds.), Formal ontology in information systems. IOS press, Amsterdam, pp. 19–36. Gates, A. Q., Keller, G. R., Salayandia, L., Da Silva, P. P. and Salcedo, F. (2007). The gravity data ontology: Laying the foundation for workflow-driven ontologies. In: International Conference on GeoSpatial Sematics, pp. 278–287. Berlin: Springer. Gelernter, J., Balaji, S., 2013. An algorithm for local geoparsing of microtext. GeoInformatica 17, 635–667. Gelernter, J., Mushegian, N., 2011. Geo-parsing messages from microtext. Transactions in GIS 15, 753–773.

Geospatial Semantics

91

Gelsey, A., Mcdermott, D., 1990. Spatial reasoning about mechanisms. In: Chen, S. (Ed.), Advances in spatial reasoning. Ablex Publishing Corporation, Norwood, NJ. Gey, F., Larson, R., Sanderson, M., Joho, H., Clough, P., Petras, V., 2005. GeoCLEF: the CLEF 2005 cross-language geographic information retrieval track overview. In: Workshop of the Cross-Language Evaluation Forum for European Languages. Springer, Berlin and Heidelberg, pp. 908–919. Gibson, E.J., 1982. The concept of affordances in development: The renascence of functionalism. In: Collings, W.A. (Ed.), The concept of development: The Minnesota symposia on child psychology. Lawrence Erlbaum, Hillsdale, NJ, pp. 55–81. Golledge, R.G., 2002. The nature of geographic knowledge. Annals of the Association of American Geographers 92, 1–14. Golledge, R.G., Spector, A.N., 1978. Comprehending the urban environment: Theory and practice. Geographical Analysis 10, 403–426. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69, 211–221. Goodchild, M.F., 2011. Formalizing place in geographic information systems. In: Burton, L., Kemp, S., Leung, M.C., Matthews, S., Takeuchi, D. (Eds.), Communities, Neighborhoods, and Health. Springer, Newyork. Goodchild, M.F., Hill, L.L., 2008. Introduction to digital gazetteer research. International Journal of Geographical Information Science 22, 1039–1044. Goodchild, M.F., Hunter, G.J., 1997. A simple positional accuracy measure for linear features. International Journal of Geographical Information Science 11, 299–306. Goodwin, J., Dolbear, C., Hart, G., 2008. Geographical linked data: The administrative geography of great britain on the semantic web. Transactions in GIS 12, 19–30. Grenon, P. and Smith, B. (2004). SNAP and SPAN. A Prolegomena to Geodynamic Ontology. Spatial Cognition & Computation 4(1): 69–104. Grothe, C., Schaab, J., 2009. Automated footprint generation from geotags with kernel density estimation and support vector machines. Spatial Cognition & Computation 9, 195–211. Gruber, T.R., 1993. A translation approach to portable ontology specifications. Knowledge Acquisition 5, 199–220. Guarino, N. (1997). Some organizing principles for a unified top-level ontology. In: AAAI Spring Symposium on Ontological Engineering, pp. 57–63. Menlo Park: AAAI Press. Guarino, N. (1998). Formal ontology and information systems. In: Proceedings of FOIS, pp. 3–15. Amsterdam: IOS Press. Gutierrez, C., Hurtado, C., Vaisman, A. (2005). Temporal rdf. In: European Semantic Web Conference, pp. 93–107. Berlin: Springer. Hakimpour, F. and Timpf, S. (2001). Using ontologies for resolution of semantic heterogeneity in GIS. In: Proceedings of 4th AGILE Conference on Geographic Information Science, pp. 385–395. Brno. Hart, G., Dolbear, C., 2013. Linked data: A geographic perspective. CRC Press, Boca Raton. Harvey, F., Kuhn, W., Pundt, H., Bishr, Y., Riedemann, C., 1999. Semantic interoperability: A central issue for sharing geographic information. The Annals of Regional Science 33, 213–232. Hastings, J., 2008. Automated conflation of digital gazetteer data. International Journal of Geographical Information Science 22, 1109–1127. Heath, T., Bizer, C., 2011. Linked data: Evolving the web into a global data space. Synthesis Lectures on the Semantic Web: Theory and Technology 1, 1–136. Hess, G.N., Iochpe, C., Ferrara, A., Castano, S., 2007. Towards effective geographic ontology matching. In: International Conference on GeoSpatial Sematics. Springer, Berlin and Heidelberg, pp. 51–65. Hill, L. L. (2000). Core elements of digital gazetteers: Placenames, categories, and footprints. In: International Conference on Theory and Practice of Digital Libraries, pp. 280–290. Berlin: Springer. Hitzler, P., Krotzsch, M., Rudolph, S., 2009. Foundations of semantic web technologies. CRC Press, Boca Raton. Hollenstein, L., Purves, R., 2010. Exploring place through user-generated content: Using Flickr tags to describe city cores. Journal of Spatial Information Science 2010, 21–48. Hu, W., Qu, Y., 2008. Falcon-AO: A practical ontology matching system. Web Semantics: Science, Services and Agents on the World Wide Web 6, 237–239. Hu, Y., Gao, S., Janowicz, K., Yu, B., Li, W., Prasad, S., 2015a. Extracting and understanding urban areas of interest using geotagged photos. Computers, Environment and Urban Systems. 54, 240–254. Hu, Y. and Janowicz, K. (2016). Enriching top-down geo-ontologies using bottom-up knowledge mined from linked data. In: Advancing Geographic Information Science: The Past and Next Twenty Years, pp. 183–198. GSDI Association Press. Hu, Y., Janowicz, K., Carral, D., Scheider, S., Kuhn, W., Berg-Cross, G., Hitzler, P., Dean, M. and Kolas, D. (2013). A geo-ontology design pattern for semantic trajectories. In: International Conference on Spatial Information Theory, pp. 438–456. Berlin: Springer. Hu, Y., Janowicz, K., Chen, Y., 2016. Task-oriented information value measurement based on space-time prisms. International Journal of Geographical Information Science 30, 1228–1249. Hu, Y., Janowicz, K. and Prasad, S. (2014). Improving wikipedia-based place name disambiguation in short texts using structured data from dbpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, p. 8. New York: ACM. Hu, Y., Janowicz, K., Prasad, S., Gao, S., 2015b. Metadata topic harmonization and semantic search for linked-data-driven geoportals: A case study using ArcGIS online. Transactions in GIS 19, 398–416. Hu, Y., Mckenzie, G., Janowicz, K. and Gao, S. (2015c). Mining human-place interaction patterns from location-based social networks to enrich place categorization systems. In: Proceedings of the Workshop on Cognitive Engineering for Spatial Information Processes at COSIT 2015 Santa Fe. Intagorn, S. and Lerman, K. (2011). Learning boundaries of vague places from noisy annotations. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 425–428. New York: ACM. Janowicz, K., 2012. Observation-driven geo-ontology engineering. Transactions in GIS 16, 351–374. Janowicz, K., Keßler, C., 2008. The role of ontology in improving gazetteer interaction. International Journal of Geographical Information Science 22, 1129–1157. Janowicz, K., Raubal, M., Kuhn, W., 2011. The semantics of similarity in geographic information retrieval. Journal of Spatial Information Science 2011, 29–57. Janowicz, K., Schade, S., Bröring, A., Keßler, C., Maué, P., Stasch, C., 2010. Semantic enablement for spatial data infrastructures. Transactions in GIS 14, 111–129. Janowicz, K., Scheider, S., Adams, B., 2013. A geo-semantics flyby. In: Reasoning Web. Semantic Technologies for Intelligent Data Access. Springer, Berlin and Heidelberg. Janowicz, K., Scheider, S., Pehle, T., Hart, G., 2012. Geospatial semantics and linked spatiotemporal data–past, present, and future. Semantic Web 3, 321–332. Janowicz, K., Wilkes, M., Lutz, M., 2008. Similarity-based information retrieval and its role within spatial data infrastructures. In: Cova, T.J., Miller, J.J., Beard, K., Frank, A.U., Goodchild, M.F. (Eds.), International Conference on Geographic Information Science. Springer, Berlin, pp. 151–167. Jones, C. B., Alani, H. and Tudhope, D. (2001). Geographical information retrieval with ontologies of place. In: International Conference on Spatial Information Theory, pp. 322– 335. Berlin: Springer. Jones, C. B., Purves, R., Ruas, A., Sanderson, M., Sester, M., Van Kreveld, M. and Weibel, R. (2002). Spatial information retrieval and geographical ontologies an overview of the SPIRIT project. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 387–388. New York: ACM. Jones, C.B., Purves, R.S., 2008. Geographical information retrieval. International Journal of Geographical Information Science 22, 219–228. Jones, C.B., Purves, R.S., 2014. GIR’13 workshop report: 7th ACM SIGSPATIAL workshop on geographic information retrieval Orlando, Florida, USA, 5th November 2013. SIGSPATIAL Special 6, 14. Jones, C.B., Purves, R.S., Clough, P.D., Joho, H., 2008a. Modelling vague places with knowledge from the Web. International Journal of Geographical Information Science 22, 1045–1065. Jones, R., Zhang, W.V., Rey, B., Jhala, P., Stipp, E., 2008b. Geographic intention and modification in web search. International Journal of Geographical Information Science 22, 229–246. Ju, Y., Adams, B., Janowicz, K., hu, Y., Yan, B. and Mckenzie, G. (2016). Things and Strings: Improving Place Name Disambiguation from Short Texts by Combining Entity CoOccurrence with Topic Modeling. In: 20th International Conference on Knowledge Engineering and Knowledge Management, 19–23 November 2016. Bologna, Italy. Kennedy, L., Naaman, M., Ahern, S., Nair, R., Rattenbury, T., 2007. How flickr helps us make sense of the world: Context and content in community-contributed media collections. In: Proceedings of the 15th ACM International Conference on MultimediaACM, New York, pp. 631–640.

92

Geospatial Semantics

Keßler, C., Farmer, C.J., 2015. Querying and integrating spatial–temporal information on the Web of Data via time geography. Web Semantics: Science, Services and Agents on the World Wide Web 35, 25–34. Keßler, C., Janowicz, K. and Bishr, M. (2009a). An agenda for the next generation gazetteer: Geographic information contribution and retrieval. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 91–100. New York: ACM. Keßler, C., Janowicz, K., Kauppinen, T., 2012. Spatial@ linkedscience–exploring the research field of GIScience with linked data. In: International Conference on Geographic Information Science. Springer, Berlin, pp. 102–115. Keßler, C., Maué, P., Heuer, J. T. and Bartoschek, T. (2009b). Bottom-up gazetteers: Learning from the implicit semantics of geotags. In: International Conference on GeoSpatial Sematics, pp. 83–102. Berlin: Springer. Kim, J., Vasardani, M., Winter, S., 2016. Similarity matching for integrating spatial information extracted from place descriptions. International Journal of Geographical Information Science. 31, 1–25. Klippel, A., Tappe, H., Kulik, L., Lee, P.U., 2005. Wayfinding choremesda language for modeling conceptual route knowledge. Journal of Visual Languages & Computing 16, 311–329. Klippel, A., Winter, S., 2005. Structural salience of landmarks for route directions. In: International Conference on Spatial Information Theory. Springer, Berlin and Heidelberg, pp. 347–362. Krisnadhi, A., Hu, Y., Janowicz, K., Hitzler, P., Arko, R., Carbotte, S., Chandler, C., Cheatham, M., Fils, D., Finin, T., 2015a. The GeoLink framework for pattern-based linked data integration. In: Proceedings of the ISWC 2015 Posters & Demonstrations Track a track within the 14th International Semantic Web ConferenceISWC, Bethlehem. Krisnadhi, A., Hu, Y., Janowicz, K., Hitzler, P., Arko, R., Carbotte, S., Chandler, C., Cheatham, M., Fils, D. and Finin, T. (2015b). The GeoLink modular oceanography ontology. In: International Semantic Web Conference, pp. 301–309. Berlin: Springer. Kuhn, W., 2003. Semantic reference systems. International Journal of Geographical Information Science 17, 405–409. Kuhn, W., 2005. Geospatial semantics: Why, of what, and how? Journal on Data Semantics III, LNCS 3534, 1–24. Kuhn, W., Kauppinen, T. and Janowicz, K. (2014). Linked data-A paradigm shift for geographic information science. In: International Conference on Geographic Information Science, pp. 173–186. Berlin: Springer. Lambrix, P., Tan, H., 2006. SAMBOda system for aligning and merging biomedical ontologies. Web Semantics: Science, Services and Agents on the World Wide Web 4, 196–206. Larson, R. R. (1996). Geographic information retrieval and spatial browsing. In: Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information [Papers Presented at the 1995 Clinic on Library Applications of Data Processing, April 10–12, 1995], Graduate School of Library and Information Science, University of Illinois at UrbanaChampaign. Lecun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S., 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 167–195. Leidner, J.L., 2006. An evaluation dataset for the toponym resolution task. Computers, Environment and Urban Systems 30, 400–417. Leidner, J.L., 2008. Toponym resolution in text: Annotation, evaluation and applications of spatial grounding of place names. Universal-Publishers, Boca Raton, FL. Lemmens, R., Wytzisk, A., By, D., Granell, C., Gould, M., Van Oosterom, P., 2006. Integrating semantic and syntactic descriptions to chain geographic services. IEEE Internet Computing 10, 42–52. Li, J., Tang, J., Li, Y., Luo, Q., 2009. RiMOM: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and Data Engineering 21, 1218–1232. Li, L., Goodchild, M.F., 2011. An optimisation model for linear feature matching in geographical data conflation. International Journal of Image and Data Fusion 2, 309–328. Li, L. and Goodchild, M. F. (2012). Constructing places from spatial footprints. In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information, pp. 15–21. New York: ACM. Li, W., Goodchild, M.F., Raskin, R., 2014. Towards geospatial semantic search: Exploiting latent semantic relations in geospatial data. International Journal of Digital Earth 7, 17–37. Ligozat, G.F., 1993. Qualitative triangulation for spatial reasoning. In: European Conference on Spatial Information Theory. Springer, Elba Island, pp. 54–68. Lin, F. and Sandkuhl, K. (2008). A survey of exploiting wordnet in ontology matching. In: IFIP International Conference on Artificial Intelligence in Theory and Practice, pp. 341– 350. Berlin: Springer. Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., 2001. Geographic information systems and science. John Wiley & Sons Ltd, Abingdon. Lutz, M., Lucchi, R., Friis-Christensen, A., Ostländer, N., 2007. A rule-based description framework for the composition of geographic information services. In: International Conference on GeoSpatial Sematics. Springer, New York, pp. 114–127. Lutz, M., Sprado, J., Klien, E., Schubert, C., Christ, I., 2009. Overcoming semantic heterogeneity in spatial data infrastructures. Computers & Geosciences 35, 739–752. Maceachren, A.M., Kraak, M.-J., 2001. Research challenges in geovisualization. Cartography and Geographic Information Science 28, 3–12. Maedche, A., Staab, S., 2004. Ontology learning. Handbook on ontologies. Springer, Berlin. Mai, G., Janowicz, K., Hu, Y. and Mckenzie, G. (2016). A Linked Data driven visual interface for the multi-perspective exploration of data across repositories. In: Proceedings of 2nd International Workshop on Visualization and Interaction for Ontologies and Linked Data, pp. 89–97. Aachen: CEUR. Mallenby, D. (2007). Grounding a geographic ontology on geographic data. In: AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, pp. 101–106. Menlo Park: AAAI Press. Manning, C.D., Raghavan, P., Schütze, H., 2008. Introduction to information retrieval. Cambridge University Press, Cambridge. Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S. and Mcclosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60. Mark, D., Egenhofer, M., Hirtle, S., Smith, B., 2000. UCGIS emerging research theme: Ontological foundations for geographic information science. In: McMaster, R., Usery, L. (Eds.), Research Challenges in Geographic Information Science. John Wiley and Sons, New York. Mark, D.M., Turk, A.G., 2003. Landscape categories in Yindjibarndi: Ontology, environment, and language. In: International Conference on Spatial Information Theory. Springer, Heidelberg, pp. 28–45. Martins, B., Manguinhas, H. and Borbinha, J. (2008). Extracting and exploring the geo-temporal semantics of textual resources. In: IEEE International Conference on Semantic Computing, pp. 1–9. New York: IEEE. Mata-Rivera, F., Torres-Ruiz, M., Guzmán, G., Moreno-Ibarra, M., Quintero, R., 2015. A collaborative learning approach for geographic information retrieval based on social networks. Computers in Human Behavior 51, 829–842. Mata, F., 2007. Geographic information retrieval by topological, geographical, and conceptual matching. In: International Conference on GeoSpatial Sematics. Springer, Heidelberg, pp. 98–113. McCurley, K.S., 2001. Geospatial mapping and navigation of the web. In: Proceedings of the 10th International Conference on World Wide WebACM, New York, pp. 221–229. Mckenzie, G., Adams, B. and Janowicz, K. (2015a). Of oxen and birds: Is yik yak a useful new data source in the geosocial zoo or just another Twitter? In: Proceedings of the 8th ACM SIGSPATIAL International Workshop on Location-Based Social Networks, p. 4. New York: ACM. Mckenzie, G., Janowicz, K., 2015. Where is also about time: A location-distortion model to improve reverse geocoding using behavior-driven temporal semantic signatures. Computers, Environment and Urban Systems 54, 1–13. Mckenzie, G., Janowicz, K., Gao, S., Yang, J.-A., Hu, Y., 2015b. POI pulse: A multi-granular, semantic signature-based information observatory for the interactive visualization of big geosocial data. Cartographica: The International Journal for Geographic Information and Geovisualization 50, 71–85.

Geospatial Semantics

93

Mendes, P. N., Jakob, M., García-Silva, A. and Bizer, C. (2011). DBpedia spotlight: Shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. New York: ACM. Meyer, D., Hornik, K., FEINERER, I., 2008. Text mining infrastructure in R. Journal of Statistical Software 25, 1–54. Miller, G.A., 1995. WordNet: A lexical database for English. Communications of the ACM 38, 39–41. Moncla, L., Renteria-Agualimpia, W., Nogueras-Iso, J. and Gaio, M. (2014). Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 183–192. New York: ACM. MONTELLO, D.R., 1998. A new framework for understanding the acquisition of spatial knowledge in large-scale environments. In: Egenhofer, M.J. (Ed.), Spatial and temporal reasoning in geographic information systems. Oxford University Press, New York, pp. 143–154. Montello, D.R., 2009. Cognitive research in GIScience: Recent achievements and future prospects. Geography Compass 3, 1824–1840. Montello, D.R., Fabrikant, S.I., Ruocco, M., Middleton, R.S., 2003a. Testing the first law of cognitive geography on point-display spatializations. In: International Conference on Spatial Information Theory. Springer, Berlin and Heidelberg, pp. 316–331. Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P., 2003b. Where’s downtown? Behavioral methods for determining referents of vague spatial queries. Spatial Cognition & Computation 3, 185–204. Mostern, R., Johnson, I., 2008. From named place to naming event: Creating gazetteers for history. International Journal of Geographical Information Science 22, 1091–1108. Navarrete, T., Blat, J., 2007. An algorithm for merging geographic datasets based on the spatial distributions of their values. In: International Conference on GeoSpatial Sematics. Springer, Berlin and Heidelberg, pp. 66–81. Nowak, J., Nogueras-Iso, J. and Peedell, S. (2005). Issues of multilinguality in creating a European SDI-the perspective for spatial data interoperability. In: 11th EC-GI & GIS Workshop, ESDI: Setting the Framework. Alghero, Sardinia. Ouksel, A.M., Sheth, A., 1999. Semantic interoperability in global information systems. ACM Sigmod Record 28, 5–12. Overell, S., Rüger, S., 2008. Using co-occurrence models for placename disambiguation. International Journal of Geographical Information Science 22, 265–287. Patroumpas, K., Alexakis, M., Giannopoulos, G. and Athanasiou, S. (2014). TripleGeo: An ETL tool for transforming geospatial data into RDF triples. In: Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference, Athens, Greece, March 28, pp. 275–278. Pérez, J., Arenas, M., Gutierrez, C., 2006. Semantics and complexity of SPARQL. In: International Semantic Web Conference. Springer, Heidelberg, pp. 30–43. Perry, M., Estrada, A., Das, S., Banerjee, J., 2015. Developing GeoSPARQL applications with Oracle spatial and graph. 1st Joint International Workshop on Semantic Sensor Networks and Terra Cognita. CEUR, Bethlehem, PA. Perry, M., Sheth, A.P., Hakimpour, F., Jain, P., 2007. Supporting complex thematic, spatial and temporal queries over semantic web data. In: International Conference on GeoSpatial Sematics. Springer, Heidelberg, pp. 228–246. Prieto-Díaz, R. (2003). A faceted approach to building ontologies. Information Reuse and Integration, 2003. IRI 2003. In: IEEE International Conference on, 2003, pp. 458–465. New York: IEEE. Pundt, H., Bishr, Y., 2002. Domain ontologies for data sharing–an example from environmental monitoring using field GIS. Computers & Geosciences 28, 95–102. Purves, R., Edwardes, A. and Wood, J. (2011). Describing place through user generated content. First Monday 16. Purves, R.S., Clough, P., Jones, C.B., Arampatzis, A., Bucher, B., Finch, D., Fu, G., Joho, H., Syed, A.K., Vaid, S., 2007. The design and implementation of SPIRIT: A spatially aware search engine for information retrieval on the Internet. International Journal of Geographical Information Science 21, 717–745. Randell, D.A., Cui, Z., Cohn, A.G., 1992. A spatial logic based on regions and connection. In: Proceedings of KR’92: Principles of Knowledge Representation and ReasoningMorgan Kaufmann, San Mateo, pp. 165–176. Raskin, R.G., Pan, M.J., 2005. Knowledge representation in the semantic web for Earth and environmental terminology (SWEET). Computers & Geosciences 31, 1119–1125. Rattenbury, T., Naaman, M., 2009. Methods for extracting place semantics from Flickr tags. ACM Transactions on the Web (TWEB) 3, 1. Renz, J., Nebel, B., 2007. Qualitative spatial reasoning using constraint calculi. Handbook of Spatial Logics. Springer, Heidelberg. Rice, M.T., Aburizaiza, A.O., Jacobson, R.D., Shore, B.M., Paez, F.I., 2012. Supporting accessibility for blind and visiondimpaired people with a localized gazetteer and open source geotechnology. Transactions in GIS 16, 177–190. Rodríguez, M.A., Egenhofer, M.J., 2003. Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15, 442–456. Rodríguez, M.A., Egenhofer, M.J., Rugg, R.D., 1999. Assessing semantic similarities among geospatial feature class definitions. In: Interoperating Geographic Information Systems. Springer, Berlin. Rogers, J., Rector, A., 1996. The GALEN ontology. In: Medical Informatics Europe (MIE 96). IOS Press, Copenhagen, pp. 174–178. Rosch, E., Lloyd, B.B., 1978. Cognition and categorization. Lawrence Erlbaum Associates, Hillsdale, NJ, p. 47. Rosch, E.H., 1973. Natural categories. Cognitive Psychology 4, 328–350. Russell, S.J., Norvig, P., Canny, J.F., Malik, J.M., Edwards, D.D., 2003. Artificial intelligence: A modern approach. Prentice hall, Upper Saddle River. Samal, A., Seth, S., Cueto, K., 2004. A feature-based approach to conflation of geospatial sources. International Journal of Geographical Information Science 18, 459–489. Sanderson, M. and Kohler, J. (2004). Analyzing geographic queries. In: Proceedings of the SIGIR Workshop on Geographic Information Retrieval, pp. 8–10. Sankoff, D., Kruskal, J.B., 1983. Time warps, string edits, and macromolecules: The theory and practice of sequence comparison. Addison-Wesley Publication, Reading, p. 1 edited by Sankoff, David; Kruskal, Joseph B. Scheider, S., Janowicz, K., Kuhn, W., 2009. Grounding geographic categories in the meaningful environment. In: International Conference on Spatial Information Theory. Springer, Heidelberg, pp. 69–87. Schlieder, C., Vögele, T. and Visser, U. (2001). Qualitative spatial representation for information retrieval by gazetteers. In: International Conference on Spatial Information Theory, pp. 336–351. Berlin: Springer. Schuurman, N. (2005). Object Definition in GIS. In: Re-presenting GIS, p. 27, Hoboken: John Wiley & Sons. Schuurman, N., Leszczynski, A., 2006. Ontologydbased metadata. Transactions in GIS 10, 709–726. Sehgal, V., Getoor, L., Viechnicki, P.D., 2006. Entity resolution in geospatial data integration. In: Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information SystemsACM, New York, pp. 83–90. Sen, S., 2007. Two types of hierarchies in geospatial ontologies. In: International Conference on GeoSpatial Sematics. Springer, Mexico City, pp. 1–19. Shamsfard, M., Barforoush, A.A., 2004. Learning ontologies from natural language texts. International Journal of Human-Computer Studies 60, 17–63. Shankar, M., Sorokine, A., Bhaduri, B., Resseguie, D., Shekhar, S., Yoo, J.S., 2007. Spatio-temporal conceptual schema development for wide-area sensor networks. In: International Conference on GeoSpatial Sematics. Springer, Mexico City, pp. 160–176. Shvaiko, P., Euzenat, J., 2005. A survey of schema-based matching approaches. In: Spaccapietra, S. (Ed.), Journal on data semantics IV: Lecture Notes in Computer Science, vol 3730. Springer, Berlin and Heidelberg. Shyu, C.-R., Klaric, M., Scott, G.J., Barb, A.S., Davis, C.H., Palaniappan, K., 2007. GeoIRIS: Geospatial information retrieval and indexing systemdcontent mining, semantics modeling, and complex queries. IEEE Transactions on Geoscience and Remote Sensing 45, 839–852. Silva, M.J., Martins, B., Chaves, M., Afonso, A.P., Cardoso, N., 2006. Adding geographic scopes to web resources. Computers, Environment and Urban Systems 30, 378–399. Sinha, G., Mark, D., Kolas, D., Varanka, D., Romero, B.E., Feng, C.-C., Usery, E.L., Liebermann, J., Sorokine, A., 2014. An ontology design pattern for surface water features. In: International Conference on Geographic Information Science. Springer, Heidelberg, pp. 187–203. Smith, B. and Mark, D. M. (1998). Ontology and geographic kinds. In: Proceedings, International Symposium on Spatial Data Handling. Vancouver, Canada. Smith, B., Mark, D.M., 2001. Geographical categories: An ontological investigation. International Journal of Geographical Information Science 15, 591–612.

94

Geospatial Semantics

Smith, D.A., Crane, G., 2001. Disambiguating geographic names in a historical digital library. In: International Conference on Theory and Practice of Digital Libraries. Springer, Heidelberg, pp. 127–136. Sorokine, A., Sorokine, R., Bittner, T. and Renschler, C. (2004). Ontological investigation of ecosystem hierarchies and formal theory for multiscale ecosystem classifications. In: Proceedings of GIScience’04. Citeseer. Southall, H., Mostern, R., Berman, M.L., 2011. On historical gazetteers. International Journal of Humanities and Arts Computing 5, 127–145. Stevens, R., Goble, C.A., Bechhofer, S., 2000. Ontology-based knowledge representation for bioinformatics. Briefings in bioinformatics 1, 398–414. Stoeckl, R., Rohrmeier, P. and Hess, T. (2007). Motivations to produce user generated content: Differences between webloggers and videobloggers. In: BLED 2007 Proceedings, p. 30, AIS Electronic Library (AISeL). Stokes, N., Li, Y., Moffat, A., Rong, J.W., 2008. An empirical study of the effects of NLP components on Geographic IR performance. International Journal of Geographical Information Science 22, 247–264. Sunna, W., Cruz, I.F., 2007. Structure-based methods to enhance geospatial ontology alignment. In: International Conference on GeoSpatial Sematics. Springer, Heidelberg, pp. 82–97. Third, A., Bennett, B., Mallenby, D., 2007. Architecture for a grounded ontology of geographic information. In: International Conference on GeoSpatial Sematics. Springer, Berlin and Heidelberg, pp. 36–50. Tomai, E., Kavouras, M., 2004a. From “onto-geonoesis” to “onto-genesis”: The design of geographic ontologies. Geoinformatica 8, 285–302. Tomai, E. and Kavouras, M. (2004b). “Where the city sits?” revealing geospatial semantics in text descriptions. In: 7th AGILE Conference on Geographic Information Science, pp. 189–194. Heraklion. Tuan, Y.-F., 1977. Space and place: The perspective of experience. University of Minnesota Press, Minneapolis. Uryupina, O., 2003. Semi-supervised learning of geographical gazetteers from the internet. In: Proceedings of the HLT-NAACL 2003 Workshop on Analysis of Geographic ReferencesAssociation for Computational Linguistics, Morristown, NJ, pp. 18–25. Vol. 1. Usery, E.L., Varanka, D., 2012. Design and development of linked data from the national map. Semantic Web 3, 371–384. Vasardani, M., Winter, S., Richter, K.-F., 2013. Locating place names from place descriptions. International Journal of Geographical Information Science 27, 2509–2532. Visser, U., Stuckenschmidt, H., Schuster, G., Vögele, T., 2002. Ontologies for geographic information processing. Computers & Geosciences 28, 103–117. Wallgrün, J.O., Hardisty, F., Maceachren, A.M., Karimzadeh, M., Ju, Y., Pezanowski, S., 2014. Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. In: Proceedings of the 8th Workshop on Geographic Information RetrievalACM, Dallas, p. 44. Wang, C., Xie, X., Wang, L., Lu, Y., Ma, W.-Y., 2005. Detecting geographic locations from web resources. In: Proceedings of the 2005 Workshop on Geographic Information RetrievalACM, New York, pp. 17–24. Wang, S., 2010. A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers 100, 535–557. Wang, W., Stewart, K., 2015. Spatiotemporal and semantic information extraction from web news reports about natural hazards. Computers, Environment and Urban Systems 50, 30–40. Wang, Y., Gong, J., Wu, X., 2007. Geospatial semantic interoperability based on ontology. Geo-spatial Information Science 10, 204–207. White, E., Stewart, K., 2015. Barrier dynamics for GIS: A design pattern for geospatial barriers. International Journal of Geographical Information Science 29, 1007–1022. Wiegand, N., García, C., 2007. A taskdbased ontology approach to automate geospatial data retrieval. Transactions in GIS 11, 355–376. Winter, S., Freksa, C., 2012. Approaching the notion of place by contrast. Journal of Spatial Information Science 2012, 31–50. Winter, S., Kuhn, W., Krüger, A., 2009. Guest editorial: Does place have a place in geographic information science? Spatial Cognition and Computation 9, 171–173. Worboys, M. and Hornsby, K. (2004). From objects to events: GEM, the geospatial event model. In: International Conference on Geographic Information Science, pp. 327–343. Berlin: Springer. Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., Bambacus, M., Fay, D., 2011. Spatial cloud computing: How can the geospatial sciences use and help shape cloud computing? International Journal of Digital Earth 4, 305–329. Ye, M., Shou, D., Lee, W.-C., Yin, P. and Janowicz, K. (2011). On the semantic annotation of places in location-based social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 520–528. New York: ACM. Yue, P., 2013. Geospatial semantic web. In: Yue, P. (Ed.), Semantic web-based intelligent geospatial web services. Springer, New York. Zhang, C., Zhao, T., Li, W., Osleeb, J.P., 2010. Towards logic-based geospatial feature discovery and integration using web feature service and geospatial semantic web. International Journal of Geographical Information Science 24, 903–923. Zhao, T., Zhang, C., Anselin, L., Li, W., Chen, K., 2015. A parallel approach for improving Geo-SPARQL query performance. International Journal of Digital Earth 8, 383–402. Zhou, S., Winter, S., Vasardani, M., Zhou, S., 2016. Place descriptions by landmarks. Journal of Spatial Science 1–21. Zhu, R., Hu, Y., Janowicz, K., Mckenzie, G., 2016. Spatial signatures for geographic feature types: Examining gazetteer ontologies using spatial statistics. Transactions in GIS 20 (3), 333–355.

1.08

Geocoding and Reverse Geocoding

Dapeng Li, Michigan State University, East Lansing, MI, United States © 2018 Elsevier Inc. All rights reserved.

1.08.1 1.08.2 1.08.2.1 1.08.2.2 1.08.2.3 1.08.2.4 1.08.3 1.08.4 1.08.5 1.08.5.1 1.08.5.2 1.08.5.3 1.08.5.4 1.08.6 References

1.08.1

Introduction Principles and Methods of Geocoding/Reverse Geocoding Principles Methods Online Geocoding/Reverse Geocoding Geocoding/Reverse Geocoding Quality Geocoding/Reverse Geocoding Applications Location Privacy in Geocoding/Reverse Geocoding Recent Trends and Challenges Accessibility The Temporal Dimension Indoor Positioning Privacy in the Mobile Age Conclusion

95 96 96 97 101 102 103 103 104 104 105 106 106 107 107

Introduction

Location plays a significant role in geography and geographic information systems (GIS). Georeferencing, defined as the general process of relating information to a geographic location, is an important concept in GIS (Hill, 2009). Geocoding, an important type of georeferencing technique, usually refers to relating street addresses to geographic coordinates (Goldberg, 2011). A broader definition of geocoding is not limited to address names but includes various kinds of geographic features. Geocoding can date back to the early days of digital mapping in the 1960s when the US Census Bureau used digital map databases to match addresses from transportation or health surveys to their corresponding census tracts or blocks (Cooke, 1998). With the rapid development of computer technologies (especially mobile computing) in the past few years, geocoding and reverse geocoding have become so ubiquitous that they have become a necessity in our daily life. For example, a geocoding operation is performed when a user gives a street address as the input to locate it in Google Map, while a reverse geocoding process occurs when the user searches nearby features with a specific geographic location. Moreover, reverse geocoding is also an indispensible functionality in various location-based services (LBS) applications. Address geocoding has been the primary study focus in this field for decades since residential addresses have served as an important spatial attribute in various survey or historical record data in a variety of fields such as public health and transportation. These addresses need to be transformed to geographic locations before researchers, practitioners, or stakeholders can perform spatial analysis over the data for decision-making or policy making. The theoretical underpinnings, primary methods, various applications, and latest progress of address geocoding will be covered in this article such that the readers can develop a better understanding of this technique. Geocoding/reverse geocoding can be generally categorized into two groups: conventional and online geocoding/reverse geocoding. Conventional geocoding/reverse geocoding practices are usually conducted by GIS professionals using existing GIS software, and users can have more control on the reference data and geocoding/reverse geocoding method. Online geocoding/reverse geocoding are mostly published by commercial companies as web services, and users can send requests according to predetermined format and receive results from these services. Compared with conventional offline geocoding/reverse geocoding, online services can be readily integrated into modern systems developed on different platforms or in different programming languages. This platform- and language-independent feature has made online geocoding/reverse geocoding services enjoy great popularity in modern information systems. This article also covers online geocoding/reverse geocoding. The remainder of this article is organized as follows. Section “Principles and Methods of Geocoding/Reverse Geocoding” provides the principles, methods, and quality of geocoding/reverse geocoding and relevant metrics for quality evaluation. Section “Geocoding/Reverse Geocoding Applications” introduces the applications of geocoding/reverse geocoding. Privacy issues in geocoding/reverse geocoding are covered in section “Location Privacy in Geocoding/Reverse Geocoding.” The challenges in geocoding/reverse geocoding are given in section “Recent Trends and Challenges.” Finally, section “Conclusion” concludes with a summary of this work.

95

96

Geocoding and Reverse Geocoding

1.08.2

Principles and Methods of Geocoding/Reverse Geocoding

1.08.2.1

Principles

With substantial demand in a wide range of applications, geocoding/reverse geocoding has become a fundamental module of most GIS software packages. This section examines the principles and methods of geocoding/reverse geocoding. First and foremost, we need to understand the importance of geocoding/reverse geocoding in various applications before we dig into more details. Survey data have been widely used as input for analysis in many disciplines, for example, public health, geography, and sociology. In many cases, the address data are also collected from the subjects, which enable researchers to ask and answer questions from a spatial perspective. However, we need to use geocoding to transform addresses to geographic locations before we could perform spatial analysis to examine the spatial patterns of the phenomena. Equally important is the use of geocoding to associate collected data with various data (e.g., demographic data, socioeconomic status (SES) data, and environmental data) compiled by federal and local government agencies, organizations, and commercial companies and reveal the underlying patterns for policy making (Rushton et al., 2006). For example, the decennial census conducted by the US Census Bureau has been an important source of various demographic and SES data at different spatial levels (e.g., census blocks, block groups, and census tracts). The zonal system used by the US Census Bureau is characterized by a population-based subdivision of the United States. As a matter of fact, the US Census Bureau also sends out surveys to the households and performs geocoding to link the data with corresponding census zones. Geocoding provides an avenue for the researchers to link their data to a wide range of other data such that they could reveal new patterns or discover new knowledge. Fig. 1 demonstrates how geocoding can be used to aggregate the address data at different spatial units and link them with other existing data. Note that the users can use GIS software packages to spatially join the collected data to other data conveniently once they obtain the geographic locations of the address data using geocoding.

Fig. 1

Using geocoding to link input data to other data at different spatial scales.

Geocoding and Reverse Geocoding

97

Since address data are the primary input for geocoding, we need to develop a good understanding of address data before we could perform geocoding/reverse geocoding procedures more effectively. An address is a piece of text that describes and identifies the location of a specific residence. Different countries may have different postal systems. However, a residential address is usually composed of a series of components, for example, house number, street name/type, city name, state/province name, and zip code. Note that duplicates may exist for one component of an address within a postal system but the whole combination should always be unique. In most cases, a postal system will have a set of standards for addresses. For example, the United States Postal Service (USPS) has its postal addressing standards that describe both standardized address formats and contents. It is important that the address data adhere to these standards during data collection such that they could be readily processed during geocoding. Note that there exist many web services that can verify and validate addresses and can give the users a standardized address. These address standardization tools have been widely used in various software systems to improve the accuracy of address data. In addition to residential addresses, post-office box (POB) addresses are also widely used. Since the location of a POB address usually differs from the location of a physical residence, the use of POB addresses poses challenges to geocoding and subsequent analysis (Hurley et al., 2003). Thus, geocoding practitioners need to take into account address type during the geocoding procedure. A geocoding procedure is usually composed of several components: the input data, reference datasets, a processing method, and output (Goldberg et al., 2007; Karimi et al., 2004; Levine and Kim, 1998), as shown in Fig. 2. As mentioned earlier, the input data usually come from surveys or other sources and are characterized by records with address information. The reference datasets could be in a variety of forms, for example, address point data, road data, and parcel data. A processing algorithm uses the reference datasets to match an input address record to a specific geographic location (latitude/longitude). Note that the reference data should maintain a certain level of accuracy; otherwise, the locations produced from geocoding will be inaccurate. After the geocoding procedure, each address record that can be geocoded is associated with a geographic location. Fig. 2 could also be used to demonstrate the building blocks of a reverse geocoding procedure. The input data for reverse geocoding will be a set of geographic coordinates, which could be acquired through stand-alone Global Positioning Systems (GPS) units or GPS receivers installed on various mobile devices. A processing algorithm looks up the geographic features in the reference datasets to obtain a feature or features that are close to the input point and satisfy a set of constraints from the user. Thus, reverse geocoding could be considered as a spatial query process that retrieves the features around a given input point from the reference datasets and returns the results to the user. Note that the features in reverse geocoding are not limited to addresses and can also include a variety of feature types. These features are usually termed points of interest (POIs). The output of a reverse geocoding procedure is a feature or a set of features that satisfy certain constraints (e.g., distance and feature type) from the user.

1.08.2.2

Methods

The methods for geocoding/reverse geocoding rely on the reference datasets, and users need to develop a good understanding of various reference datasets first. Reference data could use different data models, for example, point, polyline, and polygon. Fig. 3 gives some examples of the popularly used reference data: address point, parcel, road, and other spatial units (e.g., counties) data. Note that the reference data should be standardized and appropriately indexed before they are used for geocoding. As for the reference data for reverse geocoding, they should be spatially indexed to speed up spatial query. Fig. 4 illustrates the general workflow of a geocoding procedure, which includes tokenization, standardization, and address matching (Goldberg et al., 2007). Specifically, in the context of geocoding, tokenization refers to the practice of splitting a whole address into different components. Then, the standardization step transforms each component into standardized form such that it can be used for address matching. In the address matching step, the components of an input address record are used to match the address to a record in the reference datasets. Note that the users can set relevant constraints for the matching step. A scoring algorithm is usually used to assign a score to a matched record. For example, if all the components of an input address have been matched with a record in the reference dataset and an accurate location can be determined, this geocoded record will have a high score; otherwise, if only parts of the components can be matched and the returned geographic point is derived from the reference datasets, the matching can have a low score. Finally, if the matching is successful, a geocoded location will be returned as the result of the geocoding procedure.

Fig. 2

Components of the geocoding procedure.

98

Geocoding and Reverse Geocoding

Fig. 3

Examples of various reference data.

Fig. 4

The general workflow of a processing algorithm.

Address point data use points to represent address locations, and each record includes the address information, which can be used to match the address data during geocoding. The point feature associated with each record could be derived from parcel data. Furthermore, high-resolution remote sensing image data have also been used to improve the spatial accuracy of the point location. For example, a point could be positioned based on the footprint of the residence. Each record in the reference address point dataset contains the different components (e.g., house number, prefix direction, street name, street type, city, state, and zip code) of an address. During the geocoding process, the processing algorithm first splits an input address record into corresponding components and then compares it with the reference data to match it to a point location. Note that there could be many variations in the components of an address. For example, according to the USPS, the standard suffix abbreviation for “AVENUE” is “AVE.” However, commonly used suffixes or abbreviations also include “AV” and “AVEN,” to name a few. Moreover, there could also be many misspellings in the input address records during data collection. Thus, these issues should also be taken into account in the processing algorithm. Techniques such as soundex (a phonetic index system that indexes information based on word sounds) can be used to deal with these issues (Zandbergen, 2008). It is also worth mentioning that it is expensive to compile and maintain large address point datasets. Countries such as Australia, the United Kingdom, and Canada have national address point datasets. In the United States, although many counties and states have built their address point datasets, a national address point database does not exist at this moment. Parcel data have also been widely used as the reference datasets in geocoding, and Fig. 3B gives an example of the layout of parcel data. Similarly, each record in the parcel data has address and geometry information. Note that while fine-grain address point data could be used to link an address with a building structure, parcel data could at most link an address with a parcel or the centroid of a parcel. The matching process could also be done in the same manner. The compilation and maintenance of large parcel data is also

Geocoding and Reverse Geocoding

99

costly. In the United States, parcel data are usually managed by local government agencies, which limits EL id¼“del19” orig¼“s”; the use of parcel data in nationwide geocoding practices. It should also be noted that since the centroid of a parcel is usually used as the result for an input address in a geocoding procedure, parcel sizes will have impacts on positional accuracy. The use of road data in geocoding has enjoyed great popularity in the past few decades. One important reason is the availability of national road datasets from the US Census Bureau’s Topologically Integrated Geographic Encoding and Referencing (TIGER) products. The road datasets from the TIGER/Line dataset have been widely used in geocoding practices. The users can join the road dataset to an address range dataset such that each road contains address range information. Due to the growing customer needs for geocoding, the US Census added Address Range Feature shapefiles (ADDREFEAT) to the TIGER/Line products beginning in 2011. Note that while the TIGER road datasets could also be used for other purposes (e.g., transportation studies), the address range feature data are specially designed for address geocoding. Fig. 5 shows the address feature data from the 2015 TIGE/Line products for Salt Lake City, Utah. Note that some road segments are not included in this dataset because they have no addresses (residences) associated with them. Each record includes a series of attributes that are important for geocoding, and Table 1 lists some key attributes. With given road or address range feature data from TIGER/Line products, the input addresses can be geocoded using an address interpolation algorithm. Fig. 6 gives a demonstration of address interpolation in street geocoding. Each road segment in the reference datasets includes the address range on the left and right side of the road, and the house number from the input address record is used to interpolate the address location. It is worth mentioning that the address range for each side of the road segment can contain more addresses than the true address range. For example, the true address range on the left side of the road in Fig. 6 is 101–107, while the address range in the reference data is 101–109. Note that the offsets along the road segment and perpendicular to the road

Fig. 5

Address range feature data in Salt Lake City. Table 1

Key record attributes in the address range feature dataset

Field name

Description

FULLNAME LFROMHN

The full name of the road segment The beginning address number on the left side of a road segment The ending address number on the left side of a road segment The beginning address number on the right side of a road segment The ending address number on the right side of a road segment The zip code on the left side of a road segment The zip code on the right side of a road segment

LTOHN RFROMHN RTOHN ZIPL ZIPR

100

Geocoding and Reverse Geocoding

Fig. 6

Address interpolation in street geocoding.

Fig. 7

A geocoded address using the address range feature data in ArcGIS.

segment are usually given in the reference datasets. The offsets are used to compute the location of the geocoded point. It should also be noted that a road segment is directional in geocoding, and the starting and end node information can be used to determine the left and right side of the road. In the United States, the national TIGER/Line datasets are available to the public at no cost, which makes street geocoding a widely used method in geocoding practices. Fig. 7 gives an example of using address range feature data for geocoding. Specifically, the input address is “332 S 1400 E, SALT LAKE CITY, UT, 84112”; the reference data used are the address range feature data shown in Fig. 5, and geocoding was performed in ArcGIS 10.1. Note that the start point of the input road segment is located at the northmost node while the endpoint at the southmost node. The address ranges on the left and right side of the road are 301–399 and 300–398, respectively. The house number 332 from the input address, the address range data, and the road geometry were used to compute the location of the address. The structure location of the input address is also shown in Fig. 7, and there is a displacement between the interpolated point from geocoding and the true structure location. This displacement can represent the positional accuracy, which will be discussed in more detail in section “Geocoding/Reverse Geocoding Quality.” In addition to the aforementioned methods, other areal unit features (e.g., zip code, city, county, and state boundary lines) in different zoning systems can also be used for geocoding (Goldberg and Cockburn, 2012). Similar to parcel geocoding, when these areal unit features are used as the reference data for geocoding, the returned results are the centroids of these features. Note that the centroid computed for an irregular area unit can fall within another areal unit, which can cause significant errors and impact the results (Goldberg and Cockburn, 2012). It is also worth mentioning that areal unit features have also been widely used in reverse geocoding. This process is also termed spatial join in GIS. For example, Nguyen et al. (2016) made use of the census tract boundary data to match geotagged tweets to the census tracts they fall within such that demographic and SES data can be linked to study the social environment at the neighborhood level.

Geocoding and Reverse Geocoding

101

Note that the geocoding procedure has evolved over time with the improvement of reference datasets and processing algorithms. For example, the quality of reference data has been improved significantly from the early Dual Independent Map Encoding data to the widely used TIGER road data and the recent TIGER address range feature data. A study done by Zandbergen et al. (2011) examined the positional accuracy of TIGER 2000 and 2009 data, and the results indicate that TIGER 2009 data are consistently more accurate than TIGER 2000 data. It should also be noted that the choice of the resolution of reference data for geocoding depends on the context and the methods to be used in subsequent analysis procedure. At the same time, a significant amount of research has been done to develop new processing algorithms to further improve the geocoding procedure. For example, many composite methods have been proposed to ensure that more address records can be successfully matched during the geocoding procedure.

1.08.2.3

Online Geocoding/Reverse Geocoding

Online geocoding/reverse geocoding refers to web services that can provide geocoding/reverse geocoding services to the users, and the past few years have witnessed the great popularity of online geocoding/reverse geocoding (Kounadi et al., 2013; Roongpiboonsopit and Karimi, 2010a). Compared with conventional geocoding/reverse geocoding, online geocoding/ reverse geocoding services have several advantages. First, the users do not need to compile, process, or manage reference datasets in an online setting. Second, online geocoding services provide easy-to-use, language and platform-independent representational state transfer (REST) application programming interfaces (APIs) such that the users can conveniently perform geocoding operations. For example, most online geocoding/reverse geocoding services use Extensible Markup Language (XML) or JavaScript Object Notation (JSON) as the output formats, which enables the users to integrate these services into different computer systems. However, online geocoding services also come with a number of disadvantages. First, the users cannot manage or control the reference data, the constraints, or the parameters for geocoding/reverse geocoding, which makes it difficult to evaluate the quality. Second, most service providers have set count limits for the number of queries the users can make without any cost, and the users need to buy special licenses before they could use these online services to make large batches of requests. Another constraint is that the users need internet access to use these online services. Online reverse geocoding services play a significant role in LBS. In a typical LBS application, a user retrieves POIs close to his or her current location for a certain purpose. For example, users can use a mobile application named Yelp in the United States to search for different types of nearby restaurants through its reverse geocoding services. Table 2 lists several popular free reverse geocoding services and their count limits. Note that it is very expensive to compile, update, and maintain the data behind these online reverse geocoding services. Most companies provide their free online reverse geocoding services to basic users with a daily count limit and commercial services to advanced users at a certain price. Since the count limit usually applies to the device, these free online reverse geocoding services will suffice for most client-side applications, and only those server-side applications that make a large number of reverse geocoding requests will need special licenses. With the growing demand for online geocoding/reverse geocoding services and the rapid development of computing technologies, more and more federal or state government agencies have started to build and publish their own online services. For example, the US Census Bureau has published an online geocoding/reverse geocoding service for public use. In addition to the standardized REST APIs, the US Census also provides a user-friendly online graphical user interface that enables users to upload address data and perform a batch processing operation. Another advancement made by the new US Census online services is that they enable the users to choose the reference data for geocoding/reverse geocoding. The users can use either the most current TIGER/Line data or the 2010 TIGER/Line data for geocoding/reverse geocoding. This gives the users more control over the reference data used in geocoding/reverse geocoding. Another example for online geocoding/reverse geocoding services developed by government agencies comes from the Automated Geographic Reference Center (AGRC) – the GIS Department of Utah in the United States. AGRC has published its online geocoding/reverse geocoding services to satisfy various data needs in Utah. One novel feature provided by AGRC’s services is that they enable the users to choose a spatial reference for geocoding/reverse geocoding. For example, the user can choose to use either a geographic coordinate system (e.g., World Geodetic System (WGS) 1984) or a projected coordinate system (e.g., Web Mercator) for geocoding/reverse geocoding. Furthermore, a number of other features (e.g., allowing the use of different types of geometry for spatial queries) are still under development. Moreover, AGRC also provides the examples of using these services in different programming languages and publishes the source code on GitHub. In summary, all these endeavors made by federal, state, or local government agencies will accelerate the development of online geocoding/reverse geocoding services and play a significant role in data sharing in the era of open science. Table 2

Examples of free online reverse geocoding services

Service name

Company

Limitations

Output format

GeoNames Google Maps MapQuest Nominatim

Marc Wick Google AOL Nominatim

2000 requests/h 2500 requests/day 5000 requests/day 1 request/s

XML/JSON XML/JSON XML/JSON XML/JSON

102

Geocoding and Reverse Geocoding

1.08.2.4

Geocoding/Reverse Geocoding Quality

Since geocoding usually serves as one step and provides location data for subsequent analysis and computation in many research endeavors, its quality has significant impacts on subsequent procedures and the final results (Zandbergen et al., 2012). Thus, we need to develop a better understanding of the quality of the geocoding procedure before we could analyze the data and interpret the results correctly. Popular metrics to evaluate geocoding quality include positional accuracy, match rate, and repeatability (Zandbergen, 2009b). Online reverse geocoding has been widely used in various LBS applications, and positional accuracy is also an important metric to evaluate reverse geocoding quality. Positional accuracy is an issue that can never be overemphasized in geocoding and reverse geocoding. Positional errors in geocoding/reverse geocoding can propagate to subsequent spatial analysis and influence the results directly, and a significant body of research has been conducted to examine error propagation in various applications (Jacquez, 2012; Karimi et al., 2004; Zandbergen et al., 2012). With the popularity of mobile computing in the past few years, positional accuracy in geocoding/reverse geocoding is becoming more significant. Fig. 8 illustrates the locations and displacements relevant to the positional accuracy of geocoding/reverse geocoding. The detailed descriptions are listed in Table 3. In a geocoding scenario, the user can obtain a location point p2 for an input address with the true location p3. The distance d2 between p2 and p3 can represent the positional accuracy of the geocoding procedure. If address point data are used as the reference data in geocoding, d2 is the displacement between the address point and its true location; if street centerline data are used, the accuracy of the road geometry and the offsets along and perpendicular to the road segment both contribute to the d2. Note that p3 can be derived from GPS units, large-scale parcel maps, or georeferenced high-resolution remote sensing image to compute d2 and evaluate positional accuracy in geocoding practices (Bonner et al., 2003; Cayo and Talbot, 2003; Curtis et al., 2006; Roongpiboonsopit and Karimi, 2010a; Strickland et al., 2007; Zandbergen and Green, 2007; Zandbergen et al., 2011). With the advent of GPS-enabled mobile devices, mobile computing has enjoyed great popularity in the past few years, and online reverse geocoding services have been widely used in various LBS applications. In high-density urban areas, the GPS signal may be impacted by obstructions such as buildings, which results in inaccuracy in the location derived from GPS-enabled mobile devices; moreover, locations derived from Wi-Fi and cellular positioning have large errors and cannot satisfy the accuracy needed by many LBS applications (Zandbergen, 2009a). A typical LBS application is used as an example to demonstrate the positional accuracy

Fig. 8

Illustration of positional accuracy in geocoding and reverse geocoding. Table 3

Positional accuracy of geocoding/reverse geocoding

Parameter

Description

p0 p1 p2 p3 d0 d1 d2

The true location of a user’s location The input query point for the reverse geocoding The derived point from geocoding/reverse geocoding The true location of a target structure (centroid of the structure footprint) The displacement between the input point and the true location of it The distance between the input point and the derived point from reverse geocoding The displacement between the derived point from geocoding/reverse geocoding and the true location of the target address/POI The distance between the input point for reverse geocoding services and the true location of the target address/POI The distance between the true location of the input point and the true location of the target address/POI

d3 d4

Geocoding and Reverse Geocoding

103

of online reverse geocoding services and mobile devices. As shown in Fig. 8, in the context of a LBS application, p0 denotes the true location of a user, p1 is the input query point derived from a mobile device, p2 is the derived point from a reverse geocoding service, and p3 is the true location of the POI. Theoretically, the true distance d4 between the user and the POI should be used in LBS applications; however, since the displacement d0 caused by mobile devices and d2 from online reverse geocoding services are usually unavoidable, the distance d1 is what the user derives in a LBS application. One example of d0 is the displacement caused by GPS receivers: the user gets his or her location p1 from GPS receivers, while the true location is p0. It is worth mentioning that most modern operating systems for mobile devices provide relevant APIs to derive geographic locations. Note that locations derived from different sources (e.g., GPS receivers, Wi-Fi access points, and cell powers) vary in positional accuracy (Zandbergen, 2009a). A study conducted by Zandbergen and Barbeau (2011) examined the accuracy of assisted GPS (A-GPS) on mobile phones under diffident conditions (static and dynamic outdoor tests and static indoor test), and the results indicate that A-GPS is accurate enough to satisfy the needs of most LBS applications. However, it may still be challenging to use locations derived from mobile devices in some special LBS applications (e.g., locating a specific POI in urban areas with high POI density). Thus, developers should take into account positional accuracy when they design and implement reverse geocoding services and LBS applications. Match rate, also coined completeness, is defined as the percentage of data records that are successfully matched (Zandbergen, 2008). It has become a routine to include the match rate of the geocoding procedure in most studies. It should be noted that match rate itself cannot represent the quality of a geocoding procedure. For example, a geocoding procedure based on address point data could have a lower match rate than one based on road centerline data. However, the geocoded points derived from the former are usually more accurate than the results from the latter in terms of positional accuracy. Although there is no consensus on the minimum match rate for each specific application, many studies have been conducted on this topic. For example, Ratcliffe (2004) used Monte Carlo simulation to replicate a declining match rate, and the result indicates that the estimated minimum reliable match rate for crime mapping is 85%. Repeatability refers to how sensitive the results are to variation in different components of geocoding (e.g., reference datasets, and matching algorithm) (Zandbergen, 2008). Whitsel et al. (2006) used addresses in 49 US states to evaluate geocoding services provided by four vendors, and the results revealed that significant differences exist in match rate, positional accuracy, and concordance between established and derived census tracts. Note that these metrics can also be used to evaluate online geocoding services (Roongpiboonsopit and Karimi, 2010a,b). It is worth mentioning that when evaluating geocoding quality, we should take into account all these metrics; otherwise, the conclusion could be biased.

1.08.3

Geocoding/Reverse Geocoding Applications

Geocoding and reverse geocoding have been widely used in a variety of fields. This subsection gives a brief review of various kinds of applications of geocoding/reverse geocoding. Specifically, the applications of geocoding in health, crime, and traffic accident studies are covered, and the applications of reverse geocoding concentrate on various LBS applications. As noted, geocoding has been extensively used in health studies, for example, measures of accessibility, disease cluster analysis, and exposure analysis. Accessibility studies in public health deal with measuring people’s accessibility to ability to access a wide range of resources that have impacts on their health and well-being, for example, food outlets (Barnes et al., 2015; Vandevijvere et al., 2016), health care facilities (Luo and Wang, 2003; Wan et al., 2012), and tobacco stores (Cantrell et al., 2016). Geocoding is used to locate both facilities and residences in accessibility-related studies. In disease cluster analysis, geocoding is employed to transform the patients’ addresses into geographic coordinates such that spatial analysis can be performed to detect clusters (Glatman-Freedman et al., 2016). Exposure analysis focuses on examining the impacts of pollution on people’s health and well-being, and geocoding is used to locate the residences of the subjects so that they can be linked to the environmental pollution data (Bellander et al., 2001; Ganguly et al., 2015). Another important application of geocoding is to transform addresses into geographic points in crime studies, for example, crime mapping and analysis (Andresen et al., 2016; Ratcliffe, 2002, 2004) and residency restrictions for sex offenders (Rydberg et al., 2016; Zandbergen and Hart, 2009). Moreover, geocoding has also been widely used to georeference police crash records to analyze traffic accidents and identity the road links that are more likely to cause crashes (Erdogan et al., 2008). Reverse geocoding has been widely used in a myriad of LBS applications. These applications are usually in an online setting. Online reverse geocoding services provide addresses or various POIs to the users to satisfy their needs. Note that these online services usually offer user-friendly interfaces such that users can make queries using different spatial or aspatial constraints. Moreover, these online services can be readily integrated into various computer systems on different platforms. Besides satisfying the users’ information needs in LBS applications, reverse geocoding has also been employed to build semantics for people’s GPS trajectories (Cao et al., 2010; Lv et al., 2016), which can help us develop a better understanding of people’s exposure to social, cultural, and physical environments.

1.08.4

Location Privacy in Geocoding/Reverse Geocoding

Since geocoding/reverse geocoding deals with sensitive location data, a great deal of attention is being paid to the privacy issues in various geocoding/reverse geocoding applications (Armstrong et al., 1999; Cassa et al., 2006; Curtis et al., 2006; Kounadi et al.,

104

Geocoding and Reverse Geocoding

2013; Kounadi and Leitner, 2014; Tompson et al., 2015; VanWey et al., 2005). For example, in the United States, many public health researchers and practitioners deal with sensitive individual data in their work. Although the Health Insurance Portability and Accountability Act issued by the US Department of Health and Human Services in 1996 has relevant regulations to protect individually identifiable health information, geolocation privacy issues still exist because of the difficulty to quantify the risk of disclosure. As mentioned earlier, it is a common practice to aggregate the collected data at a specific spatial level such that other datasets can be linked and further analysis is performed. And privacy issues will arise when large-scale maps are published to present the results to the public because detailed map data can be easily reverse geocoded to derive individual location and information. The practice of using published maps and/or other related data to recover the identity of an individual and infer individual information is termed reengineering (Curtis et al., 2006). Armstrong and Ruggles (2005) gave two examples of possible privacy violation in geocoding/reverse geocoding applications: one is reengineering individual-level information from dot maps and the other one rediscovering user location in LBS applications. Meanwhile, they also point out that the concepts of privacy issues change with the development of technologies. For example, the widely available high-resolution remote sensing image data could not only be used to construct more accurate structure-level reference data to improve geocoding quality but also be used to recover individual-level information. Thus, the development of technologies provides us an avenue to improve geocoding/reverse geocoding quality and also poses new challenges in privacy protection. In order to reduce disclosure risk, a substantial body of research has been conducted on preserving individual confidentiality in geocoding applications. Specifically, one simple method is to aggregate individual information to various geographic areas (e.g., census tracts, zip code zones, and counties) such that individual information will be protected. Note that if aggregated data are used in subsequent analysis, the details of spatial patterns could be lost, which makes it difficult to detect clusters (Armstrong et al., 1999). Furthermore, data aggregation can also lead to the modifiable unit area problem (Openshaw, 1984). Armstrong et al. (1999) listed some alternative methods (e.g., individual affine transformations, random perturbation, aggregation, neighbor information, and contextual information) that can mask individual information and preserve valuable information, and this line of research has attracted significant research attention in the past few years (Wieland et al., 2008; Zhang et al., 2015). A recent summary of the popularly used geographic masking methods was given by Kounadi and Leitner (2015). According to Armstrong et al. (1999), the methodology used to evaluate the effectiveness of geographic masking methods covers three aspects: (1) preserving information, (2) preserving links to other spatial data, and (3) preserving confidentiality. Note that the information to be preserved can be reflected in many forms (e.g., pairwise relations, event–geography relations, clusters, trends, and anisotropies) and different characteristics are used to store the information (e.g., distance, orientation, and directionality) (Armstrong et al., 1999). Thus, when evaluating the capabilities of geographic masks to preserve information, we need to take into account the application context (i.e., what information we should prioritize to preserve). For example, if we are to study the spatial pattern of disease incidences, the priority will be placed on preserving the spatial pattern information; however, if we study the effects of environmental exposure on people’s health conditions, we may need to take into consideration preserving the distance between residences and the pollution sources. With the popularity of various LBS applications, location privacy has also attracted a significant amount of attention. In LBS applications, users provide their locations as the input for LBS services to search and retrieve relevant POIs. This process poses significant disclosure risk to the users because the attackers can retrieve the location and identify the user (Wernke et al., 2014). Many countermeasures have been put forward to protect location privacy: regulatory strategies, privacy policies, anonymity, and obfuscation (Duckham and Kulik, 2006). Among these countermeasures, k-anonymity is a fundamental method that has been widely used in many studies (Gedik and Liu, 2008; Krumm, 2009). The concept of k-anonymity was originally proposed to protect privacy in the data released from a data holder, and a release is k-anonymous if each person’s information is indistinguishable from at least k  1 other persons’ information in the same release (Sweeney, 2002). Location k-anonymity can be achieved when the location information from a mobile user cannot be distinguished from the location information of at least k  1 other mobile users (Gruteser and Grunwald, 2003). Location k-anonymity has been extensively used as a metric for confidentiality preservation in many studies (Tribby et al., 2015; Zhang et al., 2015). It should be noted that we should take into account privacy issues if we use any individuallevel location data in our research.

1.08.5

Recent Trends and Challenges

1.08.5.1

Accessibility

The Euclidean distance has been widely used in various geocoding/reverse geocoding applications. For example, the Euclidean distance between the geocoded point and the true address location is computed to evaluate the positional accuracy of a geocoding procedure. Note that we need to take into account the application context when evaluating the impacts of positional accuracy on the final results. For example, for certain studies that employ the spatial analysis methods that rely on Euclidean distance, the positional error of the geocoded locations can influence the results directly, while for the studies that use analysis methods based on other types of distances, positional accuracy may not directly reflect variations in the results. For example, Euclidean distance can be used as an effective measure in environment exposure studies but may cause large errors in accessibility studies. Fig. 9 lists different measures of accessibility that differ in accuracy. Note that Euclidean distance is a very coarse measure of accessibility because people need to travel within the road network to access different resources. Thus, most health studies that involve measuring people’s access to health-related resources use network-based distance or travel time as the measure of accessibility (Delamater, 2013; Wan et al., 2012). However, it is worth mentioning that most health studies that use network-based distance or travel time to measure

Geocoding and Reverse Geocoding

Fig. 9

105

Accuracy of different measures of accessibility.

accessibility do not consider traffic congestion and the travel delays caused by traffic lights. A more accurate travel time can be estimated using traffic simulation, which can take into account these aspects by using relevant travel demand and traffic lights data. Finally, with the rapid development of big data technologies and intelligent transport systems (ITS), travel time can be derived from the real traffic data, which will provide a more accurate, detailed measure of accessibility for various applications. In addition to various geocoding applications, the development of measures of accessibility also has implications on many reverse geocoding applications. Specifically, one basic need in many LBS applications is to locate and access a nearby POI. Although a few navigational LBS applications (e.g., Google Map) provide travel times based on real traffic data to the users, a majority of reverse geocoding services and LBS applications still use Euclidean or network distance. It is expected that big data and ITS can be leveraged to enrich reverse geocoding services and resolve this issue.

1.08.5.2

The Temporal Dimension

Traditional research on geocoding/reverse geocoding has primarily focused on the spatial dimension (e.g., spatial accuracy), and the temporal dimension still remains underresearched. The temporal dimension in this context can have several aspects. First, note that the reference data used in geocoding/reverse geocoding are not static and commercial data companies or government agencies will add new records or update existing records during their maintenance. When users perform geocoding/reverse geocoding operations, they are using a snapshot of the reference data. While users can choose to use a specific reference dataset in convention geocoding, they have little control over the reference data when using commercial online geocoding services. Most commercial companies that provide online geocoding/reverse geocoding services take it for granted that users will need the most current reference data for geocoding and have neglected the situations in which the users need to use historical reference data to link to other historical datasets. As mentioned earlier, some online services like the US Census geocoding/reverse geocoding services have started to provide different referent datasets for the users, and this issue concerning the temporal aspect of the reference data in these online services could be solved with the development of technology and the increasing awareness of the importance of this problem. The implications of using reference datasets compiled in different time periods on academic research lie in that they differ in data quality, and this can influence subsequent analysis and the final results (Zandbergen et al., 2011). More research should be conducted to more thoroughly examine the impacts of data quality on spatial analysis and result interpretation. Another aspect of the temporal dimension concerns the temporal attributes of various POIs used in reverse geocoding for LBS applications. For example, the hours of operation for the POIs can be valuable for a myriad of LBS applications. A great many commercial reverse geocoding services have included hours of operation of the POIs in the databases. For example, Google maintains its own POI database and provides reverse geocoding services to the users through the Google Places APIs. The users can get the detailed information such as hours of operation about a specific POI, which provides great convenience to the users when they look for certain POIs. However, note that the hours of operation information for a specific could be subject to change due to holidays or other reasons, which indicates that it will be more valuable if these reverse geocoding service providers can connect with the business owners and maintain this information in a dynamic manner. With the rapid development of the internet of things, this could be achieved to further increase the value of various reverse geocoding services. The temporal dimension of the POIs has also attracted significant attention in academia. From a supply and demand perspective, the hours of operation of POIs can reflect the supply side and play a significant role in measuring people’s access to relevant resources. Studies on food access have incorporated time into accessibility measurements (Chen and Clark, 2013, 2016), which provides an avenue to discover new scientific findings. Moreover, recent studies have also proposed new methods to enrich reverse geocoding services. For example, user check-in data have been used to construct temporal signatures and further enrich the temporal information of the POIs (McKenzie and Janowicz, 2015; McKenzie et al., 2015). These endeavors based on data-driven methods will further improve the quality of reverse geocoding services and have great potential in designing better LBS applications that can better satisfy the users’ needs (Majid et al., 2013).

106 1.08.5.3

Geocoding and Reverse Geocoding Indoor Positioning

With the rapid development of mobile computing, the past few years have witnessed the great popularity of indoor positioning, and studies have shown that technologies such as Bluetooth and Wi-Fi can be readily used for indoor positioning (He and Chan, 2016; Li et al., 2015; Liu et al., 2007). The advent of indoor positioning poses many new challenges to geocoding/reverse geocoding practitioners. Mapping companies have incorporated indoor maps into their mapping practice such that users can locate various POIs more conveniently. Fig. 10 gives an example of indoor mapping in Google Map for the Hollywood & Highland building in Los Angeles, United States. Note that the physical address for the whole building is “6801 Hollywood Blvd, Los Angeles, CA 90028,” and many POIs are located inside the building. In this case, indoor positioning can be integrated with mobile mapping to better facilitate pedestrian navigation in indoor environments, and relevant geocoding/reverse geocoding services need to be developed such that users can readily locate relevant POIs in indoor environments. Indoor positioning has brought about many new indoor LBS applications, and indoor location-based retailing is one of the novel applications (Thamm et al., 2016). Small-sized beacons can be conveniently attached to different products, and when a consumer approaches an item, the smartphone can sense the beacon, and relevant descriptions of the item can be retrieved and displayed to the user. In this regard, relevant geocoding/reverse geocoding services can also be developed to satisfy these new needs in indoor environments.

1.08.5.4

Privacy in the Mobile Age

Mobile computing has become so pervasive that more and more mobile devices are being used in our daily life, and various mobile applications on smartphones or smart watches can easily record people’s locations and trajectories, which pose significant risks on people’s privacy. For example, a great many studies have shown that private information such as identity and home and workplace locations can be inferred from GPS trajectory data by using other ancillary data and advanced data mining methods (Krumm, 2007; Zheng, 2015). Recent studies reveal that even the trajectory data from geotagged social media posts can be used to infer various private data (Kim et al., 2016; Luo et al., 2016). These new challenges have attracted significant attention, and a myriad of studies

Fig. 10

An example of indoor mapping in Google Map.

Geocoding and Reverse Geocoding

107

have been conducted to develop relevant countermeasures to protect privacy from being exposed in the trajectory data (Fanaeepour et al., 2015; Seidl et al., 2016). Furthermore, the rapid development of big data and new positioning technologies (e.g., indoor positioning) will cause many new privacy issues, and more research should be conducted to thoroughly examine these issues and protect user privacy in the mobile age.

1.08.6

Conclusion

This work summarizes the principles, methods, and applications of geocoding/reverse geocoding and introduces the metrics for evaluating the quality of geocoding/reverse geocoding procedures. Furthermore, the pros and cons of online geocoding/reverse geocoding services are also covered. The privacy issues in geocoding/reverse geocoding are introduced, and relevant countermeasures for privacy protection are also discussed. Finally, the recent trends and challenges in geocoding/reverse geocoding are included to shed light on potential future research directions in this field. In summary, it is necessary for geocoding/reverse geocoding users and researchers to develop a good understanding of the theoretical underpinnings before they can use it correctly and more effectively and contribute to its development.

References Andresen, M.A., Linning, S.J., Malleson, N., 2016. Crime at places and spatial concentrations: Exploring the spatial stability of property crime in Vancouver BC, 2003–2013. Journal of Quantitative Criminology 1–21. http://dx.doi.org/10.1007/s10940-016-9295-8. Armstrong, M.P., Ruggles, A.J., 2005. Geographic information technologies and personal privacy. Cartographica: The International Journal for Geographic Information and Geovisualization 40 (4), 63–73. http://dx.doi.org/10.3138/RU65-81R3-0W75-8V21. Armstrong, M.P., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525. Barnes, T.L., Bell, B.A., Freedman, D.A., Colabianchi, N., Liese, A.D., 2015. Do people really know what food retailers exist in their neighborhood? Examining GIS-based and perceived presence of retail food outlets in an eight-county region of South Carolina. Spatial and Spatio-Temporal Epidemiology 13, 31–40. http://dx.doi.org/10.1016/ j.sste.2015.04.004. Bellander, T., Berglind, N., Gustavsson, P., Jonson, T., Nyberg, F., Pershagen, G., Järup, L., 2001. Using geographic information systems to assess individual historical exposure to air pollution from traffic and house heating in Stockholm. Environmental Health Perspectives 109 (6), 633–639. Bonner, M.R., Han, D., Nie, J., Rogerson, P., Vena, J.E., Freudenheim, J.L., 2003. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology 14 (4), 408–412. http://dx.doi.org/10.1097/01.EDE.0000073121.63254.c5. Cantrell, J., Pearson, J.L., Anesetti-Rothermel, A., Xiao, H., Kirchner, T.R., Vallone, D., 2016. Tobacco retail outlet density and young adult tobacco initiation. Nicotine & Tobacco Research 18 (2), 130–137. http://dx.doi.org/10.1093/ntr/ntv036. Cao, X., Cong, G., Jensen, C.S., 2010. Mining significant semantic locations from GPS data. Proceedings of the VLDB Endowment 3 (1–2), 1009–1020. http://dx.doi.org/ 10.14778/1920841.1920968. Cassa, C.A., Grannis, S.J., Overhage, J.M., Mandl, K.D., 2006. A context-sensitive approach to anonymizing spatial surveillance data. Impact on Outbreak Detection 13 (2), 160–165. http://dx.doi.org/10.1197/jamia.M1920. Cayo, M.R., Talbot, T.O., 2003. Positional error in automated geocoding of residential addresses. International Journal of Health Geographics 2 (1), 1–12. http://dx.doi.org/ 10.1186/1476-072x-2-10. Chen, X., Clark, J., 2013. Interactive three-dimensional geovisualization of space-time access to food. Applied Geography 43, 81–86. http://dx.doi.org/10.1016/ j.apgeog.2013.05.012. Chen, X., Clark, J., 2016. Measuring space-time access to food retailers: A case of temporal access disparity in Franklin County, Ohio. The Professional Geographer 68 (2), 175–188. http://dx.doi.org/10.1080/00330124.2015.1032876. Cooke, D.F., 1998. Topology and TIGER: The Census Bureau’s contribution. In: Foresman, T.W. (Ed.), The history of geographic information systems: Perspectives from the pioneers. Upper Saddle River, NJ, Prentice Hall. Curtis, A.J., Mills, J.W., Leitner, M., 2006. Spatial confidentiality and GIS: Re-engineering mortality locations from published maps about Hurricane Katrina. International Journal of Health Geographics 5 (1), 1–12. http://dx.doi.org/10.1186/1476-072x-5-44. Delamater, P.L., 2013. Spatial accessibility in suboptimally configured health care systems: A modified two-step floating catchment area (M2SFCA) metric. Health & Place 24, 30–43. http://dx.doi.org/10.1016/j.healthplace.2013.07.012. Duckham, M., Kulik, L., 2006. Location privacy and location-aware computing. In: Drummond, J. (Ed.), Dynamic and mobile GIS: Investigating change in space and time. CRC Press, Boca Raton, FL, pp. 34–51. Erdogan, S., Yilmaz, I., Baybura, T., Gullu, M., 2008. Geographical information systems aided traffic accident analysis system case study: City of Afyonkarahisar. Accident Analysis & Prevention 40 (1), 174–181. http://dx.doi.org/10.1016/j.aap.2007.05.004. Fanaeepour, M., Kulik, L., Tanin, E., Rubinstein, B.I., 2015. The CASE histogram: Privacy-aware processing of trajectory data using aggregates. GeoInformatica 19 (4), 747–798. http://dx.doi.org/10.1007/s10707-015-0228-8. Ganguly, R., Batterman, S., Isakov, V., Snyder, M., Breen, M., Brakefield-Caldwell, W., 2015. Effect of geocoding errors on traffic-related air pollutant exposure and concentration estimates. Journal of Exposure Science and Environmental Epidemiology 25 (5), 490–498. http://dx.doi.org/10.1038/jes.2015.1. Gedik, B., Liu, L., 2008. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE Transactions on Mobile Computing 7 (1), 1–18. http:// dx.doi.org/10.1109/TMC.2007.1062. Glatman-Freedman, A., Kaufman, Z., Kopel, E., Bassal, R., Taran, D., Valinsky, L., Shohat, T., 2016. Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting. Journal of Infection 73 (2), 99–106. http://dx.doi.org/10.1016/j.jinf.2016.04.038. Goldberg, D.W., 2011. Advances in geocoding research and practice. Transactions in GIS 15 (6), 727–733. http://dx.doi.org/10.1111/j.1467-9671.2011.01298.x. Goldberg, D.W., Cockburn, M.G., 2012. The effect of administrative boundaries and geocoding error on cancer rates in California. Spatial and Spatio-temporal Epidemiology 3 (1), 39–54. http://dx.doi.org/10.1016/j.sste.2012.02.005. Goldberg, D.W., Wilson, J.P., Knoblock, C.A., 2007. From text to geographic coordinates: The current state of geocoding. URISA Journal 19 (1), 33–46. Gruteser, M., Grunwald, D., 2003. Anonymous usage of location-based services through spatial and temporal cloaking. In: Paper Presented at the Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, San Francisco, California. He, S., Chan, S.H.G., 2016. Wi-Fi fingerprint-based indoor positioning: Recent advances and comparisons. IEEE Communications Surveys & Tutorials 18 (1), 466–490. http:// dx.doi.org/10.1109/COMST.2015.2464084.

108

Geocoding and Reverse Geocoding

Hill, L.L., 2009. Georeferencing: The geographic associations of information. The MIT Press, Cambridge, MA. Hurley, S.E., Saunders, T.M., Nivas, R., Hertz, A., Reynolds, P., 2003. Post office box addresses: A challenge for geographic information system-based studies. Epidemiology 14 (4), 386–391. http://dx.doi.org/10.1097/01.ede.0000073161.66729.89. Jacquez, G.M., 2012. A research agenda: Does geocoding positional error matter in health GIS studies? Spatial and Spatio-temporal Epidemiology 3 (1), 7–16. http://dx.doi.org/ 10.1016/j.sste.2012.02.002. Karimi, H.A., Durcik, M., Rasdorf, W., 2004. Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructure Engineering 19 (3), 170–185. http://dx.doi.org/10.1111/j.1467-8667.2004.00346.x. Kim, M.G., Kang, Y.O., Lee, J.Y., Koh, J.H., 2016. Inferring tweet location inference for twitter mining. Spatial Information Research 1–15. http://dx.doi.org/10.1007/s41324-0160041-y. Kounadi, O., Leitner, M., 2014. Why does geoprivacy matter? The scientific publication of confidential data presented on maps. Journal of Empirical Research on Human Research Ethics 9 (4), 34–45. http://dx.doi.org/10.1177/1556264614544103. Kounadi, O., Leitner, M., 2015. Spatial information divergence: Using global and local indices to compare geographical masks applied to crime data. Transactions in GIS 19 (5), 737–757. http://dx.doi.org/10.1111/tgis.12125. Kounadi, O., Lampoltshammer, T.J., Leitner, M., Heistracher, T., 2013. Accuracy and privacy aspects in free online reverse geocoding services. Cartography and Geographic Information Science 40 (2), 140–153. http://dx.doi.org/10.1080/15230406.2013.777138. Krumm, J., 2007. Inference attacks on location tracks. In: LaMarca, A., Langheinrich, M., Truong, K.N. (Eds.), Pervasive Computing: 5th International Conference, PERVASIVE 2007, Toronto, Canada, May 13–16, 2007, Proceedings. Springer, Berlin, Heidelberg, pp. 127–143. Krumm, J., 2009. A survey of computational location privacy. Personal and Ubiquitous Computing 13 (6), 391–399. http://dx.doi.org/10.1007/s00779-008-0212-5. Levine, N., Kim, K.E., 1998. The location of motor vehicle crashes in Honolulu: A methodology for geocoding intersections. Computers, Environment and Urban Systems 22 (6), 557–576. http://dx.doi.org/10.1016/S0198-9715(98)00034-9. Li, X., Wang, J., Liu, C., 2015. A bluetooth/PDR integration algorithm for an indoor positioning system. Sensors 15 (10), 24862. Liu, H., Darabi, H., Banerjee, P., Liu, J., 2007. Survey of wireless indoor positioning techniques and systems. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews 37 (6), 1067–1080. http://dx.doi.org/10.1109/TSMCC.2007.905750. Luo, W., Wang, F., 2003. Measures of spatial accessibility to health care in a GIS environment: Synthesis and a case study in the Chicago region. Environment and Planning B: Planning and Design 30 (6), 865–884. Luo, F., Cao, G., Mulligan, K., Li, X., 2016. Explore spatiotemporal and demographic characteristics of human mobility via Twitter: A case study of Chicago. Applied Geography 70, 11–25. http://dx.doi.org/10.1016/j.apgeog.2016.03.001. Lv, M., Chen, L., Xu, Z., Li, Y., Chen, G., 2016. The discovery of personally semantic places based on trajectory data mining. Neurocomputing 173 (Part 3), 1142–1153. http:// dx.doi.org/10.1016/j.neucom.2015.08.071. Majid, A., Chen, L., Chen, G., Mirza, H.T., Hussain, I., Woodward, J., 2013. A context-aware personalized travel recommendation system based on geotagged social media data mining. International Journal of Geographical Information Science 27 (4), 662–684. http://dx.doi.org/10.1080/13658816.2012.696649. McKenzie, G., Janowicz, K., 2015. Where is also about time: A location-distortion model to improve reverse geocoding using behavior-driven temporal semantic signatures. Computers, Environment and Urban Systems 54, 1–13. http://dx.doi.org/10.1016/j.compenvurbsys.2015.05.003. McKenzie, G., Janowicz, K., Gao, S., Gong, L., 2015. How where is when? On the regional variability and resolution of geosocial temporal signatures for points of interest. Computers, Environment and Urban Systems 54, 336–346. http://dx.doi.org/10.1016/j.compenvurbsys.2015.10.002. Nguyen, Q.C., Kath, S., Meng, H.-W., Li, D., Smith, K.R., VanDerslice, J.A., Li, F., 2016. Leveraging geotagged Twitter data to examine neighborhood happiness, diet, and physical activity. Applied Geography 73, 77–88. http://dx.doi.org/10.1016/j.apgeog.2016.06.003. Openshaw, S., 1984. The modifiable areal unit problem. Geo Books, Norwich, UK. Ratcliffe, J.H., 2002. Damned if you don’t, damned if you do: Crime mapping and its implications in the real world. Policing and Society 12 (3), 211–225. http://dx.doi.org/ 10.1080/10439460290018463. Ratcliffe, J.H., 2004. Geocoding crime and a first estimate of a minimum acceptable hit rate. International Journal of Geographical Information Science 18 (1), 61–72. http:// dx.doi.org/10.1080/13658810310001596076. Roongpiboonsopit, D., Karimi, H.A., 2010a. Comparative evaluation and analysis of online geocoding services. International Journal of Geographical Information Science 24 (7), 1081–1100. http://dx.doi.org/10.1080/13658810903289478. Roongpiboonsopit, D., Karimi, H.A., 2010b. Quality assessment of online street and rooftop geocoding services. Cartography and Geographic Information Science 37 (4), 301–318. http://dx.doi.org/10.1559/152304010793454318. Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L., 2006. Geocoding in cancer research: A review. American Journal of Preventive Medicine 30 (2), S16–S24. http://dx.doi.org/10.1016/j.amepre.2005.09.011. Rydberg, J., Grommon, E., Huebner, B.M., Pleggenkuhle, B., 2016. Examining the correlates of sex offender residence restriction violation rates. Journal of Quantitative Criminology 1–23. http://dx.doi.org/10.1007/s10940-016-9303-z. Seidl, D.E., Jankowski, P., Tsou, M.-H., 2016. Privacy and spatial pattern preservation in masked GPS trajectory data. International Journal of Geographical Information Science 30 (4), 785–800. http://dx.doi.org/10.1080/13658816.2015.1101767. Strickland, M.J., Siffel, C., Gardner, B.R., Berzen, A.K., Correa, A., 2007. Quantifying geocode location error using GIS methods. Environmental Health 6 (1), 1–8. http://dx.doi.org/ 10.1186/1476-069x-6-10. Sweeney, L., 2002. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (05), 571–588. http://dx.doi.org/10.1142/S021848850200165X. Thamm, A., Anke, J., Haugk, S., Radic, D., 2016. Towards the omni-channel: Beacon-based services in retail. In: Abramowicz, W., Alt, R., Franczyk, B. (Eds.), Business Information Systems: 19th International Conference, BIS 2016, Leipzig, Germany, July, 6–8, 2016, Proceedings. Springer International Publishing, Cham, pp. 181–192. Tompson, L., Johnson, S., Ashby, M., Perkins, C., Edwards, P., 2015. UK open source crime data: Accuracy and possibilities for research. Cartography and Geographic Information Science 42 (2), 97–111. http://dx.doi.org/10.1080/15230406.2014.972456. Tribby, C.P., Miller, H.J., Brown, B.B., Werner, C.M., Smith, K.R., 2015. Assessing built environment walkability using activity-space summary measures. Journal of Transport and Land Use 9 (1), 187–207. http://dx.doi.org/10.5198/jtlu.2015.625. Vandevijvere, S., Sushil, Z., Exeter, D.J., Swinburn, B., 2016. Obesogenic retail food environments around New Zealand schools: A national study. American Journal of Preventive Medicine 51 (3), e57–e66. http://dx.doi.org/10.1016/j.amepre.2016.03.013. VanWey, L.K., Rindfuss, R.R., Gutmann, M.P., Entwisle, B., Balk, D.L., 2005. Confidentiality and spatially explicit data: Concerns and challenges. Proceedings of the National Academy of Sciences 102 (43), 15337–15342. http://dx.doi.org/10.1073/pnas.0507804102. Wan, N., Zou, B., Sternberg, T., 2012. A three-step floating catchment area method for analyzing spatial access to health services. International Journal of Geographical Information Science 26 (6), 1073–1089. http://dx.doi.org/10.1080/13658816.2011.624987. Wernke, M., Skvortsov, P., Dürr, F., Rothermel, K., 2014. A classification of location privacy attacks and approaches. Personal and Ubiquitous Computing 18 (1), 163–175. http:// dx.doi.org/10.1007/s00779-012-0633-z. Whitsel, E.A., Quibrera, P.M., Smith, R.L., Catellier, D.J., Liao, D., Henley, A.C., Heiss, G., 2006. Accuracy of commercial geocoding: Assessment and implications. Epidemiologic Perspectives & Innovations 3 (1), 1–12. http://dx.doi.org/10.1186/1742-5573-3-8.

Geocoding and Reverse Geocoding

109

Wieland, S.C., Cassa, C.A., Mandl, K.D., Berger, B., 2008. Revealing the spatial distribution of a disease while preserving privacy. Proceedings of the National Academy of Sciences 105 (46), 17608–17613. http://dx.doi.org/10.1073/pnas.0801021105. Zandbergen, P.A., 2008. A comparison of address point, parcel and street geocoding techniques. Computers, Environment and Urban Systems 32 (3), 214–232. http://dx.doi.org/ 10.1016/j.compenvurbsys.2007.11.006. Zandbergen, P.A., 2009a. Accuracy of iPhone locations: A comparison of assisted GPS, WiFi and cellular positioning. Transactions in GIS 13, 5–25. http://dx.doi.org/10.1111/ j.1467-9671.2009.01152.x. Zandbergen, P.A., 2009b. Geocoding quality and implications for spatial analysis. Geography Compass 3 (2), 647–680. http://dx.doi.org/10.1111/j.1749-8198.2008.00205.x. Zandbergen, P.A., Barbeau, S.J., 2011. Positional accuracy of assisted GPS data from high-sensitivity GPS-enabled mobile phones. Journal of Navigation 64 (03), 381–399. Zandbergen, P.A., Green, J.W., 2007. Error and bias in determining exposure potential of children at school locations using proximity-based GIS techniques. Environmental Health Perspectives 115 (9), 1363–1370. Zandbergen, P.A., Hart, T.C., 2009. Geocoding accuracy considerations in determining residency restrictions for sex offenders. Criminal Justice Policy Review 20 (1), 62–90. Zandbergen, P.A., Ignizio, D.A., Lenzer, K.E., 2011. Positional accuracy of TIGER 2000 and 2009 road networks. Transactions in GIS 15 (4), 495–519. http://dx.doi.org/10.1111/ j.1467-9671.2011.01277.x. Zandbergen, P.A., Hart, T.C., Lenzer, K.E., Camponovo, M.E., 2012. Error propagation models to examine the effects of geocoding quality on spatial analysis of individual-level datasets. Spatial and Spatio-temporal Epidemiology 3 (1), 69–82. http://dx.doi.org/10.1016/j.sste.2012.02.007. Zhang, S., Freundschuh, S.M., Lenzer, K., Zandbergen, P.A., 2015. The location swapping method for geomasking. Cartography and Geographic Information Science 1–13. http:// dx.doi.org/10.1080/15230406.2015.1095655. Zheng, Y., 2015. Trajectory data mining: An overview. ACM Transactions on Intelligent Systems and Technology 6 (3), 1–41. http://dx.doi.org/10.1145/2743025.

1.09

Metadata and Spatial Data Infrastructure

Scott Simmons, Open Geospatial Consortium, Fort Collins, CO, United States © 2018 Elsevier Inc. All rights reserved.

1.09.1 Definitions 1.09.1.1 Metadata 1.09.1.2 Spatial Data Infrastructure 1.09.1.3 The Core Relationship Between Metadata and SDIs 1.09.2 Concepts in Summary 1.09.2.1 Metadata 1.09.2.1.1 History of development 1.09.2.1.2 Early standards 1.09.2.1.3 Voluntary consensus standards 1.09.2.1.4 Best practices for creation of metadata 1.09.2.2 SDIs 1.09.2.2.1 History of implementation 1.09.2.2.2 Policies and the legal environment 1.09.2.2.3 Design considerations 1.09.2.2.4 SDI architecture 1.09.2.2.5 Deployment tools for SDIs 1.09.3 Putting it All Together 1.09.3.1 Data Repositories: Build for Success 1.09.3.2 Fit-for-Purpose Considerations 1.09.4 Emerging Influences 1.09.4.1 Linked Data and the Semantic Web 1.09.4.2 Crowd-Sourcing and Volunteered Geographic Information 1.09.5 Summary References Further Reading Relevant Websites

1.09.1

Definitions

1.09.1.1

Metadata

110 110 110 111 111 111 111 112 114 115 117 117 117 118 118 121 121 121 121 122 122 122 122 123 124 124

Metadata is information that describes the contents and origin of data. More specifically, in a geospatial context, International Standards Organization Technical Committee 211 (Geographic Information) (ISO/TC 211) defines metadata for geographic information as providing information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services (ISO/TC 211 Terminology website). Not only does metadata describe the contents, quality, usage rights, and source of the data, but also properly built metadata allows for discoverability and assessment of the usefulness of the data which it describes.

1.09.1.2

Spatial Data Infrastructure

A spatial data infrastructure (SDI) is a specific configuration of a data infrastructure to manage and deliver geospatial data. Included in an SDI are not just the hardware, networking, and software required to physically manage the data, but also the appropriate policies, standards, and personnel to operate the infrastructure. The United Nations Committee of Experts on Global Geospatial Information Management (UN-GGIM) states that in developing an SDI, the most important element lies, not at the technical level, but rather in establishing effective coordination and infrastructure management in order to realise the full potential of geospatial information and the underlying technology, and to make it accessible to and effectively used by a broad range of users (UN-GGIS SDI website). SDIs often federated data from multiple sources, either physically in a central repository or virtually through a central catalog with links to source information maintained and served by the owner of the data. SDIs can exist at various levels of scale and complexity. Single organizations can maintain Enterprise SDIs; governments can maintain regional, state, or agency SDIs; or nations can hold National SDIs (generally abbreviated as NSDI). While similar components exist at all levels, the remainder of this article will focus on NSDIs with reference to any deviations in effort for lesser-scale efforts.

110

Metadata and Spatial Data Infrastructure 1.09.1.3

111

The Core Relationship Between Metadata and SDIs

Metadata must be present for an SDI to exist. Metadata provides the foundation information to define a searchable catalog of data in the SDI. Further, metadata contains the information necessary to apply policies to the data: for instance, some data may be visible only to accredited users of the SDI and hidden from guest users or processing services may only be run on data with a specified positional accuracy range. Conversely, an SDI drives the requirement for high-quality and complete metadata. Many organizations have gathered and organized data to serve specific purposes, often stored in file systems and databases with only a few people knowing what data can be found where. These organizations soon realize that metadata must be attached to the data before it can form the foundation of a working SDI. To be clear: a sensible directory structure and README file do not alone comprise an SDI!

1.09.2

Concepts in Summary

1.09.2.1

Metadata

1.09.2.1.1

History of development

Metadata has been around about as long as there have been maps. Even some of the very earliest maps had attribution of who created the map. And certainly by Babylonian times, maps included titles and often other information indicating the intended use of the map, as shown in Fig. 1 (public domain image). The first metadata that generally resembles modern metadata in content and purpose is associated with early map libraries, for example, the Spanish military established a War Depository (Depósito de la Guerra) in 1810 to collect and store historical maps and other documents to preserve experience from past wars and explorations. (This collection is now preserved at the Instituto de Historia y Cultura Militar in Madrid, Spain.) The libraries developed consistent nomenclature to describe the extent covered by the map, its source and date, and often, the purpose for which the map is to be used. This metadata existed in the form of map catalogs to aid in the identification and retrieval of the physical maps. A key difference between these catalogs and modern metadata is that the specific metadata record for a map was not attached to the map in such a fashion that the metadata could not be found without having access to the catalog. Fig. 2 highlights a modern map library index from the Perry-Castañeda Library at the University of Texas (PCL Map website).

Fig. 1

Babylonian Map of the World, circa 5th century BC. Source: https://en.wikipedia.org/wiki/Early_world_maps#/media/File:Baylonianmaps.JPG.

112

Fig. 2

Metadata and Spatial Data Infrastructure

Online map library index from the Perry-Castañeda Library at the University of Texas ay Austin. Source: http://www.lib.utexas.edu/maps/.

Perhaps, some of the earliest attached metadata is found in aerial photography. From almost the earliest use of aerial film cameras, each aerial photograph negative included in its border information regarding the lens manufacturer/serial number/focal length, the date of the photo, the flying height, and other information, as shown in Fig. 3 (source: USGS). Metadata was now transportable with the data. It is important to recognize that much of the discussion of the history of geospatial metadata to this point has focused on metadata for maps. Maps certainly should have associated metadata, but the true power of metadata is best realized when all of the data layers that comprise the map have their own metadata. Even hand-drawn maps usually contain layers of information (e.g., coastline, rivers, roads, terrain, etc.). When maps began to be printed with offset presses, each color had its own sheet for the printing press, and often each color represented just one or a few layers of information. Mapping organizations managed these plates as parts of a whole map and only updated the plates with changes to create a new edition of a map, see Fig. 4 (source USGS). Thus, managed metadata for layers of information was born. Metadata’s emergence as the cornerstone of sound geospatial data management practices did not truly emerge until the digital age. As geographic information systems (GIS) emerged, data could be created and managed as discrete types and layers of content. Early GIS software, such as MOSS (1977) (Reed et al., 1978) and GRASS (1982) (Westervelt, 2004), included the ability to save additional information about data layers and search across these metadata. Commercial software vendors also included such a capability, for instance ArcINFO (1982) and Deltamap (later GenaMap) (1986). The metadata stored along with data from a specific software platform in a proprietary or system-specific format was accessible by other users of the same software. Users also recognized the need to share data between software platforms, and many data translation tools also move accompanying metadata to the output format. However, much could be lost in translation: not just portions of the metadata, but also content, precision, and attribution of the data itself. Thus, the geospatial community recognized the need to develop open and standardized encoding formats to facilitate data sharing and a common set of terms for metadata, giving rise to geospatial standards.

1.09.2.1.2

Early standards

1.09.2.1.2.1 Data interchange format (DIF) One of the pioneers in standardizing metadata was the US National Aeronautics and Space Administration (NASA). At a workshop in 1987, DIF was proposed and then later completed in that same year. DIF was formally adopted in 1988 by NASA and continues

Metadata and Spatial Data Infrastructure

113

Fig. 3 Aerial photo from the USGS National Aerial Photography Program (NAPP) showing imagery metadata captured in the collar of the photo. Source: https://earthexplorer.usgs.gov/.

Fig. 4 Copper plate used to print paper maps. Note that only a subset of the map information (represented by a single color on that map) is stored on this copper plate. Source: https://www.usgs.gov/media/images/engraved-copper-map-plate.

to be used, with the latest version published in 2010 as DIF 10. DIS is a relatively simple metadata format containing 34 fields, of which only eight are mandatory. While the standard was created for the Earth Science community and certainly has a strong remotesensing heritage, it is not restricted to use for remotely sensed data. After publication of the ISO 19115: 2004 Metadata Standard in 2004 (see below), DIF was modified to be fully compatible with the ISO Standard (NASA DIF website).

114

Metadata and Spatial Data Infrastructure

1.09.2.1.2.2 Content standard for digital geospatial metadata (CSDGM) In 1990, the United States government established the Federal Geographic Data Committee (FGDC) (OMB, 1990) with the objective to promote the coordinated development, use, sharing, and dissemination of surveying, mapping, and related spatial data. The FGDC developed the CSDGM (better known as the “FGDC Metadata Standard”) in 1994 and updated the standard to its final form in 1998 (FGDC, 1998). This standard is a comprehensive repository for metadata with a large number of fields, but a clear organizational structure. FGDC Metadata Standard documents are normally stored not only as plain or formatted text, but also as XML or in other encodings. Vast quantities of geospatial data created in the United States by government, commercial, and nonprofit entities include FGDC Metadata. Most GIS software packages read and write FGDC Metadata, and validators are common. It is important to note that all United States Federal agencies have been directed to work with voluntary consensus standards for geospatial practices, and thus the FGDC endorsed the use of the ISO 19xxx Metadata Standards in 2010 (see below for more information on these standards). 1.09.2.1.2.3 Other national standards Akin to the FGDC Metadata Standard, numerous national mapping agencies and cooperative organizations developed their own geospatial metadata standards. As these entities increasingly distribute and publish their data holdings, legacy content will include metadata in the respective formats (noting that most of these entities have moved to adoption of the ISO 19xxx series of standards or profiles of those standards). In Australia and New Zealand, New Zealand Land Information Council (ANZLIC) developed a metadata standard in 1996 and now publishes a profile of ISO 19115:2005 (2007). The United Kingdom also developed a profile of ISO 19115:2003, along with content from the UK e-government metadata standard (e-GMS), in 2004 known as the GEo-spatial Metadata INteroperability Initiative (UK GEMINI). This standard has been augmented with recommendations from the Infrastructure for Spatial Information in Europe (INSPIRE) program and was last updated in 2012 (an update is intended in 2017) (AGI, 2012).

1.09.2.1.3

Voluntary consensus standards

1.09.2.1.3.1 Overview In 1994, two geospatial standards organizations were founded: the OpenGIS Consortium (now known as the Open Geospatial Consortium or OGC) and ISO/TC 211. The OGC was established by a group of government agencies, companies, and universities to promote geospatial data interoperability through standards. ISO also recognized the need to develop and maintain internationally recognized standards for the geospatial practice and thus created TC 211. The two groups have always had significant overlap in membership and active participants, and thus numerous standards are jointly developed or passed from one organization to the other for endorsement and publication. Fundamental to standards published by OGC and ISO/TC 211 are the metadata standards. These standards are further described below. 1.09.2.1.3.2 ISO 19115 The ISO 19115 Standard for geospatial metadata was initially published as a single volume as ISO 19115:2003. Per the Standard abstract, ISO 19115:2003 defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data (ISO, 2003). ISO 19115:2003 is still widely used and forms the basis for numerous national standards as described earlier in this article. ISO 19115 has since been revised and published in three volumes (parts). ISO 19115–1: 2014: this part describes the fundamental metadata schema and is an update of ISO 19115: 2003 (ISO, 2014). Note that ISO 19115–1:2014 added no new mandatory elements to the previous version to preserve a maximum degree of backward compatibility. Thus, the XML encoding Standards designed to work with ISO 19115:2003 will also generate valid ISO 19115–1:2014 content. l ISO 19115–2: 2009: this part extends ISO 19115:2003 to describe metadata for imagery and gridded data, including sensor/ measuring information, processing of the acquired data, and derivation of information from the raw data (ISO, 2009). l ISO/TS 19115–3: 2016: this part provides XML schema, Schematron rules, and an Extensible Stylesheet Language Transformation (XSLT) to implement ISO 19115–1, ISO 19115–2, and ISO/TS 19139 in XML (ISO, 2016). l

1.09.2.1.3.3 ISO/TS 19139 ISO/TS 19139:2007 is an XML encoding for ISO 19115:2003. ISO/TS 19139 has a second part, ISO/TS 19139–2:2012 for an XML encoding for ISO 19115–2:2009. There is current work in ISO/TC 211 to update the ISO 19139 Standards to provide encoding for the latest ISO 19115 Standards. 1.09.2.1.3.4 Using ISO metadata standards So how does one use the ISO metadata standards? ISO 19115 is encoding-agnostic, so data publishers can choose the format of their preference for sharing the metadata. However, the best practice is to author content in a common format that can be easily parsed, so most software packages tend to save the metadata in XML (per ISO 19139dsee Fig. 5 for an example [US Census]) or text format. Conventions and best practices for population of metadata are provided later in this article.

Metadata and Spatial Data Infrastructure

Fig. 5

115

Example of a portion of ISO 19139-compliant metadata. Source: U.S. Census.

1.09.2.1.3.5 Dublin Core The Dublin Core Metadata Initiative (DCMI) maintains a metadata model that takes advantage of modern linked data capabilities. The Dublin Core comprises a very simple minimum set of 15 elements and is extensible through the use of “application profiles” where specialized vocabularies are appended to the minimum elements to satisfy the requirements for a specific discipline (such as geospatial). The DCMI Abstract Model (DCMI website) provides syntax for implementing Dublin Core in the World Wide Web Consortium (W3C) Resource Description Framework (RDF). The use of RDF allows great flexibility inherent in linked data, but can also result in complexity unless properly constrained. Numerous bodies have developed crosswalks from other metadata standards into Dublin Core, including ISO 19115. The use of Dublin Core (and RDF principles in general) is particularly beneficial for web-accessible geospatial resources and for flexible discovery of data in catalogs (OGC, 2009; Lopez-Pellicer et al., 2010). The use of linked data concepts with well-defined geospatial vocabularies extends the concept of a one-to-one relationship between a metadata records and a specific dataset to allow for discovery of related data (which might not even be geospatial in nature). In fact, linked data concepts and the use of the semantic web may radically redefine what is even meant by the term “metadata”drefer to the “Emerging Influences” section at the end of this article.

1.09.2.1.4

Best practices for creation of metadata

It should be clear to the reader that the ISO 19115/19139 series of standards are the most commonly used and prescribed metadata standards in the geospatial community, regardless of whether the metadata is encoded in text, XML, or as a Dublin Core application profile. With the presumption that metadata is to be encoded to the ISO 19115 schema (or a profile thereof), there are several best practices that should be considered in authoring such metadata. There are three considerations for authoring of useful metadata: the metadata must be Correct, Complete, and Relevant. Correct: every entry must be accurate and adequately descriptive. Further, the metadata document as a whole must validate against the standard to which it is written. Recall that metadata is key to data discovery and cataloging, so user expectations are that data will reflect the metadata entries used in finding those data. l Complete: users of geospatial data never complain about too much metadata; authors might find creating such detailed metadata content onerous. However, data publishing organizations typically have a minimum required metadata content standard l

116

Metadata and Spatial Data Infrastructure

(often as a profile of ISO 19115) and ISO 19115 itself prescribes a certain amount of mandatory content to ensure useful metadata is created to facilitate data interoperability. l Relevant: much of the content in a metadata standard is tightly constrained against specific options. For example, the geometry of a feature dataset is limited to a few geometric primitive types. Likewise, the bounding box of a dataset is measurable. But some metadata fields are more subjective in content, such as keywords, and thus require careful consideration in populating to ensure that the field information is relevant to the community(ies) of interest. Such considerations are not always obvious, for example, a database of electric power transmission line towers is critically important to aeronautical navigation and thus should include a keyword to indicate that the database includes “vertical obstructions.” Authors should always use terms in metadata with consistent meaning. ISO/TC 211 publishes a Multi-Lingual Glossary of Terms (ISO/ TC 211 Terminology website) that is widely used in metadata and provides normative definitions for ISO/TC 211 and OGC standards. 1.09.2.1.4.1 Tools to assist in authoring and validation Given the complexity of the prevailing metadata standards, most geospatial software vendors have created tools to facilitate creation and maintenance of metadata. Such tools can automatically populate some portion of the metadata (such as file name, last update date, field names for attributes, etc.). Combining this automated population of content with example information and/or guidance for populating the metadata, these tools greatly reduce the time to create useful metadata. Fig. 6 shows an example of the INSPIRE Metadata Editor from the European Commission, an online tool for the authoring and validation of metadata (INSPIRE Metadata Editor website). There are also open source metadata software tools to provide a user-friendly interface to editing of metadata, such as tkme (tkme website), GeoNetwork opensource (GeoNetwork website), and the Geoportal XML Editor (Geoportal XML Editor website). Some of these tools even validate the metadata to ensure that fields include the proper types and that “mandatory” content (as specified for a particular user or project) is complete. 1.09.2.1.4.2 Automated population While metadata has a clear value for the geospatial industry, the standards are daunting in scope and most organizations and data creators struggle to create complete and accurate metadata. Some would argue that the cost–benefit ratio of time-to-value is difficult to justify. Automation of metadata population is possible for a significant portion of important content, and automation capabilities continue to improve. Automation generally takes three forms:

Fig. 6

The online INSPIRE metadata editor and validator. Source: http://inspire-geoportal.ec.europa.eu/editor/.

Metadata and Spatial Data Infrastructure

117

1. Default values: organizations can establish default values for publication details, contact information, etc. Additionally, a template can be used to copy common content between metadata sets where the described data have many commonalities. 2. Extraction of information from content: many metadata field values can be directly obtained from the data. For instance, the geometry of a feature class, the bounding box of the data, and the type (e.g., binary, text, integer, etc.) of a field are all easily obtainable without manual intervention. 3. History of processing: for those datasets which were derived from other data, geospatial processing software packages often have mechanisms to store the sequence of processing that arrived at the final data. Such history can range from simply reference to an imagery source from which feature data were digitized to the parameters used in a geostatistical interpretation of raw data to create an analytical product. 1.09.2.1.4.3 Sub-dataset level metadata Many datasets are derived from multiple sources or rely upon multiple methods for compilation. In such cases, metadata may require a finer level than just descriptive of the dataset as a whole. In the case of feature data, there are three general additional levels of detail at which metadata is commonly recorded (and of course, combinations of these detailed levels may exist in a single dataset) (Danko, 2012). 1. Feature-type-level metadata: a group of features within a larger database, such as roads vs. property boundaries in a cadastral dataset. The roads may have been collected by vehicle-mounted survey, while property boundaries were digitized from paper maps. Each feature type can contain additional metadata to describe its source and accuracy. Feature-type-level metadata is typically recorded as additional metadata records associated with the dataset or as subsets of metadata in a master metadata structure. 2. Feature-level metadata: each feature in a dataset can have unique metadata. Feature-level metadata is quite common in defense mapping (NGA, 1997) or in other domains where features are collected from image mosaics. If a feature dataset was digitized from a mosaic of several images, the source image for features will vary across the dataset and is often recorded. Feature-level metadata is generally stored as attributes for the feature. 3. Feature-attribute-level metadata: as each feature can have varying sources, so can specific attributes for a single feature, and attribute-level sources may vary across the dataset. For example, the name of a road could be derived from a legacy road dataset for some features and from a visual inspection for other features. Feature-attribute-level metadata is typically stored as additional attributes for a feature.

1.09.2.2 1.09.2.2.1

SDIs History of implementation

The growth in digital geospatial data in the 1980s posed a management concern to several national governments such that investigations began as to the ideal fashion to collect, store, and disseminate geospatial data. While most stakeholders were organizations with each government, public access to content was also an early item of interest. Perhaps, the first formalized efforts to establishing an NSDI came from Australia with the establishment in 1986 of the Australian Land Information Council to coordinate crossgovernment data sharing and use. New Zealand joined the effort in 1991 leading to a renaming of the Council in use today: the ANZLIC (Masser, 1999). By 1997, ANZLIC recognized that the elements of an NSDI were mostly in place, although not necessarily centrally coordinated (ANZLIC, 1996). Similar to Australia, the US federal government realized the requirement for central management of an NSDI. The FGDC was established in 1990 (see section above on “Content standard for digital geospatial metadata (CSDGM)”) and in 1994, Executive Order 12906 “Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure” was signed, signifying the Executive Branch in leading establishment of the United States NSDI (OMB, 1994). A number of other nations initiated SDI efforts at about the same time including Qatar and Portugal in 1990 and the Netherlands in 1992 (Masser, 1999). SDIs and related institutions do participate in an organization for cooperation, networking, and education: the Global Spatial Data Infrastructure Association (GSDI). GSDI organizes regular conference and develops publications of key importance to the industry. More information can be found on the GSDI website.

1.09.2.2.2

Policies and the legal environment

Most SDIs, particularly NSDIs, are accompanied and governed by policies impacting not only the design of the SDI, but also the operation and data dissemination rules. Coordination of various participants in the SDI is generally the most challenging aspect of establishment and maintenance of an SDI, and thus policies often focus more on directing or facilitating such coordination. Policy may have the rule of law or may be a condition for participation in the SDI. Rather than restating policy content in this article, the reader is directed to the UN-GGIM collection of policy documents and supporting information (UN-GGIM website). As soon as multiple datasets are integrated, there is a strong probability that usage rights and restrictions of the datasets will be in conflict. This conflict impacts not only the availability of data but also the recovery of cost in providing the data. Some nations (such as the United States) have laws that strongly encourage public use of public-funded data, thus eliminating many concerns with usage rights. Other nations or transnational bodies (such as the European Union) must deal with diverse laws impacting their collections of data (van Loenen and Kok, 2004).

118

Metadata and Spatial Data Infrastructure

Further, security restrictions on data often vary over time. Some readers may recall that the United States Army Corps of Engineers dataset of dam locations in the United States was freely available for many years until shortly after the 11 September 2001 terrorist attacks. Some years after that attack, the data again became public and are now available as the National Inventory of Dams. A final consideration of increasing visibility to the general public is that of privacy. Some geospatial datasets (such as cadastral land ownership records) contain specific information on citizens that may not be releasable in certain legal jurisdictions. With the increasing derivation of geospatial data from unstructured content (such as social media feeds), privacy concerns are certainly at the forefront of legal considerations, but also quite complex to address.

1.09.2.2.3

Design considerations

The key design parameters for any SDI are quite simple in concept: make the data discoverable and obtainable. But it will come as no surprise to anyone that the devil is in the details. Below are some generalizations that apply to all SDIs: the details of how each element of an SDI is constructed and administered will come later in this article. Content: an SDI is not going to be built around a single geospatial dataset. At a minimum, multiple datasets will be included in the SDI and commonly, these datasets will be provided from multiple owning organizations. Further, these datasets are likely to be stored in multiple formats and/or database server architectures. Metadata: without adequate metadata, the SDI content cannot be described and when not described, the data are not discoverable. However, metadata may come in different forms for different data and will generally not be consistent in completeness. As an SDI matures, rules for metadata authoring can be enforced to improve consistency in description. Discovery services: working with the assumptions that the content is stored in multiple formats/databases and that the metadata are not equivalent in detail across the content, overlying discovery services must be able to “walk” across the metadata to identify underlying content and be able to describe how the content may be accessed. Access: different types of data have preferred mechanisms for access. For instance, raster or gridded content is most easily served as raster graphics to a client. Feature content may be needed for use in reference layers as rendered maps (thus served as raster graphics) or as actual features accessible as vector graphics. Often a simple download of clipped raw data is required. When multiple datasets are requested to be accessed together, options for service to a client software or bundling for download quickly become limited. Management: common goals and rules for operation ensure a sustainable SDI. Management need not be central in the hands of a single organization; management can also be distributed across federated bodies with some degree of cooperative planning, usually in a committee (such as the FGDC in the United States).

1.09.2.2.4

SDI architecture

The architecture for an SDI can be generalized as comprising the following elements: content databases; a discovery (catalog) service; l processing services; and l dissemination services. l l

Attached to the architecture are the tools necessary to author the content and the clients to consume the discovered data or dissemination services. Many architectural models to implement such infrastructure are possible, please see examples in Goodchild et al. (2007), Ferreira et al. (2015), and Bernard et al. (2009). However, the overall picture can be greatly summarized as in Fig. 7. Note that the SDI interface to the clients supports discovery and delivery of data. However, the data may already be available to the SDI as a service that can be discovered and directly served to the client without any processing or compilation for delivery through a dissemination service. The SDI architecture operates in a networked environment. As NSDIs are built for government and public sharing of data, they are connected to the internet. This means that connection from the NSDI interfaces to the source data is often via the internet and consumption of data by end users occurs through web services or file downloads. Secured or classified SDIs that serve defense or other restricted user communities rely upon intranets, but use the same software and processes as with publicly accessible SDIs. Each of the four main elements of an SDI is further discussed below. 1.09.2.2.4.1 Content databases Content is provided to the SDI in a variety of ways. Raw data organized in file systems can be adequately cataloged for discovery and retrieval. Dedicated spatial databases, such as Oracle Spatial, Esri’s geodatabases (e.g., ArcSDE), or the PostGIS extension to Postgre, store content in tables in relational databases. Many relational databases even include native spatial capabilities for storing geometry in the OGC Simple Features standard (OGC, 2011) and some query capabilities, for instance Microsoft SQL Server or MySQL include such features in their core functionality. Nonrelational (or NoSQL) systems are also increasingly geospatially aware. Popular file or document-oriented storage systems include MongoDB, MarkLogic, and databases leveraging Apache Hadoop. Graph databases with well-tested spatial capabilities include AllegroGraph and Neo4j. These content stores may be accessed directly by the discovery, processing, and dissemination services of the SDI. However, in more complex or distributed SDIs, content data storage systems are generally managed internally by the owning organization and only exposed to the SDI consumers via intermediate services. The OGC Web Services, for example, Web Map Service (WMS) (OGC

Metadata and Spatial Data Infrastructure

?

SDI interface

?

?

Discovery service

119

Dissemination services Processing services

Content databases

SQL

Fig. 7

NoSQL

File system

WMS

WFS

Simplified architecture for a SDI. The gray lines indicate queries against a Discovery Service, the black lines indicate the flow of data.

WMS website), Web Map Tile Service (WMTS) (OGC WMTS website), Web Feature Service (WFS) (OGC WFS website), and Web Coverage Service (WCS) (OGC WCS website), are commonly used in government-administered SDIs. Proprietary web or network services from the major geospatial software providers can offer advantages for ease of administration and still offer further exposure via standard web services. SDIs commonly provide a “bridge” between the end user and the data service described in the previous paragraph. The use identifies some data of interest that is part of the SDI and can directly access the data service from their client device. In many cases, such a model is an optimal use of resources by the community participating in the SDI. 1.09.2.2.4.2 Discovery services Data are worthless if not discoverable. Likewise, brilliant collaboration in cooperating on data collection and maintenance through an SDI agreement still provides little value to the end users if there are no cataloging and discovery services built into the framework. Key to discovery is relevant metadata (as described in the first half of this article). SDIs provide value just by including a catalog that identifies the content by useful keywords and which provides information on how to access or request the data. But the real power of discovery services is providing an assistance system to help users find the content they need and consume that content in their own clients or work environments. Fig. 8 shows an example of the Geoportal from the Group on Earth Observations (GEO website). In this example, the user has requested OGC WMS 1.1.1 services over Honduras. SDI discovery services still often rely upon custom-programmed or database-driven query interfaces. There are also standards to facilitate discovery, such as the OGC Catalogue Service (CSW 2.0.2 and 3.0) (OGC Catalogue Services website). Such standards work against traditional metadata catalogs or data registries, such as ebRIM. An excellent example of the latter is provided from OSGeo (OSGeo ebRIM website). An interesting extension to discovery is the ability to store a reference to the identified data as well as any downstream processing through a context document. The OGC has a standard for an OGC Web Services Context Document that can be used to repeat retrievals or share web-accessible data and processing steps with others without having to bundle all of the source content (OGC OWS Context website). 1.09.2.2.4.3 Processing services Before information can be consumed by the end user, be it as physical files or via data and web services, the source content often requires some processing. For instance, retrieval of data inside a bounding box is generally expedited by clipping the data to that box to save in transmission costs. In addition, a successful SDI does not necessarily force every content provider to adhere to the same coordinate reference system (CRS) for the information. Provided that the CRS is defined correctly (ideally using EPSG codes, see the EPSG Registry (EPSG

120

Fig. 8

Metadata and Spatial Data Infrastructure

An example query of the GEOSS Portal as an access point to the GEOSS SDI. Source: http://www.geoportal.org/.

Registry website)), a processing service can be used to re-project data on the fly from multiple sources into a single delivery in a single CRS. Other processing workflows can include the following: deriving new information from calculations against one or more source datasets; thinning of data or reducing resolution; l cartographic rendering of raw content; and l format conversion from the source to a delivery type. l l

The processed data are now ready for dissemination. 1.09.2.2.4.4 Dissemination services The most important part of an SDI is that which delivers content to the end user. “Dissemination” in this article has a broad definition: it is the delivery of information from an SDI to a consumer, be that consumer another machine, a skilled end user of geospatial data, or a simple graphic rendering. Early SDIs (and in fact many cataloged datasets to this day) were delivered as a FTP or HTTP download of one or more files to an user after the content was identified. The simplest method of such delivery is copying the source content to the user with no further manipulation. Of course, processing services can provide the ability to geographically clip the information, transform its CRS, or change its format, but the end result is still mostly the same: a file is now available for further use. Intermediate between the file services described above and web services described next is direct access to an SDI database. Many geospatial databases allow networked direct connection. However, such connection is typically limited to intra-SDI delivery of data and not exposed to end users as there can be significant security and performance issues if end users are directly interfacing with the databases. In many cases, a processing service may sit on top of a database and deliver processed information or statistics to the end user, often to assist the user in assessing whether the information is relevant to their needs. With the advent of web services for geospatial information, dissemination of data can be more dynamic and require less bandwidth and no client storage. As described earlier with the Content databases, web services are available to provide rendered (but queryable) representations of one or more content sources as WMS or WMTS. Data may also be provided by a mechanism where it can be consumed and further utilized as if it were sitting on the client machine in the form of WFS (for vector (feature) data) or WCS (for coverage data, including multidimensional gridded data). WCS offers some unique capabilities in directly querying the content such that subsets of information are retrieved, as shown in Fig. 9 (OGC).

Metadata and Spatial Data Infrastructure

121

Subset = Trim

Fig. 9

Slice

Example WCS data subsetting. Source: https://portal.opengeospatial.org/files/09-110r4.

For further examples of modern SDI interfaces and their dissemination mechanisms see below: United States GeoPlatform (GeoPlatform website); European Commission Geoportal (EC Geoportal website); and l Bhuvan, Indian Geo-Platform of ISRO (Bhuvan website). l l

1.09.2.2.5

Deployment tools for SDIs

While NSDIs are certainly complex in operation and configuration, smaller SDIs can now readily be established with deployment tools from commercial and open sources. These tools can be used to develop components that are part of a larger SDI or to build an entire SDI from scratch. But the user should be beware that the more complex the SDI, the less automated configuration can be. Examples include Hexagon Geospatial’s Geospatial SDI, Geoportal Server, GeoNode, easySDI, and others.

1.09.3

Putting it All Together

It should be clear from this article so far that metadata and SDIs are intrinsically linked. An SDI cannot function without metadata, and metadata provides critical value in the proper use of any geospatial content. One could argue that any collection of geospatial data that is discoverable and deliverable is an SDI. The remainder of this section discusses some considerations for appropriate use of metadata and its influence on the design of SDIs.

1.09.3.1

Data Repositories: Build for Success

All geospatial data needs metadata to be most useful. If data are to be included in an SDI, the metadata requirements should be defined in the SDI operational procedures. However, many originators of geospatial content do not necessarily have a forecast of the eventual lifecycle of the data and where it may end up being most used. All data should have sufficient metadata to place the information on a map (the CRS), to describe its content, and to identify the creator or publisher. An organization should agree upon a consistent and manageable metadata content standard for all data that organization will create and share. Consistency is a key to ensure that the collective whole of data is discoverable and used in a fair and appropriate manner.

1.09.3.2

Fit-for-Purpose Considerations

How much metadata is too much? The answer depends on the perspectives of both the author and the user of the metadata. Developing metadata is time-consuming (despite some of the automation noted earlier) and not particularly exciting for most data managers or creators. The tendency for most authors is to deliver as little as possible under the two following general cases: 1. metadata content may be programmatically or contractually defined such that data are not accepted without adequate metadatadthere are financial or professional consequences to consider or 2. the data creator wants the data to be discoverable and is willing to create sufficient metadata content to ensure that the data can be found. Understand that much of the value of the data is in how it is eventually used. Data creators should be willing to invest in metadata and make its creation part of the overall data authoring workflow. Further, there are numerous GIS tools available to end users and several operating paradigms with respect to use of local vs. networked or web data stores. Use the SDI (even if that SDI is for your single data layer) to offer as many dissemination options

122

Metadata and Spatial Data Infrastructure

as possible. Some consumers want raw content to further process, and others want an attractive map. Offer the user these possibilities using adopted or de facto standards wherever possible.

1.09.4

Emerging Influences

1.09.4.1

Linked Data and the Semantic Web

As described earlier in this article, the DCMI facilitates the use of linked data concepts for improving web accessibility of metadata. The W3C has published a standard for a Data Catalog Vocablular (DCAT) to assist with interoperability between data catalogs that have been published on the web. Such concepts alleviate some of the problems with inconsistent metadata models and negotiate consistent discovery of information across multiple datasets created in different user expertise domains. The geospatial community in the European Commission has extended DCAT to create GeoDCAT-AP. This application profile of DCAT provides an RDF syntax binding for geospatial metadata, specifically the metadata elements specified in the ISO 19115:2003 standard and the INSPIRE Directive (European Union, 2007). GeoDCAT continues to evolve and will likely become central to standard activities in the geospatial and web communities. With the capabilities of a semantic web (W3C Semantic Web website), wherein all data are published to the web and linked with appropriate semantic mediation through vocabularies and rules, the current concept of metadata may become completely irrelevant. Because metadata merely describes data, all of the descriptive elements are themselves a collection of data. Might users one day search for geospatial data based on common language terms, and the semantic relationships between those terms and the highly specialized data properly negotiate the identification and retrieval of the most relevant data.

1.09.4.2

Crowd-Sourcing and Volunteered Geographic Information

For much of its history, geospatial data has been created by expert practitioners to satisfy specific requirements. Now, however, massive quantities of geospatial data are being collected from the “crowd,” either as derived information from nongeospatial content such as social media or as volunteered geographic information provided purposely by volunteer effort. There are three general types of information being collected from the crowd, each with its own implications for assigning metadata and potentially including in SDIs. Crowd-source nongeospatial data: these data comes from unstructured content that has been processed to include a geospatial reference. In some cases, the position where the content was created is recorded (such as a georeferenced tweet). In other cases, the applicable geographic reference is obtained by analyzing the contentda discussion of the Eiffel Tower implies a location in Paris, France. But the Eiffel Tower with the big cowboy hat on top is in Paris, Texas, USA, so the reference is subject to uncertainty. Metadata assignable to the data obtained from these types of content is generally limited to the source of the information and the method for assigning location. Directed volunteered geographic information: these data are obtained by volunteers who have been directed to capture specific information and to adhere to a collection and description methodology that is consistent. Many citizen science initiatives rely upon this model (COBWEB website). The managers of the collection expect a certain level of competence in the volunteers and generally include collection of metadata as part of the overall data collection regimen. Nondirected volunteered geographic information: these data are collected by a community of interested individuals who want to collectively develop one or more datasets. An excellent example is OpenStreetMap (OpenStreetMap website). Because all of the content is created by volunteers and there are seldom strict rules as to how each piece is described, metadata is generally informal and inconsistent. Automated routines can be run on the data to obtain additional metadata and sometimes to assign a degree of confidence in the accuracy of the content, but there is no contractually or legally obligated rule set for creation of the metadata. In this case, the creators of the data may not understand or be concerned with the importance of metadata. Data managers and SDI administrators must be careful when working with crowd- or volunteer-sourced data and include metadata that as clearly as possible identifies the source of the content, the nature of its collection, and any pertinent information that describes the inherent reliability and accuracy of the data.

1.09.5

Summary

Metadata and SDI concepts have been purposely woven together in this article for two reasons: 1. metadata are key enabling characteristics of geospatial data that underpin SDIs and 2. SDIs provide a relevant example of the benefit and design principles of metadata. Metadata ranges in complexity from very simple title and legend information on a map to a sophisticated database descriptive of the creator, content, provenance, and processing history of data. As metadata provides the description of associated data, it is the basis by which data can be searched and retrieved. Thus, with the advent of digital geospatial data (and large volumes of such data), governments and data holding organizations recognized the need to standardize the content of metadata to facilitate searches.

Metadata and Spatial Data Infrastructure

123

Most publishers and repositories of digital geospatial data now adhere to or are in the process of moving to the use of the ISO metadata standards. These ISO standards are comprehensive in content and thus are frequently profiled to a subset of information by organizations. Because these standards are developed and maintained in a voluntary, consensus-based and international process, they have stability and support from across the community. Dataset-level metadata is a minimum requirement for effective discovery and consumption of geospatial data. However, more detailed metadata at the feature type or even feature or pixel level can better describe the provenance and suitability of data for use. With such additional detail comes the additional burden of populating metadata. Numerous tools and automation techniques are available to facilitate the process of creating metadata at any level from the dataset to the attribute. A collection of geospatial data, described by adequate metadata, can be managed and distributed in an SDI. The basic SDI contains four fundamental architectural elements: 1. 2. 3. 4.

content databases; a discovery service; processing services; and dissemination services.

The purpose of the SDI is to consistently manage a collection of geospatial data that may or may not originate in the organization hosting the SDI. These data are managed for interoperable use among stakeholders, often including the public. A key advantage of an SDI is that users have a single point-of-entry into a catalog of diverse geospatial content. The best SDIs allow users to both download content as needed and to link to content via web services to ingest in their client applications. Two evolving trends are likely to significantly impact the concepts long held as best practices by the geospatial community for metadata and SDIs. Linked data techniques can enable better connectivity between traditional geospatial data and other information available on the web, but will blur the line between data and metadata such that the two may one day be indistinguishable. The traditional architecture of the SDI may become less well defined, and discovery services may be inherent in the rules linking content. The growth of crowd-sourced data means that SDIs may have to adapt to ingest such information, despite having very little oversight or insight into the quality and intent of the data. Metadata is only variably available for crowd-sourced content and seldom it is consistent in format. Perhaps, linked data techniques will be the best mechanism for describing and finding data from the crowd and allowing those data to be inserted into SDIs.

References AGI, 2012. UK GEMINI, specification for discovery metadata for geospatial data resources, v. 2.2. Association for Geographic Information, London. ANZLIC, 1996. Spatial data infrastructure for Australia and New ZealanddDiscussion paper. Australian Government, Canberra. ANZLIC, 2007. ANZLIC metadata profile: An Australian/New Zealand profile of AS/NZS ISO 19115:2005, geographic informationdMetadata (implemented using ISO/TS 19139:2007, geographic informationdMetadatadXML schema implementation). Australian Government, Canberra. Bernard, L., Kanellopoulos, I., Annoni, A., Smits, P., 2009. The European geoportaldOne step towards the establishment of a European spatial data infrastructure. Computers, Environment, and Urban Systems 29, 15–31. Danko, D.M., 2012. Geospatial metadata. In: Kreese, W., Danko, D.M. (Eds.), Springer handbook of geographic information. Springer-Verlag, Berlin, pp. 359–393. Union, European, 2007. Directive 2007/2/EC of the European parliament and of the council of 14 March 2007 establishing an infrastructure for spatial information in the European Community (INSPIRE). European Union, Strasbourg. Ferreira, K., de Querioz, G., Vinhas, G., et al., 2015. Towards a spatial data infrastructure for big spatiotemporal data sets. In: Gherardi, D.F., e Cruz de Aragao, L.E. (Eds.), Proceedings from XVII Simpósio Brasileiro de Sensoriamento Remoto. Brasilia, Simpósio Brasileiro de Sensoriamento Remoto, pp. 7588–7594. FGDC, 1998. FGDC-STD-001-1998: Content standard for digital geospatial metadata (revised June 1998). Federal Geographic Data Committee, Washington, DC. Goodchild, M.F., Fu, P., Rich, P., 2007. Sharing geographic information: An assessment of the geospatial one-stop. Annals of the Association of American Geographers 97, 250–266. ISO, 2003. ISO 19115:2003 geographic informationdMetadata. International Organization for Standardization, Geneva. ISO, 2009. ISO 19115–2: 2009, geographic informationdMetadatadPart 2: Extensions for imagery and gridded data. International Organization for Standardization, Geneva. ISO, 2014. ISO 19115–1:2014, geographic informationdMetadatadPart 1: Fundamentals. International Organization for Standardization, Geneva. ISO, 2016. ISO/TS 19115–3: 2016, geographic informationdMetadatadPart 3: XML schema implementation for fundamental concepts. International Organization for Standardization, Geneva. Lopez-Pellicer, F.J., Florczyk, A.J., Nogueras-Iso, J., Muro-Medrano, P.R., Zarazaga-Soria, F.J., 2010. Exposing CSW catalogues as linked data. In: Painho, M., Santos, M.Y., Pundt, H. (Eds.), Geospatial thinking. Springer-Verlag, Berlin, pp. 183–200. Masser, I., 1999. All shapes and sizes: The first generation of national spatial data infrastructures. International Journal of Geographical Information Science 13, 67–84. NGA, 1997. National system of geospatial intelligence (NSG) geospatial metadata deskside reference (NGMDSR) version 1. National Geospatial-Intelligence Agency, Washington, DC. OGC, 2009. OGC catalogue servicesdOWL application profile of CSW, OGC discussion paper OGC 09–010. Open Geospatial Consortium, Wayland. OGC, 2011. OpenGIS® implementation standard for geographic informationdSimple feature accessdPart 1: Common architecture; OGC 06-103r4. Open Geospatial Consortium, Wayland. OMB, 1990. Coordination of surveying, mapping, and related spatial data activities, circular A-16 Revised. Office of Management and Budget, Washington, DC. OMB, 1994. Coordinating geographic data acquisition and access: The national spatial data infrastructure. Office of Management and Budget, Washington, DC. Reed, C., Hammill, J., Gropper, J., Salmen, L., 1978. Logical capabilities of the USFWS GIS. U. S. Fish and Wildlife Service, Washington, DC. van Loenen, C., Kok, B., 2004. Spatial data infrastructures legal and economic issues. In: van Loenen, C., Kok, B. (Eds.), Spatial data infrastructure and policy development in Europe and the United States. DUP Science/Delft University Press, Delft, pp. 1–14. Westervelt, J., 2004. GRASS roots. Proceedings of the FOSS/GRASS Users Conference, Bangkok, Thailand. Osaka City University, Osaka. http://gisws.media.osaka-cu.ac.jp/ grass04/papers.php.

124

Metadata and Spatial Data Infrastructure

Further Reading GSDI (2012) Spatial data infrastructure cookbook 2012 update. Accessed 20 January 2017. http://gsdiassociation.org/images/publications/cookbooks/ SDI_Cookbook_from_Wiki_2012_update.pdf

Relevant Websites http://bhuvan.nrsc.gov.in/bhuvan_links.php – Bhuvan website. https://cobwebproject.eu/ – COBWEB website. http://dublincore.org/documents/abstract-model/ – DCMI website. http://inspire-geoportal.ec.europa.eu/ – EC Geoportal website. https://www.epsg-registry.org/ – EPSG Registry website. http://www.earthobservations.org/index.php – GEO website. http://geonetwork-opensource.org/ – GeoNetwork website. https://www.geoplatform.gov/ – GeoPlatform website. https://github.com/Esri/geoportal-server/wiki/Geoportal-XML-Editor – Geoportal XML Editor website. http://gsdiassociation.org/ – GSDI website. http://inspire-geoportal.ec.europa.eu/editor/ – INSPIRE Metadata Editor website. http://www.isotc211.org/Terminology.htm – ISO / TC 211 Terminology website. http://gcmd.nasa.gov/add/difguide – NASA DIF website. http://www.opengeospatial.org/standards/cat – OGC Catalogue Services website. http://www.opengeospatial.org/standards/owc – OGC OWS Context website. http://www.opengeospatial.org/standards/wcs – OGC WCS website. http://www.opengeospatial.org/standards/wfs – OGC WFS website. http://www.opengeospatial.org/standards/wms – OGC WMS website. http://www.opengeospatial.org/standards/wmts – OGC WMTS website. https://www.openstreetmap.org/ – OpenStreetMap website. https://wiki.osgeo.org/wiki/EbRIM – OSGeo ebRIM website. http://www.lib.utexas.edu/maps/historical/history_americas.html – PCL Map website. https://geology.usgs.gov/tools/metadata/tools/doc/tkme.html – tkme website. http://ggim.un.org/sdi.html – UN-GGIM SDI website. https://www.w3.org/standards/semanticweb/ – Semantic Web website.

1.10

Spatial Analysis Methods

David WS Wong, George Mason University, Fairfax, VA, United States Fahui Wang, Louisiana State University, Baton Rouge, LA, United States © 2018 Elsevier Inc. All rights reserved.

1.10.1 1.10.1.1 1.10.1.2 1.10.1.3 1.10.2 1.10.2.1 1.10.2.2 1.10.2.3 1.10.2.4 1.10.3 1.10.3.1 1.10.3.2 1.10.3.3 1.10.3.4 1.10.4 1.10.4.1 1.10.4.2 1.10.4.3 1.10.4.4 1.10.4.5 1.10.5 1.10.5.1 1.10.5.2 1.10.5.3 1.10.5.4 1.10.6 References Further Reading

Descriptive Measures of Point Features Mean Center, Median Center, and Central Feature Standard Distance and Standard Deviational Ellipse An Illustrative Example for Centrographic Measures Inferential Measures of One Type of Points Quadrat Analysis Ordered Neighbor Statistics Ripley’s K-Function An Illustrative Example for Quadrat Analysis, Ordered Neighbor Statistics, and K-Function Collocation Analysis of Two Types of Points Cross K-Function Analysis Global Colocation Quotient Local Indicator of Colocation Quotients An Illustrative Example for Colocation Analysis of Two Types of Points Area-Based Analysis of Spatial Autocorrelation Defining Spatial Weights Join-Count Statistics for Spatial Autocorrelation Analysis of Binary Variables Global Spatial Autocorrelation Measures Local Spatial Autocorrelation Measures An Illustrative Example for Area-Based Spatial Autocorrelation Measures Regionalization Methods Earlier GIS-Based Regionalization Methods REDCAP Method MLR Method An Illustrative Example for REDCAP and MLR Methods Conclusion

126 126 126 127 127 128 129 131 131 133 133 134 135 135 137 137 137 138 140 140 141 143 143 144 145 146 146 147

Spatial analysis is one of the major functions in GIS (systems), and is also a major topic in GIScience. However, the term “spatial analysis” has been used by geographers before the birth of GIS. In the edited volume by Berry and Marble (1968), some basic concepts of geographical modeling were included, but spatial statistical topics accounted for the majority of the articles. In addition, a section is dedicated to regionalization, a topic that will be addressed here. Although recent GIScience literature and GIS software may separate “statistics” from “analysis,” as some analyses in GIS are not statistical in nature, the term “spatial analysis” originally was predominately statistical. Most existing spatial analytical–statistical functions were not incorporated into GIS until the 1990s (Ding and Fotheringham, 1992; Lee and Wong, 2001). Still many spatial analytical–statistical functions were not available in standard commercial GIS packages. To provide an efficient overview of the topic on “spatial analysis,” we adopt the following approaches. Geographical features are typically categorized geometrically as points, lines, and polygons in the plane. In many cases, linear features of the same type are connected and therefore they together may be treated as a network (examples of nonnetwork features include fault lines, mountain ranges, and coast lines). As this volume also includes an article on network analysis, this article will not discuss analysis specific to linear features (lines) and networks. Only analysis and statistical methods for point and polygon features will be discussed, although some methods are also applicable to linear features (e.g., spatial autocorrelation statistics discussed in this article can be used to analyze network features; see Black, 1992). The remainder of the article is structured as follows: section “Descriptive Measures of Point Features” discusses measures that describe the spatial distribution of point features. Section “Inferential Measures of One Type of Points” covers methods that characterize spatial patterns of one type of points with statistical significance. Section “Collocation Analysis of Two Types of Points” examines colocation of two different types of points. Section “Area-Based Analysis of Spatial Autocorrelation” discusses methods for analyzing polygon features. These methods are specifically formulated to evaluate the magnitude of spatial autocorrelation. The final section is about methods that group areal units together to form regionsdregionalization.

125

126

1.10.1

Spatial Analysis Methods

Descriptive Measures of Point Features

As fundamental as mean and standard deviation in basic statistics, centrographic measures describe the geographic distribution of point features in terms of central tendency and spatial dispersion. Central tendency measures include mean center, median center, and central feature, and spatial dispersion measures include standard distance and the deviational ellipse. For example, the historical trend of the mean center of population in the United States (https://www.census.gov/geo/reference/centersofpop/ animatedmean2010.html) shows that population in the western United States has grown faster than the rest of the country and to a lesser degree the growth in the south has outpaced that in the north. Based on the ellipses of places named by an ethnic group in different eras, we can examine the possible trend of the group’s settlement history (Wang et al., 2014).

1.10.1.1

Mean Center, Median Center, and Central Feature

Similar to the arithmetic mean in regular statistics that represents the average value of observations, the mean center may be interpreted as the average location of a set of points. If one wishes to assess the mean center of lines or areas, they may be represented by the coordinates of the center points of the lines or the centroids of the areas (Mitchell, 2005, p. 33). As a location can be represented  Y)  are the average x coordinate and average y coordinate of all points (Xi, by its x and y coordinates, the mean center’s coordinates (X, Yi) for i ¼ 1, 2, ., n such that X X ¼ X Xi =n; Y ¼ Yi =n: i

i

If the points are weighted by a variable (e.g., population), denoted by wi, the weighted mean center has coordinates X X X X ¼ X ðwi Xi Þ= wi ; Y ¼ ðwi Yi Þ= wi : i

i

i

i

Two other measures of central tendency are used less often: median center is the location that has the shortest total distance to all points, and central feature is the point feature (among all input points) that has the shortest total distance to all other points. The median center or central feature can be viewed as the most accessible location to the point set as the total distance from all data points is the minimum. The difference is that the central feature refers to an existing point feature and the median center does not. With this restriction, the total distance from the central feature to all other point features is always greater than or equal to the total distance from the median center to all point features. Computing the central feature is straightforward: select the point with the shortest total distance from the others. The median center is approximated by an iterative algorithm (e.g., Kuhn and Kuenne, 1962). Beginning with the mean center, a candidate median center is found and then refined until it approximately represents the location with the minimum total distance to all features (Wong and Lee, 2005, Chapter 5).

1.10.1.2

Standard Distance and Standard Deviational Ellipse

Similar to the standard deviation in regular statistics that captures the degree of total variation of observations from the mean, the standard distance is the average distance between the points and their mean center: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi X X 2  Þ2 =n þ ðXi  X ðYi  Y Þ =n: SD ¼ i

For the weighted standard distance, SDw ¼

i

sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X X X X 2  Þ2 = wi ðXi  X wi þ wi ðYi  Y Þ = wi : i

i

i

i

The standard distance can be represented graphically as a circle around the mean center with a radius equal to the standard distance. The larger the standard distance, the more dispersed the points are from the mean center. Therefore, the standard distance reflects the compactness of the distribution of a set of points. The ellipse further assesses the distribution of the set of points by identifying the orientation of the points and the average deviations of point locations from the mean center along the long axis and the short axis. The orientation of the long axis (i.e., the new y-axis) is determined by rotating from 00 to an angle so that the sum of the squares of the distance between the points and the axis is minimal, and the short axis (i.e., the new x-axis) is then perpendicular to the derived long axis. For most of the features to occur within an ellipse, the lengths of the two axes are usually twice the standard deviation along each axis, which is written as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X X 2  Þ2 =n; SDy ¼ ðXi  X ðYi  Y Þ =n: SDx ¼ i

Therefore, it is also termed the standard deviational ellipse.

i

Spatial Analysis Methods

127

N

Legend Central feature Median center Mean center MotorTheft Ellipse Standard distance

0 1.25 2.5

5

7.5

10

km

District

Fig. 1

Centrographic measures for motorcycle thefts in a city.

In the graphic representation (Fig. 1), the ellipse indicates that a set of points are more dispersed along the long axis than the short axis from the mean center.

1.10.1.3

An Illustrative Example for Centrographic Measures

The example is based on motorcycle theft incidents in a study reported in Wang et al. (2017). The data may be accessed at http://ga. lsu.edu/blog/fahui/. Sample data and programs used for all illustrative examples in this article are provided via this link. As shown in Fig. 1, a total of 7777 motorcycle thefts are scattered across the study area, and most are concentrated in the central city in the south central area. As the three measures of central tendency are near each other, a close-up inset shows their relative locations. Understandably, the median center and the central feature are very close since both are intended to determine the locations with the shortest total distance from all crime incidents. The mean center is northwest of them. Most of the crimes are included inside the standard distance. The standard deviational ellipse shows a northwest–southeast orientation, consistent with the shape of the study area. The five centrographic measures provide basic descriptions about the geographic distribution of one type of crimes at one time. One may need to compare these statistics to the patterns in other time periods or to other types of crimes in order to learn whether the pattern for motorcycle thefts has shifted over time or is more compact or dispersed than the distributions of other types of crimes.

1.10.2

Inferential Measures of One Type of Points

The previous section discusses how location information of points and their associated attribute values, if available, can be summarized using centrographic measures. Those summary measures (where the points are generally located and to what extent they are

128

Spatial Analysis Methods

dispersed and if they exhibit a directional bias) are good to describe the point distributions but cannot indicate if the points are clustered, dispersed, or random. In this section, we will review methods to characterize point patterns. The next section will discuss methods to determine whether different types of points are located close together (colocated) statistically. Two general approaches have been proposed to analyze point patterns: density-based versus distance-based (Boots and Getis, 1988). The concern of the density-based approach is whether the observed points exhibit a density pattern (number of points per unit area) that is different from that of a random pattern. Quadrat analysis utilizes this approach. Ripley’s K-function also considers point density, but depicts changes in point density at different geographical scales (spatial lags). The distance-based approach considers whether distances between points are larger (dispersed) or smaller (clustered) than those of a random pattern. Ordered neighbor statistics represent this approach. Both approaches compare measures from the observed patterns with measures derived from a random pattern (or a predefined pattern) to render a decision.

1.10.2.1

Quadrat Analysis

In quadrat analysis, areas are sampled and their point density levels are measured from these areal samples. These point density levels are compared with density levels derived from a random point pattern in order to conclude whether the observed pattern is different from a random pattern. Alternatively, if a known process, such as a clustering process, is used to generate a point pattern, then the point density levels of the observed pattern can be compared with those of this clustered pattern to determine whether the observed pattern is different from a clustered pattern. The general idea is to compare the point density levels of observed points with those of a known or expected pattern. In spatial sampling, locations or points are often randomly selected as samples (Wong and Lee, 2005, Chapter 1). In quadrat analysis, quadrats, which are areal sampling units, can be randomly placed over the study region to derive point density levels of the observed pattern. But quadrats can also be laid out to cover the entire study region. Quadrats may be of various sizes and shapes, although some general guidelines should be followed (discussed later). They can be squares, rectangles, circles, hexagons, or theoretically any shape, even irregular polygons, as long as the same shape is used throughout the entire study. However, squares and rectangles can cover the entire study area without gaps or overlaps, but they are less compact than circles, which cannot provide a complete coverage of the study area without overlapping quadrats. On the other hand, a hexagon is the most compact shape that can cover the entire study area without overlap. Nevertheless, using hexagons is often not supported in software, with a few exceptions (e.g., Wong and Lee, 2005, Chapter 6). Although there is no strict rule in determining the number of quadrats and their sizes, the literature provides some guidance (Greig-Smith, 1952; Taylor, 1977; Griffith and Amrheim, 1991). In short, the optimal quadrat size is approximately 2A/r, where A is the area of the study area and r is the number of points. If square quadrats are used, then the number of quadrats N is approximately r/2. While strictly following these suggestions may not be possible in all cases, they are nonetheless useful guidelines to consider when determining the number of quadrats. Whether a complete coverage or random placement of quadrats should be used is also subjective to a certain degree. Even for complete coverage, the quadrats are still samples, as they can be placed in many different ways by shifting their positions. Regardless of which sampling scheme is used, the following discussion about comparing the observed and expected (including random) patterns is applicable. After quadrats are placed over the study area, the next step is to tabulate the frequency distribution of the numbers of quadrats with different points in them. A quadrat can have 0, 1, 2, ., r points in it where r is the total number of points in the study area. If one quadrat has r points (i.e., all other quadrats have 0 points), then the observed points are very likely clustered. But if most quadrats have similar numbers of points, then the pattern is likely dispersed or random. To determine whether the observed pattern is different from a random pattern, this observed quadrat frequency distribution should be compared with the distribution of the one derived from a random pattern. Therefore, the actual implementation may take the following steps to tabulate frequency distribution of the observed pattern: (1) determine the number of quadrats/size of quadrats, and the placement of the quadrats (complete coverage vs. random placement) (2) overlay the quadrats onto the study region (3) tabulate the frequency distribution in the following manner: l l l l l

number of quadrats with no points number of quadrats with 1 point number of quadrats with 2 points . . until the quadrat with the largest number of points is included.

Thus, each of the element x in the set of the numbers of points in a quadrat {0, 1, 2, ., r} corresponds to n(x), the number of quadrats with x points. The results are presented in Table 1. As labeled in Table 1, the quadrat frequency counts can be regarded as the observed counts. Again, if the point pattern is highly clustered, then one or more quadrats should have a large numbers of points. If the points are relatively evenly spread, then most quadrats should have similar numbers of points. This frequency distribution should be compared with the frequency distribution generated by a known or random process, and that distribution can be recorded as expected frequency counts, e(x), in the third column of Table 1.

Spatial Analysis Methods Table 1

129

Tabulation of the frequency of quadrats with different numbers of points in a quadrat

Numbers of points in a quadrat x

Observed frequency counts n(xi)

Expected frequency counts e(xi)

0 1 2 . r

n(0) n(1) n(2) . n(r)

. . . . .

If we want to test whether the observed pattern is different from that generated by a random pattern, the random pattern is treated as the expected one. To generate a random pattern, we can use a Poisson process to generate a distribution. To do so, the mean l needs to be determined. In the context of quadrat analysis, l is the average number of points in a quadrat, which can be estimated by r/N. With l, we can derive the probability of having x points in a quadrat, that is, p(x), where x ¼ 0, 1, 2, ., r, using the formula pðxÞ ¼

el lx x!

The probabilities for the range of x values are computed. Given that N is the total number of quadrats, p(x)  N provides the expected number of quadrats with x points to fill the third column in Table 1. With the observed and expected frequencies, we can compare the two distributions using, for instance, chi-square statistic. An alternative is to compare the observed and expected distributions based upon their probabilities. Using the Poisson formula, the probabilities of the expected distribution are readily available. For the observed probabilities, they can be derived from n(xi)/N. Then, we can compare the observed and expected probabilities using, for instance, the Kolmogorov–Smirnov (K-S) test. Based upon the results of the test, one may conclude whether the observed pattern is different from the random pattern. If one wants to test whether the observed pattern is different from that generated by a known process, then in the above steps, the known process can be used instead of the Poisson process to generate the expected distribution.

1.10.2.2

Ordered Neighbor Statistics

Quadrat analysis has several limitations. As it adopts an areal sampling approach, the point density of sampled areas (quadrats) fails to consider the spatial relationship of points within each sampling unit and between units. For example, point patterns such as those illustrated in Figs. 2A and B will not be distinguishable by quadrat analysis as they have the same point density level, but the spatial relationships between points are different in the two patterns. Apparently, points are closer together in Fig. 2B than the points in Fig. 2A. Therefore, spatial statistical measures such as ordered neighbor statistics that rely on distances between points are needed. Ordered neighbor statistics refers to a set of statistics that evaluates whether neighboring points of different order exhibit a pattern different from a random pattern. The idea can be illustrated by the first order or nearest neighbor (NN) statistic. This statistic compares the averaged NN distance of the observed pattern (robs) to that of a random pattern, which is often labeled as the expected pattern (rexp). If the observed NN distance is significantly smaller than that of the random pattern, then the observed pattern is likely a clustered pattern, and vice versa. To determine robs, let di denote the distance between point i and its NN. For example, in Fig. 3, d1 is the distance between point 1 and its NN, which is point 2. Similarly, d2 is the distance between point 2 and its NN, which in this case, is also point 1. This principle can be extended to determine di, where i ¼ 1, 2, 3, ., 6 in Fig. 3. Note that the reciprocal NN relationship is not always true. For (A)

(B)

Fig. 2 (A) Local clusters with regional dispersion; (B) local and regional clustering. Adapted from Wong, D. W. S. and Lee, J. (2005). Statistical analysis and modeling of geographic information. Wiley & Sons.

130

Spatial Analysis Methods

6 5 4 3 1 2

Fig. 3

A hypothetical set of points to determine ordered neighbors.

P instance, the NN of point 4 is point 5, but the NN of point 5 is point 6, not point 4. Then robs is the average of dis, or di/r, where r is the number of points. The observed NN distance is pthen ffiffiffiffiffiffiffiffi compared with the expected NN distance, rexp. The latter is derived analytically for a random pattern such that rexp ¼ 0:5 A=r , where A again is the area of the study area. The ratio between them, R ¼ robs =rexp , should be around 1 if the observed pattern is close to a random pattern. The ratio theoretically can be as small as 0 when all points are on top of each other and can be as large as 2.149 when the points are maximally dispersed, forming a triangular lattice or a hexagonal landscape as depicted in Central Place Theory (Christaller, 1933). and testing for their statistical difference is needed. Given that the standard Most likely, robs and rexp are differentpnumerically, ffiffiffiffiffiffiffiffiffiffi error (SEr) of the NN statistic is 0:26136 A=r 2 , the standardized z-score (Zr) of comparing the observed and expected NN distances is (robs  rexp)/SEr. Using the traditional standard in testing differences, the null hypothesis, that the observed and expected NN distances are not significantly different, can be rejected if Zr is smaller or larger than  1.96 using an a of 0.05. Alternatively, one may choose the one-tail test to determine whether the z-score is negative, indicating a significant difference from a dispersed pattern, or is positive, indicating a significant difference from a clustered pattern. The NN statistic relies entirely on NN distances, ignoring the relationships of higher ordered neighbors. Using the NN statistic, the spatial patterns in Fig. 2A and B are still not distinguishable, as points in Fig. 2A form small local clusters with regional dispersion, while Fig. 2B exhibits a clustering of local clusters. NNs are relatively close together in both cases. However, when both nearest (first ordered) and second or higher ordered neighbors are considered, Fig. 2A exhibits local clustering by the NN statistic, but a regional dispersion by the higher ordered neighbor statistic, while Fig. 2B exhibits clustering at both local and regional scales. In other words, considering higher ordered neighbors can reveal the spatial scale of the clustering pattern. Extensions of the NN statistics to higher ordered statistics are quite straightforward. Instead of using the NN distance to determine di, the higher ordered statistics just use the corresponding higher ordered neighbor distances to determine di. For instance, for the second ordered neighbor statistic, di is the distance between point i and its second ordered neighbor. To derive the expected distance and its respective standard error for the kth ordered neighbor, the following formulae can be used: rffiffiffiffi r rexp ðkÞ ¼ P1 ðkÞ N P1 ðkÞ SEr ðkÞ ¼ pffiffiffiffiffiffiffiffiffiffi r 2 =A where P1 and P2 are two parameters with applicable values in Table 2. Note that when k ¼ 1, the ordered statistic is the same as the NN situation. Parameter values for orders higher than 6th are available analytically.

Table 2

Parameters for higher ordered neighbor statistics

Order of neighbors (k)

P1 (k)

P2 (k)

1 2 3 4 5 6

0.5000 0.7500 0.9375 1.0937 1.2305 1.3535

0.2613 0.2722 0.2757 0.2775 0.2784 0.2789

Spatial Analysis Methods

131

Fig. 2B also partly illustrates the frame-dependent nature of spatial analytical techniques. Points across all units together in Fig. 2B exhibit mild clustering. If the boundaries of the entire region are “zoomed-out,” the points would look more clustered. Zooming-in closer to the set of points may not reveal clustering. Thus, the analysis needs to account for the boundary effect in analyzing point patterns. One possible approach is to adjust the expected NN distance and the standard error. They are derived based on the assumption of complete spatial randomness over an infinite surface, ignoring the fact that boundaries are always imposed onto the points. The Donnelly’s correction (Donnelly, 1978) suggests that sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   A 0:041 B þ 0:0514 þ pffiffi rexp ¼ 0:5 r r r SE2r ¼ 0:0683

A þ 0:037B r2

rffiffiffiffiffi A r5

where B is the perimeter of the study area.

1.10.2.3

Ripley’s K-Function

Both the NN statistic and quadrat analysis assume that the spatial process creating the point pattern is homogenous across the study area, ignoring the possibility that the process may operate at different spatial scales or multiple spatial processes operate simultaneously at different scales. In other word, the question is whether the point distribution exhibits different magnitudes of clustering (or dispersion) at different spatial scales. Ripley’s K-function may offer insights toward this question (Ripley, 1976, 1977). In essence, the K-function depicts point density levels at different spatial scales. Conceptually, the K-function is constructed by counting the number of points that fall within different distance rings, h (which is the spatial lag or reflects the scale), around each point. Highly clustered patterns will have high density levels for small spatial lags (h). If the points are relatively even or dispersed throughout the region, then the density level will be similar with different spatial lags. Therefore, the K-function is typically depicted by point density levels on the y-axis and different spatial lags on the x-axis. Formally, the number of points that fall within the spatial lag h of a point is defined as XX   Ih dij ; nðhÞ ¼ i sj; i

j

where Ih(dij) is an indicator (binary) function such that it will be 1 if dij < h, and 0 otherwise. Therefore, it is essentially a function counting the number of points within dij of point i. However, to make this function “scale-independent,” it can be standardized by the overall point density of the region, r2/A, to derive the K-function, where r is the total number of points. Thus A XX   K ðhÞ ¼ 2 Ih dij : r i j The K-function may be modified by a weight wi indicating the proportion of the neighborhood of point i that is within the study area to account for the boundary effect. The K-function may be regarded as the observed point density levels at different spatial lags. This observed distribution should be compared with an expected distribution, which is often a random pattern. A reasonable density estimate of random pattern is ph2. The difference between the K-function and ph2 is the difference function L(h), which is written as rffiffiffiffiffiffiffiffiffiffi K ðhÞ h LðhÞ ¼ p High (and positive) values in the L-function indicate clustering at the corresponding lags while low (and negative) values reflect dispersion. Therefore, plotting the L-function against h can show the different degrees of clustering-dispersion over different spatial lags. The L-function of a random point pattern should be close to zero at all spatial lags.

1.10.2.4

An Illustrative Example for Quadrat Analysis, Ordered Neighbor Statistics, and K-Function

The data used in this example are a subset of those used in the previous example. A total of 219 points within the relatively small southeastern district are selected for illustration. As stated previously, the desirable number of quadrats N should be approximately r/2. Therefore, 100 quadrats are generated to cover the entire study area (Fig. 4). Note that there is no reason to place quadrats in area outside the study region. Therefore, quadrats are not in a rectangular grid layout, but cover an irregular region. The next step is to create a table similar to Table 1, showing the observed and expected frequency distributions. Table 3 is the resultant table with observed frequency tabulation from the observed pattern (second column, i.e., the number of quadrats with a given number of points). Quite a few quadrats (45) have 0 point and one quadrat has 28 points. The observed distribution needs to be compared with the expected (random) pattern using the K-S statistic. To do so, the observed frequency distribution (i.e., the counts) is converted into probabilities (column 3). The expected (random) distribution is generated using a Poisson distribution

132

Spatial Analysis Methods

Quadrats Auto-theft locations District boundary 0

Fig. 4

0.5

1

2 km

Study region for quadrat analysis overlaid with 100 square quadrats.

Table 3

Tabulating observed and expected frequency counts for quadrats and deriving probabilities and K-S statistic

(1) Points in a quadrat

(2) Observed count

(3) Observed probability

(4) Expected probability

(5) Expected count

(6) Cumulative observed probability

(7) Cumulative expected probability

(8) Absolute difference in probabilities

0 1 2 3 4 5 6 7 10 11 12 13 15 24 28

45 22 12 8 2 1 1 1 2 1 1 1 1 1 1

0.45 0.22 0.12 0.08 0.02 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01

0.11 0.25 0.27 0.20 0.11 0.05 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00

11 25 27 20 11 5 2 1 0 0 0 0 0 0 0

0.45 0.67 0.79 0.87 0.89 0.9 0.91 0.92 0.94 0.95 0.96 0.97 0.98 0.99 1

0.1119 0.3570 0.6254 0.8213 0.9286 0.9756 0.9927 0.9981 0.9982 0.9982 0.9982 0.9982 0.9982 0.9982 0.9982

0.3381 0.3130 0.1646 0.0487 0.0386 0.0756 0.0827 0.0781 0.0582 0.0482 0.0382 0.0282 0.0182 0.0082 0.0018

Total Lambda

100 2.19

102

with lambda (ldthe mean) ¼ 2.19 (number of points/number of quadrats) and is reported in column 4. Cumulative probabilities of the observed and expected are derived (columns 6 and 7), and they are compared (column 8). The maximum absolute difference, pffiffiffi which is 0.3381 (first row in column 8), is the K-S statistic D. Using a ¼ 0.05, the critical value of D is 1:36= n, which is 0.14. Therefore, the observed D is larger than the critical value and the null hypothesis is rejected. There is a 95% of chance that the observed pattern is different from that of a random pattern. Using K-S statistic is just one of several ways to compare the observed with the expected (random) distributions. One alternative is to use the c-square statistic to compare the observed and expected frequencies (columns 2 and 5). Another comparison method is a variance–mean ratio test. The random pattern can be generated by a Poisson process, and a distinctive characteristic of Poisson is that the mean is the same as the variance. Therefore, if the mean and variance of the observed distribution are very similar (i.e., having a variance–mean ratio close to 1), then the observed pattern may likely be generated by a Poisson process. For details of these tests, readers can refer to Wong and Lee (2005).

Spatial Analysis Methods

133

The second part of this example is to implement the ordered neighbor statistics. Using ArcGIS, the nearest neighboring point (NEAR_FID) and the associated distance (NEAR_DIST) for each point (IN_FID) are derived and results for some of the points are reported in Table 4. Given these distances, robs, the averaged NN distance is 119.55 m. Given an area of 31,519,189 m2, rexp ¼ 189.69, and SEr ¼ 6.70. Then, comparing the observed and expected NN distances, the z-value is  10.47. Therefore, the pattern is very likely clustered. Even using the Donnelly’s correction, the z-value is still  9.36, failing to alter the initial conclusion. Table 5 is similar to the NN distance table in Table 4, but includes the second NNs and associated distances determined by GIS. Practically, higher-ordered neighbors can be identified using a similar procedure and therefore, higher-ordered neighbor statistics can be obtained. The third part of the example is to illustrate the K-function. The L-function, the difference between the observed K-function and the K-function of a random distribution, was plotted against different spatial lags. Fig. 5 shows the plot. The plot also includes the expected L-function, which is horizontal at 0, with a 95% confidence bound around it. The observed L-function is way above the expected one and the bounds, and therefore the observed pattern is clearly clustered. However, the degree of clustering varies at different spatial lags. At the local scale, the clustering is mild, but the degree of clustering increases as lag increases. Over longer lags, the degree of cluster diminishes slightly, but is still significantly different from a random pattern. Note that the attached plot was produced from Crimestat (Levine, 2002).

1.10.3

Collocation Analysis of Two Types of Points

Colocation analysis examines the extent that two types of objects are within the vicinity of each other. Colocation analysis methods can be grouped into area-based and point-based approaches. Area-based colocation analysis methods such as the join-count statistic will be covered in the next section, and this section focuses on colocation analysis of points.

1.10.3.1

Cross K-Function Analysis

For point-based colocation analysis, the cross K (or bivariate K) function, an extension of Ripley’s K-function (as discussed earlier) for two populations, is frequently used to investigate the relationship between two spatial processes under the assumption of independence. It measures the ratio of observed overall density of type B points within a specified distance of a type A point over what we Table 4

Partial listing of the table providing the nearest neighbor distances

Objectid

IN_FID

NEAR_FID

NEAR_DIST

0 0 0 0 0 0 0 0 . . .

0 1 2 3 4 5 6 7 . . .

110 38 14 56 53 77 114 59 . . .

44.0858 20.4155 529.4193 76.2802 27.7706 434.1159 75.5106 556.8240 . . .

Table 5

Partial listing of the table providing nearest neighbor and second ordered neighbor distances

Objectid

IN_FID

NEAR_FID

NEAR_DIST

0 0 0 0 0 0 0 0 . . .

0 0 1 1 2 2 3 3 . . .

110 82 38 122 169 14 56 61 . . .

44.0858 68.2103 20.4155 114.6369 619.6956 529.4193 76.2802 213.5704 . . .

134

Spatial Analysis Methods

1000 900 800 Difference function L(h)

700 600 Observed L(d)

500

Expected L(d) 400

Lower bound Upper bound

300 200 100

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97

0 –100 Spatial lag (h) Fig. 5

Observed L(h) function, expected L (¼0), and the upper and lower bounds of the expected L(h).

would expect by chance (Cressie, 1991). A significance test is fulfilled by a series of Monte Carlo simulations to examine whether the two types of points are clustered or dispersed. One may use network distance instead of Euclidean distance to reduce bias for socioeconomic activities in urban areas (Okabe and Yamada, 2001) since such events (e.g., crimes) usually occur (or are possibly even geocoded) along the existing street network. Similar to the output of Ripley’s K-function, results for cross K-function analysis are presented as a graph illustrating the change of cross K value (y-axis) as distance (x-axis) increases. Based on the graph, one can detect at what distance range the two types are spatially correlated (i.e., colocated) or independent from each other. For example, statistically significant colocation exists at a certain distance range if the cross K value at that distance falls above an upper envelope of a given level of statistical significance; significant dispersion of two point sets is suggested if it lies below the lower envelop line; and the spatial distribution of type A points has no impact on that of type B points if it falls within the upper and lower envelops. Note that the cross K-function is sensitive to the direction of colocation association. In other words, the cross K-function for A / B, which stands for the extent to which type A points are attracted to type B points, differs from B / A that indicates the extent of type B points drawn by type A points.

1.10.3.2

Global Colocation Quotient

As Leslie and Kronenfeld (2011) point out, the cross K-function of two types of point features may be significantly biased by the distribution of the underlying population if they collectively exhibit a significant spatial clustering pattern. They proposed a colocation quotient (CLQ) measure, based on the concept of location quotient commonly used in economic geography studies, to mitigate the problem. Different from the cross K-function, the global CLQ is designed on the basis of NNs rather than actual metrical distance in order to account for the aforementioned effect of the joint population distribution. The global CLQ, which examines the overall association between observed and expected numbers of type B points in proximity to type A points, is formulated as CLQA/B ¼

NA/B =NA NB =ðN  1Þ

(1)

where N represents the total number of points, NA denotes the number of type A points, NB depicts the number of type B points, and NA/ B denotes the number of type A points that have type B points as their NNs. The numerator calculates the observed proportion of type A points that has type B points as the NN, while the denominator estimates the expected proportion by chance. Given that a point itself is not an eligible candidate for its NN, N  1 rather than N is used in the measurement of the expected proportion. Like the cross K-function, the global CLQ takes the direction of colocation relationship into account. In practice, a point A may have multiple NNs within a bandwidth. Under this circumstance, the global CLQ treats each NN of point A equally in the calculation of NA/B. It is formulated as NA/B ¼

nni NA X X fij nn i i¼1 j¼1

(2)

Spatial Analysis Methods

135

where i denotes each type A point, nni represents the number of NNs of point i, j indicates each of point i’s NNs nni, and fij is a binary variable indicating whether or not point j is of type B under investigation (1 indicates yes and 0, otherwise). The global CLQ is proven to have an expected value of one when all points are randomly relabeled, given the frequency distribution of each point set (Leslie and Kronenfeld, 2011). Consequently, a CLQA/B larger than one indicates that type A points tend to be more spatially colocated with B points, and a larger CLQ value indicates a stronger colocation association. On the contrary, a CLQA/B less than one suggests that type A points tend to be isolated from B points. A Monte Carlo simulation-based statistical test is used to determine whether or not such spatial colocation or isolation behavior is significantly nonrandom. The Monte Carlo simulation process reassigns the category of each point in a random manner, but subject to the frequency distribution of each point category. A distribution of CLQ from the randomization is obtained by repeating this process for many times. Then the distribution of observed CLQ is compared with the distribution derived from randomization to derive a test statistic and significance level.

1.10.3.3

Local Indicator of Colocation Quotients

More recently, Cromley et al. (2014) developed a local version of CLQ, the local indicator of colocation quotients (LCLQ), in order to reveal the spatial variability of correlation between two point sets. The LCLQ is formulated as LCLQAi /B ¼

NAi /B ¼

N X j¼1ð jsiÞ

NAi /B NB =ðN  1Þ PN

wij fij

(3) !

j¼1ð jsiÞ wij

wij ¼ exp 0:5 

d2ij

(4)

!

d2ib

(5)

where the LCLQ for point Ai relative to type B points has a similar expression as the global CLQ in Eq. (1), NAi /B denotes the weighted average number of type B points that are the NNs of point Ai, fij still represents a binary variable indicating if point j is a marked B point (1 for yes and 0 otherwise), wij denotes the weight of point j, indicating the importance of point j to the ith A point, dij is the distance between the ith A point and point j, dib denotes the bandwidth distance around the ith A point. Eq. (5) defines the Gaussian kernel density function that is used to assign geographic weights to each neighbor of point Ai, and basically the farther a neighbor locates beyond point Ai, the less important it is to point Ai. One may also consider other density functions to define a point’s weight, such as the box kernel density function which treats each point within the prescribed bandwidth distance from a marked A point identically as one, regardless of their distances. Recently, Wang et al. (2017) used the Monte Carlo simulation technique to develop a statistical significant test for the LCLQ. According to the “restricted random labeling approach” (Leslie and Kronenfeld, 2011), at a given location, one simulation trial randomly assigns the categories of points by keeping the frequency distribution of each category fixed in this relabeling step. In other words, the simulation process at point i relabels other points in a random fashion until all points are recategorized; and the number of elements in each point set remains unchanged after the simulation. This simulation trial is repeated for a predetermined number of times, and a sample distribution of LCLQs can be obtained at each point of interest (say, point i) by recalculating an LCLQ for each trial. Then the significance of calculated LCLQ at point i is determined by comparing it with the sample distribution generated from simulation.

1.10.3.4

An Illustrative Example for Colocation Analysis of Two Types of Points

Based on the same data set used in Wang et al. (2017), Fig. 6A and B shows the cross K-function between motorcycle theft incidents and entertainment establishments, based on Euclidean distance and network distance, respectively. For implementation, we used the Ripley’s cross K-function provided in GeodaNET (https://geodacenter.asu.edu/downloads/software/gnet) to derive both figures. Fig. 6A shows that the colocation between motorcycle thefts and entertainment establishments is significant at a straight-line distance within 33 km, but not significant beyond. When network distance is used, as shown in Fig. 6B, the observed cross K value is significantly higher than the upper envelope curve across all distance ranges (from 0 to 54 km). We used the tool developed by Leslie and Kronenfeld (2011) (http://seg.gmu.edu/clq) to compute the global CLQ. Setting the Monte Carlo simulation for 1000 times, the global CLQ was 0.496 by using a bandwidth of 1 (i.e., the NN), and increased to 0.681 by using a bandwidth of 10 (i.e., 10 NNs). In both cases, motorcycle thefts were significantly (at 0.001 level) isolated from entertainment establishments in the study area. This contradicts the finding provided by the cross K-function analysis. That is to say, the finding from the cross K-function is likely attributable to the significant clustering pattern of the joint population, and can be biased or even erroneous. A program written in C#, available at the same link to download the sample data, was developed to implement the LCLQ and corresponding statistical test. Here, only the bandwidth of 10 NNs around each examined point is used for a smoother and more readable pattern. Compared to a fixed bandwidth based on metrical distance, this bandwidth strategy ensures that exactly the same

Spatial Analysis Methods

⫻ 10,000,000

(A)

(B)

70

⫻ 100,000

136

60 50 40

35 30 25 20

30

15

20

10

10 0

40

5 0

0

9,000 18,000 27,000 36,000 5% k(d)

k(d)

95% k(d)

0 k(d)

18,000

36,000

5% k(d)

54,000 95% k(d)

Fig. 6 Cross K-function analysis of motorcycle thefts and entertainment establishments (x-axis: distance in meters, y-axis: cross K value): (A) Euclidean-distance-based; (B) network-distance-based. Wang, F., Y. Hu, S. Wang, and X. Li (2016) Local Indicator of Colocation Quotient with a statistical significance test: examining spatial association of crime and facilities. Professional Geographer 68. http://dx.doi.org/10.1080/00330124. 2016.1157498.

number of points is involved in the estimation of LCLQ at each marked A point, and thus returns more robust and reliable results. In practice, the order of neighbors (e.g., the first NN and the second NN) may be estimated by different distance measures such as the Euclidean and network distances. Determining Euclidean distance is computationally simple, but network distance is more meaningful when activities in the environment (e.g., in suburban and rural areas) are highly constrained by the network structure. Using the same sample dataset, Fig. 7A and B illustrates the results of LCLQ between motorcycle thefts and entertainment establishments, based on Euclidean distance and network distance, respectively. The LCLQ results based on two distance metrics are largely consistent with each other, and reveal that significant colocations are mostly observed in the southern part of the study area. The two maps indicate that the global CLQ resultsdsignificant dispersion pattern between the twoddoes not hold across the study region, and that spatial heterogeneity of colocation is evident. There are also some noticeable discrepancies between the results based on two distance metrics. For example, in the northwest corner of the study area (Fig. 7A), most of the motorcycle theft crime incidents significantly collocated with entertainment establishments (LCLQ > 1) while a few incidents in the central area were significantly dispersed when Euclidean distance was used. When network distance was used (Fig. 7B), the dispersion between them was much expanded to most of the crime incidents with only a few exceptions on the edge. Such a difference may be explained

(A)

(B)

N W

N

E

W

S

S

Legend

Legend

Local CLQ 0.0–0.5 0.5–1.0 1.0–2.0 >2.0 Not significant District Subdistrict road

Local CLQ 0.0–0.5 0.5–1.0 1.0–2.0 >2.0 Not significant District Subdistrict road

0

1

2

4

6

8 Miles

E

0

1

2

4

6

8 Miles

Fig. 7 LCLQs between motorcycle thefts and entertainment establishments (A) Euclidean-distance-based (B) network-distance-based. Wang, F., Y. Hu, S. Wang, and X. Li. (2016) Local Indicator of Colocation Quotient with a statistical significance test: examining spatial association of crime and facilities. Professional Geographer 68. http://dx.doi.org/10.1080/00330124.2016.1157498.

Spatial Analysis Methods

137

by the road network pattern in that area. Specifically, there are only a few roads connecting this area with others and the road density around that area is very low; therefore, the distance between an incident and its 10 NNs would become much longer as measured by network distance than the Euclidean distance, and thus would return results with lower LCLQ values but higher confidence. This may suggest that the network distance-based LCLQ measure, in general, reports lower but more significant LCLQ values than the Euclidean distance approach, especially in the rural area where road network is sparse and simple.

1.10.4

Area-Based Analysis of Spatial Autocorrelation

This section focuses on the analysis of area (polygon) features, although some analytical methods are also applicable to point features. More often than the analysis of point features, area-based analysis focuses on relationships of attribute values over space and the magnitude of spatial autocorrelation or association, that is, how similar or dissimilar values are across space. Different regions are often assessed if they carry similar population composition. Some of these studies are framed within the context of segregation studies, evaluating how different population groups dominate specific regions. The degree of dominance may be reflected by the proportion of a (minority) group. To summarize the uneven distributions of two population groups, a popular measure is the dissimilarity index D, which is defined as:  n  X  ai  bi  D ¼ 0:5   A B i¼1 where n is the number of areal units in the study area, ai and bi are the population counts of the two population groups in areal unit i, and A and B are the total population counts of the two groups in the entire region. This index ranges from 0, indicating no segregation, to 1, reflecting perfect segregation. However, this index is aspatial, as its formulation does not capture any information about the spatial relationship between areal units. Thus, if we swap the populations in a3 and b3 with a5 and b5, even if the proportions between the two groups in the two units are dramatically different, D remains unchanged.

1.10.4.1

Defining Spatial Weights

Many spatial statistical measures, including spatial autocorrelation statistics, need to specify the spatial relationship between objects. Such relationship is often captured by a spatial weights matrix. In the simplest form of the matrix with n areal units, each matrix element, cij, is binary, with 0 indicating that the corresponding i and j units are not adjacent, and 1 otherwise. This binary matrix C is also labeled as the adjacency matrix or connectivity matrix in transportation study or network analysis. This matrix is a square symmetrical matrix of dimension n  n. By summing all values (1s) across columns (j) for each row (i), that P is, j cij , the column sum indicates the number of neighbors of the respective unit i. If the nonzero cells of row i are replaced P by wij ¼ 1= j cij , or 1 over the row sum, then we have a stochastic or row-standardized matrix W, as each original binary cell value is standardized by the corresponding row sum. If the spatial configuration has a relatively large number of areal units, this matrix likely has a large proportion of zero cells, as most units have just a few neighbors. To reduce the dimension of the matrix and store the adjacency information more efficiently, a sparse matrix may be used. Each record in this matrix corresponds to an areal unit i. Only the IDs of the units that are neighbors of i are recorded. While the information captured by this sparse matrix is identical to the n  n binary adjacency matrix, the dimension of this sparse matrix will be much reduced to n  k, where k is the maximum number of neighbors across all areal units with k  n. Apparently, adjacency is one of many possible spatial relationships. Another typical one description of relationship is the actual distance between areal units (dij). Distances in the matrix, D, can be modified to form weights. For instance, to depict the general spatial principle that the magnitude of relationship declines with increasing distance, the matrix values may take on the inverse of distance. The inverse of distance may further be modified with an exponential or power function, depicting the distance decay effect at an increasing rate with increasing distance, typically adopted in spatial interaction models (Fotheringham and O’Kelly, 1989). More detailed discussion about the formulation of different spatial weights matrices can be found in Wong and Lee (2005).

1.10.4.2

Join-Count Statistics for Spatial Autocorrelation Analysis of Binary Variables

Different measures of spatial autocorrelation are needed for attribute variables in different measurement scales. Join-count statistic is an area-based measure of spatial correlation for data of binary values (Dacey, 1965). Each areal unit has an attribute of 0 or 1. The statistics are derived from counting the numbers of neighboring units with (0,0), (0,1), and (1,1) in values, that is, join counting neighboring units. These counts can also be regarded as the numbers of shared boundaries (joints) with neighboring units carrying similar [(0,0) and (1,1)] or dissimilar [(0,1)] values. If values distributed across the study area have relatively strong positive spatial autocorrelation, then the counts for (0,0) and (1,1) should be relatively high and the count for (0,1) should be relatively low, reflecting that neighbors with similar characteristics are the dominant situation. Negative spatial autocorrelation will have a relatively large (0,1) count, indicating dissimilarity in neighboring units.

138

Spatial Analysis Methods

Formally, let xi ¼ 1 if unit i carries 1, and 0 otherwise, and wij is the spatial weight, which may adopt the binary or stochastic specification. Then the observed join-count statistics for the three types of neighbors are X X  Oð1; 1Þ ¼ ½ wij xi xj i

j

X X   wij ð1  xi Þ 1  xj Oð0; 0Þ ¼ ½ i

Oð0; 1Þ ¼ ½

j

X Xh i

 2 i wij xi  xj

j

These formulae are primarily for programming purpose when the spatial system is too large to be enumerated manually. Otherwise, the three types of neighbors can be counted manually. To determine the expected counts, that is, counts when there is no significant spatial autocorrelation of any type, we have to make assumption about the sampling framework. If samples are assumed to be drawn from a normal distribution, the sampling method may be regarded as free sampling, normality sampling, or sampling with replacement. Alternately, if the sample is fixed, but is drawn from or arranged in different ways, then it is nonfree sampling, randomization, or sampling without replacement. Expected values of the join-count statistics depend on the sampling assumption. For normality sampling, if p is the probability that an areal unit is 1, the expected counts for the three types of neighbors are Eð1; 1Þ ¼ ½Wp2 Eð0; 0Þ ¼ ½Wq2 Eð1; 0Þ ¼ Wpq; where q is the probability that an areal unit is 0, or (1  p), which can be estimated by n0/n, where n0 is the number of units with 0 and n is the total number of areal units, and W is the sum of the values in the spatial weights matrix, which can be the binary or row-standardized one. If a binary matrix is used, the variances for the three corresponding statistics are: d2 ð1; 1Þ ¼ p2 J þ p3 K  p4 ð J þ K Þ d2 ð0; 0Þ ¼ q2 J þ q3 K  q4 ð J þ K Þ d2 ð0; 1Þ ¼ 2pqJ þ pqK  4p2 q2 ð J þ K Þ P P P where J is the total number of shared boundaries or 0:5  i j cij , Li is the number of neighbors of areal unit i or j cij , and K is P i Li ðLi  1Þ. If a row-standardized weights matrix is used, the three variance formulae will be different than those above and their specifications are available in Cliff and Ord (1981) or Wong and Lee (2005). Under the randomization sampling assumption, the expected join counts are Eð1; 1Þ ¼ ½W

n1 ðn1  1Þ nðn  1Þ

Eð0; 0Þ ¼ ½W

n0 ðn0  1Þ nðn  1Þ

Eð0; 1Þ ¼ W

n1 ðn  n1 Þ nðn  1Þ

The standard deviations (or variances) under the randomization assumption are rather complicated; therefore, they are not described here. Interested readers may refer to Cliff and Ord (1981) and Wong and Lee (2005). The z-scores of the three statistics can be determined as ((O  E)/d). Note that both O(0,0) and O(1,1) indicate positive spatial autocorrelation. The z-scores for testing both O(0,0) and O(1,1) have to be rejected in order to conclude that the observed pattern has significant positive spatial autocorrelation. The z-score for O(0,1) should indicate whether significant negative spatial autocorrelation exists in the observed pattern.

1.10.4.3

Global Spatial Autocorrelation Measures

Characterizing regions by binary variables is not common. The more common situation is that regions are characterized by interval– ratio variables such as rates (e.g., crime or disease), socioeconomic indicators (e.g., income, house value), or physical attributes (e.g., annual rainfall, averaged summer temperature). Clearly many of these variables are related to spatial processes and thus their values should be similar over space. Various spatial autocorrelation or association measures have been developed to evaluate the

Spatial Analysis Methods

139

similarity levels of attribute values over space. Measures that summarize the similarity level over the entire study area (with subregions inside) are known as global measures. Measures that assign a value to each areal unit to reflect how that local unit is related to neighboring units are known as local measures, and will be examined in the next subsection. The most popular global spatial autocorrelation measure is Moran’s I. It is essentially a spatial version of the Pearson correlation coefficient, but is applied to one variable. Formally, it is defined as   PP n i j wij ðxi  xÞ xj  x I¼ P W i ðxi  xÞ2 where xi is the variable value in unit i and x is the mean of the variable over the study area. While the denominator may be treated as a variance-based standardization term, the numerator is based upon covariance (mean deviations of the values in i and j), but is conditioned by the spatial weight wij. Therefore, the covariance is counted only if wij is not zero. In the binary weighting scheme, the numerator is a univariate spatial covariance. When values of both neighboring units are above or below the mean, the product contributing to the numerator will be positive, indicating positive spatial autocorrelation. If one neighbor is above and the other is below the mean, then the contribution to the numerator will be negative, indicating negative spatial autocorrelation. Moran’s I has a range of ( 1, 1). When there is no spatial autocorrelation, the expected value of Moran’s I is EI ¼  1/(n  1), where n is the number of areal units. Note that EI is always negative, regardless of how close it may be toward zero when n is very large. When n is small, EI can be quite negative. Under the normality sampling assumption, the variance of Moran’s I is d2 ð I Þ ¼

n2 S1  nS2 þ 3W 2 : W 2 ðn2  1Þ

Under the randomization assumption, the variance is     n n2  3n þ 3 S1  nS2 þ 3W 2  k n2  n S1  2nS2 þ 6W 2 d2 ðIÞ ¼ ðn  1Þðn  2Þðn  3ÞðW 2 Þ where k is the kurtosis of the variable without the “ 3” adjustment, and 2 1 X X S1 ¼ wij þ wji 2 i j 0 12 X X X @ S2 ¼ ; wij þ wji A i

j

j

Another global measure of spatial autocorrelation is the Geary ratio. As Moran’s I is based upon the comparison between attribute values and the mean (mean deviations in the numerator), the Geary ratio directly compares values in neighboring units. Formally, 2 PP  ðn  1Þ i j wij xi  xj C¼ P 2W i ðxi  xÞ2 Because the Geary ratio directly compares neighboring values in the numerator, if neighboring values are similar, the numerator will be small. The minimum will be zero when all units are equal. When neighboring values are very different, the numerator will be relatively large. After the sum of the squared differences in the numerator is standardized by the denominator, the maximum value of the Geary ratio is 2. As the Geary ratio is not as popular as Moran’s I, partly due to its distributional properties, its variances are not presented here. Interested readers can refer to the relevant literature (Geary, 1954). Another popular global measure of spatial autocorrelation or association is the Getis-G statistic. Formally, P P GðdÞ ¼

j wij ðdÞxi xj P P i j xi xj

i

where wij(d) is a binary function carrying a value of 0 or 1. If unit j is within d of i, then wij(d) ¼ 1, implying that unit j is within the neighborhood of i, and 0 otherwise. Thus, the G-statistic is dependent on the size of the neighborhood definition adopted. If unit j is within the neighborhood of unit i, then the product of their values (xi and xj) will be included in the numerator. The denominator includes the cross-products of all possible pairs of values disregarding the locations. Therefore, G-statistic is essentially a ratio of the association of values within the neighborhood (as defined by d) to the association of values over the entire region. From a spatial perspective, if values within the neighborhood are relatively large, the (sum of) products will be relatively large. If the neighborhood values are relatively small, the products will be small. Thus, if large values are clustered, the G-statistic should be relatively large, indicating the presence of hot spots (high values next to high values). On the other hand, if the neighboring values are small, the G-statistic should be small, reflecting the presence of cold spots (low–low values). Note that the size of d plays an important role in evaluating the presence of spatial clusters, and high and low values for the G-statistic do not correspond to positive and negative spatial autocorrelation, respectively, as in Moran’s I and Geary ratio.

140

Spatial Analysis Methods

The expected value of the G-statistic, E(G) is W/[n(n  1)], where W will reflect the value of d adopted in defining the size of neighborhood. The variance of the G-statistic is d2(G) ¼ E(G2)  [E(G)]2. While E(G) is quite simple to compute, E(G2) is rather complicated. Interested readers should refer to Getis and Ord (1992). Again, the standardized score of the G-statistic ranges from positive to negative values, but they should not be interpreted as positive and negative spatial autocorrelation, respectively. Rather they correspond to the presence of high–high and low–low clusters. Spatial autocorrelation statistics such as Moran’s I and Geary ratio cannot distinguish low from high value clusters as both clusters are formed by similar neighboring values. The global spatial autocorrelation–association statistics discussed earlier are applicable to one variable. They may be extended to examine the spatial correlation pattern between two variables (e.g., Lee, 2001). For example, instead of correlating a variable and the spatial lags for the same variable, a bivariate Moran’s I analyzes the correlation between a variable and the spatial lags of another variable. However, such an approach has not been used often, partly due to the difficulties in interpreting the result. In addition, a major reason to assess the level of spatial autocorrelation is to determine if the basic assumption of independence in classical statistics is violated or not. Therefore, evaluating the univariate case is often more applicable.

1.10.4.4

Local Spatial Autocorrelation Measures

Moran’s I, Geary ratio and G-statistics are all regarded as global measures as they provide one summary value for the entire study area, ignoring the spatial variability of the measured phenomenon within the region. To depict the variability within the study region, measures reflecting the local situations are needed. A series of local measures of spatial autocorrelation–association have been introduced. They are mostly derived by spatially decomposing the global measures for local units. The Local Indicators of Spatial Association (LISA) include both the local versions of Moran’s I and Geary ratio. Formally, local Moran is defined as X I i ¼ zi wij zj i

where zi is the standardized score of xi, that is xi  x=d. Thus, for each unit i, a fraction (wij) is pulled from the deviation from the mean of the neighboring value (zj). The sum of these fractions of mean deviations from neighbors is multiplied with the mean deviation in i. If both neighboring values are above or below the mean, Ii will be positive, indicating positive spatial autocorrelation. If one of the neighboring values is above the mean and the other is below the mean, Ii will be negative, indicating negative spatial autocorrelation. The expected value of Ii, E(Ii) ¼ wi./(n  1) where wi. is the column sum of the ith row of the W matrix. The variance of Ii is extremely complicated and is described in Anselin (1995).For local Geary, it is defined as X  2 wij zi  zj ci ¼ i

which is literally the sum of weighted squared mean deviation differences. Similar to the global Geary ratio, which compares neighboring values directly, the local version compares the neighboring mean deviations directly (zi and zj), but weighted by the spatial weights (wij). If neighboring values are similar, the local Geary will be small, but will be large if neighboring values are dissimilar. Similarly, the global G-statistic also has a local version: P j wij ðdÞxj P Gi ðdÞ ¼ j xj The denominator is essentially the sum of values in all units, except the one in unit i. The numerator is the  sum  of values in the P neighbor of i. The expected value of local G-statistic is wi./(n  1), where wi. is j wij ðdÞ. The variance is E G2i  ½EðGi Þ2 . Again, E(G2i ) is too complicated to be included here (refer to Getis and Ord, 1992). In interpreting the standardized local G-statistic, one should be cautious. Strong negative standardized scores indicate low–low clusters, strong positive scores refer to high–high clusters and values close to zero may refer to random patterns or clusters of moderate values. When high and moderate values are in proximity, the scores will be moderately positive. When low and moderate values are close by, the scores will be moderately negative.

1.10.4.5

An Illustrative Example for Area-Based Spatial Autocorrelation Measures

The example is based on the dataset used in a study reported in Wang et al. (2012), which can be accessed via the same link as for datasets for other illustrative examples used in this article. The example uses only 68 “analysis areas” that intersect the City of Chicago. The analysis areas were constructed from zip code areas by regionalization so that each of them had a minimum number of breast cancer cases ( 15) to derive a reliable estimate of late-stage cancer rate (as shown in Fig. 8A). In order to implement the join-count statistics, the mean late-stage cancer rate, 0.3, was used to categorize areas into low and high (as shown in Fig. 8B). There were 33 (n1) areas with low rates ( 0.3) and 35 (n2) with high rates. Treating low as “0” and high as “1,” and using the queen’s contiguity to define polygon adjacency, O(0,0), O(1,1), and O(0,1) are 44, 47 and 78, respectively. Given the values of n1 and n2 reported above, and the total number of shared boundaries (169) or the sum of the weights matrix (338), the respective expected values can be derived (and rounded to): 39, 44, and 86. Both O(0,0) and O(1,1) are smaller

Spatial Analysis Methods

(A)

141

(B)

Chicago: zip-codes areas Late stage breast cancer rate

Chicago: zip-code areas Low or high rate

0.11 – 0.19

Low

0.20 – 0.26

High

0.27 – 0.33 0.34 – 0.40 0.41 – 0.52

0

Fig. 8

3.75

7.5

15 km

0

3.75

7.5

15 km

(A) Late-stage breast cancer rates in Chicago, (B) areas classified into low (0.3) and high (> 0.3).

than their expected values, and O(0,1) is larger than the expected; the spatial distribution seems to exhibit a negative spatial autocorrelation. However, these differences need to be tested for statistical significance. The parameters required to compute the variances of the join-count statistics, J and K, are 169 and 1502, respectively, and their respective variances are 118.78, 132.30, and 42.58, under the normality sampling assumption. Then the respective z-scores of the three join-count statistics are 0.45, 0.26, and  1.23, failing to substantiate the presence of significant spatial autocorrelation of any type. We then examine the global spatial autocorrelation measures. Again, using the queen’s contiguity, the observed Moran’s I for late-stage rates was 0.23, while the expected value was  0.01. The z-score of the Moran’s I was 3.23 with a probability (p-value) of 0.01. Therefore, the observed distribution has a low to mild but significant positive spatial autocorrelation. However, the G-statistic was 0.0715, not too different from the expected value of 0.0707. As a result, the z-score was 0.48 with a p-value of 0.63, indicating that the distribution has no strong spatial association pattern and the pattern is not statistically significant. In this case, Moran’s I and G-statistic yielded different conclusions, as they detect slightly different phenomena. Finally, for local measures of spatial autocorrelation, local Moran values were computed with their z-scores shown in Fig. 9A. Large positive z-scores indicate similar (either low–low or high–high) values in the neighborhoods, and large negative z-scores reflect dissimilar values. The high positive z-scores in the west-central area of Chicago were high–high clusters while the relatively high positive z-scores along the narrow strip in the north east and north are low–low clusters. On the other hand, the negative z-scores in the northwest reflect high–low clusters (Fig. 9B). Local G-statistics were also computed for the late-stage breast cancer rates. The results are shown in Fig. 10. The high rate cluster in the west-central part of the city is similar to that identified by local Moran, but with the additional information of different levels of statistical significance. Clusters in the north and northeast have moderately negative z-scores, indicating that moderately low values are next to low values, but low–low clusters are not found. Therefore, local Moran and local G provide similar information, but offer subtly different results.

1.10.5

Regionalization Methods

Regionalization is the process of grouping a larger number of small areal units into fewer regions, and has a long tradition in geography (Cliff et al., 1975). Similar to cluster analysis, which clusters observations according to the similarity among their attributes, regionalization groups areas that are similar in attributes and also spatially adjacent to each other. Therefore, it is also referred to as “spatial clustering” (Mu et al., 2015). There are various reasons or purposes for regionalization. Regionalization yields areas of a larger population size to protect privacy of data and provide more reliable rate estimate. Constructing new geographic areas allows the analysis to be conducted at multiple geographic levels, and thus to test of the modifiable areal unit problem (Fotheringham and Wong, 1991). This issue is also tied to the uncertain geographic context problem (e.g., in multilevel modeling), referring to the sensitivity in research findings to different delineations of contextual units (Kwan, 2012). Furthermore, since similar areas are merged to

142

Spatial Analysis Methods

(A) 0

(B) 5

10

20 km

0

5

10

Chicago: Local clusters Not significant

Local Moran: z-scores Chicago: Late stage breast cancer rate

Fig. 9

HH

–3.330880 to –0.769678 –0.769677 to 0.358605

HL

0.358606–1.245820

LH

1.245821–2.572920 2.572921–4.581660

LL

(A) z-scores of local Moran for late-stage breast cancer rates in Chicago, (B) identified clusters.

0

5

10

20 km

Chicago City: Gi-z scores Late stage breast cancer rate 2.58 Std. Dev.

Fig. 10

z-scores of local G-statistic for late-stage breast cancer rates in Chicago.

20 km

Spatial Analysis Methods

143

form new regions, spatial autocorrelation is less of a concern in the new regions (Mu and Wang, 2008). The advancement of GIS technology has revitalized this line of work and expanded its applications.

1.10.5.1

Earlier GIS-Based Regionalization Methods

Some earlier GIS-based regionalization methods came from public health applications. For instance, Black et al. (1996) developed the ISD method (after the Information & Statistics Division of the Health Service in Scotland where it was devised) to group a large number of census enumeration districts (EDs) in the United Kingdom into larger analysis units of approximately equal population size. Lam and Liu (1996) used the spatial order method to generate a national rural sampling frame for HIV/AIDS research, in which some rural counties with insufficient HIV cases were merged to form larger sample areas. Both approaches emphasize spatial proximity, but neither considers within-area homogeneity of attribute. Haining et al. (1994) attempted to consolidate many EDs in the Sheffield Health Authority Metropolitan District in the United Kingdom to a manageable number of regions for health service delivery (commonly referred to as the “Sheffield method”). The Sheffield method started by merging adjacent EDs sharing similar deprivation index scores (i.e., complying with within-area attribute homogeneity), and then used several subjective rules and local knowledge to adjust the regions for spatial compactness (i.e., accounting for spatial proximity). The method attempts to balance the two criteria of attribute homogeneity and spatial proximity, a major challenge in regionalization analysis. Among others, the AZP (Openshaw, 1977; Openshaw and Rao, 1995; Cockings and Martin, 2005; Grady and Enander, 2009) and the MaxP (Duque et al., 2012) are two popular regionalization methods that are automated in a GIS environment. The AZP method starts with an initial random regionalization and then iteratively refines the solution by reassigning objects to neighboring regions to improve an objective function value (e.g., maximum homogeneity within derived regions), and therefore the regionalization result varies, depending on the initial randomization state. The MaxP groups a set of geographic areas into the maximum number of homogeneous regions such that the value of a spatially extensive regional attribute is above a predefined threshold value. Neither of these methods guarantees that the newly formed areas have population above a threshold, an important property desirable in many health studies. The following discusses two recent methods in more detail that account for this constraint: the REDCAP (Guo and Wang, 2011; Wang et al., 2012) and the mixed-level regionalization (MLR) method (Mu et al., 2015).

1.10.5.2

REDCAP Method

REDCAP refers to a family of methods, termed regionalization with dynamically constrained agglomerative clustering and partitioning. REDCAP extends the single-linkage (SLK), average-linkage (ALK), complete-linkage (CLK), and the Ward hierarchical clustering methods to enforce the spatial contiguity of clusters and obtain a set of regions while explicitly optimizing an overall homogeneity measure (Guo, 2008). In essence, the goal of REDCAP is to construct a set of homogeneous regions by aggregating contiguous small areas of similar attribute values (e.g., socioeconomic structure). The homogeneity measure is the total sum of squared deviations (SSDs) (Everitt, 2002), written as SSD ¼

nr X k X d  X

xij  xj

2

r¼1 i¼1 j¼1

where k is the number of regions, nr is the number of small areas in region r, d is the number of variables considered, xij is a variable value, and xj is the regional mean for variable j. Each input data variable should be normalized and a weight can be assigned for each variable. As illustrated in Fig. 11, REDCAP is composed of two steps: (1) contiguity-constrained hierarchical clustering and (2) top-down tree partitioning. The shade of each polygon represents its attribute value and similar shades represent similar values. Two polygons are considered contiguous in space if they share a segment of boundary (i.e., based on the Rook contiguity). In the first step, as shown in Fig. 11A, REDCAP constructs a hierarchy of spatially contiguous clusters based on the attribute similarity under the contiguity constraint. Two adjacent and most similar areas are grouped to form the first cluster; two adjacent and most similar clusters are grouped together to form a higher-level cluster; and the process continues until the whole study area is one cluster. A clustering tree is generated to fully represent the cluster hierarchy (i.e., each cluster at any level is a subtree in the map). In the second step, as shown in Fig. 11B, REDCAP partitions the tree to generate two regions by removing the best edge (i.e., 11–15 in Fig. 11B) that optimizes the homogeneity measure (i.e., SSD) as defined above. In other words, the two regions are created in a way that the total withinregion homogeneity is maximized. The partitioning continues until the desired number of regions is reached. Note that the first step (i.e., contiguity-constrained clustering) is a bottom-up process, which builds a hierarchy of spatially contiguous clusters but does not directly optimize the objective function. The second step (i.e., tree partitioning) is a top-down approach that directly optimizes the objective function. REDCAP is similar to the SKATER method (Assunҫӑo et al., 2006) in terms of the two-step framework but significantly outperforms the latter according to criteria such as total heterogeneity, region size balance, internal variation, and preservation of data distribution (Guo, 2008). This above process is further modified to accommodate one more constraint, the minimum regional size (Wang et al., 2012). It can be a threshold region population and/or the number of incidents (whichever is the denominator in a rate estimate). Such a constraint is enforced in the second step, that is, tree partitioning. For each potential cut, if it cannot produce two regions that both satisfy the constraints, the cut will not be considered as a candidate cut. Then the best of all candidate cuts is chosen to partition a tree into two regions. If there is no candidate cut

144

Spatial Analysis Methods

(A)

(B) 3

2

1

4 7

6

5

1

4

3

2

8

7

6

5 8

10

22

16

25

24

23

18

22

17

16

15

14

21

20

19

25

24

23

13

12

11

17

21

20

19

18

10 9

15

14

13

12

11

9

Fig. 11 An example illustrating REDCAP: (A) a spatially contiguous tree is built with a hierarchal clustering method; (B) partitioning the tree by removing the edge that optimizes the SSD measure. Wang F, Guo D, and McLafferty S (2012) Constructing geographic areas for cancer data analysis: a case study on late-stage breast cancer risk in Illinois. Applied Geography 35: 1–11. http://dx.doi.org/10.1016/j.apgeog.2012.04.005.

(i.e., no cut can produce regions that satisfy the constraints), then the region will not be partitioned further. If none of the current regions can be cut, the regionalization process stops.

1.10.5.3

MLR Method

The MLR method is a hybrid model built upon two earlier methods: the Peano curve algorithm (Bartholdi and Platzman, 1988; Mandloi, 2009) and the modified scale-space clustering (MSSC) (Mu and Wang, 2008). The former is utilized to achieve spatial compactness, and the latter addressed attributive homogeneity. The spatial order method uses space-filling curves to determine the nearness or spatial order of areas. The first algorithm was developed by Italian mathematician Peano (1890), and thus also referred to as Peano curve. Space-filling curves traverse space in a continuous and recursive manner to visit all areas, and assign a spatial order (from 0 to 1) to each area based on its relative positions in a two-dimensional space. As shown in Fig. 12, the spatial order of each area is calculated and labeled (on the left). The centroids of

0

0.05

0.25

0.3

0.45

0.5

0.2

0.35

0.4

0.55

0.05

0.9

0.85

0.7

0

0.95

0.8

0.75

0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55

0.25

0.3

0.45

0.5

0.2

0.35

0.4

0.55

0.05

0.9

0.85

0.7

0

0.95

0.8

0.75

0.7 0.75 0.8 0.85 0.9 0.95

Fig. 12 An example of assigning spatial order values to areas. Mu L, Wang F, Chen VW, and Wu X (2015) A place-oriented, mixed-level regionalization method for constructing geographic areas in health data dissemination and analysis. Annals of the Association of American Geographers 105: 48– 66.

Spatial Analysis Methods

145

these areas in the 2D space are then mapped onto the 1D line underneath. The Peano curve connects the centroids from the lowest spatial order to the highest (on the right). Based on their spatial order values, the areas can be grouped by using a regular cluster analysis method. For instance, the “Spatial Sort Method” (with the option PEANO) available in ArcGIS is based on one of the algorithms developed by Bartholdi and Platzman (1988). In general, areas that are close together have similar spatial order values, and areas that are far apart have dissimilar spatial order values. The method provides a first-cut measure of closeness (Lam and Liu, 1996; Wang and O’Brien, 2005). The spatial order method only considers spatial proximity, but not within-area attribute homogeneity. The MSSC method accounts for that while considering spatial contiguity. The method developed by Mu and Wang (2008) is modified and improved from the early versions of scale-space clustering methods such as the melting algorithm (Wong, 1993; Ciucu et al., 2003) and the blurring algorithm (Leung et al., 2000; Wang, 2005). The idea is to melt each spatial object (area) to its most similar neighbor. The similarity is measured by the attribute distance between two neighboring areas. An object i has t attributes standardized as (xi1, ., xit); and its adjacent objects j have attributes also standardized as (xj1, ., xjt). The attribute distance between i and j is defined as X 2 xit  xjt Dij ¼ t

Among i’s neighboring objects (l ¼ 1, 2, ., m), its most similar neighbor (k) is the one with the shortest attribute distance from it, termed minimum-distance criterion, such that Dik ¼ minfDil g l

An object with the highest attribute value among its surrounding objects serves as a local nucleus, and the procedure searches outward for its most similar neighboring object for grouping. A branch of the outward searching process continues until it reaches an object with the lowest attribute value among its surrounding objects. The grouping process is much like our cognitive process of viewing a picture: the pattern or structure is captured by its brightest pixels, and the surrounding darker pixels serve as the background. In practice, one needs to follow the direction from a local minimum to a local maximum, and groups all the objects along the way. By merging surrounding areas (up to local minima) to the local maxima, a region is simplified with fewer areas while the structure is preserved. The MLR method integrates the aforementioned spatial order method and the MSSC method by explicitly balancing spatial compactness and attributive homogeneity in the process of spatial clustering. Specifically, the spatial order method yields a spatial order osi for each area i. The MSSC defines the attributive order oai based on the attribute-driven melting process. After both the spatial order osi and attributive order oai are normalized, the MLR balances the factors by their corresponding weights ws and wo to define an integrated clustering order oi such as oi ¼ ws ,osi þ wa ,oai where oi ¼ integrated clustering order value for unit i, ws ¼ weighting factor of spatial consideration, and ws > 0, osi ¼ normalized spatial order from Peano Curve algorithm, wa ¼ weighting factor of attributive consideration, oai ¼ normalized attributive order based on MSSC, and subject to ws þ wa ¼ 1. In practice, the value of ws is larger than 0, and preferably larger than 50% for emphasizing entity connectivity. The additional constraint such as threshold population for derived regions can be easily enforced in the clustering process. The MPC method is implemented in Python as a convenient ArcGIS toolkit (Mu and Wang, 2015). As the name suggests, the MLR method is developed for decomposing areas of large population (to gain more spatial variability) and merging areas of small population (to mask privacy of data) in order to obtain regions of comparable population. For instance, for rural counties with small population, it is desirable to group counties to form regions of similar size; and for urban counties with large population, it is necessary to segment each county into multiple regions also of similar size and each region is composed of lower-level areas (e.g., census tracts). The resulting regions are mixed-level, with some tract regions (subcounty level), some singlecounty regions, and some multicounty regions (Mu et al., 2015). Very often the task is to cluster the same-level small areas into larger areas of comparable population. As it is the case in our example below, the MLR method collapses to a simple singlelevel regionalization.

1.10.5.4

An Illustrative Example for REDCAP and MLR Methods

The example is based on the study reported in Wang and Robert (2015). Both the REDCAP and MLR methods are applied for comparison. The sample data and the MLR program can be downloaded via the same link. See Wang (2015, pp. 201–215) for detailed illustration of implementation.

146

Spatial Analysis Methods Table 6

Comparing regions derived by REDCAP and MLR

Compactness (isoperimeter quotient) Range (min–max) Mean Coefficient of variation for CDI Range (min–max) Mean

REDCAP-derived regions (n ¼ 50)

MLR-derived regions (n ¼ 57)

0.1694–0.7299 0.4178

0.1305–0.7898 0.4417

0–17.1574 1.0455

0–17.1574 1.8215

The example examines the relationship between the concentrated disadvantage index (CDI) and homicide rates in New Orleans, Louisiana. The CDI is also used to define attribute similarity between areas. The homicide rate is measured as homicides per 100,000 population. The analysis is initially conducted at the census tract level; however, many census tracts have small population and thus do not have sufficient base population to compute reliable homicide rates. We then use the REDCAP and the MLR methods to construct new areas with a threshold population of 3500. In implementing the MLR, we set ws ¼ 0.75. 175 census tracts are grouped into 50 regions by the REDCAP method and 57 regions by the MLR. Here two simple properties such as spatial compactness and attributive homogeneity in the derived regions are used to assess the results of regionalization. Isoperimeter quotient, a simple spatial compactness index, is defined as the ratio of a region’s actual area size over the area of a circle having the same perimeter of the region. A higher isoperimeter quotient indicates a more compact region in shape. Attributive homogeneity can be measured as the coefficient of variation (i.e., standard deviation divided by mean) of the attribute variable (here, CDI) for census tracts within a region. A lower coefficient of variation implies a higher level of homogeneity in a constructed region. As reported in Table 6, the REDCAP-derived regions are more homogenous but less compact than the MLRderived regions. Recall that the REDCAP method explicitly strives to maximize the overall homogeneity measure in its algorithm and understandably yields more homogenous regions. The MLR method balances the tradeoff between spatial compactness and attributive homogeneity by user-defined weights. Our experiment assigns a much larger weight for spatial compactness (75%) than the weight for attributive homogeneity (25%), and therefore the MLR does a better job in achieving compactness of regions. When different weights are used, the MLR results will conceivably change. In addition, a larger number of regions from the MLR (57) than the REDCAP (50) can also be considered more desirable in most studies as it preserves better spatial resolution in the aggregation process.

1.10.6

Conclusion

This article examines techniques for analyzing the spatial patterns of point and area features. The first three sections cover topics related to point features, and the next two sections introduce topics on area features. Section “Descriptive Measures of Point Features” provides some descriptive measures of point features in terms of central tendency and spatial dispersion. Central tendency measures include mean center, median center, and central feature, and spatial dispersion measures include standard distance and deviational ellipse. Section “Inferential Measures of One Type of Points” discusses inferential measures of one type of points such as quadrat analysis, ordered neighbor statistics, and Ripley’s K-function to assess whether an observed pattern is random, dispersed, or clustered. Section “Collocation Analysis of Two Types of Points” examines colocation of two types of points by cross K-function and CLQ in order to measure the extent that they are within the vicinity of each other. Section “Area-Based Analysis of Spatial Autocorrelation” analyzes spatial autocorrelation of areal features by popular indices such as the join-count statistic, Moran’s I, Geary Ratio, and Getis-G statistic. Section “Regionalization Methods” introduces several GIS-automated regionalization methods, such as REDCAP and MLR, to aggregate smaller and similar (and thus spatially autocorrelated) areal units into larger regions. All sections have some illustrative examples to explain the implementation process with sample data and programs to download.

References Anselin, L., 1995. Local Indicators of Spatial Association – LISA. Geographical Analysis 27, 93–116. Assunҫӑo, R.M., Neves, M.C., Cӑmara, G., Freitas, C.D.C., 2006. Efficient regionalization techniques for socio-economic geographical units using minimum spanning trees. International Journal of Geographical Information Science 20, 797–811. Bartholdi III, J.J., Platzman, L.K., 1988. Heuristics based on spacefilling curves for combinatorial problems in Euclidean space. Management Science 34, 291–305. Berry, B.J.L., Marble, D.F. (Eds.), 1968. Spatial analysis: a reader in statistical geography. Prentice Hall Inc., Englewood Cliffs, NJ. Black, W.R., 1992. Network autocorrelation in transport network and flow systems. Geographical Analysis 24, 207–222. Black, R.J., Sharp, L., Urquhart, J.D., 1996. Analysing the spatial distribution of disease using a method of constructing geographical areas of approximately equal population size. In: Alexander, P.E., Boyle, P. (Eds.), Methods for investigating localized clustering of disease. International Agency for Research on Cancer, Lyon, pp. 28–39. Boots, B.N., Getis, A., 1988. Point pattern analysis. Sage, Newberry Park, CA. Christaller, W., 1933. Central places in Southern Germany. Translated by C W Baskin (1966, in English).. Prentice Hall, Englewood Cliffs, NJ.

Spatial Analysis Methods

147

Ciucu, M., Heas, P., Datcu, M., Tilton, J.C., 2003. Scale space exploration for mining image information content. In: Zaiane, O.R., Simoff, S., Djeraba, C. (Eds.), Mining multimedia and complex data. Springer, Berlin/New York, pp. 118–133. Cliff, A.D., Ord, J.K., 1981. Spatial processes: models and applications. Pion, London. Cliff, A., Haggett, P., Ord, J., Bassett, K., Davis, R., 1975. Elements of spatial structure. Cambridge University, Cambridge. Cockings, S., Martin, D., 2005. Zone design for environment and health studies using pre-aggregated data. Social Science & Medicine 60, 2729–2742. Cressie, N.A.C., 1991. Statistics for spatial data. Wiley, New York. Cromley, R.G., Hanink, D.M., Bentley, G.C., 2014. Geographically weighted colocation quotients: specification and application. The Professional Geographer 66, 138–148. Dacey, M.F., 1965. A review of measures of contiguity for two and K-color maps. In: Berry, B.J.L., Marble, D.F. (Eds.), Spatial analysis: a reader in statistical geography. PrenticeHall, Englewood Cliffs, NJ, pp. 479–495. Ding, Y., Fotheringham, A.S., 1992. The integration of spatial analysis and GIS. Computers, Environment and Urban Systems 16, 3–19. Donnelly, K.P., 1978. Simulations to determine the variance and edge effect of total nearest neighbor distance. In: Hodder, I. (Ed.), Simulation methods in archaeology. Cambridge University Press, Cambridge, pp. 91–95. Duque, J.C., Anselin, L., Rey, S.J., 2012. The Max-P-regions problem. Journal of Regional Science 52, 397–419. Everitt, B.S., 2002. The Cambridge dictionary of statistics. Cambridge University Press, Cambridge. Fotheringham, A.S., O’Kelly, M.E., 1989. Spatial interaction models: formulations and applications. Kluwer Academic, Amsterdam. Fotheringham, A.S., Wong, D.W.S., 1991. The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A 23, 1025–1044. Geary, R., 1954. The contiguity ratio and statistical mapping. The Incorporated Statistician 5, 115–145. Getis, A., Ord, J.K., 1992. The analysis of spatial association by use of distance statistics. Geographical Analysis 24, 189–207. Grady, S.C., Enander, H., 2009. Geographic analysis of low birthweight and infant mortality in Michigan using automated zoning methodology. International Journal of Health Geographics 8 (10). Greig-Smith, P., 1952. The use of random and contiguous quadrats in the study of the structure of plant communities. Annals of Botany 16, 312. Griffith, D.A., Amrhein, C.G., 1991. Statistical analysis for geographers. Prentice Hall, Englewood Cliffs, NJ. Guo, D., 2008. Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). International Journal of Geographical Information Science 22, 801–823. Guo, D., Wang, H., 2011. Automatic region building for spatial analysis. Transactions in GIS 15 (s1), 29–45. Haining, R., Wises, S., Blake, M., 1994. Constructing regions for small area analysis: material deprivation and colorectal cancer. Journal of Public Health Medicine 16, 429–438. Kuhn, H.W., Kuenne, R.E., 1962. An efficient algorithm for the numerical solution of the Generalized Weber Problem in spatial economics. Journal of Regional Science 4, 21–33. Kwan, M.-P., 2012. The uncertain geographic context problem. Annals of the Association of American Geographers 102, 958–968. Lam, N.S.-N., Liu, K., 1996. Use of space-filling curves in generating a national rural sampling frame for HIV-AIDS research. The Professional Geographer 48, 321–332. Lee, S., 2001. Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. Journal of Geographical Systems 3, 369–385. Lee, J., Wong, D.W.S., 2001. Spatial analysis with ArcView. Wiley, Hoboken, NJ, 192. Leslie, T.F., Kronenfeld, B.J., 2011. The colocation quotient: a new measure of spatial association between categorical subsets of points. Geographical Analysis 43, 306–326. Leung, Y., Zhang, J.S., Xu, Z.B., 2000. Clustering by scale-space filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 1396–1410. Levine, N. (2002). CrimeStat: a spatial statistics program for the analysis of crime incident locations (v 2.0). Houston, TX/Washington, DC: Ned Levine & Associates/The National Institute of Justice. Available at http://www.nij.gov/topics/technology/maps/pages/crimestat.aspx (last accessed 1-19-2017). Mandloi, D., 2009. Partitioning tools. Environmental Systems Research Institute, Inc. (ESRI), Redlands, CA. Mitchell, A., 2005. The ESRI guide to GIS analysis, Volume 2: spatial measurements & statistics. ESRI Press, Redlands, CA. Mu, L., Wang, F., 2008. A scale-space clustering method: mitigating the effect of scale in the analysis of zone-based data. Annals of the Association of American Geographers 98, 85–101. Mu L and Wang F (2015) Appendix 9B: A toolkit of the mixed-level regionalization method. In: Wang F (ed.) Quantitative methods and socioeconomic applications in GIS (2nd edn.), pp. 213–215. Boca Raton, FL: CRC Press. Mu, L., Wang, F., Chen, V.W., Wu, X., 2015. A place-oriented, mixed-level regionalization method for constructing geographic areas in health data dissemination and analysis. Annals of the Association of American Geographers 105, 48–66. Okabe, A., Yamada, I., 2001. The K-function method on a network and its computational implementation. Geographical Analysis 33 (3), 271–290. Openshaw, S., 1977. A geographical solution to scale and aggregation problems in region-building, partitioning, and spatial modelling. Transactions of the Institute of British Geographers 2, 459–472. Openshaw, S., Rao, L., 1995. Algorithms for reengineering 1991 census geography. Environment & Planning A 27 (3), 425–446. Peano, G., 1890. Sur une courbe qui remplit toute en aire plaine. Mathematische Annalen 36. Ripley, B.D., 1976. The second-order analysis of stationary point processes. Journal of Applied Probability 13, 255–266. Ripley, B.D., 1977. Modelling spatial patterns. Journal of the Royal Statistical Society: Series B: Methodological 39, 172–212. Taylor, P.J., 1977. Quantitative Methods in Geography: An Introduction to Spatial Analysis. Waveland Press, Prospect Heights, IL. Wang, F., 2005. Job access and homicide patterns in Chicago: an analysis at multiple geographic levels based on scale-space theory. Journal of Quantitative Criminology 21, 195–217. Wang, F., 2015. Quantitative methods and socioeconomic applications in GIS, 2nd edn. CRC Press, Boca Raton, FL. Wang, F., O’Brien, V., 2005. Constructing geographic areas for analysis of homicide in small populations: testing the herding-culture-of-honor proposition. In: Wang, F. (Ed.), GIS and crime analysis. Idea Group Publishing, Hershey, PA, pp. 83–100. Wang, F., Robert, L.K., 2015. Constructing geographic areas by REDCAP and MLR for analysis of homicide rates: a case study of New Orleans, Louisiana. Papers in Applied Geography 1, 295–306. Wang, F., Guo, D., McLafferty, S., 2012. Constructing geographic areas for cancer data analysis: a case study on late-stage breast cancer risk in Illinois. Applied Geography 35, 1–11. Wang, F., Zhang, L., Zhang, G., Zhang, H., 2014. Mapping and spatial analysis of multiethnic toponyms in Yunnan, China. Cartography and Geographic Information Science 41, 86–99. Wang, F., Hu, Y., Wang, S., Li, X., 2017. Local indicator of colocation quotient with a statistical significance test: examining spatial association of crime and facilities. Professional Geographer 69, 22–31. Wong, Y.-F., 1993. Clustering data by melting. Neural Computation 5, 89–104. Wong, D.W.S., Lee, J., 2005. Statistical analysis and modeling of geographic information. Wiley, Hoboken, NJ.

Further Reading Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically weighted regression: the analysis of spatially varying relationships. Wiley, Hoboken, NJ.

1.11

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)

Chen Xu, University of Wyoming, Laramie, WY, United States © 2018 Elsevier Inc. All rights reserved.

1.11.1 1.11.1.1 1.11.1.2 1.11.1.3 1.11.2 1.11.2.1 1.11.2.2 1.11.2.3 1.11.2.4 1.11.2.5 References

1.11.1

Cloud Computing Cloud Computing Architecture Amazon EC2 Amazon EC2 Big Data Analytic Platform Example Hadoop/Spark Platform Hadoop Spark Hadoop-/Spark-Based GIS Hadoop-/Spark-Based GIS Examples The Future of GIS on Big Data Analytic Frameworks

148 148 149 150 150 150 151 151 151 152 152

Cloud Computing

Cloud computing effectively solves the challenge of provision of abundant computing resources demanded by computing- or dataintensive geographic information processing tasks. Many urgent problems that humans face today and the exploration of their solutions demand elastic provisioning of computing resources in terms of hardware, software, data resources, etc. Cloud computing provides one of the advanced computing infrastructures, and it targets crunching big amounts of data in a time efficient way by delivering on-demand computing resources as services. Cloud computing, according to National Institute of Standards and Technology (NIST, 2011) of U.S. Department of Commerce, is “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST further defined the essential characteristics of cloud computing as on-demand selfservice, broad network access, resource pooling, rapid elasticity, and measured service. With on-demand self-service, consumers of cloud computing control the provision and utilization of computing resources; Broad network access provides the basic support as now computing is delivered as service; l Through resource pooling, cloud computing providers dynamically allocate available computing resources in the pool to consumers as requested by them; l Rapid elasticity empowers, as computing resources are in a perpetual cycle of provision and releasing, consumers by supporting their computations with seemingly unlimited capabilities; l These capabilities are metered as the amount of services utilized by consumers; thus, the utilization of cloud computing is transparent to both service providers and consumers. l l

Broadly speaking, cloud computing is the sum of hardware virtualization, service-oriented computing, and utility computing (Georgakopoulos and Papazoglou, 2008; Armbrust et al., 2010). It realizes a model for providing ubiquitous network access to computing resources, which can be conveniently configured. The process is on-demand and allows services to be acquired and released with minimum interaction between service providers and service receivers. Mell and Grance (2009) define three service models including software as a service, platform as a service, and infrastructure as a service; they also identify four deployment models consisting of private cloud, community cloud, public cloud, and hybrid cloud.

1.11.1.1

Cloud Computing Architecture

As a variety of cloud computing platforms were designed by different cloud computing providers, there have been numerous cloud computing architectures implemented. Meanwhile, unique demands from different applications also influence the actual architecture of cloud computing platforms. In order to accommodate the variations and create a standard, NIST has specified a vender-independent conceptual architecture of Fig. 1 (Liu et al. 2011). The ecosystem of cloud computing as illustrated in Fig. 1 comprises several key roles in serving computing services to services users or cloud consumers. Where cloud provider is the leading role in cloud computing business, cloud auditor, cloud broker, and cloud carrier are indispensable supportive roles. Usually, cloud consumers interact with cloud providers. For example, in a task to compute the median center of individual Twitter user’s activities, researchers requested computing resources from Amazon Elastic Compute Cloud (EC2), a major cloud computing provider in the field, to support analyzing millions of users simultaneously. Amazon EC2 is one of the major providers for offering Infrastructure as a Service (IaaS) type of cloud computing. Other than IaaS, there are Platform as a Service (PaaS) and Software as a Service (SaaS) cloud computing models. Google Analytics and

148

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)

149

Cloud provider Cloud broker

Service orchestration Cloud service management

Service layer SaaS

Security audit Privacy impact audit

IaaS Resource abstraction and control layer

Provisioning/ configuration

Physical resource layer Hardware

Performance audit

Business support

Privacy

Cloud auditor

Service intermediation

PaaS

Security

Cloud consumer

Service aggregation Service arbitrage

Portability/ interoperability

Facility

Cloud carrier Fig. 1

NIST conceptual reference cloud computing model. Source: https://bigdatawg.nist.gov/_uploadfiles/M0008_v1_7256814129.pdf.

Microsoft Azure and Visual Studio are two examples of PaaS. Many service providers exist in SaaS markets. In GIS domain, ESRI’s ArcGIS online provides mapping services to GIS users. From SaaS to IaaS, cloud computing users gain increasing flexibility in terms of computing resources and computing process configurations. Contrasting to conventional desktop computer systems, SaaS is like in the scenario where a user uses preinstalled software only; PaaS allows a user to develop new tools based on provided development environment; and IaaS is where a user receives a barebone computer. At the same time, accessibilities to services are different depending on the deployment types. Public clouds allow open access and use to the general public, community clouds grant access and use to users within an organization or a community of organizations with common interests, private clouds exclude users from outside the host organizations to the deployed cloud computing platforms, and when heterogeneous deployments exist in a single cloud computing platform, it is called a hybrid cloud.

1.11.1.2

Amazon EC2

Amazon is a pioneer in providing public cloud services to the general public. Within the Amazon Web Services (AWS) architecture, there are six layers of functionalities working together toward delivering a variety of computing services to users. These six layers comprise physical infrastructure layer, infrastructure building blocks layer, platform building blocks layer, cross service layer, tool and API layer, and application layer. Cloud computing services are achieved by virtualizing physical computers. Physical computers are maintained in data centers and form the physical infrastructure layer. As services are delivered through networks, geographic distance is still a factor that affects service quality. In order to minimize latency, Amazon has data centers in nine geographic regions around the world. For example, AWS GovCloud (United States) ensures sensitive data and computing stay inside the US region. Availability zones isolate multiple zones within a region from influencing each other. Also, Amazon implements AWS Edge locations as to bring data and computing closer to cloud users. Infrastructure building block layer provides virtualized computing resources to users, which include computing units, data storages, etc. For example, Amazon EC2 is the venue for computing processing resource; Amazon S3 and Elastic Block Storage (EBS) services are the venues for data storage services, and Amazon VPC provides network communication services. Platform building blocks layer provides several PaaSs. For example, CloudFront provides content delivery services at 21 edge locations across the world. Cross service layer comprises cloud computing management services. Tool and API layer enable access to underlying AWS. Cloud applications in the application layer serve end users with functions built on functionalities offered by AWS. Users access cloud services through the AWS Management Console or the Amazon EC2 Amazon Machine Image (AMI) Tools. A user can launch an instance using one of Amazon’s predefined AMIs and then deploys applications to the functioning virtual machine. Yang and Huang (2013) described in detail the steps to configure and launch Amazon EC2 instances. GIS is facing the big data challenge. The volume and velocity challenges of big data can be solved by relying on cloud computing technologies. Based on the data volume, users can request suitable computing resources. The computing performance is monitored

150

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)

by the system and can trigger autoscaling to maintain the performance level. Once the computing demand spike past, computing resources can be scaled back to minimize computing resource waste. Given the characteristics of cloud computing, such as on-demand and rapid elasticity, geospatial analytics conducting using cloud computing services can be cost-effective compared to implementation cost of using desktop computers.

1.11.1.3

Amazon EC2 Big Data Analytic Platform Example

Yang and Huang (2013) gave an example of using Amazon EC2 service for archiving massive volume of climate simulation data and support web-based data analytics. The climate simulation model generated about 2.5 terabytes of data for a 10-year simulation. The first challenge in the project was to store the data effectively. Where conventionally the project would have to acquire a huge amount of physical storages, the project selected to use Amazon EBS service. Besides the on-demand feature of Amazon service helped reduce time and cost on acquiring and configuring storages, the EBS also ensured that data could be restored if any crash happened. The visual analytic functions were provided through a spatial web portal. In order to maintain usability of the visual tool, in the scenario of using physical computers, the number of servers would be able to meet the requirement of peak computing requests, which would inevitably cause waste of computing resource during normal time. Deployment of the spatial web portal to the Amazon EC2 leveraged the platform’s load balancing and autoscaling capabilities. Thus, portal performance had been maintained with maximum cost effectiveness.

1.11.2

Hadoop/Spark Platform

Cloud computing platforms, either from commercial suppliers such as Amazon EC2 or from open source communities such as OpenStack, make big spatial data analytics achievable regardless of the size of the data. However, to leverage the core advantages of cloud computing technologies such as elasticity and load balancing is not an easy task to general users who may not have insights about the technologies. In other words, the effective implementation of cloud computing demands sophisticated configuration of parallelization, fault tolerance, data distribution, load balancing, etc. The Apache Hadoop project has offered a platform that can help take care of the challenges such as parallelization, fault tolerance, data distribution, or load balancing. Built upon Hadoop, Apache Spark provides better computing performance.

1.11.2.1

Hadoop

HDFS Fig. 2

Apache Hadoop ecosystem.

MongoDB

Flink

Cassandra

YARN

HBase

Map reduce

Spark

Hive

Storm

Pig

Giraph

Zookeeper

Hadoop has two core components (Fig. 2): MapReduce and Hadoop Distributed File System (HDFS). MapReduce has created a mechanism to divide a computing task into multiple identical subtasks and then converge subresults into a single final result of the original computing task. HDFS is a Java-based file system that automates, for example, file distribution, scalability, and fault tolerance. Another important component of the Hadoop ecosystem is the HBase, an open source distributed column-oriented database sitting on top of HDFS. HBase provides Big-Table (https://cloud.google.com/bigtable/) like capabilities. HBase can effectively access large-scale sparse data in a fault-tolerant means. Tables in HBase serve as inputs and outputs for MapReduce jobs. HBase has to be accessed through Java API. Hadoop has created a system that can effectively and efficiently process unstructured data such as georeferenced social media data. Hadoop system can be deployed to a cloud computing platform as to harness potentially limitless computing resources. At the same time, Hadoop automates the management of resources once the system has been deployed. Hadoop is reliable, scalable, and efficient and enables distributed processing of large datasets in a MapReduce way. However, the mechanism of Hadoop managing computing processes leaves ample space for performance improvement. Hadoop is a disk-based framework; therefore, it writes all intermediate data to hard disks. The intermediate data between Map tasks and Reduce tasks can be of huge volume, and the writing to and reading from disks have greatly restricted the processing speed. Computer engineering has managed to construct huge amount of random-access memory for conducting an entire data-processing task within the memory. A new technology leverages the technology advancement and enables inmemory computing.

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)

151

Applications (Java, Scarla, Python, ...) Spark SQL

Spark Streaming

MLlib

GraphX

Spark Core YARN Data sources (HDFS, HBse, ...) Fig. 3

Apache Spark ecosystem.

1.11.2.2

Spark

Apache Spark (Fig. 3) is an in-memory computing framework for processing large-scale data. Spark leverages large amount of memory by creating a structure called Resilient Distributed Dataset (RDD). RDD allows transparent storing in-memory data storage and can persist the stored data to disk when necessary. Compared to Hadoop, Spark has achieved significant computing performance improvement by eliminating frequent reading and writing to disk. Another advantage of Spark is that the status of assigned computing resources to a computing task is maintained until the completion of the task, which is different from Hadoop where resources are frequently provisioned and released even for the same task. In spatial data analytics where a chain of multiple analyses is required, Spark has an obvious advantage over Hadoop because of the different computing resource and process mechanism. Also for spatial analyses that have iterations, Spark can clearly outperform the Hadoop. Another strength of Spark is that it has included in the framework supports for streaming process, machine learning, and graphic processing. Therefore, for real-time spatial analytics, Spark is an ideal platform. In addition, the framework has interfaces for multiple programming languages such as Python and R.

1.11.2.3

Hadoop-/Spark-Based GIS

GIS comes from an era of computing resource scarcity and current GIS inherits many computational design features from that time. GIS algorithms have been designed for optimal internal processing of datasets that can be efficiently handled by a single computer. With complex spatial analytics, whose computing resource demands are disproportionately larger than available computer capacity, single computer-based GIS is paralyzed. Big data platforms lift limitations on the size of computing resources that can be applied for data processing; for example, the entire big dataset can be loaded into RAM (Zaharia et al., 2012). Broadly speaking, recent spatial technology advancements have made two significant changes in geospatial data production. The first is that high fidelity remote sensing technologies and new generation of in-situ sensor web technologies facilitate an explosion of data about Earth surface. These types of data, which were not available 2–3 years ago, immediately expose the incapability of traditional GIS in dealing with big spatial data. The second change is the availability of massive amount and types of real-time spatial data about human society. The analysis of these new types of data can potentially support applications such as smart cities. From high performance computing to cloud computing, cyberinfrastructure-based systems divide and conquer geospatial big data challenges. Spatial datasets with size larger than the capacity of conventional standalone computers were divided and distributed to a cluster of networked computers that run parallel. However, with the exponential growth of data, the delivering data to code approach becomes increasingly inefficient; the high cost and high latency of data transmission become the new bottleneck. Apache Hadoop and Spark create a new computational paradigm by deploying code to data as a solution to get rid of the bottleneck. In the new big data platform, data locality becomes paramountly crucial in achieving higher computational performance; in other words, data storage and organization can significantly influence computing efficiency. Consequently, data storage and organization mechanism are pivotal in the effort to accelerate computing performance. It is still a major challenge to migrate geospatial analytics to big data platforms such as Hadoop and Spark. The nonlinear characteristics of many geospatial analyses render the conventional scaling up computing resources in proportion to data or computing demands only effective to a degree. In GIS applications, the trend that data accumulation rate outpaces hardware evolvement rate is likely to continue in the big data era. Solutions to maximize the performance of GIS software on limited hardware resources thus become paramountly important. In terms of GIS software designs, many of them have their roots in a time when computers had single cores and tiny size of RAMs. GIS operations thence are in sequential processing mode. For a nontrivial geospatial analysis task that can involve a variety of loosely coupled processes, their sequential implementations apparently become extremely inefficient. As the minimum computing time is determined by the most time-consuming process in an analytics task, allocating proper amount of computing resources according to the complexity of a process is also relevant in scheduling parallelized geospatial analytics.

1.11.2.4

Hadoop-/Spark-Based GIS Examples

GIS on Hadoop or Spark platform is a relative new development area. Many GIS applications over Hadoop and HBase have been designed to provide convenient and efficient query processing. SpatialHadoop (Eldawy and Mokbel, 2014) extends Hadoop and

152

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark)

consists of four layers namely language, storage, MapReduce, and operations. The language layer supports an SQL-like language to simplify spatial data query. The storage layer employs a two-level index to organize data globally and locally. The MapReduce layer allows Hadoop programs to exploit index structure. The operations layer provides a series of spatial operations like range query, KNN, and join. GIS applications on Spark are fewer than on Hadoop. Spark has not been fully investigated for GIS applications. You et al. (2015) conducted a geospatial analysis on Spark, which used join query process as an example and compared the performances between Spark and Impala frameworks. Xie et al. (2014) compared performances of query methods using R-tree and quad-tree on a Spark framework.

1.11.2.5

The Future of GIS on Big Data Analytic Frameworks

Today more data collected are in the form of geospatial data. Traditional sequential computation process is increasingly inefficient in face of the data tsunami. Parallelization and distributed computing gradually become the standard framework when conducting studies driven by massive geospatial datasets. This solution is effective partly because cloud computing service providers like Amazon EC2 make procuring massive amount of computing resources physically achievable and economically affordable, and partly because open source computing frameworks like Apache Hadoop and Spark are better at scaling computing tasks. It is at the early stage of moving geospatial computing toward using big data analytic frameworks. However, many computational intensive tasks can potentially benefit from the new technologies. Geospatial applications driven by massive noisy geospatial data demand means for dealing with uncertainties innate to the methodology. Decision-making under uncertainties is less deterministic and more probabilistic. In order to explore as comprehensive as possible all potential resolutions, multiple analyses have to be conducted simultaneously. Monte Carlo and Bayesian approaches provide the theoretical foundation to the challenge, but practical computational solutions only become reliably feasible recently. As GIS technologies move forward, new approaches have to be developed for integrating new data sources into analysis. For example, Internet of Things and sensor networks will generate huge amount of data about every facet of daily life. Effective and efficient data assimilation would be achievable only with support of suitable computing technologies like the big data analytic frameworks.

References Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M., 2010. A view of cloud computing. Communication of the ACM 53 (4), 50–58. http://dx.doi.org/10.1145/1721654.1721672. Eldawy A and Mokbel M (2014) SpatialHadoop: Towards flexible and scalable spatial processing using mapreduce. In: Proceedings of the 2014 SIGMOD PhD Symposium, pp. 46–50. New York: ACM. Georgakopoulos, D., Papazoglou, M.P., 2008. Service-oriented computing. The MIT Press, ISBN 9780262072960. Liu F, Tong J, Mao J, Bohn R, Messina J, Badger L, and Leaf D (2011) NIST Cloud Computing Reference Architecture. National Institute of Standards and Technology, Special Publication 500-292. https://bigdatawg.nist.gov/_uploadfiles/M0008_v1_7256814129.pdf (Accessed on 23rd April 2017). Mell, P., Grance, T., 2009. The NIST definition of cloud computing. National Institute of Standards and Technology 53 (6), 50. NIST (2011) The NIST definition of cloud computing. http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf (Accessed on 23rd April 2017). Xie X, Xiong Z, Hu X, et al. (2014) On massive spatial data retrieval based on Spark. In: Web-International Conference on Web-Age Information Management. Springer International Publishing. Yang, C., Huang, Q., 2013. Spatial cloud computing: A practical approach. CRC Press. You S, Zhang J, and Gruenwald L (2015) “Large-scale spatial join query processing in cloud.” 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW), Coex, Samsung-dong, Kangnam-gu, Seoul, Korea (South), 13–17 Apr 2015. IEEE. Zaharia M, Chowdhury M, Das T, et al. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked System Design and Implementation. USENIX Association.

1.12

Network Analysis

Kevin M Curtin, George Mason University, Fairfax, VA, United States © 2018 Elsevier Inc. All rights reserved.

1.12.1 1.12.2 1.12.3 1.12.4 1.12.4.1 1.12.4.2 1.12.4.3 1.12.5 1.12.6 1.12.6.1 1.12.6.2 1.12.6.3 1.12.6.4 1.12.7 1.12.8 References

1.12.1

Introduction The History of Network Analysis in GIS The Network as a Spatial Domain The Science of Network Analysis Graph Theory Topology Linear Referencing Network Models in GIS Methods of Network Analysis in Geography Analysis With Linear Referencing Network Descriptions and Measures Routing Network Spatial Statistics The Future of Network Analysis in Geography Conclusions

153 153 154 155 155 155 155 157 158 158 158 159 159 160 160 160

Introduction

Network analysis holds a prominent place in geographic information systems (GIS). This is a reflection of the earliest GIS data models that were implemented, its broad usefulness across domain areas, and its potential for future development. The fact that network analysis was a significant component of scientific geography during the quantitative revolution in geography in the 1950s led to its rapid introduction into GIS as analytical tools were added to the growing suite of functionalities in GIS software. Given the extraordinarily broad set of application areas that can take advantage of network structure, the use of networks in GIS has grown, and networks are likely to remain a dominant spatial platform for analysis in the foreseeable future. While networks analyzed in geography can be of great variety (road networks, river networks, utility networks, and more), it is the similarity in structure that rather than the diversity in application that truly provides the value for research and practice. Networks represent a fundamental spatial domain on which many phenomena can be located and over which many activities move. In order to comprehensively review the subject of network analysis in GIS, the history of network analysis in GIS will be first explored. The scientific underpinnings of network analysis as it is implemented in GIS will be discussed, including graph theory, topology, and the means of spatially referencing to networks. This will be followed by a review of how network models are instantiated in GIS. The broad set of analytical methods associated with network analysis will be outlined. In conclusion, an examination of the potential for network analysis moving forward will be presented.

1.12.2

The History of Network Analysis in GIS

The use of network representations in geographic practice extends back well before the advent of modern GIS and even well beyond even the earliest noncomputerized instantiations of GIS (mylar overlays for map algebraic analysis). Descriptions of the earliest practice of network analysis in geography can be found in other sources (Curtin, 2017); suffice it to say that network elements have long been crucial in cartography, navigation, and transport. As the quantitative science of geography developed, network analytic components were often included, including in the theoretical models of Von Thunen and Christaller, and in the work of the sequence of urban sociologists from the University of Chicago. As the Quantitative Revolution in Geography (1950s and 1960s) progressed, network analysis became a focus of research in and of itself. Hierarchical networks of places and networks of adjacent places were posited as the space over which people, products, or ideas were moving in diffusion modeling. Networks were the foundational space over which interactions took place when developing interaction (or gravity) models. The nature of the network structures themselves and how they evolve in different geographic and economic contexts became the fundamental focus of some human geographic research (Haggett and Chorley, 1970). Given the tight integration of network analysis with quantitative geography over such a long period of time, it is not surprising that network analytic techniques would evolve with the growth of GIS. That said, as is true with many subdisciplines within geography, network analysis has been truly revolutionized in the context of GIS. The tight coupling of GIS and network analysis is a consequence of the prevalence of interest in the subject combined with the necessity to use linear features as elements in the earliest GIS data models.

153

154

Network Analysis

The earliest computer-based GIS focused on the automation of cartographic tasks, including the depictions of networks (e.g., road, rail, and river networks on maps). Dot matrix printers would place successive symbols of varying types in sequence to represent these features. As the plotting hardware improved, continuous lines could be drawn with pen plotters. A watershed moment occurred in the evolution of GIS generally, and network analysis in GIS more specifically, when the US Bureau of the Census began the development of its GIS data model to support the field cartographic needs of census takers, and to digitally define the boundaries of census polygons (tracts, blocks, etc.). The most common reference features that were used by field employees to navigate to unreported dwellings were network features such as roads and railroads. For that reason, a program to digitize a comprehensive set of network featuresdfirst in urban areas, but eventually throughout the countrydwas undertaken by the federal government in support of the decennial census. Although the intended purpose of these data was not to perform network analysis in GIS as it is understood today, the outcome of the program was the availability of a massive set of digital data, in a format ready to import into GIS that could be used for network analysis at the users’ discretion. At a time when very little digital spatial data existed, this was an extraordinarily fortuitous happenstance for those interested in network analysis in GIS. In relatively short order, some of the analytical functions that had been developed outside GIS began to be integrated as functions within GIS software. Although a comprehensive discussion of methods for network analysis in GIS appears below, broadly speaking, the evolution of network analysis in GIS began with straightforward routing functions, expanded to network facility location models, and grew to address highly combinatorially complex routing problems (through heuristic methods), and even to entire GIS software packages designed for network analysis, often in the context of transportation applications. All of this development, however, required that the network be well defined, and that it be captured digitally in the GIS in such a way that procedures can act upon it. The following sections discuss the nature of the network space in GIS and the models for representing that space digitally.

1.12.3

The Network as a Spatial Domain

A key to understanding the extent to which network analysis is integrated in GIS is realizing that network functions are not merely a subset of tools that have been developed in the context of GIS. Rather, networks are one of several fundamental spatial domains on which human and physical geographic processes occur. There are relatively few fundamental spatial domains, although they can each be applied in many different contexts. Consider that the spatial domain with which all geographers are familiar is the globe itself. Most commonly, the domain is an ellipsoid that approximates the shape of the Earth or some portion of the Earth’s surface. Locations on that domain aredalmost universallydspecified with a geographic coordinate system with coordinates given by latitude, longitude, and elevation. Of importance here is that any location on that ellipsoid can be specified and can be the location of some element of a geographic process. For example, the location of a hurricane can exist at any location, and its track can move across the spatial domain in any direction. Of course, as all trained geographers know, it is common to alter the locations on the ellipsoidal spatial domain such that they are transferred to a planar spatial domain. The process of projecting to a plane permits the use of a planar coordinate system and the geometric operations that are developed for that domain. Most importantly for this discussion, though, is that the plane still represents a domain where locations can be specified anywhere; the plane is continuous spatial surface, geographic processes can operate at any place on that surface. In contrast, a third fundamental spatial domain are sets of points. There are many applications where the only appropriate location is at one or more of a distinct set of points. For example, many of the location–allocation models that have been developed require a set of points that are potential facility locations. No other location outside that set may be chosen to locate a facility. The network is another of these fundamental spatial domainsdor models of spacedthat has proven to be an accurate and extraordinarily useful representation for a wide range of geographic processes. The network isdwith very few exceptionsddescribed as a connected set of edges and vertices. Very broadly speaking, edges represent connections between places, and vertices are the places they connect. Using roads as an example, edges can serve as bi-directional roads (single centerline representation), as directions of traffic flow (dual carriageway representations), or even as lanes of traffic. Vertices can represent the intersections of roads, the places where turns or stops are acceptable, or even as the locations of cities that the road enters, depending on the scale of the geographic phenomena under study. Perhaps most importantly, this set of edges and vertices represents the entirety of the locations that are valid for a particular process, for example, the case of traffic accidents. For all intents and purposes, traffic accidents only occur on the road network. For analytical purposes, specifying a location for a traffic accident that is not on the road network is nonsensical. More importantly, it has been shown that treating network phenomena analytically as if they existed on a continuous surface can lead to misleading results (Yamada and Thill, 2004). Despite the dominance of the linear edge/point vertex network model, there are network analytic techniques in GIS that are designed to work on raster datasets rather than explicit sets of edges and vertices. However, in these cases, the raster structure is implicitly converted to a regular network where the center of the raster cells becomes the vertices in the network, and edges are constructed between neighboring cells to support movement or interaction between the cells. If the cell values contain some measure of terrain impedance (e.g., difficulty in traversing the cell or change in elevation), then the edge can be assigned an average of the values in the cells it connects. These implicit networks are functionally the same as explicit networks in that the same network analytic functions can be performed on any network of edges and vertices whether these network elements are derived from raster cells.

Network Analysis

1.12.4

155

The Science of Network Analysis

For virtually all sophisticated GIS functionality, there is some basis in the science of geography or in some related or supporting discipline. For network analysis in particular, several subdisciplines of mathematics provide a theoretical foundation for network analysis to complement the growing body of pure geographic theory. Those subdisciplinesdgraph theory and topologydare used to construct network representations to understand relationships among network elements, to prove properties of networks and spaces, and to implement network analytic functions. These are complemented by a geographic processdlinear referencing designed to support locating events on networks in the well-defined network spatial domain. Each of these is discussed in turn.

1.12.4.1

Graph Theory

In mathematical terms, a graph is synonymous with a network. A graph is a connected set of edges and vertices. Studies in graph theory examine such things as how to construct or draw the graphs (something useful for cartographic representations), how to describe and store graphs digitally (critical for the data models discussed below), how to color graphs (again critical for mapping applications), and how to route across graphs.a broadly useful body of knowledge. What graph theory does notdgenerallydconsider are the specific spatial structures of real-world features. That is, a roaddin graph theorydis simply an edge that connects two vertices. Its precise length, the curves it takes, or the changes in elevation are of no concern. It is the connectivity that provides the raw data for graph theoretical investigation.

1.12.4.2

Topology

Topology is a related subdiscipline of mathematics that is concerned with the nature of spaces. For the discussion here, the space of interest is network space. Although the mathematics of topology can consider space of great complexity and high dimensionality, the geographic cases are frequently concerned with only two or three dimensional space. The topological properties of graphs are those that are not altered by elastic deformations (such as a stretching or twisting). Therefore, properties such as connectivity and adjacency are topological properties of networks, and they remain constant even if the network is deformed by some process in a GISystem (such as projecting or rubber sheeting). The permanence of these properties allows them to serve as a basis for describing, measuring, and analyzing networks (Curtin, 2009). The most basic topological properties can lead to descriptive measures of networks.essentially allowing the most basic comparative analysis to occur. Such properties of networks as the numbers of edges and vertices or the degree of the vertices (the number of edges connecting to a vertex) can be used to compare networks to one another based on measures of network size and connectivity. A complementary way to describe networks is to categorize them into idealized types based on the topological relationships of the features or the regular pattern of features in the network. One of the most common examples is the Manhattan network type, which is characterized by edges crossing at right angles creating rectangular “blocks.” Other idealized types include hub and spoke networks, or tree networks. A special case of the tree network is the “Bus” network type, where a single main line has branches extending from either side. Bus networks are frequently used in utility applications. Other types that are often used in more theoretical network analysis work include line and ring networks or complete networks (where every vertex connects to every other vertex with a unique edge). Additional measures of connectivity based on topological properties are discussed in the section on methods for network analysis below.

1.12.4.3

Linear Referencing

As with all spatial domains, a mechanism must be created for specifying locations across that space in order to support analytical operations. The traditional geographic space employs the latitude and longitude coordinate system, and projected continuous spaces have a multitude of planar coordinate systems that can be chosen to suit the purpose of the spatial analysis that is to be undertaken (UTM, State Plane, and many others). Each of these reference systems has a well-defined origin, and means of specifying offsets from that origin, whether those offsets are angular measures or distance measures in any number of units. A comparable reference system can be defined for a network. Such a system is termed a linear referencing system (LRS) and using such a system to specify location in the network space is simply termed linear referencing. In an LRS, rather than using a single origin from which offsets are measured, many origin points can be defined on the network, and the location is the distance along some network feature (or set of features combined into a route). The mile markers along US highways (Federal Highway Administration, 2001; Federal Transit Administration, 2003) are the evidence of an LRS that is familiar to many travelers. The origins of the routes are (generally) the southern and western limits of the state through which the interstate runs, increasing to the north and east, respectively. A mile marker indicates the number of miles in true travel distance from the origin of the route to that marker. It is often far more useful to specify a location by referencing a mile marker (or similar linear reference) than it is to specify a UTM coordinate, for example. Consider that if a traffic accident or other emergency occurs on the highway, the mile marker location will indicate the true distances between emergency response crews and the emergency location (the distance can be easily calculated by simply subtracting mile marker values) and indicates the direction the response crew should travel to reach the emergency location. Although transportation applications are being used in this discussion, the method of linear referencing can be applied to any network phenomena. In fact, it is commonly used in hydrologic applications (e.g., to locate flow sensors in rivers) and utility

156

Network Analysis

applications where gauges, junctions, or other significant equipment must be located along the distribution network. It has been applied for mapping accidents, traffic stops, or other incident locations, displaying traffic counts along streets, maintaining the location of fleet vehicles, and performing asset management functions such as the recording of pavement conditions or the location of street signs, bridges, exits, and many other traffic-related objects (Federal Highway Administration, 2001). Although there are a number of theoretical linear referencing data models, many of which were developed developed under the auspices of the National Cooperative Highway Research Program project 20–27, (Vonderohe et al., 1997; Vonderohe et al., 1998; Koncz and Adams, 2002), a more practice-minded GIS user can follow a process for implementing linear referencing (Curtin et al., 2007) see Fig. 1. Such a process includes the identification of an application to which linear referencing is pertinent, the concomitant network representation to be employed, and the topological rules that must be followed. Given that different applications can often require networks with fundamentally different structures (e.g., road networks are very different from river networks), the clear definition of the network structure as it applies to the application is critical from the start of the linear referencing process. The second step is the determination of the route structuredor the underlying datumdto which events will eventually be linearly referenced. The choice of routes is an often overlookeddyet critical elementdin any linear referencing process. The routes are the fundamental spatial features along which locations are determined. Routes do not frequently correspond to the features in

Identify Application Areas

Choose Network Representations

Determine Route Structure

UberFeatures For Linear Events

Determine Topology Rules

Linear Referencing Process

Analyze: Intersection of Events Accidents on Road Types Breaks on Pipes Buffer? Query Historical Analysis

Create Events Data Collection

Display Event Data Cartographic Output

Representation For Point Events

Determine Measures Feature Length Based on GIS Layer Datum Direction

Maintain Create Alternative Representations Share Data Transform Concatenate

Fig. 1 An iterative 7-step linear referencing process. Adapted from Curtin KM, Arifin RR, and Nicoara G (2007) A Comprehensive Process for Linear Referencing, URISA Journal 19(2): 41–50.

Network Analysis

157

the underlying spatial database, rather they are a collection of connected spatial features that, in combination, comprise a more global network entity. Expanding on the interstate highway example from above, a typical route might be built from all the segments that make up Interstate 95 in the state of Virginia. The third step of this process concerns the definition of the measures along those routes. The measures can be derived from the cartographic representations in the GIS, from higher-quality measurements taken in the field, or can represent the percent of the distance along the route. In fact, any measurement system (impedance, cost, distance, and risk) can be used to associate measures with locations on the network. This flexibility is one of the most powerful elements of linear referencing, broadening the range of possible applications and analytical techniques far beyond those available with traditional referencing systems. The fourth step in linear referencing is to define how events will be defined, captured, and maintained on the network. Event data are point or linear features that occur along the network. These are the mile markers, the exits, the construction zones, the traffic accidents, or any other phenomenon that happens at some point along the network. The fifth step is concerned with the cartographic representation of linearly referenced events, and the sixth step outlines ways of analyzing those events once they have been referenced. These analytical techniques are described in more detail below. The final step in linear referencing is to update and maintain the measures and routes themselves, and the linearly referenced events to ensure that applications are using the best possible information in their analyses, and that the decisions taken during the process will hold for future analyses. Although the theory and practice of linear referencing has long been known to be an essential functionality in GIS software (Nyerges, 1990), and while many GIS software packages contain applications and tools to allow users to implement linear referencing (Goodman, 2001), these tools are not as frequently used as one might expect given the prevalence of network datasets in GIS practice.

1.12.5

Network Models in GIS

Network modeling encompasses a wide range of procedures, techniques, and methods for the collection and storage of phenomena that can be modeled in the form of connected sets of edges and vertices. There are a variety of network models in GIS, which are primarily differentiated by the topological relationships they maintain. Network models can act as the basis for location through the process of linear referencing. Although the graph theoretic definition of a networkda connected set of edges and verticesdapplies to network models as they exist in GISystems, the details of implementing networks have changed over time. The earliest computer systems for automated cartography employed the “spaghetti” data model for geographic objects including the elements of networks. This data model did not preserve topological properties among the parts of the network but simply recorded each feature individually with an identifier and a list of coordinates that defined its shape. Although this data model is simple to understand and efficient in terms of display, it is essentially useless for network analysis. The spaghetti data model persisted in Computer Aided Design software for several decades but was abandoned (at least for a time) by most GISystems in favor of the topologically integrated network models (Curtin, 2009). Although the idea of using topological information to store features developed independently in several places, it was the acceptance of a topological data model by the US Census Bureau that indirectly led the way for significant advances in network analysis. Although generating maps for field work and for the location of addresses to which questionnaires could be mailed was of importance, the Census Bureau was primarily interested in a means of generating well-defined polygons that would represent the Census Tracts, Census Blocks, Counties, and other enumeration areas with which even casual users of GIS are now familiar. In order to meet this need, researchers at the Census Bureau developed a topological data model named the Dual Incidence Matrix Encoding (DIME) model (Cooke, 1998). “Dual Incidence” refers to the storage of topological information among nodes (which nodes are incident creating lines) and among lines (which lines are incident creating polygons). The DIME databases evolved into the Topologically Integrated Geographic Encoding and Referencing (TIGER) files, and this data model for a time became the standard for vector representations in GIS. A similar data model was employed for the digital line graph series of products from the United States Geologic Survey. The introduction of topological incidence into network GISystem data structures had a profound influence on the ability to conduct network analysis. The graph theoretic methods developed over decades could now be employed. The vector data model has since dominated the application of Geographic Information Science, and variants of this model are still in wide use today (Curtin, 2007). However, because this data model was designed to support a set of hierarchical polygons, it constrained network analysis techniques. More specifically, in order to maintain complete polygonal coverage, the data model had to enforce planarity. Planar networks are those that have no edges intersecting without a vertex at that location. In contrast, in real-world transportation networks, it is common to have edges which cross in planar space without any connection between the edges. These include overpass–underpass crossings, bridge crossings, and even at-grade railroad and road crossings. No connectivity exists between the edges at the coordinates of the crossing. This limitation has necessitated the development of network-specific data models in GIS. These models variously allow for nonplanar intersections, for modeling flow across the network, and for identifying specialized features such as junctions or turn elements that restrict movement across the network. It has been found that networks can be more efficiently stored in structures that differ from standard relational database structures; therefore, over time network data models have come to rely on structures such as the forward star data structure (Evans and Minieka, 1992).

158

Network Analysis

There are several consequences of planar enforcement for network analysis. First, geographic features (such as roads) are divided into separate database objects (records) based on every intersection. That is, a road that is perceived and used as a single object in the transportation network would need to be represented as a series of records in the geographic database. This repetition increases the database size many times over and exposes the database to error when multiple features are assigned attribute values. Second, the planar network data model does not easily support the creation of bridge, overpass, or tunnel features; places where network features cross without an intersection being present. These limitations have necessitated the development of nonplanar data models designed specifically for network routing applications (Fohl et al., 1996). As network models have become further divorced from the TIGER data model over time, modelers have employed efficient storage structures such as the forward star representation (Evans and Minieka, 1992). The process of linear referencing described above can be used to remove the requirement of a highly segmented linear network. If entire network elements are treated as routes, regardless of the number of database features from which they are comprised, the functionality can be designed to perform on the network in a way that is more faithful to actual network behavior.

1.12.6

Methods of Network Analysis in Geography

With the full integration of network models into GIS, the advance of network analysis in geography has seen a rapid move forward. These advances have taken the form of networks as the foundation for location referencing systems, networks as the basis for advanced location modeling, and networks as the spatial domain for new spatial statistical techniques.

1.12.6.1

Analysis With Linear Referencing

The ultimate goal of implementing linear referencing (or any other process) in a GIS is to increase the ability to perform a diverse set of analyses. With routes and event data in hand, analysis can be performed on the event data through techniques such as overlays, intersections, and other techniques that are part of the linear referencing capabilities of most GIS software. Additionally, linear referencing allows an entirely new set of database queries to be made that differs from queries based on the underlying network. For example, the storage of data as event tables enables historical queries if events are date-stamped. However, while significant analytic capability is added through the linear referencing process, other traditional GIS analytic capabilities are lost. One example is the loss of traditional road network functions, such as shortest path determination or routing, due to the loss of nonplanar topology that results when the segments that comprise the routes must be merged (Curtin et al., 2007).

1.12.6.2

Network Descriptions and Measures

In network modelingdas in many scientific endeavorsdthe first concern is to define and describe the basic elements to be analyzed. Simple descriptions of graphs include the number of edges and vertices, the maximum, minimum, or total length of the edges, or the diameter of the graph (the length of the longest minimal cost path between vertices). As a measure of the complexity of a graph, the number of fundamental cycles (cycles that do not contain other cycles) can be computed from the number of edges, vertices, and subgraphs (disconnected parts of the graph). Measures of centrality within a graph include the maximum, minimum, or average degree of the vertices, the closeness of nodes to other nodes as measured by the distances between them, the betweenness of a node which is a measure of the number of paths between nodes that go through it, and eigenvector centrality which measures centrality via a neighborhood mechanism. A series of more complex indices have been developed to specify the level of connectedness of network elements (Kansky, 1963). These indices are more complex in that they compare two or more measures of networks, and they can loosely be grouped into measures of pure graph properties and measure of applied networks. Since topological properties are of primary concern, most of the indices of pure graph properties are measures of connectivity. Examples include: ➢ The Alpha Index: a measure of connectivity that compares the number of fundamental cycles in a graph to the maximum possible number of fundamental cycles. Since more cycles indicate greater connectivity in the graph, the larger the Alpha Index, the more connected the graph. The possible values of the Alpha Index range from 0 for graphs with no cycles (such as trees) to 1 for completely connected networks. ➢ The Beta Index: a measure of connectivity that compares the number of edges to the number of vertices in a graph. When comparing two graphs with the same number of vertices, if there are more edges in one of the graphs then that graph must be more thoroughly connected. ➢ The Gamma Index: a measure of connectivity that compares the number of edges in a graph to the maximum possible number of edges in a graph. Like the Alpha Index, the Gamma Index may have a value between 0 (a graph with no edges) and 1 (a completely connected graph). ➢ The measures just described can be used on any graph simply by counting the number of edges, vertices, and subgraphs. With applied networks comes additional attribute information such as the length of the edges or the area in which the graph exists. This information allows for another set of indices to be defined:

Network Analysis

159

➢ The Eta Index: the average length per link. The Eta index provides a comparative measure of the distance between vertices or the cost of traveling between nodes. ➢ The Pi Index: the relationship between the diameter of the graph and the total length of the edges in the graph. This index measures the shape of the graph, where a low value indicates a graph dominated by its diameter, with relatively little development off of this axis. A high value of the Pi Index indicates a greater level of development (more and longer edges) away from the diameter. ➢ The Detour Index: a comparison of the Euclidean distance between two points on the network and the actual travel distance between them. The closer the actual distance is to the Euclidean distance, the more efficiently the network overcomes distance. ➢ Network Density: a measure comparing the length of network edges to the areal extent of the network. The greater the length of edges per unit area, the more developed the network. These measures and indices begin to describe the properties of networks and provide a means for making basic comparisons among different networks or among different versions of the same network over time. Although this is not the appropriate venue to review them, there are many more advanced graph theoretic techniques for describing networks, for proving their properties, and for determining paths across them (Wilson, 1996).

1.12.6.3

Routing

Routing is the act of selecting a course of travel. This process is undertaken by nearly every active person every day. The route from home to school or work is chosen by commuters. The selection of stops one will make for shopping and other commercial activities and the paths between those stops is a routing activity. Package delivery services plan routes for their trucks in such a way that packages are delivered within specified time windows. School buses are assigned to routes that will pick up and deliver children in an efficient manner. Less tangible objects such as telephone calls or data packets are routed across information networks. Routing is the most fundamental logistical operation for virtually all transportation and communications applications. Generally, a routing procedure is based on an objectivedor goaldfor the route and a set of constraints regarding the route’s properties. By far, the most common objective for routing problems is to minimize cost. Cost can be measured in many different ways but is frequently defined as some function of distance, time, or difficulty in traversing the network. Thus, the problem of locating the least cost path (or shortest path) between two points across a network is the most common routing problem. It is also a problem for which there are several extremely efficient algorithms that can determine the optimal solution. The most widely cited algorithm that solves the least cost path problem on directed graphs with nonnegative weights was developed by Dijkstra (1959), and an even more efficient version of this algorithmdthe two-tree algorithmdexists (Dantzig, 1960). The shortest path problem is just one of a class of related routing problems that can be described as network design problems. Network design problems require that some combination of the elements of a network (edges and vertices) be chosen in order to provide a route (or routes) through the network. This group includes the minimal spanning tree problem, the Steiner tree problem, the traveling salesman problem, and the vehicle routing problem, among many others (Magnanti and Wong, 1984). With increasing flexibility in network models, GIS are beginning to move beyond solutions to simple routing problems in order to tackle the more difficult problems of network design (Ahuja et al., 1993). Currently there are implementations of heuristics for problems such as the traveling salesman problem, the maximal covering location problem, or the P-median problem. These combinatorially complex network location problems provide a challenge to GIScientists and will require the integration of both GIS and optimization techniques (Curtin et al., 2005). In the case of network flow problems (such as the flow along a river, through a pipeline, or along a highway), the network must be able to support the concepts of capacity and flow direction. The capacity is generally implemented as an attribute value associated with features. The concept of flow direction is more complex, in that, although a direction for an edge in the network can be assigned with an attribute value, more frequently flow direction is a function of the location ofdand topological connection todsources and destinations. Although flow direction can be determined automatically for some network structures (trees are particularly suited for this), other networks (in particular road networks) will have indeterminate flow unless there are large numbers of sources and sinks. When flow direction is maintained, GIS can be programmed to solve problems such as tracing up- or down-stream or to determine the maximal flow through the network.

1.12.6.4

Network Spatial Statistics

In what may be one of the most dramatic developments in network analysis in GIS a distinct subdiscipline within spatial statistics has developed over the past twodecadesdnetwork spatial statistics. This branch of spatial statistics permits the justifiable analysis of network-based events (Okabe et al., 2006; Okabe et al., 1995). Examples of such phenomena include traffic accidents along roads, pipe breaks along water mains, or pollution samples along river networks. These statistics recognize that some phenomenon can only occur on networks; therefore, the spatial components of the statistics must correctly model network space. This family of statistics includes methods for measuring network density, determining if significant clusters of activities occur on networks, and spatial interpolation across networks, among others. This is a vibrant field within GIS and is likely to produce additional methods well into the foreseeable future.

160

1.12.7

Network Analysis

The Future of Network Analysis in Geography

There is every reason to believe that the future advances in network analysis will continue to reinforce the importance of this subdiscipline within the science of geography. One can see the increasing interest in network analysis across a broad range of applications areas by examining the current literature. For example, a rapidly growing interest in social network analysis has led to contributions including additional network measures of closeness (average distance between nodes), betweeness (number of paths that go through nodes), and centrality (position within network neighborhoods)dall of which are fundamentally geographic concepts. Increased interest is being seen in the development of techniques for network-based spatio-temporal analysis, with the recognition that patterns that suggest an underlying process in both time and space are different from those that appear in only one dimension alone (Eckley and Curtin, 2013). It is likely that the past research on static measures of networks and deterministic analysis on networks will be complemented by more research into dynamic movement through and across networks. Although network flow models have existed in the literature for decades, additional means of describing, visualizing, and measuring flow may come from work regarding the simulation of movement across networks. Advances in tracking objects as they move through space will likely support this area of research. There are also avenues for innovation in network analysis related to behavioral geography and the movement of teams across networks where they are seeking some group goal. As these problems become more computationally complex, the advances in computing power (e.g., massively parallel computing, cloud computing) may allow researchers to pursue problem instances that were not previously tractable. Although there has been a persistent presence of networks and network analysis in GIS throughout its development, an examination of network analysis more generally demonstrates that only a small fraction of the network analytic techniques that have been developed havedas of this writingdbeen integrated into off-the-shelf GIS software. That is, there are many network analytic procedures that could be integrated into GIS relatively easily, but for one reason or another, they have not. These include methods such those addressing minimum spanning trees, k-shortest paths, network transportation (supply/demand) problems, the maximal flow algorithm, and others.

1.12.8

Conclusions

In summary, network analysis in GIS is a major research area, with a strong theoretical history, a solid scientific foundation, and a wide range of methodological approaches. It is an area with broad interest across subdisciplines within geography and beyond. Perhaps most importantly, it is a vibrant and dynamic research area with both growth and refinement of existing methods and the development of new methods over time. It is likely to remain a significant component of the larger area of geographic analysis for the foreseeable future.

References Ahuja, R.K., Magnanti, T.L., Orlin, J.B., 1993. Network flows: Theory, algorithms, and applications, 1. Prentice Hall, Upper Saddle River. Cooke, D.F., 1998. Topology and TIGER: The census bureau’s contribution. In: Foresman, T.W. (Ed.), The history of geographic information systems. Prentice Hall, Upper Saddle River, NJ. Curtin, K.M., 2007. Network Analysis in Geographic Information Science: Review, Assessment, and Projections. Cartography and Geographic Information Science 34 (4), 103–111. Curtin, K.M., 2009. Network modeling. In: Karimi, H. (Ed.), Handbook of research on geoinformatics. IGI Global, Hershey. Curtin, K.M., 2017. Network analysis. In: Richardson, D., Castree, N., Goodchild, M.E., Kobayashi, A., Liu, W., Marston, R.A. (Eds.), The international encyclopedia of geography. Wiley, Chichester, p. 8. Curtin, K.M., Nicoara, G., Arifin, R.R., 2007. A comprehensive process for linear referencing. URISA Journal 19 (2), 41–50. Curtin, K.M., Qiu, F., Hayslett-McCall, K., Bray, T.M., 2005. Integrating GIS and maximal covering models to determine optimal police patrol areas. In: Geographic Information Systems and Crime Analysis. IDEA Group Publishing, Hershey, pp. 214–235. Dantzig, G.B., 1960. On the shortest route through a network. Management Science 6, 187–190. Dijkstra, E.W., 1959. A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271. Eckley, D., Curtin, K., 2013. Evaluating the spatiotemporal clustering of traffic incidents. Computers, Environment and Urban Systems 37 (1), 70–81. https://doi.org/10.1016/j. compenvurbsys.2012.06.004. Evans, J., Minieka, E., 1992. Optimization algorithms for networks and graphs, 2. Marcel Dekker, New York. Federal Highway Administration, 2001. Implementation of GIS Based Highway Safety Analysis: Bridging the Gap (No. FHWA-RD-01-039). Department of Transportation, U.S. Federal Transit Administration, 2003. Best Practices for Using Geographic Data in Transit: A Location Referencing Guidebook (No. FTA-NJ-26-7044-2003.1). Department of Transportation, U.S. Fohl, P., Curtin, K.M., Goodchild, M.F., Church, R.L., 1996. A non-planar, lane-based navigable data model for ITS. In: Kraak, M.J., Molenaar, M. (Eds.), International symposium on spatial data handling, vol. 1. Delft, International Geographical Union, 7B17-7B29. Goodman, J. E. (2001). Maps in the Fast Lane - Linear Referencing and Dynamic Segmentation, vol. 2004. Directions Magazine. Haggett, P., Chorley, R.J., 1970. Network analysis in geography. St. Martin’s Press, New York. Kansky K (1963) Structure of transportation networks: Relationships between network geography and regional characteristics (No. 84). University of Chicago, Chicago, IL. Koncz, N.A., Adams, T.M., 2002. A data model for multi-dimensional transportation applications. International Journal of Geographical Information Science 16 (6), 551–569. Magnanti, T.L., Wong, R.T., 1984. Network design and transportation planning: Models and algorithms. Transportation Science 18 (1), 1–55. Nyerges TL (1990) Locational Referencing and Highway Segmentation in a Geographic Information System. ITE Journal, March, 27–31. Okabe, A., Okunuki, K., Shiode, S., 2006. SANET: A toolbox for spatial analysis on a network. Geographical Analysis 38, 57–66.

Network Analysis

161

Okabe, A., Yomono, H., Kitamura, M., 1995. Statistical analysis of the distribution of points on a network. Geographical Analysis 27 (2), 152–175. Vonderohe, A., Adams, T., Chou, C., Bacon, M., Sun, F., Smith, R.L., 1998. Development of System and Application Architectures for Geographic Information Systems in Transportation (Research Results Digest No. 221). In: National Cooperative Highway Research Program. Transportation Research Board. Vonderohe, A., Chou, C., Sun, F., Adams, T., 1997. A generic data model for linear referencing systems (No. Research Results Digest Number 218). In: National Cooperative Highway Research Program. Transportation Research Board. Wilson, R., 1996. Introduction to graph theory. Longman, Essex. Yamada, I., Thill, J.-C., 2004. Comparison of planar and network K-functions in traffic accident analysis. Journal of Transport Geography 12, 149–158.

1.13

Analysis and Modeling of Movement

Paul Holloway, University of York, York, United Kingdom Jennifer A Miller, University of Texas at Austin, Austin, TX, United States © 2018 Elsevier Inc. All rights reserved.

1.13.1 1.13.2 1.13.2.1 1.13.2.2 1.13.2.3 1.13.2.3.1 1.13.2.3.2 1.13.2.3.3 1.13.2.3.4 1.13.3 1.13.3.1 1.13.3.1.1 1.13.3.1.2 1.13.3.1.3 1.13.3.2 1.13.3.2.1 1.13.3.2.2 1.13.3.2.3 1.13.3.2.4 1.13.3.2.5 1.13.3.2.6 1.13.4 1.13.4.1 1.13.4.2 1.13.4.3 1.13.5 1.13.5.1 1.13.5.2 1.13.5.3 1.13.6 1.13.6.1 1.13.6.2 1.13.6.3 1.13.6.3.1 1.13.6.3.2 1.13.6.3.3 1.13.7 References

1.13.1

Introduction Understanding Movement: The Moving Object Aggregation of Objects Lagrangian Versus Eulerian Movement Sampling Continuous sampling Noncontinuous sampling Sampling lifetimes Idiosyncrasies of the sample Understanding Movement: The Movement Space Structure of Movement Spaces Discrete space Continuous space Alternative space Coupling Movement and Space Discretizing movement Semantic level Feature generation Temporally explicit environmental data Interactions Privacy of moving objects Modeling Movement: Analytical Models Time Geography Movement Parameters Similarity and Clustering Modeling Movement: Simulation Models Spread Models Random Walk Models Agent-Based Models Modeling Movement: Visual Analytics Direct Depiction Aggregation Visualization Challenges Conceptual spatial models Context Animation Conclusion

162 164 164 164 165 165 165 166 166 166 166 166 166 167 167 167 168 168 169 169 169 169 169 171 171 172 172 174 175 175 176 176 177 177 177 177 177 178

Introduction

Movement is ubiquitous across most facets of life and is important for identifying the key processes and patterns within ecology, epidemiology, transportation, geography, sociology, and among many other disciplines. Movement occurs in both space and time, and it is the coupling of these measurement frames into spatiotemporal geographic information that can improve our understanding of many of the geographic processes we study. Furthermore, recent advances in the technologies responsible for recording the locations of moving objects have resulted in an unprecedented amount of spatiotemporal geographic information, such that we are now beginning to observe near-continuous collection of data and ancillary information associated with them. The quality and availability of these spatiotemporal data are allowing researchers to ask increasingly unique and novel questions. For example, the collection of global positioning satellite (GPS) data on the migration of the Burchill’s zebra in Namibia and Botswana recently revealed a previously unknown multicountry route that has been recorded as the longest journey undertaken by an African land mammal (Naidoo et al., 2016). Similarly, an example of continuous data collection is the tracking of vehicles

162

Analysis and Modeling of Movement

163

recorded from road-sensors, taxi-networks, and mobile phones, with this information used to inform people of “real-time” traffic patterns (Tao et al., 2012). It is this concept of continuous movement of objects that requires a new (or improved) approach to recording, analyzing, and modeling movement. Time is an implicit component of most geographic questions. At its fundamental underpinnings, geographic data links a location (x,y) with an attribute (z), as well as the time at which these were recorded (t). However, many of the processes and patterns geographers study are somewhat static and slow to change. Geographic Information Systems (GIS) research has a long history of addressing how to incorporate temporal information. The concept of time geography (Hägerstrand, 1970) has been a regular focus of GIS research (Miller, 1991; Peuquet, 1994; Kwan, 2004; Neutens et al., 2011), and considerable effort has been directed toward implementing a temporal GIS approach (Langran, 1989; Yuan, 1996; Christakos et al., 2012); however, the concept of time has not been fully resolved in GIScience (Kucera, 1992; Goodchild, 2013). The cartographic communication paradigm within GIS has resulted in geographic phenomena traditionally represented as a series of “snapshot” views (Laube, 2014). For example, many geographic features (e.g., forests, land cover, population, continental plates) are presented and viewed on a yearly, decadal, centennial, or millennial time-frame, despite that they are continually moving and changing. This relatively static representation has limited the development of a unified set of tools and approaches to study movement in GIS (Gudmundsson et al., 2012; Laube, 2014). Frameworks for studying movement within a GIS context are beginning to emerge which are unifying the methodological and conceptual challenges associated with the various subdisciplines that address movement (Dodge, 2016). The research continuum proposed by Dodge (2016) apportions movement research into two interconnected factions: where research is aimed toward (1) understanding movement processes, and (2) modeling and predicting movement phenomena (Fig. 1). The inclusion of several subdisciplines facilitates the adoption of established theories and techniques in movement research, permitting studies to be conducted with more rigor and reliability. The use of a unified framework for studying movement will be important as data collection technologies continue to advance, and the amount of spatiotemporal geographic data with explicit movement connotations increases. The research areas within the continuum have large overlap in the concepts and methodologies employed to either understand or model movement. Geographic Information Systems and Science are therefore well-placed to address the challenges associated with studying movement data. The analysis and modeling of movement is a burgeoning research area within GIS; workshops on the analysis of movement data have been held at recent GIScience conferences (2010 Zurich, 2014 Vienna, 2016 Montreal), and special issues on the topic have recently been published in Computers, Environment, and Urban Systems (Purves et al., 2014) and International Journal of Geographic Information Science (Dodge et al., 2016). Many of the current research trajectories in GIS align nicely with those faced by individuals studying movement (e.g., visualization, big-geodata). Moreover, interdisciplinary work between GIScientists and domain experts is increasingly being recognized (Demsar et al., 2015), and GIS has an important and central role in such collaborations. We discuss the analysis and modeling of movement from two perspectives, (1) understanding movement, and (2) modeling movement. In understanding movement, we will focus on how movement data is defined for use within a GIS and on the new challenges associated with applying geographic context to movement. A number of different movement models have been implemented and have wide applications from understanding to predicting movement phenomena, ranging from the descriptive to the quantitative. As such, we will discuss models of movement from three perspectives; analytical, simulation, and visual.

Visualization

Quantifying movement

Prediction

Modeling movement processes

Simulation of movement

Movement observations

Movement & context

Understanding movement processes

Computational movement analysis

Modeling movement Validation

Fig. 1 Movement research continuum. Modified from Dodge, S. (2016). From observation to prediction: the trajectory of movement research in GIScience. In Onsrud, H. and Kuhn, W., (Eds.) Advancing geographic information science: the past and next twenty years. Chapter 9. pp. 123–136. Needham, MA: GSDI Association Press.

164

1.13.2

Analysis and Modeling of Movement

Understanding Movement: The Moving Object

We are now observing smaller and longer-lived GPS devices, along with a plethora of other tracking technologies including very high frequency radio transmitters, radio-frequency identification, surveillance cameras, and Global System for Mobile communication, which can provide unparalleled data on space, time, and in some cases activity of moving objects. The structure of these data (x, y, z, and t) fundamentally remains, meaning data on moving objects can be considered synonymous with geographic data. Some of the fundamental problems associated with defining geographic data for a GIS therefore have not changed, such as the choice of conceptual spatial model to implement. However, novel methodological challenges do exist that need explicit consideration when defining and quantifying movement.

1.13.2.1

Aggregation of Objects

Laube (2017) has outlined a “spectrum of semantic levels” in order to improve the clarification of the type movement that is being utilized across domains. This spectrum ranges from the instantaneous level (represented as fixes and vectors), to interval aggregations (represented as moves and segments), and finally to global aggregations (represented as polygons and field-based surfaces) and is expanded upon in Table 1. The aggregated levels of movement are further confounded when the object being analyzed is itself aggregated. Patterns at one level of organization can often be understood as the aggregates of smaller units (Levin, 1992), so it is important to define explicitly what is moving. For example, questions related to the dispersal of animals are population-level processes; yet the factors responsible for animal movement operate on an individual scale (Jønsson et al., 2016). The aggregation of individual units into a moving object is not without its uncertainties, and how these aggregated objects are built can have significant consequences. For example, Christensen et al. (2013) found that the evacuation times for a four-story building significantly increased by upward of 15min when a homogenous crowd of 100 people was compared with a heterogeneous crowd which included individuals who had various disabilities. Research should continue to be directed toward identifying the concomitant effect that aggregating objects and data have on resultant movement models, and how issues may propagate as this occurs. Alternatively, movement can be deconstructed into the parts that make up the whole. Chavoshi et al. (2015) used reflective markers and infrared cameras within the MotionCapture system to record the position of dancers’ heads, toes, hands, and feet. The body parts were studied as the moving objects, and similarity in movement trajectories was assessed for individuals, providing an improved understanding for how dancers move. This deconstruction can be extended across an array of sports and has wide applications in health rehabilitation. Similarly, Demsar and Çöltekin (2014) quantified hand–eye coordination by studying eye and (computer) mouse movement and used this to illustrate a relatively inexpensive method to capture eye-tracking data. Moreover, the movement of individual components is an active research area in fields such as gaming, where users routinely expect movements to mimic real life (Cass, 2002). The use of these deconstructed movement objects in GIS research has increased in recent years, and perhaps could become one of the most active movement analysis frontiers.

1.13.2.2

Lagrangian Versus Eulerian

The moving object is also studied from two distinctive viewpoints: Lagrangian or Eulerian. The Lagrangian approach involves the use of discrete steps and segments, making it highly useful for analyzing the detailed movements of objects (Fig. 2A). Conversely, the Eulerian approach describes the expected pattern of space use by an object or aggregation of objects in relation to a known (and often fixed) location in space (Fig. 2B). Consequently, differences in the methods used to collect data exist. Lagrangian movement is perhaps best represented by technologies such as GPS, which track an individual object through space and time. Eulerian movement Table 1

Semantic levels of movement as described by Laube (2017)

Semantic level

Description

Fix Vector

A time-stamped location representing x, y, z, and t data The dynamic representation of the fix. This movement aggregation has a direction (azimuth) and a speed and requires two or more fixes to quantify The connection between two successive movement fixes or vectors Movement over more than two successive fixes or vectors. Segmentation is often based on delineation according to quantitative rules (including season, parameters, mode, geographic context) An aerial combination of a collection of movement data. Examples include minimum bounding rectangle (the smallest rectangle that can encompass all fixes and vectors) or a minimum convex polygon (the smallest polygon that can encompass all fixes and vectors) Global aggregation that represents the density of movement represented in a continuous format. Examples include the use of kernel-density estimators Density estimations can be extended, with probability of space use based on speed, direction, and locational position. This is achieved through the prism of probabilistic time-geography and the random walk model

Move Segment Area Density Density and time

Analysis and Modeling of Movement

(A)

165

(B)

A

(C)

(D)

Fig. 2 Examples of (A) Lagrangian and (B) Eulerian approaches to movement, represented using (C) vector and (D) Voronoi polygon conceptualizations.

better captures space use and is representative of movement such as traffic in an “average speed” zone, where the cameras are fixed and the movement (and obviously speed) of individual cars is tracked between cameras.

1.13.2.3

Movement Sampling

Geographic data (and any subsequent inferences we make from them) are highly dependent on the sampling regime used. Movement is in most cases a continuous process, and even when objects are stationary, they still have geographic information (x, y, z, and t) associated with them. Therefore, moving objects are subject to a number of different sampling schemes that, if implemented, will have important consequences for the structure of the sampled movement data. The two predominant ways that moving objects have been sampled have been in continuous and noncontinuous regimes (Laube, 2017).

1.13.2.3.1

Continuous sampling

Continuous sampling collects data on the location of the moving object at regular intervals that are representative of “continuous movement.” While movement data is increasingly being collected at finer resolutions (e.g., subsecond), continuous time is still an artifact of the scale used. The intervals (resolution) at which data are collected are subject to the researcher’s interpretation of what constitutes continuous. While the sample that is collected depicts a continually moving object, due to the nature of the resolution used, this sampling is more representative of a “regular” abstraction than a “continuous” one. Continuous (or regular) sampling can also result in very large and noisy datasets. For example, movement of an individual over the course of a day could provide detailed data on their movements at home, at work, and in-between; however, due to the time spent at each location, most of these data would be redundant for a transport planner working on improving a city’s commuting routes. Therefore, noncontinuous sampling aims to collect less overall but more targeted movement data.

1.13.2.3.2

Noncontinuous sampling

Noncontinuous sampling has been achieved by using either irregular or event sampling (Laube, 2017). Irregular sampling targets fine-grained bursts of data collection interspersed with long periods of inactivity providing a highly detailed account of movement for a specific time period, while also preserving the life of the data collection technology (Laube, 2017). Ideally, this type of sampling should be targeted to an event or behavior, otherwise the stochasticity in data collection may skew the movements that have been recorded, making such a dataset difficult to interpret and analyze. Some technologies are now being fit with sensors that remain inactive until they are in proximity with another moving object (e.g., distance thresholddDrewe et al., 2012) or until certain biophysical readings are recorded (e.g., heart ratedWikelski et al., 2003). This has allowed for data collection on the movements of an object that are periodic in nature, such as interactions with other individuals or behaviors such as hunting or sprinting. However, despite the potential to provide a stratified sampling regime, these technologies are not without their own uncertainties that need to be considered. For example, Drewe et al. (2012) identified that the distances within which sensors located each other increased as the age of the technology also increased. Regardless, the potential for targeted, event-driven sampling of movement data

166

Analysis and Modeling of Movement

should provide significant advancements in interpretation of moving objects and could be extended to address novel sampling issues associated with moving objects.

1.13.2.3.3

Sampling lifetimes

“Lifetime tracks” have emerged due to advancements in technologies, whereby solar panels can keep batteries charged indefinitely for some animal GPS trackers, therefore recording movement across a life span (Kays et al., 2015). Therefore, we are beginning to observe data collected across longer temporal extents, and how these lifetime tracks are deconstructed and whether these lifetime tracks can validate “shorter” samples or segments of movement will be emerging research frontiers in the coming years. For species that hereditarily receive information on movement patterns, is a lifetime track just another sample in the broader scheme of life? An historic migration pattern of Burchill’s zebras from the Okavango Delta to Makgadikgadi National Park in Botswana was recorded in 2008–09 for the first time since anecdotal accounts by explorers in the early 20th century, due to the removal of a veterinary fence that had blocked the route since 1968 (Bartlam-Brooks et al., 2011). Alternatively, increasing availability to navigation technologies is changing how people move in a relatively short time-frame. The use of vehicle navigation systems has increased dependency on such technologies as well as a reduced capacity for users to form cognitive maps (Burnett and Lee, 2005). Furthermore, tracking technologies have led to significant cultural and societal shifts. For example, younger generations of Inuits (who until recently had no word in their language for “lost”) are relying more heavily on GPS-tracking devices and have neglected learning traditional skills, such as tracking (Robbins, 2010). How generational knowledge (instinctive or forgotten) is incorporated within movement analysis could be an important research frontier, and how GIS is changing movement behaviors also warrants investigation.

1.13.2.3.4

Idiosyncrasies of the sample

Finally, a good sample should be representative of the population, meaning how moving objects are selected should also be considered. Does bias exist in the behavior of animals that are tagged with radio collars? Are there differences in the individuals who agree to terms and conditions for mobile providers to track them? Do the driving routes of people with newer vehicles with in-built tracking technologies differ from those with older vehicles without the technology? Idiosyncratic preferences of a few individuals have been found to greatly influence the inferences made from a sample of moving objects (Lindberg and Walker, 2007), so this is also an important sampling issue that should be considered here.

1.13.3

Understanding Movement: The Movement Space

In their seminal paper, Nathan et al. (2008a) introduced movement ecology as a research paradigm and observed that the “environment” is one of the four driving forces behind movement processes. Barriers and corridors to organism movement (e.g., rivers, forest, roads) all have a driving influence on where objects can move, while goal-orientated movement toward resources (e.g., food, shelter) are undeniably interlinked with the environment, as the location of such features determines the end location of movement steps. Furthermore, the location of other moving agents causes responses entrenched with movement (e.g., attraction, avoidance). This concept of the environment influencing movement can easily be expanded beyond organisms and is arguably applicable to all moving entities. It is therefore imperative that movement be analyzed within the geography through which these patterns and processes operate, and GIS is well-positioned to aid in the development of context-aware environments through which movement occurs.

1.13.3.1 1.13.3.1.1

Structure of Movement Spaces Discrete space

For movement represented in discrete space, such as movement tracked through the neighborhoods of cell phone towers, the geometry is an important construct that has wide-ranging implications for the resulting movement patterns. Holland et al. (2007) compared cell-to-cell movement of individuals in a virtual landscape across a number of different geometries and identified that the choice of landscape structure can systematically bias the direction and distance of movement. A key finding of their study was that environments constructed from regular geometries had the potential to significantly bias movement, as the regular grids restrict the direction of travel taken at each step resulting in a dependent relationship between movement and the landscape structure. Conversely, the use of an irregular grid (e.g., Voronoi cells) resulted in a uniform circular distribution of movement patterns and subsequently less directional bias. This led Holland et al. (2007) to recommend the creation of irregular-gridded landscapes to better control such biases introduced by the “unnatural” gridded representation of the world. A number of different landscape models exist (Lindenmayer et al., 2007), and the research should continue to focus on how these models influence the movement processes observed.

1.13.3.1.2

Continuous space

For moving objects, such as animals or people tracked by GPS, the environment is best represented as a continuous Euclidean space (Laube, 2017). The construction of underlying continuous geographic space also has important resultant consequences for the analysis of movement data. Any continuous structure is dependent on the scale used to generate it. For continuously moving objects, tracking technologies have to “fix” their location in both space and time. For example, GPS fixes are recorded to a minimum of 0.001

Analysis and Modeling of Movement

(A)

(B)

(C)

(D)

(E)

167

Fig. 3 Diagram of how GPS resolution and temporal resolution can influence movement parameters. (A) The actual movement path (black line), with its position at each sample time (1 s) shown by white circles on a grid of 0.001 arc degrees. GPS fixes (black circles) are snapped to the grid at (B) 1 s temporal resolution and (C) 10 s temporal resolution. The spatial resolution of the GPS resolution is increased to 0.002 arc degrees, and the same temporal resolutions are shown in (D) 1 s and (E) 10 s. Modified from Ryan, P. G., Petersen, S. L., Peters, G., and Gremillet, D. (2004). GPS tracking a marine predator: the effects of precision, resolution and sampling rate on foraging tracks of African penguins. Marine Biology, 145(2), 215–223.

longitude and latitude, meaning the precision, resolution, and sampling rate of the moving object can greatly influence the location and values of movement parameters. Sampling at a finer temporal resolution (Fig. 3B) overestimates the distance travelled by the individual, while sampling at a coarser temporal resolution decreases the distance travelled (Fig. 3C). These results are amplified when the spatial resolution of the coordinate grid is doubled, but still representative of continuous space (Fig. 3D–E). The need to effectively control the spatial precision of GPS-recorded coordinates is well-documented (Frair et al., 2010); however, the spatial resolution of the underlying coordinate system is seldom considered for its interactions with temporal resolutions (although see Ryan et al., 2004; Laube and Purves, 2011).

1.13.3.1.3

Alternative space

O’Sullivan (2001) introduced graph-cellular automata as an alternative movement space, whereby a proximal conceptualization of space defines neighborhoods with both a notion of site and situation (Couclelis, 1997). The proximal concept does well to encompass both absolute and relational conceptualizations of space (O’Sullivan, 2001) and has become a common implementation within studies implementing a cellular approach to urban growth (Badariotti et al., 2007; Barreira-González and Barros, 2016). This method should be applicable to certain simulated aggregation models which study moving objects and is a possible avenue for future movement research.

1.13.3.2 1.13.3.2.1

Coupling Movement and Space Discretizing movement

Environments are regularly conceptualized using the raster viewpoint, in part due to the plethora of environmental datasets that are generated in such a format (e.g., remotely sensed imagery, digital elevation models). The rasterized structure of gridded environments contradicts a vector conceptualization of continuous movement, yet such data models are regularly coupled in order to apply geographic context. Certain models subsequently require the discretization of continuous movement into a gridded structure. For example, Cunze et al. (2013) discretized the trajectories of animal home range movements to predict the dispersal (via animal transportation) of European flora alongside a gridded representation of habitat suitability. Such discretization can result in an overestimation of movement, particularly if the resolution used to generate the environmental grid is substantially coarser than

168

Analysis and Modeling of Movement

the movement distances being undertaken by the object (Bocedi et al., 2012). Furthermore, how movement is considered (center to center, area to area) can have asymptotic properties for discretized movement that is modeled over multiple time steps (Chipperfield et al., 2011).

1.13.3.2.2

Semantic level

The choice of semantic level of observation can alter the inferences made. The relatively simple question of “what is accessible?” provides an interesting and nuanced appreciation for how we define the environment relative to the moving object. The geographic area deemed accessible to moving objects constructed either as “fixes” or “moves” changes substantially depending on the aggregation of movement data used. Fig. 4 illustrates the area of a gridded landscape that was accessible to simulations of brown hyena movement trajectories originating from one grid cell over a 1-year period. The area deemed accessible to the “moves” is substantially larger than that accessible to the “fixes,” and this difference continued to increase as the number of individuals increased.

1.13.3.2.3

Feature generation

As the semantic level of movement advances beyond traditional “static” representations of space and time, we also need to advance how we construct the underlying geography. Subsequently, an ontological discussion of how we define the environment is needed. This is particularly pertinent when we consider that environmental variables are increasingly used as covariates in statistical models that are used to infer movement–environment relationships. Step-selection function (SSF) is a powerful spatial statistic that uses movement data to analyze patterns based on the underlying environment, considering the landscape variables associated with the observed movement, and statistically comparing this to alternatively generated movements that an object theoretically could have taken (Fortin et al., 2005). Covariates have been implemented as fixes (e.g., landscape cover at end of a stepdBjørneraas et al., 2011), as moves (proportion of landscape cover along a stepdCoulon et al., 2008), and as segments (mean distance from segment to environmental featuredPereboom et al., 2008), meaning how the environmental variables are represented in the model can have large implications for any movement–environment inferences made. Furthermore, many objects move in response to linear features (e.g., roads or fence lines); however, the digital representation of these landscape features contains no surface (e.g., depicted as a polyline), meaning the chances of a move or fix corresponding exactly with this feature is highly unlikely (Thurfjell et al., 2014). The need to develop “fuzzy” landscape structures is needed to better deal with these new conceptual issues arising from differences in the semantic level of movement. Similarly, how environmental variables are generated can have important consequences for other modeling decisions. In a study investigating the impact of different user-decisions on SSF results, such as the methods of generating alternative movement steps, Holloway and Miller (2014) found that user-decisions caused differences in the results when movement observations were treated either as fixes or moves for oilbirds in Venezuela. For example, the method to select a step-length for the alternative movement steps resulted in large and significant coefficient differences when the landscape variable represented the land cover at the end of the movement step (fix), while no significant differences were observed in the coefficients when distance to a linear feature was averaged across the entirety of the movement step (move). The construction of the movement space and the digital representation of environmental variables unquestionably influence multiple facets of analysis and modeling of movement.

250,000

Area (km2)

200,000

150,000

100,000

50,000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Fixes

Moves

Fig. 4 Area accessible to brown hyenas originating from one grid cell when movement was simulated (using a biased correlated random walk) over a 1-year period at a 1 km resolution.

Analysis and Modeling of Movement 1.13.3.2.4

169

Temporally explicit environmental data

With the rapid increase in fine-grained movement data, researchers are experiencing a dichotomy in the spatial and temporal resolution of the movement data and the corresponding environmental data utilized in research. This discrepancy has led to researchers questioning the resources spent on developing highly sophisticated models to investigate the importance of the environment on movement when models are built using fine-grained movement observations and coarse-grained and static (or permanent) representations of the underlying geography (Hebblewhite and Haydon, 2010). This static viewpoint of geographic data has in part tempered the development of movement analysis, with environmental variables not regularly generated at the corresponding temporal intervals needed to perform rigorous analysis. The use of temporally explicit environmental variables which better match the resolution of the moving object will subsequently improve analysis and prediction of movement. The Environmental-Data Automated Track Annotation (Env-DATA) system was developed to automate the generation of environmental attributes for animal movement trajectories with ambient atmospheric observations and underlying landscape information (Dodge et al., 2013). This system links with the animal movement data repository Movebank (https://www.movebank.org/) and annotates each movement observations with interpolated (openly available) environmental variables in both space and time. This system does well to address the issue outlined by Hebblewhite and Haydon (2010) and provides the “closest” or “most representative” spatiotemporal environmental variable associated with the movement observations.

1.13.3.2.5

Interactions

Interactions between and among moving objects are an important component of the movement space that needs consideration. Detailed accounts of individual mobility based on travel diaries have been used to model potential for social interaction as an important metric for defining urban amenities, and modifications in the environment such as lengthening work–home commuting distances have been shown to decrease potential social interaction (Farber and Paez, 2009). Moreover, movement spaces are shifting. The widespread uptake of social media has provided a new space in which interactions can occur. Passive interactions among individuals maintain mutual awareness, which can influence movement decisions such as migration and integration (Komito, 2011), which in turn influences the geography on an area. The spaces and movements created from social media are fundamentally different from the structures currently implemented in a GIS. Measuring interactions with movement data for individuals has focused traditionally on static representations of interaction (spatial overlap). However, spatial and temporal overlap in the locations of moving objects needs to be considered (dynamic interactions) and recent research in this area has addressed the ability of different metrics to measure actual dynamic interaction phenomena where movement of two or more objects is not independent (Long and Nelson, 2013b). Various measures of “interaction” have been developed, but most of them fail to reliably quantify whether individuals are in fact interacting or whether they are just in spatiotemporal proximity (Long et al., 2014a; Miller, 2012, 2015).

1.13.3.2.6

Privacy of moving objects

The application of geographic context becomes difficult when the privacy of the movement data is considered. Obfuscation is the process of degrading the quality of the location information in order to provide some level of privacy to the data (Duckham and Kulik, 2005). This is an important concept, as the users of many mobile applications do not want their exact location tracked. Similarly, real-time locations of wildlife can have severe conservation and poaching consequences. Such concerns have been controlled in a movement context by creating fuzzy boundaries of where an object is located, allowing trajectories to be represented by spatial data quality measures such as inaccuracy, imprecision, and vagueness (Laube, 2017). A large body of literature within the GIS community exists on privacy and location (Duckham et al., 2007), and Hwang et al. (2014) provide a detailed account of the novel issues associated with the obfuscation of trajectories. Issues of privacy in geographic data is well-established (e.g., the aggregation of household data to the census track), so GIS and geography should be well-placed to inform on such “fuzzy” contextualization.

1.13.4

Modeling Movement: Analytical Models

A number of different models have been implemented in order to study movement. In a recent review, Long and Nelson (2013a) outline seven main classes of methods used to analyze movement, encapsulating the fundamental ways that movement data have been quantitatively analyzed within a GIS context to date. These methods included (1) Time Geography, (2) Path Descriptors, (3) Path Similarity Indices, (4) Pattern and Cluster Methods, (5) Individual Group Dynamics, (6) Spatial Fields Methods, and (7) Spatial Range Methods. There has been notably less research directed toward implementing qualitative analytical models within a GIS context to explore movement, although techniques for handling and analyzing qualitative activity space data have been developed, but are better represented when discussing visualization (Kwan, 2007, 2008; Mennis et al., 2013).

1.13.4.1

Time Geography

When discussing the analytical models used to investigate spatiotemporal movement data, it is useful to begin with a discussion on time geography. Time geography (originally presented by Torsten Hägerstrand) conceptualizes the constraints and trade-offs of human movement within both spatial and temporal dimensions. Hägerstrand (1970) posited that there were three types of constraint on movement, those enforced by physiological necessities (capability constraints, e.g., the need to sleep), those enforced

170

Analysis and Modeling of Movement

by the need to be at a certain location for a specified duration (coupling constraints), and those enforced by the cultural, societal, or institutional rules or customs (authority constraints, e.g., school is only accessible during school hours). The advancements in data collection technologies and availability of movement data have seen time geography experience a recent renaissance (Miller, 2005b), with Miller (2005a) proposing a measurement theory that has provided time geography with an analytical framework in order to make statements about error and uncertainty. A time geography framework uses a 3D movement space with two spatial axes and a third orthogonal axis which represents time (Fig. 5). The terminology for this 3D space has converged on “space-time cube” (STC) and is a predominant method through which movement data are being analyzed and visualized. Anchor points denote the beginning and end of fixed activities (e.g., work), and movement is represented as a path along the three axes between activities (Fig. 5A). The most common representation of space and time in an STC is the space-time prism (Fig. 5C), which quantifies the possible movement between anchor points based on the three environmental constraints. The space-time prism can then be used to calculate the potential path area (PPA), which is a spatial representation of accessibility along the spatial axes. Future movement can be predicted using the space-time cone (Fig. 5D), which represents all future movement possibilities within the environment. The theory and concepts outlined in time geography are being used across various GIS and movement fields, including wildlife ecology (Long and Nelson, 2012), transportation (Neutens et al., 2011), tourism (Hall and Page, 2014), gender (Kwan, 1999), and medical geography (Schærström, 1996).

(B)

t

t

(A)

x

y

y

x

y

(D)

t

t

(C)

x

x

y

Potential path area (PPA) Fig. 5 The space-time-cube (STC), with the (A) space-time path representing movement, (B) the space-time path within geographic context, (C) the space-time prism and potential path area (PPA), and (D) the space-time cone representing future movement constraints.

Analysis and Modeling of Movement 1.13.4.2

171

Movement Parameters

Movement parameters are used to infer information about the object’s internal state, subsequent movement behaviors, and the motion capacity of individuals. Movement parameters can be categorized into three main groups: primitive parameters, primary derivatives, and secondary derivatives (Dodge et al., 2008). Primitive spatial parameters consist of the position of the moving object and the time interval that locations are obtained at. Primary derivatives are generated from primitive parameters (e.g., distance and direction), while secondary derivatives are higher order parameters of movement (e.g., acceleration and sinuosity). These parameters can be further organized into spatial, temporal, and spatiotemporal dimensions. Parameters have been used to quantify movement across a range of applications, used as the end-product model or as inputs into more sophisticated models. At their core, movement parameters are geographic attributes and are subject to the same scale uncertainties as more traditional geographic features such as slope (Anselin, 1989). In an extensive study on the effect of temporal resolution of movement parameters for cows in Australia, Laube and Purves (2011) found a steady decrease in median and overall variance in speed as the temporal resolution coarsened, as well as lower minimum values at the finer temporal scales. Subsequently, the generation of movement parameters is subject to many of the conceptual uncertainties identified when defining and contextualizing moving objects, warranting explicit consideration early in the analytical framework. However, as will be discussed in the coming section, a large volume of work also exists on fitting simulation models to movement parameters, as well as the explicit use of parameters within spatial models. This provides an alternative conceptualization of where movement parameters belong in analyzing movement data and is the reason behind Long and Nelson’s (2013a) inclusion of these metrics as a stand-alone quantitative analytical model.

1.13.4.3

Similarity and Clustering

Clustering partitions data based on a set of unifying features, such that groups of similar attributes are created, while similarity indices provide a measure of “distance” between trajectories. A number of techniques for measuring clustering and similarity have been proposed within spatial and nonspatial domains (Witten and Frank, 2005; Duran and Odell, 2013). The basic similarity methods that have been proposed in movement analysis measure the spatial distance between movement observations and have been used to indicate interactions or dependent movement. The simplest of metrics measures Euclidean distance between two objects. Advancing from this, the Hausdorff distance metric identifies the largest distance between two points across two segments, or in other words, the greatest distance from a location in trajectory a, to the closest point in trajectory b. These metrics do not consider the temporal sequence of trajectory data, which has led to the Fréchet distance metric being used, which measures the maximum distance between two (temporally) coinciding fixes. Approaches have also involved the segmentation of a trajectory based on generated movement parameters. Trajectories of analogous speeds and sinuosity could represent the same movement process, and so it would reason that these paths be recorded as

End

End

Start

Start

Trajectory b: MMSFF

Trajectory a: MMMMM

F = fast M = medium S = slow e = elevation b = barrier

4

3

End

Start

2

3

2

3

4

5

Trajectory d: MMSFF | MMSbFF

Trajectory c: MMSFF | MMSeFF

1

1

5

4

5

End

2

Start

1

1

2

3

4

5

Context Elevation

Barriers Fig. 6 Examples of movement paths for individual runners, categorized by the speed of individual miles. Context identifies the geographic features associated with trajectories c and d.

172

Analysis and Modeling of Movement

such. For example, two movement paths could be identical in length and spatial location, but differ in terms of various parameters. Fig. 6A–B illustrates the speed of a two runners’ five-mile movement paths. The shape, length, and direction of two paths appear similar, but when segmented based on speed we have two different representations. Dodge et al. (2012) implemented this concept in a movement analysis context by creating a string of symbols representing different movement parameters and successfully representing North Atlantic Hurricane trajectories and GPS tracks of couriers in London. In the example of our runners (Fig. 6A–B), this would equate to MMMMM and MMSFF. The use of non-spatial sequential clustering analysis is an active research area in computer science where methods such as decision trees, eigenvectors, and genetic equations are used particularly in the domains of linguistics (Witten and Frank, 2005), meaning the opportunity for collaboration could provide new insights into classifying trajectories. The methods outlined thus far all fail to incorporate context. Trajectories may appear similar when considered outside of geographic context, but differ when geography is applied (Demsar et al., 2015). Fig. 6C–D illustrates two identical movement trajectories for our runner, each with a segmentation MMSFF. Unless the context is taken into account, the metrics outlined thus far will fail to accurately categorize these trajectories, grouping them together. However, in trajectory c, the acceleration and deceleration rates correspond with the slopes the runner is traversing, while in trajectory d, the changes in rates are due to barriers such as having to slow to cross roads. This deceleration could be represented in the segmentation as e (elevation) and b (barrier), meaning the two sequences are now presented as: MMSeFF and MMSbFF. Such context could help inform fitness trackers that record such information to select ideal routes for individuals when training for specific events (i.e., hill running would be advantageous to city running when training for cross-country races). Context has also been applied within such similarity and cluster techniques by using effective distances (e.g., least-cost path measurements) for hurricane trajectories (Buchin et al., 2012).

1.13.5

Modeling Movement: Simulation Models

Simulation is an important tool to test scientific hypotheses by comparing observed movement to a null model for validation purposes (Dodge, 2016), as well as for visualizing data and generating a range of outcomes from analytical models. However, simulation is also itself an analytical tool and has been used to provide insights into how system-level properties emerge from the adaptive behavior of individuals as well as how the system affects individuals (Grimm et al., 2006). Simulations are implementations of the underlying conceptual and mathematical models outlined in the previous section and are highly useful in systems where replication is not possible. For example, evacuation scenarios should not be replicated 1000 times to identify the most suitable location for emergency exits, meaning that simulation is a necessary and highly important modeling stage. Different spatial simulation models have been developed to understand and predict movement, comprised of two distinct conceptualizations; spread models and random walks. In spread models, movement is considered a diffusive behavior through a neighborhood (often in a gridded representation), while random walk models simulate individual movement paths based on the location of previous steps and using empirical or theoretical distributions from which parameters such as step-length and relative angle are drawn.

1.13.5.1

Spread Models

Discrete spatial simulation models (such as the movement of individuals between cell-phone tower zones) is represented using a discrete (often cellular or lattice) movement space. Each cell (or observation) has a state, which changes iteratively in time and often in accordance with a set of predefined rules. Models using this structure perform well for simulating an object that spreads (or aggregates) throughout space, such as forest encroachment into grasslands or disease transmission throughout a population. Simulation of this movement is commonly implemented through a cellular automaton (CA) approach with information and agents diffused throughout the landscape. Movement rules can vary, with different CA frameworks proposed for different applications. A simple, yet effective, model to simulate spread over time is the “majority rule,” which specifies that the state of a cell in the next time step is dependent on the majority value in the neighborhood (e.g., Merow et al., 2011). More sophisticated CA models apply transition rules, with a hypothetical example provided in the pseudo-code and Fig. 7: if cell i ¼ k, and if cell i ¼ k for time - 3 (the three previous iterations), and l if neighboring cell j! ¼ k, l at time þ 1: j ¼ k. l l

These transition rules can vary and can encompass a wide variety of rules and assumptions. These models all produce spatial patterns by scaling up local processes, so an important and common feature of these methods is that they are all based on an assumption that movement is dependent on the immediate neighborhood. This concept is rooted in Tobler’s (1970, 236) First Law of Geography, which states that “everything is related to everything else, but near things are more related than distant things.” Adjacency can be represented in numerous ways, with the most common methods defining the neighborhood as the four immediate neighbors (von Neumann or rook’sdFig. 8A) or the eight surrounding neighbors (Moore or Queen’sdFig. 8B). Alternatively, the neighborhood can be defined as a specified distance buffer of various shapes which is representative of the movement capacities of the object (Fig. 8C–D). The choice of neighborhood definition can strongly influence the outcomes of these spatial simulations. For example, Holloway et al. (2016) compared 20 simulations of dispersal using various definitions of neighborhoods to

Analysis and Modeling of Movement

173

== k !=k t–3

i

Fig. 7

t–2

j

i

t –1

j

t+ 1

t

i

j

i

j

i

j

Illustration of the transition rules in a hypothetical cellular automata model.

(A)

(B)

(C)

(D)

Fig. 8 Neighborhood space defined as (A) four immediate neighbors, (B) eight immediate neighbors, (C) circular distance buffer, and (D) square distance buffer.

investigate how the geographic ranges of British breeding birds would respond to climate change. They found that the choice of neighborhood in dispersal models strongly influenced the accuracy of the results, with a simple delineation of rook’s or Queen’s connectivity resulting in an aerial difference of over 2000 km2 when all the species were considered together. Simulation models are increasingly using more complex or probabilistic representations of neighborhoods to simulate discrete movement. Kernels extend these neighborhood definitions of accessibility by calculating the probability density function (PDF) describing the number of moving objects as a function of distance from the source, with the advantage that they can incorporate long-distance movement events. The possibility of movement occurring beyond an immediate neighborhood is founded on Reid’s paradox, which acknowledges long distance dispersal events in spite of relatively short seed dispersal distances that have been recorded. (Clark et al., 1998). This concept has been widely utilized in ecological research and is something that is thought to play an important role in determining broad-scale processes of population spread (Nathan et al., 2008b). Several gridded models now incorporate a measure of long-distance movement (Engler and Guisan, 2009; Smolik et al., 2010; Merow et al., 2011). The use of kernels in a discrete movement space does well to capture movement throughout a neighborhood, with distances representative of the chosen PDF; however, it does introduce a methodological artifact. The continuous nature of a probabilistic kernel is predicated on the fact that, at any given point in a continuous sample space, a value can be interpreted based on the relative likelihood that the value of the random variable would equal that point. Incorporating a continuous PDF into a discrete space results in a “blocky” representation that is dependent on the spatial resolution. With the artifacts of scale noted throughout this article, discretizing probabilities has not particularly been acknowledged in spatial simulation. As noted, cellular modeling imposes

174

Analysis and Modeling of Movement

limitations in representing movement and designing realistic spaces for agent interactions (Torrens, 2012), and subsequently, such models are not always suited to decision making (Parker et al., 2003).

1.13.5.2

Random Walk Models

The random walk framework is a stochastic representation of an individual movement trajectory. This method has been widely used in disciplines such as movement ecology (Codling et al., 2008) and the social sciences (Torrens, 2012), but also in disciplines where the simulation is not inherently spatial, such as finance (e.g., stock price fluctuationsdFama, 1995). The random walk concept is built on understanding the incremental fixes and moves in order to simulate how a trajectory will develop over time (O’Sullivan and Perry, 2013). Several extensive reviews have been written on random walk models across the disciplines, so for a more detailed insight into how random walks can be used within movement analysis, we suggest Morales et al. (2004), Benhamou (2007), and Codling et al. (2008) within a movement ecology context, and Torrens (2012) for a social geography context. O’Sullivan and Perry (2013) provide a readable account of random walks from a GIS perspective. Simple random walks constitute a movement path, whereby the next turn angle is generated randomly from all possible options. Within a discrete movement space, an object moves through a lattice, selecting a random step from the predefined neighborhood, while within a continuous environment a step is selected randomly from the (0–360 degree) distribution of turn angles. The displacements observed through such models (particularly in discrete spaces) are oftentimes relatively small, due to the lack of possible steps (dependent on the neighborhood) and directional bias. In fact, simple random walks have quantifiable properties related to displacement and distance rooted in both mathematics and the structure of the movement space (O’Sullivan and Perry, 2013). Subsequently, the conceptual challenges outlined throughout this article associated with contextualizing movement throughout different spatial constructs are still pertinent. Random walks can also have variable step-lengths. Where steps have a finite variance, once a sufficient length of time has passed the location coordinate of an object on any axis converges on a Gaussian distribution (Codling et al., 2008). In addressing this, super-diffusive walks, such as the Lévy walk, have been proposed, which advance simple random walks by assuming a fat-tailed distribution of step-lengths (Benhamou, 2007). These models do well to replicate the large “jumps” or “steps” observed in real movement data; however, debate exists as to how applicable these models are in ecology, with arguments in support (Viswanathan et al., 1996, 2011) and in caution of wide acceptance (Benhamou, 2007; James et al., 2011). Within complex systems, a simple random walk would be too simplistic to represent the actual trajectories of moving objects. Exceptions are found in the form of molecules and gas particles that do move randomly through their environment, but overall these models are too simplistic to represent realistic movement patterns for the majority of objects. Building on this framework, a large number of simulated models are specified using correlation and/or bias, which describe movement in terms of a number of parameters, specifically directional persistence (correlation) and a consistent bias toward a given target (bias) (Codling et al., 2008). Simplistic correlated random walks have a directional persistence, meaning each step tends to have the same direction as the previous one (Codling et al., 2008). Such simulations can encompass rules such as self-avoidance, which prevent objects from returning to a location for a specified time, or reinforcement, which increase the probability if a step has already been made (O’Sullivan and Perry, 2013). One issue with all these models is that they lack context, and so a measure of bias is needed for an environmental awareness of the movement space. Bias in random walks can consist of movement along environmental gradients (e.g., chemical attractiondvan Haastert and Postma, 2007), or toward specific environments (e.g., displacement toward favorable habitatdFrair et al., 2008), and random walks have been extended to account for many different factors. For example, South (1999) developed a red squirrel home range model, where individuals moved in response to a built memory map of food resources, or in response to the distribution of nest sites, that was characteristic of observed movement patterns. In simulations that involve internal states, subjective choices are required to parameterize models. Certain parameters, such as behavioral states, preference for habitat bias, or estimating attraction or avoidance of agents, can be difficult to quantify, calibrate, or justify and have subsequently been labelled soft factors (Bonabeau, 2002). The quantitative outcomes of models often influence management decisions; however, due to the subjective nature of parameter estimation, it has been suggested that these outcomes in simulation models should be interpreted at a purely qualitative level (Bonabeau, 2002). Error analysis should therefore be conducted to estimate how parameter states influence the modelled outcomes. The issue of uncertainty and sensitivity is a longstanding focus of GIS (Crosetto and Tarantola, 2001), and a lot of research has been directed toward estimating and visualizing uncertainty in movement models (e.g., Laube and Purves, 2011). In reality, outputs from models will range from qualitative to quantitative, making interpretation dependent on the situation of the model, the data that has been used to generate simulations, and the overall purpose of the model. Due to the subjective nature of parameter estimation, simulations based on quantitative analysis of real movement observations have been proposed to strengthen such models (Dodge, 2016). In simulating pedestrian movements, Torrens (2012) developed a model whereby agents made use of detailed geographic information that informed dynamic behaviors, allowing movement processes and any aggregate interpretations (e.g., movement of the crowd) to be construed as naturally arising from agent–environment interactions. This model was proposed due to the reliance on abstract, physics-driven models that (largely) ignore real-world observations and theory about the processes that drive pedestrian movement. Similarly, Ahearn et al. (2017) used the Chi-square distribution derived from the relationship between an agent’s GPS observation and information about the underlying environment

Analysis and Modeling of Movement

175

to determine the next probable move made for a simulation of tigers in Thailand. The use of real movement observations in the simulation process allows for more robust analysis than models fit solely on subjective decisions. Probabilistic time geography derives a probability surface that considers constraints on movement (Winter, 2009). Downs (2010) extended this by developing the time geography density estimator, which creates a probability surface for a moving object that is confined to the PPA, instead of the commonly used Gaussian distribution. This creates a density surface representative of the probability for an object to visit a location based on constraints. Extensions on generating a probability surface within the PPA have been undertaken within a network space for vehicles (Downs and Horner, 2012), providing geographic context to such models. Probabilistic time geography is based on random walks within a cellular automata environment to derive a probability surface that considers constraints on movement (Winter, 2009). Winter and Yin (2010) advanced this by using biased random walks, in the form of Brownian motion (movement based on a continuous-time stochastic processdMörters and Peres, 2010) to simulate goal-orientated behavior of moving objects. A large number of random walk simulations provided the frequency of visitation to locations which were then normed into a probabilistic surface. Such random walk models have been advanced within a probabilistic time geography concept; for example, Prager and Wiegand (2014) used a biased random walk to model probability density of Flickr users in New York City. Song and Miller (2014) created a density estimator using simulated correlated random walks and truncated Brownian bridge movement to compare the distribution of visit probabilities in a planar space. Such simulation methods improved interpretations of accessibility within the space-time prism and controlled for mobile object locational uncertainty. Random walks have also been used in order to provide validation to observed movement data. Laube and Purves (2006) used Monte-Carlo simulations to generate synthetic lifelines of motion patterns created by a group of moving point objects and constrained these simulations by the statistical properties of real observational data. They then compared the simulated random walks with observational data to assess the significance of the patterns, and once patterns of interest were identified, they returned again to the observed data to investigate whether these patterns had domain (movement ecology and political science) relevance. Similarly, Miller (2015) used a null model approach and compared a set of interaction statistics for pairs of brown hyenas with those generated by a correlated random walk. The use of a random walk model within the null model framework allowed for more rigorous analysis for how well the interaction statistics performed. Likewise, Long et al. (2014b) used random walk theory to investigate the usefulness of skew-normal distribution to model kinetic-based movement probabilities under a probabilistic time geography approach, simulating walks for comparison in the proposed models.

1.13.5.3

Agent-Based Models

Despite accurately replicating many ecological, geographic, and urban systems, cellular modeling imposes limitations in representing movement and designing realistic spaces for agent interactions (Torrens, 2012). However, they are increasingly being combined with other methods of simulating movement in geographic systems, such as random walks. One method through which aggregation and random walks have been coupled is through agent-based modeling (ABMdParker et al., 2003). ABM relies on the idea that emergent characteristics of systems are understood by examining subcomponent relationships (Parker et al., 2003). ABM supports a wide range of applications by capturing the structural and functional complexity of ecological and social systems (Grimm and Railsback, 2005). They have therefore been used to study a wide range of complex systems often in different disciplines, with Tang and Bennett (2010) identifying that the current ABM paradigm has been developed within individual-based models from ecology, multiagent systems from computer science, computational process models from cognitive science, and ABM from the social sciences. Agents can be modelled either as the cells or within these cellular models (O’Sullivan and Torrens, 2001), and this bottom-up simulation modeling compiles information about entities at a lower level of the system (often the agents), formulates theories about their behavior, and then implements these theories in a computer simulation before observing the emergence of system-level properties (Grimm et al., 2005). Agents are the central components, and these can be defined by several characteristics (Parker et al., 2003): They are autonomous They share an environment through agent interaction and communication l They make decisions that are influenced by the environment l l

Being autonomous means that they have control over their internal state and their actions in order to achieve certain goals. This implies flexibility in their behavior or, at the very least, requires strategies that allow them to react to changes in the environment. The use of ABM to understand movement has been steadily increasing in recent years due to a movement toward a Lagrangian approach. Nathan et al. (2008a) recently identified that animal movement can be defined by the internal state, motion capacities, navigation capacities, and external factors and it can be argued that these factors could be expanded to all active moving objects. ABM can be used to account for all four of these areas and has been a key factor in the resurgence of ABM in recent years (Tang and Bennett, 2010). Table 2 identifies how these four areas can be addressed using ABM.

1.13.6

Modeling Movement: Visual Analytics

Visualization is essential for analyzing geographic phenomena and is an important tool in the cognitive processing of movement data, providing the link between humans and computers. Visualization allows the identification of patterns that were previously not

176

Analysis and Modeling of Movement Table 2

How the four movement characteristics of movement ecology can be modeled in agent-based models

Movement characteristic

Implementation of agent-based modeling

Internal state

This is the object’s goal and is represented using state variables. These states can change throughout the simulation and have been used to represent perception, memory, learning, and decision making (e.g., Van Moorter et al., 2009) Objects use different movement modes, and this can be influenced by both internal state and external factors. Random walks on the landscape are an often used technique of ABM, and these can be programmed to accommodate different movement modes based on internal or external influences (Tang and Bennett, 2010) Navigation in real life is based on cognitive and environmental cues. ABM can model both, so it can be used to distinguish which has more of an impact on navigation The cellular model identified in the ABM framework by Parker et al. (2003) begins to illustrate the power of how landscapes can be linked with ABMs. These landscapes will influence how and where movement occurs, with feedback from the environment further influencing movement

Motion capacity

Navigation capacity External factors

evident across multiple spatial and temporal scales, supports hypothesis generation, and provides a data validation procedure, as errors can be quickly identified (Demsar et al., 2015). Dodge (2016) considers visualization a vital component of the movement research continuum due to its ability to communicate the outcomes of complex computational analysis, models, simulations, and predictions in meaningful and interpretative ways. Visualization within a movement context has relied heavily on the use of cartography, meaning geography and GIScience will be key in the development of robust and informed geovisualization techniques. In general, the visualization of movement phenomena has been undertaken either as direct depiction of movement or based on aggregated units (Andrienko et al., 2008).

1.13.6.1

Direct Depiction

Direct depiction represents the traditional methodology utilized in visualization whereby each point in space and time of the moving object is portrayed, such that the analyst can interact with the raw data (Andrienko et al., 2008). This is perhaps the closest representation of traditional cartographic methods, whereby points and polylines are depicted on a map representing possible geographic phenomena. While an effective starting point for visual exploration of moving objects, due to the unprecedented amounts of movement data that are being collected, displays are becoming cluttered with an excessive number of data and objects represented. This has led Andrienko and Andrienko (2012) to distinguish between visualization of full trajectories and lower levels of interest such as moves and segments. Despite this, the deconstruction of movement trajectories is not without its own uncertainties (as discussed throughout this article), and furthermore, does not necessarily result in a decluttered depiction of movement. Another key issue with direct depiction is the portrayal of both space and time. Time geography’s STC is one of the most frequently used visual representations of movement data within a GIS context (Andrienko and Andrienko, 2012). The STC is an effective method to visualize movement data in three axes (Fig. 5A) and surpasses most other forms of direct depiction. Kwan (2008) used the STC concept to visualize the movements of a Muslim woman after September 11, 2001, coupling space-time paths with a conceptualization of fear associated with neighborhoods. Such methods do well to implement current visual analytics used within a GIS to explore and project how geography is perceived by individuals. However, questions pertaining to the cartographic design, operational environments, the 3D depth, as well as data quantity and complexity still remain (Kveladze et al., 2013). Visualization of movement can also be effective in two dimensions. Mennis et al. (2013) incorporated text coding (a common feature of qualitative research), along with a novel set of cartographic symbolization strategies to represent a variety of characteristics associated with activity space locations and paths. This representation takes place in a 2D movement space, where colors and symbols are used to illustrate user attitudes toward space, as well as integrating a user-operated text box to portray narrative. They implemented this representation of space for urban youth and developed the Qualitative Activity space data Viewer (QAV), which utilizes location, path, and subject features.

1.13.6.2

Aggregation

Aggregation involves methods for summarizing movement data prior to graphical depiction and visualization (Andrienko et al., 2008). Aggregation reduces the size of the dataset, which solves the issue of cluttering and occlusion in visualization; however, it increases abstraction and generalization within the analysis. Andrienko and Andrienko (2010) outline the main approaches for aggregating movement data for visualization purposes. The first method aggregates data into “events” that have similar attributes in time, space, or activity. Time aggregation creates temporal histograms where bars correspond to the specified time intervals, while spatial aggregation divides the environment into a compartment such as a lattice grid. Attribute aggregation involves similar categorization and can be based on attributes such as speed or mode of movement. An important distinction of aggregation is that movement is viewed as a set of independent events (Andrienko and Andrienko, 2010), despite that movement data is not independent of itself. The second method aggregates data into spatially continuous density fields. Field-based representations of

Analysis and Modeling of Movement

177

probabilistic time geography have been visualized using the STC. For example, Demsar and Virrantaus (2010) advanced the direct depiction within an STC by visualizing the 3D space-time density for vessel trajectories in the Gulf of Finland. Similarly, Nakaya and Yano (2010) visualized space-time clusters of crime density using kernel density and denoted hypothetical movement using arrows. The final method aggregates data which have specified origins and destinations and are characterized by the number of aggregated units as well as primary derivatives (such as duration and length). Results of this aggregation have been visualized as a transition matrix, whereby rows and columns represent the origins and destinations, and the attributes are depicted through color (e.g., Guo et al., 2006). Bak et al. (2015) developed flower charts to visualize the spatiotemporal distribution of stop events for public transport in Helsinki. However, for objects that change direction more so than objects with directional persistence, how direction is portrayed can be a challenge.

1.13.6.3 1.13.6.3.1

Visualization Challenges Conceptual spatial models

While continuous fields of density estimation have been portrayed, these have primarily been based on representing discrete objects. The paucity of work dealing with visualization of moving objects depicted as fields is well-noted (Andrienko and Andrienko, 2012); however, the movement of such continuous fields is an important challenge facing GIS research in the coming years. Many geographic phenomena such as ocean currents, atmospheric gases, and land covers continuously move and change locations. The raster conceptualization of space is widely used to depict such features, but the dearth of visualization techniques beyond time-series of static snapshots is troublesome.

1.13.6.3.2

Context

Another notable absentee from the above visualization techniques is geographic context. While many of the examples provided utilize some cartographic representation of the underlying environment (e.g., STC overlain on a mapdFig. 5B), many of the current techniques used in the analysis and modeling of movement data do not incorporate context. The temporal matching of environmental datasets with movement data is an issue with many of the cartographic examples provided, due to the static nature of the underlying representation of space. Xavier and Dodge (2014) developed a visualization tool that illustrated multidimensional environmental context along the movement trajectories of nine Galapagos Albatrosses. This visualization allows users to move, resize, and color all elements of the underlying geography and provides an interactive timeline which highlights the values for a number of temporally explicit environmental variables. Another visualization technique that has potential use in simulation models was developed by Bouvier and Oates (2008) to provide environmental context. As an agent interacts with an environmental feature, it becomes stained (depicted as a specific color), allowing the user to observe which agents have interacted with specific features. Despite these advances, applying context remains a challenge for visualization techniques.

1.13.6.3.3

Animation

The animation of moving objects is something that is seldom considered when visualizing in GIS. This is particularly pertinent for individual-based models, whereby the movement of agents or objects is used to simulate realistic movements. Animations are often good enough for entertainment or scientific research (Torrens, 2012), but in order to fully understand the processes that drive movement patterns, objects need to be considered in their fullest representation and not as an abstracted entity. Generating plausible and authentically behaving synthetic agents is an important component of social science, engineering, planning, and policy scenarios (Torrens, 2014). Furthermore, with the expectations of end-users of many simulations increasing due to the advancements in gaming and computer-generated imagery, GIS needs to keep pace with such developments or risk losing out to other fields which have more realistic graphics.

1.13.7

Conclusion

The analysis and modeling of movement is a burgeoning area of GIS research. Recent advances in the technologies used to record the movements of objects have resulted in an unprecedented amount of spatiotemporal geographic data at increasingly fine spatial and temporal resolutions, and new methods to understand and model movement processes are needed. Novel methodological challenges exist that need to be considered when defining and quantifying movement, such as the aggregation of movement data to different levels, the differences between Lagrangian and Eulerian approaches to studying movement, and the sampling regimes used to collect data. Movement seldom occurs irrespective of the environment, and so the movement space is an important construct to consider. It is imperative that movement be analyzed within the geographic space through which these patterns and processes operate. The structure of the movement space and how movement data and the environment interact are important issues. Of particular relevance to current GIS research are questions related to how continuous movement is discretized and how environmental features are generated. A number of different models have been implemented in order to research movement. Models are simplifications of reality and have wide applications from understanding to predicting movement. We discussed three concepts of models; analytical, simulation, and visual. A number of different analytical methods have been used in movement analysis, ranging from time geography,

178

Analysis and Modeling of Movement

movement parameters, similarity, and clustering. These models encapsulate the fundamental ways by which movement data have been quantitatively analyzed within a GIS to date. Studying the geographic context of movement within these models is an ongoing challenge. Simulation is an important tool to test scientific hypotheses by comparing observed movement to a null model for validation purposes, as well as for visualizing data and predicting the outcomes of analytical models. Simulation has also been used to provide insights into how system-level properties emerge from the adaptive behavior of individuals as well as how the system affects individuals. Different spatial simulation models have been developed to understand and predict movement, with two distinct conceptualizations; spread models and random walks. Spread models model movement as a diffusive behavior through a neighborhood (often in a gridded representation), while random walk models simulate individual movement paths based on the location of previous steps. Visualization, and in particular visual analytics, is itself an active analytical framework within GIS, so it is important to consider how visualization is itself an important independent component for analyzing and modeling movement. Visualization has predominantly been categorized based on direct depiction of movement data or an aggregated representation of movement. Such models are important for validation, as well as for communicating results. The visualization of moving fields, applying geographic context, and improving animation will be important research frontiers of movement analysis GIS in the coming years.

References Ahearn, S.C., Dodge, S., Simcharoen, A., Xavier, G., Smith, J.L., 2017. A context-sensitive correlated random walk: a new simulation model for movement. International Journal of Geographical Information Science 31 (5), 1–17. Andrienko, G., Andrienko, N., 2010. A general framework for using aggregation in visual exploration of movement data. The Cartographic Journal 47 (1), 22–40. Andrienko, N., Andrienko, G., 2012. Visual analytics of movement: an overview of methods, tools and procedures. Information Visualization 12 (1), 3–24. Andrienko, G., Andrienko, N., Dykes, J., Fabrikant, S.I., Wachowicz, M., 2008. Geovisualization of dynamics, movement and change: key issues and developing approaches in visualization research. Information Visualization 7 (3–4), 173. Anselin, L., 1989. What is special about spatial data? Alternative perspectives on spatial data analysis. National Center for Geographic Information and Analysis, Santa Barbara, CA, 63–77. Badariotti, D., Banos, A., Moreno, D., 2007. Influence of network metrics in urban simulation: introducing accessibility in graph-cellular automata. In: 15TH European Colloquium on Theoretical and Quantitative Geography, p. 4. Bak, P., Ship, H., Yaeli, A., et al., 2015. Visual analytics for movement behavior in traffic and transportation. IBM Journal of Research and Development 59 (2/3), 10–11. Barreira-González, P., Barros, J., 2016. Configuring the neighborhood effect in irregular cellular automata based models. International Journal of Geographical Information Science 31 (3), 1–20. Bartlam-Brooks, H.L.A., Bonyongo, M.C., Harris, S., 2011. Will reconnecting ecosystems allow long-distance mammal migrations to resume? A case study of a zebra Equus burchelli migration in Botswana. Oryx 45 (2), 210–216. Benhamou, S., 2007. How many animals really do the Levy walk? Ecology 88 (8), 1962–1969. Bjørneraas, K., Solberg, E.J., Herfindal, I., et al., 2011. Moose Alces alces habitat use at multiple temporal scales in a human-altered landscape. Wildlife Biology 17 (1), 44–54. Bocedi, G., Pe’er, G., Heikkinen, R.K., Matsinos, Y., Travis, J.M., 2012. Projecting species’ range expansion dynamics: sources of systematic biases when scaling up patterns and processes. Methods in Ecology and Evolution 3 (6), 1008–1018. Bonabeau, E., 2002. Agent-based modeling: methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences of the United States of America 99 (3), 7280–7287. Bouvier DJ and Oates B (2008) Evacuation traces mini challenge award: Innovative trace visualization staining for information discovery. In Visual Analytics Science and Technology, 2008. VAST’08. IEEE Symposium, pp. 219–220. IEEE. Buchin, M., Dodge, S., Speckmann, B., 2012. Context-aware similarity of trajectories. Lecture Notes in Computer Science 7478, 43–56. Burnett, G.E., Lee, K., 2005. The effect of vehicle navigation systems on the formation of cognitive maps. In: International Conference of Traffic and Transport Psychology. Cass, S., 2002. Mind games: to beat the competition, video games are getting smarter. IEEE Spectrum 39 (12), 40–44. Chavoshi, S.H., De Baets, B., Neutens, T., De Tré, G., Van de Weghe, N., 2015. Exploring dance movement using sequence alignment methods. PLoS ONE 10 (7), e0132452. Chipperfield, J.D., Holland, E.P., Dytham, C., Thomas, C.D., Hovestadt, T., 2011. On the approximation of continuous dispersal kernels in discrete-space models. Methods in Ecology and Evolution 2, 668–681. Christakos, G., Bogaert, P., Serre, M., 2012. Temporal GIS: advanced functions for field-based applications. Springer Science and Business Media, Berlin. Christensen, K.M., Sharifi, M.S., Chen, A., 2013. Considering individuals with disabilities in a building evacuation: an agent-based simulation study. In: Proceeding of the 92nd Annual Meeting of the Transportation Research Board, Washington, DC. Clark, J.S., Fastie, C., Hurtt, G., et al., 1998. Reid’s paradox of rapid plant migration dispersal theory and interpretation of paleoecological records. BioScience 48 (1), 13–24. Codling, E.A., Plank, M.J., Benhamou, S., 2008. Random walk models in biology. Journal of the Royal Society Interface 5 (25), 813–834. Couclelis, H., 1997. From cellular automata to urban models: new principles for model development and implementation. Environment and Planning B: Planning and Design 24 (2), 165–174. Coulon, A., Morellet, N., Goulard, M., et al., 2008. Inferring the effects of landscape structure on roe deer (Capreolus capreolus) movements using a step selection function. Landscape Ecology 23 (5), 603–614. Crosetto, M., Tarantola, S., 2001. Uncertainty and sensitivity analysis: tools for GIS-based model implementation. International Journal of Geographical Information Science 15 (5), 415–437. Cunze, S., Heydel, F., Tackenberg, O., 2013. Are plant species able to keep pace with the rapidly changing climate? PLoS ONE 8 (7), e67909. Demsar, U., Çöltekin, A., 2014. Quantifying the interactions between eye and mouse movements on spatial visual interfaces through trajectory visualisations. Workshop on Analysis of Movement Data at GIScience, Vienna (accessed on 23–26th September, 2014). Demsar, U., Buchin, K., Cagnacci, F., et al., 2015. Analysis and visualization of movement: an interdisciplinary review. Movement ecology 3 (1), 1. Demsar, U., Virrantaus, K., 2010. Space–time density of trajectories: exploring spatio-temporal patterns in movement data. International Journal of Geographical Information Science 24 (10), 1527–1542. Dodge, S., 2016. From observation to prediction: the trajectory of movement research in GIScience. In: Onsrud, H., Kuhn, W. (Eds.), Advancing geographic information science: the past and next twenty years. GSDI Association Press, Needham, MA, 123–136. Chapter 9. Dodge, S., et al., 2016. Analysis of movement data. International Journal of Geographic Science 30, 825–834.

Analysis and Modeling of Movement

179

Dodge, S., Bohrer, G., Weinzierl, R., et al., 2013. The environmental-data automated track annotation (Env-DATA) system: linking animal tracks with environmental data. Movement Ecology 1 (1), 1. Dodge, S., Laube, P., Weibel, R., 2012. Movement similarity assessment using symbolic representation of trajectories. International Journal of Geographical Information Science 26 (9), 1563–1588. Dodge, S., Weibel, R., Lautenschutz, A.K., 2008. Towards a taxonomy of movement patterns. Information Visualization 7, 240–252. Downs, J.A., 2010. Time-geographic density estimation for moving point objects. Lecture Notes in Computer Sceince 6292, 16–26. Downs, J.A., Horner, M.W., 2012. Probabilistic potential path trees for visualizing and analyzing vehicle tracking data. Journal of Transport Geography 23, 72–80. Drewe, J.A., Weber, N., Carter, S.P., et al., 2012. Performance of proximity loggers in recording intra-and inter-species interactions: a laboratory and field-based validation study. PLoS ONE 7 (6), e39068. Duckham, M., Kulik, L., 2005. A formal model of obfuscation and negotiation for location privacy. In: International Conference on Pervasive ComputingSpringer, Berlin\Heidelberg, pp. 152–170. Duckham, M., Mokbel, M., Nittel, S., 2007. Special issue on privacy aware and location-based mobile services. Journal of Location Based Services 1 (3), 161–164. Duran, B.S., Odell, P.L., 2013. Cluster analysis: a survey, vol. 100. Springer Science and Business Media, Berlin. Engler, R., Guisan, A., 2009. MIGCLIM: predicting plant distribution and dispersal in a changing climate. Diversity and Distributions 15 (4), 590–601. Fama, E.F., 1995. Random walks in stock market prices. Financial Analysts Journal 51 (1), 75–80. Farber, S., Paez, A., 2009. My car, my friends, and me: a preliminary analysis of automobility and social activity participation. Journal of Transport Geography 17, 216–225. Fortin, D., Beyer, H.L., Boyce, M.S., et al., 2005. Wolves influence elk movements: behavior shapes a trophic cascade in Yellowstone National Park. Ecology 86 (5), 1320–1330. Frair, J.L., Merrill, E.H., Beyer, H.L., Morales, J.M., 2008. Thresholds in landscape connectivity and mortality risks in response to growing road networks. Journal of Applied Ecology 45 (5), 1504–1513. Frair, J.L., Fieberg, J., Hebblewhite, M., et al., 2010. Resolving issues of imprecise and habitat-biased locations in ecological analyses using GPS telemetry data. Philosophical Transactions of the Royal Society of London B: Biological Sciences 365 (1550), 2187–2200. Goodchild, M.F., 2013. Prospects for a space-time GIS. Annals of the Association of American Geographers 103 (5), 1072–1077. Grimm, V., Railsback, S.F., 2005. Individual-based modeling and ecology. Princeton University Press, Princeton, NJ. Grimm, V., Revilla, E., Berger, U., et al., 2005. Pattern-oriented modeling of agent-based complex systems: lessons from ecology. Science 310 (5750), 987–991. Grimm, V., Berger, U., Bastiansen, F., et al., 2006. A standard protocol for describing individual-based and agent-based models. Ecological Modelling 198 (1), 115–126. Gudmundsson, J., Laube, P., Wolle, T., 2012. Computational movement analysis. In: Springer handbook of geographic information. Springer, Berlin\Heidelberg, 423–438. Guo, D., Chen, J., MacEachren, A.M., Liao, K., 2006. A visualization system for space-time and multivariate patterns (vis-stamp). IEEE Transactions on Visualization and Computer Graphics 12 (6), 1461–1474. Hägerstrand, T., 1970. What about people in regional science? Papers of the Regional Science Association 24, 7–21. Hall, C.M., Page, S.J., 2014. The geography of tourism and recreation: environment, place and space. Routledge, Abingdon. Hebblewhite, M., Haydon, D.T., 2010. Distinguishing technology from biology: a critical review of the use of GPS telemetry data in ecology. Philosophical Transactions of the Royal Society of London B: Biological Sciences 365 (1550), 2303–2312. Holland, E.P., Aegerter, J.N., Dytahm, C., Smith, G.C., 2007. Landscape as a model: the importance of geometry. PLoS Computational Biology 3 (10), e200. Holloway, P., Miller, J., 2014. Uncertainty analysis of step-selection functions: the effect of model parameters on inferences about the relationship between animal movement and the environment. Lecture Notes in Computer Science 8728, 48–63. Holloway, P., Miller, J.A., Gillings, S., 2016. Incorporating movement in species distribution modelling: how do simulations of dispersal affect the accuracy and uncertainty of projections? International Journal of Geographic Information Science 30 (10), 2050–2074. Hwang, R.H., Hsueh, Y.L., Chung, H.W., 2014. A novel time-obfuscated algorithm for trajectory privacy protection. IEEE Transcations on Services Computing 7 (2), 126–139. James, A., Plank, M.J., Edwards, A.M., 2011. Assessing Lévy walks as models of animal foraging. Journal of the Royal Society Interface 8 (62), 1233–1247. Jønsson, K.A., Tøttrup, A.P., Borregaard, M.K., et al., 2016. Tracking animal dispersal: from individual movement to community assembly and global range dynamics. Trends in Ecology and Evolution 31 (3), 204–214. Kays, R., Crofoot, M.C., Jetz, W., Wikelski, M., 2015. Terrestrial animal tracking as an eye on life and planet. Science 348 (6240), aaa2478. Komito, L., 2011. Social media and migration: virtual community 2.0. Journal of the American Society for Information Science and Technology 62 (6), 1075–1086. Kucera, G., 1992. Time in geographic information systems. CRC Press, Boca Raton, FL. Kveladze, I., Kraak, M.J., van Elzakker, C.P., 2013. A methodological framework for researching the usability of the space-time cube. The Cartographic Journal 50 (3), 201–210. Kwan, M.P., 1999. Gender and individual access to urban opportunities: a study using space–time measures. The Professional Geographer 51 (2), 210–227. Kwan, M.P., 2004. GIS methods in time-geographic research: geocomputation and geovisualization of human activity patterns. Geografiska Annaler: Series B, Human Geography 86 (4), 267–280. Kwan, M.-P., 2007. Affecting geospatial technologies: toward a feminist politics of emotion. The Professional Geographer 59 (1), 22–34. Kwan, M.-P., 2008. From oral histories to visual narratives: re-presenting the post-September 11 experiences of the Muslim women in the USA. Social and Cultural Geography 9 (6), 653–669. Langran, G., 1989. A review of temporal database research and its use in GIS applications. International Journal of Geographical Information System 3 (3), 215–232. Laube, P., 2014. Computational movement analysis. Springer, New York. Laube, P., 2017. Representation, trajectories. In: Richardson, D., Castree, N., Goodchild, M.M., Kobayashi, A., Liu, W., Marston, R.A. (Eds.), The international encyclopaedia of geography: people, the earth, environment, and technology. John Wiley and Sons, Ltd, New York City, NY. Laube, P., Purves, R.S., 2006. An approach to evaluating motion pattern detection techniques in spatio-temporal data. Computers, Environment and Urban Systems 30 (3), 347–374. Laube, P., Purves, R.S., 2011. How fast is a cow? Cross-scale analysis of movement data. Transactions in GIS 15 (3), 401–418. Levin, S.A., 1992. The problem of pattern and scale in ecology: the Robert H. MacArthur award lecture. Ecology 73 (6), 1943–1967. Lindberg, M.S., Walker, J., 2007. Satellite telemetry in avian research and management: sample size considerations. The Journal of Wildlife Management 71 (3), 1002–1009. Lindenmayer, D.B., Fischer, J., Hobbs, R., 2007. The need for pluralism in landscape models: a reply to Dunn and Majer. Oikos 116 (8), 1419–1421. Long, J.A., Nelson, T.A., 2012. Time geography and wildlife home range delineation. The Journal of Wildlife Management 76 (2), 407–413. Long, J.A., Nelson, T.A., 2013a. A review of quantitative methods for movement data. International Journal of Geographical Information Science 27 (2), 292–318. Long, J.A., Nelson, T.A., 2013b. Measuring dynamic interaction in movement data. Transactions in GIS 17 (1), 62–77. Long, J.A., Nelson, T.A., Webb, S.L., Gee, K.L., 2014a. A critical examination of indices of dynamic interaction for wildlife telemetry studies. Journal of Animal Ecology 83 (5), 1216–1233. Long, J.A., Nelson, T.A., Nathoo, F.S., 2014b. Toward a kinetic-based probabilistic time geography. International Journal of Geographical Information Science 28 (5), 855–874. Mennis, J., Mason, M.J., Cao, Y., 2013. Qualitative GIS and the visualization of narrative activity space data. International Journal of Geographical Information Science 27 (2), 267–291. Merow, C., LaFleur, N., Silander Jnr, J.A., Wilson, A.M., Rubega, M., 2011. Developing dynamic mechanistic species distribution models: predicting bird-mediated spread of invasive plants across Northeastern North America. The American Naturalist 178 (1), 30–43. Miller, H.J., 1991. Modelling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information System 5 (3), 287–301.

180

Analysis and Modeling of Movement

Miller, H.J., 2005a. A measurement theory for time geography. Geographical Analysis 37 (1), 17–45. Miller, H.J., 2005b. What about people in geographic information science. Re-presenting Geographic Information Systems. John Wiley, Hoboken, NJ, 215–242. Miller, J.A., 2012. Using spatially explicit simulated data to analyze animal interactions: a case study with brown hyenas in northern Botswana. Transactions in GIS 16 (3), 271–291. Miller, J.A., 2015. Towards a better understanding of dynamic interaction metrics for wildlife: a null model approach. Transactions in GIS 19 (3), 342–361. Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E., Fryxell, J.M., 2004. Extracting more out of relocation data: building movement models as mixtures of random walks. Ecology 85 (9), 2436–2445. Mörters, P., Peres, Y., 2010. Brownian motion, 30. Cambridge University Press, Cambridge. Naidoo, R., Chase, M.J., Beytell, P., Du Preez, P., 2016. A newly discovered wildlife migration in Namibia and Botswana is the longest in Africa. Oryx 50 (1), 138–146. Nakaya, T., Yano, K., 2010. Visualising Crime Clusters in a Space-time Cube: An Exploratory Data-analysis Approach Using Space-time Kernel Density Estimation and Scan Statistics. Transactions in GIS 14 (3), 223–239. Nathan, R., Getz, W.M., Revilla, E., et al., 2008a. A movement ecology paradigm for unifying organismal movement research. Proceedings of the National Academy of Sciences Of the United States of America 105 (49), 19052–19059. Nathan, R., Schurr, F.M., Spiegel, O., et al., 2008b. Mechanisms of long-distance seed dispersal. Trends in Ecology and Evolution 23 (11), 638–647. Neutens, T., Schwanen, T., Witlox, F., 2011. The prism of everyday life: towards a new research agenda for time geography. Transport Reviews 31 (1), 25–47. O’Sullivan, D., 2001. Graph-cellular automata: a generalised discrete urban and regional model. Environment and Planning B: Planning and Design 28 (5), 687–705. O’Sullivan, D., Perry, G.L.W., 2013. Spatial simulation: exploring pattern and process. Wiley-Blackwel, Oxford. O’Sullivan, D., Torrens, P.M., 2001. Cellular models of urban systems. Theory and practical issues on cellular automata. Springer, London, 108-116. Parker, D.C., Manson, S.M., Janssen, M.A., et al., 2003. Multi-agent systems for the simulation of land-use and land-cover change: a review. Annals of the Association of American Geographers 93 (2), 314–337. Pereboom, V., Mergey, M., Villerette, N., et al., 2008. Movement patterns, habitat selection, and corridor use of a typical woodland-dweller species, the European pine marten (Martes martes), in a fragmented landscape. Canadian Journal of Zoology 86 (9), 983–991. Peuquet, D.J., 1994. It’s about time: a conceptual framework for the representation of temporal dynamics in geographic information systems. Annals of the Association of American Geographers 84 (3), 441–461. Prager, S.D., Wiegand, R.P., 2014. Modeling use of space from social media data using a biased random walker. Transactions in GIS 18 (6), 817–833. Purves, R.S., Laube, P., Buchin, M., Speckmann, B., 2014. Moving beyond the point: an agenda for research in movement analysis with real data. Computers, Environment and Urban Systems 47, 1–4. Robbins, J., 2010. GPS navigation. but what is it doing to us? In: 2010 IEEE International Symposium on Technology and SocietyIEEE, pp. 309–318. Ryan, P.G., Petersen, S.L., Peters, G., Gremillet, D., 2004. GPS tracking a marine predator: the effects of precision, resolution and sampling rate on foraging tracks of African penguins. Marine Biology 145 (2), 215–223. Schærström, A., 1996. Pathogenic paths? In: A time geographical approach in medical geography, vol. 125 Lund University Press, Lund. Box 141c. S-221 00. Smolik, M.G., Dullinger, S., Essl, F., et al., 2010. Integrating species distribution models and interacting particle systems to predict the spread of an invasive alien plant. Journal of Biogeography 37 (3), 411–422. Song, Y., Miller, H.J., 2014. Simulating visit probability distributions within planar space-time prisms. International Journal of Geographical Information Science 28 (1), 104–125. South, A., 1999. Extrapolating from individual movement behavior to population spacing patterns in a ranging mammal. Ecological Modelling 117 (2), 343–360. Tang, W., Bennett, D.A., 2010. Agent-based modeling of animal movement: a review. Geography Compass 4 (7), 682–700. Tao, S., Manolopoulos, V., Rodriguez, S., Rusa, A., 2012. Real-time urban traffic state estimations with A-GPS mobile phones as probes. Journal of Transportation Technologies 2, 22–31. Thurfjell, H., Ciuti, S., Boyce, M.S., 2014. Applications of step-selection functions in ecology and conservation. Movement Ecology 2 (4). Tobler, W., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46 (2), 234–240. Torrens, P.M., 2012. Moving agent pedestrians through space and time. Annals of the Association of American Geographers 102 (1), 35–66. Torrens, P.M., 2014. High-fidelity behaviors for model people on model streetscapes. Annals of GIS 20 (3), 139–157. Van Haastert, P.J., Postma, M., 2007. Biased random walk by stochastic fluctuations of chemoattractant-receptor interactions at the lower limit of detection. Biophysical journal 93 (5), 1787–1796. Van Moorter, B., Visscher, D., Benhamou, S., et al., 2009. Memory keeps you at home: a mechanistic model for home range emergence. Oikos 118 (5), 641–652. Viswanathan, G.M., Afanasyev, V., Buldyrev, S.V., et al., 1996. Lévy flight search patterns of wandering albatrosses. Nature 381 (6581), 413–415. Viswanathan, G.M., Da Luz, M.G., Raposo, E.P., Stanley, H.E., 2011. The physics of foraging: an introduction to random searches and biological encounters. Cambridge University Press, Cambridge. Wikelski, M., Tarlow, E.M., Raim, A., et al., 2003. Avian metabolism: costs of migration in free-flying songbirds. Nature 423 (6941), 704. Winter, S., 2009. Towards a probabilistic time geography. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in Geographic Information SystemsdGIS ’09ACM Press, New York, NY, pp. 528–531. Winter, S., Yin, Z.-C., 2010. Directed movements in probabilistic time geography. International Journal of Geographical Information Science 24 (9), 1349–1365. Witten, I.H., Frank, E., 2005. Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Burlington, MA. Xavier, G., Dodge, S., 2014. An exploratory visualization tool for mapping the relationships between animal movement and the environment. In: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Interacting with MapsACM, New York, NY, pp. 36–42. Yuan, M., 1996. Temporal GIS and spatio-temporal modeling. In: Proceedings of Third International Conference Workshop on Integrating GIS and Environment Modeling, Santa Fe, NM.

1.14

Spatial Metrics: The Static and Dynamic Perspectives

Saad Saleem Bhatti, Jose´ Pedro Reis, and Elisabete A Silva, University of Cambridge, Cambridge, United Kingdom © 2018 Elsevier Inc. All rights reserved.

1.14.1 1.14.2 1.14.2.1 1.14.2.2 1.14.2.3 1.14.3 1.14.3.1 1.14.3.2 1.14.4 1.14.5 References

1.14.1

Background Static Metrics: Potentials, Gaps, and Challenges Spatial Metrics in the Literature Usefulness and Potential of Static Metrics Gaps and Challenges Dynamic Metrics SRS Data and Spatiotemporal Metrics The Space–Time Dynamics Future Directions in Spatial Metrics: From Static to Dynamic? Appendix

181 182 182 182 184 185 185 186 188 190 194

Background

Spatial metrics are quantitative measures for evaluating the physical characteristics of geographical features, such as urban settlements and structures (Reis, 2015; Reis et al., 2015). Just like other quantitative methods of spatial analysis, spatial metrics have garnered a growing interest of international research during the past two decades (Clifton et al., 2008; Liu and Yang, 2015; Schwarz, 2010). This resurgence of quantitative urban studies is attributed to several factors, such as growing concerns about sustainable development and the negative environmental consequences of urban sprawl (Huang et al., 2007; Kaza, 2013; Lord et al., 2015; Tian et al., 2014; Torrens, 2008), increasing emphasis on evidence-based policy formulation (Geertman, 2015), developments in geographical information systems and information technologies, and rising quality and availability of spatially referenced data, notably with the recent advancements in remote sensing-based methods (Clifton et al., 2008; Herold et al., 2003a; Huang et al., 2007). Spatial metrics have been used for a wide range of purposes. These include, but are not limited to, characterizing urban patterns to support planning policy, comparing physical patterns of different cities or regions, and understanding the spatiotemporal patterns of urban development (Herold et al., 2003a). Spatial metrics applied to digital geographical information systems (GIS) layers have been increasingly used together with other methods, such as remote sensing and image processing techniques imbedded in urban growth models (Herold et al., 2003a; Kaza, 2013; Schneider and Woodcock, 2008; Silva et al., 2008). By offering an improved description of heterogeneous urban areas, spatial metrics can be useful tools for the development of spatial plans (Marshall and Gong, 2009), for instance, by local authorities. Coupled with dynamic spatial urban models, such as agent-based models (ABMs) and cellular automaton (CA) models, metrics include spatial and temporal variability and have the ability to assist and test different planning scenarios and to explore the subsequent impacts of different policy decisions (Herold et al., 2003a). Furthermore, spatial metrics and urban models can make an important contribution to linking urban land use patterns and spatial structure with functionality and economic processes (Parker et al., 2001). Although a wide range of spatial metrics can be found in the literature, only a handful of studies managed crossing the boundaries in order to carry out a truly broad and multidisciplinary research using spatial metrics to study the urban patterns (Reis et al., 2015). Studies specifically focusing on the use of metrics are still quite rare, and even a few that review and compare different spatial metrics are normally restricted to a specific area of knowledge or a geographical location (Banzhaf et al., 2009; Lauf et al., 2012; Schwarz, 2010). Moreover, most reviews and empirical applications focus on static metrics, making the study of dynamic metrics a very promising but still quite undeveloped field of research. An important consideration in dynamic metrics is “time” along with the usual spatial dimension. Although a significant amount of research has been done to develop, test, and apply a variety of static spatial metrics, little work has been carried out toward true integration of time and aspatial data in the spatial metrics. Efforts have been made to incorporate the temporal dimension to spatial metrics (Fichera et al., 2012; Guo and Di, 2015; Kong et al., 2012; Ramachandra et al., 2012); however, these studies to a great extent relied on processing the static metrics at multiple time intervals to model the temporal dynamics. Since the dimensions of space and time are addressed separately in such methods, the resultant approaches are therefore quite complex and disintegrated. The aforementioned technique of blending the spatial and temporal dimensions has also been applied to satellite remote sensing (SRS) data (Kong et al., 2012; Pham et al., 2011). Efforts have been made to combine remote sensing-based data and spatial metrics in different research areas, especially urban growth modeling (Ramachandra et al., 2012; Herold et al., 2005). The rise in spatial and temporal resolutions of SRS data over the recent years demands for faster and dynamic spatiotemporal modeling techniques capable of capturing both the spatial and temporal dimensions in a seamless way; dynamic spatial metrics seem to have an obvious edge over the hybrid and disintegrated methods of space–time modeling (Singh et al., 2016; Valero et al., 2016). However, given the myriad of variables and the variety of spatial and aspatial data involved in the modeling process,

181

182

Spatial Metrics: The Static and Dynamic Perspectives

such integration is quite challenging and has been proclaimed in only a handful of research studies, which are also study-area specific and hardly replicable to other geographic areas. This article reviews the use of metrics for spatial analyses focusing on the strengths, weaknesses, and challenges associated with the application of existing static and dynamic spatial metrics. Identifying the gaps in existing methods of data processing and metrics, this review aims at directing the future research toward the development of robust dynamic metrics that can truly integrate the dimensions of both space and time together with the aspatial data. For this article, the dynamic metrics refer to the spatial metrics that exhibit at least one of the following characteristics: includes, in an explicit form, the measurement of both temporal and spatial variations; capable of mapping the metric’s path through a set of time frames or spatial scales; measures “dynamic spatiality” (vertical and horizontal integration); and/or includes feedback loops, self-organization, and/or learning. All other spatial metrics not observing these characteristics will be referred to as static metrics or just spatial metrics in this article.

1.14.2

Static Metrics: Potentials, Gaps, and Challenges

Building on a comprehensive review of spatial metrics for growth and shrinkage (Reis, 2015; Reis et al., 2014, 2015), this section presents an overview of the state of the research on spatial metrics, identifying their potentials and gaps, and discussing some of the main challenges in developing spatial metrics and applying them empirically. It focuses on the metrics which characterize urban patterns and structures in a particular time period, hereby defined as “static metrics,” as opposed to the previously defined specific characteristics of “dynamic metrics.”

1.14.2.1

Spatial Metrics in the Literature

Despite being commonly used for spatiotemporal analyses of urban areas, static metrics are not a function of time (time is not imbedded within the metric). Temporal analysis with static metrics is normally done by comparing the results of same metric for different time periods, or temporal snapshots of the spatial data. According to Reis et al. (2015), spatial metrics can be divided into three main groups based on the area of knowledge and methodological approach to urban form in which the metrics were developed: landscape metrics, geospatial metrics, and spatial statistics. Their review identified a total of 162 metrics (41 landscape metrics, 110 geospatial metrics, and 11 spatial statistics), summarized in Table 1. Landscape metrics have been used since the 1980s by landscape ecologists, primarily concerned with environmental protection and resource conservation aiming at quantifying the shape and pattern of vegetation (Herold et al., 2005; Mcgarigal and Marks, 1995). They have, however, also been increasingly used to study urban patterns. Several authors highlighted the usefulness of landscape metrics to represent spatial urban characteristics (Aguilera et al., 2011; Herold et al., 2005; Liu and Yang, 2015; Schwarz, 2010) and link economic processes to land use patterns (Parker et al., 2001; Liu et al., 2016), as well as used their combination with urban growth models. Spatial metrics adapted from landscape ecology are traditionally characterized by relying on data derived from aerial photography, and more recently from satellite remote sensing or local sensor data. Usually, they use “patches” (polygons with homogeneous characteristics of a specific landscape property) as the basic unit of analysis (Herold et al., 2005). Geospatial metrics have mostly been developed to measure specific urban spatial patterns by urban planners and geographers. These metrics are quite diverse in both the complexity of their calculation methods and the specific features of urban built environment they measure. Geospatial metrics differ from landscape metrics as they are developed for particular case studies, while the latter are normally formed by a set of researchers in one particular subject and subsequently transferred to multiple case studies and/or software (in a “top down” manner). For this reason, geospatial-based approaches tend to be more customized to the characteristics and types of data of particular case studies and, therefore, are less robust (Reis, 2015). Spatial statistics are the metrics based on statistical tools that are used to characterize the distribution of events across space, focusing on the nature of spatial data (Getis et al., 2004). These metrics are often used in combination with regression and spatial econometric models, but they are also used to characterize particular spatial patterns of urban settlements, such as diversity, fragmentation, inequality, or spatial autocorrelation (Haining, 2015). It is important to note that these three groups (landscape metrics, geospatial metrics, and spatial statistics) are intended only as a general framework for the metrics’ disciplinary background and broad calculation methods, not as a universal classification or a typology of metrics. Moreover, other types of metrics would have been considered if this literature review was extended to include, among others, different methods of spatial data analysis in the spatial econometrics literature (see Getis et al., 2004 and O’Sullivan and Unwin, 2014 for other subjects), complex methods of land classification used in remote sensing (Yang, 2011), or metrics focusing on transport and accessibility (Cerdá, 2009; Curtis and Scheurer, 2010).

1.14.2.2

Usefulness and Potential of Static Metrics

Spatial metrics have been widely applied in empirical studies of urban patterns, from different disciplinary and methodological backgrounds and with diversified research goals. Landscape metrics are developed with a strong and well-documented body

Spatial Metrics: The Static and Dynamic Perspectives Table 1

183

A summary of landscape metrics, geospatial metrics, and spatial statistics

Type

Metrics

Landscape metrics

Area-weighted mean landscape expansion index Area-weighted mean shape index Area-weighted mean patch fractal dimension Change in density of urban land Change in size of urban area Compactness index of the largest patch Compactness index Contagion index Contrasting edge proportion Contrasting edge ratio Diversity index Edge density Edge-to-interior ratio A-ratio A-ratio Area index Area neighborhood green Average degree Average household size Axial ringiness Batty’s entropy Betweenness centrality Blocks Blocks perimeter B-ratio Bus distance Centrality Centrality index Centralization index Characteristic path length Choice Clark’s density gradient Closeness centrality Cluster index

Geospatial metrics

Spatial statistics

Clustering Clustering coefficient Coefficient of variation Commercial distance Commercial pedestrian accessibility Community node inaccessibility Compactness Concentration Connectivity Continuity Control Core-dominated nuclearity Degree of isolation Degree of sealing Delta index Dendritic street pattern Density gradient by ordinary least squares regression Geary coefficient Getis-Ord Gi

Fractal dimension Interspersion and juxtaposition index Landscape expansion index Landscape shape index Largest patch index Length of common edge Mean dispersion Mean landscape expansion index Mean nearest neighbor distance Mean nearest neighbor distance standard deviation Mean patch size Mean perimeter-area ratio Mean radius of gyration Mean shape index

Number of patches Patch cohesion index Patch density Patch richness Patch size coefficient variation Patch size standard deviation Percent. Like of adjacency Shannon’s diversity index Shannon’s evenness index Shape index Simpson’s diversity index Simpson’s evenness index Square pixel Urban area

Dispersion index Distance to primary school Distance to roads Distance to shopping Distance to CBD (I) Distance to CBD (II) Efficiency External connectivity Floor space Fractal dimension Fraction imperv. surface Grid axiality Gross leapfrog index H indicator Highway strip index Hrel indicator

Mix Mix actual Mix zoned Mixed uses Neighborhood recreation area Net leapfrog index Nuclearity Number of axial lines Number of nodes Orientation index Park distance Peak ratio Peripheral density Population density Proximity (different land use) Proximity (same land use) Ratio of density of people Ratio of open space Real relative asymmetry Relative entropy Residential development in existing urbanized area Residential vacancy Residential density Road network density Segregated land use Shape index ® Share of demolition Share of renovated houses Single family dwelling density Spatial isolation index Straightness centrality Synergy Total greenery Transit pedestrian access Urban land use change Urban density index Weighted average proximity

Index of clustering Index of remoteness Information centrality Integration Intelligibility Internal (street) connectivity Job density Land consumption index (I) Land consumption index (II) Land use diversity Land use diversity index Leapfrog index Length cul-de-sacs Lot size Mean contour policentricity Mean axial lines length Mean depth Median distance to schools Median contour policentricity

Getis-Ord Gi*

Locational Gini coefficient

Gini coefficient Local Moran (Ii) Location quotient

Moran’s I Number of fragments Spatial index

Source: see Reis, J. P., Silva, E. A. and Pinho, P. (2015). Spatial metrics to study urban patterns in growing and shrinking cities. Urban Geography 37(2), 246–271.

184

Spatial Metrics: The Static and Dynamic Perspectives

of research to quantify urban patterns, especially the patterns of growth and urbanization that focus on changes to the spatial configuration (e.g., fragmentation, shape irregularity, porosity, etc.) and composition/heterogeneity (e.g., diversity, land use mix, etc.) of urban areas. A large array of empirical applications of landscape metrics can be found for many cities around the world (Aguilera et al., 2011; Marraccini et al., 2015; Salvati and Carlucci, 2015; Schneider and Woodcock, 2008; Schwarz, 2010; Wu et al., 2011). Since the methods of calculation are quite similar (notwithstanding the limitations set by the quality and spatial aggregation of data, and by issues of subjective selection), these metrics can be particularly useful for comparisons between cities and for the interpretation and validation of urban models (Herold et al., 2003a; Schwarz, 2010). Geospatial metrics are by far the most wide and diverse of the three types of spatial metrics discussed here. Most of these metrics have been specifically developed for urban studies focusing on spatial patterns such as density, fragmentation/compactness, centrality, land use mix, connectivity, or accessibility. They have also been used to compare different cities or areas of cities (intra-urban analysis), often addressing the very specific characteristics of urban structure, and sometimes integrating the spatial elements with economic and demographic variables (population, employment, etc.). A considerable number of geospatial metrics listed in Table 1 are based on spatial network analysis theory. Network analysis represents cities as networks in which identifiable urban elements (e.g., settlements, locations, and intersections) are regarded as nodes in a planar graph, and the connections between pairs of nodes (e.g., roads and transport lines) are represented as edges. A set of topological centrality measures (e.g., connectivity, integration, and intelligibility) can be extracted from the graph in order to quantify the relative accessibility of each space/object in the system (Hiller, 2007; Porta et al., 2006a,b; Volchenkov and Blanchard, 2008). Spatial statistics, developed mainly by geographers and spatial econometricians, encompass a number of methods of spatial data analysis (Getis et al., 2004). Some of these are used to characterize patterns such as the evenness/inequality of the distribution of an attribute, or the presence of a spatial structure in the distribution of attributes in a particular area, often related to patterns of spatial clustering or dispersion. The latter relates to spatial autocorrelation, that is, the idea that data from nearby locations are more likely to be similar than the data from more distant locations (Haining et al., 2010; O’Sullivan and Unwin, 2014). Spatial autocorrelation metrics (e.g., Moran’s coefficient and local Moran’s coefficient) are useful to measure spatial clustering, for instance, to examine whether the areas with high levels of a particular phenomenon (e.g., urban density, land use type, activity, etc.) are evenly (or randomly) distributed across the urban area or clustered in the space (Anselin, 1995; O’Sullivan and Unwin, 2014). A considerable number of metrics from all the groups (123 metrics) have been used to study the spatial patterns of urban growth, and they are particularly common in the quantitative assessment of urban sprawl. Urban sprawl is probably the most studied spatial phenomenon in planning and geography with a consistent body of literature proposing several quantitative methods and metrics addressing a variety of spatial dimensions (Ewing et al., 2002; Galster et al., 2001; Sarzynski et al., 2014; Torrens, 2008). Despite the ambiguity and lack of consensus on its definition, some agreement exists on recognizing extensive urbanization, low density, single use, fragmentation/scatter, or poor accessibility as some of the main spatial characteristics of sprawl. The patterns of sprawl have been studied using a variety of landscape metrics with a particular focus on shape irregularity and fragmentation of urban areas (Huang et al., 2007; Schneider and Woodcock, 2008). The importance of urban sprawl in the literature is perhaps more evident viewing the geospatial metrics. A good part of these metrics was actually developed with the particular purpose of studying urban sprawl, often focusing on spatial patterns of fragmentation, density, land use diversity, centrality, and accessibility (Frenkel and Ashkenazi, 2008; Knaap et al., 2007; Sarzynski et al., 2014; Torrens, 2008). Sprawl patterns have also been studied with spatial statistics, notably the Moran’s I (Torrens, 2008; Tsai, 2005). The use of static spatial metrics is also found in other research areas not particularly related to urban growth. For example, Vaughan (2007) used spatial network analysis metrics (Space Syntax methods) to connect urban physical structures with patterns of poverty and social segregation. Other authors studied the relationship between spatial patterns quantified by metrics and socioeconomic and demographic characteristics of urban areas, such as purchasing power, changes in gross domestic products, education levels, or household structures (Huang et al., 2007; Liu et al., 2016; Schwarz, 2010). McCarty and Kaza (2015) used spatial metrics to evaluate the relationship between spatial patterns of American cities and air quality. Some metrics, however, have also been used to analyze patterns of urban shrinkage, such as small-scale fragmentation, perforation, residential vacancy, and urban decay (Bontje, 2005; Deng and Ma, 2015; Kabisch et al., 2006).

1.14.2.3

Gaps and Challenges

Notwithstanding the prolific literature on metrics examining urban spatial patterns from distinct backgrounds, some important gaps still exist in terms of encompassing the dynamic urban processes. Landscape metrics are often criticized for relying too much on ecology principles and not being the most adequate to study some specific urban processesdin particular those occurring at smaller spatial scalesdor processes involving population movements, socioeconomic variables, or governance structures (Herold et al., 2005; Schneider and Woodcock, 2008; Schwarz, 2010). Geospatial metrics usually focus on spatial patterns, or use the datasets specific to a particular case study, and are therefore not robust enough to transfer and adapt to different geographic contexts or to draw general conclusions. The scarcity of metrics focusing on patterns of urban shrinkage is another important gap in the literature (Reis et al., 2015). Strong and enduring population decline, migration across different tier cities (i.e., in some cases from small to mid-size cities), and inter-metropolitan area movements of residents are all related to the changing demographic, cultural, and socioeconomic trends. Urban shrinkage is an increasing reality in many regions of the world, which has emerged as one of the most important research topics of urban and regional planning, particularly in Europe and North America (Pallagst, 2010; Wiechmann and Bontje, 2015).

Spatial Metrics: The Static and Dynamic Perspectives

185

These processes are quite relevant at an intra-urban scale and, therefore, growing and shrinking areas can be found in most large cities regardless of overall metropolitan population trends. Although some metrics have been developed and used to study urban decline (e.g., residential vacancy and share of demolition), there is a clear lack of quantitative research on urban shrinkage, particularly in terms of development of metrics specifically aimed at assessing its spatial patterns. An important challenge for the development and application of spatial metrics is related to the gaps in knowledge on the exact nature of spatial patterns intended to be quantified. The first step toward the development of metrics, therefore, is to identify the particular spatial features of an urban area that should be quantified. For example, the concept of “fragmentation” often addressed by landscape and geospatial metrics is not always clear and it is sometimes mixed with “shape irregularity” (Reis, 2015). Moreover, as mentioned above, the specific spatial features of urban sprawl are still not clear in the literature despite the high number of studies addressing this issue. More notably, the literature on spatial patterns associated with urban shrinkage is still much undeveloped. For instance, although “perforation” is commonly considered a spatial outcome of shrinkage, Reis (2015) found at least five different definitions of perforation in the literature, thus indicating a clear gap in standardizing the definitions of shrinkage-related terminologies. In addition to the aforementioned challenges, scalability effects, types of data units and aggregation, and issues of subjective selection are some of the intrinsic limitations of empirical applications of spatial metrics. Spatial metrics use the data variables that are often aggregated into arbitrary units; however, since the boundaries of the data units affect the results of spatial data, the outcomes of an application of metrics are influenced by the quality of available data and the spatial scale used (Martínez et al., 2007; Martínez et al., 2009). Certainly, no application of spatial metrics can be considered completely objective as the results are always influenced by the quality and level of aggregation of data and the criteria for the definition of spatial units or the extent of the study area. These aspects are dependent on subjective choices made by the analyst, on specific characteristics, and research objectives of different case studies, and therefore limit the generalization of findings of a particular case study, and/or the extent to which the metrics can be replicated to other case studies. Several authors argue that more effort is needed toward the development of new spatial metrics to achieve robust measures for assessing urban patterns (Aguilera et al., 2011; Huang et al., 2007, 2009; Liu and Yang, 2015). This is quite imperative for the study of urban shrinkage and comparative analyses of different cities. Furthermore, combining the spatial metricsdparticularly those focusing on physical configuration of spatial structuresdwith demographic and socioeconomic variables would be a leap forward toward forming more robust mixed indicators, and to some extent overcoming the challenges of using diverse datasets and spatial aggregations.

1.14.3

Dynamic Metrics

Spatial metrics have played a significant role in a variety of research areas, from landscape to urban land use to environment (Aguilera et al., 2011; Lausch et al., 2015; Poodat et al., 2015). As discussed above, a substantial amount of literature exists on developing and testing the spatial metrics that deal with one-time assessment of any phenomenon in a given area (Brown and Reed, 2012; Frank et al., 2012; Plexida et al., 2014; Syrbe and Walz, 2012; Uuemaa et al., 2013). However, since “time” is an important dimension that must be part of the metrics (the “objects” comprising a “space” are normally not stationary), several researchers also attempted to incorporate the temporal dimension to the spatial metrics (Fichera et al., 2012; Guo and Di, 2015; Kong et al., 2012; Ramachandra et al., 2012). Majority of these studies are limited to the development and use of simple (two-dimensional) spatial metrics at multiple intervals in time. This method can be called a “process” or a “model” that manages to capture the spatial variability of a phenomenon or object over a period of time, where metrics are used in such models to assess the object/phenomenon under observation (Fotheringham and Wegener, 1999). Although this approach captures both the spatial and temporal dimensions of the factor/s under observation, it brings complexity to the process as the assessments of the two dimensions are carried out by different disintegrated methods. Another important consideration dealing with the spatiotemporal data is its storage. A database usually stores the locations of objects in a spatial reference system along with the associated time stamps. The spatiotemporal data can be integrated within the spatial data warehouses and trajectory data warehouses that provide storage and retrieval support for spatial, spatiotemporal, and mobile objects database indexes (Bédard and Han, 2009; Pelekis et al., 2008). Some common methods used for spatiotemporal data exploration include space–time cubes, spatiotemporal clustering, spatiotemporal associations, sequence mining, mining collective patterns, and visualization and visual analytics (Miller, 2015). Although capturing the sequence of objects’ locations over time cannot be done directly as they are stored at discrete moments in time, it can be achieved through several ways, such as linear interpolation of the recorded temporal sequence (Andrienko et al., 2008; Ratti et al., 2006). Carefully selected spatial and temporal scales and an appropriately designed spatiotemporal database are thus imperative for dynamic spatial analyses.

1.14.3.1

SRS Data and Spatiotemporal Metrics

The SRS is undoubtedly a valuable source of spatial data. A number of spatial metrics have been successfully applied to remotely sensed earth observation data (Kong et al., 2012; Pham et al., 2011). Remote sensing-based methods to examine temporal variability of different phenomena have also been developed and employed by a number of researchers.

186

Spatial Metrics: The Static and Dynamic Perspectives

In a study conducted by Herold et al. (2003b), spatial metrics and textures were used to describe the spatial characteristics of land cover objects as derived from aerial photographs. The gray-level co-occurrence matrix method was used to describe the image texture based on continuous gray-level pixel values (seven texture parameters), whereas the computation of spatial metrics (22 metrics) was based on categorical, patch-based representation of landscape within land use regions (nine land use classes). It was found that the spatial metrics were useful in providing important information to characterize different urban land uses, as well as to assess their spatial variability. Another study attempted to combine the remote sensing and spatial metrics for improving the urban growth modeling (Herold et al., 2005). A theoretical framework was established first, which was then illustrated by application to an urban area of Santa Barbara, California, using IKONOS satellite imagery. Four spatial metrics (contagion, fractal dimension, patch density, and nearest neighbor standard deviation) were applied to nine types of land uses. It was found that the spatial metrics were simple statistical measurements, and their comprehensive interpretation was more important than just mere application to the remotely sensed data. Although the metrics employed in this study were directly adopted from landscape ecology and were not specifically tailored to the urban space, the authors concluded that spatial metrics definitely play an important role in urban dynamics research. In another study, the spatiotemporal patterns of urbanization in Kathmandu Valley, Nepal, were examined using remote sensing data and spatial metrics (Thapa and Murayama, 2009). A 33-year time-series remotely sensed data were used where eight spatial metrics were computed for each land use map. The metrics were calculated by FRAGSTAT softwaredErdas Imagine and ArcGIS were used for image processing and analysis. Although the presented approach catered to the temporal dimension and spatial variability, there was no evidence of true integration of time, spatial metrics, and remote sensing data (spatial metrics were computed independent of the time). Another study examined seven spatial metrics calculated for 77 metropolitan areas in Asia, United States of America, Europe, Latin America, and Australia (Huang et al., 2007). Four land use classes were derived by supervised classification of satellite images, while the spatial metrics comprising a series of quantitative indices were computed separately. An interesting aspect of this research was the exploration of differences in urban form by comparing the socioeconomic indicators of urban development; correlation between the spatial metrics and socioeconomic factors was examined. However, these processes were carried out independent of each other and the temporal dimension not considered. “Temporal metrics” is another term used by some researchers to indicate the inclusion of time in the standard metrics. In a study conducted by Borak et al. (2010), several temporal change metrics were examined to assess the land cover change in sub-Saharan Africa. Using the remote sensing data, the difference images (T2-T1) were used to define five temporal metrics, annual maximum, annual minimum, annual range, annual mean, and temporal vector. An important finding of this study was the role of “spatial scale” in determining the types of processes that a coarser or finer spatial resolution data can examine. The authors also studied the relationships between fine-scale change and coarse spatial resolution metrics, and concluded that stepwise regression showed that multivariate combinations of temporal metrics were better compared to the univariate linear or quadratic models. However, this study did not examine any aspatial characteristics influencing the land cover change. Moreover, the study seems to be lacking in terms of coupling the temporal and spatial metrics. The role of satellite remote sensing data in spatial research is increasing day by day, thanks to the enhanced spatial, spectral, radiometric, and temporal resolutions that modern satellites offer now. However, transforming this wealth of data, which is now received in substantial volume, into useful information is becoming a challenge. Appropriate tools and methods for handling, analyzing, modeling, and interpreting the multiscale and multitemporal data are lacking. The current spatial modeling methods can handle the aspatial and temporal dimensions of the data, to some extent; however, the spatial, aspatial, and temporal dimensions are treated separately, thus reducing the modeling efficiency. Although some efforts have been made in this regard, a very few have dealt with it through metrics.

1.14.3.2

The Space–Time Dynamics

As an effort toward developing new metrics to analyze the space–time remotely sensed datasets, the concept of three-dimensional “blob” was introduced by Parrott et al. (2008). The successive time-series spatial raster images were stacked in a space–time cube, where a spatial pixel became a three-dimensional “voxel” exhibiting two dimensions in space and one dimension in time (Fig. 1). The depth of a voxel equals the sampling time interval. Metrics such as density, number of blobs, blob shape complexity, fractal dimension, contagion, and spatiotemporal complexity were developed and analyzed through application to the ecological data. Some limitations, however, as mentioned by the authors point toward applicability of the proposed metrics, such as (1) only to a limited range of spatial scales; (2) only to the univariate data; and (3) only to categorical or classified data (not applicable to raw/unclassified satellite image). Nevertheless, the proposed concept presents opportunity for development of new methods to understand and characterize the spatiotemporal dynamics in a better way. Agent-based modeling (ABM) is another area gaining popularity, the key difference from other forms of modeling being its ability to focus on the individuals and their behaviors (Crooks, 2015). Majority of the geographical ABMs are constrained to modeling in two rather than three dimensions because of the considerable complexity in model building and simulation (Torrens, 2015). This limitation is quite common in the ABMs dealing with GIS and spatial data models (Abdul-Rahman and Pilouk, 2008; Zlatanova et al., 2002). Attempts have been made at implementing three-dimensional automata-based models; CAs in particular are popular in this regard (Silva and Clarke, 2002, 2005; Silva and Wu, 2014). Some applications are seen in the areas of landscape (Dibble and Feldman, 2004), urban dynamics (Semboloni et al., 2004) and geo-visualization (Luke et al., 2005), although the third dimension in these instances has not been substantively integrated into the model behavior. Some advancement in developing three-dimensional geographical ABMs and CAs is realized in the field of computer gaming showcasing agents as characters

Spatial Metrics: The Static and Dynamic Perspectives

(A)

187

(B)

50 45 40 35 30

Δt

25

Δx Δy

20 15 10 5 5

10 15 20 25 30 35 40 45 50

(C)

Space, x

Time, t

Time, t Sp ac e

,y

Nt

e,

ac

Sp

y

Space, x

Ny Nx

Fig. 1 Format of spatiotemporal data used for the 3D metrics presented in this article: (A) an example of ecological mosaic composed of categorical raster data; shades of gray correspond to different categories or patch types; (B) A pixel in raster data becomes a 3D voxel in space–time having dimensions equal to the spatial (Dx, Dy) and temporal (Dt) sampling resolutions. Neighboring voxels are used to calculate adjacencies for metrics such as contagion; the von Neumann neighborhood is shown here; (C) a stack of spatial mosaics taken at successive points in time can be used to generate a 3D data matrix in which spatial patches take on an additional temporal dimension to become “blobs” composed of 3D voxels. Reproduced from Parrott, L., Proulx, R. and Thibert-Plante, X. (2008). Three-dimensional metrics for the analysis of spatiotemporal data in ecology. Ecological Informatics 3(6), 343–353, with permission. Available at: http://www.sciencedirect.com/science/article/pii/S157495410800037X [Accessed October 28, 2015].

(Nareyek, 2001), or dynamic variables in virtual environment (Parker and O’Brien, 2009). A variety of game development toolkits, therefore, have seen some use in the construction of ABM-based programs (Barnes et al., 2009; Jacobson and Hwang, 2002; Zyda, 2005). One of the libraries of spatial analysis tools, PySAL (written in open-source programming language Python), provides some functions related to space–time analysis (Rey and Anselin, 2007). The spatial dynamics module of this library implements Markov chain-based methods that can be used to model the dynamics of a variety of spatial processes. In addition to extending the local indicators of spatial association Markov chain to a space–time dynamic context (Rey and Janikas, 2006; Rey et al., 2011), this module also includes some tests for space–time interaction (Jacquez, 1996; Knox, 1964). Extending the PySAL library to deal with the dynamic spatial metrics can therefore be an interesting advancement. Another approach to deal with the complexity of dynamic processes involves optimizing both the hardware and software sides of computing. In a recent study, a hybrid parallel geospatial CA model was developed for examining urban growth where a heterogeneous computer architecture combining both central processing units and graphics processing units was used to reduce the overall computing time (Guan et al., 2015). This approach presents a scalable solution to handle complex simulation scenarios by processing large amounts of data through multiple processing units. Such an approach could also be extended to handle the complex space–time dynamics which could lead to the development of hybrid (software–hardware) dynamic metrics. Some other methods such as hidden Markov models (HMM) and computational fluid dynamics (CFD) have also been used to address the problem of incorporating the temporal aspect to the spatial dynamics (Bui et al., 2001; Holsclaw et al., 2015; Ma et al., 2002; Parra et al., 2010). However, majority of the research has been done on the development of models and approaches rather

188

Spatial Metrics: The Static and Dynamic Perspectives

than dynamic metrics. One of the tools capable of handling the dynamic raster models is “BioClass.” The software, which basically focuses on the spatiotemporal multicriteria classification system, is developed by Eco-Consult (Eco-Consult, 2010). It presents several GIS tools designed for a variety of purposes, such as solving multiple-criteria classification and optimization problems, combined fuzzy logic and level set methods for classification, and multiobjective decision making for real-world problems. Some of the functions that point toward the possibility of developing dynamic metrics are raster dynamic models, dynamic fuzzy regions and objects, and spatiotemporal interpolation. Although the complete and in-depth description of the methods used in the software is not available, it appears that the underlying equations are modified on the go, in a dynamic way, so as to incorporate the user input. However, the way the temporal dimension is handled in the actual process is not clear. Although aspatial data can be used in this software just as it is done in some other spatial simulation models, evidence of its coupling with the spatial data to form a new data structure is rather missing. The HMM, because of its capability to simulate time-series observations, has been employed in a variety of studies such as spatial population genetics, object tracking and prediction in complex spatial layout, disease mapping, spatial modeling of species abundance, crop classification, and many more (Bui et al., 2001; François et al., 2006; Green and Richardson, 2002; Prates et al., 2015; Siachalou et al., 2015). The use of Markov models for examining urban growth and land use dynamics has also been successful (Ahmed and Ahmed, 2012; Singh et al., 2015; Zhang et al., 2011); however, majority of these studies rely on the development and/or implementation of “dynamic model” rather than “dynamic metrics.” For instance, a study reported to have used the HMM to develop a classification model based on time-series analysis of multispectral satellite images (Siachalou et al., 2014). Although the overall approach appears promising to monitor temporal changes in water bodies, the implementation of HMM was limited to image classification. Another similar sort of application of the HMM was found in a study related to crop classification (Siachalou et al., 2015). The CFD is another three-dimensional model that primarily solves and analyzes problems relating to fluid flows. However, since this model can handle the temporal dynamics together with the spatial domain, it has also been applied to other relevant research areas such as modeling and design of ventilation systems in agricultural industry, urban river modeling, urban canopy parameterization, and urban weather research and forecasting (Baklanov et al., 2009; Chen et al., 2011; Ma et al., 2002; Norton et al., 2007). The CFD appears interesting due to its versatility, and can be adopted in the studies dealing with spatiotemporal dynamics. However, both HMM and CFD consider each time step as a separate layer and manipulate them for spatiotemporal analysis. Although these approaches are useful to deal with the spatiotemporal satellite remote sensing data, they are not fully optimized. The use of dynamic metrics can significantly increase the data handling, processing, and analyzing efficiency as it will allow for integration of both spatial and temporal dynamics in a single set of equation/s. However, in order to be truly beneficial, these metrics should be able to integrate with the existing models. Sample application area and reference of some of the metrics/models/methods capable of dealing with the temporal dimension together with the spatial one are shown in Table 2, whereas their details are summarized in Appendix.

1.14.4

Future Directions in Spatial Metrics: From Static to Dynamic?

While the concept of metric itself has been highly debated (i.e., recurrent confusions about what is meant by variables, metrics, indexes, etc.), it was decided to focus on the necessary discussion about the commonly used static and dynamic metrics in this article (assuming that alternative concepts will be discussed in other parts of this book). The past 100 years saw an increasing use of metrics to extract meaningful information. For some time, the analogue processes of data retrieval (i.e., traditional mapping using paper) and the associated theory and data analysis were static that produced the maps containing information that suited static metric production. However, modern-day digital data collection/processing (from satellites, local sensors, or other modern data acquisition tools) rendered useless most of the traditional analogue data collection/analysis and map production methods, hence questioning the applicability of existent static metrics. In an era where enormous amounts of digital data are being produced every minute, at multiple scales from a variety of sensors and for diverse purposes, the harvesting, integration, and mining of all this information are raising serious concerns. Some say that we live in a data-rich environment; nevertheless, we now face problems of obtaining the data too fast and not having the theory, methods, and metrics to support its timely processing and analysis. The review of static and dynamic metrics points clearly toward the gaps in true integration of the dimension of “time” and scalability in the spatial metrics. Moreover, the role of aspatial data in affecting or getting affected in space and time cannot be neglected and, therefore, it should also be addressed properly in the dynamic metrics. The development of dynamic metrics is thus imperative so as to attain higher data processing and analysis efficiency through exploiting the true potential of high temporal resolution remote sensing data and other local sensors produce. The dynamic metrics can simplify the computational process by reducing the steps needed for data processing and analysis (space and time are normally handled separately in conventional metrics). Moreover, incorporating the aspatial dimension in the spatiotemporal dynamic metrics will further boost their utility. This, however, again points to the need to elucidate the characteristics of dynamic metrics. As stated at the beginning of this article, the dynamic metrics refer to the spatial metrics that exhibit at least one of these characteristics: (1) includes, in an explicit form, the measurement of both temporal and spatial variations; (2) capable of mapping the metric’s path through a set of time frames or spatial scales; (3) measures “dynamic spatiality” (vertical and horizontal integration); and/or (4) includes feedback loops, self-organization, and/or learning. The new approaches to data harvesting and processing are intertwined with the need to clarify other associated concepts, such as validity, randomness, uncertainty, calibration, and scalability. The scalability of data is clearly an important aspect that needs to be addressed when applying spatial metrics (both static and dynamic). Lower spatial resolution of the data limits the application of metrics on detailed local scales. Such limitations also arise when aggregating the different types of data; some metrics require

Spatial Metrics: The Static and Dynamic Perspectives

189

Table 2 Sample application area of some of the metrics/models/methods capable of handling both the spatial and temporal dimensions Metric/model/method

Sample application and reference

   

HMM approach for crop classification linking to time-series remote sensing data (Siachalou et al., 2015) HMM approach to model wetland dynamics using temporal remote sensing data (Siachalou et al., 2014)

                   – – – – – – – – – – – – – – – – –

Model for temporal observations Hidden Markov model (HMM) Hidden Markov model (HMM) Optimality criterion to determine sequence of hidden states Change metrics Multitemporal change vector PD – Patch density ED – Edge density LPI – Largest patch index ENNMN – Euclidean nearest neighbor distance mean AWMPFD – Area-weighted mean patch fractal dimension COHESION CONTAG – Contagion SHDI – Shannon’s diversity index Contagion Spatiotemporal complexity AWMSI – Area-weighted mean shape index AWMPFD – Area-weighted mean patch fractal dimension Centrality CI – Compactness index CILP – Compactness index of the largest patch ROS – Ratio of open space Density Patch size standard deviation Contagion index Patch density Edge density Area-weighted mean patch fractal dimension PLAND – Percentage of landscape PD – Patch density AREA_MN – Mean patch size AREA_SD – Area standard deviation ED – Edge density LPI – Largest patch index ENN_MN – Euclidian mean nearest neighbor distance ENN_SD – Euclidian nearest neighbor distance standard deviation FRAC-AM – Area-weighted mean patch fractal dimension FRAC-SD – Fractal dimension standard deviation COHESION CONTAG – Contagion

Temporal metrics for land cover change detection (Borak et al., 2010) Examining spatiotemporal urbanization using satellite remote sensing data (Thapa and Murayama, 2009)

Analysis of spatiotemporal data in ecology using “blob” (Parrott et al., 2008) Application of spatial metrics and remote sensing to compare and analyze urban form Huang et al. (2007)

Analysis and modeling of urban land use change using satellite remote sensing data (Herold et al., 2005)

Mapping land use through satellite remote sensing data (Herold et al., 2003b)

Source: Reis, J.P., Silva, E.A. & Pinho, P., (2015) Spatial metrics to study urban patterns in growing and shrinking cities. Urban Geography, 37(2), pp. 246–271.

particular types of units (e.g., raster cells, contiguous polygons, segments) in their calculation methods, and the type, number, and size of mapping units also influence the results of some metrics, which therefore creates potential biases. In addition, recent advancements in the methods of data collectiondfor example, big data, such as sensors collecting vast amounts of data in real time, sometimes interacting with mobile telecommunication devices, etc.dopen a whole new set of research problems and opportunities. These would probably bring about new possibilities for the development of other types of metrics (perhaps more complex and detailed) but also present new challenges in their development and application. All in all, we are now witnessing a data science revolution. The data used these days is fundamentally different from what was employed in the past; it is digital, highly disaggregated and live, and the data models and methods applied are dynamic that go beyond physical representations of the world in order to include the socioeconomic and behavioral attributes of highly complex societies and individuals. In the middle of these two dimensions of DATA and MODELS, we have the METRICS, and metric development seems to be lagging behind and seriously impacting the type of work we do and the results we obtain.

Appendix

190

1.14.5

Metric/model/ method

Measurement/description

Sample application and reference

Model for temporal observations Hidden Markov model (HMM)

The sequence of observations consisted of a set of remote sensing measurements O ¼ fOt1 ; .; Otn g, where t ¼ ft1 ; .; tn g are the acquisition dates of the images. The hidden states correspond to the different phenological states S ¼ fS1 ; .; Sm g of each crop type, and Q ¼ fqt1 ; .; qtn g is the fixed sequence of hidden states  (1) The state transition probability matrix A, where ai;j ðtÞ ¼ P½qt ¼ Sj qt1 ¼ Si  denotes the transition probability from state Si to state Sj at    time t. The emission probability matrix B, defines the probability that Ot is emitted by state Sj, i.e., bj ðOt Þ ¼ P Ot  qt ¼ Sj . In order to estimate the symbol probability distributions B, a multivariate Gaussian distribution is assumed for the observed spectral data. The mean vector mi and the covariance matrix Si were calculated by the training data for each crop type, for each state, and for every image by the equation:   P1 ðOt mi Þ T 1 P i exp  O bi ðOt Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð  m Þ t i 2 ð2pÞd j i j (2) The initial probability pi is the probability of being in state Si at time t1, i.e., pi ¼ P½qt1 ¼ Si . The parameters A, B, and pi were estimated by the set of training data. Once the parameters A, B, pi of each model l were estimated and given the sequence of observations O for each pixel, the probability that the sequence O was generated by these models l is defined by:       p q1 ¼ Si ; .; qt ¼ Sj ; O1 ; .; Ot l ¼ P O1 ; .; Ot q1 ¼ Si ; .; qt ¼ Sj  P q1 ¼ Si ; .; qt ¼ Sj l Yt Yt ¼ pq1  l¼2 aqt1; qt ðlÞ  l¼1 bqt ðOl Þ

HMM approach for crop classification linking to time-series remote sensing data (Siachalou et al., 2015)

Hidden Markov model (HMM)

Basic elements of HMM:

HMM approach to model wetland dynamics using temporal remote sensing data (Siachalou et al., 2014)

Optimality criterion to determine sequence of hidden states Change metrics Multitemporal change vector

Metric/model/ method

    1. the transition matrix A, where each entry describes the transition probability from state Si to state Sj, i.e.,  aij ¼ P qt ¼ Sj qt1 ¼ Sj  2. the emission matrix B, which involves the probability that Ot is emitted by state Sj, i.e., bj ðOt Þ ¼ P Ot qt ¼ Sj 3. the initial probability pi which is the probability of being in state Si at time t1, i.e., pi ¼ P½qn1 ¼ Si  In order to infer the sequence of “hidden” states, we seek to determine an optimality criterion. For this cause we search for the state sequence Sb ¼ fs1 ; s2 . sn g Q that gives the highest score in the equation p ðO1 ; O2 ; .; On jSb Þ ¼ nt¼1 P ðSt jSt 1 Þ,P ðOt jSt Þ PðStjSt1 Þ represents the transition probability from state St to state St1 and PðOt jSt Þ, the probability of emitting observation y while having state St D ¼ Metric ðyear 2Þ  Metric ðyear 1Þ the difference in annual mean, annual maximum, annual minimum, and annual range cði Þ ¼ pði; y Þ  pði; z Þ where c(i) is the change vector for pixel i between the years y and z and p(i, y) is the multitemporal vector for pixel i and the year y    I ðt1 Þ     I ðt Þ   2  pði; y Þ ¼    /     I ðt Þ  n where I are the values of the indicator under consideration for pixel i at the time periods t1 to tn, n being the number of time dimensions Measurement/description

Temporal metrics for land cover change detection (Borak et al., 2010)

Sample application and reference

Spatial Metrics: The Static and Dynamic Perspectives

Summary of some of the spatial metrics/models/methods capable of handling the temporal dimension

PD – Patch density

Fragmentation, No./ 100 ha PD equals the number of patches of a specific land cover class divided by total landscape area.

ED – Edge density

Fragmentation, Meters/ ha The sum of the lengths of all edge segments involving a specific class, divided by the total landscape area multiplied by 10 000 Dominance, Percent The area of largest patch of the corresponding class divided by total area covered by that class, multiplied by 100

LPI – Largest patch index

ENNMN – Euclidean nearest neighbor distance mean AWMPFD – areaweighted mean patch fractal dimension COHESION CONTAG – Contagion

Contagion

Spatiotemporal Complexity

Fragmentation and complexity, None, range: 1–2 It describes the complexity and fragmentation of a patch by a perimeter-area ratio. Lower values indicate compact form of a patch. If the patches are more complex and fragmented, the perimeter increases representing higher values Physical connectedness, None, range: 0–100 Approaches 0 as the portion of the landscape composed of the focal class decreases and becomes increasingly subdivided and less physically connected Fragmentation and the degree of aggregation, None, range: 1–100 Contagion index describes the fragmentation of a landscape by the random and conditional probabilities that a pixel of patch class is adjacent to another patch class. It measures to what extent landscapes are aggregated or clumped Patch diversity, information Shannon’s diversity index quantifies the diversity of the landscape based on two components: the number of different patch types and the proportional area distribution among patch types The formula of contagion is extended to three dimensions in order to measure the space–time dispersion of blob types. Note that the calculation is based on voxel (not blob) adjacencies. Calculation of “b” blob types: RC ¼ 1  EE=EEmax where: RC contagion EEmax b lnðbÞ  P P EE  bi1 bj1 pij ln pij pij nij =ni nij number of adjacencies between voxels of blob type j

Analysis of spatiotemporal data in ecology using “blob” (Parrott et al., 2008)

and voxel of blob type i ni number of voxels of type i Spatiotemporal complexity is calculated by looking at the contents of successively offset 3D windows (of dimension n  n  n, where n is an arbitrary length that is considerably smaller than the data cube dimensions) in the space–time cube. For each possible placement of the 3D window in the cube the number of voxels occupied by blob type i is counted. Pn3  pk lnpk STC ¼ ln ðni03 þ1Þ 0 < STC < 1 where: STC spatiotemporal complexity pk relative frequency of Mk Measurement/description

191

Metric/model/ method

Isolation / proximity, Meters The distance mean value of all patches of a land use to the nearest neighbor patch of the land use based on shortest edge-to-edge distance from cell center to cell center

Spatial Metrics: The Static and Dynamic Perspectives

SHDI – Shannon’s diversity index

Examining spatiotemporal urbanization using satellite remote sensing data (Thapa and Murayama, 2009)

Sample application and reference (Continued)

Measurement/description Pi¼N AWMSI ¼

pffiffiffi si

p =4 i¼1 i N

Sample application and reference

i  Psi¼N

s i¼1 i

Where si and pi are the area and perimeter of patch i, and N is the total number of patches Pi¼N

AWMPFD – Areaweighted mean patch fractal dimension Centrality

Where si and pi are the area and perimeter of patch i, and N is the total number of patches

CI – Compactness index

where Di is the distance of centroid of patch i to centroid of the largest patch, N is the total number of patches, R is the radius of a circle with area of s, and s is summarization area of all patches P P pffiffiffiffiffiffiffi pi =pi 2p si =p=pi CI ¼ Ni 2 ¼ i N 2 si and pi are the area and perimeter of patch i, pi is the perimeter of a circle with the area of si, and N is the total number of patches pffiffiffiffiffiffi 2p s=p CILP ¼ p where s and p are the area and perimeter of largest patch

CILP – Compactness index of the largest patch ROS – Ratio of open space Density Patch size standard deviation Contagion index Patch density Edge density Area-weighted mean patch fractal dimension Metric/model/ method PLAND – Percentage of landscape

AWMPFD ¼

i¼1

2 ln 0:25pi =lnsi N

PN 1 Centrality ¼

i¼1

Di =N 1 R

¼

Application of spatial metrics and remote sensing to compare and analyze urban form (Huang et al., 2007)

i  Psi¼N

s i¼1 i

Pn1 Di =N1 i¼1 pffiffiffiffiffiffi S=p

0

ROS ¼ SS  100% where s is the summarization area of all “holes” inside the extracted urban area, s is summarization area of all the patches Density ¼ TS where T is the city’s total population, S is summarization area of all the patches -

-

Measurement/description Percent, 0 < PLAND < ¼ 100 PLAND equals the sum of the areas (m2) of a specific land cover class divided by total landscape area, multiplied by 100

Analysis and modeling of urban land use change using satellite remote sensing data (Herold et al., 2005)

Sample application and reference Mapping land use through satellite remote sensing data (Herold et al., 2003b)

Spatial Metrics: The Static and Dynamic Perspectives

AWMSI – Areaweighted mean shape index

192

Metric/model/ method

PD – Patch density AREA_MN – Mean patch size AREA_SD – Area standard deviation ED – Edge density

LPI – Largest patch index

COHESION CONTAG – Contagion

Meters per Hectare, ED > ¼ 0, no limit ED equals the sum of the lengths (m) of all edge segments involving a specific class, divided by the total landscape area (m2) multiplied by 10 000 (to convert to hectares) Percent, 0 < LPI < ¼ 100 LPI equals the area (m2) of the largest patch of the corresponding class divided by total area covered by that class (m2), multiplied by 100 (to convert to a percentage) Meters, ENN_MN > 0, no limit ENN_MN equals the distance (m) mean value over all patches of a class to the nearest neighboring patch based on shortest edge-to-edge distance from cell center to cell center Meters, ENN_SD > 0, no limit ENN_SD equals the standard deviation in Euclidian mean nearest neighbor distance of land cover class

None, 1 < ¼ FRAC_AM < ¼ 2 Area-weighted mean value of the fractal dimension values of all patches of a land cover class, the fractal dimension of a patch equals two times the logarithm of patch perimeter (m) divided by the logarithm of patch area (m2); the perimeter is adjusted to correct for the raster bias in perimeter None, FRAC_SD > 0, no limit FRAC_SD equals the standard deviation in fractal dimension of land cover class

Percent, 0 < COHESION < 100 Cohesion is proportional to the area-weighted mean perimeter-area ratio divided by the area-weighted mean patch shape index (i.e., standardized perimeter-area ratio) Percent, 0 < CONTAG ¼ 1, no limit PD equals the number of patches of a specific land cover class divided by total landscape area Hectares, AREA_MN > ¼ 0, no limit AREA_MN equals the average size of the patches of a land cover class Hectares, AREA_SD > ¼ 0, no limit AREA_SD equals the standard deviation in size of the patches of a land cover class

193

194

Spatial Metrics: The Static and Dynamic Perspectives

References Abdul-Rahman, A. and Pilouk, M. (2008). Spatial Data Modelling for 3D GIS, Springer Science & Business Media. Available at: https://books.google.com/books? hl ¼ en&lr ¼ &id ¼ X6a2f75ky3gC&pgis ¼ 1. Aguilera, F., Valenzuela, L.M. and Botequilha-Leitão, A. (2011). Landscape metrics in the analysis of urban land use patterns: A case study in a Spanish metropolitan area. Landscape and Urban Planning 99(3–4), 226–238. Available at: http://www.sciencedirect.com/science/article/pii/S0169204610002823 [Accessed September 27, 2015]. Ahmed, B. and Ahmed, R. (2012). Modeling urban land cover growth dynamics using multi-temporal satellite images: A case study of Dhaka, Bangladesh. ISPRS International Journal of Geo-Information 1(3), 3–31. Available at: http://www.mdpi.com/2220-9964/1/1/3/htm [Accessed October 27, 2015]. Andrienko, N. et al. (2008). Basic concepts of movement data. In: Giannotti, F. & Pedreschi, D. (eds.) Mobility, Data Mining and Privacy SE - 2, pp 15–38. Springer Berlin Heidelberg. Available at: http://dx.doi.org/10.1007/978-3-540-75177-9_2. Anselin, L. (1995). Local Indicators of Spatial Association-LISA. Geographical Analysis 27(2), 93–115. Available at: http://doi.wiley.com/10.1111/j.1538-4632.1995.tb00338.x [Accessed July 13, 2016]. Baklanov, A. et al. eds. (2009). Meteorological and air quality models for urban areas, Berlin, Heidelberg: Springer Berlin Heidelberg. Available at: http://www.springerlink.com/ index/10.1007/978-3-642-00298-4 [Accessed October 27, 2015]. Banzhaf, E., Grescho, V. and Kindler, A. (2009). Monitoring urban to peri-urban development with integrated remote sensing and GIS information: A Leipzig, Germany case study. International Journal of Remote Sensing 30(7), 1675–1696. Available at: http://dx.doi.org/10.1080/01431160802642297 [Accessed September 14, 2014]. Barnes, T., Encarnação, L.M., Shaw, C.D., 2009. Serious games. IEEE Computer Graphics and Applications 2, 18–19. Bédard, Y., Han, J., 2009. Fundamentals of spatial data warehousing for geographic knowledge discover. In: Miller, H.J., Han, J. (Eds.), Geographic data mining and knowledge discovery. CRC Press, Boca Raton, FL, pp. 45–68. Bontje, M. (2005). Facing the challenge of shrinking cities in East Germany: The case of Leipzig. GeoJournal 61(1), 13–21. Available at: http://link.springer.com/10.1007/s10708005-0843-2 [Accessed July 13, 2016]. Borak, J. S., Lambin, E. F. and Strahler, A. H. (2010). The use of temporal metrics for land cover change detection at coarse spatial scales. International Journal of Remote Sensing 21(6–7), 1415–1432. Available at: http://www.tandfonline.com/doi/abs/10.1080/014311600210245 [Accessed October 28, 2015]. Brown, G. G. and Reed, P. (2012). Social landscape metrics: Measures for understanding place values from Public Participation Geographic Information Systems (PPGIS). Landscape Research 37(1), 73–90. Available at: http://www.tandfonline.com/doi/abs/10.1080/01426397.2011.591487 [Accessed October 6, 2015]. Bui, H. H., Venkatesh, S. and West, G. (2001). Tracking and surveillance in wide-area spatial environments using the abstract hiddel markov model. International Journal of Pattern Recognition and Artificial Intelligence 15(01), 177–196. Available at: http://www.worldscientific.com/doi/abs/10.1142/S0218001401000782 [Accessed October 27, 2015]. Curtis, Carey, Scheurer, Jan, 2010. Planning for sustainable accessibility: Developing tools to aid discussion and decision-making. Progress in Planning 74 (2), 53–106. Cerdá, A. (2009). Accessibility: A performance measure for land-use and transportation planning in the Montr{é}al Metropolitan Region. Supervised Research Project Report, School of Urban Planning McGill University. Montreal, Canada. Chen, F. et al. (2011). The integrated WRF/urban modelling system: development, evaluation, and applications to urban environmental problems. International Journal of Climatology 31(2), 273–288. Available at: http://doi.wiley.com/10.1002/joc.2158 [Accessed October 27, 2015]. Clifton, K., et al., 2008. Quantitative analysis of urban form: A multidisciplinary review. Journal of Urbanism: International Research on Placemaking and Urban Sustainability 1 (1), 17–45. Crooks, A., 2015. Agent-based modeling and geographical information systems. In: Geocomputation. London: Sage Publications. Deng, C., Ma, J., 2015. Viewing urban decay from the sky: A multi-scale analysis of residential vacancy in a shrinking U.S. city. Landscape and Urban Planning 141, 88–99. Dibble, C. and Feldman, P. G. (2004). The GeoGraph 3D computational laboratory: Network and terrain landscapes for RePast. Journal of Artificial Societies and Social Simulation 7(1). Available at: http://jasss.soc.surrey.ac.uk/7/1/7.html. Eco-Consult (2010). BioClass & OptimClass. Available at: http://eco-con.net/dss.htm [Accessed October 28, 2015]. Ewing, R., Pendall, R., Chen, D., 2002. Measuring sprawl and its impact: The character and consequences of metropolitan expansion. Smart Growth America, Washington, DC. Fichera, C.R., Modica, G., Pollino, M., 2012. Land Cover classification and change-detection analysis using multi-temporal remote sensed imagery and landscape metrics. European Journal of Remote Sensing 45 (1), 1–18. Fotheringham, S. and Wegener, M. (1999). Spatial models and GIS: New and potential models. London: CRC press. François, O., Ancelet, S. and Guillot, G. (2006). Bayesian clustering using hidden Markov random fields in spatial population genetics. Genetics 174(2), 805–816. Available at: http://www.genetics.org/content/174/2/805.abstract [Accessed September 30, 2015]. Frank, S. et al. (2012). A contribution towards a transfer of the ecosystem service concept to landscape planning using landscape metrics. Ecological Indicators 21, 30–38. Available at: http://www.sciencedirect.com/science/article/pii/S1470160X11001087 [Accessed July 15, 2014]. Frenkel, A. and Ashkenazi, M. (2008). Measuring urban sprawl: How can we deal with it? Environment and Planning B: Planning and Design 35(1), 56–79. Available at: http:// epb.sagepub.com/lookup/doi/10.1068/b32155 [Accessed July 13, 2016]. Galster, G., et al., 2001. Wrestling sprawl to the ground: Defining and measuring an elusive concept. Housing Policy Debate 12 (4), 681–717. Geertman, S., 2015. Planning support systems (PSS) as research instruments. In: Silva, E.A., et al. (Eds.), The Routledge handbook of planning research methods. Routledge, New York, pp. 322–334. Getis, A., Lacambra, J., Zoller, H. (Eds.) (2004). Spatial econometrics and spatial statistics. London: Macmillan. Green, P. J. and Richardson, S. (2002). Hidden Markov Models and disease mapping. Journal of the American Statistical Association 97(460), 1055–1070. Available at: http:// amstat.tandfonline.com/doi/abs/10.1198/016214502388618870 [Accessed October 27, 2015]. Guan, Q. et al. (2015). A hybrid parallel cellular automata model for urban growth simulation over GPU/CPU heterogeneous architectures. International Journal of Geographical Information Science 30(3), 494–514. Available at: http://www.tandfonline.com/doi/full/10.1080/13658816.2015.1039538 [Accessed May 12, 2016]. Guo, L. and Di, L. (2015). Spatio-temporal evolutional characteristics of landscape patterns in the Loess Plateau in China d A landscape metrics-based assessment. In: 2015 Fourth International Conference on Agro-Geoinformatics (Agro-geoinformatics). IEEE, pp. 153–157. Available at: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber ¼ 7248094 [Accessed October 28, 2015]. Haining, R., 2015. Thinking spatially, thinking statistically. In: Silva, E.A., et al. (Eds.), The Routledge handbook of planning research methods. Routledge, New York, pp. 255–267. Haining, R. P., Kerry, R. and Oliver, M. A. (2010). Geography, spatial data analysis, and geostatistics: An overview. Geographical Analysis 42(1), 7–31. Available at: http:// doi.wiley.com/10.1111/j.1538-4632.2009.00780.x [Accessed July 13, 2016]. Herold, M., Couclelis, H. and Clarke, K. C. (2005). The role of spatial metrics in the analysis and modeling of urban land use change. Computers, Environment and Urban Systems 29(4), 369–399. Available at: http://www.sciencedirect.com/science/article/pii/S0198971503001145 [Accessed July 7, 2015]. Herold, M., Goldstein, N.C. and Clarke, K.C. (2003a). The spatiotemporal form of urban growth: measurement, analysis and modeling. Remote Sensing of Environment 86(3), 286–302. Available at: http://www.sciencedirect.com/science/article/pii/S0034425703000750 [Accessed July 22, 2014]. Herold, M., Liu, X. and Clarke, K.C. (2003b). Spatial metrics and image texture for mapping urban land use. Photogrammetric Engineering & Remote Sensing 69(9), 991–1001. Available at: http://www.ingentaconnect.com/content/asprs/pers/2003/00000069/00000009/art00005 [Accessed October 28, 2015]. Hiller, B. (2007). Space is the machine: A configurational theory of architecture. London.

Spatial Metrics: The Static and Dynamic Perspectives

195

Holsclaw, T. et al. (2015). A bayesian hidden markov model of daily precipitation over south and east asia. Journal of Hydrometeorology, p.150710131818000. Available at: http:// journals.ametsoc.org/doi/abs/10.1175/JHM-D-14-0142.1 [Accessed July 16, 2015]. Huang, J., Lu, X. X. and Sellers, J. M. (2007). A global comparative analysis of urban form: Applying spatial metrics and remote sensing. Landscape and Urban Planning 82(4), 184–197. Available at: http://www.sciencedirect.com/science/article/pii/S0169204607000588 [Accessed May 15, 2015]. Huang, S.-L., Wang, S.-H. and Budd, W.W. (2009). Sprawl in Taipei’s peri-urban zone: Responses to spatial planning and implications for adapting global environmental change. Landscape and Urban Planning 90(1–2), 20–32. Available at: http://linkinghub.elsevier.com/retrieve/pii/S0169204608001709 [Accessed July 16, 2014]. Jacobson, J., Hwang, Z., 2002. Unreal tournament for immersive interactive theater. Communications of the ACM 45 (1), 39–42. Jacquez, G.M., 1996. A k nearest neighbour test for space-time interaction. Statistics in Medicine 15 (18), 1935–1949. Kabisch, S., Haase, A. and Haase, D. (2006). Beyond growthdurban development in shrinking cities as a challenge for modeling approaches. Burlington, USA: International Environmental Modelling and Software Society. Kaza, N., 2013. The changing urban landscape of the continental United States. Landscape and Urban Planning 110, 74–86. Knaap, G.-J., Song, Y., Nedovic-Budic, Z., 2007. Measuring patterns of urban development: New intelligence for the war on sprawl. Local Environment: The International Journal of Justice and Sustainability 12 (3), 239–257. Knox, E.G., 1964. The detection of space-time interactions. Journal of the Royal Statistical Society. Series C (Applied Statistics) 13(1), 25–30. Available at: http://www.jstor.org/ stable/2985220. Kong, F. et al. (2012). Simulating urban growth processes incorporating a potential model with spatial metrics. Ecological Indicators 20, 82–91. Available at: http:// www.sciencedirect.com/science/article/pii/S1470160X12000490 [Accessed October 28, 2015]. Lauf, S. et al. (2012). Simulating demography and housing demand in an urban region under scenarios of growth and shrinkage. Environment and Planning B: Planning and Design 39(2), 229–246. Available at: http://epb.sagepub.com/lookup/doi/10.1068/b36046t [Accessed July 13, 2016]. Lausch, A. et al. (2015). Understanding and quantifying landscape structureda review on relevant process characteristics, data models and landscape metrics. Ecological Modelling 295, 31–41. Available at: http://www.sciencedirect.com/science/article/pii/S0304380014003974 [Accessed May 3, 2015]. Liu, T. and Yang, X. (2015). Monitoring land changes in an urban area using satellite imagery, GIS and landscape metrics. Applied Geography 56, 42–54. Available at: http:// www.sciencedirect.com/science/article/pii/S0143622814002306 [Accessed November 18, 2014]. Liu, Y., et al., 2016. Socioeconomic drivers of forest loss and fragmentation: A comparison between different land use planning schemes and policy implications. Land Use Policy 54, 58–68. Lord, S., et al., 2015. Growth modelling and the management of urban sprawl: Questioning the performance of sustainable planning policies. Planning Theory & Practice 16 (3), 385–406. Luke, S. et al. (2005). MASON: A multiagent simulation environment. Simulation 81(7), 517–527. Available at: http://sim.sagepub.com/content/81/7/517.abstract. Ma, L. et al. (2002). Computational fluid dynamics and the physical modelling of an upland urban river. Geomorphology 44(3–4), 375–391. Available at: http://www.sciencedirect.com/science/article/pii/S0169555X01001842 [Accessed October 27, 2015]. Marraccini, E., et al., 2015. Common features and different trajectories of land cover changes in six Western Mediterranean urban regions. Applied Geography 62, 347–356. Marshall, S. and Gong, Y. (2009). WP4 deliverable report: Urban pattern specification. London. Martínez, L., Viegas, J., Silva, E., 2007. Zoning decisions in transport planning and their impact on the precision of results. Transportation Research Record: Journal of the Transportation Research Board 1994, 58–65. Martínez, L.M., Viegas, J.M. and Silva, E.A. (2009). A traffic analysis zone definition: A new methodology and algorithm. Transportation 36(5), 581–599. Available at: http:// link.springer.com/10.1007/s11116-009-9214-z [Accessed July 13, 2016]. McCarty, J., Kaza, N., 2015. Urban form and air quality in the United States. Landscape and Urban Planning 139, 168–179. Mcgarigal, K. and Marks, B. J. (1995). FRAGSTATS: Spatial pattern analysis program for quantifying landscape structure. Portland, US. Miller, H. J. (2015). Spatio-temporal knowledge discovery. In: Brunsdon C and Singleton A (eds.) Geocomputation. London: Sage Publications. Nareyek, A. (2001). Review: Intelligent agents for computer games. In: Marsland T., & Frank, I. Chris Brunsdon, Alex Singleton (eds.). Computers and games. Lecture Notes in Computer Science, pp 414–422. London: Berlin, Heidelberg: Springer. Available at: http://dx.doi.org/10.1007/3-540-45579-5_28. Norton, T. et al. (2007). Applications of computational fluid dynamics (CFD) in the modelling and design of ventilation systems in the agricultural industry: A review. Bioresource Technology 98(12), 2386–2414. Available at: http://www.sciencedirect.com/science/article/pii/S0960852406006092 [Accessed October 5, 2015]. O’Sullivan, D. and Unwin, D. (2014). Geographic information analysis. Hoboken, New Jersey, US: John Wiley & Sons. Pallagst, K., 2010. Viewpoint: The planning research agenda: shrinking cities – a challenge for planning cultures. Town Planning Review 81 (5), i–vi. Parker, D. C., Evans, T.P. and Meretsky, V. (2001). Measuring emergent properties of agent-based landuse/landcover models using spatial metrics. In: Seventh annual conference of the International Society for Computational Economics. New Haven, Connecticut, US: Yale University. Parker, E.G. and O’Brien, J.F. (2009). Real-time deformation and fracture in a game environment. In: Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pp. 156–166. New York: Association for Computing Machinery (ACM). Parra, M.A. et al. (2010). A methodology to urban air quality assessment during large time periods of winter using computational fluid dynamic models. Atmospheric Environment 44(17), 2089–2097. Available at: http://www.sciencedirect.com/science/article/pii/S1352231010002049 [Accessed October 28, 2015]. Parrott, L., Proulx, R. and Thibert-Plante, X. (2008). Three-dimensional metrics for the analysis of spatiotemporal data in ecology. Ecological Informatics 3(6), 343–353. Available at: http://www.sciencedirect.com/science/article/pii/S157495410800037X [Accessed October 28, 2015]. Pelekis, N. et al. (2008). Towards trajectory data warehouses. In: Giannotti F., & Pedreschi, D. (eds.) Mobility, Data Mining and Privacy, pp. 189–211. Berlin, Heidelberg: Springer Berlin Heidelberg. Available at: http://www.springerlink.com/index/10.1007/978-3-540-75177-9. Pham, H.M., Yamaguchi, Y. and Bui, T.Q. (2011). A case study on the relation between city planning and urban growth using remote sensing and spatial metrics. Landscape and Urban Planning 100(3), 223–230. Available at: http://www.sciencedirect.com/science/article/pii/S016920461100017X [Accessed May 26, 2015]. Plexida, S. G. et al. (2014). Selecting landscape metrics as indicators of spatial heterogeneitydA comparison among Greek landscapes. International Journal of Applied Earth Observation and Geoinformation 26, 26–35. Available at: http://www.sciencedirect.com/science/article/pii/S0303243413000500 [Accessed October 28, 2015]. Poodat, F. et al. (2015). Prioritizing urban habitats for connectivity conservation: Integrating centrality and ecological metrics. Environmental Management 56(3), 664–674. Available at: http://www.ncbi.nlm.nih.gov/pubmed/25924790 [Accessed September 8, 2015]. Porta, S., Crucitti, P., Latora, V., 2006a. The network analysis of urban streets: A dual approach. Physica A: Statistical Mechanics and its Applications 369 (2), 853–866. Porta, S., Crucitti, P. and Latora, V. (2006b). The network analysis of urban streets: A primal approach. Environment and Planning B: Planning and Design 33(5), 705–725. Available at: http://epb.sagepub.com/lookup/doi/10.1068/b32045 [Accessed July 13, 2016]. Prates, M. O. et al. (2015). Transformed Gaussian Markov random fields and spatial modeling of species abundance. Spatial Statistics. Available at: http://www.sciencedirect.com/ science/article/pii/S2211675315000676 [Accessed October 27, 2015]. Ratti, C. et al. (2006). Mobile landscapes: Using location data from cell phones for urban analysis. Environment and Planning B: Planning and Design 33(5), 727–748. Available at: http://epb.sagepub.com/content/33/5/727.abstract [Accessed December 29, 2015]. Reis, J. P. (2015). The spatial patterns of growing and shrinking cities: An emphasis on metrics. Cambridge, UK: University of Cambridge. Reis, J.P., Silva, E.A., Pinho, P., 2014. Measuring Space: A review of spatial metrics for urban growth and shrinkage. In: Silva, E.A., et al. (Eds.), The Routledge handbook of planning research methods. Routledge, New York, pp. 279–292. Reis, J.P., Silva, E.A., Pinho, P., 2015. Spatial metrics to study urban patterns in growing and shrinking cities. Urban Geography 37 (2), 246–271.

196

Spatial Metrics: The Static and Dynamic Perspectives

Rey, S.J. and Anselin, L., 2007. PySAL: A Python library of spatial analytical methods. The Review of Regional Studies 37(1), 5–27. Available at: http://journal.srsa.org/ojs/ index.php/RRS/article/view/134 [Accessed December 29, 2015]. Rey, S.J. and Janikas, M. V. (2006). STARS: Space-Time Analysis of Regional Systems. Geographical Analysis 38(1), 67–86. Available at: http://doi.wiley.com/10.1111/j.00167363.2005.00675.x [Accessed December 29, 2015]. Rey, S.J., Mack, E.A. and Koschinsky, J. (2011). Exploratory space–time analysis of burglary patterns. Journal of Quantitative Criminology 28(3), 509–531. Available at: http:// link.springer.com/10.1007/s10940-011-9151-9 [Accessed December 29, 2015]. Salvati, L. and Carlucci, M. (2015). Land-use structure, urban growth, and periurban landscape: a multivariate classification of the European cities. Environment and Planning B: Planning and Design 42(5), 801–829. Available at: http://epb.sagepub.com/lookup/doi/10.1068/b120059p [Accessed July 13, 2016]. Sarzynski, A., Galster, G., Stack, L., 2014. Evolving United States metropolitan land use patterns. Urban Geography 35 (1), 25–47. Schneider, A. and Woodcock, C. E. (2008). Compact, dispersed, fragmented, extensive? A comparison of urban growth in twenty-five global cities using remotely sensed data, pattern metrics and census information. Urban Studies 45(3), 659–692. Available at: http://usj.sagepub.com/content/45/3/659.short [Accessed September 14, 2014]. Schwarz, N., 2010. Urban form revisiteddselecting indicators for characterising European cities. Landscape and Urban Planning 96 (1), 29–47. Semboloni, F. et al. (2004). CityDev, an interactive multi-agents urban model on the web. Computers, Environment and Urban Systems 28(1–2), 45–64. Available at: http:// www.sciencedirect.com/science/article/pii/S0198971502000479 [Accessed December 29, 2015]. Siachalou, S., Doxani, G. and Tsakiri-Strati, M. (2014). Time-series analysis of high temporal remote sensing data to model wetland dynamics: A hidden markov model approach. In: SENTINEL-2 for Science Workshop, Water: Inland, Coastal and Wetlands. Available at: http://www.researchgate.net/publication/267863683_TIMESERIES_ANALYSIS_OF_HIGH_TEMPORAL_REMOTE_SENSING_DATA_TO_MODEL_WETLAND_DYNAMICS_A_HIDDEN_MARKOV_MODEL_APPROACH [Accessed October 28, 2015]. Siachalou, S., Mallinis, G. and Tsakiri-Strati, M. (2015). A hidden Markov Models approach for crop classification: Linking crop phenology to time series of multi-sensor remote sensing data. Remote Sensing 7(4), 3633–3650. Available at: http://www.mdpi.com/2072-4292/7/4/3633/htm [Accessed October 27, 2015]. Silva, E.A., Ahern, J. and Wileden, J. (2008). Strategies for landscape ecology: An application using cellular automata models. Progress in Planning 70(4), 133–177. Available at: http://www.sciencedirect.com/science/article/pii/S0305900608000743 [Accessed September 21, 2015]. Silva, E.A. and Clarke, K. (2002). Calibration of the SLEUTH urban growth model for Lisbon and Porto, Portugal. Computers, Environment and Urban Systems 26(6), 525–552. Available at: http://www.sciencedirect.com/science/article/pii/S019897150100014X [Accessed September 10, 2015]. Silva, E.A. and Clarke, K.C. (2005). Complexity, emergence and cellular urban models: lessons learned from applying SLEUTH to two Portuguese metropolitan areas. European Planning Studies 13(1), 93–115. Available at: http://www.tandfonline.com/doi/abs/10.1080/0965431042000312424 [Accessed September 21, 2015]. Silva, E.A. and Wu, N. (2014). DG-ABC: An integrated multi-agent and cellular automata urban growth model. In: Pinto, N. N. et al. (eds.) Technologies for urban and spatial planning: Virtual cities and territories, pp 57–92. IGI Global. Available at: http://www.igi-global.com/chapter/dg-abc/104211 [Accessed September 28, 2015]. Singh, S.K. et al. (2016). Landscape transform and spatial metrics for mapping spatiotemporal land cover dynamics using Earth Observation data-sets. Geocarto International, 1– 15. Available at: http://www.tandfonline.com/doi/abs/10.1080/10106049.2015.1130084 [Accessed April 22, 2016]. Singh, S.K. et al. (2015). Predicting spatial and decadal LULC changes through cellular automata Markov Chain Models using earth observation datasets and geo-information. Environmental Processes 2(1), 61–78. Available at: http://link.springer.com/10.1007/s40710-015-0062-x [Accessed October 27, 2015]. Syrbe, R.-U. and Walz, U. (2012). Spatial indicators for the assessment of ecosystem services: Providing, benefiting and connecting areas and landscape metrics. Ecological Indicators 21, 80–88. Available at: http://www.sciencedirect.com/science/article/pii/S1470160X12000593 [Accessed July 16, 2014]. Thapa, R.B. and Murayama, Y. (2009). Examining spatiotemporal urbanization patterns in Kathmandu valley, Nepal: Remote sensing and spatial metrics approaches. Remote Sensing 1(3), 534–556. Available at: http://www.mdpi.com/2072-4292/1/3/534 [Accessed October 28, 2015]. Tian, G., Ewing, R., Greene, W., 2014. Desire for smart growth: A survey of residential preferences in the Salt Lake Region of Utah. Housing Policy Debate 25 (3), 446–462. Torrens, P. M. (2008). A toolkit for measuring sprawl. Applied Spatial Analysis and Policy 1(1), 5–36. Available at: http://link.springer.com/10.1007/s12061-008-9000-x [Accessed July 13, 2016]. Torrens, P. M. (2015). Geographical agents in three dimensions. In: Geocomputation. London: Sage Publications. Tsai, Y.-H. (2005). Quantifying urban form: compactness versus “sprawl.” Urban Studies 42(1), 141–161. Available at: http://usj.sagepub.com/cgi/doi/10.1080/ 0042098042000309748 [Accessed July 13, 2016]. Ramachandra, T. V., Aithal, B. H. and Sanna, D. D. (2012). Insights to urban dynamics through landscape spatial pattern analysis. International Journal of Applied Earth Observation and Geoinformation 18, 329–343. Available at: http://www.sciencedirect.com/science/article/pii/S0303243412000499 [Accessed October 28, 2015]. Uuemaa, E., Mander, Ü. and Marja, R. (2013). Trends in the use of landscape spatial metrics as landscape indicators: A review. Ecological Indicators 28, 100–106. Available at: http://www.sciencedirect.com/science/article/pii/S1470160X1200283X [Accessed May 6, 2015]. Valero, S. et al. (2016). Production of a dynamic cropland mask by processing remote sensing image series at high temporal and spatial resolutions. Remote Sensing 8(1), 55. Available at: http://www.mdpi.com/2072-4292/8/1/55/htm [Accessed February 12, 2016]. Vaughan, L. (2007). The spatial form of poverty in Charles Booth’s London. In: Vaughan, L., (ed). Progress in Planning: The spatial syntax of urban segregation, pp. 205–294. Elsevier, UK: Elsevier. Volchenkov, D., Blanchard, P., 2008. Scaling and universality in city space syntax: Between Zipf and Matthew. Physica A: Statistical Mechanics and its Applications 387 (10), 2353–2364. Wiechmann, T., Bontje, M., 2015. Responding to tough times: Policy and planning strategies in shrinking cities. European Planning Studies 23 (1), 1–11. Wu, J., et al., 2011. Quantifying spatiotemporal patterns of urbanization: The case of the two fastest growing metropolitan regions in the United States. Ecological Complexity 8 (1), 1–8. Yang, X. (2011). Urban remote sensing: Monitoring, synthesis and modeling in the urban environment. Oxford, UK: John Wiley & Sons. Zhang, Q. et al. (2011). Simulation and analysis of urban growth scenarios for the Greater Shanghai Area, China. Computers, Environment and Urban Systems 35(2), 126–139. Available at: http://www.sciencedirect.com/science/article/pii/S0198971510001134 [Accessed September 22, 2015]. Zlatanova, S., Abdul-Rahman, A., Pilouk, M., 2002. Trends in 3D GIS development. Journal of Geospatial Engineering 4 (2), 71–80. Zyda, M., 2005. From visual simulation to virtual reality to games. Computer 38 (9), 25–32.

1.15

Multicriteria Analysis

Jacek Malczewski, Western University, London, ON, Canada © 2018 Elsevier Inc. All rights reserved.

1.15.1 1.15.2 1.15.2.1 1.15.2.2 1.15.2.3 1.15.3 1.15.3.1 1.15.3.1.1 1.15.3.1.2 1.15.3.2 1.15.3.2.1 1.15.3.2.2 1.15.3.2.3 1.15.3.3 1.15.3.3.1 1.15.3.3.2 1.15.4 1.15.4.1 1.15.4.1.1 1.15.4.1.2 1.15.4.2 1.15.4.2.1 1.15.4.2.2 1.15.4.2.3 1.15.4.3 1.15.4.3.1 1.15.4.4 1.15.4.4.1 1.15.4.4.2 1.15.4.4.3 1.15.4.5 1.15.5 References Further Reading

1.15.1

Integrating GIS and Multicriteria Analysis Components of MCA Decision-Making Agents Decision Alternatives Evaluation Criteria Fundamental concepts of GIS-MCA Value Scaling Global value scaling Local value scaling Criteria Weights Ranking and rating Pairwise comparison Spatially explicit criteria weighting Combination Rules Compensatory and noncompensatory methods Multiattribute and multiobjective methods GIS-MCA Methods Conventional GIS-MCA Value function-base methods Outranking relation-based methods Spatially Explicit MCA Spatial WLC Spatial IP methods Geosimulation and MCA Spatial Multiobjective Optimization Conventional multiobjective optimization methods Heuaristics and Methaheuristics Basic heuristics Metaheuristics Geosimulation and multiobjective optimization Dealing with uncertainties Conclusion

197 198 198 199 199 200 200 200 200 201 201 201 202 202 202 203 203 203 203 205 206 206 207 207 208 208 209 209 210 211 212 212 213 217

Integrating GIS and Multicriteria Analysis

The primary motivation for integrating multicriteria analysis (MCA) into GIS comes from the need to expand the decision support capabilities of GIS (Sugumaran and DeGroote, 2011; Malczewski and Rinner, 2015; Ferretti and Montibeller, 2016). GIS has been designed as a general purpose system with theories of spatial representation and computing in mind (Goodchild and Haining, 2004; O’Sullivan and Unwin, 2010), and with strong assumptions about the instrumental rationality as a base for decisionmaking procedures (Alexander, 2000). As a consequence, the GIS technology is not well-suited for acquiring, storing, processing, analyzing, and visualizing data and information critical for decision making such as value judgments, preferences, priorities, opinions, and attitudes. One way of alleviating this difficulty is to integrate MCA methods into the suite of GIS procedures. While GIS can provide tools for handling the disagreements over facts by providing more and better information, the MCA methods can help in diminishing the disagreements over values among the conflicting interest parties (Feick and Hall, 1999; Jankowski and Nyerges, 2001; Sugumaran and DeGroote, 2011). An integrated GIS and MCA system is often referred to as multicriteria spatial decision support system (MC-SDSS) (Sugumaran and DeGroote, 2011; Malczewski and Rinner, 2015). There are three types of approaches for developing MC-SDSS: loose-coupling, tight-coupling, and full integration (Nyerges, 1992; Jun, 2000; Hamilton et al., 2016). In the loose-coupling approach, two systems (GIS and multicriteria modeling) exchange files such that a system uses data from the other system as the input data. A tightcoupling strategy is based on a single data or model manager and a common user interface. Thus, the two systems share not only the communication files but also common user interface. A more complete integration can be achieved by creating

197

198

Multicriteria Analysis

user-specified routines using generic programming languages. The routines then can be added to the existing set of commands or routines of the GIS package. This coupling strategy is referred to as a full integration approach. The MCA procedures have been integrated into several GIS software packages (see Malczewski and Rinner, 2015). The IDRISImulti-criteria evaluation (MCE) module provides a comprehensive set of MCA functionalities (Eastman et al. 1993). SMCE (spatial multiple criteria evaluation) is an open-source toolkit available in the Integrated Land and Water Information System (ILWIS) (Sharifi and van Herwijnen, 2002). GRASS GIS provides another open-source implementation of MCA (Massei et al. 2014). Yatsalo et al. (2015) developed a standalone GIS-MCA system: Decision Evaluation in Complex Risk Network System (DECERNS). A number of GIS-MCA approaches have been implemented in the ArcGIS/ArcView environment (e.g., Boroushaki and Malczewski, 2008; Herzig, 2008; Chen et al., 2010a,b; Maliszewski and Horner, 2010; Reynolds and Hessburg, 2014; Ozturk and Batuk, 2011). Over the last two decades, a considerable effort has been made to develop web-based MC-SDSS (e.g., Andrienko and Andrienko, 2001; Zhu et al., 2001; Jankowski et al., 2008; Simão et al., 2009; Hamilton et al., 2016). The Spatial Decision Support Knowledge Portal provides a large collection of multicriteria decision support methods, tools, and case studies (see Li et al., 2012).

1.15.2

Components of MCA

A multicriteria decision problem involves a set of alternatives that are evaluated on the basis of conflicting and incommensurate criteria according to the decision maker’s (or decision-making agent’s) preferences. There are three generic elements of any multicriteria decision problem: decision maker(s), alternatives, and criteria. They can be organized in a tabular form or a decision matrix (Table 1). The rows of the matrix represent the alternatives (e.g., geographic entities). Each alternative is described by its locational data and attribute data or evaluation criteria. Each criterion (attribute) accounts for a column in the decision matrix. Formally, matrix A is a (m  n) decision matrix in which element, aik, indicates the performance of alternative, Ai, when it is evaluated in terms of criterion, Ck, (i ¼ 1, 2, ., m, and k ¼ 1, 2, ., n). The location of the ith alternative is defined implicitly or explicitly. For conventional MCA, the location of a decision alternative is given implicitly (see “Conventional GIS-MCA” section). In the case of spatially explicit MCA (see “Spatially Explicit MCA” and “Spatial Multiobjective Optimization” sections), the location of the ith alternative is defined by the (xi, yi) coordinates. It is also assumed that the decision-making agent’s preferences with respect to the evaluation criteria are defined in terms of the criteria weights (denoted as wk, for k ¼ 1, 2, ., n). Typically, the agent’s preference is presumed to have spatially homogeneous preferences; accordingly, a single weight, wk, is assigned to the kth criterion. For the spatially explicit MCA, the value of criterion weight may vary from one location to another; consequently, the criterion weight, wik, depends on the location of the ith alternative defined in terms of the (xi, yi) coordinates. The sets of data/information contained in the decision matrix are processed using a decision (combination) rule that combines the geographic data (criterion values assigned to alternatives) and the decision-making agent’s preferences (e.g., criteria weights) to associate an overall value to each alternative. The overall values are then used for ordering the set of alternatives under consideration. The concept of decision matrix can also be used to organize the input data for group decision making. Given a decision-making agent, DMg (g ¼ 1, 2, ., z), the input data consist of a series of decision matrices, each representing the gth agent. To choose a consensus or compromise alternative, the agents have to specify their own preferences and then the individual preferences are combined by means of a group choice function. The group choice problem involves collective choice rules that produce group preferences from individual orderings.

1.15.2.1

Decision-Making Agents

Broadly speaking, decision maker (or decision-making agent) is an entity involved in the decision-making procedure. It can be an individual (e.g., searching for a house or an apartment), a group of individuals (e.g., selecting a suitable site for housing development), or an organization (e.g., allocating resources for housing development). While the conventional decision analysis focuses on the human decision maker, recent approaches to computer-based modeling provide a broader description of decision maker to include the concept of decision-making agent (Parker et al., 2003; Sengupta and Bennett, 2003). An agent is a computer program characterized by such properties as: autonomy (i.e., the capability of taking independent action), reactivity (i.e., the capability of sensing and reacting to its environment and other agents), and rationality (i.e., the capability of acting rationally to solve a problem Table 1

Decision matrix Spatial coordinates

Alternative, Ai A1 A2 A3 . Am Global weight, wk

Xi x1 x2 x3 . xm wik

Criterion/attribute, Ck Yi y1 y2 y3 . ym

C1 a11 a21 a31 . am1 w1

C2 a12 a22 a32 . am2 w2

C3 a13 a23 a33 . am3 w3

. . . . . . .

Cn a1n a2n a3n . amn wn

Multicriteria Analysis

199

at hand (Sengupta and Bennett, 2003; O’Sullivan and Unwin, 2010). Further, humanistic characteristics such as preferences, beliefs, and opinions can be a part of agent behavior. These characteristics make it possible to represent human decision makers as agents acting in a simulated real-world environment (Bone et al., 2011).

1.15.2.2

Decision Alternatives

Decision alternative can be defined as alternative courses of action. A geographic decision alternative consists of at least two elements: action (what to do?) and location (where to do it?) (Malczewski, 1999; Chakhar and Mousseau, 2008). The spatial components of a decision alternative can be specified explicitly or implicitly (Malczewski, 2006a). Examples of explicitly spatial alternatives include: alternative sites for locating facilities, alternative location-allocation patterns, and alternative patterns of land use. In many decision situations, the spatial component of a decision alternative is not explicitly present. However, there may be spatial implications associated with implementing an alternative. In such a case, the alternative is referred to as an implicitly spatial alternative (van Herwijnen and Rietveld, 1999). Spatially distributed impacts can emerge, for example, through the implementation of a particular solution to minimize flood risks in which favorable impacts are produced at one location while negative consequences result at another (Tkach and Simonovic, 1997). An alternative can be defined by a decision variable (e.g., a 0–1 variable can be associated with alternative sites for locating facilities). Constraints represent restrictions imposed on the decision variables (alternatives). They dichotomize a set of decision alternatives into two categories: acceptable (feasible) and unacceptable (infeasible). From the GIS perspective, the constraints eliminate geographic objects characterized by certain attributes and/or certain values of attributes from consideration. An alternative is feasible if it satisfies all constraints; otherwise, it is referred to as an infeasible alternative. The concept of Boolean (or logical) constraints is the most often used approach for identifying set of feasible alternatives in the GIS-MCA procedures (Malczewski, 2006a). The set of feasible alternatives can be subdivided into two categories: dominated and nondominated. This distinction is based on the Pareto optimality or efficiency principle (Cohon, 1978; Huang et al., 2008). According to the principle: if an alternative A1 is at least as desirable as alternative A2 on all criteria and more desirable on at least one criterion, then alternative A2 is dominated by A1. This implies that for a nondominated solution, an increase in the value of one of the criteria under consideration is not possible without some decrease in the value of at least one other criterion. The nondominated alternative is also referred to as the efficient or noninferior alternative.

1.15.2.3

Evaluation Criteria

Both individual criterion and a set of criteria should possess some properties to adequately evaluate decision alternatives (Keeney, 1992). Each criterion must be comprehensive and measurable. A set of criteria should be complete (it should cover all aspects of a decision problem), operational (the criteria can be meaningfully used in the analysis), decomposable (the set of criteria can be broken into parts to simplify the process), nonredundant (to avoid the problem of double counting), and minimal (the number of criteria should be kept as small as possible). A criterion is a generic term including both the concept of objective and attribute (Malczewski, 1999). An objective is a statement about the desired state of a system (e.g., location-allocation system, land-use pattern, transportation system). It indicates the directions of improvement of one or more attributes. The statement about desired directions of improvement can be interpreted as either “the more of the attribute, the better” or “the less of the attribute, the better.” This implies a maximization or minimization of an objective function. Thus, the concept of an objective is made operational by assigning to each objective at least one attribute, which directly or indirectly measures the level of an achievement of the objective. An attribute can be described as a property of an element of a real-world geographic system. More specifically, an attribute is a measurable quantity or quality of a geographic entity or a relationship between geographic entities. For example, the objective of maximizing physical accessibility to the public service facilities can be operationalized by attributes such as the total traveling distance, time, cost, or any other measure of spatial proximity. The relationships between objectives and attributes have a hierarchical structure. The most general objectives are at the highest level. The general objectives may be defined in terms of more specific objectives, which are defined at lower levels. At the lowest level of the hierarchy are attributes, which are quantifiable indicators of the extent to which associated objectives are realized (Saaty, 1980). The concept of hierarchical structure of criteria underlies a value-focused approach for structuring multicriteria decision problems (Keeney, 1992). This approach uses the values (evaluation criteria) as the fundamental element of the decision analysis. It involves specifying criteria to evaluate a set of alternatives. Fig. 1 shows an example of hierarchical structure of the main elements of decision problem. The top level of the hierarchical structure is the ultimate goal (or overall objective) of the decision at hand (e.g., the goal is to identify the best spatial pattern of land uses, to select the best site for a public service facility, to find the shortest transportation route). The hierarchy then descends from the general to the more specific until a level of attributes is reached. This is the level against which the decision alternatives of the lowest level of the hierarchy are evaluated. Typically, the hierarchical structure consists of four levels: goal, objectives, attributes, and alternatives. However, a variety of elements relevant to a particular decision situation and different combination of these elements can be used to represent decision problems (Saaty, 1980; Malczewski and Rinner, 2015). Examples of hierarchical structuring of spatial decision problems in GIS-MCA are given in Bojórquez-Tapia et al. (2001), Sharifi et al. (2004), Rinner and Taranu (2006), and Demetriou et al. (2012).

200

Multicriteria Analysis

Goal Objective 1

Objective 2

Attribute 1 a11

a12

a21 a31

a41

Attribute 2

a51

a42

a22

a32

a52

Attribute 3 a13 a43

a33

a23 a53

Fig. 1 Hierarchical structure of decision problem; aik is the value of the kth attribute (criterion) associated with the i th alternative (k ¼ 1, 2, 3, and i ¼ 1, 2, 3, 4, 5).

1.15.3

Fundamental concepts of GIS-MCA

The procedures for tackling spatial multicriteria problems involve three rudimentary concepts: value scaling (or standardization), criterion weighting, and combination (decision) rule (Malczewski and Rinner, 2015).

1.15.3.1 1.15.3.1.1

Value Scaling Global value scaling

A method for transforming raw criterion scores to comparable units is referred to as the value scaling or standardization procedure. The notion of value function is an underlying concept of any standardization method. A value function is a mathematical representation of human judgment (Keeney, 1992). It relates possible decision outcomes (criterion or attribute values) to a scale, which reflects the decision-making agent’s preferences with respect to different levels of criterion values. There are a number of methods for standardizing raw data (Hwang and Yoon, 1981; Voogd, 1983). The score range procedure is the most often used GIS-based method for value scaling (Malczewski, 2006a). The procedure transforms the raw criterion scores, a1k, a2k, ., amk, into standardized values, v(aik), as follows:   vðaik Þ ¼ max faik g  aik (1) rk ; f or the kth criterion to be minimized; i

 vðaik Þ ¼

 aik  minfaik g i

rk ;

f or the kth criterion to be maximized;

(2)

where aik is a score for the ith alternative with respect to the kth criterion (i ¼ 1, 2, ., m; k ¼ 1, 2, ., n); mini aik and max i aik are the minimum and maximum criterion values for the kth criterion, respectively; and rk ¼ max faik g  minfaik g i

i

(3)

is the range of the kth criterion. The standardized score values, v(aik), range from 0 to 1; 0 is the value of the least-desirable outcome and 1 is the most-desirable score. Since the range is defined for the whole study area, the rk value is referred to as the global range (Malczewski, 2011). Consequently, vk(ai) is the global value function. Jiang and Eastman (2000) suggest that the concept of fuzzy set membership provides a basis for developing a generalized value scaling approach in GIS-MCA. The concept has been implemented in the fuzzy module of IDRISI (Eastman, 1997). The fuzzy procedure assigns a value to a decision alternative (pixel) based on its membership in a fuzzy set. It involves specifying a fuzzy set membership function, which can take a form of linear or nonlinear value function. In real-world applications of GIS-MCA, a nonlinear value function can be approximated by a piecewise linear form (Pereira and Duckstein, 1993; Eastman, 1997). Examples of implementation of the value function concept in GIS are given in Ozturk and Batuk (2011), Chen and Paydar (2012), Demetriou et al. (2012), and Yatsalo et al. (2015).

1.15.3.1.2

Local value scaling

Conventional MCA methods assume spatial homogeneity of preferences with respect to criterion values. A local form of value function can be developed to take into account the spatial varying preferences (Malczewski, 2011). The spatial variation of the value function can be operationalized by the concept of local range:  q  q q rk ¼ max aik  min aik ; (4) iq

iq

Multicriteria Analysis

201

 q  q where min aik and max aik are the minimum and maximum values of the kth criterion in the qth subset (q ¼ 1, 2, ., h) of the iq

iq

locations, i ¼ 1, 2, ., m; m > h, respectively. A number of methods for defining the subset of locations i ˛ q can used; for example, the moving window and the p-nearest neighbor procedures can be used to define a set of overlapping neighborhoods or the study area can be subdivided into discrete units (neighborhoods, zones, or regions) (Lloyd, 2010; O’Sullivan and Unwin, 2010). Given the definition of the local range, the local value function v(aqik) converts different levels of the kth attribute associated with the ith alternative located in the qth neighborhood. Specifically, the local form of the global value function (see Eqs. 1 and 2) can be defined as follows:     n o rkq ; f or the k  th criterion to be minimized; and (5) v aqik ¼ max aqik  aqik i; q

   n o rkq ; v aqik ¼ aqik  min aqik i;q

f or the k  th criterion to be maximized;

(6)

 q  q where min i; q aik and max i; q aik are the minimum and maximum criterion values for the kth criterion in the qth neighborhood, q respectively, and rk is the local range (Eq. 4). The standardized values v(aqik) range from 0 to 1, with 0 being the value of the leastdesirable outcome and 1 is the value assigned to the most-desirable alternative in the qth neighborhood (Malczewski, 2011; Carter and Rinner, 2014; S¸alap-Ayça and Jankowski, 2016).

1.15.3.2

Criteria Weights

A weight assigned to the kth criterion is a value that indicates its importance relative to the other criteria under consideration. The P criteria weights, w1, w2, ., wk, ., wn, are typically assumed to meet the following conditions: 0  wk  1, and nk¼1 wk ¼ 1. The weights must be ratio scaled (Hobbs and Meier, 2000). They should represent the trade-off between two criteria: how large a gain on one of them would be required to compensate a loss on the other? Assigning weights to evaluation criteria must account for the changes in the ranges of criterion values (see “Value Scaling” section), and the different degrees of importance being attached to those ranges. Since the meaning of criteria weights depends on multicriteria decision rule (see “Combination Rules” section), the weights may have different interpretations for different MCA methods and decision contexts (Lai and Hopkins, 1989; Choo et al., 1999; Belton and Stewart, 2002). Many methods have been suggested for assessing criteria weights (Hwang and Yoon, 1981; Stillwell et al., 1981; Choo et al., 1999; Hobbs and Meier, 2000). From the perspective of GIS-MCA, the methods can be classified into two groups: global (conventional) procedures and local (or spatially explicit) methods. The global techniques are based on the assumption of spatial homogeneity of preferences. Consequently, they assign a single weight to each criterion. A vast majority of the GIS-MCA applications have used one of the three global weighting methods: ranking, rating, and pairwise comparison (Malczewski, 2006a). To take into account spatial heterogeneity of preferences, spatially explicit criterion weighting methods such as the proximity-adjusted criteria weights and range-based local weighting have been proposed (Malczewski, 2011; Ligmann-Zielinska and Jankowski, 2012). A critical discussion of the criteria weighting methods in GIS-MCA can be found in Malczewski and Rinner (2015).

1.15.3.2.1

Ranking and rating

A simple method for estimating the criteria weights is to rank the criteria in the order of the decision-making agent’s preference (Stillwell et al., 1981). First, the straight ranking (the most important ¼ 1, second important ¼ 2, etc.) is used. Once the ranking is established for a set of criteria, each rank is divided by the rank sum. The rating methods require the decision-making agent to estimate weights on the basis of a predetermined scale; for example, a scale of 0 to 100. Given the scale, a score of 100 is assigned to the most important criterion. Proportionately smaller weights are then given to criteria lower in the order. The procedure is continued until a score is assigned to the least important criterion. Finally, the weights are normalized by dividing each of the weights by the sum total. The ranking and rating methods have been used in a number of GIS-MCA applications (see Malczewski and Rinner, 2015). Jankowski et al. (2008) demonstrated the use of those methods for estimating criteria weights in the ArcGIS-based Choice Modeler system. Ozturk and Batuk (2011) have implemented the ranking and rating methods into the ArcGIS-MCA system. The methods are also available in the ILWIS-SMCE module (Sharifi et al., 2004).

1.15.3.2.2

Pairwise comparison

The pairwise comparison method (Saaty, 1980) is the most often used procedure for estimating criteria weights in GIS-MCA applications (Malczewski, 2006a). The method employs an underlying scale with values from 1 to 9 to rate the preferences with respect to a pair of criteria. The pairwise comparisons are organized into a matrix: C ¼ [ckp]n  n; where ckp is the pairwise comparison rating for the kth and pth criteria. The matrix C is reciprocal; that is, cpk ¼ c1 kp , and all its diagonal elements are unity; that is, ckp ¼ 1, for k ¼ p. Once the pairwise comparison matrix is created, a vector of criteria weights, w ¼ [w1, w2, ., wn] can be obtained as the unique solution to: Cw ¼ lmax; where lmax is the largest eigenvalue of C. Saaty (1980) provides several methods for approximating the values of criteria weights. One of the most often used is the procedure of averaging over normalized columns. First, the entries in the matrix C P P are normalized: ckp ¼ ckp = nk¼1 ckp ; and then the weights are computed as follows: wk ¼ np¼1 ckp =n, for k ¼ 1,2, ., n.

202

Multicriteria Analysis

The method allows for some degree inconsistency in the pairwise comparisons (Saaty, 1980). A measure of inconsistency is based on the observation that lmax > n for positive, reciprocal matrices, and lmax ¼ n if C is a consistent matrix. The consistency ratio (CR) can be defined as follows: CR ¼ ðlmax  1Þ=ðRI  ðn  1ÞÞ; where RI is the random index, which is the consistency index of a randomly generated pairwise comparison matrix. It can be shown that RI depends on the number of criteria being compared. For example, for n ¼ 2, 3, 4, 5, 6, 7, and 8, RI ¼ 0.00, 0.52, 0.89, 1.11, 1.25, 1.35, and 1.40, respectively (Saaty, 1980). The consistency ratio, CR < 0.10, indicates a reasonable level of consistency in the pairwise comparisons; if, however, CR  0.10, then the value of the ratio is indicative of inconsistent judgments. In such cases, one should reconsider and revise the original values in the pairwise comparison matrix, C. The pairwise comparison method is a part of multicriteria decision support modules in IDRISI (Eastman et al., 1993), ILWISSMCE (Sharifi et al., 2004), CommonGIS (Rinner and Taranu, 2006), and GRASS (Massei et al. 2014). The method has also been implemented in the ArcGIS/ArcView environment (Zhu et al., 2001; Boroushaki and Malczewski, 2008; Ozturk and Batuk, 2011; Chen et al., 2013).

1.15.3.2.3

Spatially explicit criteria weighting

The local range-based method (Malczewski, 2011) and proximity-adjusted approach (Ligmann-Zielinska and Jankowski, 2012) provide examples of procedures for spatially explicit criteria weighting. The critical aspect for criteria weighting methods is that the weight, wk, depends on the range of the criterion values, rk. This implies that a criterion weight is intricately associated with corresponding value function, v(aik). Consequently, a meaningful estimate of a weight requires that at least the upper and lower limits of the value function (and its measurement unit) have been specified (Hwang and Yoon, 1981; Malczewski, 2000). The relationship is encapsulated in the range-sensitive principle (Keeney, 1992; Fischer, 1995). The principle suggests that, other things being equal, the greater the range of values for the kth criterion, the greater the weight, wk, should be assigned to that criterion (Fischer, 1995). Given the definition of the qth neighborhood (see “Outranking relation-based methods” section), the local criterion weight, wqk, for the kth criterion can be defined as a function of the global weight, wk, the global range, rk, and the local range, rqk. Specifically, , n n X X q q q wk ¼ bk bk ; 0  wk  1 and wk ¼ 1 (7) k¼1

k¼1

q wk rk =rk . Since the spatial variability of the local weight, wqk, is a function of the local criterion range, rqk, the value of a local

where bk ¼ weight depends on the neighborhood scheme used for subdividing a study area into neighborhoods (zones or regions). The method has been used as an element of GIS-MCA in Malczewski (2011), Carter and Rinner (2014), Malczewski and Liu (2014), and S¸alapAyça and Jankowski (2016). The proximity-adjusted criterion weighting is based on the idea of adjusting preferences according to the spatial relationship between alternatives or an alternative and some reference locations (Rinner and Heppleston, 2006). Ligmann-Zielinska and Jankowski (2012) operationalized the concept of proximity-adjusted criterion weights by introducing a reference or benchmark location. They suggest that the weights should reflect both relative importance of the criterion and the spatial position of a decision alternative with respect to a reference location. The relative importance is assessed in terms of the global criterion weight; that is, the same value of wk is assigned to each decision alternative evaluated with respect to the kth criterion. The location effect is measured by means of a distance decay function; the closer a given alternative is situated to a reference location, the higher the value of the criterion weight should be.

1.15.3.3

Combination Rules

At the most fundamental level, a decision rule is a procedure for evaluating (and ordering) a set of decision alternatives (Hwang and Yoon, 1981). The decision rules are also referred to as the rules of combination. A GIS-MCA combination rule integrates the data and information about alternatives (criterion maps) and decision-making agent’s preferences (criterion weights) into an overall assessment of the alternatives. There are a number of classification of decision rules i (Malczewski and Rinner, 2005). Here we focus on the following dichotomic classifications: compensatory versus noncompensatory, and multiattribute versus multiobjective.

1.15.3.3.1

Compensatory and noncompensatory methods

The distinction between compensatory and noncompensatory decision rules is based on the trade-offs between evaluation criteria: the former takes into account the trade-offs between criteria, while the latter disregards the value of trade-offs. The compensatory methods allow trade-off of a low value on one criterion against a high value on another. The weighted linear combination (WLC) model provides an example of the compensatory method in GIS-MCA (see “WLC and related methods” section). The noncompensatory decision rules are conceptualized in GIS-MCA using Boolean overlay operations in the form of the conjunctive and disjunctive screening methods (Malczewski, 1999). Under conjunctive screening, an alternative is accepted if it meets specified standards or thresholds for all evaluation criteria. Disjunctive screening accepts alternative scores sufficiently high on at least one of the criteria under consideration (Carver, 1991; Eastman et al., 1993).

Multicriteria Analysis 1.15.3.3.2

203

Multiattribute and multiobjective methods

Multicriteria decision rules can be broadly categorized into two groups: multiattribute decision analysis and multiobjective decision analysis methods (Hwang and Yoon, 1981). Multiattribute decision problems involve a predetermined, limited number of decision alternatives. The alternatives are given explicitly rather than defined implicitly as in the case of multiobjective decision. Solving a multiattribute problem is an outcome-oriented evaluation and choice process. The multiobjective approach is a processoriented design and search. Unlike multiattribute approaches, the multiobjective methods make a distinction between the concept of decision variables and decision criteria. These two elements are related to one another by a set of objective functions, although the multiattribute and multiobjective decision methods are sometimes referred to as discrete and continuous decision problems, respectively. It is important to indicate, however, that the multiobjective decision problems can be defined in terms of a set of continuous and/or discrete decision variables (Zarghami and Szidarovszky, 2011). A good illustration of the distinction between multiattribute and multiobjective (and the discrete and continuous decision) problems is provided by the site selection and site search problems (Cova and Church, 2000). The aim of site selection analysis is to identify the best site for some activity given the set of potential (feasible) sites. In this type of analysis, all the characteristics (such as location, size, and relevant attributes) of the candidate sites are known. The problem is to rate (rank) the alternative sites based on their characteristics so that the best site (or a set of sites) can be identified. If a set of candidate sites is not predetermined, then the problem is referred to as a site search analysis. The characteristics of the sites (i.e., their boundaries) have to be defined by solving the problem; that is, identifying the boundary of the best site(s). The site selection problem is typically tackled in the GIS environment using multiattribute methods, including WLC, analytic hierarchy process (AHP), ideal point (IP) methods, and outranking methods (see “Conventional GIS-MCA” and “Spatially Explicit MCA” sections). The site search problem is usually formulated in terms of spatial multiobjective optimization problem and solved using methods of mathematical programming and heuristic/metaheuristic algorithms (see “Heuaristics and Methaheuristics” section).

1.15.4

GIS-MCA Methods

The methods of GIS-MCA can be classified into three groups: the conventional GIS-MCA for spatial decision making, spatially explicit MCA, and spatial multiobjective optimization (Malczewski and Rinner, 2015).

1.15.4.1

Conventional GIS-MCA

The conventional GIS-MCA approaches are merely adaptations of existing MCA methods for analyzing spatial decision problems. They usually involve spatial variability only implicitly by defining evaluation criteria based on the concept of spatial relations such as proximity, adjacency, and contiguity (van Herwijnen and Rietveld, 1999; Ligmann-Zielinska and Jankowski, 2012). The conventional approaches also assume a spatial homogeneity of the decision-making agent’s preferences within a given study area (Malczewski, 2011). A number of conventional MCA methods have been adapted for the use in a GIS environment. The most popular GIS-MCA methods include: the WLC and related procedures (Eastman et al., 1993; Malczewski, 2000), IP methods (Pereira and Duckstein, 1993; Tkach and Simonovic, 1997), the analytical hierarchy/network process (AH/NP) (Banai, 1993; Marinoni, 2004), and ELECTRE (ELimination Et Choix TRaduisant la REalité) and PROMETHEE (Preference Ranking Organization METHod for Enrichment Evaluations) (Joerin et al., 2001; Martin et al., 2003). These methods can be grouped into two distinctive categories: the value function-base methods (WLC, IP, and AH/NP) and outranking relation-based methods (ELECTRE and PROMETHEE).

1.15.4.1.1

Value function-base methods

1.15.4.1.1.1 WLC and related methods The WLC is the most often used GIS-MCA method (see Malczewski, 2006a). The WLC model consists of two components: criterion weights, wk, and value functions, v(aik) (see “Value Scaling” and “Criteria Weights” sections). It is a map combination procedure that associates with the ith decision alternative (location) a set of criteria weights, w1, w2,.,wn, and combines the weights with the criterion (attribute) values, ai1, ai2, ., ain, (i ¼ 1, 2, ., m) as follows: V ðAi Þ ¼

n X

wk vðaik Þ;

(8)

k¼1

where V(Ai) is the overall value of the ith alternative; v(aik) is the value of the ith alternative with respect to the kth attribute. The alternative characterized by the highest value of V(Ai) is the most preferred one. The most often used approaches for assessing the criteria weights and value functions in GIS-WLC are the pairwise comparison procedure (“Pairwise comparison” section) and the criterion range standardization method (Eqs. 1 and 2) (see Malczewski, 2006a). The WLC model is based on the assumptions of linearity (i.e., desirability of an additional unit of an attribute is constant for any level of that attribute) and additively (i.e., the attributes under consideration are mutually preference independent of each other). It should be emphasized that one cannot meaningfully assess the value of criterion weight without identifying the value function (the range of criterion values) (Keeney, 1992; Hobbs and Meier, 2000). The ratio of two criteria weights, w1 and w2, should be inversely proportional to the rate at which one is willing to trade the criteria off. This principle should be applied irrespectively of the method

204

Multicriteria Analysis

used for assessing the criteria weights. The assumptions behind WLC are often very difficult to apply in the real-world situations (Malczewski, 2000). Furthermore, there is evidence to show that the WLC method yields “close approximations to very much more complicated nonlinear forms, while remaining far easier to use and understand” (Hwang and Yoon, 1981, p. 103; Stewart, 1996). Hobbs (1980), Lai and Hopkins (1989), and Malczewski (2000) provide critical discussions of the use of GIS-WLC. One of the main advantages of WLC is that the method can easily be implemented within the GIS environment using map algebra operations. The method is also intuitively appealing to those involved in the decision-making/evaluation procedures. Consequently, GIS-WLC has been used for analyzing decision and management situations in a variety of application domains (e.g., Eastman et al., 1993; Jankowski, 1995; Geneletti, 2005; Gorsevski et al., 2013). Several GISystems (such as IDRISI, ILWIS, and CommonGIS) feature decision support modules performing the WLC procedure (Eastman, 1997; Rinner and Malczewski, 2002; Sharifi et al., 2004). 1.15.4.1.1.1.1 The ordered weighted averaging The ordered weighted averaging (OWA) is a generalization and extension of the WLC model (Yager, 1988; Jiang and Eastman, 2000). For a given set of criteria (attribute) maps, OWA is a map combination procedure that associates with the two types of weights: a set of criteria weights, w1, w2, ., wn, and a set of order weights u1, u2, ., un (0  uk  1, P and nk¼1 uk ¼ 1). The OWA model has a form similar to WLC (Eq. 8). Specifically it combines the reordered standardized criterion values and modified (reordered) criteria weights with the order weights. The generality of OWA is related to its capability to implement a wide range of combination operators by selecting appropriate set of order weights (Yager, 1988). The family of OWA operators includes the most often used GIS-base map combination procedures: the conventional WLC and Boolean overlay operations, such as intersection (AND) and union (OR) (Jiang and Eastman, 2000). A set of equal order weights (n 1, n 1, ., n 1) result in the WLC scores; the order weights (1.0, 0.0, ., 0.0) assign a weight of 1.0 to the highest (best) criterion value for each location, resulting in an OR-type combination; the order weights (0.0, ., 0.0, 1.0) assign a weight of 1.0 to the lowest (worst) values, resulting in the AND combination (Jiang and Eastman, 2000; Malczewski et al., 2003). The AND and OR operators represent the extreme cases of OWA and they correspond to the MIN and MAX operators, respectively (Jiang and Eastman, 2000; Malczewski, 2006b; Bell et al., 2007). A key issue associated with using OWA is the method for determining the order weights. In the GIS-OWA applications, the weights are often defined “intuitively” based on the degree of ORness and trade-off (Eastman, 1997; Jiang and Eastman, 2000; Rinner and Malczewski, 2002). The maximum entropy method has been used in Malczewski et al. (2003) and Makropoulos and Butler (2006). This method allows for determining the optimal values of order weights by maximizing the measure of entropy (dispersion of the order weights), subject to a specify degree of ORness. Yager’s (1996) approach for defining order weights with the linguistic quantifiers has been implemented in GIS-MCA by Malczewski (2006b) (see also Boroushaki and Malczewski, 2008; Chen and Paydar, 2012). The OWA concept has been extended to the GIS applications by Eastman (1997) as a part of decision support module in GIS-IDRISI. The development of the IDRISI-OWA module has stimulated the implementation of OWA in the ArcView/ArcGIS environment (Malczewski et al., 2003; Makropoulos and Butler, 2006; Boroushaki and Malczewski, 2008; Ozturk and Batuk, 2011; Eldrandaly, 2013; Massei et al., 2014). In addition, an effort has been made to implement GIS-OWA as a web-enabled system using CommonGIS (Rinner and Malczewski, 2002; Malczewski and Rinner, 2005). The OWA procedure has also been integrated into the location-based services (Rinner and Raubal, 2004) and personalized route planning (Nadi and Delavar, 2011). Boroushaki and Malczewski (2010) implemented the concept of fuzzy majority in ArcGIS as a MultiCriteria Group Analyst extension. The procedure applies a quantifier-guided OWA operator for generating the solution maps according to the individual preferences, and then the fuzzy majority approach is employed for aggregating the individual preferences (see also Meng and Malczewski, 2010). 1.15.4.1.1.2 Analytic hierarchy/network process The methods of analytic hierarchy/network process (AH/NP) are based on three principles: decomposition, comparative judgment, and synthesis of priorities (Saaty, 1980, 1996). The decomposition principle requires that a decision problem be decomposed into a hierarchy/network that captures the essential elements of the problem. The principle of comparative judgment requires assessments (pairwise comparisons) of the elements of the hierarchical or network structure. The pairwise comparison is the basic measurement mode employed in the AH/NP procedure (see “Pairwise comparison” section). The synthesis principle takes each of the derived ratio-scale priorities for the elements of decision problem and constructs a composite set of priorities for the decision alternatives. One of the underlying assumptions of the AHP is that the elements of a decision problem are independent. This is a rather strong assumption especially in the context of spatial decision problems, typically involving a complex pattern of the interactions and dependences among elements of the decision situations. Saaty (1996) proposed a method, analytic network process (ANP), for tackling the dependence problem. The ANP method is an extension and generalization of AHP. The AH/NP model is a form of WLC (see “WLC and related methods” section). If the hierarchical structure (see Fig. 1) consists of three levels (the goal, attributes, and alternatives), then the AH/NP method is an equivalent of WLC with criteria weights defined by the pairwise comparison method. The major advantage of using AH/NP rather than WLC is the former provides a tool for focusing decision maker attention on developing a formal structure to capture all the important elements of a decision situation. There have been a number of approaches for integrating GIS and AH/NP (see Malczewski, 2006a). They can be classified into two groups. The first group includes GIS-AH/NP systems having the capabilities of estimating criteria weights based on the

Multicriteria Analysis

205

comparative judgment principle (Eastman et al., 1993; Jun, 2000; Ozturk and Batuk, 2011). However, the systems do not have the capabilities of representing decision problem using the decomposition principle and calculating the overall evaluation score according to the AH/NP synthesis of priorities model. They typically use some form of additive weighted model for calculating the overall evaluation scores. The IDRISI-multi-criteria evaluation module provides an example of this category of GIS-AHP (Eastman et al., 1993). The second group of GIS-AH/NP includes systems that are based on the three principles: decomposition, comparative judgment, and synthesis of priorities. CommonGIS (Rinner and Taranu, 2006), ILWIS-SMCE (Sharifi et al., 2004), and the Ecosystem Management Decision Support System/Criterium DecisionPlus (Reynolds and Hessburg, 2014) exemplify this type of GIS-AH/NP. The AH/NP methods have also been implemented in the ArcGIS/ArcView environment (e.g., Banai, 1993; Zhu et al., 2001; Boroushaki and Malczewski, 2008; Ozturk and Batuk, 2011; Eldrandaly, 2013). GIS-AH/NP has proved to be an effective approach for a wide variety of decision and management situations and in a broad range of application domains (e.g., Nekhay et al., 2009; Estoque, 2012; Ferretti and Pomarico, 2013). It has successfully been used for tackling spatial decision problems in the individual and group/participatory decision-making settings. There are essentially two approaches for group decision making with AHP/ANP: consensus approach involves debating the individual judgments and voting to identify a compromise solution, and aggregate approach involves synthesizing each of the individual’s judgments and combining the resulting priorities. The consensus approach is based on the premise that a group of individuals can generate a single hierarchical structure for a decision problem. In the aggregation approach, each individual generates its own hierarchy (or subhierarchy) of the decision problem’s elements and then the geometric mean method is used for aggregating individual preferences (e.g., Nekhay et al., 2009). Despite the widespread use of AHP/ANP, the method has not been without criticism (Belton and Gear, 1983). The criticisms include: the ambiguity in the meaning of the relative importance of one element of the decision hierarchy when it is compared to another element, the number of comparisons for large-size problems, and the use of a 1-to-9 scale. Some decision analysts argue that the type of questions asked during the process of pairwise comparisons are meaningless. In addition, it is argued that for a large decision problem, there are too many pairwise comparisons that must be performed. Another criticism is related to the so-called “rank reversal” problem (Belton and Gear, 1983). Specifically, the AHP analysis may indicate that alternative A1 is preferred to alternative A2 when alternative A3 is not being considered; but when alternative A3 is included as an option, it may indicate that alternative A2 is preferred to alternative A1. 1.15.4.1.1.3 The IP methods The IP methods evaluate a set of decision alternatives on the basis of their separations from some ideal (reference) point. A reference point represents a hypothetical alternative. It can be any significant target or goal against which the decision alternatives are evaluated. This hypothetical alternative is often defined in terms of the positive ideal (utopia) point, or negative ideal (or antiideal) point (Zeleny, 1982). A separation of the ith decision alternative from a reference point is defined by means of a distance metric, Lp(Ai); p is a power parameter ranging from 1 to N. The p parameter is set at 1, then the rectangular distance (or Manhattan metric) is used for measuring the separation between the ith alternative and the reference point; for p ¼ 2 the straight-line distance is calculated; if p ¼ N, then the minimum of the maximum weighted deviation is sought. The p parameter is a “balancing factor” between the two extreme cases when p ¼ 1 and p ¼ N. The overall value of L1(Ai) is a solution that minimizes the total regret; that is, the weighted sum of the regrets associated with all the criteria. The deviation from the ideal can be interpreted a measure of disutility of the ith alternative. The L1(Ai) value is a complement of the overall value (utility) of a given alternative; that is, L1(Ai) ¼ 1  V(Ai). For p ¼ 1, all weighted deviations are assumed to compensate each other perfectly; that is, a decrease of one unit in a given criterion value can be compensated by an equivalent increase in any other criterion (Zeleny, 1982). For p ¼ 2, each deviation is accounted for in direct proportion to its size. This implies a partial compensation between criteria. As p approaches infinity, the alternative with the largest deviation completely dominates the distance measure resulting in a mini–max, noncompensatory decision rule. One can design a number of decision rules using the family of the separation measures, Lp(Ai). The most popular GIS-based methods are: the IP approach (Carver, 1991; Pereira and Duckstein, 1993) and the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) (Malczewski, 1996; Chen et al., 2001). The IP approach rates the decision alternatives under consideration according to their multidimensional distance to the IP using the distance metric, Lp(Ai). Two forms of the model can be  defined: the IP model, Lþ p ðAi Þ, and negative ideal model, Lp ðAi Þ. The two models can result in different orderings of alternatives.  To avoid this “ambiguity,” the TOPSIS method combines Lp ðAi Þ and Lþ p ðAi Þ into a composite measure. It defines the best alternative as the one that is simultaneously closest to the ideal alternative and furthest away from the negative IP. The IP models have a prominent place in the area of GIS-MCA applications (Malczewski, 2006a). In particular, this type of GIS-MCA has made significant contribution to the development of GIS-based land-use suitability analysis (Carver, 1991; Eastman et al., 1993; Pereira and Duckstein, 1993; Elaalem et al., 2011). GIS-IP methods, enhanced by a voting procedure (such as the Borda count method), have also proved to be effective approaches for tackling spatial decision problems in group, participatory, and collaborative settings (Malczewski, 1996; Chen et al., 2001; Jankowski et al., 2008). Although the IP approaches have predominantly been used as normative modeling tools, Pereira and Duckstein (1993) have demonstrated their utility as predictive models. They have also shown the capabilities of the GIS-IP approach for performing the sensitivity analysis based on the p parameter.

1.15.4.1.2

Outranking relation-based methods

The underlying assumption of outranking methods is that the decision-making agent’s preference structure can be represented by outranking relations, which are defined for each pair of alternatives Ai and Aj. The ith alternative outranks the jth alternative if there

206

Multicriteria Analysis

is enough evidence to declare that Ai is at least as good as Aj on the majority of the criteria, while there is no essential evidence to show that the statement is false with respect to the remaining criteria. The pairwise comparison procedure involves determining the extent to which criterion scores and associated weights confirm or contradict the pairwise relationships between alternatives. The procedure typically uses the concordance and disconcordance measures (Voogd, 1983). The former are based on the concordance set; that is, the subset of all criteria for which the ith alternative is not worse than the competing alternative, j. The latter measures are based on the disconcordance set; that is, the subset of all criteria for which alternative, i, is worse than the competing alternative, j. There are a wide variety of formulas available to calculate the overall score for each alternative on the basis of the two indicators. The most popular outranking methods are: ELECTRE (ELimination Et Choix TRaduisant la REalité) (Roy, 1968) and PROMETHEE (Preference Ranking Organization METHod for Enrichment Evaluations) (Brans et al., 1984). These two outranking methods have also been most often used for integrating into GIS (e.g., Joerin et al., 2001; Massei et al., 2014). Carver (1991), Joerin et al. (2001), Chakhar and Mousseau (2008), and Louvart et al. (2015) provide examples of integrating ELECTRE into GIS. Studies by Joerin et al. (2001) and Chakhar and Mousseau (2008) are of particular significance for the GIS-ELECTRE methods because they address the problem of defining spatial decision alternatives in the context of outranking analysis. Examples of the GIS-PROMETHEE approaches are given in Martin et al. (2003), Marinoni (2005), and Yatsalo et al. (2015). Gilliams et al. (2005) developed a SDSS called AFFOREST that integrates GIS capabilities and MCA methods, including ELECTRE and PROMETHEE. One of the conclusions of the study was that GIS-PROMETHEE performs slightly better than GIS-ELECTRE with respect to such considerations as: user friendliness, simplicity of the model strategy, variation of the solution, and implementation. The two outranking methods, ELECTRE and PROMETHEE, have also been integrated into GIS to support group decision making. These methods can be used according two schemes: a consensus on the preference structure of decision makers is achieved first and then the group preferences are used within the conventional outranking methods, and the individual decision makers solve the problem separately, and then the individual solutions are aggregated. The former approach has often been applied by integrating GIS and ELECTRE (e.g., Joerin et al., 2001), while the latter has been more popular in applications based on integrating GIS and PROMETHEE (e.g., Martin et al., 2003). Since the outranking relations are based on a voting analogy, the ELECTRE methods can be used without reference to a challenging analysis of trade-offs between criteria in the value function approaches. The methods are mainly concerned with the use of the ordinal scale measurements and are based on the principles of noncompensatory preference structures. Although the concepts underlying the outranking methods are intuitively appealing (Gilliams et al., 2005), they are “difficult to verify empirically as models of human preferences” (Stewart, 1992, p. 583). Arguably, the major limitation of the GIS-based outranking methods is the problem of a large number of pairwise comparisons alternatives with respect to each evaluation criterion. There have been several attempts to find a solution to this limitation of outranking methods (e.g., Joerin et al., 2001; Marinoni, 2005; Chakhar and Mousseau, 2008). The proposed approaches are based on the concept of spatial aggregation of the basic geographic units of criterion maps to reduce the number of decision alternatives.

1.15.4.2

Spatially Explicit MCA

One can use four tests proposed by Goodchild and Janelle (2004) for making a distinction between the conventional SMCA models and spatially explicit MCA methods. First, the invariance test considers a MCA model spatially explicit if its decision outcomes (rankings or orderings of decision alternatives) vary under relocation of the feasible alternatives. Second, the representation test requires that decision alternatives are spatially defined. Third, the formulation test declares a MCA model spatially explicit if it contains spatial concepts such as location, distance, contiguity, connectivity, adjacency, or direction. Fourth, according to the outcome test, the spatial form of outputs generated by a spatially explicit MCA model is different than the spatial form of its inputs. Several GIS-MCA approaches that conform to the four tests have been suggested (Tkach and Simonovic, 1997; van Herwijnen and Rietveld, 1999; Feick and Hall, 2004; Makropoulos and Butler, 2006; Chakhar and Mousseau, 2008; Malczewski, 2011; Ligmann-Zielinska and Jankowski, 2012). These approaches are based on the assertion that spatial decision problems require distinct modeling frameworks. Also, it is argued that the spatial decision problems cannot be effectively tackled with the same methods as nonspatial problems. Consequently, the spatially explicit MCA methods go beyond the mere adaption of the conventional methods. They explicitly incorporate the properties of spatial data into the MCA procedures and/or the components of MCA are made spatially explicit.

1.15.4.2.1

Spatial WLC

A crucial aspect of the global WLC model (see “WLC and related methods” section) is that the criterion weight is intricately associated with the value function (Malczewski, 2011; Carter and Rinner, 2014). The interrelated concepts of the criterion range (value function) and criterion weight provide the foundation for developing the local form of WLC. The relationship is captured by the range-sensitive principle (see “Local value scaling” section). Given the definitions of local weight (see “Spatially explicit criteria weighting” section) and the local value function (see “Local value scaling” section), the local form of WLC can be defined as follows: n q X q q

wk v aik ; V Ai ¼ k¼1

(9)

Multicriteria Analysis

207



where V(Aqi ) is the overall value of the ith alternative estimated locally (in the qth neighborhood), v aikq is the value of the kth criterion measured by means of the local value function in the qth neighborhood (see Eqs. 5 and 6), and wqk the local criterion weight (see Eq. 7). The decision alternative with the highest value of V(Aqi ) is the most preferred alternative in the qth neighborhood. An alternative approach for developing the spatially explicit WLC model is to use the concept of proximity-adjusted preferences (see “Spatially explicit criteria weighting” section). Specifically, the proximity-adjusted WLC model is based on the idea of adjusting preferences according to the spatial relationship between alternatives, or an alternative and some reference locations (LigmannZielinska and Jankowski, 2012). The model has a form of the conventional WLC (Eq. 8) with the proximity-adjusted criteria weights. Ligmann-Zielinska and Jankowski (2012) have analyzed a decision situation to demonstrate and evaluate the proximity-adjusted WLC model. They found that the results of proximity-adjusted WLC were considerably different from those obtained using the conventional models. The study suggests that the proximity-adjusted preferences (weights) have a nonlinear effect on the ranking of decision alternatives. The spatially explicit MCA approach has been extended to the OWA model (see “The ordered weighted averaging” section). Makropoulos and Butler (2006) have developed a spatially explicit GIS-OWA model, called spatial OWA (SOWA), by making one of the OWA parameters spatially variable. SOWA operationalizes spatially heterogeneous preferences using the concept of spatially variable order weights. This implies that different types of OWA can be applied to different locations depending on their characteristics (Makropoulos and Butler, 2006). Malczewski and Liu (2014) have proposed a local version of OWA. Like local WLC, the local form of OWA is based on the range-sensitive principle (see “Spatially explicit criteria weighting” section). The spatial WLC models open up new opportunities for SMCA. The results of conventional approaches are unmappable with exception of the overall scores. The results of local WLC modeling can be mapped and further examined with GIS. The local form of WLC offers a tool for place-based analysis. The elements of the local model provide a detailed information about spatial patterns of decision alternatives. Specifically, an analysis of the spatial distribution of local criteria weights and local value functions can provide new insights into the nature of spatial patterns. The spatially distributed parameters of local WLC can be used for examining the local preferences with respect to relative importance of evaluation criteria as well as the local trade-offs between criteria. The outcomes of local WLC are potentially useful in targeting priority areas for informing local planning and policies development.

1.15.4.2.2

Spatial IP methods

Tkach and Simonovic (1997) advanced the conventional GIS-IP method (see “The ideal point methods” section) by considering spatially explicit decision alternatives. Unlike the conventional GIS-IP model, which determines a single overall score (separation from the IP) for each alternative (location), the spatially explicit model identifies a distance metric for each location and for each alternative. It determines the best decision alternative for each location. Using the spatially explicitly GIS-IP model, Tkach and Simonovic (1997) demonstrated that the spatial policy (strategic decision) alternative identified by the conventional GIS-IP method may not necessarily be the best for all locations in a study region. They have shown that the best strategies vary from one location to another. The study also demonstrated that the choice of the best alternative is sensitive to the criteria weights. Tkach and Simonovic (1997) have used a set of global criteria weights; that is, a single weight has been assigned to a particular criterion. Koo and O’Connell (2006) have extended the conventional GIS-IP approach by making the criteria weights spatial variable. They have used the spatial or site-specific weights as a component of a spatially explicit IP model for evaluating land-use scenarios. While Koo and O’Connell (2006) focused on spatially variable criteria weights, S¸alap-Ayça and Jankowski (2016) have advanced the spatially explicit IP method by developing the local form of GIS-IP based on the range-sensitivity principle (see “Spatially explicit criteria weighting” section).

1.15.4.2.3

Geosimulation and MCA

Geosimulation methods, such as cellular automata (CA) and multiagent systems (MAS), have recently emerged as a platform for integrating MCA into group (social or collective) decision making (Malczewski and Rinner, 2015). Unlike the conventional GIS-MCA methods for group decision making, the simulation-based GIS-MCA approaches are spatially explicit, in that the outcome of the decision process depends on spatial arrangement of decision alternatives (e.g., alternative patterns of land use). The principle purpose of using MCA in geosimulation modeling is to define the rules of behavior for decision-making agents. The MCA methods (or multicriteria decision rules) are used for describing and understanding decision-making process and its consequences through a simulation model. The central issue in integrating MCA into geosimulation methods is the procedure for estimating the criteria weights. The pairwise comparison procedure (“Pairwise comparison” section) has been the most often used approach for obtaining the weights (e.g., Wu, 1998; Li and Liu, 2007; Lai et al., 2013). A number of studies have demonstrated the usefulness of integrating MCA into geosimulation procedures (e.g., Ligtenberg et al., 2001; Ligmann-Zielinska, 2009; Sabri et al., 2012). Ligmann-Zielinska (2009) provides an example of using the IP method (see “The ideal point methods” section) within a multiagent modeling approach for simulating land-use patterns. The AN/HP methods (see “Analytic hierarchy/network process” section) have been integrated into MAS (see Sabri et al., 2012; Arsanjani et al., 2013). The synergistic effects of integrating MCA into spatial simulation methods can be enhanced by combining CA and multiagent modeling (e.g., Ligtenberg et al., 2001; Li and Liu, 2007; LigmannZielinska, 2009; Sabri et al., 2012). Ligtenberg et al. (2001) provide an example of an integrated CA and MAS approach and GIS-MCA methods for group (collective) decision making. The main advantage of an integrated GIS-MCA-geosimulation approach is that it provides a tool for developing dynamic models that combine spatially explicit processes using the automaton techniques and actor (stakeholders) interactions by applying the

208

Multicriteria Analysis

multiagent technology. An integrated GIS-MCA and MAS approach can be used for exploring complex large-scale (global) spatial structures that emerge from local decision-making processes. However, global patterns are unlikely to result from local decisionmaking processes alone (Ligtenberg et al., 2004). The bottom-up approach to spatial modeling limits the capability of multiagent simulation methods as a tool for analyzing complex spatial decision problems. This drawback can be addressed by integrating the large-scale (bottom-up) geosimulation methods and the top-down multiobjective optimization procedures (see “Geosimaulation and multiobjective optimization” section).

1.15.4.3

Spatial Multiobjective Optimization

Spatial multiobjective (multicriteria) optimization methods have been specifically designed for modeling spatial systems and solving spatial problems. Common to all optimization models, including spatial multiobjective optimization, is a quantity (quantities) to be minimized or maximized. The quantity is termed the objective or criterion function. In addition, optimization problems typically have a set of constraints imposed on the decision variables. The constraints define the set of feasible solutions. A solution to an optimization problem determines the values of decision variables subject to a set of constraints. In the most general term, a multiobjective optimization model can be written as follows: Minimize or maximize F ðxÞ ¼ ff1 ðxÞ; f2 ðxÞ; .; fn ðxÞg;

(10)

subject to : x ˛ X;

(11)

where F(x) is the nth dimensional objective function; fk(x) is an objective (criterion) function (k ¼ 1, 2, ., n); X is the set of feasible alternatives, and x ¼ (x1, x2, ., xm) is a vector of decision variables, xi  0, for i ¼ 1, 2, ., m. In spatial multiobjective optimization problems, there is at least one set of spatially explicit decision variables, which define the decision/management alternatives. The variables have a geographic meaning such as location, distance, direction, connectivity, contiguity, shape, and compactness of an area. It is important to make distinction between spatial multiobjective optimization problems (or models) and the methods (algorithms) for solving those problems. The problems can be classified according to the type of decision variables into two main categories: discrete and continuous (Cohon, 1978; Zarghami and Szidarovszky, 2011). A discrete variable is limited to a fixed or countable set of values, while a continuous variable can take on any value in a specified interval. An optimization model is an integer model if any one of its decision variables is discrete. If all variables are discrete, the mode is a pure integer one; otherwise, it is a mixed-integer. Combinatorial optimization is a type of discrete modeling. This type of multiobjective optimization problems is the most often used approach for modeling spatial systems (Malczewski, 2006a). Many spatial problems such as locationallocation, traveling salesman, vehicle routing, and scheduling problems fall into this category of spatial optimization. If the values of all decision variables are continuous, the problem is called continuous optimization. The transportation problem is the best-known example of this type of spatial optimization. Many spatial multiobjective models involve both discrete and continuous decision variables. Such optimization problems are referred to as the mixed type. The plant location problem is typically modeled in terms of mixed optimization problem. The solution(s) to spatial multiobjective problems can be generated using a wide range of methods (Duh and Brown, 2005; Malczewski, 2006a). The methods can be classified into two categories: exact (deterministic) methods and approximate (stochastic) methods. The former are based on the theories of mathematical programming (Cohon, 1978). For example, the simplex method in linear programming is an exact approach. However, the exact methods are inefficient in solving complex (and computationally intensive) spatial multiobjective optimization problems. To overcome the limitations of exact methods, a number of heuristics (and metaheuristics) have been proposed (Xiao et al., 2002; Duh and Brown, 2005; Zhang et al. 2015).

1.15.4.3.1

Conventional multiobjective optimization methods

The conventional optimization approaches in GIS-MCA can be classified into two groups: methods for generating nondominated solutions (the weighting and constraint methods), and methods for identifying compromise solutions including the distance-based methods and interactive methods. 1.15.4.3.1.1 Methods for generating nondominated solutions A vector of decision variables x* is said to be nondominated or Pareto optimal if there exist no other feasible vector x such that fk(x)  fk(x*) for all k ¼ 1, 2, ., n and fk(x) > fk(x*) for at least one k. This implies that x* is nondominated if there is no feasible vector that would improve some objective without causing a simultaneous deterioration of at least one other objective. Several techniques for generating nondominated solutions are available (Cohon, 1978; Zarghami and Szidarovszky, 2011). A common feature of these techniques is that the multiobjective problem is first transformed into a scalar problem and then solved as a single-objective optimization problem. The basic difference among the methods lies in how they make the transformation from a multi- to single-objective model (Cohon, 1978). The weighting and constraint methods are the most often used procedures for tackling spatial multiobjective problems (Church et al., 1992; Maliszewski et al., 2012). Church et al. (1992) integrated the weighting method into a raster-based GIS for generating and exploring spatial decision (location) alternatives. Farhan and Murray (2008) integrated the weighting method with GIS to analyze the trade-off between conflicting objectives. Maliszewski and Horner (2010) used a standard mathematical programming software and ArcGIS to solve spatial

Multicriteria Analysis

209

multiobjective problem. Herzig (2008) developed LUMASS (Land Use Management Support System), which integrates ArcMap GIS and an open-source mixed-integer linear programming system for tackling land-use allocation problems. The system offers the weighting and constraint techniques for generating the set of efficient solutions. One important advantage of the weighting and constraint procedures is that the methods reduce the multiobjective optimization problem to a scalar-valued function. This means that the vast body of algorithms, software, and experience that exist for singleobjective optimization models can be directly applied to multiobjective problems. This is of major importance considering the extent to which single-objective optimization has influenced the development of spatial analysis methods (Killen, 1983). 1.15.4.3.1.2 Methods for identifying compromise solutions In the absence of any preference regarding the objectives, all nondominated solutions are assumed equivalent or indifferent. However, the multiobjective decision problems often require that a single nondominated alternative (or a subset of nondominated alternatives) is selected from the set of Pareto optimal solutions. The most often used optimization approaches in GIS-MCA for identifying compromise solutions are the distance-based methods such as compromise programming, goal programming, and reference point methods (Malczewski and Rinner, 2015). The distance metric-based multiobjective methods aim at minimizing a function of the distance (separation) between the desired (usually unachievable) and achieved solutions (Zarghami and Szidarovszky, 2011). The basic approach of this type of multiobjective methods is to establish a specific (desired) value for each of the objective, and then seek a solution that minimizes the (weighted) sum of separations between the achieved (nondominated) alternatives and the specified values of objectives. The desired solution (target values) can be defined as a set of goals (in the goal programming), an IP (in the compromise programming), or some reference point (in the reference point method). November et al. (1996), Alçada-Almeida et al. (2009), and CoutinhoRodrigues et al. (2012) provide examples of integrating GIS and the goal programming procedures. There have been several applications of GIS-based compromise programming methods for solving spatial multiobjective optimization problems (e.g., Church et al., 1992; Huang et al., 2008; Li and Leung, 2011). Antoine et al. (1997), Agrell et al. (2004), and Zeng et al. (2007) have integrated the reference point-based systems into GIS for solving spatial multiobjective problems. The distance metric base methods have also been used as elements of the interactive multiobjective optimization methods. Examples of integrating GIS and interactive goal programming approaches are given in Coutinho-Rodrigues et al. (1997) and Alçada-Almeida et al. (2009). The major advantage of the distance-based methods such as goal programming approaches is their computational efficiency. While dealing with multiobjective optimization problems, goal programming allows us to stay within an efficient linear programming computational environment. There are, however, some conceptual and technical difficulties with using goal programming methods for tackling spatial multiobjective optimization problems. The standard goal programming methods require the user to specify fairly detailed a priori information about his/her aspiration levels, and the importance of goals. Another weakness of weighted goal programming is its poor control over the interactive process in the case of discrete optimization problems. An advantage of the compromise programming approach is that the set of preferred compromise solutions can be ordered between the extreme criterion outcomes, and consequently, an implicit trade-off between criteria can be performed. A disadvantage of this approach is that the selection of the “best” alternative within the set of compromise alternatives must be made based on a further insight into the compromise set of nondominated alternatives. This can be achieved by using the compromise programming approach as a component of an interactive procedure.

1.15.4.4

Heuaristics and Methaheuristics

The complexity of many spatial optimization problems makes it difficult, or even impossible, to search every candidate solution using the conventional multiobjective optimization methods (“Spatially Explicit MCA” section). Such complex problems are often tackled by heuristic algorithms. This class of methods is based on a trial-and-error approach. The algorithms aim at finding good solution in an acceptable timescale by iteratively trying to improve a candidate solution with regard to a given measure of quality. However, the methods do not guarantee that an optimal solution is ever found. Heuristic algorithms include a group of methods, referred to as metaheuristics (Talbi, 2009). Metaheuristics are relatively sophisticated heuristic methods. They are sometimes referred to as the advanced or modern heuristics, as opposed to the traditional, basic heuristic methods. Accordingly, the methods can be classified into: the basic heuristics and metaheuristics.

1.15.4.4.1

Basic heuristics

GIS-based applications of basic heuristics tend to be problem specific. The most popular area of applications is the land-use suitability analysis in the raster GIS environment. One can distinguish two categories of basic heuristics for land-use suitability analysis: site suitability heuristics and site location heuristics (Malczewski and Rinner, 2015). The heuristics for site suitability analysis involve allocating competing land uses to parcels of land based on the land-use suitability scores (e.g., Eastman et al., 1995). This type of heuristics does not consider the spatial properties of areas (regions or patches) of land uses explicitly. Eastman et al. (1995) proposed a heuristic method for land-use suitability analysis, which has been implemented in the IDRISI-Multi-Objective Land Allocation (MOLA) module. The basic principle underlying MOLA is a reclassification of ranked suitability maps with a subsequent conflict resolution between competing land uses allocated to a parcel of land (raster). Brookes (1997), Cromley and Hanink (1999), and Church et al. (2003) provide critical discussions of MOLA. One of the main

210

Multicriteria Analysis

drawbacks of MOLA is that the procedure may not generate the best spatial pattern of land uses according to spatial criteria, such as contiguity and compactness. This shortcoming stimulated research on developing site location heuristics. The site location heuristics are search procedures concerned with land suitability and spatial objectives. In this case, an optimal patch or region contains cells having the highest suitability scores; and it is also characterized by some desirable spatial properties such as shape, orientation, compactness, and contiguity (see Brookes, 1997; Church et al., 2003). Brookes (1997) proposed the parameterized region-growing (PRG) heuristic for locating sites with particular spatial characteristics. An alternative to the PRG method is the use of the patch growing process (Church et al., 2003). The procedure starts with a seed raster cell, and then sequentially adds contiguous raster cells until a prespecified area of the patch is achieved. Vanegas et al. (2011) proposed a modified version of the PRG procedure, called heuristic for multiple criteria site location. In addition to the land suitability-raster-based heuristics, a few other methods, such as the greedy algorithms (Davis et al., 2006; Fischer and Church, 2005), HERO heuristic optimization (Kangas et al., 2008), and Lagrangian relaxation heuristics (Zografos and Androutsopoulos, 2008) have been employed for solving spatial multiobjective optimization problems within the GIS environment. In contrast to the site suitability/location heuristics, these methods are more applicable to the vector-data-based spatial problems.

1.15.4.4.2

Metaheuristics

There is a wide range of metaheuristics available for tackling multiobjective optimization problems (Talbi, 2009). Some of the metaheuristics have been integrated with GIS. Evolutionary algorithms are the most popular GIS-based implementations. They include metaheuristics, such as the genetic algorithms, evolution strategies, and evolutionary programming methods. Other metaheuristic procedures that are consider to be a part of the family of evolutionary algorithms are simulated annealing and tabu search (TS) (see Xiao et al., 2007; Talbi, 2009). Another group of metaheuristics for solving spatial multiobjective problems includes the swarm intelligence methods. The ant colony and particle swarm optimization (PSO) procedures are the two best-known swarm intelligence metaheuristics (Li et al., 2009; 2011b).

1.15.4.4.2.1 Evolutionary algorithms Evolutionary algorithms are metaheuristic methods inspired by biological principles of natural selection and survival of the fittest. They operate on a population of individuals (potential solutions) instead of single solutions. In this way, the metaheuristic search is performed in a parallel manner. By applying the principle of survival of the fittest, the algorithms produce a set of improved solutions at each generation (iteration). The procedure is terminated when some condition is satisfied. 1.15.4.4.2.1.1 Genetic algorithms The genetic algorithms are by far the most popular methods for solving spatial multiobjective optimization problems using GIS (e.g., Brookes, 1997; Bennett et al., 1999; Stewart et al., 2004; Aerts et al., 2005; Xiao et al., 2007; Cao et al., 2011).The basic feature of genetic algorithms is a multidirectional and global search, while maintaining a population of potential solutions from generation to generation. The population-based approach is especially useful for exploring the set of Pareto solutions. Many variants of the multiobjective genetic algorithm have been suggested, with different schemes for chromosome representation, fitness function, selection, crossover, and mutation. One of the most popular genetic procedures is the Non-dominated Sorting Genetic Algorithm II (NSGA-II) (Deb, 2001). It is also the most often used procedure in GIS-based applications of genetic algorithms (e.g., Cao et al., 2011; Fotakis et al., 2012). Cao et al. (2011) have proposed a spatial NSGA-II for multiobjective optimization of land use. This method modifies the conventional NSGA-II by introducing spatial components into the crossover and mutation operators. One of most remarkable features of GIS-based approaches to multiobjective analysis using genetic algorithms has been the wide range of application domains. The NSGA-II procedure provides a good example of the diversity of applications (see Cao et al., 2011). One disadvantage of genetic algorithms (including NSGA-II) is that the abstract genetic algorithm framework may be difficult to implement efficiently and effectively in the context of spatial multiobjective problems (O’Sullivan and Unwin, 2010). 1.15.4.4.2.1.2 Simulated annealing Simulated annealing (SA) approach mimics the process of arranging atoms when a material is heated and then slowly cooled (Kirkpatrick et al., 1983). The SA algorithm reproduces the annealing process to solve an optimization problem (Talbi, 2009). One can distinguish two approaches for tackling GIS-based multiobjective optimization problems with SA. First, the different evaluation criteria (objectives) are combined into a single-objective cost function, and then the problem is solved using the single-objective SA procedure (e.g., Aerts and Heuvelink, 2002; Sharma and Lees, 2004; Santé-Riveira et al., 2008). Second, the conventional SA algorithm is modified for multiobjective problems to search for Pareto-optimal solutions (e.g., Duh and Brown, 2007). Duh and Brown (2007) have incorporated auxiliary knowledge in the structure of spatial patterns to improve the performance of Pareto-simulated annealing in solving multiobjective spatial allocation problems. An advantage of using SA for tackling multiobjective optimization problems is that the algorithm is easy to implement. It is also a robust, flexible, and versatile tool for solving different types of complex spatial problems (Duh and Brown, 2007). Sharma and Lees (2004) compare SA and MOLA for a land-use allocation problem (see “Conventional multiobjective optimization methods” section). Overall, SA provides a superior solution. In addition, the quality of the final land allocation can be assessed easily by comparing the cost functions between the initial and final land-use allocation.

Multicriteria Analysis

211

1.15.4.4.2.1.3 Tabu search The development of the TS method was motivated by the mechanics of human memory (Glover, 1989). Several studies have demonstrated that the GIS-based TS method generates quality solutions to spatial multiobjective optimization problems (e.g., Bettinger et al., 1997; Indriasari et al., 2010). Lourenço et al. (2001) and Bong and Wang (2004) developed hybrid metaheuristics, which combined two metaheuristic methods, including TS. The results of their applications provided evidence of high efficiency and effectiveness of hybrid metaheuristics in solving spatial multiobjective problems. Indriasari et al. (2010) compared the performance of three metaheuristics: SA (see “Simulated annealing” section), GA (see “Genetic algorithms” section), and TS. Although SA generated a solution that was significantly better than the existing location pattern, the SA algorithm was inferior compared to TS in term of solution quality and computation time. The GA and SA procedures were comparable in computation time, but the former was better in solution quality. 1.15.4.4.2.2 Swarm intelligence Swarm intelligence optimization methods are inspired by social behaviors in flocks of birds, schools of fish, herds of buffalo, colonies of ants, and so forth. A colony or swarm is a self-organized, MAS in which the individuals (agents) cooperate to accomplish complex tasks. This cooperation is distributed among the entire population, without any centralized control. The global pattern of agents emerges from local interactions between agents, which occur through direct (agent-to-agent) or indirect (via the environment) communication. Each agent follows a set of rules influenced by locally available information. Ant colony and PSO methods are the most successful swarm intelligence inspired optimization algorithms (Talbi, 2009). 1.15.4.4.2.2.1 Ant colony optimization Ant colony optimization (ACO) is based on an imitation of the behavior of ants in their search for food (Dorigo et al., 1991). Although it seems that each ant in a colony has its own agenda, the colony (population) behaves as a self-organizing system. The main idea behind the ACO metaheuristics is to use repeated and often recurrent simulations of a set of software agents (mobile agents inspired by real ant behavior) to generate solutions to a given problem. At each iteration, the agents collect relevant information, which is used in subsequent iterations to direct their search for the best solution. Li and associates (2009) modified the conventional ACO algorithm to make it capable of addressing spatial multiobjective optimization problems in a GIS-raster environment (see also Li et al., 2011a,b). They used the concept of utility for aggregating objective functions and introduced a direction function as a tool for exploring the search space to increase efficiency of the ACO algorithm (Li et al., 2009). The conventional ACO method has also been advanced by adopting the strategies of neighborhood pheromone diffusion, tabu table adjusting, and multiscale optimization (Li et al., 2009). These advances of GIS-based ACO have been part of an integrated system, called the geographical simulation and optimization system or GeoSOS (Li et al., 2011a). The system has been successfully applied for tackling variety of spatial optimization problems (Li et al., 2011a,c). 1.15.4.4.2.2.2 Particle swarm optimization PSO is inspired by the social behavior in flocks of birds, schools of fish, and swarms of insects such as termites, bees, and wasps (Kennedy and Eberhart, 1995). In this context, an individual (e.g., a bird, fish) is referred to as a particle (or an agent). Each individual in a swarm behaves according to a combination of its own intelligence and the collective intelligence of the population. In PSO, individual particles of a swarm represent potential solutions (e.g., spatial patterns of land uses). The position of each particle is adjusted according to its velocity (i.e., rate of change) and the difference between its current position and the best position found by its neighbors, and the best position it has found so far. As the model is iterated, the swarm focuses more and more on an area of the search space containing high-quality solutions. Ma et al. (2011) and Masoomi et al. (2013) provide notable applications of GIS-based PSO for land-use allocation problems. The integration of swarm intelligent optimization methods into GIS offers enhanced spatial analytical capabilities involving the concept of a simple self-organized system of agents cooperating to solve a problem. Ma et al. (2011) indicate that an attractive feature of PSO is its simplicity and flexibility. Unlike the genetic algorithm that involves complex coding, a swarm intelligent optimization algorithm is able to perform all the operations using a few parameters. Zeng et al. (2007) compare the performance of ACO with other heuristics such as simulated annealing and genetic algorithm, and conclude that the three methods result in a similar final output. However, one can argue that the performance of conventional swarm intelligence algorithms can be significantly improved when they are modified to take into account specific features of spatial multiobjective decision problems (e.g., Ma et al., 2011; Liu et al., 2012a,b). Indeed, the results of comparative studies by Liu and associates (2012a,b) show that a modified ACO is considerably more efficient than genetic algorithm (“Genetic algorithms” section), simulation annulling (“Simulated annealing” section), and the MOLA method (“Conventional multiobjective optimization methods” section) in solving a zoning problem. Other computational experiments and comparisons of different algorithms reveal that the ACO procedure designed for solving a site section problem shows a stable performance under different parameter settings and it outperforms the conventional genetic algorithm (Liu et al., 2006).

1.15.4.4.3

Geosimulation and multiobjective optimization

Over the last decade or so, there has been a considerable interest in integrating multiobjective optimization methods with geosimulation (e.g., Ward et al., 2003; Trunfio, 2006; Castella et al., 2007; Bone and Dragicevic, 2009; Ligmann-Zielinska and Jankowski, 2010; Chen et al., 2010a; Bone et al., 2011; Li et al., 2011c; Fotakis and Sidiropoulos, 2012; Feng and Liu, 2013). The motivation behind the integrated approach is that one can achieve a synergistic effect by integrating the two modeling frameworks (LigmannZielinska and Jankowski, 2010; Bone et al., 2011). This has been demonstrated by several studies about combining CA and

212

Multicriteria Analysis

multiobjective optimization methods (e.g., Ward et al., 2003; Fotakis and Sidiropoulos, 2012), and MAS and multiobjective optimization models (e.g., Castella et al. 2007; Bone and Dragicevic, 2009). For example, the conventional multiobjective optimization (mathematical programming) methods have been integrated with CA (Ward et al. 2003) and agent-based modeling (Castella et al. 2007; Chen et al., 2010a; Ligmann-Zielinska and Jankowski, 2010). In the geosimulation-based multiobjective approaches, the spatial optimization problems are often tackled by metaheuristics (see “Metaheuristics” section). Bone and Dragicevic (2009) developed a model in which agents representing individual stakeholders have their actions evaluated by algorithms based on reinforcement learning (see also Bone et al., 2011). Li and associates (2011a) have integrated CA and ACO procedures (see “Ant colony optimization” section) to solve complex path optimization problems. The CA approach has been coupled with a simulated annealing procedure (see “Simulated annealing” section) for modeling urban land-use changes (Feng and Liu, 2013). Fotakis and Sidiropoulos (2012) proposed the CA-based spatial optimization model using NSGA-II (see “Genetic algorithms” section). Chen et al. (2010a,b) demonstrated the usefulness of MAS for tackling a land allocation optimization problem. The results of their computational experiments show that the simulation-based optimization procedure generates solutions (land allocation patterns) similar to that obtained with the exact mathematical programming methods. The geosimulation-based multiobjective optimization methods provide a significant contribution to applied GIS-MCA (Ligmann-Zielinska and Jankowski, 2010). While the multiobjective optimization procedure generates a set of nondominated solutions and allows for analyzing trade-off between conflicting objectives, geosimulation provides an effective tool for exploring a variety of decision-making scenarios and facilitating the process of identifying a compromise solution. The two modeling paradigms complement each other. Complementarity is the primary source of synergy between the two methods. The synergistic effects manifest themselves in mutually reinforced conclusions that can be derived from geosimulation and multiobjective optimization analysis. The normative results (recommendations) of multiobjective optimization can be strengthened by a complementary multiagent, process-oriented modeling of the decision-making process. As with any heuristic method, the geosimulation-based multiobjective optimization approach is not without its problems. Although there is some evidence to show that the methods generate good approximation of the exact solution to complex spatial problems (e.g., Chen et al., 2010a,b), the approach does not guarantee more accurate decision making. The geosimulation technologies can also be criticized for their “black box” style of spatial analysis (O’Sullivan and Unwin, 2010).

1.15.4.5

Dealing with uncertainties

There are two approaches for handling uncertainties in GIS-MCA: direct and indirect methods (Malczewski and Rinner, 2015). The former incorporates uncertainty into the decision rules directly. Specifically, the deterministic methods such as WLC, AH/NP, IP, compromise programming, and goal programming can be extended to take into account uncertainties involved in the decisionmaking process. Two types of uncertainty may be present in a decision situation: uncertainty associated with fuzziness (imprecision), and uncertainty associated with limited information. Depending on the type of uncertainty involved, multicriteria decision problems under uncertainty can be further subdivided into: fuzzy (e.g., Banai, 1993; Jiang and Eastman, 2000; Makropoulos et al., 2003), and probabilistic (or stochastic) decision-making problems (e.g., Kangas et al., 2005; Marinoni, 2005; Prato, 2008). Sensitivity analysis is an alternative method of incorporating uncertainties into MCA. It is concerned with the way uncertainties in a set of input data affect the multicriteria decision model output. Sensitivity and uncertainty analysis can be considered as integral parts of broadly defined sensitivity analysis (Saltelli, 2000; Crosetto and Tarantola, 2001). Specifically, sensitivity analysis in MCA is a set of methods for assessing uncertainty in the multicriteria model output and importance of the model input factors (such as the criterion values and weights). It aims at partitioning the uncertainty in output to different sources of uncertainty associated with the input factors (Saltelli, 2000; Crosetto and Tarantola, 2001). Gómez-Delgado and Tarantola (2006), Chen et al. (2013), Feizizadeh et al. (2014), and Ligmann-Zielinska and Jankowski (2014) provide examples of approaches for sensitivity and uncertainty analysis in GIS-MCA.

1.15.5

Conclusion

Since the beginning of the 1990s, there has been a strongly growing interest in the theoretical and applied research on integrating GIS and MCA (Malczewski and Rinner, 2015). The main rationale for integrating these two distinctive sets of methods and tools has come from the need to expand the decision support capabilities of GIS (Sugumaran and DeGroote, 2011; Ferretti and Montibeller, 2016). The major advantage of incorporating MCA into GIS is that value judgments can be introduced into the GIS-based decisionmaking procedures. MCA provides tools for guiding the decision maker (agent) through the critical process of clarifying evaluation criteria (attributes and/or objectives), and of defining values that are relevant to the decision situation. It can help the users to understand the results of GIS-MCA by providing insights into the trade-offs among policy objectives, and then use the results in a systematic and defensible way to develop policy recommendations (Nyerges and Jankowski, 2010). In the group decision making, MCA can improve the communication and understanding of a decision problem among participants and facilitate ways of building consensus and reaching policy compromises. Consequently, the GIS-MCA methods can contribute to improving collaborative decision-making procedures by providing flexible problem-solving approaches where those involved in collaborative tasks can explore, understand, and redefine a decision problem.

Multicriteria Analysis

213

Recent developments in spatial analysis show that GIS-MCA can make substantial contributions to the advancements of geosimulation methods (Li et al. 2011a; Lai et al. 2013). These contributes come from the complementarity and synergy of GIS-MCA and geosimulation modeling. While MCA offers a set of tools for defining the behavior of decision-making agents in the geosimulation approaches, the geosimulation methods provide a platform allowing for spatial aspects of MCA to be considered explicitly. This, in turn, has inspired research on integrating spatial simulation and optimization methods. The integrative modeling framework combines the bottom-up approach of geosimulation methods and the top-down multiobjective optimization procedures as a tool for analyzing complex spatial decision problems (Bone et al., 2011). Another significant contribution of GIS-MCA to spatial analysis is the development of spatially explicit multicriteria models. A number of approaches have been proposed for developing spatially explicit methods using the concepts of GIS-MCA. The local forms of the conventional models are of particular significance because they have been developed based on well-established MCA concepts (Malczewski, 2011). Notwithstanding the remarkable growth of research on GIS-MCA, the research has tended to concentrate on the technical issues of integrating MCA into GIS. As a consequence, our understanding of the benefits of such integration is limited by the scarcity of research on empirical substantiation of the use of GIS-MCA methods and their theoretical foundations and operational validity. Some MCA procedures are lacking a proper scientific foundation and some methods involve a set of stringent assumptions, which are difficult to validate in real-world situations. These problems have, to a large extent, been ignored by the GIS-MCA community.

References Aerts, J.C.J.H., Heuvelink, G.B.M., 2002. Using simulated annealing for resource allocation. International Journal of Geographical Information Science 16, 571–587. Aerts, J.C.J.H., van Herwijnen, M., Janssen, R., Stewart, T.J., 2005. Evaluating spatial design techniques for solving land-use allocation problems. Journal of Environmental Planning and Management 48, 121–142. Agrell, P.J., Stam, A., Fischer, G.W., 2004. Interactive multiobjective agro-ecological land use planning: The Bungoma region in Kenya. European Journal of Operational Research 158, 194–217. Alçada-Almeida, L., Tralhão, L., Santos, L., Coutinho-Rodrigues, J., 2009. A multiobjective approach to locate emergency shelters and identify evacuation routes in urban areas. Geographical Analysis 41, 9–29. Alexander, E.R., 2000. Rationality revisited: Planning paradigms in a post-postmodernist perspective. Journal of Planning Education and Research 19, 242–256. Andrienko, N., Andrienko, G., 2001. Intelligent support for geographic data analysis and decision making in the web. Journal of Geographic Information and Decision Analysis 5, 115–128. Antoine, J., Fischer, G., Makowski, M., 1997. Multiple criteria land use analysis. Applied Mathematics and Computation 83, 195–215. Arsanjani, J.J., Helbich, M., Vaz, E., 2013. Spatiotemporal simulation of urban growth patterns using agent-based modeling: The case of Tehran. Cities 32, 33–42. Banai, R., 1993. Fuzziness in geographic information systems: Contributions from the analytic hierarchy process. International Journal of Geographical Information Systems 7, 315–329. Bell, N., Schuurman, N., Hayes, M.V., 2007. Using GIS-based methods of multicriteria analysis to construct socio-economic deprivation indices. Intentional Journal of Health Geography 6, 17. Belton, V., Gear, T., 1983. On a shortcoming of Saaty’s method of analytic hierarchies. Omega 11, 228–230. Belton, V., Stewart, T.J., 2002. Multiple criteria decision analysis: An integrated approach. Kluwer Academic Publishers, Boston. Bennett, D.A., Wade, G.A., Armstrong, M.P., 1999. Exploring the solution space of semi-structured geographical problems using genetic algorithms. Transactions in GIS 3, 51–71. Bettinger, P., Sessions, J., Boston, K., 1997. Using tabu search to schedule timber harvests subject to spatial wildlife goals for big game. Ecological Modelling 94, 111–123. Bojórquez-Tapia, L.A., Diaz-Mondragon, S., Ezcurra, E., 2001. GIS-based approach for participatory decision making and land suitability assessment. International Journal of Geographical Information Science 15, 129–151. Bone, C., Dragicevic, S., 2009. GIS and intelligent agents for multiobjective natural resource allocation: A reinforcement learning approach. Transactions in GIS 13, 253–272. Bone, C., Dragicevic, S., White, R., 2011. Modeling-in-the-middle: Bridging the gap between agent-based modeling and multi-objective decision making for land use change. International Journal of Geographical Information Science 25, 717–737. Bong, C.W., Wang, Y.C., 2004. A multiobjective hybrid metaheuristic approach for GIS-based spatial zoning model. Journal of Mathematical Modelling and Algorithms 3, 245–261. Boroushaki, S., Malczewski, J., 2008. Implementing an extension of the analytical hierarchy process using ordered weighted averaging operators with fuzzy quantifiers in ArcGIS. Computers and Geosciences 34, 399–410. Boroushaki, S., Malczewski, J., 2010. Using the fuzzy majority approach for GIS-based multicriteria group decision-making. Computers and Geosciences 36, 302–312. Brans, J., Mareschal, B., Vincke, P., 1984. PROMETHEE: A new family of outranking methods in multicriteria analysis. In: Brans, J. (Ed.), Proceedings of International Federation of Operational Research Societies (IFORS) on Operational Research, August 6–10, 1984, Washington, DC. Universite Libre de Bruxelles, Brussels, Belgium, pp. 477–490. Brookes, C.J., 1997. A genetic algorithm for locating optimal sites on raster suitability maps. Transactions in GIS 2, 91–107. Cao, K., Batty, M., Huang, B., Liu, Y., Yu, L., Chen, J., 2011. Spatial multi-objective land use optimization: Extensions to the nondominated sorting genetic algorithm-II. International Journal of Geographical Information Science 25, 1949–1969. Carter, B., Rinner, C., 2014. Locally weighted linear combination in a vector geographic information system. Journal of Geographical Systems 16, 343–361. Carver, S.J., 1991. Integrating multi-criteria evaluation with geographical information systems. International Journal of Geographical Information Systems 5, 321–339. Castella, J.C., Kam, S.P., Quang, D.D., Verburg, P.H., Hoanh, C.T., 2007. Combining top-down and bottom-up modelling approaches of land use/cover change to support public policies: Application to sustainable management of natural resources in northern Vietnam. Land Use Policy 24, 531–545. Chakhar, S., Mousseau, V., 2008. GIS-based multicriteria spatial modeling generic framework. International Journal of Geographical Information Science 22 (11–12), 1159–1196. Chen, Y., Paydar, Z., 2012. Evaluation of potential irrigation expansion using a spatial fuzzy multi-criteria decision framework. Environmental Modelling and Software 38, 147–157. Chen, K., Blong, R., Jacobson, C., 2001. MCE-RISK: Integrating multicriteria evaluation and GIS for risk decision-making in natural hazards. Environmental Modelling and Software 16, 387–397. Chen, Y., Khan, S., Paydar, Z., 2010a. To retire or expand? A fuzzy GIS-based spatial multicriteria evaluation framework for irrigated agriculture. Irrigation and Drainage 59, 174–188. Chen, Y., Li, X., Liu, X., Liu, Y., 2010b. An agent-based model for optimal land allocation (AgentLA) with a contiguity constraint. International Journal of Geographical Information Science 24, 1269–1288. Chen, Y., Yu, J., Khan, S., 2013. The spatial framework for weight sensitivity analysis in AHP based multi-criteria decision making. Environmental Modelling and Software 48, 129–140. Choo, E.U., Schoner, B., Wedley, W.C., 1999. Interpretation of criteria weights in multicriteria decision making. Computers and Industrial Engineering 37, 527–541.

214

Multicriteria Analysis

Church, R.L., Loban, S.R., Lombard, K., 1992. An interface for exploring spatial alternatives for a corridor location problem. Computers and Geosciences 18, 1095–1105. Church, R.L., Gerrard, R.A., Gilpin, M., Stine, P., 2003. Constructing cell-based habitat patches useful in conservation planning. Annals of the Association of American Geographers 93, 814–827. Cohon, J.L., 1978. Multiobjective programming and planning. Academic Press, London. Coutinho-Rodrigues, J., ClÚmaco, J., Current, J., Ratick, S., 1997. An interactive spatial decision support system for multiobjective HAZMAT location-routing problems. Transportation Research Record 1602, 101–109. Coutinho-Rodrigues, J., Tralhão, L., Alçada-Almeida, L., 2012. Solving a location-routing problem with a multiobjective approach: The design of urban evacuation plans. Journal of Transport Geography 22, 206–218. Cova, T.J., Church, R.L., 2000. Exploratory spatial optimization and site search: Neighborhood operator approach. Computers, Environment and Urban Systems 21, 401–419. Cromley, R.G., Hanink, D.M., 1999. Coupling land use allocation models with raster GIS. Journal of Geographical Systems 1, 137–153. Crosetto, M., Tarantola, S., 2001. Uncertainty and sensitivity analysis: Tools for GIS-based model implementation. International Journal of Geographical Information Science 15, 415–437. Davis, F.W., Costello, C., Stoms, D., 2006. Efficient conservation in a utility-maximization framework. Ecology and Society 11, 33. Deb, K., 2001. Multi-objective optimization using evolutionary algorithms. John Wiley and Sons, Chichester, England. Demetriou, D., Stillwell, J., See, L., 2012. An integrated planning and decision support system (IPDSS) for land consolidation: Theoretical framework and application of the landredistribution modules. Environment and Planning B: Planning and Design 39, 609–662. Dorigo, M., Colorni, A., Maniezzo, V., 1991. Positive feedback as a search strategy. Technical Report. Politecnico di Milano, Milan, pp. 91–106. Duh, J.D., Brown, D.G., 2005. Generating prescribed patterns in landscape models. In: Maguire, D.J., Goodchild, M.F., Batty, M. (Eds.), GIS, spatial analysis and modeling. ESRI Press, Redlands, CA, pp. 423–444. Duh, J.D., Brown, D.G., 2007. Knowledge-informed Pareto simulated annealing for multiobjective spatial allocation. Computers, Environment and Urban Systems 31, 253–281. Eastman, J.R., 1997. IDRISI for Windows, Version 2.0: Tutorial exercises. Clark University, Worcester. Eastman, J.R., Kyem, P.A.K., Toledano, J., Jin, W., 1993. GIS and decision making. UNITAR, Geneva. Eastman, J.R., Jin, W.G., Kyem, P., Toledano, J., 1995. Raster procedures for multi-criteria/multi-objective decisions. Photogrammetric Engineering and Remote Sensing 61, 539–547. Elaalem, M., Comber, A., Fisher, P., 2011. A comparison of fuzzy AHP and ideal point methods for evaluating land suitability. Transactions in GIS 15, 329–346. Eldrandaly, K.A., 2013. Exploring multi-criteria decision strategies in GIS with linguistic quantifiers: An extension of the analytical network process using ordered weighted averaging operators. International Journal of Geographical Information Science 27, 2455–2482. Estoque, R.C., 2012. Analytic hierarchy process in geospatial analysis. In: Murayama, Y. (Ed.), Progress in geospatial analysis. Springer, Tokyo, pp. 157–182. Farhan, B., Murray, A.T., 2008. Siting park-and-ride facilities using a multi-objective spatial optimization model. Computers and Operations Research 35, 445–456. Feick, R.D., Hall, G.B., 1999. Consensus-building in a multi-participant Spatial Decision Support System. URISA Journal 11, 17–23. Feick, R.D., Hall, B.G., 2004. A method for examining the spatial dimension of multi-criteria weight sensitivity. International Journal of Geographical Information Science 18, 815–840. Feizizadeh, B., Jankowski, P., Blaschke, T., 2014. A GIS based spatially-explicit sensitivity and uncertainty analysis approach for multi-criteria decision analysis. Computers and Geosciences 64, 81–95. Feng, Y., Liu, Y., 2013. A heuristic cellular automata approach for modelling urban land-use change based on simulated annealing. International Journal of Geographical Information Science 27, 449–466. Ferretti, V., Montibeller, G., 2016. Key challenges and meta-choices in designing and applying multi-criteria spatial decision support systems. Decision Support Systems 84, 41–52. Ferretti, V., Pomarico, S., 2013. Ecological land suitability analysis through spatial indicators: An application of the analytic network process technique and ordered weighted average approach. Ecological Indicators 34, 507–519. Fischer, G.W., 1995. Range sensitivity of attribute weights in multiattribute value models. Organizational Behavior and Human Decision Processes 62, 252–266. Fischer, D.T., Church, R.L., 2005. The SITES reserve selection system: A critical review. Environmental Modeling and Assessment 10, 215–228. Fotakis, D., Sidiropoulos, E., 2012. A new multi-objective self-organizing optimization algorithm (MOSOA) for spatial optimization problems. Applied Mathematics and Computation 218, 5168–5180. Fotakis, D.G., Sidiropoulos, E., Myronidis, D., Ioannou, K., 2012. Spatial genetic algorithm for multi-objective forest planning. Forest Policy and Economics 21, 12–19. Geneletti, D., 2005. Practice report: Multicriteria analysis to compare the impact of alternative road corridors: A case study in northern Italy. Impact Assessment and Project Appraisal 23 (2), 135–146. Gilliams, S., Raymaekers, D., Muys, B., van Orshoven, J., 2005. Comparing multiple criteria decision methods to extend a geographical information system on afforestation. Computers and Electronics in Agriculture 49, 142–158. Glover, F., 1989. Tabu SearchdPart I. ORSA Journal on Computing 1, 190–206. Gómez-Delgado, M., Tarantola, S., 2006. Global sensitivity analysis, GIS and multi-criteria evaluation for a sustainable planning of a hazardous waste disposal site in Spain. International Journal of Geographical Information Science 20, 449–466. Goodchild, M.F., Haining, R.P., 2004. GIS and spatial data analysis: Converging perspectives. Papers in Regional Science 83, 363–385. Goodchild, M.F., Janelle, D.G., 2004. Thinking spatially in the social sciences. In: Goodchild, M.F., Janelle, D.G. (Eds.), Spatially integrated social science. Oxford University Press, New York, pp. 3–22. Gorsevski, P.V., Cathcart, S.C., Mirzaei, G., Jamali, M.M., Ye, X., Gomezdelcampo, E., 2013. A group-based spatial decision support system for wind farm site selection in northwest Ohio. Energy Policy 55, 374–385. Hamilton, M.C., Nedza, J.A., Doody, P., Bates, M.E., Bauer, N.L., Voyadgis, D.E., Fox-Lent, C., 2016. Web-based geospatial multiple criteria decision analysis using open software and standards. International Journal of Geographical Information Science 30, 1667–1686. Herzig, A., 2008. A GIS-based module for the multiobjective optimization of areal resource allocation. In: Bernard, L., Friis-Christensen, A., Pundt, H., Compte, I. (Eds.), Proceedings of the 11th AGILE International Conference on Geographic Information Science. University of Girona, Spain, pp. 1–17. Hobbs, B.F., 1980. A comparison of weighting methods in power plant siting. Decision Sciences 11, 725–737. Hobbs, B.F., Meier, P., 2000. Energy decisions and the environment: A guide to the use of multicriteria methods. Kluwer Academic Publishers, Boston. Huang, B., Fery, P., Xue, L., Wang, Y., 2008. Seeking the Pareto front for multiobjective spatial optimization problems. International Journal of Geographical Information Science 22, 507–526. Hwang, C.L., Yoon, K., 1981. Multiple attribute decision making: Methods and applications. Springer-Verlag, Berlin. Indriasari, V., Mahmud, A.R., Ahmad, N., Shariff, A.R.M., 2010. Maximal service area problem for optimal siting of emergency facilities. International Journal of Geographical Information Science 24, 213–230. Jankowski, P., 1995. Integrating geographical information systems and multiple criteria decision making methods. International Journal of Geographical Information Systems 9, 251–273. Jankowski, P., Nyerges, T., 2001. Geographic information systems for group decision making: Towards a participatory geographic information science. Taylor and Francis, London. Jankowski, P., Ligmann-Zielinska, A., Swobodzinski, M., 2008. Choice modeler: A web-based spatial multiple criteria evaluation tool. Transactions in GIS 12, 541–561. Jiang, H., Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria evaluation in GIS. International Journal of Geographical Information Systems 14, 173–184.

Multicriteria Analysis

215

Joerin, F., Theriault, M., Musy, A., 2001. Using GIS and outranking multi-criteria analysis for land-use suitability assessment. International Journal of Geographical Information Science 15, 153–174. Jun, C., 2000. Design of an intelligent geographic information system for multi-criteria site analysis. URISA Journal 12, 5–17. Kangas, J., Store, R., Kangas, A., 2005. Socioecological landscape planning approach and multicriteria acceptability analysis in multiple-purpose forest management. Forest Policy and Economics 7, 603–614. Kangas, A., Kangas, J., Kurttila, M., 2008. Decision support for forest management. Springer, Berlin. Keeney, R.L., 1992. Value-focused thinking: A path to creative decision making. Harvard University Press, Cambridge, MA. Kennedy, J., Eberhart, R., 1995. Particle swarm optimization. In: Proceedings of the 4th IEEE International Conference on Neural Networks, Nov. 27–Dec. 01, 1995, Perth, WA, pp. 1942–1948. Killen, J., 1983. Mathematical programming methods for geographers and planners. Croom Helm, London. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P., 1983. Optimization by simulated annealing. Science 220, 671–680. Koo, B.K., O’Connell, P.E., 2006. An integrated modelling and multicriteria analysis approach to managing nitrate diffuse pollution: 1. Framework and methodology. Science of the Total Environment 259, 1–16. Lai, S.K., Hopkins, L.D., 1989. The meanings of trade-offs in multiattribute evaluation methods: A comparison. Environment and Planning B 16, 155–170. Lai, T., Dragicevic, S., Schmidt, M., 2013. Integration of multicriteria evaluation and cellular automata methods for landslide simulation modelling. Geomatics, Natural Hazards and Risk 4, 355–375. Li, R., Leung, Y., 2011. Multi-objective route planning for dangerous goods using compromise programming. Journal of Geographical Systems 13, 249–271. Li, X., Liu, X., 2007. Defining agents’ behaviors to simulate complex residential development using multicriteria evaluation. Journal of Environmental Management 85, 1063–1075. Li, X., He, J.Q., Liu, X.P., 2009. Ant intelligence for solving optimal path-covering problems with multi-objectives. International Journal of Geographical Information Science 23, 839–857. Li, X., Chen, Y.M., Liu, X.P., Li, D., He, J.Q., 2011a. Concepts, methodologies, and tools of an integrated geographical simulation and optimization system. International Journal of Geographical Information Science 25, 633–655. Li, X., Lao, C., Liu, X., Chen, Y., 2011b. Coupling urban cellular automata with ant colony optimization for zoning protected natural areas under a changing landscape. International Journal of Geographical Information Science 25, 575–593. Li, X., Shi, X., He, J., Liu, X., 2011c. Coupling simulation and optimization to solve planning problems in a fast-developing area. Annals of the Association of American Geographers 101, 1032–1048. Li, N., Raskin, R., Goodchild, M., Janowicz, K., 2012. An ontology-driven framework and web portal for spatial decision support. Transactions in GIS 16, 313–329. Ligmann-Zielinska, A., 2009. The impact of risk-taking attitudes on a land use pattern: An agent based model of residential development. Journal of Land Use Science 4, 215–232. Ligmann-Zielinska, A., Jankowski, P., 2010. Exploring normative scenarios of land use development decisions with an agent-based simulation laboratory. Computers, Environment and Urban Systems 34, 409–423. Ligmann-Zielinska, A., Jankowski, P., 2012. Impact of proximity-adjusted preferences on rank-order stability in geographical multicriteria decision analysis. Journal of Geographical Systems 14, 167–187. Ligmann-Zielinska, A., Jankowski, P., 2014. Spatially-explicit integrated uncertainty and sensitivity analysis of criteria weights in multi-criteria land suitability evaluation. Environmental Modelling and Software 57, 235–247. Ligtenberg, A., Bregt, A.K., van Lammeren, R., 2001. Multi-actor-based land use modelling: Spatial planning using agents. Landscape and Urban Planning 56, 21–33. Ligtenberg, A., Wachowicz, M., Bregt, A.K., Beulens, A., Kettenis, D.L., 2004. A design and application of a multi-agent system for simulation of multi-actor spatial planning. Journal of Environmental Management 72, 43–55. Liu, N., Huang, B., Chandeamouli, M., 2006. Optimal sting of fire stations using GIS and ANT algorithm. Journal of Computing in Civil Engineering 20, 361–369. Liu, X., Lao, C., Li, X., Liu, Y., Chen, Y., 2012a. An integrated approach of remote sensing, GIS and swarm intelligence for zoning protected ecological areas. Landscape Ecology 27, 447–463. Liu, X., Li, X., Shi, X., Huang, K., Liu, Y., 2012b. A multi-type ant colony optimization (MACO) method for optimal land use allocation in large areas. International Journal of Geographical Information Science 26, 1325–1343. Lloyd, C.D., 2010. Local models for spatial analysis. CRC Press, Boca Raton. Lourenço, H.R., Paixão, J.P., Portugal, R., 2001. Multiobjective metaheuristics for the bus driver scheduling problem. Transportation Science 35, 331–343. Louvart, L., Meyer, P., Olteanu, A.-L., 2015. MODEL: A multicriteria ordinal evaluation tool for GIS. International Journal of Geographical Information Science 29, 1910–1931. Ma, S., He, J., Liu, F., Yu, Y., 2011. Land-use spatial optimization based on PSO algorithm. Geo-spatial Information Science 14, 54–61. Makropoulos, C.K., Butler, D., 2006. Spatial ordered weighted averaging: Incorporating spatially variable attitude towards risk in spatial multi-criteria decision-making. Environmental Modelling and Software 21, 69–84. Makropoulos, C.K., Butler, D., Maksimovic, C., 2003. Fuzzy logic spatial decision support system for urban water management. Journal of Water Resources Planning and Management 129, 69–77. Malczewski, J., 1996. A GIS-based approach to multiple criteria group decision-making. International Journal of Geographical Information Systems 10, 955–971. Malczewski, J., 1999. GIS and multicriteria decision analysis. Wiley, New York. Malczewski, J., 2000. On the use of weighted liner combination method in GIS: Common and best practice approaches. Transactions in GIS 4, 5–22. Malczewski, J., 2006a. GIS-based multicriteria decision analysis: A survey of the literature. International Journal of Geographical Information Science 20, 703–726. Malczewski, J., 2006b. Ordered weighted averaging with fuzzy quantifiers: GIS-based multicriteria evaluation for land-use suitability analysis. International Journal of Applied Earth Observation and Geoinformation 8, 270–277. Malczewski, J., 2011. Local weighted linear combination. Transactions in GIS 15, 439–455. Malczewski, J., Liu, X., 2014. Local ordered weighted averaging in GIS-based multicriteria analysis. Annals of GIS 20, 117–129. Malczewski, J., Rinner, C., 2005. Exploring multicriteria decision strategies in GIS with linguistic quantifiers: A case study of residential quality evaluation. Journal of Geographical Systems 7, 249–268. Malczewski, J., Rinner, C., 2015. Multi-criteria decision analysis in geographic information science. Springer, Berlin. Malczewski, J., Chapman, T., Flegel, C., Walters, D., Shrubsole, D., Healy, M.A., 2003. GIS-multicriteria evaluation with Ordered Weighted Averaging (OWA): Developing management strategies for rehabilitation and enhancement projects in the Cedar Creek watershed, Ontario, Canada. Environment and Planning A 35, 1769–1784. Maliszewski, P.J., Horner, M.W., 2010. A spatial modeling framework for siting critical supply infrastructures. Professional Geographer 62, 426–441. Maliszewski, P.J., Kuby, M.J., Horner, M.W., 2012. A comparison of multi-objective spatial dispersion models for managing critical assets in urban areas. Computers, Environment and Urban Systems 36, 331–341. Marinoni, O., 2004. Implementation of the analytical hierarchy process with VBA in ArcGIS. Computers and Geosciences 30, 637–646. Marinoni, O., 2005. A stochastic spatial decision support system based on PROMETHEE. International Journal of Geographical Information Science 19, 51–68. Martin, N.J., St Onge, B., Waaub, J.P., 2003. An integrated decision aid system for the development of Saint Charles River alluvial plain, Quebec, Canada. International Journal of Environment and Pollution 12, 264–279. Masoomi, Z., Sadi Mesgari, M., Hamrah, M., 2013. Allocation of urban land uses by multi-objective particle swarm optimization algorithm. International Journal of Geographical Information Science 27, 542–566.

216

Multicriteria Analysis

Massei, G., Rocchi, L., Paolotti, L., Greco, S., Boggia, A., 2014. Decision support systems for environmental management: A case study on wastewater from agriculture. Journal of Environmental Management 146, 491–504. Meng, Y., Malczewski, J., 2010. Web-PPGIS usability and public engagement: A case study in Canmore, Alberta. Journal of URISA 22, 55–64. Nadi, S., Delavar, M.R., 2011. Multi-criteria, personalized route planning using quantifier-guided ordered weighted averaging operators. International Journal of Applied Earth Observation and Geoinformation 13, 322–335. Nekhay, O., Arriaza, M., Boerboom, L., 2009. Evaluation of soil erosion risk using analytic network process and GIS: A case study from Spanish mountain olive plantations. Journal of Environmental Management 90, 3091–3104. November, S.M., Cromley, R.G., Cromley, E.K., 1996. Multi-objective analysis of school district regionalization alternatives in Connecticut. Professional Geographer 48, 1–14. Nyerges, T.L., 1992. Coupling GIS and spatial analytical models. In: Breshanan, P., Corwin, E., Cowen, D. (Eds.)Proceedings of 5th International Symposium on Spatial Data Handling, Aug. 3–7, 1992, Charleston, SC, pp. 534–543. Nyerges, T.L., Jankowski, P., 2010. Regional and urban GIS a decision support approach. Guilford, New York. O’Sullivan, D., Unwin, D.J., 2010. Geographic information analysis. John Wiley and Sons, Hoboken, NJ. Ozturk, D., Batuk, F., 2011. Implementation of GIS-based multicriteria decision analysis with VB in ArcGIS. International Journal of Information Technology and Decision Making 10, 1023–1042. Parker, D.C., Manson, S.M., Janssen, M.A., Hoffmann, M., Deadman, P., 2003. Multi-agent systems for the simulation of land-use and land-cover change: A review. Annals of the Association of American Geographers 93, 314–337. Pereira, J.M.C., Duckstein, L., 1993. A multiple criteria decision-making approach to GIS-based land suitability evaluation. International Journal of Geographical Information Systems 7, 407–424. Prato, T., 2008. Stochastic multiple attribute evaluation of land use policies. Ecological Modelling 219, 115–124. Reynolds, K.M., Hessburg, P.F., 2014. An overview of the ecosystem management decision-support system. In: Reynolds, K.M., Hessburg, P.F., Bourgeron, P.S. (Eds.), Making transparent environmental management decisions. Springer, Berlin, pp. 3–22. Rinner, C., Heppleston, A., 2006. The spatial dimensions of multi-criteria evaluation-case study of a home buyer’s spatial decision support system. In: Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (Eds.), Proceedings of 4th International Conference, GIScience 2006, Mu¨nster, Germany. Lecture Notes in Computer Science vol. 4197, 338–352, 20–23 September. Berlin: Springer. Rinner, C., Malczewski, J., 2002. Web-enabled spatial decision analysis using ordered weighted averaging (OWA). Journal of Geographical Systems 4, 385–403. Rinner, C., Raubal, M., 2004. Personalized multi-criteria decision strategies in location-based decision support. Journal of Geographic Information Science 10, 149–156. Rinner, C., Taranu, J.P., 2006. Map-based exploratory evaluation of non-medical determinants of population health. Transactions in GIS 10, 633–649. Roy, B., 1968. Classement et choix en présence de points de vue multiples (la méthode ELECTRE). RIRO 2, 57–75. Saaty, T.L., 1980. The analytic hierarchy process. McGraw-Hill, New York. Saaty, T.L., 1996. Decision making with dependence and feedback: The analytic network process. RWS Publications, Pittsburgh. Sabri, S., Ludin, A.N.M.M., Ho, C.S., 2012. Conceptual design for an integrated geosimulation and analytic network process (ANP) in gentrification appraisal. Applied Spatial Analysis and Policy 5, 253–271. S¸alap-Ayça, S., Jankowski, P., 2016. Integrating local multicriteria evaluation with spatially explicit uncertainty-sensitivity analysis. Spatial Cognition and Computation 16, 106–132. Saltelli, A., 2000. What is sensitivity analysis. In: Saltelli, A., Chan, K., Scott, M. (Eds.), Sensitivity analysis. Wiley, New York, pp. 3–12. Santé-Riveira, I., Boullón-Magán, M., Crecente-Maseda, R., Miranda-Barrós, D., 2008. Algorithm based on simulated annealing for land-use allocation. Computers and Geosciences 34, 259–268. Sengupta, R.R., Bennett, D.A., 2003. Agent-based modelling environment for spatial decision support. International Journal of Geographical Information Science 17, 157–180. Sharifi, A., van Herwijnen, M., 2002. Spatial decisions systems. International Institute for Geoinformation Science and Earth Observation, Enschede, Netherlands. Sharifi, M.A., Boerboom, L., Shamsudin, K., 2004. Application of multiple criteria decision analysis in the Klang Valley integrated land use and transportation study. Malaysian Journal of Town Planning 2, 40–55. Sharma, S.K., Lees, B.G., 2004. A comparison of simulated annealing and GIS based MOLA for solving the problem of multi-objective land use assessment and allocation. Proceedings of the 17th International Conference on Multiple Criteria Decision Analysis. Canada, Whistler. Simão, A., Densham, P.J., Haklay, M.M., 2009. Web-based GIS for collaborative planning and public participation: An application to the strategic planning of wind farm sites. Journal of Environmental Management 90, 2027–2040. Stewart, T.J., 1996. Robustness of additive value function methods in MCDM. Journal of Multi-Criteria Decision Analysis 5, 301–309. Stewart, T.J., 1992. A critical survey on the status of multiple criteria decision making: Theory and practice. OMEGA 20, 569–586. Stewart, T.J., Janssen, R., van Herwijnen, M., 2004. A genetic algorithm approach to multiobjective land use planning. Computers and Operations Research 31, 2293–2313. Stillwell, W.G., Seaver, D.A., Edwards, W., 1981. A comparison of weight approximation techniques in multiattribute utility decision making. Organizational Behavior and Human Performance 28, 62–77. Sugumaran, R., DeGroote, J., 2011. Spatial decision support systems: Principles and practices. CRC Press, Boca Raton, FL. Talbi, E.G., 2009. Metaheuristics: From design to implementation. Wiley, Hoboken, NJ. Tkach, R., Simonovic, S., 1997. A new approach to multicriteria decision making in water resources. Journal of Geographical Information and Decision Analysis 1, 25–43. Trunfio, G.A., 2006. Exploiting spatio-temporal data for the multiobjective optimization of cellular automata models. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (Eds.), Intelligent data engineering and automated learning- IDEAL 2006. Lecture Notes in Computer Science No. 4224 pp. 81–89, Burgos, Spain, 20–23 September. Heidelberg: Springer. van Herwijnen, M., Rietveld, P., 1999. Spatial dimensions in multicriteria analysis. In: Thill, J.C. (Ed.), Spatial multicriteria decision making and analysis: A geographic information sciences approach. Ashgate, London, pp. 77–99. Vanegas, P., Cattrysse, D., Van Orshoven, J., 2011. A multiple criteria heuristic solution method for locating near to optimal contiguous and compact sites in raster maps. In: Murgante, B., Borruso, G., Lapucci, A. (Eds.), Geocomputation, sustainability and environmental planning: Studies in computational intelligence. Springer-Verlag, Berlin, pp. 35–56. Voogd, H., 1983. Multicriteria evaluation for urban and regional planning. Pion Limited, London. Ward, D.P., Murray, A.T., Phinn, S.R., 2003. Integrating spatial optimization and cellular automata for evaluating urban change. The Annals of Regional Science 37, 131–148. Wu, F., 1998. SimLand: A prototype to simulate land conversion through the integrated GIS and CA with AHP-derived transition rules. International Journal of Geographical Information Science 12, 63–82. Xiao, N., Bennett, D.A., Armstrong, M.P., 2002. Using evolutionary algorithms to generate alternatives for multiobjective site-search problems. Environment and Planning A 34, 639–656. Xiao, N., Bennett, D.A., Armstrong, M.P., 2007. Interactive evolutionary approaches to multiobjective spatial decision making: A synthetic review. Computers, Environment and Urban Systems 31, 232–252. Yager, R.R., 1988. On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Transactions Systems, Man and Cybernetics 18, 183–190. Yager, R.R., 1996. Quantifier guided aggregation using OWA operators. International Journal of Intelligent Systems 11, 49–73. Yatsalo, B., Didenko, V., Gritsyuk, S., Sullivan, T., 2015. Decerns: A framework for multi-criteria decision analysis. International Journal of Computational Intelligence Systems 8, 467–489. Zarghami, M., Szidarovszky, F., 2011. Multicriteria analysis applications to water and environment management. Springer-Verlag, Berlin. Zeleny, M., 1982. Multiple criteria decision making. McGraw Hill, New York.

Multicriteria Analysis

217

Zeng, H., Pukkala, T., Peltola, H., Kellomäki, S., 2007. Application of ant colony optimization for the risk management of wind damage in forest planning. Silva Fennica 41, 315–332. Zhang, T., Hua, G., Ligmann-Zielinska, A., 2015. Visually-driven parallel solving of multi-objective land-use allocation problems: A case study in Chelan, Washington. Earth Science Informatics 8, 809–825. Zhu, X., McCosker, J., Dale, A.P., Bischof, R.J., 2001. Web-based decision support for regional vegetation management. Computers, Environment and Urban Systems 25, 605–627. Zografos, K.G., Androutsopoulos, K.N., 2008. A decision support system for integrated hazardous materials routing and emergency response decisions. Transportation Research Part C: Emerging Technologies 16, 684–703.

Further Reading Carver, S., 1999. Developing web-based GIS/MCE: Improving access to data and spatial decision support tools. In: Thill, J.C. (Ed.), Spatial multicriteria decision-making and analysis. Ashgate, Aldershot, pp. 49–75. Eastman, R., 1999. Multi-criteria evaluation and GIS. In: Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W. (Eds.), Geographical information systems. Wiley, New York, pp. 493–502. Greene, R., Devillers, R., Luther, J.E., Eddy, B.G., 2011. GIS-based multiple-criteria decision analysis. Geography Compass 5, 412–432. Jankowski, P., Andrienko, N., Andrienko, G., 2001. Map-centred exploratory approach to multiple criteria spatial decision making. International Journal of Geographical Information Science 15, 101–127.

1.16

Agent-Based Modeling

Andrew Crooks, George Mason University, Fairfax, VA, United States Alison Heppenstall and Nick Malleson, University of Leeds, Leeds, United Kingdom © 2018 Elsevier Inc. All rights reserved.

1.16.1 Introduction 1.16.2 The Rise of the (Automated) Machines 1.16.3 What Is ABM? 1.16.3.1 Making Agents More Human 1.16.4 Steps to Building an Agent-Based Model 1.16.4.1 Preparation and Design 1.16.4.2 Model Implementation 1.16.4.3 Evaluating a Model 1.16.4.3.1 Verification 1.16.4.3.2 Calibration 1.16.4.3.3 Validation 1.16.4.3.4 Difficulties in evaluating spatial models 1.16.5 Integrating GIS and Space Into Agent-Based Models 1.16.5.1 Coupling and Embedding GIS and Agent-Based Models 1.16.5.2 ABM Toolkits 1.16.5.3 Example Applications 1.16.6 Challenges and Opportunities 1.16.7 Conclusion References Further Reading Relevant Websites

219 219 221 223 224 225 226 227 227 227 227 228 228 229 230 234 234 237 238 243 243

Glossary Agent-based modeling A computer simulation comprising of multiple types of heterogeneous agents which are autonomous decision-making entities (e.g., cars). Agents are given specific rules for interaction with other agents and/or other entities within a system. Through such rules and interactions, more aggregate patterns emerge (e.g., traffic jams). Big data Big data not only refers to datasets that are unusually large in volume, but those that also exhibit other properties that distinguish them from “traditional” datasets. These include velocity (data that are generated rapidly and might only be relevant for a short amount of time), variety (data that are stored in various formats and hold diverse pieces of information), and veracity (there are uncertainties around bias, noise, and the level of representation) (Croitoru et al., 2014). Examples include established datasets such as national censuses to mobile phone data or social media data. Bottom-up modeling The notion that changes at the global or macro-level are driven from interactions that occur at the lowest micro-level. For example, in Schelling’s (1971) classic segregation example, an individual’s preference for who they live nearby drives the patterns of segregation that occur at a neighborhood level. Bounded rationality In decision making, individuals or agents do not have universal knowledge. Their decisions are limited by the information that they have which is often specific to their context, the time they have to make their decisions, and their cognitive abilities (Simon, 1955). Calibration This is the process of adjusting model parameters so that the behavior of the model closely matches some observed (often historical) conditions (O’Sullivan, 2004). This is also known as “fitting” the model to some observed data (Gilbert and Terna, 2000). Complexity Complexity arises when a small number of rules or laws, applied at a local level and among many entities, are capable of generating complex global phenomena: collective behaviors, extensive spatial patterns, hierarchies, etc. are manifested in such a way that the actions of the parts do not simply sum to the activity of the whole. Emergence The product of interactions between individuals, for example, an exchange of information. From these interactions new patterns, processes, or information is produced. What is central to the idea of emergence is that what emerges cannot be predicted. As Aristotle noted, “the sum is greater than the parts”. Validation Validation is the next process after calibration and its main purpose is to demonstrate that the model is sufficiently accurate given the context of the system that it is attempting to simulate. This is often achieved through applying the model to unseen data to test that it can produce the correct output.

218

Agent-Based Modeling

1.16.1

219

Introduction

Understanding individual behavior patterns, their causes, and their consequences has been an issue that has taxed geographers for the last 50 years. Early approaches such as Alonso’s (1964) bid rent model and Hagerstrand’s (1967) diffusion model had one central common intellectual caveat: in order to say something useful about social systems, analysis had to take place at the aggregate level (Heppenstall et al., 2012). Viewing the world in this manner meant that geographical systems were distilled down into homogeneous units whereby it was virtually impossible to say anything meaningful about their inner workings or microdynamics (Batty, 2008). New forms of “big” data and simulation methods are beginning to give greater insight into the complexity inherent within geographical systems and to reveal the role of the individual. This understanding is being reflected in the way that geographical systems are currently being conceptualized. Instead of large, aggregate models that contain equations applied to homogeneous groups, recent thinking emphasizes the individual, in particular their networks and interactions, as being one of the most important factors that shape geographical systems (Batty, 2013). For O’Sullivan et al. (2012) these interactions and decisions can potentially be seen as the drivers of social systems. If we can piece together knowledge about who is making these decisions and what influences them, we can advance on the previous 50 years of geographical modeling by being able to both pose and answer important questions about the causes and consequences of individual behavior patterns. Leveraging this information about individuals and their interactions will hopefully give researchers a clearer understanding about the role that complexity plays in shaping geographical systems. The definition of a complex system encompassing “heterogeneous subsystems or autonomous entities, which often feature nonlinear relationships and multiple interactions” (An, 2012) is closely aligned to the description that we would assign to geographical systems. Complexity theory is still in its early days with advancements required in the “onotological and epistemological representations” of complexity (An, 2012; Grimm et al., 2005; Manson and O’Sullivan, 2006; Parker et al., 2003). The merger of concepts from complexity, agent-based modeling (ABM), and big data has the potential to create a new deeper understanding of the mechanisms powering geographical systems. To do this we need to use tools that can exploit these new forms of data to create detailed simulations of the main components and drivers of geographical systems. Perhaps most importantly, these methods need to be able to simulate behaviors and interactions at the individual level. An individual-based method that has seen a rapid uptake by researchers across the social and geographical sciences in the past 20 years is ABM (Macal, 2016). ABM advocates the creation of individuals with their own attributes and behaviors. The emphasis within these models on the individual makes it a natural framework to apply within social and geographical systems. Its popularity has been cemented by increases in computer processing power and data storage along with developments in computer-programming languages and easily accessible frameworks that enable rapid development of models with minimal programming experience. ABM has now reached a point of acceptance as a research tool across the geographical and social sciences with numerous journal articles and books dedicated to applications and developments (see, e.g., Batty, 2005; Benenson and Torrens, 2004; Gimblett, 2002; Heppenstall et al., 2012; Railsback and Grimm, 2011). Furthermore, the recent emergence of big data is beginning to have a large impact on ABM (Heppenstall et al., 2016). The proliferation of novel, high-resolution, individual-level data sources provides ABM with an opportunity to address some of the more critical methodological issues, namely construction of accurate individual-level behavioral rule sets and robust model calibration and validation. Whether this opportunity is taken up is yet to be seen. What is clear is that researchers now have the data and tools at their disposal to examine geographical systems in unprecedented individual-level detail, thus creating new knowledge and understanding about how these systems evolved and what the consequences of future individual behaviors are likely to be. This article presents an overview of how ABM is being utilized for the simulation of geographical systems. We have attempted to be as comprehensive as possible given the space available, but will have inevitably omitted important aspects. In this article, we have attempted to find a balance by including what we consider to be the key elements of ABM with respect to geographical systems and providing extensive references for interested readers to follow up on any of our discussion. We begin in section “The Rise of the (Automated) Machines” by providing a brief overview of the development of agent-based models, before moving on to defining agents in section “What Is ABM?” and how they can be developed to incorporate human behavior. Section “Steps to Building an Agent-Based Model” walks through the process of how to design and build agent-based models including steps for verification, calibration, and validation of such models. For geographers, the influence of space on agents is one of the key factors that we need to account for; rationale and guidance on integrating agent-based models with geographical information systems (GIS) is given in section “Integrating GIS and Space Into Agent-Based Models” along with a range of applications where agent-based models have been applied. Finally, we discuss what the main challenges for ABM are in section “Challenges and Opportunities” before offering concluding thoughts in section “Conclusion”.

1.16.2

The Rise of the (Automated) Machines

ABM owes much of its unique character to ideas and concepts borrowed from other disciplines. For example, one of the key strengths of ABM is the ability of individual agents to be autonomous, that is, have full control over their future decisions. This part of an ABM’s DNA owes much to early work on digital computers, such as Lovelace (1843) and Turing’s (1936) work on the computability of mathematics and von Neumann’s (1951) early work on computer design. As Torrens (2000) points out, the intelligence that is endowed on agents, such as the ability to reason, can also be seen as drawing on ideas from cybernetics (Weiner, 1948) and intelligent machines (Turing, 1950).

220

Agent-Based Modeling

Agent-like models first began to make an appearance in the academic literature in the 1960s and 1970s. Prior to this, the most common approaches to simulating and gaining new insight into geographical systems were through the use of more established mathematical and statistical models such as system dynamics, spatial interaction (e.g., Fotheringham and O’Kelly, 1989), or diffusion models (Hagerstrand, 1967). In spatial interaction models, for example, large diverse groups of people were treated as one homogeneous (aggregate) group and all given the same behavior and movements (Wilson, 1974). The main criticisms leveled at these approaches were that the rich variability inherent within many data forms was lost through the process of aggregation. While these approaches can and have been used to successfully predict macro-level patterns of behavior (e.g., Batty, 1976; Birkin et al., 1996), they struggle to give any insight into why these behaviors appeared and how they might manifest themselves in the future. Perhaps most importantly for geographers wishing to understand the impact of individual behavior, these methods are incapable of simulating individuals, their interactions, and the resulting consequences. The first automata models that focused on modeling geographical phenomena came from the area of cellular automata (CA). The basic features of a CA have been well documented within the literature (see Wolfram, 2002 for a thorough overview). Essentially, a CA is a discrete dynamic system with the behavior specified in terms of local relationships with space represented as a grid (i.e., a lattice) containing cells of equal size. The cell state (e.g., a value of either 0 or 1) is determined by the state of its neighbors, by a set of local rules, and by the cell itself (see Benenson and Torrens, 2004; Iltanen, 2012; Wolfram, 2002) as illustrated in Fig. 1. From a geographical perspective, the first notable use of a CA was for the purpose of urban growth and land use (e.g., Nakajima, 1977; Tobler, 1979). Due to the relative ease of implementation, CA modeling remains a popular choice for simulating large-scale urban phenomena such as urban sprawl (Al-Ahmadi et al., 2009; Clarke et al., 1997; White and Engelen, 1993). Related to CA is microsimulation. With its origins in economics (Orcutt, 1957), this approach has been widely used to study the impact of social, administrative, or environmental changes on populations. This methodology is commonly used to create small area microdata such as individual-level synthetic populations. Fig. 2 shows how individual-level data, such as the Sample of Anonymised Records (a sample of individual-level data from the UK Census) for a given spatial area, are used to create

1

2

3

4

Fig. 1 Diagrammatic representation of 2D CA and the most commonly used neighborhoods (1) von Neumann 1-neighborhood, (2) Moore 1-neighborhood, and extended (3) von Neumann and (4) Moore neighborhoods.

SAR Non-spatial, individual level

Synthetic population Spatial, individual level

Census statistics Spatial, aggregate level

Fig. 2

Population reconstruction model (PRM)

Schematic outlining of the basic process of creating a synthetic population using microsimulation.

Agent-Based Modeling

221

a representative synthetic population of that area. Transition probabilities, such as the likelihood of an individual giving birth, are then applied to each individual unit to generate future scenarios (see Harland et al., 2012 for a discussion of the different methods used to achieve this). As with CA, microsimulation operates at the level of the individual to simulate the global consequences of local interactions while allowing the characteristics of each individual to be tracked over time. Microsimulation has been widely taken up in geography with applications within transport (e.g., MATSim (Horni et al., 2016) and TRANSIMS (Nagel et al., 1997)), population dynamics (e.g., SimBritain, Ballas et al., 2005), and health (e.g., Smith et al., 2011). Wu and Birkin (2012) provide a detailed overview of the diversity of spatial applications of microsimulation (along with noting how such an approach could be leveraged within agent-based models). While both CA and microsimulation allow the simulation of individuals, they do contain several drawbacks. CA models tend to be spatially constrained to a grid and, although each cell can be in a different state, they are all driven by identical transition rules. This makes it impossible to capture a range of unique individual behavioral traits. Microsimulation only allows the modeling of one-direction interactions: the impact of the policy on the individuals. With microsimulation there is no behavioral modeling capability (as everything happens through transition matrices) and perhaps most importantly, individuals do not interact with each other (Gilbert and Troitzsch, 2005). The ability to create individuals with multiple attributes and behaviors who can move freely within a space and interact with other individuals is critical if we are to create tools that allow new insight into the dynamics and processes of geographical systems and to understand the impact of individual patterns of behavior. ABM is a methodology that inherently possesses these key elements. It arrived at a time where computing power and storage were rapidly increasing along with new developments in object-orientated computer-programming languages that were well suited to the development of agent-based models. It is at this point in the 1990s that ABM trickled into the social sciences, most notably through the work of Epstein and Axtell (1996) who demonstrated how agent-based models could be extended from modeling people to growing entire artificial societiesdan area that they termed Generative Social Science. Because ABM generates emergent phenomena from the bottom-up, it raises the issue of what constitutes an explanation of such a phenomenon. According to Epstein and Axtell (1996): [ABM] may change the way we think about explanation in the social sciences. What constitutes an explanation of an observed social phenomenon? Perhaps one day people will interpret the question ‘Can you explain it’ as asking ‘Can you grow it?’

It is this thinking that has percolated into the ABM community leading to new ways of examining geographical systems, “not from a traditional modeling perspective but from the perspective of redefining the scientific process entirely” (Bonabeau, 2002). Rather than favoring deduction (testing of assumptions and their consequences) or induction (the development of theories on the basis of observations), there is a third way (Axelrod, 1997). The researcher starts with a set of assumptions, but then employs experimental methods to generate data that can be analyzed inductively (Gilbert and Troitzsch, 2005).

1.16.3

What Is ABM?

As discussed in section “Introduction”, the development and acceptance of ABM is the result of several different strands that include the development of automata approaches (e.g., CA) and object-orientated programming languages such as Cþþ. These key elements coevolved with developments in increased processing power, data storage, and the proliferation of large amounts of individual-based data that have shaped ABM even further. In this section, we examine the components identified within the common definitions of ABM to see if they hold with current ABM design and practice. A particular focus is how these definitions fit with ABM that are routinely applied to geographical systems; do our commonly held definitions need to shift to ensure that we able to “facilitate the exploration of system processes at the level of their constituent elements” (Crooks and Heppenstall, 2012)? Despite the rapid proliferation of ABM, there is surprisingly no universal definition of an agent-based model. While definitions often have several commonalities, normally related to their function (see Table 1 for a description of these), the sheer diversity of applications makes anything more than a loose definition of ABM seemingly impossible (see Macal, 2016). Far from this being problematic, this is reflective of the broad framework which has promoted the vastly different applications that can be designed and built (as will be highlighted in section “Integrating GIS and Space Into Agent-Based Models”). One of the most “popular” and early definitions of ABM heralds from a computer science perspective. Wooldridge (1997) defined agents as “an encapsulated computer system that is situated in some environment and that is capable of flexible, autonomous action in that environment in order to meet its design objectives”. While easily accessible, Wooldridge’s definition is broad in scope, prompting Jennings (2000) to give further clarification emphasizing that agents should have a specific goal and be embedded within an environment within which the agents receive inputs. In addition, they should have control (autonomy) over their internal state and behavior that allows them to be reactive and flexible in pursuit of fulfilling their objectives. These two definitions together clearly articulate a view of ABM from a technical perspective with the emphasis on the DNA of the agents. Other definitions have emphasized the value of agent-based models in providing new knowledge and understanding about the dynamics of the real-world through their ability to evolve and interact, allowing unanticipated actions and behaviors to emerge from a computer simulation (Crooks and Heppenstall, 2012). Here, concepts such as emergence and bottom-up modeling

222

Agent-Based Modeling Table 1 Core components of an agent: drawn from Wooldridge and Jennings (1995), Epstein (1999), Bonabeau (2002), Macal and North (2005), Crooks et al. (2008), Crooks and Heppenstall (2012), and Torrens (2012) Component

Description

Autonomy

There is no central control over agentsdthey are essentially masters of their own fate Each agent can potentially have their own unique attributes and rules Agents “exist” within some form of space: this can be a physical environment or more abstract (e.g., social network) Agents can interact with other agents exchanging information Agents do not have universal knowledge, only that specific to their context Each agent should be able to proactively respond to changes in their environment

Heterogeneity Explicit space Interactions Bounded rationality Reactivity

dominate the descriptions of the agents. For Bonabeau (2002), it is the ability of ABM to handle emergent phenomena that sets ABM apart from other approaches and allows agents to provide a natural description of the system. While including the more technical attributes of agents, these definitions also emphasize the heterogeneity of agents, with individual variations being accounted for by random influences (Helbing and Balietti, 2011). A slightly different perspective is taken by researchers whose ABMs are embedded within spatial systems. Here the emphasis is placed on the ability of agents to move and interact with each other and the environment at the micro-level (neighborhood), producing emergent behavior that can only be revealed by viewing the system from a higher geographical (macro) scale (Crooks et al., 2008; Malleson et al., 2010). For spatially explicit applications, the influence of space is equally important to agent movement and interactions (e.g., Parker et al., 2003; Pires and Crooks, 2017). Fig. 3 diagrammatically represents some of these core “components.” Here, two heterogeneous agents are situated in their own distinct spaces within an artificial world. These spaces could be specific geographical areas which exert their own environmental influences on the agents, for example, the availability of a particular service. Through the direct interaction, the agents exchange information which can lead to the emergence of new knowledge or ideas. This newly “emerged” knowledge may result in the agent reacting and pursuing a new form of behavior/decision making to reach its goal. To give a simple example of how these characteristics come together within an ABM and how micro-level agent choices lead to macro-level patterns emerging, consider the simple segregation model of Schelling (1971); one of the earliest ABMs. Within this model there are two types of agents (green and yellow) as shown in Fig. 4. Each agent is autonomous and possess a desire to live in a neighborhood (defined by its eight surrounding cells), with a percentage of neighbors who are identical to themselves. In this example, the agent preference for similar neighbors was set at 30% (i.e., at least 30% of an agent’s neighbors must be of the same type for the agent to be satisfied). Initially the agents are randomly distributed throughout the environment. Agents move (i.e., take an action) if they are in a situation where their neighborhood preferences are not met (i.e., the condition to move). Over time, agents move to empty areas (the black cells) and segregated neighborhoods emerge at the aggregate level. While it is unsurprising that the level of segregation increases with individual preferences for similar neighbors, it is surprising that communities still self-segregate even when the preference for similar neighbors is relatively low (e.g., 30%). This is illustrated in Fig. 5. The utility of such a model is that it demonstrates that aggregate patterns cannot always be easily discerned from the

Emergence Behaviour Interaction

Hetereogeneity

Fig. 3

Schematic illustrating of some of the main components of an agent.

Goal

Agent-Based Modeling

Initial state (time (T ) = 0)

T=1

T=2

T=3

T=4

T=5

T=6

T=7

T=8

T=9

Fig. 4

Progression of segregationdagents want to live in a neighborhood where 30% of their neighbors are the same color.

Fig. 5

Examples of how changing agents’ neighborhood preference leads to different patterns of segregation emerging.

30% preference

40% preference

50% preference

223

60% preference

behaviors of the individual agentsdthe sum is greater than the parts. While this is a simple example, such mild tastes and preferences for certain neighborhood types have been shown to cause “real” world segregation to emerge (see Benenson et al., 2002). This example also serves to show the importance of being able to capture and represent emergent phenomena in geographical systems. Without the ability to model this property through individuals and bottom-up modeling, we cannot fully understand the dynamics and processes within our systems of interest. Finally, while there is a broad consensus in the core characteristics that an agent should contain, the application area and the research agenda of the developer exert the greatest influence over the overall form of the agent, and thus its definition. For example, Heppenstall et al. (2005) linked an agent-based model to established geographical (spatial interaction) and artificial intelligence (AI) inspired (genetic algorithm) models to handle influences of space, and to optimize agent behavior. The agents within this application could equally be described as being hybrid, spatially explicit, or optimizing, highlighting how easily other definitions can be assigned to agent-based models. With the increase in big data and the new insights into individual behavior that they offer, researchers will be able to pose increasingly complex questions about how social systems work and how they will evolve in the future (as will be discussed in section “Challenges and Opportunities”). It is likely that the agent-based models that they build to answer these questions will absorb the characteristics of these new forms of data, thereby generating new essential components within an agent design and definition.

1.16.3.1

Making Agents More Human

While we provided an overview of the core components of an agent and how the ABM paradigm perfectly aligns with simulating individuals in geographical systems, how do we capture and embed the behavior that makes us unique into these models? Humans are in possession of such diverse personality traits, varying levels of knowledge, experience, desires, and emotions (common set of emotions: interest, joy, happiness, sadness, anger, disgust, and fear (Izard, 2007; Bonabeau, 2002)) that attempting to simulate any small aspect of this seems a foolish endeavor. Fortunately for the modeler, it is rare that we wish to simulate the full spectrum of

224

Agent-Based Modeling

human behavior. Instead, we are often interested in one or two clearly defined aspects of behavior that we believe has a strong influence on the system under investigation. Therefore in order to simulate human behavior, researchers embed behavioral frameworks into their ABMs (see Kennedy (2012) and Balke and Gilbert (2014) for an overview). These frameworks can be broadly categorized into two areas; mathematical and cognitive. Examples from each category are presented later to demonstrate their utility. Mathematical approaches range from the use of random number generators to select between predefined choices (e.g., Gode and Sunder, 1993), or the use of threshold-based rules and complex probabilistic rules sets. Due to its ease of implementation, one of the most commonly used approaches is that of threshold-based rules. Threshold rules come into operation when a preset value is exceeded. Depending on the value, this will result in behavior from a predefined set. For example: IF < hunger> is below THEN agent-dies IF < hunger> is above THEN address another goal IF < hunger> is between and THEN search-for-food Adapted from Kennedy (2012).

Examples of the use of this approach for geographically explicit models can be found in Heppenstall et al. (2005) and Crooks (2010). While such rule sets are able to broadly simulate behavior, one of the main criticisms leveled at this approach is that it cannot easily account for the multiple components of human behavior and they should only be used when behavior can be well specified. An alternative approach that can readily handle more complex behavior comes from the category of cognitive frameworks. Perhaps the most popular architecture used is the Beliefs–Desires–Intentions (BDI) modeling framework (Bratman et al., 1988; Rao and Georgeff, 1991). This architecture has been used in several areas, including air traffic management systems (Rao and Georgeff, 1995), simulations of geopolitical conflicts (Taylor et al., 2004), land-use planning (Caillou et al., 2015), and frameworks for models of crime reduction (Brantingham et al., 2005a,b). Despite its uptake, the BDI framework has been widely criticized as being too restrictive while others feel that they are overly complicated (Rao and Georgeff, 1995). Fundamentally, the framework assumes rational decision making, which is difficult to justify because people rarely meet the requirements of rational choice models (Axelrod, 1997). Brailsford and Schmidt (2003) see the restriction of the framework to cognitive processes as a limitation; the BDI framework cannot integrate physical, emotional, or social processes or the interactions between them. Balzer (2000) also notes that the core elements are difficult to observe directly: observation can only be achieved in a laboratory setting which is unlikely to relate to real situations. A cognitive framework that can overcome these limitations is the PECS framework (physical conditions, emotional states, cognitive capabilities and social status). Proposed by Schmidt (2000) and Urban (2000), this architecture states that human behavior can be modeled by taking into account physical conditions, emotional states, cognitive capabilities, and social status. The framework is modular, in the sense that it allows the modeler to separate components that control each aspect of the agents’ behavior (Martõnez-Miranda and Aldea, 2005). To illustrate the PECS framework features, an example proposed by Urban (2000) is adapted here. Consider a person in a shop who is contemplating purchasing some goods. They might experience physical needs (such as hunger), emotional states (such as surprise at the available goods), cognition (such as information about current prices), and social status (which will, e.g., affect how the agent reacts to the shop assistant). Schmidt (2000) and Urban (2000) argue that every aspect of human behavior can be modeled using these components although, depending on the application, it might not be necessary to incorporate all of them (Schmidt, 2002). Recent examples of successful integration of this approach into geographically explicit agent-based models include that of Malleson et al. (2010) and Pires and Crooks (2017). While these frameworks are representative of the two broad approaches that modelers use for simulating behavior in agents, the number of alternative architectures that are appearing is rapidly increasing (e.g., Gigerenzer and Goldstein, 1996). However, regardless of the complexity of these frameworks, they are equally reliant on access to detailed individual-level data and rigorous calibration and validation for their results to be valid. How we calibrate and validate these models, as well as an overview of the data that we can use is presented in the following sections.

1.16.4

Steps to Building an Agent-Based Model

. although ABM is technically simple, it is also conceptually deep. This unusual combination often leads to improper use . .

Bonabeau (2002).

ABM is an extremely powerful methodology. As shown in the sections earlier, and will be demonstrated in the following sections, it is technically possible to incorporate a vast array of complicated individual behavioral frameworks, voluminous interactions, complex psychology, intricate (spatial) environments, and many other advanced modeling features that are typically extremely difficult to account for when using aggregate modeling methods. However, this flexibility is tempered by the risks of creating over-complicated models. Models that have been poorly designed or inadequately validated have the potential to be no easier to understand than the underlying system that they are attempting to simulate (Crooks et al., 2008).

Agent-Based Modeling

225

To mitigate these risks, the ABM community has developed a suite of approaches that can inform the design, implementation, and testing of models to make them more robust. These include innovative means of validating models (e.g., pattern-oriented modeling, Grimm et al., 2005), standard approaches to the design and documentation of models, for example, the Overview, Design concepts, and Details (ODD) protocol (Grimm et al., 2006, 2010) which has recently been extended to incorporate more options for describing human decision making (ODDþD; Müller et al., 2013), the application of computationally efficient methods to explore large parameter spaces (e.g., genetic algorithms, Malleson et al., 2009), and many others. This section will discuss some of these methods in order to illustrate best practice in building reliable agent-based models. Fig. 6 presents an overview of the typical model design process. It is important to note that this is not the only process that a modeler might use in order to create their model. As with many features of ABM, there are a multitude of ways to approach the task of building a model. For more details, the interested reader should refer to any of the excellent ABM text books that are available such as Wilensky and Rand (2015) or Railsback and Grimm (2011) and for steps in modeling within the social sciences more generally see Gilbert and Troitzsch (2005). The remainder of this section will discuss each step in more detail: formulating a research question and designing an appropriate model (section “Preparation and Design”); implementing the model using a chosen tool (or suite of tools) as discussed in section “Model Implementation”; evaluating the robustness and accuracy of the model (section “Evaluating a Model”); and finally evaluating the robustness and accuracy of the model (section “Evaluating a Model”) and the challenges that spatial models pose and how they can be overcome.

1.16.4.1

Preparation and Design

The first step in the modeling process is to define the research question (i.e., what element of the real-world are you interested in?). What, specifically, will be the aim and purpose of the model? At this stage, it is extremely important to decide whether ABM is a suitable approach. Although ABMs are potentially extremely powerful, they are also much more complicated to create than aggregate modeling methods, such as regression. This is partly due to the immaturity of the available tools (software packages such as SPSS

Preparing to model

Design and preparation Model design

Building the model

Model implementation

Verification

Model evaluation

Calibration

Validation

Explanation

Prediction

Running the model and understanding insights Fig. 6

An overview of the typical modeling process.

226

Agent-Based Modeling

allow the user to run a multitude of mathematical models at the push of a button; equivalents do not exist for ABM) but also because the models themselves are considerably more complicated. It is therefore important to critically assess the target system under study, understand which factors are the most important drivers of the system, and therefore decide whether the advantages offered by ABM outweigh the additional difficulties. As an example, consider a model of air moving over the wing of an aircraft. The system consists of individual entities (air particles) and its aggregate properties (i.e., air motion, pressures, etc.) are dependent on the interactions between discrete particles. In this sense, it appears to be an ideal candidate for an ABM. However, in this case the behavior of the particles is effectively homogeneous and the system can therefore be modeled adequately with aggregate equations. There is no benefit to modeling the individual particles. That said, there are many systems in which the complex interactions that govern the real-world behavior make analytical solutions impractical or impossible (Bonabeau, 2002). If the use of ABM is considered to be an appropriate approach, then the first design decision regards the level of abstraction of the model which depends on its purpose. Take the example of a map, the level of detail portrayed on the map is dependent on its purpose. For instance, a map intended for driving is generally less detailed than that of one for hiking as the purpose is different (Miller and Page, 2007). This is the same for modeling and there are typically two categories that the model will fall under: predictive or explanatory. With predictive modeling, the aim is to simulate a system with a degree of realism such that the results can be used empirically. A predictive agent-based model of a social phenomenon (such as crime, traffic, protests, etc.) might, for example, include a realistic representation of the underlying environment that allows the models to make predictions about the future state of the real-world directly. Explanatory modeling, on the other hand, is typically concerned with refining the theoretical explanations that explain a phenomenon. Explanatory models are often greater abstractions away from the real-world than their predictive counterparts. The Schelling (1971) segregation model outlined in section “What Is ABM?” is an excellent example of an explanatory modeldit attempts to better understand a phenomenon but does not directly provide guidance about where or how the real system can be manipulated to reduce segregation. Having broadly “prepared” for modeling by deciding on the approximate level of abstraction, the process in Fig. 6 suggests that the next stage is to design the model. Wilensky and Rand (2015) describe this as “top-down” modeling; the designer comprehensively plans the characteristics and behaviors of the agents, designs the environment, and defines the possible interactions. However, an equally appropriate approach might be to begin implementing the model immediately, such that the design and implementation coevolve. In practice, most modelers will use some combination of approaches; for example, a broad design with specifics resolved during implementation. Before discussing the implementation specifically, it is worth highlighting two important developments. The first is the ODD protocol (Grimm et al., 2006, 2010). The protocol formalizes the approach to documenting models with the aim of making it easier for others to understand and reproduce models. It is organized around the three main components that need to be documented: Overview, Design concepts, and Details. Modelers should to be aware of the protocol so that they can choose whether to make use of some or all of it when documenting their own models. The second development concerns the level of complexity of a model. It was highlighted earlier that a poorly designed agent-based model will be no easier to understand than the target system (i.e., the real-world) that it is attempting to model. Although most people agree that model complexity is not appropriate unless it is justified by the target system, opinions as to how to reach the “appropriate” level of complexity are polarized. The “Keep It Simple, Stupid” (KISS) argument (e.g., Axelrod, 1997) posits that models should be as simple as possible initially, with additional complexity added only if the model is unable to appropriately represent the system in its simplest form. The “Keep It Descriptive, Stupid” (KIDS) approach (e.g., Edmonds and Moss, 2004), on the other hand, is to start with a model that reflects the evidence and knowledge about the target system, however complex that it makes the model, iteratively remove features that appear to be unnecessary. Recent KIDS work has also explored the stages of this abstraction process in detail (Lafuerza et al., 2016).

1.16.4.2

Model Implementation

As noted earlier, implementing an agent-based model is usually considerably more involved than alternative, traditional modeling approaches. This is partly because there are no standard models that can be used “off the shelf”devery ABM is different and, given the complexity of the underlying systems that agent-based models attempt to simulate, a generic model would not work. However, in the last decade a number of toolkits have emerged that substantially reduce the difficulty required to implement models (this will be discussed in detail in section “ABM Toolkits”). That said, using an existing toolkit is not essential and there can be significant advantages to building a model from the ground up using conventional (object-oriented) programming techniques if a modeler has sufficient programming knowledge and experience. The advantage with using existing tools, rather than starting from scratch, is that the common elements that models require such as graphics libraries, common algorithms, data input/output procedures, and analysis tools need not be re-implemented. Section “ABM Toolkits” reviews a number of toolkits that have been designed to support the implementation of agent-based models. Although most share similar features, some are easier to use (particularly for developers without any programming experience) and others are potentially more powerful (guidelines for assessing such frameworks are discussed in section “ABM Toolkits”). For example, Repast Simphony includes a High Performance Computing extension (see Collier and North, 2013) that allows models to be distributed over grids of connected computers, but this advanced feature can only be leveraged by using the Cþþ language. As part of the design process, it will be worthwhile to consider how complex and computationally expensive the final model will be, and whether the additional difficulty in learning a more advanced tool will be offset by the advantages in performance and flexibility.

Agent-Based Modeling 1.16.4.3

227

Evaluating a Model

As illustrated in Fig. 6, it is very unlikely that the model-building process ends once the model has been implemented. It is much more likely that insights into the model that are gained during the process of evaluating a complete model will cause the designer to repeatedly return to the initial model design to add refinements that require subsequent implementation. Far from being a burden, this process actually highlights one of the significant advantages of ABM over other methodologies. As the model is developed, the researcher has an opportunity to test their understanding of how the target system works. An ABM will rarely behave exactly as expected, so the model itself and our understanding of the phenomenon that it is modeling have a chance to coevolve. It is nevertheless vital to properly evaluate models because, as the famous quote from Box (1979) suggests: “all models are wrong but some are useful.” If models are to be useful as tools to explore the real-world, then they need to replicate the behavior of their target system to a certain level of accuracy. This degree to which models are able to do this is often termed their validity, that is, the extent to which the model is able to represent the system it is attempting to simulate (Casti, 1997). While perfection is neither possible nor desirable (a perfect model ceases to be a model at all), it is important to understand the levels of uncertainty in a simulation. However, robust evaluation is a step that is often overlooked or given minimal attention. There is no formal methodology for evaluating agent-based models, but the community has largely settled on a standard process that consists of verification, calibration, and validation (although naturally the terms differ across authors).

1.16.4.3.1

Verification

Broadly, verification refers to the process of ensuring that the model implementation corresponds to the model design. In other words, has the model been programmed correctly? This can be thought of as “internal validity” (Axelrod, 1997). One way to increase the reliability of the model is to make use of the techniques that computer programmers have developed. A range of techniques exist that can help to support the writing of accurate code and to provide mechanisms to test for errors (see Balci, 1996). Unit testing, for example, is a common approach, whereby every part of the codebase (the individual “units”) is tested. A more rigorous, although probably more time-consuming approach, is docking (Axtell et al., 1996). Docking refers to the process of creating a second model separately from the first in order to ascertain whether similar results can be replicated from both. Ideally, the two models are created in different languages and by different developers.

1.16.4.3.2

Calibration

Having verified the model implementation (i.e., determining that the model implementation has accurately captured the design), it is possible to calibrate the model. In effect, calibration is the process of adjusting model parameters so that the behavior of the model closely matches some observed (often historical) conditions (O’Sullivan, 2004). This is also known as “fitting” the model to some observed data (Gilbert and Terna, 2000). Calibration is necessary because the theory and empirical evidence that were used to create the general structure of the model are usually insufficiently precise to allow for detailed parameterization. For a hypothetical example, consider an agent rule that determines the conditions under which an agent who is taking part in a marathon will give up and drop out of the race. Theory might suggest that the probability of dropping out (Pdrop out) is influenced in part by levels of energy and in part to psychological motivation: Pdrop out ¼ x1  energy þ x2  motivation While that theory might be reliable, it is insufficiently precise to tell us how important each of those factors are in the overall decision. Hence calibration is required to find the most appropriate values for x1 and x2. In other words, the rules that drive the behavior of the agents might be correct, but we might not have enough information to implement them precisely. Calibration is actually performed by comparing the behavior of the agent-based model to some data that describe the real system. Typically, the approaches are either quantitative or qualitative. Quantitative calibration involves computing the difference between the simulation outputs to real data and calculating a “fitness” value (i.e., a number that represents the similarity between the two datasets). If comparable data are not directly available, it is common to use “stylized facts” (e.g., Heine et al., 2005). As an example, rather than comparing the routes taken by agents in a transport model to real data, one could assess the success of the model on how well it simulated the overall commute time (a very simplified representation of the route taken). Alternatively, qualitative calibrationdalso known as “face calibration” or “face validity” (Gravetter and Forzano, 2011)dconsists of exploring the model results and using human intuition to assess its similarity to the real-world. With face calibration of spatial models, GIS and spatial data visualization are important components as the success of the calibration might rely on the quality of the maps used to compare the two datasets. These difficulties are discussed in more detail later.

1.16.4.3.3

Validation

Calibrating a model to real-world data runs the risk of overfitting the model to the data. A model that has been overfitted will not generalize well to contexts other than the one that it was calibrated on. Therefore, validation completes the process of evaluating a model by testing it on some new data. The aim of the validation is to demonstrate that the model is sufficiently accurate given the context of the system that it is attempting to simulate. If the model is able to reliably replicate real-world conditions that it has not been “fitted” to (i.e., calibrated on), then we can be reasonably confident that it can be applied to new scenarios and to explore potential futures. Typically the methods to assess the success of validation (i.e., model fitness) are the same as those used during the calibration stages. For interested readers, Axtell and Epstein (1994) offer guidelines for validating agent-based models

228

Agent-Based Modeling

depending on their purpose. The authors categorize these purposes on a scale ranging from models that portray a caricature of the agents’ behavior to those that attain quantitative agreements with both emergent macro-structures as well as individual agent’s micro-behavior. An ongoing problem, however, is how to obtain high-resolution data to support validation.

1.16.4.3.4

Difficulties in evaluating spatial models

It is important to stress that assessing how well an agent-based model represents the underlying system is usually extremely challenging. One of the reasons that it can be so difficult is because to properly evaluate a model it must be examined at multiple hierarchical levels. For example, in the segregation example presented in section “What Is ABM?”, to properly evaluate the reliability of the model in comparison with real-world data, it would be necessary to look at more than just the overall degree of segregation. One might examine the decision process of the agents before and after moving, or compare the actual moves themselves to real data. Fortunately, frameworks are emerging to support the complicated process of validating complex models by examining model behavior at numerous hierarchical scales. POM (Grimm et al., 2005) is arguably the most well-known and widely cited example. POM advocates the comparison of numerous patterns produced by models and their counterparts in real (nonsimulated) data. Commonly in ABM, this involves analyzing model outcomes at multiple spatial and temporal scales. To further complicate matters, it can be very difficult to quantify the difference between two spatial patterns (i.e., modeled and observed), even if data are available. Face validation, as discussed earlier, is a common means of comparing spatial patterns, but one that is subjective (one person might see similar patterns where another would not). Instead, spatial statistics can be used to provide quantitative assessments of the similarity between simulated and observed data. For example, the Nearest Neighbor Index (Clark and Evans, 1954) and Ripley’s K (Ripley, 1977) both quantify the degree of clustering in point datasets (for a comprehensive review of such statistics, the interested reader is directed to O’Sullivan and Unwin (2010)). Statistics that are not inherently spatial can also be applied if the data are aggregated to some boundary areas (e.g., administrative areas or a regular grid). Statistics such as R2 and the standardized root mean square error (SRMSE) have been found to give reliable error assessments for spatial models (Knudsen and Fotheringham, 1986). A further advantage to performing validation using statistics, other than the quantitative rigor introduced, is that algorithms can be used to automatically parameterize models without the need for human intervention. Routines such as simulated annealing and genetic algorithms have been used to successfully parameterize complex agent-based models (e.g., Malleson et al., 2009). The remaining hurdle to overcome in the evaluation of spatial agent-based models is that of data. As already noted, data are required at different scales in order to properly evaluate a complex model (Grimm et al., 2005). It is not sufficient to evaluate the model outcomes at an aggregate level in isolation; rather it would be preferable to have individual-level data that can be used to evaluate the behavior of individual agents as well (thus being confident that we are not only capturing the emergent macro-structures but also the individual agents’ micro-behavior). Traditionally it has been very hard to find good-quality, high-resolution data for these purposes. However, the emergence of “big” data and the associated “datafication” of previous undocumented aspects of peoples’ everyday lives (e.g., moods, thoughts, activities, feelings) through sources like social media (among others) have led to the proliferation of sensitive individual-level data (Mayer-Schönberger and Cukier, 2013) that have the potential to transform the quality of agent-based models. Section “Challenges and Opportunities” will discuss this emerging trend and the opportunities it offers agent-based modelers in more detail.

1.16.5

Integrating GIS and Space Into Agent-Based Models

Consideration of space is often integral to the success of agent-based models. For example, in the Schelling model presented in section “What Is ABM?”, the agents’ decision making is directly impacted by its neighbors. In the sense that if the agent is dissatisfied with its current location (based on the mix of neighbors) it can move to an empty cell, thus directly impacting the landscape. Considering how agents can react to each other and change their environment, we would argue that ABM has great relevance to many geographical problems (as discussed in sections “The Rise of the (Automated) Machines” and “What Is ABM?”) and there has been growing interest in the integration of GIS and ABM (e.g., Benenson and Torrens, 2004; Gimblett, 2002; Heppenstall et al., 2012). This interest arises as it allows agent-based models to be related to actual geographical locations, thereby explicitly incorporating space into the model design. Furthermore, it allows modelers to think about how objects or agents and their aggregations interact and change in space and time (Batty, 2005). For GIS users, it provides the ability to model the emergence of phenomena through individual interactions of features within a GIS over time and space. This last point is a movement away from traditional focus of GIS where more attention was focused on spatial representation, often ignoring temporal representations (Peuquet, 2002). Moreover, from a understanding of geographical systems perspective, this linkage is highly appealing in the sense that while GIS provides us with the ability to monitor the world, it provides no mechanism to discover new decision-making frameworks such as why people have moved to a new areas (Robinson et al., 2007). Through the integration of GIS and ABM we can capture both. The simplest way to visualize the integration of geographical data and ABM is by taking the view most GIS’s do of the world, as shown in Fig. 7. The complexity of the world is abstracted away and represented as a series of layers (such as the physical environment, the built environment). These layers form the environment for our artificial world for which the agents inhabit; they can act as boundaries for our simulations, or fixed layers such as roads provide a means for agents to move from A to B, or houses provide them with a place to live. Aggregate spatial data also allow for model validation (as discussed in section “Evaluating a Model”): for example, are the land-use patterns we see emerging from a model of urban growth matching that of reality? If they do, it provides us with an independent test of the micro-level processes encoded within the model.

Agent-Based Modeling

229

The world on

ti rac

st

Ab

Physical environment layers

Artifical world

Built environment layers

If then < take action 1> else

Agents

Fig. 7

Abstracting from the “real” world into a series of layers to be used in the artificial world for which to base the agent-based model upon.

1.16.5.1

Coupling and Embedding GIS and Agent-Based Models

The question faced by many modelers’ is how to go about integrating geographical data into models as many traditional GIS platforms are not capable of representing continuous data over time and space. This has led to modeler’s either linking (coupling) GIS and ABM or embedding GIS into the ABM or vice versa (see Crooks and Castle, 2012 for a detailed discussion). Coupling can be broadly defined as the linkage of two stand-alone systems by data transfer. Westervelt (2002) identifies three coupling approaches (loose, moderate, and tight) with respect to GIS and ABM. Loose coupling involves the asynchronous operation of functions within each system, with data exchanged between systems in the form of files. For example, the GIS might be used to prepare inputs, which are then passed to the modeling system, where after execution the results of the model are returned to the GIS for display and analysis (e.g., Crooks, 2010). This approach requires the GIS and modeling system to understand the same data format (e.g., ESRI shapefiles). At the other extreme is tight coupling which can be characterized by the simultaneous operation of systems allowing direct intersystem communication during the programme execution (e.g., Leavesley et al., 1996; Benenson et al., 2005). For example, standards such as Microsoft’s COM and .NET allow a single script to invoke commands from both systems (Ungerer and Goodchild, 2002). In the middle is moderate coupling, which essentially encapsulates techniques between loose and tight coupling. For example, remote procedures that call and share database access link between the GIS and modeling system, allowing indirect communication between the systems (e.g., Harper et al., 2002). For a review of the pros and cons of different coupling approaches, the reader is referred to Westervelt (2002) for a review. Traditionally coupling has often been the preferred approach for linking GIS and modeling systems. However, this has tended to result in very specialized and isolated solutions, which have prevented the standardization of general and generic linkage. An alternative to coupling is to embed or to integrate the required functionality of either the GIS or modeling system within the dominant system using its underlying programming language (Maguire, 2005). The final system is either referred to as GIS-centric or

230

Agent-Based Modeling

modeling-centric depending on which system is dominant. In both instances, the GIS tools or modeling capabilities can be executed by calling functions from the dominant system, usually through a graphical user interface (GUI). Compared to coupling, an embedded or integrated system will appear seamless to a user (Maguire, 1995). Interest in modeling-centric systems has increased considerably over recent years, predominately due to the development of modeling toolkits with scripting capabilities that do not require advanced computer-programming skills (Castle and Crooks, 2006; Gilbert and Bankes, 2002). Often the modeling toolkit can access GIS functions, such as data management and visualization capabilities, from a GIS software library. For example, the MASON (Multi Agent Simulation Of Neighbourhood) toolkit (see section “ABM Toolkits”) exploits functions from GeoTools (a Java GIS software library) for importing and exporting data, Java Topology Suite (JTS) for data manipulation, and its own GUI for visualization. The toolkit itself maintains the agents and environment (i.e., their attributes), using identity relationships for communication between the different systems. Functions available from GIS software libraries reduce the development time of a model, and are likely to be more efficient because they have been developed over many years with attention to efficiency. Additionally, the use of standard GIS tools for spatial analysis improves functional transparency of a model, as it makes use of well-known and understood algorithms (Castle and Crooks, 2006). Conversely, the GIS-centric approach is an attractive alternative; not least because the large user-base of some GIS expands the potential user-base for the final model. Analogous to the modeling-centric approach, GIS-centric integration can be carried out using software libraries of modeling functions accessed through the GIS interface. While there are many examples of modeling systems integrated within commercial GIS, including: the Consequences Assessment Tool Set (Kaul et al., 2004) system which was designed for emergency response planning; the Hazard Prediction and Assessment Capability (DTRA, 2001) system, for predicting the effect of hazardous material releases into the atmosphere; the NatureServe Vista (2016) system, for land-use and conservation planners. There are few GIS-centric implementations from an ABM perspective, one such example is Agent Analyst for ArcGIS (see Johnston, 2013).

1.16.5.2

ABM Toolkits

Traditionally the building of agent-based models required their development from scratch. However, over the years several ABM toolkits have been developed which allow for more ease with respect to the development of the agent-based model. Specifically, it has been argued that toolkits reduce the burden modelers face programming parts of a simulation that are not content-specific (e.g., GUI, data import–export, visualization/display of the model). Toolkits also increase the reliability and efficiency of the model, because complex parts have often been created and optimized by professional developers, as standardized modeling functions (Castle and Crooks, 2006). However, there may be limitations of using toolkits to develop agent-based models, for example: in some instances a substantial amount of effort is required to understand how to design and implement a model in some toolkits; the programming code of demonstration models or models produced by other researchers can be difficult to understand, poorly documented, or apply to another purpose; a modeler will have to learn or already have an understanding of the programming language required to use the toolkit (e.g., Java, C); and finally the desired/required functionality may not be present, although additional tools might be available from the user community or from other software libraries. Benenson et al. (2005) also note that toolkit users are accompanied by the fear of discovering that a particular function cannot be used, will conflict, or is incompatible with another part of the model late in the development process. In this section we present an overview of a selection of open-source ABM toolkits that have the capability to process spatial (GIS) data as shown in Table 2. We list the projects website in order for readers to find out more about the current state of development of each toolkit. Our rationale for only choosing open source is that their source code is published and made available to the public, enabling anyone to copy, modify, and redistribute the system without paying royalties or fees. A key advantage of open-source toolkits relates to the transparency of their inner workings. The user can explore the source code, permitting the modification, extension, and correction of the system if necessary, thus relaxing the fear of the unknown. This is particularly useful for verifying and validating a model (see Crooks et al., 2008). Readers interested in other platforms or guidelines for choosing an ABM toolkit are referred to reviews by Castle and Crooks (2006), Kravari and Bassiliades (2015), Nikolai and Madey (2009), Railsback et al. (2006), Robertson (2005) and for a comprehensive list of other ABM toolkits are referred to Wikipedia (https://en.wikipedia. org/wiki/Comparison_of_agent-based_modeling_software). Swarm could be classed as the original ABM toolkit designed specifically for the development of simulations of complex adaptive systems (Swarm, 2016). Inspired by artificial life, Swarm was designed to study biological systems; attempting to infer mechanisms observable in biological phenomena (Minar et al., 1996). In addition to modeling biological systems such as fish (e.g., Railsback and Harvey, 2002), Swarm has been used to develop models for anthropology, computer science, ecological, economic, geographical, and political science purposes (e.g., Deadman and Schlager, 2002; Johnson, 2002; Lim et al., 2002). Useful examples of spatially explicit models include: the simulation of pedestrians in the urban centers (Haklay et al., 2001); and the examination of crowd congestion at London’s Notting Hill carnival (Batty et al., 2003). MASON is developed by the Evolutionary Computation Laboratory (ECLab) and the Centre for Social Complexity at George Mason University (see Luke et al., 2005). Core functionality includes dynamically charting (e.g., histograms, line graphs) and model output during a simulation. It has a strong support for using GeoMASON (Sullivan et al., 2010) which allows GIS vector and Raster data to be imported/exported. MASON has a comprehensive set of technical documents and well-commented Javadocs. MASONs how-to documentation, demonstration models, and several publications detailing the implementation and/or application of

Agent-Based Modeling Table 2

A selection of open-source ABM toolkits for creating geographical explicit models Swarm

Developers

231

MASON

Repast

Santa Fe Institute/SWARM Evolutionary Computation University of Chicago, Development Group, Laboratory and Center Department of Social USA for Social Complexity, Science Research George Mason Computing and Argonne National University, USA Laboratory, USA Date of inception 1996 2003 2000 Website http://www.swarm.org/ http://cs.gmu.edu/ https://repast.github.io/ eclab/projects/mason Implementation Objective-C/Java Java Java, Microsoft.Net language(s) Python, Groovy, ReLogo Required Strong Strong Medium to strong programming experience Integrated GIS Yes (e.g., Kenge GIS Yes Yes functionality library for Raster data, see Box, 2001) Integrated Yes (e.g., R- and S-plus Yes (e.g., wrappers for Yes charting/ statistical packages) JFreeChart) graphing/ statistics Yes Yes Yes Availability of demonstration models https://repast.github.io/ Tutorials/how-to http://www.swarm.org/ http://cs.gmu.edu/ docs.html documentation wiki/Swarm_main_page eclab/projects/ mason/docs/Also Luke, 2015 Additional Minar et al. (1996) D-MASON: www.dmason. Useful weblog: http:// crimesim.blogspot. information org/ com/ GeoMASON: http://cs. Agent Analyst: http:// gmu.edu/eclab/ resources.arcgis.com/ projects/mason/ en/help/agent-analyst/ extensions/geomason/

NetLogo

GAMA

Centre for Connected Learning and Computer-Based Modelling, Northwestern University, USA 1999 https://ccl.northwestern. edu/netlogo/ Proprietary scripting

UMMISCO, France

Basic

2007 http://gama-platform. org/ Proprietary scripting: GAMA Modeling Language Basic to medium

Yes

Yes

Yes

Yes

Yes

Yes

https://ccl.northwestern. http://gama-platform. org/tutorials edu/netlogo/resources. shtml also Wilensky and Rand, 2015 NetLogo-R Extension: GAMA GitHub page http://r-ext. https://github.com/ sourceforge.net/ gama-platform Selection of GIS examples: http://www. gisagents.org/search/ label/NetLogo

Adapted and extended from Parker, D. C., Berger, T. and Manson, S. M. (2001). Proceedings of an International Workshop on Agent-Based Models of Land-Use and Land-Cover Change, Irvine, CA; Castle, C. J. E. and Crooks, A. T. (2006). Principles and concepts of agent-based modelling for developing geospatial simulations. Centre for Advanced Spatial Analysis, University College London, Working paper 110, London.

MASON are available for a prospective modeler to evaluate the system further (see MASON, 2016). Examples of spatially explicit models utilizing MASON’s GIS functionally are shown in Fig. 8. Spatial applications of MASON include exploring disease spread (Crooks and Hailegiorgis, 2014), evacuation (Wise, 2014), conflict between herdsmen and farmers in East Africa (Kennedy et al., 2010), and movement across national borders (Łatek et al., 2012) to name but a few. More recently, a distributed version of MASON (D-MASON; Cordasco et al., 2013) was created which allows MASON models to be run over cluster and cloudcomputing architectures. Repast (Recursive Porous Agent Simulation Toolkit) was originally developed at the University of Chicago, and is currently maintained by Argonne National Laboratory. Earlier incarnations of Repast catered for the implementation of models in three programming languages: Python (RepastPy); Java (RepastJ); and Microsoft.Net (Repast.Net) (see Collier and North, 2004; North et al., 2006; Vos and North, 2004 for more details and for a review of their GIS functionality see Crooks, 2007). These earlier versions have been superseded by Repast Simphony (North et al., 2013), which provides all the core functionality of previous versions but allows models to be developed in several ways including the ReLogo (a dialect of Logo; Ozik et al., 2013), point-and-click statecharts (Ozik et al., 2015), Groovy, or Java. The Repast development team have provided a series of articles regarding Repast Simphony. The architecture and core functionality are introduced by North et al. (2005a), and the development environment is discussed by Howe et al. (2006). The storage, display, and behavior/interaction of agents, as well as features for data analysis (i.e., via the integration of the R statistics package) and presentation of models within Repast Simphony are outlined by North et al. (2005b). In relation to the integration of GIS functionality, the reader is referred to the tutorials by Malleson (2012) which demonstrates how to create a virtual city via

232

Agent-Based Modeling

(A)

(C)

(B)

(D)

B

Fig. 8 A selection of MASON spatial models (A): agents (red) exiting a building based on raster data and the resulting trails (yellow). (B) An urban growth model where red areas represent new developments. (C) A Schelling-type model using census areas in Washington, DC as its spatial environment. (D) Agents (red circles) moving along on sidewalks (gray lines).

the importation of shapefiles, create agents, and then move the agents around a road network (this tutorial was used for the creation of Fig. 9A). Furthermore, within Repast Simphony it is possible to embed spatially explicit agent-based models directly into a 3D GIS display. For this Repast Simphony provides methods to directly visualize agent-based models to NASA’s virtual globedWorld Wind. This interactive 3D GIS display allows one to visualize agents with satellite imagery, elevated terrain, and other scientific datasets as shown in Fig. 9B. Repast Simphony also supports the importation of NetLogo models into the Repast framework via Repast Simphony (Ozik et al., 2013; North et al., 2013). Such functionality aims to allow for rapid prototyping of agentbased models by first building simple agent-based models in NetLogo and once satisfied with its basic functionality migrate and extend them in Repast Simphony (for a comparison of NetLogo and ReLogo see Lytinen and Railsback, 2012). Repast Simphony

(A)

(B)

Fig. 9 Examples of vector agent-based models in Simphony. (A) Agents (red stars) moving along on sidewalks (gray lines). (B) An agent-based model overlaid on NASA world wind. Source: Repast. (2016). Recursive porous agent simulation toolkit. Available at http://repast.sourceforge.net/ (accessed on 7 Oct. 2016).

Agent-Based Modeling

233

also supports high-performance distributed computing via Repast for High Performance Computing (Repast HPC, see Collier and North, 2013) and has extension called Agent Analyst that allows users to create, edit, and run Repast models from within ArcGIS (see Johnston, 2013). Useful examples of spatially explicit models created using Repast include the studying of segregation, and residential and firm location (Crooks, 2006), residential dynamics (Jackson et al., 2008), urban regeneration (Jordan et al., 2014), crime (Malleson et al., 2010), land-use change (Deadman et al., 2004), pedestrian evacuation (Castle, 2007), and disaster management (Mysore et al., 2006). NetLogo was developed at the Centre for Connected Learning and Computer-Based Modeling at Northwestern University and uses a dialect of the Logo language to create agent-based models. NetLogo has been used to develop applications in disciplines varying from biology and physics to the social sciences (see Wilensky and Rand, 2015). It has extensive how-to documentation/ tutorials and demonstration models which are available from its website, and functionality can be extended through application programming interfaces (APIs). For example, NetLogo also has an R extension allowing for more greater statistical analysis of model structure and dynamics (Thiele and Grimm, 2010). Another interesting feature of NetLogo is that it allows for multilevel modeling (via LevelSpace) by being able to connect several models together. For example, if one had a model of population growth, another on food projection, and another on weather, you can explore how these three systems impact each other (i.e., how bad weather impacts food production and how this population growth). Moreover, models developed in NetLogo can also be run on the web via NetLogoWeb. NetLogo is simple to use and it is possible to import both raster (in the form of .asc files) and vector data (shapefiles). This ability opens up a range of possibilities for the easy creation of spatial agentbased models as shown in Fig. 10; for example, for the studying of basic concepts of water flow and surface erosion as shown in Fig. 10A. A raster file of surface elevation is loaded into a NetLogo model where the agents follow the surface to lower

(A)

(B)

(C)

(D)

Fig. 10 A selection of geographically explicit agent-based models utilizing NetLogo. (A) Rainfall where rain (blue) falls and flows to a lower elevation based on a digital elevation model captured in 2D and 3D. (B) Agents (white) moving along on sidewalks (orange). (C) The Schelling-type model using census areas in Washington, DC as their spatial environment. (D) Commuting along a road network.

234

Agent-Based Modeling

elevations. Such functionality potentially lowers the barrier between linking agent-based models and GIS to none expert programmers. For example, the gradient example presented earlier could be used to model process that relies on cost surfaces such as used in pedestrian moment models (see Crooks et al., 2015a). NetLogo models can also be viewed in a 3D environment and 3D surfaces can also be incorporated in such models as shown in the bottom of Fig. 10A. Vector data can also be imported and used as a foundation of models such as polylines can act as sidewalks or roads for agents to navigate the urban environments (Fig. 10B and D), and polygons can act as cells for a segregation type of model (Fig. 10C). Useful examples of spatially explicit models created using NetLogo include the study of gentrification (Torrens and Nara, 2007), residential housing demand (Fontaine and Rounsevell, 2009) and the reimplementation of Axtell et al.’s (2002) artificial Anasazi model by Janssen (2009). GAMA (GIS Agent-based Modeling Architecture): is the only platform we review here that was specifically designed for the development of spatially explicit models ranging from hundreds to millions of agents (Taillandier et al., 2016). It was developed by several teams under the direction of UMMISCO (Unit for Mathematical and Computer Modelling of Complex Systems), France (Amouroux et al., 2007). Its intention was and still is to allow for the simple creation of models (akin to NetLogo) but with the ability to carry out experiments and simulations as in MASON and Repast (Grignard et al., 2013). At its time of inception, it was noted that existing toolkits had several weaknesses with respect to creating geographically explicit models. For example, complex programming for new modelers when using Swarm, or no GIS support in MASON and the inability of NetLogo to have complex models which GAMA aimed at overcoming (Amouroux et al., 2007). GAMA has its own domain-specific language GAML (GAma Modeling Language) which is coded in Java and its application is based on the rich client platform (RCP) architecture provided by Eclipse. It allows for the importation of shapefiles, OpenStreetMap data along with grid and 3D files (e.g., .3ds file format) which allows for the creation of multilevel highly visual and geographically explicit models in 2D and 3D as shown in Fig. 11. Similar to other platforms it has built in charting and statistical functions including the FuzzyKappa simulation (van Vliet et al., 2013) and an integrated BDI cognitive architecture that others can use if they wish (Caillou et al., 2015). For example, spatially explicit applications utilizing GAMA include farming (Taillandier et al., 2012), 2D and 3D pedestrian evacuation (Anh et al., 2011; Macatulad and Blanco, 2014), traffic simulation (Taillandier, 2014), urban growth (Taillandier et al., 2016), land-use planning (Caillou et al., 2015), and river sedimentation (Grignard et al., 2015).

1.16.5.3

Example Applications

Agent-based models have been developed to study a wide range of phenomena from a number of academic disciplines (Macal, 2016). These range from archeological reconstruction of ancient civilizations (Axtell et al., 2002); understanding theories of political identity and stability (Cioffi-Revilla and Rouleau, 2010); exploring the processes that lead to state formation (Cederman, 2001); biological models of infectious diseases (Eidelson and Lustick, 2004); growth of bacteria colonies (Krzysztof et al., 2005); stock trading (Carrella, 2014) labor market dynamics (Guerrero and Axtell, 2011); tax compliance (Bloomquist, 2011); housing markets (Geanakoplos et al., 2012); and voting behaviors in elections (Laver and Sergenti, 2012) to name but a few. From a geographical systems perspective, agent-based models have been applied to a wide range of phenomena, some have been highlighted earlier but these range from the micromovement of pedestrians over seconds and hours (e.g., Torrens, 2014) to the rise of city systems over centuries (e.g., Pumain, 2012) and nearly everything in-between. It is impossible to cover all in detail but Table 3 gives a list of representative examples of agent-based models that use GIS to explore problems relate to geographical systems. As can be seen one question that preoccupies researchers is “how to select the most representative scale for an application in terms of agents (entities), spatial and temporal scales?” Fortunately, the underlying rationale for using ABMs is the notion of complexity, which focuses on a “bottom-up” approach to modeling geographical systems and provides a ready solution with the emphasis on representing the smallest individual unit of interest. The examples in Table 3 present a range of agent representation from individuals, households to entire cities, or institutions. The choice of agents, spatial scale, and temporal scale depends on the problem being investigated (Heppenstall et al., 2016). Furthermore, in section “Making Agents More Human” we noted that one of the hallmarks of ABM is its ability to capture and model human behavior; therefore we classified how the selected models represent behavior in two ways; either as a mathematical approach or through the use of cognitive frameworks.

1.16.6

Challenges and Opportunities

The greatest strength of ABM is its ability to model complex social phenomena. The focus on the agent as the fundamental driver of system-wide behavior allows researchers to take important steps toward understanding what the consequences of individual behavior, and the interactions between individuals, are on geographical systems. Section “Example Applications” highlighted the sheer diversity of applications that are abundant in the literature that seek to replicate the main processes and drivers of geographical systems from the micro to macro scale. While it is clear that significant progress has been made in simulating systems from the bottom-up, several key challenges remain that researchers need to address. Within this section, we will outline these challenges and the opportunities that meeting these will create. ABM advocates an understanding of social phenomena through simulation at the individual level. By creating heterogeneous individuals who can interact with other individuals and the environment, we can track the emergence of new patterns or trends.

Agent-Based Modeling

(A)

(B)

(C)

(D)

Fig. 11 GAMA platform: advanced visualization of simulations (A and B). (C) Built in charting and functions. (D) User interface. Source: GAMA. (2016). GAMA modeling and simulation development environment. Available at http://gama-platform.org/ (accessed on 20 Oct. 2016).

235

236 Table 3

Agent-Based Modeling A selection of the studies utilizing GIS and ABM

Author

Application

Entity

Behavior

Spatial scale

Temporal scale

Batty et al. (2003) Torrens and McDaniel (2013) Crooks et al. (2015a) Crooks and Hailegiorgis (2014) Eubank et al. (2004)

Public event Riots Indoor movement Disease propagation

Individuals Individuals Individuals Individuals

Mathematical Mathematical Mathematical Mathematical

Neighborhood Neighborhood Indoor scene City

Seconds Seconds Seconds Minutes

Disease propagation and urban traffic Crime

Individuals

Mathematical

City

Seconds

Individuals

Neighborhood

Minutes

Groff (2007) Manley et al. (2014) Dawson et al. (2011) Heppenstall et al. (2006) Benenson et al. (2002) Augustijn-Beckers et al. (2011) Jordan et al. (2014) Haase et al. (2010) Xie and Fan (2014)

Crime Traffic Flooding Retail Residential location Informal settlement growth

Individuals Individuals Individuals Individuals Individuals Households

Cognitive framework Mathematical Mathematical Mathematical Mathematical Mathematical Mathematical

City City center Town City Neighborhood Neighborhood

Hours Seconds Minutes Days Years Days

Regeneration Urban shrinkage Urban growth

Mathematical Mathematical Mathematical

Neighborhood City Region

Years Years Years

Pumain (2012)

City systems

Households Households Institutions and developers City

Mathematical

Countries and continents

Years

Malleson et al. (2013)

Adapted from Heppenstall, A., Malleson, N. and Crooks, A. T. (2016). “Space, the final frontier”: How good are agent-based models at simulating individuals and space in cities? Systems 4(1): 9. Available at http://www.mdpi.com/2079-8954/4/1/9/html.

While this is a tantalizing prospect for researchers, it is accompanied by a new set of problems. Let us consider how we might, for example, simulate the behavior of individuals evacuating a building in an emergency. How the individuals would react in this situation is highly dependent on their individual attributes. Their behavior is a product of their attributes and, in this case, explicitly influenced by their local environment. While we are approaching the point where we can obtain and build this information into our agents, we need a corresponding amount of data to calibrate and validate the behavior. As Heppenstall et al. (2016) note, it is ironic that the disaggregation of data down to the individual level to give better representation through heterogeneity has meant that it is near impossible (at present) to rigorously calibrate and validate our models. However, the recent proliferation of “big” data offers a potential resolution to this problem in the near future. While the abundance of data will contribute to solving this issue, how we extract value and make sense of these new forms of data presents a considerable challenge. These issues are revisited here. O’Sullivan et al. (2012) speculate that social systems are potentially the product of thousands of individual’s decisionsdit therefore follows that researchers need to include behavioral frameworks (see section “Making Agents More Human”) that can manage more complex behaviors to capture these decisions. Many of the behaviors in the examples in section “Example Applications” operate through rule-based systems that are more closely related to mathematics than psychology. While this is entirely appropriate for some applications, human behavior cannot be distilled down into formulaic rules. Human decisions are made on the basis of incomplete or assumed knowledge with decisions which can be both spontaneous and irrational. To grasp some of this complexity, there needs to a more explicit link between ABM and behavioral frameworks. Frameworks such as BDI (Bratman et al., 1988) are a popular choice among modelers (e.g., Müller, 1998; Rao and Georgeff, 1995; Taylor et al., 2004), but are limited by assumptions of rational decision making that can be difficult to justify as people rarely meet the requirements of rational choice models (Axelrod, 1997). The work of Malleson et al. (2012) and Pires and Crooks (2016) shows how alternative frameworks such as PECS can be successfully integrated with agent-based models. While we can begin to create more complex agent-based models, quantitative geography has not yet focused on the development of methods for measuring and analyzing individual units as part of a massively interactive, dynamic, and nonlinear system (Batty and Torrens, 2005; Torrens, 2010). This has precipitated the criticism of agent-based models as “toy models,” that is, an absence of robust quantitative schemes to allow ABM to be held up to account against real-world systems. This is an area where “big” data sources could offer new avenues for multiscale calibration and validation of agent-based models. Putting aside the difficulties associated with quantifying the similarity between spatial patterns (as discussed in section “Evaluating a Model”), the aggregate analysis of model results is often relatively unproblematic. But for patterns at higher spatiotemporal resolutions, assessing the reliability of model results can be extremely difficult when appropriate data are less forthcoming. For example, when modeling pedestrian movement, it would be desirable to compare the movements of individual-simulated agents to those of individual real people, rather than just the aggregate model outcomes (e.g., crowd densities, flow directions). This is where the emergence of “big” data, and associated developments around “smart cities,” are

Agent-Based Modeling

237

particularly relevant. Data about individuals and the environment are being created at an unprecedented rate. Sources such as mobile phone call data records (Diao et al., 2016), public transport smart cards (Batty et al., 2013), vehicle traffic counters (Bond and Kanaan, 2015), the use of loyalty cards or credit cards, and social media contributions such as Twitter or FourSquare (Croitoru et al., 2013; Malleson and Andresen, 2015) can potentially reveal a wealth of information about individual behavior or actions. Additionally, unlike traditional sources that are largely static and often somewhat out of datedthe most recent UK and US censuses were conducted in 2011 and 2010, respectivelyd“big” data are often generated in real time. Rather than being calibrated using historical data and then used to make future predictions in the absence of any new information, agent-based models could be calibrated in real time as new data emerge which will reduce the uncertainty in model predictions. This provides a substantial opportunity for ABM as a means of producing short-term, high-quality, local analyses and forecasts that can inform agile and responsive policy-making. This is particularly relevant for models that have been coupled to GIS, as these are often concerned with estimating future real-system states rather than exploring theory in hypothetical contexts. However, there has been relatively little work toward sound methodologies that support the incorporation of data into agent-based models dynamically. Methods to perform dynamic data assimilation that are used regularly in fields such as meteorology (Kalnay, 2003) have only been attempted for the most rudimentary of agent-based models (Ward et al., 2016). Another significant challenge for the use of big data that must be overcome before they can be used to inform agent-based models relates to bias. Considering social media, for example, not only do a small number of individuals often contribute substantially more than all others, but the presence of the “digital divide” (Yu, 2006) means that some groups are much less likely to contribute to these emerging digital data streams than others. Models that are predicated on “big” data, therefore, might disregard the groups of people who by choice or circumstance do not leave a digital footprint. Any insight from these sources must therefore be taken in the context of the inherent biases and steps should be taken to understand the subsequent inaccuracies even if nothing can be done to completely remove them. There are also related ethical implications, with a major concern that the data might be predicated on relatively weak consent. For example, an individual might sign a contract or tick a box that gives permission for the (re)use of their data, but it is not always clear how informed the person actually is. In most cases it is extremely unlikely that someone is fully aware of the terms that they are agreeing to. For example, the Apple iTunes UK Terms and Conditions contained almost 7000 words (as of 13 Sep. 2016). This lack of informed consent offers some fundamental challenges to traditional ethical frameworks that usually rely on explicit (often documented) consent. Even if consent is properly informed, then some individuals might still be unaware of size and thoroughness of the digital footprint that they are creating. Fortunately, research institutions typically adhere to comprehensive ethical standards and frameworks. It is therefore up to these institutions to demonstrate that the data can be: stored securely; treated sensitively and ethically; and produce outcomes that are ultimately for social good.

1.16.7

Conclusion

ABM is rapidly establishing itself as the defacto tool to simulate the processes and drivers in geographical systems. It offers the tantalizing possibility of creating new insights and knowledge about how geographical systems have evolved to their current state and how they might develop in the future. A large part of ABM’s popularity is down to the natural metaphor that agents offer for modeling individuals. This chimes with current geographical thinking as embodied by Batty (2013) who describes cities as being the product of hundreds of individual interactions occurring on dynamic networks, and O’Sullivan et al. (2012) who hypothesized that social systems are the product of potentially infinite numbers of human decisions. While many of the applications within this article have focused on human to human agent interactions, there is an increasing focus on understanding the outcomes of human behavior with environmental interactions. Section “Example Applications” provided examples of coupled human and natural systems (CHANS) that highlight the importance and benefit of simulating the human link to land use. What is noticeable from recent applications of ABM is the increase in complexity (richness and detail) of the agents; a factor made possible through new data sources and increased computational power. This increase in detail at the individual level is the beginning of a step-change in social simulation modeling with researchers being able to create realistic systems driven by the individual. While there has always been “resistance” to the notion that social scientists should search for some “atomic element or unit” of representation that characterizes the geography of a place (Batty, 2012), the shift from aggregate to individual places agents as a clear contender to fulfill the role of “atom” in social simulation modeling. However, there are a number of methodological challenges that need to be addressed if ABM is to be taken up as the “atomic unit” in social simulation and be recognized as a powerful tool for policy modeling in key societal issues. Evaluation of ABMs has been identified by numerous commentators as one of the most critical issues yet to be resolved in ABM (Angus and Hassani-Mahmooei, 2015; Axelrod, 1997; Crooks et al., 2008; Lee et al., 2015; Ngo and See, 2012; Takadama et al., 2008). While developments in ABM, such as ODD (Grimm et al., 2006) and empirical grounding of ABM mechanisms and agent attributes (Robinson et al., 2007; Smajgl et al., 2011; Windrum et al., 2007), have improved transparency and replicability in models, the massive diversity in outputs produced by micro-level interactions presents a significant challenge in distilling the most relevant and interesting results from a nearly endless sea of output data. ABM outputs demand a comprehensive exploration of the model behavior and model output that is just not realized at present (Angus and Hassani-Mahmooei, 2015) and presents a significant challenge for the researcher. As a result, agent-based models remain poorly calibrated and validated with researchers not fully grasping the importance of uncertainty associated with their models (Crooks et al., 2015b; Edmonds and Moss, 2004; Moss, 2008; Takadama et al., 2008). Sensitivity tests are largely absent from ABM and outputs are never associated with a confidence

238

Agent-Based Modeling

level. If these models are to achieve the credibility associated with global change models in the natural sciences, and thus be taken up in policy decision making, rigorous work is urgently needed in this area. Big data will only increase in volume, variety, and veracity in the futuredharnessing useable information and embedding with model simulations to aid policy implementation is key: a factor that has been recognized by national governments (Yiu, 2012). However, the appeal of big data presents as many risks as opportunities for researchers; for example, caution needs to be exercised to ensure that research is driven by the problem (and variables of interest) rather than data availability. While big data presents obvious opportunities for increasing the number and types of application in ABM, there is also the real possibility that the extant ABM deficiencies will become more acute as more people begin to (mis)use big data in such models. Key work is needed now to rigorously test big data led models to ensure that those that make it into the decision-making arena are credible. This will ensure that the marriage of big data and ABM presents a unique opportunity to address important societal issues. While these challenges are substantial, not addressing them will lead to ABM eventually being discredited as a tool that can deliver real value both in the academic arena and in the public and private sectors. Meeting these challenges will unlock the potential of ABM, allowing new knowledge and discoveries to be made in geographical systems that can be translated into delivering solutions for a wide range of societal problems.

References Al-Ahmadi, K., See, L., Heppenstall, A., Hogg, J., 2009. Calibration of a fuzzy cellular automata model of urban dynamics in Saudi Arabia. Ecological Complexity 6 (2), 80–101. Alonso, W., 1964. Location and land use: Toward a general theory of land rent. Harvard University Press, Cambridge, MA. Amouroux, E., Chu, T.Q., Boucher, A., Drogoul, A., 2007. GAMA: An environment for implementing and running spatially explicit multi-agent simulations. In: Ghose, A., Governatori, G., Sadananda, R. (Eds.)Pacific Rim International Conference on Multi-Agents (PRIMA 2007). Springer, Bangkok, Thailand, pp. 359–371. An, L., 2012. Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecological Modelling 229, 25–36. Angus, S.D., Hassani-Mahmooei, B., 2015. “Anarchy” reigns: A quantitative analysis of agent-based modelling publication practices in JASSS, 2001–2012. Journal of Artificial Societies and Social Simulation 18 (4), 16. Available at http://jasss.soc.surrey.ac.uk/18/4/16.html. Anh, N.T.N., Daniel, Z.J., Du, N.H., Drogoul, A., An, V.D., 2011. A hybrid macro–micro pedestrians evacuation model to speed up smulation in road networks. In: Dechesne, F., Hattori, H., ter Mors, A., Such, J.M., Weyns, D., Dignum, F. (Eds.)International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011). Springer, Taipei, Taiwan, pp. 371–383. Augustijn-Beckers, E., Flacke, J., Retsios, B., 2011. Simulating informal settlement growth in Dar es Salaam, Tanzania: An agent-based housing model. Computers, Environment and Urban Systems 35 (2), 93–103. Axelrod, R., 1997. Advancing the art of simulation in the social sciences. In: Conte, R., Hegselmann, R., Terno, P. (Eds.), Simulating Social Phenomena. Springer, Berlin, Germany, pp. 21–40. Axtell, R., Epstein, J.M., 1994. Agent-based Modelling: Understanding Our Creations. The Bulletin of the Santa Fe Institute: Winter 9, 28–32. Axtell, R., Axelrod, R., Epstein, J.M., Cohen, D., 1996. Aligning simulation models: A case study and results. Computational and Mathematical Organization Theory 1 (2), 123–141. Axtell, R., Epstein, J.M., Dean, J.S., Gumerman, G.J., Swedlund, A.C., Harburger, J., Chakravarty, S., Hammond, R., Parker, J., Parker, M., 2002. Population growth and collapse in a multiagent model of the Kayenta Anasazi in Long House Valley. Proceedings of the National Academy of Sciences 99 (3), 7275–7279. Balci, O., 1996. Verification, validation, and testing. In: Banks, J. (Ed.), Handbook of simulation: Principles, methodology, advances, applications, and practice. Wiley, New York, pp. 335–393. Balke, T., Gilbert, N., 2014. How do agents make decisions? A survey. Journal of Artificial Societies and Social Simulation 17 (4), 13. Available at http://jasss.soc.surrey.ac.uk/17/ 4/13.html. Ballas, D., Clarke, G., Dorling, D., Eyre, H., Thomas, B., Rossiter, D., 2005. SimBritain: A spatial microsimulation approach to population dynamics. Population, Space and Place 11 (1), 3–34. Balzer, W., 2000. SMASS: A sequential multi-agent system for social simulation. In: Suleiman, R., Troitzsch, K.G., Gilbert, N. (Eds.), Tools and techniques for social science simulation. Physica-Verlag, Heidelberg, pp. 65–82. Batty, M., 1976. Urban modelling: Algorithms, calibrations, predictions. In: Cambridge University Press, Cambridge, UK. Batty, M., 2005. Cities and complexity: Understanding cities with cellular automata, agent-based models, and fractals. MIT Press, Cambridge, MA. Batty, M., 2008. Fifty years of urban modelling: Macro-statics to micro-dynamics. In: Albeverio, S., Andrey, D., Giordano, P., Vancheri, A. (Eds.), The dynamics of complex urban systems: An interdisciplinary approach. Springer, New York, pp. 1–20. Batty, M., 2012. A generic framework for computational spatial modelling. In: Heppenstall, A., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 19–50. Batty, M., 2013. The new science of cities. MIT Press, Cambridge, MA. Batty, M., Torrens, P.M., 2005. Modelling and prediction in a complex world. Futures 37 (7), 745–766. Batty, M., Desyllas, J., Duxbury, E., 2003. Safety in numbers? Modelling crowds and designing control for the Notting Hill Carnival. Urban Studies 40 (8), 1573–1590. Batty, M., Manley, E., Milton, R., Reades, J., 2013. Smart London. In: Bell, S., Paskins, J. (Eds.), Imagining the future city: London 2062. Ubiquity, London, pp. 31–40. Benenson, I., Torrens, P.M., 2004. Geosimulation: Automata-based modelling of urban phenomena. Wiley, London. Benenson, I., Omer, I., Hatna, E., 2002. Entity-based modelling of urban residential dynamics: The case of Yaffo, Tel Aviv. Environment and Planning B: Planning and Design 29 (4), 491–512. Benenson, I., Aronovich, S., Noam, S., 2005. Let’s talk objects: Generic methodology for urban high-resolution simulation. Computers, Environment and Urban Systems 29 (4), 425–453. Birkin, M., Clarke, G.P., Wilson, A.G., 1996. Intelligent GIS: Location decisions and strategic planning. Geoinformation Group, Cambridge, UK. Bloomquist, K., 2011. Tax compliance as an evolutionary coordination game: An agent-based approach. Public Finance Review 39 (1), 25–49. Bonabeau, E., 2002. Agent-based modelling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences of the United States of America 99 (3), 7280–7287. Bond, R., Kanaan, A., 2015. MassDOT real time traffic management system. In: Geertman, S., Ferreira, J., Goodspeed, R., Stillwell, J. (Eds.), Planning support systems and smart cities. Springer, New York, pp. 471–488. Box, G.E.P., 1979. Robustness in the strategy of scientific model building. In: Launer, R.L., Wilkinson, G.N. (Eds.), Robustness in statistics. Academic Press, New York, pp. 201–236. Box, P., 2001. Kenge GIS-CA class template for Swarm. Natural Resources and Environmental Issues 8 (1), 31–35. Available at http://digitalcommons.usu.edu/nrei/vol8/iss1/6.

Agent-Based Modeling

239

Brailsford, S., Schmidt, B., 2003. Towards incorporating human behaviour in models of health care systems: An approach using discrete event simulation. European Journal of Operational Research 150 (1), 19–31. Brantingham, P., Glasser, U., Kinney, B., Singh, K., Vajihollahi, M., 2005a. A computational model for simulating spatial aspects of crime in urban environments. In: 2005 I.E. International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3667–3674. Waikoloa, HI. Brantingham, P.L., Glasser, U., Kinney, B., Singh, K., Vajihollahi, M., 2005b. Modeling urban crime patterns: Viewing multi-agent systems as abstract state machines. In: Beauquier, D., Börger, E., Slissenko, A. (Eds.)Proceedings of the 12th International Workshop on Abstract State Machines, pp. 101–117. Paris, France. Bratman, M.E., Israel, D.J., Pollack, M.E., 1988. Plans and resource-bounded practical reasoning. Computational Intelligence 4 (3), 349–355. Caillou, P., Gaudou, B., Grignard, A., Truong, C.Q., Taillandier, P., 2015. A Simple-to-use BDI architecture for agent-based modeling and simulation. In: European Social Simulation Association (ESSA 2015) Conference. Groningen, The Netherlands. Carrella, E., 2014. Zero-knowledge traders. Journal of Artificial Societies and Social Simulation 17 (3), 4. Available at http://jasss.soc.surrey.ac.uk/17/3/4.html. Casti, J.L., 1997. Would-be-worlds: How simulation is changing the frontiers of science. Wiley, New York. Castle CJE (2007) Agent-based modelling of pedestrian evacuation: A study of London’s King’s Cross Underground Station. PhD Thesis, University College London, London. Castle CJE and Crooks AT (2006) Principles and concepts of agent-based modelling for developing geospatial simulations. Centre for Advanced Spatial Analysis, University College London, Working paper 110, London. Cederman, L.E., 2001. Agent-based modelling in political science. The Political Methodologist 10 (1), 16–22. Cioffi-Revilla, C., Rouleau, M., 2010. MASON RebeLand: An agent-based model of politics, environment, and insurgency. International Studies Review 12 (1), 31–46. Clark, P.J., Evans, F.C., 1954. Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35 (4), 445–453. Clarke, K.C., Hoppen, S., Gaydos, L.J., 1997. A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environment and Planning B: Planning and Design 24 (2), 247–261. Collier, N.T., North, M.J., 2004. Repast for python scripting. In: Macal, C.M., Sallach, D., North, M.J. (Eds.)Proceedings of the Agent 2004 Conference on Social Dynamics: Interaction, Reflexivity and Emergence, pp. 231–237. Chicago, IL. Collier, N., North, M., 2013. Parallel agent-based simulation with Repast for high performance computing. Simulation 89 (10), 1215–1235. Cordasco, G., De Chiara, R., Mancuso, A., Mazzeo, D., Scarano, V., Spagnuolo, C., 2013. Bringing together efficiency and effectiveness in distributed simulations: The experience with D-MASON. Simulation 89 (10), 1236–1253. Croitoru, A., Crooks, A.T., Radzikowski, J., Stefanidis, A., 2013. GeoSocial gauge: A system prototype for knowledge discovery from geosocial media. International Journal of Geographical Information Science 27 (12), 2483–2508. Croitoru, A., Crooks, A.T., Radzikowski, J., Stefanidis, A., Vatsavai, R.R., Wayant, N., 2014. Geoinformatics and social media: A new big data challenge. In: Karimi, H.A. (Ed.), Big data techniques and technologies in geoinformatics. CRC Press, Boca Raton, FL, pp. 207–232. Crooks, A.T., 2006. Exploring cities using agent-based models and GIS. In: Sallach, D., Macal, C.M., North, M.J. (Eds.)Proceedings of the Agent 2006 Conference on Social Agents: Results and Prospects. University of Chicago and Argonne National Laboratory, Chicago, IL. Crooks AT (2007) The Repast simulation/modelling system for geospatial simulation. Centre for Advanced Spatial Analysis, University College London, Working paper 123, London. Crooks, A.T., 2010. Constructing and implementing an agent-based model of residential segregation through vector GIS. International Journal of GIS 24 (5), 661–675. Crooks, A.T., Castle, C., 2012. The Integration of agent-based modelling and geographical information for geospatial simulation. In: Heppenstall, A., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 219–252. Crooks, A.T., Hailegiorgis, A.B., 2014. An agent-based modeling approach applied to the spread of cholera. Environmental Modelling and Software 62, 164–177. Crooks, A.T., Heppenstall, A., 2012. Introduction to agent-based modelling. In: Heppenstall, A., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 85–108. Crooks, A.T., Castle, C.J.E., Batty, M., 2008. Key challenges in agent-based modelling for geo-spatial simulation. Computers, Environment and Urban Systems 32 (6), 417–430. Crooks, A.T., Croitoru, A., Lu, X., Wise, S., Irvine, J.M., Stefanidis, A., 2015a. Walk this way: Improving pedestrian agent-based models through scene activity analysis. International Journal of Geographical Information Science 4 (3), 1627–1656. Crooks, A.T., Pfoser, D., Jenkins, A., Croitoru, A., Stefanidis, A., Smith, D.A., Karagiorgou, S., Efentakis, A., Lamprianidis, G., 2015b. Crowdsourcing urban form and function. International Journal of Geographical Information Science 29 (5), 720–741. Dawson, R.J., Peppe, R., Wang, M., 2011. An agent-based model for risk-based flood incident management. Natural Hazards 59 (1), 167–189. Deadman, P.J., Schlager, E., 2002. Models of individual decision making in agent-based simulation of common-pool-resource management institutions. In: Gimblett, H.R. (Ed.), Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. Oxford University Press, Oxford, UK, pp. 137–169. Deadman, P.J., Robinson, D.T., Moran, E., Brondizio, E., 2004. Effects of colonist household structure on land use change in the Amazon rainforest: An agent based simulation approach. Environment and Planning B: Planning and Design 31 (5), 693–709. Diao, M., Zhu, Y., Ferreira, J., Ratti, C., 2016. Inferring individual daily activities from mobile phone traces: A Boston example. Environment and Planning B: Planning and Design 43 (5), 920–940. DTRA (2001) The HPAC user’s guide, Version 4.0.3. Prepared for Defense Threat Reduction Agency, Contract DSWA01-98-C-0110, Science Applications International Corporation, McLean, VA. Edmonds, B., Moss, S., 2004. From KISS to KIDSdAn ‘anti-simplistic’ modelling approach. In: Davidsson, P., Logan, B., Takadama, K. (Eds.), International Workshop on MultiAgent Systems and Agent-Based Simulation. Springer, New York, pp. 130–144. Eidelson, B.M., Lustick, I., 2004. VIR-POX: An agent-based analysis of smallpox preparedness and response policy. Journal of Artificial Societies and Social Simulation 7 (3), 6. Available at http://jasss.soc.surrey.ac.uk/7/3/6.html. Epstein, J.M., 1999. Agent-based computational models and generative social science. Complexity 4 (5), 41–60. Epstein, J.M., Axtell, R., 1996. Growing artificial societies: Social science from the bottom up. MIT Press, Cambridge, MA. Eubank, S., Guclu, H., Kumar, A.V.S., Marathe, M.V., Srinivasan, A., Toroczkai, Z., Wang, N., 2004. Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184. Fontaine, C.M., Rounsevell, M., 2009. An agent-based approach to model future residential pressure on a regional landscape. Landscape Ecology 24 (9), 1237–1254. Fotheringham, A.S., O’Kelly, M.E., 1989. Spatial interaction models: Formulations and applications. Springer, New York. Geanakoplos, J., Axtell, R., Farmer, D., Howitt, P., Conlee, B., Goldstein, J., Hendrey, M., Palmer, N., Yang, C., 2012. Getting at systemic risk via an agent-based model of the housing market. American Economic Review 102 (3), 53–58. Gigerenzer, G., Goldstein, D.G., 1996. Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review 103 (4), 650–669. Gilbert, N., Bankes, S., 2002. Platforms and methods for agent-based modelling. Proceedings of the National Academy of Sciences 99 (3), 7197–7198. Gilbert, N., Terna, P., 2000. How to build and use agent-based models in social science. Mind & Society 1 (1), 57–72. Gilbert, N., Troitzsch, K.G., 2005. Simulation for the social scientist. In: 2nd edn. Open University Press, Milton Keynes. Gimblett, H.R. (Ed.), 2002. Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. Oxford University Press, Oxford, UK. Gode, D.K., Sunder, S., 1993. Allocative efficiency of markets with zero-intelligence traders: Market as a partial substitute for individual rationality. The Journal of Political Economy 101, 119–137.

240

Agent-Based Modeling

Gravetter, F.J., Forzano, L.A.B., 2011. Research methods for the behavioral sciences. In: 4th edn. Wadsworth Publishing, Belmont, CA. Grignard, A., Taillandier, P., Gaudou, B., Vo, D.A., Huynh, N.Q., Drogoul, A., 2013. GAMA 1.6: Advancing the art of complex agent-based modeling and simulation. In: Guido, G., Elkind, E., Savarimuthu, B.T.R., Dignum, F., Purvis, M.K. (Eds.)International Conference on Principles and Practice of Multi-agent Systems. Springer, Dunedin, New Zealand, pp. 117–131. Grignard, A., Fantino, G., Lauer, J.W., Verpeaux, A., Drogoul, A., 2015. Agent-based visualization: A simulation tool for the analysis of river morphosedimentary adjustments. In: Gaudou, B., Sichman, J.S. (Eds.), International Workshop on Multi-Agent Systems and Agent-Based Simulation (MABS 2015). Springer, Istanbul, Turkey, pp. 109–120. Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W.M., Railsback, S.F., Thulke, H., Weiner, J., Wiegand, T., DeAngelis, D.L., 2005. Pattern-oriented modeling of agent-based complex systems: Lessons from ecology. Science 310, 987–991. Grimm, V., Berger, U., Bastiansen, F., Eliassen, S., Ginot, V., Giske, J., Goss-Custard, J., Grand, T., Heinz, S., Huse, G., Huth, A., Jepsen, J., Jorgensen, C., Mooij, W., Muller, B., Pe’er, G., Piou, C., Railsback, S., Robbins, A., Robbins, M., Rossmanith, E., Ruger, N., Strand, E., Souissi, S., Stillman, R., Vabo, R., Visser, U., Deangelis, D., 2006. A standard protocol for describing individual-based and agent-based models. Ecological Modelling 198 (1–2), 115–126. Grimm, V., Berger, U., DeAngelis, D.L., Polhill, G.J., Giske, J., Railsback, S.F., 2010. The ODD protocol for describing individual-based and agent-based models: A first update. Ecological Modelling 221 (23), 2760–2768. Groff, E.R., 2007. Simulation for theory testing and experimentation: An example using routine activity theory and street robbery. Journal of Quantitative Criminology 23 (2), 75–103. Guerrero, O.A., Axtell, R., 2011. Using agentization for exploring firm and labor dynamics. In: Osinga, S., Hofstede, G., Verwaart, T. (Eds.), Emergent results of artificial economics. Springer, New York, pp. 139–150. Haase, D., Lautenbach, S., Seppelt, R., 2010. Modeling and simulating residential mobility in a shrinking city using an agent-based approach. Environmental Modelling & Software 25 (10), 1225–1240. Hagerstrand, T., 1967. Innovation diffusion as a spatial process. In: University of Chicago Press, Chicago, IL. Haklay, M., O’Sullivan, D., Thurstain-Goodwin, M., Schelhorn, T., 2001. “So go downtown”: Simulating pedestrian movement in town centres. Environment and Planning B: Planning and Design 28 (3), 343–359. Harland, K., Heppenstall, A.J., Smith, D., Birkin, M., 2012. Creating realistic synthetic populations at varying spatial scales: A comparative critique of population synthesis techniques. Journal of Artificial Societies and Social Simulation 15 (1), 1. Available at http://jasss.soc.surrey.ac.uk/15/1/1.html. Harper, S.J., Westervelt, J.D., Trame, A., 2002. Management application of an agent-based model: Control of cowbirds at the landscape scale. In: Gimblett, H.R. (Ed.), Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. Oxford University Press, Oxford, UK, pp. 105–123. Heine, B.O., Meyer, M., Strangfeld, O., 2005. Stylised facts and the contribution of simulation to the economic analysis of budgeting. Journal of Artificial Societies and Social Simulation 8 (4), 4. Available at http://jasss.soc.surrey.ac.uk/8/4/4.html. Helbing D and Balietti S (2011) How to do agent-based simulations in the future: from modeling social mechanisms to emergent phenomena and interactive systems design. Santa Fe Institute, Working paper 11-06-024, Santa Fe, NM. Heppenstall, A.J., Evans, A.J., Birkin, M.H., 2005. A hybrid multi-agent/spatial interaction model system for petrol price setting. Transactions in GIS 9 (1), 35–51. Heppenstall, A.J., Evans, A.J., Birkin, M.H., 2006. Using hybrid agent-based systems to model spatially-influenced retail markets. Journal of Artificial Societies and Social Simulation 9 (3), 2. Available at http://jasss.soc.surrey.ac.uk/9/3/2.html. Heppenstall, A.J., Crooks, A.T., Batty, M., See, L.M. (Eds.), 2012. Agent-based models of geographical systems. Springer, New York. Heppenstall, A., Malleson, N., Crooks, A.T., 2016. “Space, the final frontier”: How good are agent-based models at simulating individuals and space in cities? Systems 4 (1), 9. http://www.mdpi.com/2079-8954/4/1/9/html. Horni, A., Nagel, K., Axhausen, K.W. (Eds.), 2016. The multi-agent transport simulation MATSim. Ubiquity, London. Howe, T.R., Collier, N.T., North, M.J., Parker, M.T., Vos, J.R., 2006. Containing agents: Contexts, projections, and agents. In: Sallach, D., Macal, C.M., North, M.J. (Eds.) Proceedings of the Agent 2006 Conference on Social Agents: Results and Prospects. University of Chicago and Argonne National Laboratory, Chicago, IL. Iltanen, S., 2012. Cellular automata in urban spatial modelling. In: Heppenstall, A.J., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 69–84. Izard, C.E., 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspectives on Psychological Science 2 (3), 260–280. Jackson, J., Forest, B., Sengupta, R., 2008. Agent-based simulation of urban residential dynamics and land rent change in a gentrifying area of Boston. Transactions in GIS 12 (4), 475–491. Janssen, M.A., 2009. Understanding Artificial Anasazi. Journal of Artificial Societies and Social Simulation 12 (4), 13. Available at http://jasss.soc.surrey.ac.uk/12/4/13.html. Jennings, N.R., 2000. On agent-based software engineering. Artificial Intelligence 117 (2), 277–296. Johnson, P.E., 2002. Agent-based modeling what I learned from the artificial stock market. Social Science Computer Review 20 (2), 174–186. Johnston, K.M. (Ed.), 2013. Agent analyst: Agent-based modeling in ArcGIS. ESRI Press, Redlands, CA. Jordan, R., Birkin, M., Evans, A., 2014. An agent-based model of residential mobility assessing the impacts of urban regeneration policy in the EASEL district. Computers, Environment and Urban Systems 48, 49–63. Kalnay, E., 2003. Atmospheric modeling, data assimilation and predictability. In: Cambridge University Press, Cambridge, UK. Kaul, D., Bruno, J., Roberts, J., 2004. CATS user’s manual. In: Science Applications International Corporation, San Diego, CA (No. SAIC-00/1010). Kennedy, W., 2012. Modelling human behaviour in agent-based models. In: Heppenstall, A., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 167–180. Kennedy, W.B., Hailegiorgis, A.B., Rouleau, M., Bassett, J.K., Coletti, M., Balan, G.C., Gulden, T., 2010. An agent-based model of conflict in East Africa and the effect of watering holes. In: Behavior Representation in Modeling and Simulation (BRiMS) Conference, pp. 274–281. Charleston, SC. Knudsen, D.C., Fotheringham, A.S., 1986. Matrix comparison, goodness-of-fit, and spatial interaction modeling. International Regional Science Review 10 (2), 127–147. Kravari, K., Bassiliades, N., 2015. A survey of agent platforms. Journal of Artificial Societies and Social Simulation 18 (1), 11. Available at http://jasss.soc.surrey.ac.uk/18/1/ 11.html. Krzysztof, K., Dzwinel, W., Yuen, D.A., 2005. Nonlinear development of bacterial colony modelled with cellular automata and agent objects. International Journal of Modern Physics C: Computational Physics and Physical Computation 14 (10), 1385–1404. Lafuerza, L.F., Dyson, L., Edmonds, B., McKane, A.J., 2016. Staged models for interdisciplinary research. PLoS One 11 (6), e0157261. http://dx.doi.org/10.1371/ journal.pone.0157261. Łatek, M.M., Mussavi Rizi, S.M., Crooks, A.T., Fraser, M., 2012. In: Social Simulations for Border Security’, Workshop on Innovation in Border Control 2012, Co-located with the European Intelligence and Security Informatics Conference (EISIC 2012), pp. 340–345. Odense, Denmark. Laver, M., Sergenti, E., 2012. Party competition: An agent-based model. Princeton University Press, Princeton, NJ. Leavesley, G.H., Markstrom, S.L., Brewer, M.S., Viger, R.J., 1996. The modular modeling system (MMS)dThe physical process modeling component of a database-centered decision support system for water and power management. In: Chow, W., Brocksen, R.W., Wisniewski, J. (Eds.), Clean water: Factors that influence its availability, quality and its use. Springer, New York, pp. 303–311. Lee, J.S., Filatova, T., Ligmann-Zielinska, A., Hassani-Mahmooei, B., Stonedahl, F., Lorscheid, I., Voinov, A., Polhill, G., Sun, Z., Parker, D.C., 2015. The complexities of agentbased modeling output analysis. Journal of Artificial Societies and Social Simulation 18 (4), 4. Available at http://jasss.soc.surrey.ac.uk/18/4/4.html.

Agent-Based Modeling

241

Lim, K., Deadman, P., Moran, E., Brondizio, E., McCracken, S., 2002. Agent-based simulations of household decision making and land use change near Altamira, Brazil. In: Gimblett, H.R. (Ed.), Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. Oxford University Press, Oxford, UK, pp. 277–310. Lovelace, A., 1843. Notes on L. Menabrea’s Sketch of the analytical engine invented by Charles Babbage, Esq. Taylor’s Scientific Memoirs 3, 666–731. Luke, S., 2015. Multiagent simulation and the MASON library. George Mason University, Fairfax, VA. Available at http://cs.gmu.edu/eclab/projects/mason/manual.pdf. Luke, S., Cioffi-Revilla, C., Panait, L., Sullivan, K., Balan, G., 2005. MASON: A multi-agent simulation environment. Simulation 81 (7), 517–527. Lytinen, S.L., Railsback, S.F., 2012. The evolution of agent-based simulation platforms: A review of NetLogo 5.0 and ReLogo. In: Bichler, R.M., Blachfellner, S., Hofkirchner, W. (Eds.)Proceedings of the Fourth International Symposium on Agent-based Modeling and Simulation. Vienna, Austria. Macal, C.M., 2016. Everything you need to know about agent-based modelling and simulation. Journal of Simulation 10 (2), 144–156. Macal, C.M., North, M.J., 2005. Tutorial on agent-based modelling and simulation. In: Euhl, M.E., Steiger, N.M., Armstrong, F.B., Joines, J.A. (Eds.)Proceedings of the 2005 Winter Simulation Conference, pp. 2–15. Orlando, FL. Macatulad, E.G., Blanco, A.C., 2014. 3DGIS-based multi-agent geosimulation and visualization of building evacuation using GAMA platform. The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40 (2), 87–91. Maguire, D.J., 1995. Implementing spatial analysis and GIS applications for business and service planning. In: Longley, P.A., Clarke, G. (Eds.), GIS for business and service planning. GeoInformation International, Cambridge, UK, pp. 171–191. Maguire, D.J., 2005. Towards a GIS platform for spatial analysis and modelling. In: Maguire, D.J., Batty, M., Goodchild, M.F. (Eds.), GIS, spatial analysis and modelling. ESRI Press, Redlands, CA, pp. 19–39. Malleson N (2012) RepastCitydA demo virtual city. Available at https://github.com/nickmalleson/repastcity (accessed on 11 Nov. 2016). Malleson, N., Andresen, M.A., 2015. The impact of using social media data in crime rate calculations: Shifting hot spots and changing spatial patterns. Cartography and Geographic Information Science 42 (2), 112–121. Malleson, N., Evans, A., Jenkins, T., 2009. An agent-based model of burglary. Environment and Planning B: Planning and Design 36 (6), 1103–1123. Malleson, N., Heppenstall, A., See, L., 2010. Crime reduction through simulation: An agent-based model of burglary. Computers, Environment and Urban Systems 34 (3), 236–250. Malleson, N., See, L., Evans, A., Heppenstall, A., 2012. Implementing comprehensive offender behaviour in a realistic agent-based model of burglary. Simulation 88 (1), 50–71. Malleson, N., Heppenstall, A., See, L., Evans, A., 2013. Using an agent-based crime simulation to predict the effects of urban regeneration on individual household burglary risk. Environment and Planning B: Planning and Design 40 (3), 405–426. Manley, E., Cheng, T., Penn, A., Emmonds, A., 2014. A framework for simulating large-scale complex urban traffic dynamics through hybrid agent-based modelling. Computers, Environment and Urban Systems 44, 27–36. Manson, S., O’Sullivan, D., 2006. Complexity theory in the study of space and place. Environment and Planning A 38 (4), 677–692. Martõnez-Miranda, J., Aldea, A., 2005. Emotions in human and artificial intelligence. Computers in Human Behavior 21 (2), 323–341. MASON. (2016). Multi agent simulation of neighbourhood. Available at http://cs.gmu.edu/eclab/projects/mason/ (accessed on 12 Jul. 2016). Mayer-Schönberger, V., Cukier, K., 2013. Big data: A revolution that will transform how we live, work and think. John Murray, London. Miller, J.H., Page, S.E., 2007. Complex adaptive systems. Princeton University Press, Princeton, NJ. Minar N, Burkhart R, Langton C, and Askenazi M (1996) The swarm simulation system: A toolkit for building multi-agent simulations. Santa Fe Institute, SFI working paper 199606-042, Santa Fe, NM. Available at http://www.santafe.edu/media/workingpapers/96-06-042.pdf. Moss, S., 2008. Alternative approaches to the empirical validation of agent-based models. Journal of Artificial Societies and Social Simulation 11 (1), 5. Available at http://jasss.soc. surrey.ac.uk/11/1/5.html. Müller, J.P., 1998. Architectures and applications of intelligent agents: A survey. The Knowledge Engineering Review 13 (4), 252–280. Müller, B., Bohn, F., Dreßler, G., Groeneveld, J., Klassert, C., Martin, R., Schlüter, M., Schulze, J., Weise, H., Schwarz, N., 2013. Describing human decisions in agent-based modelsdODD þ D, An extension of the ODD protocol. Environmental Modelling & Software 48, 37–48. Mysore, V., Narzisi, G., Mishra, B., 2006. Agent modelling of a sarin attack in Manhattan. In: Jennings, N.R., Tambe, M., Ishida, T., Ramchurn, S.D. (Eds.), First International Workshop on Agent Technology for Disaster Management. Future University, Hakodate, Japan. Nagel K, Stretz P, Pieck M, Donnelly R, and Barrett CL (1997) TRANSIMS traffic flow characteristics. Available at arXiv:adap-org/9710003v1. Nakajima T (1977) Application de la Théorie de L’automate à la Simulation de L’évolution de L’espace Urbain. Congrès Sur La Méthodologie De L’Aménagement Et Du Dévelopment, Association Canadienne-Française Pour L’Avancement Des Sciences et Comité De Coordination Des Centres De Recherches En Aménagement, Développement Et Planification (CRADEP), Montreal, Canada, pp. 154–160. NatureServe Vista. (2016). Decision support for better planning. Available at http://www.natureserve.org/conservation-tools/natureserve-vista (accessed on 12 Apr. 2016). Ngo, T.N., See, L.M., 2012. Calibration and validation of agent-based models of land cover change. In: Heppenstall, A., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 181–198. Nikolai, C., Madey, G., 2009. Tools of the trade: A survey of various agent based modeling platforms. Journal of Artificial Societies and Social Simulation 12 (2), 2. Available at http://jasss.soc.surrey.ac.uk/12/2/2.html. North, M.J., Howe, T.R., Collier, N.T., Vos, J.R., 2005a. The Repast Simphony development environment. In: Macal, C.M., North, M.J. (Eds.)Proceedings of the Agent 2005 Conference on Generative Social Processes, Models, and Mechanisms. Chicago, IL. North, M.J., Howe, T.R., Collier, N.T., Vos, J.R., 2005b. The Repast Simphony runtime system. In: Macal, C.M., North, M.J., Sallach, D. (Eds.)Proceedings of the Agent 2005 Conference on Generative Social Processes, Models, and Mechanisms. Chicago, IL. North, M.J., Collier, N.T., Vos, J.R., 2006. Experiences creating three implementations of the Repast agent modelling toolkit. ACM Transactions on Modelling and Computer Simulation 16 (1), 1–25. North, M.J., Collier, N.T., Ozik, J., Tatara, E.R., Macal, C.M., Bragen, M., Sydelko, P., 2013. Complex adaptive systems modeling with Repast Simphony. Complex Adaptive Systems Modeling 1 (1), 3. http://dx.doi.org/10.1186/2194-3206-1-3. O’Sullivan, D., Unwin, D., 2010. Geographic information analysis. In: 2nd edn. Wiley, Hoboken, NJ. O’Sullivan, D., Millington, J., Perry, G., Wainwright, J., 2012. Agent-based modelsdBecause they’re worth it? In: Heppenstall, A.J., Crooks, A.T., Batty, M., See, L.M. (Eds.), Agent-based models of geographical systems. Springer, New York. Orcutt, G.H., 1957. A new type of socio-economic system. The Review of Economics and Statistics 39 (2), 116–123. O’Sullivan, D., 2004. Complexity science and human geography. Transactions of the Institute of British Geographers 29 (3), 282–295. Ozik, J., Collier, N.T., Murphy, J.T., North, M.J., 2013. The ReLogo agent-based modeling language. In: Pasupathy, R., Kim, S.-H., Tolk, A., Hill, R., Kuhl, M.E. (Eds.)Proceedings of the 2013 Winter Simulation Conference, pp. 1560–1568. Washington, DC. Ozik, J., Collier, N., Combs, T., Macal, C.M., North, M., 2015. Repast Simphony statecharts. Journal of Artificial Societies & Social Simulation 18 (3), 11. Available at http://jasss. soc.surrey.ac.uk/18/3/11.html. Parker, D.C., Manson, S.M., Janssen, M.A., Hoffmann, M.J., Deadman, P., 2003. Multi-Agent systems for the simulation of land-use and land-cover change: A review. Annals of the Association of American Geographers 93 (2), 314–337. Peuquet, D., 2002. Representations of space and time. Guilford Press, New York. Pires, B., Crooks, A.T., 2016. The geography of conflict diamonds: The case of Sierra Leone. In: Xu, K.S., Reitter, D., Lee, D., Osgood, N. (Eds.)Proceedings of the 2016 International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction and Behavior Representation in Modeling and Simulation. Springer, Washington, DC, pp. 335–345.

242

Agent-Based Modeling

Pires, B., Crooks, A.T., 2017. Modeling the emergence of riots: A geosimulation approach. Computers, Environment and Urban Systems 61, 66–80. Pumain, D., 2012. Multi-agent system modelling for urban systems: The series of SIMPOP models. In: Heppenstall, A.J., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 721–738. Railsback, S.F., Grimm, V., 2011. Agent-based and individual-based modeling: A practical introduction. Princeton University Press, Princeton, NJ. Railsback, S.F., Harvey, B.C., 2002. Analysis of habitat selection rules using an individual-based model. Ecology 83 (7), 1817–1830. Railsback, S.F., Lytinen, S.L., Jackson, S.K., 2006. Agent-based simulation platforms: Review and development recommendations. Simulation 82 (9), 609–623. Rao, A.S., Georgeff, M.P., 1991. Modeling rational agents within a BDI-architecture. In: Allen, J., Fikes, R., Sandewall, E. (Eds.)Proceedings of the Second International Conference on Principles of Knowledge Representation and Reasoning, pp. 473–484. San Mateo, CA. Rao, A.S., Georgeff, M.P., 1995. BDI agents: From theory to practice. In: Gasser, L., Lesser, V. (Eds.)Proceedings of the First International Conference on Multiagent Systems, pp. 312–319. San Francisco, CA. Ripley, B.D., 1977. Modelling spatial patterns. Journal of the Royal Statistical Society: Series B 39 (2), 172–212. Robertson, D.A., 2005. Agent-based modeling toolkits. The Academy of Management Learning and Education 4 (4), 525–527. Robinson, D.T., Brown, D., Parker, D.C., Schreinemachers, P., Janssen, M.A., Huigen, M., Wittmer, H., Gotts, N., Promburom, P., Irwin, E., Berger, T., Gatzweiler, F., Barnaud, C., 2007. Comparison of empirical methods for building agent-based models in land use science. Journal of Land Use Science 2 (1), 31–55. Schelling, T.C., 1971. Dynamic models of segregation. Journal of Mathematical Sociology 1 (1), 143–186. Schmidt, B., 2000. The modelling of human behaviour. SCS-Europe BVBA, Ghent, Belgium. Schmidt, B., 2002. The modelling of human behaviour: The PECS reference model. In: Proceedings 14th European Simulation Symposium. Dresden, Germany. Simon, H.A., 1955. A behavioral model of rational choice. The Quarterly Journal of Economics 69 (1), 99–118. Smajgl, A., Brown, D.G., Valbuena, D., Huigen, M.G.A., 2011. Empirical characterisation of agent behaviours in socio-ecological systems. Environmental Modelling & Software 26 (7), 837–844. Smith, D.M., Pearce, J., Harland, K., 2011. Can a deterministic spatial microsimulation model provide reliable small-area estimates of health behaviours? An example of smoking prevalence in New Zealand. Health & Place 17 (2), 618–624. Sullivan K, Coletti M, and Luke S (2010) GeoMason: GeoSpatial support for MASON. Department of Computer Science, George Mason University, Technical Report Series, Fairfax, VA. Swarm (2016) Swarm: A platform for agent-based models. Available at http://www.swarm.org/ (accessed on 7 Sep. 2016). Taillandier, P., 2014. Traffic Simulation with the GAMA Platform. In: Eighth International Workshop on Agents in Traffic and Transportation, International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014). Paris, France. Available at https://hal.archives-ouvertes.fr/hal-01055567. Taillandier, P., Therond, O., Gaudou, B., 2012. A new BDI agent architecture based on the belief theory. Application to the modelling of cropping plan decision-making. In: Seppelt, R., Voinov, A.A., Lange, S., Bankamp, D. (Eds.), 2012 International Congress on Environmental Modelling and Software, International Environmental Modelling and Software Society. Leipzig, Germany, pp. 2471–2479. Taillandier, P., Banos, A., Drogoul, A., Gaudou, B., Marilleau, N., Truong, Q.C., 2016. Simulating urban growth with raster and vector models: A case study for the city of Can Tho, Vietnam. In: Osman, N., Sierra, C. (Eds.)International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016). Springer, Singapore, pp. 154–171. Takadama, K., Kawai, T., Koyama, Y., 2008. Micro- and macro-level validation in agent-based simulation: Reproduction of human-like behaviours and thinking in a sequential bargaining game. Journal of Artificial Societies and Social Simulation 11 (2), 9. Available at http://jasss.soc.surrey.ac.uk/11/2/9.html. Taylor, G., Frederiksen, R., Vane, R.R., Waltz, E., 2004. Agent-based simulation of geo-political conflict. In: Hill, R., Jacobstein, N. (Eds.)The Sixteenth Annual Conference on Innovative Applications of Artificial Intelligence. AAAI Press, San Jose, CA, pp. 884–891. Thiele, J.C., Grimm, V., 2010. NetLogo meets R: Linking agent-based models with a toolbox for their analysis. Environmental Modelling & Software 25 (8), 972–974. Tobler, W.R., 1979. Cellular geography. In: Gale, S., Olsson, G. (Eds.), Philosophy in geography. Reidel, Dordrecht, The Netherlands, pp. 379–386. Torrens PM (2000) How cellular model of urban systems work. Centre for Advanced Spatial Analysis, University College London, Working paper 28, London, UK. Torrens, P.M., 2010. Agent-based modeling and the spatial sciences. Geography Compass 4 (5), 428–448. Torrens, P.M., 2012. Moving agent-pedestrians through space and time. Annals of the Association of American Geographers 102 (1), 35–66. Torrens, P.M., 2014. High-fidelity behaviors for model people on model streetscapes. Annals of GIS 20 (3), 139–157. Torrens, P.M., McDaniel, A.W., 2013. Modeling geographic behavior in riotous crowds. Annals of the Association of American Geographers 103 (1), 20–46. Torrens, P.M., Nara, A., 2007. Modelling gentrification dynamics: A hybrid approach. Computers, Environment and Urban Systems 31 (3), 337–361. Turing, A., 1936. On computable numbers with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society 42 (2), 230–265. Turing, A.M., 1950. Computing machinery and intelligence. Mind 59 (236), 433–460. Ungerer, M.J., Goodchild, M.F., 2002. Integrating spatial data analysis and GIS: A new implementation using component object model (COM). International Journal of Geographic Information Science 16 (1), 41–53. Urban, C., 2000. PECS: A reference model for the simulation of multi-agent systems. In: Suleiman, R., Troitzsch, K.G., Gilbert, N. (Eds.), Tools and techniques for social science simulation. Springer, New York, pp. 83–114. van Vliet, J., Hagen-Zanker, A., Hurkens, J., van Delden, H., 2013. A fuzzy set approach to assess the predictive accuracy of land use simulations. Ecological Modelling 261–262, 32–42. von Neumann, J., 1951. The general and logical theory of automata. In: Jeffress, L.A. (Ed.), Cerebral mechanisms in behavior: The Hixon symposium. Wiley, New York, pp. 1–41. Vos, J.R., North, M.J., 2004. Repast.NET. In: Macal, C.M., Sallach, D., North, M.J. (Eds.)Proceedings of the Agent 2004 Conference on Social Dynamics: Interaction, Reflexivity and Emergence, pp. 239–254. Chicago, IL. Ward, J.A., Evans, A.J., Malleson, N., 2016. Dynamic calibration of agent-based models using data assimilation. Open Science 3 (4), 150703. Weiner, N., 1948. Cybernetics. MIT Press, Cambridge, MA. Westervelt, J.D., 2002. Geographic information systems and agent-based modelling. In: Gimblett, H.R. (Ed.), Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. Oxford University Press, Oxford, UK, pp. 83–104. White, R., Engelen, G., 1993. Cellular automata and fractal urban form: A cellular modelling approach to the evolution of urban land use patterns. Environment and Planning A 25 (8), 1175–1199. Wilensky, U., Rand, W., 2015. An introduction to agent-based modeling: Modeling natural, social, and engineered complex systems with NetLogo. MIT Press, Cambridge, MA. Wilson, A.G., 1974. Urban and regional models in geography and planning. Wiley, Chichester, UK. Windrum, P., Fagiolo, G., Moneta, A., 2007. Empirical validation of agent-based models: Alternatives and prospects. Journal of Artificial Societies and Social Simulation 10 (2), 8. Available at http://jasss.soc.surrey.ac.uk/10/2/8.html. Wise S (2014) Using social media content to inform agent-based models for humanitarian crisis response. PhD Dissertation, George Mason University, Fairfax, VA. Wolfram, S., 2002. A knew kind of science. Wolfram Media, Champaign, IL. Wooldridge, M., 1997. Agent-based software engineering. IEE Proceedings-Software Engineering 144 (1), 26–37. Wooldridge, M., Jennings, N.R., 1995. Intelligent agents: Theory and practice. Knowledge Engineering Review 10 (2), 115–152. Wu, B.M., Birkin, M., 2012. Agent-based extensions to a spatial microsimulation model of demographic change. In: Heppenstall, A.J., Crooks, A.T., See, L.M., Batty, M. (Eds.), Agent-based models of geographical systems. Springer, New York, pp. 347–360.

Agent-Based Modeling

243

Xie, Y., Fan, S., 2014. Multi-city sustainable regional urban growth simulationdMSRUGS: A case study along the mid-section of silk road of China. Stochastic Environmental Research and Risk Assessment 28 (4), 829–841. Yiu C (2012) The big data opportunity: Making government faster, smarter and more personal, policy exchange, London, England. Available at https://policyexchange.org.uk/ publication/the-big-data-opportunity-making-government-faster-smarter-and-more-personal/. Yu, L., 2006. Understanding information inequality: Making sense of the literature of the information and digital divides. Journal of Librarianship and Information Science 38 (4), 229–252.

Further Reading Batty, M., 2005b. Cities and complexity: Understanding cities with cellular automata, agent-based models, and fractals. MIT Press, Cambridge, MA. Batty, M., 2013b. The new science of cities. MIT Press, Cambridge, MA. Benenson, I., Torrens, P.M., 2004b. Geosimulation: Automata-based modelling of urban phenomena. Wiley, London, UK. Epstein, J.M., Axtell, R., 1996b. Growing artificial societies: Social science from the bottom up. MIT Press, Cambridge, MA. GAMA (2016) GAMA modeling and simulation development environment. Available at http://gama-platform.org/ (accessed on 20 Oct. 2016). Gilbert, N., Troitzsch, K.G., 2005b. Simulation for the social scientist. In: 2nd edn. Open University Press, Milton Keynes. Gimblett, H.R., 2002b. Integrating geographic information systems and agent-based modelling techniques for simulating social and ecological processes. In: Oxford University Press, Oxford, UK. Heppenstall, A.J., Crooks, A.T., Batty, M., See, L.M. (Eds.), 2012b. Agent-based models of geographical systems. Springer, New York. Kohler, T.A., Gumerman, G.J. (Eds.), 2000. Dynamics in human and primate societies: Agent-based modeling of social and spatial processes. Oxford University Press, Oxford, UK. Maguire, D.J., Batty, M., Goodchild, M.F. (Eds.), 2005. GIS, spatial analysis and modelling. ESRI Press, Redlands, CA. O’Sullivan, D., Perry, G.L., 2013. Spatial simulation: Exploring pattern and process. Wiley, Chichester, UK. Parker, D.C., Berger, T., Manson, S.M., 2001. In: Proceedings of an International Workshop on Agent-Based Models of Land-Use and Land-Cover Change. Irvine, CA. Railsback, S.F., Grimm, V., 2011b. Agent-based and individual-based modeling: A practical introduction. Princeton University Press, Princeton, NJ. Repast. (2016). Recursive porous agent simulation toolkit. Available at http://repast.sourceforge.net/ (accessed on 7 Oct. 2016). Sanders, L., Pumain, D., Mathian, H., Guérin-Pace, F., Bura, S., 1997. SIMPOP: A multiagent system for the study of urbanism. Environment and Planning B: Planning and Design 24 (2), 287–305. Simon, H.A., 1996. The sciences of the artificial. In: 3rd edn. MIT Press, Cambridge, MA. Wilensky, U., Rand, W., 2015b. An introduction to agent-based modeling: Modeling natural, social, and engineered complex systems with NetLogo. MIT Press, Cambridge, MA.

Relevant Websites http://www.complexcity.info/dA Science of Cities. http://www.bartlett.ucl.ac.uk/casa/dCentre for Advanced Spatial Analysis. http://www.geosimulation.org/dGeosimulation. http://www.gisagents.org/dGIS and Agent-based modelling. http://cs.gmu.edu/eclab/projects/mason/dMASON Simulation Library. https://ccl.northwestern.edu/netlogo/dNetLogo, a programmable modelling environment. https://www.openabm.org/dOpen Agent-based modelling Consortium.

1.17

Spatial Optimization for Sustainable Land Use Planning

Kai Cao, National University of Singapore, Singapore © 2018 Elsevier Inc. All rights reserved.

1.17.1 1.17.2 1.17.3 1.17.3.1 1.17.3.1.1 1.17.3.1.2 1.17.3.1.3 1.17.3.2 1.17.3.2.1 1.17.3.2.2 1.17.3.3 1.17.3.4 1.17.3.5 References

1.17.1

Spatial Optimization Sustainable Land Use Planning Sustainable Land Use Optimization Problems Objectives and Constraints Spatial compactness Spatial compatibility Spatial contiguity Formulation Single objective–based multiobjective optimization Pareto front–based multiobjective optimization Solutions Discussion Conclusion

244 245 247 247 247 247 248 248 248 249 250 250 251 251

Spatial Optimization

Spatial optimization has been a popular and challenging topic for decades, especially in the fields of geography, regional science, economics, operational research, environmental studies and engineering. Church (2016) noted that spatial optimization involves identifying how land use and other activities should be arranged and organized spatially, and includes many location, layout, network and districting problems that involve design, operations, and planning. The field of spatial optimization is firmly rooted in the work done by the pioneers many years ago (Church, 2016; Euler, 1741; Hotelling, 1929; Steiner et al., 1876). But the term “spatial optimization” is still relatively new, emerging during the late 1960s within the context of regional science (Mathieson, 1969). It has developed rapidly in the past decades along with the advancement of computer science. ReVelle and Swain (1970) studied the well-known central facilities location problem (P-median problem) in the 1970s, and Leung (1984) studied robust programming for spatial optimization in his dissertation research. Kuby (1989) successfully developed a spatial optimization model to find the optimal spatial arrangement of towns and markets. O’Kelly and Miller (1994) reviewed the hub network design problem. Hof and Bevers (1998) studied the spatial optimization problems for managed ecosystems. Cova and Church (2000a) developed a neighborhood operator approach to solve a site selection spatial optimization problem. Lemberg and Church (2000) studied the reassignments of school boundaries in Broward County, Florida between 1990 and 1998. Liu et al. (2006) proposed to use the ANT algorithm to find the optimal sites of fire stations in Singapore. Ligmann-Zielinska et al. (2008) utilized spatial optimization to solve multiobjective land use allocation problems. Tong and Murray (2012) reviewed spatial optimization problems in geography. Cao and Ye (2013) tried to employ a parallel genetic algorithm to solve a land use optimization problem. In addition, Cao et al. (2014) also proposed a Pareto front–based optimization model to calibrate a cellular automata model for the study of rural-urban land conversion in New Castle, Delaware. Li et al. (2014) innovatively defined the p-compact-regions problem in their research. Xue et al. (2015) proposed an Ant Colony Optimization model to solve the waste collection problems in Singapore. There have also been many other spatial optimization related studies successfully conducted in the past decades. The booming achievements in this field can be seen in Fig. 1 (based on Scopus as of the end of 2016, checked on Feb. 18, 2017). It shows that, as of the end of 2016, a total of 27,689 publications could be found using the keyword of “spatial optimization” on Scopus; Fig. 1 also demonstrates the rapidly rising number of spatial optimization–related studies since 1961. In addition, Fig. 2 also shows that most of the publications are from the United States and China, followed by Germany, the United Kingdom and France. Of course, some of these publications may be beyond our definition of spatial optimization, but we believe these two figures still reflect the relative distribution of spatial optimization related studies based on countries and years (spatially and temporally). Generally, spatial optimization problems comprise three components: objective(s), constraint(s) and decisions to be made (Tong and Murray, 2012). The objective(s) refer to the goal(s) of the problems, for example, maximizing the service coverage, minimizing the cost, or maximizing the ecological benefit. The objective(s) could be a single objective, or multiple objectives depending on different contexts. Constraints refer to the conditions necessary to be satisfied, such as the budget limit to construct a new fire station, how many more fire stations can be constructed, or even the locations that need to be covered by a certain kind of service. Decisions to be made refers to core research questions, such as where to locate a new fire station, how to allocate different land use resources to different locations, or which need to be interpreted to different objectives and constraints. To tackle spatial optimization problems, there have been many different kinds of models, ranging from linear programming (LP) models to heuristic models, weighted-sum models to Pareto front–based models, depending on the different contexts of different projects.

244

Spatial Optimization for Sustainable Land Use Planning

245

2500

2000

1500

1000

Fig. 1

2009

2007

2005

2003

2001

1999

1997

1995

1993

1991

1989

1987

1985

1983

1981

1979

1977

1975

1973

1971

1969

1967

1965

1963

0

1961

500

Rising number of “spatial optimization” related studies on Scopus since 1961. Source: Scopus.

United States China Germany United Kingdom France Canada Italy Japan Australia Spain 0 Fig. 2

1000

2000

3000

4000

5000

6000

7000

8000

9000

Top ten countries in terms of “spatial optimization” related studies on Scopus. Source: Scopus.

And apparently spatial optimization problems are too broad to be addressed all at one time; therefore, in this article we only focus on the studies of spatial optimization for sustainable land use planning. In the following section (section “Sustainable Land Use Planning”), more detailed concepts of sustainable land use planning are first reviewed and summarized, followed by definition and formulation of objectives and constraints of a generic sustainable land use optimization problem derived from spatial optimization and sustainable land use planning. After the review of the approaches for solving the sustainable land use optimization problems, a discussion of challenges and opportunities as well as conclusions is introduced in this article.

1.17.2

Sustainable Land Use Planning

A comprehensive definition of sustainability was developed and brought forward by the World Commission on Environment and Development (WCED) through the publication of Our Common Future in 1987. Sustainable development can hence be defined as: “Development that meets the needs of the present without compromising the ability of future generations to meet their own needs” (WCED, 1987). There also have been many other definitions, for example, Meadows et al. (1972) and Munasinghe (1993). By and large, a general consensus has been reached on the three aspects involved in sustainable development: economy, environment, and social aspects (Fig. 3). Economic aspect: An economically sustainable system should be capable of producing goods and services on a continuing basis, to build enough public facilities for the society, to sustain welfare, to manage finances properly, etc., so as to alleviate the damages resulting from environmental deterioration. In conclusion, an economically sustainable system not only promotes social equity but also minimizes the impact of pollution on the environment.

246

Spatial Optimization for Sustainable Land Use Planning

Economy

Sustainable development Ecology

Society

Fig. 3

Three aspects of sustainability.

Environmental aspect: An environmentally sustainable system should be able to avoid overexploitation of renewable resources. Such a system conserves nonrenewable resources by limited usage or depleting only to the extent proportional to the alternatives available. From an ideal perspective, this aspect should also include biodiversity, other vital ecosystem functions, atmosphere, water, and land resources, etc. Social aspect: A socially sustainable system refers to a framework that consciously takes into consideration the well-being of the people and the communities. Noticeably, the implied objectives are multidimensional, raising various issues, including the judgments on sustainable activities and how to achieve balance among these related factors or objectives. The term sustainable land use planning denotes two major domains: land use planning and sustainable development. Land use planning deals with the active planning of future land use by people to support their own needs. The land use types to be allotted can be diverse, such as industrial sites, residential zones, scenic landscapes and green spaces, etc. Land use planning generally involves assessment and decision making about where to allocate and how much to allocate for these specific land use activities. Based on FAO guidelines, land use planning can be understood as “the systematic assessment of land and water potential, alternatives for land use and economic and social conditions in order to select and adopt the best land-use options” (FAO, 1993), which can somewhat reflect the components of sustainable development and is commonly employed by various governments to manage their land use activities and mitigate the potential conflicts among different concerns. Of late, sustainability and sustainable development have been hot research topics. Sustainable development, thus defined, is an important goal for land use planning, which yields the term sustainable land use planning. This term has to be based on the meaning and descriptions of present practices in land use planning, as well as on the notion of sustainable development. Sustainable land use planning embraces several aspects, as given in Fig. 4. In this context, sustainable land use planning can be defined as instruments to set land use policies, to implement these policies for the right location of the various land uses and for the improvement of the spatial and physical conditions for an optimal use and protection of the natural resources on the long term, while meeting the needs and aspirations of the present generations (vanLier, 1994). Briefly stated, the essential notion behind sustainable land use planning pertains to the location (where) and proportion (how much) of allocation of specific land use activities. Importantly, this has to be in a judicious trade-off rooted in the three focal elements of sustainability: economy, environment, and society.

Setting policies for land use

Sustainable land use planning

Planning for various land uses and their locations Plans to improve spatial conditions

Fig. 4

Contents of sustainable land use planning (Van Lier et al., 1994).

For —optimal use and protection of natural resources on the long term —meeting the needs and aspirations of the present generation

Spatial Optimization for Sustainable Land Use Planning

247

In addition to these three general goals of sustainable development of land use planning, some detailed and practical aspects should also be worked out during the sustainable land use planning process. Evidently, such an overall goal can be understood from several perspectives. The literature on sustainability and spatial planning has brought up different sets of detailed objectives from different perspectives (Næss, 2001; OECD, 1994; UN, 1998).

1.17.3

Sustainable Land Use Optimization Problems

Mathematically and theoretically, sustainable land use planning can be understood as a kind of spatial optimization problem with the previously mentioned objectives and logic behind it, and then we can arrive at the term of sustainable land use optimization. On the other hand, sustainable land use optimization, as a kind of spatial multiobjective optimization, will be an effective land use planning support tool to assist in the planning or decision-making process after combining the comprehensive objectives of sustainability on land use planning and implementing effective and efficient optimization approaches. There are two key steps in generating the optimal scenarios for sustainable land use optimization: the first one is the selection and formulation of the objectives and constraints for land use sustainability, and the next crucial step is the construction of effective and efficient approaches for solving the spatial multiobjective optimization problem.

1.17.3.1

Objectives and Constraints

With regard to the multiobjective problem, land use planning for sustainable development includes the same objectives discussed earlier under sustainability: economy, society, and environment. Several related research projects pertaining to the definition of the objectives have been done in the past. Leccese and McCormick (2000), in Charter of the New Urbanism, described an agenda for sustainable land use planning, emphasizing infill development, environmental protection, spatial compactness, etc. Balling’s (1999) primary targets were to minimize traffic congestion, control air pollution, promote affordable housing, maximize economic development, conserve historical and cultural sites, etc. The research of Ligmann-Zielinska et al. (2008) focused on infill development, compatibility of neighboring land uses, and defensible redevelopment. Cao et al. (2012) employed GDP, land use conversion, ecological and geological suitability, accessibility, Not-In-My-Back-Yard (NIMBY) influence, spatial compactness and compatibility in a multiobjective land use optimization study. Many other meaningful sets of objectives or constraints have been considered by different scholars and researchers; by and large, discussions related to objectives representing sustainability invariably seem to consider economic benefit as one of the foremost goals, with social and environmental aspects also considered to be very important driving forces of sustainable development. In addition, in view of the spatial nature of such problems, compactness, compatibility and contiguity are also frequently employed in these existing studies as objectives or constraints.

1.17.3.1.1

Spatial compactness

Compact land use is favored in various kinds of planning, for example, reserve design and forest management. Promoting compactness has been a commonly recognized goal of sustainable land use planning. Urban sprawl has been a widespread problem for over 50 years, one that is tremendously affecting urban development. There are two main concerns environmentally, the extent of landscape consumption and the pollution caused (Anderson et al., 1996; Guiliano and Narayan, 2003; Williams, 1999). In addition, urban sprawl may also lead to the destruction of natural habitat of endangered species, inefficient consumption of land resources, air pollution as well as inefficient energy consumption. On the other hand, urban sprawl also evidently affects sustainability as related to social equity, as noted by Arbury (2005). Apparently, compactness should be one of the essential objectives in planning a city moving towards sustainability. Lock (1995) perceived a compact city as one that makes the fullest use of land resources. There are also some other studies on compact cities done by different scientists, for example, Burton (2000) and Thomas and Cousins (1996), reaching the consensus that compact land use is essential for sustainability. Many questions still exist, especially the quantitative evaluation of land use compactness. There are a variety of measures developed to quantify spatial compactness, such as the nonlinear integer program-neighbor method, minimization of the number of clusters per land use type, shape index (Aerts and Heuvelink, 2002; Ligmann-Zielinska et al., 2008), and spatial autocorrelation (Cliff and Ord, 1973; Kurttila et al., 2002; Wardoyo and Jordan, 1996). However, there is still no universal or best measure to define and quantify spatial compactness for land use optimization. Most of the studies are based on the direct implementations of or variations on these previously mentioned models. Aerts et al. (2003a), Stewart et al. (2004), and Janssen et al. (2008) have employed the shape index to pursue the compactness of land use scenarios. The effect is reasonably good, but it can only effectively apply to a relatively small area (in terms of the number of units). Aerts et al. (2003a) have compared different models on a hypothetical area (8 by 8 grid), and the efficiency of the nonlinear integer program-neighbor method is found to be the best. However, it is still hard to say which one is the best, and the selection of compactness measure still needs to be on a case-by-case basis.

1.17.3.1.2

Spatial compatibility

Compatibility of land use scenarios is concerned with the relationships among various land use types and the overall layout. It has been effectively utilized in many sustainable land use optimization studies (Cao et al., 2011, 2012; Chandramouli et al., 2009; Ligmann-Zielinska et al., 2008).

248

Spatial Optimization for Sustainable Land Use Planning

1

Land use type

K

Fig. 5

2

3

4

5

1

1.0 1.0 1.0 0.9 0.8

2

1.0 1.0 0.5 0.2 0.0

3

1.0 0.5 1.0 0.9 1.0

4

0.9 0.2 0.9 1.0 0.1

5

0.8 0.0 1.0 0.1 1.0

Compatibility (Cao et al., 2012).

As the compatibility of neighbors for different land use types is different, it is essential to guide land use planning to allocate compatible land uses together to establish the harmony of the complete area. Under this consideration, the compatibility of the neighbors should be an objective of sustainable land use planning. The details can be illustrated in Fig. 5. With regard to the compatibility value setting, it is feasible to obtain values from professionals or experts (i.e., the Delphi method). However, this may involve significant subjectivity and uncertainties. Even for the same expert, it is hard to detect the relationship of every pair of land use types with no bias. The analytic hierarchy process and pair-wise comparison could be employed to mitigate this uncertainty to some extent (Cao et al., 2012; David, 1988; Saaty, 2008).

1.17.3.1.3

Spatial contiguity

Another important spatial nature of land use planning is contiguity. It is mostly considered to be a constraint of the sustainable land use optimization problem, and has been successfully integrated in many existing studies (Aerts et al., 2003a; Cao and Ye, 2013; Cao et al., 2012; Cova and Church, 2000b; Ligmann-Zielinska et al., 2008; Shirabe, 2005; Williams, 2002). Contiguity can be defined differently based on vector and raster datasets. Most of the vector-based contiguity models are based on graph theory. Cova and Church (2000b) considered a shortest path contiguity constraint to solve a single-site selection problem. Cao and Ye (2013) employed a spatial neighboring relationship table (rook or queen) to control contiguity during the optimization process for a land use optimization problem. On the other hand, raster-based contiguity models are mostly based on four-neighbor or eight-neighbor relationships on the grid. Cao et al. (2012) directly employed a raster-based contiguity in a sustainable land use optimization problem, which played an important role in generating reasonable optimal land use planning scenarios in the case study. The complexity varies for vector and raster representations. Generally, the raster representation is relatively more straightforward than the vector representation in terms of the integration of spatial contiguity into the optimization model, but apparently vector representation is more esthetically pleasing and closer to practical land use planning cases (Cao and Ye, 2013; Cao et al., 2011).

1.17.3.2

Formulation

In a multiobjective optimization problem, the objectives of sustainable land use optimization are usually conflicting, and the objectives cannot be satisfied simultaneously. The solutions are based on the construction of the evaluation function for the multiobjective problem. There are generally two categories of approaches to formulate multiobjective optimization problems. The first one is to combine all the objectives into a single objective by some feasible method or choose only the main objective and change the other objectives to constraints. The second one involves determining and obtaining the entire Pareto front or a representative set with all the nondominated solutions. More details will be introduced in following sections.

1.17.3.2.1

Single objective–based multiobjective optimization

Single objective–based multiobjective optimization is very similar to the generic single-objective optimization. Earlier research involving multiobjective optimization quite frequently used this method when dealing with multiple objectives. The use of straightforward methods and models to combine the multiple objectives together makes it easy to understand and operate. Some of the methods are listed here: 1. Constraints method: The principle of this method is to choose one main objective from all the objectives, and then define all the other objectives as constraints. It is comparatively easy to transform multiobjective optimization problems to single-objective optimization problems with constraints. 2. Stratified sequential method: This method, which is a little more complicated than the constraints method mentioned earlier, takes advantage of the subjective weights of all the objectives and then satisfies the objectives one by one based on the sequence of the weights (from large to small). To some extent, this method can solve the multiple-objective optimization problem effectively; however, problems arise during the weight-setting procedure. The setting of weights using subjective opinions may not reflect the real or actual weights. It is important to note that it is possible for the user to miss some of the optimal solutions through this process.

Spatial Optimization for Sustainable Land Use Planning

249

3. Efficiency coefficient method: The key notion behind this method is to normalize the objective functions, and take the sum of all the objective functions as the final objective. This makes it more feasible to reflect the weights the users want to set; however, the normalization process and the minimization and maximization of each objective are also hard to decide upon without systematic analysis of the objective. 4. Linear weighted-sum method: This method involves the creation of an evaluation function by the sum of weights set for each objective. Despite its simplicity of operation, the subjective setting of weights and the nonnormalization of the objectives can introduce some wrong solutions to the final optimum. 5. Min-max/max-min method: The principle of the min-max method is to find the best strategy based on the pessimistic solutions, while that of the max-min method is to find the worst strategy based on the optimistic solutions. 6. Multiply and divide method: The target of this method is to construct an evaluation function by multiplying and dividing all the objectives, it is easy to realize but has many limitations. 7. Goal programming: This is a commonly used method to aid in multiobjective optimization problems solving; the formula is as follows:  Q  X fq ðxÞ  Iq r (1) lq  Iq q¼1 In this formula, q is the index of each objective, fq(x) is the evaluation of each objective, Iq is the possible ideal value for each objective if optimized on its own, lq is the worst value of each objective, and r is the power. If r ¼ 1, it can be considered as the sum of each normalized objective using the same weight; if r ¼ 2, it can be understood as the related Euclidean distance’s square to the ideal point, and very similar to the ideal point method without weight setting. If r ¼ 3 . N, it is the high dimension distance’s power r to the ideal point. Sometimes, the high dimension can result in better results. The normalization in this method is certain to give equal preference to each objective and yield a balanced optimal result at last. However, similar to the ideal point method, defining the reference point here is also a problem for the user. Of course, there are also some other methods to integrate multiple objectives into one, such as compromising programming or the weighted sum of squares. All of them are similar to each other, and actually, sufficient systematic experimentation with different weights can identify all nondominated solutions to a convex multiobjective optimization problem (Miettinen, 1999). However, this is not quite meaningful when within the context of nonconvex and multiobjective optimization problems. This kind of problem can be solved by Pareto front-based multiobjective optimization models and a nonlinear combination of single objective–based multiobjective optimization models.

1.17.3.2.2

Pareto front–based multiobjective optimization

Hitherto, most of the multiobjective optimization methods were based on the principle of constructing a single objective from all the objectives. Another general approach involved determining an entire set of Pareto-front solutions. Pareto-front solutions are a set of solutions that are not dominated by any other solutions. By this method, the optimal results can be obtained with different preference to all the objectives, from one solution to another on the Pareto front. However, this involves some compromise in one objective(s) to achieve a certain amount of gain in some other objective(s). 1. Pareto Ranking Pareto ranking, first proposed by Goldberg (1989), was the essential part of the optimization method. According to Goldberg’s ranking (Fig. 6) the nondominated solutions are assigned the best ranking of one, following which the remaining solutions are removed from consideration. The process is continued, and the new nondominated solutions are set to rank two, and the iteration continues until all solutions are ranked. Goldberg ranking

Objective-2

1 1 1

2 2 2

3

Objective-1 Fig. 6

Pareto ranking paradigm.

250

Spatial Optimization for Sustainable Land Use Planning

2. Diversity mechanism While the use of Pareto ranking selection can guarantee the selection of solutions by different rank number, it is not sufficient to ensure reasonable distribution of all solutions. Normally, the diversity mechanism can be integrated into the ranking process to distribute all the solutions evenly. A niche-based sharing approach, cell-based density, and crowding distance, etc., are some approaches to accomplish this.

1.17.3.3

Solutions

As stated previously, the formulation of different sustainable land use optimization problems with different contexts is complex and very challenging, especially while considering spatial objectives or constraints. However, to solve these sustainable land use optimization problems could be even more challenging. In the past, various kinds of land use optimization problems were solved by exact methods, including LP and integer programming approaches. Undeniably, even for multiobjective problems, trade-offs can be obtained using exact methods after combining objectives into one objective somehow (e.g., weighted sums). Many studies have effectively integrated exact methods with geographic information systems (GIS) to undertake spatial land use planning (Aerts et al., 2003a; Arthur and Nalle, 1997; Chuvieco, 1993; Gabriel et al., 2006; Stewart, 1991, 1993; Zimmermann, 1978). Nevertheless, due to the complexity of these spatial objectives and constraints involved, many of these cases cannot be solved effectively and efficiently by exact methods, but can be solved by heuristic methods, for example, simulated annealing or genetic algorithms (Cao et al., 2012; Haque and Asami, 2011). However, quantifying the relative weights for different objectives is always a challenging task for the planners or decision makers. On the other hand, as mentioned previously, the linear combination of different objectives into a single objective may still cause the missing of some nonconvex solutions, which can be addressed by Pareto front–based multiobjective optimization methods. Pareto front optimal (Pareto, 1971) can be used to avoid setting the weights of different objectives and retrieve the nonconvex solutions. The primary characteristic of the Pareto set is its independence of different objectives. It is very popular owing to its ability in solving multiobjective spatial optimization problems, including sustainable land use optimization problems (Balling et al., 1999; Cao et al., 2011; Chandramouli et al., 2009; Duh and Brown, 2007; Huang et al., 2013). However, the planning process is also very complicated due to the presence of a large number of variables and exponentially increased complexity, which makes the problem-solving process very slow even if only a few objectives are considered. To solve Pareto front–based optimization problems, due to its complexity and characteristics, most of these existing cases are dependent on heuristic methods. Duh and Brown (2007) utilized simulated annealing to perform Pareto front–based spatial optimization. Cao et al. (2011) employed NSGA-II to generate Pareto front–based optimal land use planning scenarios in Tongzhou, China. To summarize, sustainable land use optimization problems can be solved using both exact methods and heuristic methods, both of these have their pros and cons and also require further context information to determine which method to employ. Generally, exact methods, for example, LP and integer programming, can guarantee optimal solutions and can be conducted using a number of the existing commercial software packages, such as LINDO and CPLEX. Undoubtedly, in land use optimization problems, the complexity of the models would need to be simplified and may not be able to well represent reality. On the other hand, a heuristic method has the advantage of the ability to effectively handle a larger number of variables (e.g., spatial and nonspatial objectives, land parcels or cells), as well as solve the sustainable land use optimization problems efficiently by generating optimal or near-optimal solutions. This method has been widely employed in different studies of land use optimization problems, such as simulated annealing (Aerts and Heuvelink, 2002; Aerts et al., 2003b), evolutionary algorithms and genetic algorithms (Cao and Ye, 2013; Cao et al., 2011, 2012; Dai and Ratick, 2014; Haque and Asami, 2011; Janssen et al., 2008; Matthews et al., 1999; Stewart and Janssen, 2014; Stewart et al., 2004), and particle swarm optimization algorithms (Masoomi et al., 2013). However, how dependable these solutions are still needs more discussion.

1.17.3.4

Discussion

As described previously, sustainable land use planning problems could be addressed by spatial optimization models, and could be further defined as sustainable land use optimization problems. As a multifaceted process, land use planning not only refers to mathematical land use optimization, which is able to assess a variety of criteria and generate optimal or near-optimal land use scenarios, but also to how to integrate this information into the planning or decision-making process, for example, the collection of different objectives from different stakeholders or from interactive planning support. Apparently, further efforts on the development and integration of GIS and planning support systems will be one of the future directions of sustainable land use optimization studies (Cao et al., 2011, 2012). Second, interpreting these sustainability objectives within the context of different research areas and demands, along with the quantitative modeling of these objectives, are also very crucial. The challenging task is to develop effective and quantitative evaluation models to explore and reflect the systematic correlations among different variables involved, for example, locations, land use types and fitness values. This is always a significant challenge in the study of sustainable land use optimization, and definitely deserves more attention from us in the future. Third, as one kind of multiobjective optimization problem, it is critical to find a scientific way to integrate these objectives and constraints in order to formulate an effective multiobjective sustainable land use optimization model. In this article, we have reviewed two primary categories of models, single objective–based models and Pareto front–based models, both of which have

Spatial Optimization for Sustainable Land Use Planning

251

pros and cons given different contexts of different sustainable land use optimization problems. More efforts are still expected to better meet the demand of different sustainable land use optimization problems that are represented by vectors or rasters, and also to take into account the characteristics of different approaches for solving the sustainable land use optimization problems as well as the planning support process. In addition to the multiobjective land use optimization models selection and development, the approaches for generating land use planning solutions or scenarios from the single objective–based models or Pareto front–based models represent another critical issue that determines the effectiveness and overall efficiency of sustainable land use optimization problem solving. This is of particular importance when the problem is complicated (e.g., nonlinear nature, large number of spatial variables). As mentioned previously, a variety of efforts have been made, but with the emergence of high-performance computing and cloud computing, much more effort needs to be, or could be, put into this direction, which would also give us an opportunity to better tackle other computationally intensive spatial optimization problems (Cao and Ye, 2013). Last but not least, another aspect that hasn’t attracted, but deserves, much attention and effort is the temporal dimension of the sustainable land use optimization problems. Obviously, sustainable land use planning does not only refer to where and how much to allocate to the different land use types, but also refers to when (Shirabe, 2004). There have been few studies on this aspect, but the topic needs much more attention in the future. With the consideration of space and time, the sustainable land use optimization problems could be much better addressed, even though it is very challenging in terms of model formulation and optimization problem solving.

1.17.3.5

Conclusion

The past decades have witnessed extremely impressive economic growth and urbanization around the world. This rapid urban development has also brought with it numerous related environmental and social issues, including deterioration of the physical and built environment, increased traffic congestion, and social inequality, which has generated a significant need for proactive and sustainable land use planning to steer the new development towards a sustainable society. Starting with the concept of spatial optimization, this article has introduced and explained the term of sustainable land use planning, and reviewed spatial optimization approaches for solving sustainable land use planning problems, including the objectives and constraints, formulation and solutions of sustainable land use optimization problems. In addition, this article has also discussed the challenges and opportunities of sustainable land use optimization problems, which will be able to guide future research directions in this field. We look forward to more efforts and studies on spatial optimization and sustainable land use planning in the future, and hope this article can help the scholars and practitioners make better plans or decisions towards sustainable land use planning.

References Aerts, J.C.J.H., Heuvelink, G.B.M., 2002. Using simulated annealing for resource allocation. International Journal of Geographical Information Science 16 (6), 571–587. Aerts, J.C.J.H., Eisinger, E., Heuvelink, G.B.M., Stewart, T.J., 2003a. Using linear integer programming for multi-site land use allocation. Geographical Analysis 35, 148–169. Aerts, J.C.J.H., Herwijnen, M.v., Stewart, T.J., 2003b. Using simulated annealing and spatial goal programming for solving a multi site land use allocation problem. Lecture Notes in Computer Science 2632, 448–463. Anderson, W., Kanaroglou, P., Miller, E., 1996. Urban form, energy and the environment: A review of issues, evidence and policy. Urban Studies 33, 7–35. Arbury, J. (2005). From urban sprawl to compact city: An analysis of urban growth management in Auckland. University of Auckland. Arthur, J.L., Nalle, D.J., 1997. Clarification on the use of linear programming and GIS for land-use modelling. International Journal of Geographical Information Science 11 (4), 397–402. Balling, R.J., Brown, M.R., Day, K., 1999. Multiobjective urban planning using genetic algorithm. Journal of Urban Planning and Development 125 (2), 86–99. Burton, E., 2000. The compact city: Just or just compact? A preliminary analysis. Urban Studies 37 (11), 1969–2007. Cao, K., Ye, X., 2013. Coarse-grained parallel genetic algorithm applied to a vector based land use allocation optimization problem: The case study of Tongzhou Newtown, Beijing, China. Stochastic Environmental Research and Risk Assessment 27 (5), 1133–1142. Cao, K., Batty, M., Huang, B., Liu, Y., Yu, L., Chen, J., 2011. Spatial multi-objective land use optimization: Extensions to the non-dominated sorting genetic algorithm-II. International Journal of Geographical Information Science 25 (12), 1949–1969. Cao, K., Huang, B., Wang, S., Lin, H., 2012. Sustainable land use optimization using boundary-based fast genetic algorithm. Computers, Environment and Urban Systems 36 (3), 257–269. Cao, K., Huang, B., Li, M., Li, W., 2014. Calibrating a cellular automata model for understanding rural–urban land conversion: A Pareto front-based multi-objective optimization approach. International Journal of Geographical Information Science 28 (5), 1028–1046. Chandramouli, M., Huang, B., Xue, L., 2009. Spatial change optimization: Integrating GA with visualization for 3D scenario generation. Photogrammetric Engineering and Remote Sensing 75 (8), 1015–1023. Church, R. L. (2016). Tobler’s law and spatial optimization: Why Bakersfield? International Regional Science Review, DOI: 0160017616650612. Chuvieco, E., 1993. Integration of linear programming and GIS for land-use modelling. International Journal of Geographical Information Science 7 (1), 71–83. Cliff, A.D., Ord, J.K., 1973. Spatial autocorrelation. Pion, London. Cova, T.J., Church, R., 2000a. Exploratory spatial optimization in site search: A neighborhood operator approach. Computers, Environment and Urban Systems 24 (5), 401–419. Cova, T.J., Church, R.L., 2000b. Contiguity constraints for single-region site search problems. Geographical Analysis 32 (4), 306–329. Dai, W., Ratick, S.J., 2014. Integrating a raster geographical information system with multi-objective land allocation optimization for conservation reserve design. Transactions in GIS 18 (6), 936–949. David, H.A., 1988. The method of paired comparisons. Oxford University Press, New York. Duh, J.-D., Brown, D.G., 2007. Knowledge-informed Pareto simulated annealing for multi-objective spatial allocation. Computers, Environment and Urban Systems 31 (3), 253–281.

252

Spatial Optimization for Sustainable Land Use Planning

Euler, L., 1741. Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Scientiarum Petropolitanae 8, 128–140. FAO. (1993). Guidelines for land-use planning (Vol. 1). Food & Agriculture Org. Gabriel, S.A., Faria, J.A., Moglen, G.E., 2006. A multiobjective optimization approach to smart growth in land development. Socio-Economic Planning Sciences 40 (3), 212–248. Goldberg, D.E., 1989. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Longman, Boston, MA. Guiliano, G., Narayan, D., 2003. Another look at travel patterns and urban form: The US and Great Britain. Urban Studies 40 (11), 2295–2312. Haque, A., Asami, Y., 2011. Optimizing urban land-use allocation: Case study of Dhanmondi Residential Area, Dhaka, Bangladesh. Environment and Planning B: Planning and Design 38 (3), 388–410. Hof, J.G., Bevers, M., 1998. Spatial optimization for managed ecosystems. Columbia University Press, New York. Hotelling, H., 1929. Stability in competition. The Economic Journal 39 (153), 41–57. Huang, K., Liu, X., Li, X., Liang, J., He, S., 2013. An improved artificial immune system for seeking the Pareto front of land-use allocation problem in large areas. International Journal of Geographical Information Science 27 (5), 922–946. Janssen, R., Herwijnen, M.v., Stewart, T.J., Aerts, J.C.J.H., 2008. Multiobjective decision support for land-use planning. Environment and Planning B: Planning and Design 35 (4), 740–756. Kuby, M., 1989. A location–allocation model of Lösch’s central place theory: Testing on a uniform lattice network. Geographical Analysis 21 (4), 316–337. Kurttila, M., Pukkala, T., Loikkanen, J., 2002. The performance of alternative spatial objective types in forest planning calculations: A case for flying squirrel and moose. Forest Ecology and Management 166, 245–260. Leccese, M., McCormick, K., 2000. Charter of the new urbanism. McGraw-Hill, New York. Lemberg, D.S., Church, R.L., 2000. The school boundary stability problem over time. Socio-Economic Planning Sciences 34 (3), 159–176. Leung, Y., 1984. Robust programming for spatial optimization problems. Chinese University of Hong Kong, Hong Kong. Li, W., Church, R.L., Goodchild, M.F., 2014. The p-ompact-regions problem. Geographical Analysis 46 (3), 250–273. Ligmann-Zielinska, A., Church, R.L., Jankowski, P., 2008. Spatial optimization as a generative technique for sustainable multiobjective land-use allocation. International Journal of Geographical Information Science 22 (6), 601–622. Liu, N., Huang, B., Chandramouli, M., 2006. Optimal siting of fire stations using GIS and ANT algorithm. Journal of Computing in Civil Engineering 20 (5), 361–369. Lock, D., 1995. Room for more within city limits? Town and Country Planning 64 (7), 173–176. Masoomi, Z., Mesgari, M.S., Hamrah, M., 2013. Allocation of urban land uses by multi-objective particle swarm optimization algorithm. International Journal of Geographical Information Science 27 (3), 542–566. Mathieson, R., 1969. The Soviet contribution to regional science: A review article. Journal of Regional Science 9 (1), 125–140. Matthews, K.B., Sibbald, A.R., Craw, S., 1999. Implementation of a spatial decision support system for rural land use planning: Integrating geographic information system and environmental models with search and optimisation algorithms. Computers and Electronics in Agriculture 23, 9–26. Meadows, D.H., Meadows, D.L., Randers, J., Behrens, W.W., 1972. The limites to growth. Universe Books, New York. Miettinen, K.M., 1999. Nolinear multiobjective optimization. Kluwer Academic Publishers, Norwell, MA. Munasinghe, M. (1993). Environmental economics and sustainable development. Washington, DC: World Bank. Næss, P., 2001. Urban planning and sustainable development. European Planning Studies 9 (4), 503–524. OECD. (1994). Strategies for a sustainable pattern of development of urban rregions in Europe. Paper presented at the Background study for the European Council’s 10th Conference for Ministers Responsible for Regional Planning (CEMAT), Oslo, Norway. O’Kelly, M.E., Miller, H.J., 1994. The hub network design problem: A review and synthesis. Journal of Transport Geography 2 (1), 31–40. Pareto V ([1906]1971) Manual of Political Economy. New York: A. M. Kelley. ReVelle, C.S., Swain, R.W., 1970. Central facilities location. Geographical Analysis 2 (1), 30–42. Saaty, T.L., 2008. Relative measurement and its generalization in decision making: Why pairwise comparisons are central in mathematics for the measurement of intangible factors the analytic hierarchy/network process. RACSAM 102 (2), 251–318. Shirabe, T. (2004). Towards a temporal extension of spatial allocation modeling. Paper presented at the International Conference on Geographic Information Science, GIScience 2006. Berlin: Springer. Shirabe, T., 2005. A model of contiguity for spatial unit allocation. Geographical Analysis 37 (1), 2–16. Steiner, J., Geiser, C. F. and Schröter, H. E. (1876). Die Theorie der Kegelschnitte, gestützt auf projectivische Eigenschaften . bearb. von Heinrich Schröter (Vol. 2). BG Teubner. Stewart, T.J., 1991. A multi-criteria decision support system for R&D project selection. The Journal of the Operational Research Society 42 (1), 17–26. Stewart, T.J., 1993. Use of piecewise linear value functions in interactive multicriteria decision support: A Monte Carlo study. Management Science 39 (11), 1369–1381. Stewart, T.J., Janssen, R., 2014. A multiobjective GIS-based land use planning algorithm. Computers, Environment and Urban Systems 46, 25–34. Stewart, T.J., Janssen, R., VanHerwijnen, M., 2004. A genetic algorithm approach to multiobjective land use planning. Computers & Operations Research 31, 2293–2313. Thomas, L., Cousins, W., 1996. The compact city: A successful, desirable and achievable urban form? In: Jenks, M., Burton, E., Williams, K. (Eds.), The compact city: A sustainable urban form? E&FN Spoon, London, pp. 53–65. Tong, D., Murray, A.T., 2012. Spatial optimization in geography. Annals of the Association of American Geographers 102 (6), 1290–1309. UN. (1998). Major trends characterizing human settlements development in the ECE region. New York/Geneva: United Nations. Van Lier, H.N., Jaarsma, C.F., Jurgens, C.R., de Buck, A.J., 1994. Sustainable land use planning. Elsevier, Amsterdam. Wardoyo, W., Jordan, G.A., 1996. Measuring and assessing management of forested landscapes. Forestry Chronicle 72, 639–645. WCED. (1987). Our common future. Oxford: WCED. Williams, K., 1999. Urban intensification policies in England: Problems and contradictions. Land Use Policy 16 (3), 167–178. Williams, J.C., 2002. A zero-one programming model for contiguous land acquisition. Geographical Analysis 34 (4), 330–349. Xue, W., Cao, K., Li, W., 2015. Municipal solid waste collection optimization in Singapore. Applied Geography 62, 182–190. Zimmermann, H.J., 1978. Fuzzy programming and linear programming with several objective functions. Fuzzy Sets and Systems 1, 45–55.

1.18

Geostatistical Approach to Spatial Data Transformation

Eun-Hye Yoo, University at Buffalo, Buffalo, NY, United States © 2018 Elsevier Inc. All rights reserved.

1.18.1 1.18.2 1.18.3 1.18.3.1 1.18.3.2 1.18.3.3 1.18.4 1.18.4.1 1.18.4.2 1.18.4.3 1.18.5 References

1.18.1

Introduction Geostatistical Modeling Geostatistical Solutions to COSP Point Kriging Block Kriging Area-to-Point Kriging Case Study Reference Point Values and Sample Data Kriging Predictions Evaluation of Support Differences in COSP Discussion and Conclusions

253 254 255 255 256 256 257 257 258 260 262 263

Introduction

Spatial data are not necessarily observed in a form that is required by the phenomenon under study. For example, we are often interested in continuous air pollutant concentrations in a study area, but we only have access to data observed at sparse monitoring sites. To resolve these issues, we may make assumptions that the available spatial data are the result of a transformation of what we call original spatial process, which is continuously varying in space. This assumption enables us to undertake data transformation from the available data (i.e., point observations) into another form (i.e., a continuous surface), and further, to conduct subsequent analysis/ modeling (Arbia, 1989). Spatial data transformation from points to areas, areas to points, or lines to points is a fundamental function in geographic information system (GIS) technology and has been widely used in spatial analysis and modeling. However, the issue of the domain or footprint that each piece of data is defined with, referred to as support, often has been ignored, despite its substantial consequences when following statistical analyses and GIS modeling. The problems or errors associated with spatial data transformation (more specifically, when analysts want to make inferences about the target regions for which data are needed by using observed source data) are almost inevitable in most scientific disciplines dealing with spatial data. These problems have been studied under different terms across disciplines; for example, the modifiable areal unit problem (MAUP) in geography is associated with areal data, when one aims to understand a variable’s distribution at a new level of spatial aggregation or to relate it to another variable that is already available at this level (Openshaw and Taylor, 1981). The smoothing effects associated with aggregation or averaging may alter the spatial autocorrelation of the spatial units and subsequently affect the distribution. In mining applications, for instance, the estimation of the block average of metal concentrations at different spatial scales might be sought using point sample data collected over the study region (Isaaks and Srivastava, 1989). Similarly, we might have a very low-resolution global climate model for weather forecasting, and we need to enhance the resolution of prediction for local-region weather forecasting. The spatial misalignment of ecological (aggregate) data and inference on individual level also has attracted attention in political science and many health science applications (Robinson, 1950; Banerjee et al., 2004; Wakefield and Morris, 2011). Assuming that observed data and target prediction or transformation are modeled through a spatial process, these problems can be captured under the umbrella term change of support problems (COSP), and some solutions have been suggested (e.g., Gotway and Young, 2002). Table 1 summizes commonly encountered examples of COSP problems in GIS with point and areal data/values as source data and target of inference, respectively.

Table 1

Examples of COSPs

Source

Target

Potential solutions

Points

Points

Points Areas Areas

Areas Points Areas

Spatial interpolation [inverse distance weight (IDW), spline, point kriging] Block kriging, spatial smoothing Area-to-point (a2p) kriging Areal interpolation

Modified from Arbia, G. (1989). Statistical effects of spatial data transformations: A proposed general framework. In Goodchild, M. & Gopal, S. (eds.) Accuracy of spatial databases, pp. 249–259. London: Taylor & Francis and Gotway, C. and Young, L. (2002). Combining incompatible spatial data. Journal of the American Statistical Association 97(458), 632–648.

253

254

Geostatistical Approach to Spatial Data Transformation

While a vast amount of literature exists both inside and outside geography, a practical textbook that presents the concept of COSP with examples is rare. In this article, we aim to review some geostatistical solutions for COSPs, including MAUP as a special case. We present a brief introduction of linear geostatistics and a geostatistical solution for COSP, which often is required in many health science applications, including spatial and environmental epidemiology. As source data, we consider air pollution data that are observed at points in space (i.e., monitoring stations), and over areal units (e.g., regular grids with varying cell sizes or postcodes). But they need to be transformed to infer the spatial distributions of air quality concentrations at various spatial resolutions. Furthermore, we discuss practical issues that arise during spatial data transformation, including the changes in spatial dependence and the descriptive statistics of attributes (i.e., mean and standard deviation) of source data and target predictions, using simulated data in the case study.

1.18.2

Geostatistical Modeling

Geostatistics is a branch of applied statistics that concentrates on the description of spatial patterns and estimating values at unsampled locations (Liebhold et al., 1993). Geostatistics quantify and model spatial and temporal variabilities of the variable of interests, assuming that samples close together, on average, are valued more similarly than those that are farther apart. This characteristic enables geostatistics to provide a comprehensive framework for integrating data of different supports and reliability to provide location-dependent models of spatial uncertainty. Its uses in both remote sensing and GIS have been documented in the literature (e.g., Burrough and McDonnell, 1998). We start with a stationary Gaussian process specification for z(s), which denotes a point support value at a location with coordinate vector s ¼ ðx; yÞ within a study area A. Let z(s0 ) represent the value of the same variable at some distance h away; i.e., s 0 ¼ s þh. We assume that both of the point values z(s), z(s0 ) are a realization of the random variables (RVs) Z(s) and Z(s0 ), which model uncertainty about the unknown point values z(s) and z(s0 ). We further assume that the set of all point RVs in the study area constitutes a random function. Under the decision of stationarity, we may assume that the property of interest varies in a similar way across the study area and treat each of observation as realizations of the RVs (Lloyd, 2010). More specifically, the second-order stationarity requires both the mean and covariance to be constant within the location, summarized as follows: EfZ ðsÞg ¼ mðsÞ;

c s˛A

Ef½Z ðsÞ  m½Z ðs0 Þ  mg ¼ CðhÞ;

(1) c h

(2)

The mean values of the variables do not depend on the location s ˛ A, so long as the variance of the increment is finite under the assumption of intrinsic stationarity. The intrinsic stationarity assumption implies that a constant mean m(s) exists within the study area, but the value is unknown and the variogram g(h) is sufficient to characterize the RV of interest. The variogram is more frequently used than the covariance function, although the covariance function is also frequently encountered in the geostatistical literature. The covariogram C(h) summarizes the similarity of values of pairs that are h units of distance apart, whereas the variogram summarizes the dissimilarity of values as a function of separation distance. Typically, when the variogram values are plotted for all appropriate h values, they are small for low values of h (with a smaller dissimilarity of values at a short distance apart), increase with increasing distance, and then usually level off or become constant after some distance, as shown in Fig. 1. The covariance function (covariogram) is closely linked to the variogram, in that the variogram can be written as a function of covariance function and variance: gðhÞ ¼ Cð0Þ CðhÞ. In the application of geostatistics, much effort and time are spent in analysis and modeling of the spatial structure of the process ^ b ðhÞ or covariance model C(h). underlying the observed data; that is, estimating the empirical variogram g The variogram is estimated by calculating the squared differences between any pair of point observations and obtaining half the average for all observations separated by a given lag. Consider that there are nc numbers of paired observations, placed an h distance apart. The empirical variogram can be estimated from the paired observations ½zðsi Þ ; zðsi þhÞ ; i ¼ 1; . ; nc  as b ðhÞ ¼ g

nc 1 X ½zðsk Þ  ðsk þ hÞ2 2nc k¼1

(3)

at each lag h. C(h)

σ2

γ (h)

h Covariogram C(h) Fig. 1

Modeling spatial correlation.

h Variogram γ (h)

Geostatistical Approach to Spatial Data Transformation

255

An empirical variogram that quantifies the spatial correlation between observations is fitted via a mathematic function (model), which can be used to estimate values at unsampled points. This estimation procedure is called kriging, named after the South African gold mining engineer D. G. Krige, whose goal was to estimate the most likely distribution of gold based on samples from a few boreholes. Matheron (1973) expanded Krige’s empirical experiments and put them into the theoretical framework of geostatistics, as have others, including Wold (1939), Wiener (1949), Journel and Huijbregts (1978), Cressie (1993), and Chilès and Delfiner (1999). Kriging provides a solution to a fundamental problem faced by environmental scientists, that of predicting values from sparse sample data based on a stochastic model of spatial variation (Oliver, 2010). Kriging, a method of optimal prediction or estimation in geographical space, is often known as a best linear unbiased predictor. Kriging fully uses observed data with optimal weights based on spatial correlation quantified via a covariance or variogram. If data are spatially independent, little or no gain will be obtained by using geostatistical models and tools for the prediction (Cressie, 1993). Kriging also provides not only predictions, but also the mean-squared prediction error (MSPE), which can be regarded as a measure of uncertainty associated with prediction. Kriging can be used for point or block supports of various sizes, depending on the aims of the prediction, and provides a general solution of COSP.

1.18.3

Geostatistical Solutions to COSP

As briefly defined earlier, the term support in geostatistics means the geometrical size, shape, and spatial orientation of the domain over the datum that is defined. Point sample data consist of n realizations of the spatial process, which is denoted as zs ¼ ½zðs1 Þ; . ; zðsn ÞT . On the other hand, we assume that K areal data zB ¼ ½zðB1 Þ; zðB2 Þ; . ; zðBKÞT are collected over areal supports (block units) whose kth areal datum is defined as an areal or block average by Z nk   1 1 X Z ðsÞdsz Z sp Z ðBk Þ ¼ (4) jBk j s˛Bk jBk j p where jBk j denotes the volume of the kth areal support in the study area Bk 3A, which consists of nk discretized points. Assuming that the mean and covariance of the spatial process of interest Z(s) are known as mZ and Cov ðZðsÞ ; Z ðs 0 ÞÞ ¼ Cðs; s 0 Þ, respectively, we can make an inference on a set of areal RVs Z(Bk) based on the definition of areal data in Eq. (4). That is, the expected value of areal data RVs can be written as a mean of Z(s) as     Z Z 1 1 EfZ ðBk Þg ¼ E (5) Z ðsÞds ¼ mZ $E ds ¼ mZ jBk j s˛Bk jBk j s˛Bk The covariance between a pair of blocks can be written as  ðBk ; Bk0 Þ ¼ 1 1 CovfZðBk Þ; Z ðBk0 Þg ¼ C jBk j jBk0 j

Z

Z s˛Bk

s0 ˛Bk0

Cðs; s0 Þdsds0

Similarly, the covariance between an areal RV Z(Bk) and a point RV Z(s) may be written as Z  ðBk ; sÞ ¼ 1 CovfZ ðBk Þ; Z ðsÞg ¼ C Cðs0 ; sÞds0 jBk j s0 ˛Bk

1.18.3.1

(6)

(7)

Point Kriging

Variant kriging is applicable depending on the assumptions and the amount of knowledge about the underlying processes, whereas simple kriging (SK) can be used when the mean is known and the process is locally stationary. Consider that a RV Z(s) is measured at n sampling points si; i ¼ 1; . ; n, and the constant mean mZ is known. We want to use this information to estimate its value at a target point s0 with the same support of the data via point-to-point (p2p) SK as b ðs0 Þ ¼ Z

n X

lðsi Þ½zðsi Þ  mZ  þ mZ

(8)

i¼1

where Ẑ(s0) denotes the predicted value at s0 and l(si) is the optimal kriging weight applied to the ith point sample datum z(si). The kriging weights lðsi Þ; i ¼ 1; . ; n are obtained by solving the following SK system: n X   C si ; sj lðsi Þ ¼ Cðs0 ; si Þ; j¼1

In matrix form, this SK system can be written as Kss ls ¼ K0s

f or i ¼ 1; .; n

(9)

256

Geostatistical Approach to Spatial Data Transformation

where Kss is an ðn  nÞ matrix of p2p data covariances whose (i, j)th element consists of C(si, sj), and the vector of optimal weights is denoted as ls ¼ ½lðsi Þ; i ¼ 1; .; nT The ðn  1Þ vector of data-to–target point covariance K0s ¼ ½Cðs0 ; si Þ ; i ¼ 1 ; . ; n consists of the cross-covariance between the target prediction point and data si, i ¼ 1; . ; n. Using the estimated kriging weights, the MSPE of Z at s0 is obtained as a function of the variance C(0) and cross-covariance of target prediction location s0 and data locations si; i ¼ 1; . ; n, weighted by SK weights as n X b ðs0 Þ  Zðs0 Þ2 ¼ CZ ð0Þ  s2 ðs0 Þ ¼ E½ Z lðsi ÞCðsi ; s0 Þ i¼1   T ¼ C 0Þ  KT0s K1 ss K0s ¼ C 0Þ  ls K0s

(10)

As an exact interpolator, the p2p kriged value at a sampling site si ; i ¼ 1; . ; n is the same with the observed value z(si), and the estimation variance is zero.

1.18.3.2

Block Kriging

A block kriging predictor can be used when a spatial inferential problem is concerned about the value of target block support from point sample data. The prediction of a block datum at a target block B0, denoted as Ẑ(B0), can be obtained as a weighted combiP nation of point samples ½zðsi Þ; i ¼ 1; . ; n via point-to-block (p2b) SK as ZðB0 Þ ¼ ni¼1 lðsi Þzðsi Þ. The optimal kriging weights ½li ¼ lðsi Þ ; i ¼ 1 ; . ; n assigned to n point samples are obtained by solving the following p2b kriging system: n X    ðB0 ; si Þ; C si ; sj lðsi Þ ¼ C

f or i ¼ 1; .; n

(11)

j¼1

 ðB0 ; si Þ is calculated as an areal average of p2p covariances Cðs; si Þ; s ˛ B0 , following Eq. (7). where block-to-point covariance C Using a matrix notation, the p2b kriging system in Eq. (11) can be written as Kss ls ¼ KBs where Kss is an ðn  nÞ matrix of p2p data covariances whose (i, j)th element consists of C(si, sj), and the vector of optimal weights is denoted as ls ¼ ½lðsi Þ; i ¼ 1; .; nT  ðB0 ; si Þ; i ¼ 1; . ; n consists of the block-averaged cross-covariance between the B0 The ðn 1Þ vector of p2b covariance KBs ¼ ½C block and point data si, i ¼ 1; . ; n. Similar to the p2p kriging MSPE, the uncertainty associated with the target block estimation is quantified by the p2b MSPE, which is written as s2 ðB0 Þ ¼ CðB0 Þ  KTBs K1 ss KBs The only difference between this and the p2p kriging MSPE in Eq. (10) is in the matrix of data-to-target point covariance K0s replaced by the matrix of data to–target block covariance KBs.

1.18.3.3

Area-to-Point Kriging

Some data are measured or reported over areal units or blocks, such as sociodemographic indices over census tracks or surface reflectance values of pixels. Suppose that the kth areal datum Bk; k ¼ 1; . ; K is defined as an areal average of point values [see the definition of areal datum for Eq. (4)]. The problem of estimating point values from areal data or data obtained at multiple spatial resolutions corresponds to the goal of areal interpolation in geography. The methodology of areal interpolation rooted in both cartographic methods and statistical methods (Haining, 2003), given that making attribute values recorded on one spatial framework to another spatial framework, has been frequently encountered in spatial research. One of the critical issues in areal interpolation is a volume-preserving (pycnophylatic) or coherence property (Tobler, 1979; Lam, 1983), suggesting that the known volume associated with source zone data must be reproduced on the new target zones. Kyriakidis (2004) showed that the geostatistical solution, called area-to-point (a2p) kriging, is linked to a choropleth map analytically, and Yoo et al. (2010) demonstrated the similarity between a2p kriging and Tobler’s pycnophylatic interpolation in a case study. Similar to p2p kriging, the a2p kriging prediction at a target point s0 is obtained as a weighted linear combination of K areal data ½zðBk Þ; k ¼ 1; . ; KT as follows: Zðs0 Þ ¼

K X k¼1

lðBk ÞzðBk Þ

Geostatistical Approach to Spatial Data Transformation

257

The optimal a2p kriging weights for K areal data lk ¼ ½lðBk Þ ; k ¼ 1 ; . ; K are obtained by solving the following kriging system [see Kyriakidis, 2004 for further details]: K X

 ðBk ; Bk0 ÞlðBk Þ ¼ C  ðBk ; s0 Þ C

f or k ¼ 1; .; K

(12)

k0 ¼1

 ðBk Bk0 Þ for any pair of block RVs Z(Bk) and ZðBk 0 Þ is calculated as an areal average of point where the block-to-block covariance C  ðBk ; s0 Þ can be derived from the definition in Eq. (7). The a2p kriging covariance values using Eq. (6). Similarly, a2p covariance C system in Eq. (12) can be summarized using the matrix notation KBB lk ¼ KBs , where KBB denotes a ðK  KÞ matrix consisting of area-to-area covariances and KBs is a K 1 vector of a2p covariance. The vector of optimal kriging weights assigned for K areal data ½zðBk Þ ; k ¼ 1; . ; KT is denoted as lTk ¼ ½lðBk Þ; k ¼ 1; . ; KT . The uncertainty associated with the a2p prediction at s0 is quantified by the corresponding MSPE, which is written as T s2 ðs0 Þ ¼ Cðs0 Þ  KTBs K1 BB KBs ¼ Cðs0 Þ  lk KBs

The variance of the spatial process C(0) plays an important role like p2p kriging variance, but the proximity of the target prediction point s0 and block data Bk; k ¼ 1; . ; K also affect the MSPE weighted by the kriging weights.

1.18.4

Case Study

We considered a case study with the primary goal of characterizing the air-quality surface at multiple spatial scales. Time-series designs in ecological studies often call for air-quality estimates at postcode, but a cohort design in individual-level studies typically requires air pollution exposure at individual residences. On the other hand, primary data sources are either point observations from a sparse air-quality-system network or numerical air quality model outputs at a regular grid. The gap between data available at source zones and the attributes at desired target zones calls for spatial data transformation, and we may apply a geostatistical solution for COSP to undertake these tasks. To focus on the effects of support differences in spatial predictions while controlling the sampling effects and the nonnormality of air pollutants, we used a realistically simulated data set. Specifically, we generated a surface of fine particulate matter (PM2.5) in New York City using a geostatistical simulation and used them as reference values throughout the case study. We randomly selected 29 points and extracted simulated values as point sample data, which amount to PM2.5 measurements collected at a monitoring network in the city. We also generated areal data by calculating the areal averages of the reference values over 25 irregularly shaped areal units. These two data sets with different supports, as shown in Fig. 2, are subsequently used as source data for predictions. We used free software, the gstat package, for the simulation and finding geostatistical solutions to the spatial transformations.

1.18.4.1

Reference Point Values and Sample Data

We generated reference point values using an unconditional simulation algorithm with a constant mean (mZ ¼ 5 Þ and the isotropic spatial covariance model. The covariance model takes a form of an exponential function with two parameters: a range of 6000 m and variance of 1.1 units, which is written as   j3hj (13) CðhÞ ¼ 0:1 þ exp 6000 The simulated reference point support values, which are based on the covariance model in Eq. (13) over a (108  80) regular grid with cell size of 500  500 m2, are a realization of a Gaussian random field and represents a daily average of PM2.5 concentrations. In practice, the true air pollutant surface is unknown; instead, only a subset of its realization is available as sample points. We obtained point sample data by extracting the reference point values at 29 randomly selected point locations to mimic a realistic situation, and then generated areal sample data by superimposing 25 irregularly shaped areal units over the reference point values and calculating their areal average within each pixel. The summary statistics of point reference values, point sample data, and areal data are summarized in Table 2. The mean of the point sample (5.16) and areal data (5.24) is close to that of the reference values (5.22), but the minimum and maximum values are less than those of reference values. We suspect that the smaller range of sample data is attributed to the sampling and averaging effects, respectively, which is also confirmed by the smaller standard deviation of two data sets (0.86 and 0.61). The spatial distribution of reference values and the two sample data sets is illustrated in Fig. 2. The reference values and areal data in Fig. 2(A) and (C) are presented as continuous surfaces, whereas the point sample data in Fig. 2(B) are represented by point symbols whose colors represent the attribute values over the five borough boundaries of New York City. Areal data consisting of 25 irregular shaped areal units is shown in Fig. 2(C), whose attribute values are represented by various colors.

258

Geostatistical Approach to Spatial Data Transformation

(A)

8 7 6 5 4 3 2 (B)

(C)

6.5

6.0

5.5

5.0 [3.476,4.173] [4.173,4.87] [4.87,5.568] [5.568,6.265] [6.265,6.962] Fig. 2

4.5

4.0

Study area and data: (A) reference values, (B) point sample data over the five boroughs of New York City, and (C) areal average of data.

Table 2

Descriptive statistics of reference values and two sets of sample data

Reference Point sample Areal data

1.18.4.2

Minimum

First quarter

Median

Mean

Third quarter

Maximum

Standard deviation

2.32 3.48 4.15

4.53 4.73 4.87

5.21 5.16 5.16

5.22 5.16 5.24

5.87 5.61 5.68

8.26 6.96 6.47

0.97 0.86 0.61

Kriging Predictions

Point sample data on air pollution (more specifically, PM2.5), are known at 29 sample locations, but they do not tell what is happening at individual residences or workplaces. Similarly, the areal average of air pollutant concentrations is often sought after to be integrated with socioeconomic data or health data that are obtained at aggregate levels. To provide information at intermediate point locations or neighborhood averages, a spatial data transformationdkriging predictiondwas undertaken. We illustrated the effects of target supports on the prediction by conducting both punctual and block kriging using the point sample data in Fig. 2(B). We conducted SK with the known mean mZ ¼ 5 and the exponential covariance function [see Eq. (13)]. The point sample data are used for both punctual (a grid with cell size of 500 m) and block kriging (a grid with cell size of 5000 m). The resulting prediction maps are shown in Fig. 3(A) and (C), and the corresponding MSPEs are in Fig. 3(B) and (D). The descriptive statistics of both types of kriging predictions are given in Table 3 as p2p and p2b, respectively. Similarly, we examined the effects of source zones (supports) on the point predictions. We conducted a2p kriging and centroidto-point (ac2p) kriging from the areal average data in Fig. 2(C). For illustration purposes, we designed two cases where predictions are obtained at point supports (a grid with cell size of 500  500 m2) using areal data with and without an explicit consideration of areal supports. In the former case, the geometric differences of each areal datum of Fig. 2(C) are explicitly taken into account in point predictions, whereas in the latter case, the supports of areal data were ignored and simplified as the geometric centroid of

Geostatistical Approach to Spatial Data Transformation

(A)

259

(B)

1.0 6.5

0.9

6.0

0.8 0.7

5.5

0.6 5.0

0.5 0.4

4.5

0.3

4.0

0.2 (C)

(D)

0.6 6.0

0.5

5.5

0.4

5.0

0.3

0.2

4.5

0.1 4.0 Fig. 3

Table 3

Kriging from the point source: (A) p2p prediction, (B) p2p MSPE, (C) p2b prediction, and (D) p2b MSPE.

Descriptive statistics of kriging predictions

Kriging

Minimum

First quarter

Median

Mean

Third quarter

Maximum

Standard deviation

p2p p2b a2p ac2p

3.75 4.08 3.87 3.84

4.75 4.77 4.74 4.88

5.11 5.11 5.18 5.28

5.09 5.09 5.22 5.29

5.43 5.41 5.65 5.67

6.79 6.26 6.81 7.42

0.47 0.43 0.63 0.60

each areal unit. That is, the size, shape, and orientation of areal data are explicitly accounted for in a2p kriging, whereas ac2p approximates the areal data as point support data. The prediction results and the corresponding MSPE are illustrated in Fig. 4(A) and (B), and the descriptive statistics are summarized in Table 3 as a2p and ac2p, respectively. A comparison of the summary statistics of four different types of kriging predictions indicates that the smoothing effects were the most obvious in p2b kriging predictions. This is also shown by the standard deviations (0.47 and 0.43) of punctual/block kriging, which are half of the reference point values (0.97), as well as the larger minimum value of p2b kriging prediction (4.08). In terms of spatial patterns, both p2p and p2b kriging reproduced the overall spatial patterns of the point reference values except in two regionsdthe high values in the top of the study area and the low values in the southeast side of the study area. Visual inspection of the two maps of p2p and p2b kriging predictions does not reveal substantial differences. Meanwhile, the kriging prediction based on areal data reproduced two regions of high values (marked in blue) relatively well, although a region in the northwest side of the study area with low values (shown in red) was ill represented. In particular, ac2p prediction in this region is different from the pattern of reference point values, and their high uncertainty is shown in Fig. 4(D).

260

Geostatistical Approach to Spatial Data Transformation

(A)

(B)

7.0 0.8 6.5 0.7 6.0 0.6 5.5 0.5 5.0 0.4 4.5

0.3

4.0

(C)

0.2

(D)

7.5

1.1 1.0

7.0 0.9 6.5

0.8

6.0

0.7

5.5

0.6

5.0

0.5

4.5

0.4 0.3

4.0 0.2 Fig. 4

Kriging from areal data source: (A) a2p prediction, (B) a2p MSPE, (C) ac2p prediction, and (D) ac2p MSPE

1.18.4.3

Evaluation of Support Differences in COSP

We evaluated the effects of support differences in COSP based on the following two criteriadthe reproduction of spatial structure and the volume-preserving (coherence) property. We assessed the quality of target predictions in terms of their capability to reproduce the spatial structure of the theoretical model. More specifically, we quantified the differences between the empirical variogram derived from kriging predictions and the theoretical model. Regarding the coherence property, we quantified the differences between the original data used and the summary of the prediction at source supports, either point sample data locations or averages over areal units. The spatial structure of two punctual kriging results (p2p and a2p) was compared to the variogram model for point reference values. We calculated the empirical variograms of both the reference values and point sample data to ensure that the realization and sample data represent the underlying process of interest. The results are summarized in Fig. 5(A) and (B). The spatial structure of reference values is close to the theoretical model with the estimated range (6830 m) and the total sill estimate (1.17). On the other hand, the fitted variogram of sample data approximates the range of the model variogram, although the range (5007 m) and the sill (0.81) estimates of the empirical variogram of point sample data are less than those of the theoretical variogram model. These gaps between the empirical variogram parameters of the reference point values and the point sample data are relatively small, but it is clear that the estimated parameters of point sample data are more deviated from those of theoretical models than from the reference models, and this uncertainty is propagated onto the subsequent kriging predictions. More specifically, the differences between the empirical variograms of kriging predictions (p2p and a2p) and the theoretical models are more substantial, as shown in Fig. 5(C) and (D). The sill parameter estimates from p2p (0.31) and a2p (0.58) kriging are less than 1.1 of the theoretical variogram due to the smoothing effects of kriging, but it also is due to the sampling effects on the point and areal data. The lower value of the sill estimate from p2p kriging predictions than that of a2p predictions is particularly due to the sampling effectdthe variability and the spatial configuration of point sample data versus the variability and the spatial resolution of areal data. In terms of the range parameters of p2p and a2p predictions, both the p2p (10,299 m) and a2p kriging (14,796 m) overestimated the range (6000 m) of

Geostatistical Approach to Spatial Data Transformation

(A)

(B)

Reference

Point sample data 1.2

Semivariogram

Semivariogram

1.2

261

0.8

0.4

0.8

0.4

o Data − Fitted − Model

0.0 0

5000

10,000

15,000

0.0 0

20,000

5000

(C)

(D)

p2p kriging

15,000

20,000

15,000

20,000

a2p kriging 1.2

Semivariogram

1.2

Semivariogram

10,000 Distance (m)

Distance (m)

0.8

0.4

0.0

0.8

0.4

0.0 0

5000

10,000 Distance (m)

15,000

20,000

0

5000

10,000 Distance (m)

Fig. 5 Variogram reproduction: (A) reference values, (B) point sample data, (C) p2p predictions, and (D) a2p predictions. The empirical variogram was denoted by dot symbols, and the fitted and theoretical (model) variogram models are denoted by blue and red lines, respectively.

the theoretical variogram. Our case study demonstrated the underestimation of variability characterized by the sill parameter and the overestimation of spatial contiguity characterized by the range parameter, although the relative difference between p2p and a2p is affected by the randomness in a realization of reference point values and the sampling scheme of point/areal data at each spatial scale. We also assessed the volume-preserving property of kriging predictions by comparing estimated values to the corresponding sample data values. Punctual kriging is an exact interpolator regardless of the variogram model used and the distribution of data, and thus the predicted values at collocated points, when the target prediction location of p2p kriging overlaps with the locations of the point sample data, should be the same as the sample data values. In our experiment, the prediction grid was not exactly overlapped with the point sample data, although their differences were minimal. The predicted values at the 29 point sample locations were compared to the sample values, and the mean of absolute difference for p2p and p2b was calculated as 0.13 and 0.36, respectively. The gap between the p2b kriging prediction and the point sample data is larger than p2p kriging due to the averaging effect of block kriging. In terms of kriging from areal data, we compared the areal data with the average values of a2p predictions and ac2p predictions within each areal unit. The comparison results were summarized by calculating the absolute mean square error, which were 0.03 and 0.26 for a2p and ac2p kriging, respectively. The larger error of ac2p kriging indicates that the error rising from ignoring the support of source data in ac2p and treating areal data as point support data is not trivial. The averaging effects of p2b kriging increased the error by 2.77% over p2p kriging, whereas ac2p kriging increased similar errors to 8.67% over that of a2p kriging. These absolute numerical differences may change as the different realization of reference values and alternative sampling scheme are used, but our case study illustrated how much error might occur by ignoring the support of source data in the spatial data transformation. To facilitate the comparisons, we presented the differences between kriging predictions and corresponding sample data in Fig. 6.

262

Geostatistical Approach to Spatial Data Transformation

(A)

(B)

6.5

p2p kriging

p2p kriging

6.5

5.5

5.5

4.5

4.5

3.5

3.5 3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

3.5

4.0

Point sample data

5.0

5.5

6.0

6.5

7.0

Point sample data

(C)

(D)

6.5

6.5

6.0

6.0

Areal avg. ac2p kriging

Areal avg. a2p kriging

4.5

5.5

5.0

4.5

5.5

5.0

4.5

4.5

5.0

5.5

6.0

6.5

4.5

Areal data Fig. 6

1.18.5

5.0

5.5

6.0

6.5

Areal data

Reproduction of source data: (A) p2p, (B) p2b, (C) a2p, and (D) ac2p kriging predictions.

Discussion and Conclusions

In this article, we reviewed geostatistical solutions to COSP and demonstrated how the supports of source data and the support of target prediction affect the inference on the underlying process, respectively. In a case study, we used realistically simulated data, which enabled us to evaluate the effects of source and target supports on the spatial data transformation. The effects of target supports on the prediction quality were demonstrated using the same point sample data and the variogram model via a comparison between punctual (p2p) kriging predictions and block (p2b) kriging predictions. As illustrated in Fig. 3(A) and (C), block kriging tend to level out the higher or lower values than punctual kriging and consequently yields a smoother surface. The overall patterns are similar to each other, and the differences are minimal where the predicted values are close to the mean, but their differences increase when the punctual kriging prediction is either near the maximum (6.84) or the minimum value (3.04). The effects of target support are also highlighted in the maps of MSPE in Fig. 3(B) and (D), as the largest MSPE of p2b kriging is half the MSPE of p2p and the localized patterns of the p2p prediction error have disappeared in the MSPE of p2b. This entails that the uncertainty associated with prediction is less in block kriging than punctual kriging. The comparison between p2p and a2p kriging predictions was designed to assess the effects of source supports on point prediction in our case study. Point prediction from aggregate data commonly occurs to compensate for the lack of data at a finer scale or to combine with other data sets collected in an incompatible zone or areal units. The quality of point prediction, therefore, has important impacts on the subsequent analyses and modeling. The quality of p2p kriging prediction is determined by the number of point samples and their spatial coverage (i.e., if they are clustered or regularly distributed), with respect to the spatial patterns of the underlying process, whereas the spatial resolution of areal data is a major factor to determine the quality of point predictions in a2p kriging. In the current case study, the size of sample datad25 areal units and 29 point sample locationsdis comparable, although

Geostatistical Approach to Spatial Data Transformation

263

the prediction results shown in Figs. 3(A) and 4(A) are quite contrasting. The two regions with high values in the point reference values in Fig. 2 (colored blue) are missed in p2p kriging predictions, but they are reproduced in a2p kriging predictions. The uncertainty associated with p2p kriging is also higher than that of a2p kriging. Quite substantial differences are shown in the maps of MSPEs. The MSPE of p2p increases as the prediction location is farther from the source data (sample point locations), whereas the MSPE of a2p shows a regular pattern in which the prediction uncertainty is minimal at the center of the areal data. Furthermore, the differences between source supports on point prediction were clearly demonstrated in the reproduction of spatial structure (i.e., the variogram). The empirical variograms of both kriging predictions in Fig. 5(C) and (D) underestimated the sill values of the theoretical model, whereas the p2p kriging prediction approximated the proposed range parameter better than the a2p kriging prediction due to the averaging effects of areal data. Finally, we assessed the prediction error that occurred by ignoring nonpoint supports and simplifying the areal supports by their central points by comparing a2p and ac2p. The differences between the two maps in Fig. 4(A) and (C) demonstrate the prediction error caused by ignoring the size, shape, and orientation of areal data. The support of areal data in Fig. 2(C) is not necessarily uniform, and the geometric centroid of each area unit is hardly representative of areal support. Consequently, the ac2p kriging prediction in Fig. 4(C) failed to capture the spatial patterns of the reference point values as opposed to a2p prediction, which is also evidenced in the evaluation of the coherence property, summarized in Fig. 6(C) and (D). We examined the impacts of target support, source support, and the misspecification of source support on the spatial data transformation. It is expected that the spatial data transformation alters the distributional characteristics of the outputs, but it is still unknown how much of the changes is due to the scale of analysis, the quality of sample data and their spatial configuration, and the support of outputs. Using the simulated values as a realization of the underlying process and deriving both point sample and areal data from the reference values, we were able to directly assess the error associated with the individual sources of problems. However, we acknowledge that our findings may be affected by the randomness of the realization of reference point values and the sampling scheme used to obtain sample data. More important, we demonstrated the effects of smoothing on the variability and spatial continuity of the outputs from spatial data transformation, but further investigation is needed to improve our understanding of the changes. For example, it may be worth investigating whether there is a systematic pattern in the reduced attribute variability with respect to the number of point sample or a relationship between smoothing effects and the scale of areal data. We performed our case study using the R software package gstat, but we concluded that the computational burden is too high for larger amounts of data. Future work, therefore, should involve developing a new tool for spatial data transformation that is open-source and capable of handling big data. When the goal of spatial data transformation is to predict values at a finer spatial scale than the scale of data available with higher accuracy, we often combine data from multiple sources with different resolutions and quality. The computational tool used in the current study is limited to single variables and does not consider a different quality of data. Increasingly, the availability of spatiotemporal data increases, and there is a need for downscaling methods that can handle such data.

References Arbia, G., 1989. Statistical effects of spatial data transformations: A proposed general framework. In: Goodchild, M., Gopal, S. (Eds.), Accuracy of spatial databases. Taylor & Francis, London, pp. 249–259. Banerjee, S., Carlin, B.P., Gelfand, A.E., 2004. Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC Press, Boca Raton, FL. Burrough, P.A., McDonnell, R.A., 1998. Principles of geographical information systems. Oxford University Press, Oxford. Chilès, J.P., Delfiner, P., 1999. Geostatistics: Modeling spatial uncertainty. John Wiley, New York. Cressie, N., 1993. Statistics for spatial data. John Wiley & Sons, New York. Gotway, C., Young, L., 2002. Combining incompatible spatial data. Journal of the American Statistical Association 97 (458), 632–648. Haining, R., 2003. Spatial data analysis: Theory and practice. Cambridge University Press, New York. Isaaks, E., Srivastava, R., 1989. An introduction to applied geostatistics. Oxford University Press, New York. Journel, A.G., Huijbregts, C.J., 1978. Mining geostatistics. Academic Press, New York. Kyriakidis, P.C., 2004. A geostatistical framework for the area-to-point spatial interpolation. Geographical Analysis 36 (3), 41–50. Lam, N.S.-N., 1983. Spatial interpolation methods: A review. The American Cartographer 10 (2), 129–149. Liebhold, A.M., Rossi, R.E., Kemp, W.P., 1993. Geostatistics and geographic information systems in applied insect ecology. Annual Review of Entomology 38 (1), 303–327. Lloyd, C.D., 2010. Local models for spatial analysis. CRC Press, Boca Raton, FL. Matheron, G., 1973. The intrinsic random functions and their applications’. Advances in Applied Probability 5, 439–468. Oliver, M.A., 2010. The variogram and kriging. Handbook of applied spatial analysis. Springer, Berlin, pp. 319–352. Openshaw, S., Taylor, P.J., 1981. The modifiable areal unit problem. In: Wrigley, N., Bennett, R.J. (Eds.), Quantitative geography: A British view. Routledge & Kegan Paul, London, pp. 60–69. Robinson, W., 1950. Ecological correlations and the behavior of individuals’. American Sociological Review 15, 351–357. Tobler, W.R., 1979. Smooth pychnophylactic interpolation for geographical regions. Journal of the American Statistical Association 74 (367), 519–530. Wakefield, J.C., Morris, S.E., 2011. The Bayesian modeling of disease risk in relation to a point source. Journal of the American Statistical Association 96 (453), 77–91. Wiener, N., 1949. Extrapolation, interpolation, and smoothing of stationary time series, vol. 2. MIT Press, Cambridge, MA. Wold, H., 1939. A study in the analysis of stationary time series. Almquist and Wiksell, Stockholm. Yoo, E.-H., Kyriakidis, P.C., Tobler, W., 2010. Reconstructing population density surfaces from areal data: A comparison of Tobler’s pycnophylactic interpolation method and areato-point kriging’. Geographical Analysis 42 (1), 78–98.

1.19

Spatial and Spatiotemporal Data Mining

Shashi Shekhar, Yan Li, Reem Y Ali, Emre Eftelioglu, and Xun Tang, University of Minnesota Twin Cities, Mississippi, MN, United States Zhe Jiang, University of Alabama, Tuscaloosa, AL, United States © 2018 Elsevier Inc. All rights reserved.

1.19.1 1.19.1.1 1.19.1.2 1.19.1.3 1.19.1.4 1.19.1.5 1.19.2 1.19.2.1 1.19.2.2 1.19.3 1.19.3.1 1.19.3.2 1.19.4 1.19.4.1 1.19.4.1.1 1.19.4.1.2 1.19.4.1.3 1.19.4.1.4 1.19.4.2 1.19.4.2.1 1.19.4.2.2 1.19.4.2.3 1.19.4.2.4 1.19.4.3 1.19.4.3.1 1.19.4.3.2 1.19.4.3.3 1.19.4.3.4 1.19.4.4 1.19.4.4.1 1.19.4.4.2 1.19.4.4.3 1.19.4.5 1.19.4.5.1 1.19.4.5.2 1.19.4.5.3 1.19.4.5.4 1.19.4.6 1.19.4.6.1 1.19.4.6.2 1.19.5 1.19.6 References

1.19.1

Introduction Societal Importance Challenges Related Work Scope Organization Input: Spatial and Spatiotemporal Data Types of Spatial and Spatiotemporal Data Data Attributes and Relationships Statistical Foundations Spatial Statistics for Different Types of Spatial Data Spatiotemporal Statistics Output Pattern Families Spatial and Spatiotemporal Anomaly Detection Problem definition Statistical foundation Spatial anomaly detection approaches Spatiotemporal anomaly detection approaches Spatial and Spatiotemporal Associations, Tele-Connections Problem definition Statistical foundation Spatial association detection approaches Spatiotemporal association, tele-connection detection approaches Spatial and Spatiotemporal Prediction Problem definition Statistical foundations Spatial prediction approaches Spatiotemporal prediction approaches Spatial and Spatiotemporal Partitioning (Clustering) and Summarization Problem definition Spatial partitioning and summarization approaches Spatiotemporal partitioning and summarization approaches Spatial and Spatiotemporal Hotspot Detection Problem definition Statistical foundation Spatial hotspot detection approaches Spatiotemporal hotspot detection approaches Spatial and Spatiotemporal Change Problem definition Spatial and spatiotemporal change detection approaches Research Trends and Future Research Needs Conclusions

264 265 265 266 266 266 266 266 269 269 269 270 271 271 271 271 272 273 273 273 274 274 275 275 275 276 276 277 277 277 278 278 278 278 279 279 280 280 280 281 281 282 282

Introduction

The significant growth of spatial and spatiotemporal (SST) data collection as well as the emergence of new technologies has heightened the need for automated discovery of spatiotemporal knowledge. SST data mining studies the process of discovering interesting and previously unknown, but potentially useful, patterns from large SST databases. The complexity of SST data and implicit relationships limits the usefulness of conventional data mining techniques for extracting SST patterns.

264

Spatial and Spatiotemporal Data Mining

265

Input: spatial & spatiotemporal data

Preprocessing, exploratory space-time analysis Spatial & spatiotemporal statistical foundation Spatial & spatiotemporal data mining algorithms Computational techniques Output: spatial & spatiotemporal patterns

Post-processing Fig. 1

The process of spatial and spatiotemporal data mining.

As shown in Fig. 1, the process of SST data mining starts at preprocessing of the input data. Typically, this step is to correct noise, errors, and missing data. Exploratory space–time analysis is also conducted in this step to understand the underlying spatiotemporal distribution. Then, an appropriate algorithm is selected to run on the preprocessed data and produce output patterns. Common output pattern families include SST anomalies, associations and tele-couplings, predictive models, partitions and summarization, hotspots, as well as change patterns. SST data mining algorithms often have statistical foundations and integrate scalable computational techniques. Output patterns are postprocessed and then interpreted by domain scientists to find novel insights and refine data mining algorithms when needed.

1.19.1.1

Societal Importance

SST data mining techniques are crucial to organizations which make decisions based on large SST datasets. Many government and private agencies are likely beneficiaries of SST data mining including NASA, the National Geospatial-Intelligence Agency (Krugman, 1997), the National Cancer Institute (Albert and McShane, 1995), the US Department of Transportation (Shekhar et al., 1993), and the National Institute of Justice (Eck et al., 2005). These organizations are spread across many application domains including public health, public safety, transportation, environmental science and management, economics, climatology, public policy, earth science, market research and analytics, public utilities, and logistics. In ecology and environmental management (Haining, 1993; Isaaks et al., 1989; Roddick and Spiliopoulou, 1999; Scally, 2006), researchers need tools to classify remote sensing images to map forest coverage. In public safety (Leipnik and Albert, 2003), crime analysts are interested in discovering hotspot patterns from crime event maps, so as to effectively allocate police resources. In transportation (Lang, 1999), researchers analyze historical taxi GPS trajectories to recommend fast routes. Epidemiologists (Elliot et al., 2000) use spatiotemporal data mining techniques to detect disease outbreak. The interdisciplinary nature of SST data mining means that its techniques must be developed with awareness of the underlying physics or theories in the application domains. For example, climate science studies find that observable predictors for climate phenomena discovered by data science techniques can be misleading if they do not take into account climate models, locations, and seasons. In this case, statistical significance testing is critically important in order to further validate or discard relationships mined from data.

1.19.1.2

Challenges

In addition to interdisciplinary challenges, SST data mining also poses statistical and computational challenges. Extracting interesting and useful patterns from SST datasets is more difficult than extracting corresponding patterns from traditional numeric and categorical data due to the complexity of spatiotemporal data types and relationships, including (1) the spatial relationships among the variables, that is, observations that are not independent and identically distributed (i.i.d.); (2) the spatial structure of errors; and (3) nonlinear interactions in feature space. According to Tobler’s (Tobler, 1970) First Law of Geography, “Everything is related to everything else, but near things are more related than distant things.” For example, people with similar characteristics, occupation, and background tend to cluster together in the same neighborhoods. In spatial statistics, such spatial dependence is

266

Spatial and Spatiotemporal Data Mining

called spatial autocorrelation. Ignoring autocorrelation and assuming an identical and independent distribution (i.i.d.) when analyzing data with SST characteristics may produce hypotheses or models that are inaccurate or inconsistent with the dataset (Shekhar et al., 2003b). In addition to spatial dependence at nearby locations, phenomena of spatiotemporal tele-coupling also indicate long-range spatial dependence, such as El Niño and La Niña effects, in the climate system. Another challenge comes from the fact that spatiotemporal datasets are embedded in continuous space and time, and thus many classical data mining techniques that assume discrete data (e.g., transactions in association rule mining) may not be effective. A third challenge is the spatial heterogeneity and temporal nonstationarity, that is, spatiotemporal temporal data samples do not follow an identical distribution across the entire space–time envelope. Instead, different geographical regions and temporal periods may have distinct distributions. The Modifiable Areal Unit Problem (MAUP) or multi-scale effect is another challenge since the results of spatial analysis depend on the choice of appropriate spatial and temporal scales. Finally, flow and movement and a Lagrangian framework of reference in spatiotemporal networks pose challenges (e.g., directionality, anisotropy, etc.). One way to deal with implicit spatiotemporal relationships is to materialize the relationships into traditional data input columns and apply classical data mining techniques (Agrawal et al., 1993, Agrawal and Srikant, 1994; Bamnett and Lewis, 1994; Jain and Dubes, 1988; Quinlan, 1993). However, the materialization process can result in information loss (Shekhar et al., 2003b). The spatial and temporal vagueness which naturally exists in data and relationships usually creates further modeling and processing difficulty in SST data mining. A more preferable way to capture implicit SST relationships is to develop statistics and techniques to incorporate spatial and temporal information into the data mining process.

1.19.1.3

Related Work

Surveys in SST data mining can be categorized into two groups: ones without statistical foundations and ones with a focus on statistical foundation. Among the surveys without a focus on statistical foundations, Koperski et al. (1996) and Ester et al. (1997) reviewed spatial data mining from a spatial database approach; Roddick and Spiliopoulou (1999) provided a bibliography for spatial, temporal, and spatiotemporal data mining; Miller and Han (2009) covered a list of recent SST data mining topics but without a systematic view of statistical foundations. Among surveys covering statistical foundations, Shekhar et al. (2003b) reviewed several spatial pattern families focusing on spatial data’s unique characteristics; Kisilevich et al. (2009) reviewed spatiotemporal clustering research; Aggarwal (2013) has a chapter summarizing SST outlier detection techniques; Zhou et al. (2014) reviewed SST change detection research from an interdisciplinary viewpoint; Cheng et al. (2014) reviewed state-of-the-art spatiotemporal data mining research including spatiotemporal autocorrelation, space–time forecasting and prediction, space– time clustering, as well as space–time visualization; Shekhar et al. (2011) provided a review of spatial data mining research, and Shekhar et al. (2015) reviewed spatiotemporal data mining research with a discussion of statistical foundations.

1.19.1.4

Scope

Our aim is to contribute to SST data mining research by (1) providing a taxonomy of SST data types; (2) providing a systematic overview of SST statistics organized by data types; and (3) surveying common computational techniques for major SST pattern families, including SST anomalies, coupling and tele-coupling, prediction models, partitioning and summarization, hotspot and change patterns.

1.19.1.5

Organization

This article starts with the characteristics of the data inputs of SST data mining (section “Input: Spatial and Spatiotemporal Data”) and an overview of its statistical foundation (section “Statistical Foundations”). It then describes in detail six main output patterns of SST data mining (section “Output Pattern Families”). The article concludes with an examination of research needs and future directions in sections “Research Trends and Future Research Needs” and “Conclusions.”

1.19.2

Input: Spatial and Spatiotemporal Data

The input data are one of the important factors shaping SST data mining. In this section, a taxonomy of different SST data types is introduced, followed by a summary of the unique attributes and relationships of different data. The goal is to provide a systematic overview of data in SST data mining tasks.

1.19.2.1

Types of Spatial and Spatiotemporal Data

The complexity of its input data makes SST data mining unique when compared with classical data mining. The cause of this complexity is the conflict between the discrete representations and continuous space and time. There are three models in spatial data, namely the object model (Fig. 2), the field model (Fig. 3), and the spatial network model (Fig. 4) (Shekhar and Chawla, 2003). The object model uses point, line, polygon, and collections of them to describe the world. Objects in this model are distinctly identified according to the application context, each of which is associated with some nonspatial

Spatial and Spatiotemporal Data Mining

Y

267

Points

Polygon Line

X Object model.

Pixel

ROW

Fig. 2

Columns Fig. 3

Raster model.

Y

X Fig. 4

Spatial network model.

attributes. The field model represents spatial data as a function from a spatial framework, which is a partitioning of space to nonspatial attributes. For example, a grid consisting of areas covered by pixels of a remote sensing image is mapped to pixel values in the image. Contrary to the object model, the field model is more suitable for depicting continuous phenomena. According to the scope of operations working on field model data, the operations are categorized into four groups: local, focal, zonal, and global operations. Local operations derive an output at a given location based solely on the input at that location. Focal operations’ output at a location depends on the input in a small neighborhood of the location as well. Zonal and global operations work on inputs from a predefined zone or global inputs. The spatial network model relies on a graph to represent the relationship between spatial

268

Spatial and Spatiotemporal Data Mining

elements using vertices and edges. In addition to binary relationships between vertices represented by edges in a graph, a spatial network includes more complex relationships, for example, different kinds of turns at a road intersection. Based on how temporal information is modeled in addition to spatial data, spatiotemporal data can be categorized into three types, that is, the temporal snapshot model, the temporal change model, and the event or process model (Li et al., 2008; Yuan, 1996, 1999). In the temporal snapshot model, spatial layers of the same theme are time-stamped. Based on the data model in each snapshot layer, snapshots can also represent trajectories of raster data, points, lines, and polygons, and networks such as time-expanded graphs (TEGs) and time-aggregated graphs (TAGs) (George and Shekhar, 2008). A remote sensing imagery time series is a typical example of the temporal snapshot model of raster data (Fig. 5). Unlike the temporal snapshot model, the temporal change model represents spatiotemporal data with a spatial layer at a given start time together with incremental changes occurring afterward (Fig. 6 represents the motion directions and speed of two objects). For example, it can represent relative motion attributes, such as speed, acceleration, and direction of spatial points (Gelfand et al., 2010; Laube and Imfeld, 2002), as well as rotation and deformation on lines and polygons. The event and process model describes things that happen or last in time (as shown in Fig. 7 Event and Process Model). Processes represent ongoing homogeneous situation involving one or more types of things. The homogeneity of processes indicates that if a process is lasting over an interval of time, then it is also lasting over all subintervals of the interval, so the properties of a process are subject to change over time. Events refer to the culmination of processes associated with precise temporal boundaries. They are used to represent things that have happened already (Allen, 1984; Campelo and Bennett, 2013; Shekhar and Xiong, 2007; Worboys, 2005).

Time

Fig. 5

Temporal snapshot model.

Y

Y

Starting Status

t1

X

X Y

Y

t2 Fig. 6

X

t3

X

Temporal change model.

Process 1

Event 1

… Event 2 Fig. 7

Event and process model.

Event 3

Spatial and Spatiotemporal Data Mining Table 1

Nonspatiotemporal and spatial relationships

Attributes

Relationships

Nonspatiotemporal

l l l

Spatial

l l l l l

1.19.2.2

269

Arithmetic: sum, subtraction, larger than, etc. Ordering: before, after, etc. Set: union, subclass of, instance of, etc. Set: union, intersection, membership, etc. Topological: meet, within, overlap, etc. Metric: distance, area, perimeter, etc. Directional: above, northeastern, etc. Others: visibility, etc.

Data Attributes and Relationships

Data attributes for SST data can be categorized into three distinct types, that is, nonspatiotemporal attributes, spatial attributes, and temporal attributes. Nonspatiotemporal attributes are the same as the attributes of the input data of classical data, which characterize noncontextual features of objects, such as name, population, and unemployment rate of a city. Spatial attributes define the location (e.g., longitude, latitude, and elevation) of spatial points, spatial extent (e.g., area, perimeter), and the shape of spatial polylines and polygons in a spatial reference frame. Temporal attributes include the timestamp of a spatial layer (e.g., a raster layer, points) as well as the duration of a process. Relationships on nonspatiotemporal attributes are made explicit through arithmetic, ordering, and set relations. In contrast, relationships among spatial objects are often implicit, such as intersection, within, and below. The relationships on nonspatiotemporal and spatial attributes are summarized in Table 1. Spatial relationships are categorized into five groups, namely set based, topological, metric, directional, and other relationships. Treating spatial locations as set elements, set relationships can be extended to be applied on spatial attributes. Topological relationships are concerned with the relationships between spatial objects that are preserved under continuous deformations. Metric relationships are defined between spatial elements in a metric space where the notion of distance is well defined. Directional relationships are further divided into two types, namely absolute and object-relative. For example, north and south are absolute direction relationships which are defined in the context to a global reference system. Object-relative directions are defined using the orientation of a given object, such as left and right. There are also other relationships like visibility which identifies whether a spatial element is visible to an observer from an observation point. Different spatiotemporal relationships have been proposed for different spatiotemporal data types. For temporal snapshots of the object model data, spatiotemporal predicates (Erwig and Schneider, 2002) have been proposed for topological space, while trajectory distance (e.g., Hausdorff distance (Chen et al., 2011)) has for metric space. Changes across snapshots of the field model at local, focal, and zonal levels (Zhou et al., 2014) and cubic map algebra (Mennis et al., 2005) are studied on raster snapshots. For temporal snapshots of network models, studies are conducted on predecessor(s) and successor(s) on a Lagrangian path (Yang et al., 2014), temporal centrality (Habiba et al., 2007), and network flow. Spatiotemporal coupling (i.e., within geographic and temporal proximity) (Celik et al., 2006; Huang et al., 2008; Mohan et al., 2012) and spatiotemporal covariance structure (Gelfand et al., 2010) are good examples of relationships for the event and process model.

1.19.3

Statistical Foundations

This section provides a taxonomy of common statistical concepts for different SST data types. Spatial statistics is a branch of statistics specialized to model events and phenomenon that evolves within a spatial framework. SST statistics are distinct from classical statistics due to the unique characteristics of space and time. One important property of spatial data is spatial dependence, a property so fundamental that geographers have elevated it to the status of the First Law of Geography: “Everything is related to everything else, but nearby things are more related than distant things” (Tobler, 1979). Spatial dependence is measured using spatial autocorrelation. Temporal dependence is also a topic studied in SST statistics. Other important properties include spatial heterogeneity and temporal nonstationarity, as well as the multiple scale effect. With spatial heterogeneity in the data, the distribution of spatial events and interactions between them may vary drastically by location. Temporal nonstationarity extends spatial heterogeneity into the temporal dimension.

1.19.3.1

Spatial Statistics for Different Types of Spatial Data

There exists a variety of data models to represent spatial events and processes. Accordingly, spatial statistics can be categorized according to their underlying spatial data type: Geostatistics for point referenced data, lattice statistics for areal data, spatial point process for spatial point patterns, and spatial network statistics for network data.

270

Spatial and Spatiotemporal Data Mining

Geostatistics: Geostatistics is concerned with point-referenced data. Point-reference data contains a set of points with fixed locations and a set of attribute values. The goal of geostatistics is to model the distribution of the attribute values and make predictions on unsampled locations. Spatial point-referenced data have three characteristics studied by geostatistics (Cressie, 2015), including spatial continuity (i.e., dependence across locations), weak stationarity (i.e., some statistical properties do not change with location), and isotropy (i.e., uniformity in all directions). Under the assumption of weak stationarity or intrinsic stationarity, spatial dependence at various distances can be captured by a covariance function or a semivariogram (Banerjee et al., 2014). Geostatistics provides a set of statistical tools, such as Kriging (Banerjee et al., 2014) which can be used to interpolate attributes at unsampled locations. However, real-world spatial data often show inherent variation in measurements of a relationship over space, due to influences of spatial context on the nature of spatial relationships. For example, human behavior can vary intrinsically over space (e.g., differing cultures). Different jurisdictions tend to produce different laws (e.g., speed limit differences between Minnesota and Wisconsin). This effect is called spatial heterogeneity or nonstationarity. Special models (e.g., local space–time Kriging (Gething et al., 2007)) can be further used to reflect the varying functions at different locations by assigning higher weights to neighboring points to reduce the effect of heterogeneity. Lattice statistics: Lattice statistics are statistics for spatial data in the field (or areal) model. Here, a lattice refers to a countable collection of regular or irregular cells in a spatial framework. The range of spatial dependency among cells is reflected by a neighborhood relationship, which can be represented by a contiguity matrix called a W-matrix. A spatial neighborhood relationship can be defined based on spatial adjacency (e.g., rook or queen neighborhood) or Euclidean distance, or, in more general models, cliques and hypergraphs (Warrender and Augusteijn, 1999). Based on a W-matrix, spatial autocorrelation statistics can be defined to measure the correlation of a nonspatial attribute across neighboring locations. Common spatial autocorrelation statistics include Moran’s I, Getis-Ord Gi*, Geary’s C, Gamma index G (Cressie, 2015), as well as their local versions called local indicators of spatial association (LISAs) (Anselin, 1995). Several spatial statistical models, including the spatial autoregressive regression (SAR) model, conditional autoregressive (CAR) model, Markov random fields (MRF), as well as other Bayesian hierarchical models (Banerjee et al., 2014), can be used to model lattice data. Another important issue is the MAUP (also called the multi-scale effect), i.e., the results for the same analysis method will change on different aggregation scales. For example, analysis using data aggregated by states will differ from analysis using data at the individual household level. Spatial point process statistics: Different from geostatistics, spatial point process statistics do not concern attribute values but consider the locations of points. The distribution of points is the main focus of spatial point process. Locations of a set of points can be generated based on different statistical assumptions (e.g., random, clustered). The most common model assumed for spatial point process is the homogeneous Poisson distribution, which is also known as complete spatial randomness (CSR) (Gelfand et al., 2010). In CSR, the total number of points follows a Poisson distribution, and each point is identically and independently distributed in a predefined spatial domain. A variant of CSR is binomial point process, in which the only difference is a fixed total number of points. In many application domains, CSR or binomial point process is not an appropriate assumption since points may have spatial autocorrelation or inhibition characteristics. In such cases, other specialized models should be applied to better approximate the exact distribution. For spatial inhibition, the Poisson hardcore process is widely used to generate a distribution that enforces mutual repulsion among points. For spatial autocorrelation, the Matern cluster process can be chosen to reflect the clustering characteristics. Similar cluster processes include the Poisson cluster process, Cox cluster process, and Neyman–Scott process. One of the most well-known applications of spatial point process is spatial scan statistics (Kulldorff, 1997) in hotspot detection. In spatial scan statistics, chance hotspots occurred by randomness are removed through a statistical significance test under a null hypothesis based on CSR. Similarly, Ripley’s K function, which estimates the overall clustering degree of a point distribution, also applies CSR as a base model in its null hypothesis (Marcon and Puech, 2009; Ripley, 1976). Although CSR is still the most popular spatial point process for random point distributions, other models (e.g., Poisson hardcore process, Matern cluster process) are being adapted in data mining problems (e.g., colocation and segregation) as more robust and plausible assumptions. Spatial network statistics: Most spatial statistics focus on Euclidean space. Spatial statistics on the network space is much less studied. Spatial network space, for example, river networks and street networks, is important in applications of environmental science and public safety analysis. However, it poses unique challenges including directionality and anisotropy of spatial dependency, connectivity, as well as high computational cost. Statistical properties of random fields on a network are summarized in Guyon (1995). Several spatial statistics, such as spatial autocorrelation, K-function, and Kriging, have been generalized to spatial networks (Okabe and Sugihara, 2012), while little research has been done on spatiotemporal statistics on the network space.

1.19.3.2

Spatiotemporal Statistics

Spatiotemporal statistics merge spatial statistics and temporal statistics (Cressie and Wikle, 2015; Gelfand et al., 2010). Similar to spatial data, temporal data also preserves intrinsic properties such as autocorrelation and heterogeneity. By incorporating the additional dimension of time, a majority of the models in spatial statistics can be transformed into spatiotemporal statistics. This section summarizes common statistics for different spatiotemporal data types, including spatial time series, spatiotemporal point process, and time series of lattice (areal) data. Spatial time series: Spatial statistics for point reference data have been generalized for spatiotemporal data (Kyriakidis and Journel, 1999). Examples include spatiotemporal stationarity, spatiotemporal covariance, spatiotemporal variograms, and spatiotemporal Kriging (Cressie and Wikle, 2015; Gelfand et al., 2010). There is also temporal autocorrelation and tele-coupling

271

Spatial and Spatiotemporal Data Mining

(high correlation across spatial time series at a long distance). Methods to model spatiotemporal process include physics-inspired models (e.g., stochastically differential equations) (Cressie and Wikle, 2015; Gelfand et al., 2010) and hierarchical dynamic spatiotemporal models (DSMs) (e.g., Kalman filtering) for data assimilation (Cressie and Wikle, 2015; Gelfand et al., 2010). Spatiotemporal point process: A spatiotemporal point process generalizes the spatial point process by incorporating the factor of time. As with spatial point processes, there is a spatiotemporal Poisson process, Cox process, and cluster process. There are also corresponding statistical tests including a spatiotemporal K function and spatiotemporal scan statistics (Cressie and Wikle, 2015; Gelfand et al., 2010). Time series of lattice (areal) data: Similar to lattice statistics, there is spatial and temporal autocorrelation, a SpatioTemporal Autoregressive Regression (STAR) model (Cressie and Wikle, 2015; Gelfand et al., 2010), and Bayesian hierarchical models (Banerjee et al., 2014). Other spatiotemporal statistics include empirical orthogonal functions (EOF) analysis (principal component analysis in geophysics), canonical-correlation analysis (CCA), and DSMs (Kalman filter) for data assimilation (Cressie and Wikle, 2015; Gelfand et al., 2010).

1.19.4

Output Pattern Families

1.19.4.1

Spatial and Spatiotemporal Anomaly Detection

Detecting SST anomalies is useful in many applications including transportation, ecology, homeland security, public health, climatology, and location-based services. For example, spatiotemporal anomaly detection can be used to detect anomalous traffic patterns from sensor observations on a highway road network. In this section, we review techniques for SST anomaly detection.

1.19.4.1.1

Problem definition

In data mining, anomaly (outlier) has been informally defined as the identification of items, events, or observations which do not conform to an expected pattern or are inconsistent with the remainder of the dataset (Bamnett and Lewis, 1994; Chandola et al., 2009). Contrarily, a spatial anomaly (Shekhar and Chawla, 2003) is a spatially referenced object whose nonspatial attribute values are significantly different from the values of its neighborhood. The detection of anomalies can lead to the discovery of useful knowledge and has a number of practical applications. For instance, a road intersection where vehicle speed is much higher than the intersections around it in rush hour is a spatial anomaly based on the nonspatial attribute vehicle speed. Similarly, a spatiotemporal anomaly generalizes spatial anomalies by substituting a spatial neighborhood with a spatiotemporal neighborhood. Fig. 4 compares global anomalies versus spatial anomalies. Point G in Fig. 4(A) is considered a global anomaly, but point S in Fig. 4(A) is not, according to the histogram in Fig. 4(B). However, when compared with its spatial neighbors P and Q, point S is significantly different and thus is a spatial anomaly.

1.19.4.1.2

Statistical foundation

The spatial statistics for spatial anomaly detection include two kinds of bi-partite multidimensional tests: graphical tests, including variogram clouds (Haslett et al., 1991) and Moran scatterplots (Anselin, 1995), and quantitative tests, including scatterplot (Anselin, 1994) and neighborhood spatial statistics (Shekhar and Chawla, 2003). A variogram is defined as the variance of the difference between field values at two locations. Fig. 9(A) shows a variogram cloud with neighboring pairs. As can be seen, two

(A)

(B)

8

G→ ←S

7

9

Data point Fitting curve

8 7 Number of occurrence

Attribute values

6 5 P→

4

D↑

3

Q→

2

Fig. 8

5

μ – 2σ →

← μ + 2σ

4 3 2

L

1 0

6

1 0

2

4

6

8

10

12

14

16

18

20

Global anomalies (e.g., point G) versus spatial anomalies (e.g., point S).

0 –2

0

2

4

6

8

10

272

Spatial and Spatiotemporal Data Mining

(B)

2.5

Weighted Neighbor Z-Score of Attribute Values

Square root of absolute difference of attribute values

(A)

← (Q,S)

2

← (P,S)

1.5

1

0.5

0

0

0.5

1

1.5 2 2.5 Pairwise distance

3

3

2

Q→ 0 S→

–1

–2

–3 –2

3.5

P→

1

–1.5

Variogram cloud (C)

2

2.5

4 S→

6.5 3

6

P→

5.5

2

5 4.5

Zs(x)

Average attribute values over neighborhood

–0.5 0 0.5 1 1.5 Z Score of attribute values Moran’s scatterplot

(D)

7

Q→

1

4 ←S

3.5 3

0

–1

2.5 2

Fig. 9

–1

P→ Scatterplot

–2

←Q

Spatial statistics Z(s) test

Spatial statistics for spatial anomaly detection (input data are the same as in Fig. 8A).

pairs (P, S) and (Q, S) on the left-hand side lie above the main group of pairs and are possibly related to spatial anomalies. Moran’s scatterplot is inspired by Moran’s I index. Its x-axis is Z score and y-axis is weighted average of neighbors’ Z score, so the slope of the regression line is Moran’s I index. As shown in Fig. 9(B), the upper left and lower right quadrants indicate spatial neighbors of dissimilar values: low values surrounded by high value neighbors (e.g., points P and Q), and high values surrounded by low values (e.g., point S). Scatterplot (Fig. 9(C)) fits a regression line between attribute values and a neighborhood average of attribute values and calculates residues of true points from the regression line. Spatial anomalies, for example, point S, are detected when they have high absolute residual values. Finally, as shown in Fig. 9(D), spatial statistic S(x) measures the extent of discontinuity between the nonspatial attribute value at location x and the average attribute value of x’s neighbors. A Z-test of S(x) can be used to detect spatial anomalies (e.g., point S). Median-based Z-test and iterative Z-test are also studied as alternatives of Z-test. As long as spatiotemporal neighborhoods are well defined, the spatial statistics for spatial anomaly detection is also applicable to spatiotemporal anomalies.

1.19.4.1.3

Spatial anomaly detection approaches

Visualization approaches utilize plots to visualize outlying objects. The common methods include variogram clouds and the Moran scatterplot as introduced earlier. Quantitative approaches can be classified further into distribution-based and distance-based methods. In distribution-based methods, data are modeled based on a distribution function (e.g., Normal and Poisson distribution), and the final objects are

Spatial and Spatiotemporal Data Mining

273

characterized as outliers depending on the initial statistical hypothesis that controls the model. Distance-based approaches are commonly used when dealing with points, lines, polygons, locations, and the spatial neighborhood. A spatial statistic is computed as the difference between the nonspatial attribute of the current location and that of the neighborhood aggregate (Shekhar et al., 2003a). Spatial neighborhoods can be identified by distances on spatial attributes (e.g., K nearest neighbors) or by graph connectivity (e.g., locations on road networks). This research has been extended in a number of ways to allow for multiple nonspatial attributes (Chen et al., 2008), average and median attribute values (Lu et al., 2003a), weighted spatial outliers (Kou et al., 2006), categorical spatial outliers (Liu et al., 2014), local spatial outliers (Schubert et al., 2014), and fast detection (Wu et al., 2010).

1.19.4.1.4

Spatiotemporal anomaly detection approaches

The reasoning behind spatiotemporal outlier detection is that they reflect “discontinuity” on nonspatiotemporal attributes within a spatiotemporal neighborhood. Approaches can be summarized according to the input data types. Outliers in spatial time series: For spatial time series (on point reference data, raster data, as well as graph data), basic spatial outlier detection methods, such as visualization-based approaches and neighborhood-based approaches, can be generalized with a definition of spatiotemporal neighborhoods. After a spatiotemporal neighborhood is defined, the difference between the nonspatial attribute of the current location and that of the neighborhood aggregate is computed (Chen et al., 2008; Lu et al., 2003b; McGuire et al., 2014; Shekhar et al., 2003a). Flow anomalies: Given a set of observations across multiple spatial locations on a spatial network flow, flow anomaly discovery aims to identify dominant time intervals where the fraction of time instants of significantly mismatched sensor readings exceeds the given percentage-threshold. Flow anomaly discovery can be considered as detecting discontinuities or inconsistencies of a nonspatiotemporal attribute within a neighborhood defined by the flow between nodes, and such discontinuities are persistent over a period of time. A time-scalable technique called SWEET (Smart Window Enumeration and Evaluation of persistentThresholds) was proposed (Elfeky et al., 2006; Franke and Gertz, 2008; Kang et al., 2008) that utilizes several algebraic properties in the flow anomaly problem to discover these patterns efficiently. To account for flow anomalies across multiple locations, recent work (Kang et al., 2009) defines a tele-connected flow anomaly pattern and proposes a RAD (Relationship Analysis of Dynamicneighborhoods) technique to efficiently identify this pattern. Anomalous moving object trajectories: Detecting spatiotemporal outliers from moving object trajectories is challenging due to the high dimensionality of trajectories and their dynamic nature. A context-aware stochastic model has been proposed to detect anomalous moving patterns in indoor device trajectories (Liu et al., 2012). Another spatial deviations (distance)-based method has been proposed for anomaly monitoring over moving object trajectory stream (Bu et al., 2009). In this case, anomalies are defined as rare patterns with big spatial deviations from normal trajectories in a certain temporal interval. A supervised approach called Motion-Alert has also been proposed to detect anomaly in massive moving objects (Li et al., 2006). This approach first extracts motif features from moving object trajectories, then clusters the features and fits a supervised model to classify whether a trajectory is an anomaly. Other techniques have been proposed to detect anomalous driving patterns from taxi GPS trajectories (Chen et al., 2011; Ge et al., 2011; Zhang et al., 2011).

1.19.4.2

Spatial and Spatiotemporal Associations, Tele-Connections

SST coupling patterns represent spatiotemporal object types whose instances often occur in close geographic and temporal proximity. Discovering various patterns of SST coupling and tele-coupling is important in applications related to ecology, environmental science, public safety, and climate science. For example, identifying spatiotemporal cascade patterns from crime event datasets can help police department to understand crime generators in a city and, thus, take effective measures to reduce crime events. In this section, we review techniques for identifying SST association as well as tele-connections. The section starts with the basic spatial association (or colocation) pattern and moves on to spatiotemporal association (i.e., spatiotemporal cooccurrence, cascade, and sequential patterns) as well as spatiotemporal tele-connection.

1.19.4.2.1

Problem definition

Spatial association, also known as spatial colocation patterns (Huang et al., 2004), represents subsets of spatial event types whose instances are often located in close geographic proximity. Real-world examples include symbiotic species, for example, the Nile Crocodile and Egyptian Plover in ecology. Similarly, spatiotemporal association patterns represent spatiotemporal object types whose instances often occur in close geographic and temporal proximity. Fig. 10 shows an example of spatial colocation patterns. In this dataset, there exist instances of several Boolean spatial features, each represented by a distinct shape. Spatial features in sets f0 þ0 ;0  0g and {0 o0 , 0 * 0 } tend to be located together. A careful review reveals two colocation patterns, that is f0 þ0 ;0  0g and {0 o0 , 0 * 0 }. Mining patterns of SST association are challenging due to the following reasons. First, in association rule learning for nonspatial data, features are considered associated if they are in the same transaction (e.g., goods on the same shopping receipts). However, in continuous space and time, transaction is implicit. Second, because of the modeling of continuous space and time using discrete transactions, features or relationships between features may be double-counted in different association patterns. Third, the number of candidate patterns is exponential, and a trade-off between statistical rigor of output patterns and computational efficiency has to be made.

274

Spatial and Spatiotemporal Data Mining

+

o∗

×

+ ×

+



×

+

×

o

o ∗

o

o∗

+

o ×

+

×



+o

o ∗o

+

o∗

+

Fig. 10 Illustration of point spatial colocation patterns. Shapes represent different spatial feature types. Spatial features in sets {‘þ’, ‘’} and {‘o’, ‘*’} tend to be located together.

1.19.4.2.2

Statistical foundation

The underlying statistic for SST coupling patterns is the cross-K function, which generalizes the basic Ripley’s K function (introduced in section “Statistical Foundations”) for multiple event types.

1.19.4.2.3

Spatial association detection approaches

Mining spatial colocation patterns can be done via two categories of approaches: those that use spatial statistics and algorithms that use association rule mining primitives. Spatial statistics-based approaches utilize statistical measures such as the cross-K function with Monte Carlo simulation (Cressie, 2015), mean nearest neighbor distance, and spatial regression model (Chou, 1997). However, these approaches are computationally expensive due to the exponential number of candidate patterns. Contrarily, association rule-based approaches focus on the creation of transactions over space so that an Apriori-like algorithm can be used. Within this category, there are transaction- and distance-based approaches. A transaction-based approach defines transactions over space (e.g., around instances of a reference feature) and then uses an Apriori-like algorithm (Koperski and Han, 1995). However, in the spatial colocation pattern mining problem, transactions are often not explicit. Force fitting the notion of transaction in a continuous spatial framework will lead to loss of implicit spatial relationships across the boundary of these transactions, as illustrated in Fig. 11. As shown in Fig. 11(A), there are three feature types in this dataset: A, B, and C, each of which has two instances. The neighbor relationships between instances are shown as edges. Colocations (A, B) and (B, C) may be considered as frequent in this example. Fig. 11(B) shows transactions created by choosing C as the reference feature. As colocation (A, B) does not involve the reference feature, it will not be found. Fig. 11(C) shows two possible partitions for the dataset of Fig. 11(A), along with the supports for colocation (A, B); in this case, the support measure is order sensitive and may also miss the colocation (A, B). A distance-based approach defines a distance-based pattern called k-neighboring class sets (Morimoto, 2001) or using an event-centric model (Huang et al., 2004) based on a definition of participation index, which is an upper bound of cross-K function statistics and has an anti-monotone property. Recently, approaches have been proposed to identify colocations for extended spatial objects (Xiong et al., 2004) or rare events (Huang et al., 2006a), regional colocation patterns (Ding et al., 2011; Wang et al., 2013) (i.e., pattern is significant only in a subregion), statistically significant colocation (Barua and Sander, 2014), as well as fast algorithms (Yoo and Shekhar, 2006).

(A)

(B)

C1

A1

(C)

C1

A1

B1 A2

A2

Example dataset, neighboring instances of different features are connected

C1

A1 A2

C2

B2

Reference feature = C Transactions{{B1},{B2}} support(A,B) = ϕ

C1

A1

B1

B1

B2 C2

(D)

B1 A2

C2

B2

Support(A,B) = 2

C2

B2

Support(A,B) = 1

Support for (a,b) is order sensitive

Fig. 11 Example to illustrate different approaches to discovering colocation patterns: (A) Example data set. (B) Reference feature-centric model. (C) Data partition approach. Support measure is ill-defined and order sensitive. (D) Event-centric model.

Spatial and Spatiotemporal Data Mining 1.19.4.2.4

275

Spatiotemporal association, tele-connection detection approaches

Spatiotemporal coupling patterns can be categorized according to whether there exists temporal ordering of object types: spatiotemporal (mixed drove) cooccurrences (Celik et al., 2008) are used for unordered patterns, spatiotemporal cascades (Mohan et al., 2012) for partially ordered patterns, and spatiotemporal sequential patterns (Huang et al., 2008) for totally ordered patterns. Spatiotemporal tele-connection (Zhang et al., 2003a) is the pattern of significantly positive or negative temporal correlation between spatial time series data at a great distance. The following subsections categorize common computational approaches for discovering spatiotemporal couplings by different input data types. Mixed-drove spatiotemporal cooccurrence patterns (MDCOPs) represent subsets of two or more different object types whose instances are often located in spatial and temporal proximity. Discovering MDCOPs is potentially useful in identifying tactics in battlefields and games, understanding predator–prey interactions, and in transportation (road and network) planning (Güting and Schneider, 2005; Koubarakis et al., 2003). However, mining MDCOPs is computationally very expensive because the interest measures are computationally complex, datasets are larger due to the archival history, and the set of candidate patterns is exponential in the number of object types. Recent work has produced a monotonic composite interest measure for discovering MDCOPs, and novel MDCOP mining algorithms are presented in Celik et al. (2006, 2008). A filter-and-refine approach has also been proposed to identify spatiotemporal cooccurrence on extended spatial objects (Pillai et al., 2013). A spatiotemporal sequential pattern is a sequence of spatiotemporal event types in the form of f1 / f2 / . / fk. It represents a “chain reaction” from event type f1 to event type f2 and then to event type f3 until it reaches event type fk. A spatiotemporal sequential pattern differs from a colocation pattern in that it has a total order of event types. Such patterns are important in applications such as epidemiology where some disease transmission may follow paths between several species through spatial contacts. Mining spatiotemporal sequential patterns is challenging due to the lack of statistically meaningful measures as well as high computation cost. A measure of sequence index, which can be interpreted by K-function statistics, was proposed by Huang et al. (2008) and Huang et al. (2006b), together with computationally efficient algorithms. Other works have investigated spatiotemporal sequential patterns from data other than spatiotemporal events, such as moving object trajectories (Cao et al., 2005; Li et al., 2013; Verhein, 2009). Cascading spatiotemporal patterns: Partially ordered subsets of event types whose instances are located together and occur in stages are called cascading spatiotemporal patterns (CSTP). In the domain of public safety, events such as bar closings and football games are considered generators of crime. Preliminary analysis revealed that football games and bar closing events do indeed generate CSTPs. CSTP discovery can play an important role in disaster planning, climate change science (Frelich and Reich, 2010; Mahoney et al., 2003) (e.g., understanding the effects of climate change and global warming), and public health (e.g., tracking the emergence, spread, and re-emergence of multiple infectious diseases (Morens et al., 2004)). A statistically meaningful metric was proposed to quantify interestingness, and computational pruning strategies were proposed to make the pattern discovery process more computationally efficient (Huang et al., 2008; Mohan et al., 2012). Spatial time series and tele-connection: Given a collection of spatial time series at different locations, tele-connection discovery aims to identify pairs of spatial time series whose correlation is above a given threshold. Tele-connection patterns are important in understanding oscillations in climate science. Computational challenges arise from the length of the time series and the large number of candidate pairs and the length of time series. An efficient index structure, called a cone-tree, as well as a filter-and-refine approach (Zhang et al., 2003a,b) have been proposed which utilize spatial autocorrelation of nearby spatial time series to filter out redundant pair-wise correlation computation. Another challenge is spurious “high correlation” pairs of locations that happen by chance. Recently, statistical significance tests have been proposed to identify statistically significant tele-connection patterns called dipoles from climate data (Kawale et al., 2012). The approach uses a “wild bootstrap” to capture the spatiotemporal dependencies and takes account of the spatial autocorrelation, the seasonality, and the trend in the time series over a period of time.

1.19.4.3

Spatial and Spatiotemporal Prediction

SST prediction is widely used to classify remote sensing images into dynamic land cover maps and detect changes. Climate scientists use collections of climate variables to project future trends in global or regional temperature. In this section, we review the definition of SST prediction, followed by computational approaches organized according to their input data types.

1.19.4.3.1

Problem definition

Given SST data items, with a set of explanatory variables (also called explanatory attributes or features) and a dependent variable (also called target variables), the SST prediction problem aims to develop a model that can predict the dependent variable from the explanatory variables. When the dependent variable is discrete, the problem is called SST classification. When the dependent variable is continuous, the problem is SST regression. One example of a SST classification problem is remote sensing image classification over temporal snapshots (Almeida et al., 2007), where the explanatory variables consist of various spectral bands or channels (e.g., blue, green, red, infra-red, thermal, etc.), and the dependent variable is a thematic class such as forest, urban, water, and agriculture. Examples of SST regression include yearly crop yield prediction (Little et al., 2008) and daily temperature prediction at different locations. Regression can also be used in inverse estimation, that is, given that we have an observed value of y, we want to determine the corresponding x value.

276

Spatial and Spatiotemporal Data Mining

(A)

(B)

(C)

(D)

P

Legend A

P

P

A

P P

A

P

= Nest location A = Actual nest in pixel P = Predicted nest in pixel

A

Fig. 12

A

A

A

A

A

The choice of objective function and spatial accuracy.

The unique challenges of SST prediction come from the special characteristics of SST data, which include spatial and temporal autocorrelation, spatial heterogeneity and temporal nonstationarity, as well as the multi-scale effect. These unique characteristics violate the common assumption in many traditional prediction techniques that samples follow an i.i.d.. Simply applying traditional prediction techniques without incorporating these unique characteristics may produce hypotheses or models that are inaccurate or inconsistent with the dataset. Additionally, a subtle but equally important reason is related to the choice of the objective function to measure classification accuracy. For a two-class problem, the standard way to measure classification accuracy is to calculate the percentage of correctly classified objects. However, this measure may not be the most suitable in a spatial context. This is because the measure of spatial accuracydhow far the predictions are from the actualsdis important in some applications such as ecology due to the effects of the discretization of a continuous wetland into discrete pixels, as shown in Fig. 12. Fig. 12(A) shows the actual locations of nests and (B) shows the pixels with actual nests. Note the loss of information during the discretization of continuous space into pixels. Many nest locations barely fall within the pixels labeled “A” and are quite close to other blank pixels, which represent “no-nest.” Now consider two predictions shown in Fig. 12(C) and (D). Domain scientists prefer prediction Fig. 12(D) over (C), as the predicted nest locations are closer on average to some actual nest locations. The classification accuracy measure cannot distinguish between Fig. 12(C) and (D), and a measure of spatial accuracy is needed to capture this preference.

1.19.4.3.2

Statistical foundations

SST prediction techniques are developed based on SST statistics, including spatial and temporal autocorrelation, spatial heterogeneity, temporal nonstationarity, and MAUPs (see section “Statistical Foundations”).

1.19.4.3.3

Spatial prediction approaches

Several previous studies (Jhung and Swain, 1996; Solberg et al., 1996) have shown that the modeling of spatial dependency (often called context) during the classification or regression process improves overall accuracy. Spatial context can be defined by the relationships between spatially adjacent spatial units in a small neighborhood. Three supervised learning techniques for classification and regression that model spatial dependency are: (1) MRF-based classifiers; (2) SAR model; and (3) spatial decision tree. MRF-based Bayesian classifiers: Maximum likelihood classification (MLC) is one of the most widely used parametric and supervised classification techniques in the field of remote sensing (Hixson et al., 1980; Strahler, 1980). However, MLC is a per-pixel-based classifier and assumes that samples are i.i.d. Ignoring spatial autocorrelation results in salt-and-pepper noise in the classified images. One solution is to use MRF-based Bayesian classifiers (Li, 2009) to model spatial context via the a priori term in Bayes’ rule. This uses a set of random variables whose interdependency relationship is represented by an undirected graph (i.e., a symmetric neighborhood matrix). SAR models: SAR is one of the commonly used autoregressive models for spatial data regression (Shekhar and Xiong, 2007). In the SAR model, the spatial dependencies of the error term, or the dependent variable, are directly modeled in the regression equation using contiguity matrix (Anselin, 2013). An example of contiguity matrix is shown in Fig. 13. Based on this model, the SAR can be written as: Y ¼ rWY þ Xb þ e; where Y ¼ Observation or dependent variable, X ¼ Independent variables, r ¼ Spatial autoregressive parameter that reflects the strength of the spatial dependencies between the elements of the dependent variable via the logistic function for binary dependent variables, W ¼ Contiguity matrix, b ¼ Regressive coefficient, and e ¼ Unobservable error term. When r ¼ 0, this model is reduced to the ordinary linear regression equation. Spatial decision trees: Decision tree classifiers have been widely used for image classification (e.g., remote sensing image classification) but they assume an i.i.d. distribution and produce significant salt-and-pepper noise. Several works propose spatial

Spatial and Spatiotemporal Data Mining

(A)

(B)

A

B

C

D

Spatial framework Fig. 13

277

(C)

A

B

C

D

A

0

1

1

0

B

1

0

0

C

1

0

D

0

1

A

B

C

D

A

0

0.5

0.5

0

1

B

0.5

0

0

0.5

0

1

C

0.5

0

0

0.5

1

0

D

0

0.5

0.5

0

Neighbor relationship

Contiguity matrix

A spatial framework and its four-neighborhood contiguity matrix.

feature selection heuristics, such as spatial entropy or spatial information gain, to select a tree node test that favors spatial autocorrelation structure while splitting training samples (Li and Claramunt, 2006; Stojanova et al., 2013). Recently, a focal-test-based spatial decision tree is proposed (Jiang et al., 2013, 2015), in which the tree traversal direction of a sample is based on both local and focal (neighborhood) information. Focal information is measured by local spatial autocorrelation statistics in a tree node test. One limitation of the SAR model is that it does not account for the underlying spatial heterogeneity that is natural in geographic spaces. Thus, in a SAR model, coefficient b of covariates and the model errors are assumed to be uniform throughout the entire geographic space. One proposed method to account for spatial variation in model parameters and errors is geographically weighted regression (GWR) (Fotheringham et al., 2003). The regression equation of GWR is y ¼ xbðsÞ þ eðsÞ, where b(s) and e(s) represent the spatial parameters and the errors, respectively. GWR has the same structure as standard linear regression, with the exception that the parameters are spatially varying. It also assumes that samples at nearby locations have higher influence on the parameter estimation of a current location.

1.19.4.3.4

Spatiotemporal prediction approaches

STAR: STAR extends SAR by further explicitly modeling the temporal and spatiotemporal dependency across variables at different locations. More details can be found in Cressie (2015). Spatiotemporal Kriging: Kriging (Cressie, 2015) is a geostatistic technique to make predictions at locations where observations are unknown, based on locations where observations are known. In other words, Kriging is a spatial “interpolation” model. Spatial dependency is captured by the spatial covariance matrix, which can be estimated through spatial variograms. Spatiotemporal Kriging (Cressie and Wikle, 2015) generalizes spatial Kriging with a spatiotemporal covariance matrix and variograms. It can be used to make predictions from incomplete and noise spatiotemporal data. Hierarchical Dynamic Spatiotemporal Models: Hierarchical DSMs (Cressie and Wikle, 2015), as the name suggests, aim to model spatiotemporal processes dynamically with a Bayesian hierarchical framework. On the top is a data model, which represents the conditional dependency of (actual or potential) observations on the underlying hidden process with latent variables. In the middle is a process model, which captures the spatiotemporal dependency with the process model. On the bottom is a parameter model, which captures the prior distributions of model parameters. DSMs have been widely used in climate science and environment science, for example, for simulating population growth or atmospheric and oceanic processes. For model inference, Kalman filter can be used under the assumption of linear and Gaussian models.

1.19.4.4

Spatial and Spatiotemporal Partitioning (Clustering) and Summarization

SST partitioning and summarization are important in many societal applications, such as public safety, public health, and environmental science. For example, partitioning and summarizing crime data, which is spatial and temporal in nature, helps law enforcement agencies find trends of crimes and effectively deploy their police resources (Levine, 2013).

1.19.4.4.1

Problem definition

SST partitioning or SST clustering is the process of grouping similar spatial or spatiotemporal data items and, thus, partitioning the underlying space and time (Kisilevich et al., 2009). It is important to note that SST partitioning or clustering is closely related to, but not the same as, hotspot detection. Hotspots can be considered as special clusters such that events or activities inside a cluster have much higher intensity than outside. SST summarization aims to provide a compact representation of spatiotemporal data. For example, traffic accident events on a road network can be summarized into several main routes that cover most of the accidents. SST summarization is often done after or together with spatiotemporal partitioning so that objects in each partition can be summarized by aggregated statistics or representative objects.

278

Spatial and Spatiotemporal Data Mining

1.19.4.4.2

Spatial partitioning and summarization approaches

The data mining and machine learning literature have explored a large number of partitioning algorithms which can be classified into three groups. Hierarchical clustering methods start with all patterns as a single cluster and successively perform splitting or merging until a stopping criterion is met. This results in a tree of clusters called dendograms. The dendogram can be cut at different levels to yield desired clusters. Well-known hierarchical clustering algorithms include balanced iterative reducing and clustering using hierarchies (BIRCH), clustering using interconnectivity (Chameleon), clustering using representatives (CURE), and robust clustering using links (ROCK) (Cervone et al., 2008; Karypis et al., 1999; Zhang et al., 1996). Global partitioning-based clustering algorithms start with each pattern as a single cluster and iteratively reallocate data points to each cluster until a stopping criterion is met. These methods tend to find clusters of spherical shape. K-Means and K-Medoids are commonly used partitional algorithms. Squared error is the most frequently used criterion function in partitional clustering. The recent algorithms in this category include partitioning around medoids (PAM), clustering large applications (CLARA), clustering large applications based on randomized search (CLARANS), and expectation-maximization (EM) (Hu et al., 2008; Ng and Han, 2002). Density-based clustering algorithms try to find clusters based on the density of data points in a region. These algorithms treat clusters as dense regions of objects in the data space. The density-based clustering algorithms include density-based spatial clustering of applications with noise (DBSCAN), ordering points to identify clustering structure (OPTICS), and density-based clustering (DECODE) (Lai and Nguyen, 2004; Ma and Zhang, 2004; Neill and Moore, 2004; Pei et al., 2009). Spatial summarization often follows spatial partitioning. It is more difficult than nonspatial because of its nonnumeric nature. According to the clustering methods used, the objectives of spatial summarization vary. For example, centroids or medoids may be summarization results from K-Means or K-Medoids. K main routes are identified as a summary of activities in spatial network from a K-Main-Routes algorithm proposed by (Oliver et al., 2014b; Zhou, et al., 2014).

1.19.4.4.3

Spatiotemporal partitioning and summarization approaches

Based on the input data type, common spatiotemporal partitioning summarization approaches are as follows. Spatiotemporal Event Partitioning: Some previously mentioned clustering algorithms for two-dimensional space can be easily generalized to spatiotemporal scenarios. For example, ST-DBSCAN (Birant and Kut, 2007) is a spatiotemporal extension of its spatial version called DBSCAN (Ankerst et al., 1999; Ester et al., 1996), and ST-GRID (Wang et al., 2006) splits the space and time into three-dimensional cells and merges dense cells together into clusters. Spatial Time Series Partitioning: Spatial time series partitioning aims to divide the space into regions such that the correlation or similarity between time series within the same regions is maximized. Global partitioning-based clustering algorithms, such as K Means, K Medoids, and EM, can be applied so can the hierarchical approaches. However, due to the high dimensionality of spatial time series, density-based approaches are often not effective. When computing similarity between spatial time series, a filter-andrefine approach (Zhang et al., 2003a) can be used to avoid redundant computation. Trajectory Data Partitioning: Trajectory data partitioning aims to partition trajectories into groups according to their similarity. Algorithms are of two types, that is, density-based and frequency-based. The density-based approaches (Lee et al., 2007) first break trajectories into small segments and apply density-based clustering algorithms similar to DBSCAN to connect dense areas of segments. The frequency-based approach (Lee et al., 2009) uses association rule mining (Agrawal et al., 1994) algorithms to identify subsections of trajectories which have high frequencies (also called high “support”). Data summarization aims to find compact representation of a dataset (Chandola and Kumar, 2007, p. 355–378), which can be conducted on classical data, spatial data, as well as spatiotemporal data. It is important for data compression as well as for making pattern analysis more convenient. For spatial time series data, summarization can be done by removing spatial and temporal redundancy due to the effect of autocorrelation. A family of such algorithms has been used to summarize traffic data streams (Pan et al., 2010). Similarly, the centroids from K-Means can also be used to summarize spatial time series. For trajectory data, especially spatial network trajectories, summarization is more challenging due to the huge cost of similarity computation. A recent approach summarizes network trajectories into K-Primary Corridors (Evans et al., 2012, 2013). The work proposes efficient algorithms to reduce the huge cost for network trajectory distance computation.

1.19.4.5

Spatial and Spatiotemporal Hotspot Detection

The problem of detecting SST hotspot has received a lot of attention and being widely studied and applied. For example, in epidemiology, finding disease hotspots allows officials to detect an epidemic and allocate resources to limit its spread (Kulldorff, 2015).

1.19.4.5.1

Problem definition

Given a set of SST objects (e.g., activity locations and time) in a study area, SST hotspots are regions together in certain time intervals where the number of objects is anomalously or unexpectedly high within the time intervals. SST hotspots are a special kind of clustered pattern whose inside has significantly higher intensity than outside.

Spatial and Spatiotemporal Data Mining

279

The challenge of SST hotspot detection is brought about by the uncertainty of the number, location, size, and shape of hotspots in a dataset. Additionally, “false” hotspots that aggregate events only by chance should often be avoided, since these false hotspots impede proper response by authorities and waste police resources. Thus, it is often important to test the statistical significance of candidate spatial or spatiotemporal hotspots.

1.19.4.5.2

Statistical foundation

Spatial scan statistics (Kulldorff, 1997) are used to detect statistically significant hotspots from spatial datasets. It uses a circle to scan the space–time for candidate hotspots and perform hypothesis testing. The null hypothesis states that the activity points are distributed randomly obeying a homogeneous (i.e., same intensity) Poisson process over the geographical space. The alternative hypothesis states that the inside of the circle has higher intensity of activities than outside. A test statistic called the log likelihood ratio is computed for each candidate hotspot (or circle), and the candidate with the highest likelihood ratio can be evaluated using a significance value (i.e., p-value). Spatiotemporal scan statistics extend spatial scan statistics by considering an additional temporal dimension and change the scanning window to a cylinder. Besides, LISAs, including Moran’s I, Geary’s C, or Ord Gi and Gi* functions (Anselin, 1995), are also used to detect hotspots. Contrary to spatial autocorrelation, these functions are computed within the neighborhood of a location. For example, the local Moran’s I statistic is given as: Ii ¼

n   xi  X X wi;j x j  X 2 Si j¼1;jsi

where xi is an attribute for object i, X is the mean of the corresponding attribute, wi,j is the spatial weight between objects i and j, and Pn 2 ðxj XÞ S2i ¼ j¼1; jsi with n equating to the total number of objects. A positive value for local Moran’s I indicates that an object has n1 neighboring objects with similarly high or low attribute values, so it may be a part of a cluster, while a negative value indicates that an object has neighboring objects with dissimilar values, so it may be an outlier. In either instance, the p-value for the object must be small enough for the cluster or outlier to be considered statistically significant.

1.19.4.5.3

Spatial hotspot detection approaches

Generally, there are two categories of algorithms for spatial hotspot detection, namely clustering-based approaches and spatial scan statistics-based approaches. 1.19.4.5.3.1 Clustering-based approaches Clustering methods can be used to identify candidate areas for a further evaluation of spatiotemporal hotspots. These methods include global partitioning-based, density-based clustering and hierarchical clustering (see section “Spatial and Spatiotemporal Partitioning (Clustering) and Summarization”). These methods can be used as a preprocessing step to generate candidate hotspot areas, and statistical tools may be used to test statistical significance. CrimeStat, a software package for spatial analysis of crime locations, incorporates several clustering methods to determine the crime hotspots in a study area. CrimeStat package has K-Means tool, nearest neighbor hierarchical (NNH) clustering (Jain et al., 1999), Risk Adjusted NNH (RANNH) tool, STAC Hot Spot Area tool (Levine, 2013), and LISAs tool (Anselin, 1995) that are used to evaluate potential hotspot areas. In addition to point process data, hotspot detection from trajectories is also studied to detect network paths with high density (Lee et al., 2007) or frequency of trajectories (Lee et al., 2009). 1.19.4.5.3.2 Spatial (spatiotemporal) scan statistics-based approaches The spatial scan statistic extends the original scan statistic which focuses on one-dimensional point process to allow for a multidimensional point process. The spatial scan statistic also extends the scan statistic by allowing the scanning window to vary and the baseline process to be any inhomogeneous Poisson process or Bernoulli process (Kulldorff, 1997). When the spatial scan statistic was proposed, it focused on finding circular hotspots in two-dimensional Euclidean space with a statistical significance test. It exhaustively enumerates all circles defined by an activity point as the center and another activity point on the circumference. For each circle, an interest measure named as log likelihood ratio is computed to indicate how likely the circle is statistically significant. The final step is to evaluate the p-value of each interest measure using Monte Carlo simulation because the distribution of log likelihood ratio is unknown beforehand. The p-value indicates the confidence of the circle being true hotspot area so as to eliminate hotspots generated only by chance. The null hypothesis of the MC simulation is that the data points following CSR, in other words, the baseline process is homogeneous Poisson process. Since the introduction of the spatial scan statistic, significant research is targeted at different shapes of hotspots. Because of the various shapes of the hotspots, each algorithm focuses on reducing the computational complexity in order to make the algorithms scalable for large datasets. Neill and Moore (2004) propose an algorithm for rectangular hotspots detection. Based on the property of the interest measure, that is, log likelihood ratio, and the shape of hotspots, that is, rectangle, it applies a divide-and-conquer method that speeds up the computation significantly. In addition, if some errors are tolerated, a heuristic can be used to accelerate the detection even further. According to the fact that hotspots may not always have solid centers, ring-shaped hotspot detection is proposed to find such hotspots that are formed by two concentric circles (Eftelioglu et al., 2014). The hotspots in this shape can be

280

Spatial and Spatiotemporal Data Mining

900

750

600

450

300

150

0 Fig. 14

0

150

300

450

600

750

900

Ring-shaped hotspot detection.

found in environmental criminology. Typically, criminals do not commit crimes close to their homes, but they also do not commit crimes too far from home because of the transportation cost (e.g., time and money). For example, as shown in Fig. 14, the detected ring-shaped hotspot’s log likelihood ratio is 284.51, and its p-value is 0.001. However, the most significant result returned from SatScan is a circle covering the whole ring area with log likelihood ratio as 211.15 and its p-value as 0.001. Eftelioglu et al. (2014) claims that by delineating the inner circles of ring-shaped hotspots, the range to search for the possible source of events (e.g., home of the suspect of a serial crimes) can be further narrowed down. By specifically defining the scanning window in a spatial network, some network space hotspot detection algorithms are put forward. For example, Eftelioglu et al. (2016) introduces an algorithm for ring-shaped hotspot detection in a spatial network by defining that the center of a ring-shaped hotspot is the shortest path between two points in the network.

1.19.4.5.4

Spatiotemporal hotspot detection approaches

Spatiotemporal hotspot detection can be seen as a special case of pure spatial hotspot detection by adding time as a third dimension. Two types of spatiotemporal hotspots that are of particular importance are “persistent” spatiotemporal hotspots and “emerging” spatiotemporal hotspots. A “persistent” spatiotemporal hotspot is defined as a region where the rate of increase in observations is constantly high over time. Thus, a persistent hotspot detection assumes that the risk of a hotspot (i.e., outbreak) is constant over time, and it searches over space and time for a hotspot by simply totaling the number of observations in each time interval. An “emerging” spatiotemporal hotspot is a region where the rate of observations is monotonically increasing over time (Chang et al., 2005; Tango et al., 2011). This kind of spatiotemporal hotspot occurs when an outbreak emerges causing a sudden increase in the number of observations. Such phenomena can be observed in epidemiology where at the start of an outbreak the number of disease cases suddenly increases. Tools for the detection of emerging spatiotemporal hotspots use spatial scan statistics with the change in expectation over time (Neill et al., 2005). Many of the clustering-based and spatial scan statisticsbased methods mentioned above, which are originally designed for two-dimensional Euclidean space and mostly used for pure spatial data, can be used to identify spatiotemporal candidate hotspots by considering the temporal part of the data as a third dimension. For example, DBSCAN will cluster both SST data using the density of the data as its measure, and SatScan, which uses a cylindrical window in three dimensions (time is the third dimension) instead of the circular window in two dimensions for spatial hotspots (Kulldorff, 2015), can be used as a tool to detect persistent spatiotemporal hotspots.

1.19.4.6 1.19.4.6.1

Spatial and Spatiotemporal Change Problem definition

Although the single term “change” is used to name the spatial and spatiotemporal change patterns in different applications, the underlying phenomena may differ significantly. This section briefly summarizes the main ways a change may be defined in spatiotemporal data (Zhou et al., 2014): Change in a Statistical Parameter: In this case, the data are assumed to follow a certain distribution and the change is defined as a shift in this statistical distribution. For example, in statistical quality control, a change in the mean or variance of the sensor readings is used to detect a fault. Change in Actual Value: Here, change is modeled as the difference between a data value and its spatial or temporal neighborhood. For example, in a one-dimensional continuous function, the magnitude of change can be characterized by the derivative function, while on a two-dimensional surface, it can be characterized by the gradient magnitude.

Spatial and Spatiotemporal Data Mining

281

Change in Models Fitted to Data: This type of change is identified when a number of function models are fitted to the data and one or more of the models exhibit a change (e.g., a discontinuity between consecutive linear functions) (Chandola et al., 2010).

1.19.4.6.2

Spatial and spatiotemporal change detection approaches

Spatial footprints can be classified as raster or vector footprints. Vector footprints are further classified into four categories: point(s), line(s), polygon(s), and network footprint patterns. Raster footprints are classified based on the scale of the pattern, namely local, focal, or zonal patterns. This classification describes the scale of the change operation of a given phenomenon in the spatial raster field (Worboys and Duckham, 2004). Local patterns are patterns in which change at a given location depends only on attributes at this location. Focal patterns are patterns in which change in a location depends on attributes in that location and its assumed neighborhood. Zonal patterns define change using an aggregation of location values in a region. Spatiotemporal Change Patterns with Raster-Based Spatial Footprint: This includes patterns of spatial changes between snapshots. In remote sensing, detecting changes between satellite images can help identify land cover change due to human activity, natural disasters, or climate change (Bujor et al., 2004; Kosugi et al., 2004; Martino et al., 2007). Given two geographically aligned raster images, the aim is to find a collection of pixels that have significant change between the two images (Radke et al., 2005). This pattern is classified as a local change between snapshots since the change at a given pixel is assumed to be independent of changes at other pixels. Alternative definitions have assumed that a change at a pixel also depends on its neighborhoods (Thoma and Bierling, 1989). For example, the pixel values in each block may be assumed to follow a Gaussian distribution (Aach and Kaup, 1995). We refer to this type of change footprint pattern as a focal spatial change between snapshots. Researchers in remote sensing and image processing have also tried to apply image change detection to objects instead of pixels (Chen et al., 2012; Desclée et al., 2006; Im et al., 2008), yielding zonal spatial change patterns between snapshots. A well-known technique for detecting a local change footprint is simple differencing. The technique starts by calculating the differences between the corresponding pixels’ intensities in the two images. A change at a pixel is flagged if the difference at the pixel exceeds a certain threshold. Alternative approaches have also been proposed to discover focal change footprints between images. For example, the block-based density ratio test detects change based on a group of pixels, known as a block (Aach et al., 1993; Rignot and van Zyl, 1993). Object-based approaches in remote sensing (Im et al., 2008; Im and Jensen, 2005) employ image segmentation techniques to partition temporal snapshots of images into homogeneous objects (Douglas and Peucker, 1973) and then classify object pairs in the two temporal snapshots of images into no change or change classes. Spatiotemporal Change Patterns with Vector-Based Spatial Footprint: This includes the spatiotemporal volume change footprint pattern. This pattern represents a change process occurring in a spatial region (a polygon) during a time interval. For example, an outbreak event of a disease can be defined as an increase in disease reports in a certain region during a certain time window up to the current time. Change patterns known to have a spatiotemporal volume footprint include the spatiotemporal scan statistics (Kulldorff, 2001; Kulldorff et al., 1998), a generalization of the spatial scan statistic, and emerging spatiotemporal clusters defined by Neill et al. (2005).

1.19.5

Research Trends and Future Research Needs

Most current research in SST data mining uses Euclidean space, which often assumes an isotropic property and symmetric neighborhoods. However, in many real-world applications, the underlying space is network space, such as river and road networks (Isaak et al., 2014; Oliver et al., 2010; Oliver et al., 2014a). One of the main challenges in SST network data mining is to account for the network structure in the dataset. For example, in anomaly detection, spatial techniques do not consider the spatial network structure of the dataset, that is, they may not be able to model graph properties, such as one-ways, connectivity, and left-turns. The network structure often violates the isotropic property and symmetry of neighborhoods and, instead, requires asymmetric neighborhood and directionality of neighborhood relationship (e.g., network flow direction). Recently, some cutting-edge research has been conducted in the spatial network statistics and data mining (Okabe and Sugihara, 2012). For example, several spatial network statistical methods have been developed, for example, network K function and network spatial autocorrelation. Several spatial analysis methods have also been generalized to network space, such as network point cluster analysis and clumping method, network point density estimation, network spatial interpolation (Kriging), as well as network Huff model. Due to the nature of spatial network space as distinct from Euclidean space, these statistics and analysis often rely on advanced spatial network computational techniques (Okabe and Sugihara, 2012). We believe more spatiotemporal data mining research is still needed in network space. First, though several spatial statistics and data mining techniques have been generalized to network space, few spatiotemporal network statistics and data mining methods have been developed, and the vast majority of research is still in the Euclidean space. Future research is needed to develop more spatial network statistics, such as spatial network scan statistics, a spatial network random field model, as well as spatiotemporal autoregressive models for networks. Furthermore, phenomena observed on spatiotemporal networks need to be interpreted in an appropriate frame of reference to prevent a mismatch between the nature of the observed phenomena and the mining algorithm. For instance, moving objects on a spatiotemporal network need to be studied from a traveler’s perspective, that is, the Lagrangian frame of reference (Gunturi et al., 2015; Gunturi and Shekhar, 2014) instead of a snapshot view. This is because a traveler moving along a chosen path in a spatiotemporal network would experience a road segment (and its properties such as fuel efficiency and travel time) for the time at which s/he arrives at that segment, which may be distinct from the original departure time at the start of

282

Spatial and Spatiotemporal Data Mining

the journey. These unique requirements (nonisotropy and Lagrangian reference frame) call for novel spatiotemporal statistical foundations (Isaak et al., 2014) as well as new computational approaches for spatiotemporal network data mining. Another future research need is to develop spatiotemporal graph (STG) big data platforms. Current relevant big data platforms for SST data mining include ESRI GIS Tools for Hadoop (ESRI, 2013), Hadoop GIS (Aji et al., 2013), and so on. These provide distributed systems for geometric data (e.g., lines, points, and polygons) including geometric indexing and partitioning methods, such as R-tree, Rþtree, or Quad tree. Recently, SpatialHadoop has been developed (Eldawy and Mokbel, 2015). SpatialHadoop embeds geometric notions in language, visualization, storage, MapReduce, and operations layers. However, STGs violate the core assumptions of current spatial big data platforms that the geometric concepts are adequate for conveniently representing STG analytics operations and for partition data for load-balancing. STGs also violate core assumptions underlying graph analytics software (e.g., Giraph (Avery, 2011), GraphLab (Low et al., 2014), and Pregel (Malewicz et al., 2010)) that traditional locationunaware graphs are adequate for conveniently representing STG analytics operations and for partition data for load-balancing. Therefore, novel STG big data platforms are needed. Several challenges should be addressed, for example, STG big data requires novel distributed file systems (DFS) to partition the graph, and a novel programming model is still needed to support abstract data types and fundamental STG operations.

1.19.6

Conclusions

SST data mining has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology. This article provides an overview of recent advances in SST data mining. It reviews common SST data mining techniques organized by major SST pattern families: SST anomaly, SST association and tele-coupling, SST prediction, SST partitioning and summarization, SST hotspots, and SST change detection. New trends and research needs are also summarized, including SST data mining in the network space and SST big data platform development.

References Aach, T., Kaup, A., 1995. Bayesian algorithms for adaptive change detection in image sequences using Markov random fields. Signal Processing: Image Communication 7 (2), 147–160. Aach, T., Kaup, A., Mester, R., 1993. Statistical model-based change detection in moving video. Signal Processing 31 (2), 165–180. Aggarwal CC (2013) Outlier analysis. Berlin: Springer. http://www.springer.com/us/book/9781461463955 (accessed 28 November 2016). Agrawal R, Imielinski T, and Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216. New York: ACM. Agrawal R, and Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference Very Large Data Bases, VLDB, Vol. 1215, pp. 487–499. Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., Saltz, J., 2013. Hadoop GIS: A high performance spatial data warehousing system over MapReduce. Proceedings VLDB Endowment 6 (11), 1009–1020. Albert PS and McShane LM (1995) A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics, 627–638. Allen, J.F., 1984. Towards a general theory of action and time. Artificial Intelligence 23 (2), 123–154. Almeida CM, Souza IM, Alves CD, Pinho CMD, Pereira MN, and Feitosa RQ (2007) Multilevel object-oriented classification of quickbird images for urban population estimates. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, pp. 12:1–12:8. New York: ACM. Ankerst M, Breunig MM, Kriegel H-P, and Sander J (1999) OPTICS: Ordering points to identify the clustering structure. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. New York: ACM. Anselin, L., 1994. Exploratory spatial data analysis and geographic information systems. New Tools for Spatial Analysis 17, 45–54. Anselin, L., 1995. Local indicators of spatial associationdLISA. Geographical Analysis 27 (2), 93–115. Anselin L (2013) Spatial econometrics: Methods and models. Springer Science & Business Media. Avery C (2011) Giraph: Large-scale graph processing infrastructure on Hadoop. Proceedings of the Hadoop Summit. Santa Clara 11. Bamnett V and Lewis T (1994) Outliers in statistical data. Banerjee S, Carlin BP, and Gelfand AE (2014) Hierarchical modeling and analysis for spatial data. CRC Press. https://books.google.com/books? hl¼en&lr¼&id¼WVHRBQAAQBAJ&oi¼fnd&pg¼PP1&dq¼HierarchicalþModelingþandþAnalysisþforþSpatialþData&ots¼AnAYt mTHn1&sig¼7BxKH2imau6uhpLiKxfUNh6NQ80 (accessed 28 November 2016). Barua, S., Sander, J., 2014. Mining statistically significant co-location and segregation patterns. IEEE Transactions on Knowledge and Data Engineering 26 (5), 1185–1199. Birant, D., Kut, A., 2007. ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering 60 (1), 208–221. Bu Y, Chen L, Fu AW-C, and Liu D (2009) Efficient anomaly monitoring over moving object trajectory streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 159–168. New York: ACM. Bujor, F., Trouve, E., Valet, L., Nicolas, J.M., Rudant, J.P., 2004. Application of log-cumulants to the detection of spatiotemporal discontinuities in multitemporal SAR images. IEEE Transactions on Geoscience and Remote Sensing 42 (10), 2073–2084. Campelo CEC and Bennett B (2013) Representing and reasoning about changing spatial extensions of geographic features. In: Tenbrink, T., Stell, J., Galton, A. & Wood, Z. (eds.) Spatial information theory, pp. 33–52. Springer International Publishing. Cao H, Mamoulis N, and Cheung DW (2005) Mining frequent spatio-temporal sequential patterns. In: Fifth IEEE International Conference on Data Mining (ICDM’05), p. 8. Celik, M., Shekhar, S., Rogers, J.P., Shine, J.A., 2008. Mixed-drove spatiotemporal co-occurrence pattern mining. IEEE Transactions on Knowledge and Data Engineering 20 (10), 1322–1335. Celik M, Shekhar S, Rogers JP, Shine JA, and Yoo JS (2006) Mixed-drove spatio-temporal co-occurence pattern mining: A summary of results. In: Sixth International Conference on Data Mining (ICDM’06), pp. 119–128. Cervone G, Franzese P, Ezber Y, and Boybeyi Z (2008) Risk assessment of atmospheric hazard releases using K-means clustering. In: 2008 I.E. International Conference on Data Mining Workshops, pp. 342–348.

Spatial and Spatiotemporal Data Mining

283

Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection: A survey. ACM Computing Surveys 41 (3), 15:1–15:58. Chandola V, Hui D, Gu L, Bhaduri B, and Vatsavai RR (2010) Using time series segmentation for deriving vegetation phenology indices from MODIS NDVI data. In: 2010 I.E. International Conference on Data Mining Workshops, pp. 202–208. Chandola, V., Kumar, V., 2007. SummarizationdCompressing data into an informative representation. Knowledge and Information Systems 12 (3), 355–378. Chang W, Zeng D, and Chen H (2005) Prospective spatio-temporal data analysis for security informatics. In: Proceedings 2005 I.E. Intelligent Transportation Systems, 2005, pp. 1120–1124. Chen, C., Zhang, D., Castro, P.S., Li, N., Sun, L., Li, S., 2011. Real-time detection of anomalous taxi trajectories from GPS traces. In: Puiatti, A., Gu, T. (Eds.), Mobile and ubiquitous systems: Computing, networking, and services. Springer, Berlin and Heidelberg, pp. 63–74. Chen, D., Lu, C.-T., Kou, Y., Chen, F., 2008. On detecting spatial outliers. Geoinformatica 12 (4), 455–475. Chen, G., Hay, G.J., Carvalho, L.M.T., Wulder, M.A., 2012. Object-based change detection. International Journal of Remote Sensing 33 (14), 4434–4457. Cheng, D.T., Haworth, J., Anbaroglu, B., Tanaksaranond, G., Wang, J., 2014. Spatiotemporal data mining. In: Fischer, M.M., Nijkamp, P. (Eds.), Handbook of regional science. Springer, Berlin and Heidelberg, pp. 1173–1193. Chou Y-H (1997) Exploring spatial analysis in geographic information systems. http://bases.bireme.br/cgi-bin/wxislind.exe/iah/online/?IsisScript¼iah/iah.xis&src¼google&base¼REPIDISCA&lang¼p&nextAction¼lnk&exprSearch¼48378&indexSearch¼ID (accessed 28 November 2016). Cressie N (2015) Statistics for spatial data. Wiley. Cressie N and Wikle CK (2015) Statistics for spatio-temporal data. Wiley. https://books.google.com/books? hl¼en&lr¼&id¼4L_dCgAAQBAJ&oi¼fnd&pg¼PP1&dq¼Statisticsþforþspatio-temporalþdata&ots¼ibZ_0INk_X&sig¼tHWjgedEtBOV2YAOKsmCaUoaBWA (accessed 27 October 2016). Desclée, B., Bogaert, P., Defourny, P., 2006. Forest change detection by statistical object-based method. Remote Sensing of Environment 102 (1–2), 1–11. Ding, W., Eick, C.F., Yuan, X., Wang, J., Nicot, J.-P., 2011. A framework for regional association rule mining and scoping in spatial datasets. GeoInformatica 15 (1), 1–28. Douglas, D.H., Peucker, T.K., 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization 10 (2), 112–122. Eck J, Chainey S, Cameron J, and Wilson R (2005) Mapping crime: Understanding hotspots. http://discovery.ucl.ac.uk/11291/1/11291.pdf (accessed 28 November 2016). Eftelioglu E, Li Y, Tang X, Shekhar S, Kang JM, and Farah C (2016) Mining network hotspots with holes: A summary of results. In: Miller, J. A., O’Sullivan, D. & Wiegand, N. (eds.) Geographic information science, pp. 51–67. Springer International Publishing. Eftelioglu E, Shekhar S, Oliver D, Zhou X, Evans MR, Xie Y, Kang JM, et al (2014) Ring-shaped hotspot detection: A summary of results. In: 2014 IEEE International Conference on Data Mining, pp. 815–820. Eldawy A and Mokbel MF (2015) SpatialHadoop: A MapReduce framework for spatial data. In: 2015 I.E. 31st International Conference on Data Engineering, pp. 1352–1363. Elfeky MG, Aref WG, and Elmagarmid AK (2006) Stagger: Periodicity mining of data streams using expanding sliding windows. In: Sixth International Conference on Data Mining (ICDM’06), pp. 188–199. IEEE. Elliot P, Wakefield JC, Best NG, Briggs DJ, et al (2000) Spatial epidemiology: Methods and applications. Oxford University Press. http://www.cabdirect.org/abstracts/ 20023007010.html (accessed 28 November 2016). Erwig, M., Schneider, M., 2002. Spatio-temporal predicates. IEEE Transactions on Knowledge and Data Engineering 14 (4), 881–901. ESRI (2013) Breathe life into big data. http://www.esri.com/esri-news/arcnews/summer13articles/breathe-life-into-big-data (accessed 29 November 2016). Ester, M., Kriegel, H.-P., Sander, J., 1997. Spatial data mining: A database approach. In: Advances in spatial databases. Springer, Berlin and Heidelberg, pp. 47–66. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231. Evans MR, Oliver D, Shekhar S, and Harvey F (2012) Summarizing trajectories into K-primary corridors: A summary of results. In: Proceedings of the 20th International Conference on Advances in Geographic Information Systems, pp. 454–457. New York: ACM. Evans MR, Oliver D, Shekhar S, and Harvey F (2013) Fast and exact network trajectory similarity computation: A case-study on bicycle corridor planning. In: Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, pp. 9:1–9:8. New York: ACM. Fotheringham AS, Brunsdon C, and Charlton M (2003) Geographically weighted regression: The analysis of spatially varying relationships. Wiley. Franke C and Gertz M (2008) Detection and exploration of outlier regions in sensor data streams. In: 2008 I.E. International Conference on Data Mining Workshops, pp. 375– 384. IEEE. Frelich, L.E., Reich, P.B., 2010. Will environmental changes reinforce the impact of global warming on the prairie–forest border of central North America? Frontiers in Ecology and the Environment 8 (7), 371–378. Ge Y, Xiong H, Liu C, and Zhou ZH (2011) A taxi driving fraud detection system. In: 2011 IEEE 11th International Conference on Data Mining, pp. 181–190. Gelfand AE, Diggle P, Guttorp P, and Fuentes M (2010) Handbook of spatial statistics. CRC Press. George, B., Shekhar, S., 2008. Time-aggregated graphs for modeling spatio-temporal networks. In: Spaccapietra, S., Pan, J.Z., Thiran, P., Halpin, T., Staab, S., Svatek, V., Shvaiko, P., et al. (Eds.), Journal on data semantics XI. Springer, Berlin and Heidelberg, pp. 191–212. Gething, P.W., Atkinson, P.M., Noor, A.M., Gikandi, P.W., Hay, S.I., Nixon, M.S., 2007. A local space–time kriging approach applied to a national outpatient malaria data set. Computers & Geosciences 33 (10), 1337–1350. Gunturi VMV and Shekhar S (2014) Lagrangian Xgraphs: A logical data-model for spatio-temporal network data: A summary. In: Advances in conceptual modeling, pp. 201–211. Springer International Publishing. Gunturi, V.M.V., Shekhar, S., Yang, K., 2015. A critical-time-point approach to all-departure-time Lagrangian shortest paths. IEEE Transactions on Knowledge and Data Engineering 27 (10), 2591–2603. Güting RH and Schneider M (2005) Moving objects databases. Elsevier. Guyon X (1995) Random fields on a network: Modeling, statistics, and applications. Springer Science & Business Media. https://books.google.com/books? hl¼en&lr¼&id¼EYO0MNrIT8YC&oi¼fnd&pg¼PA1&dq¼Guyon,þX.þ RandomþFieldsþonþaþNetwork:þModeling,þStatistics,þandþApplications% 3BþSpringer:þBerlin,þGermany,þ1995.&ots¼EDH0hU5b_H&sig¼02XCqGhW7_I5ORbTLXdR7_Awpmc (accessed 28 November 2016). Habiba H, Tantipathananandh C, and Berger-Wolf T (2007) Betweenness centrality measure in dynamic networks. Chicago: Department of Computer Science, University of Illinois at Chicago. http://dimacs.rutgers.edu/TechnicalReports/abstracts/2007/2007-19.html?pagewanted¼all (accessed 20 October 2016). Haining R (1993) Spatial data analysis in the social and environmental sciences. Cambridge University Press. https://books.google.com/books? hl¼en&lr¼&id¼FFIsxD1rdrIC&oi¼fnd&pg¼PR13&dq¼SpatialþDataþAnalysisþinþtheþSocialþandþEnvironmentalþSciences&ots¼vS_u9qqDh&sig¼7JHvqtaxMJkh5VSp9L9itHyvQKA (accessed 28 November 2016). Haslett, J., Bradley, R., Craig, P., Unwin, A., Wills, G., 1991. Dynamic graphics for exploring spatial data with application to locating global and local anomalies. The American Statistician 45 (3), 234–242. Hixson M, Scholz D, Fuhs N, and Akiyama T (1980) Evaluation of several schemes for classification of remotely sensed data. Photogrammetric Engineering and Remote Sensing. https://ntrs.nasa.gov/search.jsp?R¼19810031937 (accessed 28 November 2016). Hu, T., Xiong, H., Gong, X., Sung, S.Y., 2008. ANEMI: An adaptive neighborhood expectation-maximization algorithm with spatial augmented initialization. In: Advances in knowledge discovery and data mining. Springer, Berlin and Heidelberg, pp. 160–171. Huang, Y., Pei, J., Xiong, H., 2006a. Mining co-location patterns with rare events from spatial data sets. GeoInformatica 10 (3), 239–260. Huang, Y., Shekhar, S., Xiong, H., 2004. Discovering colocation patterns from spatial data sets: A general approach. IEEE Transactions on Knowledge and Data Engineering 16 (12), 1472–1485.

284

Spatial and Spatiotemporal Data Mining

Huang Y, Zhang L, and Zhang P (2006) Finding sequential patterns from massive number of spatio-temporal events. In: Proceedings of the 2006 SIAM International Conference on Data Mining, vols. 1–0, pp. 634–638. Society for Industrial and Applied Mathematics. Huang, Y., Zhang, L., Zhang, P., 2008. A framework for mining sequential patterns from spatio-temporal event data sets. IEEE Transactions on Knowledge and Data Engineering 20 (4), 433–448. Im, J., Jensen, J.R., 2005. A change detection model based on neighborhood correlation image analysis and decision tree classification. Remote Sensing of Environment 99 (3), 326–340. Im, J., Jensen, J.R., Tullis, J.A., 2008. Object-based change detection using correlation image analysis and image segmentation. International Journal of Remote Sensing 29 (2), 399–423. Isaak, D.J., Peterson, E.E., Ver Hoef, J.M., Wenger, S.J., Falke, J.A., Torgersen, C.E., Sowder, C., et al., 2014. Applications of spatial statistical network models to stream data. Wiley Interdisciplinary Reviews: Water 1 (3), 277–294. Isaaks EH., et al. (1989) Applied geostatistics. Oxford University Press. http://www.sidalc.net/cgi-bin/wxis.exe/?IsisScript¼ORTON.xis&B1¼Buscar&formato¼1&cantidad¼50&expresion¼Srivastava,%20R.M. (accessed 28 November 2016). Jain AK, and Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc. http://dl.acm.org/citation.cfm?id¼SERIES10022.42779 (accessed 28 November 2016). Jain, A.K., Murty, M.N., Flynn, P.J., 1999. Data clustering: A review. ACM Computing Surveys 31 (3), 264–323. Jhung, Y., Swain, P.H., 1996. Bayesian contextual classification based on modified M-estimates and Markov random fields. IEEE Transactions on Geoscience and Remote Sensing 34 (1), 67–75. Jiang Z, Shekhar S, Zhou X, Knight J, and Corcoran J (2013) Focal-test-based spatial decision tree learning: A summary of results. In: 2013 I.E. 13th International Conference on Data Mining, pp. 320–329. Jiang, Z., Shekhar, S., Zhou, X., Knight, J., Corcoran, J., 2015. Focal-test-based spatial decision tree learning. IEEE Transactions on Knowledge and Data Engineering 27 (6), 1547–1559. Kang, J.M., Shekhar, S., Henjum, M., Novak, P.J., Arnold, W.A., 2009. Discovering teleconnected flow anomalies: A relationship analysis of dynamic neighborhoods (RAD) approach. In: Mamoulis, N., Seidl, T., Pedersen, T.B., Torp, K., Assent, I. (Eds.), Advances in spatial and temporal databases. Springer, Berlin and Heidelberg, pp. 44–61. Kang JM, Shekhar S, Wennen C, and Novak P (2008) Discovering flow anomalies: A SWEET approach. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 851–856. Karypis, G., Han, E.-H., Kumar, V., 1999. Chameleon: Hierarchical clustering using dynamic modeling. Computer 32 (8), 68–75. Kawale J, Chatterjee S, Ormsby D, Steinhaeuser K, Liess S, and Kumar V (2012) Testing the significance of spatio-temporal teleconnection patterns. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 642–650. New York: ACM. Kisilevich S, Mansmann F, Nanni M, and Rinzivillo S (2009) Spatio-temporal clustering. Springer. http://link.springer.com/10.1007/978-0-387-09823-4_44 (accessed 28 November 2016). Koperski K, Adhikary J, and Han J (1996) Spatial data mining: Progress and challenges survey paper. In: Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada, pp. 1–10. Citeseer. Koperski K, and Han J (1995) Discovery of spatial association rules in geographic information databases. In: International Symposium on Spatial Databases, pp. 47–66. Springer. Kosugi, Y., Sakamoto, M., Fukunishi, M., Lu, W., Doihara, T., Kakumoto, S., 2004. Urban change detection related to earthquakes using an adaptive nonlinear mapping of highresolution images. IEEE Geoscience and Remote Sensing Letters 1 (3), 152–156. Kou Y, Lu C-T, and Chen D (2006) Spatial weighted outlier detection. In: Proceedings of the SIAM International Conference on Data Mining (SDM’06), pp. 614–618. Koubarakis M, Sellis T, Frank AU, Grumbach S, Güting RH, Jensen CS, Lorentzos N, et al. (2003) Spatio-temporal databases: The CHOROCHRONOS approach. Springer. Krugman PR (1997) Development, geography, and economic theory, Vol. 6. MIT Press. https://books.google.com/books? hl¼en&lr¼&id¼Pm_oAg_1UxIC&oi¼fnd&pg¼PR7&dq¼Development,þGeography,þandþEconomicþTheory&ots¼A1pbI9rgsy&sig¼hvkhCirPaJagzg7QuyoSNfh4GCQ (accessed 28 November 2016). Kulldorff, M., 1997. A spatial scan statistic. Communications in StatisticsdTheory and Methods 26 (6), 1481–1496. Kulldorff, M., 2001. Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A (Statistics in Society) 164 (1), 61–72. Kulldorff M (2015) SaTScanTM user guide. http://www.satscan.org/cgi-bin/satscan/register.pl/Current%20Version:%20SaTScan%20v9.1.1%20released%20March%209% 202011.?todo¼process_userguide_download (accessed 29 November 2016). Kulldorff, M., Athas, W.F., Feurer, E.J., Miller, B.A., Key, C.R., 1998. Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of Public Health 88 (9), 1377–1380. Kyriakidis, P.C., Journel, A.G., 1999. Geostatistical space–time models: A review. Mathematical Geology 31 (6), 651–684. Lai C, and Nguyen NT (2004) Predicting density-based spatial clusters over time. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM ’04, pp. 443–446. Lang L (1999) Transportation GIS, Vol. 118. ESri Press Redlands. http://www.sidalc.net/cgi-bin/wxis.exe/? IsisScript¼FAUSAC.xis&method¼post&formato¼2&cantidad¼1&expresion¼mfn¼026174 (accessed 28 November 2016). Laube, P., Imfeld, S., 2002. Analyzing relative motion within groups of trackable moving point objects. In: Egenhofer, M.J., Mark, D.M. (Eds.), Geographic information science. Springer, Berlin and Heidelberg, pp. 132–144. Lee, A.J.T., Chen, Y.-A., Ip, W.-C., 2009. Mining frequent trajectory patterns in spatial–temporal databases. Information Sciences 179 (13), 2218–2231. Lee J-G, Han J, and Whang K-Y (2007) Trajectory clustering: A partition-and-group framework. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 593–604. New York: ACM. Leipnik MR, and Albert DP (2003) GIS in law enforcement: Implementation issues and case studies. CRC Press. https://books.google.com/books? hl¼en&lr¼&id¼EyyDRiVZKRcC&oi¼fnd&pg¼PP1&dq¼GISþinþLawþEnforcement:þImplementationþIssuesþandþCaseþStudies&ots¼bx-LoUsbHR&sig¼Yu0zT7CTNtQwF1LBMqi_wazt80 (accessed 28 November 2016). Levine N (2013) CrimeStat IV: A spatial statistics program for the analysis of crime incident. Forensic Magazine, 15 August. https://www.forensicmag.com/news/2013/08/ crimestat-iv-spatial-statistics-program-analysis-crime-incident (accessed 28 November 2016). Li SZ (2009) Markov random field modeling in image analysis. Springer Science & Business Media. Li, X., Claramunt, C., 2006. A spatial entropy-based decision tree for classification of geographical information. Transactions in GIS 10 (3), 451–467. Li, X., Han, J., Kim, S., 2006. Motion-alert: Automatic anomaly detection in massive moving objects. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, F.-Y. (Eds.), Intelligence and security informatics. Springer, Berlin and Heidelberg, pp. 166–177. Li Y, Bailey J, Kulik L, and Pei J (2013) Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases. In: 2013 IEEE 13th International Conference on Data Mining, pp. 448–457. Li Z, Chen J, and Baltsavias E (2008) Advances in photogrammetry, remote sensing and spatial information sciences: 2008 ISPRS congress book. CRC Press. Little B, Schucking M, Gartrell B, Chen B, Ross K, and McKellip R (2008) High granularity remote sensing and crop production over space and time: NDVI over the growing season and prediction of cotton yields at the farm field level in Texas. In: 2008 I.E. International Conference on Data Mining Workshops, pp. 426–435. Liu C, Xiong H, Ge Y, Geng W, and Perkins M (2012) A stochastic model for context-aware anomaly detection in indoor location traces. In: 2012 I.E. 12th International Conference on Data Mining, pp. 449–458. Liu, X., Chen, F., Lu, C.-T., 2014. On detecting spatial categorical outliers. GeoInformatica 18 (3), 501–536.

Spatial and Spatiotemporal Data Mining

285

Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, and Hellerstein J (2014) GraphLab: A new framework for parallel machine learning. arXiv:1408.2041 [Cs]. http://arxiv.org/ abs/1408.2041 (accessed 29 November 2016). Lu C-T, Chen D, and Kou Y (2003a) Detecting spatial outliers with multiple attributes. In: 15th IEEE International Conference on Tools with Artificial Intelligence, 2003. Proceedings, pp. 122–128. Lu C-T, Chen D, and Kou Y (2003b) Algorithms for spatial outlier detection. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 597–600. IEEE. Ma D and Zhang A (2004) An adaptive density-based clustering algorithm for spatial database with noise. In: Fourth IEEE International Conference on Data Mining, 2004. ICDM ’04, pp. 467–470. Mahoney, J.R., Asrar, G., Leinen, M.S., Andrews, J., Glackin, M., Groat, C., Hobenstein, W., et al., 2003. Strategic plan for the US climate change science program. Climate Change Science Program Office, Washington, DC. Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, and Czajkowski G (2010) Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. New York: ACM. Marcon E and Puech F (2009) Generalizing Ripley’s K function to inhomogeneous populations. 1 April. https://halshs.archives-ouvertes.fr/halshs-00372631/document (accessed 28 November 2016). Martino, G.D., Iodice, A., Riccio, D., Ruello, G., 2007. A novel approach for disaster monitoring: Fractal models and tools. IEEE Transactions on Geoscience and Remote Sensing 45 (6), 1559–1570. McGuire, M.P., Janeja, V.P., Gangopadhyay, A., 2014. Mining trajectories of moving dynamic spatio-temporal regions in sensor datasets. Data Mining and Knowledge Discovery 28 (4), 961–1003. Mennis, J., Viger, R., Tomlin, C.D., 2005. Cubic map algebra functions for spatio-temporal analysis. Cartography and Geographic Information Science 32 (1), 17–32. Miller, H.J., Han, J., 2009. Geographic data mining and knowledge discovery. CRC Press, New York. Mohan, P., Shekhar, S., Shine, J.A., Rogers, J.P., 2012. Cascading spatio-temporal pattern discovery. IEEE Transactions on Knowledge and Data Engineering 24 (11), 1977–1992. Morens, D.M., Folkers, G.K., Fauci, A.S., 2004. The challenge of emerging and re-emerging infectious diseases. Nature 430 (6996), 242–249. Morimoto, Y. (2001). Mining frequent neighboring class sets in spatial databases. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 353–358. ACM. Neill DB, and Moore AW (2004) Rapid detection of significant spatial clusters. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 256–265. New York: ACM. Neill DB, Moore AW, Sabhnani M, and Daniel K (2005) Detection of emerging space-time clusters. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 218–227. New York: ACM. Ng, R.T., Han, J., 2002. CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14 (5), 1003–1016. Okabe A and Sugihara K (2012) Spatial analysis along networks: Statistical and computational methods, Wiley. https://books.google.com/books? hl¼en&lr¼&id¼RdjpFnAoIvMC&oi¼fnd&pg¼PT5&dq¼Okabe,þA.%3BþSugihara,þK.þ SpatialþAnalysisþalongþNetworks:þStatisticalþandþComputationalþMethods% 3BþJohnþWileyþ%26þSons:þNewþYork,þNY,þUSA,þ2012.&ots¼Mc_eUVstnj&sig¼WAGD9ewn8cv77lShRJwBRCLIFs4 (accessed 28 November 2016). Oliver D, Bannur A, Kang JM, Shekhar S, and Bousselaire R (2010) A K-main routes approach to spatial network activity summarization: A summary of results. In: 2010 I.E. International Conference on Data Mining Workshops, pp. 265–272. Oliver, D., Shekhar, S., Kang, J.M., Laubscher, R., Carlan, V., Bannur, A., 2014a. A K-main routes approach to spatial network activity summarization. IEEE Transactions on Knowledge and Data Engineering 26 (6), 1464–1478. Oliver D, Shekhar S, Zhou X, Eftelioglu E, Evans MR, Zhuang Q, Kang JM, et al. (2014b) Significant route discovery: A summary of results. In: International Conference on Geographic Information Science, pp. 284–300. Springer International Publishing. Openshaw S and Openshaw S (1984) The modifiable areal unit problem. Geo Abstracts University of East Anglia. Pan B, Demiryurek U, Banaei-Kashani F, and Shahabi C (2010) Spatiotemporal summarization of traffic data streams. In: Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming, pp. 4–10. New York: ACM. Pei T, Jasra A, Hand DJ, Zhu A-X, and Zhou C (2009) DECODE: A new method for discovering clusters of different densities in spatial data. Data Mining and Knowledge Discovery 18(3), 337. Pillai KG, Angryk RA, and Aydin B (2013) A filter-and-refine approach to mine spatiotemporal co-occurrences. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 104–113. New York: ACM. Quinlan JR (1993) C4. 5: Programming for machine learning, p. 38. Morgan Kauffmann. Radke, R.J., Andra, S., Al-Kofahi, O., Roysam, B., 2005. Image change detection algorithms: A systematic survey. IEEE Transactions on Image Processing 14 (3), 294–307. Rignot, E.J.M., van Zyl, J.J., 1993. Change detection techniques for ERS-1 SAR data. IEEE Transactions on Geoscience and Remote Sensing 31 (4), 896–906. Ripley BD (1976) The second-order analysis of stationary point processes. Journal of Applied Probability, 255–266. Roddick, J.F., Spiliopoulou, M., 1999. A bibliography of temporal, spatial and spatio-temporal data mining research. ACM SIGKDD Explorations Newsletter 1 (1), 34–38. Scally R (2006) GIS for environmental management. ESRI Press. http://agris.fao.org/agris-search/search.do?recordID¼US201300114558 (accessed 28 November 2016). Schubert, E., Zimek, A., Kriegel, H.-P., 2014. Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection. Data Mining and Knowledge Discovery 28 (1), 190–237. Shekhar, S., Chawla, S., 2003. Spatial databases: A tour, 1. Prentice Hall, Upper Saddle River, NJ. Shekhar, S., Evans, M.R., Kang, J.M., Mohan, P., 2011. Identifying patterns in spatial information: A survey of methods. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (3), 193–214. Shekhar, S., Jiang, Z., Ali, R.Y., Eftelioglu, E., Tang, X., Gunturi, V.M.V., Zhou, X., 2015. Spatiotemporal data mining: A computational perspective. ISPRS International Journal of Geo-Information 4 (4), 2306–2338. Shekhar, S., Lu, C.-T., Zhang, P., 2003. A unified approach to detecting spatial outliers. GeoInformatica 7 (2), 139–166. Shekhar S and Xiong H (2007) Encyclopedia of GIS. Springer Science & Business Media. Shekhar, S., Yang, T.A., Hancock, P.A., 1993. An intelligent vehicle highway information management system. Computer-Aided Civil and Infrastructure Engineering 8 (3), 175–198. Shekhar S, Zhang P, Huang Y, and Vatsavai RR (2003) Trends in spatial data mining. Data Mining: Next Generation Challenges and Future Directions, 357–380. Solberg, A.H.S., Taxt, T., Jain, A.K., 1996. A Markov random field model for classification of multisource satellite imagery. IEEE Transactions on Geoscience and Remote Sensing 34 (1), 100–113. Stojanova, D., Ceci, M., Appice, A., Malerba, D., Dzeroski, S., 2013. Dealing with spatial autocorrelation when learning predictive clustering trees. Ecological Informatics 13, 22–39. Strahler, A.H., 1980. The use of prior probabilities in maximum likelihood classification of remotely sensed data. Remote Sensing of Environment 10 (2), 135–163. Tango, T., Takahashi, K., Kohriyama, K., 2011. A space–time scan statistic for detecting emerging outbreaks. Biometrics 67 (1), 106–115. Thoma, R., Bierling, M., 1989. Motion compensating interpolation considering covered and uncovered background. Signal Processing: Image Communication 1 (2), 191–212. Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46 (Suppl 1), 234–240. Tobler WR (1979) Cellular geography. In: Gale, S. & Olsson, G. (Eds.) Philosophy in geography, pp. 379–386. Netherlands: Springer. Verhein F (2009) Mining complex spatio-temporal sequence patterns. In: Proceedings of the 2009 SIAM International Conference on Data Mining, Vols. 1–0, pp. 605–616. Society for Industrial and Applied Mathematics. Wang, M., Wang, A., Li, A., 2006. Mining spatial-temporal clusters from geo-databases. In: Advanced data mining and applications. Springer, Berlin and Heidelberg, pp. 263–270.

286

Spatial and Spatiotemporal Data Mining

Wang, S., Huang, Y., Wang, X.S., 2013. Regional co-locations of arbitrary shapes. In: Advances in spatial and temporal databases. Springer, Berlin and Heidelberg, pp. 19–37. Warrender, C.E., Augusteijn, M.F., 1999. Fusion of image classifications using Bayesian techniques with Markov random fields. International Journal of Remote Sensing 20 (10), 1987–2002. Worboys, M., 2005. Event-oriented approaches to geographic phenomena. International Journal of Geographical Information Science 19 (1), 1–28. Worboys MF, and Duckham M (2004) GIS: A computing perspective (2nd edn.). CRC Press. Wu, M., Jermaine, C., Ranka, S., Song, X., Gums, J., 2010. A model-agnostic framework for fast spatial anomaly detection. ACM Transactions on Knowledge Discovery from Data 4 (4), 20:1–20:30. Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, and Yoc J (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceedings of the 2004 SIAM International Conference on Data Mining, Vols. 1–0, pp. 78–89. Society for Industrial and Applied Mathematics. Yang, K., Evans, M.R., Gunturi, V.M.V., Kang, J.M., Shekhar, S., 2014. Lagrangian approaches to storage of spatio-temporal network datasets. IEEE Transactions on Knowledge and Data Engineering 26 (9), 2222–2236. Yoo, J.S., Shekhar, S., 2006. A joinless approach for mining spatial colocation patterns. IEEE Transactions on Knowledge and Data Engineering 18 (10), 1323–1337. Yuan M (1996) Temporal GIS and spatio-temporal modeling. In: Proceedings of Third International Conference Workshop on Integrating GIS and Environment Modeling, Santa Fe, NM. http://loi.sscc.ru/gis/data_model/may.html (accessed 20 October 2016). Yuan, M., 1999. Use of a three-domain representation to enhance GIS support for complex spatiotemporal queries. Transactions in GIS 3 (2), 137–159. Zhang D, Li N, Zhou Z-H, Chen C, Sun L, and Li S (2011) iBAT: Detecting anomalous taxi trajectories from GPS traces. In: Proceedings of the 13th International Conference on Ubiquitous Computing, pp. 99–108. New York: ACM. Zhang, P., Huang, Y., Shekhar, S., Kumar, V., 2003. Correlation analysis of spatial time series datasets: A filter-and-refine approach. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (Eds.), Advances in knowledge discovery and data mining. Springer, Berlin and Heidelberg, pp. 532–544. Zhang, P., Huang, Y., Shekhar, S., Kumar, V., 2003. Exploiting spatial autocorrelation to efficiently process correlation-based similarity queries. In: Advances in spatial and temporal databases. Springer, Berlin and Heidelberg, pp. 449–468. Zhang T, Ramakrishnan R and Livny M (1996) BIRCH: An efficient data clustering method for very large databases. In: Jagadish, H.V. and Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pp. 103–114. ACM Press. Zhou, X., Shekhar, S., Ali, R.Y., 2014. Spatiotemporal change footprint pattern discovery: An inter-disciplinary survey”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (1), 1–23.

1.20

Space-Time GIS and Its Evolution

Atsushi Nara, San Diego State University, San Diego, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.20.1 Introduction 1.20.2 Background of Space-Time GIS 1.20.3 Dynamic Topic Modeling 1.20.3.1 Methods 1.20.3.2 Results of DTM 1.20.4 Reviews on ST-GIS Topics and Trends 1.20.4.1 Conceptualization and Representation 1.20.4.2 Methodology 1.20.4.2.1 Data analysis and visualization 1.20.4.2.2 Modeling 1.20.4.3 Applications 1.20.4.3.1 Physical, environmental, and climate geography 1.20.4.3.2 Urban/Regional dynamics 1.20.4.3.3 Risk 1.20.4.3.4 Mobility/Accessibility 1.20.4.3.5 Health 1.20.5 Discussion Acknowledgment References

1.20.1

287 287 288 288 289 291 291 296 296 297 297 297 298 298 298 298 299 299 299

Introduction

Human activities and surrounding environments dynamically change over space and time. Geographers have long studied this change by describing patterns and processes across spatial and temporal scales such as human migration, transportation, environmental change, and socio-economic dynamics. In the last two decades, the geographic information system (GIS) and geographic information science (GISci) research communities have witnessed a growing interest in studying spatiotemporal data. These data are available largely through advancements in remote sensing and location-aware technologies and can help advance geographic knowledge over time. By incorporating the temporal dimension into the conventional GIS framework, researchers in many fields have contributed toward the development of Space-Time GIS (ST-GIS). Recent efforts have provided valuable reviews, insights, and prospects into the development of ST-GIS (Goodchild, 2013; Yuan, 2016), space-time analysis (An et al., 2015), quantitative methodologies for movement data (Long and Nelson, 2013), big spatiotemporal data management, and cloud computing (Yang et al., 2016). This chapter provides a review of research trends in ST-GIS by conducting a trend analysis based on dynamic topic modeling (DTM), a text mining method designed to analyze the evolution of topics in the research literature. The chapter organized as follows: section “Background of Space-Time GIS” introduces the background of ST-GIS, section “Dynamic Topic Modeling” presents the method and results of DTM, section “Reviews on ST-GIS Topics and Trends” reviews literature based on topics identified in section “Dynamic Topic Modeling”, and section “Discussion” discusses the results and implications.

1.20.2

Background of Space-Time GIS

Space-Time GIS (ST-GIS) has evolved from the static GIS by integrating the temporal dimension. Fundamental components of the conventional GIS are data management, analysis, and visualization to process, investigate, and present geographic information for understanding spatial relationships, patterns, processes, and trends, and ultimately supporting decision makings in solving real-world problems. The integration of time into GIS can enhance its capability to understand changes of geographic information in terms of morphology (or geometry), topology, and attributes, and their patterns, processes, and trends. The integration is, however, a challenging task due to the complexity associated with spatiotemporal data handling. Earlier works, labeled as temporal GIS or TGIS, emerged in the early 1990s, and extensive discussions were primarily focused on conceptual frameworks that led to the development of various spatiotemporal data models. In one of the earliest attempts in TGIS research, Langran (1989, 1993) reviewed temporal research in information processing literatures, discussed how to represent time in relational data models, argued the trade-off between the efficiency of storage and the complexity of temporal information, and identified problems and issues of adapting those data models to GIS requirements. Langran and Chrisman (1988) defined critical components of cartographic time in relation to cartographic space, compared temporal and spatial topologies, and proposed three conceptualizations of TGIS, (1) time-slice snapshots, (2) base map with overlays, and (3) space-time composite.

287

288

Space-Time GIS and Its Evolution

Following after these works, fruitful debates have helped enriching the ST-GIS literature with regard to conceptualization and representation. Peuquet (1994) proposed a spatiotemporal representation framework to incorporate concepts from perceptual psychology, artificial intelligence, and related fields. The approach attempts to represent stored spatiotemporal data in a way that conforms both to human conceptualizations of the world in space-time and geographic theory and to technical demands for accuracy and flexibility in computer-based analysis and visual presentation (Peuquet, 1994). Worboys (1994a,b) introduced the object-oriented framework to handle temporal information. He constructed a conceptual model that brings together the purely spatial with the purely temporal by forming an aggregated object, called a spatio-temporal bitemporal object or ST-simplex, consisting of a member of the spatial hierarchy and a bi-temporal reference (Worboys, 1994a,b). He also defined a ST-complex, which is the object structure representing a bitemporally referenced spatial configuration. Furthermore, spatiotemporal query operations such as equality, subset, topological, projection, b-product, set-theoretic, and selection operations are built upon ST-complexes. Yuan (1996, 1999) designed a semantic-temporal-spatial three-domain representation approach. The framework consists of semantic objects, temporal objects, spatial objects, and domain links being able to represent reality from locational-centered, entity-centered, and time-centered perspectives with six basic types of changes in geographic information: attribute changes, static spatial distribution, static spatial changes, dynamic spatial changes, mutation of a process, and movement of an entity (Yuan, 1996). Pelekis et al. (2004) provided a thorough review on historical development of spatiotemporal database models along with the definitions of spatio-temporal data modeling. Their review illustrates the detail taxonomy in terms of temporal semantics, spatial semantics, spatio-temporal semantics, and query capabilities for various models including snapshot model, space-time composite model, simple time-stamping model, event-oriented model, three-domain model, history graph model, spatio-temporal entityrelationship model, object-relationship model, spatio-temporal object-oriented data model, spatio-temporal unified modeling language, and moving object data model. Despite the rich literature of ST-GIS in earlier conceptual and data model developments, the lack of an adequate computational environment (e.g., cost, storage, performance, and data scarcity) limited effectively implementing those conceptualizations and representations as a ST-GIS platform. Recent technological advancements in computer software/hardware, databases, remote sensing, location-aware technologies, mobile devices, and parallel/distributed computing now allow collecting very large spatiotemporal datasets, improve and extend conventional spatial databases to handle and query spatiotemporal information, and help realizing ST-GIS and ST-GIS applications presented in the following sections.

1.20.3

Dynamic Topic Modeling

To examine research trends in ST-GIS, the author has reviewed the existing literature and conducted a trend analysis by employing dynamic topic modeling (DTM). This trend analysis consisted of three procedures: (1) literature data retrieval, (2) data preprocessing, and (3) DTM. The result of the analysis identified 17 dynamic topics classified into three general ST-GIS categories: (1) framework, concept, and representation; (2) methodology; and (3) application. This analysis also revealed how each topic has evolved over time.

1.20.3.1

Methods

The existing literature on ST-GIS was retrieved by performing a literature search using an online scientific citation indexing service Web of Science. The service is based on the Web of Science Core Collection, which is a comprehensive interdisciplinary bibliographic database with article references from journals, books, and proceedings across science and technology, the arts and humanities, and the social sciences. To retrieve the ST-GIS literature, the following search keyword was used. Search Keyword (ST-GIS) ¼ (“Geographic Information System” OR (GIS)) AND (“space time” OR spatiotemporal OR “spatio-temporal”) This keyword search results in a list of publications that used both the “GIS” term and the “space time” term in their title, keywords, and/or abstract. Note that more than two words enclosed by double quotations are considered as a single term, and a dash between two words does not produce any difference in the search outcome. The second step was preprocessing data for DTM. First, no-English articles were excluded from the search results. Second, all texts including title, keywords, and abstract in each article were concatenated into a single document, and each word was standardized by converting it to lower case. Third, four text mining data-processing techniques were applied to each lower-cased single document in order to extract meaningful research topics by DTM. They are (1) tokenization, (2) filtering stop words, (3) lemmatization, and (4) part-of-speech (POS) tagging. Tokenization broke a sequence of texts in a document into words, phrases, symbols, and other meaningful elements, called tokens, by identifying, for example, punctuation marks, contractions, and abbreviations. The next process was filtering stop words, which are a set of commonly used words providing little value in analytical results. Stop words used in this study include common research words and symbols (e.g., “approach,” “article,” “paper,” “research,” “study,” and “result”), and common ST-GIS words (e.g., “geographic,” “gis,” “space,” “spacetime,” “space-time,” “spatial,” “spatio,” “spatiotemporal,” “temporal,” and “time”). Third, lemmatization is a text data normalization technique to map different inflected forms of a word into one common root form or lemma. For example, “systems” becomes “system” and “changes” becomes “change”. After lemmatization, stop-word filtering was further conducted to yield a list of lemmatized tokens in each document. In the fourth step of data

Space-Time GIS and Its Evolution

289

preprocessing, POS tagging was applied to select words by their grammatical category. To arrive at meaningful research topics, this study only extracted nouns and used them as input data for DTM. The last process of the trend analysis in this study was DTM. DTM is extended from conventional topic modeling, which is an unsupervised machine learning method for finding patterns of words and discovering underlying themes in document collections using hierarchical probabilistic models (Deerwester et al., 1990; Blei et al., 2003; Steyvers and Griffiths, 2007). Although the underlying statistical assumptions of a conventional topic model, such as Latent Dirichlet Allocation, are based on a static process where the temporal order of documents is not considered, DTM takes the temporal order into account to reflect an evolving set of topics by dividing the documents collection using time slices (Blei and Lafferty, 2006). Thus, DTM models the documents of each time slice with a K-component topic model, where the topics associated with time slice t evolve from the topics associated with time slice t  1 (Blei and Lafferty, 2006). This study employed a DTM method, developed by Greene and Cross (2016), which is built on two layers of nonnegative matrix factorization (NMF) linking together topics identified in snapshots of text sources appearing over time. NMF is an unsupervised approach for reducing the dimensionality of nonnegative matrices that can be applied to textual data to reveal topical structures by measuring how important a word is to a document in a collection of texts based on a log-based term-frequencyinverse document frequency weighting factor to the data (Greene and Cross, 2016). In the first layer, the modeling process involves identifying hidden topics in each time slice or time window. The process first divides the document collection into s disjoint time windows {T1, T2, ., Ts} and then applies NMF to assess the coherence of topic models in each time window. This yields a set of s successive window topic models {M1, M2, ., Ms}, each containing ki window topics (Greene and Cross, 2016). In the second layer, window topics in each time slice identified in the first layer are combined to generate dynamic topics, k0 , that span multiple time windows. To identify dynamic topics k0 , NMF is applied on topic documentations, which is a new condensed representation of the original text sources constructed using the assumption that topics that span windows, but share a common theme, will have similar topic documents (Greene and Cross, 2016). In order to select a suitable model parameter k and k0 , this study used Topic Coherence via Word2Vec (TC-W2V) (O’Callaghan et al., 2015) that can evaluate different topic models by measuring topic coherence. In this study, the analytical process was coded in Python 2.7 using Python modules including NLTK, gensim, and dynamic-nmf and executed on Linux CentOS (v.7.2).

1.20.3.2

Results of DTM

The results of literature search consisted of 1279 ST-GIS publications as of 3 September 2016. To compare the trend in literature between ST-GIS and the general space-time research, a literature search was conducted via the Web of Science using the following search keywords. Search Keyword 2 (General ST Research) ¼ (“space time” OR spatiotemporal OR “spatio-temporal”) The result of the search for general ST research fetched 94,428 articles. Fig. 1 shows the number of publications in English for STGIS and the general ST research by year. The general published ST research appeared from the early 1990s, and there is a remarkable shift from 1990 to 1991 when the number of publications increased from 249 to 854. This change coincides with technological advancements in early 1990s including the first website on the World Wide Web in August, 1991 (Berners-Lee et al., 1992). Along with the trend in general ST research, ST-GIS publications have appeared since the early 1990s and the number of publication is continuously increasing.

200

10000

Frequency: ST-GIS (Secondary Axis) Frequency: General ST Research 8000

160

6000

120

4000

80

2000

40

0 1912

Fig. 1

0 1920

1930

1940

1950

1960

1970

1980

1990

2000

2010

The number of publications related to general spatio-temporal research and ST-GIS by year (as of 3 September 2016).

2016

290

Space-Time GIS and Its Evolution Table 1

Source titles of ST-GIS publications

Rank

Source titles

Freq.

Percent.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

International Journal of Geographical Information Science Applied Geography Lecture Notes in Computer Science Computers Environment and Urban Systems Environmental Monitoring And Assessment Environmental Earth Sciences Journal of Transport Geography Annals of The Association of American Geographers Environmental Modeling Software PLoS One Geoinformatica Landscape and Urban Planning Transactions in GIS International Journal of Health Geographies Journal of Geographical Sciences Computers Geosciences International Journal of Applied Earth Observation and Geo Information ISPRS International Journal of Geo Information Natural Hazards Chinese Geographical Science

62 36 24 20 19 18 18 17 17 17 16 16 16 14 14 13 13

4.85 2.81 1.88 1.56 1.49 1.41 1.41 1.33 1.33 1.33 1.25 1.25 1.25 1.09 1.09 1.02 1.02

13 12 11

1.02 0.94 0.86

18 19 20

Table 1 lists the top 20 source titles of ST-GIS publications. Popular outlets for ST-GIS research include major GIS and Geography journals including International Journal of Geographic Information Science, Applied Geography, Computers Environment and Urban Systems, Journal of Transport Geography, and Annals of The Association of American Geographers. Fig. 2 shows the number of publications of ST-GIS research by year from 1992 to 2016, where dots represent the number of publications by year. To conduct an analysis based on DTM, the time-stamped text data acquired from each article’s title, keywords, and abstract are divided into time windows. This study used variable time windows based on the second-order differences of the number of publications by year to capture significant trend shifts. . The second-order differences are drawn as a solid line according to the y-axis in Fig. 2. The author used 5 as a threshold for the second-order difference and divided the text data into five time windows {T1: 1992–2002, T2: 2003–2008, T3: 2009–2010, T4: 2011–2014, and T5: 2015–2016}. Note that T1 was not divided at 1997 even though the second-order difference at 1997 is greater than the threshold. This is because the division of 1997, creating two time windows with a low number of publications (i.e., 23 publications from 1992 to 1996), does not provide enough information to extract substantial window topics by DTM. A total of 17 dynamic window topic models were generated as the result of DTM by applying NMF to text data for each time windows. Table 2 illustrates the top 10 terms in each window topic model, and Table 3 highlights a summary of manually categorized ST-GIS topics. Three general ST-GIS categories identified are (1) conceptualization, representation, and framework; (2) methodology; and (3) application. Topics in methodology and application were further categorized into subtopics based on the type of methodology and application, respectively. 200

T1

180 160

T2

T3

T4

T5

70 60

Frequency First-order differences (Secondary Axis)

140

Second-order differences (Secondary Axis)

50 40

120 30 100 20 80 10

60

0

40

−10

20

0 −20 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Fig. 2

The number of ST-GIS publications by year (as of 3 September 2016).

Space-Time GIS and Its Evolution Table 2

291

Top 10 terms in window topic model DTM window topic (overall)

Term rank D01

D02

D03

D04

D05

D06

D07

D08

D09

1 2 3 4 5 6 7 8 9 10

Data Analysis Visualization Map Application Tool Survey Database Software Structure D11 Fire Event Risk Analysis Vegetation Area Age Simulation Number Decision

Land Change Cover Area Image Classification Remote Period Management Policy D12 Activity Framework Geography Interaction Mobility Measure Representation Pattern Case Location

Model Simulation Process Network Automaton Framework Development Scenario State Prediction D13 Air Pollution Exposure Concentration Regression Health Level Effect Source Population

Water Quality Groundwater Management Surface Resource Variation Source River Concentration D14 Area Distribution Analysis China Region Population Case Pattern Period Variation

Soil Erosion Loss Factor Vegetation Variability Value Area Climate Scale D15 Method Visualization Movement Integration Technique Analysis Theory Process Work Pattern

Landscape Pattern Process Change Analysis City Management Scale Population Level D16 Accessibility Service Network Access Measure Transportation Health Location Road Traffic

Risk Case Factor Cluster Health Control Assessment Disaster Conclusion County D17 Change Climate Temperature Vegetation Precipitation Index River Event Surface Period

Expansion Growth City China Development Urbanization Land Region Automaton Policy

1 2 3 4 5 6 7 8 9 10

System Information Database Management Technology Application Representation Support Environment Decision D10 Disease Conclusion Surveillance Control Cluster Health Dengue Analysis Hotspot Factor

Table 3

Summary of window topics Methodology

Application

Conceptualization/ Representation

Data analysis/ Visualization

Modeling

D01, D12

D02, D15

D04

Physical/Environmental/ Climate

Urban/Regional Risk

D03, D05, D06, D07, D13, D03, D07, D09, D17 D14

Mobility/ Accessibility

D8, D12, D16 D11

Health D08, D10, D13

Subtopics for methodology are data analysis/visualization and modeling, whereas those for application include physical/environmental/climate geography, urban/regional dynamics, risk, mobility/accessibility, and health. Finally, 83 dynamic window topics were identified by applying NMF to 17 dynamic window topic models (D01–D17). Table 4 presents top 10 terms in each dynamic window topic that is associated with one window topic model at a certain time window describing how one topic dynamically evolves over time. In Table 4, window topics in T1, T3, and T5 are shaded in order to highlight different time windows. Note that multiple window topics in a single time window can be associated with a single dynamic topic (e.g., T2-WT1 and T2-TW2 in D04). It is important to note that one limitation in this DTM analysis is that research topics and trends identified by DTM were extracted strictly from the ST-GIS literature, where publications must include both the “GIS” and the “space-time” terms in their title, keywords, and/or abstract; therefore, the result of the DTM analysis did not include topics from possibly related literatures not referring “GIS.” For example, many data mining and machine learning literatures may not use “GIS” but highly related to spatiotemporal analysis. In addition, only single noun was considered in the analysis, which excluded emerging topics consisting of more than two words such as “big data,” “social media data,” “human dynamics,” and “high performance computing.”

1.20.4

Reviews on ST-GIS Topics and Trends

1.20.4.1

Conceptualization and Representation

Since the early 1990s, GIS researchers have discussed fundamental ST-GIS topics to incorporate temporal information and analysis functions into GIS technology (Yuan, 2016). As identified in the result of DTM, there are two dynamic topics related to conceptualization and representation of ST-GIS; one is a general topic, discussions of which include data representation, information system, and database management (Table 4, D01), and the other is a specific topic focusing on human mobility, activity, and interaction (Table 4, D12).

292 Table 4

Space-Time GIS and Its Evolution Dynamic topics associated with each time window and each dynamic window topic model D01

Term rank

Overall

T1-WT1

T1-WT2

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

System Information Database Management Technology Application Representation Support Environment Decision

Database Application Management Language Support Query System Data information Development

System Information Technology Distribution Rate Problem Analysis Area simulation Decision

System Database Information Application Representation Management Environment Event object Extension

System Information Management Database Evolution Design Condition Process region Site

System Information Analysis Decision Tool Technology Earth Field power Visualization

System Management Process Network Information Environment Support Technology analysis Monitoring

D02 Term rank

Overall

T1

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Data Analysis Visualization Map Application Tool Survey Database Software Structure

Data Survey Analysis Area Map Population Activity Tool Component Pattern

Data Visualization Exploration Analysis Technique Method Tool Measure Value Attribute

Data Analysis Map Software Structure Implementation Tool Field Application Development

Data Analysis Visualization Method Event Mining Collection Dimension Application Database

Data Analysis Visualization Service Map Datasets Application Method Representation Environment

D03 Term rank

Overall

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Land Change Cover Area Image Classification Remote Period Management Policy

Land Cover Change Area Construction Expansion China Management Structure Process

Land Change Cover Surface Remote Impact Image Policy Region Period

Land Cover Change Image Classification Force Area Basin River Remote

Land Change Cover Area Classification Image Urbanization Period Development Detection

D04 Term rank

Overall

T1

T2-WT1

T2-WT2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Model Simulation Process Network Automaton Framework Development Scenario State Prediction

Model Representation Development Value Data Location Part Support Information Science

Model Effect Data Simulation Relation Process Term Modeling System Performance

Language Error Modeling Knowledge Dimension Simulation Tool Event Technique Function

Model Network State Simulation Scenario Type Process Population Software Range

Model Simulation Component Framework Representation Regression Map Constraint Performance Language

Model Simulation Automaton Framework Growth Prediction Regression Modeling Function State

D05 Term rank

Overall

T1

T2

T3

T4-WT1

T4-WT2

T5

1 2 3

Water Quality Groundwater

Water Landscape Management

Water Groundwater Quality

Water Concentration Rate

Groundwater Aquifer Quality

Water Quality Irrigation

Water Quality River

293

Space-Time GIS and Its Evolution Table 4

Dynamic topics associated with each time window and each dynamic window topic modeldcont'd D05

Term rank

Overall

T1

T2

T3

T4-WT1

T4-WT2

T5

4 5 6 7 8 9 10

Management Surface Resource Variation Source River Concentration

Land Scale Quality Distribution Population Level Development

Agriculture Analysis Strategy Variation Management Climate Layer

Management Assessment Hypothesis Conclusion Site Distribution Impact

Level Area Depth Region Analysis Trend Rainfall

Pollution Supply Source Runoff Management Cause Body

Surface Variation Parameter Environment India Sea Resource

D06 Term rank

Overall

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Soil Erosion Loss Factor Vegetation Variability Value Area Climate Scale

Soil Geostatistics Variability Trend Zone Quality Distribution Survey Scale Precipitation

Soil Loss Value Factor Classification Vegetation Variability Correlation Climate Development

Soil Erosion Loss Area Sample Vegetation Concentration Factor Content Influence

Soil Erosion Loss River Rate Index Area Zone Process Transport

D07 Term rank

Overall

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Landscape Pattern Process Change Analysis City Management Scale Population Level

Landscape Pattern Indicator Heterogeneity Ecosystem Analysis Population Process Cover Impact

Landscape Pattern City Change Analysis Index Process Image Dimension Growth

Landscape Change Fragmentation Habitat Grassland Pattern Conservation Process Patch Condition

Landscape Effect Site Pattern Change Influence Construction Importance Intensity Role

D08 Term rank

Overall

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Risk Case Factor Cluster Health Control Assessment Disaster Conclusion County

Risk Case Factor Control Association County Program Variation Conclusion Relationship

Risk Cluster Case Analysis Health Pattern Factor Conclusion Control Location

Risk Disaster Population Area Health Assessment Part Hazard Vulnerability Factor

Risk Map Exposure Assessment Control Tool Stage Case Factor Management

D09 Term rank

Overall

T2

T4-WT1

T4-WT2

T5-WT1

T5-WT2

1 2 3 4 5

Expansion Growth City China Development

Growth City Development Expansion China

Growth Development Automaton City Simulation

Expansion China Land Settlement City

Expansion Growth City China Land

Energy City District Density Analysis (Continued)

294 Table 4

Space-Time GIS and Its Evolution Dynamic topics associated with each time window and each dynamic window topic modeldcont'd D09

Term rank

Overall

T2

T4-WT1

T4-WT2

T5-WT1

T5-WT2

6 7 8 9 10

Urbanization Land Region Automaton Policy

Image Theory Policy Area Show

Transition Area Building Analysis Process

Urbanization Factor Region Force Cropland

Development km Urbanization District Decade

System Scale Series Technology Optimization

D10 Term rank

Overall

T2

T3

T4-WT1

T4-WT2

T5

1 2 3 4 5 6 7 8 9 10

Disease Conclusion Surveillance Control Cluster Health Dengue Analysis Hotspot Factor

Disease Surveillance Conclusion Health Service Development System County Density Mapping

Disease Control Population Conclusion Vegetation Strategy Activity Tool Account Application

Dengue Case Disease Outbreak Control Pattern Surveillance Risk Uncertainty Climate

Disease Analysis Hotspot Cluster Conclusion Area Incidence Distribution Transmission Control

Disease Cluster Event Distribution Conclusion Mountain Factor Condition Area Case

D11 Term rank

Overall

T2

T3

1 2 3 4 5 6 7 8 9 10

Fire Event Risk Analysis Vegetation Area Age Simulation Number Decision

Fire Event Risk Age Vegetation Area Ecosystem Stand Cluster Index

Fire Analysis Data Analyze Monitoring Number Risk Decision Order Environment

D12 Term rank

Overall

T1

T2

T3

T4

T5

1 2 3 4 5 6 7 8 9 10

Activity Framework Geography Interaction Mobility Measure Representation Pattern Case Location

Measure Framework Interaction Location Structure Scale Type Relationship Land Level

Activity Geography Concept Interaction Design Tool System Analysis Location Group

Framework Representation Component Science Information Activity Interaction Map Field Show

Activity Constraint Travel Path Geography Behavior Pattern Sea Opportunity Interaction

Mobility Pattern Source Example Access Work Challenge Transportation Activity Geography

D13 Term rank

Overall

T2

T4

T5

1 2 3 4 5 6 7

Air Pollution Exposure Concentration Regression Health Level

Exposure Pollution Air Concentration Method Regression Source

Air Exposure Pollution Concentration Regression Health Level

Air Pollution Concentration Effect Exposure Vegetation Road

Space-Time GIS and Its Evolution Table 4

295

Dynamic topics associated with each time window and each dynamic window topic modeldcont'd D13

Term rank

Overall

T2

T4

T5

8 9 10

Effect Source Population

Group Analysis Level

Monitoring Effect Estimation

Factor Level Relationship

D14 Term rank

Overall

T1

T2-WT1

T2-WT2

T3-WT1

T3-WT2

T5-WT1

T5-WT2

1 2 3 4 5 6 7 8 9 10

Area Distribution Analysis China Region Population Case Pattern Period Variation

Case Pattern Rate Activity Work Period Population Analysis Distribution System

Distribution Area Map Specie Site Information System Pattern Correlation Analysis

River Area Surface Vegetation Rainfall Basin km Process Variation Source

China Variation Trend Variability Concentration Distribution Analysis Part Region Technique

Area Change Development Period Measure Management Component Image Cover Structure

Population Exposure Habitat Survey Estimate Area Estimation Scenario Imagery Distribution

Area Region China Province Distribution Analysis Basin Development Cluster Slope

D15 Term rank

Overall

T1

T3

T4

1 2 3 4 5 6 7 8 9 10

Method Visualization Movement Integration Technique Analysis Theory Process Work Pattern

Method Visualization Theory Pattern Work Way Level Technique Process Data

Method Analysis Integration Information Database Technique Type Data Application Process

Movement Location Method Home Geography Range GPS Object Review Path

D16 Term rank

Overall

T2

T3

T4-WT1

T4-WT2

T5-WT1

T5-WT2

1 2 3 4 5 6 7 S 9 10

Accessibility Service Network Access Measure Transportation Health Location Road Traffic

Network Transportation Algorithm Planning Support Complexity Service Decision Road Method

Access Service Location Health Source Network Data Integration Environment Value

Service Accessibility Measure Ecosystem Access Demand Hour Transportation Value Variation

Traffic Road Network Density Data Estimation Infrastructure Travel Number Event

Accessibility Travel Transportation Service Method Transport Difference Access Center Area

Health Conclusion Rate Source Access Analysis Control Resource Factor Information

D17 Term rank

Overall

T1

T2-WT1

T2-WT2

T2-WT3

T4-WT1

T4-WT2

T5-WT1

T5-WT2

1 2 3 4 5 6 7 8 9 10

Change Climate Temperature Vegetation Precipitation Index River Event Surface Period

Change Event Surface Simulation Period Database Query Process Technique Function

Query Index Interface Language Application Event Structure Support Database Performance

Change Science Rate Analysis Technique Climate Image Detection Period Difference

Community Climate Vegetation Site Forest Pattern Distance Program Show Knowledge

Temperature Site Climate Surface Period Air Effect Power Spread Variation

Index River Change Vegetation Precipitation Climate Area Basin Trend Flood

Wetland Vegetation Change Ecosystem Area km Cover Activity Period Service

Climate Change Temperature Precipitation Rainfall Trend Habitat Variability Sea Season

296

Space-Time GIS and Its Evolution

Conceptualizations and representations of space and time and the spatiotemporal relationships among geographical entities and phenomena have been continuously discussed. Although conceptualizations capture ontological constructs, representations formalize the conceptualized ontological constructs based on their characteristics, behaviors, and relationships to organize spatial and temporal data in accordance with the geographic domain (Yuan, 2016). Two core conceptual perspectives of space and time are absolute and relative. Originating from Newtonian absoluteness, absolute space is defined as homogeneous and immovable, can be a neutral of empty container, and imposes a formal frame of reference. Newtonian absolute time is defined as uniform flows of time and simultaneity, where two intervals of time are truly equal, and each moment of time is defined everywhere. In this absolute viewpoint, space is Euclidean with a three-dimensional Cartesian frame of reference, and time can be added as a fourth orthogonal axis. Because of the pure mathematical representation of Euclidean geometry, it is convenient for geographers to implement absolute space in GIS for representing geographic phenomena. In addition, two properties of absolute time (i.e., equal interval and simultaneity) can be commonly seen in geographic mappings; for example, a sequence of maps representing demographic changes using census data or other longitudinal data with equal temporal intervals. The alternative conceptualization of Newtonian absolute space and time is the concept of relativity proposed by Leibniz followed by Mach and Einstein (Earman, 1989). In contrast to the absolute framework, relative space and time have no absolute, independent existence but only exist by virtue of the things that exist and the events that occur (Galton, 2004). In other words, space and time are viewed as coexistence relationships between changes and events, and they are defined by the spatial elements and processes under consideration. Therefore, in studies examining relationships between spatial patterns and functions, it is the processes (e.g., migration, commuting patterns, and diffusion of ideas and information) that define the scales and regions (Meentemeyer, 1989). In addition, the relative space-time view may be defined in non-Euclidean space or linear time, exists only with reference to things and processes, and is applied in studies of forms, patterns, functions, rates, or diffusion (Meentemeyer, 1989; Wachowicz, 2003). As these two frameworks are fundamental conceptualizations of space and time, various ST-GIS conceptualizations and representations have been proposed such as object-oriented conceptualization (Raper and Livingstone, 1995; Wachowicz, 2003; Worboys, 1994a,b), event-based data model (Hornsby and Cole, 2007; Peuquet and Duan, 1995), semantic-temporal-spatial three-domain representation (Yuan, 1999), topological temporal framework (Claramunt and Thériault, 1995), and trajectory conceptualization (Spaccapietra et al., 2008). Focusing on human mobility, activity, and interaction, one key approach in GIScience is to employ Hägerstraand (1970) time geography and its central principles of space-time paths (STPs) and space-time prisms. Because time and space play an inseparable role in human activities, Hägerstrand proposed the concept of time geography to study the relationship between human activities and various constraints in a space-time context (Golledge and Stimson, 1997). In this theoretical framework, an individual’s activities are limited by three constraints: (1) capability constraints are the physical and technological limitations such as sleeping and auto ownership, respectively; (2) coupling constraints are anchors on activity that enable people to bundle their activities to places and times (work, home, school, etc.); and (3) authority constraints are temporal and/or spatial limitations or regulations on spacetime accessibility, as in the case of military areas (spatial constraints) and office hours (temporal constraints) (Yu and Shaw, 2008). Many efforts have been made to formalize and implement Hägerstrand’s time geography (Hornsby and Egenhofer, 2002; Kwan, 1998; Miller, 1991; Yu and Shaw, 2007; Yuan et al., 2014), which are discussed in sections “Data analysis and visualization” and “Applications”.

1.20.4.2 1.20.4.2.1

Methodology Data analysis and visualization

Two ST-GIS dynamic window topics (D02, D15) are broadly associated with spatiotemporal data analysis and visualization. In addition, DTM found recent research trends in ST-GIS, such as data mining, event sequences, and moving objects and trajectory analysis, by capturing terms like mining, event, movement, GPS, and path, which appear in D02-T4 and D15-T4. A variety of methodologies have been discussed to analyze and visualize various types of spatiotemporal data. Two major spatiotemporal data types are spatial panel data and individual movement data, and space-time analysis methods are appropriate for these data (An et al., 2015. Spatial panel data are data in cross-sectional units, which are georeferenced and often immobile (An et al., 2015). Spatial panel data can be categorized into three subtypes: (1) temporal sequences of snapshots (e.g., time series of remotely sensed images); (2) temporal sequences of polygon coverages (e.g., census tracts); and (3) multidimensional data sampled through time at a set of fixed point (e.g., weather station data) (Goodchild, 2013). Spatiotemporal exploratory data analysis is a methodological approach to detect and describe patterns, trends, and relations in data in both space and time by data query, quantification, and visualization (Andrienko and Andrienko, 2006). Example tools for spatial panel data are available in open source and proprietary software including PCRaster (Karssenberg et al., 2010), TGRASS (Gebbert and Pebesma, 2014), STARS (Rey and Janikas, 2006), VIS-STAMP (Guo et al., 2006), and Extended Time-Geographic Framework Tools (Shaw et al., 2008). Individual movement data are one of the more prominent new data types of the past decade resulting from the tracking of objects, using remote sensing and location-aware technologies such as the global positioning system (GPS), radio-frequency identification (RFID), surveillance cameras, and other techniques (Goodchild, 2013). Unprecedented amounts of tracking data are being collected and available for researchers to study spatiotemporal patterns, processes, and behaviors of moving object such as humans, vehicles, and animals (Dodge et al., 2016). Time geography provides a conceptual framework and has been widely applied in GIS to explore and analyze individual movement data. The space-time path and space-time cube are used for exploratory visualization of individual-based objects, events, and activities in space and time (Gatalsky et al., 2004; Kapler and Wright, 2005; Kraak and

Space-Time GIS and Its Evolution

297

Koussoulakou, 2005). Miller (1991, 1999) applied time geography concepts to develop accessibility measures in an urban environment and extended it to characterized complex velocities by minimum cost curves through an inverse velocity field (Miller and Bridwell, 2009). Location-based social networking attempts to find human interactions and community structures by creating and analyzing graph networks based on spatial and spatiotemporal constraints using the time-geography framework (Crooks et al., 2016; Yuan et al., 2014). Space-time kernel density estimation can be applied to identify and visualize space-time hotspots of point patterns and moving objects (Brunsdon et al., 2007; Nakaya and Yano, 2010; Nara and Torrens, 2011). Various analyses can be also conducted to describe the behavior of mobile objects such as speed, acceleration, turning angle, displacement, travel path, and straightness index, directional distribution, and fractal dimension (Benhamou, 2004; Dodge et al., 2008; Laube et al., 2007; Nara and Torrens, 2007; Torrens et al., 2012). Trajectory-based analysis can extract movement characteristics and semantics of individuals and groups (Laube et al., 2005; Nara et al., 2017; Nara and Torrens, 2011; Parent et al., 2013; Yuan and Nara, 2015).

1.20.4.2.2

Modeling

The dynamic window topic of D04 depicts the modeling aspect of ST-GIS. One noticeable trend suggested by several key terms in the topic is that research interests in simulation modeling have been increasing year by year and become one of the popular methodological approaches to represent and ultimately predict space-time processes. Cellular automata (CA) and agent-based models (ABMs) are two popular simulation models applied in GIScience. Their theoretical basis is in complex system science that studies how fragmented parts of a system give rise to the collective or aggregated behaviors of the system and how the system interacts with its environment (Bar-Yam, 2002). This disaggregated objects’ interactive framework is capable of representing complex spatiotemporal dynamics including multiscale feedback effects, nonlinear dynamics, self-organization, and emergent behaviors. In addition, the “bottom-up” modeling framework can avoid problems with the ecological fallacy (Wrigley et al., 1996) and the modifiable areal unit problem (Openshaw, 1984), common challenges in a “top-down” approach when inferring finer-scale characteristics from coarse-scale data. CA models consist of cells, or grids, that provide the discrete confines of individual automata. A cell has a finite set of states that are used to describe its attributes. At each discrete simulation time step, the states of each cell evolve according to well-defined uniform transition rules that are locally applied (i.e., using the states of the cell and its neighbors). Though CA can run on an N-dimensional space, generally two-dimensional CA is commonly used for geographic applications. Since CA models have a natural affinity with raster data and GIS, they are suitable for simulating spatiotemporal processes in a spatially continuous field such as urban process and land-use land-cover change. In contrast, ABMs abstract spatiotemporal processes based around object-oriented, disaggregated, autonomous, and heterogeneous entities that interact with each other and the environment. Agents in an ABM also consist of states, neighborhoods, and transition rules. In addition, agents have free mobility in space having defined movement rules. Therefore, ABMs have a great capability of representing mobile entities such as human flows and vehicles. CA and ABM can be combined to simulate spatiotemporal processes by taking advantages of cells in CA representing immobile entities and agents in an ABM representing mobile entities. Geographic automata (GA) systems are based on this hybrid approach (Torrens and Benenson, 2005). Besides CA and ABM, statistical modeling methodologies have also been applied to analyze and model spatiotemporal processes. Space-time autoregressive models and multivariate space-time regression models are examples of panel regression models, which account for temporal autocorrelation, spatial autocorrelation, or both spatial and temporal autocorrelation (An et al., 2015; Elhorst, 2012). A set of space-time autoregressive models includes space-time autoregressive (STAR), space-time autoregressive moving average, and space-time autoregressive integrated moving average models, whereas geographical and temporal weighted regression (Fotheringham et al., 2015) is an extension of a geographically weighted regression model (Fotheringham et al., 2003). Christakos et al. (2012) implemented a suite of space-time analysis functions for field-based applications based on Bayesian maximum entropy for studying spatiotemporal distributions of natural variables. Spatial interaction models are another modeling approach to estimate the volume of flows between origins and destinations based on the structural attributes of two regions (Fotheringham and O’Kelly, 1989; Niedzielski et al., 2015).

1.20.4.3

Applications

The result of DTM identified five subtopics related to the ST-GIS application, physical/environmental/climate geography, urban/ regional dynamics, risk, mobility/accessibility, and health.

1.20.4.3.1

Physical, environmental, and climate geography

Six dynamic window topics of ST-GIS relate to physical processes in geography. Generally, D3 describes land cover change, D5 illustrates water/groundwater quality, D6 portrays soil loss/erosion dynamics, D7 depicts landscape change, D13 explains air pollution exposure and concentration, and D17 represents climate change. Researchers in water quality and climate change were early adaptors of ST-GIS starting in the 1990s (T1). In the early 2000s, research in other topics related to physical geographic processes emerged. Various ST-GIS frameworks and methodologies have been employed to study these topics. For example, (1) survival analysis coupled with GIS and remote sensing data to model and analyze land changes (An and Brown, 2008); (2) a coupled model with GIS and remote sensing data to enhance the DRASTIC model, widely used for calculating an index of groundwater vulnerability, to assess spatio-temporal groundwater vulnerability (Albuquerque et al., 2013); (3) an integration of multidimensional modeling and GIS implemented in geographic resources analysis support system to examine the spatial and temporal process of soil erosion and deposition; (4) a voxel-based automata approach for modeling the propagation of airborne pollutants in three-dimensional space

298

Space-Time GIS and Its Evolution

over time (Jjumba and Dragicevic, 2015); (5) an integrated approach of GIS, data mining, and spatial statistics to examine spatial distribution and backward trajectories of airborne elemental pollution (Luo et al., 2016); (6) a new temporal GIS framework to represent the movement of isolines to capture differences in patterns and locations of change of climate variables (Bothwell and Yuan, 2012).

1.20.4.3.2

Urban/Regional dynamics

Four dynamic window topics of ST-GIS are associated with urban and regional dynamics such as urbanization, landscape change, city growth/expansion, and developments in China (D3, D7, D9, and D14). ST-GIS can facilitate the understanding of structure and mechanisms underlying the process of urban and regional dynamics, which ultimately supports decision-making for policy-makers, urban planners, developers, and residents. A simulation-based modeling approach is one popular method in ST-GIS to examine the process of urban and regional dynamics (Batty, 2013; Heppenstall et al., 2012). For example, CA, ABM, and other process-based simulation such as spatial Markov chains models have been applied to study urban growth/expansion (Batty, 1997; Clarke et al., 1997; White and Engelen, 2000), sprawl (Torrens, 2006, 2008), residential dynamics and segregation (Portugali et al., 1997), landscape change (Baker, 1989), and gentrification (O’Sullivan, 2002; Torrens and Nara, 2007). Urban and regional growth studies in China are identified as a distinct subtopic reflecting the rapid growth of the Chinese economy of the past three decades (Kwan et al., 2014).

1.20.4.3.3

Risk

The dynamic window topics of D08 and D11 represent research related to risk assessment and management for natural hazards and disasters. D08 depicts general risk topics including health, disaster, and hazard, whereas D11 focuses on fire events. GIS and ST-GIS have been applied within emergency management and used as a decision support tool in disaster response, recovery, mitigation, and preparedness (Cutter, 2006; Emrich et al., 2011). Examples of ST-GIS research in the context of risk include: (1) modeling community evacuation vulnerability using an integer programming model and GIS (Cova and Church, 1997); (2) fire spread modeling and GIS to delimit wildfire evacuation trigger point using data on wind, topography, and fuel in conjunction with estimated evacuation time (Cova et al., 2005); (3) social media data analysis for situation awareness during wildfire events (Wang et al., 2016); and (4) location-based social network and citizen-sensor approach to contextualize wildfire events (De Longueville et al., 2009).

1.20.4.3.4

Mobility/Accessibility

Research on human mobility and accessibility has greatly benefited from technological advancements in location-aware sensors and mobile devices. The result has been generation of unprecedented amounts of data on individual mobility. Dynamic window topics capture research trends in mobility, activity, interaction, and accessibility (D12, D16). The framework of time geography has been a popular approach to study these topics by exploring and visualizing individuals’ movement patterns and accessibility over time. For example, Kwan (Kwan, 1998, 1999) used time geography to study urban accessibility differences by gender and ethnicity. More recently, research interests in GIS-based analysis of accessibility to health care services are growing (Blanford et al., 2012; Kwan, 2013; Neutens, 2015). Spatiotemporal data analysis and trajectory-based analysis have been applied to characterize and examine movement patterns associated with everyday life (Yuan and Nara, 2015), crime(Groff et al., 2009) and indoor environment (Nara et al., 2017). Simulation models, particularly ABMs, have been applied to model human and animal mobility, decision-making processes, human–human interaction, and human–environment interaction over space and time (Ahearn et al., 2016; An et al., 2014; Heppenstall et al., 2012; Torrens, 2015).

1.20.4.3.5

Health

In recent years, there has been a growing interest in employing spatial thinking, GIS, and location-aware technologies to study disease and health geographically (Bhowmick et al., 2008; Dummer, 2008; Kulldorff et al., 1998; Kwan, 2004). By incorporating temporal information, ST-GIS can play a major role in advancing our understanding the geographical dynamics of health including the cause and spread of diseases, human mobility related to health behavior, and interactions between humans and the built environment over time. GIS has long been applied to the field of health, medicine, and epidemiology. A literature review by Lyseen et al. (2014) identified four distinct categories: (1) spatial analysis of disease focusing on disease mapping and modeling, geographical epidemiology, and environmental epidemiology; (2) spatial analysis of health service planning focusing on spatial analysis for planning, management, delivery, provision, accessibility, and utilization of health care and emergency facilities; (3) public health comprising spatial analysis for promotion, preventive rehabilitation activities, and health outcomes; and (4) health technologies and tools concentrating on technologies for collecting health data such as GPS, remote sensing, and wearable health monitoring devices, as well as health data manipulation tools. In these health geography applications, GIS and spatial analysis have been extensively used to map, analyze, and model disease and health data to develop new hypotheses in a geographic context; to analyze and predict future disease risks; and to undertake location/allocation analysis of the distribution of services and resources (Lyseen et al., 2014). Extending conventional GIS and spatial analysis to ST-GIS and spatiotemporal analysis, these analyses can further advance our understanding of spatiotemporal processes (i.e., changes in spatial patterns over time) and underlying phenomena related to disease and health geography. The result of DTM captures three ST-GIS dynamic window topics related to disease and health (D08, D10, and D13). These three topics emerged in the early 2000s specifically discussing disease epidemiology (D10), air pollution exposure in relation to health

Space-Time GIS and Its Evolution

299

(D13), and risk factor analysis associated health (D08). For example in disease epidemiology, GIS and spatial analysis have been used to map and reveal patterns of disease spread. In one of the most popular examples, John Snow, considered to be the father of modern epidemiology, used proto-GIS methods in an analog form to examine spatial diffusion of cholera deaths in relation to the built environment in Soho, London (Snow, 1855). By incorporating temporal information, the same phenomenon can be re-examined to study the space-time pattern of the cholera outbreak (Shiode et al., 2015) using Kernel Density Estimation and network-based scan statistics (Shiode, 2011). Software development such as SaTScan, ClusterSeer, GeoSurveillance, and Surveillance package for R have facilitated spatiotemporal analysis in health studies (Höhle, 2007; Jacquez et al., 2002; Kulldorff et al., 1998; Robertson and Nelson, 2010; Yamada et al., 2009). For instance, SaTScan is originally designed for performing geographical surveillance of disease to detect spatial, temporal, and space-time disease clusters and see if they are statistically significant (Kulldorff et al., 1998). It was used for more than 250 publications from many health application fields including infectious diseases (Liang et al., 2010), cancer (Kulldorff et al., 1997), and vaccines (Omer et al., 2008) to name a few. As a free software tool, SaTScan can be integrated into GIS (e.g., EpiScanGIS (Reinhardt et al., 2008), an online GIS to detect disease spatiotemporal clusters and visualize on maps). Furthermore, it has been now widely used in various research fields, including criminology (Leitner and Helbich, 2011; Nakaya and Yano, 2010) and natural disasters (Stevenson et al., 2010; Vadrevu, 2008). Other health application examples include studies of spatial and temporal distributions of dengue outbreak (Morrison et al., 1998), long-term exposure to particulate matter air pollution in relation to cognitive decline (Weuve et al., 2012), the spread of a communicable disease in an urban environment using agent-based simulation modeling (Perez and Dragicevic, 2009), and West Nile epidemics and population-based risk analysis (Lian et al., 2007).

1.20.5

Discussion

This chapter reviews the ST-GIS literature and identifies research topics and trends by conducting text mining analysis on the extant literature. The results depict topics and trends reflecting the current research progress on ST-GIS in terms of conceptualization and representation, methodologies, and applications. A wide variety of ST-GIS conceptualizations, representations, and methodologies have been developed and applied to a broad range of applications, ranging from urban planning, transportation management, and computational visualization and animation to physical, social, behavioral, medical, and geographical studies. As a result, a number of distinct forms of STGIS have evolved based on distinct data types and suites of scientific questions (Goodchild, 2013). Although some progress has been made under ST-GIS, there remain many difficult and challenging issues that need to be explored. One challenge related to advancing process-based simulation models (e.g., CA and ABM) is model validation, i.e., how adequately and what extent the model represents real-world spatiotemporal process. This process involves the goodness-of-fit of the model to data, and, for example, earlier ABM studies struggled to validate their models due to lack of sufficient dynamic individual-scale data. Currently, we live in the mobile age. Advancements in location-aware technologies and ubiquitously distributed geo-sensor networks allow the collection of fine-scale data about human dynamics. At the same time, high performance computing enables the simulation of large-scale ABM where millions of agents interact in simulated cities. These advancements provide new challenging opportunities to develop analytical methodologies that quantify complex properties generated from simulation (e.g., emergent behavior) and validate the model against “big data” collected from the real world. Another challenge is the integration of various spatiotemporal data, which are collected at multispatial and temporal scales. For example, geo-tagged social media data are a great resource for exploring and explaining human dynamics; however, there are only 1%–4% of GPS-tagged data, whereas approximately 70%–80% of social media data contain city-level geolocation information from user profiles or actual media content (Tsou, 2015; Tsou and Leitner, 2013). The integration of various social media data could enhance spatio-temporal analysis; however, it is challenging to develop an analytical framework because of the data integration complexity due to the modifiable area unit problem and the modifiable temporal unit problems (Tsou, 2015).

Acknowledgment This work was supported in part by the National Science Foundation under Grant No. 1634641, IMEE project titled “Integrated Stage-Based Evacuation with Social Perception Analysis and Dynamic Population Estimation.” Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

References Ahearn, S.C., Dodge, S., Simcharoen, A., Xavier, G., Smith, J.L.D., 2016. A context-sensitive correlated random walk: A new simulation model for movement. International Journal of Geographical Information Science 1–17. http://dx.doi.org/10.1080/13658816.2016.1224887. Albuquerque, M.T.D., Sanz, G., Oliveira, S.F., Martínez-Alegría, R., Antunes, I.M.H.R., 2013. Spatio-temporal groundwater vulnerability assessmentdA coupled remote sensing and GIS approach for historical land cover reconstruction. Water Resources Management 27, 4509–4526. http://dx.doi.org/10.1007/s11269-013-0422-0. An, L., Brown, D.G., 2008. Survival analysis in land change science: Integrating with GIScience to address temporal complexities. Annals of the Association of American Geographers 98, 323–344. http://dx.doi.org/10.1080/00045600701879045.

300

Space-Time GIS and Its Evolution

An, L., Tsou, M.-H., Crook, S.E.S., Chun, Y., Spitzberg, B., Gawron, J.M., Gupta, D.K., 2015. Space–time analysis: Concepts, quantitative methods, and future directions. Annals of the Association of American Geographers 105, 891–914. http://dx.doi.org/10.1080/00045608.2015.1064510. An, L., Zvoleff, A., Liu, J., Axinn, W., 2014. Agent-based modeling in coupled human and natural systems (CHANS): Lessons from a comparative analysis. Annals of the Association of American Geographers 104, 723–745. http://dx.doi.org/10.1080/00045608.2014.910085. Andrienko, N., Andrienko, G., 2006. Exploratory analysis of spatial and temporal data: A systematic approach. Springer Science & Business Media, Berlin. Baker, W.L., 1989. A review of models of landscape change. Landscape Ecology 2, 111–133. http://dx.doi.org/10.1007/BF00137155. Bar-Yam, Y., 2002. General features of complex systems. Encyclopedia of Life Support Systems (EOLSS). UNESCO and EOLSS Publishers, Oxford. Batty, M., 2013. The new science of cities. MIT Press, Cambridge, MA. Batty, M., 1997. Cellular automata and urban form: A primer. Journal of the American Planning Association 63, 266–274. Benhamou, S., 2004. How to reliably estimate the tortuosity of an animal’s path: Straightness, sinuosity, or fractal dimension? Journal of Theoretical Biology 229, 209–220. http:// dx.doi.org/10.1016/j.jtbi.2004.03.016. Berners-Lee, T., Cailliau, R., Groff, J., Pollermann, B., 1992. World Wide Web: The information universe. Internet Research 2, 52–58. http://dx.doi.org/10.1108/eb047254. Bhowmick, T., Griffin, A.L., MacEachren, A.M., Kluhsman, B.C., Lengerich, E.J., 2008. Informing geospatial toolset design: Understanding the process of cancer data exploration and analysis. Health Place 14, 576–607. http://dx.doi.org/10.1016/j.healthplace.2007.10.009. Blanford, J.I., Kumar, S., Luo, W., MacEachren, A.M., 2012. It’s a long, long walk: Accessibility to hospitals, maternity and integrated health centers in Niger. International Journal of Health Geographics 11, 24. http://dx.doi.org/10.1186/1476-072X-11-24. Blei, D.M., Lafferty, J.D., 2006. Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning. ACM, New York, pp. 113–120. Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022. Bothwell, J., Yuan, M., 2012. A spatiotemporal GIS framework applied to the analysis of changes in temperature patterns. Transactions in GIS 16, 901–919. http://dx.doi.org/ 10.1111/j.1467-9671.2012.01327.x. Brunsdon, C., Corcoran, J., Higgs, G., 2007. Visualising space and time in crime patterns: A comparison of methods. Computers, Environment and Urban Systems, Extracting Information from Spatial Datasets 31, 52–75. http://dx.doi.org/10.1016/j.compenvurbsys.2005.07.009. Christakos, G., Bogaert, P., Serre, M., 2012. Temporal GIS: Advanced functions for field-based applications. Springer Science & Business Media, New York. Claramunt, C., Thériault, M., 1995. Managing time in GIS: An event-oriented approach. In: Proceedings of the International Workshop on Temporal Databases: Recent Advances in Temporal Databases. Springer-Verlag, London, pp. 23–42. Clarke, K.C., Hoppen, S., Gaydos, L., 1997. A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environment and Planning B Planning and Design 24, 247–261. http://dx.doi.org/10.1068/b240247. Cova, T.J., Church, R.L., 1997. Modelling community evacuation vulnerability using GIS. International Journal of Geographical Information Science 11, 763–784. Cova, T.J., Dennison, P.E., Kim, T.H., Moritz, M.A., 2005. Setting wildfire evacuation trigger points using fire spread modeling and GIS. Transactions in GIS 9, 603–617. http:// dx.doi.org/10.1111/j.1467-9671.2005.00237.x. Crooks, A.T., Croitoru, A., Jenkins, A., Mahabir, R., Agouris, P., Stefanidis, A., 2016. User-generated big data and urban morphology. Built Environment 42, 396–414. http:// dx.doi.org/10.2148/benv.42.3.396. Cutter, S.L., 2006. GI science, disasters and emergency management. Hazards, Vulnerability and Environmental Justice 399–406. De Longueville, B., Smith, R.S., Luraschi, G., 2009. “OMG, from here, I can see the flames!”: A use case of mining location based social networks to acquire spatio-temporal data on forest fires. In: Proceedings of the 2009 International Workshop on Location Based Social Networks, LBSN ’09. ACM, New York, pp. 73–80. http://dx.doi.org/10.1145/ 1629890.1629907. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R., 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407. http://dx.doi.org/10.1002/(SICI)1097-4571(199009)41:63.0.CO;2-9. Dodge, S., Weibel, R., Ahearn, S.C., Buchin, M., Miller, J.A., 2016. Analysis of movement data. International Journal of Geographical Information Science 30, 825–834. http:// dx.doi.org/10.1080/13658816.2015.1132424. Dodge, S., Weibel, R., Lautenschütz, A.-K., 2008. Towards a taxonomy of movement patterns. Information Visualization 7, 240–252. http://dx.doi.org/10.1057/ palgrave.ivs.9500182. Dummer, T.J.B., 2008. Health geography: Supporting public health policy and planning. CMAJ 178, 1177–1180. http://dx.doi.org/10.1503/cmaj.071783. Earman, J., 1989. World enough and spacetime. MIT Press, Cambridge, MA. Elhorst, J.P., 2012. Dynamic spatial panels: Models, methods, and inferences. Journal of Geographical Systems 14, 5–28. http://dx.doi.org/10.1007/s10109-011-0158-4. Emrich, C.T., Cutter, S.L., Weschler, P.J., 2011. GIS and emergency management. In: The SAGE handbook of GIS and society. Sage, London, pp. 321–343. Fotheringham, A., O’Kelly, M.E., 1989. Spatial interaction models: Formulations and applications. Kluwer Academic Publishers, Dordrecht. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2003. Geographically weighted regression: The analysis of spatially varying relationships. Wiley, Chichester. Fotheringham, A.S., Crespo, R., Yao, J., 2015. Geographical and temporal weighted regression (GTWR). Geographical Analysis 47, 431–452. http://dx.doi.org/10.1111/ gean.12071. Galton, A., 2004. Fields and objects in space, time, and space-time. Spatial Cognition & Computation 4, 39–68. http://dx.doi.org/10.1207/s15427633scc0401_4. Gatalsky P, Andrienko N and Andrienko G (2004) Interactive analysis of event data using space-time cube. In: Eighth International Conference on Information Visualisation, 2004. IV 2004. Proceedings, pp. 145–152. http://dx.doi.org/10.1109/IV.2004.1320137. Gebbert, S., Pebesma, E., 2014. A temporal GIS for field based environmental modeling. Environmental Modelling & Software 53, 1–12. http://dx.doi.org/10.1016/ j.envsoft.2013.11.001. Golledge, R.G., Stimson, R., 1997. Spatial behavior: A geographic perspective. Guilford Press, New York. Goodchild, M.F., 2013. Prospects for a space–time GIS. Annals of the Association of American Geographers 103, 1072–1077. http://dx.doi.org/10.1080/ 00045608.2013.792175. Greene D and Cross JP (2016) Exploring the political agenda of the European parliament using a dynamic topic modeling approach. arXiv:1607.03055 [cs]. Groff, E., Weisburd, D., Morris, N.A., 2009. Where the action is at places: Examining spatio-temporal patterns of juvenile crime at places using trajectory analysis and GIS. In: Weisburd, D., Bernasco, W., Bruinsma, G.J.N. (Eds.), Putting crime in its place. Springer, New York, pp. 61–86. http://dx.doi.org/10.1007/978-0-387-09688-9_3. Guo, D., Chen, J., MacEachren, A.M., Liao, K., 2006. A visualization system for space-time and multivariate patterns (VIS-STAMP). IEEE Transactions on Visualization and Computer Graphics 12, 1461–1474. http://dx.doi.org/10.1109/TVCG.2006.84. Hägerstraand, T., 1970. What About People in Regional Science? Papers in Regional Science 24 (1), 7–24. http://dx.doi.org/10.1111/j.1435-5597.1970.tb01464.x. Heppenstall, A.J., Crooks, A.T., See, L.M., Batty, M. (Eds.), 2012. Agent-based models of geographical systems. Springer, Dordrecht. Höhle, M., 2007. Surveillance: An R package for the monitoring of infectious diseases. Computational Statistics 22, 571–582. http://dx.doi.org/10.1007/s00180-007-0074-8. Hornsby, K., Egenhofer, M.J., 2002. Modeling moving objects over multiple granularities. Annals of Mathematics and Artificial Intelligence 36, 177–194. http://dx.doi.org/10.1023/ A:1015812206586. Hornsby, K.S., Cole, S., 2007. Modeling moving geospatial objects from an event-based perspective. Transactions in GIS 11, 555–573. http://dx.doi.org/10.1111/j.14679671.2007.01060.x. Jacquez, G.M., Greiling, D.A., Durbeck, H., Estberg, L., Do, E., Long, A., Rommel, B., 2002. ClusterSeer user guide 2: Software for identifying disease clusters. TerraSeer Press, Ann Arbor.

Space-Time GIS and Its Evolution

301

Jjumba, A., Dragicevic, S., 2015. Integrating GIS-based geo-atom theory and voxel automata to simulate the dispersal of airborne pollutants. Transactions in GIS 19, 582–603. http://dx.doi.org/10.1111/tgis.12113. Kapler, T., Wright, W., 2005. GeoTime information visualization. Information Visualization 4, 136–146. http://dx.doi.org/10.1057/palgrave.ivs.9500097. Karssenberg, D., Schmitz, O., Salamon, P., de Jong, K., Bierkens, M.F.P., 2010. A software framework for construction of process-based stochastic spatio-temporal models and data assimilation. Environmental Modelling & Software 25, 489–502. http://dx.doi.org/10.1016/j.envsoft.2009.10.004. Kraak, M.-J., Koussoulakou, A., 2005. A visualization environment for the space-time-cube. In: Developments in spatial data handling. Springer, Berlin and Heidelberg, pp. 189–200. http://dx.doi.org/10.1007/3-540-26772-7_15. Kulldorff, M., Feuer, E.J., Miller, B.A., Freedman, L.S., 1997. Breast cancer clusters in the northeast United States: A geographic analysis. American Journal of Epidemiology 146, 161–170. Kulldorff, M., Rand, K., Gherman, G., Williams, G., DeFrancesco, D., 1998. SaTScan v 2.1: Software for the spatial and space-time scan statistics. National Cancer Institute, Bethesda, MD. Kwan, M.-P., 2013. Beyond space (as we knew it): Toward temporally integrated geographies of segregation, health, and accessibility. Annals of the Association of American Geographers 103, 1078–1086. http://dx.doi.org/10.1080/00045608.2013.792177. Kwan, M.-P., 2004. GIS methods in time-geographic research: Geocomputation and geovisualization of human activity patterns. Geografiska Annaler: Series B, Human Geography 86, 267–280. Kwan, M.-P., 1999. Gender and individual access to urban opportunities: A study using space–time measures. The Professional Geographer 51, 210–227. http://dx.doi.org/ 10.1111/0033-0124.00158. Kwan, M.-P., 1998. Space-time and integral measures of individual accessibility: A comparative analysis using a point-based framework. Geographical Analysis 30, 191–216. http://dx.doi.org/10.1111/j.1538-4632.1998.tb00396.x. Kwan M-P, Richardson D, Wang D and Zhou C (2015) Space-time integration in geography and GIScience: Research frontiers in the US and China. Netherlands: Springer. Langran, G., 1993. Issues of implementing a spatiotemporal system. International Journal of Geographical Information Systems 7, 305–314. http://dx.doi.org/10.1080/ 02693799308901963. Langran, G., 1989. A review of temporal database research and its use in GIS applications. International Journal of Geographical Information Systems 3, 215–232. http://dx.doi.org/ 10.1080/02693798908941509. Langran, G., Chrisman, N.R., 1988. A framework for temporal geographic information. Cartographica: The International Journal for Geographic Information and Geovisualization 25, 1–14. http://dx.doi.org/10.3138/K877-7273-2238-5Q6V. Laube, P., Dennis, T., Forer, P., Walker, M., 2007. Movement beyond the snapshotdDynamic analysis of geospatial lifelines. Computers, Environment and Urban Systems, Geospatial Analysis and Modeling 31, 481–501. http://dx.doi.org/10.1016/j.compenvurbsys.2007.08.002. Laube, P., van Kreveld, M., Imfeld, S., 2005. Finding REMOdDetecting relative motion patterns in geospatial lifelines. In: Developments in spatial data handling. Springer, Berlin and Heidelberg, pp. 201–215. http://dx.doi.org/10.1007/3-540-26772-7_16. Leitner, M., Helbich, M., 2011. The impact of hurricanes on crime: A spatio-temporal analysis in the city of Houston, Texas. Cartography and Geographic Information Science 38, 213–221. http://dx.doi.org/10.1559/15230406382213. Lian, M., Warner, R.D., Alexander, J.L., Dixon, K.R., 2007. Using geographic information systems and spatial and space-time scan statistics for a population-based risk analysis of the 2002 equine West Nile epidemic in six contiguous regions of Texas. International Journal of Health Geographics 6, 42. http://dx.doi.org/10.1186/1476-072X-6-42. Liang, L., Xu, B., Chen, Y., Liu, Y., Cao, W., Fang, L., Feng, L., Goodchild, M.F., Gong, P., 2010. Combining spatial-temporal and phylogenetic analysis approaches for improved understanding on global H5N1 transmission. PLoS One 5, e13575. http://dx.doi.org/10.1371/journal.pone.0013575. Long, J.A., Nelson, T.A., 2013. A review of quantitative methods for movement data. International Journal of Geographical Information Science 27, 292–318. http://dx.doi.org/ 10.1080/13658816.2012.682578. Luo, N., An, L., Nara, A., Yan, X., Zhao, W., 2016. GIS-based multielement source analysis of dustfall in Beijing: A study of 40 major and trace elements. Chemosphere 152, 123– 131. http://dx.doi.org/10.1016/j.chemosphere.2016.02.099. Lyseen AK, Nøhr C, Sørensen EM, Gudes O, Geraghty EM, Shaw NT, and Bivona-Tellez C (2014) A review and framework for categorizing current research and development in health related geographical information systems (GIS) studies. Yearbook of Medical Informatics 9, 110–124. http://dx.doi.org/10.15265/IY-2014-0008. Meentemeyer, V., 1989. Geographical perspectives of space, time, and scale. Landscape Ecology 3, 163–173. http://dx.doi.org/10.1007/BF00131535. Miller, H., 1991. Modelling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information Systems 5, 287–301. http://dx.doi.org/10.1080/02693799108927856. Miller, H.J., 1999. Measuring space-time accessibility benefits within transportation networks: Basic theory and computational procedures. Geographical Analysis 31, 187–212. http://dx.doi.org/10.1111/j.1538-4632.1999.tb00976.x. Miller, H.J., Bridwell, S.A., 2009. A field-based theory for time geography. Annals of the Association of American Geographers 99, 49–75. http://dx.doi.org/10.1080/ 00045600802471049. Morrison, A.C., Getis, A., Santiago, M., Rigau-Perez, J.G., Reiter, P., 1998. Exploratory space-time analysis of reported dengue cases during an outbreak in Florida, Puerto Rico, 1991–1992. American Society of Tropical Medicine and Hygiene 58, 287–298. Nakaya, T., Yano, K., 2010. Visualising crime clusters in a space-time cube: An exploratory data-analysis approach using space-time kernel density estimation and scan statistics. Transactions in GIS 14, 223–239. http://dx.doi.org/10.1111/j.1467-9671.2010.01194.x. Nara A, Allen C and Izumi K (2017) Surgical phase recognition using movement data from video imagery and location sensor data. In: Griffith DA, Chun Y & Dean DJ (eds.) Advances in geocomputation, advances in geographic information science, pp. 229–237. Springer International Publishing. http://dx.doi.org/10.1007/978-3-319-22786-3_21. Nara, A., Torrens, P., 2011. Trajectory data mining: Classification and spatio-temporal visualization of mobile objects. In: Cheng, T., Longley, T., Ellul, C., Chow, A.H.F. (Eds.), Proceedings of 11th International Conference of GeoComputation. University College London, London, pp. 338–344. Nara A and Torrens PM (2007) Spatial and temporal analysis of pedestrian egress behavior and efficiency. In: Proceedings of the 15th Annual ACM International Symposium on Advances in Geographic Information Systems, p. 59. Seattle, WA. Neutens, T., 2015. Accessibility, equity and health care: Review and research directions for transport geographers. Journal of Transport Geography 43, 14–27. http://dx.doi.org/ 10.1016/j.jtrangeo.2014.12.006. Niedzielski, M.A., O’Kelly, M.E., Boschmann, E.E., 2015. Synthesizing spatial interaction data for social science research: Validation and an investigation of spatial mismatch in Wichita, Kansas. Computers, Environment and Urban Systems 54, 204–218. http://dx.doi.org/10.1016/j.compenvurbsys.2015.09.004. O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P., 2015. An analysis of the coherence of descriptors in topic modeling. Expert Systems with Applications 42, 5645–5657. http://dx.doi.org/10.1016/j.eswa.2015.02.055. Omer, S.B., Enger, K.S., Moulton, L.H., Halsey, N.A., Stokley, S., Salmon, D.A., 2008. Geographic clustering of nonmedical exemptions to school immunization requirements and associations with geographic clustering of pertussis. American Journal of Epidemiology 168, 1389–1396. Openshaw, S. (1984). The modifiable areal unit problem. Presented at the CATMOG 38, GeoBooks, Norwich. O’Sullivan, D., 2002. Toward micro-scale spatial modeling of gentrification. Journal of Geographical Systems 4, 251–274. Parent C, Spaccapietra S, Renso C, Andrienko G, Andrienko N, Bogorny V, Damiani ML, Gkoulalas-Divanis A, Macedo J, Pelekis N, Theodoridis Y, and Yan Z (2013) Semantic trajectories modeling and analysis. ACM Computing Surveys 45, 42:1–42:32. http://dx.doi.org/10.1145/2501654.2501656. Pelekis, N., Theodoulidis, B., Kopanakis, I., Theodoridis, Y., 2004. Literature review of spatio-temporal database models. Knowledge Engineering Review 19, 235–274. http:// dx.doi.org/10.1017/S026988890400013X.

302

Space-Time GIS and Its Evolution

Perez, L., Dragicevic, S., 2009. An agent-based approach for modeling dynamics of contagious disease spread. International Journal of Health Geographics 8, 50. http://dx.doi.org/ 10.1186/1476-072X-8-50. Peuquet, D.J., 1994. It’s about time: A conceptual framework for the representation of temporal dynamics in geographic information systems. Annals of the Association of American Geographers 84, 441–461. http://dx.doi.org/10.1111/j.1467-8306.1994.tb01869.x. Peuquet, D.J., Duan, N., 1995. An event-based spatiotemporal data model (ESTDM) for temporal analysis of geographical data. International Journal of Geographical Information Systems 9, 7–24. http://dx.doi.org/10.1080/02693799508902022. Portugali, J., Benenson, I., Omer, I., 1997. Spatial cognitive dissonance and sociospatial emergence in a self-organizing city. Environment and Planning B: Planning and Design 24, 263–285. http://dx.doi.org/10.1068/b240263. Raper, J., Livingstone, D., 1995. Development of a geomorphological spatial model using object-oriented design. International Journal of Geographical Information Systems 9, 359–383. http://dx.doi.org/10.1080/02693799508902044. Reinhardt, M., Elias, J., Albert, J., Frosch, M., Harmsen, D., Vogel, U., 2008. EpiScanGIS: An online geographic surveillance system for meningococcal disease. International Journal of Health Geographics 7, 33. http://dx.doi.org/10.1186/1476-072X-7-33. Rey, S.J., Janikas, M.V., 2006. STARS: Space–time analysis of regional systems. Geographical Analysis 38, 67–86. http://dx.doi.org/10.1111/j.0016-7363.2005.00675.x. Robertson, C., Nelson, T.A., 2010. Review of software for space-time disease surveillance. International Journal of Health Geographics 9, 16. http://dx.doi.org/10.1186/1476072X-9-16. Shaw, S.-L., Yu, H., Bombom, L.S., 2008. A space-time GIS approach to exploring large individual-based spatiotemporal datasets. Transactions in GIS 12, 425–441. http:// dx.doi.org/10.1111/j.1467-9671.2008.01114.x. Shiode, N., Shiode, S., Rod-Thatcher, E., Rana, S., Vinten-Johansen, P., 2015. The mortality rates and the space-time patterns of John Snow’s cholera epidemic map. International Journal of Health Geographics 14, 21. http://dx.doi.org/10.1186/s12942-015-0011-y. Shiode, S., 2011. Street-level spatial scan statistic and STAC for analysing street crime concentrations. Transactions in GIS 15, 365–383. http://dx.doi.org/10.1111/j.14679671.2011.01255.x. Snow, J., 1855. On the mode of communication of cholera. John Churchill, London. Spaccapietra, S., Parent, C., Damiani, M.L., de Macedo, J.A., Porto, F., Vangenot, C., 2008. A conceptual view on trajectories. Data and Knowledge Engineering 65, 126–146. http://dx.doi.org/10.1016/j.datak.2007.10.008. Stevenson, J.R., Emrich, C.T., Mitchell, J.T., Cutter, S.L., 2010. Using building permits to monitor disaster recovery: A spatio-temporal case study of coastal Mississippi following hurricane Katrina. Cartography and Geographic Information Science 37, 57–68. http://dx.doi.org/10.1559/152304010790588052. Steyvers, M. and Griffiths, T. (2007). Probabilistic topic models. In: Landauer T, McNamara D, Dennis S, and Kintsch W (eds.) Handbook of Latent Semantic Analysis, pp. 427–448, Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Torrens, P.M., 2015. Intertwining agents and environments. Environmental Earth Sciences 74, 7117–7131. http://dx.doi.org/10.1007/s12665-015-4738-3. Torrens, P.M., 2008. A toolkit for measuring sprawl. Applied Spatial Analysis and Policy 1, 5–36. Torrens, P.M., 2006. Simulating sprawl. Annals of the Association of American Geographers 96, 248–275. Torrens, P.M., Benenson, I., 2005. Geographic automata systems. International Journal of Geographical Information Science 19, 385–412. Torrens, P.M., Nara, A., 2007. Modeling gentrification dynamics: A hybrid approach. Computers, Environment and Urban Systems 31, 337–361. http://dx.doi.org/10.1016/ j.compenvurbsys.2006.07.004. Torrens, P.M., Nara, A., Li, X., Zhu, H., Griffin, W.A., Brown, S.B., 2012. An extensible simulation environment and movement metrics for testing walking behavior in agent-based models. Computers, Environment and Urban Systems 36, 1–17. Tsou, M.-H., 2015. Research challenges and opportunities in mapping social media and big data. Cartography and Geographic Information Science 42, 70–74. http://dx.doi.org/ 10.1080/15230406.2015.1059251. Tsou, M.-H., Leitner, M., 2013. Visualization of social media: Seeing a mirage or a message? Cartography and Geographic Information Science 40, 55–60. http://dx.doi.org/ 10.1080/15230406.2013.776754. Vadrevu, K.P., 2008. Analysis of fire events and controlling factors in eastern India using spatial scan and multivariate statistics. Geografiska Annaler: Series A, Physical Geography 90, 315–328. http://dx.doi.org/10.1111/j.1468-0459.2008.00348.x. Wachowicz, M., 2003. Object-oriented design for temporal GIS. CRC Press, Boca Raton. Wang, Z., Ye, X., Tsou, M.-H., 2016. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards 83, 523–540. http://dx.doi.org/10.1007/s11069-0162329-6. Weuve, J., Puett, R.C., Schwartz, J., Yanosky, J.D., Laden, F., Grodstein, F., 2012. Exposure to particulate air pollution and cognitive decline in older women. Archives of Internal Medicine 172, 219–227. http://dx.doi.org/10.1001/archinternmed.2011.683. White, R., Engelen, G., 2000. High-resolution integrated modelling of the spatial dynamics of urban and regional systems. Computers, Environment and Urban Systems 24, 383–400. http://dx.doi.org/10.1016/S0198-9715(00)00012-0. Worboys, M.F., 1994a. A unified model for spatial and temporal information. The Computer Journal 37, 26–34. http://dx.doi.org/10.1093/comjnl/37.1.26. Worboys, Michael F., 1994b. Object-oriented approaches to geo-referenced information. International Journal of Geographical Information Systems 8, 385–399. http://dx.doi.org/ 10.1080/02693799408902008. Wrigley N, Holt T, Steel D, and Tranmer M (1996) Analysing, modelling, and resolving the ecological fallacy. In: Longley P and Batty M (eds.) Spatial Analysis: Modelling in a GIS environment, pp. 23–40, New York, NY: John Wiley & Sons, Inc. Yamada, I., Rogerson, P.A., Lee, G., 2009. GeoSurveillance: A GIS-based system for the detection and monitoring of spatial clusters. Journal of Geographical Systems 11, 155–173. http://dx.doi.org/10.1007/s10109-009-0080-1. Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F., 2016. Big data and cloud computing: Innovation opportunities and challenges. International Journal of Digital Earth 1–41. http:// dx.doi.org/10.1080/17538947.2016.1239771. Yu, H., Shaw, S.-L., 2008. Exploring potential human activities in physical and virtual spaces: A spatiotemporal GIS approach. International Journal of Geographical Information Science 22, 409–430. http://dx.doi.org/10.1080/13658810701427569. Yu H and Shaw S-L (2007) Revisiting Hägerstrand’s time-geographic framework for individual activities in the age of instant access. In: Miller, H. J. (ed.) Societies and cities in the age of instant access. The GeoJournal library, pp. 103–118. Springer. http://dx.doi.org/10.1007/1-4020-5427-0_7 Yuan M (2016) Space-time GIS. In: Warf B (ed.) Oxford Bibliographies in Geography. New York, NY: Oxford University Press. http://dx.doi.org/10.1093/obo/9780199874002-0142 Yuan, M., 1999. Representing geographic information to enhance GIS support for complex spatiotemporal queries. Transactions in GIS 3, 137–160. Yuan, M., 1996. Modeling semantical, temporal and spatial information in geographic information systems. In: Geographic information research: Bringing the Atlantic. Taylor & Francis, London, pp. 334–347. Yuan, M., Nara, A., 2015. Space-time analytics of tracks for the understanding of patterns of life. In: Kwan, M.-P., Richardson, D., Wang, D., Zhou, C. (Eds.), Space-time integration in geography and GIScience: Research frontiers in the US and China. Springer, Dordrecht, pp. 373–398. Yuan, M., Nara, A., Bothwell, J., 2014. Space–time representation and analytics. Annals of GIS 20, 1–9.

1.21

Time Geography

Jie Dai, San Diego State University, San Diego, CA, United States; and University of California, Santa Barbara, CA, United States Li An, San Diego State University, San Diego, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.21.1 1.21.2 1.21.3 1.21.3.1 1.21.3.2 1.21.3.2.1 1.21.3.2.2 1.21.3.3 1.21.3.4 1.21.3.5 1.21.3.6 1.21.3.7 1.21.4 1.21.4.1 1.21.4.2 1.21.4.3 1.21.5 References

1.21.1

Time in Geographical Research The Notion of Time Geographical Research Methods Involving Time Classical Time Geography Space–Time Analysis Spatial panel data visualization and analysis Spatial panel data regression Survival Analysis Spatial Latent Trajectory Models Spatial Markov Chains Models Cellular Automaton Agent-Based Simulation Conclusion and Discussion Tight Integration of Space and Time Span and Granularity of Time Imprecision of Time Measurement Future Directions

303 303 303 304 306 306 306 307 308 308 308 308 309 309 309 310 310 311

Time in Geographical Research

Traditional geographical research usually explores where, and to a large extent why, events and phenomena occur, and the notion of space is at the core of what we consider as the discipline of geography. Seldom (if not never) do geographers shed light on the notion of time, considering when and for how long those events and phenomena take place. Temporal issues are taken into consideration to examine the change of subjects or phenomena of interest in time. Nevertheless, even if time functions as a significant factor in the process, the traditional approach is to explore through a “snapshot model” where several static snapshots of the subjects or phenomena of interest are captured so as to study the temporal trends and/or mechanisms behind such trends (An and Brown, 2008; An et al., 2015; Armstrong, 1988). The objective of this review is to examine how time has been and/or can be involved in time geography, and more broadly, in geographical research, what research methods have been employed and what other methods are promising for this purpose. We conclude this review with future directions of time geography.

1.21.2

The Notion of Time

The classic dichotomy of space is between absolute space and relative space (Cresswell, 2013; Dainton, 2001; Hinckfuss, 1974). The former usually utilizes the longitude and latitude (or x and y coordinates, sometimes with the z coordinate or the elevation) defined in a certain geographical coordinate system, while the later locate an anchor/origin in the geographical space as a reference, and all other locations are defined as the relative relief from the origin. Similar to the above dichotomy, time can also be defined accordingly, with the absolute time represented by exact reading in a watch (e.g., second, minute, hour, date, month, and year), and relative time defined as the length of time prior to or has elapsed from a specific spot in the temporal domain.

1.21.3

Geographical Research Methods Involving Time

The related literature suggests that time can play three roles in geographical research. First, time can be considered as specific spot or interval in the temporal domain, which is the core issue of interest. Example research questions can be “When will things occur and for how long?” The second role of time is to be considered as a constraint, in which time may have a dominant effect on the occurrence of events. Research questions of this purpose can be “Given a certain amount of time, what will happen? Where can I go? And what can I do?” In the last scenario, time acts as the explanatory factor of events and phenomena. Most space–time geographical research involves time in this way, from pattern revelation and prediction, to time-series analysis of spatial objects. Example questions can be “How will the extent of a phenomenon change through time?” or “How will the object(s) of interest be located at certain times?”

303

304

Time Geography

Before exploring potential research methods that consider temporal issues, it is beneficial to mention the different data types involved in time geography or generic geographical research (An et al., 2015). The first category of data is the individual movement data. The individuals of interest can be animals, people, vehicles, etc. that move across the geographical space. This kind of data records information considering the movement of the corresponding individuals, for instance time and position through the moving trajectory. The second kind of data is the so-called spatial panel data, which is usually consisted of georeferenced cross-sectional units at multiple times. Immobile pixels, parcels, polygons, or data collection sites are good examples. Beside the abovementioned two major types of data, another unique category is the so-called event or transaction data, which are usually nominal, indicating whether an event or phenomenon occurs at a certain time and location. In the next several paragraphs, we introduce some typical conceptual frameworks and/or methods that incorporate time in geographical research. All methods mentioned in this review will incorporate some or all of these data types.

1.21.3.1

Classical Time Geography

Classical time geography essentially deals with individual movements over space and time. Accordingly, its methods often incorporate individual movement data, and time usually acts as constraints to each individual’s schedule. The sentinel article in time geography, “What about people in the regional science?” (Hagerstrand, 1970), has laid the foundation for classical time geography. Several key concepts are illustrated here with delicate details. We present these concepts with the help of hypothetical situations. Consider this scenario of one person: after finishing that day’s work, she leaves her office at 5 pm. She decides to be home no later than 6 pm. And she wants to stop by a grocery store and purchase some food materials for dinner. It takes her about 15 minutes to get to the store, and about half-hour to do the shopping. Finally she gets home on time. Within this scenario, there are three places, or anchors, involved in the space domain: work place, grocery store, and home. The time span is from 5 to 6 pm. The movement of the person and the time arrangement can be easily depicted as a space–time path (Fig. 1). Sometimes the schedule of this person can be very flexible. But in this scenario, there are some constraints that shape or at least affect her schedule. First, since the work time does not end till 5 pm, theoretically she is not supposed to leave before that. This kind of constraint is referred to as authority constraints, within which a person must obey rules, and events are under control of a given individual, group, or organization with authority. For example, you can only visit a museum during its open hours, and you are not allowed to light a cigarette in smoke-free areas. The second category of constraints, capability constraints, is related to that particular person’s “biological construction and/or the tools he can command” (Hagerstrand, 1970). In our precedent scenario, it takes the person 15 minutes to commute from her work place to the grocery store. She can either walk on her own, which wastes no time, or take the bus, which costs her waiting for several minutes, but travels faster. Either way, she cannot get to the store in less than 15 minutes. The third family of constraints, coupling constraints, occurs when one person has to join another person, entity, or tool to accomplish a goal. For example, she decides to work out after dinner, and she contacts her friend, who would not be available until 8 pm. Unless she wants to go to the gymnasium alone, she has to schedule for 8 pm or later. Considering that same person, on the other day, she decides to take her time after work, and to get home no later than 6:30 pm. With the 1.5-hour time budget, she has many choices to perform all kinds of activities in different places. And she can also decide how much time she wants to invest in those activities during the time interval. The space–time prism can determine the potential space and time that is available given her time budget (Fig. 2). A space–time prism can be considered as an aggregation of all possible space–time paths. The projected gray area in the geographical space is the potential path space, which is a collection of all possible places this person can visit, given the 1.5-hour time budget. One problem with the classical time geography model is that it assumes an equal accessibility to all geographical units. In this conceptually simple model, travel velocity is usually constant, thus their Euclidean distance divided by the velocity gives the commuting time between any two locations. With this assumption, if the time budget is defined, the longest possible travel distance

Time (pm) 6 : 15 6 : 00 5 : 45 Geographical space

5 : 30 5 : 15 5 : 00 Fig. 1

Example of space–time path.

Grocery

Work

Home

Time Geography

305

Time (pm) 7 : 00 Total time budget (1.5 h)

6 : 30 6 : 00

Activity time

5 : 30 Geographical space

5 : 00 Work

Fig. 2

Potential path space

Home

Example of space–time prism.

(denote as Dmax) will also be a constant. The calculation of potential path space (Fig. 2) will be a simple geometry problem: we are looking for points in the geographical space, where the sum of distances from this point to the two locations are no larger than Dmax. The answer to this question will be an ellipse, and the two anchors are its focuses. This can be a dangerous assumption to make, especially in developed areas, where the actual travel distance is defined by the route connecting the two locations, and the speed is decided by factors like traffic, transportation tool, etc. To make the geographical space more realistic in the original model, we can display the road networks and assign speed limits to the roads. Even more complicated road conditions, like one-way roads and whether allowing U turn or not, can also be introduced. With the help of modern GIS software like ArcGIS, we can conduct a network analysis and easily figure out the potential path tree (Fig. 3), which includes all potential paths that can be taken given the time budget.

(A)

(B)

Max. velocity (km/h) 100 80 60 40 20

(C)

(D)

Fig. 3 Examples of a road network (A), potential path tree (B), the network-based TGDE for two (C) and three (D) anchors. The red stop signs are the anchors in the geographical space. Darker blue networks indicate higher probability of visiting. (C and D) From Downs, J. A. and Horner, M. W. (2012). Probabilistic potential path trees for visualizing and analyzing vehicle tracking data. Journal of Transport Geography 23, 72–80.

306

Time Geography

Although theoretically the person can visit any part of the tree, in reality the probability of a particular part of the tree being visited may vary based on the nodes and topology of the road network. Utilizing vehicle-tracking data, Downs and Horner (2012) developed the network-based time-geographic density estimation (TGDE) to evaluate these probabilities. Network-based TGDE is adopted from the traditional kernel density estimation. It replaces the Euclidean distance of the original kernel in the decay function with a network-based travel time. The results of applying this function to the potential path tree are shown in Fig. 3. Panel (A) is a hypothetical road network, with maximum travel velocity assigned. Given two anchors in the network, panel (B) depicts the potential path tree, which entails the road network accessible by a vehicle given the time interval. Panel (C) applies the probabilistic TGDE to the potential path tree, in which darker blue roads are visited more likely than lighter blue ones. And panel (D) shows the results with a three-anchor case.

1.21.3.2

Space–Time Analysis

Traditional time-series analysis looks at the temporal changes of a certain phenomenon or a set of entities of interest without explicit consideration of the related spatial locations. Measurements are recorded at varying temporal intervals to capture the changes through time. Once we know the underlying temporal principle(s) or function(s), we can make predictions at a certain future time spot or interval. In this case, time is usually the only explanatory independent variable involved in this type of statistical analysis. With some spatial factor(s) involved, space–time analysis examines, more often the case but not limited to, how the phenomenon or entities of interest are unfolded and evolve over time. Typical applications include the prediction of tidal heights and rainfall in the atmospheric and oceanic field, and generating terrestrial phenological metrics from remotely sensed satellite imagery.

1.21.3.2.1

Spatial panel data visualization and analysis

Spatial panel data largely refer to temporal sequences of snapshots, temporal sequences of polygon coverages, or time series of multidimensional data with locational data (An et al., 2015). We classify the related work into space–time pattern visualization and explanation/prediction for presentation convenience, though they are often intertwined with each other. We do not elaborate on software tools, especially those that are designed to perform spatial analysis alone, because of space limit as well as our equal emphasis on the ability to handle time variability. However, it is worth mentioning that software development is one of top priority areas for spatial panel data visualization and analysis. Several noncommercial packages such as STARS (Space– Time Analysis of Regional Systems, open source; to be mentioned later) and CrimeStat (freely distributed for educational or research purposes, but owned by Ned Levine and Associates) have strong capabilities for space–time clustering, diffusion, and interaction (Levine, 2004). Increasingly commercial GIS packages start to include a number of space–time metrics, tools, and techniques for pattern visualization and explanation/prediction purposes, such as visualizing events in three dimensions and performing space–time hot spot analysis in ArcGIS (ESRI, 2016). Also worthy of mention is the Spatial-Temporal Analysis of Moving Polygons (STAMP) program that is designed and implemented as an ArcGIS toolbar. This program can be used to generate graphs, calculate measures, and summarize space–time histories (Robertson et al., 2007). For pattern visualization purpose, geovisualization has been long studied with a multitude of accomplishments. Excellent overviews by Dykes et al. (2005) and Nöllenburg (2007) are available for readers with interest in this topic. Pattern visualization has been focus of a range of studies or projects worth mentioning. The STARS package has been designed for exploratory analysis of spatial panel data (Rey and Janikas, 2006). Now STARS has evolved into PySal, which is a vector data oriented tool for spatial data analysis and geocomputation. In addition to visualizing spatial data over time, STARS allows for a set of geocomputational methods to calculate metrics such as global and local Moran’s I. Along this line, the ArcGIS-based tool “Extended TimeGeographic Framework Tools” aims to generate aggregate-level metrics and visualization graphs (Shaw et al., 2008). The SpatioTemporal Moving Average and Correlated Walk Analysis makes it possible to track the movement of the mean location of an event/feature, opening the door for predicting the location and timing of certain events or features (Levine, 2004). Similarly, developing space–time metrics (e.g., Anselin, 1995; Delmelle et al., 2014; Leibovici et al., 2014; Levine, 2004; Rey et al., 2005; Ye and Carroll, 2011), performing temporal query and dynamic navigation (e.g., Lee et al., 2014), and identifying spatiotemporal clusters (or hotspots) in a GIS environment represent recent progress in pattern detection and visualization. In particular, recent years have witnessed development of probability density maps of space–time hotspots based on individuallevel movement data (e.g., GPS measurements; Scheepens et al., 2014; Scholz and Lu, 2014). There are other studies that explore methods for determining correlation in both time and space, including space–time covariance structures (Guttorp et al., 1994) and spatial groupings in a GIS environment (Rouhani and Wackernagel, 1990). Readers with interest are referred to An et al. (2015).

1.21.3.2.2

Spatial panel data regression

Then we turn to explanation or prediction of space–time data. Panel regression models, largely available in the toolbox of space– time analysis, are regression models that make use of panel data. Such models are diverse for multipurposes, and here we list several major ones: space–time autoregressive models (without exogenous variables), multivariate space–time regression models (with exogenous variables), and other variants. Panel regression models are unique because of their consideration of (1) temporal autocorrelation (e.g., Elhorst, 2012); (2) spatial autocorrelation, and/or (3) both spatial and temporal autocorrelation (An et al., 2015). A large number of models have been developed by econometriciansde.g., fixed and random effects modelsdin the last decade or

Time Geography

307

so (e.g., Elhorst, 2012; Lee and Yu, 2010). Readers with interest are referred to the excellent reviews by Lee and Yu (2010) and Elhorst (2012) about panel regression models, and by An et al. (2015) for overall space–time analysis. There is a family of autoregressive models, in which the space–time processes are assumed to be stationary (or near stationary) in both space and time. Such models include space–time autoregressive, space–time moving average, space–time autoregressive moving average, and space–time autoregressive integrated moving average models. However, the spatial or temporal stationarity may or may not be true in many time-geographical applications, efforts have been invested to deal with potential nonstationarity (e.g., Cheng et al., 2011). Based on the geographically weighted regression framework, the geographically and temporally weighted autoregressive model developed by Wu et al. (2014) is a promising approach to handling both temporal nonstationarity and spatial autocorrelation simultaneously through creatively employing a linear combination of both spatial and temporal distances for all the space–time points in a spatiotemporal weights matrix. Following An et al. (2015), we define multivariate space–time regression models to be panel regression models with exogenous (or independent) variables. These models often decompose the dependent variable of interest into some function of site-dependent, time-dependent, and/or site–time interaction terms (e.g., Assuncao et al., 2001; Lophaven et al., 2004; Natvig and Tvete, 2007). Alternatively, a latent space–time process can be assumed to generate the observed data, which are also subject to some random, unknown perturbation (Cheng et al., 2011). The latent space–time process can be determined by a set of site or time-related variables, which may be justified by the related theory or literature. Multivariate space–time regression models may ensue a large number of parameters, which present a daunting task for data collection and limits the usefulness (generality) of such models. In such instances, modelers often choose a (much) smaller number of parameters, assuming they follow some statistical distribution(s). Such parameters can be estimated using the Bayesian space– time approach. For instance, modelers often represent the dependent variable of interest at site i and time t as a function ( f ) of location, time, and (sometimes) a set of covariates (e.g., Assuncao et al., 2001; Furrer et al., 2007; Natvig and Tvete, 2007). Prior knowledge about the phenomena or events of interest is employed in the process of estimating the associated parameters. The Bayesian approach, though very useful, suffers from its high complexity and high computational intensity (Biggeri and Martuzzi, 2003). Some assumptions (e.g., certain parametric functions between the dependent variable and time and other independent variables) may be questionable and lack of clear guidelines, which warrants and more research. Similar challenges are also found in the process of choosing appropriate a priori conditional distribution and posterior distribution. If the phenomenon or process of interest is too complex, researchers often consider decomposing it into several hierarchical processes or components. This sort of pursuit largely represents the hierarchical Bayesian approach of space–time analysis. Nail et al. (2011) present an example of this type of approach, where they decompose the ozone level at a certain site and time into two components: one is the local emissions and the other is regional transport. The local ozone level within a parcel of air is explained by a function of the amounts of NOx and volatile organic compounds (VOCs), the composition of VOCs, and the maximum daily temperature at the same time interval. At the regional level, the transported ozone level is a function of the weighted average of ozone observations at the previous time step. Hitherto our focus has been on mainstream multivariate space–time regression models. Below we turn to several nontraditional methods or models due to their high analytical power for spatial panel data analysis. These methods, if employed appropriately, will very likely complement the abovementioned panel regression models.

1.21.3.3

Survival Analysis

Adopted from social science and public health, survival analysis (also called event history analysis) incorporates time as the core research objective by examining the occurrence and timing of events or phenomenon (An and Brown, 2008; An et al., 2011). The data being incorporated can be individual movement data, spatial panel data, or transaction data (An et al., 2015). There are two basic concepts considering survival analysis. One is the hazard rate, which is the instantaneous risk that an event occurs at a certain time, given the individual survives to a time point of interest. The change of hazard rate through time is expressed through the hazard function. The other important concept is the survival probability, which is the probability that the individual will survive to a certain time of interest. Hazard rate can be usually related to a suite of explanatory variables that are time dependent (changing through time, like age) or time independent (constant through time, like gender). Thus hazard rate can fluctuate as time elapses. The survival probability will be one minus the integral of hazard function, thus survival function is always monotonic and survival probabilities are either decreasing or constant. Survival analysis tracks the history of individuals (people, machines, animals, etc.) until the event of interest comes up. So by its very nature, survival analysis deals with identifiable entities or objects (An and Brown, 2008; An et al., 2011). Survival analysis is relevant because it predicts or explains (at least partially) at what time (or time interval) a certain event or phenomenon may occur, which simultaneously examines the surrounding environment through looking into the values of the time-dependent and/or timeindependent variables. Equally (if not more) importantly, survival analysis allows the use of censored timing data, in which knowledge only exists about the earliest time, the latest time, or time interval that an event may have happened. This handles more elegantly the time imprecision problem that besets many geographical data (such as the spatial panel data mentioned above) and the subsequent data analysis (An and Brown, 2008). Traditional survival analysis models are often applied in social and medical sciences to handle change of status (e.g., marriage, divorce or end of marriage, child bearing, and death of patients), but seldom exposed to geographic research with very few

308

Time Geography

exceptions (Coomes et al., 2000; Irwin and Bockstael, 2002; Vance and Geoghegan, 2002). An and Brown (2008) and An et al. (2011) extend traditional survival analysis to the geographic and land change arena innovatively, in which they treat land parcels as units of analysis, and explore what variables may help explain a certain land unit’s survival (remain as is) or change of status (i.e., change in land use type).

1.21.3.4

Spatial Latent Trajectory Models

Similar to traditional time-series analysis, latent trajectory models (LTMs) look at the changes for a specific object or phenomenon of interest through time. Unlike traditional regression models that fit relationships based on all data points, LTMs fit a quantitative trajectory for the repeated measurements of each study unit over time. The assumption is that the change over time can be described by a suite of parameters (e.g., intercept and slope), and these parameters can be explained (predicted in many instances) by a set of site-specific independent variables. The underlying assumption is that temporal changes of all subjects (units) follow a certain latent trajectory, and observed data are derived from such latent trajectory under some (spatially or temporally local) perturbations or deviations. In practice, the trajectory could be linear, quadratic, exponential, cosine or sine (for periodical phenomena such as seasonal change of canopy cover of deciduous forests), and the like. Traditional LTMs are often applied in social and public health sciences to handle longitudinal data such as test scores of students over time and drug use behavior (Guo and Hipps, 2004). Combined with a spatial term that accounts for neighboring effects (Tiefelsdorf and Griffith, 2007), LTMs can be used to interpret temporal trends in spatial panel data, such as the intensity of online keyword search for “climate change” across the United States (An et al., 2016a) and the space–time change of body mass index (a measure of obesity) in four Ghanaian regions (Crook et al., 2016).

1.21.3.5

Spatial Markov Chains Models

Traditional Markov chain models aim to predict the status of an object (or a set of objects) or phenomenon at future times. There are several statuses the object(s) or the phenomenon of interest can be subject to, and the probability of changing from one status to another is derived from analyzing empirical observations. The principle is relatively simple: between adjacent time intervals, the changing probability between particular status pairs is constant. Spatial Markov chain models address the change of spatial units, such as land cover/use change at different times. This method applies the traditional “snapshot” data model and examines the temporal trends of change. What distinguishes spatial Markov chain models from regular ones is the consideration of spatial dependence (e.g., through spatial lag or spatial weights matrix; Anselin, 2003) among nearby units. Despite many strengths of spatial Markov chain models (particularly it offers a simple methodology for exploring spatiotemporal changes), a number of drawbacks are also noteworthy, such as the questionable assumption of stationarity. For more detail, see An and Brown (2008), Iacono et al. (2012), and An et al. (2015).

1.21.3.6

Cellular Automaton

As a simulation tool, cellular automaton (CA) represents the real geographic space by laying out a two-dimensional plane that is populated with cells. At the beginning phase, some cells are turned on, which can represent the presence of some features or phenomena, while others are off. Certain predefined rules, usually neighborhood based, will guide the changes of the cell’s status as time moves on. A simple example of the rule can be that if more than two of the focal cell’s eight neighbors are on, and the cell was off at the previous stage, it will be turned on. And sometimes more complicated rules can be applied to represent more complex landscape changes. Several example CA applications (Clarke et al., 1997; He et al., 2005; Messina and Walsh, 2001) can be found in the literature. Similar with most traditional geographic research methods, CA handles time through a “snapshot model”. Changes through time are captured by snapshots during the modeling process. Patterns revealed in each snapshot and the differences among snapshots are analyzed. Here time functions as an additional dimension other than the two or three dimensions that defines the geographical space. Time itself is not the interest of research, and the researchers are more interested how objects and phenomena of interest evolve over time.

1.21.3.7

Agent-Based Simulation

Traditional simulation methods, like CA mentioned above, apply a top-down modeling scheme. As a major, powerful tool in complexity science and in studying many human or nonhuman systems, agent-based models (ABMs or agent-based modeling) have been widely developed and employed in various domains such as ecology, epidemiology, geography, land use, political science, and sociology (An et al., 2015). The use of ABMs has increased rapidly among various scientific communities over the last two decades. According to a Web of Knowledge survey, the number of articles reporting the development or use of ABMs has been steadily increasing in an exponential rate since the 1990s, ranging across such research fields as ecology, humanenvironment science, land system science, and sociology (An et al., 2016b). In a typical ABM, action (behavior) rules are predefined and the subjects or phenomena evolve according to those rules. Usually these rules are static and not subject to change as time moves on. Nevertheless, the real-world situation may be more complex. Rules

Time Geography

309

can change, and there can be interactions among different subjects that may reinforce or weaken (even cancel) this change. Agent-based simulation, or ABM, is a promising tool to handle this kind of complicated interactions. Similar with CA, ABM can first set up the geographical space as a plane populated with two-dimensional cells, or three-dimensional space with cubic units. Agents can be in all forms ranging from an individual, a household, an organization or institution, or even some abstract entities. ABM applies a “bottom-up” scheme, which can account for the interactions among different agents and between agents and the environment. The agents are designed to be “intelligent” to adjust their behavior according to feedbacks from the environment and other agents. All these features contribute to the power of ABMs: their capabilities to represent heterogeneity, nonlinearity, feedback, and individual-level activity and decision making. Furthermore, ABM has demonstrated to be very useful in integrating data and models across multiple disciplines and scales (An, 2012; An et al., 2015).

1.21.4

Conclusion and Discussion

Even with an intellectual origin from many other disciplines for a long time, the dichotomy of space and time still besets geographers. There is no doubt that contemporary advances in technology such as GIS, GPS, and remote sensing have brought up a growing number of opportunities (challenges at the same time). In this review, we conclude that time can play three roles in geographical research: time as the interest of research, as a constraint, or as an attribute domain of the data. Most traditional geographic research incorporates time as a data attribute by applying a “snapshot” model. These methods tend to emphasis the spatial aspects of the problem while not paying enough attention to the temporal aspects.

1.21.4.1

Tight Integration of Space and Time

Probably due to the abovementioned difficulties in integrating space and time, few studies treat time as the core research issue, for example, investigating the timing and endurance of events or phenomena. It poses considerate challenges to reveal processes or phenomena with fast changing paces, especially when the sampling intervals are long and the temporal resolution of the data is low. One methodological opportunity is the ABM framework, which has a high potential to elegantly address many of the aforementioned challenges. The classical time geography methods and their later developments are good for dealing with questions where time acts as constraints. And for more complex situations with multilayer interactions and feedbacks, simulation tools like ABM are very useful for dealing with this nonlinearity. Assume a time geographer (city planner) aims to deploy some urban facilities and services such as overpass bridges. Under an ABM framework, the planner collects data regarding: (1) demographic datadgender, education, race, age, and work hours; (2) the spatially heterogeneous environment data: road network, home address, and local people’s preference, availability and capability of conducting certain actions, for example, going to work at certain time(s); also data can be gathered about the timing and location of traffic congestion on a daily or hourly basis; and (3) interactions between agents and the environment over a time span of varying granularity, for example, one person with work hours from 8 am to 4 pm may decide to take a certain highway or local roads. Using an ABM, the planer may locate commuters (agents), work and home addresses, roads, and intersections (objects) on a twodimensional digital environment. Then (s)he may assign attributes to these agents and objects, such as age, race, work hours, and preference to commuters, etc. Then according to domain or survey knowledge or artificial intelligence, (s)he may assign rules to these agents, which could be in the “if.then.(with a certain probability)” or “if.then.; else if.then.” format. Then the simulation begins with all such data and rules included: the agents stay or move over space along time, often interacting with other agents or the environment (e.g., stop at a red light or traffic congestion). Such simulation may ultimately provide information about what areas may be subject to traffic congestion at what time and for how long, answering many “what-if” questions: what if a new road is added near a certain “bottleneck” place? What would happen if more time of green light is allowed at one direction of an intersection? The ABM-based simulation mimics “the sequential unfolding” of agent activities over time (Kwan, 2013, p. 1082), showing how ABM may integrate space and time more in the so-called “flow perspective” (Dijst, 2013, p. 1060). Additionally, an ABM may generate a large amount of individual movement data that can be subject to many classical time geography methods or metrics, such as space–time paths, space–time prisms, or TGDE(Downs, 2010). In the context of the above ABM, we can examine whether path bundling may exist and contribute to traffic congestion at the corresponding road or intersection. Clearly this is a typical time geography/accessibility problem, involving movement constraints or possibilities such as people’s work hours, road network, and geographic locations of homes and work addresses. More interestingly, ABMs are able to simulate “interactions between individuals and with environmental variables” (Long and Nelson, 2013, p. 312). It is self-evident that planners are likely to make erroneous conclusions if time is ignored or looked at a too coarse resolution. As illustrated above, ABM has the potential to become a major tool in representing, explaining, and predicting individual movements and interactions with one another and with the environment, shedding important insights into the space–time patterns and the mechanisms behind such patterns.

1.21.4.2

Span and Granularity of Time

Current geography, GIScience in particular, has been powerful in terms of handling spatial heterogeneity. In spite of considerable efforts, geographers still fall short of the capability to handle temporal variability (e.g., An, 2012; An and Brown, 2008; Long and

310

Time Geography

Nelson, 2013; Peuquet and Duan, 1995; Yi et al., 2014; Yuan, 1999). The advent of and advances in several modern technologies (e.g., GIS, GPS, and remote sensing) have substantially empowered time geographical research. However, choosing the time span and temporal granularity of data is still largely driven by data availability, the convenience of data collection, or even personal preferences with little consideration of the related theory or knowledge about the process(es) of interest. Questions should be directed toward the validity of time span or temporal granularity in data collection or analysis. Analysis based on such preference- or convenience-driven data may not uncover the realistic patterns or mechanisms behind such patterns. The LTM approach, adapted from the social and health science fields, may help address these kinds of questions, especially those in the land use and land cover change domain. In the context of our introduction of LTMs earlier, we demonstrate how LTM may help us determine the appropriate time span and/or granularity. Assume that a spatial LTM gives rise to a set of interesting temporal trajectories, characterized by insignificant time-related coefficients. What do such insignificant coefficients mean? Among many possible reasons, one possibility could be that our time span is too short or data are collected at a too coarse granularity of time. We might further explore whether the phenomena of interest may have temporal patterns (e.g., periodicity). If so, we should adapt our data collection scheme so as to accommodate such type of complexity.

1.21.4.3

Imprecision of Time Measurement

In many research applications, coarse granularity of time is unavoidable in data collection due to technological, financial, administrative, or labor limitations. When studying discrete or qualitative events at individual or aggregate levels, survival analysis models can make data collected at a coarse granularity more useful. This advantage lies in survival analysis’ capability in handling censored data (see the section for survival analysis), which deserves more attention from time geographers or modelers. One intriguing feature of survival analysis is that we could record data about event (e.g., transaction) time, x and y coordinates, and the environment as continuous (or at a very fine granularity if discrete) attributes of the objects under consideration. When an event or transaction happens at time t or a time interval (t, t þ Dt), we can link the event with the data of the very object, of other objects, and/or of the environment also at time t (or earlier times such as t  1) through a set of time-dependent variables. Survival analysis models deserve more effort and attention in time geography, especially in dealing with events and transactions data that are very much time variant.

1.21.5

Future Directions

Our review and discussion may point to a few future directions in time geography. First, the current loose space–time integration is still a challenge in time geography and other related disciplines, and we should continue to invest time and efforts in this exciting, yet challenging research frontier. ABM is a very promising methodological framework to handle individual movement data, spatial panel data, and event/transaction data. It would be ideal to develop ABM modules, platforms, or tools that are relatively easy to use (e.g., in combination with GIS), free (e.g., open source), and accessible (e.g., online available and well documented). Second, time geography has come to a point, in which vigorous frameworks and theories are in dire need. Various disciplines, especially land change science, GIScience, and complexity science, may take lead in developing such frameworks and theories. Particularly, complexity science may play an important role because of its strengths in dealing with feedback, heterogeneity, time lag, path dependence, multifinality, and equifinality that are common in complex systems (An, 2012; Liu et al., 2007; National Research Council, 2014). Modeling human actions and behavior should be a very important research frontier in time geographic research, and there are a multitude of models or methods that time geographers can benefit from. For an overview of the related models, we recommend An (2012). Third, time geographers should continue to move forward in the data-mining direction with input from multidisciplines even when we do not have enough domain knowledge, theory, and understanding toward our topic or phenomena of interest. This is particular important in this era of big data. The panel regression and simulation models reviewed in this article should contribute to this direction substantially. We have highlighted the usefulness and importance of several “nonmainstream” methods for time geographic research, such as LTMs, survival analysis models, and ABMs, expecting more advances in time geography resulting from the application of these methods. Fourth and last, more efficient data models, robust metrics, along with powerful statistical- and simulation-based methods, should be developed to accommodate the need to handle space–time data, especially big space–time datasets. The traditional “snapshot” data model has the advantage of conceptual simplicity and ease of understanding, but suffers a lot from its low efficiency (e.g., the same data are stored at different times repeatedly). In developing space–time GIS or STGIS (Goodchild, 2013), it is a worthwhile investment if more time and efforts can be devoted to developing, testing, and employing more efficient data models in time geographic research, especially when dealing with individual movement and/or transaction data. Related to this need, more robust metrics, statistical- and simulation-based methods, and tools should continue to be developed and tested as we move toward better visualizing, explaining, or predicting space–time patterns. We do not expect to give a completely objective (e.g., no personal preference or bias) review of all time geographic research. It is our hope that this article may synthesize what has been achieved in the subarea of time geography, pinpoint areas for further research, and stimulate more meaningful efforts in the future.

Time Geography

311

References An, L., 2012. Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecological Modeling 229, 25–36. An, L., Brown, D.G., 2008. Survival analysis in land change science: Integrating with GIScience to address temporal complexities. Annals of the Association of American Geographers 98 (2), 323–344. An, L., Brown, D.G., Nassauer, J.I., Low, B., 2011. Variations in development of exurban residential landscapes: Timing, location, and driving forces. Journal of Land Use Science 6 (1), 13–32. An, L., Tsou, M., Crook, S., et al., 2015. Space–time analysis: Concepts, quantitative methods, and future directions. Annals of Association of American Geographers 105 (5), 891–914. An, L., Tsou, M., Spitzberg, B., Gupta, D.K., Gawron, J.M., 2016a. Latent trajectory models for space–time analysis: An application in deciphering spatial panel data. Geographical Analysis 48 (3), 314–336. An L, Jankowski P, Turner BL, Wang S, and Manson S (2016b) ABM’17: The usefulness, uselessness, and impending tasks of agent-based models in social, human-environment, and life sciences. The proposal of an NSF funded project between 2016 and 2018 (BCS-1638446). Anselin, L., 1995. Local indicators of spatial association-LISA. Geographical Analysis 27 (2), 93–115. Anselin, L., 2003. Spatial externalities, spatial multipliers, and spatial econometrics. International Regional Science Review 26 (2), 153–166. Armstrong, M.P., 1988. Temporality in spatial database. In: Proceedings of GIS/LIS’88, 2. American Congress of Surveying and Mapping, Bethesda, MD, pp. 880–889. Assuncao, R.M., Reis, I.A., Oliveira, C.D.L., 2001. Diffusion and prediction of Leishmaniosis in a large metropolitan area in Brazil with a Bayesian space–time model. Statistics in Medicine 20 (15), 2319–2335. Biggeri, A., Martuzzi, M., 2003. Preface (Special issue). Environmetrics 14, 429–430. Cheng, T., Wang, J., Li, X., 2011. A hybrid framework for space–time modeling of environmental data. Geographical Analysis 43 (2), 188–210. Clarke, K.C., Hoppen, S., Gaydos, L., 1997. A self-modifying cellular automaton model of historical urbanization in the San Francisco Bay area. Environment and Planning. B, Planning & Design 24 (2), 247–261. Coomes, O.T., Grimard, F., Burt, G.J., 2000. Tropical forests and shifting cultivation: Secondary forest fallow dynamics among traditional farmers of the Peruvian Amazon. Ecological Economics 32, 109–124. Cresswell, T., 2013. Geographic thought: A critical introduction. Wiley-Blackwell, West Sussex. Crook, S.E.S., An, L., Stow, D.A., Weeks, J.R., 2016. Latent trajectory modeling of spatiotemporal relationships between land cover and land use, socioeconomics, and obesity in Ghana. Spatial Demography 4 (3), 221–244. Dainton, B., 2001. Time and space. Cambridge Univ. Press, London. Delmelle, E., Dony, C., Casas, I., Jia, M., Tang, W., 2014. Visualizing the impact of spact-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science 28 (5), 1107–1127. Dijst, M., 2013. Space–time integration in a dynamic urbanizing world: Current status and future prospects in geography and GIScience. Annals of the Association of American Geographers 103 (5), 1058–1061. Downs, J.A., 2010. Time-geographic density estimation for moving point objects. In: Fabrikant, S.I., Reichenbacher, T., van Kreveld, M., Schlieder, C. (Eds.), Geographic information science. Springer, Berlin, pp. 16–26. Downs, J.A., Horner, M.W., 2012. Probabilistic potential path trees for visualizing and analyzing vehicle tracking data. Journal of Transport Geography 23, 72–80. Dykes, J., MacEachren, A.M., Kraak, M.J., 2005. Exploring geovisualization. Elsevier, San Diego, CA. Elhorst, J.P., 2012. Dynamic spatial panels: Models, methods, and inferences. Journal of Geographical Systems 14 (1), 5–28. ESRI, 2016. ArcGIS. Environmental Systems Research Institute, Redlands, CA. http://www.esri.com/. Furrer, R., Knutti, R., Sain, S.R., Nychka, D.W., Meehl, G.A., 2007. Spatial patterns of probabilistic temperature change projections from a multivariate Bayesian analysis. Geophysical Research Letters 34 (6). : L06711. Goodchild, M.F., 2013. Prospects for a space–time GIS. Annals of the Association of American Geographers 103 (5), 1072–1077. Guo, G., Hipps, J., 2004. Longitudinal analysis for continuous outcomes: Random effects models and latent trajectory models. In: Hardy, M., Bryman, A. (Eds.), The Handbook of Data Analysis. SAGE Publications, Los Angeles, CA, pp. 347–368. Guttorp, P., Meiring, W., Sampson, P.D., 1994. A space–time analysis of ground-level ozone data. Environmentrics 5 (3), 241–254. Hagerstrand, T., 1970. What about people in the regional science? Papers in Regional Science 24 (1), 7–24. He, C., Shi, P., Chen, J., et al., 2005. Developing land use scenario dynamics model by the integration of system dynamics model and cellular automata model. Science in China Series D: Earth Science 48 (11), 1979–1989. Hinckfuss, I., 1974. The existence of space and time. http://philpapers.org. Iacono, M., Levinson, D., El-Geneidy, A., Wasfi, R., 2012. A Markov chain model of land use change in the Twin Cities. Paper presented at the 10th International Symposium on Spatial Accuracy Assessment in natural Resources and Environmental Sciences. Florianopolis, Santa Catarina, Brazil. Irwin, E., Bockstael, N., 2002. Interacting agents, spatial externalities, and the endogenous evolution of residential land-use pattern. Journal of Economic Geography 2, 31–54. Kwan, M.P., 2013. Beyond space (as we knew it): Toward temporally integrated geographies of segregation, health, and accessibility. Annals of the Association of American Geographers 103 (5), 1078–1086. Lee, L., Yu, J., 2010. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40, 255–271. Lee, C., Devillers, R., Hoeber, O., 2014. Navigating spatio-temporal data with temporal zoom and pan in a multi-touch environment. International Journal of Geographical Information Science 28 (5), 1128–1148. Leibovici, D.G., Claramunt, C., Guyader, D.L., Brossetb, D., 2014. Local and global spatio-temporal entropy indices based on distance-ratios and co-occurrences distributions. International Journal of Geographical Information Science 28 (5), 1061–1084. Levine, N., 2004. Space–time analysis. In: CrimeStat III, 9.1-9.42. Ned Levine and Associates, Houston, TX. http://www.icpsr.umich.edu/CrimeStat. Liu, J., Dietz, T., Carpenter, S.R., et al., 2007. Complexity of coupled human and natural systems. Science 317 (5844), 1513–1516. Long, J.A., Nelson, T.A., 2013. A review of quantitative methods for movement data. International Journal of Geographical Information Science 28 (5), 855–874. Lophaven, S., Carstensen, J., Rootzén, H., 2004. Space–time modeling of environmental monitoring data. Environment and Ecological Statistics 11, 237–256. Messina, J.P., Walsh, S.J., 2001. 2.5D morphogenesis: Modeling landuse and landcover dynamics in the Ecuadorian Amazon. Plant Ecology 156 (1), 75–88. Nail, A.J., Hughes-Oliver, J.M., Monahan, J.F., 2011. Quantifying local creation and regional transport using a hierarchical space–time model of ozone as a function of observed NOx, a latent space–time VOC process, emissions, and meteorology. Journal of Agricultural, Biological, and Environmental Statistics 16 (1), 17–44. National Research Council, 2014. Advancing land change modeling: Opportunities and research requirements. National Academies Press, Washington, DC. Natvig, B., Tvete, I.F., 2007. Bayesian hierarchical space–time modeling of earthquake data. Methodology and Computing in Applied Probability 9 (1), 89–114. Nöllenburg, M., 2007. Geographic visualization. In: Kweewn, A., Ebert, A., Meyer, J. (Eds.), Human-centered visualization environments. Springer, Berlin, pp. 257–294. Peuquet, D.J., Duan, N., 1995. An event-based spatio-temporal data model (ESTDM) for temporal analysis of geographical data. International Journal of Geographical Information Systems 9 (1), 7–24. Rey, S.J., Janikas, M.V., 2006. STARS: Space–time analysis of regional systems. Geographical Analysis 38 (1), 67–86. Rey, S.J., Janikas, M.V., Smirnov, O., 2005. Exploratory geovisualization of spatial dynamics. In: Brown, D., Xie, Y. (Eds.), Geocomputation. University of Michigan, Ann Arbor, MI.

312

Time Geography

Robertson, C., Nelson, T.A., Boots, B., Wulder, M.A., 2007. STAMP: Spatial-temporal analysis of moving polygons. Journal of Geographical Systems 9 (3), 207–227. Rouhani, S., Wackernagel, H., 1990. Multivariate Geostatistical approach to space–time data analysis. Water Resources Research 26 (4), 585–591. Scheepens, R., van de Wetering, H., van Wijk, J.J., 2014. Contour based visualization of vessel movement predictions. International Journal of Geographic Information Science 28 (5), 891–909. Scholz, R.W., Lu, Y., 2014. Detection of dynamic activity patterns at a collective level from large-volume trajectory data. International Journal of Geographic Information Science 28 (5), 946–963. Shaw, S.L., Yu, H., Bombom, L.S., 2008. A space–time GIS approach to exploring large individual-based spatiotemporal datasets. Transactions in GIS 12 (4), 425–441. Tiefelsdorf, M., Griffith, D.A., 2007. Semiparametric filtering of spatial autocorrelation: The eigenvector approach. Environment and Planning A 39 (5), 1193–1221. Vance, C., Geoghegan, J., 2002. Temporal and spatial modeling of tropical deforestation: A survival analysis linking satellite and household survey data. Agricultural Economics 27, 317–332. Wu, B., Li, R., Huang, B., 2014. A geographically and temporally weighted autoregressive model with application to housing prices. International Journal of Geographic Information Science 28 (5), 1186–1204. Ye, X., Carroll, M.C., 2011. Exploratory space–time analysis of local economic development. Applied Geography 31 (3), 1049–1058. Yi, J., Du, Y., Liang, F., Zhou, C., Wu, D., Mo, Y., 2014. A representation framework for studying spatiotemporal changes and interactions of dynamic geographic phenomena. International Journal of Geographic Information Science 28 (5), 1010–1027. Yuan, M., 1999. Use of a three-domain representation to enhance GIS support for complex spatiotemporal queries. Transactions in GIS 3 (2), 137–159.

1.22

Spatial Data Uncertainty

Linna Li, Hyowon Ban, and Suzanne P Wechsler, California State University, Long Beach, CA, United States Bo Xu, California State University, San Bernardino, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.22.1 1.22.2 1.22.2.1 1.22.2.1.1 1.22.2.1.2 1.22.2.2 1.22.2.3 1.22.3 1.22.3.1 1.22.3.1.1 1.22.3.1.2 1.22.3.1.3 1.22.3.1.4 1.22.3.1.5 1.22.3.2 1.22.3.2.1 1.22.3.3 1.22.3.4 1.22.3.4.1 1.22.3.4.2 1.22.3.4.3 1.22.4 1.22.4.1 1.22.4.2 1.22.5 1.22.5.1 1.22.5.1.1 1.22.5.1.2 1.22.5.1.3 1.22.5.2 1.22.5.2.1 1.22.5.2.2 1.22.5.2.3 1.22.6 1.22.7 1.22.7.1 1.22.7.2 1.22.7.3 1.22.8 References

1.22.1

Introduction Scale and Conceptualization of Uncertainty Data Quality: Cartographic Scale and Map Accuracy Error as a component of data quality Map accuracy as a measure of error MAUP and Ecological Fallacy: The Case of Process and Temporal Scales Spatial Extent: Continuous Surfaces and Raster Scale Uncertainty Analysis Methods Modeling Uncertainty in Vector data Positional uncertainty Analytical modeling of positional uncertainty Modeling positional uncertainty of point features Modeling positional uncertainty of linear features Modeling positional uncertainty of polygons Attribute Uncertainty Error matrix for attribute uncertainty Modeling Topological Relations of Spatial Objects With Vague Boundaries Analytical Methods and Monte Carlo Simulation Uncertainty simulation in continuous fields Uncertainty in raster data: The case of the digital elevation model DEM uncertainty propagation to derived parameters Uncertainty Propagation in Spatial Analysis Modeling Uncertainty Propagation in Computational Models Modeling Uncertainty Propagation in Simple Spatial Operations Semantic Uncertainty in Spatial Concepts Uncertainty, Fuzzy-Sets, and Ontologies Uncertainty Fuzzy-set approach Ontologies Applications The 1990s The 2000s The 2010s Uncertainty Visualization Uncertainty in Crowd-Sourced Spatial Data Evaluation of Uncertainty in Crowd-Sourced Spatial Data Uncertainty in Platial Data Uncertainty Equals Low Quality? Future Directions

313 315 315 315 315 316 317 318 318 318 318 318 319 320 321 321 321 322 322 322 323 323 324 324 324 324 325 325 327 327 327 328 328 329 330 330 331 332 333 334

Introduction

Spatial data uncertainty is an important subfield of geographic information science (GIScience). Uncertainty is used as an umbrella term to encompass data quality, accuracy, error, vagueness, fuzziness, and imprecision. Each of these terms refers to imperfections in spatial datasets. Given that spatial data are representations of reality, it is impossible to have perfect representation of the world without any loss of information. Therefore, uncertainty is inevitable in all geographic datasets and analyses. Spatial data uncertainty means that we are still uncertain about the geographic world even when we have a high-quality geographic database. From reality to representation, uncertainty is introduced and propagated at every step of the spatial analysis processdfrom conceptualization and generalization to measurement and analysis.

313

314

Spatial Data Uncertainty

Spatial data quality is defined based on the assumption that there is geographic truth to compare with a datasetdthe closer a spatial dataset is to the truth, the higher its quality. The term “error” refers to how far a measurement is from truth. An example of a data quality report may be in the form of metadata that describe known uncertainties by data providers, such as measurement errors. Error is therefore a term closely related to quality. High quality is correlated with small errors while low quality is associated with big errors. Accuracy provides a measurement of error. An accurate dataset is one that is close to a represented phenomenon with only small errors. Vagueness, fuzziness, and imprecision are generally used to describe uncertainties associated with geographic concepts, classes, and values that must be addressed appropriately to responsibly communicate measure, model, and communicate uncertainty in spatial datasets and analyses. Over the past three decades, the geospatial community has addressed the issue of spatial data uncertainty in varied and meaningful ways. In 1988 the US National Center for Geographic Information and Analysis (NCGIA) hosted a conference dedicated to addressing the “Accuracy of Spatial Databases” which represents the beginning of a growing interest and concern for this topic. The conference resulted in a seminal work devoted to the topic (Goodchild and Gopal, 1989). Since 1994, 12 International Symposiums on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences have been held biannually. Nine International Symposiums on Spatial Data Quality have been organized, most recently in 2015. The 2017 AAG conference included uncertainty as one of three core themes. Research related to spatial uncertainty continues to expand (Fig. 1). The geospatial community, as it has matured over the past three decades, has consistently addressed issues of uncertainty. This article reviews many of these approaches. Spatial data uncertainty has been described and addressed from different perspectives. Zhang and Goodchild (2002) organize and discuss uncertainty-related research in continuous fields and discrete objects, respectively. Uncertainty may be studied as positional uncertainty or attribute uncertainty. Spatial uncertainty is also intertwined with other related issues, such as scale. This article is organized as follows. Section “Scale and Conceptualization of Uncertainty” discusses scale and conceptualization of uncertainty. Section “Uncertainty Analysis Methods” focuses on methods used to evaluate and analyze uncertainty in both discrete objects and continuous fields “Uncertainty Propagation in Spatial Analysis” discusses methods to measure uncertainty propagation in computational models and simple spatial operations. Section “Semantic Uncertainty in Spatial Concepts” describes semantic uncertainty in spatial concepts and section “Uncertainty Visualization” presents visualization of spatial data uncertainty. Section “Uncertainty in Crowd-Sourced Spatial Data” discusses new dimensions of uncertainty in the era of big data. Finally, the article ends with conclusions and future research directions in section “Future Directions”.

800 14000 700 12000 600

Citations

500 8000

400

6000

300

4000

200

2000

100

Publications

10000

0 01

02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12 20 13 20 14 20 15 20 16

20

20

20

00

0

Spatial data uncertainty citations

Spatial data accuracy citations

Spatial data quality citations

Spatial data uncertainty publications

Spatial data accuracy publications

Spatial data quality publications

Fig. 1 Yearly publications and citations from 2000 to 2016 indexed by Scopus using the following search terms: “spatial data quality”, “spatial data uncertainty”, and “spatial data accuracy”. The Boolean search was limited using AND NOT to prevent overlap. The search was limited to environment sciences and social sciences. The total number of publications including all terms was 10,521.

Spatial Data Uncertainty

1.22.2

315

Scale and Conceptualization of Uncertainty

Scale is arguably a fundamental characteristic of geographic data. All spatial data are scale and context dependent. These scales impact how we interpret and make meaning of resulting spatial analyses. Understanding components of scale is essential to understanding uncertainty associated with our results, which in turn shape our interpretations and decisions. The word “scale” itself has many meanings, both outside and within the geographic context (Quattrochi and Goodchild, 1997; Ruddell and Wentz, 2009). The conceptualization of scale and its varied manifestations on spatial data and associated analyses have been addressed extensively in the literature (see for example Blöschl, 1996; Goodchild, 2001, 2011; Goodchild and Proctor, 1997; Goodchild and Quattrochi, 1997; Lam and Quattrochi, 1992; Quattrochi and Goodchild, 1997). In the geospatial context, the term scale manifests in the following areasdcartographic scale, process scale, and spatial extent. Each of these categories of scale is interrelated and affects spatial data accuracy, error, and associated uncertainty (Goodchild, 2001, 2011; Goodchild and Proctor, 1997). Cartographic scale refers specifically to the representative fraction that relates a map feature with associated ground distance. This reference scale dictates the level of detail represented by map features (Goodchild, 2011; Goodchild and Proctor, 1997). Process scale refers to the spatial representation of a natural phenomenon such as soil erosion in a landslide hazard. The scales of natural processes are often unknown because they occur at a range of spatial and temporal scales. For example, topography results from many different processes operating over a range of spatial and temporal scales. Herein lies one of the perpetually confounding issues of spatial data analyses and resulting uncertainty. Matching the appropriate spatial data scale with the process we are trying to understand is an ongoing creative exploration. Physical and spatial processes manifest at different and not consistent space–time scales, while the mathematical relationships (spatial data and algorithms) we use to describe them are generally scale dependent. Uncertainty due to vagueness and ambiguity results from process, temporal scale, and spatial extent. Spatial extent refers to the geographic boundaries of a study area and influences the level of detail that can be represented, as well as the amount of storage and processing power required for spatial analysis. Spatial extent influences the scale at which a geographic phenomenon can be observed. This observational scale in turn imposes a scale on our understanding of natural or spatial processes. Uncertainty can be separated into three classes: (1) ideas of error, (2) vagueness, and (3) ambiguity (Fisher, 1999). We explore these varied expressions of geographic scale to frame an understanding of spatial uncertainty. Uncertainty due to errors is associated with Boolean representation of cartographic features. Uncertainty due to vagueness and ambiguity results from process, temporal scale, and spatial extent. In the following section, components of geospatial scale provide a framework for exploring the varied expression of uncertainty in spatial data.

1.22.2.1

Data Quality: Cartographic Scale and Map Accuracy

Geospatial analyses are performed to derive understanding about reality. Such understanding derived from spatial data analyses is inextricably linked to the “quality” of the underlying geospatial datasets. The term “data quality” has been used to refer to fitness for use, data accuracy, and by association, error and related uncertainty (Chrisman, 1991; Brus, 2013). Spatial data quality has been a core focus of geospatial practice since the arrival of geographic information systems in the early 1980s (Goodchild and Gopal, 1989; Veregin, 1999; Devillers et al., 2010; Li et al., 2012). Mechanisms for describing data quality have been used as a measure of accuracy, and in turn a mechanism to quantify what we do not know, or uncertainty. Data quality implies characteristics of both error and accuracy.

1.22.2.1.1

Error as a component of data quality

Accepted descriptions and measures of data quality are required to promote effective methods for analysis and display of uncertainty (Buttenfield, 1993). For the purposes of this article, we define these related terms as follows. Error is defined as a departure of a measurement, or in the case of spatial data, the representation of a feature on a map, from its true value. The nature and extent of error in spatial datasets are often unknowable and result in uncertainty. Efforts to represent and manage error are rooted in concepts of data quality. Error can be classified as mistakes or blunders, systematic, or random. Blunders arise from errors in the data collection process and are generally removed once identified. Systematic errors result from bias in the data collection process. Random errors remain in the data once blunders and systematic errors are addressed (USGS, 1998; Wechsler and Kroll, 2006). Quantifying map accuracy provides a measure of the magnitude of error and has been used as a measure of data quality (Li et al., 2012). Positional (horizontal) and vertical accuracy of spatial data are inextricably linked to the scale of the data from which they were derived.

1.22.2.1.2

Map accuracy as a measure of error

Before the advent of computer technology, maps were drawn by hand. Uncertainty occurs at different levels of generalization and is linked to the scale of representation (MacEachren et al., 2005). Accuracy of these cartographic representations is linked to the scale at which they were drawn. The National Map Accuracy Standards released in 1947 provided a measure of data quality for paper maps. Per these requirements, no more than 10% of sampled points on maps of 1:20,000 scale or larger could be “off” by 0.08 cm or 0.05 cm in maps at scales smaller than 1:20,000. Effectively, for 1:24,000 scale maps, the horizontal position of 10% of the map features on maps could have a positional error of up to 12 m. These standards served as the measure of map accuracy for over 50 years. Because digital maps were generated from paper maps, this measure of accuracy persisted into the digital cartographic

316

Spatial Data Uncertainty

age. Data quality, as measured by map accuracy, became the standard mechanism by which to gauge map accuracy and associated uncertainty (Buttenfield, 2000). In 1989, standards for measuring horizontal and vertical accuracy were developed. The root mean square error (RMSE) was adopted as the accuracy statistic by which data quality is measured (ASPRS, 1989). The RMSE is the square root of the average squared discrepancies and is expressed as: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN uP u ðyi  yti Þ2 ti¼1 (1) N where yi refers to the ith map value, yti refers to the ith known or measured sample point, and N is the number of sample points, reported in the measurement units. The RMSE has become the de-facto mechanism for quantifying and stating accuracy in derived spatial products. Geospatial metadata standards, first released in 2004, require that all spatial data be accompanied by a statement of data quality as defined by accuracy assessments (FGDC, 1998). This remains the only requirement of geospatial practitioners to document and represent data quality. However, the RMSE has its drawbacks. The RMSE statistic has no spatial dimension. Although it provides information about the overall accuracy of a dataset, errors in spatial data vary spatially (Wood and Fisher, 1993). The RMSE is based on the assumption that errors are random and normally distributed, which may not always be the case (Zandbergen, 2008), and by squaring the difference, more weight is given to values that deviate from the “truth”. Additionally, sample points used to calculate the RMSE for a particular data product may not be appropriately spatially distributed or of sufficient number to adequately represent inherent data quality. Despite these shortcomings, the RMSE persists as the standard accuracy statistic that is used to quantify the quality of geospatial data. The challenge remains to provide users with a mechanism for integrating that information to avoid data misuse and make better decisions (Devillers et al., 2010).

1.22.2.2

MAUP and Ecological Fallacy: The Case of Process and Temporal Scales

Making inferences about phenomena observed at one scale based on data observed and presented at coarser or finer scales results in potential misrepresentation and misinterpretation of results. Even before the digital age, researchers have grappled with this issue. In 1934, the Journal of the American Statistical Association published a series of articles dealing with sampling issues associated with census data and discrepancies between sampling space and time (Gehlke and Biehl, 1934; Neprash, 1934; Stephan, 1934). The disconnect between the scale of spatial data and the processes they attempt to explain is a well-established geographic concept and conundrum. The expression of this is referred to as the Modifiable Areal Unit Problem (MAUP) and as the “ecological fallacy” (Openshaw, 1984a, 1984b; Sui, 2009). These related concepts represent the issues associated with the potential disconnect between scales of processes and the scales of spatial data used to represent them. To understand spatial phenomena, we construct boundaries within which data are aggregated and analyzed. This very act of imposing a boundarydbe it a census block, census tract, parcel, zip code, or grid celldaffects our understanding of the process or pattern we are trying to discern and the results of spatial analyses. MAUP and ecological fallacies result from integrating data derived using different measurement scales (Duckham et al., 2001). The disconnect occurs between spatial aggregation whereby the measurement of variables through observation, associated representation through spatial data, and the subsequent relationships between these variables are affected by the scale at which these measurements take place. Analyzing the same spatial phenomenon using different scales of areal units i.e., “modifiable areal units” of analysis, can produce differing analytical results. An ecological fallacy occurs when conclusions are drawn about individuals based on aggregate data (e.g., you live in a rich tract, so you must be rich). These concepts are well documented in the academic literature related to spatial analyses (see for example Dark and Bram, 2007; Fotheringham and Wong, 1991; Jelinski and Wu, 1996; Openshaw, 1984b, 1998; Stouffer, 1934; Wong, 2009). An understanding and recognition of MAUP, how it manifests, and how to best minimize it, is essential to accommodating uncertainty in results of spatial analyses. Recommendations for approaching MAUP have focused on finding the best possible scales at which the process being studied operates (Kwan, 2012a). However, even if we had perfect knowledge of process scale, limitations with how we frame spatial scales still exist. Offshoots of this currently intractable spatial concept have recently emerged. The Modifiable Temporal Unit Problem (MTUP) (Cheng and Adepeju, 2014; Çöltekin et al., 2011) expands the issue of mismatching spatial scales in analyses to the temporal domain. The MTUP acknowledges the importance of selecting the appropriate temporal resolution through which to analyze a geographic process. Spatial data are static, yet the phenomena they represent are not. Digital map features represent a certain snapshot of the subject based on a specified scale and time. There is a persistent gap in the space–time continuum represented by geospatial data and associated analyses. Historically, GISs were not adept at managing the temporal dimension. Recently software enhancements facilitate the integration of time steps in spatial analyses, notably the release of ESRI’s space–time pattern mining functionality (Esri, 2016). However, temporal scales are complex. Limiting temporal representation to collections of time stamps, while a step forward, still does not resolve the issue (Claramunt and Theriault, 1996). Kwan (2012a,b) suggests that while MAUP addresses the disconnect between boundaries and inferences made based on the scale of representation, it does not address the underlying geographic contextdhow individuals within these boundaries behave based

Spatial Data Uncertainty

317

on experience due to geography and time. She states that how we conceptualize a spatial process influences the output of an associated spatial analysis. This Uncertain Geographic Context Problem (UGCoP) is an important extension of MAUP, acknowledging that it is not just the scale that imposed on an analysis but how the behaviors and experience within that scale are matched, or mismatched. Tobler (1970) acknowledged yet chose not to incorporate this complexity. Kwan, however, extends the complexity of underlying behaviors to the spatial incongruity. Embedding the term “uncertainty” in the acronym, Kwan more directly connects “unknowns” inherent in spatial data analyses resulting from the scales our data impose (Kwan, 2012a,b). The Modifiable Conceptual Unit Problem (MCUP) (Miller, 2016) suggests that not only are analyses sensitive to the spatial scales, but also to how spatial processes are conceptualized. This was demonstrated using pollen models where the underlying conceptualization, as represented by models of the dispersal process, manifest at more than one spatial dimension (Miller, 2016). The difficulty in identifying a demarcation between discrete and continuous boundaries based on scales of analysis has been recognized since ancient times. In the Old Testament, the demarcation between righteous and wicked is questioned (Genesis 18–23). The “sorites paradox” from the 4th century CE articulates the complexity associated with scales of analysis (Couclelis, 2003; Fisher, 2000) and explores at what scale is a heap of sand still a heap. This philosophical paradox exemplifies the vagueness that pervades geospatial information (Fisher, 2000). Boundaries are inherently fuzzy. Fuzzy set theory recognizes that boundaries are not discrete and there is a continuum along boundaries in which borders encapsulate a bit of both phenomena (Fisher, 1996). The fuzziness resulting from where boundaries are placed constitutes uncertainty. Fuzzy set theories have been integrated into geographic information systems and the use of mathematical fuzzy logic, and fuzzy set theories have been used as a mechanism for defining uncertainty in spatial boundaries (see section “Semantic Uncertainty in Spatial Concepts”). Although considerable academic work addresses the ecological fallacy, MAUP, and their related expressions, the resulting issues of vagueness and ambiguity contribute uncertainty to analytic results.

1.22.2.3

Spatial Extent: Continuous Surfaces and Raster Scale

Perhaps the disciplines that have addressed the problems of ecological fallacy related to geospatial data most directly have been ecology, natural resources, and remote sensing. Considerable research in these fields grapples with the particular issue of scale and scaling as it relates to the ability to use spatial data to link spatial patterns with natural processes (Blöschl, 1996; Hunsaker et al., 2013; Lowell and Jaton, 2000; Mowrer and Congalton, 2003; Quattrochi and Goodchild, 1997; Sui, 2009; Wu et al., 2006). Landscape processes do not always operate on the scales represented in geospatial data, yet the geospatial data we use in a GIS to assess these systems imposes a fixed scale within which we attempt to understand them. Especially in disciplines related to ecology and natural resources, spatial data analyses revolve around use of the raster data structure to represent continuous surfaces. The issue of spatial extent is exemplified by the grid cell structure and the scale it imposes on spatial analyses. Placement of discrete boundaries impacts analyses and contributes uncertainty associated with derived results. This is considerable when using the raster data structure. Although rasters represent continuous surfaces, the grid cell structure itself imposes a discrete boundary and associated scale of representation. In the raster data structure, the spatial support or resolution of spatial datasets is predefined, determined by mechanisms of the satellite (in the case of remotely sensed imagery) or grid cell resolution (in the case of digital elevation models (DEMs)), without consideration of the natural processes that are evaluated using these data (Dark and Bram, 2007). Continuous surfaces represent spatial features that are not discrete and commonly represented in a GIS using uniform grids. This raster grid cell resolution imposes a measurement scale on the nature of geospatial analyses and, by association, a scale on the process (e.g., hydrologic, ecologic) these data and associated analyses represent. The concept of resolution is closely related to scale and refers to the smallest distinguishable component of an object (Lam and Quattrochi, 1992; Tobler, 1988). The grid cell is also referred to as the spatial support, a concept in geostatistics referring to the area over which a variable is measured or predicted (Dungan, 2002). Spatial resolution is related to the sampling interval. The Nyquist sampling theory states that the sampling rate must be twice as fine as the feature to be detected. The sensitivity of model input parameters and model predictions to spatial support have been documented in numerous geospatial analyses and remains an important factor in our understanding, assessment, and quantification of uncertainty in spatial data and related modeling applications (Wechsler, 2007). Practitioners often do not have control of the grid cell resolution of a dataset (e.g., products provided from satellite remote sensing or government-produced DEMs). Subgrid variabilitydthat is variability at scales larger than those captured by the grid cell areadcannot be resolved or captured using a typical raster grid cell structure. This is changing as new technologies place the decision for selecting an appropriate support in the hand of the practitioners, such as data derived from UAV platforms. As technologies advance, new spatial datasets are continually being developed. In recent years, the commercial availability of low-cost hardware and embedded computer systems has led to an explosion of lightweight aerial platforms frequently referred to as unpiloted aerial vehicles (UAVs) or “drones”. UAVs are becoming a powerful cost-effective platform for collection of remotely sensed images. Advances in computer vision software have enabled the construction of 3D Digital Surface Models (DSMs) from acquired imagery using Structure from Motion (SfM). SfM uses complex computer algorithms to find matching points from overlapping images, enabling reconstructions of surface feature reconstructions from overlapping 2D images (Fonstad et al., 2013; Westoby et al., 2012). UAV-derived imagery and surfaces are cost effective, accessible, and facilitate data collection at spatial and temporal scales previously inaccessible. As such, they are becoming widely used data sources in a wide range of disciplines and applications including geomorphological mapping (Gallik and Bolesova, 2016; Hugenholtz et al., 2013), vegetation mapping (Cruzan et al., 2016), and

318

Spatial Data Uncertainty

coastal monitoring (Goncalves and Henriques, 2015). Point clouds obtained from SfM-derived surfaces are used to generate digital surface models (DSMs). Data quality is addressed using RMSE to quantify the accuracy of UAV-derived surfaces and vertical accuracies in the centimeter range are commonplace (Harwin and Lucieer, 2012; Neitzel and Klonowski, 2011; Reshetyuk and Martensson, 2016; Verhoeven et al., 2012). However, statements of accuracy and data quality are no substitute for estimates of uncertainty and resulting decisions for fitness-of-use. Data quality and accuracy assessment have become mainstream practice. The challenge remains to bridge the gap between representation of data quality and mechanisms for quantifying and communicating uncertainty.

1.22.3

Uncertainty Analysis Methods

In the previous section, map scale was used as a framework for exploring uncertainties in spatial data due to cartographic and process scales. Data quality, error, and accuracy were described. This section describes methods that have been used to address uncertainty in both vector and raster data structures. There are two basic data models used in GISs to represent real-world features. The vector data model describes features on the Earth’s surface as discrete and definite objects, such as buildings, parks, and cities. The field data or raster data model describes the Earth’s features as continuous phenomena distributed across the space (e.g., elevation, temperature, or rainfall). In the vector data model, the positions of each object are expressed with pairs of x and y coordinates. The attributes of the object are stored in a relational table and linked with the location of the spatial object. In the raster data structure, space is subdivided into grid cells. Each grid cell contains a number indicating the attribute at the location. Since the vector data model and raster data model represent the world differently, the methods of modeling uncertainty in the two models also differ. The first part of this section explores how uncertainty is addressed in discrete vector datasets. This is followed by a discussion of approaches to addressing uncertainty in continuous surfaces, using research related to the DEM as a representative case study.

1.22.3.1

Modeling Uncertainty in Vector data

Uncertainty in vector data is contributed by errors due to lineage, positional accuracy, attribute accuracy, logical inconsistency, incompleteness, semantic uncertainty, and temporal uncertainty (Guptill and Morrison, 1995; Lo and Yeung, 2002). In this section, we explore the cases of positional and attribute uncertainty in the vector data model.

1.22.3.1.1

Positional uncertainty

In the vector data model, positional uncertainty results from our lack of knowledge about the difference between coordinate values of an object in the model and the “true” locations of that object in the real world (Drummond, 1995). Spatial objects in the vector model are represented as points, lines, and polygons. The position of a point is stored as a pair of x and y coordinates. A polyline is a sequence of connected points with associated x and y coordinates. A polygon is a closed polyline. Positional errors that result in uncertainty may be contributed by factors that include, but are not limited to, limitations due to map scale and cartographic generalization, limitations of current technology, digitizing errors, raster to vector conversion, lack of precision in measurement methods, and source map distortion.

1.22.3.1.2

Analytical modeling of positional uncertainty

The basic geometric type of vector data is the point. Analytical modeling of positional uncertainty starts with point uncertainty, which serves as the building block for discrete uncertainty modeling. Analytical models to assess uncertainty make the assumption that characteristics of spatial data uncertainty are known, and apply error propagation techniques to perform uncertainty analyses (Hong and Vonderohe, 2014). Positional uncertainty contains systematic uncertainty and random errors. Systematic errors are reproducible and tend to be consistent in magnitude and/or direction, such as errors introduced during map projection or generated due to limitations of measuring instruments. Systematic errors, once identified, can be eliminated with correction procedures. In contrast, random errors vary in magnitude and direction and can be analyzed statistically. For example, repeated measurements of a coordinate using a GPS unit contain random errors. In this section, random errors are assumed to be randomly distributed following a normal distribution. With these assumptions, uncertainty associated with these errors can be expressed using descriptive statistics, such as the variance and standard deviation.

1.22.3.1.3

Modeling positional uncertainty of point features

The positional uncertainty of a point has been extensively investigated in geodesy and land surveying (Mikhail and Gracie, 1981). The error ellipse is widely used to model point positional uncertainty. The error ellipse represents the zone of uncertainty surrounding a point. The semi-major axis, semi-minor axis, and orientation of the error ellipse are calculated from the xdirectional variance sx, the y-directional variance sy, and the covariance of x and y coordinates of the point q (Fig. 2) (Hoover, 1984; Alesheikh et al., 1999). Therefore, the uncertainty of a point feature can be represented by the variance of the point. The center of the ellipse is the most likely “true” location of the point. However, statistically speaking, the true location can be anywhere inside

Spatial Data Uncertainty

319

Y

X

Fig. 2

The error ellipse of a point.

the ellipse. Shi (2009) provides an approach to calculate the probability that the true location is inside the error ellipse, represented by the volume of the two-dimensional curved surface on top of the error ellipse.

1.22.3.1.4

Modeling positional uncertainty of linear features

Different methods have been developed to model positional uncertainty of straight lines and curved lines. 1.22.3.1.4.1 Modeling positional uncertainty of straight lines Three models have been used to model positional uncertainty of straight lines: (1) the epsilon band model, (2) the error band model, and (3) the G-band model. The most popular line uncertainty model is the epsilon (3 ) band model which has been explored and interpreted by many researchers (Perkal, 1966; Chrisman, 1989, Blakemore, 1984; Edwards and Lowell, 1996; Leung and Yan, 1998). This model is based on the idea that a line is surrounded by a fixed-width buffer zone of uncertainty on each side of the line. This buffer is referred to as an epsilon band (Fig. 3). There are two general approaches to determine the boundaries of the epsilon band. The deterministic approach supposes the true line lies within the epsilon band with no model of error distribution involved, which is not the case in reality. The pseudo-probabilistic approach proposes that the width of the epsilon band is a function of various variables that contribute to error such as digitizing error, generalization error, and scale. The result is a rectangular, bellshaped, or bimodal distribution that delineates the epsilon band along the “true” line (Chrisman, 1982, Blakemore, 1984; Dunn et al., 1990; Alai, 1993; Alesheikh et al., 1999). Regardless of the approach, epsilon band models assume the same positional uncertainty for every point on the line, that is, a fixed band width along the line. In addition, calculation of the band width in the pseudo-probabilistic approach does not involve a random variable, thereby the pseudo-probabilistic approach is not a stochastic process model (Goodchild, 1991). The error band model relies on the placement of a boundary or band around the location of a line. Unlike the epsilon band model, the band width in the error band model is dissimilar along the line. Dutton (1992) first proposed the error band model assuming that the endpoints of a straight line are random variables with a circular normal distribution. By simulating the error band using the Monte Carlo method, Dutton pointed out that the error band is narrower in the middle and wider at the endpoints of the line. Shi (1994) applied probability analysis to describe the error distribution of the line. In his method, a joint probability function of all the points on the line is computed first, from which the probability of the true location of an individual point inside the corresponding error ellipse is obtained. As the number of individual points on the line tends to be infinite, the final probability distribution for the line within a particular region is then formed by integrating the error curved surfaces of individual points on the line (Fig. 4). This error band model is derived under the assumption that the positional uncertainties of the two endpoints are independent, and therefore the resulting shape of the error band is narrowest in the middle of the line and widest at the endpoints. This assumption may not coincide with reality, as uncertainties in the two endpoints may be correlated.

P1

P2

Fig. 3 Epsilon band model. Adapted from Aleshelkh, A. A., Blais, J. A. R., chapman, M. A. and Kariml, H. (1999). Rigorous Geospatial data uncertainty models for GISs. In Spatial accuracy assessment: Land information uncertainty in natural resources. Lowell, K. and Jaton, A. (eds.) Ann Arber Press: Chelsea Michigan. Figure 24.1b, p. 196.

320

Spatial Data Uncertainty

Fig. 4 Error band model. Adapted from Aleshelkh, A. A., Blais, J. A. R., chapman, M. A., and Kariml, H. (1999). Rigorous Geospatial data uncertainty models for GISs. In Spatial accuracy assessment: Land information uncertainty in natural resources. Lowell, K. and Jaton, A. (eds.) Ann Arber Press: Chelsea Michigan. Fig. 24.1c, p. 196.

Shi and Liu (2000) presented a more generic G-band error model to compensate for the drawbacks of the error band model. The G-band error model assumes that uncertainties in the locations of the two endpoints are correlated, and the error band model is a special case of the G-band error model. With this relaxed assumption, the shape and size of the error band may vary according to the statistical characteristics of the line. For instance, the location of the minimum error can be anywhere along the line depending on the variances of the endpoints and other characteristics. Other models for line uncertainty include the buffer model developed by Goodchild and Hunter (1997), the map perturbation model by Kiiveri (1997), the confidence regional model by Shi (1994), the positional uncertainty model of line segments by Alesheikh and Li (1996) and Alesheikh et al. (1999), the locational-based model by Leung and Yan (1998), the entropy error band (Hband) model by Fan and Guo (2001), the covariance-based a-error band by Leung et al. (2004a), and the statistical simulation model by Tong et al. (2013). These methods are variations or expansions of the epsilon band model or error band model based on either non-probabilistic or probabilistic approaches. 1.22.3.1.4.2 Modeling positional uncertainty of curved lines Linear features in the vector data model also include curves, which can be represented by a series of straight-line segments or a true curve defined by a mathematical function (e.g., a circular curve or spline curve). Modeling uncertainty in a curve approximated by a series of straight-line segments combines approaches used to model uncertainty on straight-line segments. In this section, we discuss uncertainty modeling of true curves defined by mathematical functions. Alesheikh (1998) extended the error band model to curves in which error ellipses for arbitrary points along the curve are defined first, and the region encompassing these error ellipses forms the confidence of the curve. The true curve is thus located inside the region with predefined confidence levels. Shi et al. (2000) suggested two modelsdthe 3 s error band model and 3 m error band modeldto measure uncertainty of a curve. The positional error of an arbitrary point Pi (variance of the arbitrary point) on a curve is derived first. The standard deviation of the point in a normal direction perpendicular to the tangent line of the curve (3 s) is then computed based on the positional error of the arbitrary point. The locus of 3 s along the curve is thus defined as the 3 s error band and represents positional uncertainty of the curve. In this case, 3 s may not be the maximum distance between the error ellipse of the arbitrary point and the curve. Therefore, 3 m, the maximum distance from the potential point of the error ellipse of the arbitrary point on the curve, perpendicular to the curve, is calculated, and the locus of 3 m along the curve is the 3 m error band (Fig. 5). Tong and Shi (2010) further developed these two models for circular curves. Two case studies were conducted to compute positional uncertainty of digital shorelines and digitized road curves.

1.22.3.1.5

Modeling positional uncertainty of polygons

Uncertainty indicators for polygons can be used to estimate uncertainty in area, perimeter, and a gravity point or the centroid of a polygon. The most widely applied error indicator is for the area of polygons. Because polygons are composed of vertices and lines, it is natural to estimate polygon uncertainty based on point uncertainty and line uncertainty. Chrisman and Yandell (1988) proposed a simple statistical model to compute the variance of a polygon area using the variances of its vertices under the assumption that the uncertainties of the vertices are independent and identically distributed. A similar statistical model developed by Ghilani (2000) used two less rigorous and simplified techniques and resulted in the same outcome. Zhang and Kirby (2000) suggested a conditional simulation approach to incorporate spatial correlation between vertices into modeling of polygon

Pi P1

P2

Fig. 5 3 s error band model and 3 m error band model. Adapted from Tong, X. and Shi, W. (2010). Measuring positional error of circular curve features in Geographic information systems (GIS), Computers & Geosciences, 36: 861–870. Fig. 3, p. 864.

Spatial Data Uncertainty

321

uncertainty. Liu and Tong (2005) employed two approaches to compute the standard deviation of a polygon area. One approach is based on the variance of the polygon’s vertices and the other is based on the area of the standard error band of its line segments. A case study shows that the uncertainty of a polygon is caused by the positional uncertainty of its vertices and boundary lines, and there is no significant difference between the two approaches (Kiiveri, 1997; Griffith, 1989; Prisley et al., 1989; Leung et al., 2004c). Hunter and Goodchild (1996) integrated both vector and raster approaches to address uncertainty in positional data. They enhanced the existing grid cell model and extended it to vector data. Two separate and normally distributed random error grids in the x and y directions are created with a mean and standard deviation equal to the estimate of positional error in the original polygon data set. The error grids are then overlaid with the polygon to create a new but equally probable version of the original polygon by applying the x and y positional shifts in the error grids to the vertices of the original polygon. This process can be repeated a number of times to assess uncertainty in the final products. Hunter et al. (2000) applied this model to a group of six polygons. By perturbing the set of polygons 20 times, the resulting 20 realizations are obtained and the mean polygon areas and their standard deviations are calculated to show area uncertainty of the six polygons.

1.22.3.2

Attribute Uncertainty

Attribute error in the vector data model refers to the discrepancy between the descriptive values of an object in the model and the “true” values of that object in the real world (Goodchild, 1995). Different models are used to assess attribute uncertainty based on whether the attribute values are categorical or continuous.

1.22.3.2.1

Error matrix for attribute uncertainty

Uncertainty in categorical data is assessed using an error matrix, also called a confusion matrix. This matrix has been widely adopted to compute classification accuracy in remote sensing, but it applies equally well in modeling categorical attributes in vector models. Another method is to convert uncertainty of categorical attributes to a probabilistic model and evaluate uncertainty using sensitivity analysis. An error matrix is a square array of numbers that cross-tabulates the number of sample spatial data units assigned to a particular category, compared with a “true” category. The “true” category can be acquired either from field checks or from a source data with a higher degree of accuracy (Lo and Yeung, 2002; Shi, 2009). Conventionally, rows list attribute categories in the vector spatial database, and columns represent “true” categories in the reference data. The value in the intersection of row i and column j indicates the number of sample spatial data units assigned to category i in the vector spatial database that actually belongs in category j in the reference “true” data. From the error matrix, the overall accuracy is calculated as the ratio of the total number of correctly assigned spatial data units to the total number of sample spatial data units. The data producer and user accuracy are also derived to evaluate the accuracy of each individual category both in the vector spatial database and in the reference data. These three indices are sensitive to the error matrix structure (e.g., one category of spatial data units dominates the sample) and do not consider chance agreements (Stehman and Czaplewski, 1998; Lo and Yeung, 2002). The kappa coefficient is widely used because it takes chance agreement into account (Cohen, 1960; Rosenfield and Fitzpatrick-Lins, 1986). Additionally, two error indices are the error of commission and error of omission. Errors of commission are defined as incorrect inclusion of spatial data units in a particular “true” category. Error of omission is the percentage of spatial data units that are missing from their “true” category. Sampled data are used as the input to error matrices; therefore, the sampling scheme and sample size influence the assessment results. Because the collection of samples is time consuming and requires a great amount of effort and expense, the sample size has to be kept minimum. However, it should be large enough for the assessment to be conducted in a statistically valid manner (Congalton, 1988; Fukunaga and Hayes, 1989). Various sampling schemes are designed in classical statistics and spatial statistics including simple random sampling, system sampling, stratified random sampling, and stratified systematic unaligned sampling (Shi, 2009). The selection of a sampling scheme depends on the situation and purpose of an application.

1.22.3.3

Modeling Topological Relations of Spatial Objects With Vague Boundaries

Vector data models describe the world with discrete and definable points, lines, and polygons. Real-world objects are not that simple. For example, the boundaries between mountains and plains are not clear and sharp. This type of uncertainty, unlike positional uncertainty, is inherent in the nature of the object. Over the past two decades, the development of topological relation models for spatial objects with vague boundaries has gained increasing attention (Chen, 1996; Zhan, 1998). Clementini and Felice (1996) suggested an algebraic model for topological relations of spatial objects with broad boundaries by extending the 9-intersection model (Egenhofer and Herring, 1991) for crisp spatial objects. In their model, a spatial object has an inner boundary and an outer boundary representing the indeterminacy or uncertainty of the object. The closed region between the inner and outer boundaries is the broad boundary. Thereby, a spatial object can be described with three parts: the interior, the broad boundary, and the exterior. Forty-four topological relations are defined according to geometric conditions of the three parts for two spatial objects. Cohn and Gotts (1996) expressed a vague object with two concentric subregions. The inner subregion is called “yolk”, the outer called “white”, and the inner and outer subregions together make the “egg”. They extended the region connection calculus theory (Randell et al., 1992) to the “egg-yolk” representation of vague objects and derived 46 topological relations between spatial objects with vague boundaries. Shi and Liu (2004) used fuzzy set theory to represent a spatial object with vague boundaries. Each fuzzy object has

322

Spatial Data Uncertainty

a fuzzy membership function. Quasi coincidence and quasi difference were applied to partition the fuzzy sets via the sum of their membership functions and the difference in their membership functions, respectively. The sum and the difference values were then used to interpret topological relations between the fuzzy objects. It is found that there are infinite topological relations between two fuzzy sets, which can be approximated by a sequence of matrices.

1.22.3.4

Analytical Methods and Monte Carlo Simulation

Analytical methods and Monte Carlo simulation methods are two approaches used to represent and quantify uncertainties in both the vector and raster data models. Analytical methods develop functional relationships to link characteristics of uncertainties in input variables and output variables. However, they use cumbersome mathematical and/or statistical concepts and formulae that are analytically and computationally complex (Alesheikh, 1998). In addition, most analytical methods assume independence in input uncertainties and a linear relationship between input and output uncertainties, which is rarely the case in reality (Hong and Vonderohe, 2014). New approaches have been developed to solve these problems. Among them, the Taylor series method is the most widely adopted approach which approximates nonlinear relationships by a truncated series (Heuvelink, 1998; Helton and Davis, 2003; Leung et al., 2004a; Zhang, 2006, Anderson et al., 2012; Xue et al., 2015). Although the Taylor series method simplifies error propagation modeling, it also introduces approximation error uncertainty into the analytical model. While various methods for simulating surfaces are available (Deutsch and Journel, 1992), the most common technique for representing uncertainty in continuous fields applies Monte Carlo simulation. In Monte Carlo simulation, uncertainty is addressed by generating realizations of the surface from a set of random samples drawn from the probability distribution of input data (Hong and Vonderohe, 2014). Monte Carlo simulation has been extensively applied to uncertainty analysis for spatial objects, spatial operations, and computation models (Heuvelink, 1998). In many studies, analytical models and simulation methods have been applied together for comparison and cross-validation (Leung et al., 2004b; Shi et al., 2004; Zhang et al., 2006; Sae-Jung et al., 2008; Cheung and Shi, 2000). In fact, almost all the analytical models can be simulated with Monte Carlo methods.

1.22.3.4.1

Uncertainty simulation in continuous fields

The most common technique for representing uncertainty in continuous fields is Monte Carlo simulation. Simulation methods regard any map as only one of an infinite number of equiprobable realizations within which the true map exists. Realizations of equiprobable error surfaces are integrated in a Monte Carlo simulation to quantify the distribution of the magnitude and spatial dependence of a map’s uncertainty (Ehlschlaeger, 1998). The stochastic approach to error modeling requires a number of maps or realizations upon which selected statistics are performed. Uncertainty is computed by evaluating the statistics associated with the range of outputs. The general steps of a Monte Carlo simulation are as follows (Alesheikh, 1998): 1. 2. 3. 4.

Determine the probability density function of the errors in input data; Obtain a set of N random variables drawn from the probability density function; Perform the spatial operations or computational models with the set of random variables to get N output realizations; Calculate the summary statistics from the N realizations.

The underlying assumption of these representations of continuous surfaces is that it is impossible to know the actual variance and spatial variability of error. This is approximated using random fields to approximate this variability. Differences among Monte Carlo methods for simulating uncertainty are in the methods used to generate random fields to represent the spatial structure of error. Numerous approaches to simulating error have been proposed. The concepts of spatial autocorrelation and cross correlation have been used to derive measures of uncertainty and include the spatial autocorrelation of errors in the generation of random fields (Ehlschlaeger, 1998; Ehlschlaeger and Shortridge, 1996; Fisher, 1991a,b; Goodchild, 2000; Griffith and Chun, 2016; Hengl et al., 2010; Hunter and Goodchild (1997); Wechsler and Kroll, 2006). Fisher (1991a,b) used the Moran’s I statistic to measure the autocorrelation of a normally distributed random field. Hunter and Goodchild (1997) used a spatially autoregressive random field as a disturbance term to propagate errors. Ehlschlaeger and Shortridge (1996) and Ehlschlaeger (1998) developed a model that creates random fields with a Gaussian distribution that matches the mean and standard deviation parameters derived from a higher accuracy source. Wechsler and Kroll (2006) integrated four approaches for generating random fields to quantify DEM error and its propagation to derived parameters. Monte Carlo simulation methods are best suited to model the probability of different outcomes in a process that cannot be easily predicted due to the existence of random variables. Monte Carlo methods can represent a wide range of variations of the input data, and they are easy to implement and can deal with nonlinearity and interdependency of input data. However, the generation of a large number of realizations of output data is rather computationally intensive and time consuming (Crosetto and Tarantola, 2001). In addition, Monte Carlo simulation is considered an experimental method, which means it does not provide the theoretical relationship between the input and output uncertainties (Xue et al., 2015).

1.22.3.4.2

Uncertainty in raster data: The case of the digital elevation model

Approaches to estimate uncertainty vary based on the spatial data structure. Here we discuss approaches to address uncertainty in the raster data structure. Data that represent elevation are one of the most common data used for GIS evaluation of natural systems. DEMs provide the basis for characterization of landform and are used extensively in environmental applications such as

Spatial Data Uncertainty

323

geomorphology, environmental modeling, and hydrology. Here we address the special case of DEM error and resulting uncertainty, which has served as the source of research related to uncertainty in continuous surfaces for over three decades. DEM quality is a function of not only the production method but also the scale at which they are produced. DEM data quality and associated errors are linked to DEM production methods (Daniel and Tennant, 2001; Nelson et al., 2009; Wechsler, 2007; Wilson, 2012). Errors in DEMs arise from mistakes in data entry, measurement, and temporal and spatial variation (Heuvelink, 1999). While the exact nature and extent of errors within a DEM are unknown, attempts to represent the spatial structure of error can be used to address uncertainty. In 1934, Neprash noted “.It is frequently assumed that if traits or conditions are closely associated with one another in their geographic distribution, they are functionally, if not causally, related.” (Neprash, 1934, p. 167). This early recognition of geographic spatial dependence, made famous by Waldo Tobler’s first law of geography (Tobler, 1970), is referred to as spatial autocorrelation. If geographic features are spatially autocorrelated, so too are associated errors (Congalton, 1991). Here we discuss approaches to address uncertainty in the raster data structure. It has been established that errors are a fact of spatial data. Errors therefore are propagated throughout spatial analyses. In the case of raster surfaces, much attention has been given to the propagation of errors from DEMs to the variety of derived parameters including those frequently in hydrologic analysesdslope, upslope contributing area, and the topographic index (Wechsler and Kroll, 2006, 2007). Many of these approaches use the RMSE as a springboard for generating equally viable continuous surfaces integrated into Monte Carlo simulations. Statistics derived from the variability in outcomes based on simulations of equiprobable inputs have been used as a baseline for representing uncertainty in numerous applications.

1.22.3.4.3

DEM uncertainty propagation to derived parameters

Primary terrain attributes, such as surface slope and aspect, are computed directly from DEM data. From slope, flow direction and flow accumulation are calculated. Routing flow in a GIS requires smoothing of the DEM surface prior to calculation of hydrologic parameters and further modifying the surface (Wechsler, 2007; Wu et al., 2008). Various algorithms exist for calculating these derived terrain parameters, each producing different results (Carter, 1992; Horn, 1981; Tarboton, 1997; Zevenbergen and Thorne, 1987). In the case of raster surfaces, much attention has been given to the propagation of DEM errors to the variety of derived parameters frequently in hydrologic analysesdslope, upslope contributing area, and the topographic index (Hunter and Goodchild (1997); Wechsler, 2007; Wechsler and Kroll, 2006). One area not currently addressed is that although a variety of methods exist for deriving terrain parameters, software packages do not offer a choice to users. For example, Esri’s ArcGIS provides only one option for calculating slope and flow direction (Horn, 1981; Jenson and Domingue, 1988). Assessing and communicating variability in results based on algorithms are not currently embedded in mainstream practice. The literature is replete with viable methods and approaches to address uncertainty. They focus on the issues of spatial extent or grid cell resolution, methods to propagate error and simulate its impact on the elevation and derived surfaces, and the propagation of this error to model results. However, the research community has not reached a consensus as to how to approach uncertainty and integrate these methods into accessible components of GIS software. This has impeded the integration of the approaches described in this section into conventional practice.

1.22.4

Uncertainty Propagation in Spatial Analysis

DEM-derived terrain parameters are frequently used as inputs to distributed parameter models that represent landscape processes. Vector data are often used as input to certain spatial operations and/or computational models such as hydrologic models, habitat models, network models, and soil erosion models. Uncertainty in the input vector data affects the accuracy of the final output, as spatial operations and/or computational models are functions of the input data. Spatial operations and/or computational models may contain uncertainty. As a result, uncertainties are propagated through spatial operations and/or computational models to the final output. Accommodating the propagation of uncertainty through modeling requires considering not only model inputs, but also model output uncertainty. While uncertainty analysis approaches facilitate exploration of uncertainty contributed by model parameters, sensitivity analyses are used to measure uncertainty related to model output. These two methods have been used to assess uncertainties of computational models in various areas including transportation, ecology, hydrology, urban planning, hazard susceptibility mapping, land suitability evaluation, and environmental planning (Rinner and Heppleston, 2006; Rae et al., 2007; Chen et al., 2009, 2010; Plata-Rocha et al., 2012; Feizizadeh and Blaschke, 2014; Hong and Vonderohe, 2014; Ligmann-Zielinska and Jankowski, 2014). Sensitivity analysis explores how the uncertainty of model output is apportioned to different sources of variations in the input data (Saltelli et al., 2000). It studies the relationship between the uncertainty of input and the uncertainty of output of a model, determining the contribution of each individual input uncertainty to the output uncertainty. Methods for sensitivity analysis include factor screening, differential analysis, Monte Carlo simulation, and response surface analysis. Techniques to quantify the importance of input uncertainty consist of linear regression analysis, correlation analysis, measurement of importance, and variance-based techniques (Bonin, 2006). Among them, variance-based techniques, also called ANOVA-like methods, have gained more attention in recent years. They are based on Monte Carlo simulation under the assumption that the input factors are independent. For each input factor, the variance-based techniques calculate a sensitivity index representing the fractional contribution to the variance of the model output due to this particular input factor. One advantage of variance-based techniques is that, unlike

324

Spatial Data Uncertainty

regression analysis or correlation analysis, the variance-based techniques work for nonlinear and nonadditive models (Crosetto and Tarantola, 2001). Lodwick et al. (1990) performed a sensitivity analysis of attribute values related to polygonal mapping units and proposed two algorithms for determining the confidence level of the output using five indices, one of which is attribute uncertainty, in a sensitivity analysis. Bonin (2000, 2002) transformed uncertainty of categorical attributes into a parametric probabilistic model and used it in a sensitivity analysis to evaluate the impact of attribute uncertainties on the results of travel time computation. Uncertainty analyses estimate the overall uncertainty of model output information as a result of uncertainties associated with the model input and the model itself (Saltelli et al., 2000). Unlike sensitivity analysis, which only considers uncertainty in model input, uncertainty analysis takes both uncertainties of the model input and uncertainty of the model itself into account. The uncertainty of the model itself is typically represented by parameters which are used to tune the modeling hypotheses (Bonin, 2006).

1.22.4.1

Modeling Uncertainty Propagation in Computational Models

In most computational models, spatial data input includes both raster data and vector data. For example, in a hydrological model, the input data may consist of DEM data, river network data, and land-use/land cover map data. Computational models are simplified mathematical representations of the real world. Uncertainties are inevitably introduced into these models. Input data are subject to various errors contributed by measurement, scale, and sampling issues. Errors in a particular model may be contributed through parameter selection, assumptions, and model structure. The associated uncertainties will propagate and accumulate through the model process and reside in the model output. Evaluation of uncertainty propagation through a computational model helps users make more effective and responsible decisions based on the model.

1.22.4.2

Modeling Uncertainty Propagation in Simple Spatial Operations

Positional uncertainty in buffer analyses is attributed to the positional uncertainty of the original points, lines and polygons, and the buffer width. As a result, the position of the measured buffer is not identical to the position of the “true” buffer. Modeling uncertainty in buffer analysis is mainly investigated for the raster data model, and limited studies are conducted for vector data. Zhang et al. (1998) derived an uncertainty model for buffer analysis based on the epsilon band and error band models. One limitation of this model is that it assumes positional uncertainties of the vertices are identically and independently distributed (Shi, 2009). Shi et al. (2003) proposed a more generic model that not only circumvents the above limitation, but also considers both positional uncertainty and buffer width uncertainty. Three indices are computed to assess uncertainty in buffer analysis: error of commission, error of omission, and normalized discrepant area between the “true” and measured location of the buffer according to the probability density function of the measured vertices and the measured buffer value. Overlay analysis in vector data models includes point-in-polygon overlay, line-in-polygon overlay, and polygon-on-polygon overlay. Polygon-on-polygon overlay involves vital components of point-in-polygon overlay and line-in-polygon overlay. Due to methodological complexities, analytical uncertainty propagation models for overlay analysis in vector data have seldom been explored. Only a few studies are found in the literature (Prisley et al., 1989; Caspary and Scheduring, 1992; Kraus and Kager, 1994; Leung et al., 2004b). Shi et al. (2004) describe an approach where correlation is permitted in uncertainties of x- and ydirections at each vertex and of all the vertices in x-direction (or y-direction). In this more generic and comprehensive model, the following uncertainty indices are calculated based on the variance–covariance matrix of all original polygon vertices: variance and covariance matrix of vertices of the generated polygons, variances of measurements of the generated polygons (i.e., variance of the perimeter, variance of the area, as well as the variance–covariance matrix of the center of gravity), and maximum and minimum error intervals for the vertices of the generated polygons (both for individual vertex and for all the vertices in x and y directions).

1.22.5

Semantic Uncertainty in Spatial Concepts

The previous sections deal with uncertainty related to positional and attribute accuracy. Another major type of uncertainty in spatial data is referred to as semantic uncertainty. For example, the concept of exurbanization, or urban sprawl, has more than 18 definitions in the literature (Berube et al., 2006), and each definition may generate different spatial boundaries of exurbanization for the same area (Ban and Ahlqvist, 2009). Therefore, there is semantic uncertainty in the concept of exurbanization. This section introduces general concepts in semantic uncertainty including issues of uncertainty, ontology, fuzzy-set approaches, and semantic uncertainty in a variety of applications.

1.22.5.1

Uncertainty, Fuzzy-Sets, and Ontologies

As stated, error, vagueness, and ambiguity contribute to uncertainty. Errors occur when classes of objects and individuals are clearly defined but poorly measured. Vagueness occurs when there is no unique distinction between objects and classes. Ambiguity occurs when more than two definitions exist for a concept (Fisher, 1999). Examples of errors such as positional accuracy were addressed in the previous sections. The concept of MAUP or ecological fallacies results in vagueness. An example of vagueness is the concept of a hill since it is difficult to say what appropriate elevation a hill should have. An example of ambiguity is the exurbanization concept

Spatial Data Uncertainty

325

since an area can be classified as either exurban or non-exurban, depending on different definitions. This section reviews research focused on uncertainty, fuzzy-set approach, and ontologies that are related to semantic uncertainty of spatial data.

1.22.5.1.1

Uncertainty

Formal representations, modeling, and databases of semantic uncertainty have been studied in uncertainty research. Gahegan (1999) explored semantic content of spatial data transformation and interoperability using formal notations that can be useful for multiple disciplines including geoscience, hydrology, geology, and geography. Zhang and Goodchild (2002) discussed theoretical and practical aspects of spatial data and uncertainty with emphasis on the description and modeling of uncertainty. Plewe (2002) argued that uncertainty in spatio-temporal data from historical geography can be dealt with by using an uncertain temporal entity model. Ahlqvist et al. (2005) pointed out that uncertainty exists in both semantic and spatial definitions of geographic objects and argued that the uncertainty research of spatial objects should include the existence or semantics of the object itself, the location, and boundary of the object. Morris (2008) described how to deal with uncertainty in spatial databases by using fuzzy sets for its query and representation. Some of these studies concentrated on issues of user’s cognition and criticism of the uncertainty research. Couclelis (2003) argued the necessity of shifting focus of uncertainty research in geography from information to knowledge by differentiating cognitive information system for humans and digital information system for GIS. Foody (2003) described problems of the uncertainty research in GIScience community due to misunderstanding of spatial concepts. Brown (2004) argued for interdisciplinary approaches to addressing uncertainty and recommended that human and physical geographers should collaborate to develop relevant methodologies. Kuipers (2000) discussed the usefulness of spatial semantic hierarchy to provide robust knowledge of local geometry for robotic agents’ movements.

1.22.5.1.2

Fuzzy-set approach

A fuzzy set is a class of objects with a continuum of degrees of membership of a concept or phenomenon (Zadeh, 1965). It can address uncertainty associated with vague concepts by allowing partial membership to a set (Zadeh, 1965; Fisher, 2000). When a spatial concept is characterized by vaguenessdi.e., exurbanizationdit should be represented using fuzzy boundaries rather than crisp discrete boundaries. Fuzzy boundaries of a spatial concept are created by applying a fuzzy-set approach to the spatial data. The fuzzy-set approach utilizes a membership function that assigns each spatial object a membership valuedi.e., being exurbanizeddranging between zerodno membership or not exurbanized at alldand onedfull membership or entirely exurbanized. In addition, a membership value of 0.5 is given at the breakpoint of the definitiondcould be either exurban or non-exurban. Fig. 6 presents how a fuzzy-set membership function can be generated based on one of the existing exurban definitions. According to Daniels (1999), the exurban areas are defined as “10–50 miles away from a city of at least 50,000 people”. The definition consists of two attributes, distance and population. In this section, we focus on the numerical expression of the distance attribute, “10–50 miles away from a city”. An example fuzzy-set membership function for the attribute can be developed following the logic of fuzzy-set approach above. According to the numerical expression of the attribute, the distance values of “10 miles” and “50 miles” can be the breakpoints of a fuzzy-set membership function to decide whether a location is exurban or not. Therefore, let’s assign a membership value of 0.5 for areas that correspond to the breakpoints. To make the fuzzy-set membership function of the attribute simple, a linear type of formula is used in this example. Let’s assign a full fuzzy-set membership 1 to the distance values between 20 miles and 40 miles, and no membership 0 to the distance values 0 mile and longer than 60 miles in the fuzzy-set membership function. Fig. 6 shows visual representation of the fuzzy-set membership function.

1

0.5

mi 0

10 20

40 50 60

Fig. 6 Example of a fuzzy-set membership function of an exurbanization concept of Daniels (1999) focusing on the distance attribute. Redrawn from Ban, H., and Ahlqvist, O. (2009). Representing and negotiating uncertain geospatial concepts–Where are the exurban areas? Computers, Environment and Urban Systems. 33(4), 233–246, Table 1.

326

Spatial Data Uncertainty

Following Ban and Ahlqvist (2009), a set of membership functions for distance in Fig. 6 can be developed as follows: mf : 0:05$X f or ð X < 20Þ; mf : 1 f or ð X  20 and X < 40Þ; mf : 3  0:05$X f or ð X  40 and X < 60Þ; and mf : 0 f or ð X  60Þ

(2)

When Eq. (2) is applied to empirical GIS data, the results can be visualized in maps. Fig. 7 depicts the uncertain spatial boundaries of one of the exurban definitions of Daniels (1999) visualized by using Eq. (2) and data of distance from metropolitan statistical areas (MSA) in four counties in Ohio, USA. The continuous and fuzzy grayscale values in Fig. 7 represent the heterogeneous degrees of exurbanization in the study areas that crisp, Boolean-style boundaries may miss. In Fig. 7, darker gray colors represent higher degrees of exurbanization. There are other exurban definitions in Daniels (1999) that are not introduced here. Each definition of exurbanization in Daniels (1999) can have its own fuzzy-set membership functions. If a concept consists of multiple fuzzy sets, they can be combined by operations such as inclusion, union, intersection, complement, relation, and convexity (Zadeh, 1965). The development of the concept of a fuzzy set and associated equations is well documented (Klir and Yuan, 1995; Zimmermann, 1996; Ragin, 2000; Robinson, 2003). Ban and Ahlqvist (2009) showed how multiple fuzzy-set membership functions can be combined to generate negotiated uncertain boundaries of exurban areas in maps. The fuzzy-set approach has been used in several applications of multidisciplinary research. For example, it has been used to deal with conceptual development of uncertainty research such as multivalued logic (Fisher, 2000) and a rough-fuzzy set (Ahlqvist, 2005b), several uncertain phenomena in human geography and social sciences with qualitative and quantitative data (Openshaw, 1998; Ragin, 2000), uncertainty issues in behavioral and natural environment studies such as land cover mapping and DEM data (Fisher, 1996; Fisher and Wood, 1998; Fisher and Tate, 2006), uncertain geometry and vector boundary representation (Wang and

Fig. 7 Visualization of the uncertain boundaries of exurbanization based on the fuzzy-set membership function and empirical GIS data (distance data source: US Census Bureau, 2000). Depicts one of the existing definitions of exurbanization concept regarding distance, “10–50 miles away from a major urban center” (Daniels, 1999).

Spatial Data Uncertainty

327

Hall, 1996; Guesgen and Albrecht, 2000), vagueness in GIS data, object-oriented databases, spatio-temporal data (Cross and Firat, 2000; Dragicevic and Marceau, 2000), similarity in statistics (Hagen-Zanker et al., 2005), and development of curricula of GIS about uncertainty and fuzzy classification (Wilson and Burrough, 1999). Fuzzy set tools have been integrated into GIS software packages and are therefore accessible to practitioners outside of the research communities.

1.22.5.1.3

Ontologies

Ontologies are theories from philosophy that describe a certain view of the world, for example a spatial term, by its composition, structure, properties, relations, classes, boundaries, functions, and processes (Mark et al., 1999; Fonseca et al., 2002; Couclelis, 2010). Ontologies deal with the nature of being and stem from metaphysics (Ahlqvist et al., 2005). Recently, ontologies have been highlighted in spatial uncertainty research by using computing technologies such as artificial intelligence, machine learning, and the semantic web (Sen, 2008; Kuhn, 2009; Couclelis, 2010). Goodchild (2004) argues that expanded research emphasizing ontologies in geographic processes is necessary. Ontology-related research has focused on the relationship between human cognition and ontologies including: (1) the connection between producers and consumers of an ontology for exchanges of ideas (Frank, 1997), (2) formulating and testing ontologies embodied in human cognition related to geographic categories (Smith and Mark, 1998), (3) the effect of individual differences in cognitive categorization of geographic objects (Mark et al., 1999), (4) an ontology of spatial relations of image interpretation using fuzzy representations (Hudelot et al., 2008), and (5) geographic information ontologies reflecting user intentionality and object of discourse (Couclelis, 2010). Ontology research has developed along with the integration of systems in computing environments. Philosophical, cognitive, and formal theories of semantic uncertainty have been described along with formal tools and conceptual structures for implementation and representation of uncertainty (Kavouras and Kokla, 2007). An ontology-driven GIS architecture that integrates geographic information based on semantic values has been recommended (Fonseca et al., 2002). Research has focused on topics of implementation of spatial ontologies in computer systems, such as the cooperation between critical GIS and computational approaches (Duckham and Sharp, 2005), comparison of ontologies between philosophical meaning and computer-science meaning (Schuurman, 2006), providing a formal model of uncertain geographic information based on multivalues logic (Duckham et al., 2001), and suggesting an approach to deal with vagueness in geographical concepts by using logical and semantic analysis (Bennett, 2001). In addition, some studies have dealt with the temporal aspects of ontology in geography that have been neglected until recently (Couclelis, 1999) and methods to analyze the effect of uncertainty in spatio-temporal interactions of moving objects by using space–time prisms and accessibility (Neutens et al., 2007). Research on spatial ontologies has expanded beyond the spatial domain. For example, Goodchild et al. (2007) introduced the concept of the geo-atom that could deal with uncertainty in measurement of objects to represent both discrete and continuous conceptualizations of the world. Frank (2003) proposed a multitier ontology in spatio-temporal GIS databases consisting of physical reality, observable reality, object world, social reality, and cognitive agents to integrate different philosophical viewpoints ranging from realist and positivist views to postmodern views of the world. Another group of study has made efforts to represent ontologies using formalism for the use of computing to deal with uncertainty. Sen (2008) demonstrated road-network ontologies on the framework for probabilistic geospatial ontologies by using machine-based mapping that verified maps created by human beings. Kuhn (2009) proposed an ontology of observation and measurement to formalize and ground the semantics of spatial phenomena observed and represented on the semantic sensor web. Buccella et al. (2011) proposed a system to integrate geographic data by formalizing information as normalized ontologies consisting of structural, syntactic, and semantic aspects to assist users in finding more suitable correspondences.

1.22.5.2

Applications

In this section, applied spatial uncertainty research is separated by timeframe to demonstrate the depth and breadth of research over the decades. Research in this area continues to expand.

1.22.5.2.1

The 1990s

In the 1990s, research focused on introducing fuzzy-set approaches and the usefulness of uncertainty in geographic research. For example, Fisher and Pathirana (1990) explored the use of a fuzzy classifier to determine land cover components of individual pixels in remote sensing data and demonstrated that the fuzzy classifier could be useful to extract information about individual pixels. Fisher (1992) revealed uncertainty in viewshed analysis and provided an alternative approach by utilizing an error simulation algorithm and fuzzy set to produce fuzzy viewsheds. Hays (1993) examined the uncertain geographic boundary of a certain region that could be represented by a few terms and argued that the fuzzy-set theory could contribute to multiple disciplines by illustrating uncertain geographic boundaries. In addition, Davis and Keller (1997b) proposed a model to combine the techniques of fuzzy logic and Monte Carlo simulation to deal with thematic classification uncertainty and variance in continuously distributed data. Comparisons between the fuzzy-set approach and Boolean approaches were explored. Burrough et al. (1992) suggested using fuzzy classification to determine land suitability and demonstrated that the fuzzy approach provided much better classification of continuous variation than the Boolean approach. Sui (1992) and Davidson et al. (1994) compared results of land evaluation from both Boolean and fuzzy-set approaches and argued the fuzzy-set approach provided more gradual results than the Boolean approach.

328

Spatial Data Uncertainty

De Gruijter et al. (1997) demonstrated that soil distribution modeling should be based on a fuzzy-set approach to capture soil processes and land-use effects that may be missed by traditional soil maps with a higher level of aggregation and classification. Steinhardt (1998) combined traditional assessment methods and a fuzzy-set approach for assessment of large areal units of landscape and demonstrated that the fuzzy sets and fuzzy logic provided a better representation than the traditional assessment method. Lastly, Hunter and Goodchild (1997) applied a spatially autoregressive error model to DEM to demonstrate effects of uncertainty in DEM for analyses of slope and aspect.

1.22.5.2.2

The 2000s

Since 2000, research in uncertainty has continued to expand. Some provide expanded use of fuzzy-set approaches to spatial uncertainty research. For example, Ahlqvist et al. (2003) and Ahlqvist (2005a) demonstrated how rough fuzzy sets can be useful to deal with uncertainty in classification, especially vagueness and indiscernibility in land cover categories. Dixon (2005) incorporated fuzzy rule-based model and data of GIS, GPS, and remote sensing to generate groundwater sensitivity maps. Malczewski (2006) applied the concept of fuzzy linguistic quantifiers into the GIS-based land suitability analysis by using land-use data. In addition, Ban and Ahlqvist (2009) demonstrated how an uncertain urban concept such as exurbanization can be measured by using a fuzzyset approach. Uncertainty in spatial data has been examined in applications using several analytical methodologies. Liu and Phinn (2003) presented an application of cellular automata modeling to represent multiple states of urban development using a GIS and fuzzy-set approach. Ladner et al. (2003) introduced an approach that applies association rules for fuzzy spatial data to assess correlations among values in data. Comber et al. (2006) argued for the importance of assessing data uncertainty by using methods such as linking data quality reporting, metadata, and fitness for use assessment to better deal with spatial uncertainty. Ge et al. (2009) visualized uncertainty in land cover data by using methods including maximum likelihood classification, a fuzzy c-means clustering algorithm, and a parallel coordinate plot. It also demonstrated that a fuzzy-set approach provided better results than a probability approach. Additional research has focused on assessment of uncertainty from the viewpoint of critical GIS. Comber et al. (2004) demonstrated the inconsistency among expressions of expert opinions for relations of land cover ontologies and suggested the combination of different expert opinions. Duckham and Sharp (2005) suggested combining critical thinking in GIS, such as societal issues in using technology with computational approaches to deal with uncertainty in geographic information. In addition, Duckham et al. (2006) developed a qualitative reasoning system to describe and assess consistency of uncertain geographic data to support integration of heterogeneous geographic datasets. Much research has dealt with development of methodologies, frameworks, and tools. For example, Jiang and Eastman (2000) reviewed existing multicriteria evaluation (MCE) approaches in GIS including Boolean and Weighted Linear Combination and proposed the use of fuzzy-set membership for a more specific instance of the MCE. Pappenberger et al. (2007) introduced a method to estimate uncertainty of inundation extent by using a fuzzy evaluation method and remote sensing data. Zhang and Foody (2001) compared the fuzzy c-means clustering algorithm and the artificial neural network approach to classify remote sensing data. Research has provided techniques for modeling spatial uncertainty. Wechsler and Kroll (2006) described a Monte Carlo methodology for evaluation of the effects of uncertainty on elevation values and topographic parameters such as in DEM data. Peterson et al. (2006) explored uncertainty of geographic extent of Marburg virus by using ecologic niche modeling. Heuvelink et al. (2007) developed a statistical framework and software tool for simulating and representing uncertainty in variables of environmental phenomena. Bone et al. (2005) developed a model that applies the fuzzy-set theory to remote sensing and GIS data in order to produce susceptibility maps that show insect infestations in forest landscapes. Uncertainty in classification of spatial data has also received attention. These include methods for representing vagueness in taxonomic class definitions in land-use data combined with a formal representation of fuzzy sets (Ahlqvist, 2004; Ahlqvist and Gahegan, 2005), and for evaluating similarity metrics of semantic land cover change data (Ahlqvist, 2008). In addition to uncertainty of land-use data classification, uncertainty in concepts of broad urban areas was studied. For instance, some works introduced how the uncertainty approach can represent an inherent ordering of categories such as urban, suburban, exurban, and rural areas in the numerical measurement domain (Ahlqvist and Ban, 2007) and demonstrated how an uncertain exurban concept can be measured and represented by using fuzzy-set approach geovisualization, and virtual environment to enable users to negotiate the spatial boundaries of an uncertain concept (Ban and Ahlqvist, 2009). Another area deals with user expertise on the use of uncertain geographic information for risk assessment in the domain of floodplain mapping (Roth, 2009).

1.22.5.2.3

The 2010s

In the 2010s, applications of spatial data uncertainty research have focused on the development of models and data. For instance, Voudouris (2010) proposed object and field data model combined with uncertainty and semantics by using the Unified Modeling Language (UML) class diagram. Tate (2013) evaluated uncertainty of social index reliability by assessing and visualizing uncertainty for a hierarchical social vulnerability index data using Monte Carlo-based analysis. In addition, a few works paid attention to the perception of the users and broad audience. For example, Grira et al. (2010) argued that the participation of end-users in the management of spatial data uncertainty in a Volunteered Geographic Information (VGI) context contributes to improving the perceived spatial data quality. In addition, Goodchild et al. (2012) argued that the uncertainty concepts in geographic information and social science approach should be involved in the Digital Earth for a better communication between scientists and the public regarding uncertainty in the information technology (IT) and spatial data.

Spatial Data Uncertainty

329

There are other recent works that have developed the frameworks and tools of the spatial uncertainty. Janowicz et al. (2011) and Bastin et al. (2013) introduced new frameworks to deal with uncertainty in geographic information implemented in web-based user interfaces. Bordogna et al. (2012) developed a geographic information retrieval model and a software tool to represent uncertainty in indexing text in documents with geographic locations.

1.22.6

Uncertainty Visualization

Research on uncertainty visualization is relatively new in the timeline of research on spatial data uncertainty and has developed with the advancement of IT, especially since 1990s. Currently, various visualization techniques from IT such as static, animated, twodimensional/three-dimensional/more than four-dimensional, and interactive web-based visualization have been used to represent spatial data uncertainty more effectively. This section focuses on different techniques to visualize data uncertainty, including both positional uncertainty and semantic uncertainty. Much of the research dealing with visualization of positional uncertainty has focused on development of methodologies. For example, Davis and Keller (1997a) introduced modeling and visualization of multiple types of uncertaintydsuch as uncertainty in classification and data accuracydwith an example of slope stability modeling. Lucieer and Goovaerts (2006) developed a geostatistical method to generate a spatial distribution of risk measurements to investigate how uncertainty associated with risk is visualized as uncertain locations of spatial clusters and outliers. Monmonier (2006) focused on uncertainty generated by the processes of data preparation, modeling, and classification, and argued that cartographers should not be neglected while technology of uncertainty visualization has evolved. Xiao et al. (2007) developed a method to evaluate the classification of robustness of choropleth maps when uncertainty is present in the data. Tucci and Giordano (2011) developed a method for detecting positional inaccuracy and uncertainty to measure deceptive changes in urban areas using historical maps. Pfaffelmoser et al. (2011) developed a visualization methodology for representing positional and geometrical variability of isosurfaces in uncertain 3D scalar fields with user interaction. Kwan (2012a) introduced the uncertain geographic context problem that could be generated from the way contextual units or neighborhoods are geographically delineated. The research community has developed software tools to visualize semantic uncertainty. For example, Bastin et al. (2002) developed a toolkit consisted of interactive and linked views to enable visualization of data uncertainty that allowed users to consider error and uncertainty as integral elements of image data to be viewed and explored. Lucieer and Kraak (2004) developed a tool to visualize fuzzy classification of remotely sensed imagery with the use of exploratory and interactive visualization techniques. The tool consists of dynamically linked views including an image display, a parallel coordinate plot, a 3D features space plot, and a classified map of uncertainty. On the other hand, some studies on visualization of uncertainty in data types have developed a model of fuzzy spatial data types consisting of fuzzy points, lines, regions with fuzzy spatial algebra (Schneider, 1999), and a multivalue data type that consists of multiple instances of the same object to visualize uncertainty of the spatial multivalue data (Love et al., 2005). Efforts for accuracy assessment of data and methodology have provided some guidelines to uncertainty visualization research. For the assessment of spatial data accuracy, Goodchild and Hunter (1997) proposed a technique for evaluating the positional accuracy of digitized linear features based on a comparison with higher accuracy data using statistics and visualization. Woodcock and Gopal (2000) demonstrated that the methods for accuracy assessment using fuzzy sets contribute to finding the magnitude of errors and assessment of ambiguity in map classification. Themes related to users’ cognition, perception, and behavior to visualization have been explored. For instance, Deitrick and Edsall (2006) demonstrated the importance of the way uncertainty is expressed since uncertainty visualization could affect decision-making. Some authors argued the importance of user interpretation and perceptual issues of understanding the graphical expression of uncertainty (Drucker, 2011; Brodlie et al., 2012). Usability is another theme that has been examined for uncertainty visualization. Some of them addressed that uncertainty in geographic data, and classification should be visualized to help users’ understanding (Deitrick and Edsall, 2008; Slingsby et al., 2011). In terms of user evaluation, Hope and Hunter (2007) demonstrated that general end users usually do not have an intuitive understanding of uncertainty represented in the outputs of GIS in the process of decision-making regardless of their experience with spatial data. Devillers et al. (2010) also criticized that uncertainty visualization remained only in the academic literature and has not reached the general end-users of spatial data. There are several works that investigate how cartographic elements should be used in uncertainty visualization. For example, some of them demonstrated computer graphics and visualization techniques such as three-dimensional, shape, glyph, magnitude, volume, hue, and interactivity to help users’ access and understanding of uncertainty in the data (Wittenbrink et al., 1996; Pang, 2001). Another research topic in this area is evaluating a user’s perceptions of the effectiveness of a particular visualization method such as shape, opacity, blinking, three-dimensional, hue, and saturation (Drecki, 2002). In addition, a method of choropleth mapping was developed to represent uncertainty of socioeconomic data by using hierarchical tessellations of data uncertainty and quadtree data structure (Kardos et al., 2005). Bostrom et al. (2008) argued that cartographic design features should be used in risk communications as maps to better represent uncertainty information and to influence risk perception and behavior. Visual variables and cartographic techniques such as whitening of hues, orientation, grain, arrangement, shape, fuzziness, transparency, and iconicity have been studied for cognitive testing of uncertainty visualization (Kubícek and  Sasinka, 2011; MacEachren et al., 2012; Kaye et al., 2012). With recent technology development, web-based visualization has been used in semantic uncertainty research. Examples include web-based visualization and data exploration tools for capturing uncertain geography concepts and data such as “high crime areas”

330

Spatial Data Uncertainty

(Evans and Waters, 2007), simulation of a snow avalanche event (Kunz et al., 2011), and statistical processing for extensive psychological user evaluation (Kubícek and  Sasinka, 2011). Lastly, three-dimensional visualization and augmented reality have been recently applied. Su et al. (2013) developed uncertainty-aware geospatial visualization using 3D-augmented reality techniques to monitor the proximity between invisible utilities and digging implements to deal with utility strikes for improvement of urban excavation safety. Delmelle et al. (2014) evaluated the influence of positional and temporal inaccuracies in the 3D mapping of potential outbreaks of dengue fever data.

1.22.7

Uncertainty in Crowd-Sourced Spatial Data

We have entered an era of big data: large volumes of various datasets are being created every moment. Videos are made and shared on YouTube. Photos are uploaded and pinned on Instagram, Flickr, and Pinterest. Blogs are written and commented on possibly any topic including science, politics, fashion, and travel. Tweets are posted and mentioned on anything from the presidential election and social rebellion to local news and daily gossip. Spatial data such as Global Positioning System (GPS) trajectories and places of interest are produced in collaborative mapping projects and social networking platforms, with or without users’ awareness. It is indeed an age of voluminous data produced at an unprecedented pace in all aspects of human activities. These “Big Data” and associated analytics impact our everyday lives and shape our decisions. We are able to track product prices and purchase when the price reaches historical low. The travel website Kayak relies on historical price changes to forecast price trends and offer purchasing advice to consumers. Navigation systems such as Waze incorporate traffic data contributed by on-road drivers via their cellphones to provide route suggestions in real time. Big spatial data are unique in the ability to analyze data based on their spatial location. These data provide great potential to study physical and human phenomena. Earth observation systems such as the Landsat program continuously acquire satellite imagery of the Earth, while widespread sensor networks constantly monitor environmental conditions including temperature, pressure, and sound. Furthermore, empowered by advanced geospatial technologies and ubiquitous computing systems, billions of human sensors (Goodchild, 2007) are now capable of creating large amounts of geotagged data from bikeways to parking lots. In many cases we can now find more than one data source for a specified geographic feature. These crowd-sourced spatial datasets have unique characteristics: First, these datasets are assertive instead of authoritative. There is no “gold standard” or master geographic dataset any more. Anyone with access to geospatially enabled technologies can generate geographic information effortlessly and different representations of the same geographic world can coexist simultaneously. Second, a much wider range of geographic phenomena can be mapped by citizen volunteers, and ephemeral geographic things and events may be recorded and visualized in real time. When mapping was expensive and production cycles were long, government agencies and their experts tended to map things that are stable, which ensures the accuracy and validity of maps for a long period of time. However, identification of an object’s location on the Earth’s surface and creation of one’s travel trajectories may be as easy as pressing a few buttons on a smartphone nowadays. When we are inundated with such volumes of easy-to-get information, evaluation of data sources and identifying factors that contribute to uncertainty is of particular importance.

1.22.7.1

Evaluation of Uncertainty in Crowd-Sourced Spatial Data

Uncertainty in spatial data may be measured: (1) during the data creation process or (2) evaluated after datasets are produced. Goodchild and Li (2012) proposed three mechanisms (the crowd-sourcing, social, and geographic approaches) to ensure spatial data quality in the data acquisition and compilation process. Other researchers have evaluated data quality by comparing a crowd-sourced dataset with a reference authoritative data source (e.g., Haklay (2010); Cipeluch et al., 2010). These two directions of research result in four major methods for evaluating uncertainty in crowd-sourced spatial data. Uncertainty in crowd-sourced spatial data can be evaluated using the crowd-sourcing approach which is based on Linus’s Law “given enough eyeballs, all bugs are shallow (Raymond, 1999)”. In its original context, the likelihood of a bug being found and corrected increases as the number of programmers reviewing a piece of code increases. When applied to data uncertainty, it suggests that higher accuracy of spatial data is associated with a larger number of reviewers or editors. This approach has been confirmed by a study on data quality of OpenStreetMap (Haklay, 2010): Positional accuracy of features increases as the number of editors increases until a threshold is reached. This approach can also be used to update datasets such as street networks. When an offroad route is frequently traveled by an increasing number of drivers, a new road segment may probably be added to make the dataset up-to-date. This approach works best for geographic features that are prominent or draw attention from many viewers and editors, but it may not be as effective for geographic areas that are sparsely populated or attract little interest. This is also problematic for spatial data that involve disagreements (e.g., feature types), as demonstrated by the “tag wars” in OpenStreetMap (Mooney, 2011). The second method is called the social approach, which relies on social trust and a hierarchy of gate-keepers. A reputation system is established based on the number and quality of contributions a person makes. Data producers at different levels in the hierarchy have different privileges in terms of editing, deletion, blocking other users, and resolving disputes. This type of mechanism has been implemented in many crowd-sourcing projects, including Wikipedia and OpenStreetMap (OSM). The third approach uses geographic theories and principles that govern our world to assess the likelihood of a piece of geographic information being true. While the first two approaches evaluate data uncertainty indirectly by looking at either

Spatial Data Uncertainty

331

the number of editors or the credibility of a contributor, this approach evaluates spatial data uncertainty directly using geographic knowledge. A geographic fact under evaluation should be consistent with our geographic knowledge of that particular feature type and the surrounding areas. Uncertainty in the form of logical inconsistency can be easily identified using such a knowledge base. For instance, a photo of a charming house geotagged to an ocean is probably misplaced. This is a promising approach for ensuring data quality; however, implementation of thousands of geographic principles and rules remains a challenge. In addition to the three approaches proposed by Goodchild and Li (2012), another common method to evaluate uncertainty in crowd-sourced spatial data is a comparison between a dataset and a reference authoritative source. This method has been widely used in quality analysis of OSM data in different parts of the world. For example, Haklay (2010) assessed the positional accuracy and completeness of geographic features in London and England by comparing OSM with Ordnance Survey datasets. Girres and Touya (2010) analyzed geometric, attribute, semantic and temporal accuracy, logical consistency, completeness, lineage, and usage of OSM data in France by comparing them with the BD TOPOÒ Large Scale Referential (RGE) from the French National Mapping Agency (Institut Géographique National, IGN). Fan et al. (2014) evaluated building footprint data in terms of completeness, semantic accuracy, position accuracy, and shape accuracy for the city of Munich in Germany by comparing OSM with ATKIS (German Authority Topographic–Cartographic Information System). Although this is a widely adopted method, we need to be aware of its limitations: It may quickly become inadequate when it comes to temporal accuracy, as authoritative data sources usually have a long production cycle and may easily be outperformed by a crowd-sourced spatial database created by millions of human sensors with advanced geospatial technologies. Evaluation of completeness using this method may also not be appropriate due to the same reason. As demonstrated by a study of conflating bikeways from authoritative and crowd-sourced data (Li and Valdovinos, 2017), authoritative datasets are no longer the gold standard and can actually be complemented and improved using a crowdsourced spatial dataset.

1.22.7.2

Uncertainty in Platial Data

A new dimension that is becoming increasingly important is uncertainty regarding places in crowd-sourced data. Place is a core concept in everyday life, in the discipline of geography, and across the social sciences. It has been extensively studied in human geography from experiential, humanistic, and phenomenological perspectives (e.g., Tuan, 1977; Relph, 1976; Harvey, 1993; Hubbard and Kitchin, 2010). However, this has generally been ignored in GIScience due to its vagueness and ambiguity, which do not meet the standard of rigor in scientific studies. Many places are geographic objects with indeterminate boundaries or changes through time that are difficult to define as the exact polygonal extents traditionally required by GIS (Burrough and Frank, 1996). On the other hand, space has been studied comprehensively through such concepts as location, distance, direction, geometry, and topology in GIScience. GISs have focused on representing space using points, lines, and areas in predefined coordinate systems, within which measurements are made. So far, research on spatial data accuracy and uncertainty focuses on geographic features represented as coordinate pairs in space. However, the wide adoption of digital technologies and the crowd-sourcing of geographic information have generated voluminous data centered on place, which has a great potential to benefit society. It is time to take place more seriously in spite of its imprecise or absent spatial component. Just as location is essential in spatial representation, placename is critical in representing places. It is the key to link information about the same place from different sources, just as location is the key to link properties through spatial joins. An accurate location, or even a highly inaccurate location, is no longer essential in representing place. A placename like Lower Manhattan may be more pertinent to an average citizen than the officially defined boundary of Manhattan. The development of digital technologies makes it incredibly easy to record people’s lives, so placenames that were once present only in verbal conversations are mentioned and discussed on the web, in Twitter, Flickr, travel blogs, etc. This is a form of informal geographic information. As shown in this sentence, in downtown Long Beach, the Fahrenheit temperature was 59 on 5 December 2016. It involves a place “downtown Long Beach” although it is poorly defined in terms of spatial location, without any coordinates in a reference system. A message like this is definitely meaningful and useful despite its uncertain spatial location. Place is sometimes not a well-defined geographic entity, but a vague object with changing context. Due to its ubiquitous presence in human discourse, many efforts have been made to represent uncertainty associated with places, especially those mentioned and tagged in crowd-sourced datasets. For example, Montello et al. (2003) asked participants to draw the location of downtown Santa Barbara in order to extract spatial information for the vague place “downtown”. Jones et al. (2008) tried to delineate the spatial extent of vague places by searching and calculating the frequency of co-occurrence between examined places and more precisely defined places. There is also increasing interest in generating spatial footprints for imprecise regions using geotagged photos, particularly photos in Flickr. Grothe and Schaab (2009) used two statistical methodsdkernel density estimation and support vector machinesdto generate boundaries of named places. Keßler et al. (2009) implemented a clustering and filtering algorithm to assign spatial locations to placenames and to compare those locations with the locations obtained from traditional gazetteers. Li and Goodchild (2012) extracted geotagged Flickr photos and constructed different places in a hierarchy in France. Fig. 8 shows a density surface of Flickr photos tagged with the placename “Paris” (in red) along with density surfaces for many of the constituent features of a tourist’s Paris. The map bears little relationship to the official, spatial definition of Paris, but clearly demonstrates the hierarchical nature of a tourist’s conception of Paris. More recently, Chen and Shaw (2016) developed a modified kernel density estimation method to delineate vague spatial extents of places extracted from Flickr photos.

332

Spatial Data Uncertainty

Fig. 8 Density surfaces created from Flickr postings for Paris (red) and several major places within a tourist’s Paris (Background map: OpenStreetMap) (Li and Goodchild, 2012).

1.22.7.3

Uncertainty Equals Low Quality?

Understandably, people prefer certainty to uncertainty. Over the past few decades, the GIScience community has made considerable effort to fight spatial data uncertainty by reducing errors, removing fuzziness, and increasing quality. We judge a spatial data source by how certain or accurate it is, and have developed numerous methods and techniques to assess uncertainty. As discussed in the previous sections, positional uncertainty is measured using accurate quantitative methods such as the difference between a represented location and its true location in reality. A high-quality spatial database is expected to have minimal uncertainty in every aspect. Geometric accuracy has always been an important measurement to evaluate map quality. However, in the era of big geospatial data, a large portion of geographic information is created on places. Placenames may be all that is present to link different properties provided by various sources, and locational or geometric component may not be available at all. Should we treat these data sources as little or no value due to the high uncertainty associated with locations? Or should we extract valuable knowledge from them in spite of uncertainty? Perfectly accurate data are not always necessary, or even desirable. The London Tube map (Fig. 9) is a classic example of schematic mapping that represents stations on transit lines in their relative positions. It was the first attempt to represent complex transit railway systems in London using a schematic diagram, designed by Harry Beck in 1933. Distance in central London is expanded while distance in the periphery is shrunk. Lines are straightened and directions are adjusted to horizontals, verticals, and diagonals. Although geometry is intentionally distorted, the Tube map serves the major purpose of a transit map, which is to convey route information to passengers. Assisted by the map, travelers find their origins and destinations, where to get on a train, where to make connections, and where to get off. Its success as a navigation aid has been proven by its wide adoption in many cities all over the world (Ovenden, 2007). This type of map significantly reduces cognitive load in wayfinding in a complex subway system or other public transit systems. Schematic mapping is also used in railroad maps and electric distribution systems where geographic accuracy between features such as junctions is not relevant. In all these cases, geometry is reduced for the sake of clear representation and reduction of cognitive load in comprehending essential information. Nonessential geometric details between subway or railroad stations are not required for performing the task. In summary, the Tube map has very low accuracy and high uncertainty in terms of positions of train stations, but it is a very suitable map format for conveying the topological relationships between various stations and is actually more convenient to use compared with a planimetrically correct map with accurate positions. Another type of map that does not preserve geometry is the cartogram. Cartograms distort location and resize geographic features to take advantage of length of lines or area of polygons so as to emphasize some geographic variables, such as population and travel time. Area cartograms, also known as value-by-area maps, have been proven an effective way to visualize spatial distributions by substituting land area for other quantitative attributes (Tobler, 2004). For example, cartograms were created to represent China’s population and wealth distribution (Li and Clarke, 2012). They were also used to visualize different aspects of social life in Great Britain (Dorling, 1995) and are included in newspaper stories and technical reports

Spatial Data Uncertainty

Fig. 9

333

Official tube map: London underground. Source: http://content.tfl.gov.uk/standard-tube-map.pdf

due to their simple interpretation. A second type of cartogram is a distance cartogram, in which map distance does not correspond to distance on the Earth’s surface, but other variables, such as travel time, and travel cost. Distance is distorted so as to be proportional to the magnitude of another variable that is of interest. In both types of cartogram, location, length, size, and shape are distorted according to the value of an investigated variable, while topology is maintained, manifested as retained adjacency between polygons and connectivity between nodes. In geographic visualization, absolute geometric accuracy is often not the ultimate goal, although high accuracy may be required for identification of a geographic feature on the Earth by its shape, or measurement of distance and area. In many other tasks, precise location is not required. In both schematic maps and cartograms, we deliberately distort geometry to achieve some other purpose, including removal of unnecessary detail to provide a focused presentation. This is particularly applicable to the world of crowdsourced geographic data. Large amounts of valuable information are embedded in blogs, websites, photos tags, and tweets. No accurate locational information may be available. Instead of rejecting these sources outright due to their inherent uncertainty, we may choose to accept uncertainty as an intrinsic characteristic of platial data and make good use of these data sources. Place may be represented simply as a point whose absolute location is not critical. Instead, it is relative locations of places and the connectivity and relations between them that matter.

1.22.8

Future Directions

Uncertainty as an inevitable component of any spatial databases has a profound impact on analyses and decisions relying on these data. GIScientists have devoted much effort to quantify, assess, and visualize data uncertainty by developing mathematic formulae and simulation methods to characterize positional and attribute uncertainty in both vector and raster data models. Even though uncertainty in spatial data has been extensively studied, ranging from positional accuracy to attribute accuracy, from analytical approaches to Monte Carlo simulation approaches, from uncertainty of spatial objects to uncertainty propagation in spatial operations and computational models, there is still much work to be done in terms of modeling and conveying uncertainty. Current models mainly deal with positional accuracy, whereas attribute uncertainty has received little attention in the literature. It might be attributed to the difficulty of modeling categorical data which constitute a major part of attribute data. However, attribute data are an integral component of spatial data; the lack of uncertainty modeling for attribute data will eventually impact

334

Spatial Data Uncertainty

uncertainty assessment as a whole. Therefore, more research needs to be done on models for attribute uncertainty, as well as models for integrating positional uncertainty and attribute uncertainty. Modeling for uncertainty propagation in spatial operations like buffer and overlay analysis is largely investigated for raster data instead of for vector data, as these operations are more straightforward in raster data. In spite of this, many studies use vector data to perform spatial analysis. In some cases, vector data are the best model, such as network analysis. Without appropriate models for uncertainty assessment, the results cannot be fully trusted. Thus, it is important to develop more models for uncertainty propagation in spatial operations using vector data. The need to integrate uncertainty models for vector and raster data is a natural result of the integration of vector and raster data models. For example, in a zonal statistics analysis of calculating the average temperature for each county in a state, the county data is vector data, and the temperature is raster data. The zonal statistics is a combination of raster data and vector data. Monte Carlo simulation might be a solution to assess uncertainty propagation during the zonal statistics analysis, but it is also important to develop an integrated analytical method in order to understand the relationship behind. Fuzzy-set and ontologies have been used to address semantic uncertainty while cartographic techniques have been adopted to visualize uncertainty for better communication with the wider audience. The increasingly growing crowd-sourced spatial data have added several new dimensions to the data uncertainty research. New methods must be developed to evaluate positional and attribute uncertainty for datasets created without quality control procedures. Given that a large portion of crowd-sourced geographic data is centered on places without accurate positional information, uncertainty of platial data is an inadequately studied area. Regarding semantic uncertainty, the existing literature focuses on fuzzy-set approaches to measure uncertainty and ontologies to deal with formal description of uncertainty in the computing environment. The issues of semantic uncertainty are present in a wide spectrum of applications including urban, environmental, technical, and social aspects. Some of recent studies utilize advanced methodologies of information technology and computer science such as web-based systems and Digital Earth. The study of semantic uncertainty might contribute to wider areas of knowledge with further development of technology such as virtual environment to deal with uncertainty problems of data from both real space and virtual space. Current studies of uncertainty visualization utilize some elements of cartography such as hue, saturation, transparency, shape, and orientation to represent uncertainty of spatial concepts and data. Some of recent studies utilize advanced methodologies such as web-based 3D visualization and augmented reality. The study of uncertainty visualization could benefit from future development of IT by expanding the type of uncertainty visualization. As an alternative to authoritative spatial data, crowd-sourced geographic information is promising in providing free and timely data for various applications such as basic spatial infrastructure data, routing and navigation data, places of interest, and data for tourism and emergency response. However, uncertainty associated with this new data source needs to be investigated and understood more thoroughly. On one hand, new methods need to be developed to evaluate data quality as demonstrated in this article. Four approaches have been proposed or applied, namely the crowd-sourcing approach, the social approach, the geographic approach, and the comparison approach. These methods have not been completely implemented as an automatic process in software packages. On the other hand, we may not need to be obsessed with highly accurate spatial data in every aspect. New methods and techniques should be developed to take advantage of large volumes of uncertain data that are centered on places, represented by placenames and relationships between places. Many approaches to represent spatial data uncertainty have been proposed and summarized in this article. However, currently many of these approaches are not accessible to everyday geospatial practitioners. For general users, it is impossible to apply those models in real-world applications. This may be due to a number of factors. n n n n

The scientific community has not come to a consensus regarding approaches to be taken. Software vendors have not integrated approaches to quantify and visualize uncertainty. Academic communities are not consistently training students in methods to address, quantify, and visualize uncertainty, thus there is not a demand for software vendors to provide this service. The community of practitioners may still be reticent to accept the fuzziness inherent in all spatial data and perhaps view it as a threat to the perceived validity of geospatial results.

The time has come for uncertainty to be integrated into common geospatial practice. To do so requires an understanding of the complexities of uncertainty and a consensus within the scholarly and professional communities regarding how it should be approached.

References Ahlqvist, O., 2004. A parameterized representation of uncertain conceptual spaces. Transactions in GIS 8 (4), 493–514. Ahlqvist, O., 2005a. Using uncertain conceptual spaces to translate between land cover categories. International Journal of Geographical Information Science 19 (7), 831–857. Ahlqvist, O., 2005b. Transformation of geographic information using crisp, fuzzy and rough semantics. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, p. 99. Ahlqvist, O., 2008. Extending post-classification change detection using semantic similarity metrics to overcome class heterogeneity: A study of 1992 and 2001 US National Land Cover Database changes. Remote Sensing of Environment 112 (3), 1226–1241. Ahlqvist, O., Ban, H., 2007. Categorical measurement semantics: A new second space for geography. Geography Compass 1 (3), 536–555. Ahlqvist, O., Gahegan, M., 2005. Probing the relationship between classification error and class similarity. Photogrammetric Engineering & Remote Sensing 71 (12), 1365–1373.

Spatial Data Uncertainty

335

Ahlqvist, O., Keukelaar, J., Oukbir, K., 2003. Rough and fuzzy geographical data integration. International Journal of Geographical Information Science 17 (3), 223–234. Ahlqvist, O., Bibby, P., Duckham, M., Fisher, P., Harvey, F., Schuurman, N., 2005. Not just objects: Reconstructing objects. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, pp. 17–25. Alai, J. (1993). Spatial uncertainty in a GIS. Master of Science, The University of Calgary, Calgary. Alesheikh, A. A. (1998). Modeling and managing uncertainty in object-based geospatial information systems. Doctor of Philosophy, The University of Calgary, Alberta. Alesheikh, A. A. and Li, R. (1996). Rigorous uncertainty models of line and polygon objects in GIS. Paper presented at the GIS LIS -International Conference ’06, Denver. Alesheikh, A.A., Blais, J.A.R., Chapman, M.A., Kariml, H., 1999. Rigorous geospatial data uncertainty models for GISs. In: Jaton, Annick, Lowell, Kim (Eds.), Spatial accuracy assessment: Land information uncertainty in natural resources. Ann Arber Press, Chelsea, pp. 195–202. Anderson, T.V., Mattson, C.A., Larson, B.J., Fullwood, D.T., 2012. Efficient propagation of error through system models for functions common in engineering. Journal of Mechanical Design 134 (1), 1–6. ASPRS, 1989. ASPRS accuracy standards for large scale maps. Photogrammetric Engineering and Remote Sensing 56, 1068–1070. Ban, H., Ahlqvist, O., 2009. Representing and negotiating uncertain geospatial concepts–Where are the exurban areas? Computers, Environment and Urban Systems 33 (4), 233–246. Bastin, L., Fisher, P.F., Wood, J., 2002. Visualizing uncertainty in multi-spectral remotely sensed imagery. Computers & Geosciences 28 (3), 337–350. Bastin, L., Cornford, D., Jones, R., Heuvelink, G.B., Pebesma, E., Stasch, C., Nativi, S., Mazzetti, P., Williams, M., 2013. Managing uncertainty in integrated environmental modelling: The UncertWeb framework. Environmental Modelling & Software 39, 116–134. Bennett, B., 2001. What is a forest? On the vagueness of certain geographic concepts. Topoi 20 (2), 189–201. Berube, A., Singer, A., Wilson, J.H., Frey, W.H., 2006. Finding exurbia: America’s fast-growing communities at the metropolitan fringe. In: The Brookings institution: Living cities census series, pp. 1–48. October. Blakemore, M., 1984. Generalization and error in spatial databases. Cartographica 21 (2), 131–139. Blöschl, G. (1996). Scale and scaling in hydrology. Technical University, Institut für Hydraulik, Gewässerkunde, und Wasserwirtschaft. Bone, C., Dragicevic, S., Roberts, A., 2005. Integrating high resolution remote sensing, GIS and fuzzy set theory for identifying susceptibility areas of forest insect infestations. International Journal of Remote Sensing 26 (21), 4809–4828. Bonin, O. (2000). New advances in error simulation in vector geographical databases. Paper presented at the 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, University of Amsterdam, The Netherlands. Bonin, O., 2002. Large deviation theorems for weighted sums applied to a geographical problem. Journal of Applied Probability 39 (2), 251–260. Bonin, O. (2006). Sensitivity analysis and uncertainty analysis for vector geographical applications. Paper presented at the 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Lisbon. Bordogna, G., Ghisalberti, G., Psaila, G., 2012. Geographic information retrieval: Modeling uncertainty of user’s context. Fuzzy Sets and Systems 196, 105–124. Bostrom, A., Anselin, L., Farris, J., 2008. Visualizing seismic risk and uncertainty. Annals of the New York Academy of Sciences 1128 (1), 29–40. Brodlie, K., Osorio, R.A., Lopes, A., 2012. A review of uncertainty in data visualization. In: Expanding the frontiers of visual analytics and visualization. Springer, London, pp. 81–109. Brown, J.D., 2004. Knowledge, uncertainty and physical geography: Towards the development of methodologies for questioning belief. Transactions of the Institute of British Geographers 29 (3), 367–381. Brus, J. (2013). Uncertainty vs. spatial data quality visualisations: A case study on ecotones. 13th SGEM GeoConference on Informatics, Geoinformatics And Remote Sensing, (International Multidisciplinary Scientific GeoConference SGEM2013), vol. 1, pp. 1017–1024. Albena, Bugaria. Buccella, A., Cechich, A., Gendarmi, D., Lanubile, F., Semeraro, G., Colagrossi, A., 2011. Building a global normalized ontology for integrating geographic data sources. Computers & Geosciences 37 (7), 893–916. Burrough, P.A., Frank, A., 1996. Geographic objects with indeterminate boundaries, 2. CRC Press, London. Burrough, P.A., MacMillan, R.A., Deursen, W.V., 1992. Fuzzy classification methods for determining land suitability from soil profile observations and topography. Journal of Soil Science 43 (2), 193–210. Buttenfield, B.P., 1993. Representing data quality. Cartographica: The International Journal for Geographic Information and Geovisualization 30 (2–3), 1–7. Buttenfield BP (2000) Chapter 6: Mapping Ecological Uncertainty. In: Hunsaker CT, Goodchild MF, Friedl MA, and Case TJ (eds.) 2001 Spatial Uncertainty in Ecology, pp. 116–132. New York: Springer-Verlag. Carter, J., 1992. The effect of data precision on the calculation of slope and aspect using gridded DEMs. Cartographica: The International Journal for Geographic Information and Geovisualization 29 (1), 22–34. Caspary, W. and Scheduring, R. (1992). Error-bands as measures of geometrical accuracy. Paper presented at the Third European Conference on GIS (EGIS’92), Munich. Chen, X. (1996). Spatial relations between uncertain sets. Paper presented at the International Archives of Photogrammetry and Remote Sensing, Vienna. Chen, J., & Shaw, S. L. (2016). Representing the spatial extent of places based on Flickr photos with a representativeness-weighted Kernel density estimation. In: International Conference on Geographic Information Science, pp. 30–144. Chen, Y., Yu, J., Shahbaz, K. and Xevi, E. (2009). A GIS-based sensitivity analysis of multi-criteria weights. Paper presented at the 18th World IMAC Congress and MODSIM09 International Congress on Modelling and Simulation, Cairns. Chen, Y., Yu, J., Khan, S., 2010. Spatial sensitivity analysis of multi-criteria weights in GIS-based land suitability evaluation. Environmental Modelling & Software 25 (12), 1582–1591. Cheng, T., Adepeju, M., 2014. Modifiable temporal unit problem (MTUP) and its effect on space-time cluster detection. PLoS One 9 (6), e100465. Cheung, C. K. and Shi, W. (2000). A simulation approach to analyze error in buffer spatial analysis. Paper presented at the International archives of Photogrammetry and Remote Sensing, Amsterdam. Chrisman, N. R. (1982). A theory of cartographic error and its measurement in digital bases. Paper presented at the Fifth International Symposium on Computer-Assisted Cartography and International Society for Photogrammetry and Remote Sensing (Auto-Carto 5): Environmental Assessment and Resource Management, Crystal City. Chrisman, N. R. (1989). Error in categorical maps: Testing versus simulation. Paper presented at the Ninth International Symposium on Computer-Assisted Cartography (Auto-Carto 9), Baltimore. Chrisman, N.R., 1991. The error component in spatial data. Geographical Information Systems 1, 165–174. Chrisman, N.R., Yandell, B.S., 1988. Effects of point error on area calculations: A statistical model. Surveying and Mapping 48, 241–246. Cipeluch, B., Jacob, R., Winstanley, A. and Mooney, P. (2010). Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps. In: Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences 20–23rd July 2010. University of Leicester, pp. 37–40. Claramunt, C., Theriault, M., 1996. Toward semantics for modelling spatio-temporal processes within GIS. Advances in GIs Research I, 27–43. Clementini, E., Felice, D.P., 1996. An algebraic model for spatial objects with indeterminate boundaries. In: Burrough, P.A., Frank, A.U. (Eds.), Geographic objects with indeterminate boundaries. Taylor & Francis, London, pp. 155–169. Cohen, J., 1960. A coefficient of agreement of nominal scales. Educational and Psychological Measurement 20 (1), 37–46. Cohn, A.G., Gotts, N.M., 1996. The “Egg-Yolk” representation of regions with indeterminate boundaries. In: Burrough, P.A., Frank, A.U. (Eds.), Geographic objects with indeterminate boundaries. Taylor & Francis, London, pp. 171–187. Çöltekin, A., De Sabbata, S., Willi, C., Vontobel, I., Pfister, S., Kuhn, M., & Lacayo, M. (2011). Modifiable temporal unit problem. Paper presented at the ISPRS/ICA workshop “Persistent Problems in Geographic Visualization” (ICC2011), Paris.

336

Spatial Data Uncertainty

Comber, A.J., Fisher, P.F., Wadsworth, R., 2004. Integrating land-cover data with different ontologies: Identifying change from inconsistency. International Journal of Geographical Information Science 18 (7), 691–708. Comber, A.J., Fisher, P.F., Harvey, F., Gahegan, M., Wadsworth, R., 2006. Using metadata to link uncertainty and data quality assessments. In: Progress in spatial data handling. Springer, Berlin and Heidelberg, pp. 279–292. Congalton, R.G., 1988. A comparison of sampling schemes used in generating error matrices for assessing the accuracy of maps generated from remotely sensed data. Photogrammetric Engineering & Remote Sensing 54 (5), 593–600. Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment 37 (1), 35–46. Couclelis, H., 1999. Space, time, geography. Geographical Information Systems 1, 29–38. Couclelis, H., 2003. The certainty of uncertainty: GIS and the limits of geographic knowledge. Transactions in GIS 7 (2), 165–175. Couclelis, H., 2010. Ontologies of geographic information. International Journal of Geographical Information Science 24 (12), 1785–1809. Crosetto, M., Tarantola, S., 2001. Uncertainty and sensitivity analysis: Tools for GIS-based model implementation. International Journal of Geographical Information Science 15 (5), 415–437. Cross, V., Firat, A., 2000. Fuzzy objects for geographical information systems. Fuzzy Sets and Systems 113 (1), 19–36. Cruzan, M.B., Weinstein, B.G., Grasty, M.R., Kohrn, B.F., Hendrickson, E.C., Arredondo, T.M., Thompson, P.G., 2016. Small unmanned aerial vehicles (micro-UAVs, drones) in plant ecology. Applications in Plant Sciences 4 (9). http://dx.doi.org/10.3732/apps.160004. Daniel, C., Tennant, K., 2001. DEM quality assessment. In: Digital elevation model technologies and applications: The DEM users manual. ASPRS, Bethesda, pp. 395–440. Daniels, T., 1999. When city and country collide. Island Press, Washington, DC. Dark, S.J., Bram, D., 2007. The modifiable areal unit problem (MAUP) in physical geography. Progress in Physical Geography 31 (5), 471–479. http://dx.doi.org/10.1177/ 0309133307083294. Davidson, D.A., Theocharopoulos, S.P., Bloksma, R.J., 1994. A land evaluation project in Greece using GIS and based on Boolean and fuzzy set methodologies. International Journal of Geographical Information Systems 8 (4), 369–384. Davis, T.J., Keller, C.P., 1997a. Modelling and visualizing multiple spatial uncertainties. Computers & Geosciences 23 (4), 397–408. Davis, T.J., Keller, C.P., 1997b. Modelling uncertainty in natural resource analysis using fuzzy sets and Monte Carlo simulation: Slope stability prediction. International Journal of Geographical Information Science 11 (5), 409–434. De Gruijter, J.J., Walvoort, D.J.J., Van Gams, P.F.M., 1997. Continuous soil mapsdA fuzzy set approach to bridge the gap between aggregation levels of process and distribution models. Geoderma 77 (2), 169–195. Deitrick, S., Edsall, R., 2006. The influence of uncertainty visualization on decision making: An empirical evaluation. In: Progress in spatial data handling. Springer, Heidelberg, pp. 719–738. Deitrick, S., Edsall, R., 2008. Making uncertainty usable: Approaches for visualizing uncertainty information. Geographic visualization: Concepts, tools and applications. John Wiley & Sons, Hoboken, pp. 277–291. Delmelle, E., Dony, C., Casas, I., Jia, M., Tang, W., 2014. Visualizing the impact of space-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science 28 (5), 1107–1127. Deutsch, C.V., Journel, A.G., 1992. GSLB: Geostatistical software library and user’s guide. Oxford University Press, Oxford. Devillers, R., Stein, A., Bédard, Y., Chrisman, N., Fisher, P., Shi, W., 2010. Thirty years of research on spatial data quality: Achievements, failures, and opportunities. Transactions in GIS 14 (4), 387–400. http://dx.doi.org/10.1111/j.1467-9671.2010.01212.x. Dixon, B., 2005. Groundwater vulnerability mapping: A GIS and fuzzy rule based integrated tool. Applied Geography 25 (4), 327–347. Dorling, D., 1995. A new social atlas of Britain. John Wiley and Sons, London. Dragicevic, S., Marceau, D.J., 2000. A fuzzy set approach for modelling time in GIS. International Journal of Geographical Information Science 14 (3), 225–245. Drecki, I., 2002. Visualisation of uncertainty in geographical data. In: Spatial data quality. Taylor & Francis, London, pp. 140–159. Drucker, J., 2011. Humanities approaches to graphical display. Digital Humanities Quarterly 5 (1), 1–21. Drummond, J., 1995. Positional accuracy. In: Guptill, S.C., Morrison, J.L. (Eds.), Elements of spatial data quality. Elsevier Science Ltd, Oxford, pp. 31–38. Duckham, M., Sharp, J., 2005. Uncertainty and geographic information: Computational and critical convergence. In: Fisher, P., Unwin, D. (Eds.), Re-presenting GIS. John Wiley & Sons, London, pp. 113–124. Duckham, M., Mason, K., Stell, J., Worboys, M., 2001. A formal approach to imperfection in geographic information. Computers, Environment and Urban Systems 25 (1), 89–103. Duckham, M., Lingham, J., Mason, K., Worboys, M., 2006. Qualitative reasoning about consistency in geographic information. Information Sciences 176 (6), 601–627. Dungan, J.L., 2002. Toward a comprehensive view of uncertainty in remote sensing analysis. In: Uncertainty in remote sensing and GIS, 3. Wiley, Hoboken, pp. 25–35. Dunn, R., Harrison, A.R., White, J.C., 1990. Positional accuracy and measurement error in digital databases of land use: An empirical study. International Journal of Geographical Information systems 4 (4), 385–398. Dutton, G. (1992). Handling positional uncertainty in spatial databases. Paper presented at the 5th International Symposium on Spatial data Handling, Charleston. Edwards, G., Lowell, K.E., 1996. Modeling uncertainty in photointerpreted boundaries. Photogrammetric Engineering and Remote Sensing 62 (4), 377–391. Egenhofer, M.J., Herring, J.R., 1991. Categorizing binary topological relationships between regions, lines, and points in geographic databases. Department of Surveying Engineering, University of Maine, Orono. Ehlschlaeger CR (1998) The stochastic simulation approach: Tools for representing spatial application uncertanity. Unpublished doctoral thesis, University of California, Santa Barbara. Ehlschlaeger, CR, and Shortridge A (1996) Modeling elevation uncertainty in geographical analyses. In: Proceedings of the International Symposium on Spatial Data Handling, p. 9B. Esri (2017) An overview of the space time pattern mining Toolbox. http://desktop.arcgis.com/en/arcmap/10.3/tools/space-time-pattern-mining-toolbox/an-overview-of-the-spacetime-pattern-mining-toolbox.htm (Accessed 4 November 2017). Evans, A.J., Waters, T., 2007. Mapping vernacular geography: Web-based GIS tools for capturing fuzzy or vague entities. International Journal of Technology, Policy and Management 7 (2), 134–150. Fan, A., Guo, D., 2001. The uncertainty band model of error entropy. Acta Geodaetica el Cartographica Sinica 30, 48–53. Fan, H., Zipf, A., Fu, Q., Neis, P., 2014. Quality assessment for building footprints data on OpenStreetMap. International Journal of Geographical Information Science 28 (4), 700–719. Feizizadeh, B., Blaschke, T., 2014. An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping. International Journal of Geographical Information Science 28 (3), 610–638. FGDC (1998). Federal Geographic Data Committee. Geospatial Positioning Accuracy Standards Part 3: National Standard for Spatial Data Accuracy. Subcommittee for Base Cartographic Data, Federal Geographic Data Committee, FGDC-STD-007.3-1998. FGDC.gov. Fisher, P.F., 1991a. First experiments in viewshed uncertainty: The accuracy of the viewshed area. Photogrammetric Engineering and Remote Sensing 57 (10), 1321–1327. Fisher, P.F., 1991b. Modelling soil map-unit inclusions by Monte Carlo simulation. International Journal of Geographical Information System 5 (2), 193–208. Fisher, P.F., 1992. First Experiments in Viewshed Uncertainty: Simulating Fuzzy Viewsheds. Photogrammetric engineering and remote sensing 58 (3), 345–352. Fisher, P., 1996. Boolean and fuzzy regions. In: Masser I and Salge F (eds.) Geographic objects with indeterminate boundaries, GISDATA2. Taylor and Francis. ISBN-10: 0748403876. Fisher, P.F., 1999. Models of uncertainty in spatial data. Geographical Information Systems 1, 191–205.

Spatial Data Uncertainty

337

Fisher, P.F., 2000. Sorites paradox and vague geographies. Fuzzy Sets and Systems 113 (1), 7–18. Fisher, P.F., Pathirana, S., 1990. The evaluation of fuzzy membership of land cover classes in the suburban zone. Remote Sensing of Environment 34 (2), 121–132. Fisher, P.F., Tate, N.J., 2006. Causes and consequences of error in digital elevation models. Progress in Physical Geography 30 (4), 467–489. Fisher, P., Wood, Jo, 1998. What is a mountain? Or the Englishman who went up a Boolean geographical concept but realised it was fuzzy. Geography Compass 83 (3), 247–256. Fonseca, F.T., Egenhofer, M.J., Agouris, P., Câmara, G., 2002. Using ontologies for integrated geographic information systems. Transactions in GIS 6 (3), 231–257. Fonstad, M.A., Dietrich, J.T., Courville, B.C., Jensen, J.L., Carbonneau, P.E., 2013. Topographic structure from motion: A new development in photogrammetric measurement. Earth Surface Processes and Landforms 38 (4), 421–430. http://dx.doi.org/10.1002/esp.3366. Foody, G.M., 2003. Uncertainty, knowledge discovery and data mining in GIS. Progress in Physical Geography 27 (1), 113–121. Fotheringham, A.S., Wong, D.W., 1991. The modifiable areal unit problem in multivariate statistical analysis. Environment and Planning A 23 (7), 1025–1044. Frank, A.U., 1997. Spatial ontology: A geographical information point of view. In: Spatial and temporal reasoning. Springer, Dordrecht, pp. 135–153. Frank, A.U., 2003. Ontology for spatio-temporal databases. In: Spatio-temporal databases. Springer, Berlin and Heidelberg, pp. 9–77. Fukunaga, K., Hayes, R.R., 1989. Effects of sample size in classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (8), 873–885. Gahegan MN (1999) Characterizing the semantic content of geographic data, models, and systems. Interoperating Geographic Information Systems, pp. 71–83. US: Springer. Gallik, J., Bolesova, L., 2016. sUAS and their application in observing geomorphological processes. Solid Earth 7 (4), 1033–1042. http://dx.doi.org/10.5194/se-7-1033-2016. Ge, Y., Li, S., Lakhan, V.C., Lucieer, A., 2009. Exploring uncertainty in remotely sensed data with parallel coordinate plots. International Journal of Applied Earth Observation and Geoinformation 11 (6), 413–422. Gehlke, C.E., Biehl, K., 1934. Certain effects of grouping upon the size of the correlation coefficient in census tract material. Journal of the American Statistical Association 29 (185A), 169–170. Ghilani, C.D., 2000. Demystifying area uncertainty: More or less. Surveying and Land Information Systems 60 (3), 177–182. Girres, J.F., Touya, G., 2010. Quality assessment of the French OpenStreetMap dataset. Transactions in GIS 14 (4), 435–459. Goncalves, J.A., Henriques, R., 2015. UAV photogrammetry for topographic monitoring of coastal areas. ISPRS Journal of Photogrammetry and Remote Sensing 104, 101–111. http://dx.doi.org/10.1016/j.isprsjprs.2015.02.009. Goodchild, M. F. (1991). Symposium on spatial database accuracy. Paper presented at the Symposium on Spatial Database Accuracy, Melbourne. Goodchild, M.F., 1995. Attribute accuracy. In: Guptill, S.C., Morrison, J.L. (Eds.), Elements of spatial data quality. Elsevier Science Ltd, Oxford. Goodchild, M.F., 2000. Introduction: special issue on ‘uncertainty in geographic information systems’. Fuzzy sets and systems 113 (1), 3–5. Goodchild, M.F., 2001. Metrics of scale in remote sensing and GIS. International Journal of Applied Earth Observation and Geoinformation 3 (2), 114–120. Goodchild, M.F., 2004. GIScience, geography, form, and process. Annals of the Association of American Geographers 94 (4), 709–714. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69 (4), 211–221. Goodchild, M.F., 2011. Scale in GIS: An overview. Geomorphology 130 (1), 5–9. Goodchild, M.F., Gopal, S. (Eds.), 1989. The accuracy of spatial databases. CRC Press, Boca Raton. Goodchild, M.F., Hunter, G.J., 1997. A simple positional accuracy measure for linear features. International Journal of Geographical Information Science 11 (3), 299–306. http:// dx.doi.org/10.1080/136588197242419. Goodchild, M.F., Li, L., 2012. Assuring the quality of volunteered geographic information. Spatial Statistics 1, 110–120. Goodchild, M.F., Proctor, J., 1997. Scale in a digital geographic world. Geographical and environmental modelling 1, 5–24. Goodchild, M.F., Quattrochi, D.A., 1997. Scale, multiscaling, remote sensing, and GIS. Press, CRS, ISBN 9781566701044. Goodchild, M.F., Yuan, M., Cova, T.J., 2007. Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science 21 (3), 239–260. Goodchild, M.F., Guo, H., Annoni, A., Bian, L., de Bie, K., Campbell, F., Cragliac, M., Ehlersg, M., van Genderene, J., Jacksonh, D., Lewisi, A.J., Pesaresic, M., Remetey-Fülöppj, G., Simpsonk, R., Skidmoref, A., Wangb, C., Woodgatel, P., 2012. Next-generation digital earth. Proceedings of the National Academy of Sciences 109 (28), 11088–11094. Griffith, D.A., 1989. Distance calculations and errors in geographic databases. In: Goodchild, M.F., Gopal, S. (Eds.), Accuracy of spatial databases. Taylor & Francis, London, pp. 81–90. Griffith, D., Chun, Y., 2016. Spatial Autocorrelation and Uncertainty Associated with Remotely-Sensed Data. Remote Sensing 8 (7), 535. Grira, J., Bédard, Y., Roche, S., 2010. Spatial data uncertainty in the VGI world: Going from consumer to producer. Geomatica 64 (1), 61–72. Grothe, C., Schaab, J., 2009. Automated footprint generation from geotags with kernel density estimation and support vector machines. Spatial Cognition and Computation 9 (3), 195–211. Guesgen, H.W., Albrecht, J., 2000. Imprecise reasoning in geographic information systems. Fuzzy Sets and Systems 113 (1), 121–131. Guptill, S.C., Morrison, J.L., 1995. Elements of spatial data quality. Elsevier Science, Oxford. Hagen-Zanker, A., Straatman, B., Uljee, I., 2005. Further developments of a fuzzy set map comparison approach. International Journal of Geographical Information Science 19 (7), 769–785. Haklay, M., 2010. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and Design 37 (4), 682–703. Harvey, D., 1993. From space to place and back again: Reflections on the condition of postmodernity. In: Bird, J., Curtis, B., Putnam, T., Tickner, L. (Eds.), Mapping the futures. Routledge, London, pp. 3–29. Harwin, S., Lucieer, A., 2012. Assessing the accuracy of georeferenced point clouds produced via multi-view stereopsis from unmanned aerial vehicle (UAV) imagery. Remote Sensing 4 (12), 1573–1599. http://dx.doi.org/10.3390/rs4061573. Hays, T.E., 1993. “The new guinea highlands”: Region, culture area, or fuzzy set? [and comments and reply]. Current Anthropology 34 (2), 141–164. Helton, J.C., Davis, F.J., 2003. Latin hypercube sampling and the propagation of uncertainty in analyses of complex systems. Reliability Engineering & System Safety 81 (1), 23–69. Hengl, T., Heuvelink, G., Loon, E., 2010. On the uncertainty of stream networks derived from elevation data: The error propagation approach. Hydrology and Earth System Sciences 14 (7), 1153–1165. Heuvelink, G.B., 1998. Error propagation in Environmental modelling with GIS. CRC Press, Boca Raton. Heuvelink, G.B., 1999. Propagation of error in spatial modelling with GIS. Geographical Information Systems 1, 207–217. Heuvelink, G.B., Brown, J.D., van Loon, E.E., 2007. A probabilistic framework for representing and simulating uncertain environmental variables. International Journal of Geographical Information Science 21 (5), 497–513. Hong, S., Vonderohe, A.P., 2014. Uncertainty and sensitivity assessments of GPS and GIS integrated applications for transportation. Sensors 14 (2), 2683–2702. Hoover, W. E. (1984). Algorithms for confidence circles and ellipses (NOS 107 C&GS 3). https://www.ngs.noaa.gov/PUBS_LIB/AlgorithmsForConfidenceCirclesAndEllipses_TR_ NOS107_CGS3.pdf (Accessed, April, 10, 2017). Hope, S., Hunter, G.J., 2007. Testing the effects of positional uncertainty on spatial decision-making. International Journal of Geographical Information Science 21 (6), 645–665. Horn, B.K., 1981. Hill shading and the reflectance map. Proceedings of the IEEE 69 (1), 14–47. Hubbard, P., Kitchin, R. (Eds.), 2010. Key thinkers on space and place. Sage. Thousand Oaks, CA. Hudelot, C., Atif, J., Bloch, I., 2008. Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets and Systems 159 (15), 1929–1951. Hugenholtz, C.H., Whitehead, K., Brown, O.W., Barchyn, T.E., Moorman, B.J., LeClair, A., Hamilton, T., 2013. Geomorphological mapping with a small unmanned aircraft system (sUAS): Feature detection and accuracy assessment of a photogrammetrically-derived digital terrain model. Geomorphology 194, 16–24. http://dx.doi.org/10.1016/ j.geomorph.2013.03.023.

338

Spatial Data Uncertainty

Hunsaker CT, Goodchild MF, Friedl MA and Case TJ (eds.) (2013) Spatial uncertainty in ecology: implications for remote sensing and GIS applications. Springer Science & Business Media. Hunter, G.J., Goodchild, M.F., 1996. A new model for handling vector data uncertainty in GIS. Journal of Urban and Regional Systems Association 8 (1), 51–57. Hunter, G.J., Goodchild, M.F., 1997. Modeling the uncertainty of slope and aspect estimates derived from spatial databases. Geographical Analysis 29 (1), 35–49. Hunter, G.J., Qiu, J., Goodchild, M.F., 2000. Application of a new model of vector data uncertainty. In: Jaton, A., Lowell, K. (Eds.), Spatial accuracy assessment: Land information uncertainty in natural resources. Ann Arbor Press, Michigan, pp. 203–208. Janowicz, K., Raubal, M., Kuhn, W., 2011. The semantics of similarity in geographic information retrieval. Journal of Spatial Information Science 2011 (2), 29–57. Jelinski, D.E., Wu, J., 1996. The modifiable areal unit problem and implications for landscape ecology. Landscape Ecology 11 (3), 129–140. Jenson, S.K., Domingue, J.O., 1988. Extracting topographic structure from digital elevation data for geographic information system analysis. Photogrammetric Engineering and Remote Sensing 54 (11), 1593–1600. Jiang, H., Eastman, J.R., 2000. Application of fuzzy measures in multi-criteria evaluation in GIS. International Journal of Geographical Information Science 14 (2), 173–184. Jones, C.B., Purves, R.S., Clough, P.D., Joho, H., 2008. Modelling vague places with knowledge from the web. International Journal of Geographical Information Science 22 (10), 1045–1065. Kardos, J., Benwell, G., Moore, A., 2005. The visualisation of uncertainty for spatially referenced census data using hierarchical tessellations. Transactions in GIS 9 (1), 19–34. Kavouras, M., Kokla, M., 2007. Theories of geographic concepts: Ontological approaches to semantic integration. CRC Press, Boca Raton. Kaye, N.R., Hartley, A., Hemming, D., 2012. Mapping the climate: Guidance on appropriate techniques to map climate variables and their uncertainty. Geoscientific Model Development 5 (1), 245–256. Keßler, C., Mau, P., Heuer, J.T., Bartoschek, T., 2009. Bottom-up gazetteers: Learning from the implicit semantics of geotags. In: GeoS ’09: Proceedings of the Third International Conference on GeoSpatial Semantics. Springer, Berlin, pp. 83–102. Kiiveri, H.T., 1997. Assessing, representing and transmitting positional uncertainty in maps. International Journal of Geographical Information Science 11 (1), 33–52. Klir, G., Yuan, B., 1995. Fuzzy sets and fuzzy logic, 4. Prentice hall, New Jersey. Kraus, K., Kager, H., 1994. Accuracy of derived data in a geographic information system. Computers, Environment and Urban Systems 18 (2), 87–94.  2011. Thematic uncertainty visualization usability–comparison of basic methods. Annals of GIS 17 (4), 253–263. Kubícek, P., Sasinka, C., Kuhn, W., 2009. A functional ontology of observation and measurement. In: International Conference on GeoSpatial Sematics. Springer, Berlin and Heidelberg, pp. 26–43. Kuipers, B., 2000. The spatial semantic hierarchy. Artificial Intelligence 119 (1), 191–233. Kunz, M., Grêt-Regamey, A., Hurni, L., 2011. Visualization of uncertainty in natural hazards assessments using an interactive cartographic information system. Natural Hazards 59 (3), 1735–1751. Kwan, M.-P., 2012a. The uncertain geographic context problem. Annals of the Association of American Geographers 102 (5), 958–968. Kwan, M.-P. (2012b). Uncertain geographic context problem: Implications for environmental health research. Paper presented at the 142nd APHA Annual Meeting and Exposition, 15–19 November 2014. New Orleans: LA. Ladner, R., Petry, F.E., Cobb, M.A., 2003. Fuzzy set approaches to spatial data mining of association rules. Transactions in GIS 7 (1), 123–138. Lam, N.S.N., Quattrochi, D.A., 1992. On the issues of scale, resolution, and fractal analysis in the mapping sciences. The Professional Geographer 44 (1), 88–98. Leung, Y., Yan, J., 1998. A locational error model for spatial features. International Journal of Geographical Information Science 12 (6), 607–620. Leung, Y., Ma, J., Goodchild, M.F., 2004a. A general framework for error analysis in measurement-based GIS Part 1: The basic measurement-error model and related concepts. Journal Geographical Systems 6, 381–402. Leung, Y., Ma, J., Goodchild, M.F., 2004b. A general framework for error analysis in measurement-based GIS part 3: Error analysis in intersections and overlays. Journal Geographical Systems 6, 325–354. Leung, Y., Ma, J., Goodchild, M.F., 2004c. A general framework for error analysis in measurement-based GIS part 4: Error analysis in length and area measurements. Journal of Geographical Systems 6, 403–428. Li, L., Clarke, K.C., 2012. Cartograms showing China’s population and wealth distribution. Journal of Maps 8 (3), 320–323. Li L and Goodchild MF (2012) Constructing places from spatial footprints. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information, 6 November 2012, pp. 15–21, Redondo Beach, ACM. Li L and Valdovinos J (2017) Optimized conflation of authoritative and crowd-sourced geographic data: Creating an integrated bike map. In: Information Fusion and Geographic Information Systems (IF&GIS’2017). Switzerland: Springer International Publishing. Li, D., Zhang, J., Wu, H., 2012. Spatial data quality and beyond. International Journal of Geographical Information Science 26 (12), 2277–2290. Ligmann-Zielinska, A., Jankowski, P., 2014. Spatially-explicit integrated uncertainty and sensitivity analysis of criteria weights in multicriteria land suitability evaluation. Environmental Modelling & Software 57, 235–247. Liu, Y., Phinn, S.R., 2003. Modelling urban development with cellular automata incorporating fuzzy-set approaches. Computers, Environment and Urban Systems 27 (6), 637–658. Liu, C., Tong, X., 2005. Relationship of uncertainty between polygon segment and line segment for spatial data in GIS. Geo-spatial Information Science 8 (3), 183–188. Lo, C.P., Yeung, A.K.W., 2002. Concepts and techniques of geographic information systems. Prentice-Hall Inc, Upper Saddle River. Lodwick, W.A., Monson, W., Svoboda, L., 1990. Attribute error and sensitivity analysis of map operations in geographic information systems: Suitability analysis. International Journal of Geographical Information systems 4 (4), 413–428. Love, A.L., Pang, A., Kao, D.L., 2005. Visualizing spatial multivalue data. IEEE Computer Graphics and Applications 25 (3), 69–79. Lowell, K., Jaton, A., 2000. Spatial accuracy assessment: Land information uncertainty in natural resources. CRC Press, Boca Raton. Lucieer, A., Kraak, M.J., 2004. Interactive and visual fuzzy classification of remotely sensed imagery for exploration of uncertainty. International Journal of Geographical Information Science 18 (5), 491–512. MacEachren, A.M., Robinson, A., Hopper, S., Gardner, S., Murray, R., Gahegan, M., Hetzler, E., 2005. Visualizing geospatial information uncertainty: What we know and what we need to know. Cartography and Geographic Information Science 32 (3), 139–160. http://dx.doi.org/10.1559/1523040054738936. MacEachren, A.M., Roth, R.E., O’Brien, J., Li, B., Swingley, D., Gahegan, M., 2012. Visual semiotics; uncertainty visualization: An empirical study. IEEE Transactions on Visualization and Computer Graphics 18 (12), 2496–2505. Malczewski, J., 2006. Ordered weighted averaging with fuzzy quantifiers: GIS-based multicriteria evaluation for land-use suitability analysis. International Journal of Applied Earth Observation and Geoinformation 8 (4), 270–277. Mark, D.M., Smith, B., Tversky, B., 1999. Ontology and geographic objects: An empirical study of cognitive categorization. In: International Conference on Spatial Information Theory. Springer, Berlin and Heidelberg, pp. 283–298. Mikhail, E.M., Gracie, G., 1981. Analysis and adjustment of survey measurements. Van Nostrand Reinhold Co, New York. Miller, M.D., 2016. The modifiable conceptual unit problem demonstrated using pollen and seed dispersal. Global Ecology and Conservation 6, 93–104. Monmonier, M., 2006. Cartography: Uncertainty, interventions, and dynamic display. Progress in Human Geography 30 (3), 373. Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P., 2003. Where’s downtown? Behavioral methods for determining referents of vague spatial queries. Spatial Cognition and Computation 3 (2–3), 185–204. Mooney, P. (2011). The evolution and spatial volatility of VGI in OpenStreetMap. In: Hengstberger Symposium Towards Digital Earth: 3D Spatial Data Infrastructures, pp. 7–8. Heidelberg. Morris, A. (2008). Uncertainty in spatial databases. In: Wilson, J. P. & Fotheringham, A. S. (eds.) The handbook of geographic information science, pp. 80–93. Oxford, UK: John Wiley & Sons.

Spatial Data Uncertainty

339

Mowrer, H.T., Congalton, R.G., 2003. Quantifying spatial uncertainty in natural resources: Theory and applications for GIS and remote sensing. CRC Press, Boca Raton. Neitzel, F., Klonowski, J., 2011. Mobile 3D mapping with a low-cost UAV system. International Archives of the Photogrammetry, Remote Sensing and Spatial Science 38, 1–6. Nelson, A., Reuter, H., Gessler, P., 2009. DEM production methods and sources. Developments in Soil Science 33, 65–85. Neprash, J.A., 1934. Some problems in the correlation of spatially distributed variables. Journal of the American Statistical Association 29 (185A), 167–168. Neutens, T., Witlox, F., Van de Weghe, N., De Maeyer, P., 2007. Human interaction spaces under uncertainty. Transportation Research Record: Journal of the Transportation Research Board 2021, 28–35. Openshaw, S., 1984a. Ecological fallacies and the analysis of areal census data. Environment and Planning A 16 (1), 17–31. Openshaw, S., 1984b. The modifiable areal unit problem. CATMOG – Concepts and Techniques in Modern Geography. Geo Books: Norwich, England. Openshaw, S., 1998. Towards a more computationally minded scientific human geography. Environment and Planning A 30 (2), 317–332. Ovenden, M., 2007. Transit maps of the world. Penguin Books, London. Pang, A., 2001. Visualizing uncertainty in geo-spatial data. In: Proceedings of the Workshop on the Intersections between Geospatial Information and Information Technology. National Research Council, Arlington, pp. 1–14. Pappenberger, F., Frodsham, K., Beven, K., Romanowicz, R., Matgen, P., 2007. Fuzzy set approach to calibrating distributed flood inundation models using remote sensing observations. Hydrology and Earth System Sciences Discussions 11 (2), 739–752. Perkal, J. (1966). On the length of empirical curves. Paper presented at the Michigan Inter-University Community of Mathematical Geography, Ann Arbor. Peterson, A.T., Lash, R.R., Carroll, D.S., Johnson, K.M., 2006. Geographic potential for outbreaks of Marburg hemorrhagic fever. The American Journal of Tropical Medicine and Hygiene 75 (1), 9–15. Pfaffelmoser T, Reitinger M, and Westermann R (2011) Visualizing the positional and geometrical variability of isosurfaces in uncertain scalar fields. In: Computer Graphics Forum, vol. 30, no. 3, pp. 951–960. Oxford, Blackwell Publishing. Plata-Rocha, W., Gómez-Delgado, M., Bosque-Sendra, J., 2012. Proposal for the introduction of the spatial perspective in the application of global sensitivity analysis. Journal of Geographic Information System 4 (6), 503–513. Plewe, B., 2002. The nature of uncertainty in historical geographic information. Transactions in GIS 6 (4), 431–456. Prisley, S.P., Gregoire, T.G., Smith, J.L., 1989. The mean and variance of area estimates computed in an arc-node geographical information systems. Photogrammetric Engineering and Remote Sensing 55, 1601–1612. Quattrochi, D.A., Goodchild, M.F., 1997. Scale in remote sensing and GIS. CRC Press, Boca Raton. Rae, C., Rothley, K., Dragicevic, S., 2007. Implications of error and uncertainty for an environmental planning scenario: A sensitivity analysis of GIS-based variables in a reserve design exercise. Landscape and Urban Planning 79 (3–4), 210–217. Ragin, C.C., 2000. Fuzzy-set social science. University of Chicago Press, Chicago. Randell, D. A., Cui, Z., & Cohn, A. G. (1992). A spatial logic based on regions and connection. Paper presented at the 3rd International Conference on Knowledge Representation and Reasoning, Cambridge, MA. Raymond, E., 1999. The cathedral and the bazaar. Knowledge, Technology & Policy 12 (3), 23–49. Relph, E., 1976. Place and placelessness. Pion, London. Reshetyuk, Y., Martensson, S.G., 2016. Generation of highly accurate digital elevation models with unmanned aerial vehicles. Photogrammetric Record 31 (154), 143–165. http:// dx.doi.org/10.1111/phor.12143. Rinner, C., Heppleston, A., 2006. The Spatial Dimensions of Multi-Criteria Evaluation – Case Study of a Home Buyer’s Spatial Decision Support System. In: Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (Eds.), Geographic Information Science, pp. 338–352. Proceedings of 4th International Conference, GIScience 2006, Münster, Germany, 20-23 September 2006. Robinson, V.B., 2003. A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Transactions in GIS 7 (1), 3–30. Rosenfield, G.H., Fitzpatrick-Lins, K., 1986. A coefficient of agreement as a measure of thematic classification accuracy. Photogrammetric Engineering and Remote Sensing 52 (2), 223–227. Roth, R.E., 2009. The impact of user expertise on geographic risk assessment under uncertain conditions. Cartography and Geographic Information Science 36 (1), 29–43. Ruddell, D., Wentz, E.A., 2009. Multi-tasking: Scale in geography. Geography Compass 3 (2), 681–697. Sae-Jung, J., Chen, X. and Phuong, D. (2008). Error propagation modeling in GIS polygon overlay. Paper presented at the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Beijing. Saltelli, A., Chan, K., Scott, M. (Eds.), 2000. Sensitivity analysis. John Wiley & Sons, New York. Schneider, M., 1999. Uncertainty management for spatial data in databases: Fuzzy spatial data types. In: International Symposium on Spatial Databases. Springer, Berlin and Heidelberg, pp. 330–351. Schuurman, N., 2006. Formalization matters: Critical GIS and ontology research. Annals of the Association of American Geographers 96 (4), 726–739. Sen, S., 2008. Framework for probabilistic geospatial ontologies. International Journal of Geographical Information Science 22 (7), 825–846. Shi, W. (1994). Modeling positional and thematic error in integration of GIS and remote sensing. Enschede: ITC Publication. Shi W (2009) Principles of modeling uncertainties in spatial data and spatial analyses. Boca Raton, FL: CRC Press, Taylor & Francis Group. Shi, W., Liu, W., 2000. A stochastic process-based model for the positional error of line segments in GIS. International Journal of Geographical Information Science 14 (1), 51–66. Shi, W., Liu, K., 2004. Modeling fuzzy topological relations between uncertain objects in GIS. Photogrammetric Engineering and Remote Sensing 70 (8), 921–929. Shi, W., Tong, X., Liu, D., 2000. A approach for modeling error of generic curve features in GIS. Acta Geodaetica et Cartographica Sinica 29, 52–58. Shi, W., Cheung, C., Zhu, C., 2003. Modeling error propagation of buffer spatial analysis in vector-based GIS. International Journal of Geographical Information Science 17 (3), 251–271. Shi, W., Cheung, C., Tong, X., 2004. Modeling error propagation in vector-based overlay spatial analysis. ISPRS Journal of Photogrammetry & Remote Sensing 59, 47–59. Slingsby, A., Dykes, J., Wood, J., 2011. Exploring uncertainty in geodemographics with interactive graphics. IEEE Transactions on Visualization and Computer Graphics 17 (12), 2545–2554. Smith, B., Mark, D., 1998. Ontology with human subjects testing. American Journal of Economics and Sociology 58 (2), 245–312. Stehman, S.V., Czaplewski, R.L., 1998. Design and analysis for thematic map accuracy assessment: Fundamental principles. Remote Sensing and Environment 64 (3), 331–344. Steinhardt, U., 1998. Applying the fuzzy set theory for medium and small scale landscape assessment. Landscape and Urban Planning 41 (3), 203–208. Stephan, F.F., 1934. Sampling errors and interpretations of social data ordered in time and space. Journal of the American Statistical Association 29 (185A), 165–166. Stouffer, S.A., 1934. Problems in the application of correlation to sociology. Journal of the American Statistical Association 29 (185A), 52–58. Su, X., Talmaki, S., Cai, H., Kamat, V.R., 2013. Uncertainty-aware visualization and proximity monitoring in urban excavation: A geospatial augmented reality approach. Visualization in Engineering 1 (1), 1. Sui, D., 1992. A fuzzy GIS modeling approach for urban land evaluation. Computers, Environment and Urban Systems 16 (2), 101–115. Sui, D. (2009). Ecological fallacy. In: Kitchin, R. & Thrift, N. (eds.) International encyclopedia of human geography, pp. 291–293. Elsevier. https://www.elsevier.com/books/ international-encyclopedia-of-human-geography/kitchin/978-0-08-044911-1. Tarboton, D.G., 1997. A new method for the determination of flow directions and upslope areas in grid digital elevation models. Water Resources Research 33 (2), 309–319. Tate, E., 2013. Uncertainty analysis for a social vulnerability index. Annals of the Association of American Geographers 103 (3), 526–543. Tobler, W., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46 (Suppl 1), 234–240. Tobler W (1988) Resolution, Resampling, and All That. In: Mounsey H and Tomlinson R (eds.) Building Data Bases for Global Science, pp. 129–137. London, UK: Taylor and Francis.

340

Spatial Data Uncertainty

Tobler, W., 2004. Thirty five years of computer cartograms. Annals of the Association of American Geographers 94 (1), 58–73. Tong, X., Shi, W., 2010. Measuring positional error of circular curve features in Geographic Information Systems (GIS). Computers & Geosciences 36 (7), 861–870. Tong, X., Sun, T., Fan, J., Goodchild, M.F., Shi, W., 2013. A statistical simulation model for positional error of line features in Geographic Information Systems (GIS). International Journal of Applied Earth Observation and Geoinformation 21, 136–148. Tuan, Y.F., 1977. Space and place: The perspective of experience. University of Minnesota Press, Minneapolis. Tucci, M., Giordano, A., 2011. Positional accuracy, positional uncertainty, and feature change detection in historical maps: Results of an experiment. Computers, Environment and Urban Systems 35 (6), 452–463. US Census Bureau. (2000). Census 2000 summary file 3. American fact finder. http://www.factfinder.census.gov/home/en/sf3.html (Accessed 9 October 2007). USGS, 1998. National mapping program technical instructions, part 2-standards for digital elevation models. USGS, Washington, DC. Veregin H (1999) Data quality parameters. Geographical Information Systems 1: 177–189. Verhoeven, G., Taelman, D., Vermeulen, F., 2012. Computer vision-based orthophoto mapping of complex archaeological sites: The ancient quarry of Pitaranha (Portugal–Spain). Archaeometry 54 (6), 1114–1129. http://dx.doi.org/10.1111/j.1475-4754.2012.00667.x. Voudouris, V., 2010. Towards a unifying formalisation of geographic representation: The object–field model with uncertainty and semantics. International Journal of Geographical Information Science 24 (12), 1811–1828. Wang, F., Hall, G.B., 1996. Fuzzy representation of geographical boundaries in GIS. International Journal of Geographical Information Systems 10 (5), 573–590. Wechsler, S., 2007. Uncertainties associated with digital elevation models for hydrologic applications: A review. Hydrology and Earth System Sciences 11 (4), 1481–1500. Wechsler, S.P., Kroll, C.N., 2006. Quantifying DEM uncertainty and its effect on topographic parameters. Photogrammetric Engineering & Remote Sensing 72 (9), 1081–1090. Westoby, M.J., Brasington, J., Glasser, N.F., Hambrey, M.J., Reynolds, J.M., 2012. ‘Structure-from-motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314. http://dx.doi.org/10.1016/j.geomorph.2012.08.021. Wilson, J.P., 2012. Digital terrain modeling. Geomorphology 137 (1), 107–121. Wilson JP and Burrough PA (1999) Dynamic modeling, geostatistics, and fuzzy classification: New sneakers for a new geography? 736–746. Wittenbrink, C.M., Pang, A.T., Lodha, S.K., 1996. Glyphs for visualizing uncertainty in vector fields. IEEE Transactions on Visualization and Computer Graphics 2 (3), 266–279. Wong, D., 2009. The modifiable areal unit problem (MAUP). Sage, London. Wood, J.D., Fisher, P.F., 1993. Assessing interpolation accuracy in elevation models. IEEE Computer Graphics and Applications 13 (2), 48–56. Woodcock, C.E., Gopal, S., 2000. Fuzzy set theory and thematic maps: Accuracy assessment and area estimation. International Journal of Geographical Information Science 14 (2), 153–172. Wu, J., Jones, B., Li, H., Loucks, O.L., 2006. Scaling and uncertainty analysis in ecology. Methods and applications. Springer. ISBN-10: 1402046626. Wu, S., Li, J., Huang, G., 2008. A study on DEM-derived primary topographic attributes for hydrologic applications: Sensitivity to elevation data resolution. Applied Geography 28 (3), 210–223. Xiao, N., Calder, C.A., Armstrong, M.P., 2007. Assessing the effect of attribute uncertainty on the robustness of choropleth map classification. International Journal of Geographical Information Science 21 (2), 121–144. Xue, J., Leung, Y., Ma, J., 2015. High-order Taylor series expansion methods for error propagation in geographic information systems. Journal of Geographical Systems 17 (2), 187–206. Zadeh, L.A., 1965. Fuzzy sets. Information and Control 8 (3), 338–353. Zandbergen, P.A., 2008. Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Transactions in GIS 12 (1), 103–130. Zevenbergen, L.W., Thorne, C.R., 1987. Quantitative analysis of land surface topography. Earth Surface Processes and Landforms 12 (1), 47–56. Zhan, F.B., 1998. Approximate analysis of binary topological relations between geographic regions with indeterminate boundaries. Soft Computing 2 (2), 28–34. Zhang, J., 2006. The calculating formulae and experimental methods in error propagation analysis. IEEE Transactions on Reliability 55 (2), 169–181. Zhang, J., Foody, G.M., 2001. Fully-fuzzy supervised classification of sub-urban land cover from remotely sensed imagery: Statistical and artificial neural network approaches. International Journal of Remote Sensing 22 (4), 615–628. Zhang, J., Goodchild, M.F., 2002. Uncertainty in geographical information. CRC Press, Boca Raton. Zhang, J., Kirby, R.P., 2000. A geostatistical approach to modeling positional error in vector data. Transactions in GIS 4 (2), 145–159. Zhang, B., Zhu, L., Zhu, G., 1998. The uncertainty propagation model of vector data on buffer operation in GIS. ACTA Geodaetica et Cartographic Sinica 27, 259–266. Zhang, L., Deng, M., Chen, X., 2006. A new approach to simulate positional error of line segment in GIS. Geo-spatial Information Science 9 (2), 142–146. Zimmermann, H. J. (1996). Fuzzy control. In: Fuzzy set theorydAnd its applications, pp. 203–240. Berlin, Germany: Springer.

1.23

Cyberinfrastructure and High-Performance Computing

Xuan Shi and Miaoqing Huang, University of Arkansas, Fayetteville, AR, United States © 2018 Elsevier Inc. All rights reserved.

1.23.1 The Bottleneck in Conventional Hardware and Software Systems 1.23.2 The Evolving Cyberinfrastructure 1.23.3 Scalable and High-Performance Geospatial Applications Over the Modern Cyberinfrastructure 1.23.3.1 GPU and GPU Cluster 1.23.3.2 MIC and MIC Cluster 1.23.3.3 GIS and Remote Sensing Applications Using GPU and Cluster of GPUs 1.23.3.4 GIS and Remote Sensing Applications Using MIC and Cluster of MICs 1.23.4 The Vulnerable Personnel Component in Cyberinfrastructure Development 1.23.4.1 The Emerging Computation Divide 1.23.4.2 Data-Driven Geography Through Concurrent and Parallel Geocomputation 1.23.4.3 Challenges to Next-Generation GIScientists in Response to Data-Driven Geoinformatics 1.23.5 Conclusion Acknowledgments References

1.23.1

341 343 344 344 345 346 347 348 348 349 351 351 352 352

The Bottleneck in Conventional Hardware and Software Systems

It was observed that the number of transistors in a dense integrated circuit doubles approximately every 2 years (Moore, 1965). Moore’s law envisioned that the overall computing power or performance of computer or the clock speed of the processor in the central processing unit (CPU) would double every 2 years since the increasing number of transistors could promote better performance in computation. Traditionally, computer-based research and development in geoinformatics, urban systems, and built and natural environments could achieve scalability and performance improvement by simply waiting for the next generation of computer hardware. In that era, researchers had enjoyed the “free lunch” to adopt better software solutions from hardware improvements. By mid-2000s, it was indicated that such kind of “free lunch” was over (Sutter, 2005). Due to the physical barrier, it could be difficult to achieve significant performance improvements by further increasing the clock frequency on the uniprocessors. Clock frequency of the chip is calculated by the number of clock cycles repeated per second. Due to the heat dissipation and power consumption, when the clock frequencies reach around 4 GHz, which means the computer completes four billion cycles per second, it is difficult to achieve higher clock frequencies. At 90 nm level, the transistor gates became already too thin to prevent current from leaking out into the substrate (Geppert, 2002; Borkar and Chien, 2011). When the clock frequency on a single core had achieved such physical bottleneck, multicore processors or chip multiprocessors have been developed to improve the overall system performance. Multicore contains two or more CPU cores on one chip. For example, Intel released the dual-core processors into the market in 2005, quad-core processors in 2007, and octa-core processors in 2009. Even if multicore processors provide multiple threads to do different jobs, however, such computing threads are not deployed efficiently when a serial program only utilizes a single thread to complete the job. Most frequently, we enjoy the power of multicore processors by running multiple computer programs concurrently to handle small data and tasks. Fig. 1 displays an example in which a data- and computation-intensive program occupied most of available memory on a desktop computer. As a result, the computer gets frozen to response to any other job requests, even if this quad-core computer has eight threads to handle different job requests. Data-driven geography (Miller and Goodchild, 2015) will be seriously constrained by the inability of existing software and tools that are utilized in processing, analyzing, and modeling large geospatial data. When geographic information systems (GIS) have been widely applied in varieties of disciplines in geosciences, earth and environmental sciences, geography, city and regional planning, transpiration, population, socioeconomic studies, in the era of information explosion, however, the inability of existing geospatial software in handling large data has been exemplified. For example, satellite imagery and aerial photos have been a source of geospatial data that are extensively utilized in geospatial analytics for various researches on the dynamics of coupled natural environment and human systems. With the rapid advance in sensor technology development, many satellites can generate very high-resolution images in digital format and include area beyond the visible spectrum that are more competitive with aerial photos (Jacobsen, 2011). While high-resolution remotely sensed data have become increasingly available, one direct result is the increasing amount of the data. Existing software has been struggling in processing high-resolution remotely sensed data. For example, one tile of a three band imagery data with 0.5 m resolution may have 18 GB of data for 80,000  80,000  3 pixels. It would take about 5.5, 6.5, and 7.5 h to implement an unsupervised classification procedure when this tile of imagery is classified into 10, 15, and 20 classes by commercial remote sensing software. The classification process, however, cannot be complete according to the 95% convergence threshold, which is the maximum percentage of pixels whose class values are allowed to be unchanged between iterations, but has to be terminated because it reaches the maximum iteration number to be performed.

341

342

Fig. 1

Cyberinfrastructure and High-Performance Computing

Computer gets frozen when data and computation-intensive program is running.

Many concerns or challenges are arising in the face of the increasingly available big data. In the public domain of the Extreme Science and Engineering Discovery Environment (XSEDE), a request was recently sent to the XSEDE community in search of applicable solution to handle large geospatial data. In private communication, we received inquiries from Centers for Disease Control and Prevention researchers regarding “alternative technologies that outperform ArcGIS and/or have been parallelized for HPC machines.” In service-oriented computing, OGC’s Web Processing Service (WPS) may resolve the software interoperability at the interface level. In real-world practice, however, WPS providers may have had more difficulty providing stable and capable service that can dynamically handle large data specified by the service requesters. In spatial statistics, permutation and combination calculations over large data have been inapplicable for a long time. Raster-based data analytics and modeling have also been suffering the scalability bottleneck or dilemma. When all separated tiles of data are mosaicked together, the software may not be able to handle the entire single data. If the data are processed individually, when the computation is dependent upon the global information derived from the separated tiles of data, results derived from separated procedures are not consistent and comparable. In certain kinds of spatial computation, the calculation on the boundary row(s) or column(s) of each tile has to be dependent upon the information retrieved from the neighboring row(s) or column(s) before the calculation can be implemented. While GIS has been mostly taught in geospatial-related science and education programs, such as geosciences, geography, urban planning, or environmental sciences, many of such programs in colleges and universities only offer courses to teach students about how to use GIS, remote sensing, statistics, and engineering software products for data processing, analysis and visualization. Drummond and French (2008) expressed concerns that very few urban planners learn any computer programming and relatively few urban planning students have the inclination to become database management and application programming experts. They warned that if all planners in the future were users of GIS only, it is unlikely that the GIS experts would understand or adequately serve the professional needs in planning practice. While some geospatial-related disciplinary programs offer a few courses on spatial database design and management and/or GIS programming for customized applications, very few of such programs teach students about how the software could be developed through the engineering life cycle. In case the existing software cannot do the job over big data or complex computation problems, research and education would be constrained by the inability of existing software. In contrast to the concerns expressed by Drummond and French (2008), Klosterman (2008) seems more optimistic and satisfied to be the user, not developer or creator, of GIS since “future planners will not have to develop GIS-based tools and models from scratch, but will obtain them from the Web and link them together to perform particular tasks,” particularly “planners have used them creatively to revolutionize their analysis and communication” (Klosterman, 2008). Such a “free lunch” of using available software modules is likely true in the era of desktop-based GIS applications since the user community has been generating and sharing tools and models. The same conclusion may not be true when geocomputation is implemented over the modern cyberinfrastructure (Atkins et al., 2003; NSF, 2007). Emerging advanced computing infrastructure and technologies, including many-core chips and multicore processors, such as graphics processing units (GPUs) and Intel’s many integrated core (MIC), and heterogeneous computer systems that combine accelerators and multicore nodes have been significantly improving the scalability and performance of scientific computation in a variety of domains. For example, a NVIDIA Tesla K40 GPU has 12 GB with 2880 multiprocessor cores for a total of 92,160 threads for general-purpose scientific computation. In the case of MIC, Intel Xeon Phi coprocessor 5110P has 60 cores or 240 threads. When a large number of GPUs and MICs are synthesized together, supercomputer can resolve large-scale data-driven scientific computation in general when a huge volume of computing nodes and threads can be used concurrently to resolve the calculation problems.

Cyberinfrastructure and High-Performance Computing

343

In order to efficiently deploy such powerful computing resources, parallel and distributed computing is inevitably the way to go. However, it was indicated that “most applications in the world have not been structured to exploit parallelism. This leaves a wealth of capabilities untapped on nearly every computer system” (Jeffers and Reinders, 2013). That’s to say, most of the software products are developed based on serial programs. Such a fact is particularly true in geoinformatics and geocomputation where mainstream GIS and remote sensing software products are based on serial computer programs that have constrained the scalability and performance in geospatial computation and applications to handle big data. Although the performance of commercial software has been significantly improved, the computational capacity and efficiency have been constrained by the memory and computing resources on a desktop computer when big data are involved. Parallel computing solutions for large-scale data and computation, however, may not be a priority to commercial software companies that are market-oriented or dominated by PC-based software designed for small business, governmental agencies, and educational institutions to ensure their investment is cost-effective. In general, when GPUs or MICs are used for geocomputation, there is no more “free lunch” to obtain the parallel versions of the software modules. Even if GPU is more accessible by the general users when it is integrated within desktop computer, it is not easy to transform a serial program to a compute unified device architecture (CUDA) program for implementation on NVIDIA GPU. Considering the heavy workload in CUDA program development, no more “free lunch” could be expected, particularly when hybrid message passing interface (MPI) þ GPU programs have to be developed to deploy a cluster of GPUs since a single GPU has memory limit as well. In the case of MIC, general users may even not be able to access such an expensive product because it is waste of money and resource if MIC is used to run serial computer programs. To efficiently use MIC, at least MPI and OpenMP solutions have to be applied in addition to the normal C or FORTRAN programs. Such technical thresholds mean that geographers cannot just be creative users but have to be creative developers.

1.23.2

The Evolving Cyberinfrastructure

“Cyberinfrastructure refers to infrastructure based upon distributed computer, information and communication technology,” including the “enabling hardware, algorithms, software, communications, institutions, and personnel” (Atkins et al., 2003; NSF, 2007). This means that cyberinfrastructure is not only about the hardware, software, computer network, and communication but also coupled with organizations that construct and operate the infrastructure and people who are able to work on the advanced cyberinfrastructure. Efforts to explore scalable and high-performance solutions over the modern cyberinfrastructure to process large volumes of data could be traced back to the 1990s. Initially, researchers used multiple computers to process the data separately by running the same scripts or programs on individual machines. Consequently, researchers tried to build a local network to link multiple computers into a cluster, while grid systems were constructed by integrating remotely distributed homogeneous or heterogeneous computers and workstations (Foster and Kesselman, 1999; Foster et al., 2010). In recent years, cloud computing has emerged, which is based on the concept of service computing by offering the infrastructure, platform, and data as services (Armbrust et al., 2010; Yang et al., 2011). Advances in hardware development have transformed the computing infrastructure by shifting from homogeneous systems employing identical processing elements to hybrid computing architectures that employ multicore processors in combination with special-purpose chips and accelerators, such as the GPU and field-programmable gate array (FPGA) (El-Ghazawi et al., 2008). New multicore architectures combined with application accelerators hold the promise of increasing performance by exploiting levels of parallelism not supported by the conventional systems. Following the long trajectory of the evolving cyberinfrastructure, many researchers have been applying parallel and distributed computing technologies for high-performance data processing and analytics over geospatial data. While considerable publications have documented the achievements of prior works, only a few are referenced in this article in order to establish the literature in this domain. In the early stage, a few computer nodes and clusters were used to exemplify such capability (Wang et al., 2004; Zhang et al., 2009) with limited achievement to improve the performance, such as the work of image classification on distributed workstations (Dhodhi et al., 1999) that could only have a speedup of 8 on a 12 spectrum image for a size of 512  512 pixels in each band. Following the trend in grid computing, researchers deployed grid systems for high-performance image processing and analytics and large-scale data storage while offering data and analytic services through grid systems (Giovanni et al., 2003; Yang et al., 2004; Petcu et al., 2007). By deploying Microsoft’s Azure cloud computing resources, researchers tried to reproject large scale of satellite data, for which it would take tens of months to continuously process such data over a high-end quad-core desktop machine (Li et al., 2010). In hyperspectral image processing, cutting-edge technologies of GPU, FPGA, and hybrid clusters have been applied to achieve high-performance solutions (Valencia et al., 2008; Plaza and Chang, 2008; Paz and Plaza, 2010a,b; Gonzalez et al., 2012). Because satellite imagery and aerial photo can be represented as a multidimensional matrix or data cube, the goal of high-performance computing can be achieved by deploying massively parallel computing resources in multiprocessing environment when the data structure could have a perfect match to the computing infrastructure for matrix calculation, such as on the GPUs. Scalability issues may not have the same attraction in contrast to the impressive achievement in high-performance computing. Most frequently, prior works might only deal with a small scale of data that may only have a few tens or hundreds of megabytes or one to two thousands of pixels in each of the two dimensions in one band for a multiband imagery. In the case of hyperspectral data processing, the typical size of the data is 614  512 pixels with 224 spectral bands and a total size of 140 MB (Plaza and Chang, 2008; Paz and Plaza, 2010a,b). Scalability can also be achieved accordingly in a batch processing mode by utilizing highperformance computing resources and solutions to process large amount of data (Li et al., 2010) since individual task has no

344

Cyberinfrastructure and High-Performance Computing

dependency on the other tasks and thus has no impact on the tasks processed by the other nodes. Except the case of processing data in batch mode, researchers may only use one small imagery or data to test the algorithm or exemplify the better performance achieved through advanced computer architecture or computing technologies. Today, high-resolution geospatial data are counted by gigabytes or terabytes. When hundreds of gigabytes of data are available for processing and analytics, scalable computing capability is critical to derive consistent output products with quality. One obvious constraint is that a computer may not have sufficient memory to process large data at once. In this case, a large data may have to be cut into pieces. Although some algorithms can be implemented in batch mode since there is little dependency between individual works, certain analytic modules may not generate consistent output product. In the case of unsupervised image classification over a high-resolution imagery data, if the whole image is cut into two or more pieces, the output products of two or more separate classification processes are not consistent and comparable in contrast to the result generated from a single classification process covering the entire imagery data. Obviously, there is a dependency between the two pieces of the image in the classification process. When the entire image is applied in a single classification process, pixel information from both pieces is utilized to generate the final classification, which is different from the output products that were derived from two separate processes. In the case of processing vector geometric data, how to maintain the topological relationship and consistence between vector geometric features in distributed computing environment has been a challenging task. A dilemma is confronted because if the data are not cut into pieces, a computer may not be able to process such a large-scale data. However, if the data are cut into pieces and processed separately, the quality of the output products may not be consistent in the same quality and criteria. Designing novel algorithms and solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over petascale, complex, and heterogeneous geospatial data with consistent quality and high performance will be the central theme in the research of cyberinfrastructure and high-performance geocomputation.

1.23.3

Scalable and High-Performance Geospatial Applications Over the Modern Cyberinfrastructure

Modern cyberinfrastructure is a distributed computing environment that contains hundreds or thousands of multi- or many-core processors. By deploying large numbers of such processors concurrently, the computation bottleneck discussed in the previous section can be resolved to achieve scalable and high-performance computing. While classic supercomputers have been constructed by multicore CPUs, in recent years, more advanced supercomputing clusters are equipped with modern accelerators, such as GPU, and Intel’s MIC such as Xeon Phi (Shi et al., 2013; Shi and Huang, 2016a,b).

1.23.3.1

GPU and GPU Cluster

Traditionally, graphics on the personal computer were handled by a video graphics array controller (Blythe, 2008; Nickolls and Kirk, 2014). NVIDIA’s GeForce 256 was released in Oct. 1999 as the first GPU in the world, while GPU was the term to denote that the graphics device had become a processor. Originally, GPU was designed to process and generate computer graphics, images, and video games. When high-quality and high-resolution graphics are expected in varieties of applications, in order to process large volume of pixels or vertices or geometries efficiently, hundreds of GPU cores and thousands of threads have to be developed and deployed accordingly. Consequently, new generations of GPUs are massively parallel programmable processors that can be used for general-purpose scientific computation. General-purpose computing on graphics processing units (GPGPU) is thus a more specific term in high-performance computing in comparison with the conventional device as graphics processors. Along with its changing role and capability, GPU has been evolving dramatically. NVIDIA and AMD are two major vendors of GPUs. Although there are differences on the technical details between these two vendors, the general architecture is the same. A typical GPU device contains dozens of multiprocessors, each of which consists of dozens to hundreds of processing cores. The GPU architectures of both vendors have evolved for a couple of generations. Take the GPUs from NVIDIA for example, it has gone through five generations, including G80, GT200, Fermi, Kepler, and Maxwell. As its earliest generation, on the G80 architecture, there are 16 multiprocessors, each of which contains 8 cores. The number of multiprocessors is increased to 30 on the GT200 architecture. In addition, a double-precision unit is added into each multiprocessor. The Fermi architecture is a significant breakthrough (NVIDIA Corporation, 2009). It contains 16 multiprocessors and has a cache hierarchy. Each multiprocessor consists of 32 cores, 2 of each can combinedly carry out double-precision operations. The Kepler architecture is the successor of the Fermi architecture (NVIDIA Corporation, 2012). There are 15 newly designed multiprocessors in the Kepler architecture, each of which contains 192 processing cores and 64 double-precision units. The latest Maxwell architecture mainly targets the gaming industry. Therefore, its double-precision performance is relatively weak compared with the Fermi and Kepler architectures. The GPU architecture is typically organized into an array of highly threaded streaming multiprocessors (SMs) (Kirk and Hwu, 2013). Fig. 2 illustrates NVIDIA’s Kepler GPU architecture and the SM. Modern GPUs are equipped with cache hierarchy. The L1 cache is inside the SM and shared by the processing cores belonging to the same SM. The L2 cache is shared by all the SMs. An SM consists of dozens to hundreds of processing cores. Each SM contains a large register file. Once a thread is scheduled for execution, it will occupy a set of registers until it reaches the end of the execution. Relying on this mechanism, the processing core can switch among threads with almost zero overhead. In addition, each SM contains a scratchpad memory for low-latency data access.

Cyberinfrastructure and High-Performance Computing

345

Fig. 2 NVIDIA’s Kepler GPU architecture. (Reproduced from NVIDIA (2014) NVIDIA’s next generation CUDATM compute architecture: Kepler TM GK110/210.)

There are two parallel programming languages that implement computation over the GPU, that is, Open Computing Language (OpenCL) and CUDA (NVIDIA Corporation, 2014). OpenCL is a cross platform language that can be applied on both AMD and NVIDIA GPUs. On the other hand, CUDA is the proprietary solution designed by NVIDIA for its own GPU devices. Although their flavors are slightly different, the principles are very close. In CUDA, the host refers to the CPU, while device refers to an individual GPU. A CUDA program is composed of the host functions running on the host CPU and one or more kernel functions suitable for parallel execution on the GPU. A kernel function is executed as a grid of threads. The threads in a grid are broken into thread blocks, each of which is scheduled to execute on a SM. Every thread is physically executed on a processing core. Through this mapping of GPU 4 kernel, streaming multiprocessor 4 thread block, and processing core 4 thread, the data parallelism in an application is explicitly expressed. Both the thread grid and the thread block can be up to three-dimensional. As an extension of the C language, CUDA program model includes both host (CPU) functions and kernel (GPU) functions, which are separated and compiled by the NVIDIA C compiler. Normally, the GPU executes the data computation process, while CPU handles data reading and writing. A general scheme for writing CUDA C program can be summarized as 1. 2. 3. 4. 5. 6. 7.

specify the types and sizes of input and output data; allocate memory on GPU for input data, output data, and intermediate data; specify the configuration of the thread grid, that is, specify number of threads per block and total number of blocks; copy input data from CPU to GPU; execute the kernel function for data modeling and computation; copy output data from GPU to CPU; free the allocated GPU memory.

Although GPU has the capability to accelerate scientific computation, a single GPU still has limited memory to handle big data. For example, NVIDIA’s Tesla K40 GPU has 12 GB GDDR5 memory. If the total size of data in computation is over this limit, such a task cannot be completed. For this reason, a cluster of GPUs may resolve the constraint to expand the scalability by utilizing multiple GPUs. Considering that data have to be copied back and forth between the host and device, only certain kinds of computation would be more appropriate on GPU clusters.

1.23.3.2

MIC and MIC Cluster

Xeon Phi is the first commercially available hardware product based on Intel’s MIC architecture. Multicore CPUs, such as Intel Xeon processors, typically coexist with Xeon Phi coprocessors in a hybrid computer node. The current Intel Xeon Phi coprocessor contains up to 61 scalar processing cores. Each core on the coprocessor can run four threads in parallel. These cores are connected through a high-speed bidirectional, 1024-bit-wide ring bus (512 bits in each direction). In addition to the scalar unit inside each core, there is a vector processing unit to support wide vector processing operations. As shown in Fig. 3, each core has its own private L1 instruction cache and L1 data cache. Cache coherence among L1 caches is supported by hardware. There is also an onboard L2 cache shared by all the cores on the MIC card. This cache architecture is close to the traditional multicore CPU; however, it is quite different from the cache architecture on GPUs. On GPUs, there is no direct communication between SMs. Therefore, cache coherence is not

346

Cyberinfrastructure and High-Performance Computing

Fig. 3 Internal architecture of Intel’s Xeon Phi coprocessor. Reproduced from Intel. (2010). Intel® many integrated core architecture. http://www. many-core.group.cam.ac.uk/ukgpucc2/talks/Elgar.pdf (Accessed September 15, 2015).

supported. On a MIC card, there is typically an off-chip global memory shared by all the cores in the same card. This global memory is separate from the main memory on the host. Therefore, data need to be explicitly transferred to the global memory on the MIC card for efficient data processing. Because each core alone is a classic processor, traditional parallel programming models, such as MPI and OpenMP, are supported by each core. The communications between the cores can be realized through the shared memory programming models, for example, OpenMP. Additionally, each core can run MPI to realize communication. When multiple MIC cards are integrated into a cluster, direct communication between MIC processors across different nodes in the cluster is also supported through MPI. When massively computing resources are available, different approaches can be deployed to parallelize scientific computation on computer clusters equipped with MIC processors. Fig. 4 displays the two most commonly used approaches. In the native model, the MPI process is directly run on each MIC core. In other words, each MIC core directly hosts one MPI process. In this way, the 60 cores on the Xeon Phi 5110P are treated as 60 independent processors while sharing the 8 GB onboard memory. To take advantage of the parallelism on each MIC core, the multithreading approach can run four threads in each MPI process using OpenMP. In the offload mode, the MPI processes will be hosted by CPUs, which will offload the data and computation to the MIC coprocessors using OpenMP. Besides the native model and the offload model, hybrid model is also supported as a third approach. In a hybrid processing model, data and computation can be distributed by MPI onto both the CPU cores on Xeon processors and the MIC cores on the Xeon Phi coprocessors. Considering the limited memory and weak computing power on individual processor on MIC, generally, a cluster of MIC has to be deployed to process big data.

1.23.3.3

GIS and Remote Sensing Applications Using GPU and Cluster of GPUs

Due to the intrinsic massive parallelism, GPGPU has been prevalent in high-performance scientific computing in general and in the geocomputation applications in GIS, remote sensing, and geospatial modeling and simulation (Boyer and El Baz, 2013; Nickolls

Fig. 4 Two basic parallel approaches on MIC clusters. (A) Native model and (B) offload model. Reproduced from Intel. (2012). Programming models for Intel® Xeon® processors and Intel® Many Integrated Core (Intel MIC) Architecture. https://www.tacc.utexas.edu/documents/13601/ d9d58515-5c0a-429d-8a3f-85014e9e4dab (Accessed September 15, 2015).

Cyberinfrastructure and High-Performance Computing

347

and Dally, 2010; Shi et al., 2014a; Tarditi et al., 2006). Originated as graphics processors, GPUs are very suitable for matrix manipulation and processing. Consequently, the architecture of the GPU is specifically appropriate for processing imagery data. In recent years, for example, a lot of works on hyperspectral image processing using GPUs have been published (e.g., Nascimento et al., 2014; Paz and Plaza, 2010a,b; Sánchez et al., 2010; Sánchez and Plaza, 2010). A collection of research papers on GPU utilization in remote sensing applications can be found from the 2011 special issue of IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing on “High performance computing in earth observation and remote sensing.” Other relevant themes include image segmentation and classification using GPUs (Bernabé et al., 2012; Men et al., 2012; Ye and Shi, 2013; Yang et al., 2014; Li et al., 2014; Hossam et al., 2014; Hwu et al., 2008). In the field of GIS, raster-based spatial calculation could be a good match to the architecture of the GPU. Among some pilot studies, spatial interpolation has been implemented on the GPUs using different algorithms, such as inverse distance weighting (IDW), Kriging, spline, and natural neighbor (Cheng, 2013; De Ravé et al., 2014; Henneböhl et al., 2011; Shi and Ye, 2013; You and Zhang, 2012). Similarly, viewshed or visibility calculation is another popular topic for acceleration by GPU (Chao et al., 2011; Feng et al., 2015; Gao et al., 2011; Pena et al., 2014; Stojanovic and Stojanovic, 2013; Xia et al., 2010; Strnad, 2011). Other discrete topics on raster calculation over GPUs include spatial index construction, zonal statistics, histogram, polygon rasterization, and solar energy potential estimation (Guan et al., 2014; Huang et al., 2015; Qin et al., 2014; Raina et al., 2009; Steinbach and Hemmerling, 2012; Zhang, 2011; Zhang and Wang, 2014; Zhang and You, 2014; Zhang et al., 2010, 2011, 2014). In contrast to the popular utilization of GPU in remote sensing and raster-based geocomputation, a few works would have explored how vector geometric data can be processed on the GPU. Implementing varieties of spatial index for spatial join or query over the GPU has been discussed from different aspects of applications (Gieseke et al., 2014; Luo et al., 2012; You et al., 2013; Yu et al., 2011; Zhang and You, 2012a). Very few works were about vector geometric computation (Fort and Sellares, 2009; Liao et al., 2013; McKenney et al., 2011; Skala, 2012; Zhang and You, 2012a,b; Oh, 2012). Since parallel computing on a single GPU has memory and bandwidth limitations that are insufficient when handling large volume of data, the hybrid computer architecture and programming solutions by integrating multiple GPUs and CPUs have been increasingly applied to support geocomputation over large-scale data. A variety of applications in remote sensing, GIS, spatial simulation, and modeling have been developed by utilizing clusters of GPUs. In this case, MPI þ GPU could be the general programming style, in which MPI is used to control and coordinate the tasks implemented on multiple GPUs, while data input/output (I/O) would be manipulated by the host CPUs. Many pilot studies have been reported in recent years, including the topics about hyperspectral image processing, analytics, anomaly detection, IDW and Kriging interpolation, unsupervised image classification, cellular automaton-based spatial modeling, and agent-based modeling (Molero et al., 2014; Sánchez and Plaza, 2014; Sevilla et al., 2014; Shi and Ye, 2013; Shi et al., 2014a,b; Tang, 2013; Wu et al., 2014).

1.23.3.4

GIS and Remote Sensing Applications Using MIC and Cluster of MICs

In the case of utilization of MIC and cluster of MICs for geospatial applications, three representative geocomputation applications, that is, Kriging interpolation, iterative self-organizing data analysis technique algorithm (ISODATA), and Game of Life (GOL), have been accomplished (Lai et al., 2014; Shi et al., 2014a) over supercomputer Beacon. Beacon is a Cray CS300-AC cluster supercomputer that offers access to 48 compute nodes and 6 I/O nodes joined by FDR InfiniBand interconnect providing 56 Gb/s of bidirectional bandwidth. Each compute node is equipped with two Intel Xeon E5-2670 8-core 2.6 GHz processors, four Intel Xeon Phi (MIC) coprocessors 5110P, 256 GB of RAM, and 960 GB of SSD storage. Each I/O node provides access to an additional 4.8 TB of SSD storage. Each Xeon Phi 5110P coprocessor contains 60 1.053 GHz MIC cores and 8 GB GDDR5 onboard memory. Thus, Beacon provides 768 conventional cores and 11,520 accelerator cores that provide over 210 TFLOP/s of combined computational performance, 12 TB of system memory, 1.5 TB of coprocessor memory, and over 73 TB of SSD storage, in aggregate. The compiler used in this work is Intel 64 Compiler XE, Version 14.0.0.080 Build 20130728. Kriging is a geostatistical estimator that infers the value of a random field at an unobserved location. Kriging is based on the idea that the value at an unknown point should be the average of the known values at its neighbors. The algorithm itself reads input data and returns a raster grid with calculated estimations for each cell. No matter using the native model or the offload model, MPI is used for distributing the computation among computer nodes. For Kriging, the output raster grid is evenly distributed among multiple MPI processes. Each MPI process also receives the whole input data. Then, each MPI process works on its own subgrid by interpolating the value of unknown points using their neighbors. The ISODATA is one of the most frequently used algorithms for unsupervised image classification in remote sensing applications (Ball and Hall, 1965). The objective of this benchmark is to classify the image into n classes. In general, ISODATA can be implemented in three steps: (1) calculate the initial mean value of each class, (2) classify each pixel to the nearest class, and (3) calculate the new class means based on all pixels in one class. The second and third steps are repeated until the change between two iterations is small enough. In order to parallelize the computation, the whole image is partitioned into blocks of the same size. Each block is sent to a different MPI process. During each iteration, each MPI process first calculates the local means of n classes. Then, all MPI processes send their local means to the head MPI process. After the head MPI process collects all the local means, it calculates the global means for the n classes and returns them to all other MPI processes. Then, all the MPI processes start the computation of the next iteration. Cellular automata (CAs) are the foundation for geospatial modeling and simulation. GOL (Gardner, 1970), invented by British mathematician John Conway, is a well-known generic CA that consists of a collection of cells that can live, die, or multiply based on

348

Cyberinfrastructure and High-Performance Computing

a few mathematical rules. The universe of the GOL is a two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, alive (“1”) or dead (“0”). Every cell interacts with its eight neighbors, which are the cells that are horizontally, vertically, or diagonally adjacent. In each iteration, the statuses of all cells are updated simultaneously. In order to parallelize the updating process, the cells in the square grid are partitioned into stripes along the row-wise order. Each stripe is handled by one MPI process. At the beginning of each iteration, each MPI process needs to send the statuses of the cells along the boundaries of each stripe to its neighbor MPI processes and receive the statuses of the cells of two adjacent rows. On supercomputer Beacon, MPI was used for distributing workloads among allocated MICs. On Intel MIC coprocessors, the native model, that is, MPI processes directly run on MIC cores, was first applied. For all three representative benchmarks, the strong scalability of Beacon cluster was examined until the performance reached a plateau due to the introduced communication overhead. Furthermore, a pilot study was conducted to compare different programming models on the Intel MIC coprocessor with detailed report on the scalability and performance comparison (Lai et al., 2014). In summary, when the native model was applied, OpenMP was further used to increase the number of threads running on each MIC core. On the offload model in which the MPI process runs on the host CPU, various numbers of threads were deployed to the MIC coprocessor for performance comparison. Further, a third programming model was implemented, that is, the hybrid model in which the workloads are scheduled onto both the CPUs and the MICs. Experiments demonstrated that the native model is typically better than the offload model. The hybrid model can provide extra performance improvement. The same three benchmarks were implemented also on Keeneland, a cluster of GPUs. It was concluded (Shi et al., 2014a) that GPU clusters would have exemplified the advantage in the category of embarrassing parallelism, such as Kriging interpolation calculation. This is reasonable since GPU clusters could have sufficient and more computing threads to do the calculation when the scale of data and computation could be within the memory limit. When the large data and the computation problem can be partitioned into multiple GPUs, GPU cluster could have more potential to accelerate geospatial applications to achieve significant performance improvement compared with the traditional MPI þ CPU parallel implementation and single-CPU implementation. In the case of geospatial computation that has simple communication between the distributed computing nodes, such as ISODATA for unsupervised image classification, especially when larger data are involved in iterative computation procedures, the simple MPI native programming model on Intel MIC cluster can achieve a performance equivalent to the MPI þ GPU model on GPU clusters when the same number of processors is allocated. It is implied that an efficient cross node communication network will be the key to achieve the strong scalability for parallel applications running on multiple nodes. In the case of geospatial computation that has intensive data exchange and communication between the distributed computing nodes, such as CA, it seems that the simple MPI native programming model on Intel MIC cluster can achieve better performance than the MPI þ GPU model on GPU clusters when the same number of processors is allocated. This is because data have to be copied back and forth between the host CPUs and the GPUs. When the spatial computation and simulation have to be implemented for a lot of iterations, the communication overhead between CPUs and GPUs could be prominent.

1.23.4

The Vulnerable Personnel Component in Cyberinfrastructure Development

According to the definition of cyberinfrastructure (Atkins et al., 2003; NSF, 2007), personnel is a critical component in the modern cyberinfrastructure, which is a distributed and massively parallel computing environment and is very different from the traditional engineering environment for mainstream software based on serial algorithms and programs developed for use on desktop PC. In this case, if scientists want to efficiently deploy the supercomputing power in the distributed and parallel computing environment, they need to understand and implement advanced but complex technologies (e.g., GPU programming, MPI, OpenMP, Pthread, MapReduce on Hadoop, or Spark) and more to handle I/O, memories, etc. in order to complete the tasks over such advanced computing infrastructure. Obviously, there are more challenging issues in personnel training and education when GIScientists and geographers, as well as researchers in the broader social, behavior, and economic (SBE) community, who do not have sufficient computer science background and capacity, are facing complex and difficult learning curves. When the hardware infrastructure is available and stable for deployment in scientific computation, the vulnerable personnel component has been a negative facet in cyberinfrastructure development. If few GIScientists and geographers are capable of working on the modern cyberinfrastructure, it is almost not possible to make substantial achievement in response to the big data science. If the data can be handled on PC or workstation, it is not a big data problem at all.

1.23.4.1

The Emerging Computation Divide

Lathrop and Murphy (2008) indicated that “An unexpected dilemma looms over this relentless drive toward abundant computer resources and transformative computational applications: the workforce to develop, implement, and conduct the analysis of these models is shrinking!” At the same time, the computation divide over the cyberinfrastructure has been emerging and evolving within the science community in general. In another survey report, Pennington et al. (2008) indicated that our understanding of global environmental problems is limited by an inability to effectively analyze data from sources as wide-ranging as field and lab experiments, satellite imagery, and simulation models. The report addressed the challenges that environmental scientists lack the requisite computer literacy because they have had little previous involvement with advanced technologies. Although thematic workshops would help the participants to

Cyberinfrastructure and High-Performance Computing

349

recognize those potentially transformative technical approaches, scientists became painfully aware of their lack of requisite skills, and frequently, they are frustrated by the gap between their technical skills and the skills needed to use the advanced computing system and infrastructure resources. The computation divide has been an obstacle in geoinformatics, geography, urban studies, and environmental research for a long time. If computational and geospatial researchers would like to break through the computational constraints in existing software modules and to deploy the emerging advanced computer architecture and computing infrastructure, they have to build a stronger capability to bridge between domain sciences that privilege the geospatial perspective and computer science and engineering programs. At this time, it is not true that spatial modeling and computation are something that computer engineers, computer scientists, and information scientists are already doing. First of all, geospatial computation and geoinformatics are not the mainstream research programs in any computer science department. Individual computer scientists may have discrete interests on a few topics in geospatial computation, but few computer scientists can cover broad research topics in geocomputation. Secondly and most probably, most computer scientists do not have domain knowledge in geoinformatics research. It has been witnessed that certain collaborative computer scientist and students would take several months to learn but could hardly understand the domain knowledge in geocomputation thus could hardly develop any meaningful solutions. Spatial scientists and decision-makers should not just imagine that someday computer science professionals will resolve the problem and develop new software or solutions for geoscientists or environmental scientists and geoinformatics scientists. Such an attitude and inclination will damage major research funding programs that were designed to support research in geospatial science and geoinformatics. As a result, the computation divide will be enlarged into vicious circles, while research in geospatial computation and geoinformatics will be slowed and deconstructed in the face of the big data challenge. The next generation of researchers (or at least a proportion of these experts) in geoinformatics, geography, urban and environmental sciences, and broader SBE sciences should have such capability in order to advance and transform the research of data-driven geoinformatics and geography with the support of distributed and parallel computing over the cyberinfrastructure.

1.23.4.2

Data-Driven Geography Through Concurrent and Parallel Geocomputation

Concurrency has been claimed as “the next big thing” and “the way of the future” in software development, while parallelism is the key (Sutter and Larus, 2005). In computer science, concurrency means computer programs are executed simultaneously on multiple cores or threads on the same processor or executed on distributed processors. Both concurrency and parallelism are not new concepts. Particularly, concurrency and parallelism might not have urgent needs when multicore technology has been evolving incrementally. Until 2012, Intel announced that Xeon Phi will be the brand name for all products based on the MIC architecture. Reinders (2012) explained the fundamental difference between multi- and many-core technologies as MIC is a highly parallel device with dozens of cores. When clusters of MIC are integrated into a computational system, massive computing threads and processors are available. In the case of GPUs, a single Tesla K40/K80 GPU has 2880/4992 CUDA cores. Now that clusters of MICs and GPUs are available, data-driven geography can be promoted and realized through concurrent and parallel geocomputation. Several pilot studies on complex geocomputation and simulation have been completed successfully over large geospatial data. In the research of urban sprawl simulation, many approaches exist to model urban land-use changes and associated natural and socioeconomic dynamics. A large proportion of them is based on variants of the CAs model, a discrete computational model used to simulate dynamic spatial processes through a set of transition rules. In a CA-based simulation (Guan et al., 2015), the dataset covers the entire state of California and has five normalized layers (i.e., elevation, slope, distance to major transportation networks, distance to city centers, and land use of 1992), an exclusion layer, and an urban layer of 1992. Each layer contains 40,460  23,851 cells at 30 m resolution, stored in GeoTIFF format. The CA-based model simulation also generates four intermediate layers when executing the transition rules (one layer per rule) and the final output layer representing the simulated urban distribution, all with the same spatial dimensions as the input layers. The total memory space required is over 20 GB. When 64 GPUs on supercomputer Keeneland (http://keeneland.gatech.edu/) were utilized, a 50-year simulation of urban growth was completed in about half minutes, achieving a speedup of 400 in comparison with serial programs. High-performance solutions upon concurrency and parallelism will be indispensable in the future of data-driven geography for geospatial data processing and analytics in response to the big data challenge. For example, satellite imagery and aerial photos have been extensively applied in the research in geoscience, geoinformatics, urban studies, and environmental sciences. By deploying supercomputers Kraken (http://www.nics.tennessee.edu/First-academic-PetaFLOP), Keeneland, and Beacon (https://www.nics. tennessee.edu/beacon), unsupervised image classification by the ISODATA was successfully implemented over 100 GPUs and MICs to understand the different performances of such hybrid computer systems (Shi et al., 2014a,b). When the aforementioned 18 GB image data were classified into 10 classes, unsupervised classification can be completed successfully in about a half minute. Scalability test was conducted on Kraken by deploying 10,800 processors to complete unsupervised image classification in 1 min over 12 tiles of 18 GB imagery data for a total of 216 GB in a single computation process, which is completely undoable over desktop computer at all. In a recent endeavor of supervised image classification using maximum likelihood classifier (MLC) algorithm, the classification over 18 GB image data can be completed in about 19 s on a Tesla K40 GPU, achieving a speedup of 99  in comparison with the same function implemented by commercial software. It can be expected that such a classification process can be accomplished in a couple of seconds when multiple GPUs are utilized. In the case of spatial interpolation using IDW and Kriging algorithms, when three GPUs on Keeneland were deployed, both IDW and Kriging interpolation over more

350

Cyberinfrastructure and High-Performance Computing

than one million output cells can be completed in 2–3 s on average, achieving a speedup of 1200  in comparison with the serial programs (Ye et al., 2011; Shi et al., 2013). While the large volume of geospatial data could lead to big data challenge in data-driven geography research, on the other hand, small data may generate petascale (1015) or exascale (1018) computing problems as well. For example, affinity propagation (AP) was first published by Science in 2007 (Frey and Dueck, 2007) and has more than 2800 citations till now. As a relatively new clustering algorithm, AP is not widely applied in geoinformatics yet. Unlike other classification or clustering algorithms, such as ISODATA, k-means, and MLC, AP does not specify a predefined arbitrary number of clusters in advance but will derive the number of clusters as the result. Furthermore, AP can be applied in cluster analysis on raster or image data, vector geometric data, and text data. For this reason, AP has significant potential in geoinformatics in the identification of spatial clusters and other research and applications, such as data resampling, spatial filter, and pattern analysis. To implement the AP algorithm (Frey and Dueck, 2007), a similarity matrix S contains N  (N  1) records of the negative values of the distance between each point to all other points. The other input data contain the preference value of the n input points. The similarity matrix S describes how each data point is presented to be the exemplar, while data points with higher preference values could be selected as cluster centers or exemplars. In this case, the preference value determines the number of identified clusters. In AP, all data points are considered equally as potential exemplars or the cluster centers. For this reason, the preference values are initialized to a common value, which is usually the median in the similarity matrix. In general, AP is an optimization process to maximize the similarity or to minimize the total sum of intracluster similarities. Unfortunately, AP can only handle a small data with embarrassing performance as it can easily go beyond the capability of a desktop computer. “Affinity propagation’s computational and memory requirements scale linearly with the number of similarities input; for nonsparse problems where all possible similarities are computed, these requirements scale quadratically with the number of data points.” (Dueck, 2009) In the case of image analytics, an image with a dimension of 100  200 pixels has a total of 20,000 pixels. The size of similarity matrix is almost about 108 and needs about 3.2 GB memories, which could hardly be processed by the serial program of AP that has to allocate 3.2  5 ¼ 16 GB memory space to process those intermediate parameters to run the AP program. The same scalability and performance constraints exist when large amount of geospatial features in vector datasets is used. In our pioneering study, AP calculation needs large memory to process big data, for example, 10 K points need 4 GB memory, 20 K points need 16 GB memory, and 40 K points need 64 GB memory. Solutions over clusters of GPUs have to be developed to handle big data. For example, when we transformed AP from serial program to a CUDA program, it can only process 15,000 points for AP cluster analysis on a Tesla K40 GPU (Shi, 2015). If the data have more than 15,000 points, a cluster of GPUs has to be deployed. If AP can be applied in geography and geoinformatics to resolve real-world problems, the scalability bottleneck has to be overcome. Similarly, the calculation of near repeat (NR) (Ratcliffe and Rengert, 2008) can easily go beyond petascale or exascale as well even when the size of the data is not very big. NR is a theory or hypothesis indicating that when a crime event takes place at a certain location, its immediate surroundings face an increased risk of experiencing subsequent events within a fairly short period of time (Ratcliffe and Rengert, 2008; Wells et al., 2011). The conclusion or hypothesis derived from the concept and theory of NR is critical in policy making particularly in urban areas because it can suggest interventions that might prevent possible subsequent crimes. NR has two kinds of calculation about event pairs and event chains. Firstly, all two-event pairs can be derived from the input source data, which contains the space (x and y coordinates) and time (t) information for each event. The output result is a table of a given number of rows and columns. For example, within a 5  5 table, any two events that are within a given distance in space and a given interval of time (e.g., every 500 m and 30 min) will be placed into a corresponding cell in the table. To generate a relatively stable statistical significance, 999 simulations are usually required in NR calculation. N incidents will derive N*(N  1)/2 pairs for investigation at each single run splitting into multiple bands defined by space spatial distance and time interval, with an additional 999 simulated runs to get the statistical significance for each band in the output matrix. Ratcliffe’s NR calculator would spend more than 10 h to process only 100 events. When N reaches 30,000, for example, the calculation goes beyond the capacity of NR calculator that could hardly finish such intensive computation within a reasonable amount of time. Through a pilot study on a sample data of 32,505 events, which could be the total number of crime events in a big metropolitan area for 1 year, a single GPU on a desktop machine was used to accomplish the two-event pair calculation (N*(N  1)/2 pairs) and permutation in about 48.5 min. When 100 GPUs on supercomputer Keeneland were used, the two-event pair calculation and permutation were finished in about 4 min. After N*(N  1)/2 pairs are derived from N incidents, the results are split into multiple bands of a table according to specified spatial distance and time interval. Within each band, event chain calculation first identifies how many three events are chained together. That is to say, for any three events {e1, e2, e3}, if all two-event pairs {e1, e2} {e1, e3} {e2, e3} exist in the band, these three events establish a three-event chain. Consequently, higher number of event chains can be derived from the lower number of event chains. For this reason, event chain calculation is all about the combination of event pairs. The theoretical computational intensity for different numbers of event chains can be estimated. When event chain calculation is implemented over 30,000 incidents (N ¼ 30,000), it needs to compare N*(N  1)*(N  2)/(3*2*1) or about 4.5  1012 two-event pairs to derive potential threeevent chains. In the case of five-event chain, it needs to compare N*(N  1)*(N  2)*(N  3)*(N  4)/(5*4*3*2*1) or about 2.0  1020 two-event pairs, which is beyond the exascale in calculation. Obviously, not all event chains can be identified in the pilot study when 32,505 events were processed. Within the 5  5 output table containing all two-event pairs, one band with 902 two-event pairs was selected to derive all event chains. 4800 processors on supercomputer Kraken were used to complete the task in 130.69 s. Starting from 902 two-event pairs, there are 162 three-event

Cyberinfrastructure and High-Performance Computing

351

chains, 137 four-event chains, 127 five-event chains, 84 six-event chains, 36 seven-event chains, 9 eight-event chains, and 1 nineevent chain. In the case of another band with 19,118 two-event pairs, 4800 processors on Kraken were used to complete a partial work for the multiple chain calculation in more than 10 h. Starting from 19,118 two-event pairs, there are 107,160 three-event chains; 1,009,866 four-event chains; and 9,058,084 five-event chains. There are 1992 unique event IDs that are involved in the calculation for the potential six-event chains, that is, 1016 calculations would be generated. More efficient and powerful algorithms have to be explored to complete the event chain calculation since the largest number of two-event pairs in the output table is 121,121,618.

1.23.4.3

Challenges to Next-Generation GIScientists in Response to Data-Driven Geoinformatics

Drummond and French (2008) reviewed the evolution of GIS, which actually originated in academia. Today, if geographers and geoinformatics researchers would like to deploy the computing power of the new computer hardware in computing infrastructure, one essential task is to understand the fundamental algorithms for geospatial computation in the software modules and geosimulation processes so as to explore the solution of parallelism. Both of the new computer hardware, such as GPUs and MICs, and the CI are characterized by the massively parallel computing environment. However, traditional GIS and remote sensing software have been mainly developed for desktop applications using sequential algorithms, while a few pilot applications applied multithreading technologies. A lot of research and education challenges are arising in response to the emerging computing infrastructure, and the solution may have to be explored even at the interface between software and hardware, while significant software redesign and reengineering are expected. While Pennington et al. (2008) found that many scientists lack requisite skills and frequently are frustrated by the gap between their technical skills and the skills needed to use the advanced computing system and infrastructure resources, Glotzer et al. (2008) reported that students are unprepared to use HPC or parallel computing as a research tool as well. As a common issue in education and training, Glotzer et al. (2008) summarized four aspects of problems that are of significance to geoinformatics programs as well, including (1) difficulty in balancing domain topics and scientific computation in a way that provides both depth and breadth, (2) a lack of software engineering skills needed to write, modify, verify, and validate robust systems and applications codes that will address community needs over the long term; (3) a lack of understanding of underlying algorithms and their applicability in a highly parallel multiscale environment, and a tendency to use codes as “black boxes”; and (4) insufficient knowledge underlying programming and optimizing for performance. Based on those exploratory studies on utilizing Kraken, Keeneland, and Beacon for geocomputation, if researchers and students in geography, geoscience, and geoinformatics program would like to pursue parallel computing solutions over the CI, a long learning curve is expected covering a variety of themes in computation and computer sciences, such as data structure, algorithm, and parallel and distributed computing by MPI and OpenMP on GPU and MIC. In vector polygon overlay calculation, for example, data and task partition is the key for the efficient utilization of massively parallel computing resources, while an enhanced data structure can accelerate the partition process (Shi, 2012). Also, in ISODATA for unsupervised image classification, I/O optimization is critical since I/O time can be longer than the classification time. While memory handling could be a common issue in all cases, in parallelizing LISA, a popular function in PySAL for spatial statistics, when the 220,000 polygons of the census block groups are used, if a normal two-dimensional array is applied, even a cluster of computers may not have sufficient memory to handle 220  220K variables as double data type. When the compressed row storage (CRS) approach is applied as the new data structure, LISA calculation over 220K features can be completed in about a couple of seconds even by sequential program in a pilot project accomplished recently. In parallelizing the affinity propagation algorithm, transforming the original serial C program into a CUDA program may need significant algorithm redesign since the original serial program may not be intuitively transformable to parallel programs. Even one line of the original serial C program for AP may result in dozens of lines of code in order to develop the correct CUDA programs, while the data structure also has to be renovated significantly (Shi, 2016).

1.23.5

Conclusion

The “free lunch” through hardware improvements is over because it is difficult to achieve significant performance improvements by further increasing the clock frequency on the uniprocessors. Moreover, the “free lunch” for creative use of software is over. For example, our fellow geographers who would like to apply AP to retrieve exemplars from large geospatial data will not be able to obtain a free solution for parallelized AP programs. In response to the changing computer infrastructure and computing technologies, concurrency and parallelism are the future direction and challenges in geocomputation to enable and support the research of data-driven geography. There is growing evidence that parallel and high-performance computing over emerging advanced computer architecture and systems has the potential to lead to transformative advances in scientific research. While some researchers in geography, geoinformatics, and urban and environmental sciences would be satisfied that they can use existing software in creative ways, the computation divide may increase dramatically if no action is taken promptly to catch up with the advances in computer hardware development and in parallel and distributed computing. Particularly, when high-resolution and accurate geospatial data are increasingly available along with the expanding spatial–temporal extent associated within the geoinformatics research in the era of global change, parallel computing solutions over the emerging heterogeneous computer architecture and infrastructure will be the inevitable way to go.

352

Cyberinfrastructure and High-Performance Computing

Advances in computer infrastructure and computing technologies are revolutionizing the practice of science and engineering research and education. In the past, we frequently see researchers who may deal with a small piece of data over a certain algorithm or method to derive a conclusion over a small area. After their research works were published, many of them may not be sustained for a long time or be transformed into real-world practice to handle big datasets. In fact, many of them may not be easily adapted into the emerging cyberinfrastructure environment since significant efforts of software redesign and reengineering are necessary to transform those serial programs into parallelism. However, pursuing the capability of petascale or exascale computing will enable and enhance future researchers to reshape the landscape and frontier of geoinformatics research and geocomputation for datadriven geography and geoinformatics.

Acknowledgments This work is partially supported by the National Science Foundation (NSF) through the award NSF SMA-1416509. Some of the research results introduced in this paper were supported partially by the awards NSF CCF-1048162 and NSF OCI-1047916.

References Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M., 2010. Above the clouds: A Berkeley view of cloud computing. Communications of the ACM 53 (4), 50–58. Atkins, D.E., Droegemeie, K.K., Feldman, S.I., Garcia-Molina, H., Klein, M.L., Messerschmitt, D.G., Messina, P., Ostriker, J.P., Wright, M.H., 2003. Revolutionizing science and engineering through cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure [online]. Available from: http://www.communitytechnology. org/nsf_ci_report/. Ball, G.H., Hall, D.J., 1965. ISODATA: A method of data analysis and pattern classification. Technical Report. Stanford Research Institute, Menlo, Park, CA. Bernabé, S., Plaza, A., Marpu, P.R., Benediktsson, J.A., 2012. A new parallel tool for classification of remotely sensed imagery. Computers & Geosciences 46, 208–218. Blythe, D., 2008. Rise of the graphics processor. Proceedings of the IEEE 96 (5), 761–778. Borkar, S., Chien, A., 2011. The future of microprocessors. Communications of the ACM 54 (5), 67–77. http://dx.doi.org/10.1145/1941487.1941507. Boyer, V., El Baz, D., 2013. Recent advances on GPU computing in operations research. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 I.E. 27th International, pp. 1778–1787. Chao, F., Chongjun, Y., Zhuo, C., Xiaojing, Y., Hantao, G., 2011. Parallel algorithm for viewshed analysis on a modern GPU. International Journal of Digital Earth 4 (6), 471–486. Cheng, T., 2013. Accelerating universal Kriging interpolation algorithm using CUDA-enabled GPU. Computers & Geosciences 54, 178–183. De Ravé, E.G., Jiménez-Hornero, F.J., Ariza-Villaverde, A.B., Gómez-López, J.M., 2014. Using general-purpose computing on graphics processing units (GPGPU) to accelerate the ordinary kriging algorithm. Computers & Geosciences 64, 1–6. Dhodhi, M.K., Saghri, J.A., Ahmad, I., Ul-Mustafa, R., 1999. D-ISODATA: A distributed algorithm for unsupervised classification of remotely sensed data on network of workstations. Journal of Parallel and Distributed Computing 59 (2), 280–301. Drummond, W.J., French, S.P., 2008. The future of GIS in planning: Converging technologies and diverging interests. Journal of the American Planning Association 74 (2), 161–174. Dueck, D., 2009. Affinity propagation: Clustering data by passing messages. Doctoral dissertation. University of Toronto. El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., Buell, D., 2008. The promise of high-performance reconfigurable computing. IEEE Computer 41 (2), 69–76. Feng, W., Gang, W., Deji, P., Yuan, L., Liuzhong, Y., Hongbo, W., 2015. A parallel algorithm for viewshed analysis in three-dimensional Digital Earth. Computers & Geosciences 75, 57–65. Fort, M., Sellares, J.A., 2009. GPU-based computation of distance functions on road networks with applications. In: Proceedings of the 2009 ACM. Symposium on Applied ComputingACM, New York, NY, USA, pp. 1320–1324. Foster, I., Kesselman, C. (Eds.), 1999. The grid: Blueprint for a new computing infrastructure. Morgan Kaufmann, San Francisco, USA. Foster, I., Kesselman, C., Tuecke, S., 2010. The anatomy of the grid. International Journal of Supercomputer Applications, 15(3): 200–222. Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science 315, 972–976. http://dx.doi.org/10.1126/science.1136800. Gao, Y., Yu, H., Liu, Y., Liu, Y., Liu, M., Zhao, Y., 2011. Optimization for viewshed analysis on GPU. In: Geoinformatics, 2011 19th International Conference onIEEE, pp. 1–5. Gardner, M., 1970. Mathematical gamesdThe fantastic combinations of John Conway’s new solitaire game of life. Scientific American 223, 120–123. Geppert, L., 2002. The amazing vanishing transistor act. IEEE Spectrum 39 (10), 28–33. Gieseke, F., Heinermann, J., Oancea, C., Igel, C., 2014. Buffer kd-trees: Processing massive nearest neighbor queries on gpus. In: Proceedings of the 31st International Conference on Machine Learning, pp. 172–180. Giovanni, N.A., Luigi, F.B., Linford, J., 2003. Grid technology for the storage and processing of remote sensing data: Description of an application. Proceedings of SPIE 4881, 677–685. Glotzer, S.C., Panoff, B., Lathrop, S., 2008. Challenges and opportunities in preparing students for petascale computational science and engineering. IEEE Computing in Science and Engineering (September/October), 22–27. Gonzalez, C., Sanchez, S., Paz, A., Resano, J., Mozos, D., Plaza, A., 2012. Use of FPGA or GPU-based architectures for remotely sensed hyperspectral image processing. Integration, the VLSI Journal 46 (2), 89–103. March 2013,. Guan, Q., Zeng, W., Gong, J., Yun, S., 2014. pRPL 2.0: Improving the parallel raster processing library. Transactions in GIS 18 (S1), 25–52. Guan, Q., Shi, X., Huang, M., Lai, C., 2015. A hybrid parallel cellular automata model for urban growth simulation over GPU/CPU heterogeneous architectures. International Journal of Geographical Information Science. http://dx.doi.org/10.1080/13658816.2015.1039538. Henneböhl, K., Appel, M., Pebesma, E., 2011. Spatial interpolation in massively parallel computing environments. In: Proceedings of the 14th AGILE International Conference on Geographic Information Science (AGILE 2011). Hossam, M.A., Ebied, H.M., Abdel-Aziz, M.H., Tolba, M.F., 2014. Accelerated hyperspectral image recursive hierarchical segmentation using GPUs, multicore CPUs, and hybrid CPU/GPU cluster. Journal of Real-Time Image Processing. http://dx.doi.org/10.1007/s11554-014-0464-4. Huang, Y., Chen, Z., Wu, B., Chen, L., Mao, W., Zhao, F., Wu, J., Wu, J., Yu, B., 2015. Estimating roof solar energy potential in the downtown area using a GPU-accelerated solar radiation model and airborne LiDAR data. Remote Sensing 7 (12), 17212–17233. http://dx.doi.org/10.3390/rs71215877. Hwu, W.W., Keutzer, K., Mattson, T., 2008. The concurrency challenge. IEEE Design and Test of Computers 312–320. July/August. Jacobsen, K., 2011. Characteristics of very high resolution optical satellites for topographic mapping. In: IntArchPhRS (vol. XXXVIII-4/W19), Hannover, 6S. CD. Jeffers, J., Reinders, J., 2013. Intel Xeon Phi coprocessor high-performance programming. Morgan Kaufmann, Elsevier Inc, Waltham, MA 02451, USA.

Cyberinfrastructure and High-Performance Computing

353

Kirk, D.B., Hwu, W.W., 2013. Programming massively parallel processors: A hands-on approach, 2nd edn. Morgan Kaufmann., Waltham, MA 02451, USA. Klosterman, R.E., 2008. Comment on Drummond and French: Another view of the future of GIS. Journal of the American Planning Association 74 (2), 174–176. Lai, C., Hao, Z., Huang, M., Shi, X., You, H., 2014. Comparison of parallel programming models on Intel MIC computer cluster. In: Proceedings of Fourth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) as Part of IPDPS, Phoenix, AZ, May 19, pp. 925–932. Lathrop, S., Murphy, T., 2008. High-performance computing education. IEEE Computing in Science and Engineering 9–11. September/October. Li, J., Agarwal, D., Humphrey, M., van Ingen, C., Jackson, K., Ryu, Y., 2010. eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the windows azure platform. In: The 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), April. Li, H., Yang, Z., He, H., 2014. An improved image segmentation algorithm based on GPU parallel computing. Journal of Software 9 (8), 1985–1990. Liao, W., Zhang, Z., Yuan, Z., Fu, W., Wu, X., 2013. Parallel continuous k-nearest neighbor computing in location based spatial networks on GPUS. In: Computational and Information Sciences (ICCIS), 2013 Fifth International Conference onIEEE, pp. 271–274. Luo, L., Wong, M., Leong, L., 2012. Parallel implementation of r-trees on the GPU. In: ASP-DAC 2012, January 30–Febraury 2, pp. 353–358. McKenney, M., De Luna, G., Hill, S., Lowell, L., 2011. Geospatial overlay computation on the GPU. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsACM, New York, pp. 473–476. Men, L., Huang, M., Gauch, J., 2012. Accelerating mean shift segmentation algorithm on hybrid CPU/GPU platforms. In: Proceedings of 2012 International Workshop on Modern Accelerator Technologies for GIScience (MAT4GIScience 2012) Held as Part of GIScience 2012, Columbus, OH, September 18. Miller, H.J., Goodchild, M.F., 2015. Data-driven geography. GeoJournal 80 (4), 449–461. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8) pp. 114–117 Molero, J., Garzon, E., García, I., Quintana-Orti, E., Plaza, A., 2014. Efficient implementation of hyperspectral anomaly detection techniques on GPUs and multicore processors. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (6), 2256–2266. Nascimento, J.M., Bioucas-Dias, J.M., Rodriguez Alves, J.M., Silva, V., Plaza, A., 2014. Parallel hyperspectral unmixing on GPUs. IEEE Geoscience and Remote Sensing Letters 11 (3), 666–670. Nickolls, J., Dally, W.J., 2010. The GPU computing era. IEEE Micro 30 (2), 56–69. Nickolls, J., Kirk, D., 2014. Appendix C. Graphics and computing GPUs. In: Patterson, D.A., Hennessy, J.L. (Eds.), Computer organization and design: The hardware/software interface. Morgan Kaufmann/Elsevier Inc, 30(2), 56–99. NSF, 2007. Cyberinfrastructure vision for 21st century discovery. Report of NSF Council. Available at: http://www.nsf.gov/od/oci/ci_v5.pdf. NVIDIA Corporation, 2009. NVIDIA’s next generation CUDA compute architecture: Fermi. http://www.nvidia.com/content/pdf/fermi_white_papers/nvidia_fermi_compute_ architecture_whitepaper.pdf. NVIDIA Corporation. (2012). NVIDIA’s next generation CUDA compute architecture: Kepler GK110. NVIDIA Corporation. (2014). CUDA C programming guide (v6.5). Oh, B.W., 2012. A parallel access method for spatial data using GPU. International Journal on Computer Science and Engineering 4 (03), 492–500. Paz, A., Plaza, A., 2010a. Cluster versus GPU implementation of an orthogonal target detection algorithm for remotely sensed hyperspectral images. In: Cluster Computing (CLUSTER), 2010 I.E. International Conference onIEEE, pp. 227–234. Paz, A., Plaza, A., 2010b. GPU implementation of target and anomaly detection algorithms for remotely sensed hyperspectral image analysis. In: Proc. SPIE 7810, Satellite Data Compression, Communications, and Processing VI. http://dx.doi.org/10.1117/12.860213. http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid¼723130. Pena, G.C., Andrade, M.V., Magalhaes, S.V., Franklin, W.R., Ferreira, C.R., 2014. An improved parallel algorithm using GPU for siting observers on terrain. In: 16th International Conference on Enterprise Information Systems (ICEIS-2014)Portugal, Lisbon, pp. 367–375. Pennington, D.D., Michener, W.K., Katz, S., Downey, L.L., Schildhauer, M., 2008. Transforming scientists through technical education a view from the trenches. IEEE Computing in Science and Engineering 28–33. September/October. Petcu, D., Zaharie, D., Gorgan, D., Pop, F., Tudor, D., 2007. MedioGrid: A grid-based platform for satellite image processing. In: 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS): Technology and Applications, pp. 137–142. Plaza, A., Chang, C., 2008. Clusters versus FPGA for parallel processing of hyperspectral imagery. International Journal of High Performance Computing Applications 22 (4), 366–385. Qin, C.-Z., Zhan, L.-J., Zhu, A.-X., Zhou, C.-H., 2014. A strategy for raster-based geocomputation under different parallel computing platforms. International Journal of Geographical Information Science 28 (11), 2127–2144. Raina, R., Madhavan, A., Ng, A.Y., 2009. Large-scale deep unsupervised learning using graphics processors. In: Proceedings of the 26th Annual International Conference on Machine LearningACM, New York, NY, USA, pp. 873–880. Ratcliffe, J.H., Rengert, G.F., 2008. Near-repeat patterns in Philadelphia shooting events. Security Journal 21 (1), 58–76. Reinders, J., 2012. Multicore vs. manycore. http://goparallel.sourceforge.net/ask-james-reinders-multicore-vs-manycore/. Sánchez, S., Plaza, A., 2010. GPU implementation of the pixel purity index algorithm for hyperspectral image analysis. In: Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 I.E. International Conference onIEEE, pp. 1–7. Sánchez, S., Plaza, A., 2014. Fast determination of the number of endmembers for real-time hyperspectral unmixing on GPUs. Journal of Real-Time Image Processing 9 (3), 397–405. Sánchez, S., Martín, G., Plaza, A., Chang, C.-I., 2010. GPU implementation of fully constrained linear spectral unmixing for remotely sensed hyperspectral data exploitation. In: Proc. SPIE 7810 Satellite Data Compression Commun. Process. VI vol. 7810, pp. 78100G-1–78100G-11. http://proceedings.spiedigitallibrary.org/proceeding.aspx? articleid¼723106. Sevilla, J., Bernabe, S., Plaza, A., 2014. Unmixing-based content retrieval system for remotely sensed hyperspectral imagery on GPUs. The Journal of Supercomputing 70 (2), 588–599. Shi, X. (2012). System and methods for parallelizing polygon overlay computation in multiprocessing environment. Pending patent submitted to USPTO with Serial No. 13/523,196. Sponsored by NSF CCF-1048162. Shi, X., 2016. Parallelizing affinity propagation using GPUs for spatial cluster analysis over big geospatial data. In: Griffith, D.A., et al. (Eds.), Advances in geocomputation: Proceedings of the geocomputation 2015. Springer. http://www.springer.com/us/book/9783319227856. Shi, X., Huang, M., 2016a. GPGPU in GIS. In: Shekhar, S., Xiong, H., Zhou, X. (Eds.), Encyclopedia of GIS. Springer. http://link.springer.com/referenceworkentry/10.1007/978-3319-23519-6_1649-1. Shi, X., Huang, M., 2016b. MIC in GIS. In: Shekhar, S., Xiong, H., Zhou, X. (Eds.), Encyclopedia of GIS. Springer. http://link.springer.com/referenceworkentry/10.1007/978-3-31923519-6_1607-1. Shi, X., Ye, F., 2013. Kriging interpolation over heterogeneous computer architectures and systems. GIScience & Remote Sensing 50 (2), 196–211. Shi, X., Kindratenko, V., Yang, C., 2013. Modern accelerator technologies for geographic information science. In: Shi, X., Kindratenko, V., Yang, C. (Eds.), Modern accelerator technologies for geographic information science. Springer, New York, pp. 3–6. Shi, X., Lai, C., Huang, M., You, H., 2014a. Geocomputation over the emerging heterogeneous computing infrastructure. In: Transactions in GIS, 18. S1, pp. 3–24. Shi, X., Huang, M., You, H., Lai, C., Chen, Z., 2014b. Unsupervised image classification over supercomputers Kraken, Keeneland and Beacon. GIScience & Remote Sensing 51 (3), 321–338. Shi, X., Lai, C., Huang, M., You, H., 2014c. Geocomputation over the emerging heterogeneous computing infrastructure. Transactions in GIS 18 (S1), 3–24. Skala, V., 2012. Interpolation and intersection algorithms and GPU. In: ICONS 2012, Saint Gilles, Reunion Island, IARIA, pp. 193–198. Steinbach, M., Hemmerling, R., 2012. Accelerating batch processing of spatial raster analysis using GPU. Computers & Geosciences 45, 212–220.

354

Cyberinfrastructure and High-Performance Computing

Stojanovic, N., Stojanovic, D., 2013. Performance improvement of viewshed analysis using GPUTelecommunication in Modern Satellite, Cable and Broadcasting Services (TELSIKS), 2013 11th International Conference on, vol. 02, 397–400. http://dx.doi.org/10.1109/TELSKS.2013.6704407. Strnad, D., 2011. Parallel terrain visibility calculation on the graphics processing unit. Concurrency and Computation: Practice and Experience 23 (18), 2452–2462. Sutter, H., 2005. The free lunch is over: A fundamental turn toward concurrency in software. http://www.gotw.ca/publications/concurrency-ddj.htm. Sutter, H., Larus, J., 2005. Software and the concurrency revolution. ACM Queue 3 (7), 54–62. Tang, W., 2013. Accelerating agent-based modeling using graphics processing units. In: Shi, X., Kindratenko, V., Yang, C. (Eds.), Modern accelerator technologies for geographic information science. Springer, New York, pp. 113–129. Tarditi, D., Puri, S., Oglesby, J., 2006. Accelerator: Using data parallelism to program GPUs for general-purpose uses. Proceedings of the 2006 ASPLOS Conference 34 (5). December 2006, pp. 325–335. ACM, New York, NY, USA. Source: http://dl.acm.org/citation.cfm?id¼1168898. Valencia, D., Lastovetsky, A., O’Flynn, M., Plaza, A., Plaza, J., 2008. Parallel processing of remotely sensed hyperspectral images on heterogeneous networks of workstations using HeteroMPI. International Journal of High Performance Computing Applications 22 (4), 386–407. Wang, J., Sun, X., Xue, Y., Hu, Y., Luo, Y., Wang, Y., Zhong, S., Zhang, A., Tang, J., Cai, G., 2004. Preliminary study on unsupervised classification of remotely sensed images on the Grid. LNCS 3039, 981–988. Wells, W., Wu, L., Ye, X., 2011. Patterns of near-repeat gun assaults in Houston. Journal of Research in Crime and Delinquency 49, 186–212. Wu, X., Huang, B., Plaza, A., Li, Y., Wu, C., 2014. Real-time implementation of the pixel purity index algorithm for endmember identification on GPUs. IEEE Geoscience and Remote Sensing Letters 11 (5), 955–959. Xia, Y., Li, Y., Shi, X., 2010. Parallel viewshed analysis on GPU using CUDA. In: Computational Science and Optimization (CSO), 2010 Third International Joint Conference on, vol. 1. IEEE, pp. 373–374. Yang, X.J., Chang, Z.M., Zhou, H., Qu, X., Li, C.J., 2004. Services for parallel remote-sensing image processing based on computational grid. LNCS 3252, 689–696. Yang, C., Goodchild, M., Huang, Q., Nebert, D., Raskin, R., Xu, Y., Bambacus, M., Fay, D., 2011. Spatial cloud computing: How can the geospatial sciences use and help shape cloud computing? International Journal of Digital Earth 4 (4), 305–329. Yang, S., Dong, J., Yuan, B., 2014. An efficient parallel ISODATA algorithm based on Kepler GPUs. In: International Joint Conference on Neural Networks (IJCNN), Beijing, 2014, pp. 2444–2449. IEEE. Source: http://ieeexplore.ieee.org/abstract/document/6889478/. Ye, F., Shi, X., 2013. Parallelizing ISODATA algorithm for unsupervised image classification on GPU. In: Shi, X., Kindratenko, V., Yang, C. (Eds.), Modern accelerator technologies for geographic information science. Springer, pp. 145–156. New York, NY, USA. Ye, F., Shi, X., Wang, S., Liu, Y., Han, S.Y., 2011. Spherical interpolation over graphic processing units. In: Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information SystemsNew York, NY, USA: ACM, pp. 38–41. You, S., Zhang, J., 2012. Constructing natural neighbor interpolation based grid DEM using CUDA. In: Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applicationsp. 28, New York, NY, USA: ACM. You, S., Zhang, J., Gruenwald, L., 2013. Parallel spatial query processing on GPUs using R-trees. In: ACM BigSpatial, pp. 23–31. Yu, B., Kim, H., Choi, W., Kwon, D., 2011. Parallel range query processing on r-tree with graphics processing unit. In: DASC 2011, pp. 1235–1242. Zhang, J., 2011. Speeding up large-scale geospatial polygon rasterization on GPGPUs. In: Proceedings of the ACM SIGSPATIAL Second International Workshop on High Performance and Distributed Geographic Information SystemsNew York, NY, USA: ACM, pp. 10–17. Zhang, J., Wang, D., 2014. High-performance zonal histogramming on large-scale geospatial rasters using GPUs and GPU-accelerated clusters. In: Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International, Phoenix, AZ, pp. 993–1000. Zhang, J., You, S., 2012a. CudaGIS: Report on the design and realization of a massive data parallel GIS on GPUs. In: Proceedings of the Third ACM SIGSPATIAL International Workshop on GeoStreamingNew York, NY, USA: ACM, pp. 101–108. Zhang, J., You, S., 2012b. Speeding up large-scale point-in-polygon test based spatial join on GPUs. In: Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataNew York, NY, USA: ACM, pp. 23–32. Zhang, J., You, S., 2014. Efficient and scalable parallel zonal statistics on large-scale species occurrence data on GPUs. Technical report. http://www-cs.ccny.cuny.edu/jzhang/ papers/szs_gbif_tr.pdf. Zhang, X., Chen, S., Fan, J., Wei, X., 2009. A grid environment based satellite images processing. In: 2009 1st International Conference on Information Science and Engineering (ICISE), pp. 9–11. IEEE. Danvers, MA. USA. Zhang, J., You, S., Gruenwald, L., 2010. Indexing large-scale raster geospatial data using massively parallel GPGPU computing. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information SystemsACM, New York, NY, USA, pp. 450–453. Zhang, J., You, S., Gruenwald, L., 2011. Parallel quadtree coding of large-scale raster geospatial data on GPGPUs. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsNew York, NY, USA: ACM, pp. 457–460. Zhang, J., You, S., Gruenwald, L., 2014. Data parallel quadtree indexing and spatial query processing of complex polygon data on GPUs. In: ADMS@VLDB 2014, pp. 13–24.

1.24

Augmented Reality and GIS

Nick Hedley, Simon Fraser University, Burnaby, BC, Canada © 2018 Elsevier Inc. All rights reserved.

1.24.1 From Virtual Worlds to Augmented Reality 1.24.1.1 Setting the Scene With Virtual Worlds 1.24.2 Introducing Augmented Reality and Its Relationship to VR 1.24.2.1 Defining AR: Head’s-Up-Display Versus Augmented World 1.24.2.2 Types of AR Interface 1.24.3 Technical Systems: How Is AR Achieved 1.24.4 (Geo) Spatial Uses of AR 1.24.5 Key Challenges for AR 1.24.6 Opportunities and Vision for AR in GIS and GIScience References Further Reading Relevant Websites

1.24.1

355 355 357 358 358 359 362 364 365 367 368 368

From Virtual Worlds to Augmented Reality

This article discusses augmented reality (AR)da form of user interface (UI) that mixes real and virtual spaces. Using visual and spatial tracking, they allow us to register virtual objects in real space. AR interfaces, therefore, have considerable potential for us to connect GIS data and analyses, previously accessible only in the virtual spaces of analysis, to real spaces. This capability suggests fundamentally new ways to use spatial data, perform GIScience work, and view the world around us.

1.24.1.1

Setting the Scene With Virtual Worlds

In immersive virtual environments, users are enveloped by three-dimensional (3D) virtual worlds using head-mounted displays (HMDs). In these environments, users’ everyday realities are replaced by 3D virtual scenes that can be navigated and manipulated using an array of transducers. Such spaces have been developed, used, and studied in many application contexts over several decades: entertainment, medicine, spaceflight training, museum education, molecular problem-solving, geological visualization, big data, ocean science, and spatial analysis. As Winn (1993) points out in his excellent conceptual framework for learning in virtual environments, virtual environments (VEs) can provide powerful knowledge-building that is not possible from first-person real-world experiences. The power of virtual environments as experiential knowledge-building tools includes the ability to go anywhere you have data for, explore it at any relative scale you choose. You can also view phenomena that would otherwise be impossible to view or perceive (deep space, abstract data space), or impossible to survive while viewing (inside a volcano or nuclear reactor, in the ocean deeps, or at the arrival of a tsunami or earthquake). And to enable unprecedented new ways to move, navigate, and explore virtual spaces (fly; start/stop or control the speed of time passing in the world around you). VEs provide opportunities for us to deliver digital worlds that allow us to perceive and experience phenomena that are not directly perceivable in the real world, by giving them “sensable*” (*not a typo!) form (visual, audible, and tangible). This technique of creating perceptible representations of objects and events that would otherwise have no physical form is known as “reification” (Winn, 1993). “Making the invisible perceivable” resonates very well with the well-established objectives of geovisualization: revealing the unknown and visualizing existing data in new ways. Geographers, particularly in the fields of analogue and digital mapping, have used reification to give invisible abstract variables form and geometry in maps for a very long time. GIScience scholars have used geovisualization for reification of subtle geographic features, abstract variables, and complex spatial relationships. In virtual environments, the reifications of data/information that are not normally available to our senses can be made perceivable via an array of transducers for more than just the visual sensory pathways. These include HMDs (sometimes called “VR goggles”), headphones and spatialized sound fields, tactile and kinesthetic haptic feedback devices (such as force feedback vibrations in HTC Vive controllers, and force feedback from electromechanical armatures used in entertainment and tele-operation), and platforms that enable users to walk in virtual space with steps in real space, such as the Virtusphere (see link to 2013 video in references), and to also receive sensory and skeletomuscular feedback of ground slope movement (VR treadmills). In the late 20th century, several geovisualization researchers were exploring the potential of virtual environments to extend the ways in which we deliver geovisual representation and analysis. A range of examples are summarized in Dykes et al. (2005), MacEachren et al. (1999), Slocum et al. (2001), (particularly the articles on virtual environments and newer interfaces), and Hedley (2016).

355

356

Augmented Reality and GIS

Fig. 1 Spatial Interface Research Lab PhD Candidate Sonja Aagesen testing immersive VR system to explore remote First Nations sea cliff pictographs, using a 3D scene generated from image processing.

It is important to note that many of the examples discussed in the citations above were from a period when VR technology was less ubiquitous, more expensive, and of lower performance than is available now. HMDs had lower display resolutions, narrower fields of view (FOV), higher tracking and rendering latencies, and greater bulk and weight than their 21st-century successors. A new wave of more affordable high-performance VR systems that have arrived in the past 2–3 years (Oculus Rift, HTC Vive, PlayStation VR, and Gear VR) have been integrated into geographic research, pedagogy, and outreach by GIScience researchers. At the Spatial Interface Research Lab at the Department of Geography at Simon Fraser University, Canada, we have used several generations of Oculus Rift and the HTC Vive HMDs for within-lab development of interactive GIS and LiDAR data manipulation, and mobile VR systems for public engagement/science communication. Over the same period, students in my 3D geovisualization and 3D interface courses have had hands-on experiences, training, and project assignments using a variety of VR systems (Figs. 1 and 2).

Fig. 2 Students in 3D geovisualization and 3D spatial interface courses at Simon Fraser University gain critical hands-on experience, training, and project assignments using a variety of virtual reality, augmented reality and other geovisual interface technologies. Images provided by the author.

Augmented Reality and GIS

357

The purpose of these activities is to cultivate new cohorts of GIS and GIScience students, who are experienced with emerging 3D interface and visualization technology. There is a massive opportunity (and need) for new and future GIScientists to become knowledgeable, experienced, and skilled in these emerging technologies. Flying around inside of Google Earth VR with a group of senior undergraduate students and building new analytical virtual environments together, we discuss the future of GIScience interfaces and practice. We are going to need to spend a lot of time inside these spacesdusing them, testing them, building themdin order to bridge the gap from where GIS interfaces are currently to where they could or should be as the 21st-century unfolds. We should expect to see a new wave of applications, empirical usability studies, and theory emerge as GIScience uptake and knowledge capital increases. Immersive virtual environments, then, are unique spaces which can deliver powerful visual, spatial, and sensory experiences. They can be used for scientific visualization, geovisualization, interactive analysis, and communication; of complex phenomena and abstract data spaces (such as visual analytics in big data applications; geophysical survey interpretation and analysis). They are well-suited complex phenomena whose spatial location, spatial or temporal scale, or size relative to a human individual makes them almost impossible to perceive or experience directly in the real world. See examples of successful VEs developed to make Newtonian physics demonstrations (Salzman et al., 1996) and Newtonian mechanics (Dede et al., 1994) visible and interactive in 3D space, and VEs used to fuse research vessel data gathered in remote locations, computational ocean circulation models into virtual worlds in which students can go on virtual field trips to sample and interpret geophysical processes and relationships (Winn et al., 2001). And virtual worlds using the latest crop of HMDs are already capturing the imagination of citizens and scientists alike.Jake Rowell’s award-winning TheBlu interactive undersea virtual experiences on the WEVR VR service are exquisitely crafted, open-ended scenarios allowing force-feedback interaction with sea life, which hint at the possibilities for visual fidelity and physical interaction in future VEs (see ‘Relevant Websites’ section; WEVR/TheBlu, 2016). Virtual environments can (and do) deliver engaging (sometimes perspective-changing) 3D user experiences. They are well suited to applications where full immersion and completely synthetic digital environments are preferable, or necessary; perhaps in order to have total control on all objects and events in the world; or to focus user attention on only the content/data/simulations within that world; or to define a “metaverse” (to use Stephenson’s, 1992 construct) where real-world impediment of space, time, or navigation have been removed in support of scientific or pedagogical interpretation. However, the very fact that virtual environments supplant our everyday views of the world with wholly synthetic ones disconnects us from the realities we are attempting to better understand, interpret, analyze, and communicate to others. Users can experience temporal and spatial disorientation when entering (and especially exiting) virtual spaces. These and other aspects of fully immersive, enveloping virtual spaces may impede spatial knowledge transfer from virtual to real space or introduce unnecessary cognitive burdens to relate synthetic stimuli to real phenomena, thus potentially undermining primary objectives of communication, education, and visual analytical support for science. These factors suggest that while immersive VEs may be extremely useful for focused visualization, analysis, and exploration, their inherent specification keeps them largely disconnected from reality, thus diminishing their utility in real geographic environments. However, there is another whole class of interface that has steadily emerged over the past 20 or so years, that has the ability to deliver the visual and spatial content of virtual environments, in real geographic spaces. After years of research and development, AR has truly arrived. And the opportunities for GIScience and its practitioners are immense.

1.24.2

Introducing Augmented Reality and Its Relationship to VR

AR presents an alternative to completely immersing yourself within an enveloping digital space. Instead, it delivers an ability to fuse virtual digital objects with real space using tracking, registration, and rendering architectures, combined with display transducers. The remainder of this article introduces AR, its various types, examples of its geospatial use to date, and the significance and opportunities for our field. In contrast to immersive virtual environments, where users are surrounded by synthetic digital spaces, AR interfaces combine virtual content with real spacesdusing tracking, registration, rendering, and digital display devices to deliver user experiences where reality is “augmented” with digital content. From a definitional standpoint, AR is a subtype of “mixed reality” (MR). Mixed reality was a term introduced by Milgram and Kishino (1994) in their foundational work to conceptualize the middle ground that lies between pure reality and pure “virtuality”, where real spaces converge, overlap, and combine with virtual ones. Different interfaces differ by objective, content, interaction design, transducers, and audience. Each interface implicitly has its own specific emphasis on real or virtual, resulting in a unique proportion of each. The proportion of real versus virtual content allows us to place any interface on this spectrum of possibility. Within the umbrella of “mixed reality,” two main subtypes of interface can by identified: “augmented virtuality” interfaces are predominantly virtual spaces, which have been augmented with real content, such as live video feeds from the real world; “augmented reality” interfaces are predominantly real spaces, which have been enhanced by adding/integrating virtual content (such as 3D objects in views of the real world) (Fig. 3). In addition to defining what mixed reality is, Milgram and Kishino’s structural framework also allows us to consider where conventional interface technologies (such as paper maps and desktop GIS) might be located on this continuum. The reality– virtuality continuum also challenges us to develop a more sophisticated conceptualization of what virtual means and what true virtuality is (i.e., supplanting a human’s entire reality with another that is utterly convincing to all senses, perhaps only achievable through hallucinogenic drugs or science fiction constructs such as The Matrix) (Fig. 4).

358

Augmented Reality and GIS

Fig. 3 3D reconstruction of Petra UNESCO World Heritage Site, using table-mounted tangible AR interface (L); SIRL RAs Sonja Aagesen and William Morgenstern using a tangible AR globe prototype in the lab (R).

Mixed reality

Real

Virtual Augmented reality

Augmented virtuality

Fig. 4 A simplified version of Milgram and Kishino’s Reality-Virtuality Continuum. Adapted from Milgram, P. and Kishino, F. (1994). A Taxonomy of Mixed Reality Visual Displays. IEICE Transactions on Information and Systems, 77(12), 1321–1329.

1.24.2.1

Defining AR: Head’s-Up-Display Versus Augmented World

Now that we understand the difference between virtual reality, reality, mixed reality, augmented virtuality, and AR, we can focus on a more detailed specification of AR. Azuma (1997) provided one of the most robust definitions of AR interfaces: those that superimpose 3D virtual information on the real world (i.e., they combine virtual and physical objects in the same interaction space) are interactive in real time and are spatial, meaning the virtual objects are registered and interactive in 3D space. As AR techniques broadened in the early 21st century, AR researchers’ perspectives have also broadened, where augmentation does not always require 3D virtual objects (Azuma’s specification) but only that virtual objects are registered in 3D physical space in relation to other physical objects. This expansion not only accommodates a multiplicity of mixed reality and AR interface types but also helps reinforce the difference between UIs that augment the 3D space of reality versus “annotated vision” interfaces and enhance views of real spaces through real-time annotation on the display surface, as heads-up-display (HUD)-type UIs.

1.24.2.2

Types of AR Interface

Two most common subtypes of AR are tangible AR (TAR) and mobile AR (MAR). Tangible AR (TAR) interfaces are those that register virtual objects to surfaces or objects that can be physically manipulated, thus enabling a form of direct manipulation of AR content. The very act of touching a physical card provides tactile feedback, skeletomuscular feedback, and visual feedback of the resulting movement of the augmented scene. Examples include attaching virtual 3D objects to physical cards (Figs. 2 and 3), physical pages of books (such as the well-known Magic Book AR project), and spatial AR turntable displays such as those designed by the author for an international science museum exhibit (see Hedley, 2017). Tabletop TAR is simple yet dramatic. As the user moves the card the virtual augment is registered to, the virtual object stays anchored to the real object and moves as if attached to it. This effect is made more powerful due to direct physical touch and control. What were previously mouse or button-actuated and metaphor-mediated activities, such as zoom, pan, and rotate, are now achieved by moving the augmented objects in one’s hands, no differently than if one were inspecting a coffee cup (to inspect, rotate, hold closer or further away, zoom, or pan). Hedley (2017) adapted the use of individual cards to decks of cards that could be placed on a tabletop turntable to create a “live” 3D landscape with inter-object dynamics (like “live” AR “SimCity”) for the Star Wars: Where Science Meets Imagination museum exhibit. See below for a geographic example of tabletop TAR to view and manipulate a reconstruction of Parkes and Thrift’s (1980) space–time diagram (Fig. 5). Room-anchored AR is similar to tangible tabletop AR, except the virtual content is registered to floors, walls, or ceilings. Some researchers have developed geographic AR systems that are anchored to classroom spaces (such as Shelton and Hedley’s

Augmented Reality and GIS

359

Fig. 5 Using tabletop TAR to view, manipulate and inspect the geometry of a reconstruction of Parkes and Thrift’s (1980) space-time diagram. In this work by the author, a digital version of Parkes and Thrift’s figure was reconstructed in 3D to assess the potential of viewing what is a very three-dimensional construct – using a three dimensional interface. Images provided by the author.

(2002, 2004) AR Earth-Sun-Moon educational tool and Woolard et al.’s (2003) solar system education environment). See our geospatial AR holodeck prototype later on in this article. A number of movie production companies have, for some time now, been using AR technology that allows directors and actors, while in green screen studios, to see the CGI environment that will surround them in the final rendering of the movie. Real cameras and displays can be moved freely around these spaces to allow viewing of composite live-action and digital environments from all angles. This has been taken even further by Epic Games and Chevrolet, where an adjustable physical car rig can be reconfigured to match the wheelbase of any car, and digitally “reskinned” using AR tracking, registration, and rendering combined with live-action video (see UploadVR, 2017). Other more advanced AR systems include transitional spacesdsuch as the Magic Bookdwhere users can step in and out of real worlds, enhanced with AR tabletops and books, and enveloping immersive virtual worlds where participants can leave their real surroundings behind and join others to collaborate in shared virtual spaces (see Billinghurst et al., 2001). Mobile augmented reality (Mobile AR or MAR) systems broadly denote AR applications that operate on mobile devices and imply that the user can move while using them. They combine (in various combinations) GPS, RF tracking, infrared scanning, compass, inclinometer, accelerometers, gyroscopes, and other sensors to establish position in geographic space and calculate the pose of the viewing device, detecting features in real space that are used to render virtual objects registered to real space. MAR interfaces run on a variety of hardware platforms, including backpack computers, smartphones and tablets, and wearable computers, such as HoloLens. Displays can be head-mounted or handheld devices. Some MAR systems allow visual and spatial inspection of virtual objects attached to the real world, simply by moving around them. Other systems use multitouch inputs (on tablets or smartphones) or gestural interactions (such as Leap or HoloLens) to scale, manipulate, or modify virtual content. MAR interfaces, then, are highly spatial technologies. This inherent spatiality is well illustrated by Pokémon Godone of the first mainstream MAR games (released by Niantic in 2016)din which users seek virtual characters, objects, and rewards in physical space, using a mobile AR app. As Tsou (2016) points out, Pokémon Go integrates several GIS technologies: Google Maps, Points of Interest (POIs), GPS tracking, buffering (accessing PokéStops), geo-fencing, Volunteered Geographic Information (VGI) (to create PokéStops), land use classification (for locating different types of Pokémons in different regions, such as water fronts, beaches, forests, or parks), and network analysis (designing the best route to visit most PokéStops within a defined amount of time). Gaming aside, Pokémon Go advanced society’s awareness of MAR considerably. It demonstrated how we link virtual digital spaces with the real world, using a range of sensor data, spatial algorithms, computer vision, VGI and LBS inputs. An additional significant outcome was how mixed reality Pokémon Go experiences influenced human behavior. There were many reports of gamers following AR objectives while ignoring real spaces and real-world hazards. This not only led to discussion of safety management for people operating in mixed reality but also raised philosophical questions of where users are psychologically present). In the following sections, we introduce and explain the mechanics and considerations behind these mainstream developments, focusing on spatial applications of AR, recent progress in geographic AR, and its implications for GIScience.

1.24.3

Technical Systems: How Is AR Achieved

Mixed reality environments combine real and virtual objects and spaces. Therefore, real and virtual objects need to be accurately and simultaneously tracked in real time in real and corresponding virtual spaces, in order to collocate them in mixed reality. Most commonly, this “‘mixing” or “augmentation” is achieved by using tracking, registration, rendering to combine virtual objects with views of real environments, using a variety of digital display devices. The key elements of AR architecture are: 1. An ability to track objects in virtual and real space. 2. An ability to compute and render the positions of virtual objects registered to real-world space, at speeds that match human perceptual experiences of real space. 3. An ability to view the combination of real and virtual objects, using a display.

360

Augmented Reality and GIS

Fig. 6 An example of Tangible AR. Virtual object created from structure-from-motion imagery of a Portuguese façade - registered to a physical card with fiducial marker. Since the card can be physically manipulated, this can be considered TAR. Image provided by Ian Lochhead.

Tracking. In a similar vein to the early days of GIS, early mainstream AR research and development focused on getting hardware systems working robustly enough to start thinking about realistic applications. Tracking and registration performance in unprepared outdoor environments was a substantial challenge in early work. Much of the early work in AR concentrated on improving AR registration and tracking (Feiner et al., 1993). In early AR work, tracking using fiducial (abstract black and white glyph-based) tracking markers was more robust in prepared indoor environments than it has in unprepared outdoor environments. Factors such as distance, light wash-out and signal interference made tracking and registration challenging (Azuma et al., 2001), and still do for systems using visual tracking of abstract markers. Advances in computer vision in the past decade or so have seen the emergence of natural feature tracking (NFT) as a more flexible way to use computer vision algorithms to register virtual objects to real scenes. Advances in spatial sensor data have also become far more integrated into AR applications. Over the past decade, tracking has become more reliable, employing NFT to enable more flexibility with the ability to train an application to recognize and register 3D virtual content to everyday spaces on the fly (the popular AR platform Augment does this very well). Greater integration of technologies including GPS, accelerometer, gyroscope, RFID, structured light sensing, and SLAM (simultaneous localization and mapping) algorithms have led to a variety of increasingly robust tracking and AR solutions in a greater range of environments (Fig. 6). Displays for AR come in many forms. Augmented overlays can be projected onto the world, using miniaturized wearable projectors and virtual tables, a set of techniques sometimes called “spatial augmented reality.” Augmented displays can be HUDs, where digital information is projected onto a surface so that they can be viewed as part of a larger everyday task. Automotive HUDs project speed and navigation information onto the inside of a car’s windshield (see example from Volvo (2016), available in the list of Relevant Websites in this article.). TVs can act as augmented displays, when sports coverage augments live video with first down lines elegantly integrated with the objects and activity on a football field (i.e., providing occlusion and interposition) (Fig. 7).

Fig. 7 Automotive head-up display (HUD). Screen capture from YouTube video posted by Volvo Cars. (Volvo, 2016). Source: https://www.youtube. com/watch?v¼pCvvoJDO8Ck.

Augmented Reality and GIS

Fig. 8

361

The author using HoloLens gestural inputs to annotate geographic space with hazard classification glyphs with the DisastARcons app.

Displays can be handheld, such as smartphones and tablets used as an AR “lens” to look through rather than look at, and “opera glass” style used in HITLab’s MagicBook project at SIGGRAPH. AR capable eyeglasses (such as Google Glass) can provide the wearer with simple digital overlays. Most recently, the arrival of pass-through HMD configurations are enabling developers to experiment with simple mobile HMD-based AR (such as with Gear VR or Google Cardboard) and with high-powered cabled AR in indoor spaces (using Vive, see Techcrunch article (2016), available in the list of Relevant Websites in this article.) (Fig. 8). Microsoft’s Hololens wearable computer, Meta’s Meta 2 AR display, Google’s Project Tango, and Occipital’s Structure Sensor and Bridge platform go beyond simple displays. All of them scan/sense the space around them as well as fuse virtual and real content into the user’s view. Structure’s Bridge sensor/smartphone enclosure is a head-mounted structured-light-enabled 3D scanning enabled smartphone-based AR/VR system (see video, Occipital (2016), available in the list of Relevant Websites in this article.) (Figs. 9 and 10). The significance of these latest scanning displays is that they provide real-time localized 3D scanning information with which to register augmentation with great accuracy and robustness. Furthermore, this enables an ability to reconstruct an approximate of the entire scene in 3D to compute occlusion zones. Managing visual occlusion and interposition in AR scenes is crucial to delivering visual experiences that are perceptually cogent. User perception (and belief) in the veracity of AR scenes hinges on their ability to look and behave like normal first-person views. In order to convince users that virtual objects are part of real space, they must appear to integrate visually and spatially. Interleaving real and virtual objects so that they deliver robust depth cues and relationships has been a high priority of many AR and MR researchers. The arrival of wearable devices that are both AR displays and 3D scanners is ushering in whole new ways to combine real and virtual information and increasingly seamless interplay between real and virtual spaces. This opens up a universe of new ways to view, manipulate, and use spatial data simulations. Before HoloLens and Meta2 devices were available, Lonergan and Hedley (2014) began exploring these emerging spaces of geovisual analysis and experience through a set of prototypes and research which introduced Flexible Mixed Reality (FMR), the ability to select what proportions of real and virtual worlds/content users wished for geovisual simulation and analysis in real spaces. This is a particularly useful capability and not simply to implement Milgram’s reality–virtuality continuum. The power of FMR is that

Fig. 9

Live geometry (green) captured by the structured light sensor used by Occipital’s Structure sensor. Image provided by the author.

362

Augmented Reality and GIS

Fig. 10

The derived geometry of a user’s surroundings captured by Occipital’s Structure sensor. Image provided by the author.

we can incrementally link fully real and fully virtual spaces of spatial analysis by controlling exactly how much abstract digital analytical content is combined with views of the real world and spatially registered to geographic space. This has huge implications for the way we may perform, deliver, and consume.

1.24.4

(Geo) Spatial Uses of AR

Examples of spatial applications of MAR include Columbia University’s Touring Machine (Feiner, et al., 1997), a self-contained backpack system that includes differential GPS, compass, inclinometer, mobile computer, and see-through HMD. Using this system, users experience information that is world-stabilized, in this case, virtual labels attached to buildings in campus environment. Some might contend that this system constitutes “augmented vision” in that it annotates views of the real world, as opposed to Azuma’s (1997) specification of 3D virtual objects anchored in the context of the real world. Another group who developed a similar system were the Naval Research Laboratory (NRL), whose wearable Battlefield Augmented Reality System augmented urban military operation environments with tactical information (Azuma et al., 2001). Other examples of wearable MAR include an application for tourists visiting Greek archaeological sites (Vlahakis et al., 2002) and real-time 3D augmented gaming in real-world spaces (Piekarski and Thomas, 2002; Thomas et al., 2002). In the early 2000s, geospatial researchers developed tangible AR interfaces for collaborative 3D geographic visualization (Hedley et al., 2002) and for geographic education (Shelton and Hedley, 2004). Woolard and colleagues (2003) and Shelton and Hedley (2004) built some of the first systems to deliver AR Solar System and Earth-Sun-Moon pedagogy registered to classroom spaces, and each conducted empirical user studies which established some baselines for cognitive and pedagogical performance. The work of Hedley (2001), Shelton and Hedley (2004), and Woolard et al. (2003) performed some of the first geographic-specific empirical studies of geographic tangible AR for pedagogy in classrooms, museums, and public outreach spaces. Hedley’s (2001) research demonstrated the significance of tangible AR as a 3D visualization system that interfaces powerfully with human proprioception and sensorimotor learning. Over time, MAR researchers have found application and context-specific preferences for handheld “lensing” with devices. The work of Wayne Piekarski and Bruce Thomas’ TinMith research (2002) contributed heavily to this knowledge capital. In the early 2000s, a small network of geographic information researchers were actively exploring how simple geospatial MAR capability using everyday ubiquitous computing devices might support geographic learning (see Radburn, 2006; Priestnall and Polmear, 2006, 2007). Other examples included the MARA mobile imaging project (Kähäri and Murphy, 2006), using aerial photographs to enhance MAR annotation (Wither et al., 2006), while Spring and Braun’s (2008) Enkin prototype application for Google’s Android platform (Fig. 11).

Fig. 11 Hedley’s early MAR tablet work (2004) used windows tablets and outboard GPS to enable mobile augmented geographic visualization and interaction in everyday spaces. While the hardware was cumbersome and low-performance by today;s standards, the research yielded much knowledge capital from experiences. Images provided by the author.

Augmented Reality and GIS

363

The enabling technologies and research of early AR research led to new devices and applications that integrate location awareness, video cameras, and sufficient computational power to run the tracking, computer vision, and rendering elements of many AR software libraries. This in turn led to the emergence of mainstream AR authoring solutions, including ARToolkit, Junaio, Metaio, ARMedia, and Augment. There has been a groundswell in geographic uses of AR, including empirical assessment of tangible AR for geovisualization (Hedley, 2001), collaborative tangible AR for geovisual interfaces (Hedley et al., 2001, 2002), dynamic tangible AR map spaces (Cheok et al., 2002), cartographic information (Schmalstieg and Reitmayr, 2007), building information overlays (such as Kamat and El-Tawil’s, 2007 earthquake and Kim and colleagues’ (2016) building damage assessment tool), underground infrastructure viewing using AR (Schall et al., 2009), augmented maps (Paelke and Sester, 2010), and broadly geographic tasks (Dünser, Billinghurst, Wen et al., 2012). Studies by Radburn (2006), Priestnall and Polmear (2006, 2007), Hedley (2008), Mower (2009), Hedley and Behn (2012), Hedley and Chan (2012), Lonergan and Hedley (2014, 2015) have investigated how we might use handheld devices to enable mobile augmented geographic visualization and interaction in everyday spaces, including geographic field trips, environmental performance visualization, risk communication, evacuation communication, and situated simulations. Recent interface research of Veas et al. (2013) implemented static spatial information overlays. However, while technologically advanced, the spatial information (contour maps overlaid on real views) would raise eyebrows in the GIScience community (contour lines draped irregularly on landscapes as textures, rather than spatially rigorous alignment to a datum), cautioning us not to be seduced by compelling 3D visualization interfaces and to maintain geovisualization/GIScience/geoscience rigor. In the architecture, engineering, and construction (AEC) community, collaborations between Microsoft and Trimble have yielded some early prototypes which enable design teams to view and modify building plans and geometry, using HoloLens. While there is considerable novelty to seeing virtual objects attached to real objects (such as tangible AR), several geographic researchers have a bigger vision to develop AR GIS for use in outdoor spaces. Geographic researchers at Laval University, Queen’s University, and Simon Fraser University in Canada combined spatial analysis, spatial gaming, and MAR to deliver a suite of environmentally situated spatial gaming applications (see Harrap et al., 2012). Hedley and Behn (2012) implemented an AR space–time GIS visualization tool (GeoTEA), to explore the ability to fuse marine data (in this case, AIS “breadcrumbs”) with GIS datasets, to perform sophisticated geovisual analysis with it (in this case, 3D space–time characterization of the movement of shipping) which allowed the user to view the 3D space–time paths of shipping in the port of Vancouver, while standing on the water’s edge (see Fig. 12).

Fig. 12 Development and implementation of an augmented reality GIS prototype to allow viewing of 3D space-time GIS analysis of shipping, AIS data and anchorages in real-world environments. Built by Hedley and Behn (2012). Images provided by the author.

364

Augmented Reality and GIS

Fig. 13 A Touch of Rain (described in more detail in Lonergan and Hedley, 2014) – a mobile augmented reality system which enables situated real-time physics simulations that interact with real space. Note particle simulations flowing over rooftops and buildings in the examples above. Both images are live screen captures from tablets. Images provided by the author.

Implementing AR views of “passive” GIS outputs is only part of our opportunity. We have the potential to implement localized, situated dynamic geovisual simulations and analyses. With this goal, the team at SFU’s Spatial Interface Research Lab built working demonstrators of mobile AR geovisual analytics systems that can perform serious analyses and simulations. In 2012, we completed A Touch of Rain, a mobile AR system enabling real-time 3D particle fluid simulations to be run in urban environments from a tablet (Hedley and Lonergan, 2012; Lonergan and Hedley, 2014; Fig. 7). We followed this with a mobile AR interface to run a 3D physics-based tsunami simulation in situ at a coastal field site in Ucluelet, British Columbia (the system is called Tsunamulator) (Lonergan and Hedley 2015). In addition to these systems being able to run 3D physics-based simulations in real landscapes, they also allow users to select their preferred blend of mixed reality. We call this capability FMR (introduced in Lonergan and Hedley, 2014). These examples demonstrate our ability to build “Augmented GIS” to the GIScience community using situated mobile AR. And by doing so, we are linking virtual spaces of analysis and simulation with real spaces of sense-making and interpretation (Fig. 13).

1.24.5

Key Challenges for AR

AR has seen a considerable rise in popularity over the past few years, especially in the entertainment sector. However, for GIScience and spatial analytical applications, AR needs to be more than entertaining. AR must be robust enough to support meaningful spatial perception, analysis, simulation, interpretation, and communication. AR capable devices are not only getting more powerful but they must also be compact and light enough to integrate easily into everyday practice. Or better yet, become part of normal practice. For this integration to occur, tracking, registration, and rendering will need to be extremely low latency

Augmented Reality and GIS

365

(no delay, no jitter). Displays will need to be large FOV and high resolution in order to deliver high-quality blending of real and virtual content. Currently, devices such as HoloLens v.1 have excellent tracking and registration but have a limited FOV (which requires users to sweep the HoloLens’ FOV across their view). Adding greater FOV is possible, but may come at the expense of larger, heavier, more cumbersome displays, which may undermine the experience of AR due to the distractions of weight, discomfort, and bulk. All of these factors have the potential to undermine the user experience and the ability to focus fully on the task at hand or spatial phenomena being visualized. AR glasses or headsets will need to work in everyday practice. Whereas some companies, such as MagicLeap, have focused on generating media and investor interest, companies such as Microsoft, Meta, and DAQRI have worked steadily toward robust high-performance AR systems. Hardware aside, this fusion of real and virtual content will also be heavily influenced by the design of virtual content in mixed realty scenes. While a display device or wearable computer may be able to deliver excellent tracking, registration, rendering, and high-resolution, large FOV views, the virtual content will have an appearance that either abruptly contrasts with the real view or blends into it. In short, we are talking about how the visual, graphical, spatial design of virtual additions to real views may result in more (or less) powerful, effective AR information experiences. Hedley (2001) describes the design of augments and AR information design in terms of “perceptual cogency”; do all augments fit logically with the real scene they supplement? Can augments be designed or tuned to fit the visuo-spatial context of the real-world space they supplement? Understanding this concept is part of another challenge that will need to be accommodated: visual dissonance. Most AR displays work by superimposing virtual digital objects onto real views, using some form of display device. There are going to be times when tracking and registration perform very well, in highly controlled environments or with the most sophisticated AR systems. In these circumstances, and with well-designed augments, the fusion of real and virtual will be convincing and powerful. There are also going to be times when the virtual overlays conflict visually and/or spatially with the real world viewed through a display device. This may be a function of poor tracking and registration, or it might be due to unexpected objects between the AR user and viewing target or vista they are trying to augment. Beyond tracking and registration performance, another factor that comes into play here is occlusion. By this we mean how the scene should look when object interposition resulting in closer or larger objects hiding (or occluding) other objects that would otherwise be in line of sight. Since most AR systems use simple digital superimposition, there is no way for a real object to influence how much of a virtual overlaid augment is visible. This challenge has attracted considerable attention from AR engineers over the last decade, and some of the more sophisticated AR displays and wearable computing devices are now able to scan the geometry of the immediate space surrounding them and integrate this information into the performance of AR applications. Meta’s Meta 2 headset, Microsoft’s HoloLens, and Occipital’s Structure Sensor/Bridge enclosure system, all feature some form of real-time 3D scanning (Occipital’s Structure sensor uses structured infra-red light scanning, e.g.; see Figs. 9 and 10). What this does for AR performance is an ability to calculate and render interposition and occlusion relationships between real and virtual objects, resulting in far more perceptually cogent AR scenes. Which, in turn, make the fusion of real and virtual more seamless and more convincing. Which delivers a more robust and reliable user experience platform as a foundation for serious spatial applications. In summary, tracking and registration needs to be robust, whether in classroom or outdoor environments. Computing performance, ergonomic elegance, and augmented performance need to converge in a package that supports and maximizes a convincing mixed reality visual experience, that do not distract from spatial analytical objectives, tasks, and user experience. Mainstream AR focusing on entertainment and marketing is likely to be a short-term novelty. The longer opportunity is to supplant novelty with meaningful geospatial information experiences that deliver new capabilities to view, perceive, query, interpret, and communicate, all of which add value to science, pedagogy, and practice. Empirical evidence for tangible systems appears to support this. The optimal hardware configuration is still in flux, but there is much for the geospatial community to do to be experienced and ready to adopt each best-of-class transducer as they mature.

1.24.6

Opportunities and Vision for AR in GIS and GIScience

In this article, we have introduced the concepts and technical mechanisms that underpin AR. We have introduced what AR is, its relationship to virtual environments, and the mixed reality spectrum. We have also summarized a cross section of research and development of AR for geographic applications. The previous section tempered the hype surrounding these compelling technologies by discussing key challenges and considerations for AR that are robust enough for geographical applications. Several trends in the spatial information domain form a backdrop to this. First, spatial data capture has to be one of the biggest drivers of changing spatial information economies and ecosystems. Advances in spatial positioning systems, sensor technologies, and mobile devices have revolutionized spatial data workflows in many sectors. Second, the ways we consume data in society have not just been changing but also they have been dynamically morphing, in response to changing mixtures of devices, services, and mobilizations of data in formal and informal settings. Third, over the past few years, we have experienced a massive groundswell in interest in virtual technologies and virtual environments in society. It is critical for us to understand that the geospatial opportunities that lie before us are not the goggles and gadgets themselves but rather the capabilities they may provide us. The simple truth is that AR and MR have the potential to transform how and where

366

Augmented Reality and GIS

Fig. 14

SIRL MSc student Samantha Romano shows 3D augmented reality globe in classroom.

Fig. 15 AR ocean GIS datasets embedded in smartphone. SIRL MSc student Ian Lochhead using topographic/bathymetric GIS embedded in physical table. Interfaces designed and developed by Ian Lochhead and Nick Hedley were introduced during plenary address at ESRI’s Ocean GIS Forum 2016. URL: http://storify.com/deepseadawn/esri-ocean-gis-forum-2016/embed? Images credits: (left) by author; (right) Heather Lochhead.

we view, interact with, consume and share spatial analyses, simulations, and visualizations. They may enable us to perceive, navigate, and experience spatial information in unprecedented new ways; to link abstract science with real space like never before; to collapse space and time like never before. The examples provided in this article have provided a selection of the prototypes and capabilities geographic researchers in this domain have built and tested over a number of years and continue to do so. It is fitting that we conclude the article, therefore, by sharing a vision for the use of AR in geovisual analytical applications. Our future has the potential to be one where we can turn datasets into AR visualization systems in classrooms: 3D visualization systems where students can see and inspect the structural complexity of spatial phenomena (see Hedley, 2001; Shelton and Hedley, 2002; Woolard et al., 2003). These AR systems might be as simple as transforming the GeoWall displays envisioned in the 1990s into classroom-projected AR displays (Fig. 14) or creating a geospatial holodeck that one can teleport any dataset into, for inspection, query, and interpretation (Fig. 15). Or perform 3D simulations of 3D physical processes or human dynamics (see Hedley and Lonergan, 2012) on a table in the classroom! In research, the big opportunity is to close the gap between fieldwork, data processing, and visualization, and linking the outcomes back to the spaces of measurement and observation. Imagine being able to stand at a field site, look through a device at a watershed, and run virtual simulations of overland flow, that interact with the real topography, in real time, using FMR tools. That is exactly the vision Lonergan and Hedley (2015) pursued. But there are many opportunities to extend that work. Imagine being able to perform situate visual analytics that splice real spaces with abstract multivariate data spaces. Imagine having smartphone and tabletop AR GIS displays (Fig. 15) or a room-scale geospatial holodeck (Fig. 16) into which you could bring 3D data of any location or structure and run GIS analyses, AI or physics simulations (Fig. 13). Imagine being able to walk through a landscape and look through the ground, using geospatial x-ray vision (Hedley, 2008). The other major opportunity lies in geospatial reification in all branches of geographic work, being able to see everyday spaces through the analytical/abstract lens of any spatial science, simply by using augmented GIS architectures.

Augmented Reality and GIS

367

Fig. 16 Prototyping an augmented reality Geospatial Holodeck at the Spatial Interface Research Lab. Using Wall-registered AR visualization to inspect 3D data derived from image processing (a 3D reconstruction of sea cliffs from the Central Coast of British Columbia) a spart of our dayto-day workflow. Image provided by the author.

References Azuma, R.T., 1997. A survey of augmented reality. Presence-Teleoperators and Virtual Environments 6 (4), 355–385. Azuma, R.T., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B., 2001. Recent advances in augmented reality. IEEE Computer Graphics and Applications 21 (6), 34–47. Billinghurst, M., Kato, H., Poupryev, I., 2001. The magic book: A transitional interface. Computers & Graphics 25 (5), 745–753. Cheok, A., Yang, X., Ying, Z., et al., 2002. Touch-space: Mixed reality game space based on ubiquitous, tangible, and social computing. Personal and Ubiquitous Computing 6 (5–6), 430–442. Dede CJ, Salzman MC, and Loftin RB (1994) The Development of a Virtual world for Learning Newtonian Mechanics. Proceedings of Multimedia, Hy-permedia, and Virtual Reality (MHVR 1994). Moscow, Russia: 87–106 . Dünser, A., Billinghurst, M., Wen, J., Lehtinen, V., Nurminen, A., 2012. Exploring the use of handheld AR for outdoor navigation. Computers & Graphics 36 (8), 1084–1095. Dykes, J., MacEachren, A.M., Kraak, M.-J. (Eds.), 2005. Exploring geovisualization. Elsevier Press, Oxford. Feiner, S.K., MacIntyre, B., Haupt, M., Solomon, E., 1993. Windows on the world: 2D windows for 3D augmented reality. In: Hudson, S.E., Pausch, R., Zanden, B.V., Foley, J.D. (Eds.), Proceedings of the 6th Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, New York, pp. 145–155. Feiner SK, MacIntyre B, Höllerer T, and Webster T (1997) A touring machine: Prototyping 3D mobile augmented reality systems for exploring the urban environment. In: Proceedings of ISWC ‘97 (First IEEE International Symposium on Wearable Computers), October 13–14, 1997. Cambridge, MA: IEEE. Harrap, R., Daniel, S., Power, M., Pearce, J., Hedley, N., 2012. Chapter 8: Design and implementation of mobile educational games: Networks and innovation. In: Chrisman, N., Wachowicz, M. (Eds.), The added value of scientific networking: Perspectives from the GEOIDE network members 1998–2012. GEOIDE Network, Quebec, pp. 157–187 http:// digitalcommons.mtu.edu/materials_fp/55. Hedley N (2001) Virtual and augmented reality interfaces: Empirical findings and implications for spatial visualization. In: Proceedings of the International Cartographic Congress (ICC 2001). Beijing. http://icaci.org/files/documents/ICC_proceedings/ICC2001/icc2001/file/f16023.pdf. Hedley, N., 2008. Real-time reification: How mobile augmented reality may change our relationship geographic space. Paper Read at 2nd International Symposium on Geospatial Mixed Reality, August 28–29. Laval University, Quebec City. Hedley, N., 2017. Augmented reality. In: Richardson, D., Castree, N., Goodchild, M., Kobayashi, A., Liu, W., Marston, R. (Eds.), The international encyclopedia of geography: People, the earth, environment, and technology. Wiley. http://dx.doi.org/10.1002/9781118786352. Hedley N and Behn S (2012) Deployable non-linear geomovies and 3D augmented reality space-time visualizations as new forms of situated public environmental information tools. Paper presented at XXII ISPRS Congress, 25 August– 1 September 2012, Melbourne. Hedley, N., 2016. Virtual reality and cartography in the twentieth century. In: Monmonier, M (Ed.), History of cartography in the twentieth century. University of Chicago Press, Chicago. Hedley, N., Billinghurst, M., Postner, L., et al., 2002. Explorations in the use of augmented reality for geographic visualization. Presence-Teleoperators and Virtual Environments 11 (2), 119–133. Hedley N and Chan C (2012) Using situated citizen sampling, augmented geographic spaces and geospatial game design to improve resilience in real communities at risk from tsunami hazards. Paper presented at XXII ISPRS Congress, 25 August–1 September 2012, Melbourne. Hedley N and Lonergan C (2012) Controlling virtual clouds and making it rain particle systems in real spaces using situated augmented simulation and portable virtual environments. In: Proceedings of XXII ISPRS Congress, August 25–September 1, 2012, Melbourne, Australia. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXIX-B2, pp. 113–117. Melbourne, Australia: International Society for Photogrammetry and Remote Sensing. Hedley N, Postner L, Billinghurst M, and May R (2001) Collaborative AR for geographic visualization. In: Proceedings of the Second International Symposium on Mixed Reality, pp. 11–18. Yokohama, IEEE. Kähäri M and Murphy DJ (2006) MARA, sensor based augmented reality system for mobile imaging device. In: ISMAR: Proceedings of the International Symposium on Mixed and Augmented Reality. IEEE Computer Society. Kamat, V., El-Tawil, S., 2007. Evaluation of augmented reality for rapid assessment of earthquake-induced building damage. Journal of Computing in Civil Engineering 21 (5), 303–310. Kim, W., Kerle, N., Gerke, M., 2016. Mobile augmented reality in support of building damage and safety assessment. Natural Hazards and Earth System Sciences 16 (1), 287–298. Lonergan, C., Hedley, N., 2014. Flexible mixed reality. Cartographica 49 (3), 175–187. Lonergan, C., Hedley, N., 2015. Navigating the future of tsunami risk communication: Using dimensionality, interactivity and situatedness to interface with society. Natural Hazards 78 (1), 179–201. MacEachren, A.M., Edsall, R., Haug, D., et al., 1999. Exploring the Potential of Immersive Virtual Environments for Geographic Visualization. http://www.geovista.psu.edu/ publications/aag99vr/fullpaper.htm (accessed May 6, 2017).

368

Augmented Reality and GIS

Milgram, P., Kishino, F., 1994. A taxonomy of mixed reality visual displays. IEICE Transactions on Information and Systems 77 (12), 1321–1329. Mower, J., 2009. Creating and delivering augmented scenes. International Journal of Geographical Information Science 23 (8), 993–1011. Paelke, V., Sester, M., 2010. Augmented paper maps: Exploring the design space of a mixed reality system. ISPRS Journal of Photogrammetry and Remote Sensing 65 (3), 256–265. Parkes, D., Thrift, N., 1980. Times, spaces and places: A chronogeographic perspective. John Wiley and Sons, New York. Piekarski, W., Thomas, B.H., 2002. ARQuake: The outdoor augmented reality gaming system. ACM Communications 45 (1), 36–38. Priestnall G and Polmear G (2006) Landscape visualization: From lab to field. In: Proceedings of the 1st International Symposium on Geospatial Mobile Augmented Reality. Banff, 29–30 May 2006. Priestnall G and Polmear G (2007) A synchronised virtual environment for developing location-aware mobile applications. In: Proceedings of the 15th Annual Geographical Information Science Research UK Conference (GISRUK’07), pp. 236–240. Radburn, A. (2006). A mobile augmented reality demonstrator. In: Proceedings of the First International Workshop on Geospatial Mobile Augmented Reality. Banff, 29–30 May 2006. Quebec City, REGARD. Salzman MC, Loftin RB, Dede C, McGlynn D (1996) ScienceSpace: Lessons for designing immersive virtual realities. In Conference Companion on Human Factors in Computing Systems, April 18 1996, pp. 89–90. ACM. Schall, G., Mendez, E., Kruijff, E., Veas, E., Junghanns, S., Reitinger, B., Schmalstieg, D., 2009. Handheld augmented reality for underground infrastructure visualization. Personal and Ubiquitous Computing 13 (4), 281–291. Schmalstieg, D., Reitmayr, G., 2007. Augmented reality as a medium for cartography. In: Cartwright, W., Peterson, M., Gartner, G. (Eds.), Multimedia cartography. Springer, Heidelberg, pp. 267–282. Shelton, B., Hedley, N., 2002. Using augmented reality for teaching earth-sun relationships to undergraduate geography students. Proceedings of the First IEEE International Augmented Reality Toolkit Workshop. ACM Press, Darmstadt. Shelton, B., Hedley, N., 2004. Exploring a cognitive basis for learning spatial relationships with augmented reality. Technology, Instruction, Cognition and Learning 1 (4), 323–357. Slocum, T., Blok, C., Jiang, B., Koussoulakou, A., Montello, D.R., Fuhrmann, S., Hedley, N.R., 2001. Cognition and usability issues in GeoVisualization. Cartography and Geographic Information Science 28 (1), 61–76. Spring R and Braun M (2008) EnkindA Google Android prototype application. http://enkinblog.blogspot.ca/ (Accessed 16 February 2017). Stephenson, N., 1992. Snow crash. Bantam Books, New York. Thomas, B.H., Close, B., Donoghue, J., Squires, J., De Bondi, P., Piekarski, W., 2002. First person indoor/outdoor augmented reality application: ARQuake. Personal and Ubiquitous Computing 6 (2), 2002. UploadVR (2017) Epic games teams with Chevrolet on new AR car project. https://uploadvr.com/epic-chevrolet-ar-car/ (accessed 1 March 2017). Veas, E., Grasset, R., Ferencik, I., Grünewald, T., Schmalstieg, D., 2013. Mobile augmented reality for environmental monitoring. Personal and Ubiquitous Computing 17 (7), 1515–1531. Vlahakis, V., Ioannidis, N., Karigiannis, J., Tsotros, M., Gounaris, M., Stricker, D., Gleue, T., Daehne, P., Almeida, L., 2002. Archeoguide: An augmented reality guide for archaeological sites. IEEE Computer Graphics and Apps 22 (5), 55–60. IEEE Computer Society Press. Winn, W., 1993. A conceptual basis for learning in virtual environments. HITLab Tech Report R-93-9. University of Washington, Human Interface Technology Laboratory, Seattle. Winn W, Windschitl M, and Hedley NR (2001) Using immersive visualizations to promote the understanding of complex natural systems: Learning inside virtual Puget Sound. In: Proceedings Annual Meeting of the National Association for Research on Science Teaching (NARST). St Louis, USA. Wither J, DiVerdi S, and Höllerer T (2006) Using aerial photographs for improved mobile AR annotation. Paper and demonstration, International Symposium on Mixed and Augmented Reality (ISMAR) 2006. Santa Barbara, USA: IEEE. Woolard, A., Lalioti, V., Hedley, N.R., et al., 2003. Case studies in the application of augmented reality in future media production. In: IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR 2003). Tokyo, IEEE Computer Society.

Further Reading Andrienko, G., Andrienko, N., Jankowski, P., Keim, D., Kraak, M.J., MacEachren, A., Wrobel, S., 2007. Geovisual analytics for spatial decision support: Setting the research agenda. International Journal of Geographical Information Science 21 (8), 839–857. Peng, Z.R., Tsou, M.H., 2003. Internet GIS: Distributed geospatial information services for wired and wireless networks. John Wiley and Son Inc., New York. Rekimoto, J., Nagao, K., 1995. The world through the computer: Computer augmented interaction with real world environments. In: Proceedings of the 8th Annual ACM Symposium on User Interface and Software. Technology UIST’95. ACM, New York, pp. 29–36. Rekimoto, J., 1997. Navicam: A magnifying glass approach to augmented reality. Presence: Teleoperators and Virtual Environments 6 (4), 399–412. Tamura, H., Yamamoto, H., Katayama, A., 2001. Mixed reality: Future dreams seen at the border between real and virtual worlds. IEEE Computer Graphics and Applications 21 (6), 64–70. Tsai, M.-K., Lee, Y.-C., Lu, C.-H., Chen, M.-H., Chou, T.-Y., Yau, N.-J., 2012. Integrating geographical information and augmented reality techniques for mobile escape guidelines on nuclear accident sites. Journal of Environmental Radioactivity 109, 36–44. Waller, D., Hunt, E., Knapp, D., 1998. The transfer of spatial knowledge in virtual environment training. Presence 7 (2), 129–143. MIT Press.

Relevant Websites https://www.transportvr.com/theblu-series–WEVR. TheBlu - Undersea VR series on WEVR https://www.youtube.com/watch?v¼Iys8yo0sjYg – Occipital. Introducing bridge https://techcrunch.com/2016/06/15/htc-mixed-reality/ – Techcrunch. HTC’s mixed reality demo shows onlookers what it’s like to experience VR https://www.linkedin.com/pulse/pokémon-gotransform-gis-location-based-services-lbs-ming-hsiang-tsou – MH Tsou. Will Pokemon Go transform GIS and Location-based Services (LBS)? Online white paper research discussion https://www.youtube.com/watch?v¼5PSFCnrk0GI – Virtusphere. Locomotion platform for walking in virtual space https://www.youtube.com/watch?v¼pCvvoJDO8Ck – Volvo Cars. Head-up display. http://storify.com/deepseadawn/esri-ocean-gis-forum-2016/embed? – D Wright. Tweets and clips covering. Immersing ourselves in virtual worlds to understand ocean environments. Plenary presented by the author at ESRI Ocean GIS Forum 2016.

1.25

GIS and Serious Games

Brian Tomaszewski, Angelina Konovitz-Davern, and David Schwartz, Rochester Institute of Technology, Rochester, NY, United States Joerg Szarzynski and Lena Siedentopp, United Nations University Institute for Environment and Human Security, Bonn, Germany Ashely Miller and Jacob Hartz, Rochester Institute of Technology, Rochester, NY, United States © 2018 Elsevier Inc. All rights reserved.

1.25.1 1.25.1.1 1.25.1.2 1.25.1.3 1.25.1.4 1.25.1.5 1.25.2 1.25.2.1 1.25.2.2 1.25.3 1.25.3.1 1.25.3.1.1 1.25.3.1.2 1.25.3.1.3 1.25.3.1.4 1.25.3.2 1.25.4 1.25.5 1.25.5.1 1.25.5.2 1.25.5.3 1.25.5.4 1.25.5.5 1.25.6 References

1.25.1

Introduction Games Serious Games Serious Games vs. Simulations Gamification Spatial Representations and Serious Games GIS and Serious Game Literature Review Disaster Management and Serious Gaming Other Examples of GIS and Serious Games GIS and Serious Game Case Studies Emergency Simulation Exercises for Capacity Development Within Postgraduate Education The simulation exercise scenario The simulation exercise (SimEx) The pedagogic value The added value of geospatial applications in the field “Serious” GIS (SerGIS) Evaluating Serious GIS Games A GIS and Serious Game Research Agenda Geo-Gamification Spatial Representations and Serious Games Expert Knowledge Incorporation Into Serious GIS Games Evaluating Serious GIS Games Technology Research Summary and Conclusions

369 370 371 371 371 372 372 372 374 374 374 375 376 376 377 377 378 381 381 381 381 382 382 382 382

Introduction

This article presents the interdisciplinary idea of combining geographic information systems (GIS) and serious games. Maps have been integral component of games (serious or not) for many years. In particular, maps are often used as a contextual layer for a gaming experience. For example, the first map that many people in the United States may have encountered was the RiskÒ game board that used a Mercator-based map projection (Fig. 1). In the past 20 years, both the geospatial technology industry and the video game industry have grown in financial revenue and influence (the global video gaming industry had an estimated $99 billion dollar value in 2016 (http://venturebeat.com/2016/04/ 21/video-games-will-become-a-99-6b-industry-this-year-as-mobile-overtakes-consoles-and-pcs/); a 2013 estimate valued the GIS industry at $270 billion (http://geospatialworld.net/uploads/magazine/Geospatial-World-December-2013.pdf)). For example, maps have become well established and commonplace in diverse fields ranging from medical analysis to disaster response (Tomaszewski et al., 2015). Furthermore, digital games, besides providing entertainment value, are becoming increasingly incorporated into teaching, training, and learning practice (Salen and Zimmerman, 2004). In this article, we argue that further integration of GIS and serious games can have long ranging impact on learning and advancement of spatial knowledge and expertise in numerous application domains. As previously stated, maps are commonplace in modern-day computer games. However, there is much more that can be done in terms of incorporating the spatial analytical and representational power of GIS into game experiences. Additionally, the integration of GIS into numerous application domains raises interesting and far-reaching research questions on the combination of GIS and serious games. In this regard, we draw heavily, although not exclusively, upon disaster management as an exemplar case study of the integration of GIS and serious games for education, spatial thinking skill development, and problem solving. We begin the article with definitions to provide context for other parts of the article and to provide readers with background knowledge on important concepts. Specifically, we define games, serious games, the difference between serious games and simulations, the idea of gamification, and spatial representations within serious games. With this background, we provide a literature review focused on the

369

370

Fig. 1

GIS and Serious Games

The classic Risk game board.

use of GIS and serious games in disaster management education. We also discuss GIS and serious games in relation to other concepts such as public participation GIS (PPGIS). We then draw on two GIS and serious game case studies to illustrate previous points made in the article. The first is a real-time simulation exercise that incorporates geospatial tools into an emergency response scenario in the context of emergency management capacity building in postgraduate education. Second, we discuss the ongoing research at developing the “serious” GIS or SerGIS system that is designed for flexible creation of serious games using real GIS data and tools. Next, we discuss the important topic of evaluating serious games in terms of their efficacy for learning. We then outline a GIS and serious games research agenda to provide researchers, practitioners, and educators ideas to think about the long-term implications of combining GIS and serious games. The article ends with a summary and conclusion about the integration of GIS and serious games.

1.25.1.1

Games

Although the notion of play and culture was studied in the early 1900s in the classic book, Homo Ludens (Huizinga, 1938), academic game degree programs are a relatively new phenomenon with respect to traditional academic fields. The advent of video games in popular culture and industry since the 1970s has driven a multitude of students to study and make games in academia. UBM Technology Group’s (2016) “Game Career Guide” currently lists over 430 colleges and universities. With faculty both responding and furthering this interest, researchers continue to ask seemingly simple questions about games, which have surprisingly complex answers. For example, consider TetrisÒ, which is often considered a “game.” In most games, the player overcomes challenges to win or beat the game. But, in TetrisÒ, the player keeps stacking rows of shapes with no final end state, and so, is it really game? And if it is not, then what is it? At the core of this work in this article, we start with the definition of a game. It is an interesting exercise to define this seemingly obvious concept, especially given the dependent relationships of play and fun. Considering the numerous examples of activities that are fun (e.g., playing with a toy and watching a movie) or could be considered fun for some (e.g., work and school) will usually give pause. For example, if someone has fun in a class, earns points toward a grade, and learns a new skill, is that class at its core a game? We address this idea throughout “Introduction” section. Realizing that researchers needed common ground and terminology for advancing a body of knowledge, Salen and Zimmerman (2004) summarized a multitude of definitions by professionals and academics, which they distilled into their own: “A game is a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome.” This definition captures essential notions of challenges presented to players through structured rules and goals that the players seek. Consider the TetrisÒ example abovedwith a formal academic definition, it is actually considered a “toy” that can be played. Many other experiences and devices fall into a similar category, like MinecraftÒ and other popular “games.” For the purpose of this article, we seek to focus on specific kinds of games that apply to GIS, and so, we drill down a bit into specific kinds of games that are used in education, training, and work, as discussed next.

GIS and Serious Games 1.25.1.2

371

Serious Games

Through Salen and Zimmermans (2004) definition, the example of a student having fun in a class might actually be participating in a game. Their definition broadens a game to include “game-like” experiences, like work, school, and even simulations, where structured experiences provide real rules (e.g., policies) that the participants follow to achieve real goals. Using the example of a class, course material tends to abstract the real world, which presents an artificial “conflict.” A student competing with himself/herself, or perhaps the infamous “grading on a curve” are obvious examples of “conflict,” but many other aspects of a class present similar challenges: difficult material, abstract concepts, new contexts, or simply juggling multiple courses in school. The artificial aspect of coursework draws highly from educational material abstracting or distilling long-term, complex problems into smaller “chunks” for presentation, homework, and tests. Common examples include business and engineering courses, which borrow heavily from real-world situations, but imagine a student designing an entire luxury cruise liner in a few weeks. There are many similarities of games and education; Bayliss and Schwartz (2009) draw multiple connections between instructional design and game design. This broader definition of a game also helps extend “games for fun” to “games for work,” that is, serious games. In the past two decades, there has been a surge of game-related work. Demand fordand bydstudents has been a major factor, but the lack of early academic scholarship in play and games leads one to wonder about this absence, especially given the vital importance of games and playing in cultures. Perhaps the most obvious answer is the perception of games as frivolous activitiesdthat is, only “serious” human aspects deserve study. Researchers using games for nonentertainment purposes began to use the term “serious game” in a variety of contexts, especially for work, education, military exercises, and more (Djaouti et al., 2011). Although a broad definition of a game should help convince others of the importance of studying games, applying the adjective serious helps elevate the “importance” of such study. When considering serious games, the research community has struggled with the term “serious,” as the word implies that such games are not fun for the players. When considering the definition of game, as discussed above, “serious” implies that the artificial conflict and rules derive from the real worlddthe stakes for winning are much higher in the sense that the player may develop real-world knowledge, which is the category in which our game resides. A variety of categories exist, including health, advertising, military, education, and many more.

1.25.1.3

Serious Games vs. Simulations

A common source of confusion is the relationship of games and simulations. On their own, the importance of simulations cannot be overstated. For example, before performing crash tests of cars, crashes of simulated cars via mathematical and engineering models help narrow the “design space” to save companies a significant amount of time and resources. Fields that involve visualization, like biology and big data, use computer graphics and human–computer interfaces to help display extremely complex systems to improve understanding. The visualization and interaction of simulations often increase the confusion with games because of the nature of the interactive software with respect to its interface, presentation, and controls. Starting from a game player’s perspective, some games offer such realistic environments, as to simulate a real-world location. The worlds of games such as Assassin’s CreedÒ, FalloutÒ, and many others use real-world locations that span large areas to explore. This software is marketed as games, released on game consoles, and reflected in popular culture as games. Considering real-time 3D visualizations that offer interfaces, like architectural walk-throughs in a proposed building, these experiences clearly do not represent games and yet, a user’s avatar navigating a simulated environment only differs in purpose, for example, not needed to survive an attack while investigating window furnishings in an office building. In fact, AutodeskÒ is a phenomenal example of how a software company provides both 3DSMAX (animation, games, films) and AutoCAD (engineering, architecture). When simulation and entertainment software, controls, interfaces, and presentation both give strikingly similar interfaces, users unfamiliar with games might assume that their simulated walk-through is somehow a game. This idea recalls a classic question: is the medium the message? In this case, nodthe tools used for simulations might actually be the same ones used for making games, but the experiences created might have entirely different purposes. However, the potential to “blur the lines” between games and simulations becomes possible, given the fidelity is now provided by software tools. When thinking of simulations, especially those for scientific fields like GIS, a “sliding scale” of simulation of physical reality in which a designer might reduce fidelity to focus on certain features to bring more game-like experiences into simulations. The field of wargame design and wargaming provides a useful example (Schwartz et al., 2007), especially given the resources spent by governments to test combat situations. In Schwartz (2008), the classic Tic Tac Toe game is an extremely abstracted wargame, which in itself is an abstraction of real war. The design space between these extremes is dense, given the variety of decisions, scenarios, and models that can influence the design. All of these issues continue to be the focus of academic exploration and debate. For the purpose of this article, we choose to preserve the core definition of games involving rules and goals. With respect to bridging games and simulations, we look to gamification, as explained in the next section.

1.25.1.4

Gamification

The observed similarity of games and classes, serious games, and simulations that “slide” toward game-like experiences all point to the concept of gamification. We use the definition from Deterding et al (2011): “‘Gamification’ is the use of game design elements in nongame contexts.” By gamifying an experience, one applies game concepts to that experience, which should (theoretically) drive

372

GIS and Serious Games

interest, engagement, and passion. For example, the more fun work is for an employee, the better the productivity, retention, and more. Although many people might naturally assume nongame experiences as games for themselves, Coonradt and Nelson (2007) documented the concept in “The Game of Work.” An interesting example is our work vernacular. Although informal, one may often hear about how a worker “games the system,” as in finding loopholes in rules and policies to exploit advantages. In a more productive sense, using rewards to encourage achievements is a common workplace strategy. Part of the drive of gamification is observation of gamers “glued” to their platforms (TVs, mobile devices, computers, etc.), fixating on overcoming the challenges the game provides. If companies, schools, the military, and others could somehow map the interactions, rules, and challengesdall of which a game comprisesdthen somehow the workers, students, soldiers, and others would all engage more deeply. At least that is the theory (e.g., Egenfeldt-Nielsen et al., 2016). For the purpose of this article, we focus on the aspects of serious games to explore spatial representations.

1.25.1.5

Spatial Representations and Serious Games

The comparative effectiveness of several methods of spatial representations in serious GIS games is an important research question. More than ever, interactive platforms that allow for 3D visualization of geographic environments have been proposed to accommodate experts in retrieving and interpreting spatial data in a variety of fields, including but not limited to disaster response and urban planning. 3D city models, 3D spatial data, and virtual reality simulations are currently being used to accomplish essential tasks such as on-scene assessment, optimal path calculation, and disaster response training. However, how 3D graphical representations compare to traditional 2D displays is a key consideration. Chen (2011) notes that one of the major disadvantages of 3D GIS and city models is the high cost of production. Despite appearing more technically and visually impressive, what advantages do 3D mapping and visualization offer over their 2D counterparts? Are 3D models worth the cost of production? Many have answered the first question, addressing the gap of knowledge between 2D and 3D visualization. Unlike traditional 2D maps, which rely on symbology, 3D maps include illumination, shadows, texture information, and the ability to change perspectives (Chen, 2011). Tiwari and Jain (2015) argue that in the context of disaster management, 3D graphical representation “reduces the cognition effort needed to interpret the situation,” thus improving the efficiency of decision-making. Herbert and Chen (2015) reported a study evaluating the perceived usefulness of 2D and 3D visualizations according to urban professionals. The study suggests that the usefulness of 2D or 3D depends on the specific tasks undertaken. 2D visualization was perceived to be more useful by urban professionals for “simple, measurement-based tasks” such as assessing building height, footprint, and setback. However, 3D visualization is suggested to help urban planners determine the “context” of a proposed building in its terrain and how it interacts with surrounding buildings from many perspectives. Additionally, 3D was considered useful for performing more complex urban planning tasks such as assessing a building’s recession plane. These complex tasks are described as tasks that “involve more imagination and mental manipulation” (Herbert and Chen, 2015). Shen et al. (2012) address the 2D versus 3D comparison in the context of emergency management information systems. Similar to the Herbert and Chen (2015) study, Shen et al. (2012) suggest that 3D displays do not guarantee superior decision-making for all tasks. Rather, decision performance is determined by the specific task, not the dimensionality itself. For example, 2D visualization may be better suited for tasks involving judgments on relative position and orientation. Shen et al. (2012) suggest a hybrid EMIS for toggling between 2D and 3D displays as well as training to enable decision makers to choose the most appropriate display format based on the current situation. Although more research needs to be done on comparing 2D and 3D visualizations, it seems that the relative usefulness of each dimension display may be task dependent. From the above literature, it is suggested that 3D GIS may be an effective tool in disaster management. 3D displays and models could provide responders with essential geo-information necessary for responding promptly to events. Additionally, 3D visualization is useful for developing simulations that could improve disaster preparedness and virtual training procedures for first responders. However, it is still unclear how the 2D and 3D displays compare for decision performance, especially in the context of serious games. For a potential disaster simulation using a 3D city model for navigation, it may be advantageous to include an option to view the 2D map.

1.25.2

GIS and Serious Game Literature Review

GIS and serious games have seen a small but growing body of literature. In the following sections, we present a literature review of GIS and serious games from the perspective of disaster management and serious gaming and also look to other examples of research focused on combining GIS technology and serious games.

1.25.2.1

Disaster Management and Serious Gaming

In the field of disaster management, serious games are being introduced as a means of eliminating the shortcomings of traditional training tools such as simulated drills. In addition to lacking realism, these drills often require large investments of time and money to arrange and execute. These shortcomings make these training tools impossible to repeat in short amounts of time. Recent introductions of serious games into the disaster management training routine allows upcoming first responders to optimize their training

GIS and Serious Games

373

by achieving the most effective results with less of a need to invest great amounts of time and money. The following are reviews of several released serious games that include either a mapping tool or a GIS component as well as common mechanics seen within these games. Many existing serious games in disaster management include a GIS component as a means of providing spatial awareness in the gaming scenario. The presentation of maps, context for the scenario, specific locations, and the ability to interact with the environment are all examples of spatial awareness as presented in a gaming context. A gaming scenario that presents this exceptionally well is C^3Fire, a microworld simulation for emergency management (Granlund, 2001). Using the tool both as a means for the leader to communicate to his personnel and as a means for the personnel to keep record of their findings, this game relies heavily on the graphical interface that a GIS is capable of providing in order to enhance their communications between players. Granlund (2001) revealed in his findings that those participants who selected to use the GIS and mapping tools provided to them had a higher rate of accurately identifying fires than those who simply chose to use the diary and standard communication tools. He elaborated as well by stating that the data from the GIS tool were much more beneficial for debriefing of the game since it provided the instructors with quantitative data rather than just simply qualitative feedback. Several other disaster management games effectively provide spatial awareness without necessarily including real GIS functions. BreakAway Ltd. (2012) presents Incident Commander (Fig. 2), a game created in conjunction with the US Department of Justice, that also considers spatial context for their game, giving users a map of the surrounding area where the disaster is located. As people work through the situation, they are able to reflect on the context of the emergency and make decisions based on what’s present in the area. Hazmat Hotline uses maps in a slightly different way, still giving context to their users, but on a much more local level (Schell et al., 2005). Giving them locations of victims, of the source of the hazardous material, and of their crew, this game allows users to think about how to best handle the situation, given where everything is located in relation to each other. Although GIS and spatial components are critical components of disaster training serious games, there are other factors seen within released games that strengthen a game’s viability as a training tool. One of these factors is the inclusion of a stress component to portray the reality of the situation at hand. There are several ways to address the stress component within a gaming context, one of which is using time as a key game mechanic. Haferkamp et al. (2011) demonstrates how each is portrayed within the game, DREADED. DREAD-ED works based on limited time for team discussion and decision-making, giving the team between 30 and 45 min to reach a decision. To introduce a stress component as well, the game displays four scales to the players which change based on decisions they make to give real-time feedback after every more. Both poor and wise decisions come with feedback. The tactical decision games created by Crichton and Flin (2001) reflect upon similar components, allowing only an hour and a half for participants to completely work through their game, and introducing contingencies throughout the entire duration. The time component emphasizes the need for emergency responders to act quickly in light of a disaster. In addition to a time limit, stress is also factored into serious games through the inclusion of an unpredictable factor. Created in conjunction with VSTEP and several agencies around Europe, RescueSim is a flexible gaming environment that is controlled strictly through an instructor toolbox (VSTEP B.V., n.d.). The instructor not only creates the original scenario that will be presented to the players but also is capable of changing the weather in real time, showing the progression of an incident as it would look in real life, and introducing secondary events off of the primary one. Each of these changes is not known or able to be predicted by the players. SPOEL, a virtual environment created for the management and training of mass evacuations, allows for stress to be portrayed in a slightly different manner, working with the changes in human behavior as well as resource distribution and management as their primary sources of stress (Kolen et al., 2011). Victims within the game are able to change their opinions and actions based on media and decisions of the emergency crews.

Fig. 2

Screen shot of Incident Commander from https://youtu.be/Gc1CnfQKkZc.

374

GIS and Serious Games

Road systems are also a limited resource, as they are capable of degrading within the game, or becoming too congested to use as viable evacuation routes. Another component present in released disaster management games is the use of news stories or information recaps within the game scenario. Information provided to the players throughout the game scenario is another crucial piece to their ability to fully understand about what is going on as the incident revolves around them. IMACSIM provides this through use of waypoints (Benjamins and Rothkrantz, 2007). As the users make their way through the simulated environment they are able to visit numerous waypoints which provide information on the current state. These waypoints are flexible with scenarios, meaning that they can fit to a variety of different conditions and emergencies, and they are also able to accurately reflect any changes that occur throughout game play. Disaster in my Backyard also takes advantage of the opportunity to introduce information throughout the game, using QR codes and victims as the information source (Meesters and van de Walle, 2013). Set up as a live walk-through game, this scenario is much more hands-on in their information presentation. As players make their way through from start to finish, they are able to interact with actors who are playing victims within the game, receiving various amounts of information as they interact with them. Similarly, participants are also given an app which allows them to interact with QR codes that are placed throughout the game environment. These QR codes contain relevant information and allow communication between people as the game plays out.

1.25.2.2

Other Examples of GIS and Serious Games

GIS and its many tools have several applications in serious games outside of the disaster management field as well. For example, Ahlqvist et al. (2012) is an excellent example of using GIS technology to incorporate real-time data streams in a massive online simulation geared for student learning about human–environment relations. Additionally, serious GIS games are not necessarily only for student learners or professionals in training but can also target the general public. Qamar et al. (2014) present a unique example of such a game with a novel idea for an “immersive map navigation experience” for patients undergoing physical therapy for hemiplegia. This gaming environment involves map navigation influenced by both the Leap Motion controller and the Microsoft’s Kinect. Its 3D motion sensors are able to noninvasively detect hand therapy motions and joint movements while converting them into movements seen within game interface. The serious gaming environment is advantageous for both patients and therapists because of its lightweight web interface and its use of cloud-based storage that allows patients to use it anywhere and anytime. The GIS game interface consists of either a 2D or 3D map that can be browsed by the patient’s gestures. The different dimensions of spatial representation require different controls to navigate the map interface. To navigate the 2D interface, the user must “grab” the map by clenching the fist and moving the fist in the desired direction (up, down, left, or right). 3D navigation, which is represented by a freely moving kite on the screen, is influenced by more specific hand gestures that enable controls such as zooming out. Serious gaming environments such as this one show major potential for making therapy sessions more entertaining and immersive for patients. Serious game environments have additionally been implemented with PPGIS. Conceptually founded in the 1990s, the purpose of PPGIS is to enable communication between decision makers and local groups through the GIS and geographical education (Poplin, 2012). Poplin (2012) has studied a specific implementation of web-based PPGIS within an online questionnaire for residents of Wilhelmsburg, Germany. In this questionnaire, participants were asked questions concerning their relationship with the canals of the quarter. These questions involved having participants drawing points and lines on an interactive map interface, marking locations such as a participant’s favorite place in Wilhelmsburg. Within the conclusion of the study, Poplin (2012) questions if serious games could be integrated in public participatory processes to hold participant interest in the decisionmaking process. The integration of serious gaming in PPGIS has been implemented in recent years. In a later study, Poplin (2014) studied the design of a serious PPGIS game called “B3dDesign Your Marketplace.” This serious game provides a playful environment in which citizens are able to submit their own designs of a public space based on current information received about the city district. In addition to submitting designs, citizens can vote for their favorite designs and communicate with fellow participants as well as urban planning experts. Serious gaming environments such as “B3” integrated into public participatory processes not only facilitates citizens’ interest in urban planning and decision-making, but also has pedagogical value in their ability to educate participants on geospatial technologies in an engaging manner.

1.25.3

GIS and Serious Game Case Studies

In the following sections, we draw upon two GIS and serious game case studies based on the research of the authors of this article. These case studies can serve as examples of both the incorporation of GIS Technology into real-time live simulations and development of digital GIS serious games.

1.25.3.1

Emergency Simulation Exercises for Capacity Development Within Postgraduate Education

Disaster risk results from environmental hazards of diverse nature interacting with social-ecological systems and the different vulnerabilities of societies and populations (UNISDR, 2015). Policy makers and practitioners are increasingly realizing the importance of bringing together the disaster risk reduction (DRR) and the climate change adaptation (CCA) communities around the

GIS and Serious Games

375

globe, as well as the sustainable development goals (SDG). Mainstreaming CCA and DRR into SDG activities is considered as a strong and valuable measure to reduce vulnerability and to progressively increase resilience (Harris and Bahadur, 2011). A key barrier though, to improving resilience to disaster risk and climate change impacts at the national level has been identified as a lack of sound scientific knowledge, operational capacity, and technical expertise often resulting from the absence of sustainable and quality assured formal training programs in the area of DRR- and CCA-related sectors (Jordan et al., 2010). The increasing complexity of operational environments in global DRR and Emergency Response in general, in combination with the growing challenges of the humanitarian system at large, also reinforces the need for knowledgeable, skilled, and readily available personnel. Since 2013, the United Nations University Institute for Environment and Human Security (UNU-EHS) in Bonn, Germany is offering the Joint Master of Science (M.Sc.) Program “Geography of Environmental Risks and Human Security” together with the Department of Geography, University of Bonn, as an international degree program with a research-oriented profile. This program provides an in-depth introduction to problem-oriented research methods, theories, and concepts as well as real life challenges and problems that international and UN organizations are facing. The curriculum includes research areas such as vulnerability assessment, resilience analysis, risk management and adaptation strategies within linked human–environment systems, and environmentally induced internal displacement and transboundary migration. Among these major subjects students are also trained in the field of emergency response and preparedness. A small-scale emergency simulation exercise has been designed as a fictional disaster event with the purpose of understanding the implementation of standards and procedures that come into effect during a real emergency as well as training the application of geospatial tools for mapping and assessment purposes in the field. Basic geospatial tools are used to practice an emergency assessment as well as to improve coordination and communication among different stakeholders. At the same time, this simulation exercise was the basis for the development of the digital serious games approach, as outlined by a number of former publications (Blochel et al., 2013; Mathews et al., 2014; Tomaszewski et al., 2014, 2016) and further discussed in ““Serious” GIS (SerGIS)” section of this article.

1.25.3.1.1

The simulation exercise scenario

On June 9, 2013, a thunderstorm, heavy rainfall, and storm caused large-scale flooding along the Rhine River, in particular the city of Bonn was affected (Fig. 3). The river level increased to 8 m, which is almost 4 m above normal. The game scenario narrative is as follows: Two children have been reported drowned, numerous persons injured, and hundreds of people displaced. A first rapid assessment reports that infrastructure was heavily affected and several roads, railways, and bridges are closed due to flooding and collapsed trees. Detailed GPS data of the reported sites are not available. A hospital located nearby the river is overcrowded and does not have the capacity to receive more patients. A water treatment plant has been severely flooded and all electrical systems including back-up pumps have stopped working. The city’s water authority is already receiving costumer complaints regarding cloudy, poor-tasting drinking water. Strong wind gusts cut down power lines seriously paralyzing communication and transportation networks. In addition, a ship carrying 200,000 gal of crude oil and barrels of toxic chemicals has been reported wrecked at the South Bridge, connecting Bonn Rheinaue with BonndOberkassel, with oil spilling from the damaged cargo tank into parts of the Rheinaue park area (Fig. 4). Details about the toxic content of the barrels are unknown. According to technical experts the toxic chemicals may reach the water treatment plant within a short time due to the rapid current of the Rhine River. The official German Weather Service is forecasting the continuation of heavy rainfall combined with strong winds over the coming days. The game player then engages in the following roles. Specifically, in support of the German government the United Nations Office for the Coordination of Humanitarian Affairs has deployed a United Nations Disaster Assessment and Coordination

Fig. 3 Bonn.

The Rhine River in western Germany has a long history of flooding as seen in this 1930 image from Cologne, Germany which is north of

376

Fig. 4

GIS and Serious Games

The Bonn Scenario: toxic waste barrels that have washed up on shore from the flooded Rhine River.

(UNDAC) team. UNDAC has established an On-site Operations Coordination Centre (OSOCC) and the team supports the activities of German authorities in charge of disaster management and civil protection, such as fire fighters, emergency medical personnel including public health authorities, the Federal Office of Civil Protection and Disaster Assistance (BBK), the German Federal Agency for Technical Relief (THW), federal agencies such as the Environmental Protection Agency, state drinking water agencies, the German Red Cross, as well as local law enforcement. An additional CBRN Team (chemical, biological, radiological, and nuclear defense) from Rotterdam, The Netherlands, arrived on-site to support the team from BBK with regard to the identification and rescue of the hazardous materials (HAZMAT).

1.25.3.1.2

The simulation exercise (SimEx)

The students in the simulation exercise SimEx are divided into different groups and represent members of the UNDAC team. Their task is the assessment of four hot spot sites that are of particular high risk to affect the environmental system and population: water, biohazard, oil spill, and infrastructure. Equipped with GPS devices the students have to locate the four hot spots and assess the extent. Specific GIS functionalities, such as buffer, and further tools to combine buffered areas (union, intersection etc.) are used to display varying dissolve types. Collected GPS data (way points and tracks) will be analyzed, reproduced, and visualized in a map. Regular injections by the exercise control team additionally challenge the students, also turning the exercise into a more realistic setting. Finally, an emergency assessment report including a map of the affected area is requested for submission to the OSOCC.

1.25.3.1.3

The pedagogic value

Simulation exercises, including role-play activities, form part of a broader set of learning techniques (Shaw, 2010), in particular, relating to approaches on experiential and evidence-based learning (Petty, 2009). The United Nations University Bonn recognized the high pedagogical value of simulations and the fact that role-play exercises can be used to achieve a variety of different learning objectives, ranging from content and substantive knowledge to critical thinking and problem solving. From an educational point of view, the integration of an active learning exercises into face-to-face classroom allows instructors to achieve several different objectives that are beneficial to the students: (1) promoting student’s interaction and input; (2) promoting students creativity, curiosity, and interest; (3) creating an alternative way of presenting course material; and (4) applying theories and concepts introduced in class in the field. Within this particular scenario the teams involved exercise coordination, information management, needs assessment, safety and security procedures, cultural awareness, communications/media management, logistics, personal preparedness and self-sufficiency, as well as the individual organization’s tasks and mandates. Of particular importance is the awareness and understanding of their own individual roles and responsibilities and to be reasonably comfortable with them (UK Cabinet Office and National Security Intelligence, 2014). The latter aspect is also crucial with regard to the serious games approach based on the scenario: when persons gravely adopt their roles and responsibilities within such a game, the exercise becomes more realistic and turns into an appropriate learning experience. In this respect, such a teamwork training conducted within simulated environments, either physically or

GIS and Serious Games

377

virtually, may offer an additive benefit to more traditional didactic concepts as it enhances the performance of actors involved, and possibly also helps to avoid or reduce errors during real events.

1.25.3.1.4

The added value of geospatial applications in the field

For most people, a mobile phone with in-built GPS and geo-browser functions has become an everyday tool. But how can we use geospatial data as tangible sources of information for disaster response and preparedness? How can we collect geospatial data appropriately in the field? How can we interpret and analyze data even under adverse conditions and time pressure? The students will find answers for these questions during the exercise. “The faster emergency responders are able to collect, analyse, disseminate and act on key information, the more effective and timely will be their response, the better needs will be met and the greater will be the benefit to the affected populations (Van de Walle and Turoff, 2007).” GIS professionals directly supporting response operations in the field, such as the skilled humanitarian volunteers from the UK NGO MapAction (http://www.mapaction.org/), are collecting accurate and timely information that is extremely relevant especially in the initial, time-crucial phase of an emergency. Large amounts of up-to-date primary data can be rapidly gathered using adequate field equipment and quickly shared among relevant actors. Final products, such as visualized maps, will help stakeholders and responders to better estimate the impact of a disaster and therefore limit damages, understand the root causes of the event, and finally facilitate the decision-making process. Emergency response exercising is therefore a favorable way to build upon lessons learned from previous events, to progressively improve the understanding of prospective emergency events, and to learn how to better protect populations and societies from adverse consequences in the future.

1.25.3.2

“Serious” GIS (SerGIS)

The SerGIS project is an ongoing effort to develop a system for creating and playing games based on Geographic Information Systems (GIS). It is different than the previous case study in that it is a purely virtual environment. SerGIS has been developed with a particular emphasis on using gaming concepts to both demonstrate the capabilities of GIS tools and build spatial thinking ability (Tomaszewski et al., 2016). This is an important point as SerGIS is not like a GIS software training tooldalthough the usage of SerGIS could provide a basis for GIS software training. Another important feature of SerGIS is that it has been designed to allow for custom authoring of game scenarios. As an educational device, this is important as it allows for high level of flexibility in developing a wide range of serious GIS game types. To develop a serious game in the SerGIS environment, the user creates a series of game question/answer prompts (Fig. 5). The order of these prompts, which can be rearranged, determines the flow of the game itself. Each SerGIS prompt consists of four customizable properties: Map, Content, Choices, and Actions. The map section allows game authors to customize several properties of the map displayed in the prompt, such as its latitude, longitude, and the zoom level. Using Esri’s ArcGIS accessing to web-based GIS data, the author can edit the basemap into any of ArcGIS online’s provided reference layers. Additionally, authors can add map layers by entering the URL of the layer uploaded on an ArcGIS server. In addition to map properties, the game author can edit the

Fig. 5

The SerGIS game authoring prompt.

378

Fig. 6

GIS and Serious Games

The opening prompt of a SerGIS game.

context, choices, and actions segments of each prompt. These changes are then reflected in the JavaScript Object Notation game data file, which the author can preview, save, and publish. Upon loading the game, the user is presented with a visual representation of the first prompt. This representation contains the current game scenario as well as the accompanying map (Fig. 6). Drawing from the map data as well as the question posed by the game scenario, the player is then prompted to select from the choices displayed underneath the context. When the player makes a selection, the results of the decision are conveyed to the user. These actions range from an explanation of the choice made by the game player to using ArcGIS to draw map features or layers. The decisions a player makes may change his or her score, depending on the point values the game author assigns to each choice. The user then moves on to the next prompt, repeating the process until all of the authored prompts are finished. At the conclusion of the game, the user’s accumulated score is shown, as well as the maximum amount of points that could have been scored on each prompt. A particularly exciting aspect of the SerGIS environment has been the use of SerGIS as a way for student-to-student learning about GIS technology, games, and hazards (Tomaszewski and Griffin, 2016). In this regard, a group of GIS technology students from the Rochester Institute of Technology (RIT) in the United States conducted a collaborative learning activity about GIS technology via SerGIS with a group of undergraduate hazard students from the University of New South Wales (UNSW) in Canberra, Australia (Fig. 7). With only a minimal amount of instruction, the UNSW students were able to utilize the authoring prompt of SerGIS to create a wide variety of game-related hazards (Fig. 8). This led to a significant learning outcome that the UNSW students were able to think spatially about disasters, gain better understanding of spatial representations in GIS, and develop ideas for further training and education for development of GIS software technical skills.

1.25.4

Evaluating Serious GIS Games

From the case studies and reviews outlined earlier in the article, GIS serious games have been acknowledged as a viable tool for education and training in professional fields. However, in order to advance GIS serious game design and assist both educators and professionals in selecting appropriate games for instruction, it is necessary to evaluate the effectiveness of serious games. Several

GIS and Serious Games

379

Fig. 7 A student-to-student learning exchange about GIS for Hazards using the SerGIS environment. In this figure, students from RIT are teaching students from UNSW on how to author games in SerGIS.

Fig. 8 A serious GIS game created by an undergraduate student. This particular game is related to an air traffic control situation in Sydney, Australia. The student who created this game had no experience with GIS but was able to quickly build a game that utilized numerous GIS functions in order to learn about the capabilities of GIS for disaster management.

evaluation frameworks and methodologies have been proposed and utilized to help determine if a serious game produces the desired learning outcomes. It is necessary not only to review these approaches but also to check whether they have been applied to existing games in the fields of GIS and crisis management. Key considerations for this section will include: factors common to serious game evaluation, how these elements are present in proposed evaluative frameworks, and how these evaluative practices have been applied to existing crisis management and GIS serious games.

380

GIS and Serious Games

First, in describing the process of evaluating serious games, learning assessment has been defined as “using data to demonstrate that learning objectives are being met” (Bellotti et al., 2013). The flexible nature of video games makes such data collection a rather difficult and time-consuming process. Bellotti et al. (2013) discuss two common types of assessment with respect to serious games. Summative assessment refers to evaluating a learner’s achievements at the conclusion of an instructional process. A familiar method of summative assessment in educational research is using pre- and posttesting to measure learning outcomes. In a serious gaming context, pre- and posttesting plays a role in tracking a player’s knowledge about a particular subject before and after playing the serious game. A case study of the United Nations International Strategy for Disaster Reduction (UNISDR)’s disaster management serious game Stop Disasters! demonstrates this evaluative process. Stop Disasters! is a free-to-play serious game developed by UNISDR that simulates disasters in five scenarios. The intended pedagogical goal of the game is to raise the player’s awareness of the underlying causes of these disasters, as well as measures for preventing and mitigating the disaster’s impact. In order to address the gap in studies for evaluating the impact of social awareness serious games, Pereira et al. (2014) conducted an evaluation study of Stop Disasters! using several evaluative methods, one being summative. The user study had 27 people play through the wildfire scenario of the game twice on easy difficulty. The study used a pre- and posttest design to measure the difference in the participant’s knowledge before and after playing the game. The questionnaire asked four questions to assess the player’s knowledge in four topics: vacant land management, inhibited land management, community management initiatives, and community mechanisms. The evaluation found statistical evidence suggesting that the game had a positive impact on the player’s awareness in the four topics used for the pre/posttest questionnaire. Summative assessment is generally easy to implement and provides statistical, quantifiable evidence of a serious game’s effectiveness. However, it should be noted that methods of summative assessment, such as pre- and posttesting, are alone not a clear indicator of a serious game’s effectiveness. Summative assessment fails to take into account that players can cheat or simply guess the correct answer (Bellotti et al., 2013). Additionally, evaluating learning outcomes based on whether or not a player has completed the game does not answer whether he/she has actually learned the content or has merely learned how to beat the game (Bellotti et al., 2013). Although summative assessment alone cannot adequately evaluate game-based learning, it can be used with formative assessment to develop a comprehensive evaluation process. Formative assessment is implemented throughout the learning process and continuously monitors a learner’s progress. Bellotti et al. (2013) suggest that the immediate feedback provided by formative assessment is useful particularly to the domain of serious games evaluation. According to the general evaluation framework proposed by Mayer et al. (2014), methods of in-game assessment and data gathering should be unobtrusive to the player experience. Shute et al. (2009) describe unobtrusive evaluation in serious games as “stealth assessment,” which integrates player performance evaluation in a way so that the player is unaware of the process. Stealth assessment indirectly measures a player’s learning progress by requiring a player to use knowledge from previous areas of the game to advance farther. Unlike traditional questionnaires, in-game assessment does not directly measure player knowledge and is less disruptive to the user experience. Player performance can be discreetly evaluated through objective measures such as “information trails” introduced by Loh et al. (2007) that transform in-game data to observable player actions. These player actions are then analyzed to gain a greater understanding of a player’s thought process during the game (Loh et al., 2007). In the field of GIS games and training, Metello and Casanova (2009) emphasize that geospatial games must provide the tools necessary to evaluate player performance during gameplay. On the most basic level, these tools would provide a timeline of the player’s decisions in accordance with each of the game’s scenarios. In scenarios with predefined procedures, the game can compare the player’s decisions against these procedures, although these predefined procedures and plans are, similar to the serious game, subject to evaluation. In addition to a game’s educational value, one must assess its qualitative aspects. One of the primary challenges for serious game design is striking a sufficient balance between pedagogy and learner engagement (Rooney, 2007). Favoring pedagogy in serious games risks a loss of interest and motivation by the player (Boughzala et al., 2013). Playtesting with end users, an essential aspect of the game design process, is an effective means of assessing the value of its qualitative features such as challenge, enjoyment, and narrative. When playtesting, providing the participants with detailed questionnaires is one common method of qualitative evaluation. These questionnaires often use a Likert Scale to ascribe a relative quantitative value to qualitative data. In addition to summative assessment, Pereira et al.’s (2014) Stop Disasters! case study also measured the subjective experience of the game using a questionnaire. Participants answered a modified version of an Intrinsic Motivation Inventory (IMI) questionnaire which measured user experience in five dimensions: interest/enjoyment, perceived competence, effort/importance, value/usefulness, and pressure/ tension. Players rated sentences pertaining to each dimension on a seven-point Likert scale. At the end of the procedure, users were permitted to express any free comments about the game. While players praised the interface, concept, and the informative feedback of Stop Disasters!, they criticized the game’s lack of a zoom mechanism and the absence of the ability to test fires in certain locations. The qualitative evaluation (the IMI questionnaire) suggested that Stop Disasters! is an enjoyable and useful game in a low tension gaming environment (Pereira et al., 2014). Playtesting in the context of qualitative serious game evaluation also requires receiving feedback from a diverse range of users. These end users may differ in many variables, including but not limited to age, gender, experience in the general knowledge area, and experience with computers and digital games. In her paper detailing the development of “B3dDesign Your MarketPlace!,” an urban planning serious game in the emerging field of PPGIS, Poplin (2014) notes that the game’s finalized playtest was conducted with two groups. One of the groups consisted of entirely senior citizens with varying degrees of computer experience, while the other group consisted of university students studying urban planning. Both groups were asked to perform two tasks in this urban planning game and answer a questionnaire concerning the game’s qualitative aspects. From the evaluation, it was concluded that most of the participants appreciated the concept of a serious game for participating in the urban planning process, as well as the game’s graphics.

GIS and Serious Games

381

However, the playtesting group of senior citizens had several notable problems accomplishing the two designated tasks, which related to the computer skills of the participants. This study of the serious GIS game “B3dDesign Your MarketPlace!” suggests that if a serious game does not target a niche, concrete audience, usability testing of the game should be conducted with a diverse set of users. Another method proposed for qualitative analysis of serious games is De Freitas and Oliver’s (2006) Four-Dimensional Framework. Designed to assist teachers in evaluating the worth of serious games, the framework consists of four dimensions: context, learner, pedagogic considerations, and modes of representation. In particular, the “modes of representation” dimension focuses on the subjective features of the game, as well as the game’s overall diegesis. De Freitas and Oliver (2006) acknowledge that video games are unique in the central role diegesis (the world within the narrative) plays. Relevant considerations concerning the diegesis of a serious game include the game’s interactivity, fidelity, and realism. This remains especially true in the field of serious GIS games involving training scenarios, in which the overall effectiveness of the game’s instruction depends greatly on the realism of the scenario’s simulation. It is necessary to evaluate the level of these qualitative aspects required to support learning outcomes. An evaluation process commonly applied to serious games research is the facilitation of debriefing sessions postgame. According to De Freitas and Oliver (2006), it is essential to distinguish immersion into a virtual world from the processes used to reflect the experience. In other words, one must not only play a game but be critical of the process in order to reflect on their relationship with the diegesis outside of it. This “double” identification approach highlights the importance of debriefing as an evaluative tool for serious games (De Freitas and Oliver, 2006). Citing a study in which student learning facilitated by the game Savannah did not match the curriculum, De Freitas and Oliver (2006) argue that a reason for this mismatch was the lack of a clear debriefing session during the pilot project. Andrés et al. (2014) address a similar fault in their evaluative study of TimeMesh, stating that omitting a postgame class discussion of the learned material limited the learning outcomes. Bellotti et al. (2013) note the potential of video and screen recordings in facilitating debriefing sessions. Participants can reveal additional information concerning why he or she chose to take specific recorded gameplay actions during the debriefing process. There is little difference in the processes used for evaluating serious games specifically in the area of GIS compared to serious games in general. To maximize the effectiveness of serious game assessment, however, multiple methodologies should be integrated into the evaluation process. For example, the learning outcome of a serious GIS game should not be evaluated by pre/posttest design alone, but should also be evaluated through debriefing sessions and qualitative analysis. An evaluation study or framework that evaluates the educational value, game design quality, and in-game player performance of a serious game may prove essential to educators seeking to integrate serious games into both education and professional areas. One significant difference in the evaluation of serious games for GIS would be evaluation of how the serious GIS game enhances spatial thinking. GIS has long been recognized as a spatial thinking support device (National Research Council, 2006). Tomaszewski et al. (2016) is some of the first research to consider the relationship between the serious GIS gaming experience and spatial thinking. In this research, while playing the game, subject participants were asked to “think aloud” spatially or verbally describe their spatial thinking process. Results indicated that students with spatially oriented backgrounds such as engineering performed equally, if not better, than students from nonspatially oriented backgrounds but with GIS classroom experience.

1.25.5

A GIS and Serious Game Research Agenda

The vibrant and exciting world of GIS and serious games has numerous research directions. The following is a research agenda that ideally can guide and inform basic and applied research on GIS and serious games.

1.25.5.1

Geo-Gamification

Further research should be conducted on incorporating gaming concepts with GIS analytic and representational capabilities. Such research could start with further examination of current gamification best practice such as use of virtual reality, scoring, and rewards and incentives within the gaming experience that can potentially influence learning, training or outcomes of a serious game. Beyond this could be deeper ethnographic research on how people in various domains interact with spatial data, representations on various virtual and computing platforms, and how those interactions could then be gamified into a serious gaming experience.

1.25.5.2

Spatial Representations and Serious Games

As discussed in “Spatial Representations and Serious Games” section, 2D and 3D spatial representations critical aspects of the GIS serious gaming experience. Further research should be conducted on the appropriateness of 2D versus 3D for specific gaming tasks, and how those representations contribute to the overall serious gaming experience and learning outcomes. Research on spatial representations in serious games should also investigate how decision-making is affected by the type of spatial representation provided in the serious game.

1.25.5.3

Expert Knowledge Incorporation Into Serious GIS Games

As was demonstrated in this article, serious GIS games are closely coupled with application domains such as disaster management. This close coupling to application domains creates numerous research opportunities for incorporating expert knowledge into

382

GIS and Serious Games

serious GIS games. For example, the SerGIS environment outlined in ““Serious” GIS (SerGIS)” section provides the ability to provide feedback on individual game choices made by players. This feedback is critical to both the learning experience provided through the serious game and validating the relevancy and efficacy of the game as a training and learning tool. Incorporating expert knowledge can be conducted in a variety of ways such as literature reviews of practitioner materials, participant observation, interviews, and other knowledge elicitation techniques. These techniques are basic research methods themselves that can be taught to undergraduate and graduate students. Thus, serious games can also serve as an indirect way for building research methods and skills that are related to a broader and more engaging purpose (from a student’s perspective) of building and creating serious games.

1.25.5.4

Evaluating Serious GIS Games

Evaluation methods should be developed that account for the unique spatial aspect of serious GIS games. These methods should not only incorporate summative and formative assessments used in evaluating serious games in general (as discussed in “Evaluating Serious GIS Games” section), but also should consider how the nascent field of spatial thinking evaluation (Kim and Bednarz, 2013; Lee and Bednarz, 2012) can be incorporated with serious game evaluation.

1.25.5.5

Technology Research

Numerous technical research directions exist for serious GIS games. The case studies provided in section three demonstrated the use of existing commercial GIS tools to develop a serious GIS gaming experience. This is important as the data sets and tools used by the game players are the same as those used by GIS professionals. These gaming experiences relied on desktop and web-based GIS tools to create a gaming experience. However, much more work can be done on incorporating real GIS tools with real GIS Data on mobile platforms, wearable computing, and as of yet unknown forms of computing that will emerge in the next 5 years. For example, (a) developing serious GIS games that incorporate real data streams collected from an actual disaster and where (b) the game is played on a mobile device while in the field like discussed in “Emergency Simulation Exercises for Capacity Development Within Postgraduate Education” section.

1.25.6

Summary and Conclusions

In this article, we introduced the idea of GIS and serious games. We provide context to this idea through definitions of games, serious games, the difference between serious games and simulations, the idea of gamification, and spatial representations and serious games. Given the close coupling of serious games to application domains, we drew heavily upon the domain of disaster management in terms of literature review and case studies to provide specific examples of GIS and serious games. In particular, we provided GIS and serious game case studies drawn from the research experiences of the authors. The first was an example of capacity development using emergency simulation exercises that incorporated geospatial tools with a real-time simulation exercise. The second was a virtual serious GIS game environment called serious GIS or SerGIS that has been developed to allow for flexible creation of gaming scenarios using real GIS tools. We then provided discussion on evaluating spatial games as a means for further research on development and evaluation of new serious GIS games. We concluded the article with a GIS and serious game research agenda that ideally can provide ideas on further developing both the theoretical and methodological basis of GIS for serious games. Although maps have been, and will continue to be an important part of gaming experiences, we argued that the analytical and representational power of GIS combined with the ideas of serious games can provide a powerful and novel method for training, learning, spatial thinking development, and problem solving in a wide variety of application domains. Ideally, further research and development on GIS and serious games can take advantage of the ever-expanding volume of spatial data being created combined with new technological platforms that can address pressing societal problems.

References Ahlqvist, O., Loffing, T., Ramanathan, J., Kocher, A., 2012. Geospatial human-environment simulation through integration of massive multiplayer online games and geographic information systems. Transactions in GIS 16, 331–350. Andrés, P.M.L., Arbeloa, F.J.S., Moreno, J.L., De Carvalho, C.V., 2014. TimeMesh: Producing and evaluating a serious game. In: Proceedings of the XV International Conference on Human Computer InteractionACM, p. 100. Bayliss, J.D., Schwartz, D.I., 2009. Instructional design as game design. In: Proceedings of the 4th International Conference on Foundations of Digital GamesACM, pp. 10–17. Bellotti, F., Kapralos, B., Lee, K., Moreno-Ger, P., Berta, R., 2013. Assessment in and of serious games: An overview. Advances in Human-Computer Interaction 2013, 1. Benjamins, T., Rothkrantz, L., 2007. Interactive simulation in crisis management. In: Proceedings of the 4th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2007), Delft, Netherlands, pp. 571–580. Blochel, K., Geniviva, A., Miller, Z., Nadareski, M., Dengos, A., Feeney, E., Mathews, A., Nelson, J., Uihlein, J., Floeser, M., Szarzynski, J., Tomaszewski, B., 2013. A serious game for measuring disaster response spatial thinking. ArcUser 16, 12–15. Boughzala, I., Bououd, I., Michel, H., 2013. Characterization and evaluation of serious games: A perspective of their use in higher education. In: System Sciences (HICSS), 2013 46th Hawaii International Conference on, 2013. IEEE, pp. 844–852. Breakaway Ltd, 2012. Serious games for homeland securitydIncident Commander™ NIMS-compliant training tool for Homeland Security [Online]. Available: http://www. breakawaygames.com/serious-games/solutions/homeland/ (Accessed: 26 November 2014).

GIS and Serious Games

383

Chen, R., 2011. The development of 3D city model and its applications in urban planning. In: Geoinformatics, 2011 19th International Conference on, IEEE, pp. 1–5. Coonradt, C.A., Nelson, L., 2007. The game of work. Gibbs Smith, Layton, Utah. Crichton, M., Flin, R., 2001. Training for emergency management: Tactical decision games. Journal of Hazardous Materials 88, 255–266. De Freitas, S., Oliver, M., 2006. How can exploratory learning with games and simulations within the curriculum be most effectively evaluated? Computers & Education 46, 249–264. Deterding, S., Dixon, D., Khaled, R., Nacke, L., 2011. In: From game design elements to gamefulness: Defining “Gamification“, MidTrek’ 11. ACM, pp. 9–15. Djaouti, D., Alvarez, J., Jessel, J.-P., Rampnoux, O., 2011. Origins of serious games. In: Serious games and edutainment applications. Springer, London, England, pp. 25–43. Egenfeldt-Nielsen, S., Smith, J.H., Tosca, S.P., 2016. Understanding video games. Rutledge, New York, NY. Granlund, R., 2001. Web-based micro-world simulation for emergency management training. Future Generation Computer Systems 17, 561–572. Haferkamp, N., Kraemer, N.C., Linehan, C., Schembri, M., 2011. Training disaster communication by means of serious games in virtual environments. Entertainment Computing 2, 81–88. Harris, K., Bahadur, A., 2011. Harnessing synergies: Mainstreaming climate change adaptation in disaster risk reduction programmes and policies. Herbert, G., Chen, X., 2015. A comparison of usefulness of 2D and 3D representations of urban planning. Cartography and Geographic Information Science 42, 22–32. Huizinga, J., 1938. Homo ludens: Proeve fleener bepaling van het spel-element der cultuur. Tjeenk Willink, Haarlem. Jordan, A., Huitema, D., Van Asselt, H., Rayner, T., Berkhout, F., 2010. Climate change policy in the European Union: Confronting the dilemmas of mitigation and adaptation?. In: Cambridge University Press, Cambridge, UK. Kim, M., Bednarz, R., 2013. Development of critical spatial thinking through GIS learning. Journal of Geography in Higher Education 37 (3), 1–17. Kolen, B., Thonus, B., Zuilekom, K., De Romph, E., 2011. Evacuation a serious game for preparation. In: 2011 I.E. International Conference on Networking, Sensing and Control (ICNSC), 2011 Delft, Netherlands. IEEE, pp. 317–322. Lee, J., Bednarz, R., 2012. Components of spatial thinking: Evidence from a spatial thinking ability test. Journal of Geography 111, 15–26. Loh, C.S., Anantachai, A., Byun, J., Lenox, J., 2007. Assessing what players learned in serious games: In situ data collection, information trails, and quantitative analysis. In: 10th International Conference on Computer Games: AI, Animation, Mobile, Educational & Serious Games (CGAMES 2007), pp. 25–28. Mathews, A., Tomaszewski, B., Szarzynski, J., Vodacek, A., 2014. Disaster risk reduction spatial thinking: A serious games approach. 11th International Conference of the International Association For The Study Of Information Systems For Crisis Response And Management (ISCRAM). University Park, Pennsylvania. Mayer, I., Bekebrede, G., Harteveld, C., Warmelink, H., Zhou, Q., Ruijven, T., Lo, J., Kortmann, R., Wenzler, I., 2014. The research and evaluation of serious games: Toward a comprehensive methodology. British Journal of Educational Technology 45, 502–527. Meesters, K., Van De Walle, B., 2013. Disaster in my backyard: A serious game introduction to disaster information management. In: Comes, T., Fiedrich, F., Fortier, S., Geldermann, J., Müller, T. (Eds.)Proceedings of the 10th International ISCRAM Conference–Baden-Baden, Germany. Metello, M.G., Casanova, M.A., 2009. Training games and GIS. In: Research trends in geographic information science. Springer, Heidelberg, pp. 251–264. National Research Council, 2006. Learning to think spatially: GIS as a support system in the K-12 curriculum. The National Academies Press, Washington, DC. Pereira, G., Prada, R., Paiva, A., 2014. Disaster prevention social awareness: The stop disasters! Case study. In: Games and Virtual Worlds for Serious Applications (VS-GAMES), 2014 6th International Conference on, IEEE, pp. 1–8. Petty, G., 2009. Evidence-based teaching: A practical approach. Oxford University Press, Cheltenham. Poplin, A., 2012. Web-based PPGIS for Wilhelmsburg, Germany: An integration of interactive GIS-based maps with an online questionnaire. URISA Journal 24, 75–89. Poplin, A., 2014. Digital serious game for urban planning: “B3dDesign your Marketplace!”. Environment and Planning B: Planning and Design 41, 493–511. Qamar, A.M., Afyouni, I., Rahman, M.A., Rehman, F.U., Hussain, D., Basalamah, S., Lbath, A., 2014. A GIS-based serious game interface for therapy monitoring. In: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2014ACM, pp. 589–592. Rooney, P., 2007. Students@ play: Serious games for learning in higher education. Salen, K., Zimmerman, E., 2004. Rules of play: Game design fundamentals. MIT press, Cambridge, Massachusetts. Schell, J., Tellerman, S., Mussorfiti, L.T., 2005. Hazmat: Hotzone [Online]. Available: http://www.etc.cmu.edu/projects/hazmat_2005/people.php (Accessed: 26 November 2014). Schwartz, D.I., 2008. Motivating Engineering Mathematics Education with Game Analysis Metrics. In: Proceedings of the ASEE Zone I Conference, West Point, NY. Schwartz, D.I., Locke, K., Ross, D.O., Emeny, M., 2007. The future of wargaming: A componentized approach. In: Gauthier, J. (Ed.)Proceedings of the 2007 Huntsville Simulation Conference. Shaw, C., 2010. Designing and using simulations and role-play exercises. In: Denemark, Robert A. (Ed.), The International Studies Encyclopedia. Blackwell Publishing, London. Shen, M., Carswell, M., Santhanam, R., Bailey, K., 2012. Emergency management information systems: Could decision makers be supported in choosing display formats? Decision Support Systems 52, 318–330. Shute, V.J., Ventura, M., Bauer, M., Zapata-Rivera, D., 2009. Melding the power of serious games and embedded assessment to monitor and foster learning. Serious games: Mechanisms and effects 2, 295–321. Tiwari, A., Jain, K., 2015. A detailed 3D GIS architecture for disaster management. International Journal of Advanced Remote Sensing and GIS 4, 980–989. Tomaszewski, B., Griffin, A.L., 2016. Students learning about disaster situation training using serious games for GIS (SerGIS). In: Association of American Geographers Annual Conference, 2016 San Francisco, California. Tomaszewski, B., Judex, M., Szarzynski, J., Radestock, C., Wirkus, L., 2015. Geographic information systems for disaster response: A review. Journal of Homeland Security and Emergency ManagementdSpecial issue on Information and Communication Technology (ICT) and Crisis Disaster, and Catastrophe Management 12, 571–602. Tomaszewski, B., Schwartz, D.I., Szarzynski, J., 2016. Crisis response serious spatial thinking games: Spatial think aloud study results. In: 13th International Conference of the International Association for The Study of Information Systems For Crisis Response And Management (ISCRAM) 2016, 2016 Rio de Janeiro, Brazil. Tomaszewski, B., Szarzynski, J., Schwartz, D.I., 2014. Serious games for disaster risk reduction spatial thinking. In: Eighth International Conference on Geographic Information Science (GIScience 2014). Vienna, Austria. UGM Technology Group, 2016. The Game Career Guide, list of schools [Online]. Available: http://www.gamecareerguide.com/schools/ (Accessed: December 22, 2016). UK Cabinet Office and National Security Intelligence, 2014. Emergency planning and preparedness: Exercises and training [Online]. Available: https://www.gov.uk/guidance/ emergency-planning-and-preparedness-exercises-and-training (Accessed: 31 May 2016). Unisdr, 2015. Sendai framework for disaster risk reduction 2015–2030. United Nations Office for Disaster Risk Reduction Geneva, Switzerland. Van De Walle, B., Turoff, M., 2007. Introduction: Emergency response information systems: Emerging trends and technologies. Communications of the ACM 50, 29–31. Vstep BV (n.d.) VSTEP RescueSim - Incident Command Simulator [Online]. Available: http://vstepsimulation.com/product/rescuesim/ (Accessed: 31 May 2016).

1.26

Mobile GIS and Location-Based Services

Song Gao and Gengchen Mai, University of California, Santa Barbara, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.26.1 1.26.2 1.26.3 1.26.3.1 1.26.3.2 1.26.3.3 1.26.3.4 1.26.4 1.26.4.1 1.26.4.2 1.26.4.2.1 1.26.4.2.2 1.26.4.2.3 1.26.4.3 1.26.4.3.1 1.26.4.3.2 1.26.4.3.3 1.26.5 1.26.5.1 1.26.5.2 1.26.6 1.26.6.1 1.26.6.2 1.26.6.3 1.26.6.4 1.26.6.5 1.26.6.6 1.26.6.7 1.26.7 1.26.8 References

1.26.1

Introduction Definition of Mobile GIS Key Components and Characteristics of Mobile GIS Web-Based Spatial Data File Formats Distributed Spatial Database Mobile/Distributed Geocomputing Ability Personalized Visualization Style Positioning and Tracking Technologies for Mobile GIS Outdoor Positioning Technologies Indoor Positioning Technologies WLAN Bluetooth Radio-frequency identification Indoor Positioning Methods Geometric approaches Fingerprinting approaches Statistical approaches Mobile Databases and Field Data Collection Mobile Databases Field Data Collection LBS and Applications Google Maps Waze Yelp Foursquare and Swarm Uber Nike Plus STRAVA Research Frontiers of Mobile GIS and LBS Conclusions and Future Work

384 385 385 386 386 386 386 386 387 387 387 388 388 388 388 389 389 390 390 390 391 391 391 392 392 392 393 393 395 396 396

Introduction

With the fast development of mobile Web and computing technologies, as well as increasingly availability of mobile devices, mobile information technologies have revolutionary influence on the human society. News, emails, microblogs, photos, videos, Apps, and many other multimedia information can be easily accessed and shared with colleagues and friends through smart phones almost every day. In the era of mobile age, location-based services (LBS) play an important role in people’s daily life, such as searching nearby points of interest (POI), way finding, and navigation. In the domain of geographic information systems (GIS), advanced mobile information technologies have lowered the traditional enterprise GIS fence and enabled a variety of novel applications which can help improve positioning and tracking accuracy, efficient field data collection, real-time mapping, ground truth validation, location intelligence and decision support, and so on (Abdalla, 2016; Lemmens, 2011). Geospatial information, spatial analysis, and spatial queries are no longer limited to a fixed environment but can be accessed at any place at any time (Shi et al., 2009). However, mobile GIS and LBS face several challenges such as small screen for data visualization, limited bandwidth and high costs of networks for transferring data, battery consuming for positioning and computing capabilities, heterogeneous types, and multilevel spatiotemporal resolutions of datasets. It is necessary for geographic information scientists and researchers to review and summarize recent progresses in this fast developing domain, with highlights on the core concepts of mobile GIS and LBS. In this article, we will first introduce what is mobile GIS and its core components. Then the technology stack of mobile GIS and LBS will be presented. Several popular mobile GIS and LBS applications will be reviewed after that. Research frontiers and future development directions are identified and need further investigation. And finally we will summarize key points presented in this article and conclude this work.

384

Mobile GIS and Location-Based Services

1.26.2

385

Definition of Mobile GIS

A good science discipline starts with a good definition. However, this might not be the case for GIS since it tends to be an integrated domain and there are many variations for the definition of GIS or GIScience. Researchers have comprehensive thoughts and comments about this topic. Although it is not the focus of this article, it is worth for the readers who are interested in a holistic view of this domain to dive into more details through those references (Egenhofer et al., 2016; Goodchild, 1992, 2010; Mark, 2003; Tomlinson, 1987, 2007). According to the functionalities and usages, mobile geographic information systems (mobile GIS) extend traditional desktop GIS beyond the offices and allow individuals and organizations to localize, collect, store, visualize, and even analyze geospatial data in both field and office environments. Mobile GIS applications can either store collected datasets in the offline mode and then upload to a GIS server or a cluster later on, or directly update features to existing Web GIS services on the cloud server infrastructure in real time via mobile devices. Any user regardless of location and environment can have access to the geospatial data and GIS capabilities using the Web if the user has the Internet connection. Under such a mobile-cloud architecture, it is convenient to keep geospatial data distributed and synchronized across varying locations on the Earth (as shown in Fig. 1), and enable the shareability of GIS data and geospatial resources through the Web and mobile devices among colleagues of the same group (i.e., enterprise cloud) or any users on the Internet (i.e., public cloud).

1.26.3

Key Components and Characteristics of Mobile GIS

In this section, we present some key components which mobile GIS should consist of. Geographic information scientists and GIS engineers might have different opinions on this topic. Here, we try our best to summarize important components based on both theoretical aspects and engineering practices: Mobile device: In general, it usually refers to personal digital assistants, smartphones, tablets, or even recently developed wearable devices like smart watches which have limited analysis or computing capability but would be sufficient for positioning, visualization, and navigation purposes. l Apps: Similar to GIS software on the desktop, mobile GIS applications (also known as “Apps”) provide basic mapping functions and/or tools for collecting and storing geographic information. There are two types of apps: Native or Web-browser apps. Native ones are programmed for a specific mobile operating platform (such as Android or iOS) and can take advantage of mobile operating system features, while Web-browser apps are usually implemented using the HTML5 and Javascript framework and can be run in multiple operating platforms. l Data/service layers: This is one of the most important and expensive components of mobile GIS. It usually consists of at least one basemap layer which provides the geographic background of current location of a mobile device and thematic layers which contain physical or socioeconomic properties, such as population, traffic, and facilities. The data layers shown in the mobile GIS environment can be retrieved from offline mobile databases, plain-text files, or from online Web GIS services. l

Mobile devices have limited memory, computational power, battery life, and screen size, as well as inconvenient input support (Virrantaus et al., 2001). These limitations imply that mobile GIS might need some new technologies to overcome the challenges. Compared with traditional desktop GIS, mobile GIS have several characteristics, including (but not limited to) web-based spatial data file formats, distributed spatial database, mobile geocomputing ability, and personalized visualization style.

Fig. 1

Screenshots of Collector for ArcGIS. Source: http://www.esri.com/products/collector-for-arcgis.

386 1.26.3.1

Mobile GIS and Location-Based Services Web-Based Spatial Data File Formats

Because of the complexity of the spatial data, many spatial data file formats have been proposed and implemented in GIS to capture both geometric and topological information. Among these GIS data file formats, “Shapefile” which was developed by Esri Inc. is the most famous one and has been widely used and supported by almost all of mainstream GIS products. “DXF” (i.e., drawing interchange format or drawing exchange format) is another popular spatial data file format which is developed by AutoCAD and has been widely used in urban planning and architecture design. However, mobile GIS require frequent data exchange and communication between mobile devices and remote servers. Web is the most popular user interface which has many advantages (Pundt and Brinkkötter-Runde, 2000): (1) a graphic user interface (GUI) written in HTML and Javascript can simplify the user input tasks; (2) the WWW’s multimedia properties enable the visualization of spatial data in a graphically manner. All of the facts demonstrate that mobile GIS need a new spatial data format which can be easily transferred between mobile devices and servers, and can be easily read by Web GUI. Geography Markup Language (GML) is one of those data formats which satisfy these requirements. GML specified by the OpenGIS Consortium (Cox et al., 2001) is a kind of Extensible Markup Language (XML) which encoded geographic data. It is suitable for data exchange and supported by the Web. Nowadays, GML has been widely used in not only mobile GIS but also many other Web-based geographic data services.

1.26.3.2

Distributed Spatial Database

Mobile devices have relatively lower disk space than PC. It might be impossible to store all the spatial data on one small device. Even if several spatial data compression techniques have been proposed, it is widely accepted that only a small portion of the spatial data which are close to the user’s current location can be stored in the mobile device temporarily using the caching technique or the dynamic data model strategy (Shi et al., 2009). The whole spatial database may be better stored in the servers while many mobile GIS also support offline storage mode (more details in the “Mobile Databases” section). The spatial data can be stored in one server and can also be distributed among multiple servers and clusters.

1.26.3.3

Mobile/Distributed Geocomputing Ability

In traditional desktop GIS, geocomputing ability which involves in the data processing and spatial analyses only depends on the local machine which seems to be robust. But the low efficiency of traditional GIS has been criticized a lot when the volume of spatial datasets becomes so large. In contrast, because of the relative limited computing capability of mobile devices, mobile GIS advocate the idea of distributing the geocomputation tasks between mobile devices and servers. The greatest limitation in the distribution of geographic information over Internet is the difficulty in transferring and processing large sizes of spatial data. Mobile devices can only take care of some simple geocomputation tasks. When the user asks for a complex computation task, the mobile device will send a request to the backend server which has larger memory and higher computation ability. The server will execute the geocomputation task and sent the result back to the device. In this way, a complex geocomputation task can be achieved by a reasonable time. In addition, a conceptual dynamic data model which considers the spatial, temporal, and attribute constraints in a mobile environment has been proposed to increase mobile GIS performance, which can be measured by the response time of the database to a spatial query from a mobile GIS user (Shi et al., 2009).

1.26.3.4

Personalized Visualization Style

The traditional desktop GIS often use a standardized visualization style. However, as for mobile visualization, a user may want to change the visualization style based on individual preferences to improve his/her understanding of information in mobile GIS. Adaptive mobile cartography has been proposed by Reichenbacher (2001) to take personal needs and context information into considerations for better mobile GIS assistance. Due to the variability of screen size, color setting, resolutions among different mobile derives, new concepts of the mobile interface design are also required. The full potential for efficient visualization of both spatial and nonspatial data on such small mobile screens may need to concern the load balancing between server and client, as well as enhanced mobile caching mechanisms.

1.26.4

Positioning and Tracking Technologies for Mobile GIS

Positioning is a key component to support mobile GIS development, mobility tracking studies, trajectory data mining, and locationbased applications. In this section, we present several important positioning technologies which have been developed in both outdoor and indoor environments.

Mobile GIS and Location-Based Services 1.26.4.1

387

Outdoor Positioning Technologies

Depending on the information technology infrastructures, popular outdoor positioning technologies include global navigation satellite system (GNSS), cellular networks, and wireless networks. GNSS: It provides the location (latitude/longitude) and time information in all weather conditions, anywhere on or near the Earth surface where there is an unobstructed line of sight to at least four or more global positioning satellites. Well-known GNSS infrastructures include Global Positioning System (GPS), the United States NAVSTAR GPS and the Russian GLONASS, the Chinese BeiDou Navigation Satellite System, and the European Union’s Galileo system. The most popular GNSS deployed on mobile devices on the current market share is the GPS. Many positioning techniques have been developed based on highaccuracy GPS chips, differential GPS, and assisted-GPS (Misra and Enge, 2006). l Cellular networks: Nowadays cellular network is one of the most important communication infrastructure among people and has almost a worldwide coverage. When a mobile call is made, the mobile phone signal is linked to the nearest cellphone tower or the base station with particular geographic coordinates. The location of the cellular tower can be used as an estimation of a mobile phone user when he or she makes a phone call communication. The spatial divisions of such cellular networks are divided into cells (regions) based on the Voronoi diagram in which for each cell tower location (as a center) there is a corresponding region consisting of all points closer to that center than to any others. That is, all phone calls within a given Voronoi polygon are closer to the corresponding cell tower than to any other cell towers. Generally, urban core areas have a higher density of mobile cells where the average distance between mobile base stations is approximately one kilometer; the value of average separation depends on the size of the study area (Gao et al., 2013). l Wi-Fi: The Institute of Electrical and Electronics Engineers (IEEE, 2007) documents the standard use of Wi-Fi technology to enable wireless-network connections in five distinct frequency ranges: 2.4, 3.6, 4.9, 5, and 5.9 GHz bands. The widely use of Wi-Fi access points for Internet connection in hotels, business buildings, coffee shops, and many other fixed places makes Wi-Fi become an attractive technology for the positioning purpose. All of those Wi-Fi routers deployed in fixed places repeatedly broadcast wireless signals to the surrounding area. These signals typically travel several hundred meters in all directions such that they can form wireless signal surfaces; and one device could receive distinctive signals at different locations on the surface for localization. The accuracy of localization is then dependent on the separation distance among adjacent Wi-Fi reference points (RPs) and the transmission range of these RPs (Bulusu et al., 2000). l

Zandbergen (2009) systematically compare three dominant positioning technologies: assisted-GPS, Wi-Fi, and cellular positioning. Their pros and cons are discussed in terms of coverage, accuracy, and reliability. It reports that assisted-GPS obtains an average median error of 8 m outdoors while Wi-Fi positioning only gets 74 m of that and cellular positioning has about 600 m median error in average and is least accurate. However, high-resolution GPS or assisted-GPS positioning chipsets do not work well in indoor environments due to limited satellite visibility; and thus a number of indoor positioning technologies and systems have been designed and developed to increase the indoor positioning accuracy.

1.26.4.2

Indoor Positioning Technologies

In general, indoor positioning technologies can be classified into two broad categories: radio-frequency (RF)-based and nonradiofrequency (NRF)-based technologies. The RF group includes but not limited in wireless local area network (WLAN), Bluetooth, and radio-frequency identification (RFID) systems, while the NRF group contains ultrasound, magnetic fields, and vision-based systems. Researchers have made great efforts in the field of indoor positioning using these sensors and technologies (Ferris et al., 2007; Gu et al., 2009; Kaemarungsi and Krishnamurthy, 2004; Kuo et al., 2014; Li et al., 2012; Liu et al., 2007, 2012). The spatial coverage area and positioning accuracy of those different technologies have been reviewed by Mautz (2009). Here, we only briefly discuss the RF technologies that are most popular in the market share and have many challenging issues to investigate.

1.26.4.2.1

WLAN

Location positioning systems using WLAN infrastructure are considered as cost effective and practical solutions for indoor location estimation and tracking (Chang et al., 2010). The wireless networks are widely implemented in many types of indoor buildings in which wireless access points are usually fixed at certain positions. Those access points allow wireless devices (e.g., mobile phones, laptops, and tablets) to connect to a wired network using Wi-Fi technology. And the relative distance between wireless devices and access points can be roughly estimated based on Wi-Fi signal strength using signal propagation models (Hidayab et al., 2009; Motley and Keenan, 1988), which will be further discussed in the next subsection of methods. It is also well known that the accuracy of indoor position estimation based on Wi-Fi signal strength is affected by many environmental and behavior factors, such as walls, doors, settings of access points, and orientation of human body (Ferris et al., 2007; Wang et al., 2003). In practical applications, a good approximation of heterogeneous environmental signal surfaces could help to improve the indoor positioning accuracy. Spatial regression which is a widely used spatial analysis method in finding spatial patterns of surfaces (de Smith et al., 2007) could potentially be a good candidate.

388

Mobile GIS and Location-Based Services

1.26.4.2.2

Bluetooth

Bluetooth technology that is designed for low power consumption allows multiple electronic devices to communicate with each other without cables by using the same 2.4 GHz RF band as Wi-Fi. The distance range within which Bluetooth positioning can work is about 10 m. In the beaconing mode, Bluetooth permitted messages can be used to detect the physical proximity between two devices (Faragher and Harle, 2014). In an indoor environment equipped with equal or large than three Bluetooth low energy (BLE) beacons, the location of a target mobile device with Bluetooth can be determined using classic positioning approaches (see section “Indoor Positioning Methods” in detail). In this way, location-dependent triggers, notifications, and tracking activities can be enabled by employing multiple BLE beacons. There are several popular BLE beacon-positioning protocols and technology available online including Apple iBeacon https://developer.apple.com/ibeacon/, Google Eddystone https://developers.google.com/ beacons/, and Gualcomm Gimbal http://www.gimbal.com/, which guide developers to implement up-to-date indoor positioning and tracking applications.

1.26.4.2.3

Radio-frequency identification

Radio-frequency identification (RFID) is a general term used for a system that communicates using radio waves between a reader and an electronic tag attached to an object. Comparing with Bluetooth technology, RFID systems usually comprise of readers and tags that store relatively limited information about the object such as location and attribute information. Those tags can be activated and send out stored information if they receive the signal from RFID readers within certain distance thresholds, which can be used to estimate the reader’s location and to show relevant information. Currently RFID positioning and tracking systems are widely used for asset tracking, shipments tracking in supply chains, object positioning in retailing places and shopping malls. Because of the sensor diversity and positioning challenges in various indoor environments, there is also an increasing trend toward combining and integrating different sensor networks to get a better spatial coverage and position accuracy than using single data source. For example, Evennou and Marx (2006) integrated Wi-Fi and inertial navigation systems to get performance close to meter by fusing pedestrian dead reckoning and Wi-Fi signal-strength measurements. Regalia et al. (2016) presented a novel crowdsensing framework and demonstrated that ambient sensors (e.g., temperature, pressure, humidity, magnetism, illuminance, and audio) available on mobile devices can help determine a location change in environments (e.g., moving from indoors to outdoors, or floor changes inside buildings) more accurately than typical positioning technologies (e.g., GNSS, Wi-Fi, etc.), and thus it might achieve higher positioning accuracy using multisensor positioning technology in the future.

1.26.4.3

Indoor Positioning Methods

The following methods usually work for both outdoor and indoor environments, but we want to emphasize the indoor case here. Note that those methods can be applied in most RF-based sensors, such as Wi-Fi, Bluetooth, and so on.

1.26.4.3.1

Geometric approaches

In geometry, trilateration is a method to determine the target location of a point by measurement of distances to three points at known locations using the geometry of circles, triangles, or spheres (Cayley, 1841). This method has been widely used not only in positioning systems (Doukhnitch et al., 2008; Hofmann-Wellenhof et al., 1994), but also in robot localization, computer graphics, aeronautics, and so on (Thomas and Ros, 2005). For the positioning purpose, if already knowing the coordinate (lat, lon) information of three fixed access points, it needs to convert the latitude and longitude of these locations on the Earth to axis values in the Cartesian coordinate system with some rules: (1) the x-axis goes through (0,0); (2) the y-axis goes through (0,90); and (3) the z-axis goes through the poles. The formulas of this conversion are expressed as follows: x ¼ R  cosðlatÞ  cosðlonÞ

(1)

y ¼ R  cosðlatÞ  sinðlonÞ

(2)

z ¼ R  sinðlatÞ

(3)

where R is the approximate radius of earth (e.g., 6371 km). Then we can try to find the solutions to trilateration equations to approximate the location of the target object by referring to three points with known locations (Hofmann-Wellenhof et al., 1994) (see Fig. 2). The equations are nonlinear and it is not so easy to obtain an exact solution. Several iterative arithmetic methods have been proposed to find efficient solutions for trilateration-based localization (Manolakis, 1996; Yang and Liu, 2010). It is obvious that the distance between a wireless device and an access point is the key for Wi-Fi positioning using the trilateration method. The received signal strength does not directly lead to an estimated distance to a Wi-Fi access point. In general, it does follow a trend that the signal strength decreases with distance as is expected, but it is not a simple linear path-loss model. The log-distance path-loss model is one of the most simplistic radio-propagation models (Motley and Keenan, 1988; Seybold, 2005) that predicts the received signal strength at a certain distance inside a building or densely populated areas. It can be expressed as follows:   d þ Xg (4) PLðdÞ½dB ¼ PLðd0 Þ½dB  10  n  log10 d0

Mobile GIS and Location-Based Services

389

(lat2,lon2)

(lat1,lon1)

d2

d1 Target d3

(lat3,lon3) Fig. 2 them.

Determine the location of the target object by referring to three Wi-Fi access points with known coordinate information and the distances to

where PL(d) is the received signal strength by a wireless device in decibel (dB) at a distance of d meters, PL(d0) is the received signal strength at a known reference distance d0, n is a path loss exponent factor, and Xg is a random variable that reflects the attenuation factors of complex environment. And we could solve this equation and approximate the distance to the wireless router if we get the empirical fits of the coefficients n and Xg, as well as a known received signal strength at a given distance d0 and the signal strength at the unknown location. However, in many practical situations, many factors underlying radio propagation can contribute to the reflection, refraction, absorption, and scattering of signals. It is difficult to predict the received signal strength by a simplistic log-distance pass-loss model. Rather than creating a new pass-loss model, some researchers try to reparameterize the classic log-distance model. For example, Bose and Foh (2007) found that at closer ranges (e.g., smaller than 5 m) the n exponent factor of propagation model would take a higher value, and thus a dual-distance model has been proposed to adjust the propagation model at larger distances and to achieve a better positioning accuracy.

1.26.4.3.2

Fingerprinting approaches

Although most Wi-Fi access points are located at the fixed locations of buildings, it is very hard to get their accurate coordinates and detailed digital floor maps because of privacy concerns. Fingerprinting approach is a popular wireless network positioning technology in metropolitan areas since it does not need the exact position information of Wi-Fi access points and does not model the signal strength. Instead, it needs to construct an offline database that contains the signal-strength distribution of Wi-Fi access points at known locations, that is, the “radiomap.” For a given location, it receives varying signal values from equal or less than the maximum number of available Wi-Fi access points. The set of access points and their signal strengths are distinctive and present a “fingerprint” to a unique location in the “radiomap.” Thus, we can employ searching and comparison processes in the online positioning phase to find the most probable location. Several fingerprinting matching and calculation methods have been developed to best estimate the location of the user based on received Wi-Fi signal-strength (SS) values at known RP locations (Gezici, 2008; Kaemarungsi and Krishnamurthy, 2004; Lin and Lin, 2005; Mok and Retscher, 2007). In its simplest form, it can be expressed mathematically as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX (5) ½SSRP ði; pÞ  SSME ðiÞ2 arcmin t p˛ð1; 2; .nÞ

i¼1

where SSRP (i,p) represents the received signal-strength value of access point i at a known RP location p on the radiomap, and SSME (i) is the measured signal-strength value of access point i at the current unknown location. The location p that has the minimum rootsquared-differences of SS values for all available access points between RPs and the target location is considered as the most probable estimated location. Machine-learning-based fingerprinting techniques have also been studied for improving the quality of location estimation in complex real environment, including Bayesian modeling, k-nearest-neighbor estimation, support vector machine, neural networks, and so on (Bahl and Padmanabhan, 2000; Brunato and Battiti, 2005; Honkavirta et al., 2009; Lin and Lin, 2005; Roos et al., 2002; Youssef et al., 2003).

1.26.4.3.3

Statistical approaches

Commonly, wireless signal-strength indicators used in positioning relate to power, direction and time of a received signal. Several characteristic indicators are widely used for position estimation purposes, such as time of arrival, angle of arrival, and time

390

Mobile GIS and Location-Based Services

difference of arrival. Gezici (2008) conducted a comprehensive survey on those wireless position estimation techniques. Statistical approaches are employed to formulate a generic framework for position estimation using one or multiple characteristic indicators, which can be expressed as: Zi ¼ fi ðlat; lonÞ þ di ;

ði ¼ 1; 2; .; N Þ

(6)

where Zi is the estimated ith indicator value, fi (lat, lon) is the function for the ith indicator value at a given location with coordinate (lat, lon), di is a noise parameter, and N represents the number of estimated indicators. The parameters can be estimated based on offline signal sampling data collected at different reference locations, which is similar to previously introduced fingerprinting approach. However, the main difference between fingerprinting and statistical approaches is whether to formulate a generic parameter-based theoretical framework that can be employed for online location estimation in the second step. Depending on available information on the noise parameter or the indicator probability density function, we can choose parametric statistical tests relying on assumptions about the shape of the distribution (e.g., a Gaussian distribution) and about the form of parameters (e.g., mean, median, and standard deviations), or nonparametric estimation methods (e.g., least squares regression) relying on a fit to empirical data in the absence of any guidance or constraints from theory to estimate the target location.

1.26.5

Mobile Databases and Field Data Collection

1.26.5.1

Mobile Databases

Mobile database maintains all the spatial and nonspatial data for mobile GIS applications. Different mobile GIS applications usually have different mobile databases in terms of the size, content, schema, and network connection mode. Database schema, as the most important feature, defines how data are managed and organized in a given database. If two databases have heterogeneous schemata, it would be difficult to integrate them on one mapping layer and data conversion need to be processed and stored before mobile visualization. Because of the limited storage space of mobile devices, mobile databases are usually maintained in one or several servers rather than on mobile devices themselves. In this case, data which are held on the server will be queried, retrieved, transformed, and visualized in mobile clients through network connections. Therefore, frequent communication between clients and servers is necessary for a robust mobile GIS Web service. In fact, how to make the communication fast and stable is a hot research topic in the mobile computing field. Communication among different devices within a distributed system is a traditional goal for many network systems. However, a wireless network together with mobile clients has unique characters (Barbará, 1999) compared to other networks because of its asymmetrical communication between clients and servers. It is asymmetry in communication means that the bandwidth in the servers-to-clients direction is much larger than that in the reverse direction. That makes data dissemination a hot topic in the early stage of mobile network communication. Data dissemination is the process to delivery data from servers-to-clients. How to deliver data to appropriate clients based on attributes of clients is very important and is still a hot research topic. Data consistency is also an important area of research, especially when mobile databases have been maintained by multiple servers. On a field data collection process, a field worker can connect to one server of a distributed database and update the field survey data while another field worker may query the same part of data from another server of this system to do validation process simultaneously. If these two servers have not been synchronized, the second field worker will get out-of-date results and his/her validation work will be meaningless. Different methods and algorithms have been proposed to maintain the consistency of the database, such as session guarantees (Terry et al., 1994) which establish some rules on a sequence of I/O operations performed by a mobile client. A fast response time for database query, especially spatial query, is a major objective for mobile GIS. Shi et al. (2009) has proposed a specially designed dynamic database to accelerate the querying process. This dynamic database has been generated and continually updated based on spatial, temporal, and thematic constraints which have been provided by users. This idea is similar to the caching technique which is used to obtain a small piece of data in which users are interested from the server database temporally. Location queries on mobile GIS depend on the real-time location of mobile clients. Caching technique works well in this process. The mobile clients “cache” both spatial and nonspatial data around the current location of users. Therefore, when users do a location-dependent query, such as “nearby” information, the mobile GIS application will first search the data from local precached database instead of sending a new request to the remote servers. The most common way of storing and managing data on the mobile device is to use a cloud database and connects remotely in order to access its data. However, a mobile application needs an active and quite fast network connection. Embeddable databases which are lightweight self-contained libraries with no server component and limited resource requirements can provide offline data storage and retrieval capabilities. Popular ones among popular types of mobile operating systems in this category include BerkeleyDB, CouchbaseLite, SQLite, etc. In order to store geospatial objects (points, polylines, and polygons) and support spatial indexing, those databases usually extend with OGC Geometry Standard such as SpatiaLite (a geospatial extension of SQLite) and provide a powerful geospatial mobile database management systems.

1.26.5.2

Field Data Collection

According to the fact whether the mobile GIS services have data editing capabilities, mobile GIS can be classified into two major application areas: field-based GIS and location-based GIS (Tsou, 2004). Field work is the first part of research tasks in many

Mobile GIS and Location-Based Services

391

disciplines, for example, survey mapping, environment monitoring. Traditional field mapping tasks with the help of desktop GIS software is time consuming and the quality of data collected by those traditional methods is difficult to control, let alone the semantic integrity of the field data. A good candidate to solve this problem would be integrating global position system, GIS and remote sensing capabilities into mobile GIS services with the help of semantic plausibility control which enables the field workers or scientists to edit, update, and validate both spatial and nonspatial data. One advantage of using mobile GIS devices for field data collection is real-time data updates and exchanges between centralized map servers and distributed mobile devices (Pundt, 2002). Digital data editing capabilities and remote accessibilities to shared spatial datasets during data collection process can improve the cooperation among field workers and ensure the data agreement. Mobile GIS services with data editing abilities will improve the date collection process a lot in terms of efficiency and location quality. Moreover, prompt updates of geographic information such as road network connectivity and buildings are very important in disaster response and emergency management. Mobile GIS make use of many trending techniques and portable equipment which have been applied in an urban disaster management context (Montoya, 2003). In addition, the immediate access to shared geospatial data (georeferenced, topographic, and cartographic information) while taking a field survey (Pundt and Brinkkötter-Runde, 2000) gives the field workers more comprehensive understanding of the relationship between the real world features and digital features. By comparing the outdoor situations with the digital representation of features, field workers will have a better understand of the real world conditions. Thus it will reduce the errors in data collection process and thus improve the data quality. More advanced mobile GIS services, like knowledge-based diagnostic tools, automatic plausibility controls, can help to achieve semantic integrity of the mobile GIS database.

1.26.6

LBS and Applications

LBS, as a relatively new term compared with mobile GIS which focus more on field data collection and mapping, can be defined as services utilizing the ability of real-time determination and transmitting the location of a mobile user with aim to help people geolocating themselves and guide them to the destination (Abdalla, 2016; Lemmens, 2011). By using the aforementioned outdoor/indoor positioning technologies, the location of the mobile user can be determined dynamically. Combined with other geographic information, such as POIs, transportation networks, and traffic information, LBS applications can help users to do a variety of location related tasks. In the following, we will present some of those most popular LBS applications (Apps).

1.26.6.1

Google Maps

A well known example of LBS applications is the mobile version of Google Maps services (https://www.google.com/intl/en_us/ mobile/maps/). It visualizes the current location of a mobile user on a basemap. The basemap helps the user to interpret his/ her location by referring to the corresponding geographic background (e.g., road networks, POIs, buildings, landmarks, etc.). Google Maps reversely geocode the coordinates of a user’s location on the Earth to a human readable address information. The most frequently used services are “NearBy” Service and Navigation Service. As for the “NearBy” service, for instance, a user would like to find a “sports bar” nearby the stadium before the “Super Bowl” (Fig. 3). The user can click on the “NearBy” button of the App. The App will search for bars close to the current location of user and visualize the location on the map. A list of these bars will also show up together with the attribute information, including address, open/close time, customers’ reviews, rating, etc. After getting this information, the user can easily choose one that he/she likes most. In addition, the Navigation service is another popular LBS service which has been widely used almost all over the world. A user can select an origin (current GPS location as the default), a destination, and the traveling mode (i.e., walking, public transportation, driving, and biking). After entering the “navigation” mode, the “shortest” path from the user’s current location to the destination will be highlighted on the map. The basemap will also be rotated and zoomed into street-center views. Moreover, as more and more users begin to use Google Map Services, the App can get the location history and navigation trajectories, as well as real-time location of all users. By mining historical user logs and analyzing other information, Google Maps can derive insights about how users move and navigate the environment. It might be helpful to find better paths to avoid heavy traffic or traffic jam by integrating real-time information from traffic sensor networks and/or crowdsourced traffic reports.

1.26.6.2

Waze

Waze is a community-based traffic and navigation app (https://www.waze.com), in which drivers or passengers share real-time traffic information to advise other drivers detour or find an alternative route for their trips on the road networks (Fig. 4). This app falls into the category of volunteered geographic information (Goodchild, 2007), in which volunteers create and contribute geographical features or location descriptions to platforms where the entries are synthesized into databases. In the Waze app, users can report accidents, traffic jams, road work, speed limits, and police traps, as well as from the online map editor where users can update roads, landmarks, gas prices, and so on. Many users are engaged in the contributions since they can identify the cheapest fuel station nearby or navigate along a light-traffic route because of others’ contributions. Such a sharing mode and gaming mechanism become more and more popular in many LBS Apps.

392

Fig. 3

Mobile GIS and Location-Based Services

A screenshot of LBS using Google Maps app.

1.26.6.3

Yelp

Yelp (https://www.yelp.com) is another popular LBS application which has been widely used in people’s daily life. A user can find “nearby” restaurants with detailed business information (e.g., address, place type, open hours, contact phone number, food menu, etc.), and users’ ratings and reviews (Fig. 5). Compared with other LBS, Yelp focuses more on services based on POIs, such as restaurants, shopping centers, etc. Yelp owns a larger dataset of global POIs. After getting rich reviews from the users for many years, Yelp can be taken as a popularity reference or even “quality” evaluation source or POIs, which often help users to make a choice of food venue. The POI updating time cycle is a key factor that drives the changes in both spatial and nonspatial information in the database.

1.26.6.4

Foursquare and Swarm

Foursquare (https://foursquare.com) and Swarm (https://www.swarmapp.com) are owned by the same company but with a different functionality focus on how people connect to places (Fig. 6). Up to 2016, the Apps have generated an incredible amount of datasetsdover 6 billion user check-ins, 300 million photos, 55 million tips, and hundreds of millions of edits of places. On one hand, the latest version of Foursquare app helps users discover new places nearby or search for places (restaurants, coffee shops, bars, parks, gyms, and so on) based on a user’s current location. On the other hand, the Swarm app keeps the “check-in” feature and users can share their locations which they have been in or where they are currently styling and notify their friends or neighborhoods with enriching the location-based social networks (LBSNs).

1.26.6.5

Uber

Uber provides location-based real-time carpool sharing services (https://www.uber.com/). Uber aims at help the user to find a “nearby” Uber car according to both the passenger’s and the driver’s location (Fig. 7). Uber are able to track all the registered

Mobile GIS and Location-Based Services

Fig. 4

393

A screenshot of the Waze app.

Uber cars’ locations (In fact, they are based on drivers’ mobile phones which have been installed this App) on the server. Based on scalable carpool matching algorithms Uber can fast find the cars that are close to a passenger’s request location and notify these car drivers. The Uber driver who accepts the request will drive the user to their destination. In their scenarios, it needs to manage large scale of spatiotemporal information for both the users and drivers and couple them dynamically.

1.26.6.6

Nike Plus

Nike Plus (http://www.nike.com/us/en_us/c/nike-plus/training-app) is another interesting sports application using mobile GIS technologies. It tracks users’ locations when they are running, riding a bike, and so on. It automatically collects users’ workout data, such as average speed, total distance, duration, and total burned calories. It enables users to share their workout data with their friends which will encourage them to work out regularly. It is actually very common for a LBS application to engage users’ participation by connecting to the social network for sharing the user’s activities.

1.26.6.7

STRAVA

Similar to Nike Plus, STRAVA is another application which helps people to manage their sports activity data (https://www.strava. com). But STRAVA is more popular for people who love to do exercise on terrains, such as mountain riding, hiking. Because STRAVA records not only the plane coordinates but also the elevation of a user’s location. Based on the collected data from crowdsourced users, data engineers and geographic information researchers can perform advanced analysis to derive interesting geographic information such as accessibility and reliability of certain trails and roads on the terrain models in GIS.

Fig. 5

Search the nearby restaurants using the Yelp app.

Fig. 6

Screenshots of using the Apps (A) Foursquare and (B) Swarm.

Mobile GIS and Location-Based Services

Fig. 7

1.26.7

395

Screenshot of using the App Uber.

Research Frontiers of Mobile GIS and LBS

There has been growing interest among researchers in studying human mobility patterns based on the data collected from locationawareness devices and social media, for example, GPS-enabled devices (Yue et al., 2014; Zheng et al., 2008), cellular phones (Gao, 2015; Kang et al., 2010; Xu et al., 2016; Zhao et al., 2016), and Bluetooth sensors (Nordström et al., 2007), as well as LBSNs such as Twitter, Foursquare, and Jiepang (Cheng et al., 2011; Cho et al., 2011; Gao et al., 2014; Liu et al., 2014; Scellato et al., 2011). All of those research areas offer new insights on complex human–environment interactions and how human behaviors and social connections captured by a variety of mobile applications, and thus can be taken as the frontiers of mobile GIS and LBS. Looking forward to the future, some areas may need further investigation and research as follows (but not limited to): (1) Seamlessly integrated outdoor and indoor positioning and mobile tracking technologies. Positioning will still be the most important feature for enabling mobile GIS and LBS applications. (2) Lightweight mobile geographic information databases. More and more users using mobile GIS need to have access to offline geospatial data and query information in field work or when navigating on the maps in a new environment. More efficient data compression and decompression technologies in lightweight mobile databases may help to store large spatial coverage of data and respond spatial search requests more quickly. (3) Scalable and efficient mobile processing capability. The computation power of mobile devices is weaker than PC or computer servers. However, many geospatial queries and analytics do need complex computation. The development of scalable and efficient mobile processing technologies and approaches will advance mobile-based decision making and geospatial intelligence.

396

Mobile GIS and Location-Based Services

(4) For applications, mobile GIS and LBS will play an important role in community-driven locational data survey, disaster response and emergency management, public digital health, and smart cities. (5) Last but not least, with the fast development of information technologies, mobile GIS may also further integrate with cuttingedge technologies, such as artificial intelligence, augmented reality, internet of things, wearable devices, and so on.

1.26.8

Conclusions and Future Work

In this article, we present a comprehensive review of mobile GIS and LBS concepts, core components and characteristics, technology stack, a variety of applications, and research frontiers. This subject itself is also fast developing and advancing. We do hope this article not only educates the next-generation of geographic information systems/science major students with the abovementioned knowledge but also inspire them to dive into the challenging research areas and make their contributions in the future.

References Abdalla, R., 2016. Mobile GIS and location-based services (LBS). Introduction to geospatial information and communication technology (GeoICT). Springer International Publishing. Barbará, D., 1999. Mobile computing and databasesda survey. IEEE Transactions on Knowledge and Data Engineering 11 (1), 108–117. Bahl, P., Padmanabhan, V.N., 2000. RADAR: An in-building RF-based user location and tracking system. In: INFOCOM 2000. Nineteenth annual joint conference of the IEEE computer and communications societies. Proceedings, vol. 2. IEEE, pp. 775–784. Bose, A., Foh, C.H., 2007, December. A practical path loss model for indoor WiFi positioning enhancement. In: 2007 6th international conference on information, communications & signal processingIEEE, pp. 1–5. Brunato, M., Battiti, R., 2005. Statistical learning theory for location fingerprinting in wireless LANs. Computer Networks 47 (6), 825–845. Bulusu, N., Heidemann, J., Estrin, D., 2000. GPS-less low-cost outdoor localization for very small devices. IEEE Personal Communications 7 (5), 28–34. Chang, N., Rashidzadeh, R., Ahmadi, M., 2010. Robust indoor positioning using differential Wi-Fi access points. IEEE Transactions on Consumer Electronics 56 (3), 1860–1867. Cheng, Z., Caverlee, J., Lee, K., Sui, D.Z., 2011. Exploring millions of footprints in location sharing services. In: ICWSM, 2011, pp. 81–88. Cayley, A., 1841. On a theorem in the geometry of position. Cambridge Mathematical Journal 2, 267–271. Cho, E., Myers, S.A., Leskovec, J., 2011, August. Friendship and mobility: User movement in location-based social networks. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data miningACM, pp. 1082–1090. Cox S, Cuthbert A, Lake R, and Martell R (2001) Geography markup language (GML) 2.0. http://www.opengis.net/gml/01-029/GML2.html. De Smith, M.J., Goodchild, M.F., Longley, P., 2007. Geospatial analysis: A comprehensive guide to principles, techniques and software tools. Troubador Publishing Ltd. Doukhnitch, E., Salamah, M., Ozen, E., 2008. An efficient approach for trilateration in 3D positioning. Computer Communications 31 (17), 4124–4129. Egenhofer, M.J., Clarke, K.C., Gao, S., Quesnot, T., Franklin, W.R., Yuan, M., Coleman, D., 2016. Contributions of GIScience over the past twenty years. In: Advancing geographic information science: The past and next twenty years, pp. 9–34. Evennou, F., Marx, F., 2006. Advanced integration of WiFi and inertial navigation systems for indoor mobile positioning. Eurasip Journal on Applied Signal Processing 2006, 164. Faragher, R., Harle, R., 2014, September. An analysis of the accuracy of Bluetooth low energy for indoor positioning applications. In: Proceedings of the 27th international technical meeting of the satellite division of the institute of navigation (ION GNSSþ’14). Ferris, B., Fox, D., Lawrence, N.D., 2007, January. WiFi-SLAM using Gaussian process latent variable models. In: IJCAI, vol. 7, pp. 2480–2485. Gao, S., 2015. Spatio-temporal analytics for exploring human mobility patterns and urban dynamics in the mobile age. Spatial Cognition and Computation 15 (2), 86–114. Gao, S., Liu, Y., Wang, Y., Ma, X., 2013. Discovering spatial interaction communities from mobile phone data. Transactions in GIS 17 (3), 463–481. Gao, S., Yang, J.A., Yan, B., Hu, Y., Janowicz, K., McKenzie, G., 2014. Detecting origin-destination mobility flows from geotagged tweets in greater Los Angeles area. In: Eighth international conference on geographic information science (GIScience’14). Gezici, S., 2008. A survey on wireless position estimation. Wireless Personal Communications 44 (3), 263–282. Gu, Y., Lo, A., Niemegeers, I., 2009. A survey of indoor positioning systems for wireless personal networks. IEEE Communications Surveys and Tutorials 11 (1), 13–32. Goodchild, M.F., 1992. Geographical information science. International Journal of Geographical Information Systems 6 (1), 31–45. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69 (4), 211–221. Goodchild, M.F., 2010. Twenty years of progress: GIScience in 2010. Journal of Spatial Information Science 2010 (1), 3–20. Hidayab, M., Ali, A.H., Azmi, K.B.A., 2009, December. WiFi signal propagation at 2.4 GHz. In: 2009 Asia Pacific microwave conferenceIEEE, pp. 528–531. Hofmann-Wellenhof, B., Lichtenegger, H., Collins, J., 1994. Global positioning system: Theory and practice. Springer-Verlag Wien. Honkavirta, V., Perälä, T., Ali-Löytty, S., Piché, R., 2009, March. A comparative survey of WLAN location fingerprinting methods. In: 6th workshop on positioning, navigation and communication, 2009. WPNC 2009IEEE, pp. 243–251. IEEE 802.11 Working Group (2007) IEEE 802.11-2007: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications. IEEE 802.11 LAN Standards 2007. Kaemarungsi, K., Krishnamurthy, P., 2004, March. Modeling of indoor positioning systems based on location fingerprinting. In: INFOCOM 2004. Twenty-third annual joint conference of the IEEE computer and communications societies, vol. 2. IEEE, pp. 1012–1022. Kang, C., Gao, S., Lin, X., Xiao, Y., Yuan, Y., Liu, Y., Ma, X., 2010, June. Analyzing and geo-visualizing individual human mobility patterns using mobile call records. In: Proceedings of the 18th international conference on geoinformaticsIEEE, pp. 1–7. Kuo, Y.S., Pannuto, P., Hsiao, K.J., Dutta, P., 2014, September. Luxapose: Indoor positioning with mobile phones and visible light. In: Proceedings of the 20th annual international conference on mobile computing and networkingACM, pp. 447–458. Lemmens, M., 2011. Mobile GIS and location-based services. In: Geo-information. Springer, Netherlands, pp. 85–100. Li, B., Gallagher, T., Dempster, A.G., Rizos, C., 2012, November. How feasible is the use of magnetic field alone for indoor positioning?. In: 2012 international conference on indoor positioning and indoor navigation (IPIN)IEEE, pp. 1–9. Lin, T.N., Lin, P.C., 2005, June. Performance comparison of indoor positioning techniques based on location fingerprinting in wireless networks. In: 2005 international conference on wireless networks, communications and mobile computing, vol. 2. IEEE, pp. 1569–1574. Liu, H., Darabi, H., Banerjee, P., Liu, J., 2007. Survey of wireless indoor positioning techniques and systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 37 (6), 1067–1080. Liu, J., Chen, R., Pei, L., Guinness, R., Kuusniemi, H., 2012. A hybrid smartphone indoor positioning solution for mobile LBS. Sensors 12 (12), 17208–17233. Liu, Y., Sui, Z., Kang, C., Gao, Y., 2014. Uncovering patterns of inter-urban trip and spatial interaction from social media check-in data. PLOS ONE 9 (1), e86026. Manolakis, D.E., 1996. Efficient solution and performance analysis of 3-D position estimation by trilateration. IEEE Transactions on Aerospace and Electronic Systems 32 (4), 1239–1248.

Mobile GIS and Location-Based Services

397

Mark, D.M., 2003. Geographic information science: Defining the field. In: Foundations of geographic information science, pp. 3–18, 1. Mautz, R., 2009. Overview of current indoor positioning systems. Geodezija ir kartografija 35 (1), 18–22. Misra, P., Enge, P., 2006. Global positioning system: Signals, measurements and performance, 2nd edn. Ganga-Jamuna Press, Lincoln, MA. Mok, E., Retscher, G., 2007. Location determination using WiFi fingerprinting versus WiFi trilateration. Journal of Location Based Services 1 (2), 145–159. Montoya, L., 2003. Geo-data acquisition through mobile GIS and digital video: An urban disaster management perspective. Environmental Modelling & Software 18 (10), 869–876. Motley, A.J., Keenan, J.M.P., 1988. Personal communication radio coverage in buildings at 900 MHz and 1700 MHz. Electronics Letters 24 (12), 763–764. Nordström, E., Diot, C., Gass, R., Gunningberg, P., 2007, June. Experiences from measuring human mobility using Bluetooth inquiring devices. In: Proceedings of the 1st international workshop on system evaluation for mobile platformsACM, pp. 15–20. Pundt, H., Brinkkötter-Runde, K., 2000. Visualization of spatial data for field based GIS. Computers & Geosciences 26 (1), 51–56. Pundt, H., 2002. Field data collection with mobile GIS: Dependencies between semantics and data quality. GeoInformatica 6 (4), 363–380. Reichenbacher, T., 2001. Adaptive concepts for a mobile cartography. Journal of Geographical Sciences 11 (1), 43–53. Roos, T., Myllymäki, P., Tirri, H., Misikangas, P., Sievänen, J., 2002. A probabilistic approach to WLAN user location estimation. International Journal of Wireless Information Networks 9 (3), 155–164. Scellato, S., Noulas, A., Lambiotte, R., Mascolo, C., 2011. Socio-spatial properties of online location-based social networks. In: ICWSM, 11, pp. 329–336. Seybold, J.S., 2005. Introduction to RF propagation. John Wiley & Sons. Shi, W., Kwan, K., Shea, G., Cao, J., 2009. A dynamic data model for mobile GIS. Computers & Geosciences 35 (11), 2210–2221. Regalia, B., McKenzie, G., Gao, S., Janowicz, K., 2016. Crowdsensing smart ambient environments and services. Transactions in GIS 20 (3), 382–398. Terry, D.B., Demers, A.J., Petersen, K., Spreitzer, M.J., Theimer, M.M., Welch, B.B., 1994, September. Session guarantees for weakly consistent replicated data. In: Proceedings of the third international conference on parallel and distributed information systemsIEEE, pp. 140–149. Thomas, F., Ros, L., 2005. Revisiting trilateration for robot localization. IEEE Transactions on Robotics 21 (1), 93–101. Tomlinson, R.F., 1987. Current and potential uses of geographical information systems the North American experience. International Journal of Geographical Information System 1 (3), 203–218. Tomlinson, R.F., 2007. Thinking about GIS: Geographic information system planning for managers. ESRI Inc. Tsou, M.H., 2004. Integrated mobile GIS and wireless internet map servers for environmental monitoring and management. Cartography and Geographic Information Science 31 (3), 153–165. Virrantaus, K., Markkula, J., Garmash, A., Terziyan, V., Veijalainen, J., Katanosov, A., Tirri, H., 2001, December. Developing GIS-supported location-based services. In: Proceedings of the second international conference on web information systems engineering, 2001, vol. 2. IEEE, pp. 66–75. Wang, Y., Jia, X., Lee, H.K., Li, G.Y., 2003, July. An indoors wireless positioning system based on wireless local area network infrastructure. In: 6th international symposium on satellite navigation technology including mobile positioning & location services (No. 54). Xu, Y., Shaw, S.L., Zhao, Z., Yin, L., Lu, F., Chen, J., Fang, Z.X., Li, Q., 2016. Another tale of two cities: Understanding human activity space using actively tracked cellphone location data. Annals of the American Association of Geographers 106 (2), 489–502. Yang, Z., Liu, Y., 2010. Quality of trilateration: Confidence-based iterative localization. IEEE Transactions on Parallel and Distributed Systems 21 (5), 631–640. Youssef, M.A., Agrawala, A., Udaya Shankar, A., 2003, March. WLAN location determination via clustering and probability distributions. In: Proceedings of the first IEEE international conference on pervasive computing and communications, 2003 (PerCom 2003)IEEE, pp. 143–150. Yue, Y., Lan, T., Yeh, A.G., Li, Q.Q., 2014. Zooming into individuals to understand the collective: A review of trajectory-based travel behaviour studies. Travel Behaviour and Society 1 (2), 69–78. Zandbergen, P.A., 2009. Accuracy of iPhone locations: A comparison of assisted GPS, WiFi and cellular positioning. Transactions in GIS 13 (s1), 5–25. Zhao, Z., Shaw, S.L., Xu, Y., Lu, F., Chen, J., Yin, L., 2016. Understanding the bias of call detail records in human mobility research. International Journal of Geographical Information Science 30 (9), 1738–1762. Zheng, Y., Li, Q., Chen, Y., Xie, X., Ma, W.Y., 2008, September. Understanding mobility based on GPS data. In: Proceedings of the 10th international conference on ubiquitous computingACM, pp. 312–321.

1.27

Societal Impacts and Ethics of GIS

Jeremy W Crampton, Eric M Huntley, and Emily C Kaufman, University of Kentucky, Lexington, KY, United States © 2017 Elsevier Inc. All rights reserved.

1.27.1 1.27.1.1 1.27.1.2 1.27.2 1.27.2.1 1.27.2.2 1.27.2.3 1.27.2.4 1.27.2.5 1.27.2.6 1.27.3 1.27.3.1 1.27.3.2 1.27.3.3 1.27.3.4 1.27.3.5 1.27.3.6 1.27.3.7 1.27.4 References

1.27.1

Introduction: Concepts and History Ethics in Geospatial Research: A Brief History Contemporary Thought on Ethics in Geography Case Study in Ethics and GIS: Policing Data Acquisition Data Management Data Visualization Data Dissemination Data Effects Summary Ethics and GeoDesign Introduction Data Acquisition Data Management Data Visualization Data Dissemination Data Effects Summary Conclusion

398 399 402 404 404 404 405 405 406 407 407 407 408 409 409 410 410 411 411 412

Introduction: Concepts and History

This article provides an introduction and overview of the societal impacts and ethics of GIS. The article is divided into three main parts. Following this introduction, we provide a brief history of how ethics have been implemented and understood. We begin with the rise of ethical concerns in scientific research following World War II and then shift to ethical considerations, more specifically in GIS and digital mapping. Initially, ethics in GIS was understood as a code of practice, and several influential ethical codes have been developed for GIS; most notably that adopted by the Professional Certificate in GIS which has now been awarded to over 8000 individuals. More recently, ethical considerations have taken a turn toward issues of justice and power, or what might be thought of as a more explicitly political/politicized approach. In the next two sections, we provide case studies that draw on our own research. The purpose here is to move from conceptual discussions of ethics, important as they are, to work through the messy contingency of ethics in practice. Both case studies are structured around a common set of questions: How does ethics manifest in questions of data acquisition, data management, and data dissemination? Or more simply who is represented, by whom, how, and with what effects? In section “Case Study in Ethics and GIS: Policing” we consider these questions in the context of policing in New York City. In the last couple of decades policing has seized on GIS and spatial data, not only in a drive to gather and visualize data, but in the context of an increasing means and authority to do so. This has meant a proliferation of crime mapping, its impact on security and wellbeing, and its power over life and death. More recently, the use of predictive policing (PredPol) has combined traditional crime mapping with algorithmic procedures. What then are the ethical questions raised by these practices? In section “Ethics and GeoDesign” we examine ethics in the context of Geodesign, or GIS-guided design and planning. Geodesign is an interesting case study because some of its most significant ethical questions arise at a “higher order” than that captured by ethical codes of conduct or outcomes (that is, at the level of what is commonly known as “metaethics”). For example, who is it that participates in the planning processdindeed who may participate? Histories of Geodesign may not even capture those who did participate, especially if like women they were underrepresented in the first place. The recovery of their contributions may await a future historian in the vein of those who recovered the contributions of Rosalind Franklin or Marie Tharp. Geodesign, since it explicitly sets itself the task of “designing the future,” will logically need to consider who adjudicates these futures and who benefits (and presumably who loses out) and on what grounds. Framed as an ethical question, it is not clear on its face what model of ethics to deploy to answer this question (deontological, consequentialist, virtue ethics, or none of these). Geodesign and policing then help us recognize that ethical questions are not simply to be applied to practice, but that they are worked out in practices, and that they may not result in answers but rather further rounds of questioning. The issue of ethics and geospatial information technologies or GIS is now at least 40 years old. Under other names (such as “social justice”) it may fairly be said to date back to the initial impulses of those in the quantitative revolution who were concerned with urban social justice. Bunge’s well-known “Geographical Expeditions” took place in the late 1960s and early 1970s

398

Societal Impacts and Ethics of GIS

399

(Bunge, 1971). Prior to that, the University of Washington provided an intellectual impetus to deploy information, computing and spatial analysis, often in a manner familiar to many contemporary GIS users. Grouped around the pioneering work of William (“Bill”) Garrison, individuals such as Morrill (1965), Ronald Boyce, John Nystuen, and Duane Marble occupied privileged positions at the University of Washington in the mid-1950s (Barnes, 2004). These “Garrison Raiders” (Barnes and Wilson, 2014, p. 9) were later joined by William Bunge, Michael Dacey, Arthur Getis, and Waldo Tobler. Although their commitment to systematic geographies is well known, so much so that they are often called the “space cadets” (a term initially used in a derogatory fashion by a UCLA cultural geographer, see Barnes, 2004), their concern with social justice (or at least social concern) has received less play. But as Morrill recalls, during the 1960s he and his wife “and several other couplesdwhite, black and mixeddbarely escaped arrest for picketing and for testing real estate and banking practices (we were all members of the Congress of Racial Equality)” (Morrill, 1993, p. 352). For example, Edgar Horwood, who founded URISA in the early 1960s and worked with extensive Census Bureau datasets, was interested in “urban blight” in Spokane, Washington, more specifically “concentrations of housing units lacking plumbing facilities, and that were observed as deteriorating or dilapidated” (Dueker, 2012, p. 39). What was sometimes known as the “urban question” involved questions of both transportation and planning as people increasingly moved to the cities and as cities became larger. Of course urban planning was not itself new. European cities had addressed concerns of security and health, and instituted data collection programs since the 16th century but what was new was the scale of the data and the computational power coming into view. As Horwood remarked, in the early and mid-1960s: Urban region modeling studies were at the height of their expectations as a new scientific base for urban and regional planning. The cities had not yet burned up and the words of Martin Luther King were not yet in the foreground of the national conscience. Horwood, 2012 [1977], pp. 28–29.

During the early 1960s, as “urban renewal” took center stage in many communities, Horwood traveled the country demonstrating his Array and Card Mapping programs written in FORTRAN (precursors and influences on the development of SYMAP at Northwestern in 1964, which in turn later yielded ArcInfo and ArcGIS at Esri, see also Tobler, 1959), created with the help of the Garrison Raiders (Wilson, 2017). And although some of this was informed by a societal concern with social justice and civil rights during the 1960s America, and some of it with promoting economic development, it also perhaps drew on earlier work such as the British social reformer Charles Booth and his famous Life and Labour of the People in London (Booth, 1889) which mapped the social conditions of the working class in that city, or political principles, however flawed, behind Wilson’s famous “Fourteen Points” (perhaps partially drafted by geographer Isaiah Bowman, see Smith, 2003) and strong statements on race and racism after World War II by UNESCO, under its first Director, the British evolutionary biologist Julian Huxley (Crampton, 2009). Much of this was in response to the fascism of the 1930s and the aftermath of WWII, especially the Nuremberg Trials which we discuss briefly below. But this raises the question of what is ethics in GIS? Much of the work described above was ethics avant la lettre. There’s no doubt it overlaps and informs subsequent work on ethical thinking in GIS, digital mapping, and geospatial technologies, but it was not necessarily known as that. At its most fundamental, ethics is the question of “what is the right thing for us to do?” But ethics is also reflexive in the sense that it can ask what each of those terms means. Who is “us”dthe individual, a community, or everybody? What is the “right” or normative thing to do, and what flavor of normativity (strong, weak, or based on “practical reason”da pragmatic “competence on the go” as Barnett puts it (Barnett, 2013, p. 153)) is to be adopted? While this is not an article on the philosophy of ethics, it will be useful to clarify some terms and basic positions. If we examine the Code of Ethics adopted by the GIS Certification Institute, which was formed by URISA in the early 2000s, and which offers GIS professional certification on the basis of the UCGIS Body of Knowledge, it is stated that: “This code is based on the ethical principle of always treating others with respect and never merely as means to an end: i.e., deontology” (https://www.gisci.org/Portals/0/Ethics/ CodeOfEthics_PR.pdf). Rather than specifying specific acts which are allowable or not, deontology establishes a set of rules or principles by which to act. To date, nearly 8000 individuals have been awarded the GIS Professional (GISP) Certification under this rubric. A competing normative theory of ethics is known as “consequentialist.” On this view, there are no rules as such, but rather it is the consequences or outcomes of actions that are ethical or not. One must rationally weigh the pros and cons of an action in a cost-benefit analysis. At its most blunt, this is utilitarianism. A GIS is ethical if it produces a net social benefit (Lake, 1993). As we shall see however, ethical thinking today tends to formulate around neither of these positions, but has seen a rise in what is known as a “virtue theory” of ethics. The remainder of this introduction will discuss these positions before sketching out how ethical thinking has appeared in GIS, cartographic and geospatial practice and thought.

1.27.1.1

Ethics in Geospatial Research: A Brief History

We can identify three main lines of ethical concern for scientists after WWII. As codified today in Institutional Review Boards (IRBs) and informed consent rules for pursuing scientific research, they are: atomic scientists and nuclear research; human experimentation as documented at the Nuremberg Trials, but also in the US, the Tuskegee experiments, Stanley Milgram’s controversial research on

400

Societal Impacts and Ethics of GIS

obedience to authority (carried out in the 1960s) and Philip Zimbardo’s 1971 Prison Experiment; and the environmental movement, which developed an ethics of responsibility or care for habitus (1970s). The first of these can broadly be said to give rise to a scientific ethos of not contributing to work which potentially causes harm. This has a consequentialist flavor to it. The human experiments mentioned would not be possible today (even where they yielded insights into the human condition) because they fail the rule of informed consent, a deontological ethics. For the GIScientist, the concern of care for the lived world is perhaps the closest to their work. The concern here might be to contribute to research that improves social justice (e.g., environmental justice), develop alternative sources of energy to mitigate global climate change (e.g., the IGPCC), or provide relief where harm or loss of wellbeing has occurred (e.g., humanitarian mapping). These three strands represent scientists’ acknowledgment that not all research is justifiable. Subsequent bodies such as IRBs, the Declaration of Helsinki on medical research (1964, revised in 2013), the US Federal “Common Rule” (45 CFR 46) used by the National Science Foundation (NSF) among others, and the Belmont Report (1979) established rules and norms against which to adjudge the permissibility of proposed research, especially those involving human and live subjects. Many of these were in turn derived from the Nuremberg Code of 1947. Milgram’s experiments, although they were clearly important and potentially policy-relevant, were nonconsensual (in part) and would likely not be permitted under contemporary norms of informed consent, founded on “respect for persons, beneficence, and justice” as the Belmont Report frames it. Or in other words, the principles of informed consent, protection of subjects’ wellbeing, and equitable treatment. Ethics as normative codes therefore became a central understanding in research, if largely driven by a medical science contextdwhich left their applicability to social science open to question. For the GIScientist too, this understanding has occupied a consistent strand in research. The earliest engagement with ethics (under that name) that we are aware of in GIScience/GIT was Thomas Peucker’s (later Poiker) call for a code of professional ethics at a landmark gathering of GIS scholars and practitioners at the Harvard Graphics Lab in 1977 (Dutton, 1978). Peucker called for a group which would provide a “strictly enforced code of ethics for the protection of each member’s intellectual property” (Dutton, 1978, p. 109). Protection of intellectual property (IP) is still a core concern for many in the field of digital mapping, although today there is a now significant and growing tradition of open access (OA) and citizen science (e.g., OpenStreetMap). For example, the GISCI Code of Ethics includes a section on “Obligations to Colleagues and the Profession” which explicitly includes IP. Such codes need not exist only as toothless guidelines; US law enshrines copyright protections in such Acts as the Digital Millennium Copyright Act, which provides legal basis for “take-down” of copyright infringing content (e.g., YouTube videos). Similar laws exist in Europe and the DMCA was in turn based on World Intellectual Property Organization (WIPO) international treaties. Furthermore, the US instilled copyright protections into its constitution (known as the Copyright Clause, Article I, Section 8)dalthough these rights were explicitly for a “limited time.” Parallel provisions with enforceable consequences may also be found in student handbooks enjoining against plagiarism. Prior to Peucker’s call, however, there are a number of key events and achievements that while they were not necessarily carried out under the rubric of ethics, were at the least informed by a sense of social justice or social concern. Among these must be included many of the early quantifiers in geography who were interested in using information, computing and spatial analysis in ways familiar to contemporary GIS. As has been observed in more detail elsewhere, one of the prime loci was the University of Washington in the 1950s and 1960s. Nevertheless, other concerns have from time to time caused a reassessment and expansion of what ethics means in digital mapping and GIS. One of the more interesting and productive of these actually began as a backlash against current practice, rather than by being developed within the field. About 10 years after the Harvard Graphics Lab meetings in Boston, several authors contributed critiques of mapping and GIS that urged a rethink of ethics as more than good professional practice. Mapping and GIS, they argued, are powerful societal actors (this critique is often known as the “GIS and Society” initiative). The British historical geographer J. Brian Harley, for example, started working on a book to be called Maps as Ideology in the early 1980s. During the decade up to his death in 1991, he published around half a dozen now well-cited articles, that while they were not internally consistent (Harley was experimenting with new concepts and changed his approach more than once), brought into play now common concepts such as power, silences, knowledge, privilege, and representation. In this sense, Harley developed what is now called critical cartography and GIS. At about the same time, John Pickles drafted a paper criticizing GIS as a technology of surveillance (Pickles, 1991). Surveillance has a specific meaning that separates it from simple looking: meaning “oversight,” it is etymologically derived from over and vigil, watchful. The word entered the English language during the Terror after the French Revolution. Pickles argued that maps effectively perform a looking (e.g., at the landscape) but that, at least on occasion and maybe increasingly with better technology, they do more than just look; they control. But even looking is not innocent. Here Pickles could draw on similar lines of thought dating to the 1960s and the way that “the gaze” (le regard in French) was itself never innocent. For example, in his book the Birth of the Clinic (first published in 1963), Michel Foucault argued that the gaze is not just a sensory experience but comes “armed” in a system of signification. By drawing on this line of thought, in other words, Pickles was mounting a critique of GIS as being unethical in its very foundation (see also Smith, 1992). The same insight proved to be a central one in fields such as feminism. Already in Simone de Beauvoir’s Second Sex (de Beauvoir, 1949), she was able to identify women as subjected to this gaze, subjectifying women as “the other” (and hence inferior). In like manner, mapping and GIS is not innocent (even if mapmakers and GIS practitioners lack malice) but rather it subjectifies its objectsdthat is, conceives, categorizes, and conducts them. In its most extreme forms, this “panoptic” surveillance (literally all-seeing) would apply to each and all. For Pickles, it was not that GIS looked, but that it “looked at” and as it did so, it constituted subjects and objects

Societal Impacts and Ethics of GIS

401

(Scott, 1998). To give an example, the typical spatial data model comprises points, lines, and areas (vector) or as gridded cells (raster), but neither people nor places are any of these things. More recently, the same critique has been applied to the rush to “spatial ontologies” and the need to “go beyond the geotag” (Crampton et al., 2013). Geographic science, despite some attempts to conceive of it as such (for example, Warntz & Bunge’s proposed book Geography: The Innocent Science, see Heynen and Barnes, 2011, and Box 45 in the William Warntz papers at Cornell University), was not innocent. This line of thought fit well with a new constellation of research that emerged in the 1980s, known as science and technology studies (STS) which aimed at exposing the “situated knowledge” of scientific practices. Behind the scenes, an AAG President’s column which called GIS “nonintellectual” had also rankled many GIS practitioners (Jordan, 1988). For many, there was “alarm” that seemingly outmoded or left-behind approaches were once again sneaking back in (Lake, 1993). Lake’s essay examined these concerns in the field of planning. On the one hand, he argued, there was a rise of rationalist and GIS approaches in planning while on the other there were proliferating critiques of positivism in geography and social theory. The result was a “yawning chasm” (Lake, 1993 p. 405). For Lake, the situation was critical: A code of ethics governs individual practice, but it ignores the ethics of the project of which that practice is a part. Equal access is ethically insufficient if it provides access to an ethically flawed project. In what sense, then, is the project of Geographic Information Systems ethically flawed? It is flawed because it relies on a partial and incomplete approach to ethics; because of the ethical consequences of its uncritical adoption of the positivist assumption of subject-object dualism; and because of its inability to comprehend and respect the subjective differences among the individuals who constitute the irreducible data points at the base of the GIS edifice. These issues are inextricably linked. Lake, 1993, p. 407.

These critiques led one of the prime movers behind the Harvard meetings to call for a fresh engagement between GIS practitioners and GIS critics. In April 1993, Tom Poiker (then at Simon Fraser University) proposed a workshop on geographic information and society to be held at Marine Research Center in Friday Harbor, Washington (a town on San Juan Island). Approximately 30 people attended, including Eric Sheppard (Moderator), Nick Chrisman (Local Arrangements) and it was funded by the NCGIA. The Core Planning Group also included Helen Couclelis (UCSB), Mike Goodchild (UCSB), David Mark (Buffalo), Harlan Onsrud (Maine), and John Pickles (Kentucky). The AAG was represented by Ron Abler (who had established the three main GIS Centers under the NCGIA during the 1980s). Michael Curry also played a key role. Outcomes of the meeting were reported in two main venues: a special issue of the journal Cartography and Geographic Information Systems (CaGIS) in 1995, and in Pickles’ Ground Truth (Pickles, 1995). The papers in their original form were also printed in a book distributed to the participants (this remains unpublished). Poiker (1995, p. 4) identifies the following main questions arising from the meeting: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Power/access Education Ethics and values Representation Cross-cultural questions Applications (e.g., Gender and GIS) Democracy Decision making Nature/Society History of GIS Social conflict Privacy

A follow-up meeting was held in February 1995 in Annandale, MN (Sheppard, 2005). Following this, a new NCGIA Initiative on “GIS and Society” was launched, sometimes known as “I19” (Crampton and Wilson, 2015). Pickles’ 1995 book had actually been planned since 1990, when it was conceived in conjunction with Harley. According to Harley’s papers, now in the British Library, Pickles and Harley had first corresponded in 1984 and again in 1987 over the issue of propaganda maps, or more generally “the issue of deception and ideology” in mapping (Pickles to Harley, personal communication, 28 January 1985, Harley Archives, The British Library; see also Pickles, 2004). This issue was taken up at the 1991 AAG meetings in Miami, under the heading of “Ethics in Cartography: Issues and Scope” organized by Mark Monmonier, with Pickles presenting what would become his Friday Harbor and Ground Truth paper, and with Harley as discussant. Pickles’ memories of these events are captured in a recent interview (Crampton and Wilson, 2015). Mark Monmonier contributed a paper on “the ethics of the one-map solution” where only a single map is produced in response to a mapping problem. For Monmonier, such an approach called for one of several possible more preferable options: (1) dynamic sequence of maps showing different views; (2) interactive mapping where the user can change the map; (3) professional standards for choosing a particular map; (4) “full disclosure” by the cartographer of draft and alternative views; (5) efforts to educate map users in “informed skepticism” and (6) institutional structures, including classes, a journal of cartographic criticism or public forums for promoting map critique (see also Monmonier, 1991; Monmonier, M. The Ethics of the One-Map Solution, paper given at the 1991 AAG Conference, Miami, FL.). These useful suggestions show again that although a professional code may be necessary, mapping ethics is as much a practice of critique that acknowledges the limits to knowledge.

402

Societal Impacts and Ethics of GIS

Part of the concern of the critics was that GIS became more institutionalized, such as with the so-called “Big Book” of GIS (Maguire et al., 1991) which was “quite different” from that pursued by the rest of the geographic discipline (Pickles, 1995, p. 12). Further details may be obtained in Schuurman’s institutional accounts (Schuurman, 2000), but what concerns us here is how ethics in cartography and GIS was being pursued. After Friday Harbor and the identification of the 12 research questions, it was no longer possible to understand ethics as professional conduct alone. At about the same time, there were efforts to consider ethics explicitly in cartography, known as the cartography round-tables. These round-tables sprang from professional meetings in 1989 (NACIS Ann Arbor, AAG Baltimore) and 1991 (AAG Miami) (see McHaffie et al., 1990) and a subsequent commentary (Harley, 2001 [1991]). These pushed up against a definition of ethics as only comprising professional practice. Harley took exception to some of the viewpoints, observing: For example, the emphasis on the copyright question as a major ethical issue seems to be misplaced. The old English rhyme tells us The law locks up both man and woman Who steals the goose from off the common But lets the greater felon loose Who steals the common from the goose.

Harley, 2001 [1991], p. 199.

Harley’s ultimate point here is that ethics should be a social process, rather than one carried out by others, even if those others were professionals. In the roundtable, McHaffie made a similar point, arguing that the very labor process of mapping could fall within the remit of ethical concern. As Harley put it in his discussant comments at the 1991 meeting, ethics “was a struggle for the intellectual territory of the map” (Harley Archive, The British Library). Harley’s position was subsequently summarized in his important paper (Harley, 2001 [1991]) in which he identifies three ethics areas of concern: what is covered by ethics (as a practical matter), an effort to resolve truth claims in mapping, and thirdly the wider social concerns and social justice to which mapping contributes. These three issues are perhaps still a good summary of ethics in today’s mapping and GIS, but nevertheless, thinking about ethics has progressed more recently in geography as a whole. A short summary of contemporary thought on ethics is now offered, along with its possible implications for work in GIScience.

1.27.1.2

Contemporary Thought on Ethics in Geography

The central question of ethics today can be captured by the dilemma of both wishing for clear foundations upon which to act (a normative epistemological sense of justice and truth, that is, a politics) with a parallel wish to provide critique. These two wishes appear to contradict each other. On the one hand, it would appear, we wish to assert true knowledge of something (e.g., global climate change) while on the other wishing to keep in reserve the practice of critique, derived from half a century or more of poststructuralist thought. (Both Kant and Marx saw their work as critique, but for our purposes we are concentrating on modern-day influences.) Can we both believe things and yet acknowledge that these beliefs are not foundational? As Barnett describes the situation, “[v]arious formulations finesse this problem, including contingent foundations, strategic essentialism, onto-stories, and weak ontology” (Barnett, 2010, p. 247). A “weak ontology” is an ontology, or conception of how and what being is, that is not universal, either historically or geographically. In this sense, it is different from how some GIScientists have used “ontology” to mean objects with characteristics. This conception of ontology as essences with characteristics, known since Aristotle as predicate or substance ontology, is very narrow and incomplete in the terms of which we are discussing here. This is because you cannot gain insight into things by abstractly listing their attributes and ignoring all the ways they have pasts, presents, and futural possibilities. Humans especially (but ultimately anything we care about) are not best understood as having their being as entities with properties (Crampton, 2009). A better example of Barnett’s weak ontology would be Hacking’s “historical ontology” (Hacking, 2002). Hacking derives the phrase from the work of Michel Foucault and the latter’s concern for the problem of how we think of ourselves. Foucault suggested that there were three main ways, or axes of doing this: knowledge, power, and ethics. In terms of ethics, Foucault characterized the issue of how we “constitute” ourselves in the sense of how we invent, make or bring ourselves into question as moral agents “in quite specific, local, historical ways” (Hacking, 2002, p. 3). In other words, there are ethical principles but they are local and historical. Our task is not to universalize these ethics, but to understand them historically. How is this to be done? Toward the end of his life, Foucault often spoke of ethics but this remains an underexamined aspect of his work compared to the avalanche of writing on power and knowledge (for exceptions see Elden, 2002, 2016). In sum, Foucault saw ethics as a practice, which was as much a practice or work on the self as on or with othersdethics as a self-practice (Foucault, 1997). The work that one did on the self as ethics, was not meant to be for purposes of self-improvement, aggrandizing, or selfishness. Rather it is meant to address the problem of how freedom can be practiced, or better yet, how we can know the possibilities of freedom. “This ethical problem of practices of freedom, it seems to me, is much more important than the rather repetitive affirmation that sexuality or desire must be liberated” (Foucault, 1997, p. 283). To give a simple example, a GIS practitioner may desire that local government data about home ownership across a town be made available or open access. One may try to “liberate” these data from their bureaucratic silos and to make them open access. However, simply realizing that something needs to be liberated, be it data, speech or desire is insufficient to make it so. What needs

Societal Impacts and Ethics of GIS

403

to additionally occur is that a person practice an ethics of understanding how and under what conditions freedom can occur. In the case of local government data, what are the conditions for freeing these datadunder what conditions would the city release the data? A different set of leaders? Other inspiring examples of open city data? Addressing fears of loss of privacy or profit? An understanding that these data contribute to improved wellbeing of its citizens? These are the ethical questions of this situation. As can be seen, they are in the realm of the political. But it doesn’t end there. A further ethical question is how these data constitute the subject as something and not as something else. In other words, reverse the current practice of starting with the subject and asking what can be known about it, and instead start with knowledges and see what subjectivities they form. What work does data or the map do in the world? These forms of resistance, which might show other possibilities and the consequences of the current situation, are also questions of ethics. To return to our apparent contradiction of ethics then, the desire to have foundational knowledge and the desire to apply critique allow us to understand the revival of a third branch of ethics in addition to rules-based (deontological) and outcomes-based (consequentialist) ethics, that is, virtue ethics. In virtue ethics, action is not rationalized from foundational beliefs, “but rather in terms of elaboration, elucidation, and amplificationdof ‘making things explicit’” (Barnett, 2010, p. 247). This is quite close to what Foucault elaborates, although it is not limited to him. Barnett offers the example of social justice and the right to the city, whereby ethical engagement might spring not so much from a foundational sense of what justice comprises but rather a sense of injustice. This “sense” might be nonrepresentational and nonfoundational (that is, a feeling or affect). These feelings, which may be shared by some but not all others, are certainly ethical, but not because injustice is compared to some theory or set of rules about justice. Engaging with these feelings may better help us to understand seemingly “irrational” actions as communities which receive large European subsidies voting for Brexit, or residents of Kentucky reliant on federal benefits voting for a governor who promises to remove the benefits. An ethical GIScientist then, or rather one who wishes to engage ethically, is less concerned about judging behaviors against a set of rules or even of outcomes, but prefers to understand better issues of equality, justice, and public policy. According to Barnett on this view, “It is more appropriate to acknowledge that ‘we’ in the West stand in the position of supporters and beneficiaries of global institutional systems that contribute to the impoverishment and disenfranchisement of distant others” (Barnett, 2010, p. 250). Ethics here is less about developing self-practices, but of engaging the background conditions such as class structure, racisms, and divisions of labor, less for reasons of assigning blame, and more to work out collective or shared practices of questions of justice. This position is particularly associated with the influential writers Iris Marion Young and Doreen Massey. Young argues that more than justice and knowledge are required (or senses of injustice), also entailed are shared responsibilities (Young, 2010). Barnett’s call above for a reconsideration of normativity as less about prescribing what must and must not be done but more as a “competence on the go” entails a more ordinary sense of normativity (Barnett, 2013). In this sense, thinking and doing are alongside each other rather than a dichotomy between them, which needs to be mediated. How might this play out for the GIScientist? For Gerlach, the ethics of this everyday sense of thinking and doing occurs in what he calls “vernacular mappings” or those which are “of and for the everyday” (Gerlach, 2010). Gerlach uses the example of OpenStreetMap, the now well-known crowdsourced map of the “entire” world started in 2004 (the scare quotes indicate that it of course is very partial in its representations as feminist GISers have pointed out (Stephens, 2013), to question “counter-mapping.” For Gerlach, counter-mapping is too binary; it opposes an imagined indigenous set of knowledge against “the ostensible might of Western GIS and Euclidean reasoning” (2010, p. 166). In geography perhaps the most contentious example is the Bowman Expeditions (Bryan and Wood, 2015; Wainwright, 2013) and vigorous defenses mounted by its practitioners (Dobson, 2009). How does one proceed ethically here? For Gerlach, like Barnett, the answer lies in the more practical or pragmatic reasoning of the everyday: “what can a map do?” (Gerlach, 2010, p. 166; see also Deleuze and Guattari, 1987; Buchanan, 1997). For Gerlach and others working in this vein (Wilmott, 2016), how ethics “comes next” is less by a code of conduct (e.g., the GISci Code of Ethics) than “the generation of maps that tell open-ended and inconclusive stories, of spaces constantly on the move and coming into being, an ethics that takes care of what might come next” (Gerlach, 2010, p. 167). For Leszczynski (2015) mediation is similarly less of an intervening factor that contributes to dichotomies between thought and action (e.g., the map). Instead, she outlines a theory of mediation that understands spaces as sites of potential relations between a whole set of interested parties: people, technologies, and material objects such as APIs, smartphones, and algorithms. This concern for less of a hierarchical dichotomy and an understanding of a “flat ontology” or assemblage (agencement) has proven very popular in geography since its introduction in the early 2000s. The ethical question emerging from this understanding is one which would have us engage with this assemblage. However, Leszczynski is adamant that this not be done to retroactively paint a picture of GIS as emerging from these assemblages. This would be not only to gloss over what is new in understanding spatio-technologies as media, but to ignore their incompatibility. Whereas GIS is systems-based, she argues, spatial media are not. For the ethically minded GIScientist therefore, Leszczynski demands to know whether GIS in the technological narrow sense (software, hardware, and spatial analysis) is still relevant today, or whether “our everyday being-with-each-other, and our being-with location-based objects and services” (Leszczynski, 2015) is what matters. For all these writers, Gerlach, Wilmott, Leszczynski, Dodge, and Kitchin, the question becomes ethically pressing for how everyday mappings-as-processes or performances are co-constituted with spatialities: In addition to being constitutive of how space is ontogenetically (re)enacted through our encounters with spatial media, the ways in which location is increasingly underwriting both the organization of content itself and technically-mediated social connectivity is indicative of the ways in which networked location-enabled devices and spatial big data assemblages not only mediate experiences and perceptions of space, but also experiences of (codified) information and of sociality as structured and enacted through spatial media. Leszczynski, 2015, p. 18.

404

Societal Impacts and Ethics of GIS

It is interesting, therefore, that although ethics is sometimes thought of as an abstract and abstruse topic, current ethical thinking in mapping has returned it very much to the level of the ordinary or every day. If GIS is understood as specialized or the domain of those with GISci Certifications, the ethics of GIS described above appears to suggest a rather different future, where socio-spatiotechnologies are immanent to everyday life, either as Big Data, algorithmic decision making or location-aware devices. If so, this might open up rich new avenues for research and understanding that will go beyond traditional understandings of GIS.

1.27.2

Case Study in Ethics and GIS: Policing

1.27.2.1

Data Acquisition

Before we assess how a map looks, and what it does in the world, we should consider how its data were collected. Often the process of data acquisition itself can be embodied in violent and painful ways. The New York City Police Department (NYPD), for instance, claims to rely on data from police street stopsdpart of the program known as Stop-and-Frisk. Around the height of the program’s activity, then-Police Commissioner Raymond Kelly alleged that without Stop-and-Frisk “there will be, inevitably, killers and other criminals who won’t be captured as quickly or perhaps ever” (in Del Signore, 2010b). Yet the program has been under criticism from citizens, media, scholars, legal advocacy groups, and human rights groups for nearly a decade. One concern is the sheer numbers of stops; from nearly 100,000 in 2002 to peaking at nearly 700,000 stops in 2011. The vast majoritydincluding 88% in 2011dare found “completely innocent according to the NYPD’s own reports” (NYCLU S&F Data). The percentage innocent is even higher among Black and Latino New Yorkers, who are stopped at vastly disproportional rates. In 2012, 40.6% of all stops were of Black and Latino young men, while they only made up 4.7% of the city’s population (NYCLU Analysis). In fact, at times the number of young black men stopped has exceeded the number of young black men in the city (NYCLU Analysis). The combined unevenness and statistical inaccuracy of Stop-and-Frisk led to the 2013 decision of federal class action lawsuit against the city. The court found that Stop-and-Frisk was statistically unreasonable, thereby violating New Yorkers’ fourth amendment rights against unreasonable stop and seizure, and their 14th amendment rights to equal protection under the law. The numbers of stops declined to roughly 23,000 by 2015, but they remain racially and spatially uneven. Furthermore, the experience of the stop remains uncomfortable, often humiliating, and at times violent and marred by sexual harassment (CCR, 2012, p. 5). As the Center for Constitutional Rights notes, “everyone subject to a Stop-and-Frisk must cope with the emotional, psychological, social, and economic impact on their lives” (CCR, 2012, p. 5). Fear of racially targeted police stops has a heavy hand on the everyday lives of targeted New Yorkers, causing their very mobility to be curtailed. Black and Latino New Yorkers in heavily policed zones report changing the times and routes traveled, mode of transit, and clothing to avoid police harassment, while some avoided leaving the house and others found themselves prevented from reaching their destinations (Kaufman, 2016). Independent of what data is gathered in these stops, how it is recorded, stored, visualized, and used, the act of stopping and questioning civilians has lasting and multiple effects.

1.27.2.2

Data Management

What happens to Stop-and-Frisk data after collected is another site of controversy. First of all, the data gathered in each stop spans 112 columns in the .CSV file the department makes public. While some categories are redundant, yes/no entries, data gathered includes details of the stop itself (time, exact location), the reasons for the stop (to be entered whether or not the suspicion was borne out), reason for escalating to a frisk, and a plethora of biometric markers including race, sex, age, height, hair color, eye color, build, and any other features such as scars or tattoos (NYPD, 2015). There is also a category for type of physical force used by the officer that illuminates the options built into and expected in the stop: “hands, suspect against wall, suspect on ground, weapon drawn, weapon pointed, baton, handcuffs, pepper spray, and other” (NYPD, 2015). Because these data are anonymous, they can be stored in the NYPD’s massive computer data bank. But the NYPD gathers more explicitly identifying data as well, which are managed differently. Even if found innocent, the names, addresses, Social Security numbers, and other personal information of those stopped are recorded. Until 2010, these were entered into the computer database as well. But in response to privacy concerns over the swelling records of largely Black and Latino subjects, then-Governor Patterson signed legislation ending the NYPD’s database of those stopped and found innocent (Del Signore, 2010b). In Patterson’s words, “people who are found to be doing nothing wrong in the United States of America should not have information about them floating around the Police Department” (in Del Signore, 2010a). This information, Patterson argued, violated the principle that civilians wrongly accused should not suffer stigma or other adverse effects, a principle he described as “compatible with the presumption of innocence, and is deeply ingrained in our sense of justicedthat individuals wrongly accused of a crime should suffer neither stigma nor adverse consequences by virtue of an arrest or criminal accusation not resulting in conviction” (in Baker and Moynihan, 2010). It should be noted that the bill itself only prohibits data storage in an electronic database. As paper records are filed locally in each of the city’s 79 precincts, in effect, the measure renders the data unusable. But as an anonymous officer posted on the unofficial police social media board Thee Rant, “All it does is remove electronic technology from the equation. Its back to paper binders.another house mouse job as binder tender and wandering detectives seeking elusive suspect clues and witnesses.....(looking for that elusive/readable desk copy)!” (Thee Rant, 2016). The officer suggests that the personal data of the innocent remains fair game in investigationsdprecisely what the bill aimed to prevent. Thus, the continued existence of paper records may frustrate and hinder police, but does not entirely protect the privacy of those stopped and found innocent.

Societal Impacts and Ethics of GIS

405

Here we see Stop-and-Frisk data storage both required and prohibited in the name of civil liberties, reminding us that neither is unilaterally more ethical. And while both measures aim to keep rampant racial profiling in check, neither affects the acquisition process. In fact, in a department-wide memo after the bill was signed, Kelly expressed his desire to keep Stop-and-Frisk practices the same: “the law does not affect an officer’s ability to collect identification information at the scene of a street encounter” (in Del Signore, 2010a,b; English, 2010). Patterson himself had issued a similar statement, albeit in response to Kelly and Bloomberg’s condemnation, assuring them that “this law does not in any way tamper with our stop-and-frisk policies” (in Beja, 2010). Thus, data management has significant ethical implications and as such is the site of political feuds, but these implications cannot be evaluated on the grounds of data management alone. Stop-and-Frisk data of those found innocent may not be electronically stored, but the 2010 legislation has no bearing on those displaying some evidence of wrongdoing. Since 1994, the NYPD has employed a real-time crime tracking database called CompStat, which allowed the NYPD to make predictions, and generalizations, about the who, what, when, and where of crime. The department still holds weekly meetings with all precinct commanders to assess crime and arrest statistics from the week. Crime maps have played a key role from the start, for the real-time database allows commanders to visualize when and where crime occurs, overlay police distribution and arrest data. A map of crime locations is projected, and precinct commanders were, in Bratton’s words, held accountable for “putting cops on the dots” (in Gilsanan and Stepan, 2014, p. 4). Searching the internet for CompStat meetings, or watching and reading the news in New York, you will find no shortage of information about and glimpses into these practices. But they are largely press releases and puff pieces, including an “unprecedented look at CompStat” by NBC New York (Dienst et al., 2016). Reporters filmed segments of one CompStat meeting, and interviewed commanders and chiefs, who praised the department and the program. Then-Chief, now Police Commissioner James O’Neill echoed the dominant sentiment when he said, “There is direct accountability. I think that’s the genius of Compstat.” As the segment closes, an NBC anchor shakes her head, repeating “incredible access, incredible access!” Yet despite such glimpses, the public does not have access to the way the NYPD processes and visualizes data for its own use, nor how it impacts police policy. They claim that crime and Stop-and-Frisk data drives spatial and temporal allocation of police resources, as well as special tactics. But in all the discussion of CompStat’s role in holding commanders and precincts accountable, there is no discussion of accountability to the public. We are left to take their word that arrest rates match crime, and aren’t given any tools to keep them in check, to check up on them, and make our own assessments. It is, therefore, difficult to assess the NYPD’s data in practice, and it seems quite intentionally so.

1.27.2.3

Data Visualization

What the NYPD does provide are .CSV files of nonpersonal Stop-and-Frisk data, which they do not visualize themselves, and an interactive crime map, but no city-wide crime datasets, making it difficult or impossible to compare police practices with crime rates (however, it is possible to download the .CSV files). Before moving on to an example of a police department which does facilitate this overlay, it is worth analyzing the visualization that the NYPD does provide. There are two separate crime maps hosted on the city government website. The first is called “NYC Crime Map” and is automatically set to precinct-level data; a choropleth, not a heat map. Here crime appears confined by clear boundary lines, though precinct boundaries are neither visible nor meaningful in daily life. It can be toggled to a heat map. Rather than the red to green diverging color scheme common in crime maps, both the choropleth and the heat map are shown in a sequential color scheme of flesh tones, from pale beige as low-crime, to deeper brown as high crime. Regardless of the map makers’ intent, the implications of the color schemeddarker flesh tones indicating dangerdshould be an ethical consideration. The third map type avoids the color connotations and shows crime location in blue proportional symbols. Data is aggregated to street segment, and circles show how many crimes were committed in that segment during a given time period set by your search. Type of crime can be selected as well from the seven major felony offenses in New York State: murder, rape, robbery, felony assault, burglary, grand larceny, and grand larceny auto. Overall, the visualization is accessible; the selection functions are obvious, the map is large with a well-proportioned legend and no nondata ink. Yet one feature at first appears not to work. Given that they give a more precise location of crime than the choropleth map, the heat map and proportional symbol map do not show rape data. Presumably the omission is designed to protect rape victims’ privacy and safety. However, this ethical visualization decision is at odds with the other crime map run by the same department. In 2016, the NYPD released an interactive CompStat webpage. With its NYPD shield logo, black background, charts and tables on both sides, and slim map in the center, this map has a no-nonsense aesthetic, contrasting the NYC Crime Map’s bubble letters and pastels. Aesthetics aside, the CompStat 2.0 map shows rape location data at nearest intersection. Deputy Commissioner for Information Technology explained the compromise would “ensure victim privacy” while giving “the public a better sense than they had in the past of where these crimes are occurring” (Katz, 2016). There has never been a clear winner in the ethical tension between privacy and transparency, for the two often come at the expense of each other, and the loss of either can be dangerous. The two maps’ visualization choices can both be justified on the grounds of protecting victims and preventing crimes, and the inconsistency between them epitomizes their inherent contradictions.

1.27.2.4

Data Dissemination

Just as ethics in GIS are often discussed in terms of who gets to map and be represented by maps, data dissemination is often posed as an ethical concern of who gets to access data and its visualizations. As we see above, in the case of rape data, an equally pressing

406

Societal Impacts and Ethics of GIS

concern is who should not have access to what data. Perhaps the locations of reported rapes should be concealed from the public, but what about other types of crime? In other words, what are the impacts that crime maps have on civilian percipients? Do we have a right not only to the raw data, but to its visualizations? And if so, from whom do we claim that right? Publicly available crime maps are becoming a more common output of police departments. Some, like the NYC Crime Map, are disseminated further with the click of a “share on social media” button. But private companies, individuals, and apps are prolific in their crime mapping; thus, we see crime data disseminated more widely than ever. Moving on from the visualization itself, I discuss a new venue for crime data dissemination, with a focus on its relationship with users. As but one example of a growing trend of private companies processing government data and at times partnering with city governments, take Stamen Design’s interactive San Francisco crimespotter map. The company makes an attractive interactive map of city crime data by location. Where both NYC crime maps are aggregated to street segment or intersection, Stamen’s shows exact location and time. They also have seven more categories of crime than New York’s maps, including disturbing the peace, alcohol, vandalism, and prostitutiondcategories whose legitimacy is debated. Whereas robbery is fairly straightforward, disturbing the peace relies on police perception and discretion. Meanwhile, there is an international movement to decriminalize prostitution, and the map is likely to pick up only women walking the streets, not escort services rendered to wealthy men in elite areas. Stamen may argue that they are simply relaying the city’s data, but as Crampton points out, “maps are active and not passivedthey are a capacity to do things. Not only are maps involved in achieving a goal, they also define or identify that goal: they frame the narrative” (Crampton, 2014, p. 192). They do so in part by determining “what counts as data” (Crampton, 2014, p. 195), an act which legitimizes data, and in Stamen’s Crimespotter map, cements socially constructed categories. Interestingly, while the map is active in Crampton’s sense and in the literal sense of its interactivity, its designers are not necessarily looking for users to be aware of their own role in the action. Co-developer Mike McGursky (Rosling, 2011) explained the statistical operations involved in both creating and viewing the map, yet believes that “to people that are interacting with the thing it feels very much more like they’re just sort of browsing a website or you know, shopping on Amazon. They’re looking at data and they don’t realize they’re doing statistics.” It’s true; part of the success of the map lies in the ease with which the percipient performs a great many functions. But is it possible that these user-friendly designs, from the NYC and CompStat crime maps to Stamen’s, allow users to see themselves as passive consumers of facts, rather than active participants in the construction of a narrative? Contradictions abound, for Stamen reports a goal, and success, of community empowerment through the map. Consuming the data as passive shoppers does not mean that civilians won’t then use it to become more involved in community activities, like attending precinct meetings armed with Crimespotter’s data (Rosling, 2011). Statistics Professor Hans Rosling says of the map, “What’s most exciting to me is that public statistics is making citizens more powerful and the authorities more accountable” (Rosling, 2011). Here the map facilitates civilians holding police accountable; it adds transparency to police practices; and it somewhat levels the field between police departments and civilians. All of this is commendable; exciting even. Words like “community” and “empowerment” are persuasive in discussions of ethics, and even more so when combined. But we must always ask who is empowered? Which communities? And at whose expense? Data dissemination is a useful starting point to answering those questions. The NYC Crime Map, CompStat map, and Stamen’s Crimespotter map are all free and available to the public, and appear in a Google search. They employ user-friendly interfaces that require little knowledge of technology and no mapping expertise. This makes the data they convey much more public than the Stop-and-Frisk data that can only be downloaded in .CSV files. But the digital divide still exists; not everyone has access to computers or smart devices, and not everyone knows how to productively search the internet, what to search for, and how to use the results. Those who lack access to the crime map are likely a small minority. Larger barriers to the empowerment Crimespotter promises include: 1) time to attend precinct meetings, 2) confidence that one’s voice matters and will be heard, and 3) the ability to be heard. With police departments across the country embroiled in charges of racist violence, it is not unreasonable to think Black voices may count for less than white voices in precinct meetings. Whether or not this is the case, police stations are not always safe spaces for Black Americans. These are simply barriers to data’s positive impacts stemming from, but not ending with, dissemination. I discuss below potential negative effects stemming from both these barriersdthis differential access to datadand from the data itself.

1.27.2.5

Data Effects

What does a neighborhood safety map do? One longstanding use is redlining, named for the practice of drawing red lines around deemed-dangerous neighborhoods. These maps were then used to justify denial of services from loans and insurance policies to groceries and health care centers. Redlining maps, which began with the National Housing Act of 1934, differed from the similarly shaded safe-to-dangerous maps made from crime data today in that their determinations were at surveyors’ discretion, and were made predominantly along race and class lines. The language was often explicitly racist, but redlining was not only detrimental to African Americans, their houses, neighborhoods, schools, and local infrastructure and resourcesdit led to the decline of entire city centers (Gordon, 2008). Today, among the first several hits in searches for city’s crime maps are a real estate company’s crime heat map of that city. The message is that homebuyers should avoid certain areas, and that houses in those areas should be worth less. The map is as accurate as Stamen’s or the NYPD’sdit uses the same datadbut it tells less of the story. Here there is no option to toggle type or time of crime. Unlike the NYC Crime Map automatically set to a choropleth map, and therefore adjusted for population, the real estate map is based only on raw numbers. Its red areas coincide with the city’s highest population density, but the map does not indicate this. Nor does its legend list even the raw numbers; rather it is a scale from “less safe” to “safer.” These are not factual or quantitative but deeply situated and value-laden words, reminiscent of redlining maps’ scale of “best” to

Societal Impacts and Ethics of GIS

407

“hazardous.” To consider the impacts on property values, neighborhood diversity, and quality of life resulting from real estate crime maps, and reminiscent of the effects of redlining, is not to suggest that civilians should not have a right to visualized crime data. But such data and their visualization tell only part of any story, and in doing so, may obscure the rest.

1.27.2.6

Summary

In each example above, ethical considerations are in tension, reminding us that the question of GIS and ethics cannot be answered with a rule book. Our aim is to highlight ethical concerns that are not always brought to the fore even in discussions on the topic. In conclusion, we offer these provocations: 1. At times not-mapping, or resisting being mapped, should take precedence over a focus on mapping-back. 2. The process of data acquisition can be more than a breach of privacy, but rather a breach of dignity, wellbeing, and even of one’s own bodily barriers. When we talk about our right to access data, we should consider at what costs and who bears them. 3. Making certain types of data publicly available may jeopardize not only privacy, but safety; for instance, mapping exact location of rape and other crime data. 4. Abstract claims to data access should be questioned. When weighing the counter-argument that civilians need access to sensitive crime location data, we must ask what for? What will be done with the data and by whom? 5. Even every day, seemingly innocuous maps, from insurance maps to crime maps, made by governments or civilians, can and have had widespread and devastating effects. Redlining maps were a powerful lesson, and contemporary neighborhood safety maps cannot be examined outside of this historical context. 6. Maps always tell only part of the story, but often present themselves as the whole truth. 7. More (representation, or access to data, visualizations, and mapping tools) is not always better. In other words, it might always be better for some, at the expense of others. When weighing competing claims, we might consider what sort of world we would like to live in. If the answer is a more egalitarian one, then we have a responsibility not to further exacerbate inequalities. In a case where a privileged and oppressed group’s claims are weighted against each other, there is an ethical incentive to heed the oppressed group’s call. 8. It is well known that maps produce reality as much as they reflect it (Crampton, 2010). This co-constitutive relationship plays out directly in crime and Stop-and-Frisk mapping. The more officers deployed to an area, the more stops registered, which in turn feeds the CompStat databank. Higher concentrations of stops indicate cause for police suspicion, which can justify further deployment of officers. Some have argued that racially and spatially targeted policing leads to community dissolution, delegitimation of the law, and increased crime (Harcourt, 2007; Herbert, 1999). Without relying on this claim, it is safe to say that “disproportionate policing of profiled groups itself will lead to disproportionate data on their criminality, giving rise to further policing” (Kaufman, 2016, p. 78; see also Coleman, 2016; Harcourt, 2007). This cycle is known as the “technocratic feedback loop” (Kaufman, 2016). While the two-directional model is fitting for maps’ co-constitutive relationship with reality, the feedback loop more accurately describes the ever-expanding cycle of data gathering, managing, producing, visualizing, and disseminating discussed here. While we have divided our case studies into these categories to illuminate different aspects of each, our final provocation is that none can exist in isolation. Acquisition does not come first; it comes after a call for more data, which is an effect of the existing data. Effects are not the end result of the final product, but exist at every stage; even before the data is recorded, let alone stored, visualized, or shared, there are effects of its acquisition itself. Therefore, in any consideration of ethics in GIS, none of these components should be examined without taking into account the whole.

1.27.3

Ethics and GeoDesign

1.27.3.1

Introduction

At least since the advent of the administrative bureaucracy in the 19th century and what Ian Hacking has called the “avalanche of printed numbers” (Hacking, 2015), social and environmental measurement has been closely coupled with statecraft, military strategy, urbanism, and the governance of territory. However, with the rapid expansion of computing power and storage capacity in the period following World War II, urban planning and design professionals began to seek new ways to integrate digital data and computational forms of knowledge into their practices, often under the influence of scientists, military, and management professionals (Light, 2003). What is today called Geodesign is only the most recent realignment in a much longer history of the shifting cross-disciplinary affinities and configurations that Trevor Barnes has called the “mangle” (Barnes, 2008). Esri CEO Jack Dangermond says as much when he reminds us that “Geodesign is an old idea, an ancient idea, but also a new idea” (Dangermond, 2015, n.p.); what is new is the possibility of constructing an integrated and often cloud-based information infrastructure using GIS to guide design processes. Many of the antecedents of GIS can be located within the design disciplines. The SYMAP platform that would be instrumental in the development of ArcGIS was developed by the architect Howard Fisher, although it’s worth noting that it was coded by a largely forgotten programmer named Betty Benson (Wilson, 2017). It was further developed and monetized at the Harvard Laboratory for Computer Graphics and Spatial Analysis, which was based in the Harvard Graduate School of Design (Chrisman, 2006). The landscape architect Ian McHarg, author of Design with Nature (1994 [1969]), is often credited with developing the overlay method that Nadine Schuurman calls the “sine qua non” methodology of GIS (Schuurman, 2004, p. 3). The historical validity of this claim

408

Societal Impacts and Ethics of GIS

is dubious; overlay methods had been in use by landscape architects since the early 20th century (Steinitz et al., 1976) and were arguably first formalized by urban designer Jacqueline Tyrwhitt (Shoshkes, 2006). More recently, urban planners have contributed to the extensive literature on what have been called Spatial Decision Support Systems, which aim “to provide decision makers with a problem solving environment within which they can explore, structure, and solve complex environmental problems” (Densham, 1991, cited in Wilson, 2015, p. 228). A similar literature exists on “Planning Support Systems,” described as “infrastructure[s] for analysis, design, and evaluation for planning-specific contexts” (Klosterman, 1997, cited in Goodspeed, 2016). Even in these very brief historical remarks, ethical questions are discernible: what exclusions are enacted by histories of GIS? Many histories exclude the women who either contributed their labor and expertise to the development of technologies credited to men (e.g., Betty Benson), or who made substantive contributions to a field ignored by subsequent male scholars (e.g., Jacqueline Tyrwhitt). Indeed, a history of Geodesign published by Esri locates aspects of Geodesign in the works of architects Frank Lloyd Wright, Richard Neutra, as well as the landscape architects Warren Manning, Ian McHarg, and Carl Steinitz (Miller, 2012), all of whom are notably white male elites. Ethical questions, including who is represented and with what effects, would almost certainly complicate this masculinist history. We dwell on this particular history because contemporary Geodesign locates itself within a lineage of designers and planners using geospatial information and analysis to develop proposals for “better futures.” This very emphasis on possible and desirable futures raises a number of ethical considerations. A number of questions, for example, circulate around participation. If Geodesign produces futures, whose expertise is legitimate, who adjudicates between the desirability of multiple future scenarios, and on whose terms? Further, who benefits from the design decisions and who does not; what are the effects of Geodesign processes? These questions will be addressed through a case study of three Geodesign technologies: GeoPlanner, CityEngine, and ArcGIS Pro, all developed by Esri. Our focus on Esri technologies can be justified because the early success of Geodesign as a professional orientation was underwritten by extensive investments by Esri in technological development, as well as funding numerous Geodesign conferences, symposia, workshops, and publications (Wilson, 2015). These investments have been buoyed by the academic legitimacy conferred by the creation of undergraduate and graduate programs at, among others, the Pennsylvania State University, University of Arizona, University of Southern California, and the University of Wisconsin (Foster, 2013). As the influence of this research community expands, it is essential that we bring ethical questioning to bear upon the specific technologies deployed by GIS technologists who are increasingly “hailed” by Geodesign.

1.27.3.2

Data Acquisition

One of the selling points of Geodesign technologies is that they provide access to enormous quantities of data that have already been collected and made interoperable. Jack Dangermond, in his introductory remarks for the 2016 Geodesign Summit, suggests that while data acquisition has historically required enormous and ongoing expenditure on the part of interested organizations, this no longer needs to be the case. He argues that the joint advent of cloud computing and data infrastructures capable of handling “big data” obviates the need for expensive and redundant data collection. He estimates that 3,000,000 spatial datasets exist, with an additional 5000 coming online every day. These datasets are drawn from scientific and authoritative sources. The GeoPlanner for ArcGIS applicationddescribed in marketing materials as “an app for informed, evidence-, and performancebased planning and design”dgives the user immediate access to vast quantities of land cover, slope, elevation, “green infrastructure,” and other landscape data, already processed and rendered commensurable. It functions as a largely automated tool for suitability analysis, of the type that has long been a primary method for environmental planners and landscape architectsdSteinitz and McHarg both developed versions of this overlay-based model in the 1960s. While users can incorporate their own datasets into the analytical models that Geoplanner facilitates, much of the software’s value is presumed to derive from its ability to interface with cloud infrastructures that grant access to preprocessed, rapidly deployable datasets, and to automate or simplify the process of suitability analysis for planners and designers. In addition to data hosted on Esri’s servers, GeoPlanner gives the user access to what are called “sketch” tools; these put the “design” in Geodesign, allowing user-designers to experiment with different site selections and configurations based on the results of suitability analysis. When a user “sketches” an area that they think might be appropriate for a certain land use, they might receive immediate feedback on, for example, the suitability of land enclosed by the area drawn. These features hearken to a long-held dream of automated data collection on the one hand and, on the other, the instrumentalist vision of an objective science (Gabrys, 2016; Daston and Galison, 2010). The provision of data through cloud-based infrastructures raises a number of ethical questions. First, what forms of authorities are elevated by the Geoplanner interface? A hint is provided by Dangermond, who is fond of invoking Ian McHarg, who used to speak of his design process as one of “incorporating all of the -ologies.” Indeed, scanning through datasets made available for GeoPlanner reveal conventionally “authoritative” data collected by state agencies and environmental scientists; the classic top-down approach. Esri documentation treats local data as a supplement to the vast data infrastructure made available through Geoplanner, but as Goodchild has pointed out, local, participatory (e.g., VGI), and counter-mapping data could equally be legitimate forms of knowledge that could also be incorporated into analyses and visualizations. Furthermore, the apparent completeness and commensurability of data served by Esri conceals the fact that these data were produced by people working in institutions, with particular interests, and at particular moments in history. The ability to simply “add” a dataset, frictionlessly, obscures the socio-historical contingency of the data. Data are collected in particular ways and do not “speak for themselves” (Gould, 1981). An ethical Geodesign, then, might seek to make visible, rather than obscure, the labor

Societal Impacts and Ethics of GIS

409

of data acquisition. For example, Catherine D’Ignazio and Lauren Klein argue that “making labor visible” might involve visualizing metadata that makes explicit the data’s provenance (2016, see also McHaffie, 1995). Geodesign, as envisioned by Esri, shares much in common with the discourses (and often hype) surrounding “big data.” While there is an intuitive appeal to the empiricist notion that data “speak for themselves” (and speak more loudly as they multiply), observational data can only ever provide partial, situated, and extensively mediated perspectives on the world that must be politically negotiated (Haraway, 1988; Latour, 1993; Kitchin, 2014). Geoplanner, by providing access to innumerable, interoperable datasets, creates an image of a world fully described by data. We argue, instead, that an ethical GIS practice for Geodesign should create an image of the world that recognizes its own partiality and contingency. A final ethical line of questioning around the data acquisition might have to do with drawing design “sketches”: who is involved in this sketching process, what alternative scenarios are entertained, and who do these benefit? The process of design is often framed within Geodesign publications as balancing environmental and economic concerns (e.g., Steinitz, 2012); however, this frame tends to exclude the possibility of progressive political contestation. An ethical design process must not only include historically marginalized peoples, but must treat their claims, whether or not they are made through conventional data, as legitimate. Ethical GIS must work to surpass the expansion of technological capability and constantly ask if it is addressing itself to the difficult problem of the “we” in design, or as Latour put it “can we draw together?” (Latour, 2008, p. 12).

1.27.3.3

Data Management

As noted above, Geodesign technologies, including Geoplanner, offer the promise of a cloud-based data infrastructure for design decision-making, locating the value of these technologies in their ability to provide ready access to data, which is reconceived as a platform instead of a process. The difficulties of data management, the tedious construction of databases, then is framed as unnecessary when such datasets are already “out there.” Even software is increasingly removed from local management, for example, Esri’s migration of ArcGIS’s to ArcGIS Online. None of this is inherently unethical: we do not argue that ethical data management is local, tedious, and must always replicate the work done by others, regardless of its availability. We do argue, however, that cloud data infrastructures tend toward the obfuscation of data as socially constructed, limited things. Schuurman and Pratt (2002) for example, argue that critical engagements with technologies require a grounded perspective on the functioning of those same technologies, a position that has been called “techno-positional” (Wilson, 2009). We are concerned that as data, in particular, migrates into the cloud and is served to users without the friction of cleaning data, building databases, and confronting absences and lacunae, the ability and inclination of Geodesign practitioners to ethically and critically question technologies will be truncated. The work that produces data, and the uncertainties and ambiguities that accompany it should not be obscured. Ethical GIS practices might take advantage of the expediency made possible by networked access to data infrastructures, but must do so with an eye fixed on the production, assumptions, and limitations of data made instantly available.

1.27.3.4

Data Visualization

Esri ArcGIS Pro is a significant update to the ArcGIS interface first released in January 2015. One of the most significant updates it offers is support for three-dimensional visualization, which can be linked to two-dimensional planimetric views. It allows for automated three-dimensional visualization of two-dimensional data, such as building footprints, using rule packages that utilize attributes (e.g., building height, roof designs) to generate wireframes. It includes a variety of preset packages that automate the visualization of, for example, trees, buildings, terrain, icons, and thematic shapes using attributes stored in the attribute tables of spatial data. It also allows designers to incorporate models drawn in more conventional CAD programs (Rhinoceros, AutoCAD), as well as Esri Geodesign platforms like CityEngine. Perhaps most strikingly, it includes a variety of presets that can automatically apply textures based on architectural styles to visualized buildings, or model trees based on their species. These features are sold on the basis of their expediency and their ability to automate what in the past would have been arduous tasks drawn by hand. In a striking moment, Dangermond reacts to a demonstration of these features by noting that, as he watched, he “was thinking about what Carl [Steinitz]. did in the 60s. he did the same thing. Except the graphics weren’t as nice and it wasn’t fast. And it was by hand” (Esri, 2015, n.p.). Such equivalencies, however, tend to elide the fact that visualization technologies have significant implications for how we conceive the role of the designer. As we note above, theories of ethics following Michel Foucault ask after knowledges and see what subjectivities are formed around them. Following this mode of analysis, Orit Halpern has argued recently that data visualization as we know it today emerged in the postwar period as a means to make the uncertainty and unknowability of the world as it exists, and as it will be tractable and actionable (Halpern, 2014). She reminds us that representational techniques and visual forms of knowledge have histories, histories that call for particular forms of analytical and observational training. In other words, they require new subjectivities. Furthermore, she reminds us that these techniques are aesthetic: that data itself is not “beautiful,” but must be shaped, machined, and crafted for its value to be realized. Methods for visualization, then, are also claims to know the world in particular historically situated ways. Ethical data visualization, then, must be aware of the forms of subjectivity they demand. Geodesign, enabled by ArcGIS Pro, sees the designer as carrying out not the tasks of drawing or rendering the context of a design interventiondsuch rendering is framed as rudimentary, menial, or even unnecessary, given the capacity of sophisticated and automated visualization technologies. The context of interventions is conceived of as data, which can be automatically spun

410

Societal Impacts and Ethics of GIS

into visualizations that can inform design practices. However, the context of a site vastly exceeds what can be represented, and even more vastly exceeds the scope of what is represented. The politics of land, labor, and racialized development in American cities, for example, are far less conducive to simple, automated representation, rife as they are with deeply contested histories. How do we represent conflict, and the often violent forms of exclusion that form the context of sites? Beyond the automated generation of building wireframes, an ethical GIS practice for Geodesign must grapple with the messiness of context and contestation. Our concern with the automated production of visualizations, then, has a great deal to do with what counts as “context,” and a conceptualization of design and a designerly subjectivity that too understands context as a simple, even automatable, input into the design process. Visualization and mapping enabled by GIS are not simply automatable processes that render clearly visible the context of a design intervention: they “carry the characteristics of an infrastructure all their own” (McMahon, 2013, p. 458), one which can act to exclude nonexpert forms of knowledge and the political-economic complexities of sites and contexts.

1.27.3.5

Data Dissemination

Geodesign technologies such as Geoplanner and ArcGIS Online are often framed as expanding access to essential data, as platforms that open access to data sets and visualization techniques that might otherwise incur prohibitive labor costs and require unavailable expertise. In a limited sense, this is undoubtedly true. Much of the data that is produced through local and regional planning processes are made available through the ArcGIS Open Data initiative, which allows ArcGIS users to “share. live authoritative open data. Esri-hosted ArcGIS Open Data [provides] a quick way to set up public-facing websites where people can easily find and download your open data in a variety of open formats” (Esri, 2015, n.p.). However, the ability to contribute these datasets is mediated by access to Esri software, which remains prohibitively expensive for many organizations. Considerations of “access” within evaluations of data dissemination must also include considerations of access to the technical means of producing those data. Thus, a more expansive definition of “access” is necessary for ethical GIS practice: one that asks not merely how many have access but also who has access, who is excluded, and who has access to ability to produce the data disseminated through Esri platforms. Is an open-source Geodesign possible? A Geodesign that opens technologies and the potentialities of widespread access to data outwards for use by nonexperts? Parallel to what has been called neogeography, can we imagine a neo-Geodesign that is simultaneously attentive to the limitations of discourses of democratization and openness (see, for example, Haklay, 2013; Stephens, 2013)? Who contributes data to ArcGIS’s Open Data Portal? Are their data sets that should not be shared? Pursuant to our discussion of police and crime mapping above, are there data sets where visibility might compromise communities? This tensiondbetween recognizing open data dissemination as a potentially ethical practice and the ethical consideration of where data might be best left restricteddmust be continuously navigated as GIS and Geodesign practitioners produce, upload, and distribute data.

1.27.3.6

Data Effects

As a relatively new phenomenon, it is difficult to gauge the effects of Geodesign, particularly since many of its applications are dedicated to long-term planning and design processes. However, plans for nation states, municipalities, ecosystem conservation, and participatory mapping projects are beginning to conduct projects under the banner of Geodesign. Several of these are documented in an Esri publication entitled Geodesign in Practice (Esri, 2013), many of which allow us to see how practitioners narrate the effects of Geodesign on their process and project outcomes. Several of these indicate that Geodesign technology eases communication with interested parties. In the context of one conservation project, Geodesign helped to strike “the needed balance between developing the analytically based methods required in conservation planning and the graphic and communicative language necessary for design implementation” (p. 4). According to a township manager, Geodesign technologies have allowed township’s planning department to “model a 3D virtual city that more accurately represents township properties and combine numerous data to visualize and analyze the impact of proposed projects. This helps them present evidence-based plans to council officials and makes it easier to communicate new development projects to residents” (p. 17). Some cite the ability to automate aspects of the design process that permit them to design at large scales. A wildlife conservation team developed an Automated Design Module that provides “allows large swaths of the landscape to be designed based on widely available data” (p. 10). The Singapore Urban Redevelopment Authority “to help organize and address the complex sustainability needs” of a large-scale urban megaproject (p. 10). The ability to automate portions of the design processes, analytically deploy existing data, rapidly produce realistic models, and manage data at large spatial scales is framed as having positive effects on practice. However, it is perhaps conspicuous that all of these are tasks that could be accomplished in the absence of technologies appearing under the banner of Geodesign, though with less speed and ease. We argue that without reorienting practice toward a “feeling of injustice,” as suggested by Barnett, neither the urban problems nor the environmental crises that Geodesign suggests are soluble will be solved. We can also look to history for suggestive precedents. Jennifer Light (2003) has documented the influx of military and cybernetic forms of expertise in the period following World War II. As military budgets contracted in the 1960s and 1970s, US think tanks influenced by cybernetics and systems thinking, such as the RAND Institute and the Systems Development Corporation began seeking new markets to ensure their own longevity. By reframing urban problems in military and systems-theoretical terms, they mobilized large networks of urban planners, architects, city administrators, and engineers around the promise of urban

Societal Impacts and Ethics of GIS

411

decision-making infrastructures that would provide the basis for scientific planning and management. However, the effect of this novel assemblage on urban life was minimal: both persistent socio-spatial segregation and the form of US cities remained relatively unaffected. What was then called the urban crisis proved largely immune to managerial innovations, rooted as it was in much longer histories of racialized housing policy in the United States, the suburbanization of capital investment, and processes of deindustrialization (Sugrue, 1996). How will the history of Geodesign be written? Our intention with raising this history is not to imply that Esri and others that invested in the development of Geodesign are ill-intentioned or that the field is only a cynical attempt to reinforce professional prestige. Nor is it to suggest that Geodesign, and indeed, urban and environmental management are powerless. It is rather to suggest the opposite: that Geodesign is powerful, but that history demonstrates the degree to which the lasting effects of these changes may be the construction of these networks and the enrichment of key actors, and not substantive changes in either the form or management of cities. Ethical GIS practice in Geodesign, then, must resist easy institutionalization that benefits the producers of new technologies without meaningfully changing how planning practices are positioned vis-a-vis questions of access, participation, and representation. Again: who benefits?

1.27.3.7

Summary

The above discussion has called attention to a number of ethical considerations that must enter into GIS practice as it informs Geodesign: 1. Who is represented in our histories of GIS and Geodesign? Do we simply reaffirm histories that act to exclude the labor of women, nonexperts, and those outside of conventionalized narratives? Or do we actively seek to locate and introduce precedents that continue to historicize our understanding? 2. How do we design together? Despite calls for increased participation in the planning process, this too often takes the form of “stakeholder input” or community engagement that includes those affected by design and planning practice only in certain stages of the processdas an input to be taken into consideration by the expert. Can we instead leverage the technical capacities of GIS technology to include (and treat as legitimate) multiple and conflictual voices at all stages of the planning process? 3. How do we use GIS and Geodesign technologies to expand, rather than narrow, what and whose expertise is considered legitimate? 4. How do we ensure that expedient access to data hosted in cloud infrastructures does not lead to the sense that data is transparent and that relevant data are already “out there?” Ethical GIS practice in Geodesign must remain aware of the assumptions, limitations, and construction of data from the cloud, while even seeking to make visible its partiality and its construction. 5. What new subjectivities are demanded by visualization techniques? New visualization techniques do not transparently represent “the real world” but demand new modes of analysis, new practices, and new forms of expertise. What do these exclude? 6. Can we open Geodesign through the use of open-source software and the creation of databases that do not restrict the contribution of data to those able to afford licenses for proprietary software? At the same time, we must be cognizant of the possibility that open is not necessarily an inherent gooddsome data are better left unshareddand that even well-intentioned discourses of openness and democratization can mask the perpetuation of unequal access to data, software, and the ability to represent and be represented. 7. Can Geodesign reorient design and planning practices toward injustice? In addition to the ongoing construction of social and institutional networks around the prospect of new practices, we should strive to see the capacities of those networks put to addressing persistent social and environmental injustices.

1.27.4

Conclusion

This article provides a review of the history and current prospect for an ethics of GIS. After reviewing the rise of ethical concerns in scientific inquiry, more generally following World War II, we consider some of the earliest work in GIS, spatial analysis, and mapping to be informed by a sense of “social justice” and equality (terms which continue to motivate ethical inquiry today) among the early quantifiers in geography. Although these motivations were not necessarily at the surface of their work, like the concerns of scientists in the postwar period at Nuremberg and elsewhere, they were deeply embedded in societal developments and the rise of “rights discourse” in the 1950s and 1960s. Ethics first manifested itself in GIS as a way to formalize best practice, again much like the way the Belmont Report and informed consent did in science. Today the legacy of that lies in the Codes of Ethics that have been adopted by GIS institutions such as the GISCI Certification. These codes often constitute a narrowing of ethical concern from that of the earlier concerns with social justice and equality. Sometimes ethics is well defined in these codes and sometimes not. Where it is, there is often a conflation of ethics and proper business practice. Esri’s Code for example, specifies that employees shall protect proprietary and confidential information, maintain accurate business records, refrain from discrimination in the workplace, declare conflicts of interest, and adhere to antitrust legislation. Normative codes of ethics are traditionally grouped into those that are rule-based (deontological), utilitarian or a weighing of outcomes (consequentialist) or driven by virtues such as charity or benevolence (virtue ethics). In the latter case, and unlike the first two, these qualities of virtue are not derived or founded in other systems such as morality. Rather, “virtues and vices

412

Societal Impacts and Ethics of GIS

will be foundational for virtue ethical theories and other normative notions will be grounded in them” (Hursthouse and Pettigrove, 2016). We then trace the emergence of an explicit concern with ethics in GIS and mapping. This arose from a series of critiques and criticisms of GIS in practice in the 1980s, which took the debate well beyond a concern with professional codes. Again these were embedded in societal concerns and the development (or rather redevelopment) of worries about the negative effects of technology. These were far from new (both Walter Benjamin and Martin Heidegger expressed similar deadening effects of technology in the 1930s), and one may also point to the legacy of job fears during the agricultural and industrial revolutions (Benjamin, 1968; Heidegger, 1977). Nevertheless, they represented an about face from the hopes of the early quantifiers in the 1950s and 1960s that technology and spatial analysis would build social justice to the fear that GIS would destroy social justice. In this, they perhaps did no more than retrace the steps of scientistsdsuch as those working on the Manhattan Project. Following the rapprochement achieved at Friday Harbor and the “GIS and Society” I19 initiative, ethical critique in GIS seems to be stalled at the level of a situationist approach. Technology is neither inherently good nor bad but has to be understood in its particular condition. There has not, to date, been a renewed effort to advance explicitly ethical thinking in GIS in the 21st century. Rather, there are numerous subfields such as bioethics, robot or machine ethics, military ethics, geoethics, and so on. However, there has been a sustained concern with issues of social justice and equality, especially in disciplines related to GIScience such as geography. As our case studies show, one can detect parallel claims for Big Data, Geodesign and predictive policing to the ones made by the early spatial quantifiers; that they will solve societal problemsdand similarly too one can detect pushback, from a nascent “critical Big Data studies” (e.g., boyd and Crawford, 2012). What then can we conclude about ethics of GIS today? Here we can draw from our case studies. We would make two general points. First, whereas GIS, spatial analysis, and mapping have been important but largely specialized fields until now, what we see today is the increasing prevalence of technologies in everyday society which have as essential components a geospatial technological aspect. Under such names as the Smart City, algorithmic governance, or the Internet of Things, technology is increasingly perfusing through society in everyday ways. And as it does so, it incorporates geolocational and spatially mediated capacities. As our case studies reveal, these include ethical questions such as privacy and geosurveillance, participation, access, and so on. Thus, we may say the issue is not so much “the ethics of GIS” but questions of ethics in geolocationally enabled technologies. One suggestive implication of this is that GIS as such, is dead, replaced as it is by a more diffuse spatial mediation (Leszczynski, 2015). Second, our case studies point toward an alternative understanding of ethics beyond the traditional tripartite division of rules, outcomes, or virtues. Ethics is revealed not so much as a field which yields answers or provides an answer of what is the right thing to do, but as one in which ethics is a practice of posing questions. This might appear as both less satisfying and more difficult, but there is nothing particularly new about it. Under the name “critique” this practice has long antecedents (both Kant and Marx wrote books of critique). This notion of “critical ethics” can be summarized as having four major components (Crampton, 2010). First, it can be acknowledged that GIS and mapping continue to provide useful ways of understanding the world but that these orders of knowledge also incorporate unexamined assumptions. These assumptions act as limits that can be challenged. One way of examining assumptions is second, to historicize our knowledge. By putting what we know in historical perspective, or by examining our intellectual histories, it allows us to conceive of other knowledges. Third, a critical ethics is one that can be used to ask in what ways what we know is related to power. How is our knowledge, or what gets agreed to as knowledge or fact get shaped by social, political, and economic relations? Fourth, a critical ethics is one directed toward an end, rather than a neutral position. This encourages its proponents to take a stand. This might appear odd given the traditional role of scientists to remain objective, but perhaps it does no more than make explicit their already existing motivations. Especially of interest here is how critical mapping, for example, has an activist and even emancipatory flavor to it. In this sense it overlaps with the other three components, for example, by historicizing some assumed “natural” set of knowledge as being contingent. Or it may resist the formation of knowledgedeven counter-knowledgedaltogether in an effort to change the rules of the game (e.g., the German Census boycott of the early 1980s, see Hannah, 2010). In this view, an ethics for GIS is not a question of the right thing to do, but rather of a refusal.

References Barnes, T.J., 2004. Placing ideas: Genius loci, heterotopia, and geography’s quantitative revolution. Progress in Human Geography 28 (5), 565–595. Barnes, Trevor J., 2008. Geography’s underworld: The military–industrial complex, mathematical modelling and the quantitative revolution. Geoforum 39 (1), 3–16. Barnes, T.J., Wilson, M.W., 2014. Big data, social physics, ands spatial analysis: The early years. Big Data and Society 1, 1–14. Barnett, C., 2010. Geography and ethics: Justice unbound. Progress in Human Geography 35 (2), 246–255. Barnett, C., 2013. Geography and ethics III: From moral geographies to geographies of worth. Progress in Human Geography 38 (1), 151–160. Baker A (2010) Paterson urged to veto curb on stop-and-frisk list. The New York Times. Beja M (2010) NY law limits NYPD stop-frisk database. Associated Press: Police One. Benjamin, W., 1968. Illuminations. Essays and Reflections. Random House, New York. Booth, C., 1889. Life and Labour of the People in London, vol 1. Macmillan & Co., London and New York. Boyd, D., Crawford, K., 2012. Critical questions for big data. Information, Communication & Society 15 (5), 662–679. Bryan, J., Wood, D., 2015. Weaponizing maps. Guilford Press, New York. Buchanan, I., 1997. The problem of the body in Deleuze and Guattari, or, what can a body do? Body and Society 3 (3), 73–91. Bunge, W., 1971. Fitzgerald: Geography of a revolution. Schenkman Publishing Company, Cambridge, MA. CCR, 2012. Stop and Frisk: The Human Impact. The Center for Constitutional Rights, New York.

Societal Impacts and Ethics of GIS

413

Coleman M (2016) State Power in Blue (American Association of Geographers Conference Plenary Lecture 2015). Political Geography 51(1): 76–86. Chrisman, NR., 2006. Charting the unknown: How computer mapping at Harvard became GIS. ESRI Press, Redlands. Crampton, J.W., 2009. Rethinking maps and identity. Choropleths, clines and biopolitics. In: Dodge, M., Kitchin, R., Perkins, C. (Eds.), Rethinking maps. Routledge, London. Crampton, J.W., 2010. Mapping: A critical introduction to cartography and GIS. Wiley-Blackwell, Singapore. Crampton, J.W., 2014. The power of maps. In: Cloke, P., Crang, P., Goodwin, M. (Eds.), Introducing human geographies, 3. Hodder Education, London. Crampton, J.W., Graham, M., Poorthuis, A., Shelton, T., Stephens, M., Wilson, M.W., Zook, M., 2013. Beyond the geotag: Situating ‘big data’ and leveraging the potential of the geoweb. Cartography and Geographic Information Science 40 (2), 130–139. Crampton, J.W., Wilson, M.W., 2015. Harley and Friday Harbor: A conversation with John Pickles. Cartographica 50 (1), 28–36. Dangermond J (2015) Welcome and opening remarks. In: 2015 Geodesign Summit. http://www.esri.com/videos/watch?playlistid¼series_225&channelid¼LegacyVideo&isLegacy¼true&title¼Experiments-in-Geodesign-Synthesis&isLegacy¼true&title¼Welcome-&-Opening-Remarks. (Last accessed April 10, 2017). Daston, L.J., Galison, P., 2010. Objectivity. Zone Books, New York. Densham, P.J., 1991. Spatial Decision Support Systems. In: Maguire, D.J., Goodchild, M.F., Rhind, D.W. (Eds.), Geographical Information Systems: Principles and Applications. Wiley, New York, 403–12. de Beauvoir, S., 2011/1949. The Second Sex. Vintage, New York. Deleuze G and Guattari F (1987) A thousand plateaus. Capitalism and schizophrenia (B. Massumi, Trans.). Minneapolis and London: University of Minnesota Press. Del Signore J (2010a) NYPD chief: Let us at least keep stop n’ Frisk data for 1 year. Gothamist. New York. Del Signore J (2010b) Paterson signing bill ending massive stop and frisk database. Gothamist. New York. Dienst J, Stulberger E, and McHugh R (2016) I-Team: NYPD Provides Unprecedented Look at Compstat. NBC I-Team. http://www.nbcnewyork.com/news/local/Compstat-New-YorkNYPD-Police-Tour-375761141.html (accessed 1 May 2017). Dobson, J.E., 2009. Let the indigenous people of Oaxaca speak for themselves. Ubique 29 (1), 1–2. , 4, 7–8, 10–11. Dueker K (2012) Origin and evolution of URISA. In: Weller B (ed.) Foundations of urban and regional information systems and geographic information systems and science, pp. 3950. Des PLaines, IL: URISA. D’Ignazio C and Lauren FK (2016) Feminist data visualization. In: VIS4DH: 2016 Workshop on Visualization for the Digital Humanities. http://www.kanarinka.com/wp-content/ uploads/2015/07/IEEE_Feminist_Data_Visualization.pdf (accessed 10 April 2017). Dutton, G. (Ed.), 1978. First International Advanced Study Symposium on Topological Data Structures For Geographic Information Systems. Harvard Papers on Geographic Information Systems. Laboratory for Computer Graphics and Spatial Analysis, Cambridge, MA. Elden, S., 2002. The war of races and the constitution of the state: Foucault’s ‘Il faut defendre la société’. Boundary 2 (29), 125–151. Elden, S., 2016. Foucault’s last decade. Polity Press, Malden, MA. English, S., 2010. New York police to continue stop-and-frisk onslaught despite new law. World Socialist Website. https://www.wsws.org/en/articles/2010/07/fris-j31.html. Esri, 2013. Geodesign in practice: Designing a better world. Esri Press, Redlands. Esri (2015) Esri | ArcGIS open data. http://opendata.arcgis.com/about (accessed 10 April 2017). Foster K (2013) Geodesign education takes flight. ArcNews. Fall. http://www.esri.com/esri-news/arcnews/fall13articles/geodesign-education-takes-flight (accessed 10 April 2017). Foucault, M., 1997. The ethics of the concern for self as a practice of freedom. In: Rabinow, P. (Ed.), Ethics, subjectivity and truth. Essential works of Foucault 1954–1984, Vol. I. The New Press, New York, pp. 281–301. Gabrys, J, 2016. Program earth: Environmental sensing technology and the making of a computational planet. University of Minnesota, Minneapolis. Gerlach, J., 2010. Vernacular mapping, and the ethics of what comes next. Cartographica 45 (3), 165–168. Gilsanen K and Stepan A (2014) From Compstat to Gov 2.0 Big Data in New York City Management. Produced for Case Consortium at Columbia and the Picker Center for Executive Education, SIPA. 1-20. http://ccnmtl.columbia.edu/projects/caseconsortium/casestudies/127/casestudy/files/global/127/NYC%20Big%20Data%20final%20061814.pdf. (Accessed 1 May 2017). Goodspeed, R, 2016. Sketching and learning: A planning support system field study. Environment and Planning B: Planning and Design 43 (3), 444–463. Gordon, C., 2008. Mapping Decline: St. Louis and the Fate of the American City. University of Pennsylvania Press, Philadelphia. Gould, P., 1981. Letting the data speak for themselves. Annals of the Association of American Geographers 71 (2), 166–176. Hacking, I., 2002. Historical ontology. Harvard University Press, Cambridge, MA. Hacking, I., 2015. Biopower and the avalanche of printed numbers. In: Cisney, Vernon W., Morar, Nicolae (Eds.), Biopower: Foucault and beyond. University of Chicago Press, Chicago, pp. 65–81. Haklay, M., 2013. Neogeography and the delusion of democratisation. Environment and Planning A 45 (1), 55–69. Halpern, O., 2014. Beautiful data: A history of vision and reason since 1945. Duke University Press, Durham. Hannah, M., 2010. Dark territory in the information age. Learning from the west German census controversies of the 1980s. Ashgate, Farnham. Haraway, D.J., 1988. Situated knowledges: The science question in feminism and the privilege of partial perspective. Feminist Studies 14 (3), 575–599. Harcourt, B., 2007. Against prediction: Profiling, policing, and punishing in an actuarial age. University of Chicago Press, Chicago. Harley, J.B., 2001 [1991]. Can there be a cartographic ethics? In: Harley, J.B. (Ed.), The new nature of maps: Essays in the history of cartography. Johns Hopkins University Press, Baltimore, pp. 197–207. . Reprinted from Cartographic Perspectives 10, 9–16. Heidegger, M., 1977. The Question Concerning Technology and Other Essays. Harper and Row, New York. Heynen N and Barnes TJ (2011) Fitzgerald then and now. In: Bunge W (ed.) Fitzgerald (2nd edn.) Athens GA: University of Georgia Press. Horwood E (2012 [1977]) Perspectives on URISA’s origin and on the emergence of a theory of urban and regional information systems. In: Weller B (ed.) Foundations of urban and regional information systems and geographic information systems and science, pp. 20-38. Des, Plaines, IL: URISA. Hursthouse R and Pettigrove G (2016) Virtue ethics. The Stanford encyclopedia of philosophy. In: Zalta, Edward N (ed.). https://plato.stanford.edu/archives/win2016/entries/ethicsvirtue/ (accessed 10 April 2017). Jordan, T.G., 1988. The intellectual core. AAG Newsletter 23 (5), 1. Joy of Stats (2011) YouTube. https://www.youtube.com/watch?v¼en2ix9f8ceM (accessed 10 January 2017). Kaufman, E., 2016. Policing mobilities through bio-spatial profiling in New York City. Political Geography 55, 72–81. Katz M (2016) Interactive NYPD Map Lets You View Latest Crime Stats Block By Block. Gothamist. http://gothamist.com/2016/02/23/compstat_2_nypd_crime_map.php (accessed 1 May 2017). Kitchin, R., 2014. The data revolution: Big data, open data, data infrastructures and their consequences. Sage, Los Angeles. Klosterman, R.E., 1997. Planning support systems: A new perspective on computer-aided planning. Journal of Planning Education and Research 17 (1), 45–54. Lake, R.W., 1993. Planning and applied geography: Positivism, ethics, and geographic information systems. Progress in Human Geography 17, 404–413. Latour B (1993) We have never been modern (Catherine Porter, Trans.). Cambridge: Harvard University Press. Latour B (2008) A cautious Prometheus? A few steps toward a philosophy of design (with special attention to Peter Sloterdijk). In: Hackney F, Glynne J, Minton V (eds.) “Networks of Design”, Annual International Conference of the Design History Society, September, University College Falmouth, Cornwall, United Kingdom. Universal Publishers, pp. 2–10. Leszczynski, A., 2015. Spatial media/tion. Progress in Human Geography 39 (6), 729–751. Jennifer, S.L., 2003. From warfare to welfare: Defense intellectuals and urban problems in cold war America. Johns Hopkins University Press, Baltimore. Maguire, D.J., Goodchild, M.F., Rhind, D.W. (Eds.), 1991. Geographical information systems. Principles and applications. Vols. 1–2. Longman, Harlow.

414

Societal Impacts and Ethics of GIS

NYPD (2015) Stop, Question and Frisk Report Database. http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.shtml (accessed 1 May 2017). Thee Rant (2016) NYCTPF. http://theerant.yuku.com/topic/47635/NYCTPF-NYCTPF#.WRCJ3ojyvIU (accessed 1 May 2017). McHaffie, P, 1995. Manufacturing metaphors: Public cartography, the market, and democracy. In: Pickles, John (Ed.), Ground truth: The social implications of geographic information systems. The Guilford Press, New York, pp. 113–129. McHaffie, P., Andrews, S.K., Dobson, M., 1990. Two anonymous employees of a federal mapping agency. Ethical problems in cartography a roundtable commentary. Cartographic Perspectives 7, 3–13. McHarg, I.L., 1994 [1969]. Design with nature. Wiley, New York. McMahon, C.F., 2013. Predictive machines: Data, computer maps, and simulation. In: Dutta, Arindam (Ed.), A second modernism: MIT, architecture, and the “techno-social” moment. The MIT Press, Cambridge, pp. 436–473. Miller, W.R., 2012. Introducing geodesign: The concept. Esri Press, Redlands. Monmonier, M., 1991. How to lie with maps. University of Chicago Press, Chicago. Morrill, R., 1965. The Negro ghetto: problems and alternatives. The Geographical Review 55 (3), 339–361. Morrill, R., 1993. Author’s response. Progress in Human Geography 17 (3), 352–353. Pickles, J., 1991. Geography, GIS, and the surveillant society. Papers and Proceedings of Applied Geography Conferences 14, 80–91. Pickles, J., 1995. Ground truth. Guilford, New York. Pickles, J., 2004. A history of spaces. Cartographic reason, mapping and the geo-coded world. Routledge, London. Poiker, T., 1995. Preface. Cartography and GIScience 22 (1), 3–4. Rosling H (2011) Crimespotting: Joy of Stats. https://youtu.be/en2ix9f8ceM. (accessed 10 April 2017). Schuurman, N., 2000. Trouble in the heartland: GIS and its critics in the 1990s. Progress in Human Geography 24 (4), 569–590. Schuurman, N, 2004. GIS: A short introduction. Blackwell, Malden. Schuurman, N, Pratt, G, 2002. Care of the subject: Feminism and critiques of GIS. Gender, Place and Culture 9 (3), 291–299. Scott, J.C., 1998. Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press, New Haven. Sheppard, E., 2005. Knowledge Production through Critical GIS: Genealogy and Prospects. Cartographica 40, 5–21. Shoshkes, E, 2006. Jaqueline Tyrwhitt: A founding mother of modern urban design. Planning Perspectives 21 (2), 179–197. Smith, N., 1992. Real Wars, Theory Wars. Progress in Human Geography 16, 257–271. Smith, N., 2003. American empire: Roosevelt’s geographer and the prelude to globalization. University of California Press, Berkeley. Steinitz, C, 2012. A Framework for Geodesign: Changing Geography by Design. Esri Press, Redlands, CA. Steinitz, C, Parker, P, Jordan, L, 1976. Hand-drawn overlays: Their history and prospective uses. Landscape Architecture 66 (5), 444–454. Stephens, M, 2013. Gender and the GeoWeb: Divisions in the production of user-generated cartographic information. GeoJournal 78 (6), 981–996. Sugrue, T.J., 1996. The origins of the urban crisis: Race and inequality in postwar Detroit. Princeton University Press, Princeton. Tobler, W., 1959. Automation and Cartography. Geographical Review 49, 526–534. US Department of Health, Education and Welfare (1979) The Belmont Ethical. Principles and Guidelines for the Protection of Human Subjects of Research. Washington DC. Available: https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html [Last Accessed April 10, 2017] Wainwright, J., 2013. Geopiracy: Oaxaca, militant empiricism, and geographical thought. Palgrave Macmillan, New York. Wilmott, C., 2016. Small moments in spatial big data: Calculability, authority and interoperability in everyday mobile mapping. Big Data & Society 3 (2). Wilson, M.W., 2009. Towards a genealogy of qualitative GIS. In: Cope, Meghan (Ed.), Qualitative GIS. Sage, London, pp. 156–170. Wilson, M.W., 2015. On the criticality of mapping practices: Geodesign as critical GIS? Landscape and Urban Planning 142, 226–234. Wilson MW (2017) New lines: Critical GIS and the trouble of the map. University of Minnesota Press. Young, I.M., 2010. Responsibility for justice. Oxford University Press, New York.

1.28

Geoprivacy

Marc P Armstrong, The University of Iowa, Iowa City, IA, United States Ming-Hsiang Tsou and Dara E Seidl, San Diego State University, San Diego, CA, United States © 2018 Elsevier Inc. All rights reserved.

1.28.1 1.28.2 1.28.3 1.28.3.1 1.28.3.2 1.28.4 1.28.5 1.28.5.1 1.28.5.2 1.28.5.3 1.28.5.3.1 1.28.5.3.2 1.28.5.3.3 1.28.5.3.4 1.28.6 1.28.7 References

1.28.1

Introduction Privacy Geoprivacy System Provided Locations User Provided Locations Data Spying and Social Media Transformations of User Provided Information to Yield Locations US Addresses Resolving Address Components Geocoding Operations Address geocoding Geocoding using parcel maps Place name geocoding Reverse geocoding Cross-Linking Information to Define Activity Spaces for Individuals Concluding Remarks

415 415 416 420 420 421 422 422 423 423 423 424 424 426 427 429 429

Introduction

Concerns about privacy are deeply ingrained in the psyche of most US citizens. Such concerns often reach a peak following news reports about data breaches by hackers, or when information normally presumed to be confidential is accidentally revealed by a corporation or government agency. Yet many individuals who express concern about data privacy are either willfully ignorant about, or display a wanton disregard for, the revelation of locational information contained in a variety of digital data repositories including location-based services and social media transactions. The purpose of this article is to sketch out basic principles about how locational information can be gleaned from various kinds of maps and digital records and linked with other information to compromise privacy expectations. The article first presents a conceptual overview of the concepts of privacy and geoprivacy. The emphasis then shifts to examine how social media transactions can be transformed to yield information about locational activities. A focus is placed on social media since it has seen explosive growth during the past several years and in many cases, social media information can be easily transformed and cross-linked to yield space–time traces of human activity patterns at different levels of spatial and temporal resolution. These traces, which are analogous to pheromones (or geomones), chemicals released by organisms, specify where we have been (and in some instances for how long) and can therefore be used by a determined data spy to violate the privacy of individuals and groups of individuals.

1.28.2

Privacy

Privacy is a fluid and multifaceted concept that has evolved hand-in-glove with changes in society and advances in technology. Legal restrictions against wire-tapping, for example, did not exist before the advent of the telegraph and telephone (Dash et al., 1959). Expectations about privacy may also change drastically depending on personal characteristics such as age, employment status, and other factors that exist in particular social and cultural settings. In the United States, for example, the amount of privacy accorded to an individual in a household will normally vary considerably between the time when they are a toddler (nearly no privacy) and a teenager (much privacy desired). And for adults, living in a nuclear submarine, dormitory, or military barracks will provide an individual far less privacy than someone residing in a single family dwelling unit in a rural area. Despite this variability, all humans need, and expect, some measure of privacy. While a right to privacy is not explicitly guaranteed in the US Constitution (though search is explicitly restricted by the Fourth Amendment), it is a long-championed idea (Warren and Brandeis, 1890), an important element in tort law, and figures prominently in many state-level statutes and federal acts (e.g., HIPAA and FERPA). Moreover, privacy is a foundational tenant of the Universal Declaration of Human Rights (http://www.un.org/en/documents/ udhr/ (last accessed 28 September 2015)) of the United Nations (Article 12). It is also important to note that since the dawn of the computer era, privacy has been a concern of scholars, pundits, and citizens alike (e.g., Miller, 1971; US Department of Health, Education and Welfare, 1973). It is also likely that privacy concerns will continue to be translated into actionable policies and legal

415

416

Geoprivacy

restrictions. For example, the European Union (EU) has recently entered into a joint framework (http://ec.europa.eu/justice/dataprotection/files/factsheets/factsheet_eu-us_privacy_shield_en.pdf) with the United States that is designed to provide increased privacy protection for EU citizens by obligating US corporations to improve their data safeguards and restricting US government agencies from mass surveillance activities. At a more conceptual level, privacy theorists tend to fall into certain “camps” as described by Waldo et al. (National Research Council, 2007b). A predominant view is that privacy is an exceedingly complex construct, with some going so far as to suggest that it is not a concept at all, but is instead a bundle of ideas that cohere around only a general notion of privacy. Thomson (1975), for example, suggests that privacy is an assemblage of concepts such as property and bodily rights. DeCew (1997) takes a different slice through the conceptual thicket of privacy and partitions it into informational, physical, and mental access to one’s self and decision-making autonomy. Solove (2008) posits privacy pluralistically, without a concrete definition that is best understood by analyzing privacy problems in society. Nissenbaum (2004) advocates a more contextual view based on social norms. Such norms condition the flow of information among parties, and it is this information that plays a key role, forming what is called a norm with contextual integrity. In this view, a violation of privacy means that there is a violation of a contextual norm. Such normative interrogations have been elucidated with respect to aggregated information contained in geodemographic databases (Curry, 1998; Goss, 1995), though when aggregated data (e.g., census tracts) are used, individual characteristics are not directly available and must be inferred. Contextual integrity situates the assemblage of privacy constructs within a broader, contingent social framework. Yet this apparent flexibility may not hold when certain kinds of privacy are considered, since in some instances, violations can occur with neither the knowledge nor consent of affected parties. And even when there is informed consent, people may not fully understand how individual-level information (particularly geospatial) can be assembled and used to create activity profiles that reveal personal information thought to be private. Why is this? The capabilities of geospatial technologies are not widely understood and, more fundamentally, many US adults have a poor understanding about maps and geography in general. Consider that it is relatively straightforward to understand that a wiretap can be used to record identifiable voice information. In a similar fashion, a video camera collects identifiable visual information. Both media (audio and video) figure prominently in our everyday sensory experiences. However, knowledge about how a string of addresses can be geocoded to yield a sequence of time-stamped coordinates that define a mapped path of an individual’s movement through an urban space involves a far more abstract set of linked operations and concepts that requires more work to appreciate all that they reveal. Control over information also plays a central role in the conceptualization of privacy. Individuals present themselves to the public in a variety of different ways. In some cases, a professional presentation of self might take the form of what is revealed to coworkers in a work environment. Other presentations might be related to hobbies or other recreational activities. A whitecollar worker, for example, may not reveal much about his or her professional identity to the other members of their bowling league team. And in still other cases, personal information unrelated to either setting (professional or recreational) is something that an individual may wish to keep strictly private. Such information may be related, for example, to health status or religious affiliation. These examples refer to a particular facet of privacy often referred to as information privacy and, as suggested earlier, the key issue is control: we wish to have some measure of control about what is, and is not, revealed to others in particular settings. Indeed, Westin (1967, p. 7) uses this notion to define privacy, which is a “claim of individuals, groups, or institutions to determine for themselves when, how and to what extent information about them is communicated to others.” An information privacy breach occurs when personal information that is not intended to be revealed in public venues is obtained without the permission of an individual. It is here, once again, that individuals may incorrectly feel as though they have control over what they reveal, and may not be aware that the publication and subsequent aggregation of individual geographical facts can reveal a great deal more than they ever intended.

1.28.3

Geoprivacy

Geoprivacy is a relatively new construct that has arisen with the emergence and confluence of new technologies (GIS, Global Positioning System (GPS), smart phones, and social media) that are able to capture and transform information about the movement of individuals in space (see, e.g., Armstrong and Ruggles, 1999; Beresford and Stajano, 2003; Kounadi and Leitner, 2014; De Souza and Frith, 2012). The ability to use maps to cross-link among various types of information is not new. What has changed, dramatically, is the fluency (ease or facility of accomplishing a task) and flux (volume and flow of information) of accomplishing such tasks. As an illustration of this effect, Fig. 1 shows a side-by-side comparison of land ownership parcels for a city block in Iowa City, IA. The map on the left is a screen shot of a modern GIS-based parcel map that links to ownership information (in its online format) by clicking in the parcel polygon. The map on the right is a Sanborn map published in 1933. Such maps played an important role in understanding the magnitude of fire risks in urbanized communities throughout the United States. Though more than 80 years have gone by since the publication of the Sanborn map, and the basic configuration of dwellings on this block are similar (considering both parcels and building footprints), some differences are apparent, particularly in the size of automobile-era garages. But going beyond this geometrical fidelity, the 1933 analog map can be cross-linked with other data sources (analog and digital) to establish information about people who resided on that block; GIS software is not required. A key link is the address, which is listed on the street frontage in the Sanborn map, and is the number contained within each parcel in the GIS-based map.

Geoprivacy

417

Fig. 1 Digital (2016) and analog (1933) versions of a city block in Iowa City, IA. A particular focus is placed on East Davenport Street. Sources: Left, Johnson County Property Information Viewer https://gis.johnson-county.com/piv/; Right, Digital Sanborn Maps http://sanborn.umi.com/ia/2695/ dateid-000009.htm?CCSI¼2802n.

Fig. 2 is a scanned screen snip from a (printed) 1943 Polk City Directory for Iowa City. It shows the names of people residing on each side of a block of East Davenport Street (between North Linn Street and North Gilbert Street), sorted by increasing address, rather than by last name, as they are sorted in telephone directories. We can see from the Sanborn map that the residence at 332 East Davenport is a relatively large structure on a corner lot. Then, from the Polk directory we can see that the resident, and presumptive owner of that time, is Emma J. Harvat. Using other links, we can uncover that Ms. Harvat was not only independently wealthy, she was the first female mayor (for a city size greater than 10,000) in the United States. While it is clearly possible to use analog sources to establish data linkages, the process is relatively clumsy (and slow). With modern computing technology, such linkages appear to happen instantaneously and can be forged across numerous domains (http://iowacity.iowaassessors.com/sale.php?gid¼282277&sid¼141; https://en.wikipedia.org/wiki/Emma_J._Harvat_ and_Mary_E._Stach_House; http://iowawritershouse.org/new-events/mary-rakow-workshop).

Fig. 2 A list of addresses and residents in 1943 for the 300 block of East Davenport Street, Iowa City, IA. Source: Economy Advertising Company (1943, pp. 263). Iowa City directory, p. 263. Omaha, NE: R.L. Polk & Company.

418

Geoprivacy

The property is close to a multifamily (student) apartment complex to the north (Fig. 3), is in good repair, and was relandscaped between 2010 and 2014 (Fig. 4). Using current garden-variety technology, it is also easy to ascertain that the center of the house footprint is at (41.666,  91.530), and that it was most recently purchased in March 2014, from a Professor Emerita of English at The University of Iowa, who was one of the founders of Iowa’s Creative Nonfiction Program. She was formerly a coowner of the dwelling (Fig. 5) with a University of Iowa Professor of 18th Century French Literature who passed away at relatively early age in December 1981 (Fig. 6). Other links reveal that the current owner of the house is a writer and that the building is the current location of the Iowa Writers’ House. It is this fluent flux of easily obtained cross-linked information that has given rise to concerns about geoprivacy. Maintaining geoprivacy means that an individual is assured that they are secure from any unwanted tracking of their activities. This is important because amassed time-stamped location data allows for the creation of a detailed profile of individual and group behavior, including inferred habits, preferences, and routinesdprivate information that could be exploited and cause harm: where you go implies what you do. However, many people leave persistent and discoverable digital trails behind them as they engage in their everyday activities. Some of this trail-like information is legally mandated (e.g., location for an E911 emergency), and, as

Fig. 3

Google Maps screen shot of the city block shown in Fig. 1. Source: Google Maps.

Fig. 4

Google Street View, October 2012. Source: Iowa City Assessor, http://iowacity.iowaassessors.com/parcel.php?parcel¼1010156011.

Geoprivacy

419

Fig. 5 Parcel record denoting ownership filed at the Johnson County Assessor’s Office. Note that one owner is crossed out when the property was reassessed in 1983. Source: Johnson County Parcel Documents, https://ww1.johnson-county.com/ICA/ParcelDocument/Details/56243.

Fig. 6

Newspaper story from 1981 about one owner of 332 East Davenport Street. Source: The Daily Iowan, Monday, December 7, 1981, p. 11.

described in greater detail later, is routinely collected by cell phone service providers, while other locational data is usually leaked and unnoticed by social media users and others engaged in online social and economic transactions. In fact, it has been reported that there are 65 billion location-tagged payments made in the United States each year (Tucker, 2013). There are two general cases in which specific location information can be accumulated to construct locational histories. The first uses system-level information that is collected routinely by service providers. Location information, for example, is available for all cell phones connected to a network (Smit et al., 2012). Each phone continuously “checks in” with the network to determine its current cell tower (and directional antenna sector) and other information that is collected by the service provider. Such information can be used to provide location-based navigation and other services, though cell tower information is routinely supplemented with more precise GPS coordinates or Wi-Fi positioning. (GPS refers to a specific satellite location service provided by the US military. The term Global Navigation Satellite System is a more general term that includes, for example, the Russian GLONASS, European Galileo, and Chinese Beidou systems.) Users in this first case have no control over what is recorded by the service provider, since it is an integral part of the service and is not normally accessible except when it may be divulged as a consequence of a legal request accompanied by a search warrant. Increasingly, however, wireless carriers are attempting to create profits from the information they collect and are now repackaging it (Leber, 2013). For example, AirSage (http://www.airsage.com/) is partnering with US wireless carriers to resell anonymous call detail records. In the second case, users “opt in” (or more commonly, they do not opt out) and volunteer their location in exchange for a service, such as navigation assistance or finding nearby restaurants. For example, if locational services are enabled on a cell phone, under optimal conditions, Google will know its current location to within  10 m. This second general case also includes the content of information posted in public or semipublic social media transactions. This type of locational information, such as a place name or point of interest (POI), usually must undergo one or more transformations, such as the addition of a coordinate, before it can be used to compile a location history and is passed along for additional uses in a variety of GIS-enabled contexts. Though such locational information may take several forms, it can be placed into two broad types. System provided information is captured in the form of coordinates that are often measured or estimated based on known relationships between a mobile device and external sources with well-defined locations. User provided information, on the other hand, consists of either coordinates or text strings that must be parsed and processed using geographic base files to yield coordinate values. System and user provided locational information can be further refined into the following categories of (roughly) decreasing accuracy:

420 1.28.3.1

Geoprivacy System Provided Locations

1. GPS coordinates. Several types of media require that accurate locational tags be attached to transactions. Some applications are explicitly location-dependent (e.g., a Yelp search) while others have location-providing options (Facebook). Most smart phones have GPS receivers that provide coordinates which can be attached as metadata tags for text and images. This is known as geotagging. 2. Cell phone towers. Carriers must implement the E911 requirements of the Wireless Communications and Public Safety Act of 1999 and provide an address to emergency responders when mobile phone users dial 911. If a GPS coordinate is unavailable, locations are determined by knowledge about which cell phone tower a user is connected to (and its angular sector), and in some cases, signal strength from multiple tower locations is used to triangulate a location. 3. Wi-Fi locations. Skyhook is one of the earliest companies that constructed reference network maps containing Wi-Fi access points (with unique MAC addresses) and cellular towers to provide geolocation services (http://www.skyhookwireless.com/aboutskyhook). To achieve this goal, Skyhook collected more than 800 million wireless network reference points for Wi-Fi signature matching and Wi-Fi positioning methods (http://www.skyhookwireless.com/coverage-map). Similar methods have been adopted by other companies, such as Apple and Google. In 2011, it became public knowledge that Apple, Google, and other companies had been collecting locational information about Wi-Fi hotspots to build their own reference maps and improve their network location capabilities. Google employees, for example, drove down a high percentage of US streets and geocoded Wi-Fi access points as they collected images for Street View (they subsequently announced that this practice was halted when it attracted negative publicity). Nevertheless, at least one commercial firm uses a similar approach having collected information on the location of approximately 400 million access points, and can identify locations with a median accuracy of 20–40 m (National Research Council 2013, p. 31). Zandbergen (2012) reports somewhat larger errors, however, based on tests conducted in three large cities and with two different devices. Wi-Fi triangulation is also used to support indoor location services, such as shopping mall navigation. 4. IP geolocation. IP geolocation (Muir and van Oorschot, 2009), a technique for identifying geographic location of Internet users, web servers, or Internet-connected devices, can be used to convert IP addresses into real world coordinates (latitudes and longitudes) or geographic regions. There are two types of IP geolocation techniques: active and passive (Tsou et al., 2013). Active IP geolocation relies on the measured latency (time delay) of network packets as they are routed from one IP address to another and between the locations of network routers and receivers. Passive IP geolocation uses a database-driven procedure to match the IP address and the geolocation of registered owners from WHOIS databases. The database can also include the latitude and longitude coordinates of geocoded addresses from the registry information. MaxMind is a leading company that provides services for the geolocation of IP addresses (https://www.maxmind.com/en/home) (Table 1). Though some commercial IP geolocation services have claimed that their spatial resolution can reach to the zip code level in the United States (and to the city level for other countries), researchers argue that significant uncertainty and accuracy problems exist for IP geolocation (Poese et al., 2011). Among social media companies, Facebook is known to suggest events to users and advertise events friends are attending “near you” based solely on IP address. 5. Time zone. Many mobile devices and smart phones can detect local time zones automatically when users travel to different cities or countries. Some social media messages (such as Twitter) will record the time zone of the device when a message is posted. The time zone information can help researchers to identify the daily activity patterns of social media messages and correct the actual time stamp of messages from the UTC (Coordinated Universal Time) time zone to the local time zone of users. Although the spatial resolution of a time zone is very low, it is still useful in some temporal analyses of social media data (such as analyzing the diffusion of viral messages across the globe).

1.28.3.2

User Provided Locations

1. Geotagged photos. Some social media (such as Flickr and Facebook) allow users to geotag their photos using either systemprovided or user-defined geolocations. The system-provided geolocation is stored in the exchangeable image file format (EXIF) header in photo files. When users take pictures from smart phones (with the geolocation function enabled) or a GPSequipped camera, coordinates will be automatically recorded inside the header file of photos (Fig. 7). Users can also manually geotag the location of photos from a map provided by the social media platform or by place names associated with the photos. Table 1

Examples of IP geolocation results from MaxMind GeoIP2 Precision service

IP address

Country code

Location

Coordinates

ISP

120.118.11.2

TW

Taipei, Taipei, Taiwan, Asia

25.0392, 121.525

191.118.100

USA

San Diego, California, United States, North America

32.7751,  117.0762

Taiwan Academic Network (TANet) Information Center California State University, Office of the Chancellor

Geoprivacy

Fig. 7

421

The latitude/longitude information stored inside the EXIF metadata header of a photo file (JPEG format).

2. Postal address or zip codes. Addresses are a commonly employed locational identifier (e.g., for pizza delivery). They can be transformed into coordinates for mapping using several different methods as described later. 3. “Check-In” at POI. Many social media services, such as Twitter, Foursquare, and Facebook provide a “check-in” function for their users. Users can identify and reveal their current locations by checking in at restaurants, hotels, or scenic places from a list of nearby POIs. These “check-in” records can then be used by various location-based services. 4. Place name. While often not accurate, and often not unique, there are at least seven cities named Urbana in the United States, for example, and Moscow (Vasiliev, 1992) is even more widely seen, names can be transformed using several resources such as telephone directories and gazetteers to yield a location or an address that can be processed to yield a location. However, place name location might be provided in multiple formats with different levels of locational specificity. For example, New York can refer to either a city or state, and New York City is less specific than Brooklyn, while Brooklyn is less specific than Williamsburg.

1.28.4

Data Spying and Social Media

Data spying is the practice of obtaining information about a person or persons without consent (Armstrong et al., 1999). In social media, data spying can occur at various scales, from the observation of a single individual to massive personal surveillance. The perpetrator(s) of data spying may include friends or family members of the surveilled person(s), individuals unknown to the surveilled, outside researchers collecting social media data online, government agencies, or social media companies themselves. Motivations for data spying may consist of crime prevention, academic or recreational research, personal interest in the surveilled, or criminal activity, such as stalking or burglary. These categories are not wholly inclusive examples of potential data spies or motivations for data spying, but are meant to highlight disparate threats to privacy and geoprivacy. As a means of combating the activities of data spies, various protocols have been suggested that mask or obfuscate the locational information contained on data records (Armstrong et al., 1999; Clarke, 2015; Kwan et al., 2004; Seidl et al., 2015; Zimmerman and Pavlik, 2008). While there was wide public outcry in 2014 about the discovery that Facebook conducted a psychological experiment on almost 700,000 users to omit or include news feed content associated with negative emotions (Albergotti, 2014; Kramer et al., 2014), it is not uncommon for social media platforms to analyze user data. Similarly, the popular music service Spotify received criticism in 2015 for updates in its privacy policy to request access to user GPS data, photos, and other smartphone sensor data (Hern and Rankin, 2015). Smartphone applications with a social element enable data spying by developers if there are no regulations to prevent the practice. In 2014, taxi-alternative company Uber was criticized for its “God view” real-time GPS tracking of all Uber drivers and customers available to employees (Hill, 2014).

422

Geoprivacy

Much of the personal data available to social media companies is also accessible to government agencies. For example, the US National Security Agency secret PRISM program, revealed in a release of classified documents by Edward Snowden, had direct access to search histories, e-mail content, and file transfers from Yahoo, Google, Facebook, and Apple (Greenwald and MacAskill, 2013). From an academic perspective, there are strong benefits to accessing social media and administrative data, such as tax returns and patient information (Hayden, 2015). Such data, for example, hold promise for properly evaluating the effectiveness of government social programs. At the same time, a lack of informed consent from study subjects can hint at data spying and violate regulations imposed by institutional review boards (IRB). The growth of other technologies that can be linked to social media strengthens the potential for data spying. In particular, the boom in UAV/drone technology has led to a growth in drone-assisted social media, such as www.dronestagr.am/, where users can upload their drone flight photos and video. These new developments challenge the notion of where one can reasonably expect to find private space. For example, one UAV video that went viral depicted a monk sunbathing at the top of a wind turbine (Heim, 2015). To highlight the vulnerability of social media users to location data spying, several projects have focused on posting geocoded user locations on online maps based on geotagged Facebook, Twitter, or Instagram posts. One such effort is the Teaching Privacy Project from UC Berkeley and its “Ready or Not?” app found at app.teachingprivacy.com (last accessed 28 September 2015).

1.28.5

Transformations of User Provided Information to Yield Locations

Most original user provided locational information in social media must be transformed to make it useful in location-specific applications. Two common transformations are referred to as geocoding and reverse geocoding. Geocoding attaches locational identifiers (e.g., coordinates) to addresses. Reverse geocoding, as the term implies, refers to a process of finding addresses from mapped data with the goal of determining a single address from a coordinate value. Other transformations that yield less accurate, but still revealing, results are described in the next section of the article. The geocoding transformation requires two data sources: (1) either an address or an input file containing a list of addresses to be transformed through the addition of a coordinate and (2) a geographic base file, which may consist of a street centerline file with address ranges (e.g., US Census TIGER, used for interpolated geocoding) or a parcel map with addresses and coordinates for either a parcel or building footprint centroid (used for direct geocoding). The input file typically consists of, at a minimum, a unique ID along with a postal address that may occur in one or more fields. Though most people can easily decode and process addresses, the historical absence of uniform address protocols and the number of possible types of address combinations means that when placed into a computer system, addresses can take on considerable complexity. Thus, during the geocoding process, several sources of error are often encountered (Armstrong and Tiwari, 2007). Because of the magnitude and variety of such errors, standards have been developed to make the structure and content of addresses more uniform.

1.28.5.1

US Addresses

The United States Postal Service (USPS) delivers millions of items of addressed mail each year and has established conventions that are intended to speed up delivery. For example, the Postal Addressing Standards (USPS, 2013), specifies that each address will take the following form: IVANHOE DRUMLIN: Recipient Line EASTERN AVE: Delivery Address Line WATERVLIET NY 12189-0001: Last Line Each line is normally completed using capital letters, contains no punctuation (expect for the dash in the ZIP þ 4 code) and uses a set of standardized abbreviations. The recipient line is not normally used during geocoding. The remaining two lines, however, are central to the geocoding process. The delivery address line contains a set of components that are used to resolve each address: 1. 2. 3. 4. 5. 6. 7.

primary address number (in this case 20) predirectional indicator (often a cardinal direction) street name (EASTERN in this case) suffix (this is typically a thoroughfare type, AVE for avenue in the example) postdirectional indicator (again, usually a direction) secondary address identifier (these may be identifiers such as apartment or suite) secondary address number (usually an integer or alphabetic code such as 3G)

The city names in the last line are normally spelled in their entirety, although the USPS (2013) has compiled a list of standardized abbreviations for city names, as well as state names and street suffixes. For example, the commonly used street suffixes or abbreviations for ALLEY are contained in the set {ALLEE | ALLEY | ALLY | ALY} with ALY being the preferred abbreviation specified in the standard. This ontological approach enables different users and data sources to be integrated to achieve a common set of established terms.

Geoprivacy

423

The Urban and Regional Information Systems Association (URISA), in collaboration with the US Federal Geographic Data Committee (FGDC) has established a different set of standards that are somewhat more in tune with GIS applications (see http://urisa.org/about/initiatives/addressstandard). There are four types of addresses defined by URISA’s Address Standards Working Group (ASWG): 1. Thoroughfare: specifies a sequential location along a linear feature, normally a road of some type (e.g., 1147 Maple Street). 2. Landmark: specifies a location through reference to a well-known (may only be locally well-known) feature (e.g., The Whitehouse). 3. Postal: provides a means for mail delivery without reference to the location of an individual (e.g., PO Box 53). 4. General: a mix of the first three classes. These address types are more generic than the ones specified in the USPS standard and some cases correspond closely to the aforementioned place names that are often encountered in social media postings.

1.28.5.2

Resolving Address Components

If address elements are not contained in separate fields, they must be parsed, or decomposed into basic elements, by geocoding software or a geocoding/address standardization API. This may be performed using conventional parsing procedures originally developed in computer science and linguistics (see, e.g., Grune and Jacobs, 2008). If a geocoding project establishes protocols based on USPS or ASWG standards, the likelihood of mismatches and errors will be diminished significantly, because these standards are designed to help both humans and machines correctly decode address components. Field workers, clerical staff, or data entry personnel can be taught to insert standardized codes or forms into appropriate fields in records. A simple example illustrates this; each element in the following list represents Street: {Street, street, Str, Str., St, St., Sttreet, Steet}. Some of the elements in this list can be corrected easily by both humans and computers, while other errors may present a greater challenge to software systems and lead to geocoding failures. The process of address examination and correction may be aided by geocoding software, which typically provides a match score that assesses the correspondence between an input address field and information contained in a geographic database (Table 2). In some instances, a naming reference to a road may not correspond to information derived from a centralized governmental source when residents in an area refer to it using local terminology. For example, Easy Street might be a local name for Highway 61. These alternative referents, however, are included in many of the most widely used materials for geocoding. As suggested earlier, abbreviations are often encountered in data that must be geocoded, and data input personnel will often use abbreviations to save keystrokes. Most abbreviations are easily caught by software through the use of ontologically based synonym lists unless, of course, the abbreviation contains a typographical error. Such cases will typically be flagged for human inspection. In social media, however, abbreviations may take more extreme forms, particularly if a form of friend-based shorthand is adopted. For example, a nightspot might start out with a full place name, but become significantly truncated as it is referred to more frequently, even in the extreme, to a single letter. Thus, Donnelly’s Irish Pub, might get shortened to Donnelly’s, and then, ultimately to DP.

1.28.5.3

Geocoding Operations

Depending on the type of input data and available geographic base file information, geocoding can take several different forms (Zandbergen, 2008). The basic idea is that the input information to be geocoded consists of some descriptor of location that does not have an attached coordinate. The descriptor can range from exceedingly general (Africa), to highly specific (a postal address for a single family dwelling unit). This latter type of input is often geocoded in a wide variety of application domains (Dueker, 1974; Goldberg, 2008; Rushton et al., 2007).

1.28.5.3.1

Address geocoding

Assuming that a relatively clean input file has been created, there are several steps in a typical interpolated address geocoding process. First, select an address to geocode (1147 Maple Street, Idaho City, IA 52245) and parse it into constituent elements. Then, cross-link the address to a geographic base file. To reduce search in the geographic base file, it may be helpful to use a divide and conquer strategy by restricting search to IA and ZIP (Idaho City is not relevant except for error check). Then, for all streets in ZIP Table 2 Decreases in match scores result from the entry of a record that either matches or lacks correspondence with information contained in a geographic database (300 North Summit Street Ottumwa IA 52501) Score

X

Y

Address

City

State

Zip

100 92 82 74

623121.2 623121.2 623121.2 623121.2

4612951.8 4612951.8 4612951.8 4612951.8

300 North Summit Street 300 N Sumit St. 300 Summit St. 300 Sumit St.

Ottumwa Ottumwa Ottumwa Ottumwa

IA IA IA IA

52501 52501 52501 52501

424

Fig. 8

Geoprivacy

Interpolated geocoded location of an address (1147) with additional offset, or displacement, from the centerline.

52245, match to “Maple” “Street,” and for each chain that comprises Maple Street in the geographic base file, determine the one that contains 1147 in its range (L-H) of addresses (Fig. 8). Once the correct street segment and geometrical chain is found, required quantities are solved by proportion. First, the address (1147) is computed as a proportion of the address range (noting odd-even parity) of the containing segment (e.g., 1101–1199). Then the same proportion is applied using geometrical coordinates to obtain a location, based on distance from one end, along the street segment centerline. Two additional steps are also often employed to make realistic maps: the geocoded coordinate is often offset by some constant value from the street centerline (e.g., 10 m), again using odd-even parity, and the location is squeezed in from ends (e.g., 20 m) to disambiguate assignment of a corner address to an incorrect street (Fig. 9). In some areas, the address ranges provided for digital representations of blocks fail to correspond to those on the ground. One common variant is that blocks will be given a default range 301–399 for one block side and 300–398 for the other side. Yet in some cases, the address range will be significantly different, thus leading to large geometrical errors when interpolated proportions are used to assign coordinate locations. An example of this is shown in Fig. 10 where the address range is close to 100 though the real range is much smaller as shown in Fig. 1.

1.28.5.3.2

Geocoding using parcel maps

Most cities and counties no longer record real estate transactions using paper maps. Digital parcel-based representations are used instead and, increasingly, geocoding is supported by the widespread availability of these maps. Fig. 11 is an example of such a map developed by Johnson County, Iowa. Coordinates for geocoding are obtained by first, matching the address to an address in the parcel file, and then creating a correspondence between addresses and the coordinates of the centroids of either the parcel or the building footprint associated with that address.

1.28.5.3.3

Place name geocoding

Locational references in social media transactions are often vague, imprecise, and inaccurate. This is a characteristic of natural language, and, in part, is also because the originator of the information may assume that the receiver is familiar with the context

Fig. 9

Offset and squeeze illustrated in diagrammatic form.

Geoprivacy

425

Fig. 10 Address range limits, shown in small numbers adjacent to street segments, are shown for urban blocks and range close to 100 per block. Actual addresses are shown in Fig. 1. Source: Iowa Geographic Map Server, http://isugisf.maps.arcgis.com/apps/webappviewer/index.html? id¼47acfd9d3b6548d498b0ad2604252a5c; metadata: http://services.arcgisonline.com/arcgis/rest/services/USA_Topo_Maps/MapServer/0.

Fig. 11 Land parcels in a portion of Coralville, IA. Each parcel has a coordinate value that can be used to geocode addresses. Source: Johnson County Property Information Viewer, http://gis.johnson-county.com/piv/ (last accessed 28 December 2015).

in which a name is used, and will therefore be able to decipher meaning. There is an additional assumption that if there is a lack of clarity about information in a social media transaction, a subsequent request for a clarification will be formulated by the recipient. Without clarification, place names, or toponyms, must be disambiguated by alternate means. Schulz et al. (2013) offer a multiindicator method to disambiguate place names in tweets combining spatial indicators from user profiles with tweet text content to estimate tweet and residence locations. Given the assumption of limited geographical specificity, that locations may be specified at only a coarse level of resolution (e.g., a city, but not where inside a city), place name geocoding transformations need to be formulated in a different way. A suite of approaches can be implemented to yield locations from text strings, which, like the addresses in the previous section, will need to be parsed to extract the locations. After text has been parsed it is processed using a geospatial ontology and a gazetteer (see, e.g., Wang and Stewart, 2015). Ontologies have been developed to enable semantic interoperability among disparate data sources

426

Geoprivacy

and types. In the geospatial domain, semantic relations enable data to be cross-linked at different scales, resolutions, and completeness, and, with a gazetteer, are used to transform information like a place name into a coordinate value or some other geometrical representation of a place (Arpinar et al., 2006). Gazetteers have been compiled for centuries and are particularly useful for assisting in the location of items contained in social media. In the digital era, gazetteers contain “structured information about named places that have a particular geographic location; that is the subset of places that have acquired proper or authoritative names” (Goodchild and Hill, 2008, p. 1040). These authors go on to more formally describe a gazetteer as a collection of tuples (N,F,T) where N is one or more names, F is a representation of location (a footprint, coordinate, bounding rectangle, or other locational descriptor), and T is an element of a typology (e.g., river). This last element, place type, is used by Janowicz and Kessler (2008) when they develop an ontological structure that maps between names, footprints, and types. Though gazetteers have no requirement that they be compiled only for formal place names that is the norm. The US Geographic Names Information System (GNIS) is but one example of such a system. Compiled and maintained by the US Board of Geographic Names, GNIS contains a formal list of place names for a variety of feature types. Additions and modifications to the database undergo strict review and require local input before adoption. Consequently, and as implied earlier, a significant gap exists in the space between what is typically contained in a formal gazetteer and informal, local, and sometimes transient place names. Thus, when gazetteers and ontologies are used to specify locations from text sources such as social media, match failures, and ambiguity can play a large role in the amount of error encountered. To address such limitations, Goldberg et al. (2009) have developed a useful strategy for reducing errors by text-mining web pages to develop regional compilations of place name information. GeoNames.org is a popular free web service that provides over 8 million place names (http://www.geonames.org/).

1.28.5.3.4

Reverse geocoding

As described in the previous section, place name geocoding is able to provide coordinate information even for vaguely-defined places. This addition of coordinates to records with named locations is a type of transformation. In some cases it is useful, or necessary, to invert transformations to yield information about input data streams. Reverse geocoding is a transformation that, as the name suggests, inverts the results of geocoding to recover addresses from an “anonymous” dot (or other point symbol) map (Armstrong and Ruggles, 1999, 2005; Curtis et al., 2006). Reverse geocoding can be done in different ways, though a geographic base file, such as TIGER, is often used. Given a map of geocoded locations represented as points, the first step is to precisely register it in a coordinate system. This may require some guesswork, because in many cases the metadata used to create the map may not be available. However, for small areas this is not normally a substantial problem given that the amount of geometric distortion will be low irrespective of projection in most cases. Then for each “dot” or other symbol, a search is conducted to determine the street segment in the geographic base file that is closest to it, and the point is assigned to that segment (between intersections). The geometrical proportion of the dot along the length of that street segment is calculated and that proportion is then used to calculate the address proportion from the address range associated with that segment in the geographic base file, making note of parity (L-R). The result of these steps is a “best guess” of an address for a dot on a map. If done exactly, the inverse of the transformation should produce the original input; in this case a text address that is geocoded and mapped and then reverse geocoded, should produce the same text address (Armstrong and Ruggles, 1999). The results of geocoding and its inverse, however, are subject to errors or distortions that might be introduced intentionally or not (Armstrong and Tiwari, 2007; Armstrong et al., 1999; Leitner and Curtis, 2004). If the data are not intentionally masked, two general sources of error, often related to the parameters used in geocoding, are often encountered. Offset distances can introduce small amounts of uncertainty into the reverse geocoding process (Fig. 9). But since offset is typically a small constant value (e.g., 15 m), it is easy to “back out” of calculations and will normally not introduce large errors as a consequence. Squeeze, in contrast, displaces the geometrical location of addresses along the street segment and thus may shift symbols by one or more addresses in the range associated with a particular side of a street segment. Such errors are typically larger at the ends of streets when compared to the middle sections where squeeze distance approach zero. Squeeze therefore may render the recovery of a correct exact address impossible (see Armstrong and Ruggles, 2005). In this context, we can introduce the concept of Type I and Type II reverse geocoding errors. Type I (false positive): An address is geocoded to yield a coordinate that is mapped. This mapped location is reverse geocoded to yield an address that is different than the one used as input, typically as a consequence of offset and squeeze factors applied during geocoding, but also because of imprecision related to projection mismatches and dot symbolism used to create the map that is reverse geocoded. Type II (false negative): An address is geocoded to yield a coordinate that is mapped. This mapped location is reverse geocoded, but does not yield a reliable or recoverable address estimate. This may occur when the output location is associated as a park or water body, for example. This result may be obtained either as a consequence of geocoding or reverse geocoding error. Type I errors are difficult to detect and, in fact, their presence serves as the basis for methods of masking point data through various approaches to displacement (Armstrong et al., 1999; Kwan et al., 2004). Type II errors normally will be rare. If they are encountered, it is an indication that there may be systematic error in the reverse geocoding process. Such errors might occur for example, if the source map were mis-registered or if an incorrect choice was made about the characteristics of the source map, such as the specification of an incorrect datum. Obvious Type II errors could also be used by a data spy who may be interested in recovering the original points from a masked map, as they could reveal clues about the displacement approach employed. In addition to the reverse geocoding for street addresses, some social media also utilize reverse geocoding methods to identify a nearby POI from users’ mobile devices. Users can “check-in” at nearby restaurants or coffee shops by using the geolocation

Geoprivacy

427

provided by the mobile devices (Shaw et al., 2013). The reverse geocoding procedure for a POI includes data query (using the X, Y coordinates from the mobile device), data ranking (rank the probability of POIs based on distance, type, and previous user choices), and data modeling (spatiotemporal models for POI distribution probability patterns). For example, when a user turns on the “check-in” function in social media platform, a top-ranked list of POI (such as Restaurant A, B, and C) will be created automatically and wait for users to make a selection.

1.28.6

Cross-Linking Information to Define Activity Spaces for Individuals

With the ability to transform between addresses and coordinates, it now becomes a relatively straightforward effort to cross-link with other types of digital information (National Research Council, 2007a). In fact, geography serves as a metaphorical Rosetta Stone that enables linkages to be established among disparate data types that have only their location in common. In addition to addresses, other identifiers can also be gleaned, which enable further linkages to be established. This can be accomplished in several ways. For example, a direct linkage can be established using a relational “join” operation. In such cases an attribute in two record types would be held in common and a link formed on that basis. In other cases, a point in polygon query could be conducted to establish an ecological linkage to attributes associated with an area. For example, a home residence could be located within a particular census block group with certain socio-economic indicators. These types of analyses are widely used in geodemographics and in GIS applications in Public Health (Cromley and McLafferty, 2012; Harris et al., 2005). In other cases probabilistic methods can be employed. Just as there can be errors in geocoding operations, errors in cross-linking information to addresses can and do occur, which may result in a misrepresentation of individuals or groups. There are privacy implications of falsely identifying a person’s home or falsely attributing data to an incorrect individual. As data volumes and types continue to increase in the current era of “big data,” cross-linkages are often at the fore of discussions related to privacy (e.g., Craig and Ludloff, 2011). In fact, Sweeney (2000) showed that 87% of the US population could be uniquely identified based solely on three variables: five digit ZIP code, gender, and date of birth. In addition, Sweeney found that even if the specification of location is somewhat relaxed,  50% of the population can be uniquely identified on the basis of place, gender, and date of birth, where place refers to the city and municipality of residence. It is also a straightforward exercise to use free Internet services to determine telephone land lines, household members, and housing characteristics for individual residences and residents (Mayer and Mutchler, 2013). These are but a few examples of what is sometimes referred to as cross-walking or cross-linking information. Cross-linked locations can be used to amass information about human spatial behavior. Several researchers have developed significant conceptual advances in the study of humans as they move through time and space, with Hägerstrand (1970) serving as an important pioneer of this approach. This work is summarized by Thrift (1977) and several of the main themes of Hägerstrand’s work have been placed in a modern research context and extended by, most notably, Miller (1991, 2005), Kwan (1998), Shaw (2006), and Miller and Bridwell (2009). Much of this work uses maps to visualize space–time movements and gain insights into human spatial processes and behaviors (Fig. 12). An additional concept is particularly relevant here. An activity space refers to the collection of locations (and the paths between them) that an individual has direct contact with on a daily, weekly, or other cycle (Horton and Reynolds, 1971a,b). At an individual level, the activity space for a person (Fig. 13) can be used to construct behavioral profiles that detail journey to work, as well as the location of, for example, day care, place of worship, social clubs, dry cleaner, grocery and liquor stores, and other places of business (see, e.g., Sherman et al., 2005).

Fig. 12 A diagrammatic representation of space time. Arrows represent individual trajectories in space. Meetings are symbolized by shared space– time cylinders. Based on original work by Hägerstrand.

428

Geoprivacy

Fig. 13 com/).

An example of an activity space map that could be constructed from cell phone and social media transactions. Source: Esri (http://www.esri.

While it is unrealistic to suggest that individuals would “tweet” with a geotag every time they visited, for example, a particular grocery store, if they post information only occasionally, over time, a space–time profile of activity can be constructed and, from that, behavior inferred. Such locational profiles can be constructed because of the typical space–time activities of individuals. If someone uses mass transit to go to their work place, their daily path is constrained by the routes used. On the other hand, if a private vehicle is used, while mobility is potentially less restricted, people tend to develop well-worn paths for routine travel, such as the journey to work. The formation of these routine paths is based on accumulated travel experience with a typical goal of minimizing either transit time or cost. Of course, route alterations will often occur when trips are taken at different times of the day, and will also depend on whether a multipurpose trip is contemplated, and whether either temporary (construction) or permanent (new road) route alterations occur. The cross-linking information can be implicit and require additional external knowledge to trigger the violation of geoprivacy. For example, if a user posts a nongeotagged social media message, I am attending the 2016 AAG meeting! By conducting a web search, the location (San Francisco) and the date (March 29 to April 2, 2016) of the event can be determined. It is also possible to retrieve a large number of historical messages from the same user and estimate a possible home and workplace location for that individual. By combining the implicit event knowledge and the individual’s estimated activity spaces, researchers can estimate an individual’s travel patterns and behaviors using cross-linked social media data and web knowledge. Activity spaces are constructed by sequencing locations that occur at points. This is a relatively easy to accomplish task given a series of points taken as a regular sample in space–time. Such samples, as mentioned earlier, are available to cellular service providers and to corporations that are provided information by users in exchange for service. This also serves as the basis for forensic mapping, which is used to construct space–time analyses as related to the commission of criminal acts (e.g., Schmitz et al., 2013). In the United States, such detailed space–time information can be obtained only after a search warrant has been obtained. In other cases, individuals voluntarily allow themselves to be tracked as part of a research project that has been approved by an appropriate IRB. In yet other cases, as when social media is parsed and geocoded, paths may need to be inferred since the space–time density of points will be much lower. This is a more difficult task to accomplish with certainty, but methods based on reasonable assumptions can be developed. One approach establishes beacon points (repetitive locations of some type such as home, recreation, or workplace) and the use of a shortest path algorithm to compute routes among the beacons. Such linked locations can serve as an analogical trail pheromone path, such as those constructed by ants, which we will term a geomone. Geomones are digital codes that establish the current and previous locations of humans as they navigate through the environment. When the location information contained in these geomonic digital trails is used to link with other types of information, human behaviors can be inferred. For example, if a string of space–time coordinates reveals that an individual is fixed for approximately 2 h every Thursday night at a location that has an address that corresponds to a particular fraternal organization, it may be inferred that the individual is a member of that group. Of course, other more sinister examples can be developed.

Geoprivacy

429

Just as pheromones that mark food trails are reinforced as multiple ants move along a similar path, geomones can reinforce locations, through repeated observations. Biological pheromones also degrade, thus reducing their ability to be sensed. Geomones could be assigned strength weights that are some decreasing function of time. Geomone trails, would therefore, have some degree of persistence but could become indistinct if an activity space changes. The accuracy of a trail is a function of the geographical accuracy of the technology used to record it, as well as the temporal sampling strategy adopted. If locations are derived, for example, exclusively from place names in Tweets the spatial specificity of the locations yielded may be low. On the other hand, GPS-based cell phone coordinates will often be highly accurate (Zandbergen, 2009), though obvious problems occur in urban canyons and inside large structures, such as shopping malls. It should be noted, however, that indoor positioning strategies based on a variety of approaches such as Wi-Fi triangulation are being developed rapidly (Han et al., 2014). Sampling in the time domain may be system-level or under user control. For example, cell phones connect to specific towers in the network, and when people move, control is handed-off from one cell tower to another, in this way providing location paths. In other cases locations may be based on intermittent reports (e.g., tweets) provided by an individual.

1.28.7

Concluding Remarks

Though it has long been possible to link maps with other types of tabular information to reveal personal-level locational information, changes in technology have made it possible to perform such tasks seamlessly and for large numbers of unsuspecting citizens. Location-aware portable electronic devices are now a ubiquitous feature of the technological milieu of the first part of the 21st century and provide a strong foundation for the digital economy. Ranging from tablets to smart phone to fitness monitors, such devices record, store, and communicate locational information on a regular basis. In addition, the growth of social media has led to the creation of new types of location-based text and metadata that can be converted into geographic coordinates. As these data can often be associated with individuals, locations, life paths, and activity spaces can be constructed and cross-linked with other information to yield massive quantities of information about what people do. The ease with which this information can be obtained and transformed to yield traces of activity is not generally recognized by most citizens. As such, they may feel violated if they were to become aware that such information is collected and used by interested parties. Others may have no concerns whatsoever. Many people under the age of 25 were “born digital” and have had their location monitored for a considerable portion of their lives. Indeed, in this era of big data, locational privacy may be a moot point (Narayanan and Shmatikov, 2010; Tucker, 2013). As we become increasingly cyborg-like, and as augmented reality (which requires location to be effective) becomes more pervasive, it is possible that the only people who opt out from location services will be criminals.

References Albergotti, R., 2014. Furor erupts over Facebook’s experiment on users. The Wall Street Journal. http://www.wsj.com/articles/furor-erupts-over-facebook-experiment-on-users1404085840 (last accessed 28 September 2015). Armstrong, M.P., Ruggles, A.J., 1999. Map hacking: on the use of inverse address-matching to discover individual identities from point-mapped information sources. In: Paper presented at Geographic Information and Society Conference 1999, Minneapolis, MN. http://ir.uiowa.edu/geog_pubs/205 (last accessed 25 September 2015). Armstrong, M.P., Ruggles, A.J., 2005. Geographic information technologies and personal privacy. Cartographica 40 (4), 63–73. Armstrong, M.P., Tiwari, C., 2007. Geocoding methods, materials, and first steps toward a geocoding error budget. In: Rushton, G., Armstrong, M., Gittler, P.J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L. (Eds.), Geocoding health data. CRC Press, Boca Raton, FL, pp. 11–35. Armstrong, M.P., Rushton, G., Zimmerman, D.L., 1999. Geographically masking health data to preserve confidentiality. Statistics in Medicine 18 (5), 497–525. Arpinar, I.B., Sheth, A., Ramakrishnan, C., Usery, E.L., Azami, M., Kwan, M.P., 2006. Geospatial ontology development and sematic analytics. Transactions in GIS 10 (4), 551–575. Beresford, A.R., Stajano, F., 2003. Location privacy in pervasive computing. IEEE Pervasive Computing 2 (1), 46–55. Clarke, K.C., 2015. A multiscale masking method for point geographic data. International Journal of Geographical Information Science 30 (2), 300–315. Craig, T., Ludloff, M.E., 2011. Privacy and big data. O’Reilly Media Inc, Sebastopol, CA. Cromley, E.K., McLafferty, S.L., 2012. GIS and public health, 2nd edn. Guilford, New York. Curry, M.R., 1998. Digital places: living with geographic information technologies. Routledge, New York. Curtis, A., Mills, J., Leitner, M., 2006. Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina. International Journal of Health Geographics 5 (44). Dash, S., Schwartz, R.F., Knowlton, R.E., 1959. The eavesdroppers. Rutgers University Press, New Brunswick. de Souza e Silva, A., Frith, J., 2012. Mobile interfaces in public spaces: locational privacy, control, and urban sociability. Routledge, New York. DeCew, J., 1997. In pursuit of privacy: law, ethics and the rise of technology. Cornell University Press, Ithaca, NY. Dueker, K.J., 1974. Urban geocoding. Annals of the Association of American Geographers 64 (2), 318–325. Goldberg, D.W., 2008. A geocoding best practices guide. North American Association of Central Cancer Registries, Springfield, IL. Goldberg, D.W., Wilson, J.P., Knoblock, C.A., 2009. Extracting geographic features from the internet to automatically build detailed regional gazetteers. International Journal of Geographical Information Science 23 (1), 93–128. Goodchild, M.F., Hill, L.L., 2008. Introduction to digital gazetteer research. International Journal of Geographical Information Science 22 (10), 1039–1044. Goss, J., 1995. We know who you are and we know where you live: the instrumental rationality of geodemographic systems. Economic Geography 71 (2), 171–198. Greenwald, G., MacAskill, E., 2013. NSA Prism program taps in to user data of Apple, Google and others. The Guardian. http://www.theguardian.com/world/2013/jun/06/us-techgiants-nsa-data (last accessed 28 September 2015). Grune, D., Jacobs, C.J.H., 2008. Parsing techniques: a practical guide, 2nd edn. Springer, New York. Hägerstrand, T., 1970. What about people in regional science? Papers of the Regional Science Association 24 (1), 6–21.

430

Geoprivacy

Han, D., Jung, S., Lee, M., Yoon, G., 2014. Building a practical Wi-Fi-based indoor navigation system. IEEE Pervasive Computing 13 (2), 72–79. Harris, R., Sleight, P., Webber, R., 2005. Geodemographics, GIS and neighbourhood targeting. Wiley, Chichester. Hayden, E.C., 2015. Researchers wrestle with a privacy problem. Nature 525 (7570), 440–442. http://www.nature.com/news/researchers-wrestle-with-a-privacy-problem-1. 18396 (last accessed 27 September 2015). Heim, R.J., 2015. Drone video captures monk sunbathing atop wind turbine. NBC 10 News. http://www.turnto10.com/story/29910087/drone-video-captures-monk-sunbathingatop-wind-turbine (last accessed 28 September 2015). Hern, A., Rankin, J., 2015. Spotify’s chief executive apologises after user backlash over new privacy policy. The Guardian. http://www.theguardian.com/technology/2015/aug/21/ spotify-faces-user-backlash-over-new-privacy-policy (last accessed 28 September 2015). Hill, K., 2014. ‘God view’: Uber allegedly stalked users for party-goers’ viewing pleasure (updated). Forbes. http://www.forbes.com/sites/kashmirhill/2014/10/03/god-view-uberallegedly-stalked-users-for-party-goers-viewing-pleasure/ (last accessed 28 September 2015). Horton, F.E., Reynolds, D.R., 1971a. Effects of urban spatial structure on individual behavior. Economic Geography 47 (1), 36–48. Horton, F.E., Reynolds, D.R., 1971b. Action-space differentials in cities. In: McConnell, H., Yaseen, D.W. (Eds.), Perspectives in geography 1: models of spatial variation. Northern Illinois University Press, Dekalb, IL, pp. 84–102. Janowicz, K., Kessler, C., 2008. The role of ontology in improving gazetteer interaction. International Journal of Geographical Information Science 22 (10), 1129–1157. Kounadi, O., Leitner, M., 2014. Why does geoprivacy matter? The scientific publication of confidential data presented on maps. Journal of Empirical Research on Human Research Ethics 9 (4), 34–45. Kramer, A.D., Guillory, J.E., Hancock, J.T., 2014. Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences 111 (24), 8788–8790. Kwan, M.P., 1998. Space-time and integral measures of individual accessibility: A comparative analysis using a point-based framework. Geographical Analysis 30, 191–217. Kwan, M.P., Casas, I., Schmitz, B.C., 2004. Protection of geoprivacy and accuracy of spatial information: how effective are geographical masks? Cartographica 39 (2), 15–28. Leber, J., 2013. How wireless carriers are monetizing your movements. MIT Technology Review. http://www.technologyreview.com/news/513016/how-wireless-carriers-aremonetizing-your-movements/ (last accessed 27 September 2015). Leitner, M., Curtis, A., 2004. Cartographic guidelines for geographically masking the location of confidential point data. Cartographic Perspectives 49 (1), 22–39. Mayer, J., Mutchler, P., 2013. MetaPhone: the NSA’s got your number. Web Policy. http://webpolicy.org/2013/12/23/metaphone-the-nsas-got-your-number/ (last accessed 27 September 2015). Miller, A.R., 1971. The assault on privacy: computers, data banks, and dossiers. University of Michigan Press, Ann Arbor, MI. Miller, H.J., 1991. Modelling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information Systems 5 (3), 287–302. Miller, H.J., 2005. What about people in geographic information science? Computers, Environment and Urban Systems 27, 447–453. Miller, H.J., Bridwell, S.A., 2009. A field-based theory for time geography. Annals of the Association of American Geographers 99 (1), 49–75. Muir, J.A., Van Oorschot, P.C., 2009. Internet geolocation: evasion and counterevasion. ACM Computing Surveys 42 (1), 1–23. Narayanan, A., Shmatikov, V., 2010. Myths and fallacies of “personally identifiable information”. Communications of the Association for Computing Machinery 53 (6), 24–26. National Research Council, 2007a. Putting people on the map: protecting confidentiality with linked social-spatial data. National Academies Press, Washington, DC. National Research Council, 2007b. Engaging privacy and information technology in a digital age. National Academies Press, Washington, DC. National Research Council, 2013. Geotargeted alerts and warnings. National Academies Press, Washington, DC. Nissenbaum, H., 2004. Privacy as contextual integrity. Washington Law Review 79 (1), 119–158. Poese, I., Uhlig, S., Kaafar, M.A., Donnet, B., Gueye, B., 2011. IP geolocation databases: unreliable? ACM SIGCOMM Computer Communication Review 41 (2), 53–56. Rushton, G., Armstrong, M.P., Gittler, J., Greene, B.R., Pavlik, C.E., West, M.M., Zimmerman, D.L., 2007. Geocoding health data. CRC Press, Boca Raton, FL. Schmitz, P., Eloff, C., Talmakkies, R., Linner, C., Lourens, R., 2013. Forensic mapping in South Africa: four examples. Cartography and Geographic Information Science 40 (3), 238–247. Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., Mühlhäuser, M., 2013. A multi-indicator approach for geolocalization of tweets. Proceedings of seventh international AAAI conference on weblogs and social media. AAAI Press, Cambridge, MA, pp. 573–582. Seidl, D.E., Paulus, G., Jankowski, P., Regenfelder, M., 2015. Spatial obfuscation methods for privacy protection of household-level data. Applied Geography 63, 253–263. Shaw, S.-L., 2006. What about ‘time’ in transportation geography? Journal of Transport Geography 14 (3), 237–240. Shaw, B., Shea, J., Sinha, S., Hogue, A., 2013. Learning to rank for spatiotemporal search. In: Proceedings of the sixth ACM international conference on Web search and data mining. ACM, Rome, Italy, pp. 717–726. Sherman, J.E., Spencer, J., Preisser, J.S., Gesler, W.M., Arcury, T.A., 2005. A suite of methods for representing activity space in a healthcare accessibility study. International Journal of Health Geographics 4 (24). http://www.ij-healthgeographics.com/content/4/1/24 (last accessed 27 September 2015). Smit, L., Stander, A., Ophoff, J., 2012. An analysis of base station location accuracy within mobile-cellular networks. International Journal of Cyber-Security and Digital Forensics 1 (4), 272–279. Solove, D.J., 2008. Understanding privacy. Harvard University Press, Cambridge, MA. Sweeney, L., 2000. Simple demographics often identify people uniquely (data privacy working paper 3). Carnegie Mellon University, Pittsburgh, PA. Thomson, J., 1975. The right to privacy. Philosophy and Public Affairs 4, 295–314. Thrift, N., 1977. An introduction to time-geography: concepts and techniques in modern geography (CATMOG) No. 13. Geo Abstracts Ltd, Norwich. Tsou, M.-H., Kim, I.-H., Wandersee, S., Lusher, D., An, L., Spitzberg, B., Gupta, D., Gawron, J., Smith, J., Yang, J.-A., Han, S.Y., 2013. Mapping ideas from cyberspace to realspace: visualizing the spatial context of keywords from web page search results. International Journal of Digital Earth 7 (4), 316–335. Tucker, P., 2013. Has big data made anonymity impossible? MIT Technology Review. http://www.technologyreview.com/news/514351/has-big-data-made-anonymity-impossible/ (last accessed 27 September 2015). US Department of Health, Education and Welfare, 1973. Records, computers and the rights of citizens. US Government Printing Office, Washington, DC. DHEW Publication No. (08) 73-97. US Postal Service (USPS), 2013. Postal addressing standards. US Postal Service. Publication 28. http://pe.usps.gov/cpim/ftp/pubs/Pub28/pub28.pdf (last accessed 27 September 2015) Washington: United States Postal Service. Vasiliev, I., 1992. The Moscow connection. In: Janelle, D.J. (Ed.), Geographical snapshots of North America. The Guilford Press, New York, pp. 227–230. Wang, W., Stewart, K., 2015. Creating spatiotemporal semantic maps from web text documents. In: Kwan, M.-P., Richardson, D., Wang, D., Zhou, C. (Eds.), Space-time integration in geography and GIScience: research frontiers in the U.S. and China. Springer, Dordrecht, pp. 157–174. Warren, S.D., Brandeis, L.D., 1890. The right to privacy. Harvard Law Review 4 (5), 193–220. Westin, A.F., 1967. Privacy and freedom. Atheneum, New York. Zandbergen, P.A., 2008. A comparison of address point, parcel and street geocoding techniques. Computers, Environment and Urban Systems 32, 214–232. Zandbergen, P.A., 2009. Accuracy of iPhone locations: A comparison of assisted GPS, WiFi and cellular positioning. Transactions in GIS 13, 5–36. Zandbergen, P.A., 2012. Comparison of WiFi positioning on two mobile devices. Journal of Location Based Services 6 (1), 35–50. Zimmerman, D.L., Pavlik, C.E., 2008. Quantifying the effects of mask metadata disclosure and multiple releases on the confidentiality of geographically masked health data. Geographical Analysis 40 (1), 52–76.

1.29

Defining Public Participation GIS

Rina Ghose, University of Wisconsin-Milwaukee, Milwaukee, WI, United States © 2018 Elsevier Inc. All rights reserved.

1.29.1 1.29.2 1.29.3 1.29.4 1.29.5 References

1.29.1

Introduction Examining Public and Participation in PPGIS Interrogating Access in PPGIS Integrating Local Knowledge in PPGIS Assessing PPGIS Outcomes

431 432 433 434 435 437

Introduction

Maps have historically conferred power to its users who have translated the power of spatial knowledge to control land. Such power has tended to reside with the wealthy elite, while remaining out of reach of common people. While advancement of technology led to a transition from manual to automated cartography and eventually to geographic information system (GIS), spatial information and its analysis continued to remain in the domain of experts, rendering it as an expert technology. Centrality of GIS in resource management, conservation, planning, and policy-making activities since the 1960s, as most decision-making tasks, require spatial information. Yet, there is a differential level of access to GIS due to its cost and complexity. Such uneven access has led the experts to control spatial planning, shaping the ways that land, and its resources can be controlled and managed. Yet, such land-use decisions have disregarded the participation of indigenous or traditionally marginalized communities that reside on the land, whose experiential knowledge is dismissed as anecdotal. Based on the notion that increased use of spatial knowledge leads to more informed participation in policy-making, public participation geographic information system (PPGIS) aims to broaden the access of GIS technologies to socially marginalized groups, leading to more informed and empowered communities. The evolution of PPGIS as a research agenda and a practice is rooted in the tumultuous debates surrounding GIS in the early 1990s. GIS was critiqued for its perceived positivist epistemology, which reduced complex social processes to points, lines, areas, and attributes (Pickles, 1995). Critics found GIS to be an elitist technology as it remained out of reach for the masses. Critics also found GIS to be an instrument for production of rational, expert knowledge, implemented in policymaking through a top-down approach. The presence of a GIS digital divide along class and race lines was seen as inherently disempowering to marginalized citizens, for it discouraged their knowledge and their participation in resource allocation and governance. Led by the National Center for Geographic Information Association (NCGIA), the Friday Harbor conference in 1993 united the critics and the proponents of the GIS to formulate the GIS and Society research agenda, in order to address these criticisms. PPGIS arose out of this research agenda as an effort to advance informed participation of marginalized communities through inclusive access to GIS and spatial data. Grassroots organizations of marginalized communities tend to be more decentralized and more fragile in financial and staffing support when compared to larger nonprofit organizations and public/private sector agencies. Their poor resource conditions make it difficult for them to afford the cost of implementing GIS in their organizations. Purchase of data, software, and hardware or provision of ongoing GIS training to their staff members continues to be a challenge for these organizations. Moreover, these organizations are also hampered by frequent staff turnovers that result in higher staff training costs. Finally, grassroots organizations have faced the additional challenge of not gaining easy access to public database that are often the repository of valuable communitybased spatial data at multiple scales. These difficulties have created a technological divide between the institutions and policy makers that are frequent users of GIS and grassroots groups that face challenge in using GIS. Class inequalities are then further heightened by inequalities concerning the knowledge of and access to such information technologies, for it is clear that such access and understanding create greater opportunities for conferring political, economic, and social power upon the citizens of distressed communities. “Empowerment, Marginalization and Public Participation GIS” was established as a critical research agenda by NCGIA and University Consortium for Geographic Information Science in the late 1990s, leading to an advancement of PPGIS work. A wide range of initiatives emerged across the world, aiming to enhance accessibility of data/GIS to disadvantaged groups and to incorporate multiple voices and local knowledge within a variety of contexts. The following goals tend to define PPGIS: (1) providing equitable access to spatial data and GIS technologies among socially marginalized citizens; (2) incorporating grounded, indigenous, experiential, and local knowledge with public datasets for marginalized citizens to contest or reshape policies; (3) discouraging top-down, rational planning approaches in policy-making; (4) integrating qualitative and quantitative data in GIS; (5) creating alternate forms of mapping, representation to capture complex social processes and cultures; and (6) creating alternate forms of geospatial technology designed to suit the needs of indigenous or socially marginalized groups.

431

432

Defining Public Participation GIS

In response to the establishment of PPGIS agenda, various international conferences and workshops have been held that have significantly shaped PPGIS work. Notable among them are the Varenius Workshop on “Empowerment, Marginalization and Public Participation GIS” (1998) held in the United States, the workshop on Access and Participatory Approaches in Using GIS (2001) held in Italy, PPGIS international conferences held in the United States (2002–05), and the PPGIS E-Seminar (2007). Such workshops and conferences have been sponsored by significant scholarly bodies such as the US National Science Foundation, Urban and Regional Information Systems Association (URISA), European Science Foundation, Institute of British Geographers/Royal Geographic Society, which demonstrate the critical attention that has been given to this topic. A rich body of literature containing both theorization and case study analysis has been published in various journals and books. Notable among these are the special issue in Cartography and GIS (Barndt, 1998), as well as a seminal book on global PPGIS titled “Community Participation and GIS” (Barndt, 2002). Literature reviews of PPGIS practices also provide significant assistance in understanding this complex process (Sieber, 2006). Other notable volumes, such as “Handbook of GIS and Society” (2011), have also devoted significant sections on PPGIS. Individuals engaged in PPGIS have thus gained awareness as a distinct community with the creation of new spaces of discourse such as the PPGIS conferences and listserv.

1.29.2

Examining Public and Participation in PPGIS

PPGIS projects have been carried out as community-engaged projects involving marginalized communities and a network of social groups, such as activists, universities, government and nongovernmental agencies, community-based organizations, and grassroots communities. University–community partnerships and collaborative planning programs have provided particular opportunities for GIS practitioners and scholars to collaborate with disempowered and marginalized community groups, leading to grounded, bottom–up spatial knowledge production that could be strategically employed in policy activities. Thus, the public in PPGIS involves not only the marginalized community but also a wide range of actors at multiple institutions. Schlossberg and Shuford (2005, p.18) identified three sorts of participants in PPGIS: 1. Those affected by a decision or program; 2. Those who can bring important knowledge or information to a decision or program; 3. Those who have power to influence and/or affect implementation of a decision or program. Further, the power positionality of such actors involved in PPGIS must also be interrogated. While the marginalized community group generally occupies the most subordinate power position, other actors tend to occupy higher power positions that can affect outcomes. Yet, marginalized community groups can also navigate networks to create powerful allies to attain their goals. The notion of stakeholders is critical in PPGIS, and Schlossberg and Shuford (2005) define PPGIS participants as “stakeholders who are affected by, bring knowledge or information to, and possess the power to influence a decision or program.” Building powerful networks of association with the multitude of public involved in PPGIS has enabled marginalized community groups to address or contest social or environmental injustices. But PPGIS in itself cannot solve the problems of structural inequities and injustices, which must be addressed through political, social, and economic reforms. Similarly, participation is a multifaceted process. Many studies have drawn upon Arnstein’s (1969) concept of participation ladder to illustrate the levels of participation, ranging from tokenism at the bottom to collaboration in the middle and to citizen control at the top. Representing the various perspectives of multiple publics within spatial data structures remains a key research challenge. PPGIS approaches have been used in both global South and global North, on a diverse range of topics such as conflict resolution, land disputes, exploitation of natural resources, environmental protection and conservation, infrastructural provision in squatter settlements, indigenous land rights, land-use planning and redevelopment, etc. Over the past two decades, the areas of PPGIS applications and theoretical conceptualizations have evolved and mutually informed each other. PPGIS practices have been carried out globally to address multitude of policy-making, ranging from rural land use to conservation and natural resource management, to urban planning, to social and environmental activism, and to advocacy (Craig et al., 2002). PPGIS research has been shaped by integrated theoretical framework derived from social and political theory, science and technology studies, human–computer interaction (HCI), organizational theory, and feminist theory. PPGIS as a process has been critically scrutinized to explicate the notion of public and the nature of participation. Such critical introspection has led to creative tension, with suggestions to rename it to participatory GIS (PGIS), while other terms such as community GIS, community integrated GIS have also been used interchangeably. At times, “citizen science” has been used in conjunction with Participator GIS (involving conservation and environmental protection). Key issues in critical GIS research span a diverse array of topicsddiffering national-level procedures for spatial data access, effective PPGIS practices from initiatives around the world, and the use and impacts of digital geographic data in spatial decision-making. The nature of participation and the process of empowerment through PPGIS have also been interrogated. Though their areas of inquiry are quite diverse, research indicates that geographic data access and PPGIS projects are place specific, highly contingent, strongly shaped by the local context in which they are situated. Place thus plays an important role in shaping participatory approaches to spatial decision-making. Further, the nature of these participatory processes is crucial to understanding the differential impacts of PPGIS initiatives for the individuals and communities affected by them.

Defining Public Participation GIS

1.29.3

433

Interrogating Access in PPGIS

The notion of “democratizing data” was raised early on in PPGIS as spatial data is expensive and hard to obtain for marginalized groups (Sawicki and Craig, 1996). Within this context, appropriateness and accuracy of data are important considerations, as data and analysis that are found to be appropriate and accurate will lead to greater action. In order to properly assess this issue, a model has been proposed by Barndt (which asks the following questions: Are the data and material produced appropriate to the organization issues? Can the organization use the information in action-oriented way to support decisions, enhance communication, and inform actions? Is information available to the organization in a timely manner? Is the information pertinent to organization issues? Do the results have temporal and cross-comparison componentdthat is, a time perspective? Are the available data sufficiently accurate? In the case of data/GIS, access is shaped by factors such as existing spatial infrastructure developments, presence of supportive network of actors, and supportive local policies. Access to data and technology also lies beyond the simple availability of data, hardware, and software. Issues of appropriateness of data, data accuracy and updates, forms of data representation, and costs of training are an important and integral part of effective PPGIS practices. Access to public datasets is affected by different legal structures for copyright and licensing, and existing traditions such as the freedom of information access to public data. Data sharing between public agencies and citizen groups also depends upon the degree of government agencies’ openness to accepting grassroots citizen groups as authoritative participants in the planning process. Conversely, budgetary cutbacks and fiscal constraints may compel government agencies to sell their data to the public and/or limit data sharing with selected stakeholders. Within PPGIS, access to technology leads to questions of GIS implementation in grassroots organizations. The adoption and use of GIS by grassroots organizations has been studied under the moniker of GIS implementation. This body of work is based on the research on GIS implementation in local governments, which in turn has roots in organizational theory. Here implementation refers to the decisions made by an organization to acquire, install, implement, and use GIS in accordance with organizational needs and tasks. Presence of supportive actors that assist grassroots groups to gain access to data is another key factor. Because of the resource poor nature of grassroots organizations, they often rely on external expertise for access to GIS data and technology. External GIS providers in PPGIS tend to contain certain common characteristics. In particular, they tend to be housed in larger institutions, which provide the budget, stability, and credibility to those using its products. However, the institutional homes of the supportive actors are varied, including academia, larger nonprofit organizations, public libraries, and city government offices. In the case of urban PPGIS studies, Leitner et al. (2002) found six major models of PPGIS provision to neighborhood organizations, which include community-based GIS, university–community partnerships, GIS facilities in universities and public libraries, map rooms through government GIS, Internet map servers, and neighborhood GIS centers. The six GIS provision models are assessed through the following aspects: costs of maintaining GIS (hardware, software, and GIS training), ease of data provision and responsiveness to community organizations’ needs, stability and longevity, and ability to support collaborations among stakeholders and among grassroots organizations. Studies also indicate that access to and participation in the use of geographic information depends on social context such as local culture and institutions. Factors such as a culture’s ability to absorb uncertainty, its level of masculinity, its ability to accommodate human inequality can significantly shape PPGIS. Other factors include a community’s tolerance of expert solutions, its sense of collective control, and its level of individualism. Prevalent cultural and political norms can limit the type of participants to a specific gender, class, race, or caste. Politically repressive cultures can restrict public access to critical data through legal controls and may control citizen participation to such an extent as to render PPGIS efforts ineffective and irrelevant. Finally, internal characteristics of a grassroots organization also significantly shape its PPGIS activities, including its data access. In particular, Elwood and Ghose (2001) provide an organizational framework to examine neighborhood organizations’ usage of GIS. The framework includes four major dimensions: organizational knowledge and experience, networks of collaborative relationships, organizational stability, and organizational priorities, strategies, and status. In terms of data access, the organizational ability to form and sustain relationships of support with local government agencies and GIS provider actors is vital. Further, a grassroots group’s awareness of the value of spatial information in policy-making will shape its efforts to access data. Lastly, availability of data in terms of appropriateness, accuracy, access, and ownership, and representation is equally a pertinent issue. The context of equitable access to GIS is dependent not only upon mere availability of software and hardware but also upon the GIS skills that reside among the grassroots group. Within the PPGIS agenda, GIS represents a socially constructed technology that can be redesigned to suit the goals of the users, provided they have a fairly high level of technical skills, can shift to suit their goals. PPGIS practitioners in environmental conservation arena often display a high level of technical skill and are adept at using the technology to suit their needs. Therefore, in-house GIS use by such grassroots user groups is common. In the developmental context, Integrated Approaches to Participatory Development group developed a process of blending GIS, physical terrain models, and community participation into “participatory three-dimensional modeling.” This approach exemplifies an interface that is nontechnical. On the other hand, many community mapping projects exist in both rural and urban groups where community members do not engage directly with the GIS but work in collaboration with technical teams to provide their input (such as indigenous, experiential, or local knowledge) and evaluate output. Here, map reading skills are necessary for community groups to acquire, as outputs are in the form of maps. Some projects increase an applicant’s ease of use and lessen the need for GIS skills by enhancing the HCI. Other research has produced a bird’s-eye viewer for GIS participants who experienced difficulty in comprehending their community from a two-dimensional planimetric map.

434

Defining Public Participation GIS

HCI can be characterized from no direct use, to passive use, active use, and proactive use. These are not hierarchical; proactive is not necessarily the optimal level of usage. Nor will stakeholders benefit equally. An effective PPGIS application depends on understanding how much and when technology should be brought into a process. The corollary is how much GIS must be learned by individual stakeholders and what technologies can be supported by available resources. Advancements in Internet technology have significantly enhanced the ease of data access and mapping, and have facilitated data input and communication between different groups. Government agencies have facilitated public access to spatial data and analysis through their Internet GIS sites. However, access to high-speed Internet connection is vital for grassroots groups to effectively access data and perform analysis. Internet GIS is also commonly utilized to elicit public’s input and opinions in resource management and planning. Most recent studies have emphasized the development of geospatial technologies in combination with Web 2.0 technologies (Goodchild, 2007). Google Maps, Google Street View, and Google Earth have emerged as popular, user-friendly mapping interfaces, which can also be linked with GIS technologies to create analytical functions. Citizens can locate places and view them at a high resolution through street maps, detailed aerial photos, and satellite images. Such efforts have significantly increased spatial awareness and mapping among average citizens and have resulted in individuals providing user-generated content through “volunteered geographic information” (VGI) (Goodchild, 2007). VGI has provided any citizen with the opportunity to add information to place, upload videos or images and share information, enriching our understanding of a place. For managing emergencies and disasters, or even for citizen science, such user-generated content and mapping through VGI activities have been very helpful. Both VGI and PPGIS thus provide the opportunity to any internet-using layperson to contribute geographic information for a broad spectrum of purposes, allowing users to act as both information consumer and information provider. The distinction though is that while VGI is more concerned with individualized information sharing and mapping, PPGIS is oriented toward group decision-making that includes the voice of marginalized citizens and explicitly seeks social change. Development of free and open source GIS or Open GIS is a significant achievement in eliminating the factor of software cost in PPGIS (Sui, 2014). The principles of Open GIS include the creation of free, open source software; the harnessing of collective intelligence through a bottom-up information flow and user-generated content; the idea of the web as a platform and the development of web services, cloud computing, and lightweight programming models allowing for application development, such as mashups; software that extends beyond a single device to laptops, mobile smartphones, and tablets; and the rich usability of an easy-to-use graphic interface. Moreover, the increasing employment of smart phones with special apps provides great ease in collecting geocoded spatial data and enables new ways of data collection through participatory photomapping. The diverse, mobile, and flexible nature of Open GIS provides greater opportunities to collect local knowledge and build spatial narratives in cost-effective ways. Overall, web-based GIS transitions from one-way information dissemination to two-way interactive communication to threeway public–public communication. Grassroots groups have a better chance to participate in public policy formulation if spatial knowledge and GIS is incorporated into the organization and utilized as management tool, a form of community interaction and for resource allocation. The ongoing development in web-based geospatial technologies provides community-based organizations a space to represent their knowledge of local places through the maps they create. Web-based GIS mapping programs are a popular way for organizations to incorporate a GIS as part of their internal management and planning strategies and have also made it easier for community-based organizations to share information with one another as well as with the different social actors with whom they are collaborating on participatory projects. Web-based GIS mapping applications have been cited as effective tools for public participation in collaborative research projects, urban governance, and spatial planning initiatives. Community organizations have used Web 2.0 mapping technologies for a variety of reasons including first and foremost to collect information that can be shared in a way that effectively communicates a message to the organizations members, associated partners, and with whom an organization is attempting to form a relationship with. Increasingly, the boundaries between the spatial data user and spatial data creators have blurred.

1.29.4

Integrating Local Knowledge in PPGIS

PPGIS research has emphasized the value of indigenous or local knowledge and ways of incorporating local knowledge and indigenous knowledge into GIS databases. Significant efforts have been made to incorporate local knowledge and value-based data into GIS (Cope and Elwood, 2009), including community organizations’ active reworking of the meaning of mapping through traditional GIS tools, creative engagement of visualization and multimedia representations, and efforts of rewriting GIS software to embody multiple forms of spatial knowledge. A number of studies have sought to investigate the existing everyday practices of conventional GIS that integrate local knowledge. Integration of local, experiential knowledge with public datasets creates nuanced spatial knowledge that has proved to be significant for marginalized communities. In the context of urban revitalization processes, studies have shown that community organizations use such spatial knowledge to enhance inter- and intraorganizational communication, to legitimize existing experiential knowledge for obtaining action, to monitor neighborhood conditions for strategic planning, to prepare for organizational tasks and funding recruitment efforts, to enhance service delivery tasks, and to explore spatial relations to challenge or reshape urban policies. PGIS projects from the global South also show that there are three major uses facilitated through community mapping: communicating information within communities, communicating information between neighboring communities, and communicating information to outside groups.

Defining Public Participation GIS

435

Visualization has been acknowledged as an important approach used by community groups in PPGIS to state their position, communicate with stakeholders, and contest official discourse. PPGIS projects increasingly incorporate multiple ways to enhance such visualization. Maps are a popular form of visualization used by community organizations. In inner-city revitalization, maps help organizations to create spatial strategies to both combat problems and identify opportunities of growth. While mapping of neighborhood problems is important to seek action, community asset maps that showcase the neighborhood’s positive aspects are also frequently used to attract investment opportunities. Such maps can range from simple thematic maps to more complex maps employing Kernel density functions or spatial analysis. Mapping of neighborhood indicators can be used as an effective way to communicate neighborhood conditions and can be used to compare a neighborhood’s condition to other neighborhoods, represented at multiple scales (inner-city scale, municipal scale, metropolitan scale). Such scale jumping can be used as a spatial strategy to highlight the low quality of life in a target neighborhood in order to demand for resources and policy changes. The intersection of critical (or interpretive) epistemologies and qualitative methodologies has led to innovative modes of visualization in PPGIS, as it continues to grapple with underlying questions concerning human identity, lived experience, situated and contingent knowledge, power, and positionality, as well as concerns of social oppression and spatial exclusion. Representation is particularly critical where local knowledge is to be integrated. In order to incorporate local knowledge into the building of GIS databases, efforts have been made to include value-based, traditionally intangible information through efforts such as geoethnography. Feminist geographers employing qualitative ethnographic research have created feminist visualization techniques to more effectively understand subjectivities and represent the different experiences and emotions of individuals (Kwan, 2002). Such approaches provide multiple ways of integrating indigenous and experiential knowledge in PPGIS. PPGIS studies have thus incorporated pictures, photos, narratives, videos, sketches, architectural footprints, etc., to construct spatial knowledge. Through the use of multimedia community-integrated GIS project in South Africa, Harris and Weiner (1998) demonstrated how different forms of community knowledge could be integrated to create powerful spatial narratives. Through innovative approaches, PPGIS projects show that multimedia files can be attached as attributes to point layers to represent oral histories, nontraditional weighting schemes for site suitability can be evaluated, and language-specific user interfaces can be created. PPGIS projects have also demonstrated the use of alternate mapping techniques such as mental mapping, sketch mapping, participatory photomapping, in order to include qualitative data into GIS. Through the act of drawing, a mental map reveals significant dimensions of memory and human perception. Features become emphasized or deemphasized, revealing individual perceptions and experiential knowledge about places. However, it is important to avoid any normative use of mental maps that assume humans to possess perfect locational knowledge. The aforementioned critical and qualitative turn should not be interpreted as a comprehensive rejection of quantitative research methods. Sketch maps and participatory photomaps are spatially referenced and can be considered as examples of methodological pluralism. Narratives and videos can also be georeferenced and used in conjunction with traditional GIS activities. Through these mixed, multi, or hybrid-methodological approach to human spatiality, qualitative GIS offers innovate opportunities to bridge the intractable qualitative–quantitative divide in the direction of synergism and holism. Applications of such qualitative GIS techniques in PPGIS projects have provided greater opportunities for marginalized individuals and communities of varying age groups, race, and gender to counteract hegemonic narratives of exclusion, to incorporate multiple perceptions of reality, to emphasize local issues of importance and concern, and to draw attention to persistent silences in data. There have also been notable efforts to rewrite existing GIS software to accommodate the needs of community groups and to challenge hegemonic forms of data representations. In an effort to bridge the qualitative–quantitative methodological divide, qualitative GIS projects have been created to blend GIS software with computer-aided qualitative data analysis software. In order to facilitate the ease of GIS use among marginalized communities, efforts have been also made to create interfaces customized to the unique needs of communities. Nonetheless, while rewriting the code of GIS opens greater opportunities of multiple data representations, this approach is largely initiated by academia or public agency rather than by the community groups, given the level of technological expertise required. Further, while access to spatial data and GIS can enable the disadvantaged groups to more effectively participate in planning and policy-making activities, there is also evidence of contradictory outcomes as such access may empower certain groups within the organization while disempowering others. The ability to use computers and spatial technology vary widely among marginalized groups, leading to exclusion of some members of the community. The contradictory outcomes of PPGIS can be reflected in web-based mapping projects as well. Access to high-speed Internet differs among different social groups and in different regions. The ability to use the Internet for PPGIS is also quite variable, based on race, class, age, and gender divides. It is therefore important to inquire into whose knowledge is being represented through PPGIS.

1.29.5

Assessing PPGIS Outcomes

Scholars have called for critical examinations of the role of GIS usage in a variety of contexts as whether GIS has empowered marginalized communities or consolidated existing power relations. PPGIS outcomes can differ based on the goals of the project, the local contexts of participation and access, and the internal organizational characteristics. Early on, PPGIS associated outcome with empowerment. PPGIS research shows that it leads to empowerment of marginalized groups. The concept of empowerment has a wide array of definitions. In particular, Elwood (2002) provides an insightful

436

Defining Public Participation GIS

multidimensional analysis of empowerment in the context of GIS usage in community planning. In this framework, empowerment contains three dimensions: distributive change, procedural change, or capacity building. Among these three dimensions, distributive change is the least sustainable and capacity building is the most sustainable. A PPGIS project may result in all of the three dimensions of empowerment and reflected in multiple forms. Sustainability of PPGIS projects is also a factor in such discussions. Ultimately, empowerment must be seen as a gradual process and not as the end product of PPGIS. Notions of expanded participation also may presuppose some degree of homogeneity of benefits among those involved in PPGIS. However, various studies have noted that the notion of participation in PPGIS is complex and needs explication. The notion of individual participation varies according to cultural norms, and the notion of participation in indigenous or nonwestern cultures is significantly different from that of western culture. Social structures based on class, race, and gender divides also shape participation. Further, certain individuals may be better able to participate than others even in the same group. Similarly, all factors being equal, certain organizations are better able to participate than others. In other cases, widespread participation may not be desirable. Similarly, the use of the word participatory may also be problematic as it implies a need of an intermediary actor or institution. Evaluating the effectiveness of increased participation through PPGIS has not been easy and is shaped by its contexts. PPGIS activities among indigenous communities show a more activist and confrontational stance. Their goals include gaining recognition of land rights, protecting traditional land, gathering and guarding traditional knowledge, and achieving social justice. In contrast, PPGIS activities in urban governance are undertaken by community organizations in a framework of accommodation and collaboration, which may not lead to any alterations in power. Political and economic policies that shape governance and policy-making are also deeply influential in shaping the nature of PPGIS. The impacts of neoliberal policies have significantly affected the process of governance and citizen participation across the world, and such impacts have been felt on PPGIS practices as well (Ghose, 2007). In particular, the shift to collaborative governance emphasizing public–private partnership and advocating a leaner government has reconfigured the role of the state and led to the rise of new intermediary stakeholders. Simultaneously, citizen participation in governance has been more formalized in order to shift many of the responsibilities of the state upon the citizen groups. Collaborative governance has led to considerable data sharing between the government agencies and citizen groups, and consequently the use of spatial information and mapping among citizen groups in such collaborative governance has become more common. However, state budgetary cutbacks and funding support for community organizations have reduced the capacity of organizations to participate effectively in collaborative governance. Consequently, community groups still lack sufficient resources to implement in-house GIS, resulting in variable PPGIS productions. Effectiveness of participation and spatial knowledge production is now increasingly dependent upon internal organizational factors (Elwood and Ghose, 2001). In particular, the organizational ability to navigate the local politics of turf and building relations among stakeholders play an important role in shaping PPGIS outcomes. Relationships among stakeholders in PPGIS range from cooperation, compliance, and collaboration to control. Studies show that grassroots organizations must build dense networks of relationships to build and sustain their PPGIS efforts. Further, dynamic networks of support must be built at different spatial scales (local, regional, national) among community groups and other stakeholders (Ghose, 2007). Community organizations in urban governance thus increasingly undertake a cooperative stance rather than a confrontational stance. Within the context of urban governance, marginalized community organizations therefore tend to utilize PPGIS projects for effective navigation into the mechanisms of governance. Their projects have primarily four functions: administrative, organizational, tactical, and strategic. These functional categories mirror public and private sector goals and are similar to increasing efficiency in tasks such as map production, reducing redundancy in databases, and improving effectiveness in decision-making. Certain scholars have identified PPGIS as a mode of “collaborative decision support.” Here, PPGIS has been perceived to add value at several stages of the decision-making process, improving the articulation of stakeholders’ views, increasing individuals’ or groups’ understanding of technology, making complex decisions more transparent and objective, augmenting deliberation and consensus, furthering communication and linkages among internal participants and between internal and external parties, disseminating or sharing information, resolving conflicts, and enabling greater exploration of ideas. Evaluation of the effectiveness of PPGIS projects has been overall a difficult process. Certain scholars argue that PPGIS projects should be assessed based on (1) appropriateness and match with an organization’s existing activities; (2) adaptability to local conditions such as local culture, political climate; (3) fitness to current organizational goals and capacity; and (4) ability to be integrated into broader societal goals. Others argue that the contingent and place-specific nature of PPGIS makes it difficult to craft quantifiable assessment measures, and qualitative assessments of satisfaction with PPGIS projects should instead be practiced. In conclusion, PPGIS practices have been increasingly employed in diverse areas across the world. Major barriers of access to spatial data and technology remain a significant challenge for many disadvantaged groups. At the same time, there have been numerous creative engagements with GIS to incorporate local and multiple forms of spatial knowledge through traditional GIS software, rewriting GIS, as well as the recent combination of Web 2.0 technologies with geospatial technologies. Research has shown that PPGIS practices have been highly context dependable and there are variable outcomes of empowering community organizations. Existing studies have shown the depth and width of PPGIS practices. However, there is very little research on the longitudinal investigation of PPGIS practices. This may partly reflect the challenging issue of maintaining sustainability of PPGIS projects. Yet, a long-term investigation of PPGIS practices within a particular place will provide important insights into the above three theoretical streams reviewed.

Defining Public Participation GIS

437

References Arnstein, R., 1969. A ladder of citizen participation. Journal of the American Institute of Planners 35 (4), 216–224. Barndt, M., 1998. Public participation GIS: Barriers to implementation. Cartography and Geographic Information Systems 25 (2), 105–112. Barndt, M., 2002. A model for evaluating public participation GIS. In: Craig, W., Harris, T., Weiner, D. (Eds.), Community participation and geographic information systems. Taylor & Francis, London, pp. 346–356. Cope, M., Elwood, S. (Eds.), 2009. Qualitative GIS: A mixed methods approach. Sage, Thousand Oaks. Craig, W.J., Harris, T.M., Weiner, D. (Eds.), 2002. Community participation and geographical information systems. Taylor & Francis, London. Elwood, S., 2002. GIS use in community planning: A multi-dimensional analysis of empowerment. Environment and Planning A 34 (5), 905–922. Elwood, S., Ghose, R., 2001. PPGIS in community development planning: Framing the organizational context. Cartographica 38 (3–4), 19–33. Ghose, R., 2007. Politics of scale and networks of association in PPGIS. Environment and Planning A 39, 1961–1980. Goodchild, M., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69, 211–221. Harris, T., Weiner, D., 1998. Empowerment, marginalization, and community-oriented GIS. Cartography and Geographic Information Systems 25 (2), 67–76. Kwan, M., 2002. Feminist visualization: Re-envisioning GIS as a method in feminist geography research. Annuals of the Association of American Geographers 92 (4), 645–661. Leitner, H., McMaster, R., Elwood, S., McMaster, S., Sheppard, E., 2002. Models for making GIS available to community organizations: Dimensions of difference and appropriateness. In: Craig, W., Harris, T., Weiner, D. (Eds.), Community participation and geographic information systems. Taylor & Francis, London, pp. 37–52. Nyerges, T., Couclelis, H., McMaster, R., 2011. Handbook of GIS and society research. Sage, Thousand Oaks. Pickles, J., 1995. Representations in an electronic age: Geography, GIS, and democracy. In: Pickles, J. (Ed.), Ground truth. Guildford Press, New York, pp. 1–30. Sawicki, D., Craig, W., 1996. The democratization of data: Bridging the gap for community groups. Journal of the American Planning Association 62 (4), 512–523. Schlossberg, M., Shuford, E., 2005. Delineating ‘public’ and ‘participation’ in PPGIS. URISA Journal 16 (2), 15–26. Sieber, R.E., 2006. Public participation geographic information systems: A literature review and framework. Annals of the Association of American Geographers 96 (3), 491–507. Sui, D., 2014. Opportunities and impediments in open GIS. Transactions in GIS 18 (1), 1–24.

1.30

User-Centered Design for Geoinformation Technologies

Sven Fuhrmann, George Mason University, Fairfax, VA, United States © 2018 Elsevier Inc. All rights reserved.

1.30.1 1.30.2 1.30.2.1 1.30.2.2 1.30.2.3 1.30.3 1.30.3.1 1.30.3.2 1.30.3.3 1.30.3.4 1.30.4 References

1.30.1

Introduction User-Centered Design Working With Users Usability and Usefulness Ethical Issues in UCD UCD Methods UCD Planning Specifying Context of Use User Requirements Prototyping and Usability Testing Conclusions

438 438 439 440 440 441 441 441 442 442 443 444

Introduction

A young start-up company has acquired a sizable project to develop a mobile Geographic Information System App for environmental monitoring. The App will be used by citizens to document and monitor environmental changes over a longer period of time and run on several operating systems and devices. The project is almost completed. The project manager asks the development team if they expect any delays in publishing the first version to the stakeholders and the public. The software development manager responds that she and her team expects to be on time and might have even resources to conduct a focus group to solicit general user feedback. However, time and funding limits might require to select focus group participants from the development group. The project manager signs off on this suggestion and instructs the public relations team to start advertising the release date in the media and to the investors. How do you think this story will end? Over the last four decades the term user-centered design (UCD) has been used to generally illustrate design processes that are influenced by users and their tasks (Abras et al., 2004). The term originated in the mid-1980s through researchers in the human– computer interaction domain, most prominently through Norman and Draper (1986). Norman, then a researcher at the University of California San Diego published several books on the topic of UCD. “The Design of Everyday Things” (Norman, 1988) became a best-selling publication describing how design generally serves as a communication vehicle between objects and users. In addition Norman (1988) provides several guidelines on how to optimize a design to make the experience of using an object intuitive, useful, and enjoyable. It took several years until the UCD approach made its way into Geographic Information System (GIS) development. The desire for a better and more productive user experience was mostly driven by new emerging graphical user interfaces, the need for customized GIS applications, and a larger and more diverse user group. In the early 1990s, Medyckyj-Scott and Hearnshaw (1993) and Davies and Medyckyj-Scott (1996) started to discuss and describe cognitive abilities, conceptual GIS use models, and the need for user and performance studies. Their initial work and the resulting research questions have laid a foundation for past and current research and development in user-centered geoinformation technology design.

1.30.2

User-Centered Design

Nowadays research and development in geoinformation technologies has generally accepted to include UCD as part of the project development process. While the UCD approaches often differ, a clear trend toward developing useable and useful products is noticeable. In fact, it is actually beneficial that UCD does not include a fixed set of methods for product development; rather it is a flexible, multilayered, and customized process that is often refined during different stages of the UCD workflow. At its core UCD applies a set of specialized methods for collecting user feedback and transforming it into a product or design. Successful UCD processes involve not only computer scientists, but also designers, engineers, user-experience specialists, and other domains (Vredenburg et al., 2002). Nielsen (1992) and especially Nielsen (1993) is one of the most cited publications when people talk about UCD. In his publication Nielsen (1993) states that organizations will only create useful and usable products if the need for UCD is acknowledged, supported across the organizational structure and resources are devoted to this process that he calls usability engineering. As user interfaces of interactive systems advanced during the late 1990s, the International Organization for Standardization (ISO) drafted ISO Standard 13407:1999 “Human-centred design processes for interactive systems” (ISO, 13407, 1999). ISO 13407 provided a first internationally accepted guidance toward incorporating UCD activities throughout the life-cycle of hardware and software products, including GIS. The design and engineering community quickly realized that these

438

User-Centered Design for Geoinformation Technologies

439

ISO guidelines needed to incorporate human factors, ergonomics, process, and productivity aspects to provide usable, useful, safe, and productive information products to a wide range of users and tasks. ISO 13407 provided four major UCD themes, that is understanding and specifying the context of use, specifying the user and organizational requirements, l producing design solutions, and l evaluating designs against requirements (Bevan and Curson, 1999). l l

In 2010 this standard was revised to ISO 9241-210:2010 “Ergonomics of human-system interactiondPart 210: Human-centred design for interactive systems.” It is a process standard written for personnel involved in managing design processes. It presents a high level overview of the activities that are recommended for UCD (ISO 9241-210, 2010). ISO 9241-210 describes six key principles that will ensure UCD: (1) The design is based upon an explicit understanding of users, tasks, and environments: This principle is about understanding users’ “context of use,” that is, who are the users, which tasks should be accomplished with the system, and where will the system be used? Consider this: you need to design a mobile accident reporting App for first responders. What would you need to consider? (2) Users are involved throughout the design: This principle ensures that design teams involve users in all design phases. The idea is that design teams are not just conducting a focus group at the start or administer a survey at the end of a project. The standard emphasizes that user participation needs to be active, that is, users need to be engaged in the design process. Consider this: How can you find out about user tasks before you start working on an App design? How can you get users engaged and excited about using a new application? (3) The design is driven and refined by user-centered evaluations: In the past, often this step did not get much attention during the design process. Usually it was applied, if at all, at the end of a design process. User evaluation generally helps design teams to improve a development. ISO 9241-210 suggests that user testing should be conducted throughout a project and not just at the end. Preliminary designs can be tested with mock-ups and prototypes early on to receive user feedback and initial performance indicators. Consider this: You have three possible user interface designs. How would you decide which one to pick? (4) The process is iterative: “The most appropriate design for an interactive system cannot typically be achieved without iteration” (ISO 9241-210, 2010). This section describes that an application development should not be linear, but cyclic. Sometimes initial designs will fail during product development. Allowing to revisit stakeholder requirements, reviewing artifacts, testing design ideas, or revising the task catalog will help to create a user-centered application. Consider this: How can you solicit application requirements from a client that has not used a comparable system or struggles to describe the envisioned functionality, including the graphical user interface design? (5) The design addresses the whole user experience: This aspect looks not only at the performance aspect of a design, for example, time on task. It also considers the wider perceptual and emotional aspects associated with a user experience. Consider this: Will users enjoy working with the new application? Will they be enabled to easily report use issues with the new system? (6) The design team has multidisciplinary skills and perspectives: A range of experiences and views need to be included to address essential design and application aspects, that is, technical writing, programming, graphic user interface design, accessibility issues, marketing, etc. Consider this: How successful will a project team be that consists predominately of programmers and project managers? (Garrett, 2011; Lowdermilk, 2013; Roth et al., 2015).

1.30.2.1

Working With Users

While the UCD principles, outlined by ISO are relatively intuitive, the core question usually is how to measure and document the success of an application, including user interface interactions. The unique key to the success of an application development is the user. Thus one of the core principles in UCD is to describe and define the intended user group as best as possible. The application development team needs to ask several questions: Who are the core users of the application? Will this group evolve over time, that is, will additional departments using this application? How can the requirements of the core users best summarized? Only well-defined tasks and core users will provide a solid foundation for successful application development and later use. Future users of an application have often slightly different requirements, skill or performance levels. These users can range from novice, intermediate to expert users who might already have utilized an application for the assumed tasks in the past. Within this group, their might be users that understand the general concepts of a workflow very well, but might not have used and will not use the new application. Then there might be users that have and will only memorize a sequence of certain steps within an application; or users that are experts in a subset of the application, but might not have used all available functions due to their assignments (Horton, 1994). In addition the core users might not know what they want and need when faced with the challenge of contributing to a system design. Thus the design team needs to translate existing tasks and new tasks requests into functional requirements (Tsou and Curran, 2008). Noyes and Baber (1999) propose simple steps for defining and selecting users for design tasks: l l

define the characteristics of the users, and work with a representative sample.

440

User-Centered Design for Geoinformation Technologies

Hackos and Redish (1998) provide more guidance in this matter. They suggest to describe the main user characteristics and user numbers, describe the main user groups and prioritize them, l select a representative sample from the group, l meet with the users and revise the group categories, if necessary. l l

1.30.2.2

Usability and Usefulness

Usability is one criterion to describe the success or failure of the UCD process. Usability describes the ease of using an interface to complete a set of tasks, that is, objectives (Grinstein et al., 2003). Nielsen (1993) and Shackel (1991) developed usability parameters for application development. These usability parameters provide clues about whether the designed user interface or software is: easy to learn, efficient to use, l easy to remember, l prevents user errors, and l is pleasant to use (Nielsen, 1993). l l

Usability has also been defined as an ISO standard. It is “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use” (ISO 9241-11, 1998). Usability can be described as an overall characteristic, where: effectiveness defines the context to which the intended goals are achieved, efficiency describes the time, money, and mental effort put into reaching the objectives and l satisfaction relates to whether the user finds the application performance acceptable (ISO 9241-11, 1998). l l

An important achievement of the UCD process is to ensure usefulness of the developed application (Slocum et al., 2001). Usefulness can be best described as a process that allows to fulfill the desired tasks and objectives of a user and organization (Fuhrmann et al., 2005). Usefulness is often embedded in the effectiveness measurements during user testing (Bowman et al., 2002; Fuhrmann et al., 2005; ISO 9241-11, 1998). It is a description and measure of the alignment or mismatch between users, tasks, and available application functionality.

1.30.2.3

Ethical Issues in UCD

While UCD has become an important component for geoinformation technology development, one has to realize that development teams cannot go out and conduct user studies. Especially in the academic environment an Institutional Review Board (IRB) determines if some of the envisioned user participations are suitable. IRBs are responsible for reviewing, that is, approving or disapproving medical and behavioral research that involves humans (Bankert and Amdur, 2005). The IRBs are guided by standards published by the U.S. Food and Drug Administration and Department of Health and Human Services (Amdur and Bankert, 2011). Human subject research at higher education and government organizations must be approved and monitored by an IRB. All universities and colleges that facilitate research usually have an IRB established that meets at regular intervals to review proposed studies. These IRBs have the final authority over any research with human subjects. Researchers considering involving human subjects need to complete IRB investigator training that is provided by the university or via an educational training provider (often through CITI, a Collaborative Institutional Training Initiative). During the IRB training participants learn how to recruit participants, protect the identity of participants, safely store study data, identify and document risks, disclose information about the study to participants, and how to discard old study information (Braunschweiger and Goodman, 2007; Braunschweiger and Hansen, 2010). The IRB requires researchers to think about participant protection issues so that risks will be minimized, participant consent will be archived, confidentiality and anonymity will be ensured, and vulnerable populations will be protected. Commercial environments are not fully exempt from protecting participants in user studies through an IRB, but usually the decision rests with the project manager and the company ethics policies. However, if a company receives U.S. government funding for a project, an IRB board approval is required. Often the IRB review is conducted by a third-party in these cases (Jackman and Kanerva, 2016). However it is safe to say: If human subjects are involved in any kind of user study, IRB training is essential and the study should be reviewed or exempted by the organizational IRB. Swierenga and Pierce (2012) developed a code of professional conduct for user experience practitioners. These ethical principles state that their members will: act in the best interest of everyone, be honest with everyone, l do no harm and if possible provide benefits, l act with integrity, l avoid conflicts of interest, l l

User-Centered Design for Geoinformation Technologies l l

441

respect privacy, confidentiality, and anonymity, provide all resultant data (Swierenga and Pierce, 2012).

1.30.3

UCD Methods

At its core UCD involves five different phases that can be applied to a range of methodologies. While reviewing all possible methodologies for UCD would exceed the purpose of this section, the most prominent for each phase will be introduced here. Maguire (2001) classifies the five UCD phases in: UCD planning, specifying context of use, l user and organizational requirements, l prototyping, and l usability testing. Fig. 1 highlights the iterative nature of the UCD process. l l

1.30.3.1

UCD Planning

During the first project phase all stakeholders should participate in a “kick-off” meeting that creates a common vision and initial objectives for the application development. This stage is an important component of the UCD process as it prepares the project for effective information flow and stakeholder involvement. Maguire (2001) describes that the initial meeting provides a discussion platform about how UCD can contribute to the project objectives and allows to prioritize user involvement. A second aspect of the initial meeting is to develop a cost-benefit analysis. Heavy user involvement and usability testing can become costly very quickly and delay release timelines. Thus, the initial planning meetings aim to carefully calculate the costs and time associated with an UCD development, including necessary revisions and versioning. Less application revision and help requests after the first product release will generally indicate satisfied users, less user errors, task efficiency, and easy learnability (Vredenburg et al., 2002).

1.30.3.2

Specifying Context of Use

Understanding the context of use is one of the most critical components in the UCD process. Often a new application needs to be embedded into existing workflows or the application will enhance or change processes that have been handled in a certain fashion for years. Thus, it is important to understand what the new application needs to achieve. A good geospatial example is community planning processes that involve citizen participation. In these processes users contribute in a certain role and take part in specific, well-defined processes. A good understanding of user goals and tasks, for example, editing land-use change suggestions or other planning and zoning tasks, is essential for a successful user involvement and citizen response documentation. In this particular case, the focus is not solely on the user but on the task(s) that the application needs to support, that is, development team members need to understand the procedural, technical, physical, social, and/or organizational requirements that will ensure usefulness of the application. Several methods have been suggested for this phase. One of the most common methods in this step is the survey of exiting users. This is often done by questionnaires or interviews. While questionnaires and interviews can be held at different stages of the UCD process, the context of use assessment focuses on five categories, that is, organizational environment, physical environment, technical environment, tasks, and the users. Given the

Planning

Context of use

Testing UCD

Prototyping

User requirements

Fig. 1 The user-centered design cycle. Adapted from Maguire, M. (2001). Methods to support human-centred design. International Journal of Human-Computer Studies 55, 587–634.

442

User-Centered Design for Geoinformation Technologies

possible range of information needed, this task becomes easily daunting and thus needs to be structured without losing too much valuable information or taking too much of the project time. Thomas and Bevan (1996) developed a context of user handbook that can serve as a potential template for developing such a questionnaire. Questionnaires usually contain a set of predetermined questions that are typically answered in a fixed sequence (Kirwan and Ainsworth, 1992). Overall questionnaires can be used to reach a wider user group but they are usually less flexible compared to interviews or scenario developing since questions and answers are often determined in advance. Open-ended questions could solve this problem but some studies have indicated that answers on open-ended UCD questions often do not contain additional information (Hackos and Redish, 1998). Answers to closed questions can be analyzed more rigorously and allow the processing of many responses. Overall questionnaires are generally inexpensive and easy to use within the UCD process (Nielsen, 1993; Dix et al., 1998). Many aspects in UCD can be studied by talking with users. Interviews can be held at different stages of the UCD process and are a method for providing first-hand information by potential users. Two kinds of interviews are possible: structured and unstructured. In structured interviews the content of the interview, its questions, and sequence is predetermined. This is not the case for unstructured interviews (Miller and Salkind, 2002). For both interview types the investigator talks to one human subject at a time, usually an expert in the domain which the new application will be developed for. This data collection elicits information about user preferences, impressions, and attitudes and might reveal problems or tasks that were not anticipated before (Rubin and Chisnell, 2008). Interviewing is also an inexpensive method for data collection. Kirwan and Ainsworth (1992) distinguish informal and survey interviews. Informal interviews are set up to collect a wide range of information on a task situation, while survey interviews have a more specific objective, that is, they might just review a section of the task in more detail. A very time consuming and expensive method to assess the context of use is through direct observation. This method is most appropriate for integrating applications in larger organizations that will integrate a new application into their existing workflows. The general idea is to conduct a workplace shadowing to better understand the tasks that are associated with certain positions and workflows within an office or branch. One or several of the user experience team members will observe future users and document work routines. This monitoring is done in the background and often combined with interviews or focus groups to post questions that need clarification (Hackos and Redish, 1998). A method that is much easier to conduct and builds on existing task or workflow knowledge is card sorting (Spencer, 2009). Card sorting allows to create, document, and verify tasks and workflows as it organizes tasks and other relevant topics into categories and asks users to group these cards into groups and, depending on the task, hierarchies. This context of user method allows to structure content and tasks, decides on the general workflow, and reveals information about labeling and design the user interface (Roth et al., 2011; Spencer, 2009).

1.30.3.3

User Requirements

While one focus in the UCD process is on the use of the application, another focus should be on the users. Establishing user requirements is not a simple task, especially if the user group is very diverse (Vredenburg et al., 2002). The use of context elicitation methods, outlined earlier, generally allow to embed user specific questions, however, there are several methods that provide an additional base to assess and document user requirements. Focus groups are often used to bring a group of potential users together to discuss their requirements for an improved user experience. Focus groups are low-cost and informal qualitative meetings, usually consisting of 6–10 domain-specific users and one moderator. In the course of a session, users are introduced to the topic of interest and provided with prompting questions to start the discussion. The general idea is that the group discusses a topic freely and formulates ideas while the moderator leads the discussion and keeps it focused. Usually a set of predetermined questions is prepared, including some probes to encourage everyone to express different views and to stimulate the discussion between all participants (Krueger, 1998). Focus groups that consist of individuals who do not know one another are considered to be more vivid and free from hierarchical burden (Monmonier and Gluck, 1994). Scenarios have also become a popular method to explore future uses of an application and user requirements. Carroll (2000) describes scenarios as “stories about people and their activities.” Usually these stories consist of user-interaction narratives, which are descriptions of what users do and experience as they try to make use of hardware and software. Nielsen (1995) describes scenarios as a description of an individual user who interacts with an application to complete a certain objective. Usually tasks and interactions are described as a sequence of actions and events, representing interactions that a user does. Scenarios, based on the context of use descriptions, can be realistic descriptions of what a user tries to achieve without having to develop a prototype. Rather it allows to describe examples of future use and thus helps providing user requirements. Scenarios are very helpful, especially in the beginning of the UCD process.

1.30.3.4

Prototyping and Usability Testing

Once a good understanding of users and uses is achieved functionality and design requirements can be formalized. Often this process is supported by creating simple mock-ups of a user interface to test the acceptance with future users. These prototypes do not need to be fully functional as much of the design process is focused on the user experience. Paper prototyping applies hand drawn user interfaces to rapidly design, simulate, and test user interfaces. While this technique is relatively simple, it allows user involvement at an early stage in the prototype design. It is flexible and cost effective, encourages creativity and communication, and does not require any programming or scripting skills (Nielsen, 1993). A similar technique is storyboarding where user interface designers provide a series of interface sketches to illustrate and organize workflows and interface design suggestions (Hackos and

User-Centered Design for Geoinformation Technologies

443

Redish, 1998). Software prototyping is often done after the initial prototyping steps to explore and develop interface style guides for realistic working conditions. While the prototype might be functionally still a mock-up, this step allows to compare and test different interface design proposals prior to full implementation. After developing a prototype, the application needs to be tested on its usability in order to find out if the user-task model behind it is valid. There are several qualitative and quantitative evaluation methods available with many advantages and disadvantages (Rubin and Chisnell, 2008). Overall, usability testing can be undertaken at three different levels: theory-based methods, where the design is evaluated by the development team themselves, expert-based methods, where external interface design experts provide insight and provide design recommendation, and l user-centered methods that involve members of the target user group to provide qualitative and quantitative feedback (Barnum and Dragga, 2001; Dumas and Redish, 1999). l l

Heuristic evaluation is an affordable expert-based method that can be conducted fairly quickly (Nielsen, 1993). It is usually applied when a prototype exists and the application responds to user interaction. An expert evaluator receives a set of heuristics and has to judge if these heuristics are applicable or not (Lindgaard, 1994). Usually, the number and types of problems identified depend very much on the domain expertise of the evaluator. Thus, it is important to recruit both: experts in user experience and the context of use domain (Hackos and Redish, 1998; Shneiderman et al., 2016). The outcome of the inspection method will reveal a range of usability problems from serious to trivial ones. Overall heuristic evaluation has several advantages as the study is rather quick to perform, relatively inexpensive, and uncovers, depending on the team expertise, many potential usability issues. However, potential bias, lack of expertise, and difficulties in ranking usability problems might lead to heuristic evaluation success rates between 30% and 95% (Nielsen, 1992; Nielsen, 1993; Lindgaard, 1994). Thinking aloud is a prominent example for user involvement during the evaluation. The method was developed to investigate human cognitive processes during problem solving tasks and short-term memory processes (Weidle and Wagner, 1994; Karat, 1997). In UCD this method is used to assess human skills, knowledge acquisition, and the usability of user interfaces (Rubin and Chisnell, 2008). While testing a user interface the thinking aloud method asks participants to verbalize their thoughts as they try to solve a particular task. Participants usually do not only report how they solve a particular task but also include information about their perceptions and feelings such as anger or fear, which is really important to gauge user satisfaction (Weidle and Wagner, 1994). Additionally, participants often subjectively comment on the prototype, which supports identifying flaws and errors in the user interface. It is important to note that thinking aloud only documents cognitive processes that the participant verbalizes. Other processes might not be identified using this technique, since not all mental processes might be verbalized (Kirwan and Ainsworth, 1992; Someren et al., 1994). Major advantages of utilizing thinking aloud are the qualitative data richness, ability to map cognitive processes in combination with performance and preference information, and the relative rapid data collection. The data analysis is, however, usually very time consuming, relies on the accuracy of verbal protocols, and it is often uncertain if verbal abilities reflect cognitive processes correctly. Thus, Nielsen (1993) suggests to use a simplified thinking aloud method to find the most important usability issues faster. The simplified method recruits five to seven participants and the experiment team records the thinking aloud session and simultaneously takes notes during the test session. The final report will include user quotes and user experience observations of the experiment team. Lindgaard (1994) and Nielsen (1993) state that generally the outcome of a detailed thinking aloud protocol analysis is not significantly better than a written report. Thinking aloud is often combined with eye movement tracking to receive a deeper insight into cognitive processes. While this technique is not new, recent technological developments have driven down the price of eye movement tracking units so that eye movement tracking can be used in more application developments. At its core modern eye tracking technology is nonintrusive and based on near-infrared light that is beamed toward the eyes. The reflection of these beams in the cornea is tracked in real time and movement patterns of the eyes are recorded. Eye movement trackers have been developed for stationary and mobile use. User experience researchers investigate different parameters when working with eye movement tracking. Gaze points and fixations are the most often used metrics. Gaze points are locations that the eye movement tracker identifies as it records the eye movements. Depending on the sampling rate and the measurement accuracy (can vary between 0.1 and 1 degree) the gaze points are usually mapped as small dots connected by lines of movement. Fixations indicate areas of interest that were examined by the eye for a longer period of time. In user experience design these fixations can mean several things, that is, a prolonged fixation could indicate an area of interest or it could also indicate problems in a particular region of the interface, for example, higher cognitive loads. Often user experience designers create heat maps and areas of interest from the gaze and fixation patterns so that sections of an interface can be reviewed. Other eye movement parameters give additional insight into cognitive process and emotional responses, for example, saccades, eye blinks, pupil size dilation, vergence or vestibulo-ocular movements (Fuhrmann et al., 2009; Nielsen and Pernice, 2010). While eye movement tracking has become an important component in user experience and UCD, it should be combined with additional assessment methods to archive a common picture of the application or interface in development.

1.30.4

Conclusions

UCD has become an integral and embedded component in geoinformation application developments. At its core it focuses on current and future users of an application and ensures usefulness and usability. While the toolbox and possible methods for

444

User-Centered Design for Geoinformation Technologies

UCD might be overwhelming in the beginning of a project, good project planning will ensure a balanced approach. Nielsen (1989) introduced the idea of “discount usability testing” to the research community and started a long discussion about standards and quality of user experience design. At its core Nielsen (1989) promoted simplified user testing with five participants, simplified prototyping at early project stages, and heuristic evaluations. This approach will allow, according to Nielsen (1989), early and rapid application iterations and thus lower the production cost and shorten the project timeframe. Web-based GIS and mobile, location-based services provide a broad bandwidth of UCD issues that reach from content topics to editing, analytics, and mapping requirements (Tsou, 2011). UCD has already changed how cartographers and GIScientists approach modern geoinformation technology design and the UCD efforts will intensify over the coming years, as geoinformation technology is becoming more ubiquitous and requires a higher, more intuitive level in user experience. National and international geospatial agencies have started to address these issues and aim to provide user-centered, web-based geoinformation access and dissemination platforms. The spectrum of UCD approaches in the geoinformation field is wide and the overall objectives and applications of locationbased geoinformation services are manifold. Tsou (2011) concludes that one important outcome of these innovative developments will be quality of life enhancements. The UCD process is ongoing: Many of the stationary geospatial applications are transitioning now on mobile devices, and new geovisualization technologies are becoming available at the consumer level. Geospatial augmented reality applications will provide new challenges and opportunities for developing UCD guidelines and heuristics in the geoinformation field. This next evolution in the geoinformation technology market will provide a wealth of new research questions and outstanding project and career opportunities to graduates that are interested in combining geoinformation research and development with UCD.

References Abras, C., Maloney-Krichmar, D., Preece, J., 2004. User-centered design. In: Sims, B.W. (Ed.), Berkshire encyclopedia of human-computer interaction, vol. 2. Berkshire Publishing Group, Great Barrington, MA, pp. 763–768. Amdur, R.J., Bankert, E.A., 2011. Institutional Review Board member handbook, 3rd edn. Jones and Bartlett, Sudbury. Bankert, E.A., Amdur, R.J., 2005. Institutional Review Board: management and function, 2nd edn. Jones and Bartlett, Sudbury. Barnum, C.M., Dragga, S., 2001. Usability testing and research. Allyn & Bacon, Needham Heights. Bevan, N., Curson, I., 1999. Planning and implementing user-centered design. In: Proceedings ACM CHI 1999 Conference, Adjunct Proceedings. ACM, Pittsburgh. Bowman, D.A., Gabbard, J.L., Hix, D., 2002. A survey of usability evaluation in virtual environments: classification and comparison of methods. Presence 11, 404–424. Braunschweiger, P., Goodman, K., 2007. The CITI program: an international online resource for education in human subjects protection and the responsible conduct of research. Academic Medicine 82, 861–864. Braunschweiger, P., Hansen, K., 2010. Collaborative Institutional Training Initiative (CITI). Journal of Clinical Research Best Practices 6, 1–6. Carroll, J.M., 2000. Five reasons for scenario-based design. Interacting with Computers 13, 43–60. Davies, C., Medyckyj-Scott, D., 1996. GIS users observed. International Journal of Geographical Information Systems 10, 363–384. Dix, A.J., Finlay, J.E., Abowd, G.D., Beale, R., 1998. Human-computer interaction. Prentice Hall, Englewood Cliffs. Dumas, J.S., Redish, J., 1999. A practical guide to usability testing. Intellect, Exeter. Fuhrmann, S., Ahonen, P., Edsall, R.M., Fabrikant, S.I., Koua, E.L., Tobon, C., Ware, C., Wilson, S., 2005. Making useful and useable geovisualization. In: Dykes, J., MacEachren, A.M., Kraak, M.J. (Eds.), Exploring geovisualization. Elsevier, Amsterdam, pp. 553–566. Fuhrmann, S., Komogortsev, O., Tamir, D., 2009. Investigating hologram-based route planning. Transactions in GIS 13, 177–196. Garrett, J.J., 2011. The elements of user experience: user-centered design for the web and beyond, 2nd edn. New Riders, Berkeley. Grinstein, G., Kobsa, A., Plaisant, C., Shneiderman, B., Stasko, J.T., 2003. Which comes first, usability or utility? In: Proceedings of the 14th IEEE Visualization 2003. IEEE Computer Society, Washington, DC, p. 112. Hackos, J.T., Redish, J.C., 1998. User and task analysis for interface design. Wiley, New York. Horton, W., 1994. Designing and writing online documentation, 2nd edn. Wiley, New York. ISO 13407, 1999. Human-centred design processes for interactive systems. International Organization for Standardization, Genève. ISO 9241-11, 1998. Ergonomic requirements for office work with visual display terminals (VDTs)dPart 11: Guidance on usability. International Organization for Standardization, Genève. ISO 9241-210, 2010. Ergonomics of human-system interactiondPart 210: Human-centred design for interactive systems. International Organization for Standardization, Genève. Jackman, M., Kanerva, L., 2016. Evolving the IRB: building robust review for industry research. Washington and Lee Law Review Online 72, 442–457. Karat, J., 1997. User-centered software evaluation methodologies. In: Helander, M., Landauer, T.K., Prabhu, P. (Eds.), Handbook of human-computer interaction. Elsevier, Amsterdam, pp. 689–704. Kirwan, B., Ainsworth, L.K. (Eds.), 1992. A guide to task analysis: the task analysis working group. Taylor and Francis, London. Krueger, R.A., 1998. Moderating focus groups. Sage, Thousand Oaks. Lindgaard, G., 1994. Usability testing and system evaluation: a guide for designing useful computer systems. Chapman & Hall, London. Lowdermilk, T., 2013. User-centered design: a developer’s guide to building user-friendly applications. O’Reilly Media, Sebastopol. Maguire, M., 2001. Methods to support human-centred design. International Journal of Human-Computer Studies 55, 587–634. Medyckyj-Scott, D., Hearnshaw, H.M. (Eds.), 1993. Human factors in geographical information systems. Belhaven Press, London. Miller, D.C., Salkind, N.J., 2002. Handbook of research design and social measurement. Sage, Thousand Oaks. Monmonier, M., Gluck, M., 1994. Focus groups for design improvement in dynamic cartography. Cartography and Geographic Information Systems 21, 37–47. Nielsen, J., 1989. Usability engineering at a discount. In: Proceedings of the third international conference on human-computer interaction on designing and using human-computer interfaces and knowledge based systems. Elsevier Science, New York, pp. 394–401. Nielsen, J., 1992. Finding usability problems through heuristic evaluation. In: Proceedings ACM CHI 1992 Conference. ACM, Monterey, pp. 373–380. Nielsen, J., 1993. Usability engineering. AP Professional, Boston. Nielsen, J., 1995. Scenarios in discount usability engineering. In: Carroll, J.M. (Ed.), Scenario-based design. Wiley, New York, pp. 59–83. Nielsen, J., Pernice, K., 2010. Eyetracking web usability. New Riders, Berkeley. Norman, D.A., 1988. The design of everyday things. Currency/Doubleday, New York. Norman, D.A., Draper, S.W. (Eds.), 1986. User centered system design. Lawrence Erlbaum Associates, Hillsdale.

User-Centered Design for Geoinformation Technologies

445

Noyes, J., Baber, C., 1999. User-centred design of systems. Springer, Berlin. Roth, R.E., Finch, B.G., Blanford, J.I., Klippel, A., Robinson, A.C., MacEachren, A.M., 2011. Card sorting for cartographic research and practice. Cartography and Geographic Information Science 38, 89–99. Roth, R.E., Ross, K.S., MacEachren, A.M., 2015. User-centered design for interactive maps: a case study in crime analysis. ISPRS International Journal of Geo-Information 4, 262–301. Rubin, J., Chisnell, D., 2008. Handbook of usability testing: how to plan, design and conduct effective tests, 2nd edn. Wiley, Indianapolis. Shackel, B., 1991. Usability-context, framework, definition, design and evaluation. In: Shackel, B., Richardson, S.J. (Eds.), Human factors for informatics usability. Cambridge University Press, Cambridge, pp. 21–37. Shneiderman, B., Plaisant, C., Cohen, M., Jacobs, S., Elmqvist, N., Diakopoulos, N., 2016. Designing the user interface: strategies for effective human-computer interaction, 6th edn. Pearson, Boston. Slocum, T.A., Blok, C., Jiang, B., Koussoulakou, A., Montello, D.R., Fuhrmann, S., Hedley, N.R., 2001. Cognitive and usability issues in geovisualization. Cartography and Geographic Information Systems 28, 61–75. Someren, M.V., Barnard, Y.F., Sandberg, J.A., 1994. The think aloud method: a practical approach to modelling cognitive processes. Academic Press, London. Spencer, D., 2009. Card sorting: designing usable categories. Rosenfeld Media, Brooklyn. Swierenga, S., Pierce, G., 2012. Should we conduct this usability study? Ethics considerations in evaluations. User Experience Magazine 11, 24–25. Thomas, C., Bevan, N., 1996. Usability context analysis: a practical guide. Serco Usability Services, London. Tsou, M.H., 2011. Revisiting web cartography in the United States: the rise of user-centered design. Cartography and Geographic Information Science 38 (3), 249–256. Tsou, M.H., Curran, J.M., 2008. User-centered design approaches for web mapping applications: a case study with USGS hydrological data in the United States. In: Peterson, M.P. (Ed.), International perspectives on maps and the internet. Springer, Berlin, pp. 301–321. Vredenburg, K., Isensee, S., Righi, C., 2002. User-centered design – an integrated approach. Prentice Hall PTR, Upper Saddle River. Weidle, R., Wagner, A.C., 1994. Die Methode des Lauten Denkens. In: Huber, G.L., Mandl, H. (Eds.), Verbale Daten. Beltz Psychologie Verlag, Weinheim, pp. 81–103.

1.31

GIS Project Management

Jochen Albrecht, Hunter College, City University of New York, New York, NY, United States © 2018 Elsevier Inc. All rights reserved.

1.31.1 1.31.1.1 1.31.1.1.1 1.31.1.2 1.31.1.2.1 1.31.1.3 1.31.2 1.31.2.1 1.31.2.1.1 1.31.2.1.2 1.31.2.1.3 1.31.2.1.4 1.31.2.1.5 1.31.2.1.6 1.31.2.2 1.31.2.3 1.31.2.4 1.31.2.4.1 1.31.2.4.2 1.31.2.4.3 1.31.2.4.4 1.31.2.4.5 1.31.2.4.6 1.31.2.4.7 1.31.2.4.8 1.31.2.4.9 1.31.2.4.10 1.31.2.4.11 1.31.2.4.12 1.31.2.4.13 1.31.2.4.14 1.31.3 1.31.3.1 1.31.3.1.1 1.31.3.1.2 1.31.3.1.3 1.31.3.1.4 1.31.3.2 1.31.3.3 1.31.3.4 1.31.4 1.31.4.1 1.31.4.2 1.31.4.3 1.31.4.3.1 1.31.4.3.2 1.31.4.3.3 1.31.4.3.4 1.31.4.4 1.31.4.5 1.31.5 References

446

Project Management Overview Project Attributes Definition of a project Project Characteristics Projects versus programs versus portfolios The Process of Project Management Project Management and GIS Implementation Concepts The Triple Constraint of Scope, Schedule, and Budget Cost Scope Quality Risk Resources Time GIS Project Team Roles and Responsibilities Communications Project Management and System Development Lifecycles Project initiation Project planning SCRUM development overview System prototyping Project execution and control Resource planning Procurement management Budget planning Cost-benefit analysis Application development strategies and techniques Project risk and opportunity analysis Monitoring and control Deployment (technology roll-out activities) Closeout and evaluation Stakeholder Management Stakeholders Top management Project team Internal customers Contractors and suppliers Politics of Projects Culture of Stakeholders Strategies for Stakeholder Management Project Management Expertise Application Knowledge Understanding the Project Environment Management Knowledge and Skills Communication Trust building Leadership Problem solving Certification Ethics Summary

447 447 447 448 448 449 449 449 449 450 450 451 451 451 452 453 453 454 456 459 459 460 461 462 464 466 466 466 468 471 471 472 473 473 473 473 474 474 474 474 475 475 475 475 476 476 476 476 476 477 477 477

GIS Project Management

447

Glossary Budgeted Costs of Work Scheduled/Performed Respectively, the detailed cost estimates for each activity in the project, or the sum of those activity costs that have been delivered so far. Earned Value Analysis The intermediate sum of all costs expended minus the values already earned. Earned Value Management Periodically compares the budgeted costs with the actual costs during the project. Myers-Briggs (Type Indicator) A large world-wide administered survey of personality traits that helps human resource managers to create teams that work with less interpersonal friction. Project Business Case The project proposal that describes what problem or opportunity will be addressed by the project. QA/QC Is the acronym for the twin concepts of quality assurance/quality control. QC embodies the set of methods to test for adherence to quality standards at each step of the project. QA is the larger framework, within which the QC tests are designed and implemented. Schedule Variance The costs accrued by a project because of the amount of time it is behind schedule. SCRUM A project management methodology that allows a manager to start by building on empirical data, and then replan and iterate from there. SMART Goals Goals that are specific, measurable, agreed-upon, realistic, and time-framed. Stakeholders Are individuals who either care about or have a vested interest in the project.

1.31.1

Project Management Overview

There is a big gulf between GIScience as an academic endeavor and its application in the form of GIS project management in the real world. Project activities are complex because they rarely involve routine repetitive acts, but often require specific knowledge and skills to be used in their design, execution, and management. This article explains what project management is, its objectives, and the required ingredients from personnel to budgets, and the integration of the GIS project into the larger context of an organization’s and even societal culture.

1.31.1.1

Project Attributes

Projects are temporary in nature. They are not an everyday business process and have definitive start dates and end dates. This characteristic is important because a large part of the project effort is dedicated to ensuring that the project is completed at the appointed time. To do this, schedules are created showing when tasks should begin and end. Projects can last minutes, hours, days, weeks, months, or years. Each project is unique: they exist to produce a product or service that hasn’t existed before. This is in contrast to operations (which a project may consist of), as they are typically ongoing and repetitive. The purpose of operations is to keep the organization functioning while the purpose of a project is to meet its goals and conclude. Therefore, operations are ongoing while projects are unique and temporary.

1.31.1.1.1

Definition of a project

There are many definitions of a project. Wysocki et al. (2003, p. 38), for example, define a project as “a sequence of unique, complex and connected activities having one goal or purpose that must be completed by a specific time, within budget, and according to specification”. Central to this definition is a logical sequence of activities that must be completed within a specific time frame. The Project Management Institute (PMI) (2013a, p. 8) defines project management “the application of knowledge, skills, tools, and techniques to project activities to meet the project requirement”. This definition is supplemented by five Project Management Process Groups (PMPG) that describe the lifecycle of typical projects, and 10 knowledge areas in which project managers must be competent. The five PMPG are initiating processes, planning processes, executing processes, monitoring and controlling processes, and closing processes. The 10 knowledge areas, on the other hand, focus on management expertise in project integration management, project scope management, time management, cost management, quality management, human resources management, communications management, risk management, procurement management, and stakeholder management. Project management has also been defined in many other ways in the related literature. However, it is apparent that many authors have accepted the PMI proposition that project management is a special branch of management characterized by the application of management principles and best practices that seek to steer the initiation, planning, implementation, monitoring, and closing of projects toward their ultimate success. It is also apparent that many authors have adopted the PMI’s approach to group all project management activities into five sequential phases or levels, commonly called the project management lifecycle (PMLC).

448

GIS Project Management

1.31.1.2

Project Characteristics

Projects have several characteristics: Projects are unique. Projects are temporary in nature and have a definite beginning and ending date. l Projects are completed when the project goals are achieved or it’s determined the project is no longer viable. l l

A successful project is one that meets or exceeds the expectations of the stakeholders.

1.31.1.2.1

Projects versus programs versus portfolios

Every organization that has multiple GIS users has de facto a GIS Program (Peters, 2008). If the users get their work done and are not aware of the business unit that allows them to do their work, then this means that the program manager does her job well. If on the other hand every GIS project starts from scratch and the only institutional memory is buried in the heads of those who did other GIS projects before, then the tool that constitutes GIS is clearly not used to its highest potential. The PMI defines program management as “the application of knowledge, skills, tools, and techniques to a program to meet the program requirements and to obtain benefits and control not available by managing projects individually”, (PMI, 2013b, p. 6). The scope of programs is hence beyond the sum of individual projects and includes training, operations, and maintenance activities. All this applies to GIS Programs as well. Two dimensions are useful to keep in mind when there is confusion about the differences between projects and programs: Uncertainty; well-managed projects generally have a low level of uncertainty associated with them. This starts with project specification and improves as a project moves toward its goal. Programs, on the other hand, do not start out with a well-defined scope and require continuous adjustment. In extreme cases, a successful project may still be abandoned because its program context has changed. l Change management of projects is usually in the form of fixes when the original outcomes seem to become unattainable. Program management, however, anticipates changes and aims to adapt the program to changing contexts. l

The practice of GIS Programs would be categorized in management science as a portfolio, a higher level management structure that has temporal bounds and combines multiple programs to achieve an organization’s strategic objectives (PMI, 2013c). In addition, portfolio projects do not need to be related to each other. Both of these characteristics (no temporal bounds and possible nonrelatedness of the projects) are characteristic for GIS Programs. It follows that GIS Programs then combine the components of traditional programs and portfolios, namely strategic planning, governance, benefits management, and stakeholder engagement. GIS Programs, like portfolios, manage recurring activities (producing values) as well as projectized activities that are aimed at increasing value production capability. Projects come at a wide range of scopes from single-purpose projects that serve one-time objectives to departmental projects that typically are handled within a dedicated GIS department, enterprise-wide projects that may be line-managed by a GIS department, which may then play a strategic role within an organization, and finally consortial GIS projects, where the costs are shared by a large number of stakeholders. A project must have a well-defined goal with respect to the mission or mandate of an organization. In many instances, a project may be too complicated to be carried out as a single undertaking. Hence, it is necessary to divide it into several sub- or part-projects according to the prevailing organization structure (e.g., by departments or business functions) or geographical divisions (e.g., by regions, sales territories, or watersheds). Under such circumstances, each subproject is considered as a separate but interdependent undertaking in its own right. All subprojects have their specific goals but when added together these goals collectively constitute the specific goal of the parent project. Every project or subproject is generally subject to three constraints regardless of its objective and scale. As depicted in Fig. 1, these are: l

Time. Projects have definitive milestones that specify when particular components (e.g., progress reports, prototypes) must be delivered, and a completion date when the database system being developed will become fully functional and operational. Scope

Quality

Cost Fig. 1

The triple constraint of scope, schedule, and budget.

Schedule

GIS Project Management l l

449

Cost. Projects have cost or budgetary limits, which will impact the availability of human and technical resources. Specification. Deliverables of a project are required to meet a specific level of functionality and quality both independently and when working as a whole.

It is important to understand that the constraints of time, cost, and specification are interdependent and, as a result, changes in one constraint always cause changes in the others. For example, a change in the specification will inevitably lead to changes in the time and cost requirements. Similarly, delays in the delivery of intermediate and final products inevitably necessitate an extension of the project time frame, which will, in turn, increase the cost and resource requirements of the project. Clearly, the dynamic nature of the interplay among the constraints requires that projects must be properly managed in order to succeed. The principles and practice of management are often deployed in the context of managing tangible entities such as people, physical and financial assets, and the business operations of an organization. These same principles, however, can be equally applied to the management of tangible resources and nontangible activities that are required to complete a project.

1.31.1.3

The Process of Project Management

In spite of a long history of project management research, the application of that knowledge is lacking in both the GIS world as well as more generally in the realms of information technology or business administration. According to the CHAOS Report, published by the Standish Group (2016) that tracks over 50,000 projects around the world, some 20% of all projects fail and over 52% of all projects face major challenges, with only 22% considered to be successful. The vast majority of these challenges ad failures are avoidable by making sure that the business needs are understood early on and assuring that project management techniques are applied and followed. Having good project management skills does not completely eliminate problems, risks, or surprises. The value of good project management is to have standard processes in place to deal with all contingencies. Project management is the application of knowledge, skills, tools, and techniques applied to project activities in order to meet the project requirements. Project management is a process that includes planning, putting the project plan into action, and measuring progress and performance.

1.31.2

Project Management and GIS Implementation Concepts

The design, development, and implementation of a GIS task are complex tasks which should not be underestimated. They require leadership, adequate planning, and a project-wise approach. This section, therefore, provides a generic organizational framework which is meant to support the design, development, and installation of a GIS.

1.31.2.1

The Triple Constraint of Scope, Schedule, and Budget

On any project, there are a number of project constraints that are competing for the project manager’s attention. They are cost, scope, quality, risk, resources, and time. These six points are often abbreviated as the “triple constraint” of time, cost, and scope, illustrated in the form of a triangle to visualize the project work and see the relationship between the scope/quality, schedule/time, and cost/ resource. In this triangle, each side represents one of the constraints (or related constraints) wherein any changes to any one side cause a change in the other sides. The best projects have a perfectly balanced triangle. Maintaining this balance is difficult because projects are prone to change.

1.31.2.1.1

Cost

The definition of project success often includes completing the project within budget. Developing and controlling a project budget that will accomplish the project objectives is a critical project management skill. Although clients expect the project to be executed efficiently, cost pressures vary on projects. On some projects, the project completion or end date is the largest contributor to the project complexity. The development of a new drug to address a critical health issue, the production of a new product that will generate critical cash flow for a company, and the competitive advantage for a company to be first in the marketplace with a new technology are examples of projects with schedule pressures that override project costs. The accuracy of the project budget is related to the amount of information known by the project team. In the early stages of the project, the amount of information needed to develop a detailed budget is often missing. To address the lack of information, the project team develops different levels of project budget estimates. The conceptual estimate (or “ballpark estimate”) is developed with the least amount of knowledge. The major input into the conceptual estimate is expert knowledge or past experience. A project manager who has executed a similar project in the past can use those costs to estimate the costs of the current project. When more information is known, the project team can develop a rough order of magnitude (ROM) estimate. Additional information such as the approximate square feet of a building, the production capacity of a plant, and the approximate number of hours needed to develop a software program can provide a basis for providing a ROM estimate. After a project design is more complete, a detailed project estimate can be developed.

450

GIS Project Management

The cost of the project is tracked relative to the progress of the work and the estimate for accomplishing that work. Based on the cost estimate, the cost of the work performed is compared against the cost budgeted for that work. If the cost is significantly higher or lower, the project team explores reasons for the difference between expected costs and actual costs. Project costs may deviate from the budget because the prices in the marketplace were different from what was expected. Project costs may also deviate based on project performance. For example, a GIS manager estimated that a particular survey would take 750 labor hours, but 792 hours were actually expended. The project team captures the deviation between costs budgeted for work and the actual cost for work, revises the estimate as needed, and takes corrective action if the deviation appears to reflect a trend. The project manager is responsible for assuring that the project team develops cost estimates based on the best information available and revises those estimates as new or better information becomes available. The project manager is also responsible for tracking costs against the budget and conducting an analysis when project costs deviate significantly from the project estimate. The project manager then takes appropriate corrective action to ensure that project performance matches the revised project plan. More detail on aspects of project quality can be found in section “Budget planning”.

1.31.2.1.2

Scope

The scope is a document that defines the parametersdfactors that define a system and determine its behaviordof the project, what work is done within the boundaries of the project, and the work that is outside the project boundaries. The scope of work (SOW) is typically a written document that defines what work will be accomplished by the end of the projectdthe deliverables of the project. The project scope defines what will be done, and the project execution plan defines how the work will be accomplished. No template works for all projects. Some projects have a very detailed scope of work, and some have a short summary document. The quality of the scope is measured by the ability of the project manager and project stakeholders to develop and maintain a common understanding of what products or services the project will deliver. The size and detail of the project scope are related to the complexity profile of the project. A more complex project often requires a more detailed and comprehensive scope document. According to Burek (2011), the scope statement should include the following: l l l l l l

Description of the scope. Product acceptance criteria. Project deliverables. Project exclusions. Project constraints. Project assumptions.

The scope document is the basis for agreement by all parties. A clear project scope document is also critical to managing change on a project. Since the project scope reflects what work will be accomplished on the project, any change in expectations that is not captured and documented creates opportunity for confusion. One of the most common trends in projects is the incremental expansion in the project scope. This trend is labeled “scope creep.” Scope creep threatens the success of a project because the small increases in scope require additional resources that were not in the plan. Increasing the scope of the project is a common occurrence, and adjustments are made to the project budget and schedule to account for these changes. Scope creep occurs when these changes are not recognized or not managed. The ability of a project manager to identify potential changes is often related to the quality of the scope documents. Virtually all GIS project scopes include background/context, research on goal, objectives, information categories, and the actual information products depicted in Fig. 2. Events do occur that require the scope of the project to change. Changes in the marketplace may require a change in a product design or the timing of the product delivery. Changes in the client’s management team or the financial health of the client may also result in changes in the project scope. Changes in the project schedule, budget, or product quality will have an effect on the project plan. Generally, the later in the project the change occurs, the greater the increase in the project costs. Establishing a change management system for the project that captures changes to the project scope and assures that these changes are authorized by the appropriate level of management in the client’s organization is the responsibility of the project manager. The project manager also analyzes the cost and schedule impact of these changes and adjusts the project plan to reflect the changes authorized by the client. Changes to the scope can cause costs to increase or decrease. More detail on aspects of project scope management can be found in section “Defining the project scope”

1.31.2.1.3

Quality

Quality is a combination of the standards and criteria to which the project’s products must be delivered for them to perform effectively. The product must perform to provide the functionality expected, solve the identified problem, and deliver the benefit and value expected. It must also meet other performance requirements, or service levels, such as availability, reliability, and maintainability, and have acceptable finish and polish. Quality in a project is controlled through quality assurance (QA), which is the process of evaluating overall project performance on a regular basis to provide confidence that the project will satisfy the relevant quality standards. Project quality focuses on the end product or service deliverables that reflect the purpose of the project. The project manager is responsible for developing a project execution approach that provides for a clear understanding of the expected project deliverables and the quality specifications. Developing a good understanding of the project deliverables through documenting specifications and

GIS Project Management

451

Goal • Research question Background • Client / stakeholder aspects • Literature review

Information Products • Contain information categories • How to present info? • Text, tables, maps, graphs Fig. 2

Objectives • “Need to know” questions

Information Categories • Nouns of “need to know” questions

Components of a GIS project scope document.

expectations is critical to a good quality plan. The processes for ensuring that the specifications and expectations are met are integrated into the project execution plan. Just as the project budget and completion dates may change over the life of a project, the project specifications may also change. Changes in quality specifications are typically managed in the same process as cost or schedule changes. The impact of the changes is analyzed for impact on cost and schedule, and with appropriate approvals, changes are made to the project execution plan (see also section “Project execution and control”). Although any of the quality management techniques designed to make incremental improvement to work processes can be applied to a project work process, the character of a project (unique and relatively short in duration) makes small improvements less attractive on projects. Rework on projects, as with manufacturing operations, increases the cost of the product or service and often increases the time needed to complete the reworked activities. Because of the duration constraints of a project, the development of the appropriate skills, materials, and work processes early in the project is critical to project success. In more complex projects, time is allocated to developing a plan to understand and develop the appropriate levels of skills and work processes. Project management organizations that execute several similar types of projects may find process improvement tools useful in identifying and improving the baseline processes used on their projects. Process improvement tools may also be helpful in identifying cost and schedule improvement opportunities. Opportunities for improvement must be found quickly to influence project performance. The investment in time and resources to find improvements is greatest during the early stages of the project when the project is in the planning stages. During later project stages, as pressures to meet project schedule goals increase, the culture of the project is less conducive to making changes in work processes. More detail on aspects of project quality can be found in section “Monitoring and control”.

1.31.2.1.4

Risk

Risk is defined by potential external events that will have a negative impact on the project if they occur. Risk refers to the combination of the probability the event will occur and the impact on the project if the event occurs. If the combination of the probability of the occurrence and the impact on the project is too high, it is identified as a potential risk, which in turn should prompt the project manager to develop a proactive plan to manage that risk. More detail on aspects of project risk can be found in section “Project risk and opportunity analysis”.

1.31.2.1.5

Resources

Resources are required to carry out the project tasks. They can be people, equipment, facilities, funding, or anything else required for the completion of a project activity. More detail on aspects of project quality can be found in section “Resource planning”.

1.31.2.1.6

Time

Time is defined as the time to complete the project. Time is often the most frequent project oversight in developing projects. This is reflected in missed deadlines and incomplete deliverables. Proper control of the schedule requires the careful identification of tasks to be performed and accurate estimations of their durations, the sequence in which they are going to be done, and how people and other resources are to be allocated. Any schedule should take into account vacations and holidays.

452

GIS Project Management

The definition of project success often includes completing the project on time. The development and management of a project schedule that will complete the project on time is a primary responsibility of the project manager, and completing the project on time requires the development of a realistic plan and the effective management of the plan. On smaller projects, project managers may lead the development of the project plan and build a schedule to meet that plan. In larger and more complex projects, a project controls team that focuses on both costs and schedule planning and controlling functions will assist the project management team in developing the plan and tracking progress against the plan. To develop the project schedule, the project team does an analysis of the project scope, contract, and other information that helps the team define the project deliverables. Based on this information, the project team develops a milestone schedule. The milestone schedule establishes key dates throughout the life of a project that must be met for the project to finish on time. The key dates are often established to meet contractual obligations or established intervals that will reflect appropriate progress of the project. For less complex projects, a milestone schedule may be sufficient for tracking the progress of the project. For more complex projects, a more detailed schedule is required. To develop a more detailed schedule, the project team first develops a work breakdown structure (WBS)da description of tasks arranged in layers of detail. Although the project scope is the primary document for developing the WBS, the WBS incorporates all project deliverables and reflects any documents or information that clarifies the project deliverables. From the WBS, a project plan is developed. The project plan lists the activities that are needed to accomplish the work identified in the WBS. The more detailed the WBS, the more activities that are identified to accomplish the work. After the project team identifies the activities, the team sequences the activities according to the order in which the activities are to be accomplished. An outcome of the work process is the project logic diagram. The logic diagram represents the logical sequence of the activities needed to complete the project. The next step in the planning process is to develop an estimation of the time it will take to accomplish each activity or the activity duration. Some activities must be done sequentially, and some activities can be done concurrently. The planning process creates a project schedule by scheduling activities in a way that effectively and efficiently uses project resources and completes the project in the shortest time. In larger projects, several paths are created that represent a sequence of activities from the beginning to the end of the project. The longest path to the completion of the project is the critical path. If the critical path takes less time than is allowed by the client to complete the project, the project has a positive total float or project slack. If the client’s project completion date precedes the calculated critical path end date, the project has a negative float. Understanding and managing activities on the critical path are an important project management skill. To successfully manage a project, the project manager must also know how to accelerate a schedule to compensate for unanticipated events that delay critical activities. Compressingdcrashingdthe schedule is a term used to describe the techniques used to shorten the project schedule. During the life of the project, scheduling conflicts often occur, and the project manager is responsible for reducing these conflicts while maintaining project quality and meeting cost goals. More detail on aspects of project scheduling can be found in sections “Project schedule planning” and “Scheduling tools”.

1.31.2.2

GIS Project Team Roles and Responsibilities

Staffing the project with the right skills, at the right place, and at the right time is an important responsibility of the project management team. The project usually has two types of team members: functional managers and process managers. The functional managers and team focus on the technology of the project. On a construction project, the functional managers would include the engineering manager and construction superintendents. On a training project, the functional manager would include the professional trainers; on an information technology project, the software development managers would be functional managers. The project management team also includes project process managers. The project controls team would include process managers who have expertise in estimating, cost tracking, planning, and scheduling. The project manager needs functional and process expertise to plan and execute a successful project. Because projects are temporary, the staffing plan for a project typically reflects both the long-term goals of skilled team members needed for the project and short-term commitment that reflects the nature of the project. Exact start and end dates for team members are often negotiated to best meet the needs of individuals and the project. The staffing plan is also determined by the different phases of the project. Team members needed in the early or conceptual phases of the project are often not needed during the later phases or project closeout phases. Team members needed during the implementation phase are often not needed during the conceptual or closeout phases. Each phase has staffing requirements, and the staffing of a complex project requires detailed planning to have the right skills, at the right place, at the right time. Typically a core project management team is dedicated to the project from start-up to closeout. This core team would include members of the project management team: project manager, project controls, project procurement, and key members of the function management or experts in the technology of the project. Although longer projects may experience more team turnover than shorter projects, it is important in all projects to have team members who can provide continuity through the project phases. Project team members can be assigned to the project from a number of different sources. The organization that charters the project can assign talented managers and staff from functional units within the organization, contract with individuals or agencies to staff positions on the project, temporarily hire staff for the project or use any combination of these staffing options. This staffing approach allows the project manager to create the project organizational culture. Some project cultures are more structured and

GIS Project Management

453

detail oriented, and some are less structured with less formal roles and communication requirements. The type of culture the project manager creates depends greatly on the type of project.

1.31.2.3

Communications

Completing a complex project successfully requires teamwork, and teamwork requires good communication among team members. If those team members work in the same building, they can arrange regular meetings, simply stop by each other’s office space to get a quick answer, or even discuss a project informally at other office functions. Increasingly, however, team members hail from widely separated locations, and face-to-face meetings are replaced by electronic methods of communicating resulting in socalled virtual teams who may work synchronously and asynchronously. Communications technologies require a variety of compatible devices, software, and service providers, and communication with a global virtual team can involve many different time zones. Establishing effective communications, therefore, requires a communications plan.

1.31.2.4

Project Management and System Development Lifecycles

The Project Management Lifecycle (PMLC) defines how a project is managed effectively and efficiently from its conceptualization through implementation to its operationalization. A further term, namely the project lifecycle (PLC) is also used to describe this process. The terms PMLC and PLC are always used in conjunction with project management but they actually refer to two relatively distinct sets of concepts and processes. The purpose of a PLC is to describe the activities that must be completed in order to create a product or a service. The PLC of individual projects varies from one to another because of the uniqueness of the nature of each project. The Systems Development Lifecycle (SDLC) and Database Development Lifecycle (DDLC) are examples of the PLC for GIS implementation projects. The PLC focuses on the tasks that are necessary for a project. In contrast, the focus of the PMLC is on how these tasks can be managed. In this regard, the PMLC is more of a conceptual framework for the systematic application of managerial principles and best practices, rather than the actual steps of building a product or service. As such, the PMLC remains the same for all projects regardless of the PLC being employed. This means that while the PLC of a GIS project is markedly different from that of, for example, a project to build a new highway, the PMLC for both projects is essentially the same, in terms of the project management cycle or phases. Throughout the course of every project, the PLC and PMLC work in conjunction with one another. It is the project manager’s responsibility to ensure all PLC activities use the conceptual framework of the PMLC. The Project Management Institute (2013a) groups project activities generally into the five phases depicted in Fig. 3. This diagram is adapted from the PMBOK Guide (Project Management Institute, 2013a). The five PMLC phases are essentially sequential in nature. However, there is a feedback loop from the monitoring and control phase to the planning phase, as noted in Fig. 3. This loop can be followed as many times as is required until the project manager is satisfied that the project is sufficiently complete for it to be closed out and for the evaluation process to commence. The duration of each phase and the amount of overlap between sequential phases may vary considerably depending on the nature of the project, the complexity of the activities in, and hence the efforts and resources required by individual phases. The full PMLC starts with the initiation phase that aims to scope the project. The deliverable of the initiation phase is a document called the project proposal or business case that describes what problem or opportunity will be addressed by the project, the project goal, and objectives, the costs incurred and the resulting benefits, how success will be measured, and potential risks and obstacles that may be encountered. Once the project proposal has received approval from the management of the organization, the planning phase commences. The major deliverables and the participating work groups are identified, and the project team begins to take shape. While most of the activities in this phase are undertaken by one or a few individuals that will form the core of the project team, it is common practice to hold a formal planning session for all stakeholders who will affect or be affected by the project. The deliverable of the planning activities is a detailed project plan that provides a description of each project activity, the resources required to complete the activities, the project schedule, different milestones for the delivery of intermediate results, as well as the dates for acceptance tests and final delivery of the project products. The project execution phase, also commonly called the project implementation phase, is probably the most labor- and resourceintensive phase of the PMLC. This phase usually starts with the organization of the project team. Members of a typical project team include people transferred internally from other departments within the organization, new staff hired specifically for the project, and contract staff from external consulting firms. The actual people assigned to work on each activity of the project are identified, and detailed descriptions of the activities are developed, reviewed, and signed off. This signifies the actual design, building, and testing of the GIS to be delivered by the project. The monitoring and control phase begins as soon as the implementation activities have commenced. It runs in parallel with the project execution phase as a way of quality assurance and quality control. Generally speaking, change management is the most critical component of this particular phase, and very clear protocols and procedures must be established to ensure that when conflicts arise (between users and developers as well as between different users), the problem(s) can be addressed expediently and effectively. As change requests always have some impact on the initial time, cost and resource allocation, adjustment to the original project plan is necessary, and the feedback loop is then activated.

454

GIS Project Management

Project Initiation Managerial approval

Identify problem / opportunity Establish project goal Define project objectives Perform cost/benefit analysis Determine success criteria List assumptions, risks and obstacles

Project Planning

Managerial approval

Identify project activities Estimate resource requirements Construct workflow Prepare project proposal

Project Execution Feedback loop (activated as part of change management process)

Recruit / organize project team Establish team operating rules Assemble project resources Schedule / execute work plan Document work progress

Monitoring and Control

Running in parallel with

Monitor project progress against plan Establish reporting protocol / procedures Install change management procedures Establish problem resolution mechanism Revise project plan if necessary

Close-out and Evaluation Conduct acceptance test Establish roll out plan / schedule Complete project documentation Conduct post-implementation audit Complete final project report Fig. 3

Preparing for completion of the project

The five phases of the project management lifecycle.

Preparation for the final phase of the project usually starts well ahead of the conclusion of the execution phase. In this phase, the closeout activities include the installation and testing of the deliverables, post-implementation audit or evaluation, and the compilation of the final project report summarizing all project progress reports, acceptance of test results, and a brief description of the lessons learned. The final project report may also include recommendations to enhance and refine the GIS in response to anticipated changing user needs and advancements in the technological environment.

1.31.2.4.1

Project initiation

The start-up of a project is similar to the start-up of a new organization. The project leader develops the project infrastructure used to design and execute the project. The project management team must develop alignment among the major stakeholders on the project during the early phases or definition phases of the project. Logically, project initiation is the first phase of the PMLC. In practice, however, project managers actually start at the end and work backward mentally. Therefore, the discussion of project initiation starts first by defining the required outputs upon completion of the project, then makes preliminary estimates of the resources required, and finally develops the strategies required to construct and deliver the project output on time, within budget, and according to specification. All projects are created for a reason. Someone identifies a need or an opportunity and devises a project to address that need. How well the project ultimately addresses that need defines the project’s success or failure. Often, the pressure to get results encourages people to go right into identifying possible solutions without fully understanding the need or what the project is trying to accomplish. This strategy can create a lot of immediate activity, but it also creates significant chances for waste and mistakes if the wrong need is addressed. One of the best ways to gain approval for a project is to clearly identify the project’s objectives and describe the need or opportunity for which the project will provide a solution. Activities in the project initiation phase are conducted mostly by the project manager, possibly with the aid of one or more systems analysts and application specialists who are identified as potential members of the future project team. The very first

GIS Project Management

455

task of project initiation is to define the goal of the project including the scope of the project in terms of what is included and what is excluded. Furthermore, the goal statements identify constraints of time, cost, and specification. The plan for developing and tracking the detailed schedule, the procurement plan, and the plan for building the budget and estimating and tracking costs are developed during the start-up. The plans for information technology, communication, and tracking client satisfaction are also all developed during the start-up phase of the project. Flowcharts, diagrams, and responsibility matrices are tools to capture the work processes associated with executing the project plan. The first draft of the project procedures manual captures the historic and intuitional knowledge that team members bring to the project. The development and review of these procedures and work processes contribute to the development of the organizational structure of the project. 1.31.2.4.1.1 Writing a (request for) proposal GIS projects can be implemented by internal staff or by external consultants either in total or in part (a process commonly referred to as outsourcing). There are three common approaches to outsourcing, namely sole source, invitation to tender (ITT), and request for a proposal (RFP). As the name implies, outsourcing by sole source means that the entire contract for a project is awarded to a single contractor without going through a competitive procurement process. When compared with the other two approaches, sole sourcing is less flexible and not as rigorous a procedure, although it takes much less time to pick a contractor from a list of contractors of record and start the project. For public organizations, sole sourcing can be seen as a form of favoritism, and the decision may be challenged by consultants or companies who feel that they have not been fairly treated. Hence, sole sourcing should be avoided except in the cases where the chosen consultant is the only possible candidate who can supply the material or service required. An ITT and an RFP are competitive processes with different intents and purposes. An ITT is used when the organization is absolutely clear what it wants, and how it wants things to be done. Suppliers or vendors are openly invited to bid for the contract on the basis of factors such as price and ability to meet the specified requirements. An ITT is most suitable in situations where the supply of materials (e.g., computers and peripherals) and the type of services involved are relatively well-defined (e.g., facility maintenance and security services). An RFP, on the other hand, is used when the organization knows generally or exactly what it wants, but prefers to seek solutions from the consulting community. In GIS implementation projects, an RFP is a more prevalent approach than an ITT in terms of soliciting external consulting services. An RFP is a relatively complex procurement process that demands considerable effort and expenditure on both the part of the requesting organization and the responding consultants. The complexity of an RFP is dependent on its objective and scope, which may cover all or a substantial part of the project execution activities. There is no standard procedure for conducting an RFP but the workflow of a typical RFP exercise starts with the approval of the recommendation in a project proposal to elicit external work. The project team then prepares the RFP information package for approval. Since an RFP always ends up with a contract between the organization and the selected consultant, legal advice must be sought in advance so that the wording and general content can be reviewed from a legal standpoint, and a sample of the contract to be signed can be prepared. Any changes that are made to the RFP or the contract should, of course, be re-scrutinized by legal counsel before either document is issued and signed off on. The extent to which individual components are included in the final RFP document and the steps that are followed in the process of compiling the RFP document is to some extent a function of the scale of the work that is being called for. Some tasks are relatively straightforward and have minimal risk associated with them. However, other tasks and indeed overall projects have considerable risks, especially where subcontractors may be involved or where the actual configuration of outputs is not clear at the time of developing the RFP document. In the latter case, care should be taken not to rush the preparation of the RFP document as errors or omissions that are made at the stage of the call for work will likely be compounded into the work that is produced. In the former case, the lines of responsibility for the completion of any subcontracted work must be made clear in the RFP document and all legal issues concerning subcontracting must be accounted for. Writing a well-conceived goal statement requires considerable brainstorming and discussion with the project sponsor and representatives of potential users of the resulting GIS. It is sometimes also necessary to research the possible relationships between the project and legislated responsibilities and regulatory obligations of the organization, as well as safety and quality standards of the GIS that will result from the project. Goal statements should be SMART, i.e.: l l l l l

Specific, so that any individuals with basic knowledge of the project can understand the goal(s). The statement should be concise and clear, to avoid ambiguity or project creep within the context of the project’s activities. Measurable, to determine clearly whether particular goal statements were achieved. Agreed upon, by the sponsor and representatives of potential stakeholders. Realistic or achievable; this is measured in terms of affordability and fiscal sustainability, acknowledging technical and political constraints. Time-framed, i.e., with a detailed schedule (see section “Project execution and control”).

A project proposal, also referred to as a project overview statement or project definition form, is a document that summarizes the findings of the project initiation phase for approval by senior management of the organization. A Project Business Case (PBC) form can be developed for this purpose using a template. This approach is often preferred because a template is easy to create, edit, read, and standardize so that all projects within the same organization can be presented for management consideration in the same format.

456

GIS Project Management

Application development

Requirements study

DB conceptual design DB logical / physical design

Data acquisition

Search for data resources

Data loading

HW / SW acquisition Research on HW / SW advances Fig. 4

Systems testing / integration

Database operationalization and roll-out

HW / SW installation User training

Project planning flow chart.

Alternatively, a Proposed Solution (PS) form can be used for this purpose. The PBC form contains information that aims to help senior managers understand the justifications of the project. The information items in the form should be written in plain and concise business language. The PS form, on the other hand, provides a clear description of the proposed solutions. If alternative solutions are proposed to address a particular problem, a clear explanation of the pros and cons of different approaches as well as the reasons for picking the final choice must be given.

1.31.2.4.2

Project planning

The GIS project set-up entails defining the management structure within which the GIS project will be executed as depicted in Fig. 4. This includes: 1.31.2.4.2.1 Project steering group The GIS project steering group represents the interest of the stakeholders for whom the system is designed. It should be chaired by the project sponsor and should include in particular those agents who are funding the system and those who provide the pertinent (local) data. The role of the GIS project steering group is to provide guidance to the GIS project team. 1.31.2.4.2.2 GIS project team The GIS project team is in charge of executing the design, development, and installation activities under the guidance of the GIS project steering group. The GIS steering group selects the project team and receives its reports. The GIS project team includes at least: 1.31.2.4.2.2.1 Project manager A project manager is in charge of the overall coordination of the GIS design, development, and installation. The project manager is the key person in the project team. In many large organizations, such as government agencies or engineering consulting firms, there are professional project managers on the permanent staff whose job is to lead corporate projects. However, it is commonplace for organizations to appoint project managers from existing unit managers or senior IT personnel who have the training and experience to assume such a job function. All project managers are expected to be strategic in their thinking, tactful in dealing with people, and knowledgeable in the business area served by the project. The project manager serves as the chief executive officer of a project. He or she is the technical advisor to the project sponsor, mentor and supervisor of members of the project team, and representative of the project when dealing with internal and external stakeholders. The project manager usually has to work closely with members of the IT department to ensure adherence to corporate standards, protocols, and resource sharing policies. It is the responsibility of the project manager to recruit members for the project team and organize them into a coherent working group. Team members can be co-opted from internal staff, or hired externally on a contractual or permanent basis if internal expertise is not available. The number of members of a project team varies according to the nature of a particular project. However, it is essential to recruit and choose team members with the understanding that they have collectively all of the necessary skills that are required to complete the project successfully. For large-scale GIS implementation projects, it is helpful to divide the team into small working groups, each headed by an experienced technical lead, such as map data conversion and acquisition, database design and system development, quality assurance and control, and end-user training. This makes the team more manageable and creates a clear sense of accountability and responsibility.

GIS Project Management

457

Experience has shown that while recruiting is seldom a problem, organizing members into a coherent high-performance team can be problematic. It is a real challenge for the project manager to use his or her people skills to ensure the commitment of the members to the project (i.e., their other competing duties or tasks will not negatively impact on the project). The project manager must also keep motivating team members continuously throughout the course of the project by mentoring team members, practicing open communication, giving mutual understanding and respect, recognizing achievements, as well as using disciplinary actions where and if necessary. 1.31.2.4.2.2.2 Other team members As such a project team has to fit into the larger organization context, an organizational chart as depicted in Fig. 5 is helpful for all stakeholders: Application specialists. A database architect. l Software programmer(s). l Independent validation team members. l l

1.31.2.4.2.3 Defining the project scope The project scope is prepared by the GIS project team and describes all tasks to be carried out in relation to the design, development, and installation of the GIS. The scope has to be approved by the GIS project steering group during a kick-off meeting between the project team and the project steering group. See also section “Scope” for more on the project scope. 1.31.2.4.2.4 Sequencing of project tasks The aim of the project work plan is to determine the most efficient order for the tasks by maximizing the use of resources. This is usually accomplished by having more than one task in progress at the same time and by preventing the delay of any given task from holding up the start of another task. The concept of “precedence relationships” between individual tasks is the governing principle for project scheduling. This concept identifies tasks either as “dependent” or “independent”. Dependent tasks are those that cannot proceed until another task is completed, whereas independent tasks are those that can be carried out any time during the course of project execution. As noted earlier, an effective way of project scheduling is to consider the project from the end point and work backward to the starting point. In this way, it is relatively easy to identify those tasks that must be completed prior to the commencement of their respective predecessors. A complex project is made manageable by first breaking it down into individual components in a hierarchical structure, introduced as the work breakdown structure or the WBS in section “Time”. A commonly used tool to list out all the phases, activities, and tasks is the Gantt chart. It also illustrates estimates of how long each should take and the dates tasks should begin and end. A PERT chart displays the tasks in a project along with the dependencies between these tasks. Using a PERT chart is a great way to define and display the dependency relationships that exist between tasks. The PERT chart is used to document and track the critical path of the project; i.e., what essential tasks must get done in exactly what order to successfully complete the project. Section “Scheduling tools” provides further details on such scheduling tools. Another essential component of the project plan is the identification of resources required for each task. The project team should first identify the type of resources needed, and then estimate the cost for that particular task. Cost can be estimated in one of two ways, namely a fixed cost (e.g., $5000 per year for a multiuser software license) or a variable cost (e.g., 50 hours of initial Java programming and 20 hours or less of additional programming work per year for maintenance and update). The project team should

Corporate management

Clients /

Project sponsor

end users

Representatives of clients and end users

Fig. 5

Project

Corporate IT

manager

department

Application

Facility

GIS

Corporate DB

specialists

planner(s)

specialists

administrator

Organigram of a GIS project and its setting in a larger organizational context.

458

GIS Project Management

carefully consider the matching of available resources and estimated resources. If for example, a particular task requires skills that are not found among project team members, then arrangements must be made for outside assistance. This may necessitate amendments to estimated time and costs, and the project plan must be adjusted accordingly. 1.31.2.4.2.5 Project requirements 1.31.2.4.2.5.1 Requirement engineering The first step after defining the scope of the project is to conduct a needs assessment. During this task, the GIS project team reviews and details the user requirements provided in the preliminary analysis, and derives a consolidated set of products and system requirements, including functional needs, data needs, and processing needs. While these all flow into the technical specifications of the next section, it is important to integrate existing user practices. Because the incorporation of a new system may alter or conflict with existing systems and procedures at the level of user organizations, the linkage between the future GIS and existing systems and procedures at the level of the user organizations has been adequately examined. Recommendations to reengineer these existing systems and procedures are formulated on the basis of a detailed analysis of the current working procedures. The results of this task are reported in a document called Requirements Baseline (RB). 1.31.2.4.2.5.2 Technical specifications In response to the consolidated user requirements, the GIS project team provides a technical answer to the requirements baseline with a detailed and complete specification of the products information system expected. The results of this activity are reported in a Technical Specifications document and an associated Inventory of Existing Datasets, Design Justification Report, and Data Model Report. The DJR assembles analyses performed by the GIS project team on all implementation choices for proposed GIS. It describes, in particular, all trade-offs, design choice justifications, feasibility analysis, make-or-buy decisions, and supporting technical assessments done during the software development. Ideally, the data model report is developed in accordance with the requirements of the ISO 19000 series and here, in particular. This includes, in particular, the adoption of the ISO 19104 terminology, the development of data model using the Universal Modeling Language (UML), and the adoption of ISO 19115 requirements for metadata modeling. 1.31.2.4.2.5.3 System qualification planning This planning process is a response to the Requirements Baselines and includes a “scientifically sound” validation protocol for all products to be generated by the information system, including the description of all ground and ancillary data available. Problems, such as lack of insufficient validation data need to be investigated, the impact assessed, and the solutions identified. The results of this task are recorded in a System Qualification Plan (SQP). 1.31.2.4.2.6 Comparing options using a weighted decision matrix Sometimes, there are multiple options to choose from when determining requirements and deciding which project to work on. One of the preferred tools to select the best option is a weighted decision matrix, which is a simple form of linear programming. A basic decision matrix consists of establishing a set of criteria for options that are scored and summed to gain a total score that can then be ranked. Importantly, it is not weighted to allow a quick selection process. A weighted decision matrix operates in the same way as the basic decision matrix but introduces the concept of weighting the criteria in order of importance. The resultant scores reflect better the importance to the decision maker of the criteria involved. The more important a criterion, the higher the weighting it receives. Each of the potential options is scored and then multiplied by the weighting given to each of the criteria to produce a result. The advantage of the weighted decision matrix is that subjective opinions about one alternative versus another can be made more objective. Another advantage of this method is that sensitivity studies can be performed. 1.31.2.4.2.7 Financial considerations In many new project endeavors, it is important to find out if the project is financially feasible. Measures for such determination include the net present value (NPV), the rate of return (ROI), and payback analysis. 1.31.2.4.2.7.1 Net present value A dollar earned today is worth more than a dollar earned one or more years from now. The NPV of a time series of cash flows, both incoming and outgoing, is defined as the sum of the present values (PVs) of the individual cash flows of the same entity. In the case when all future cash flows are incoming and the only outflow of cash is the purchase price, the NPV is simply the PV of future cash flows minus the purchase price (which is its own PV). NPV is a standard method for using the time value of money to appraise long-term projects. Used for capital budgeting and widely used throughout economics, finance, and accounting, it measures the excess or shortfall of cash flows, in present value terms, once financing charges are met. NPV can be described as the “difference amount” between the sums of discounted cash inflows and cash outflows. It compares the present value of money today to the present value of money in the future, taking inflation and returns into account. The NPV of a sequence of cash flows takes as input the cash flows and a discount rate or discount curve and outputs a price. Each cash inflow/outflow is discounted back to its present value (PV). Then they are summed. Therefore, NPV is the sum of all terms. NPV is an indicator of how much value an investment or project adds to the firm. With a particular project, if NPV is a positive value, the project is in the status of positive cash inflow in the time t. If NPV is a negative value, the project is in the status of discounted cash outflow in the time t. Sometimes, risky projects with a positive NPV could be accepted. This does not necessarily mean

GIS Project Management

459

that they should be undertaken since NPV at the cost of capital may not account for opportunity cost (i.e., comparison with other available investments). 1.31.2.4.2.7.2 Return on investment Return on Investment is a performance measure used to evaluate the efficiency of an investment or to compare the efficiency of a number of different investments. It is one way of considering profits in relation to capital invested. This is calculated by subtracting the project’s costs from the benefits and then dividing by the costs. Return on investment studies of GIS projects are a relatively new field of academic research. The URISA Journal devoted an issue of its 2015 volume to this topic (URISA Journal, 2015). 1.31.2.4.2.7.3 Payback analysis Payback analysis is important in determining the amount of time it will take for a project to recoup its investments. This is the point at which the benefits start to outweigh the costs. The best way to see that is by charting the cumulative benefits and costs.

1.31.2.4.3

SCRUM development overview

“Scrum” is formal project management/product development methodology and part of agile project management. Scrum is a term from rugby (scrimmage) that means a way of restarting a game. It’s like restarting the project efforts every X weeks. It is based on the idea that it is impossible to plan the whole project up front; therefore, a good program manager starts by building on empirical data, and then re-plans and iterates from there. Scrum uses sequential sprints for development. Sprints are like small project phases (ideally 2 to 4 weeks). The idea is to take 1 day to plan for what can be done now, then develop what was planned for, and demonstrate it at the end of the sprint. Scrum uses a short daily meeting of the development team to check what was done yesterday, what is planned for the next day, and what if anything is impeding the team members from accomplishing what they have committed to. At the end of the sprint, what has been demonstrated can then be tested, and the next sprint cycle starts. 1.31.2.4.3.1 SCRUM roles Scrum methodology defines several major roles. They are: Product owners: essentially the business owner of the project who knows the industry, the market, the customers, and the business goals of the project. The product owner must be intimately involved with the Scrum process, especially the planning and the demonstration parts of the sprint. Scrum Master: somewhat like a project manager, but not exactly. The Scrum Master’s duties are essentially to remove barriers that impede the progress of the development team, teach the product owner how to maximize return on investment (ROI) in terms of development effort, facilitate creativity and empowerment of the team, improve the productivity of the team, improve engineering practices and tools, run daily standup meetings, track progress, and ensure the health of the team. Development team: self-organizing (light-touch leadership), empowered group; they participate in planning and estimating for each sprint, do the development, and demonstrate the results at the end of the sprint. The development team can be broken into “teamlets” that “swarm” on user stories, which are created in the sprint planning session. Planning meetings for each sprint require participation by the product owner, the Scrum Master, and the development team. In the planning meeting, they set the goals for the upcoming sprint and select a subset of the product backlog to work on. The development team decomposes these into tasks and estimates them. The development team and product owner do final negotiations to determine the backlog for the following sprint. The Scrum methodology has its own set of metrics to help with future planning and tracking of progress.

1.31.2.4.4

System prototyping

The objective of this phase is to prototype the GIS project and to demonstrate a preliminary compliance with the consolidated user requirements by executing a representative set of the products to be routinely generated by the future system. The system prototype does not feature the entire range of requirements which have been identified in the requirement consolidation, but should be complete enough to allow a proper demonstration of the products to the GIS project steering group and to justify its authorization to proceed. 1.31.2.4.4.1 Software prototype design During this task, the GIS project team documents all architectural software elements of the GIS following the instructions of the Design Justification Report (DJR) and the Data Model Report (DMR). Moreover, this design task also lists and details all verification test procedures for the validation of the different modules. The results of this task are reported in a Design Definition Report (DDR), a Data Model Implementation Report (DIR), and a Verification Test Procedures (VTP). 1.31.2.4.4.2 Software prototype development This task encompasses the effective coding of the different software modules identified and their preliminary integration into a GIS software prototype. The output of this task is the GIS software prototype.

460

GIS Project Management

1.31.2.4.4.3 Sample data acquisition In order to test the GIS prototype and to check its ability to generate products that conform to the requirements baseline and technical specifications, sample data needs to be acquired. These data are used for the execution of sample products (see next paragraph) and have to conform to the input data types required by the technical specifications. 1.31.2.4.4.4 Sample product execution On the basis of the GIS software prototype, the project team implements a sample production. The outputs of this task are sample products generated by the GIS software prototype.

1.31.2.4.5

Project execution and control

Project implementation (or project execution) is the third phase of the project management lifecycle, where visions and plans become reality. This is the logical conclusion, after evaluating, deciding, visioning, planning, applying for funds, and finding the financial resources of a project. The implementation phase involves putting the project plan into action. It’s here that the project manager will coordinate and direct project resources to meet the objectives of the project plan. As the GIS project unfolds, it’s the project manager’s job to direct and manage each activity, every step of the way. The better the original plan, the easier it will be for the project manager to handle any problems that come up (Tomlinson, 2005). It is important to take into account that independently of the nature of the project, implementation takes time, usually more than it is planned and that many external constraints can appear, which should be considered when initiating the implementation step, for example, the seasonality in the availability of community engagement/resources). The basic requirement for starting the implementation process is to have the work plan ready and understood by all the actors involved. Technical and non-technical requirements have to be clearly defined and the financial, technical and institutional frameworks of the specific project have to be prepared considering the local conditions. The working team should identify their strengths and weaknesses (internal forces), opportunities and threats (external forces). The strengths and opportunities are positive forces that should be exploited to efficiently implement a project. The weaknesses and threats are hindrances that can hamper project implementation. The implementers should ensure that they devise means of overcoming them. Another basic requirement is that the financial, material and human resources are fully available for the implementation. (NETSSAF, 2008)

Other actions need to be taken before work can begin to implement the detailed action plan, including: Scheduling activities and identifying potential bottlenecks. Communicating with the members of the team and ensuring all the roles and responsibilities are distributed and understood. l Providing for project management tools to coordinate the process. l Ensuring that the financial resources are available and distributed accordingly. l l

The implementation phase is where the project team actually does the project work to produce the deliverables. The word “deliverable” means anything the project delivers, including all of the products or services, and last but not least the documentation of these deliverables. The steps undertaken to build each deliverable will vary depending on the type of project, and cannot, therefore, be described here in any real detail. For instance, engineering and telecommunications projects will focus on using equipment, resources, and materials to construct each project deliverable, whereas computer software projects may require the development and implementation of software code routines to produce each project deliverable. The activities required to build each deliverable will be clearly specified within the project requirements document and project plan. Beyond mere direction of the work and delivery of results, it is the job of the project manager to also keep track of how the project team performs. The implementation phase keeps the project plan on track with careful monitoring and control processes to ensure the final deliverable meets the acceptance criteria set by the customer. This phase is typically where approved changes are implemented. Most often, changes are identified by looking at performance and quality control data. Routine performance and quality control measurements should be evaluated on a regular basis throughout the implementation phase. Gathering reports on those measurements will help to determine where the problem is and recommend changes to fix it. 1.31.2.4.5.1 Project charter A project charter, project definition, or project statement is a statement of the scope, objectives, and participants in a project. It provides a preliminary delineation of roles and responsibilities, outlines the project objectives, identifies the main stakeholders, and defines the authority of the project manager. It serves as a reference of authority for the future of the project. The purpose of a project charter is to: Provide an understanding of the project, the reason it is being conducted, and its justification. Establish early on in the project the general scope. l Establish the project manager and his or her authority level. A note of who will review and approve the project charter must be included. l l

GIS Project Management

461

1.31.2.4.5.2 Project schedule planning In order to develop our schedule, we first need to define the activities, sequence them in the right order, estimate the resources needed, and estimate the time it will take to complete the tasks. The activity definition process is a breakdown of the project into work package elements. It documents the specific activities needed to fulfill the deliverables detailed in the project plan. These activities are not the deliverables themselves but the individual units of work that must be completed to fulfill the deliverables. Activity definition uses everything we already know about the project to divide the work into activities that can be estimated. Expert judgment in the form of project team members with prior experience developing project scope statements can help to define activities. They may be employed to review an activity list created by the project manager, or they could be involved from the very beginning. Once the activity definitions for the work packages have been completed, the next task is to complete the activity list. The project activity list is a list of everything that needs to be done to complete the project, including all the activities that must be accomplished to deliver each work package. The next step is to define the activity attributes starting with a description of each activity and determining the sequence of the work. Any predecessor activities, successor activities, or constraints should be listed in the attributes along with descriptions as well as any other information about resources or time. All of the important checkpoints of the project are tracked as milestones. Some of them could be listed in a contract as requirements of successful completion; some could just be significant points in the project that the manager wants to keep track of. The milestone list needs to let everyone know which milestones are required and which are not. 1.31.2.4.5.3 Scheduling tools For all but the smallest projects, it is useful to familiarize oneself with scheduling tools. The following paragraphs provide a brief introduction to four of them. 1.31.2.4.5.3.1 Gantt chart A Gantt chart is a type of bar chart that illustrates a project schedule. Gantt charts are easy to read and are commonly used to display schedule activities. These charts display the start and finish dates of the terminal elements and summary elements of a project. Terminal elements and summary elements comprise the work breakdown structure of the project. Some Gantt charts also show the dependency relationships (i.e., precedence network) between activities. Gantt charts show all the key stages of a project and their duration as a bar chart, with the time scale across the top. The key stages are placed on the bar chart in sequence. A Gantt chart can be drawn quickly and easily and is often the first tool a project manager uses to provide a rough estimate of the time that it will take to complete the key tasks. Sometimes it is useful to start with the target deadline for completion of the whole project because it is soon apparent if the time scale is too short or unnecessarily long. The detailed Gantt chart is usually constructed after the main objectives have been determined. 1.31.2.4.5.3.2 Network diagrams Many project managers use network diagrams when scheduling a project. The network diagram is a way to visualize the interrelationships of project activities. Network diagrams provide a graphical view of the tasks and how they relate to one another. The tasks in the network are the work packages of the project. All tasks must be included in the network because they have to be accounted for in the schedule. Leaving even one task out of the network could change the overall schedule duration, estimated costs, and resource allocation commitments. The first step is to arrange the tasks into a sequence. Some tasks can be accomplished at any time throughout the project where other tasks depend on input from another task or are constrained by time or resources. The network diagram provides important information to the project team. It provides information about how the tasks are related, where the risk points are on the schedule, how long it will take as currently planned to finish the project, and when each task needs to begin and end. 1.31.2.4.5.3.3 PERT Another way to show how tasks relate is with the activity-on-arrow (AOA) diagram. Although AON is more commonly used and is supported by all project management programs, PERT is the best-known AOA-type diagram and is the historical basis of all network diagramming. The main difference is the AOA diagram is traditionally drawn using circles as the nodes, with nodes representing the beginning and ending points of the arrows or tasks. In the AOA network, the arrows represent the activities or tasks. 1.31.2.4.5.3.4 Critical path The critical path describes the sequence of tasks that would enable the project to be completed in the shortest possible time. It is based on the idea that some tasks must be completed before others can begin. A critical path diagram is a useful tool for scheduling dependencies and controlling a project. In order to identify the critical path, the length of time that each task will take must be calculated.

1.31.2.4.6

Resource planning

Another essential component of the project plan is the identification of resources required for each task. The project team first identifies the type of resources needed, and then estimates the cost for that particular task. Cost can be estimated in one of two ways, namely a fixed cost (e.g., $5000.00 per year for a multiuser software license) or a variable cost (e.g., 50 hours of initial Java programming and 20 hours or less of additional programming work per year for maintenance and update).

462

GIS Project Management

Resources are people, equipment, place, money, or anything else that is needed to perform all the planned activities. Every item on the activity list needs to have resources assigned to it. Before resources can be assigned to a project, one needs to determine their availability. Resource availability includes information about what resources are necessary, when they’re available, and the conditions of their availability. If for example, a particular task requires skills that are not found among project team members, then arrangements must be made for outside assistance. This may necessitate amendments to estimated time and costs, and the project plan must be adjusted accordingly. It is important to schedule some resources such as consultants or training rooms way in advance, as they might be available only at certain times. This resource estimation is the basis for determining how long each activity will take, a process known as activity duration estimation, where the project manager identifies how long it takes to perform each activity, starting with the information about that activity and the resources that are assigned to it, and then working with the project team to come up with an estimate.

1.31.2.4.7

Procurement management

The procurement effort in projects varies widely and depends on the type of project. Often the client organization will provide procurement services in less complex projects. In this case, the project team identifies the materials, equipment, and supplies needed by the project, and provides product specifications and a detailed delivery schedule. When the procurement department of the parent organization provides procurement services, a liaison from the project can help the procurement team better understand the unique requirements of the project and the time-sensitive or critical items of the project schedule. On larger, more complex projects, personnel is dedicated to procuring and managing the equipment, supplies, and materials needed by the project. Because of the temporary nature of projects, equipment, supplies, and materials are procured as part of the product of the project or for the execution of the project. In GIS projects, we distinguish data, functional, and processing needs: Data needs: Every GIS project requires a data inventory. This needs to be broken down into which maps or data are important for successful completion of each function in the unit. l Data is rarely perfect; It is hence important to describe problems of current data and point out future needs as well. l Although maps are usually the final output of a GIS project, creating a map inventory form will help clarify issues involved in map use down the road. l l

Functional needs: Identify activities which an organization performs to carry out its mission. Identify all of their organizational units. List the functions that require maps or other geographic information. Huxhold (1996) has a nice overview list of functions requiring geographic information by department and function. Processing needs: l l

Define how the data are to be used to fulfill the functional needs of the organization. An application definition form contains data input requirements, processing requirements, and output products.

More complex projects will typically procure through different acquisition and management methods. Commodities are common products that are purchased based on the lowest bid. Commodities include items like concrete for building projects, office supplies, or even lab equipment for a research project. The second type of procurement includes products that are specified for the project. Vendors who can produce these products bid for a contract. The awarding of a contract can include price, ability to meet the project schedule, the fit for the purpose of the product, and other considerations important to the project. These vendors’ performances become important parts of the project, and the project manager assigns resources to coordinate the work and schedule of the vendor. The third procurement approach is the development of one or more partners. A design firm that is awarded the design contract for a major part of an airport and a research firm that is conducting a study of passenger flows are examples of potential project partners. A partner contributes to and is integrated into the execution plan. Partners perform best when they share the project vision of success and are emotionally invested in the project. The project management team builds and implements a project procurement plan that recognizes the most efficient and effective procurement approach to support the project schedule and goals. Procurement management follows a logical order starting with a plan for the whole project. Before doing anything else, it is important to identify all of the work that will be contracted out so that one can plan for any purchases and acquisitions. For each of these items, one then needs to set up the contract, identify the metrics that will be used to determine that the work is considered successful, pick a seller, and have a process in place to administer the contract once the work is happening. The procurement management plan details how the procurement process will be managed. It includes the following information: The types of contracts and any metrics that will be used to measure the contractors’ performance. The planned delivery dates for the work or products. l The company’s standard documents. l l

GIS Project Management

463

The number of vendors or contractors involved, and how they will be managed. How purchasing may impact the constraints and assumptions of the project plan. l The coordination of purchasing lead times with the development of the project schedule. l The identification of prequalified sellers (if known). l l

The procurement management plan, like all other management plans, becomes a subsidiary of the project management plan. Some tools and techniques used during the procurement planning stage include make-or-buy analysis and definition of the contract type. 1.31.2.4.7.1 Spatial data acquisition and evaluation During this task, the complete set of input data identified in the requirements baseline and technical specifications (see section “Sample data acquisition”) is actually acquired. If data acquisition turns out to be a complex or long-term task (e.g. programming of aerial photograph or Lidar surveys), it is recommended to establish first a data acquisition plan. Data acquisition is often regarded as the most challenging and expensive part of a GIS project. However, maintaining data quality through the lifecycle of a database system can, in total, create greater challenges and be even more expensive. Spatial data are much more readily available now than ever before. With the construction of global and national geospatial data infrastructures, data warehouses, and Web-based information dissemination technologies, GIS projects now rely more on third-party data than on internal data conversion. New data collection technologies are now able to capture spatial data directly in digital form quickly and with a higher degree of accuracy than in the past. As a result, the focus of data in GIS projects has moved from acquisition to quality or usability. However, up to now greater data availability has not necessarily been translated into higher usability and, therefore, data and data quality remain missioncritical issues in GIS projects, and the compliance of all input data delivered with their technical specifications needs to be checked (see section “Monitoring and control” further down). 1.31.2.4.7.2 Technology acquisition and evaluation Technology acquisition and evaluation is a relatively straightforward process when compared with data acquisition and evaluation. Technology, in this case, refers to computer hardware, software, and local and wide area networks, together with related peripherals and supplies. In many organizations, GIS must be implemented using corporate architectures and standards. Therefore, a GIS project team is often more concerned with the evaluation of the suitability of corporate resources specifically for spatial applications, than with their acquisition. Developing a technology evaluation plan is a relatively complex task. It normally starts with a review of a user requirements study that seeks to identify both the hardware, software and network needs of the deliverables of the project. The project team then identifies the key features of these relative to the generation of the deliverables. These features are progressively broken down into further levels of detail and recorded in a software evaluation manual that is regularly updated during the process of development. The features to be evaluated are assigned either as a “must” or “should” requirement to indicate their relative significance in the evaluation process. In the evaluation process, potential vendors are required to supply in detail specific information about each of the features of their products that are to be evaluated. The project team then reviews the completed product evaluations and shortlists the four or five best submissions that are deemed to meet the requirements of the project. 1.31.2.4.7.3 HR planning The most important resource to a project is its peopledthe project team. Projects require specific expertise at specific moments in the schedule, depending on the milestones being delivered or the given phase of the project. An organization can host several strategic projects concurrently over the course of a budget year, which means that its employees can be working on more than one project at a time. Alternatively, an employee may be seconded away from his or her role within an organization to become part of a project team because of a particular expertise. Moreover, projects often require talent and resources that can only be acquired via contract work and third-party vendors. Procuring and coordinating these human resources, in tandem with managing the time aspect of the project, are critical to the overall success. Through performance evaluation, the manager will get the information needed to ensure that the team has adequate knowledge, to establish a positive team environment and a healthy communication climate, to work properly, and to ensure accountability. Managing the project team includes an appraisal of employee performance and project performance. The performance reports provide the basis for managerial decisions on how to manage the project team. Working with other people involves dealing with them both logically and emotionally. A successful working relationship between individuals begins with appreciating the importance of emotions and how they relate to personality types, leadership styles, negotiations, and setting goals. Emotions are both a mental and physiological response to environmental and internal stimuli. Leaders need to understand and value their emotions to appropriately respond to the client, project team, and project environment. The Myers-Briggs Type Indicator (MBTI) is one of most widely used tools for exploring personal preference, with more than two million people taking the MBTI each year. The MBTI is often referred to as simply the Myers-Briggs. It is a tool that can be used in project management training to develop awareness of preferences for processing information and relationships with other people.

464

GIS Project Management

On larger, more complex projects, some project managers will use the Myers-Briggs as a team-building tool during project startup. This is typically a facilitated work session where team members take the Myers-Briggs and share with the team how they process information, what communication approaches they prefer, and what decision-making preferences they have. This allows the team to identify potential areas of conflict, develop communication strategies, and build an appreciation for the diversity of the team. No particular leadership approach is specifically appropriate for managing a project. Due to the unique circumstances inherent in each project, the leadership approach and the management skills required to be successful vary depending on the complexity profile of the project. However, the Project Management Institute published Shi and Chen’s (2006) research that studied project management leadership traits and concluded that good communication skills and the ability to build harmonious relationships and motivate others are essential. Beyond this broad set of leadership skills, the successful leadership approach will depend on the profile of the project. For example, a transactional project manager with a strong command-and-control leadership approach may be very successful for a small software development project or a construction project, where tasks are clear, roles are well understood, and the project environment is cohesive. This same project manager is less likely to be successful on a larger, more complex project with a diverse project team and complicated work processes. Matching the appropriate leadership style and approach to the complexity profile of the project is a critical element of project success. Even experienced project managers are less likely to be successful if their leadership approach does not match the complexity profile of the project.

1.31.2.4.8

Budget planning

Every project boils down to money. If there was a bigger budget, one could probably get more people to do the project more quickly and deliver more. That’s why no project plan is complete without a budget. Regardless of the size of the project, and no matter how many resources and activities are in it, the process of figuring out the bottom line is always the same. It is important to come up with detailed estimates for all the project costs. Once this is compiled, all the cost estimates are combined into a budget plan. 1.31.2.4.8.1 Cost estimation techniques It is now possible to track the project according to that budget while the work is ongoing. Here are some tools and techniques for estimating cost: 1.31.2.4.8.1.1 Resource cost rates People who will be working on the project all work at a specific rate. Any materials used to build the project (e.g., wood or wiring) will be charged at a rate too. Determining resource costs means figuring out what the rate for labor and materials will be. 1.31.2.4.8.1.2 Vendor bid analysis Sometimes, it is necessary to work with an external contractor to get the project done. One might even have more than one contractor bid on the job. This tool is about evaluating those bids and choosing the winner. 1.31.2.4.8.1.3 Reserve analysis Most projects have cost overruns. If the risk of something expensive happening is known ahead of time, it is better to have some cash available to deal with it. Reserve analysis means putting some cash away in the case of overruns. 1.31.2.4.8.1.4 Cost of quality Fig. 1 alluded to the balance between costs and quality. A good project manager will weigh the cost of all quality-related activities into the overall budget. As it is cheaper to find bugs earlier in the project than later, there are always quality costs associated with everything a project produces. The cost of quality is a way of tracking the cost of those activities. 1.31.2.4.8.2 Managing the budget An activity can have costs from multiple vendors in addition to internal costs for labor and materials. Detailed estimates from all sources can be reorganized, so those costs associated with a particular activity can be grouped by adding the activity code to the detailed estimate. The detailed cost estimates can be sorted and then subtotaled by activity to determine the cost for each activity. 1.31.2.4.8.2.1 Managing the cash flow If the total amount spent on a project is equal to or less than the amount budgeted, the project can still be in trouble if the funding for the project is not available when it is needed. There is a natural tension between the financial people in an organization, who do not want to pay for the use of money that is just sitting in a checking account, and the project manager, who wants to be sure that there is enough money available to pay for project expenses. The financial people prefer to keep the company’s money working in other investments until the last moment before transferring it to the project account. The contractors and vendors have similar concerns, and they want to get paid as soon as possible so they can put the money to work in their own organizations. The project manager would like to have as much cash available as possible to use if activities exceed budget expectations. 1.31.2.4.8.2.2 Contingency reserves Most projects have something unexpected occur that increases costs above the original estimates. If estimates are rarely exceeded, the estimating method should be reviewed because the estimates are too high. It is impossible to predict which activities will cost more than expected, but it is reasonable to assume that some of them will. Estimating the likelihood of such events is part of the risk analysis discussed above.

GIS Project Management

465

Instead of overestimating each cost, money is budgeted for dealing with unplanned but statistically predictable cost increases. Funds allocated for this purpose are called contingency reserves. Because it is likely that this money will be spent, it is part of the total budget for the project. If this fund is adequate to meet the unplanned expenses, then the project will complete within the budget. 1.31.2.4.8.2.3 Management reserves If something occurs during the project that requires a change in the project scope, money may be needed to deal with the situation before a change in scope can be negotiated with the project sponsor or client. It could be an opportunity as well as a challenge. For example, if a new technology were invented that would greatly enhance the completed project, there would be additional cost and a change to the scope, but it would be worth it. Money can be made available at the manager’s discretion to meet needs that would change the scope of the project. These funds are called management reserves. Unlike contingency reserves, they are not likely to be spent and are not part of the project’s budget baseline, but they can be included in the total project budget. 1.31.2.4.8.3 Evaluating the budget A project manager must regularly compare the amount of money spent with the budgeted amount and report this information to managers and stakeholders. It is necessary to establish an understanding of how this progress will be measured and reported. 1.31.2.4.8.3.1 Earned value analysis A method that is widely used for medium- and high-complexity projects is the earned value management (EVM) method. EVM is a method of periodically comparing the budgeted costs with the actual costs during the project. It combines the scheduled activities with detailed cost estimates of each activity. It allows for partial completion of an activity if some of the detailed costs associated with the activity have been paid but others have not. The budgeted cost of work scheduled (BCWS) comprises the detailed cost estimates for each activity in the project. The amount of work that should have been done by a particular date is the planned value (PV). These terms are used interchangeably by some sources, but the planned value term is used in formulas to refer to the sum of the budgeted cost of work up to a particular point in the project, so we will make that distinction in the definitions in this text for clarity. The budgeted cost of work performed (BCWP) is the budgeted cost of work scheduled that has been done. The sum of BCWP values up to that point in the project schedule is the earned value (EV). The amount spent on an item is often more or less than the estimated amount that was budgeted for that item. The actual cost (AC) is the sum of the amounts actually spent on the items. 1.31.2.4.8.3.2 Schedule variance The project manager must know if the project is on schedule and within the budget. The difference between planned and actual progress is the variance. The schedule variance (SV) is the difference between the earned value (EV) and the planned value (PV). Expressed as a formula, SV ¼ EV  PV. If less value has been earned than was planned, the schedule variance is negative, which means the project is behind schedule. The schedule variance and the cost variance provide the amount by which the spending is behind (or ahead of) schedule and the amount by which a project is exceeding (or not fully using) its budget. They do not give an idea of how these amounts compare with the total budget. The ratio of earned value to planned value gives an indication of how much of the project is completed. This ratio is the schedule performance index (SPI). The formula is SPI ¼ EV/PV. An SPI value less than 1 indicates the project is behind schedule. The ratio of the earned value to the actual cost is the cost performance index (CPI). The formula is CPI ¼ EV/AC. 1.31.2.4.8.4 Estimated cost to complete the project Part way through the project, the manager evaluates the accuracy of the cost estimates for the activities that have taken place and uses that experience to predict how much money it will take to complete the unfinished activitiesdthe estimate to complete (ETC). To calculate the ETC, the manager must decide if the cost variance observed in the estimates to that point are representative of the future. For example, if unusually bad weather causes increased cost during the first part of the project, it is not likely to have the same effect on the rest of the project. If the manager decides that the cost variance up to this point in the project is atypicaldnot typicaldthen the estimate to complete is the difference between the original budget for the entire projectdthe budget at completion (BAC)dand the earned value (EV) up to that point. Expressed as a formula, ETC ¼ BAC  EV. If the manager decides that the cost variance is caused by factors that will affect the remaining activities, such as higher labor and material costs, then the estimate to complete (ETC) needs to be adjusted by dividing it by the cost performance index (CPI). For example, if labor costs on the first part of a project are estimated at $60,000 (EV) and they actually cost $63,000 (AC), the cost performance (CPI) will be 0.95. (Recall that the CPI ¼ EV/AC). To calculate the estimate to complete (ETC), assuming the cost variance on known activities is typical of future cost, the formula is ETC ¼ (BAC – EV)/CPI. If the budget at completion (BAC) of the project is $600,000, the estimate to complete is ($600,000 – $60,000)/0.95 ¼ $568,421. 1.31.2.4.8.5 Estimate final project cost If the costs of the activities up to the present vary from the original estimates, this will affect the total estimate of the project cost. The new estimate of the project cost is the estimate at completion (EAC). To calculate the EAC, the estimate to complete (ETC) is added to the actual cost (AC) of the activities already performed. Expressed as a formula, EAC ¼ AC þ ETC.

466

GIS Project Management

1.31.2.4.9

Cost-benefit analysis

Many organizations require a cost-benefit analysis to be conducted as part of the project initiation process. Cost-benefit analysis, also called investment analysis, is based on the realization that justification for investments is best made when accompanied by some level of analysis of the associated costs and benefits both of the investment itself and its returns over a relevant time frame. In addition to providing a framework for planning, a cost-benefit analysis provides some assurance of the prudence of the initial capital expenditures on the project and the long-term support and maintenance costs of the resulting GIS. However, as a project management instrument, cost-benefit analyses are often imprecise because they do not account for many of the non-monetarized subtleties and complexities that surround GIS projects. Cost-benefit analysis is particularly difficult in the case of larges-cale spatial data infrastructure projects that take several years to complete. Although the costs of information technology are relatively easy to measure and account for, the benefits side of the equation is difficult to formulate. This is because many significant benefits are intangible (e.g., improving the integrity of land tenure systems, better stewardship of the environment, and increasing public awareness of and interest in participatory democracy), and hence, cannot be quantified precisely in monetary terms. Cost-benefit calculations are further complicated by three other factors. One of these includes what kinds of costs and benefits are measured and how they are measured, at what time the benefits are realized, and whether to include the values of external benefits or only those that relate directly to the project being undertaken. Further complicating matters is the principle of the time value of money, which stipulates that costs incurred today are “worth” more than the benefits of the same monetary values received in the future. The third factor is concerned with the variation and variability of the lifecycles of different project components (e.g., the relatively short, perhaps 3 to 5-year, technology lifecycle and the longer term, and often indefinite, data lifecycle). Working within the above limitations, a typical cost-benefit analysis can be conducted as an “educated guess” to assist in project management decision making, using the following two steps: Assumptions. There are four sets of assumed parameters or variables used for the calculation of cost and benefits. These include: The time frame for project lifecycle, payback period, initial investment period, and benefit accrual period. Costs for personnel (wages and benefits), facilities, equipment, and materials obtained from the Resource Requirements Form (see section “Culture of Stakeholders”). l Benefits from productivity gain, cost recovery (including royalty from the sale of data, potential revenues and user fees), valueadded services, and economic spin-off internally and externally. l Discount rates for use as differential weightings in the calculation of future costs and benefits. l l

Calculation. The cost-benefit analysis calculates annual benefits less annual costs relating to the creation and maintenance of the GIS resulting from the project. Two sets of values are normally calculated, namely: The sum of Flow of Net Benefit (SFNB), which simply calculates a sum of all the payments and income/benefits over the same assumed payback period. l Net Present Value (NPV), also called “discounted net present value”, which calculates the sum of future payments (negative values) and income/benefits (positive values) over an assumed payback period and reduces them to present value using an assumed discount rate. l

1.31.2.4.10

Application development strategies and techniques

A GIS project involves a substantial amount of effort and resources to be focused on application development. This includes the design, programming, testing, and integration of software modules for the user interface, database connectivity, information retrieval and analysis, generation of reports, presentation of graphics, and multimedia information products. GIS applications are now developed predominantly using software engineering methods (Pressman, 2005; Summerville, 2005).

1.31.2.4.11

Project risk and opportunity analysis

Despite the best of intentions and careful thought invested in the project initiation and planning processes, the possibility of unexpected problems and benefits occurring during the course of the project lifecycle is real. Risk and opportunity analysis are not always seen as integral and critical components of project planning; however, a prudent project manager should realize that there is no perfect approach to project planning, and if something can go wrong, it usually will. The idea of potential risk and opportunity analysis is based on the belief that an experienced project team does not and cannot know exactly what problems will occur and when they will occur, but they are able to anticipate the types of potential problems and have the capability of dealing with them if they materialize. In essence, potential risk analysis is a preventive measure rather than a regular element of a project. The idea of potential risk analysis is closely related to the notion of project quality above. Risk analysis essentially aims to help the project team develop contingency plans so that it can respond quickly and correctly to problematic situations before irreparable damage is done. One of the most common problems is the resignation or reassignment of project team members to another project. This problem is particularly acute in the case of large-scale multiyear infrastructure projects, or for projects in organizations where staff resources are limited and priorities change within short time frames. Another very common problem is the late delivery or delays in the availability of source materials. In addition to risks, the project manager should also look at the possibility of maximizing the opportunities of the project. They can do this by, for example, exploring the potential of extending the functionality of the GIS, providing value-added sales of the data to

GIS Project Management

467

other users, and using their experience gained in the project being undertaken to offer consulting services to other organizations. All these will potentially bring additional revenue to the project, thus increasing its return on investment and long-term sustainability. Risk exists for all projects. The role of the project management team is to understand the kinds and levels of risks in the project and then to develop and implement plans to mitigate these risks. Risk represents the likelihood that an event will happen during the life of the project that will negatively affect the achievement of project goals. The type and amount of risk vary by industry type, complexity, and phase of the project. The project risk plan will also reflect the risk profile of the project manager and key stakeholders. People have different comfort levels with risk, and some members of the project team will be more risk averse than others. The first step in developing a risk management plan involves identifying potential project risks. Some risks are easy to identify, such as the potential for a damaging storm in the Caribbean, and some are less obvious. Many industries or companies have risk checklists developed from past experience. The Construction Industry Institute published a 100-item risk checklist that provides examples and areas of project risks. No risk checklist will include all potential risks. The value of a checklist is the stimulation of discussion and thought about the potential risks on a project. The project team analyzes the identified risks and estimates the likelihood of the risks occurring. The team then estimates the potential impact on project goals if the event does occur. The outcome of this process is a prioritized list of estimated project risks with a value that represents the likelihood of occurrence and the potential impact on the project. The project team then develops a risk mitigation plan that reduces the likelihood of an event occurring or reduces the impact on the project if the event does occur. The risk management plan is integrated into the project execution plan, and mitigation activities are assigned to the appropriate project team member. The likelihood that all the potential events identified in the risk analysis would occur is extremely rare. The likelihood that one or more events will happen is high. The project risk plan reflects the risk profile of the project and balances the investment of the mitigation against the benefit for the project. One of the more common risk mitigation approaches is the use of contingency. Contingency is funds set aside by the project team to address unforeseen events. Projects with a high-risk profile will typically have a large contingency budget. If the team knows which activities have the highest risk, contingency can be allocated to activities with the highest risk. When risks are less identifiable to specific activities, contingency is identified in a separate line item. The plan includes periodic risk-plan reviews during the life of the project. The risk review evaluates the effectiveness of the current plan and explores possible risks not identified in earlier sessions. 1.31.2.4.11.1 Defining risk Following Holton’s (2004) essay, we define risk is the probability of an uncertain event or condition that, if it occurs, has a negative effect on a program’s objectives. Both, the size of the effect and the probability combine to give the overall measure of risk. Damodaran (2007) offers a more expansive definition that sees risk as much as an opportunity that gets squashed if risk management is solely aimed at minimizing exposure to risk. 1.31.2.4.11.2 Risk management process At a higher level, overall risk is defined as the exposure of stakeholders to the consequences of variation in individual risks. The highlevel process starts with an initiation step that defines the scope and objectives of risk management. A key output from the initiation step is the risk management plan, which details how risk will be managed throughout the life cycle. Risks should be identified and documented in the risk register. The relative significance of identified risks is assessed using qualitative techniques to enable them to be prioritized for further attention. Quantitative risk analysis may also be used to determine the combined effect of risks on objectives. The process continues with risk response planning, aiming to avoid, reduce, transfer or accept threats as well as exploit, enhance, share or reject opportunities, with contingency (time, cost, resources, and course of action) for risks which cannot be managed proactively. The final step is the implementation of agreed responses. The whole process is iterative. For example, assessment or response planning can lead to the identification of further risks; planning and implementing responses can trigger a need for further analysis, and so on. It is important that risk management is not conducted in isolation. Risks at the project level, for instance, may have to be escalated to the program level or vice versa. Risk management must contribute, as appropriate, to both business risk assessments and organizational governance requirements. The manager must be aware of risks that have an effect outside their scope of responsibility, e.g., those that could affect the organization’s reputation. The management of general health and safety risks is usually excluded from program risk management, as the management of these risks is traditionally handled by a separate function within the organization. 1.31.2.4.11.2.1

Risk by phases Risk management consists of six steps or phases.

(1) We must identify what is the risk, and what is at risk. It could be timescales, the realization of a benefit, or the delivery of a capability. (2) Although the program manager is ultimately responsible, each risk should be given an owner who is best positioned to perform mitigating actions on the risk and monitor the risk. (3) The owner then evaluates and assigns the risk with an impact and probability score. (4) Every risk has a standard set of mitigations which can be applied to it: l l

Transfer: can the risk be transferred to another party, for example, could an insurance policy be taken out? Tolerate: this is frequently used for risks with very low impact, and is effectively the do-nothing option. Tolerate effectively means the risk is monitored but the program proceeds without proactive action being taken to address the risk.

468

GIS Project Management

Terminate: this refers to adjusting the program so the risk is no longer applicable to the program, for example, a project may be removed entirely from the program so the risk can never materialize. l Treat: this is where concrete actions are taken to reduce the probability of the risk materializing or impact of the risk should it materialize. (5) The risk owner then takes concrete actions to ensure that the above mitigation measures are carried out. (6) All risks and actions which have been created need to be reviewed regularly, so the risk impact and probability can be updated following any actions which have been performed to treat them. l

1.31.2.4.11.3 Project risk and project complexity profile Risk management can be understood as a complex system. It is affected by individuals, groups, stakeholders, host organizations, clients, and the broad external environment. Fundamental to understanding this system are the concepts of risk attitude and risk appetite. Risk attitude is an individual’s or group’s natural disposition toward uncertainty and is influenced by their perception of risk. Perception is itself influenced by many factors, including conscious and subconscious reactions to risk. Risk attitude will affect the way people develop responses to risk and the way they react if a risk event occurs. The risk attitude of a group or individual is often described in one of three ways: Risk averse, where risk is avoided; Risk seeking, where risk is actively sought; l Risk neutral, where risk is neither actively sought nor avoided. l l

A risk-averse attitude may be useful in some situations (e.g. local government) but detrimental in others (e.g. an entrepreneurial, technology start-up company). Conversely, risk seeking is a positive attribute in some situations but unsuitable in others. Understanding risk attitude can help geospatial managers by giving insight into why some situations are considered more risky than others, and why individuals or groups behave in certain ways when confronted with risk. Risk appetite is the amount of risk an individual, group, or organization is prepared to take in order to achieve their objectives to take risk in a given situation, influenced by their propensity to take risk and/or the organizational risk culture. A manager needs to understand the risk appetite of the stakeholders. In the definition phase of a life cycle, the development of a solution to meet requirements will be heavily influenced by the stakeholders’ risk appetite. Some ways of meeting requirements may be delivered quickly or produce high returns but also involve high levels of risk. These would be acceptable to risk-seeking stakeholders but not to those who are risk averse. The manager also needs to understand the risk attitude of the team members and ensure that they are managed in a way that is compatible with the stakeholders’ risk appetite.

1.31.2.4.12

Monitoring and control

Project monitoring and control should be done independently of project execution in order to prevent potential conflicts of interest from occurring when the individuals executing a project monitor and control their own activities. 1.31.2.4.12.1 Quality planning Quality planning is a systematic process that translates the top management’s expression of its intentions, direction, and aims regarding quality into measurable objectives and requirements, and lays down a sequence of steps for realizing them within a specified timeframe. Identification and characterization of quality as a basis for quality management procedures is helped by reference to accepted geospatial standards such as ISO, OGC, FGDC, or URISA (see section “Standards”). 1.31.2.4.12.2 Quality planning tools A preventative approach to data quality uses a four-tier Total Data Quality Management (TDQM) model to define, measure, analyze, and improve data quality (Wang, 1998). This approach differs from conventional methods in two ways. First is the emphasis on the use of pre-defined rules to validate the data before they are entered into the database. Second is the systematic capture and analysis of detected errors as well as the subsequent feedback of information to develop preventative measures at the data collection and supply end of the database construction process. 1.31.2.4.12.3 Contract management and control Contract management requires considerable people skills. The project manager not only deals on a day-to-day basis with the contractor but also serves as the primary liaison between the contractor and the project team on various aspects of the project. A common pitfall in contract management is the use of more than one contractor in the same project. Although it is not uncommon for a project contract to be awarded to more than one consultant (e.g., one for the database, the other for the application program development), awarding multiple contracts for a single project can be the project manager’s worst nightmare and, therefore, should be avoided if possible. In the case of large-scale projects that are difficult for one contractor to complete alone, the usual solution is to allow the contractor to subcontract, subject to compliance with certain conditions (e.g., the principal contractor must be able to satisfy the project manager that the subcontractors have the capabilities and resources to complete the assigned components to the required performance standard, or the contractor takes full responsibility for delivery of

GIS Project Management

469

subcontracted components). When this happens, the project manager typically only interacts directly with the principal contractor. He or she should avoid intervening in the working relationship between the principal contractor and any subcontractors (e.g., getting involved in dispute resolution between the principal contractor and subcontractors on any specific aspect of the project). During the course of the project, disagreements between the project team and the contractor occur from time to time. In order to facilitate the resolution of potential conflicts, it is important for the project manager to specify the rules and protocols for communication between the two parties, and apply these rules strictly and consistently when problems occur. Typical rules and protocols often include communications channels (e.g., all communication must be done between the person designated by the principal contractor and the project manager only), the exclusive use of written communication, and the time limit for the contractor to respond when questions are raised, as well as the time limit for the project manager to indicate acceptance after a response is received from the contractor. 1.31.2.4.12.4 Change control A commonly overlooked aspect of contract management is change management. Often, no matter how carefully a project manager plans a project and develops project specifications for hardware, software, and performance standards, unforeseen events may happen to necessitate a change in the plan and the specifications. Change in project plans and project specifications can be generally classified into five categories. Whenever a change is required, the person (e.g., the contractor or a particular project team member) initiating the change will start the change request process. This is usually done by completing a project change request form. When a problem occurs, one can’t just make a change, because it may be too expensive or take too long to do. Instead, one should see how it affects the triple constraint of Fig. 1 and how it impacts project quality to determine whether it is worth making the change. If change impact evaluation does not show an impact on the project triple constraint, then it is alright to make the change without going through change control. In all other cases, change control is the set of procedures that enables the project manager to make changes in an organized way. Any such change to a project plan starts with a change request. Every change to the project must be documented so one can figure out what needs to be done, by when, and by whom, as well as to form a basis for learning from the management of past project to improve the management of new ones. Once the change request is documented, it is submitted to a change control board. A change control board is a group of people who consider changes for approval. If there is no such agency, then the change request could also be submitted to the project sponsor or management for review and approval. Putting the recommended changes through change control will help to evaluate the impact and update all the necessary documents. Not all changes are approved, but if the changes are approved, they are returned to the team to put them in place. The implementation phase uses the most project time and resources, and as a result, costs are usually the highest during this phase. Project managers also experience the greatest conflicts over schedules in this phase. When the project is running behind, one can sometimes find ways to do activities more quickly by adding more resources to critical path tasks; a process known as crashing. Crashing the schedule means adding resources or moving them around to bring the project back into line with the schedule. Crashing always costs more and doesn’t always work. There’s no way to crash a schedule without raising the overall cost of the project. Therefore, if the budget is fixed and there isn’t any extra money to spend, this technique is not applicable. Project fast tracking refers to a situation where two activities that were originally planned to occur in sequence are now implemented at the same time. An example from the GIS world would be the concurrent testing of user acceptance testing (UAT) and the functional performance. This is a pretty risky approach though as there is always a good chance one might need to redo some of the work that was done concurrently. Crashing and fast tracking are schedule compression tools. Managing a schedule change means keeping all of the schedule documents up to date. That way, one can always compare (and report to higher management) the results to the correct plan. 1.31.2.4.12.5 Project documentation and control Project monitoring and control starts with proper project documentation and record keeping, and is among the greatest challenges for a project manager. Project documentation includes all documents resulting from the project initiation and planning phases as well as ongoing amendments to project schedules, memoranda distributed to project team members, minutes of project team meetings, communication with the contractor and representatives of stakeholders, progress reports from the contractors, expenditure reports, and progress reports from internal project team members. Project documentation and record keeping serve several important purposes in project management. One of these is to help the project manager maintain control of the project and keep track of its progress. They also allow the project manager to track the resources used against the activities completed and in progress. From the beginning of a project, its manager must ensure that all members of the project team keep detailed notes of their respective activities (e.g., where they store their data and application program files, and the methods used for as well as progress made in the development of a particular application software module). As discussed in section (g), standards are very important in this context. Even mid-sized organizations will eventually develop their own standards as part of their GIS program or portfolio management. If a particular member is reassigned to a new role in the project or leaves the project altogether, these notes will allow the project manager quickly to provide the replacement with sufficient information to continue seamlessly with the work in progress.

470

GIS Project Management

Project documentation and record keeping are tedious and time-consuming tasks. However, it is important to keep a balance between the time and cost of record keeping on the one hand, and the usefulness of the documentation on the other hand. Obsession with detailed documentation and record keeping may be a waste of valuable resources, but keeping a collection of unorganized documents and project notes will not help either. The project manager has to use his or her discretion to set up a project documentation and record keeping system that is both economically sustainable to maintain and easy enough to access and use. 1.31.2.4.12.6 Quality assurance The terms quality assurance (QA) and quality control (QC) are often used in conjunction with one another (QA/QC). In reality, however, they refer to two relatively distinct but closely interrelated sets of concepts and practices. QA and QC have their origin in manufacturing and engineering. The purpose is to safeguard the quality of products against possible imperfection in design, craftsmanship, the material used, and production processes. Thus, QA/QC are preventive measures that are applied throughout the entire production or construction cycle of hardware manufacturing and engineering development projects. GIS projects, being a design and implementation undertaking, require stringent QA/QC measures to ensure that all activities, from design, development, production, installation, servicing, and documentation, are carried out according to accepted practices and standards, and will lead to the delivery of a system that is useful for its intended purposes. The relationship between QA and QC in GIS projects is often confused. QA should be seen as the umbrella activities that provide the support infrastructure for the technical application of QC measures. The support infrastructure includes corporate information quality and standards, competencies of project staff, and performance standards of business procedures served by the GIS resulting from individual projects. Of particular interest here are the ISO and OGC standards that will be discussed in the next section. At the technical level, QC is applied by conducting a sequential series of software tests that include: Unit tests. These tests target the smallest unit of software construction, namely the functional modules, and are carried out immediately after the source-level code is developed, reviewed, and checked for correct syntax. l Integration tests. These tests aim at detecting potential errors associated with the interaction between individual functional modules. There are different approaches to integration testing. The most commonly used one is called regression testing, which is carried out incrementally each time a new module is completed and added to the software system. l Systems test. This is the final stage of software testing conducted when the entire development process is complete. In practice, systems testing is actually a series of tests with different purposes that collectively seek to verify the functionality of the database system as a whole. The four commonly used tests are a recovery test (to verify the ability of the system to restore itself to the state before the failure), a security test (to verify the ability of the system to protect itself from unauthorized use), a stress test (to examine the response of the system to abnormal events such as excessive interruption, excessive number of users, and maximum memory use), and a performance test (to monitor the functioning of the system in run-time). As the systems test is carried out to determine whether the final system is functionally acceptable for implementation, it is also commonly called the acceptance test. l

Once fully operational, the GIS has to be validated in accordance with the validation protocols agreed in the system qualification plan. Modular tests are conducted by team members who have not taken part in the coding operations. Tests have to conform to the verification test procedures. The results of this task are reported in a verification test report. On the basis of that report, a final tuning of the information system specifications is then carried by the GIS project team. This tuning may result in improved versions of the requirements baseline, technical specifications, the design justification report, data model report, and system qualification plan. 1.31.2.4.12.7 Standards Identification and characterization of quality as a basis for quality management procedures is helped by reference to accepted standards such as: The family of ISO technical specifications 191xx (as of 2016, some 68 different specifications have been passed; hence the “xx”). The International Organization for Standardization (ISO), is an independent, nongovernmental organization, the members of which are the standards organizations of the 162 member countries. Its technical committee 211 has been tasked with the development of technical specifications for digital geographic information, covering areas such as: l l l l l l l

Simple features access. Reference models. Spatial and temporal schemas. Location-based services. Metadata. Web feature and map services. Classification systems.

The Open Geospatial Consortium (OGC) is an independent standards organization with participation of industry and government organizations, and coordination with professional associations and other standards organizations. The OGC has established standards for spatial data format, data classification, geospatial services and applications, and GIS operational practices. The OGC standards have been adopted by many GIS software and database companies and by user organizations. As of 2016, the OGC has published some 49 standards including CityGML, GML, NetCDF, and the plethora of web services such as WCS, WFS, WMS, WPS that are now the foundation of all web-based geospatial applications.

GIS Project Management

471

The Federal Geographic Data Committee (FGDC) is a U.S. Federal government organization, which develops and approves standards for GIS data and metadata quality, format, content, and classification and related GIS data collection and maintenance practices. Similar organizations exist in Europe (INSPIRE), Canada (CGDI), Australia/ New Zealand (ANZLIC), and elsewhere. The Urban and Regional Information Systems Association (URISA) is an international professional and educational association that promotes standards and best practices for the management and use of GIS technology. Of particular relevance to this article is their GIS Management Institute (GMI), which develops tools and best practices useful for GIS planning and management. 1.31.2.4.12.8 Ethical responsibility in publicly financed projects While the topic of ethics will be discussed in more general terms in section “Ethics”, this section here deals with the need for an extra layer of monitoring and control in publicly financed projects. This starts with the procurement process, where the bidding process may involve criteria that go beyond the scope of the GIS project itself, e.g., minimum wage requirements or a preference to womenand minority-owned enterprises. Lobbying rules and other conflict of interest declarations will put additional burden on the monitoring requirements, sometimes dismissively labeled as “red tape”.

1.31.2.4.13

Deployment (technology roll-out activities)

After the deliverables have been physically constructed and accepted by the customer, a phase review is carried out to determine whether the project is complete and ready for closure. 1.31.2.4.13.1 System engineering During this task, the GIS project team completes all developments in accordance with the refined technical specifications and data model report. This includes the final coding of software modules and their final integration into an operational GIS package. Besides the GIS package itself, it is the project manager’s responsibility to also refine the design definition report and the data model implementation report. Critical success factors for implementation of comprehensive GIS systems will require a large investment in staffing resources and its funding as an information technology project. The implementation process must be recognized as such, therefore being managed like any major IT project to achieve optimal benefits and results.

1.31.2.4.14

Closeout and evaluation

1.31.2.4.14.1 Development and testing The objective of this phase is to complete the development of the geographical information system and to demonstrate the compliance with the consolidated user requirements refined during the prototyping phase. 1.31.2.4.14.2 Support (user documentation, training, technology transfer) Prior to its installation at the users’ premises, the information system has to be adequately documented. This documentation includes all the documents mentioned above, extended to the documents described hereafter. 1.31.2.4.14.2.1 Installation guide The installation guide describes all operations to install and configure the information system at the users’ premises. It specifies, in particular, the minimum hardware requirements and versions of operating platforms. Another important item at this stage is the need to pay attention to the implementation of the data model into the GIS software. 1.31.2.4.14.2.2 Software user manual The software user manual describes all the functionalities of the software elements of the GIS package and has to be written in such a way that it is understandable by the users’ organizations. 1.31.2.4.14.2.3 Manual of procedures The manual of procedures describes all procedures which support the operations routinely performed by the information system, especially those which require human intervention. These include: l l l l l l

Overall system administration procedures including back-up and archiving procedures, maintenance operations, the configuration of new users, etc. Data diffusion policy and procedures (including pricing policy, if any). Quality control procedures. Financial procedures. Documentation procedures. Annual performance review procedures.

1.31.2.4.14.2.4 Training plan The training plan aims at specifying the human skills which are required to operate the information system properly and the training efforts to be undertaken so that the staff of the users’ organizations matches these human skill requirements. The training plan describes the different training modules which are required and for each of them the number of hours which will be lectured, the list of staff members who will attend the module, and a preliminary investigation of potential training service providers.

472

GIS Project Management

1.31.2.4.14.3 Closing-out contracts Contractual closeout includes identification and status of each project contract and subcontract, their values and their terms and conditions. The contract status should include any incomplete deliverables; terms, conditions, and dates for obtaining remaining deliverables; real and potential claims; pending and any ongoing legal actions; warranties made as part of the contract; and any other information that might prove useful to the user organization in relation to legal, contractual, warranty, or deliverables. Before the contract is closed, any minor items that need to be repaired or completed are placed on a punch list of all the items that still remain to be done. The project team will then work on all of the items on the list, building a small schedule to complete the remaining work. Once the punch list becomes smaller, the project manager begins closing down the project, maintaining only enough staff and equipment to support the team that is working on the punch list. In many public projects, a major component is regulatory compliance, which ascertains that no additional active management is needed. The development and execution of closing-out contracts may take as much time as the project itself because once signed, it usually does not allow a client to request any changes or repairs, and it serves as protection against legal recourse once the project is finished. Signing of the closing-out contract usually coincides with the last payment as well. 1.31.2.4.14.4 Post-project evaluations In many cases when a project is completed, it is time to start thinking about its upgrade and enhancement. The purpose of the postimplementation evaluation is to summarize the experiences gained in the project just completed so that the same mistakes will not be repeated in the next version of the database system and in other similar projects. In essence, a post-implementation evaluation is a comparison between the predicted events and the events that actually occurred during the entire course of the project’s lifecycle. Experience has shown that projects rarely run exactly as planned and scheduled. Variances between planned and actual events can often be generally traced to two primary causes, namely technical and personnel. Thus, a post-implementation evaluation report is best approached from these two perspectives. It is common practice for the project sponsor to analyze the performance of project personnel, while the project manager is responsible for the evaluation of the performance of equipment and applications. In conducting the technical portion of the evaluation, the project manager should pay special attention to the problems encountered and the solutions that were used to deal with the problems successfully. Upon completion of the evaluation report, the project manager should submit copies to the project sponsor and senior management to signify the completion of the entire project. He or she should distribute copies of the report to all project staff. 1.31.2.4.14.5 Project archival The documents associated with the project must be stored in a safe location where they can be retrieved for future reference. Signed contracts or other documents that might be used in tax reviews or lawsuits must be stored. Organizations will have legal document storage and retrieval policies that apply to project documents and must be followed. Some project documents can be stored electronically. Care should be taken to store documents in a form that can be recovered easily. If the documents are stored electronically, standard naming conventions should be used so documents can be sorted and grouped by name. If documents are stored in paper form, the expiration date of the documents should be determined so they can be destroyed at some point in the future. The following are documents that are typically archived: Charter documents. Scope statement. l Original budget. l Change documents. l Manager’s summarydlessons learned. l l

1.31.3

Stakeholder Management

People and organizations can have many different relationships to the project. Most commonly, these relationships can be grouped into those who will be impacted by the project and those who can impact the project. A project is successful when it achieves its objectives and meets or exceeds the expectations of the stakeholders. This insight is fairly new: the Project Management Institute, for example, added stakeholder management to its core list of knowledge areas only in its latest (fifth) edition of the project management guidebook. Stakeholders are individuals who either care about or have a vested interest in the project. They are the people who are actively involved with the work of the project or have something to either gain or lose as a result of the project. In a project to add lanes to a highway, motorists are stakeholders who are positively affected. However, they negatively affect residents who live near the highway during the project (with construction noise) and after the project with far-reaching implications (increased traffic noise and pollution). A successful project manager will identify stakeholders early in the project. For each stakeholder, it is important to identify what they want or need and what influence or power they have over the project. Based on this information, the need to communicate with the stakeholder or stakeholder group can be identified, followed by the creation of a stakeholder management plan. A stakeholder

GIS Project Management

473

Contractors & Subcontractors

Government

Top Management Project Team Members Your Manager Peers Resource Manager Internal Customers External Customers

Fig. 6

Suppliers

Project stakeholders. Adopted from Barron M and Barron A (2011) Project management for scientists and engineers. Galway: Connexions.

register is used to identify and track the interactions between the project and each stakeholder. This register must be updated on a regular basis, as new stakeholders can arise at any time, and the needs and interest levels of a particular stakeholder may change through the course of the project. Often there is more than one major stakeholder in the project. An increase in the number of stakeholders adds stress to the project and influences the project’s complexity level. The business or emotional investment of the stakeholder in the project and the ability of the stakeholder to influence the project outcomes or execution approach will also influence the stakeholder complexity of the project. In addition to the number of stakeholders and their level of investment, the degree to which the project stakeholders agree or disagree influences the project’s complexity. A small commercial construction project will typically have several stakeholders. All the building permitting agencies, environmental agencies, and labor and safety agencies have an interest in the project and can influence the execution plan of the project. The neighbors will have an interest in the architectural appeal, the noise, and the purpose of the building. The following section details the functions and relationships of the stakeholders depicted in Fig. 6.

1.31.3.1 1.31.3.1.1

Stakeholders Top management

Top management may include the chief executives, directors, division managers, and others. These people direct the strategy and development of the organization. If one has top management support, it will be easier to recruit the best staff to carry out the project, and acquire needed material and resources; also visibility can enhance a project manager’s professional standing in the company. That comes with the risk though that failure can be quite dramatic and visible to all, and if the project is large and expensive (most are), the cost of failure will be more substantial than for a smaller, less visible project.

1.31.3.1.2

Project team

The project team is made up of those people dedicated to the project or borrowed on a part-time basis. A project manager has to provide leadership, direction, and above all, the support to team members as they go about accomplishing their tasks. Working closely with the team to solve problems helps to build rapport. Team management is not an easy task; some of the difficulties that many projects run into include: If project team members are borrowed and they don’t report to the project manager, their priorities may be elsewhere. They may be juggling many projects as well as their full-time job and have difficulty meeting deadlines. l Personality conflicts may arise. These may be caused by differences in social style or values, or they may be the result of some bad experience when people worked together in the past. l If communication breaks down, one may learn about missed deadlines only when it is too late to recover. l l

1.31.3.1.3

Internal customers

Internal customers are individuals within the organization who are customers for projects that meet the needs of internal demands. The customer holds the power to accept or reject the project outcomes. Early in the relationship, the project manager will need to negotiate, clarify, and document project specifications and deliverables. After the project begins, the project manager must stay tuned into the customer’s concerns and issues and keep the customer informed. Common stumbling blocks when dealing with internal customers include: A lack of clarity about precisely what the customer wants. A lack of documentation for what is wanted. l A lack of knowledge of the customer’s organization and operating characteristics. l l

474

GIS Project Management

Unrealistic deadlines, budgets, or specifications requested by the customer. Hesitancy of the customer to sign off on the project or accept responsibility for decisions. l Changes in project scope. l l

1.31.3.1.4

Contractors and suppliers

There are times when organizations don’t have the expertise or resources available in-house, and work is farmed out to contractors or subcontractors. This can be a construction management foreman, network consultant, electrician, carpenter, architect, or anyone who is not an employee. Managing contractors or suppliers requires many of the skills needed to manage full-time project team members. Any number of problems can arise with contractors or subcontractors: Quality of the work. Cost overruns. l Schedule slippage. l l

Many projects depend on goods provided by outside suppliers. This is true for example of construction projects where lumber, nails, bricks, and mortar come from outside suppliers. If the supplied goods are delivered late or are in short supply or of poor quality, or if the price is greater than originally quoted, the project may suffer. Depending on the project, managing contractor and supplier relationships can consume more than half of the project manager’s time. It is not purely intuitive; it involves a sophisticated skill set that includes managing conflicts, negotiating, and other interpersonal skills.

1.31.3.2

Politics of Projects

Many times, project stakeholders have conflicting interests. It’s the project manager’s responsibility to understand these conflicts and try to resolve them. It’s also the project manager’s responsibility to manage stakeholder expectations. Be certain to identify and meet with all key stakeholders early in the project to understand all their needs and constraints. Project managers are somewhat like politicians. Typically, they are not inherently powerful or capable of imposing their will directly on coworkers, subcontractors, and suppliers. Like politicians, if they are to get their way, they have to exercise influence effectively over others. On projects, project managers have direct control over very few things; therefore, their ability to influence othersdto be a good politiciandmay be very important. Here are a few steps a good project politician should follow. However, a good rule is that when in doubt, stakeholder conflicts should always be resolved in favor of the customer.

1.31.3.3

Culture of Stakeholders

When project stakeholders do not share a common culture, project management must adapt its organizations and work processes to cope with cultural differences. Communication is perhaps the most visible manifestation of culture. Project managers encounter cultural differences in communication in language, context, and candor. Language is clearly the greatest barrier to communication. When project stakeholders do not share the same language, communication slows down and is often filtered to share only information that is deemed critical. The barrier to communication can influence project execution where the quick and accurate exchange of ideas and information is critical. The interpretation of information reflects the extent that context and candor influence cultural expressions of ideas and understanding of information. In some cultures, an affirmative answer to a question does not always mean yes. The cultural influence can create confusion on a project where project stakeholders represent more than one culture.

1.31.3.4

Strategies for Stakeholder Management

Securing stakeholders’ support is absolutely critical to the success of a project. Even if all the deliverables are met and the objectives are satisfied, if the key stakeholders aren’t happy, the project will be deemed a failure. The first step, therefore, is to identify who the stakeholders are. Just because someone is important in the organization does not necessarily mean she is important to the project. The typical suspects are the manager, her boss, the client, the client’s manager, any subject matter expert whose involvement is necessary, and the board reviewing and approving the project. In some situations, there are people who think they are stakeholders. From the manager’s perspective, they may not be, but these need to be handled carefully. They could be influential with those who have the power to impact the project and should not be dismissed out of hand. Second, a project manager needs to determine what power they have and what their intentions toward the project are. Do they have the power to have an impact on the project? Are they in support or opposition to the project? Third, what’s the relationship among stakeholders? Can the project’s chances be improved by working with those who support the project by co-opting them to improve the views of those who oppose it? A key piece of stakeholder management efforts is constant communication to her stakeholders.

GIS Project Management

1.31.4

Project Management Expertise

1.31.4.1

Application Knowledge

475

Application-specific knowledge and skills pertaining to a particular GIS project are not particularly important at the management level. The project manager can usually pick up relevant knowledge and skills through interaction with application specialists and user representatives as the project unfolds. Technical competencies and skills, on the other hand, are essential throughout the entire project lifecycle, and especially so in the project execution phase. Such competencies and skills are project-specific because different types of projects require different technical skill sets. A project manager is not normally expected to be a technical expert. However, it is essential for the project manager to have, as a minimum, a good understanding of the basic principles of GIS. It is useful to consider application knowledge from another perspective by taking technical competencies and skills into consideration. Fig. 7 shows the role of technical competencies and skills in project management. Project management competencies and skills are classified into three categories according to their respective nature, namely strategic, tactical, and technical. Strategic competencies and skills cover mainly the two areas of “organizational culture” and “communication”, which are both described in the following sections. These two areas of competencies and skills are applied mainly in the initiation and closing phases of the project lifecycle. The tactical competencies and skills discussed in section “Stakeholder Management” are needed most for the planning and monitoring and control phases of the PMLC.

1.31.4.2

Understanding the Project Environment

There are many factors that need to be acknowledged in the project environment. At one level, it is important to think in terms of cultural and social environments such as people, demographics, and education. Many GIS projects take place in an international context where projects are doomed to fail if they do not heed political and cultural influences. Of all the factors, the physical ones are the easiest to understand, and it is the cultural and international factors that are often misunderstood or ignored. How we deal with clients, customers, or project members from other countries can be critical to the success of the project. For example, the culture of the United States values accomplishments and individualism. Americans tend to be informal and call each other by first names, even if having just met. Europeans tend to be more formal, using surnames instead of first names in a business setting, even if they know each other well. In addition, their communication style is more formal than in the United States, and while they tend to value individualism, they also value history, hierarchy, and loyalty. Harvey (1997) did a fascinating study on the differences in the culture of GIS implementation between Münster, Germany, and Seattle, WA (United States). Cultural differences of a very different kind occur also within (larger) organizations. All projects inevitably involve changes in the organizational culture, institutional structure, business processes, and people and it is the responsibility of the project manager to steer the organization through the transition with minimum disruption to the organization’s business, the least anxiety of its staff, at the lowest possible cost, and within the shortest realistic time frame.

1.31.4.3

Management Knowledge and Skills

Project management is the responsibility of a project manager. This individual seldom participates directly in the activities that produce the GIS, but rather strives to maintain the progress and productive interaction of various parties in order to minimize the overall risk of failure. A project manager is expected to have a specific level of competency and skills in technical management,

Strategic Tactical Technical

Initiation Planning Execution Monitoring / Control Evaluation / Close-out

Fig. 7

Project management competencies and skills.

476

GIS Project Management

financial management, and people management in order to accomplish this objective. In many organizations, the project manager is required to possess a professional designation from an accreditation body, such as the PMI noted above. Although a professional designation for a manager is by no means a panacea for the success of a project, such a requirement at least reflects the general realization that a set of managerial competencies and skills is essential for the practice of effective project management. Typical project manager job functions include: l l l l l l

Define scope of project Identify stakeholders, decision-makers, and escalation procedures Develop detailed task list (work breakdown structures) Estimate time requirements Identify required resources and budget Evaluate project requirements

l l l l l l l

Identify and evaluate risks Prepare contingency plan Identify interdependencies Identify and track critical milestones Participate in project phase review Secure needed resources Manage the change control process Report project status

The New York State Project Management Guidebook (Mulholland, 2003) identifies five core sets of competencies and skills for project managers. These include:

1.31.4.3.1

Communication

Good communication skills are a critical requirement for project managers. Project managers spend 90% of their time communicating. Communication in project management is bidirectional in the sense that talking and listening are equally important. The project manager must be able to convey his or her messages clearly both verbally and in writing. At the same time, he or she must be willing to listen to suggestions and ideas put forth by project sponsors, project team members, and other stakeholders of the project both within and outside of the organization. Project communication can thus be summed up as knowing “who needs what information and when” and making sure they have it.

1.31.4.3.2

Trust building

Project management can be a very difficult task if the project manager is unable to gain the trust of the project sponsor, project staff, and other stakeholders. Trust and credibility cannot be built overnight. They must be developed over time and can be inspired only if the project manager exhibits behaviors compatible with the competency and skill requirements described above, is willing to admit mistakes and accept responsibility for actions, values differences and diversity of opinions and cultures, and treats everyone equally and equitably.

1.31.4.3.3

Leadership

Leadership is the ability to motivate and inspire individuals to work toward expected results. Leaders inspire vision and rally people around common goals. A good project manager can motivate and inspire the project team to see the vision and value of the project. The project manager as a leader can inspire the project team to find a solution to overcome perceived obstacles to get the work done. Experience has shown that project teams seldom become high performing immediately after their formation. It takes time and effort to build an effective and coherent project team. As the team leader, the project manager should continuously motivate members by letting them know the benefits and potential opportunities of the experience and skills to be gained in the project. He or she must provide team members the necessary training to perform their assigned tasks effectively, and recognize their efforts and accomplishments appropriately. Team leadership also means delegation and empowerment, where necessary, to increase the sense of belonging among team members. At the same time, team leadership never ignores accountability and discipline, which will be applied impartially, transparently, and promptly when and if the situation warrants them.

1.31.4.3.4

Problem solving

While a project manager has considerable responsibility for the success of a project, he or she does not always have absolute authority and control over human, financial, and technical resources allocated to the project. Therefore, it is essential for a project manager to be politically astute, and have good networking and negotiation skills to deal with senior managers and all stakeholders to ensure appropriate and sustainable support for the project. It is also important for a project manager to have good mediation skills to resolve disputes or conflicts among members of the project team as well as those between the project team and other stakeholders of the project.

1.31.4.4

Certification

Certification is a process designed to recognize individuals who have demonstrated a level of expertise in their profession. Although there are specialized certifications for a number of disciplines (e.g. intelligence, photogrammetry, surveying), the GIS Certification Institute’s (GISCI) designation of GIS Professional (GISP) is the most widely accepted industry-wide, internationally recognized, software-agnostic certification available to geospatial professionals. There are, as of 2016, more than 8000 GISP-certified professionals in 25 countries. The GISP helps employers identify professionals who are committed to the skilled and ethical use of

GIS Project Management

477

GIS. GISPs have a unique set of skills and responsibilities that can enhance their workforce. In contrast to a degree, the GISP certification emphasizes regular renewal intervals that require a combination of work experience, education, and contributions to the GIS profession. The value of a GISP is acknowledged by the fact that they demand on average about 25% higher salaries, all other qualifications being equal.

1.31.4.5

Ethics

Good ethics is good business (Jakobs, 2017). In the United States, both the PMI and the GISCI require their members to sign a Code of Ethics and Professional Conduct. GIS professionals are expected to deliver quality work. Part of this is addressed by the certifications discussed in the previous section. This includes the need of the professional to keep updated in the field through readings and professional development. Each member of the project team is expected to identify risks and the potential to reduce them. Many of the characteristics of project management are mirrored in the expectations of the individuals working on the project. They include the identification of alternative strategies to reach employer/funder goals and the implications of each, as well as the requirement to document one’s work so that others can use it (see also section “Project documentation and control”). Ethical behavior assumes GIS professionals to hold information confidential unless authorized to release it. Any conflict of interest ought to be avoided and when this is not possible to be disclosed.

1.31.5

Summary

GIS project management is somewhat of a neglected child in the academic treatment geospatial work. Management practices are seen as just that: applications that do not merit further investigation. The fact that business schools exist and thrive hints at the gaping hole that GIScientists have left uncovered. One of the main messages of this article is that we need to look beyond the immediate application of GIS and see GIS implementations in a larger enterprise context. Much can be derived from the body of traditional management literature. But even here, the bon mot “spatial is special” holds true. It starts with the characteristics of spatial data and continues with the unusual breadth of application areas that require generalists and specialists to collaborate even more than in standard information technology. Most college-level GIS programs teach the technology and the science behind it, which serves the science community well, and the industry adequatelydup to a point. GIS certificates, both academic and vendor-driven abound, addressing the needs for the bottom string of a hierarchy of GIS professionals. There is, or at least has not been, a corresponding body of knowledge for higher level GIS professionals or managers, which is surprising given that there are thousands of GIS departments in the United States alone. A systematic investigation of what makes a GIS program successful (the URISA Awards for Exemplary Systems in Government come to mind) requires analyzing and abstracting all the aspects of GIS project management discussed on the previous pages. A compendium of best practices and how they might even inform GIS use in more traditional academic settings has yet to be written. In the meantime, this article will hopefully shed some insight on the complexities that experienced GIS managers have learnt to deal with.

References Barron, M., Barron, A., 2011. Project management for scientists and engineers. Connexions, Galway. Burek, P., 2011. Influence of the scope statement on the WBS. Project Management Institute, Newtown Square. Damodaran, A., 2007. Strategic risk taking: A framework for risk management. Prentice-Hall, Upper Saddle River. Harvey, F., 1997. National cultural differences in theory and practice. Information, Technology, and People 10 (2), 132–146. Holton, G., 2004. Defining risk. Financial Analysts Journal 60 (6), 19–25. Huxhold, W., 1996. Managing geographic information systems projects. Oxford University Press, New York. Jakobs, R. (2017). Good business: Why placing ethics over profits pays off. Philips Blog. http://www.philips.com/a-w/innovationmatters/blog/good-business-why-placing-ethicsover-profits-pays-off.html. (Accessed 25 November 2016). Mulholland, N. (Ed.), 2003. New York State project management guidebook. New York State Office for Technology, Albany. NETSSAF. (2008). Proceedings, Network for the Development of Sustainable Approaches for Large Scale Implementation of Sanitation in Africa Conference, 24–27 September 2008, Ouagadougou, Burkina Faso. http://www.ircwash.org/resources/netssaf-international-conference-pathways-towards-sustainable-sanitation-africa-24-27. (Accessed 23 May 2016). Peters, P., 2008. Building a GIS. ESRI Press, Redlands. Pressman, R., 2005. Software engineering: A practitioner’s approach, 6th edn. McGraw Hall, New York. Project Management Institute (PMI), 2013a. Guide of the project management body of knowledge (commonly referred to as the PMBOK Guide), 5th edn. Project Management Institute, Newtown Square. Project Management Institute (PMI), 2013b. The standard for program management. Program Management Institute, Newtown Square. Project Management Institute (PMI), 2013c. The standard for portfolio management. Program Management Institute, Newtown Square. Shi, Q., Chen, J., 2006. The human side of project management: Leadership skills. Project Management Institute, Inc., Newtown Square. Standish Group, 2016. The CHAOS report 2015. Standish Group, Boston. Summerville, I., 2005. Software engineering, 7th edn. Addison Wesley, Boston. Tomlinson, R., 2005. Thinking about GIS: Geographic information system planning for managers. ESRI Press, Redlands. URISA Journal. (2015). Special issue on Return on investment in GIS. 27(1), 13–46. Des Plaines: Urban and Regional Information Systems Association (URISA). Wang, R., 1998. A product perspective on total data quality management. Communications of the ACM 41 (2), 58–65. Wysocki, R.K., Beck Jr., R., Crane, D.B., 2003. Effective project management, 3rd edn. John Wiley & Sons, New York.

This page intentionally left blank

COMPREHENSIVE GEOGRAPHIC INFORMATION SYSTEMS

This page intentionally left blank

COMPREHENSIVE GEOGRAPHIC INFORMATION SYSTEMS EDITOR IN CHIEF

Bo Huang The Chinese University of Hong Kong, Hong Kong

VOLUME 2

GIS APPLICATIONS FOR ENVIRONMENT AND RESOURCES VOLUME EDITORS

Georg Bareth University of Cologne, Cologne, Germany

Chunqiao Song University of California, Los Angeles, CA, United States

Yan Song University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK 225 Wyman Street, Waltham, MA 02451, USA Copyright Ó 2018 Elsevier Inc. All rights reserved No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notice Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers may always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN 978-0-12-804660-9 For information on all publications visit our website at http://store.elsevier.com

Publisher: Oliver Walter Acquisition Editor: Priscilla Braglia Content Project Manager: Laura Escalante Santos Associate Content Project Manager: Paula Davies and Katie Finn Cover Designer: Mark Rogers Printed and bound in the United States

EDITOR IN CHIEF Bo Huang Dr. Bo Huang is a professor in the Department of Geography and Resource Management, The Chinese University of Hong Kong, where he is also the Associate Director of Institute of Space and Earth Information Science (ISEIS). Prior to this, he held faculty positions at the University of Calgary, Canada, and the National University of Singapore. He has a background and experience in diverse disciplines, including urban planning, computer science, Geographic Information Systems (GIS), and remote sensing. His research interests cover most aspects of GIScience, specifically the design and development of models and algorithms in spatial/spatiotemporal statistics, remote sensing image fusion and multiobjective spatial optimization, and their applications in environmental monitoring and sustainable land use and transportation planning. The Geographically and Temporally Weighted Regression (GTWR) model (available in his ResearchGate) that was developed by him in 2010 has now been widely used in a wide range of areas, including economics, environment, geography, and urban planning. Dr. Huang serves as the Asia-Pacific Editor of International Journal of Geographical Information Science (Taylor & Francis), the Executive Editor of Annals of GIS (Taylor & Francis), and the Chief Scientist of the Joint Laboratory of Smart Cities (Beijing). He was awarded Chang Jiang Chair Professorship in 2016 by the Ministry of Education of PR China.

v

This page intentionally left blank

VOLUME EDITORS Georg Bareth Georg Bareth studied Physical Geography at the University of Stuttgart and graduated in 1995. From 1996 to 1999 he received a PhD scholarship from the German Research Foundation (DFG) and worked on his thesis “Emissions of Greenhouse Gases from Agriculture – Regional Presentation and Estimation for a dairy farm region by using GIS” at the University of Hohenheim. In 2004, he habilitated in Agroinformatics, and since 2004, he holds a professorship for Geoinformatics at the University of Cologne.

Kai Cao Kai Cao is a lecturer in the Department of Geography at National University of Singapore (NUS), an affiliated researcher in the Institute of Real Estate Studies, a research associate in the Center for Family and Population Research, and a member of the steering committee of the Next Age Institute at NUS. He is serving on the Board Committee and as the chair of Newsletter Committee in the International Association of Chinese Professionals in Geographic Information Science. He had also been a member of the National Geographic’s Committee for Science and Exploration for one year. He obtained his BSc degree in Geography (Cartography and Geographic Information Science) and MPhil degree in Geography (Remote Sensing and Geographic Information Science) from Nanjing University in China, and his PhD degree in Geography from The Chinese University of Hong Kong. Prior to joining the Department of Geography at NUS, he had worked in the Center for Geographic Analysis at Harvard University, in the Department of Geography at the University of Illinois at Urbana–Champaign, and in the World History Center at the University of Pittsburgh, respectively. He was also a visiting research scholar in the Department of Human Geography and Spatial Planning at Utrecht University in 2009, and a visiting scholar in the Center for Spatial Studies and Department of Geography at University of California, Santa Barbara (UCSB) in 2012. Dr. Kai Cao specializes in GIScience, spatial simulation and optimization, urban analytics, and spatially integrated social science. He has published numerous internationally referred journal articles, book chapters, and conference papers in his field and had also been a guest editor of a special issue in the International Journal of Geographical Information Science on the topic of “Cyberinfrastructure, GIS and Spatial Optimization”, together with Dr. Wenwen Li from Arizona State University and Prof. Richard Church from UCSB.

vii

viii

Volume Editors Tom Cova

Tom Cova is a professor of Geography at the University of Utah and director of the Center for Natural and Technological Hazards. He received a BS in Computer Science from the University of Oregon and an MA and PhD in Geography from the University of California, Santa Barbara where he was an Eisenhower Fellow. Professor Cova’s research and teaching interests are environmental hazards, emergency management, transportation, and geographic information science (GIScience). His initial focus was regional evacuation modeling and analysis, but this has since been expanded to include emergency preparedness, public warnings, and protective actions. He has published in many leading GIS, hazards, and transportation journals including the International Journal of Geographical Information Science (IJGIS), Transactions in GIS, Computers, Environment and Urban Systems, Transportation Research A and C, Natural Hazards, Geographical Analysis, Natural Hazards Review, and Environment and Planning A. His 2005 paper in Natural Hazards Review resulted in new standards in the United States for transportation egress in fire-prone regions (National Fire Protection Association 1141). Concepts drawn from his 2003 paper on lane-based evacuation routing in Transportation Research A: Policy and Practice have been used in evacuation planning and management worldwide, most notably in the 2012 Waldo Canyon Fire evacuation in Colorado Springs. Professor Cova was a coinvestigator on the National Center for Remote Sensing in Transportation (NCRST) Hazards Consortium in 2001–04. Since then most of the support for his research has been provided by the National Science Foundation on projects ranging from evacuation versus shelter-in-place in wildfires to the analytical derivation of warning trigger points. He chaired the GIS Specialty Group for the Association of American Geographers in 2007–08 and the Hazards, Risks and Disasters Specialty Group in 2011–12. In 2008 he served as program chair for the International Conference on Geographical Information Science (GIScience, 2008) in Park City, Utah. He was a mentor and advisor for the National Science Foundation project “Enabling the Next Generation of Hazards Researchers” and is a recipient of the Excellence in Mentoring Award from the College of Social & Behavioral Science at the University of Utah.

Elisabete A. Silva Elisabete Silva, BA, MA (Lisbon), PhD (Massachusetts), MRTPI, is a University Senior Lecturer (Associate Professor) in Spatial Planning and a Fellow and DoS of Robinson College, University of Cambridge, UK. Dr. Silva has a research track record of 25 years, both at the public and private sector. Her research interests are centered on the application of new technologies to spatial planning, in particular city and metropolitan dynamic modeling through time. The main subject areas include land use change, transportation and spatial plans and policy, the use of Geographic Information Systems (GIS), spatial analysis, and new technologies/models in planning (i.e., CA and ABM). She is the coauthor of the Ashgate book A Planners’ Encounter With Complexity (2010) and The Routledge Handbook of Planning Research Methods (2014).

Chunqiao Song Chunqiao Song received his BS degree from Wuhan University in 2008 and his MS degree from the Chinese Academy of Sciences in 2011, respectively. Both major in geographic information science. He received his PhD degree in geography from the Chinese University of Hong Kong in 2014. He is currently working as a researcher in the University of California, Los Angeles. He focuses his research on developing the applications of remote sensing and geographic information techniques in large-scale environment monitoring and process modeling. It aims to contribute to the development of novel scientific, theoretical, and methodological aspects of geoinformatics techniques to understand how the key environment elements (e.g., water, ice, and ecosystem) respond to a changing climate and human intervention in High Mountain Asia and worldwide. His current research work includes (1) developing high-resolution satellite-based lake hydrographical datasets, which are available at global scale, and (2) understanding lake water storage dynamic and its hydrological processes and cryosphere on the Tibetan Plateau (Earth’s “Third Pole”) and high mountainous regions. He is the author of more than 50 primary research articles, reviews, and book chapters in hydrological, remote sensing, ecological, or environmental fields.

Volume Editors

ix

Yan Song Yan Song is a full professor at the Department of City and Regional Planning and director of the Program on Chinese Cities at the University of North Carolina at Chapel Hill. Dr. Song’s research interests includes low-carbon and green cities, plan evaluation, land use development and regulations, spatial analysis of urban spatial structure and urban form, land use and transportation integration, and how to accommodate research in the above fields by using planning supporting systems such as GIS, big data, and other computer-aided planning methods and tools.

Ming-Hsiang Tsou Ming-Hsiang (Ming) Tsou is a professor in the Department of Geography, San Diego State University (SDSU), and the founding director of the Center for Human Dynamics in the Mobile Age (HDMA) (http://humandynamics.sdsu.edu/). He received a BS (1991) from the National Taiwan University, an M.A. (1996) from the State University of New York at Buffalo, and a PhD (2001) from the University of Colorado at Boulder, all in Geography. His research interests are in Human Dynamics, Social Media, Big Data, Visualization, and Cartography, Web GIS, High Performance Computing (HPC), Mobile GIS, and K-12 GIS education. He is a coauthor of Internet GIS, a scholarly book published in 2003 by Wiley, and served on the editorial boards of the Annals of GIS (2008–), Cartography and GIScience (2013–), and the Professional Geographers (2011–). Tsou was the chair of the Cartographic Specialty Group (2007–08), the chair of the Cyberinfrastructure Specialty Group (2012–13) in the Association of American Geographers (AAG), and the cochair of the NASA Earth Science Enterprise Data System Working Group (ESEDWG) Standard Process Group (SPG) from 2004 to 2007. He has served on two US National Academy of Science Committees: “Research Priorities for the USGS Center of Excellence for Geospatial Information Science” (2006–07) and “Geotargeted Alerts and Warnings: A Workshop on Current Knowledge and Research Gaps” (2012–13). In 2010, Tsou was awarded a $1.3 million research grant funded by National Science Foundation and served as the principal investigator (PI) of the “Mapping ideas from Cyberspace to Realspace” (http://mappingideas.sdsu.edu/) research project (2010–14). This NSF-CDI project integrates GIS, computational linguistics, web search engines, and social media APIs to track and analyze public-accessible websites and social media (tweets) for visualizing and analyzing the diffusion of information and ideas in cyberspace. In Spring 2014, Tsou established a new research center, Human Dynamics in the Mobile Age (HDMA), a transdisciplinary research area of excellence at San Diego State University to integrate research works from GIScience, Public Health, Social Science, Sociology, and Communication. Tsou is the founding director of the HDMA Center. In Fall 2014, Tsou received an NSF Interdisciplinary Behavioral and Social Science Research (IBSS) award for “Spatiotemporal Modeling of Human Dynamics Across Social Media and Social Networks” (Award#1416509, $999,887, 2014–18, http://socialmedia.sdsu.edu/). This large interdisciplinary research project studies human dynamics across social media and social networks, focusing on information diffusion modeling over time and space, and the connection between online activities and real-world human behaviors (including disaster evacuation, vaccine exemption, etc). Tsou is also involved with several GIS education projects for K-12 and higher education. He has served on the AP GIS&T course advisory board at AAG and as a senior researcher in the National GeoTech Center, and the Geospatial Technology Coordinator in California Geographic Alliance to promote GIS education in universities, community colleges, and high schools. Tsou has conducted professional GIS training workshops for GIS teachers annually at the San Diego State University during the last 10 years (http://geoinfo.sdsu. edu/hightech/).

This page intentionally left blank

CONTRIBUTORS TO VOLUME 2 Patrícia Abrantes Universidade do Porto, Porto, Portugal; and Institute of Geography and Spatial Planning of Universidade de Lisboa, Lisboa, Portugal

Agathoklis Dimitrakos Aristotle University of Thessaloniki, Thessaloniki, Greece; and Interbalkan Environment Centre, Lagadas, Greece

Thomas K Alexandridis Aristotle University of Thessaloniki, Thessaloniki, Greece

Lynne Falconer University of Stirling, Stirling, United Kingdom

Agamemnon Andrianopoulos Interbalkan Environment Centre, Lagadas, Greece

George Galanis Interbalkan Environment Centre, Lagadas, Greece; and Aristotle University of Thessaloniki, Thessaloniki, Greece

Georg Bareth University of Cologne, Cologne, Germany Benjamin Bechtel University of Hamburg, Hamburg, Germany Todd K BenDor University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Eduardo Gomes Institute of Geography and Spatial Planning of Universidade de Lisboa, Lisboa, Portugal Li Huan City University of Hong Kong, Hong Kong, China Zhengdong Huang Shenzhen University, Shenzhen, China

Jan Blöthe University of Bonn, Bonn, Germany

He Jinliao Nanjing University, Nanjing, China

Jürgen Böhner University of Hamburg, Hamburg, Germany

Eleni Kalopesa Interbalkan Environment Centre, Lagadas, Greece

Tobias Bolch University of Zurich, Zürich, Switzerland

Fotios Katsogiannos Aristotle University of Thessaloniki, Thessaloniki, Greece; and Interbalkan Environment Centre, Lagadas, Greece

Gabriele Buttafuoco National Research Council of Italy (CNR-ISAFOM), Rende, Italy Arnab Chakraborty University of Illinois at Urbana-Champaign, Champaign, IL, United States Yan Chen University of North Carolina-Chapel Hill, Chapel Hill, NC, United States

Wolfgang Korres University of Cologne, Cologne, Germany Chaosu Li University of North Carolina at Chapel Hill, Chapel Hill, NC, United States Xuejun Liu Wuhan University, Wuhan, China

Celena Cui Colorado School of Mines, Golden, CO, United States

David Loibl Humboldt University of Berlin, Berlin, Germany

Brian Deal University of Illinois at Urbana-Champaign, Champaign, IL, United States

Federica Lucà National Research Council of Italy (CNR-ISAFOM), Rende, Italy

xi

xii

Contributors to Volume 2

Austin Madson University of California, Los Angeles, CA, United States

Trevor Telfer University of Stirling, Stirling, United Kingdom

Andrew McMillan University of Illinois at Urbana-Champaign, Champaign, IL, United States

Oreste Terranova Research Institute for National Research Council of Italy (CNR-IRPI), Rende, Italy

Matthew D Minnick Colorado School of Mines, Golden, CO, United States; and RESPEC Consulting & Services, Rapid City, SD, United States

Kristen A Vitro University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Paulo Morgado Institute of Geography and Spatial Planning of Universidade de Lisboa, Lisboa, Portugal Alireza Motevalli Tarbiat Modares University, Tehran, Iran

Guido Waldhoff University of Cologne, Cologne, Germany Bev Wilson University of Illinois at Urbana-Champaign, Champaign, IL, United States

Jan-Christoph Otto University of Salzburg, Salzburg, Austria

Sierra Woodruff University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Haozhi Pan University of Illinois at Urbana-Champaign, Champaign, IL, United States

Qiusheng Wu Binghamton University, State University of New York, Binghamton, NY, United States

Qisheng Pan Texas Southern University, Houston, TX, United States

Huang Xianjin Nanjing University, Nanjing, China

Kim Long Pham University of Stirling, Stirling, United Kingdom

Zong Yueguang Nanjing University, Nanjing, China

Hamid Reza Pourghasemi

Mohsen Zabihi Tarbiat Modares University, Tehran, Iran

Günther Prasicek University of Salzburg, Salzburg, Austria Lindsay Ross University of Stirling, Stirling, United Kingdom Karl Schneider University of Cologne, Cologne, Germany Lothar Schrott University of Bonn, Bonn, Germany Yongwei Sheng University of California, Los Angeles, CA, United States Chunqiao Song University of California, Los Angeles, CA, United States

George Zalidis Interbalkan Environment Centre, Lagadas, Greece; and Aristotle University of Thessaloniki, Thessaloniki, Greece Ming Zhang University of Texas at Austin, Austin, TX, United States Wendy Zhou Colorado School of Mines, Golden, CO, United States Youshan Zhuang University of Illinois at Urbana-Champaign, Champaign, IL, United States

CONTENTS OF VOLUME 2 Editor in Chief

v

Volume Editors

vii

Contributors to Volume 2

xi

Contents of All Volumes

xv

Preface

xxi

GIS for Biophysical Environment 2.01

GIS for Mapping Vegetation Georg Bareth and Guido Waldhoff

1

2.02

GIS for Paleo-limnological Studies Yongwei Sheng, Austin Madson, and Chunqiao Song

28

2.03

GIS and Soil Federica Lucà, Gabriele Buttafuoco, and Oreste Terranova

37

2.04

GIS for Hydrology Wolfgang Korres and Karl Schneider

51

2.05

GIS Applications in Geomorphology Jan-Christoph Otto, Günther Prasicek, Jan Blöthe, and Lothar Schrott

81

2.06

GIS for Glaciers and Glacial Landforms Tobias Bolch and David Loibl

112

2.07

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring Qiusheng Wu

140

GIS for Resources 2.08

GIS for Natural Resources (Mineral, Energy, and Water) Wendy Zhou, Matthew D Minnick, and Celena Cui

158

GIS for Energy 2.09

GIS for Urban Energy Analysis Chaosu Li

187

xiii

xiv

Contents of Volume 2

GIS and Climate Change 2.10

GIS in Climatology and Meteorology Jürgen Böhner and Benjamin Bechtel

196

2.11

GIS and Coastal Vulnerability to Climate Change Sierra Woodruff, Kristen A Vitro, and Todd K BenDor

236

GIS for Disaster Management 2.12

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study in Iran Alireza Motevalli, Hamid Reza Pourghasemi, and Mohsen Zabihi

2.13

Data Integration and Web Mapping for Extreme Heat Event Preparedness Bev Wilson

258 281

GIS for Agriculture and Aquaculture 2.14

GIS Technologies for Sustainable Aquaculture Lynne Falconer, Trevor Telfer, Kim Long Pham, and Lindsay Ross

2.15

An Integrated Approach to Promote Precision Farming as a Measure Toward Reduced-Input Agriculture in Northern Greece Using a Spatial Decision Support System Thomas K Alexandridis, Agamemnon Andrianopoulos, George Galanis, Eleni Kalopesa, Agathoklis Dimitrakos, Fotios Katsogiannos, and George Zalidis

290

315

GIS for Land Use and Transportation Planning 2.16

GIS and Placemaking Using Social Media Data Yan Chen

353

2.17

GIS and Scenario Analysis: Tools for Better Urban Planning Arnab Chakraborty and Andrew McMillan

371

2.18

Transit GIS Qisheng Pan, Ming Zhang, Zhengdong Huang, and Xuejun Liu

381

2.19

Modeling Land-Use Change in Complex Urban Environments Brian Deal, Haozhi Pan, and Youshan Zhuang

401

2.20

Application of GIS-Based Models for Land-Use Planning in China Huang Xianjin, Li Huan, He Jinliao, and Zong Yueguang

424

2.21

GIS Graph Tool for Modeling: UrbaneRural Relationships Paulo Morgado, Patrícia Abrantes, and Eduardo Gomes

446

CONTENTS OF ALL VOLUMES VOLUME 1: GIS METHODS AND TECHNIQUES New Perspectives on GIS (Multidisciplinary) 1.01

The Future Development of GISystems, GIScience, and GIServices Ming-Hsiang Tsou

1

1.02

Geocomputation: Data, Methods, and Applications in a New Era Shaun Fontanella and Ningchuan Xiao

5

Data Management 1.03

Big Geodata Michael F Goodchild

19

1.04

Current Themes in Volunteered Geographic Information Colin J Ferster, Trisalyn Nelson, Colin Robertson, and Rob Feick

26

1.05

Open Data and Open Source GIS Xinyue Ye

42

1.06

GIS Databases and NoSQL Databases Peng Yue and Zhenyu Tan

50

1.07

Geospatial Semantics Yingjie Hu

80

1.08

Geocoding and Reverse Geocoding Dapeng Li

95

1.09

Metadata and Spatial Data Infrastructure Scott Simmons

110

Spatial Analysis and Modeling 1.10

Spatial Analysis Methods David W S Wong and Fahui Wang

125

1.11

Big Data Analytic Frameworks for GIS (Amazon EC2, Hadoop, Spark) Chen Xu

148

xv

xvi

Contents of All Volumes

1.12

Network Analysis Kevin M Curtin

153

1.13

Analysis and Modeling of Movement Paul Holloway and Jennifer A Miller

162

1.14

Spatial Metrics: The Static and Dynamic Perspectives Saad Saleem Bhatti, José Pedro Reis, and Elisabete A Silva

181

1.15

Multicriteria Analysis Jacek Malczewski

197

1.16

Agent-Based Modeling Andrew Crooks, Alison Heppenstall, and Nick Malleson

218

1.17

Spatial Optimization for Sustainable Land Use Planning Kai Cao

244

1.18

Geostatistical Approach to Spatial Data Transformation Eun-Hye Yoo

253

Space-Time GIS 1.19

Spatial and Spatiotemporal Data Mining Shashi Shekhar, Yan Li, Reem Y Ali, Emre Eftelioglu, Xun Tang, and Zhe Jiang

264

1.20

Space-Time GIS and Its Evolution Atsushi Nara

287

1.21

Time Geography Jie Dai and Li An

303

Spatial Data Quality 1.22

Spatial Data Uncertainty Linna Li, Hyowon Ban, Suzanne P Wechsler, and Bo Xu

313

Cyberinfrastructure and GIS 1.23

Cyberinfrastructure and High-Performance Computing Xuan Shi and Miaoqing Huang

341

Virtual GIS 1.24

Augmented Reality and GIS Nick Hedley

355

1.25

GIS and Serious Games Brian Tomaszewski, Angelina Konovitz-Davern, David Schwartz, Joerg Szarzynski, Lena Siedentopp, Ashely Miller, and Jacob Hartz

369

Mobile GIS 1.26

Mobile GIS and Location-Based Services Song Gao and Gengchen Mai

384

Contents of All Volumes

xvii

Public GIS 1.27

Societal Impacts and Ethics of GIS Jeremy W Crampton, Eric M Huntley, and Emily C Kaufman

398

1.28

Geoprivacy Marc P Armstrong, Ming-Hsiang Tsou, and Dara E Seidl

415

1.29

Defining Public Participation GIS Rina Ghose

431

GIS Design and Project Management 1.30

User-Centered Design for Geoinformation Technologies Sven Fuhrmann

438

1.31

GIS Project Management Jochen Albrecht

446

VOLUME 2: GIS APPLICATIONS FOR ENVIRONMENT AND RESOURCES GIS for Biophysical Environment 2.01

GIS for Mapping Vegetation Georg Bareth and Guido Waldhoff

1

2.02

GIS for Paleo-limnological Studies Yongwei Sheng, Austin Madson, and Chunqiao Song

28

2.03

GIS and Soil Federica Lucà, Gabriele Buttafuoco, and Oreste Terranova

37

2.04

GIS for Hydrology Wolfgang Korres and Karl Schneider

51

2.05

GIS Applications in Geomorphology Jan-Christoph Otto, Günther Prasicek, Jan Blöthe, and Lothar Schrott

81

2.06

GIS for Glaciers and Glacial Landforms Tobias Bolch and David Loibl

112

2.07

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring Qiusheng Wu

140

GIS for Resources 2.08

GIS for Natural Resources (Mineral, Energy, and Water) Wendy Zhou, Matthew D Minnick, and Celena Cui

158

GIS for Energy 2.09

GIS for Urban Energy Analysis Chaosu Li

187

xviii

Contents of All Volumes

GIS and Climate Change 2.10

GIS in Climatology and Meteorology Jürgen Böhner and Benjamin Bechtel

196

2.11

GIS and Coastal Vulnerability to Climate Change Sierra Woodruff, Kristen A Vitro, and Todd K BenDor

236

GIS for Disaster Management 2.12

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study in Iran Alireza Motevalli, Hamid Reza Pourghasemi, and Mohsen Zabihi

2.13

Data Integration and Web Mapping for Extreme Heat Event Preparedness Bev Wilson

258 281

GIS for Agriculture and Aquaculture 2.14

GIS Technologies for Sustainable Aquaculture Lynne Falconer, Trevor Telfer, Kim Long Pham, and Lindsay Ross

2.15

An Integrated Approach to Promote Precision Farming as a Measure Toward Reduced-Input Agriculture in Northern Greece Using a Spatial Decision Support System Thomas K Alexandridis, Agamemnon Andrianopoulos, George Galanis, Eleni Kalopesa, Agathoklis Dimitrakos, Fotios Katsogiannos, and George Zalidis

290

315

GIS for Land Use and Transportation Planning 2.16

GIS and Placemaking Using Social Media Data Yan Chen

353

2.17

GIS and Scenario Analysis: Tools for Better Urban Planning Arnab Chakraborty and Andrew McMillan

371

2.18

Transit GIS Qisheng Pan, Ming Zhang, Zhengdong Huang, and Xuejun Liu

381

2.19

Modeling Land-Use Change in Complex Urban Environments Brian Deal, Haozhi Pan, and Youshan Zhuang

401

2.20

Application of GIS-Based Models for Land-Use Planning in China Huang Xianjin, Li Huan, He Jinliao, and Zong Yueguang

424

2.21

GIS Graph Tool for Modeling: UrbaneRural Relationships Paulo Morgado, Patrícia Abrantes, and Eduardo Gomes

446

VOLUME 3: GIS APPLICATIONS FOR SOCIO-ECONOMICS AND HUMANITY GIS for Economics 3.01

GIS and Spatial Statistics/Econometrics: An Overview Daniel A Griffith and Yongwan Chun

3.02

Estimating Supply Elasticities for Residential Real Estate in the United Kingdom Thies Lindenthal

1 27

Contents of All Volumes

xix

3.03

Forced Displacement and Local Development in Colombia: Spatial Econometrics Analyses Néstor Garza and Sandra Rodriguez

42

3.04

Searching for Local Economic Development and Innovation: A Review of Mapping Methodologies to Support Policymaking 59 Alexander Kleibrink and Juan Mateos

3.05

An Agent-Based Model of Global Carbon Mitigation Through Bilateral Negotiation Under Economic Constraints: The Key Role of Stakeholders’ Feedback and Facilitated Focus Groups and Meetings in the Development of Behavioral Models of Decision-Making Douglas Crawford-Brown, Helin Liu, and Elisabete A Silva

69

GIS for Business and Management 3.06

GIS-Based Approach to Analyze the Spatial Opportunities for Knowledge-Intensive Businesses Mei Lin Yeo, Saad Saleem Bhatti, and Elisabete A Silva

83

GIS for History 3.07

GIS for History: An Overview N Jiang and D Hu

101

3.08

PastPlace Historical Gazetteer Humphrey Southall, Michael Stoner, and Paula Aucott

110

3.09

Collaborative Historical Information Analysis Patrick Manning, Pieter François, Daniel Hoyer, and Vladimir Zadorozhny

119

3.10

A Review on the Current Progress in Chinese Historical GIS Research Peiyao Zhang, Ning Bao, and Kai Cao

145

GIS for Linguistics 3.11

GIS in Linguistic Research Jay Lee, Jiajun Qiao, and Dong Han

152

3.12

GIS in Comparative-Historical Linguistics Research: Tai Languages Wei Luo, John Hartmann, Fahui Wang, Huang Pingwen, Vinya Sysamouth, Jinfeng Li, and Xuezhi Cang

157

GIS for Politics 3.13

Spatial Dimensions of American Politics Iris Hui and Wendy K Tam Cho

181

3.14

GIS-Enabled Mapping of Electoral Landscape of Support for Political Parties in Australia Robert J Stimson, Prem Chhetri, and Tung-Kai Shyy

189

GIS for Law and Regulations 3.15

A Global Administrative Solution to Title and Tenure Insecurity: The Implementation of a Global Title and Rights Registry C Kat Grimsley

257

xx

3.16

Contents of All Volumes

Revamping Urban Immovable Property Tax System by Using GIS and MIS: A Case Study of Reforming Urban Taxation Systems Using Spatial Tools and Technology Nasir Javed, Ehsan Saqib, Abdul Razaq, and Urooj Saeed

272

GIS for Human Behavior 3.17

Urban Dynamics and GIScience Chenghu Zhou, Tao Pei, Jun Xu, Ting Ma, Zide Fan, and Jianghao Wang

297

3.18

Sensing and Modeling Human Behavior Using Social Media and Mobile Data Abhinav Mehrotra and Mirco Musolesi

313

3.19

GIS-Based Social Spatial Behavior Studies: A Case Study in Nanjing University Utilizing Mobile Data Bo Wang, Feng Zhen, Xiao Qin, Shoujia Zhu, Yupei Jiang, and Yang Cao

320

3.20

The Study of the Effects of Built Form on Pedestrian Activities: A GIS-Based Integrated Approach Ye Zhang, Ying Jin, Koen Steemers, and Kai Cao

330

3.21

The Fusion of GIS and Building Information Modeling for Big Data Analytics in Managing Development Sites Weisheng Lu, Yi Peng, Fan Xue, Ke Chen, Yuhan Niu, and Xi Chen

345

GIS for Evidence-Based Policy Making 3.22

Smarter Than Smart Cities: GIS and Spatial Analysis for Socio-Economic Applications That Recover Humanistic Media and Visualization Annette M Kim

3.23

Comparing Global Spatial Data on Deforestation for Institutional Analysis in Africa Aiora Zabala

371

3.24

Constructing a Map of Physiological Equivalent Temperature by Spatial Analysis Techniques Poh-Chin Lai, Pui-Yun Paulina Wong, Wei Cheng, Thuan-Quoc Thach, Crystal Choi, Man Sing Wong, Alexander Krämer, and Chit-Ming Wong

389

3.25

GIS-Based Accessibility Analysis of Health-Care Facilities: A Case Study in Hong Kong Wenting Zhang, Kai Cao, Shaobo Liu, and Bo Huang

402

3.26

From Base Map to Inductive MappingdThree Cases of GIS Implementation in Cities of Karnataka, India Christine Richter

3.27

Using GIS to Understand Schools and Neighborhoods Linda Loubert

Index

360

411 422

441

PREFACE Since its inception in the 1960s, Geographic Information System (GIS) has been undergoing tremendous development, rendering it a technology widely used for geospatial data management and analysis. The past several decades have also witnessed increasing applications of GIS in a plethora of areas, including environment, energy, resources, economics, planning, transportation, logistics, business, and humanity. The rapid development of GIS is partly due to the advances in computational technologies and the increasing availability of various geospatial data such as satellite imagery and GPS traces. Along with the technological development of GIS, its underlying theory has significantly progressed, especially on data representation, data analysis, uncertainty, and so on. As a result, the theory, technology, and application of GIS have made great strides, leading to a right time to summarize comprehensively such developments. Comprehensive Geographical Information System (CGIS) thus comes. CGIS provides an in-depth, state-of-the-art review of GIS with an emphasis on basic theories, systematic methods, state-of-the-art technologies, and its applications in many different areas, not only physical environment but also socioeconomics. Organized into three volumes, GIS theories and techniques, GIS applications for environment and resources, and GIS applications for socioeconomics and humanity, the book comprises 79 chapters, providing a comprehensive coverage of various aspects of GIS. In particular, a rich set of applications in socioeconomics and humanity are presented in the book. Authored and peer-reviewed by recognized scholars in the area of GIS, each chapter provides an overview of the topic, methods used, and case studies. The first volume of the book covers a wide spectrum of topics related to GIS methods and techniques, ranging from data management and analysis to various new types of GIS, e.g., virtual GIS and mobile GIS. While the fundamental topics in GIS such as data management, data analysis, and data quality are included, the latest developments in spaceetime GIS, cyber GIS, virtual GIS, mobile GIS, and public GIS are also covered. Remarkably, new perspectives on GIS and geocomputation are also provided. The further development of GIS is driven by the demand on applications, and various new data may be required. Big data has emerged to provide an opportunity to fuel the GIS development. Mike Goodchild provides an overview of such data, which is followed by voluntary geographic information, an important part of big geodata. Closely related to big data, open data is, however, accessible public data; they are not the same. Spatial analysis is indispensable for a GIS. After an overview of spatial analysis methods, big data analytics, spatial metrics, spatial optimization, and other relevant topics are included. Space and time are interrelated information, and their integration has long been an active research area in GIS. This section covers spaceetime data mining, spaceetime GIS, and time geography. Drawing on the developments in computer science and engineering, GIS has evolved to become more powerful through the integration with virtual reality and wireless technologies. Clearly, this volume provides new insights into different designs of GIS catering to the widespread needs of applications. This volume of the book will be of great interest not just to GIS researchers, but also to computer scientists and engineers. Environment and resources are fundamental to human society. The second volume of the book focuses on GIS applications in these areas. GIS has been widely used in the areas related to natural environments; hence various such applications using GIS, such as vegetation, soil, hydrology, geomorphology, wetland, glaciers and glacial landforms, and paleolimnology, are covered. Resources and energy are closely related to the environment and so applications in these aspects are also covered. Climate change represents a challenge to human sustainable development. One reason for this is that climate change is increasing the odds of more extreme weather events taking place. It is apparent that GIS has been capitalized on to address the related issues, starting from climatology and meteorology to disaster management and vulnerability analysis. Parallel to applications

xxi

xxii

Preface

for natural environment, resources, energy, and climate, GIS has also applied to human production activities, such as agriculture and aquaculture, which has also been covered in this volume. In addition to natural environment, built environment and its associated topics such as place-making, public transit, and land use modeling and planning are also included. Parallel to the second volume, the third volume of the book covers the applications of GIS in socioeconomics and humanities. Comparatively such applications are not as many as those in environment and resources. However, due to the increasing availability of data that can capture human activities, more applications have emerged in the areas, including economics, business management, history, linguistics, politics, law, human behavior, and policy making. Starting from Dan Griffith’s overview of GIS and spatial statistics/econometrics, GIS applications in real estate, local economic development, and carbon mitigation are then covered. Innovation drives economic growth in today’s knowledge-based economy; their relationship is covered in both the economics section and business management section. In addition to economics, GIS has also been widely applied to humanities. Such GIS applications as in history, linguistics, politics, and law are included. Human behavior has been given renewed emphasis due to the advent of social media and other types of big data. The first chapter in this section provides an overview of urban dynamics and geographic information science; several chapters are devoted to this topic. Finding evidence to support socioeconomic policy making is a highly important contribution that GIS can make. This volume also covers several chapters to find evidences for policy making. This book could have not been completed without the help and advice of many people. In this regard we would like to thank a number of people who were instrumental in bringing this project to fruition. First, I would like to acknowledge the enthusiastic support of an outstanding editorial team including Thomas Cova and Ming-Hsiang Tsou (Volume 1), Yan Song, Georg Bareth and Chunqiao Song (Volume 2), and Kai Cao and Elisabete Silva (Volume 3). From the initial discussions of the structure of the book, the selection of authors for chapters in different volumes, to the encouragement of authors and review of chapters, they have made significant contributions at each stage of the book. I am very grateful for their invaluable input and hardwork. I would also like to express my sincere gratitude to the production team at Elsevier, Priscilla, Paula, Katie, and in particular Laura, for their many efforts, perseverance, and skillful management of every aspect of this project. Last and certainly not least, I am hugely indebted to all of our authors. We have been extraordinarily fortunate in attracting individuals from all over the world to take time from their busy schedules to prepare this set of contributions. Finally, my special thanks go to my wife Rongrong and our daughter Kate for their love, help, and understanding. Without their endless support, this book would have never come to the end. Bo Huang, Editor in Chief

PERMISSION ACKNOWLEDGMENTS The following material is reproduced with kind permission of Taylor & Francis Figure 6 Spatial Analysis Methods Figure 7 Spatial Analysis Methods Figure 12 Spatial Analysis Methods Figure 2 GIS for Linguistic Research Figure 3 GIS for Linguistic Research Figure 4 GIS for Linguistic Research Figure 5 GIS for Linguistic Research Figure 6 GIS for Linguistic Research Table 2 GIS for Linguistic Research Table 3 GIS for Linguistic Research Figure 1 GIS and Scenario Analysis: Tools for Better Urban Planning Figure 4 GIS Applications in Geomorphology Figure 8 GIS for Glaciers and Glacial Landforms Figure 18 GIS for Glaciers and Glacial Landforms Table 1 Spatial Metrics - The Static and Dynamic Perspectives Table 2 Spatial Metrics - The Static and Dynamic Perspectives Figure 6 Urban Dynamics and GIScience Figure 7 Urban Dynamics and GIScience Figure 8 Urban Dynamics and GIScience Table 2 Using GIS to Understand Schools and Neighborhoods www.taylorandfrancisgroup.com

i

2.01

GIS for Mapping Vegetation

Georg Bareth and Guido Waldhoff, University of Cologne, Cologne, Germany © 2018 Elsevier Inc. All rights reserved.

2.01.1 2.01.2 2.01.3 2.01.4 2.01.4.1 2.01.4.2 2.01.4.3 2.01.4.4 2.01.4.5 2.01.4.6 2.01.4.7 2.01.5 2.01.6 References Further Reading

2.01.1

Introduction Plant Communities and Vegetation Inventories Vegetation Data in Official Information Systems Multi-Data Approach for Land Use and Land Cover Mapping Introduction to Land Use and Land Cover Mapping Methods for Remote Sensing-Based Land Use/Land Cover Mapping Integration of Remote Sensing and GIS for Land Use/Land Cover Mapping Data and GIS Methods for Enhanced Land Use/Land Cover Mapping Multi-Data Approach for Enhanced Land Use/Land Cover Mapping Examples for Land Use/Land Cover Map Enhancement With the MDA Summary and Conclusion Analysis of High-Resolution DSMs in Forestry and Agriculture Conclusion and Outlook

1 3 5 10 10 11 12 12 13 15 17 17 21 22 27

Introduction

GIS-based mapping of vegetation is a very common and broadly established application which is interconnected with remote sensing of vegetation (Jones and Vaughan, 2010), with digital surveying and mapping of vegetation (Küchler and Zonneveld, 1988), and with traditional approaches for mapping of vegetation (Mueller-Dombois, 1984). The latter three are not in the focus of this chapter but will be included where strong dependencies occur. Traditionally, mapping of vegetation is a major objective of vegetation science, geobotany, biogeography, and landscape ecology using cartographic techniques (Küchler and Zonneveld, 1988; Pedrotti, 2013). Pedrotti (2013) even states that “Geobotanical cartography is a field of thematic cartography.” which includesdas always in GIS applicationsdthe expertise of cartography. Consequently, GIS-based mapping of vegetation is a strongly interdisciplinary topic. As in almost all cases of GIS applications, and especially true in the context of vegetation, the overarching use of vegetation maps serves spatial decision making (Küchler, 1988a). The planning of protected areas, future land use, change of land use, etc. depends on related and for the spatial decision-making process adequate spatial data. Furthermore, analysis of the change in vegetation inventories which are mapped for certain time windows gives a clear signal of the magnitude of changes in or on the environment. Finally, in the context of agriculture, forestry, and resource management, the monitoring of the temporal development of vegetation, the phenology, is of key interest for management decisions like fertilization, weeding, pest control, and many more (Brown, 2005; Mulla, 2013; Naesset, 1997). In general, the mapping scale also determines the mapping technologies. In the literature, several approaches to classify scale categories or regional extent for surveying techniques are available (e.g., Alexander and Millington, 2000). The authors categorize spaceborne remote sensing (> 5 m resolution) for potential global areal extent while aerial photography with a much higher spatial resolution (1–10 m) is limited to several square kilometers. Such categories are often found in textbooks but this approach will not be followed in this chapter in the context of GIS applications. On the contrary, we strongly believe that nowadays proximal, airborne, and satellite-based remote sensing in combination with ground surveying techniques like field sampling, GPS, and laser scanning are already supporting vegetation mapping in all scales. Increasing data availability, computing resources, and software capabilities nowadays enable, e.g., global land cover mapping in 30 m resolution (Hansen et al., 2013) which was not covered as a complete data product in the categorization of scales in earlier publications. This trend of increasing spatial resolution of global data products on vegetation will continue, due to the fact that satellite imagery from ESA’s Sentinel-2 is already producing global coverage with a spatial resolution of 10 m and is accessible openly (https://sentinel.esa.int). So, nowadays it is more a combination of scale overlapping methods which results in vegetation maps. For example, mapping and monitoring the spatial variability of plant growth can be supported in subcentimeter resolution with unmanned aerial systems (UAS) or with low-altitude flying manned vehicles like gyrocopters, or in submeter or meter resolution from optical or microwave satellite-based remote sensing (Bendig et al., 2015; Hütt et al., 2016; Koppe et al., 2013) for study areas ranging from several square meters to several thousand hectares. Additionally, these airborne sensors can be operated with low cost, are available globally, and only are limited by national aviation regulations. The usage of such technologies for vegetation mapping will even grow with the increasing availability of sensor data (e.g., ESA’s Copernicus program; http://www.copernicus.eu) and with the availability of easy-to-use imaging UASs (Bareth et al., 2015). In Fig. 1, a UAS data example is shown from Turner et al. (2014), who carried out multisensor UAS campaigns for Antarctic Moss Beds mapping. It becomes obvious that such sensor-carrying systems can support the mapping of vegetation in almost any environment.

1

2

GIS for Mapping Vegetation

(B)

(A)

110°35⬘7⬙E

110°35⬘9⬙E

110°35⬘11⬙E

(C)

14.4 °C

Reflectance [%]

30

20

10

0 530

550 570 670 700 Wavelength [nm]

800 (E)

–4.4 °C (D)

meters 0

1

2

4

6

8

Fig. 1 Robinson Ridge study site: (A) visible mosaic of entire area, (B) RGB image subset, (C) multispectral image subset, (D) thermal infrared image subset, and (E) typical multispectral reflectance function of a healthy Antarctic moss turf. Turner D, Lucieer A, Malenovsky Z, King DH, and Robinson SA (2014): Spatial co-registration of ultra-high resolution visible, multispectral and thermal images acquired with a Micro-UAV over antarctic moss beds. Remote Sensing 6 (5), 4003–4024.

GIS for Mapping Vegetation

3

Having the developments of traditional vegetation mapping and of proximal and remote sensing of the last decade in mind, this chapter of GIS applications of mapping of vegetation will focus on four major topics: vegetation inventories; vegetation data in available spatial information systems; l GIS-supported land use and land cover (LULC) mapping; l GIS-based analysis of super high-resolution digital surface models (DSMs). l l

For the mapping of plant species and communities, GISs serve as a spatial database to manage, administrate, and visualize the surveyed data on vegetation. GIS-based analyses are only used for point to polygon regionalization or for (geostatistical) interpolation between sampling points. The set-up of spatial topographic information systems like the Authorative TopographicCartographic Information System (ATKIS: Amtliches Topographisch-Kartographisches Informationssystem) in Germany are also an important source for spatial vegetation data (www.atkis.de). Such spatial information systems are available in many countries at a scale of at least 1:25.000. More precise and spatial vegetation data is mapped, e.g., in Germany in so-called biotope maps which deserve protection in a scale < 1:25.000 (http://www4.lubw.baden-wuerttemberg.de/servlet/is/19264/). Special vegetation data products are produced for land cover and land use data sets. Examples are the European CORINE Land Cover data (Coordination of Information on the Environment Land CoverdCLC; www.eea.europa.eu/publications/COR0-landcover) and the German DeCover activities (www.decover.info). Finally, the previously mentioned new high-resolution sensor technologies result in new data products with an astonishing spatial resolution in centimeter or subcentimeter resolution. Such data can be analyzed with GIS techniques in a new context deriving plant species or plant dynamics.

2.01.2

Plant Communities and Vegetation Inventories

The mapping of plant species and plant communities (phytocenose) has a long tradition and dates back to the 15th century (Küchler, 1988b). While these early activities of mapping vegetation were more or less unstandardized, the developments in topographical surveying and the generation of detailed topographic maps from the 18th century on resulted in a mapping of consistent vegetation classes (Küchler, 1988b). The first pure vegetation maps were produced in the middle of the 19th century and in the early 20th century, the field of geobotanical cartography emerged (Küchler, 1988b; Pedrotti, 2013). In addition to the vegetation surveys as a basis for vegetation maps, spatial data technologies emerged in the middle of the 20th century (GISdgeographical information systems; RSdremote sensing), which have supported mapping of vegetation since then (Alexander and Millington, 2000; Pedrotti, 2013). Scales are an important issue for mapping of vegetation and determine in general the level of detail that can be mapped (Küchler, 1988c). The consideration of scales and the corresponding mapping of detail is closely related to the subject of “levels of synthesis in geobotanical mapping” described by Pedrotti (2013). Pedrotti (2013) classified in total eight such mapping levels, which correspond with distinct map types: 1. The level of individual plants (species) enables the mapping of plant populations at a very high detail, even considering individual plants. The corresponding map type is the population map at a scale of larger than 1:2000 (Küchler, 1988c; Pedrotti, 2013). 2. Also in the map type of a population map is the level of populations (species). The focus in this level is on the distribution of species in a given small study or mapping area. The preferred map scale is larger than 1:5000 (Küchler, 1988c; Pedrotti, 2013). 3. The level of synusiae is for the mapping of synusia, which represents a “group of functionally similar plant species in a vegetation stand.” The map type is defined as a synusial map at a scale of larger than 1:5000 (Küchler, 1988c; Pedrotti, 2013). 4. The mapping of plant communities (phytocenose) is defined as the level of phytocoenosis. Plant associations are of major importance here and the map type is a phytosociological map at a preferred scale of larger than 1:25,000 (Küchler, 1988c; Pedrotti, 2013). 5. The level of ecotopes (teselas) represents the mapping of sigmetum series, which represents a combination of phytocenose in a landscape unit. The corresponding map type is a synphytosociological map at a scale of larger than 1:25,000 (Küchler, 1988c; Pedrotti, 2013). 6. The landscape scale is categorized in the level of catenas. The idea of this geo-synphytosociological map type is the mapping of “catenas of vegetation series” and the scale is larger than 1:100,000 (Küchler, 1988c; Pedrotti, 2013). 7. The regional or national scale is defined as the “level of lower phytogeographical units.” The recommended map scale for this “regional-phytogeographical” map type is larger than 1:250,000 (Küchler, 1988c; Pedrotti, 2013). 8. Finally, the level of higher phytogeographical units and biomes serves for continental or global scales. This also includes maps of vegetation zones. The map scale is larger than 1:5,000,000 (Küchler, 1988c; Pedrotti, 2013). Additional to these mentioned map types, Zonneveld (1988) differentiates between “real vegetation maps” and “potential natural vegetation maps.” Maps of plant biodiversity are also in the focus of some mapping activities (Pedrotti, 2013; Scott et al., 1993). Especially, GIS-based multiscale inventories for biodiversity were recognized as an important approach to identify “landscape patterns, vegetation, habitat structure, and species distribution” (Noss, 1990). In Germany, for example, selected rare biotopes are mapped in high detail at a scale of 1:25,000. The responsibility for this Biotopkartierung (biotope mapping) lies within each of the federal states. In North Rhine-Westphalia (NRW) for example, such biotopes, covering 18% of the state’s area (approximately 6135 km2), are mapped and the data is openly accessible (http://bk.naturschutzinformationen.nrw.de). In Fig. 2, a part of the

4

GIS for Mapping Vegetation

Data Sources: - Biotop Map of NRW (http://bk.naturschutzinformationen.nrw.de/bk/de/downloads) - WMS: Digital topographic map of NRW (https://www.wms.nrw.de/geobasis/wms_nw_dtk)

0

5

10

20 kilometers

Fig. 2 Example of the Biotope Map of North Rhine-Westfalia, Germany: colored polygons represent surveyed and mapped biotopes. Source: Schutzwürdige Biotope in Nordrhein-Westfalen - http://bk.naturschutzinformationen.nrw.de.

biotope map of NRW between Cologne and Aachen is shown. The colored polygons represent the biotopes mapped and described in detail. The attribute descriptions include plant communities and plant species. The data is openly downloadable as a shape file or is accessible via a WebGIS (http://bk.naturschutzinformationen.nrw.de/bk/de/karten/bk). Finally, the mapping of invasive species is increasing in importance and comprehensive inventories and databases like the Delivering Alien Invasive Species Inventories for Europe project (DAISE; www.europe-aliens.org) (Lambdon et al., 2008) or USDA’s National Invasive Species Information Center (NISIC; https://www.invasivespeciesinfo.gov) have been set up. Other mapping and classification approaches are described by Mueller-Dombois (1984) and scale-dependently by Alexander and Millington (2000). Besides the approaches described of how to map vegetation as a function of scale, numerous guidelines and mapping procedures are available. The United States Geological Survey (USGS) together with the National Park Service (NPS), e.g., have provided a final draft for “Field Methods for Vegetation Mapping” (1994; https://www1.usgs.gov/vip/standards/fieldmethodsrpt.pdf). The guidelines consider GIS technologies and the usage of orthoimagery. The report is designed for a map scale of 1:24,000. The NPS provides up-to-date information on its vegetation mapping online and a 12-Step Guidance for NPS Vegetation Inventories (https://science. nature.nps.gov/im/inventory/veg/docs/Veg_Inv_12step_Guidance_v1.1.pdf) updated in 2013. Similar resources on vegetation mapping are, e.g., available from Australia. The Australian National Vegetation Information System (NVIS) is a comprehensive spatial data information system which provides numerous additional information in books, reports, and fact sheets (https:// www.environment.gov.au/land/native-vegetation/national-vegetation-information-system). Via the NVIS webpage numerous documents on classification, vegetation mapping, legend, etc. are available. Even the documents of Australia’s Native Vegetation Framework (2012) are downloadable. Similar activities are also found, e.g., in China. The vegetation map of China (1:1,000,000) is accompanied by much additional information on vegetation and plant communities (Guo, 2010). The usage of GIS in mapping plant communities and vegetation inventories is more or less reduced to spatial data storage, visualization, and map production (van der Zee and Huizing, 1988). Boundaries, plots, or stands can be captured by field surveys with technologies like tachymeter, compass, measuring tape, altimeter, laser distance meter, and GPS (Pedrotti, 2013). The data are stored as raster data or vector data (points, lines, or polygons with corresponding attributes) depending on the field mapping approach (Mueller-Dombois, 1984; Pedrotti, 2013; Zonneveld, 1988). Analysis is of minor importance and is limited to geostatistical interpolation and extrapolation methods, as well as to point-to-polygon regionalization concepts. Nowadays, GIS technologies support vegetation mapping in the field more actively. Portable computers (smartphones, PADs, notebooks) can be interfaced with a GPS receiver or have a GPS module included. Together with internet access via mobile data connection, available digital geodata via, e.g., Web Map Service (WMS), and adequate GIS software, it is possible to electronically use digital orthophotos (DOPs),

GIS for Mapping Vegetation

5

satellite imagery, soil or geological data, and topographic base maps as background and orientation information in the field and for direct digital vegetation surveys or mapping. The potential and beneficiary usage of remote sensing for vegetation mapping is well described by Wyatt (2000). While from the 1950s to the 1980s, the usage of aerial photographs dominated in mapping of vegetation, from the late 1980s until today, the importance of satellite-based remote sensing has grown exponentially (Houborg et al., 2015; Millington and Alexander, 2000). Three recent developments in remote sensing will have a fundamental impact on vegetation mapping: (i) Increase of spatial resolution: – satellite-based RS: up to submeter resolution – airborne-based RS: up to subdecimeter resolution – UAS-based RS: up to subcentimeter resolution (ii) Increase of temporal resolution: – satellite-based RS: up to daily repetition – airborne-based RS: up to multiple repetitions per day – UAS-based RS: up to hourly repetitions (iii) Increase of spectral resolution: – satellite-based RS: multi- and hyperspectral, multithermal, X-, C-, L-band sensors – airborne-based RS: multi- and hyperspectral, multithermal, mm-microwave, X-, C-, L-band sensors – UAS-based RS: low-weight multi- and hyperspectral, multithermal, mm-microwave, X- and C-band sensors In summary, for all RS platforms, all available sensor technologies are mountable, which not only enables the determination of species or plant communities but also of vitality, nutrient status, and stresses, e.g., in crops or trees (Jones and Vaughan, 2010; Thenkabail et al., 2011; Wulder and Franklin, 2003). The high temporal resolution can even capture phenology in vegetation from leaf development to flowering, ripening, and senescence (More et al., 2016; Parplies et al., 2016; Bendig et al., 2015). Finally, using fluorescence remote sensing technology, photosynthesis can be observed in a diurnal or seasonal context (Schickling et al., 2016; Wieneke et al., 2016; Rascher et al., 2015; Zarco-Tejada et al., 2003; https://earth.esa.int/web/guest/missions/esa-futuremissions/flex). The context of this chapter is clearly not on remote sensing methods for mapping of vegetation or on surveying techniques for mapping vegetation. The focus is on how GIS technologies can support vegetation mapping. According to Pedrotti (2013) “a vegetation map consists of a topographic map that shows vegetation units..” This more traditional view of vegetation mapping fits very well into the understanding of spatial information systems described by Bill (2016). All content-related spatial information systems for soil, geology, vegetation, etc. should be based on a topographic information system (Bill, 2016). This concept is also described by Bareth (2009) in the context of establishing a spatial environmental information system (SEIS). Nowadays, topographic information systems have replaced the traditional topographic mapping procedures. Consequently, modern mapping of vegetation should be based on those systems that are usually established, maintained, and provided by official surveying and mapping agencies.

2.01.3

Vegetation Data in Official Information Systems

The availability of vegetation data in spatial information systems is manifold and ranges over all scales. Global land cover data are, e.g., available from several research initiatives like the Global Land Cover Facility (GLCF) (http://glcf.umd.edu), the Global Land Analysis & Discovery Group (http://glad.umd.edu), or the GlobeLand30 initiative, which provides a 30-m land cover data set (http://www. globallandcover.com; Arsanjani et al., 2016; Chen et al., 2015; Congalton et al., 2014). The latter is of significance because it contains global land cover classes for cultivated lands, forests, grasslands, shrublands, tundra, and wetlands (compare Fig. 3). Besides the LULC data, on a regional or national scale numerous LULC data products are available that contain more detailed information on vegetation. For example, the European Union’s CORINE Land Cover data is available for most of the member states for 1990, 2000, 2006 and 2012 for minimal mapping units of 25 ha containing 44 mapping units (http://land.copernicus.eu/paneuropean/corine-land-cover). The data is available via download from various EU or national official websites (e.g., Germany: http://www.corine.dfd.dlr.de/intro_en.html). Some EU states improved the spatial CLC data quality by increasing the minimal mapping unit to 10 ha or even 1 ha. For example, the CLC data for the year 2012 for Germany is based on such a minimal mapping unit of 1 ha containing 37 land cover classes (Fig. 4) (Keil et al., 2015). For this enhancement of the spatial quality, the Authorative Topographic-Cartographic Information System (ATKISdwww.atkis.de) was implemented in the land cover analysis (Keil et al., 2015). The importance of official LULC data, usually provided by the official surveying and mapping agencies, for LULC products is also stated by Inglada et al. (2017). The authors present a processing scheme based on Sentinel-2 image data to derive LULC for France with a 10 m resolution. In the processing scheme, additional LULC data is used as reference data. Finally, LULC data are available in varying detail from national authorities for many countries, e.g., for the United States from the USGS Land Cover Institute (LCI) (https://landcover.usgs.gov) or from the USDA’s Cropscape Project (https://nassgeodata.gmu.edu/CropScape). Roy et al. (2015) present a seamless and very detailed vegetation type map for India at a scale of 1:50,000. (Fig. 6). The methodological approach of the mapping of vegetation is shown in Fig. 5. The authors combined several spatial data sources in a GIS environment for vegetation classification, mapping, and accuracy assessment.

6

30⬚ E

60⬚ E

90⬚ E

120⬚ E

150⬚ E

E180⬚

W

150⬚ W

120⬚

W

90⬚ W

60⬚ W

Arctic Circle

Arctic Circle

60⬚ N

60⬚ N

30⬚ N

30⬚ N

Tropic of Cancer

Tropic of Cancer

Equator

Equator

Tropic of Capricorn

Tropic of Capricorn

LEGEND

30⬚ S

30⬚ S

Grassland Water Bodies Wetland Forest Permanent Snow and ice Shrubland Bareland Tundra Cultivated land Artifitial Surfaces Continental Boundary

60⬚ S

Scale

Antarctic Circle

W0⬚ E

30⬚ E

60⬚ E

90⬚ E

120⬚

E

150⬚ E

E180⬚

W

150⬚ W

60⬚ S

1 : 205 000 000

120⬚ W

Antarctic Circle

90⬚ W

60⬚ W

Fig. 3 The GlobeLand30 global land cover map. Figure from Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M., Zhang, W., Tong, X., Mills, J. (2015). Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS Journal of Photogrammetry and Remote Sensing 103, 7–27, doi:10.1016/j.isprsjprs.2014.09.002, with permission.

GIS for Mapping Vegetation

W0⬚ E

GIS for Mapping Vegetation

7

FOREST AND SEMINATURAL AREA FORESTS

ARTIFICIAL SURFACES URBAN FABRIC

311 Broad-leaved forest 312 Coniferous forest 313 Mixed forest

111 Continuous urban fabric 112 Discontinuous urban fabric

INDUSTRIAL, COMMERCIAL AND TRANSPORT UNITS

SCRUBS AND/OR HERBACEOUS VEGETATION

121 Industrial, commercial and public units 122 Road and rail networks and associated land 123 Port areas 124 Airport

321 Natural grassland 322 Moors and heathland 324 Transitional woodland-scrub

OPEN SPACES WITH LITTLE OR NO VEGETATION

MINES, DUMPS AND CONSTRUCTION SITES

331 Beaches, dunes, sand 332 Bare rock 333 Sparsely vegetated areas 334 Burnt areas 335 Glaciers and perpetual snow

131 Mineral extraction sites 132 Dump sites 133 Construction sites

ARTIFICIAL NON-AGRICULTURAL VEGETATED AREAS 141 Green urban areas 142 Sport and leisure facilities

WETLANDS INLAND WETLANDS 411 Inland marshes 412 Peat bogs

AGRICULTURAL AREAS ARABLE LAND

COASTAL WETLANDS

211 Non-irrigated arable land

421 Salt marshes 423 Intertidal flats

PERMANENT CROPS 221 Vineyards 222 Fruit trees and berries plantations

WATER BODIES INLAND WATERS

PASTURES

511 Water courses 512 Water bodies

231 Pastures

MARINE WATERS

HETEROGENEOUS AGRICULTURAL AREAS

521 Coastal lagoons 522 Estuaries 523 Sea and ocean

242 Complex cultivation patterns 243 Land principally occupied by agriculture, with significant areas of natural vegetation

Fig. 4 LULC classes of CLC 2012 for Germany. From Keil, M., Esch, T., Divanis, A., Marconcini, M., Metz, A., Ottinger, M., Voinov, S., Wiesner, M., Wurm, M., Zeidler, J. (2015). Updating the Land Use and Land Cover Database CLC for the Year 2012 - “Backdating”of DLM-DE from the Reference Year 2009 to the Year 2006. Umweltbundesamt: Dessau-Roßlau, 80 p. (http://www.umweltbundesamt.de/publikationen/updating-the-land-use-landcover-database-clc-for), with permission.

Spatial Description

(Remote Sensing+ Topography+Climate+Biogeography+Classification+Soil Type)

Satellite Remote Sensing Images (Three Sensing)

Non-Spatial Description

8

Bio-geographical Zones

Topography

20m

1m

20m

6 5m 4 20m

2

1m

20m

5 10 15 25 30 Species Area Curve

Lay Out of Plot

Profile of Plot

Ground Sampling Knowledge Base

Fig. 5 Methodological approach of vegetation type mapping for India. Figure from Roy, P.S., Behera, M.D., Murthy, M.S.R., Roy, A., Singh, Sarnam, Kushwaha, S.P.S., Jha, C.S., Sudhakar, S., Joshi, P.K., Reddy, C.S, Gupta, S., Pujar, G., Dutt, C.B.S., Srivastava, V.K., Porwal, M.C., Tripathi, P., Singh, J.S., et al. (2015). New vegetation type map of India prepared using satellite remote sensing: comparison with global vegetation maps and utilities. International Journal of Applied Earth Observation and Geoinformation 39, 142–159, with permission.

8

GIS for Mapping Vegetation

1. Mixed forest formation

2. Gregarious forest formation Shorea sp. Tropical evergreen Tectona sp. Andaman tropical evergreen Dipterocarpus sp. Southern hilltop tropical evergreen Bamboo sp. Secondary tropical evergreen Pinus sp. Sub-tropical broadleaved evergreen Abies sp. Sub-tropical dry evergreen Quercus sp. Montane wet temperate Cedrus sp. Himalayan moist temperate Hardwickia sp. Sub-alpine Red sanders Cleistanthus sp. Tropical semi-evergreen Boswellia sp. Tropical moist deciduous Acacia catechu Tropical sal mixed moist deciduous Anogeissus pendula Tropicalteak mixed moist deciduous Acacia senegal Tropical dry deciduous Rhododendron sp. Tropical sal mixed dry deciduous Juniperus sp. Tropical teak mixed dry deciduous Tropical thorn Dry tropical bamboo mixed Temperate coniferous Sub-tropical pine mixed

N

9.Managed ecosystems Orchard Tea

Arecanut Coconut Mango Saffron Cryptomeria Padauk

3. Locale-specific formation Mangrove forest Avicennia sp. Lumnitzera sp. Mangrove scrub Phoenix sp. Rhizophora sp. Xylocarpus sp. Littoral forest Fresh water swamp Lowland swamp Syzigium sp. swamp Sholas Riverine Ravine Sacred groves Tropical seasonal swamp Kans

4. Plantation Forest plantation Eucalyptus sp. Acacia sp. Casuriana sp. Alnus sp. Mixed plantation Gliricidia sp.

5. Degraded formation Degraded forest Shifting cultivation Abandoned jhum Current jhum

6.Woodland Woodland Tree savannah Shrub savannah

(Original Mapping Scale at 1: 50,000)

0

140 280

560

840

1,120 Kilometers

7. Scrub/ Shrub land Dense scrub Open scrub Dry evergreen scrub Dry deciduous scrub Ziziphus sp. dominant Euphorbia scrub Moist alpine scrub Dry alpine scrub Prosopis juliflora Lantana sp. dominant Desert dune scrub Thorn scrub Prosopis cineraria

10.Others Agriculture Cold deserts Settlement Barren land River bed Water body Wet lands

8. Grassland Grassland Wet Riverine Moist alpine pasture Dry alpine pasture Dry Swampy Lasiurus-Panicum sp. CenchrusDactyloctenium sp. Sehima-Dichanthium sp. Costal swampy

Fig. 6 Detailed vegetation type map of India which is produced at a scale of 1:50,000. Roy, P.S., Behera, M.D., Murthy, M.S.R., Roy, A., Singh, Sarnam, Kushwaha, S.P.S., Jha, C.S., Sudhakar, S., Joshi, P.K., Reddy, C.S, Gupta, S., Pujar, G., Dutt, C.B.S., Srivastava, V.K., Porwal, M.C., Tripathi, P., Singh, J.S., et al. (2015): New vegetation type map of India prepared using satellite remote sensing: Comparison with global vegetation maps and utilities. International Journal of Applied Earth Observation and Geoinformation 39, 142–159

GIS for Mapping Vegetation

9

As Pedrotti (2013) stated and as is very obvious from the case studies just mentioned, vegetation maps are based on topographic maps and the combined use for mapping of vegetation is described in detail in the following section “Multidata Approach for Land Use and Land Cover Mapping.” Topographic maps are produced in many countries ranging in scale from 1:25,000 to 1:1,000,000. The change from static paper map production to topographic information systems from the 1990s is based on GIS technologies. The official surveying and mapping agencies are responsible for the implementation and maintenance of such topographic information systems. These are, e.g., in P.R. China the National Geomatics Center of China (NGCC) (http://ngcc.sbsm.gov.cn/article/en), in the Netherlands the Kadaster (https://www.kadaster.nl/-/top10nl), or in the United States the USGS’s The National Map (https:// nationalmap.gov). There are differences in detail of these topographical products; e.g., the German ATKIS is a multiscale topgraphic information system which provides digital landscape models (DLMs) (vector data), digital elevation models (DEMs) (raster data), digital orthophotos (DOPs) (raster data), and digital topographic maps (raster data) (www.atkis.de). The most precise and content rich is the Basis-DLM which is generated to represent topographic information at a scale of 1:10,000 to 1:25,000. The LULC information in the Basis-DLM is very rich and is summarized in Table 1. As is clearly obvious from Table 1, the ATKIS already provides rich information on vegetation in a high spatial resolution and the data is updated according to differing priority classes from every 6 months up to every 5 years. Additionally, the different DLMs for various scale are described by detailed mapping procedures and quality demands. Therefore, the metadata of the ATKIS DLMs provide a sound information base for the scales and purposes for which each DLM can be used. The combined analysis of RS-based LULC with official geodata like ATKIS is presented in detail in section “Multi-Data Approach for Land Use and Land Cover Mapping” following. Finally, user-driven data portals like Open Street Map (OSM) (www.openstreetmap.org) must be mentioned. These are unofficial, community-driven spatial data mapping activities but are also very rich in topographic information and also include LULC data, in varying detail and in a less-organized LULC class scheme than the official topographic map products. Nevertheless, the data is extremely valuable and also can be incorporated in LULC analysis using remote sensing (Johnson and Iizuka, 2016).

Table 1

Vegetation-specific LULC classes contained in the ATKIS Basis-DLM

ID

Feature class

Feature sub-class

Sub-class ID

43001

Agriculture

43002

Forest

43003 43004 43005 43006 54001

Shrubland Heathland Peatland Wetland Characteristic vegetation

Arable Land Arable Field Orchard Hops Grassland Meadow Orchard Horticulture Tree Nursery Vineyard Orchard Deciduous Forest Coniferous Forest Mixed Forest

1010 1011 1012 1020 1021 1030 1031 1040 1050 1100 1200 1300

Deciduous Tree Coniferous Tree Deciduous Trees Coniferous Trees Mixed Trees Hedge Deciduous Tree Row Coniferous Tree Row Mixed Trees Row Brushwood Shrubs Forest Aisle Reed Bed Grass Fruit Tree

1011 1012 1021 1022 1023 1100 1210 1220 1230 1250 1260 1300 1400 1500 1600

10

GIS for Mapping Vegetation

2.01.4

Multi-Data Approach for Land Use and Land Cover Mapping

2.01.4.1

Introduction to Land Use and Land Cover Mapping

The availability of spatial land use (LU) and land cover (LC) information for larger areas is essential for numerous topics. The spectrum ranges from local to global applications with different demands on LULC data from the various stakeholders (Giri, 2012; Komp, 2015; Mora et al., 2014). Key areas are, for instance, land use planning (Manakos and Lavender, 2014), food security (Foley et al., 2005, 2011; Thenkabail, 2010, 2012) or environmental modeling and climate change studies (Bareth, 2009; Bojinski et al., 2014; Simmer et al., 2014). In contrast to vegetation inventories, which focus on selected vegetation types, LULC datasets provide comprehensive information on the composition of the entire land surface. LULC datasets therefore also contain information on built-up areas, water bodies or barren land (Anderson et al., 1976). Nevertheless, vegetation mapping, especially on croplands, plays a preeminent role and is often a major driver for conducting LULC mapping (Teluguntla et al., 2015). Nowadays, LULC data is usually provided in the form of digital maps either in raster or vector data format. In such maps, areas of different LULC are allocated into different categories (e.g., urban, forest, water). Although LU and LC are strongly interconnected, they have different meanings. Land cover denotes the observed biotic and abiotic composition of the earth surface with, e.g., forests, water bodies, urban areas, etc. (Giri, 2012; Meyer and Turner, 1992). Land use, however, refers more to the usage of land by humans (Campbell and Wynne, 2011a; Loveland and DeFries, 2004). For instance, a vegetated area may have the land cover of forest or trees, but the land use may be recreation area or a tree nursery. Additionally, a specific land use can be composed of several land cover types. However, to satisfy as many potential users as possible, maps often contain a mixture of LU and LC (Anderson et al., 1976). The categorical information provided by LULC maps is usually based on a classification scheme that may be newly created or adapted from an established nomenclature. One of the first classification schemes is the one by Anderson et al. (1976), which differentiates basic land cover types like Urban, Agricultural land or Water with up to two sublevels (e.g., Level I: Urban or built-up (1), Level II: Residential (11), Level III: Single-family Units (111) or Level I: Agricultural Land (2), Level II: Cropland or Pasture (21)). Many of the succeeding classification systems built upon this structure, although adaptations concerning a specific thematic focus or the observation scale are common, for example, NOAA (2016); Xian et al. (2009). Further examples for popular contemporary LULC cover products in this regard are the National Land Cover Database 2011 (NLCD 2011) for the United States of America (Homer et al., 2015) and its predecessors, or the CORINE Land Cover (CLC) program of the European Union (Büttner et al., 2014; EEA, 2007). LULC data is usually needed for rather large areas, irrespective of the investigation scale. Nevertheless, depending on the main application and the size of the study area, mapping endeavors can be roughly categorized as local, regional to country-wide or as continental to global scale studies (Table 2). Due to the large amount of work and costs associated with LULC mapping through field surveys, mapping of large areas only became possible after the initiation of the Landsat Program and the launch of Landsat-1 in 1972 (Loveland, 2012). Countless satellite-borne earth observation systems with a variety of sensor specifications have emerged since then. Table 1 provides some examples of contemporary sensors for each scale region. The rough categorization in the tables is based on the general relationship between sensor capabilities (in terms of spatial, spectral and temporal resolution), the size of the investigation area and the minimum mapping unit (MMU). The MMU determines the size of the smallest features, which are differentiated as discrete areas in a map (Lillesand et al., 2014; Warner et al., 2009). However, many sensors may be suitable for LULC mapping endeavors at multiple scales. The spatial resolution region of the Landsat sensors of about 30 m or finer ( Landsat-4) can be considered as a quasistandard as input data for regional to nationwide LULC mapping. Other popular moderate spatial resolution sensors (ca. 10–100 m), which are

Table 2

Selection of popular satellite sensors that are frequently used for land use/land cover mapping, at different spatial scales

Scale region

Spatial resolution

Satellite (sensor)

Swath width (km)

Spatial resolution (m)

Spectral range (nm)

Continental to global

Coarse (>100 m)

NOAA 17 (AVHRR) SPOT (VGT) Terra (MODIS)

2940 2250 2330

1100 1000 250–500

500–1250 430–1750 366– 14,385 450–2350 450–2350 443–842 500–1730 520–1700 530–1165 400–1040 440–850

Regional to countrywide Regional to national

Moderate (10–100 m)

Local

Fine ( 90% of the tree canopy area was automatically classified with GIS surface analyses based on super-high resolution DSMs derived by low-cost UASs. The potential of such GIS-based morphometric analysis of canopy surfaces offers major opportunities. In particular, the combined analysis of DSMs for structural vegetation parameters with (hyper-) spectral analysis for physiological vegetation parameters opens a new and promising research field for mapping of vegetation (Aasen et al., 2015).

2.01.6

Conclusion and Outlook

The application of GIS for mapping of vegetation is manifold. From supporting data acquisition on field surveys to advances in spatial analysis for species extraction, GIS technologies are almost omnipresent in vegetation mapping and the same is true for remote sensing technologies. The use of GIS for vegetation-related research was a key focus from the very beginning of GIS developments in the early 1960s. Actually, the “father of GIS,” Dr. Roger F. Tomlinson, developed a land information system for the Canada Land Survey which is considered to be the first implemented and conceptualized GIS ever including vegetation, and land use cover, for spatial decision making (Rura et al., 2014). Hence, the spatial data handling capabilities of GIS led to applications and spatial data analyses developments in botany, landscape ecology, and geography and are nowadays key tools for mapping of vegetation in all possible forms. The ongoing miniaturization of sensors and mobile computing devices, plus the ongoing multisensor integration in proximal, airborne, and satellite sensing platforms, plus the ongoing significant enhancement of spatial, temporal, and spectral resolution, has led and will lead to an increasing data acquisition of vegetation in any context. In remote sensing, three current developments or initiatives will have a severe and challenging impact on vegetation mapping in the next 5–10 years. The first is ESA’s Copernicus program with the Sentinel satellite family in combination with its open data policy. The new and unique potentials of the Sentinel data are within its spatial resolution from 10 m on, the temporal repetition

22

GIS for Mapping Vegetation

from several days on, and the multisensor data acquisition comprising multispectral, thermal, and C-band data. Multitemporal and multisensoral data analysis is of key importance to investigate phenology and consequently plant vitality. Therefore, the Sentinel data in combination with already operating and planned satellite sensors will improve spaceborne vegetation mapping and sensing on a global scale. The second development is the increasing number of satellite sensors providing very high panchromatic (< 1 m), multispectral (< 2 m), or X-band (< 2 m) resolution with a potential repetition of several days (e.g., WorldView-3/-4, TerraSAR-X). Even so, these data products still require substantial cost investments; for regional and local scale the value of the data for mapping of vegetation has an unexploited potential, especially for tree or forest monitoring and for precision agriculture purposes. Finally, the third development is related to the improvements of low-altitude remote sensing using manned (e.g., gyrocopters) or UASs (e.g., fixed-wing or multirotor UASs). The lately introduced new low-weight multispectral and hyperspectral sensors (e.g., Cubert’s UHD185 or Parrot’s Sequoia) which can be mounted to low-weight UASs (< 5 kg) are very capable and provide for a spatial resolution of < 0.01 m data with an unseen-before information richness. These ongoing and coming remote sensing developments in combination with recent photogrammetric software developments (e.g., Pix4D, Photoscan, SURE) for DSM and 3D point cloud generation will result in an enhanced demand for established and new GIS analysis methods. A prominent example are the GIS-based surface analysis tools (e.g., slope, aspect, morphometry) which are established methods in geomorphology and have been introduced to vegetation mapping only in the last few years. In our opinion the surface analysis of DSMs having a super-high spatial resolution (< 5 cm) and a high temporal resolution (repetition < 10 days) will be one of the key research areas of vegetation mapping in the next few years. Finally, the combination of such structural vegetation parameters with physiological vegetation parameters like chlorophyll content or nutrient status, or spectral properties in general, has also been only a few years in the focus of research and bears a still-unexploited potential. The identification and monitoring of single plant species or plant communities will be enhanced significantly by these approaches. Therefore, GIS technologies for mapping of vegetation will face a prosperous future, because the new technologies, resolutions, and amount of spatial data produced are demanding new GIS solutions.

References Aasen, H., Burkart, A., Bolten, A., Bareth, G., 2015. Generating 3D hyperspectral information with lightweight UAV snapshot cameras for vegetation monitoring: from camera calibration to quality assurance. ISPRS Journal of Photogrammetry Remote Sensing 108, 245–259. Adam, E., Mutanga, O., Odindi, J., Abdel-Rahman, E.M., 2014. Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: evaluating the performance of random forest and support vector machines classifiers. International Journal of Remote Sensing 35, 3440–3458. http://dx.doi.org/10.1080/ 01431161.2014.903435. AdV (2006) Documentation on the modelling of geoinformation of official surveying and mapping (GeoInfoDok)dChapter 5dTechnical applications of the basic schemadSection 5.4dExplanations on ATKIS®, Version 5.1, Status 31 July 2006, In Afflerbach S, Kunze W (Eds.) Working Committee of the Surveying Authorities of the States of the Federal Republic of Germany (AdV), p. 74, http://www.adv-online.de/AAA-Modell/Dokumente-der-GeoInfoDok/binarywriterservlet?imgUid¼8df46f15-1ff9-f216-afd6ff3072e13d63&uBasVariant¼11111111-1111-1111-1111-111111111111&isDownload¼true (accessed 20.04.16). AdV (2016) Authoritative Real Estate Cadastre Information System (ALKIS®). Working Committee of the Surveying Authorities of the Laender of the Federal Republic of Germany, http://www.adv-online.de/Products/Real-Estate-Cadastre/ALKIS/. Aguilar, M.A., Saldaña, M.M., Aguilar, F.J., 2013. GeoEye-1 and WorldView-2 pan-sharpened imagery for object-based classification in urban environments. International Journal of Remote Sensing 34, 2583–2606. http://dx.doi.org/10.1080/01431161.2012.747018. Anderson JR, Hardy EE, Roach JT, Witmer RE (1976) A land use and land cover classification system for use with remote sensor datadgeological survey professional paper 964da revision of the land use classification system as presented in U.S. Geological Survey Circular 671. Washington: U.S. Geological Survey. http://landcover.usgs.gov/pdf/ anderson.pdf (accessed 05.08.13). Alexander, R., Millington, A.C. (Eds.), 2000. Vegetation mapping: from patch to planet. John Wiley & Sons Ltd, New York, p. 350. Arsanjani, J.J., Tayyebi, A., Vaz, E., 2016. GlobeLand30 as an alternative fine-scale global land cover map: challenges, possibilities, and implications for developing countries. Habitat International 55, 25–31. Ban, Y., Gong, P., Giri, C., 2015. Global land cover mapping using Earth observation satellite data: recent progresses and challenges. ISPRS Journal of Photogrammetry and Remote Sensing 103, 1–6. http://dx.doi.org/10.1016/j.isprsjprs.2015.01.001. Bareth G (2001) Integration einer IRS-1C-Landnutzungsklassifikation in das ATKIS zur Verbesserung der Information zur landwirtschaftlichen Nutzfläche am Beispiel des württembergischen Allgäus. GIS 6 (2001), Zschr. f. raumbezogene Informationen und Entscheidungen, Heidelberg: Wichmann Verlag, S.40–45. Bareth G (2008) Multi-data approach (MDA) for enhanced land use/land cover mapping, the international archives of the photogrammetry, remote sensing and spatial information sciences, vol. XXXVII. Part B8. Beijing 2008. International Society of the Photogrammetry and Remote Sensing Beijing, pp. 1059–1066. Bareth, G., 2009. GIS- and RS-based spatial decision support: structure of a spatial environmental information system (SEIS). International Journal of Digital Earth 2, 134–154. http://dx.doi.org/10.1080/17538940902736315. Bareth, G., Bolten, A., Bongartz, J., Jenal, A., Kneer, C., Lussem, U., Waldhoff, G., Weber, I., 2017. Single tree detection in agro-silvo-pastoral systems from high resolution digital surface models obtained from UAV- and gyrocopter-based RGB-imaging. Zenodo. http://dx.doi.org/10.5281/zenodo.375603. Bareth, G., Bolten, A., Hollberg, J., Aasen, H., Burkart, A., Schellberg, J., 2015. Feasibility study of using non-calibrated UAV-based RGB imagery for grassland monitoring: case study at the Rengen Long-term Grassland Experiment (RGE), Germany. In: DGPF Annual Conference’15, DGPF-Proceedings 24, pp. 55–62. http://www.dgpf.de/src/tagung/ jt2015/start.html. Bareth, G., Waldhoff, G., 2012. Regionalization of agricultural management by using the Multi-Data Approach (MDA). International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXIX-B8, 225–230. Bargiel, D., Herrmann, S., 2011. Multi-temporal land-cover classification of agricultural areas in two european regions with high resolution spotlight TerraSAR-X data. Remote Sensing 3, 859–877. Bendig, J., Bolten, A., Bareth, G., 2013. UAV-based imaging for multi-temporal, very high resolution crop surface models to monitor crop growth variability. Photogrammetrie Fernerkundung Geoinformation 2013 (6), 551–562. Bendig, J., Yu, K., Aasen, H., Bolten, A., Bennertz, S., Broscheit, J., Gnyp, M.L., Bareth, G., 2015. Combining UAV-based crop surface models, visible and near infrared vegetation indices for biomass monitoring in barley. International Journal of Applied Earth Observation and Geoinformation 39, 79–87.

GIS for Mapping Vegetation

23

Benediktsson, J.A., Swain, P.H., Ersoy, O.K., 1990. Neural network approaches versus statistical methods in classification of multisource remote sensing data. IEEE Transactions on Geoscience and Remote Sensing 28, 540–552. http://dx.doi.org/10.1109/TGRS.1990.572944. Benz, U.C., Hofmann, P., Willhauck, G., Lingenfelder, I., Heynen, M., 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry and Remote Sensing 58, 239–258. Bill, R., 2016. Grundlagen der Geo-Informationssysteme. Wichmann, Heidelberg, 871 p. Blaschke, T., 2010. Object based image analysis for remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing 65, 2–16. Blaschke, T., Hay, G.J., Kelly, M., Lang, S., Hofmann, P., Addink, E., Queiroz Feitosa, R., van der Meer, F., van der Werff, H., van Coillie, F., Tiede, D., 2014. Geographic objectbased image analysisdtowards a new paradigm. ISPRS Journal of Photogrammetry and Remote Sensing 87, 180–191. http://dx.doi.org/10.1016/j.isprsjprs.2013.09.014. Blaschke, T., Strobl, J., 2001. What’s wrong with pixels? Some recent developments interfacing remote sensing and GIS. GeoBIT/GIS 6, 12–17. Bojinski, S., Verstraete, M., Peterson, T.C., Richter, C., Simmons, A., Zemp, M., 2014. The concept of essential climate variables in support of climate research, applications, and policy. Bulletin of the American Meteorological Society 95, 1431–1443. http://dx.doi.org/10.1175/BAMS-D-13-00047.1. Brown, G., 2005. Mapping spatial attributes in survey research for natural management: methods and applications. Society & Natural Resources 18 (1), 17–39. Burrough, P.A., McDonnell, R.A., 1998. Principles of geographical information systems. Oxford University Press, Oxford, New York. Büttner B, Kosztra B, Maucha G, Pataki R (2012) Implementation and achievements of CLC2006. Kopenhagen: European Environment Agency (EEA), http://www.eea.europa.eu/ data-and-maps/data/clc-2006-vector-data-version-2/ (accessed 27.05.13). Büttner G, Soukup T, Kosztra B (2014) CLC2012 Addendum to CLC2006 Technical GuidelinesdFinal Draft-V2–14.08.2014. European Environment Agency, http://land. copernicus.eu/user-corner/technical-library/Addendum_finaldraft_v2_August_2014.pdf (accessed 11.01.17). Campbell, J.B., Wynne, R.H., 2011a. 20 Land use and land cover, introduction to remote sensing, 5th edn. The Guilford Press, New York. pp. 585–613. Campbell, J.B., Wynne, R.H., 2011b. Introduction to remote sensing, 5th edn. The Guilford Press, New York. Carneggie, D.M., Lauer, D.T., 1966. Use of multiband remote sensing in forest and range inventory. Photogrammetria 21, 115–141. Castillejo-González, I.L., López-Granados, F., García-Ferrer, A., Peña-Barragán, J.M., Jurado-Expósito, M., de la Orden, M.S., González-Audicana, M., 2009. Object- and pixelbased analysis for mapping crops and their agro-environmental associated measures using QuickBird imagery. Computers and Electronics in Agriculture 68, 207–215. http:// dx.doi.org/10.1016/j.compag.2009.06.004. Cecchi, G., Magli, R., Mazzinghi, P., Pantani, L., Pippi, I., 1984. Vegetation remote sensing: a new field for Lidar applications. In: Proc. SPIE 0492, 1984 European Conf on Optics, Optical Systems, and Applications, March 27, 1985, vol. 180. http://dx.doi.org/10.1117/12.94369. Chen, J., Chen, J., Liao, A., Cao, X., Chen, L., Chen, X., He, C., Han, G., Peng, S., Lu, M., Zhang, W., Tong, X., Mills, J., 2015. Global land cover mapping at 30 m resolution: a POK-based operational approach. ISPRS Journal of Photogrammetry and Remote Sensing 103, 7–27. http://dx.doi.org/10.1016/j.isprsjprs.2014.09.002. Colomina, I., Molina, P., 2014. Unmanned aerial systems for photogrammetry and remote sensing: a review. ISPRS Journal of Photogrammetry and Remote Sensing 92, 79–97. Congalton, R.G., Gu, J., Yadav, K., Thenkabail, Ps, Ozdogan, M., 2014. Global land cover mapping: a review and uncertainty analysis. Remote Sensing 6, 12070–12093. Conrad, C., Dech, S., Dubovyk, O., Fritsch, S., Klein, D., Löw, F., Schorcht, G., Zeidler, J., 2014. Derivation of temporal windows for accurate crop discrimination in heterogeneous croplands of Uzbekistan using multitemporal RapidEye images. Computers and Electronics in Agriculture 103, 63–74. http://dx.doi.org/10.1016/j.compag.2014.02.003. Coppin, P., Jonckheere, I., Nackaerts, K., Muys, B., Lambin, E., 2004. Digital change detection methods in ecosystem monitoring: a review. International Journal of Remote Sensing 25, 1565–1596. http://dx.doi.org/10.1080/0143116031000101675. Corcoran, J., Knight, J., Gallant, A., 2013. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in Northern Minnesota. Remote Sensing 5, 3212. De Fries, R.S., Hansen, M., Townshend, J.R.G., Sohlberg, R., 1998. Global land cover classifications at 8 km spatial resolution: the use of training data derived from Landsat imagery in decision tree classifiers. International Journal of Remote Sensing 19, 3141–3168. http://dx.doi.org/10.1080/014311698214235. De Wit, A.J.W., Clevers, J.G.P.W., 2004. Efficiency and accuracy of per-field classification for operational crop mapping. International Journal of Remote Sensing 25, 4091–4112. http://dx.doi.org/10.1080/01431160310001619580. DeCover (2012) DeCover 2dspace-based services for German land cover. Münster, Germany: EFTAS Fernerkundung Technologietransfer GmbH, http://www.decover.info/public/ DeCOVER_Brochure_engl_V1_1_small.pdf (accessed 10.02.17). Defries, R.S., Townshend, J.R.G., 1994. NDVI-derived land-cover classifications at a global-scale. International Journal of Remote Sensing 15, 3567–3586. Dixon, B., Candade, N., 2008. Multispectral landuse classification using neural networks and support vector machines: one or the other, or both? International Journal of Remote Sensing 29, 1185–1206. http://dx.doi.org/10.1080/01431160701294661. Drusch, M., Del Bello, U., Carlier, S., Colin, O., Fernandez, V., Gascon, F., Hoersch, B., Isola, C., Laberinti, P., Martimort, P., Meygret, A., Spoto, F., Sy, O., Marchese, F., Bargellini, P., 2012. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sensing of Environment 120, 25–36. http://dx.doi.org/10.1016/ j.rse.2011.11.026. Duro, D.C., Franklin, S.E., Dubé, M.G., 2012. A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sensing of Environment 118, 259–272. http://dx.doi.org/10.1016/j.rse.2011.11.020. EEA (2007) CLC2006 technical guidelines, Copenhagen: EEA Technical report. European Environment Agency, http://land.copernicus.eu/user-corner/technical-library/CLC2006_ technical_guidelines.pdf (accessed 11.01.17). Ehlers, M., 1992. Remote sensing and geographic information systems: image-integrated geographic information systems. In: Johnson, A.I., Pettersson, C.B., Fulton, J.L. (Eds.), Geographie information systems (GIS) and mapping-practices and standards. American Society for Testing and Materials, Philadelphia, pp. 53–67. ESRI (2012a) Esri Grid format, ArcGIS Desktop 10.0 Help. Environmental Systems Research Institute, http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#// 009t0000000w000000. ESRI (2012b) Raster dataset attribute tables, ArcGIS Desktop 10.0 Help. Environmental Systems Research Institute, http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/ Raster_dataset_attribute_tables/009t00000009000000/. Foley, J.A., DeFries, R., Asner, G.P., Barford, C., Bonan, G., Carpenter, S.R., Chapin, F.S., Coe, M.T., Daily, G.C., Gibbs, H.K., Helkowski, J.H., Holloway, T., Howard, E.A., Kucharik, C.J., Monfreda, C., Patz, J.A., Prentice, I.C., Ramankutty, N., Snyder, P.K., 2005. Global consequences of land use. Science 309, 570–574. Foley, J.A., Ramankutty, N., Brauman, K.A., Cassidy, E.S., Gerber, J.S., Johnston, M., Mueller, N.D., O’Connell, C., Ray, D.K., West, P.C., Balzer, C., Bennett, E.M., Carpenter, S.R., Hill, J., Monfreda, C., Polasky, S., Rockstrom, J., Sheehan, J., Siebert, S., Tilman, D., Zaks, D.P.M., 2011. Solutions for a cultivated planet. Nature 478, 337– 342. http://www.nature.com/nature/journal/v478/n7369/abs/nature10452.html#supplementary-information. Foody, G.M., Mathur, A., 2006. The use of small training sets containing mixed pixels for accurate hard image classification: training on mixed spectral responses for classification by a SVM. Remote Sensing of Environment 103, 179–189. Friedl, M.A., Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment 61, 399–409. http://dx.doi.org/10.1016/ s0034-4257(97)00049-7. Friedl, M.A., McIver, D.K., Hodges, J.C.F., Zhang, X.Y., Muchoney, D., Strahler, A.H., Woodcock, C.E., Gopal, S., Schneider, A., Cooper, A., Baccini, A., Gao, F., Schaaf, C., 2002. Global land cover mapping from MODIS: algorithms and early results. Remote Sensing of Environment 83, 287–302. http://dx.doi.org/10.1016/S0034-4257(02)00078-0. Giri, C., Pengra, B., Long, J., Loveland, T.R., 2013. Next generation of global land cover characterization, mapping, and monitoring. International Journal of Applied Earth Observation and Geoinformation 25, 30–37. http://dx.doi.org/10.1016/j.jag.2013.03.005. Giri, C., Zhu, Z., Reed, B., 2005. A comparative analysis of the Global Land Cover 2000 and MODIS land cover data sets. Remote Sensing of Environment 94, 123–132. http:// dx.doi.org/10.1016/j.rse.2004.09.005.

24

GIS for Mapping Vegetation

Giri, C.P., 2012. Brief overview of remote sensing of land cover. In: Giri, C.P. (Ed.), Remote sensing of land use and land cover: principles and applications. CRC Press, Boca Raton, pp. 3–12. Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random Forests for land cover classification. Pattern Recognition Letters 27, 294–300. http://dx.doi.org/10.1016/ j.patrec.2005.08.011. Gong, P., Wang, J., Yu, L., Zhao, Y., Zhao, Y., Liang, L., Niu, Z., Huang, X., Fu, H., Liu, S., Li, C., Li, X., Fu, W., Liu, C., Xu, Y., Wang, X., Cheng, Q., Hu, L., Yao, W., Zhang, H., Zhu, P., Zhao, Z., Zhang, H., Zheng, Y., Ji, L., Zhang, Y., Chen, H., Yan, A., Guo, J., Yu, L., Wang, L., Liu, X., Shi, T., Zhu, M., Chen, Y., Yang, G., Tang, P., Xu, B., Giri, C., Clinton, N., Zhu, Z., Chen, J., Chen, J., 2013. Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM þ data. International Journal of Remote Sensing 34, 2607–2654. http://dx.doi.org/10.1080/01431161.2012.748992. Guo, K., 2010. Vegetation map and vegetation monographs of China. Bulletin of the Chinese Academy of Science 24 (4), 240–242. Hansen, M., Dubayah, R., DeFries, R., 1996. Classification trees: an alternative to traditional land cover classifiers. International Journal of Remote Sensing 17, 1075–1081. Hansen, M.C., Defries, R.S., Townshend, J.R.G., Sohlberg, R., 2000. Global land cover classification at 1 km spatial resolution using a classification tree approach. International Journal of Remote Sensing 21, 1331–1364. http://dx.doi.org/10.1080/014311600210209. Hansen, M.C., Potapov, P.V., Moore, R., Hancher, M., Turubanova, S.A., Tyukavina, A., Thau, D., Stehman, S.V., Goetz, S.J., Loveland, T.R., Kommareddy, A., Egorov, A., Chini, L., Justice, C.O., Townshend, J.R.G., 2013. High-resolution global maps of 21st-century forest cover change. Science 342 (6160), 850–853. Hazeu, G.W., 2014. Operational land cover and land use mapping in the Netherlands. In: Manakos, I., Braun, M. (Eds.), Land use and land cover mapping in Europe: practices & trends. Springer Netherlands, Dordrecht, pp. 283–296. Heywood, I., Cornelius, S., Carver, S., 2011. An introduction to geographical information systems, 4th edn. Prentice Hall, Harlow. Hirschmugl, M., Ofner, M., Raggam, J., Schardt, M., 2007. Single tree detection in very high resolution remote sensing data. Remote Sensing of Environment 110, 533–544. Hoffmeister, D., Bolten, A., Curdt, C., Waldhoff, G., Bareth, G., 2010. High resolution Crop Surface Models (CSM) and Crop Volume Models (CVM) on field level by terrestrial laserscanning. In: Proc. SPIE, vol. 7840, p. 6. http://dx.doi.org/10.1117/12.872315, 78400E. Hoffmeister, D., Waldhoff, G., Korres, W., Curdt, C., Bareth, G., 2016. Crop height variability detection in a single field by multi-temporal terrestrial laserscanning. Precision Agriculture 17 (3), 296–312. Homer, C., Dewitz, J., Yang, L.M., Jin, S., Danielson, P., Xian, G., Coulston, J., Herold, N., Wickham, J., Megown, K., 2015. Completion of the 2011 National Land Cover Database for the conterminous United Statesdrepresenting a decade of land cover change information. Photogrammetric Engineering and Remote Sensing 81, 345–354. http:// dx.doi.org/10.14358/pers.81.5.345. Houborg, R., Fisher, J.B., Skidmore, A.K., 2015. Advances in remote sensing of vegetation function and traits. International Journal of Applied Earth Observations and Geoinformation 43, 1–6. Hovenbitzer, M., Emig, F., Wende, C., Arnold, S., Bock, M., Feigenspan, S., 2014. Digital land cover model for GermanydDLM-DE. In: Manakos, I., Braun, M. (Eds.), Land use and land cover mapping in Europe: practices & trends. Springer Netherlands, Dordrecht, pp. 255–272. Huang, C., Davis, L.S., Townshend, J.R.G., 2002. An assessment of support vector machines for land cover classification. International Journal of Remote Sensing 23, 725–749. Hütt, C., Koppe, W., Miao, Y., Bareth, G., 2016. Best accuracy land use/land cover (LULC) Classification to derive crop types using multitemporal, multisensor, and multi-polarization SAR Satellite images. Remote Sensing 8 (8), 684. Hutchinson, C.F., 1982. Techniques for combining landsat and ancillary data for digital classification improvement. Photogrammetric Engineering and Remote Sensing 48, 123–130. Hyyppa, J., Kelle, O., Lehikoinen, M., Inkinen, M., 2001. A segmentation-based method to retrieve stem volume estimates from 3-D tree height models produced by laser scanners. IEEE Transactions on Geoscience and Remote Sensing 39 (5), 969–975. Immitzer, M., Vuolo, F., Atzberger, C., 2016. First experience with Sentinel-2 data for crop and tree species classifications in Central Europe. Remote Sensing 8, 166. Inglada, J., Vincent, A., Arias, M., Marais-Sicre, C., 2016. Improved early crop type identification by joint use of high temporal resolution SAR and optical image time series. Remote Sensing 8, 362. Inglada, J., Vincent, A., Arias, M., Tardy, B., Morin, D., Rodes, I., 2017. Operational high resolution land cover map production at the country scale using satellite image time series. Remote Sensing 9, 95. http://dx.doi.org/10.3390/rs9010095. Janssen, L.L.F., Jaarsma, M.N., Vanderlinden, E.T.M., 1990. Integrating topographic data with remote sensing for land-cover classification. Photogrammetric Engineering and Remote Sensing 56, 1503–1506. Jaakkola, A., Hyyppa, J., Kukko, A., Yu, X.W., Kaartinen, H., Lehtomaki, M., Lin, Y., 2010. A low-cost multi-sensoral mobile mapping system and its feasibility for tree measurements. ISPRS Journal 65 (6), 514–522. Jawak, S.D., Luis, A.J., 2013. Improved land cover mapping using high resolution multiangle 8-band WorldView-2 satellite remote sensing data. Journal of Applied Remote Sensing 7, 073573. http://dx.doi.org/10.1117/1.JRS.7.073573. APPRES. Johnson, B.A., Iizuka, K., 2016. Integrating OpenStreetMap crowdsourced data and Landsat time-series imagery for rapid land use/land cover (LULC) mapping: case study of the Laguna de Bay area of the Philippines. Applied Geography 67, 140–149. Jones, H.G., Vaughan, R.A., 2010. Remote sensing of vegetation: principles, techniques, and applications. Oxford University Press, 380 p. Kadaster (2016) Basisregistratie Topografie: Catalogus en Productspecificaties, https://www.kadaster.nl/documents/20838/88032/BRTþCatalogusþenþproductspecificaties/ c446e010-e8ef-4660-b4d4-9063ed07bb46 (accessed 25.01.17). Kanellopoulos, I., Varfis, A., Wilkinson, G.G., Megier, J., 1992. Land-cover discrimination in SPOT HRV imagery using an artificial neural network - a 20-class experiment. International Journal of Remote Sensing 13, 917–924. Kavzoglu, T., Mather, P.M., 2003. The use of backpropagating artificial neural networks in land cover classification. International Journal of Remote Sensing 24, 4907–4938. http://dx.doi.org/10.1080/0143116031000114851. Keil, M., Esch, T., Divanis, A., Marconcini, M., Metz, A., Ottinger, M., Voinov, S., Wiesner, M., Wurm, M., Zeidler, J., 2015. Updating the land use and land cover database CLC for the Year 2012d„Backdating“of DLM-DE from the Reference Year 2009 to the Year 2006. Umweltbundesamt, Dessau-Roßlau, 80 p. http://www.umweltbundesamt.de/ publikationen/updating-the-land-use-land-cover-database-clc-for. Koenig, K., Höfle, B., 2016. Full-waveform airborne laser scanning in vegetation studiesda review of point cloud and waveform features for tree species classification. Forests 7 (9), 198. http://dx.doi.org/10.3390/f7090198. Komp, K.U., 2015. High resolution land cover/land use mapping of large areasdcurrent status and upcoming trends. Photogrammterie Fernerkund- ung Geoinformation 2015, 395–410. http://dx.doi.org/10.1127/pfg/2015/0276. Koppe, W., Gnyp, M.L., Hütt, C., Yao, Y.K., Miao, Y., Chen, X., Bareth, G., 2013. Rice monitoring with multi-temporal and dual-polarimetric TerraSAR-X data. International Journal of Applied Earth Observation and Geoinformation 21, 568–576. Küchler, A.W., Zonneveld, I.S., 1988. Vegetation mapping. Handbook of Vegetation Science, vol. 10, 632 p. Kluwer Academic Publishers, Dordrecht. Küchler, A.W., 1988a. Preface. In: Küchler, A.W., Zonnewald, I.S. (Eds.), Vegetation mapping, Handbook of vegetation science, vol. 10. Kluwer Academic Publishers, Dordrecht, pp. 1–2. Küchler, A.W., 1988b. Historical sketch. In: Küchler, A.W., Zonnewald, I.S. (Eds.), Vegetation mapping, Handbook of vegetation science, vol. 10. Kluwer Academic Publishers, Dordrecht, pp. 3–11. Umweltbundesamt. Küchler, A.W., 1988c. Aspect of maps. In: Küchler, A.W., Zonnewald, I.S. (Eds.), Vegetation mapping, Handbook of vegetation science, vol. 10. Kluwer Academic Publishers, Dordrecht, pp. 97–104.

GIS for Mapping Vegetation

25

Kumar, L., Dury, S.J., Schmidt, K., Skidmore, A., 2003. Imaging spectrometry and vegetation science. In: van der Meer, F.D., de Jong, S.M. (Eds.), Imaging spectrometry. Kluwer Academic Publishers, London, pp. 111–156. Lambdon, P.W., Pysek, P., Basnou, C., Hejda, M., Arianoutsou, M., Essl, F., Jarosik, V., Pergl, J., Winter, M., Anastasiu, P., Andriopoulos, P., Bazos, I., Brundu, G., CelestiGrapow, L., Chassot, P., Delipetrou, P., Josefsson, M., Kark, S., Klotz, S., Kokkoris, Y., Kuhn, I., Marchante, H., Perglova, I., Pino, J., Vila, M., Zikos, A., Roy, D., Hulme, P.E., 2008. Alien flora of Europe: species diversity, temporal trends, geographical patterns and research needs. Preslia 80 (2), 101–149. Land NRW (2017) Digitales Basis-Landschaftsmodell, https://www.opengeodata.nrw.de/produkte/geobasis/dlm/basis-dlm/basis-dlm_EPSG25832_Shape.zip (accessed 10.02.17), Datenlizenz Deutschland - Namensnennung - Version 2.0 (www.govdata.de/dl-de/by-2-0). Lefsky, M.A., Cohen, W.B., Acker, S.A., Parker, G.G., Spies, T.A., Harding, D., 1990. Lidar remote sensing of the canopy structure and biophysical properties of Douglas-fir western hemlock forests. Remote Sensing of Environment 70, 339–361. Liang, L., Gong, P., 2015. Evaluation of global land cover maps for cropland area estimation in the conterminous United States. International Journal of Digital Earth 8, 102–117. http://dx.doi.org/10.1080/17538947.2013.854414. Lillesand, T., Kiefer, R.W., Chipman, J., 2014. Remote Sensing and Image Interpretation. Wiley. Long, H., Zhao, Z., 2005. Urban road extraction from high-resolution optical satellite images. International Journal of Remote Sensing 26, 4907–4921. http://dx.doi.org/10.1080/ 01431160500258966. Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., 2015. Geographic information systems and science, 5th edn. Wiley, West Sussex. Loveland, T.R., 2012. History of land-cover mapping. In: Giri, C.P. (Ed.), Remote sensing of land use and land cover: principles and applications. CRC Press, Boca Raton, pp. 13–22. Loveland, T.R., DeFries, R.S., 2004. Observing and monitoring land use and land cover change. In: DeFries, R.S., Asner, G.P., Houghton, R.A. (Eds.), Ecosystems and land use change. American Geophysical Union, Washington, pp. 231–246. Low, F., Michel, U., Dech, S., Conrad, C., 2013. Impact of feature selection on the accuracy and spatial uncertainty of per-field crop classification using Support Vector Machines. ISPRS Journal of Photogrammetry and Remote Sensing 85, 102–119. http://dx.doi.org/10.1016/j.isprsjprs.2013.08.007. Lu, D., Mausel, P., Brondizio, E., Moran, E., 2004. Change detection techniques. International Journal of Remote Sensing 25, 2365–2407. http://dx.doi.org/10.1080/ 0143116031000139863. Lussem, U., Hütt, C., Waldhoff, G., 2016. Combined analysis of sentinel-1 and rapideye data for improved crop type classification: an early season approach for rapeseed and cereals. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLI-B8, 959–963. http://dx.doi.org/10.5194/isprs-archives-XLI-B8959-2016. Mallet, C., Bretar, F., 2009. Full-waveform topographic lidar: state-of-the-art. ISPRS Journal of Photogrammetry and Remote Sensing 64 (1), 1–16. Manakos, I., Lavender, S., 2014. Remote sensing in support of the geo-information in Europe. In: Manakos, I., Braun, M. (Eds.), Land use and land cover mapping in Europe: practices & trends. Springer Netherlands, Dordrecht, pp. 3–10. McNairn, H., Champagne, C., Shang, J., Holmstrom, D., Reichert, G., 2009. Integration of optical and Synthetic Aperture Radar (SAR) imagery for delivering operational annual crop inventories. ISPRS Journal of Photogrammetry and Remote Sensing 64, 434–449. http://dx.doi.org/10.1016/j.isprsjprs.2008.07.006. McNairn, H., Ellis, J., Van Der Sanden, J.J., Hirose, T., Brown, R.J., 2002. Providing crop information using RADARSAT-1 and satellite optical imagery. International Journal of Remote Sensing 23, 851–870. http://dx.doi.org/10.1080/01431160110070753. Merchant, J., Narumalani, S., 2009. Integrating remote sensing and geographic information systems. In: Warner, T.A., Nellis, M.D., Foody, G.M. (Eds.), The SAGE handbook of remote sensing. SAGE Publications Ltd, London, pp. 257–269. Meyer, W.B., Turner, B.L., 1992. Human population growth and global land-use/cover change. Annual Review of Ecology & Systematics 23, 39–61. Millington, A.C., Alexander, R., 2000. Vegetation mapping in the last three decades of the twentieth century. In: Alexander, R., Millington, A.C. (Eds.), Vegetation mapping: from patch to planet. John Wiley & Sons Ltd, New York, pp. 321–332. Mora, M., Tsendbazar, N.-E., Herold, M., Arino, O., 2014. Global land cover mapping: current status and future trends. In: Manakos, I., Braun, M. (Eds.), Land use and land cover mapping in Europe: practices & trends. Springer, Dordrecht Heidelberg, pp. 11–30. More, R.S., Manjunath, K., Jain, N.K., Panigrahy, S., Parihar, J.S., 2016. Derivation of rice crop calendar and evaluation of crop phenometrics and latitudinal relationship for major south and south-east Asian countries: a remote sensing approach. Computers and Electronics in Agriculture 127, 336–350. Moskal, L.M., Styers, D.M., Halabisky, M., 2011. Monitoring urban tree cover using object-based image analysis and public domain remotely sensed data. Remote Sensing 3, 2243. Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: a review. ISPRS Journal of Photogrammetry and Remote Sensing 66, 247–259. Mueller-Dombois, D., 1984. Classification and mapping of plant communities: a review with emphasis on tropical vegetation. In: Woodwell, G.M. (Ed.), The role of terrestrial vegetation in the global carbon cycle: measurement by remote sensing. John Wiley & Sons Ltd, New York, pp. 21–88. Mulla, D.J., 2013. Twenty-five years of remote sensing in precision agriculture: key advances and remaining knowledge gaps. Biosystems Engineering 114 (4), 358–371. Naesset, E., 1997. Geographical information systems in long-term forest management and planning with special reference to preservation and planning of biological diversity: a review. Forest Ecology and Management 93 (1-2), 121–136. Naesset, E., Gobakken, T., 2008. Estimation of above- and below-ground biomass across regions of the boreal forest zone using airborne laser. Remote Sensing of Environment 112, 3079–3090. Nelson, R., 1997. Modeling forest canopy heights: the effects of canopy shape. Remote Sensing of Environment 60, 327–334. NOAA (2016) C-CAP land cover Atlas. Charleston, SC: National Oceanic and Atmospheric Administration (NOAA Office for Coastal Management), https://coast.noaa.gov/ digitalcoast/tools/lca.html. Noss, R.F., 1990. Indicators for monitoring biodiversity: a hierarchical approach. Conservation Biology 4 (4), 355–364. Odenweller, J.B., Johnson, K.I., 1984. Crop identification using landsat temporal-spectral profiles. Remote Sensing of Environment 14, 39–54. http://dx.doi.org/10.1016/00344257(84)90006-3. Pacifici, F., Chini, M., Emery, W.J., 2009. A neural network approach using multi-scale textural metrics from very high-resolution panchromatic imagery for urban land-use classification. Remote Sensing of Environment 113, 1276–1292. http://dx.doi.org/10.1016/j.rse.2009.02.014. Pal, M., Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing of Environment 86, 554–565. http:// dx.doi.org/10.1016/s0034-4257(03)00132-9. Paloscia, S., Pampaloni, P., 1988. Microwave polarization index for monitoring vegetation growth. IEEE Transactions on Geoscience and Remote Sensing 26 (5), 617–621. Parplies, A., Dubovyk, O., Tewes, A., Mund, J.P., Schellberg, J., 2016. Phenomapping of rangelands in South Africa using time series of RapidEye data. International Journal of Applied Earth Observations and Geoinformation 53, 90–102. Pedrotti, F., 2013. Plant and vegetation mapping. Springer, Heidelberg, 293 p. Rascher, U., Alonso, L., Burkart, A., Cilia, C., Cogliati, S., Colombo, R., Damm, A., Srusch, M., Guanter, L., Hanus, J., Hyvarinen, T., Julitta, T., Jussila, J., Kataja, K., Kokkalis, P., Kraft, S., Kraska, T., Mateeva, M., Moreno, J., Muller, O., Panigada, C., Pikl, M., Pinto, F., Prey, L., Pude, R., Rossini, M., Schickling, A., Schurr, U., Schuttemeyer, D., Verrelst, J., Zemek, F., 2015. Sun-induced fluorescenceda new probe of photosynthesis: first maps from the imaging spectrometer HyPlant. Global Change Biology 21 (12), 4673–4684. Reitberger, J., Schnörr, C., Krzystek, P., Stilla, U., 2009. 3D segmentation of single trees exploiting full waveform LIDAR data. ISPRS Journal of Photogrammetry and Remote Sensing 64 (6), 561–574. Richards, J.A., 2013. Remote sensing digital image analysis: an introduction, 5th edn. Springer, Berlin Heidelberg.

26

GIS for Mapping Vegetation

Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P., 2012. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing 67, 93–104. http://dx.doi.org/10.1016/j.isprsjprs.2011.11.002. Rohierse, A., 2004. Regionale Darstellung der Umweltbelastungen durch klimarelevante Gase in der Agrarlandschaft KraichgaudDas Boden-Landnutzungs-Informations-System für Treibhausgasemissionen. Universität Hohenheim, Deutschland. http://opus.ub.uni-hohenheim.de/volltexte/2004/48. Rohierse, A., Bareth, G., 2004. Integration einer multitemporalen Satellitenbildklassifikation in ATKIS zur weiteren Differenzierung der Objektart Ackerland. GIS 2004, 35–41. Roy, P.S., Behera, M.D., Murthy, M.S.R., Roy, A., Singh, Sarnam, Kushwaha, S.P.S., Jha, C.S., Sudhakar, S., Joshi, P.K., Reddy, C.S., Gupta, S., Pujar, G., Dutt, C.B.S., Srivastava, V.K., Porwal, M.C., Tripathi, P., Singh, J.S., et al., 2015. New vegetation type map of India prepared using satellite remote sensing: comparison with global vegetation maps and utilities. International Journal of Applied Earth Observation and Geoinformation 39, 142–159. Rura, M., Marble, D., Alvarez, D., 2014. In Memoriam, Roger Tomlinsond“The Father of GIS” and the transition to computerized geographic information. Photogrammetric Engineering & Remote Sensing 80 (5), 401–402. Schickling, A., Matveeva, M., Damm, A., Schween, J.H., Wahner, A., Graf, A., Crewell, S., Rascher, U., 2016. Combining sun-induced chlorophyll fluorescence and photochemical reflectance index improves diurnal modeling of gross primary productivity. Remote Sensing 8, 574. http://dx.doi.org/10.3390/rs8070574. Scott, J.M., Davis, F., Csuti, B., Noss, R., Butterfield, B., Groves, C., Anderson, H., Caicco, S., Derchia, F., Edwards, T.C., Ulliman, J., Wright, R.G., 1993. GAP analysisda geographical approach to protection of biological diversity. Wildlife Monographs 123, 1–41. Shalaby, A., Tateishi, R., 2007. Remote sensing and GIS for mapping and monitoring land cover and land-use changes in the Northwestern coastal zone of Egypt. Applied Geography 27, 28–41. http://dx.doi.org/10.1016/j.apgeog.2006.09.004. Shao, Y., Lunetta, R.S., 2012. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS Journal of Photogrammetry and Remote Sensing 70, 78–87. http://dx.doi.org/10.1016/j.isprsjprs.2012.04.001. Simmer, C., Masbou, M., Thiele-Eich, I., Amelung, W., Bogena, H., Crewell, S., Diekkrüger, B., Ewert, F., Franssen, H.-J.H., Huisman, J.A., Kemna, A., Klitzsch, N., Kollet, S., Langensiepen, M., Löhnert, U., Mostaquimur Rahman, A.S.M., Rascher, U., Schneider, K., Schween, J., Shao, Y., Shrestha, P., Stiebler, M., Sulis, M., Vanderborght, J., Vereecken, H., van der Kruk, J., Waldhoff, G., Zerenner, T., 2014. Monitoring and modeling the terrestrial system from pores to catchmentsdthe transregional collaborative research center on patterns in the soil-vegetation-atmosphere system. Bulletin of the American Meteorological Society. http://dx.doi.org/10.1175/BAMS-D-13-00134.1. Smith, G.M., Fuller, R.M., 2001. An integrated approach to land cover classification: an example in the Island of Jersey. International Journal of Remote Sensing 22, 3123–3142. http://dx.doi.org/10.1080/01431160152558288. Solberg, A.H.S., Jain, A.K., Taxt, T., 1994. Multisource classification of remotely-sensed datadfusion of landsat TM and SAR images. IEEE Transactions on Geoscience and Remote Sensing 32, 768–778. http://dx.doi.org/10.1109/36.298006. Strahler, A.H., Logan, T.L., Bryant, N.A., 1978. Improving forest cover classification accuracy from Landsat by incorporating topographic information. In: Proceedings of the Twelfth International Symposium on Remote Sensing of the EnvironmentEnvironmental Research Institute of Michigan, Ann Arbor, Michigan, pp. 927–942. Teluguntla, P.G., Thenkabail, P.S., Xiong, J.N., Gumma, M.K., Giri, C., Milesi, C., Ozdogan, M., Congalton, R., Tilton, J., Sankey, T.T., Massey, R., Phalke, A., Yadav, K., 2015. Global Cropland Area Database (GCAD) derived from remote sensing in support of food security in the twenty-first century: current achievements and future possibilities, land resources monitoring, modeling, and mapping with remote sensing (remote sensing handbook). Taylor & Francis, Boca Raton, Florida. Thenkabail, P.S., 2010. Global croplands and their importance for water and food security in the twenty-first century: towards an ever green revolution that combines a second green revolution with a blue revolution. Remote Sensing 2, 2305–2312. Thenkabail, P.S., 2012. Global croplands and their water use for food security in the twenty-first century foreword. Photogrammetric Engineering and Remote Sensing 78, 797–798. Thenkabail, P.S., Lyon, J.G., Huete, A., 2011. Advances in hyperspectral remote sensing of vegetation and agricultural croplands. In: Thenkabail, P.S., Lyon, J.G., Huete, A. (Eds.), Hyperspectral remote sensing of vegetation. CRC Press, Boca Raton, pp. 3–36. Thenkabail, P.S., Smith, R.B., de Pauw, E., 2000. Hyperspectral vegetation indices and their relationships with agricultural crop characteristics. Remote Sensing of Environment 19 (3), 427–438. Tilly, N., Aasen, H., Bareth, G., 2015. Fusion of plant height and vegetation indices for the estimation of barley biomass. Remote Sensing 7 (9), 11449–11480. Tucker, C.J., Townshend, J.R.G., Goff, T.E., 1985. African land-cover classification using satellite data. Science 227, 369–375. http://dx.doi.org/10.1126/science.227.4685.369. Turner, D., Lucieer, A., Malenovsky, Z., King, D.H., Robinson, S.A., 2014. Spatial co-registration of ultra-high resolution visible, multispectral and thermal images acquired with a Micro-UAV over antarctic moss beds. Remote Sensing 6 (5), 4003–4024. Turner, D., Lucieer, A., Watson, C., 2012. An automated technique for generating georectified mosaics from ultra-high resolution Unmanned Aerial Vehicle (UAV) imagery, based on Structure from Motion (SfM) point clouds. Remote Sensing 4 (5), 1392–1410. Turker, M., Arikan, M., 2005. Sequential masking classification of multi-temporal Landsat7 ETM þ images for field-based crop mapping in Karacabey, Turkey. International Journal of Remote Sensing 26, 3813–3830. http://dx.doi.org/10.1080/01431160500166391. Vanderbilt, V.C., Silva, L.F., Bauer, M.E., 1990. Canopy architecture measured with a laser. Applied Optics 29 (1), 99–106. Van Niel, T.G., McVicar, T.R., 2004. Determining temporal windows for crop discrimination with remote sensing: a case study in south-eastern Australia. Computers and Electronics in Agriculture 45, 91–108. http://dx.doi.org/10.1016/j.compag.2004.06.003. Waldhoff G (2014) Multidaten-Ansatz zur fernerkundungs- und GIS-basierten Erzeugung multitemporaler, disaggregierter Landnutzungsdaten. Methodenentwicklung und Fruchtfolgenableitung am Beispiel des Rureinzugsgebiets. Köln: Universität zu Köln, 334 p. http://kups.ub.uni-koeln.de/5861/. Waldhoff G, Bareth G (2009) GIS- and RS-based land use and land cover analysis: case study Rur-Watershed, Germany, Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Advanced Spatial Data Models and Analyses, Proc. SPIE 7146. SPIE, pp. 714626–714628, doi:10.1117/12.813171. Waldhoff, G., Curdt, C., Hoffmeister, D., Bareth, G., 2012. Analysis of multitemporal and multisensor remote sensing data for crop rotation mapping. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences I-7, 177–182. http://dx.doi.org/10.5194/isprsannals-I-7-177-2012. Waldhoff, G., Eichfuss, S., Bareth, G., 2015. Integration of remote sensing data and basic geodata at different scale levels for improved land use analyses. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XL-3/W3, 85–89. http://dx.doi.org/10.5194/isprsarchives-XL-3-W3-85-2015. Waldhoff, G., Lussem, U., Bareth, G., 2017. Multi-Data Approach for remote sensing-based regional crop rotation mapping: a case study for the Rur catchment. International Journal of Applied Earth Observations and Geoinformation, Germany. http://dx.doi.org/10.1016/j.jag.2017.04.009. Accepted. Walsh, S.J., 1980. Coniferous tree species mapping using Landsat data. Remote Sensing of Environment 9, 11–26. Wang, J., Zhao, Y., Li, C., Yu, L., Liu, D., Gong, P., 2015. Mapping global land cover in 2001 and 2010 with spatial-temporal consistency at 250 m resolution. ISPRS Journal of Photogrammetry and Remote Sensing 103, 38–47. http://dx.doi.org/10.1016/j.isprsjprs.2014.03.007. Warner, T.A., Nellis, M.D., Foody, G.M., 2009. Remote sensing scale and data selection issues. In: Warner, T.A., Nellis, M.D., Foody, G.M. (Eds.), The SAGE handbook of remote sensing. SAGE Publications Ltd, London, pp. 3–17. Waske, B., Braun, M., 2009. Classifier ensembles for land cover mapping using multitemporal SAR imagery. ISPRS Journal of Photogrammetry and Remote Sensing 64, 450–457. http://dx.doi.org/10.1016/j.isprsjprs.2009.01.003. Wieneke, S., Ahrends, H., Damm, A., Pinto, F., Stadler, A., Rossini, M., Rascher, U., 2016. Airborne based spectroscopy of red and far-red sun-induced chlorophyll fluorescence: implications for improved estimates of gross primary productivity. Remote Sensing of Environment 184, 654–667. Wu, Q., Li, H.-Q., Wang, R.-S., Paulussen, J., He, Y., Wang, M., Wang, B.-H., Wang, Z., 2006. Monitoring and predicting land use change in Beijing using remote sensing and GIS. Landscape and Urban Planning 78, 322–333. http://dx.doi.org/10.1016/j.landurbplan.2005.10.002. Wu, W., Zucca, C., Karam, F., Liu, G., 2016. Enhancing the performance of regional land cover mapping. International Journal of Applied Earth Observation and Geoinformation 52, 422–432. http://dx.doi.org/10.1016/j.jag.2016.07.014.

GIS for Mapping Vegetation

27

Wulder, M., Franklin, S. (Eds.), 2003. Remote sensing of forest environmentsdconcepts and case studies. Springer, New York, 519 p. Wyatt, B.K., 2000. Vegetation mapping from ground, air and spacedcompetitive or complementary techniques? In: Alexander, R., Millington, A.C. (Eds.), Vegetation mapping: from patch to planet. John Wiley & Sons Ltd, New York, pp. 3–17. van der Zee, D., Huizing, H., 1988. Automated cartography and electronic Geographic Information Systems. In: Küchler, A.W., Zonnewald, I.S. (Eds.), Vegetation mapping, Handbook of vegetation science, vol. 10. Kluwer Academic Publishers, Dordrecht, pp. 163–189. Xian, G., Homer, C., Fry, J., 2009. Updating the 2001 National Land Cover Database land cover classification to 2006 by using Landsat imagery change detection methods. Remote Sensing of Environment 113, 1133–1147. http://dx.doi.org/10.1016/j.rse.2009.02.004. Yang, X., Lo, C.P., 2002. Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia metropolitan area. International Journal of Remote Sensing 23, 1775–1798. http://dx.doi.org/10.1080/01431160110075802. Yuan, F., Sawaya, K.E., Loeffelholz, B.C., Bauer, M.E., 2005. Land cover classification and change analysis of the Twin Cities (Minnesota) Metropolitan Area by multitemporal Landsat remote sensing. Remote Sensing of Environment 98, 317–328. Zarco-Tejada, P.J., Pushnik, J.C., Dobrowski, S., Ustin, S.L., 2003. Steady-state chlorophyll a fluorescence detection from canopy derivative reflectance and double-peak red-edge effects. Remote Sensing of Environment 84 (2), 283–294. Zonneveld, I.S., 1988. Examples of vegetation maps, their legends and ecological diagrams. In: Küchler, A.W., Zonnewald, I.S. (Eds.), Vegetation mapping, Handbook of vegetation science, vol. 10. Kluwer Academic Publishers, Dordrecht, pp. 135–147.

Further Reading Reitberger, J., Krzystek, P., Stilla, U., 2008. Analysis of full waveform LIDAR data for the classification of deciduous and coniferous trees. International Journal of Remote Sensing 29 (5), 1407–1431. Wulder, M.A., White, J.C., Goward, S.N., Masek, J.G., Irons, J.R., Herold, M., Cohen, W.B., Loveland, T.R., Woodcock, C.E., 2008. Landsat continuity: issues and opportunities for land cover monitoring. Remote Sensing of Environment 112, 955–969. http://dx.doi.org/10.1016/j.rse.2007.07.004.

2.02

GIS for Paleo-limnological Studies

Yongwei Sheng, Austin Madson, and Chunqiao Song, University of California, Los Angeles, CA, United States © 2018 Elsevier Inc. All rights reserved.

2.02.1 2.02.2 2.02.3 2.02.4 2.02.5 2.02.6 References

2.02.1

Introduction GIS Toolbox for Paleo-lake Reconstruction Regional-Scale Paleo-lake Reconstruction on the TP Lithospheric Rebound Effect Modeling Discussions Concluding Remarks

28 29 30 32 33 34 35

Introduction

Lakes have been experiencing distinct changes in response to ongoing and accelerated climatic changes and environmental evolutions (Oki and Kanae, 2006). In addition, human population growth and anthropological demands on water usage have increasingly put pressure on lake water (Barnett et al., 2008). The extent of environmental and human-induced impacts on lakes has increased tremendously over the past century at all spatial scales (Vörösmarty et al., 2000; Alcamo et al., 2007). As a result, lakes are able to serve as an important indicator of human-induced stressors as well as of climatic and environmental changes within their drainage basins. Changes in lake extents can reflect broader implications with respect to regional water balances, ecosystem health, biogeochemical cycles, exchange of energy and trace gases with the atmosphere, and human water use. Knowledge of paleo-environmental conditions is crucial for us to better understand the context of the global climatic changes that we are currently experiencing (Sheng, 2009). In particular, reconstruction of the paleo-lake conditions has large ramifications for paleo-limnological and paleo-climatological studies, and these techniques have become a vital means for a better understanding of the current and future global-scale climatic changes. While contemporary lake dynamics are being monitored using various remote sensing data (Cretaux et al., 2015; Li et al., 2011a,b; Nie et al., 2013; Sheng and Li, 2011; Sheng et al., 2016; Smith et al., 2005; Song and Ke, 2014; Song and Sheng, 2015; Verpoorter et al., 2014), paleo-lake changes are rather difficult to determine. Geographic information systems (GISs), although not widely used in paleo-limnology studies, provide a useful tool for analyzing the environmental conditions for paleo-lakes. GIS has advantages in managing spatial lake coring and sampling databases, latitude/longitude coordinates, core depth, laboratory identification number, sediment types, dating information, etc. (Heim et al., 2008). GIS provides the means for a complex analysis of other thematic and auxiliary datasets relevant to limnology. For example, lake basin boundaries and drainage networks can be extracted from digital elevation models (DEMs) using various GIS hydrological modeling functions, and this information is essential to a better understanding of runoff pathways and water budgets for paleo-lakes. However, reconstructing paleo-lake inundation extent is much more challenging as compared to contemporary lake mapping from remotely sensed optical and/or radar datasets. This is due to the fact that the paleo-lake reconstruction must be based on the discontinuous relict paleo-shoreline features and lacustrine deposit indicators, and these features can only be resolved by remotely sensed data at certain spatial resolutions. Drastic lake shrinkage can have severe negative impacts on the surrounding environment. The environmental disaster caused by the Aral Sea’s shrinkage is a recent but classic example of these potentially negative environmental implications. The Aral Sea, which in 1960 was formerly the world’s fourth largest with an area of 68,000 km2, has been dramatically shrinking over the last five decades. This sharp reduction in the Aral Sea’s areal extent is due to the diversion of the Sya Darya and Amu Darya Rivers for agricultural irrigation and alleviating human the anthropological water stress from climatic drought events. From 1960 to 1987, the Aral Sea’s water level dropped by  13 m, its areal extent was reduced by 40 %, and the salinity levels increased from 10 to 27 g/L, respectively. The combination of these factors subsequently turned the Aral Sea into a residual brine lake (Micklin, 1988). Owing to this dramatic desiccation, the Aral Sea became two separate water bodies in the mid-2000s. These two water bodies, the smaller North Aral Sea and the relatively large South Aral Sea, equate to a total area less than 20% of the Sea’s original size (Sheng, 2014). The Aral Sea’s dried lakebed is estimated to contain  10 GT (gigaton) of salt, and an estimated 43 million tons of these salts are transported annually by eolian processes into adjacent areas. These wind-driven processes deposit this saline material as aerosols by rain and dew over an area as large as 150,000 km2. This deposition and increase of salinity levels has caused a steep decline in the biological diversity and productivity in the region. In addition, desert animals have suffered from the greatly increased mineral content in their drinking water, and native plant communities have been severely degraded, causing desertification in this region. Lastly, people living in the area are suffering freshwater deficiencies and high rates of cancer and respiratory illnesses, which are linked to these processes. Many paleo-lakes in other endorheic basins have undergone similar processes as those previously described for the Aral Sea. Paleo-shoreline features, such as wave-cut ridges and sand bars, are oftentimes found around modern lakes. Such indicators of paleo-lake shrinkage are widely found across the globe, especially in endorheic basins in the arid and semi-arid regions of Western and Central Asia, Northern Africa, Central Australia, the Altiplano of South America, the Great Basin of the United States, as well

28

GIS for Paleo-limnological Studies

29

as other arid regions of the world. An endorheic basin, also known as a closed basin, is a drainage basin that retains water within its boundary, without water outlets that are linked to external rivers or the ocean. As these basins are not hydrologically connected to major rivers and oceans, many lakes in these basins are particularly sensitive and vulnerable to climatic changes and human activities within their respective basin extents. Lake levels are regulated mainly by precipitation and evaporation due to the closed-off nature of these endorheic basins. Modern-day endorheic basins are typically located in arid or semi-arid climates. However, the climate regime in these regions during the Pleistocene was generally much wetter, and these regions saw drastic climatic fluctuations during the Holocene as well (Street and Grove, 1979). Many of the lakes are found surrounded by paleo-shore relicts as a result of shrinkage during these subsequent drier climatic conditions. These relicts serve as indicators of their past lake extents at various periods since their high stand during the great lake period (GLP). Though preserved, most paleo-shoreline relicts remain only as discontinuous segments visible in remotely sensed images. Due to these discontinuities in the lake relicts, it is impossible to recover the complete paleo-lake boundaries by tracing them merely using remotely sensed images and various edge detection algorithms or classification schemes. The lake must exhibit a climatic and hydrologic steady-state period in order for a paleo-shoreline to leave its mark on the surrounding landscape. This steady-state period implies that the shorelines were in fact horizontal upon their creation. In theory, these preserved paleo-shoreline segments should remain at similar elevations provided that no significant subsequent neo-tectonic activities have taken place since the initial paleo-shoreline formation during the steady-state period. The paleo-shoreline reconstruction cannot be a fully automated process since most paleo-shoreline sections have not been completely preserved due to long-term geomorphological and hydrological weathering processes. Geospatial information technologies including remote sensing and GIS provide a feasible tool for an objective and systematic paleo-lake inundation extent inventory. Sizable paleo-lake shoreline relicts and lacustrine deposits are visible on remotely sensed images, and elevation data have proven to be useful in recovering paleo-lakes (Ghienne et al., 2002; Wilkins, 1997). By combining aerial photographs, satellite images, and terrain data, reconstruction of paleo-lake extents has previously been conducted in various studies, yet mostly for individual large lakes such as Lake Chad in Africa (Drake and Bristow, 2006; Ghienne et al., 2002; Leblanc et al., 2006a,b; Schuster et al., 2005) and Lake Eyre in Australia (DeVogel et al., 2004) through tedious procedures. To facilitate efficient paleo-lake mapping at regional scales, Sheng (2009) developed a semi-automated paleo-lake extent recovery toolbox, PaleoLakeR, through the integration of satellite imagery and DEMs. This GIS-based paleo-lake reconstruction toolbox is described in section “GIS Toolbox for Paleo-lake Reconstruction”. In section “Regional-Scale Paleo-lake Reconstruction on the Tibetan Plateau,” we present a case study of Pleistocene/Holocene lake inundation area recovery and water volume change estimation in the Tibetan Plateau (TP) by utilizing the PaleoLakeR tools described in section “GIS Toolbox for Paleo-lake Reconstruction.” Substantial water volume losses due to paleo-lake shrinkage can cause the underlying lithosphere to rebound, and section “Lithospheric Rebound Effect Modeling” quantifies this lithospheric rebound for a large lake on the TP by utilizing a spherically symmetric, non-rotating, elastic, and isotropic (SNREI) Earth model. Lastly, section “Discussions” provides a discussion on future directions and possible extensions of this work.

2.02.2

GIS Toolbox for Paleo-lake Reconstruction

Typical paleo-lakes were usually much larger than their subsequent offspring lakes. For example, Paleo Lake Eyre in Central Australia, Paleo Lake Chad in Africa, Paleo Aral Sea in Central Asia, and Paleo Siling Co on the TP occupied tens of thousands of square kilometers each. Vast satellite imagery and DEM databases were required to completely encompass the vast extents of these paleo-lakes. To further complicate the matter, DEMs and satellite images are often available at varying spatial resolutions and in different map projections. The PaleoLakeR toolbox is designed and developed in ENvironment for Visualizing Images (ENVI) using Interactive Data Language (IDL) to meet the aforementioned challenges of the large quantity and heterogeneity of the input datasets. The elevation value of a pixel in the satellite image is retrieved from the DEM using the common geographic coordinates to tackle the heterogeneity issue of map projection and resolution. PaleoLakeR provides an interactive environment with a high-automation capability for efficient paleo-lake mapping. The underlying assumption of the shoreline-based paleo-lake retrieval is that the lake basin has been relatively tectonically stable since the shoreline’s formation. A paleo-shore feature is identified on a satellite image as illustrated in Fig. 1, and its elevation can be determined from topographic data. The complete paleo-lake shoreline can then be recovered by tracing the terrain contour at this elevation until a connection is made. The water volume change caused by paleo-lake shrinkage can be estimated by summarizing the volume of each DEM cell between the paleo-lake extent and the modern-day lake extent. The volume at a given DEM cell, which is the volume of a 3-D pillar, can be calculated from its present-day lake level derived from the same DEM, the recovered paleo-lake level, and the cell size. The user interface of PaleoLakeR as shown in Fig. 2 contains a group of menus accessible in the image display window. An operator using the toolbox first interactively picks sample points of the outermost paleo-shoreline segments around a contemporary lake in the satellite image. Then the tool automatically extracts their elevation values from the DEM, records elevation values of the uppermost shoreline segments, and uses it as the benchmark elevation to recover the paleo-lake extent. PaleoLakeR also automatically computes the water level change and the water volume changes as caused by the lake shrinkage. The operator can use the Point-Move tool to display the current cursor location coordinates and the corresponding elevation in the DEM. The displayed elevation value is informative in identifying paleo-shore segments that belong to the same paleo-shoreline as these segments that formed at the same period are likely found at similar water levels. The Paleo-shore Relict Sample function allows the tool’s operator to pick up the paleo-shore feature points. The Point-Report tool summarizes and reports the selected

30

GIS for Paleo-limnological Studies

Paleolake extent

Identified paleo shore relics

M od

DEM cell i

n er te in rra

Hp Hm

ce

rfa su

Modern lake extent

ΔV = Σi = n Spixel × (Hp – Hm) N

Fig. 1

Paleo-lake extent recovery and water loss computation.

Fig. 2 The interface of PaleoLakeR toolbox. Paleo-shore relicts are extensively found surrounding the endorheic lake. The green pluses are paleo-shore samples picked up at the outermost paleo-shorelines, and the red polygon outlines the recovered paleo-lake extent.

paleo-shore segments in the Pickup Elevation Point window and allows the operator to pick up the sampling point to be used as the benchmark recovery point. Finally, the core recovery tool takes the benchmark elevation from the Point-Report tool to automatically recover the paleo-lake extent by tracing the contour at the benchmark elevation from the DEM. In addition, the tool also computes the amount of water level and water volume changes due to the paleo-lake shrinkage. Once the recovery results are satisfactory, the operator can save the retrieved paleo-lake extent as a polygon that includes these paleo-lake change attributes in an ArcViewÒ Shapefile format.

2.02.3

Regional-Scale Paleo-lake Reconstruction on the TP

PaleoLakeR has been used to recover the paleo-lake extents for hundreds of lakes across the TP, which is known as one of the world’s major endorheic lake regions. A large number of lakes on the plateau are surrounded by preserved shore relicts and lightly toned

GIS for Paleo-limnological Studies

31

lacustrine deposits. We identified paleo-shoreline segments and recovered the paleo-lake extent (Fig. 3) at identified benchmark elevations by using the PaleoLakeR toolbox in combination with 30-m resolution circa-2000 Landsat/ETM þ (Enhanced Thematic Mapper plus) mosaic imagery and three-arc-second resolution Shuttle Radar Topography Mission (SRTM) DEM data. After a careful inspection of the Landsat mosaic, we found 653 contemporary lakes (with a total area of 21 613 km2, blue polygons in Fig. 3) with visible paleo-shorelines. By utilizing the recovery toolbox, we retrieved their paleo-lake extents and estimated their water volume changes due to lake shrinkage. The end product of this work is a Tibetan paleo-lake GIS database, containing paleo-lake extent polygons for the recovered lakes along with their volumetric changes. Analysis of the results shows that these contemporary lakes have evolved from only173 large paleo-lakes (orange polygons in Fig. 3), which previously occupied a total area of 63,622 km2. The total lake area shrinkage and water volume changes are estimated at 42,109 km2 (i.e., two-thirds of the total lake area) and  2936 GT, respectively. As an example of a specific lake reconstruction, the insert in Fig. 3 demonstrates the evolution of Siling Co (Co: lake in Tibetan), the largest lake in the central TP. During recent decades, Siling Co has experienced a dramatic expansion, from 1640 km2 in 1976 to 2335 km2 in 2009 with its water level reaching 4544 m. The shoreline recovery tool determined Paleo Siling Co occupied an area of 7665 km2 and subsequently shrank to its modern extent as the climate in the region became drier. This drastic reduction in areal extent has seen the Paleo Siling Co evolve into 56 different contemporary lakes including the current Siling Co as well as the sizable Pangkog Co, Urru Co, and Qiagui Co. According to our retrieval results, the paleo-lake level was estimated at 4600 m ( 56 m above the present-day water level). The paleo-lake retrieval reveals that the lake shrinkage led to a loss of  310 billion m3 water and a disappearance of  64% of the original lake extent. The paleo-shoreline identification and the water level drop were validated in a field campaign conducted during the summer of 2009 using laser-ranging devices and vertical transects (Fig. 4). Sandy samples were collected at identified paleo-shoreline relicts for optically stimulated luminescence (OSL) dating, and these dating results indicate that the GLP of Siling Co occurred  11 ka BP during the early Holocene. The above paleo-lake retrieval is considered to be a conservative estimate due to the effect of isostatic rebound as caused by the water mass unloading related to the lake shrinkage (Sella et al., 2007). The lithospheric rebound from the severe hydrologic unloading caused by the extreme changes in lake loads from the paleo-extent to the modern-day extent has very likely caused the lake basins to uplift, and subsequently cause our volume losses to be underestimated.

78°E

80°E

82°E

84°E

86°E

88°E

90°E

92°E

94°E

36°N

36°N

34°N

34°N

32°N

32°N

30°N

30°N

Legend Modern lakes Paleo lakes

28°N

28°N

Study area

80°E

82°E

84°E

86°E

88°E

90°E

92°E

94°E

Fig. 3 Recovered extent of paleo-lakes across the plateau. One hundred and seventy-three large paleo-lakes (in orange) are recovered from paleo-shore relicts surrounding 653 contemporary lakes (in blue) in the ETMþ mosaic. The inset details Siling Co reconstruction.

32

GIS for Paleo-limnological Studies

Fig. 4

Field validation of paleo-lake reconstruction using laser device.

2.02.4

Lithospheric Rebound Effect Modeling

This section uses Siling Co as a specific case study to quantify the amount of lithospheric rebound using a SNREI model. Lithospheric flexural modeling has been applied to a multitude of research topics ranging from glacial isostatic adjustment (GIA) studies to post-seismic rheology probes to both recent and paleo-lacustrine loading and unloading studies (Bradley et al., 2009; Doin et al., 2015; England et al., 2013; Lundgren et al., 2009; Madson (2016); Milne et al., 2001; Wen et al., 2012). Most lithospheric flexural analyses utilize either elastic or viscoelastic models in order to determine how the lithosphere responds to different forcings. Although an in-depth discussion on the underlying physical parameters behind these models is outside the scope of this book, we briefly introduce the SNREI Earth model utilized herein to provide an example of crustal rebound from paleo-lacustrine hydrologic unloading for a large lake on the TP. A modified version of the Regional ElAstic Rebound calculator (REAR) created by Melini et al. (2015) is employed to derive the lake-adjacent accumulative flexural rebound from hydrologic unloading. The REAR model is based on the flexural theory from Farrell (1972) and derives the constant and instantaneous response to a given loading or unloading scenario. The first step in calculating the flexural response is to derive the Green’s functions (GFs). These GFs are then used against hydrologic unloading inputs to derive the lithospheric rebound. We determine the GFs from load Love numbers (LLN) computed from two different Earth models, the STW105 model from Kustowski et al. (2007) and the TC1P model from Wang et al. (2015). We further scrutinized the use of the abovementioned Earth models to determine which of the two would provide more accurate results for the Siling Co study area. The STW105 model builds upon the Preliminary Reference Earth Model (PREM) and takes into account the nonlinear crustal effects during the inversion of the tomographic waveform in order to increase the model’s accuracy (Kustowski et al., 2007). However, the density, velocity, and thickness parameters used by the STW105 model vary from the parameters used in the TC1P model in that the TC1P model takes its crustal structure information from the recent CRUST1.0 model (Wang et al., 2015). The CRUST1.0 model computes more accurate densities for the TP region and is known to be an update to the data within PREM (Laske et al., 2013). That said, the TC1P model was ultimately utilized to calculate the GFs and the subsequent modeled flexural response as it uses more accurate and recent data for the study area. We focus on the paleo-lacustrine unloading of Siling Co and the subsequent crustal rebound for this case study. Siling Co has seen a marked and substantial reduction in lake area and water level in the past 11,000 years, and this drastic reduction in water volume is exploited in order to derive the accumulated flexural uplift as derived from the REAR SNREI Earth model. We follow similar techniques that were developed from a recent study by Madson (2016), where they modeled the downward flexural response from the modern-day expansion of Siling Co. First, the paleo-shoreline for Siling Co is calculated using the methods described in section “Regional-Scale Paleo-lake Reconstruction on the Tibetan Plateau,” and this areal extent is utilized in combination with a DEM and the recent Siling Co areal extent from Li and Sheng (2012) in order to determine the overall areal lake reduction as well as the accumulated water volume loss. We exploit several different constants (e.g., density of water, Earth’s radius, gravity constant, average Earth density) as inputs into the model. The model then computes GFs for a constant load using the LLNs described in the preceding paragraph, and this array of scalars is used to derive the accumulated paleo-lake rebound. These LLNs provide the model with the underlying lithospheric parameters required to appropriately derive the GFs. We utilize geospatial processing toolsets to create a paleo lake-load level change input array for Siling Co’s paleo-lake extent. The elastic model uses this lake-level-change array in conjunction with the previously created GFs in order to model and scale the elastic lithospheric response to Siling Co unloading. We highlight the methodology utilized to determine the flexural rebound in Fig. 5. The outputs from the SNREI model consist only of the elastic flexural response, and in order to determine a more-accurate flexural response from Siling Co unloading, we use a scale-factor as determined by Madson (2016) to account for the upward lithospheric flexure that is not captured in the elastic

GIS for Paleo-limnological Studies

33

Determine paleo-lake extent along with modern-day lake extent Determine hydrologic loading/ unloading from DEM information and lake extents Derive SNREI model gridded inputs from previous step SNREI calculates Green’s functions (GFs) from TC1P LLNs GFs used to calculate accumulated lithospheric rebound from paleolake load changes Modeled elastic response is scaled Total accumulated lithospheric rebound is plotted Fig. 5

Processing workflow utilized to derive Siling Co lithospheric rebound from paleo-lake extents.

modeling process. The scale factor was determined by comparing spaceborne laser altimetry-derived lithospheric flexural responses to the elastic Earth model-derived response. Fig. 6 highlights the scaled and modeled results as caused by the reduction of Siling Co’s hydrologic mass loading from its paleo-lake level to its modern-day lake extent. A closer inspection of Fig. 6 reveals several findings. For one, the load centroid for the paleo-lake extent is only slightly shifted to the west of the contemporary lake load centroid. This is caused by the capturing of Urru Co and Qiagui Co, which lie to the west of the lake’s modern-day extent, within the paleo-lake extent. Note that the flexural rebound closely follows the paleo-lake shoreline. Also note that the largest upward flexural response is located at/or near the paleo-lake load center of mass, which is denoted as a black square in Fig. 6. Of course, it is worth noting that as the distance from the paleo-lake increases, the flexural response to that load diminishes. This case study demonstrates the flexural rebound of one particular lake on the TP by utilizing an SNREI Earth model and a scaling factor to take into account the true response of the lithosphere to lake unloading.

2.02.5

Discussions

Paleo-lake extents can be recovered through the integration of remote sensing and digital elevation data to reveal important paleo-lake dynamics. Though we have demonstrated that geospatial information technologies can successfully recover paleo-lake areal extents and quantify lithospheric rebound, several enhancements are possible for future improvements and expansions. First of all, the recently increasing availability of high-resolution imagery and high-quality DEMs will spur GIS applications in paleo-limnological studies. Satellite imagery with higher spatial resolutions is becoming more readily available. For example, meter/sub-meter resolution WorldView and Gaofen satellite imagery is able to identify subtle paleo-shoreline relicts and other features that would not be discernable in the 30-m Landsat images used in this study. The use of this higher spatial resolution imagery would allow for a much greater quantity of paleo-lakes to be recovered. In addition, the one-arc-second SRTM DEMs were released in 2016 providing higher horizontal and vertical accuracies than the DEMs discussed in this article. This improvement of input DEMs is expected to improve the quality of paleo-lake mapping. We note that at long time scales, such as those involved in paleo-lake rebound studies, that the viscous relaxation should be taken into account. Future work to model the regional lithospheric rebound from all paleo-lake extents on the TP should utilize a coupled viscous and elastic (viscoelastic) model in order to more accurately derive the regional lithospheric response from these paleo-lake volume change datasets. We also note that the modeled rebound described in the preceding section is based on an instantaneous removal of the paleo-lake’s water load, and this rebound is termed the accumulated flexural response. Further inquiries would provide a more meaningful temporal scaling of this unloading so as to provide a more accurate flexural response product. The geospatial processing techniques highlighted elsewhere in this article provide the means to derive the regional fluctuations in paleo-lake areal extent and volume changes as compared to modern-day water levels. These lake areal and volume fluctuations provide the necessary parameters to model the lithospheric flexure in order to garner a robust understanding of the regional lithospheric responses to the extreme changes in paleo-lacustrine hydrologic mass loading and unloading on the TP.

34

GIS for Paleo-limnological Studies

88°

88.5°

89°

89.5°

90°

31.5°

32°

32.5°

87.5°

31°

Siling Co rebound Paleo extent

+ 130 cm + 69 cm + 8 cm

Modern-day extent

0

25

50

100 km

Fig. 6 Plot of scaled and modeled flexural response from paleo-lake unloading. Siling Co’s modern-day extent is shown as a dashed line and the paleo-lake extent is shown as a solid black line. Also displayed are the load centroids for both extents: where the square represents the paleo-lake load centroid and the triangle represents the modern-day lake load centroid.

Finally, the methods discussed in this article should be readily applicable to other endorheic basins. Approximately 20% of the Earth’s land drains to endorheic lakes (Sheng, 2014). Continents vary in their concentration of endorheic regions due to climatic conditions and geography. In general, endorheic basins are extensively distributed in Western and Central Asia, Northern Africa, Central Australia, the Altiplano Plateau of South America, and the Great Basin of North America. Western and Central Asia is the home to world’s largest group of endorheic lakes, including the Caspian Sea, the Dead Sea, and the Aral Sea. Africa hosts three major groups of endorheic lakes in the Sahara Desert, the East African Rift, and the Kalahari Desert. Lake Chad in the southern Sahara Desert is continually shrinking due to decreased precipitation rates and increased irrigation pressure. The Great Basin in the United States contains a large region of endorheic basins, in which the Great Salt Lake is the largest endorheic lake in North America. The Valley of Mexico in Central America also contains a number of dried lakes. The Altiplano basin in South America includes a number of closed lakes such as Lake Poopó. Central Australia has many endorheic drainages, including the highly variant Lake Eyre. Only several very large endorheic lakes have been recovered for GLP extents, and we anticipate that most paleo-lake extents will be mapped out at regional and global scales in the near future.

2.02.6

Concluding Remarks

This article briefs a sample of GIS applications in paleo-limnological studies and, in particular, discusses paleo-lake inundation area retrieval using geospatial information technologies. The semi-automated toolbox for paleo-lake reconstruction is developed based on paleo-shore relicts through an effective integration of remotely sensed imagery and digital elevation data. Though it is not possible to recover paleo-lakes in a completely automatic fashion, the recovery toolbox can retrieve the spatially explicit extent of paleo-lakes and estimate their water loss both efficiently and effectively, even at regional scales. Without such tools, paleo-lake extent recovery would be a trivial and very time-consuming process. As demonstrated on the TP, the methods have successfully recovered a large group of paleo-lakes at regional scales. With the increasing availability of satellite imagery and

GIS for Paleo-limnological Studies

35

DEM data at global coverages, the developed tools are readily applicable to recovering any paleo-lake with paleo-shore relicts visible within the confines of the spatial resolution of the input remotely sensed datasets, especially those extensively found in arid and semi-arid regions. GIS and remote sensing datasets and processing techniques allow for the calculation of lake water volume changes on the TP, and these products can be coupled with elastic and viscoelastic Earth models in order to determine the lithospheric response to regional paleo-lake unloading scenarios. The results from these types of studies allow for a better understanding of modern-day lithospheric responses to contemporary lake level changes, and this increased understanding can provide meaningful information with respect to future climatic and lacustrine changes on the TP and in other endorheic basins as well.

References Alcamo, J., Flörke, M., Märker, M., 2007. Future long-term changes in global water resources driven by socio-economic and climatic changes. Hydrological Sciences Journal 52 (2), 247–275. Barnett TP, Pierce DW, Hidalgo HG, Bonfils C, Santer BD, Das T, and Cayan DR (2008) Human-induced changes in the hydrology of the western United States. Science 319(5866): 1080–1083. Bradley, S., Milne, G., Teferle, F.N., Bingley, R., Orliac, E., 2009. Glacial isostatic adjustment of the British Isles: New constraints from GPS measurements of crustal motion. Geophysical Journal International 178, 14–22. Cretaux, J.-F., Biancamaria, S., Arsen, A., Berge-Nguyen, M. and Becker, M. (2015). Global surveys of reservoirs and lakes from satellites and regional application to the Syrdarya river basin. Environmental Research Letters 10(1): 015002. DeVogel, S.B., Magee, J.W., Manley, W.F., Miller, G.H., 2004. A GIS-based reconstruction of late Quaternary paleohydrology: Lake Eyre, arid central Australia. Palaeogeography Palaeoclimatology Palaeoecology 204, 1–13. Doin, M.P., Twardzik, C., Ducret, G., Lasserre, C., Guillaso, S., Jianbao, S., 2015. InSAR measurement of the deformation around Siling Co Lake: Inferences on the lower crust viscosity in central Tibet. Journal of Geophysical Research: Solid Earth 120, 5290–5310. Drake, N., Bristow, C., 2006. Shorelines in the Sahara: Geomorphological evidence for an enhanced monsoon from palaeolake Megachad. Holocene 16, 901–911. England, P.C., Walker, R.T., Fu, B., Floyd, M.A., 2013. A bound on the viscosity of the Tibetan crust from the horizontality of palaeolake shorelines. Earth and Planetary Science Letters 375, 44–56. Farrell, W., 1972. Deformation of the Earth by surface loads. Reviews of Geophysics 10, 761–797. Ghienne, J.F., Schuster, M., Bernard, A., Duringer, P., Brunet, M., 2002. The Holocene giant Lake Chad revealed by digital elevation models. Quaternary International 87, 81–85. Heim, B., Klump, J., Oberhansli, H., Fagel, N., 2008. Assembly and concept of a web-based GIS within the paleolimnological project CONTINENT (Lake Baikal, Russia). Journal of Paleolimnology 39, 567–584. Kustowski, B., Dziewonski, A.M., Ekström, G., 2007. Nonlinear crustal corrections for normal-mode seismograms. Bulletin of the Seismological Society of America 97, 1756–1762. Laske, G., Masters, G., Ma, Z., Pasyanos, M., 2013. Update on CRUST1. 0-A 1-degree global model of Earth’s crust. Geophysical Research Abstracts 15, 2658. Leblanc, M., Favreau, G., Maley, J., Nazoumou, Y., Leduc, C., Stagnitti, F., van Oevelen, P.J., Delclaux, F., Lemoalle, J., 2006a. Reconstruction of Megalake Chad using Shuttle Radar Topographic Mission data. Palaeogeography Palaeoclimatology Palaeoecology 239, 16–27. Leblanc, M.J., Leduc, C., Stagnitti, F., van Oevelen, P.J., Jones, C., Mofor, L.A., Razack, M., Favreau, G., 2006b. Evidence for Megalake Chad, north-central Africa, during the late Quaternary from satellite data. Palaeogeography Palaeoclimatology Palaeoecology 230, 230–242. Li, J., Sheng, Y., 2012. An automated scheme for glacial lake dynamics mapping using Landsat imagery and digital elevation models: A case study in the Himalayas. International Journal of Remote Sensing 33, 5194–5213. Li, J., Sheng, Y., Luo, J., 2011a. Automatic extraction of Himalayan glacial lakes with remote sensing. Yaogan Xuebao- Journal of Remote Sensing 15, 29–43. Li, J., Sheng, Y., Luo, J., Shen, Z., 2011b. Remotely sensed mapping of inland lake area changes in the Tibetan Plateau. Hupo Kexue 23, 311–320. Lundgren, P., Hetland, E.A., Liu, Z., Fielding, E.J., 2009. Southern San Andreas-San Jacinto fault system slip rates estimated from earthquake cycle models constrained by GPS and interferometric synthetic aperture radar observations. Journal of Geophysical Research: Solid Earth 114. Madson A (2016) ICESat Derived Lithospheric Flexure as Caused by an Endorheic Lake’s Expansion on the Tibetan Plateau and its Rheological Constraints. Master’s Thesis. University of California, Los Angeles, 53 pp. Melini D, Gegout P, Midi–Pyrenees O, and Spada G (2015) On the rebound: Modeling Earth’s ever-changing shape. EOS 96. Micklin, P.P., 1988. Disiccation of the Aral Sea: A water management disaster in the Soviet Union. Science 241, 1170–1176. Milne, G.A., Davis, J.L., Mitrovica, J.X., Scherneck, H.-G., Johansson, J.M., Vermeer, M., Koivula, H., 2001. Space-geodetic constraints on glacial isostatic adjustment in Fennoscandia. Science 291, 2381–2385. Nie, Y., Liu, Q., Liu, S., 2013. Glacial lake expansion in the Central Himalayas by Landsat images, 1990–2010. PLoS One 8, e83973. Oki T and Kanae S (2006) Global hydrological cycles and world water resources. Science 313(5790): 1068–1072. Schuster, M., Roquin, C., Duringer, P., Brunet, M., Caugy, M., Fontugne, M., Mackaye, H.T., Vignaud, P., Ghienne, J.F., 2005. Holocene Lake Mega-Chad palaeoshorelines from space. Quaternary Science Reviews 24, 1821–1827. Sella G F, Stein S, Dixon T H, Craymer M, James T S, Mazzotti S, and Dokka R K (2007) Observation of glacial isostatic adjustment in “stable” North America with GPS. Geophysical Research Letters 34(2): L02306. Sheng, Y., 2009. PaleoLakeR: A semiautomated tool for regional-scale Paleolake recovery using geospatial information technologies. IEEE Geoscience and Remote Sensing Letters 6, 797–801. Sheng, Y., 2014. Endorheic lake dynamics: Remote sensing. In: Wang, Y. (Ed.), Encyclopedia of natural resources: Water. Taylor & Francis, New York, pp. 687–695. Sheng, Y., Li, J., 2011. Satellite-observed endorheic lake dynamics across the Tibetan Plateau between circa 1976 and 2000. In: Wang, Y. (Ed.), Remote sensing of protected lands. CRC Press, New York, pp. 305–319. Sheng Y, Song C, Wang J, Lyons E A, Knox B R, Cox J S, and Gao F (2016) Representative lake water extent mapping at continental scales using multi-temporal Landsat-8 imagery. Remote Sensing of Environment 185: 129–141. Smith, L., Sheng, Y., MacDonald, G., Hinzman, L., 2005. Disappearing arctic lakes. Science 308, 1429. Song C and Ke L (2014) Recent dramatic variations of China’s two largest freshwater lakes: Natural process or influenced by the Three Gorges Dam? Environmental Science & Technology, 48(3): 2086–2087 Song, C. and Sheng, Y. (2015). Contrasting evolution patterns between glacier-fed and non-glacier-fed lakes in the Tanggula Mountains and climate cause analysis. Climatic Change 135(3–4), 493–507. Street, F.A., Grove, A.T., 1979. Global maps of lake-level fluctuations since 30,000 yr BP. Quaternary Research 12, 83–118. Verpoorter, C., Kutser, T., Seekell, D.A., Tranvik, L.J., 2014. A global inventory of lakes based on high-resolution satellite imagery. Geophysical Research Letters 41, 6396–6402. Vörösmarty CJ, Green P, Salisbury J, and Lammers RB (2000) Global water resources: vulnerability from climate change and population growth. Science 289(5477): 284–288.

36

GIS for Paleo-limnological Studies

Wang H, Xiang L, Wu P, Jia L, Jiang L, Shen Q, and Steffen H (2015) The influences of crustal thickening in the Tibetan Plateau on loading modeling and inversion associated with water storage variation. Geodesy and Geodynamics 6(3): 161–172. Wen Y, Li Z, Xu C, Ryder I, and Bürgmann R, (2012) Postseismic motion after the 2001 MW 7.8 Kokoxili earthquake in Tibet observed by InSAR time series. Journal of Geophysical Research: Solid Earth 117: B08405. Wilkins, D.E., 1997. Hemiarid basin responses to abrupt climatic change: Paleolakes of the trans-Pecos closed basin. Physical Geography 18, 460–477.

2.03

GIS and Soil

Federica Luca` and Gabriele Buttafuoco, National Research Council of Italy (CNR-ISAFOM), Rende, Italy Oreste Terranova, Research Institute for National Research Council of Italy (CNR-IRPI), Rende, Italy © 2018 Elsevier Inc. All rights reserved.

2.03.1 Introduction 2.03.2 Soil Properties and Spatial Scale 2.03.3 Soil–Landscape Modeling 2.03.4 Digital Soil Mapping 2.03.4.1 Statistical Methods 2.03.4.2 Geostatistical Methods 2.03.4.3 Hybrid Methods 2.03.5 Applications of GIS in Soil Sciences 2.03.5.1 Data Fusion 2.03.5.2 Soil Erosion and Hydrological Models 2.03.5.3 Precision Agriculture 2.03.6 Conclusive Remarks Acknowledgement References Further Reading Relevant Websites

37 38 40 41 41 42 43 43 43 44 46 46 46 47 50 50

Glossary Auxiliary variables/maps The expression refers to nonsoil data used to improve soil mapping. A synonymous of secondary data, ancillary maps, environmental covariates, or nonsoil layers. Typical examples are topographic attributes, remote and proximal sensor data, geological, geomorphological, and hydrological maps. Digital soil mapping The creation and population of spatial soil information by the use of field and laboratory observational methods, coupled with spatial and nonspatial soil inference system (Carré et al., 2007; Lagacherie and McBratney, 2007). Geomorphometry The science of quantitative land-surface analysis (Pike, 2000). A synonymous for digital terrain analysis or terrain modeling. It involves the extraction of topographic attributes and objects (i.e., landscape units) from digital elevation models. Legacy soil data All existing soil information (i.e., laboratory data, soil profile description) useful to characterize or map soils and landscape. Pedometrics The application of mathematical and statistical methods for the quantitative modeling of soils, with the purpose of analyzing their distribution, properties, and behaviors. Soil variable A generic name for quantitative (measurable) and qualitative (descriptive) soil properties or characteristics. Soil–landscape modeling The use of topographic attributes to improve modeling (spatial prediction) of soil variables. Soil–landscape modeling is sometimes used as a synonym for the pedometric approach to soil mapping. Topographic attributes Variables (or maps) derived using some terrain analysis algorithm. Topographic attributes may refer to geomorphological (e.g., slope curvature), hydrological (e.g., wetness index), or climatic (e.g., insolation) features of a study area. A synonym for morphometric variables or terrain attributes. Primary topographic attributes are derived from a DEM, whereas secondary are calculated from the combination of two or more primary attributes.

2.03.1

Introduction

Soil is a valuable nonrenewable natural resource, which provides essential support to ecosystems throughout the world. According to the European Commission (2006) soil is important for (1) biomass production in both agriculture and forestry; (2) storing, filtering, and transforming nutrients, substances, and water; (3) the biodiversity pool of habitats and species; (4) human physical and cultural environment and activities; (5) source of raw materials; (6) acting as carbon pool; and (7) archive of geological and archeological heritage. Accurate information about soils is required for land resource management, monitoring, and policymaking. Understanding the soil spatial distribution is critical to assess the interplay between chemical and physical processes and for environmental protection

37

38

GIS and Soil

at a variety of scales, ranging from the management of a field up to studying global climate change impacts. In recent decades, soil scientists have made great efforts to develop both regional and global soil databases. Larger scale digital soil data are available for the Northern America, Australia, and Europe even though the geographical coverage is uneven within countries (Morvan et al., 2008). Soil databases at scale less than 1:250,000 are frequent (Nachtergaele, 1999; Rossiter, 2004) and a list of Soil Geographic Databases updated at October 26 is available at: http://www.css.cornell.edu/faculty/dgr2/research/sgdb/sgdb.html. Legacy data are an additional source of soil information but, as suggested by Lagacherie and McBratney (2007), they must be appropriately organized before to be used. In fact, data coming from different projects and created at different times by different surveyors need to be harmonized for location, information, and soil depth (Sulaeman et al., 2013). Location harmonization allows one to express information in a unique georeference system; information harmonization depends upon the aim and relies on selecting the type of data and field properties to be used. Further, soil depth harmonization is required to convert observations at the same standard depth (Sulaeman et al., 2013). Providing new more specific and detailed soil information, particularly suitable for analyses in geographic information systems (GIS), represents a challenge for soil scientists (Lagacherie and McBratney, 2007). Traditional soil survey is based on the recognition of soil properties at few sites and of their qualitative relation with landscape and environmental variables based on conceptual models. Although this approach implicitly incorporates the expertise of the soil scientist, it does not make use of geocomputational technologies that are now widely available. Remote sensing and photogrammetric techniques provide digital representations of the Earth’s surface that can be combined with soil data in a GIS to allow efficient storing and analysis of vast amounts of data. Technological advances during the last few decades have created potential for improvement in the way that soil maps may be produced. Digital soil mapping (DSM) is defined as “the creation and population of spatial soil information by the use of field and laboratory observational methods, coupled with spatial and non-spatial soil inference system” (Carré et al., 2007; Lagacherie and McBratney, 2007). DSM relies on the computer-assisted production of soil maps by assessing the relationships between sparsely available soil properties and extensively available ancillary data. In the context of soil digital mapping, therefore, the use of topographic attributes and remote sensing data plays a key role. In recent advances in environmental sensing, ancillary data production is increasing thus promoting the use of “environmental covariates,” as digital spatial data sets for soil mapping. For example, remotely sensed data provide proxies for organisms and density of vegetation in addition to soil mineralogy, particle size, and parent materials (Sullivan et al., 2005; Saadat et al., 2008), whereas topographic variables derived from a digital elevation model (DEM) provide proxies for relief and local-scale variation in solar radiation, surficial water redistribution, and erosion processes (Moore et al., 1993; Florinsky et al., 2002; Ziadat et al., 2003). The above mentioned environmental covariates may then be used to predict soil property distribution using soil prediction models. The most recent definition of Pedometrics (McBratney et al., 2003) is: “the application of mathematical and statistical methods for the quantitative modelling of soils, with the purpose of analyzing its distribution, properties and behaviours” (www.pedometrics.org). In the International Union of Soil Science (www.iuss.org) organization, Pedometrics is the Commission 1.5 of the Division 1dSoil in Space and Time. Pedometrics can be considered an interdisciplinary subject where soil science, applied statistics, and GIS intersect. It involves the use of terrain attributes for improving estimation (spatial prediction) and modeling of soil properties. The expression “soil–landscape modeling” is sometimes used as a synonym for pedometrics. This article represents an attempt to synthesize the use of GIS in soil sciences at different spatial scale by (i) describing the increasing availability of ancillary data for soil characterization; (ii) illustrating the main relationships between soil properties and DEM-derived topographic features; (iii) summarizing spatial and nonspatial pedometric techniques for estimating and modeling soil properties; (iv) illustrating some applications of GIS in soil science also highlighting the contribution of DSM for land use planning and soil protection.

2.03.2

Soil Properties and Spatial Scale

Soil is an open system and its formation and evolution is described by the “Factors of Soil Formation,” or clorpt model (Jenny, 1941). The state-factor equation is expressed as: S ¼ f ðcl; o; r; p; t Þ

(1)

where soil (S) is considered to be a function of climate (cl), organisms (o), relief (r), and parent material (p) through time (t). It is noteworthy that the five variables in the equation are not pedogenic processes but state factors of the environmental system which condition processes. Recently, McBratney et al. (2003) modified the state factor through the equation: S ¼ f ðs; c; o; r; p; a; nÞ

(2)

where a soil is a function of the external factors of climate (c), organisms (o), relief (r), parent material (p), age (a), and its location in space (n). Scorpan includes not only the same factor as clorpt but also correlation between soil properties (s) and space (n), thus highlighting the importance of prior soil information (s) and spatial relationships (n) for predicting soil properties spatial variation. The spatial complexity of soils makes it difficult the prediction and modeling soil processes acting at the topographic surface and up to the bedrock such as soil erosion, hydrology, carbon cycling, and soil pollution. In order to develop reliable and highresolution soil maps, accurate soil data are needed for hydrological analysis, environmental protection, planning agriculture

GIS and Soil

39

SPOT satellite

World view satellite

Drone

LiDAR

Profile scale DEM pixel size Sample scale

Field scale GPR

Fig. 1

Sketch of spatial scales for soil data.

crop production, and forest management. GIS allows one to store, manage, and analyze such amount of data coming from different sources and with different spatial resolution (Fig. 1). Field soil survey is the primary method for acquiring soil data and is generally performed by collecting samples either systematically or randomly and recording their location by a global positioning system. Laboratory-derived analysis provides punctually measurement of soil physical, chemical, or biological properties, which vary both laterally and vertically in the different horizons along soil profile. The main physical properties include soil texture, structure, porosity, density and influence root penetration, water availability, the degree to which water moves both laterally and vertically through the soil, biomass production, and distribution of agricultural or forest species. Physical properties may be almost permanent in time, unless modified by harvesting operations, shifting cultivation, hillslope processes, or fires. On the contrary, chemical (e.g., organic matter) and biological properties (i.e., enzymatic activities), especially those involved in nutrient cycling (carbon, nitrogen, and phosphorus), have often shown to be sensitive to small changes in soil or environmental condition, thus providing information regarding edaphic variations (Bastida et al., 2008). Although direct property measuring provides the best information, soil sampling and laboratory analyses are time consuming and cost prohibitive, especially in areas with topographic constraints. Moreover, without an adequate sample design, measurements do not allow describing soil horizontal and/or vertical variability (Lucà et al., 2014). As such, there is the need for a more efficient method to generate accurate high-resolution soil maps over larger areas, with the most cost-effectiveness. Remote and proximal sensing may provide data sources useful for characterizing soil properties and they can be used as both primary data sources and (or) auxiliary variables. Remote sensing is the process of inferring surface parameters from airborne and spaceborne acquisitions of the emitted or reflected electromagnetic radiation from the terrain surface. The section briefly describes the main aspects, whereas an extensive review of remote sensing for soil mapping can be found in Mulder et al. (2011). Sensors may be active or passive and can operate in various portions of the electromagnetic spectrum. Remote sensing spatially provides digital data for characterizing several soil, terrain, and vegetation properties. Among passive remote sensing, optical multispectral and hyperspectral sensors are particularly suitable for land use and mineralogical analysis; spectroscopy sensors are suitable for deriving soil properties (e.g., mineralogical composition, iron oxides, and organic matter); microwave sensors are mainly used for estimating soil moisture, whereas thermal infrared sensors are particularly used for soil temperature estimation. Most of the active sensors operate in the microwave portion of the electromagnetic spectrum where wavelengths are unaffected by meteorological conditions. Airborne systems (LiDAR, multispectral, and hyperspectral) have demonstrated capabilities for monitoring and analyzing relevant variables for soil science (e.g., mineralogy, moisture, and elevation) at finer spatial resolutions and over smaller extents. RADAR and passive microwave systems have been used for assessing soil properties at regional or basin scale. In addition, moderate and coarse resolution sensors provide more frequent coverage than high-resolution sensors, such as the Landsat, ASTER, and SPOT sensors.

40

GIS and Soil

Differently from remote sensing, proximal soil sensing can be used to measure the properties close to, or even in contact with the soil, and sensors may or may not be mounted on vehicles for on-the-go data acquisition. Among proximal sensors, visible and nearinfrared (Vis–NIR) spectroscopy is commonly used to predict several physical and chemical properties, based on soil reflectance. Vis–NIR has been used for estimating soil texture (Ben-Dor, 2002; Conforti et al., 2015), organic and total carbon (Stevens et al., 2010; Lucà et al., 2015; 2017), nitrogen (Selige et al., 2006; Lucà et al., 2015), and moisture (Ben-Dor, 2002). Geophysical sensors are also widely used by soil scientists to better understand spatial variability of soil property at both field and landscape scales, because of their speed in data acquisition, ease of use, and relatively low cost (Corwin, 2008). Electromagnetic induction (EMI) sensors measure changes in the apparent electrical conductivity (ECa) of the soil without direct contact with the sampled volume and are suitable as supplemental to soil sampling in areas with high soil variability (Castrignanò et al., 2008). Several studies have shown that the main soil properties affecting ECa are soil salinity (Corwin, 2008), content and type of clay (King et al., 2005; Wienhold and Doran, 2008; Cockx et al., 2009), and soil–water content (Brevik et al., 2006; Weller et al., 2007). ECa has also been used as a surrogate measure of bulk density, soil structure, ionic composition, CEC, pH, and soil organic carbon, nutrient, and CaCO3 contents (Doolittle and Brevik, 2014 and reference therein). Given the interaction among different soil properties, the relationship between ECa and soil is site-specific and sometimes complex to detect and it can vary over short distance or with the directions (Brevik et al., 2004; Farahani et al., 2005). Ground penetrating radar (GPR) relies on the radiation of high frequency electromagnetic waves into the soil and on the recording of the reflected signals for imaging the subsoil and quantifying its properties. The GPR signal is mainly sensitive to soil dielectric permittivity, which primarily depends on soil–water content (Huisman et al., 2003). EMI and GPR are successfully applied in DSM for assessing soil properties and mapping spatial variability of soil physical and chemical properties.

2.03.3

Soil–Landscape Modeling

Since topography is one of the five factors of soil formation (Jenny, 1941; McBratney et al., 2003), in DSM, morphometric variables are commonly used as key predictors of soil properties if dependence and strong statistical relationships occur. Topography exerts a great influence on soil properties since it affects: (a) water and sediment redistribution, (b) the meteorological features controlling soil temperature and moisture, and (c) vegetation distribution. In addition, as topography is the result of the interplay of endogenous and exogenous processes acting at different scales, topography can reflect the geological structure of a study area (Scheidegger, 2004; Brocklehurst, 2010). DEM-derived attributes used to quantify the morphology of the terrain are the base of geomorphometry (Pike, 2000). In a GIS environment, primary topographic attributes are calculated directly from a DEM, whereas secondary attributes (or compound attributes) are combination of primary ones. Secondary terrain attributes are catchment-related (Li et al., 2016) and give insight for processes occurring along a hillslope, that is soil erosion or deposition potential. The application of statistical techniques to analyze soil spatial distribution using topographic and other environmental variables is commonly referred to as soil–landscape modeling. Terrain attributes have been successfully used to predict both soil quantitative variables like horizon thicknesses, and physical, chemical, and biological soil properties (i.e., particle size fraction, organic carbon, nutrient, enzymatic activities) as well as qualitative data (soil taxonomic units). The number of topographic attributes used for predicting soil properties ranged from only one (i.e., in Baxter and Oliver, 2005 the spatial prediction of nitrogen was based on the elevation) up to 69 attributes (Behrens et al., 2005). Only the most widely used attributes for analyzing the influence exerted by topography on soil spatial distribution are briefly described below. Slope gradient can be considered as an indicator of energy gradient and influences flow erosivity. Slope curvature is important for soil mapping because it influences local water flowing in terms of convergence or divergence (tangential curvature) and acceleration or deceleration (profile curvature). It exerts a great influence on hillslope processes such as soil and water redistribution, throughflow, and vertical infiltration. Slope aspect reflects the role of topography in modifying solar radiation at the surface; therefore, it locally adjusts climatic factors such as soil temperature and moisture that may affect the pedogenesis (Wilson and Gallant, 2000). Contributing area (CA) represents the area of land upslope of a specific contour length. It may give insight for the amount of effective precipitation received by an area from upslope and can be used to infer landscape position (Gessler et al., 1995). Among the secondary terrain attributes, topographic wetness index (TWI) is commonly used in hydrologic soil studies. The TWI is calculated as the natural logarithm of the ratio of specific CA (SCA ¼ CA/grid size) by slope gradient (Wilson and Gallant, 2000). It is an index of the likelihood of a cell to collect water and is considered a predictor of soil saturation. The stream power index (SPI) is calculated as the logarithm of the product of SCA and slope gradient. It is a measure of the erosive power of overland flow based on the assumption that discharge is proportional to the specific catchment area (Moore et al., 1991). Generally, the higher the SPI, the higher the likely occurrence of soil erosion and related hillslope processes. The length slope factor, also used in the RUSLE equation, reflects the control exerted by topography on soil erosion (Renard et al., 1997). It is a function of the slope steepness factor (S), and the slope length (L) influences surface runoff speed and is considered as a sediment transport capacity index. The distance of an element (such as a pixel) from the local drainage influences the probability that eroded sediment contributes to basin sediment yield. In terrain–soil modeling, DEMs are usually used as error-free models even if they can be affected by errors deriving from sampling design, measurement, and interpolation (Hengl et al., 2004; Castrignanò et al., 2006a,b). A number of soil–terrain modeling approaches have reported empirical evidence of scale-dependent correlation structure between soil spatial variability and DEM-derived attributes, in different environments such as agricultural fields and hillslope systems (e.g., Thompson et al.,

GIS and Soil

41

2001; Erskine et al., 2007; Park et al., 2009). The variation of the relationship between soil properties and topographic attributes by varying DEM resolution inhibited the development of any generalizable rules for selecting appropriate DEM spatial resolutions. Although high-resolution DEMs may be able to capture any microscale topographic relief at a portion of a system, the use of such detailed geomorphic information will not necessarily be desirable in predictive soil mapping, if edaphic conditions (e.g., nutrient content, soil pH, pollution, etc.) are more or less homogeneous within the given area (Thompson et al., 2001; Smith et al., 2006; Park et al., 2009; Kim and Zheng, 2011). A unique “suitable” DEM resolution does not exist (Claessens et al., 2005) since the selection of the appropriate DEM resolution is site dependent (Hutchinson and Gallant, 2000) and the optimal geomorphic scale may vary depending on different environmental factors (Kalin et al., 2003). Since some chemical and biological properties vary over time, especially as a result of the variation of meteorological conditions, agronomic practices or human activities, a temporal variability exists in the relationships between morphometric attributes and the spatial distribution of soil dynamic properties (i.e., microbiological activities, water content, and organic carbon).

2.03.4

Digital Soil Mapping

DSM can be used to predict both categorical variables (e.g., soil taxonomic classes) and quantitative soil properties by applying various mathematical and statistical approaches from the local (i.e., single field) to regional scale. Such approaches can be summarized into three main categories such as classical statistics, geostatistics, and hybrid methods (McBratney et al., 2000). The classical statistical methods deal with deterministic relations between soil properties and auxiliary variables, but they do not account for spatial autocorrelations of data, especially at the local level. To address this issue, geostatistical methods have been developed. Geostatistics (Matheron, 1971) allows to quantify the spatial variability of soil producing continuous maps starting from sparse data. Geostatistical interpolation techniques can be used even without ancillary information, in case of sufficient data of the soil property of interest within the study area. Since soil properties are the result of environmental covariates, it may be beneficial modeling both (i) the deterministic component of soil spatial variation as a function of the environmental covariates, and (ii) any residual stochastic component. Hybrid methods derive from the combination of both classical and geostatistical techniques and therefore account for both deterministic and stochastic components of soil variability. Fig. 2 summarizes the main features characterizing the three aforementioned groups. Each of them includes a variety of quantitative available techniques. A detailed description of such approaches goes beyond the scope of the article; interested readers may refer to specific papers (McBratney et al., 2003; Scull et al., 2003). Below, we briefly describe the most commonly applied for predicting soil classes or quantitative properties.

2.03.4.1

Statistical Methods

Among statistical methods, multiple regression analysis has been the most commonly applied for assessing the relationship between a soil property (dependent variable) and several morphometric attributes as independent predictors (Moore et al., 1993; Gessler et al., 2000). The approach assumes a linear relationship between soil and topography but the simplicity of data processing, model structure, and interpretation explain its wide application for predicting several quantitative soil properties. Regression has been used for example to assessing soil horizon thickness (Moore et al., 1993; Odeh et al., 1994; Gessler et al., 2000; Florinsky et al., 2002). The relationships between soil properties and other topographic or biophysical variables are rarely

DIGITAL SOIL MAPPING

Statistical methods

Hybrid methods

Geostatistical methods

Deterministic soil variation

Stochastic soil variation

Deterministic and stochastic soil variation

Require correlation with auxiliary data

Require spatial dependence

Require spatial dependence and correlation with auxiliary data

Don’t require stationarity

+

=

Require stationarity

Require stationarity

Knowledge-driven

Data-driven

Data-driven

Examples: Regression models, ANN, Classifiaction and regression tree, etc...

Examples: Simple kriging, Ordinary kriging, Stochastic simulation, etc...

Examples: Cokriging, Regression kriging, Kriging with external drift, etc...

Non-spatial methods Fig. 2

Schematic overview of pedometric approaches for digital soil mapping.

Spatial methods

42

GIS and Soil

linear in nature. Such consideration has led to the application of more robust methods such as generalized linear models (GLM) and generalized additive models (GAM). GLM are used both for regression and classification purposes. The assumption is that the dependent variable is normally distributed and that the predictors combine additively on the response. Aside from being able to handle multiple distributions, GLM have additional benefits, such as being able to use both categorical and continuous variables as predictors. Thanks to their ability to model complex data structures, GLM models have been widely applied. In GAM, the linear function between soil properties and topographic covariates is replaced by an unspecified nonparametric function (e.g., spline). Artificial neural network (ANN) is a nonparametric modeling technique used to overcome the nonlinearity in the relationships characterizing the soils. The ANN is a form of artificial intelligence that can use both qualitative and quantitative data. It aims autoanalyzing the relationships between multisource inputs by adopting self-learning methods, and works without any hypothesis on the statistical distribution of variables. Zhao et al. (2009) developed an ANN model to predict soil properties based on hydrological attributes (soil–terrain factor, sediment delivery ratio, and vertical slope position) derived from high-resolution DEM. Fuzzy logic (Zadeh, 1965; McBratney and Odeh, 1997) is a method for grouping multivariate data into clusters, defining the membership of an element belonging to a set of classes. Different from hard logic that allows an individual to lie within a mutually exclusive class, fuzzy logic (sometimes called fuzzy k-means) allows an individual to lie as bridges between classes. Since soil landscapes are characterized by continuous nature, fuzzy logic is useful in predictive soil mapping. Fuzzy logic has been used, for example, to cluster topographic attributes (elevation, slope, plan curvature, TWI, SPI, catchment area) derived from a 5 m DEM in order to predict topsoil clay at field scale (de Bruin and Stein, 1998). The method has resulted useful for predicting chemical properties such as soil mineral nitrogen, organic matter, available phosphorus, and soil pH (Lark, 1999) at the field scale and soil taxonomic classes in large-scale soil mapping (Odeh et al., 1992; Lark, 1999; Barringer et al., 2008). Combination of fuzzy logic with discriminant analysis is also reported in the literature (Sorokina and Kozlov, 2009). Decision trees work by splitting data into homogeneous subsets. Two main types of the decision tree analysis are used in DSM: classification tree analysis (the dependent variable is categorical) and regression tree analysis (the dependent property is a numeric variable). Classification tree has been applied for predicting soil drainage class using digital elevation and remote sensed data (Cialella et al., 1997) or soil taxonomic classes (Lagacherie and Holmes, 1997; McBratney et al., 2000; Moran and Bui, 2002; Zhou et al., 2004; Scull et al., 2005; Mendonça-Santos et al., 2008). Regression tree has instead been used for predicting soil cation exchange capacity (Bishop and McBratney, 2001), soil profile thickness, total phosphorus (McKenzie and Ryan, 1999). Discriminant analysis is used to assess the group membership of an individual based on the attributes of the individual itself. This method allows to determine the attributes adequate to discriminate between classes using a multivariate dataset. The approach has been used to map soil texture classes (Hengl et al., 2007), soil drainage classes (Kravchenko et al., 2002), and taxonomic classes (Thomas et al., 1999; Hengl et al., 2007). Logistic regression is used to predict a categorical variable from a set of both continuous and/or categorical predictors (Kleinbaum et al., 2008). Logistic regression can be binary or multinomial, based on the number of soil categories to be predicted. For example, multinomial logistic regression has been used to predict soil taxonomic classes or soil texture (Hengl et al., 2007; Giasson et al., 2008). Binary logistic regression has instead been used to assess the presence or absence of specific horizon (Gessler et al., 1995), soil salinity risk (Taylor and Odeh, 2007), and gully erosion (Lucà et al., 2011; Conoscenti et al., 2014).

2.03.4.2

Geostatistical Methods

Geostatistical methods (Matheron, 1971) provide a valuable tool to study the spatial structure of soil properties; geostatistics, in fact, allows one modeling the spatial dependence between neighboring observations as a function of their distance. The mathematical model of spatial correlation is expressed by the variogram. The information provided by variograms is used in one of the different techniques of spatial interpolation (known as kriging) to estimate the variable at unsampled locations. The variographic analysis allows to detect anisotropic behavior of soil variables. The advantage of anisotropic modeling lies in its capability of disclosing important changes in spatial dependence with a particular direction which, in turn, is a function of soil-forming processes. Variogram modeling is sensitive to strong departures from normality, because a few exceptionally large values may contribute to very large squared differences. Geostatistical analysis is therefore most efficient when variables have Gaussian distribution and requires an assumption of data stationarity but such condition is not always verified. Kriging provides the “best,” unbiased, linear estimate of a variable where “best” is defined in a least-square sense (Webster and Oliver, 2007; Chilès and Delfiner, 2012). Ordinary kriging is one of the most commonly used kriging method. The estimation involves only the primary soil variable and the method provides a kriging variance or its square root, the kriging error, which can guide to the reliability of the estimate. Ordinary kriging testifies that geostatistical interpolation techniques can be used at the regional level, even without ancillary data, especially if there is sufficient data in some localities within the study area. Every kriging algorithm essentially leaves the job of spatial pattern reproduction unfinished because the kriging map is unique and smooth (Caers, 2003). Aiming at modeling heterogeneity and assessing uncertainty of soil variable at unsampled locations, the kriging should be replaced by a set of alternative maps, which honor sample measurements and try to reproduce the true spatial variability of soil properties. Stochastic simulation (Journel and Alabert, 1989; Goovaerts, 2001) represents an alternative modeling technique, which is particularly suitable for applications where global statistics are more important than local accuracy. Simulation consists in computing a set of alternative stochastic images of a random process and then carrying out an uncertainty analysis that is inadequate in the classical methods (Castrignanò et al., 2002; Buttafuoco et al., 2012). Each simulation is an equally probable realization of the unknown soil property and the postprocessing of a large set of simulated images allows assessing uncertainty and

GIS and Soil

43

evaluating the consequences of data uncertainty on decision-making. The probability of exceeding a particular threshold value can be computed from a set of simulations by counting the percentage of stochastic images that exceed the stated threshold. For example, Lucà et al. (2014) used 500 stochastic images to assess the probability that soil thickness value did not exceed a predefined threshold, useful for identifying where surficial landslide could occur. The approach also allowed delineating the areas characterized by greater uncertainty in pyroclastic thickness estimation, suggesting supplementary measurements to further improve the cover thickness distribution model, thus reducing the uncertainty. Due to high cost of getting accurate and quantitative information, a high value can be relied upon the available data at different scales.

2.03.4.3

Hybrid Methods

The hybrid geostatistical techniques are based on various combinations of classical statistics and geostatistical methods and use ancillary variables such as landscape attributes or sensed data for soil property estimation (Webster and Oliver, 2007). There are several hybrid geostatistical approaches and the choice of the best method depends on the specific case study, data availability, and presence of a spatial trend. Cokriging considers two or more variables contemporarily but needs that variables be related. Moreover, it is computing demanding because of the modeling of simple, and cross variograms which describe the correlation of pairwise variables. Kriging with external drift is similar to universal kriging but uses ancillary variable to describe the spatial changes in the relationship between variables (however such relationship must be linear). Regression kriging combines linear regression models with ordinary or simple kriging (Webster and Oliver, 2007) and is in turn used to interpolate the residuals. Both topographic attributes and electromagnetic data have been used as auxiliary variables to improve the estimation of soil texture and other properties related to soil fertility, in the regression kriging (Moral et al., 2010) or kriging with external drift techniques (De Benedetto et al., 2012).

2.03.5

Applications of GIS in Soil Sciences

Given the variety of both soil variables and ancillary data and pedometric methods, GIS should no longer be used in soil studies as simple tools of overlaying maps (Fig. 3). Single data may be combined through data fusion techniques or used as input for assessing soil erosion or predicting crop yield. Some examples of GIS application in soil sciences are reported below.

2.03.5.1

Data Fusion

DSM requires the integration of soil data and its environmental covariates. As previously described, in fact, soils can be characterized through many types of data coming from different sources such as field sampling, laboratory analyses, and proximal and remote

Auxiliary data

Soil observation 98

INPUT DATA

03 101 104 1045

1138

111

91

112

1086

105

110 106

GIS FOR SOIL SCIENCE

32

1112

107

109

DIGITAL SOIL MAPPING DEVELOP PREDICTION

Statistical methods

Geostatistical methods

Final map

OUTPUT

Accuracy evaluation DECISION MAKING Fig. 3

Policy/ Management

Flow chart of GIS procedures for soil data processing.

Hybrid methods

44

GIS and Soil

sensors (e.g., spectral, electrical, electromagnetic, or radiometric measurements of both soils and plants) with various spatial and temporal scales. After adopting the same reference system, the critical point is data integration by selecting the appropriate support system. In fact, soil samples are collected punctually, whereas auxiliary maps have commonly much larger support size (Fig. 1). Support size is the discretization level of a geographical surface. The term “data fusion” indicates the integration of data with different spatial, spectral, and radiometric resolutions by changing the support size. For example, in a GIS, multiresolution airborne and satellite data can be fused with LiDAR. The objective of support size changing is to fit the resolution of the environmental soil covariates to the target resolution of soil assessment (Hengl, 2006). Such goal can be addressed by upscaling the auxiliary maps or averaging soil samples within the size of the ancillary data. As mentioned above, the big amount of available data can be treated through the multivariate geostatistics technique called “data fusion” (Adamchuk et al., 2011), appropriate for integrating data coming from different input and for adjusting them to the same spatial resolution (Shaddad et al., 2016).

2.03.5.2

Soil Erosion and Hydrological Models

Soil erosion by water is considered one of the major causes of land degradation and a large number of investigations have been carried out aiming at developing and testing methods for the evaluation of soil erosion processes using GIS. Gullies are among the main soil–water erosion features in various places of the world and several studies focused on the assessment of a topographic threshold that has to be exceeded for the initiation of a gully (Torri et al., 2012a,b; Torri and Poesen, 2014 and references therein). Such thresholds are evaluated based on the combination of primary and secondary topographic attributes, especially those ones controlling water and sediment redistribution. In addition, GIS coupled with statistical methods is undoubtedly suitable to assess and map gully erosion susceptibility. The locations of future events can be detected based on the relationships between occurred gullies and environmental factors controlling critical conditions for their development, mainly related to topography, soil, lithology, and land use. After the estimation of the contribution of each predisposing factors, the study area can be classified into sectors characterized by different susceptibilities. Various bivariate and multivariate statistical methods have been used to predict spatial distribution of gullies, using DEMs with cell sizes ranging from 5 m to 1 km to extract topographic attributes (Bou Kheir et al., 2007; Kakembo et al., 2009; Ndomba et al., 2009; Lucà et al., 2011; Conoscenti et al., 2014). Lucà et al. (2011) have shown how the size of DEM cells influenced the extent of each erosion susceptibility class at the watershed scale. The authors observed that the larger the spatial resolution, the smaller the extent of the most susceptible classes and the accuracy of the map, suggesting to adopt as optimal DEM cell size, the same size of water erosion processes. GIS may also help to assess water soil loss potential from hillslopes using empirical or physical-based models. The most widely used empirical erosion model is represented by the Universal Soil Loss Equation (USLE) and its derivatives RUSLE and RUSLE 2 (Renard et al., 1997; Terranova et al., 2009; and reference therein). The RUSLE and RUSLE 2 are extensions of the original USLE with improvements in determining the factors controlling erosion. RUSLE estimates the annual soil loss that is due to erosion through a factor-based approach using as input variables: rainfall (R), soil erodibility (K), length slope (LS), cover management (C), and conservation practices (P). RUSLE can be easily implemented in a GIS environment by simply multiplying the maps concerning the aforementioned environmental factors. Various physically based models for erosion assessment have been proposed during last decades and applied at the local (plot, farm, and hillslope), regional, and national scales (Terranova et al., 2009; and citations therein). Table 1 reports the most commonly used. From the beginning of the 1980s of last century, in some erosion models, the physical processes governing water erosion have been mathematically described with different levels of complexity (Shoemaker et al., 1990). Since the attributes required by models are variable in space and time, GIS is essential to apply such models particularly in large areas (e.g., Lu et al., 2004; Terranova et al., 2009). In a more or less accurate physical/mathematical form, these models usually include hydrological modules, which describe the different processes: spatial and temporal distribution of precipitation, rainfall interception, soil infiltration, surface runoff, plant growth, nutrients cycle, water pollution, etc. A control surface and a control volume are used to model such physical processes. The control volume is usually delimited by two horizontal planes (top and bottom) and by a lateral vertical surface. Through the upper plane (Fig. 4), precipitations and solar radiations enter the control volume and evapotranspiration rates come out. From the lower plane, groundwater and, eventually, its solutes should drain of the control volume depending on its hydraulic conductivity. The lateral surface follows the boundary of river basin or aquifer and is affected by many different fluxes of matter, which are controlled by hydraulics (e.g., surface runoff and infiltration), thermodynamics, meteorology (precipitation, air temperature, wind, etc.), crops, or land cover (evaporation and evapotranspiration). With reference to the control volume including a small plot of soil (Fig. 4), the conservation of mass for soil-water can be simplified and described by the following equation: DS ¼ P ðt Þ  Iðt Þ  Eðt Þ  ET0 ðt Þ  Dðt Þ  Rðt Þ Dt

(3)

in which the term (DS/Dt) is the change in soil moisture during a given period Dt, P(t) is the precipitation, I(t) is the amount of irrigation, E(t) is the soil evaporation, ET0(t) is the plant transpiration, D(t), is the deep drainage, and R(t) is the surface and subsurface runoff. To build a spatial hydrological model at each cell of the control volume, the physical variables are associated to a time sequence of states. Usually, such models are built as submodels included in a GIS-based hydrological model. To ensure the spatial and

GIS and Soil Table 1

45

Mathematical models for soil–water erosion

Acronym and model evolution

Meaning of the acronym and/or notes

References

ANSWERS ANSWERS_2000 WEPP GeoWEPP

Areal nonpoint source watershed environment response simulation

SHE SHE–SED MIKE-SHE

Système Hydrologique Européen A module aimed at modeling the coupled water flow and bulk sediment transport Can model advection and dispersion of conservative solutes in multilayered aquifer systems, transformations of nitrogenous elements in the root zone, and irrigation European soil erosion model Limburg soil erosion model KINematic runoff and EROSion Sediment transport model

Beasley et al. (1980) Bouraoui and Dillaha (1996) Nearing et al. (1989) Renschler et al. (2002) and Renschler (2003) Abbott et al. (1986) Wicks and Bathurst (1996) Refsgaard and Storm (1995)

Water erosion prediction project ESRI ArcView extension able to display erosion risk maps

EuroSEM LiSEM KINEROS EROSION-2D/3D STM- 2D/3D AGNPS EPIC CREAMS GLEAMS

AGricultural NonPoint Source model Erosion productivity impact calculator Chemicals, runoff and erosion from agricultural management systems Groundwater loading effects of agricultural management systems. Model developed to simulate edge-of-field and bottom-of-root-zone loadings of water, sediment, pesticides, and plant nutrients from the complex climate-soil-management interactions. Model improvements related to specific topics often resulted in new version releases Used to simulate, on small basins, the water movement and transport in the root-zone profile, as well as the surface runoff and sediment transport during rainfall events. Therefore, it can be adopted for runoff and erosion predictions

OPUS

Morgan et al. (1998) de Roo et al. (1994) Woolhiser et al. (1990) Schmidt et al. (1996) Biesemans (2000) Young et al. (1994) Williams et al. (1983) Knisel (1980) Leonard et al. (1987) and Knisel (1993)

Smith (1992)

Input

SOLAR RADIATION, AIR TEMPERATURE

PRECIPITATION: Irrigation...

Trees, grass, litter

Snow,

Unsatured zone

INTERCEPTION: FREE THROUGHFALL Canopy, forest floor Quick flow ORGANIC AND SURFACE HORIZONS

SUBSOIL: B and C HORIZONS Intermediate store Satured zone

Root zone

Evaporation / Evapotranspiration

Rainfall,

Capillary frange Groundwater table PARENT MATERIAL (Bedrock)

Delayed flow

Base flow

Groundwater store Precipitation

Water vertical migration

Water lateral flow

Fig. 4 The hydrological cycle and major fluxes in a hydrological model scheme: layers include atmosphere (blue), vegetal coverage (green), soil layers (dark and light brown), and the groundwater reservoir (gray) as well as the exchange between them.

46

GIS and Soil

temporal continuity of all physical variables, the submodels are based on a common database. Moreover, all submodels may need their own resolution in space and/or time, but the physical consistency should be ensured when they interact with each other. The prediction of runoff and soil detachment, transport, and deposition is important for identifying the areas most susceptible to soil loss. The results of empirical/statistical/physically based models need to guide authorities to implementing the required and appropriate land use and soil conservation measures, in order to reduce the likelihood of loss-reduction occurrence, also minimizing its social and economic effects.

2.03.5.3

Precision Agriculture

GIS plays a significant role in agronomic studies since it is suitable to be used for modeling the variability of nutrient status within a field and assessing specific requirements for external application of fertilization and irrigation or for management practices. GIS can be used to assemble information’s layers of chemical and biological properties related to soil fertility, moisture content, and topography to produce a map showing which factors influence crop yield. Precision agriculture is oriented to field management taking into account soil spatiotemporal variability within a site-specific context. The most widely used approach to manage the variability of fields concerns the delineation of management zones (MZs) that are subregions inside the field that exhibit a relatively homogeneous grouping of yield-limiting factors, concerning the treatment regime to be adopted. Each zone, in fact, is treated with the suitable level of inputs (soil tillage, irrigation, fertilizer rate, pesticide, and crop protection), differently from the adjacent areas. Generally, the identification of subfield zones is difficult because of the complex combination of factors, which could influence the variation in response variables (i.e., requested quality and quantity of crop yield). There are different approaches for delineating within-field MZs, but most of them rely on spatial information sources such as soil properties (i.e., organic carbon, available nutrients, and soil texture), topographic attributes, remote or proximal sensed data, and yield maps. A first method simply consists in maps’ overlay: after reclassifying each map, the unique combinations of classes define the MZs in the field. Such approach is dependent on arbitrary classification criteria defined by the user. An additional more sound approach, available in most GIS software, is the use of clustering algorithms, in order to group areas with similar features in the field. An example of application of fuzzy clustering can be found in Fleming et al. (2000) which divide a field into MZs using combined soil properties. The contribution of geostatistics to precision agriculture is undouble (Buttafuoco and Lucà, 2016 and reference therein) since it provides the opportunity to understand the relationship among spatial variability of soil properties, field management practices, and crop yields. One possibility to summarize the variation of attributes or limiting factors affecting agricultural production is to use factorial kriging (Morari et al., 2009), which allows one to quantify and reduce the spatial variability of multivariate data to only a few factors, related to different spatial scales thus dividing the field in areas of size manageable by farmers. Further, polygon kriging can be used to assess the effectiveness of field delineation based on soil attributes (Diacono et al., 2014; Buttafuoco et al., 2017). The delineation of appropriate MZs can be used to guide recommendations for best management practices, including the evaluation of amount of fertilizers or pesticides to use, the estimation of productivity and crop yields, the assessment of appropriate crop types or irrigation needs. Such site-specific recommendations can enhance both profitability and environmental protection. Using a lower amount of pesticides may in fact not only maintain economic stability but also minimize environmental impacts from agrochemical pollutants.

2.03.6

Conclusive Remarks

Soil is the result of many biotic and abiotic factors and its spatial variability depends on the complex interaction of several soil processes working both at the same time and over long periods at different spatial scale. GIS may allow one to store and analyze available information on soil properties, topography, geology, vegetation, and other auxiliary data derived from remote and proximal sensors with different spatial resolution. DSM allows one to study soil–terrain relationships, predicting soil properties or attributes through several pedometric techniques. Although the article highlights how relationships between soils and environmental factors can be identified and assessed within GIS environment, it is worth stressing that many soil properties are nonstationary and their spatial variation may change over time. Since DSM relies heavily on auxiliary data, their quality, inaccuracy, or artifacts affect reliability of soil maps. Their quality also derives from the quantity of the available soil information and the adopted pedometric technique. In many earth and environmental science applications, soil properties serve as input for evaluating the risk of soil loss, pollution, or degradation. Therefore assessing the propagation of error in the modeling process is needed to quantify the uncertainty of the final product, which may guide soil decision support system.

Acknowledgement Financial support for this work comes from the project “ALForLab” PON03PE_00024_1 co-funded by the National Operational Programme for Research and Competitiveness (PON R&C) 2007–2013, through the European Regional Development Fund (ERDF) and national resource (Revolving Fund - Cohesion Action Plan (CAP) MIUR. The Editor and anonymous reviewers are acknowledged for their suggestions.

GIS and Soil

47

References Abbott, M.B., Bathurst, J.C., Cunge, J.A., O’Connell, P.E., Rasmussen, J., 1986. An introduction to the European Hydrological SystemdSysteme Hydrologique Europeen, SHE. History and philosophy of a physically-based distributed modelling system. Journal of Hydrology 87, 45–59. Adamchuk, V.I., Viscarra Rossel, R.A., Sudduth, K.A., Schulze Lammers, P., 2011. Sensor fusion for precision agriculture. In: Thomas, C. (Ed.), Sensor fusion – Foundation and applications. InTech, Rijeka, Croatia, pp. 27–40. Barringer, J.R.F., Hewitt, A.E., Lynn, I.H., Schmidt, J., 2008. National mapping of landform elements in support of S-Map, a New Zealand soils database. In: Zhou, Q., Lees, B., Tang, G.-A. (Eds.), Advances in digital terrain analysis. Springer, Berlin, pp. 443–458. Bastida, F., Zsolnay, A., Hernández, T., García, C., 2008. Past, present and future of soil quality indices: A biological perspective. Geoderma 147, 159–171. Baxter, S.J., Oliver, M.A., 2005. The spatial prediction of soil mineral N and potentially available N using elevation. Geoderma 128, 325–339. Beasley, D.B., Huggins, L.F., Monke, E.J., 1980. ANSWERS: A model for watershed planning. Transactions of ASAE 23 (4), 938–944. Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D., Goldschmitt, M., 2005. Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science 168, 21–33. Ben-Dor, E., 2002. Quantitative remote sensing of soil properties. Advances in Agronomy 75, 173–243. Biesemans, J. (2000). Erosion modelling as support for land management in the loess belt of Flanders, Belgium (Ph.D. thesis), Ghent University, Belgium. Bishop, T.F.A., McBratney, A.B.., 2001. A comparison of prediction methods for the creation of field-extent soil property maps. Geoderma 103, 149–160. Bou Kheir, R.B., Wilson, J., Deng, Y., 2007. Use of terrain variables for mapping gully erosion susceptibility in Lebanon. Earth Surface Processes and Landforms 32, 1770–1782. Bouraoui, F., Dillaha, T.A., 1996. ANSWERS-2000: Runoff and sediment transport model. Journal of Environmental Engineering 122 (6), 493–502. Brevik, E.C., Fenton, T.E., Horton, R., 2004. Effect of daily soil temperature fluctuations on soil electrical conductivity as measured with the Geonics® EM-38. Precision Agriculture 5, 143–150. Brevik, E.C., Fenton, T.E., Lazari, A., 2006. Soil electrical conductivity as a function of soil water content and implications for soil mapping. Precision Agriculture 7, 393–404. Brocklehurst, S.H., 2010. Tectonics and geomorphology. Progress in Physical Geography 34, 357–383. Buttafuoco, G., Lucà, F., 2016. The contribution of geostatistics to precision agriculture. Annals of Agricultural & Crop Sciences 1 (2), 1008. Buttafuoco, G., Conforti, M., Aucelli, P.P.C., Robustelli, G., Scarciglia, F., 2012. Assessing spatial uncertainty in mapping soil erodibility factor using geostatistical stochastic simulation. Environmental Earth Sciences 66, 1111–1125. Buttafuoco, G., Castrignanò, A., Cucci, G., Lacolla, G., Lucà, F., 2017. Geostatistical modelling of within-field soil and yield variability for management zones delineation: A case study in a durum wheat field. Precision Agriculture 18, 37–58. Caers, J., 2003. Geostatistics: From pattern recognition to pattern reproduction. In: Nikravesh, M., Aminzadeh, F., Zadeh, L. (Eds.), Soft computing and intelligent data analysis in oil exploration. Elsevier, Amsterdam, pp. 97–115. Carré, F., McBratney, A.B.., Mayr, T., Montanarella, L., 2007. Digital soil assessments: Beyond DSM. Geoderma 142 (1–2), 69–79. Castrignanò, A., Lopez, N., Prudenzano, M., Steduto, P., 2002. Estimation of stochastic simulation in soil science. In: Zdruli, P., Steduto, P., Kapur, S. (Eds.)Selected Papers of the 7th International Meeting on Soils with Mediterranean Type of Climate, Options Méditerranéennes, Serie A n. 50, Paris, pp. 167–182. Castrignanò, A., Buttafuoco, G., Comolli, R., Ballabio, C., 2006a. Error propagation analysis of DEM-based slope and aspect. In: Proceedings of the Second Global Workshop on Digital Soil Mapping for Regions and Countries with Sparse Soil Data Infrastructures, Rio de Janeiro, Brazil, 4–7 July 2006Empraba Solos, Rio de Janeiro. Castrignanò, A., Buttafuoco, G., Comolli, R., Ballabio, C., 2006b. Accuracy assessment of digital elevation model using stochastic simulation. In: Caetano, M., Painho, M. (Eds.) Proceedings of 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences (Accuracy 2006), 5–7 July 2006. Instituto Geográfico Portugués, Lisbon, Portugal, pp. 490–498. ISBN 972-8867-27-1. Castrignanò, A., Buttafuoco, G., Puddu, R., 2008. Multi-scale assessment of the risk of soil salinization in an area of south-eastern Sardinia (Italy). Precision Agriculture 9, 17–31. Chilès, J.P., Delfiner, P., 2012. Geostatistics: Modelling spatial uncertainty. In: 2nd edn,. Wiley, New York. Cialella, A.T., Dubayah, R., Lawrence, W., Levine, E., 1997. Predicting soil drainage class using remotely sensed and digital elevation data. Photogrammetric Engineering and Remote Sensing 63, 171–178. Claessens, L., Heuvelink, G.B.M., Schoorl, J.M., Veldkamp, A., 2005. DEM resolution effects on shallow landslide hazard and soil redistribution modelling. Earth Surface Processes and Landforms 30, 461–477. Cockx, L., Van Meirvenne, M., Vitharana, U.W.A., Verbeke, L.P.C., Simpson, D., Saey, T., Van Coille, F.M.B., 2009. Extracting topsoil information from EM38DD sensor data using neural network approach. Soil Science Society of America Journal 73 (6), 1–8. Conforti, M., Froio, R., Matteucci, G., Buttafuoco, G., 2015. Visible and near infrared spectroscopy for predicting texture in forest soil: An application in Southern Italy. iForestBiogeosciences and Forestry 8, 339–347. Conoscenti, C., Angileri, S., Cappadonia, C., Rotigliano, E., Agnesi, V., Märker, M., 2014. Gully erosion susceptibility assessment by means of GIS-based logistic regression: A case of Sicily (Italy). Geomorphology 204, 399–411. Corwin, D.L., 2008. Past, present, and future trends in soil electrical conductivity measurements using geophysical methods. In: Allred, B.J., Daniels, J.J., Ehsani, M.R. (Eds.), Handbook of agricultural geophysics. CRC Press, Taylor and Francis Group, Boca Raton, Florida, pp. 17–44. De Benedetto, D., Castrignanò, A., Sollitto, D., Modugno, F., Buttafuoco, G., Lo Papa, G., 2012. Integrating geophysical and geostatistical techniques to map the spatial variation of clay. Geoderma 171–172, 53–63. de Bruin, S., Stein, A., 1998. Soil-landscape modelling using fuzzy c-means clustering of attribute data derived from a Digital Elevation Model (DEM). Geoderma 83, 17–33. de Roo, A.P.J., Wesseling, C.G., Cremers, N.H.D.T., Offermans, R.J.E., Ritsema, C.J., van Oostindie, K., 1994. LISEM: A new physically-based hydrological and soil erosion model in a GIS-environment: Theory and implementation. In: Olive, L.J., Loughran, R.J., Kesby, J.A. (Eds.), Variability in stream erosion and sediment transport. Proc. Canberra Symp., December 1994). IAHS Publ. no. 224, pp. 439–448. Diacono, M., Castrignanò, A., Vitti, C., Stellacci, A.M., Marino, L., Cocozza, C., 2014. An approach for assessing the effects of site-specific fertilization on crop growth and yield of durum wheat in organic agriculture. Precision Agriculture 15, 479–498. Doolittle, J.A., Brevik, E.C., 2014. The use of electromagnetic induction techniques in soils studies. Geoderma 223–225, 33–45. Erskine, R.H., Green, T.R., Ramirez, J.A., MacDonald, L.H., 2007. Digital elevation accuracy and grid cell size: effects on estimated terrain attributes. Soil Science Society of America Journal 71, 1371–1380. European Commission (2006). Proposal for a directive of the European parliament and of the council establishing a framework for the protection of soil and amending directive 2004/35/EC. European Commission COM, Brussels. 232 final. Farahani, H.J., Buchleiter, G.W., Brodahl, M.K., 2005. Characterization of soil electrical conductivity variability in irrigated sandy and non-saline fields in Colorado. Transactions of the American Society of Agricultural Engineers 48 (1), 155–168. Fleming, K.L., Westfall, D.G., Weins, D.W., Brodahl, M.C., 2000. Evaluating farmer defined management zone maps for variable rate fertilizer application. Precision Agriculture 2, 201–215. Florinsky, I.V., Eilers, R.G., Manning, G., Fuller, L.G., 2002. Prediction of soil properties by digital terrain modelling. Environmental Modelling and Software 17, 295–311. Gessler, P.E., Moore, I.D., McKensie, N.J., Ryan, P.J., 1995. Soil-landscape modelling and spatial prediction of soil attributes. International Journal of Geographical Information Science 9, 421–432.

48

GIS and Soil

Gessler, P.E., Chadwick, O.A., Chamran, F., Althouse, L., Holmes, K., 2000. Modeling soil-landscape and ecosystem properties using terrain attributes. Soil Science Society of America Journal 64, 2046–2056. Giasson, E., Figueiredo, S.R., Tornquist, C.G., Clarke, R.T., 2008. Digital soil mapping using logistic regression on terrain parameters for several ecological regions in Southern Brazil. In: Hartemink, A.E., McBratney, A., Mendonça-Santos, M.L. (Eds.), Digital soil mapping with limited data. Springer, Dordrecht, pp. 225–232. Goovaerts, P., 2001. Geostatistical modelling of uncertainty in soil science. Geoderma 103 (1–2), 3–26. Hengl, T., 2006. Finding the right pixel size. Computers & Geosciences 32, 1283–1298. Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial prediction of soil variables based on regression-kriging. Geoderma 120, 75–93. Hengl, T., Toomanian, N., Reuter, H.I., Malakouti, M.J., 2007. Methods to interpolate soil categorical variables from profile observations: Lessons from Iran. Geoderma 140, 417–427. Huisman, J.A., Hubbard, S.S., Redman, J.D., Annan, A.P., 2003. Measuring soil water content with ground penetrating radar: A review. Vadose Zone Journal 2, 476–791. Hutchinson, M.F., Gallant, J.C., 2000. Digital elevation models and representation of terrain shape. In: Wilson, J.P., Gallant, J.C. (Eds.), Terrain analysis: Principles and applications. John Wiley and Sons, New York, pp. 29–50. Jenny, H., 1941. Factors of soil formation. McGraw-Hill, New York. Journel, A.G., Alabert, F., 1989. Non-Gaussian data expansion in the earth sciences. Terra Nova 1, 123–124. Kalin, L., Govindaraju, R.S., Hantush, M.M., 2003. Effect of geomorphic resolution on modeling of runoff hydrograph and sedimentograph over small watersheds. Journal of Hydrology 276, 89–111. Kakembo, V., Xanga, W.W., Rowntree, K., 2009. Topographic thresholds in gully development on the hillslopes of communal areas in Ngqushwa Local Municipality, Eastern Cape, South Africa. Geomorphology 110, 188–194. Kim, D., Zheng, Y., 2011. Scale-dependent predictability of DEM-based landform attributes for soil spatial variability in a coastal dune system. Geoderma 164, 181–194. King, J.A., Dampney, P.M.R., Lark, R.M., Wheeler, H.C., Bradley, R.I., Mayr, T.R., 2005. Mapping potential crop management zones within fields: Use of yield-map series and patterns of soil physical properties identified by electromagnetic induction sensing. Precision Agriculture 6, 167–181. Kleinbaum, D.G., Kupper, L.L., Nizam, A., Muller, K.E., 2008. Applied regression analysis and other multivariable methods. In: 4th edn. Thomson Brooks/Cole, Belmont, CA, p. 928. Knisel, W. G. (ed.) (1980). CREAMS: A Field-Scale Model for Chemical, Runoff, and Erosion from Agricultural Management Systems. Conservation Research Report 26, U.S. Department of Agriculture, Washington, DC. Knisel, W. G. (ed.) (1993). GLEAMS Groundwater loading effects of agricultural management systems, Version 2.10. Dept. Publication No. 5, Biological & Agricultural Engineering Department, University of Georgia-Coastal Plain Experiment Station, Tifton. 260pp. Kravchenko, A.N., Bollero, G.A., Omonode, R.A., Bullock, D.G., 2002. Quantitative mapping of soil drainage classes using topographical data and soil electrical conductivity. Soil Science Society of America Journal 66, 235–243. Lagacherie, P., Holmes, S., 1997. Addressing geographical data errors in a classification tree for soil unit prediction. International Journal of Geographical Information Science 11, 183–198. Lagacherie, P., McBratney, A.B., 2007. Spatial soil information systems and spatial Soil inference systems: perspectives for digital Soil mapping. In: Lagacherie, P., McBratney, A.B., Voltz, M. (Eds.), Digital Soil Mapping: An Introductory Perspective. Elsevier, The Netherland, pp. 3–22. Lark, R.M., 1999. Soil–landform relationships at within-field scales: An investigation using continuous classification. Geoderma 92, 141–165. Leonard, R.A., Knisel, W.G., Still, D.A., 1987. GLEAMS: Groundwater loading effects of agricultural management systems. Transactions of ASAE 30 (5), 1403–1418. Li, X., Chang, S.X., Liu, J., Zheng, Z., Wang, X., 2016. Topography-soil relationships in a hilly evergreen broadleaf forest in subtropical China. Journal of Soils and Sediments. http://dx.doi.org/10.1007/s11368-016-1573-4. Lu, D., Li, G., Valladares, G.S., Batistella, M., 2004. Mapping soil erosion risk in Rondônia, Brazilian Amazonia: Using RUSLE, remote sensing and GIS. Land Degradation and Development 15 (5), 499–512. Lucà, F., Conforti, M., Robustelli, G., 2011. Comparison of GIS-based gullying susceptibility mapping using bivariate and multivariate statistics: Northern Calabria, South Italy. Geomorphology 134, 297–308. Lucà, F., Buttafuoco, G., Robustelli, G., Malafronte, A., 2014. Spatial modelling and uncertainty assessment of pyroclastic cover thickness in the Sorrento Peninsula. Environmental Earth Sciences 72 (9), 3353–3367. Lucà, F., Conforti, M., Matteucci, G., Buttafuoco, G., 2015. Prediction of organic carbon and nitrogen in forest soil using visible and near-infrared spectroscopy. EAGE Near Surface Geoscience, First Conference on Proximal Sensing Supporting Precision Agriculture, Torino 6–10. September. http://dx.doi.org/10.3997/2214-4609.201413834 Available at http://www.earthdoc.org/publication/publicationdetails/?publication¼82429. Lucà, F., Conforti, M., Castrignanò, A., Matteucci, G., Buttafuoco, G., 2017. Effect of calibration set size on prediction at local scale of soil organic carbon by Vis-NIR spectroscopy. Geoderma 288C, 175–183. Matheron, G., 1971. The theory of regionalised variables and its applications. In: Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau, vol. 5. École Nationale Supérieure des Mines de Paris, Paris. McBratney, A.B.., Odeh, I.O.A., 1997. Application of fuzzy sets in soil science: Fuzzy logic, fuzzy measurement, and fuzzy decisions. Geoderma 77, 85–113. McBratney, A.B.., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M.S., Shatar, T.M., 2000. An overview of pedometric techniques for use in soil survey. Geoderma 97, 293–327. McBratney, A.B.., Santos, M.L., Minasny, B., 2003. On digital soil mapping. Geoderma 117 (1–2), 3–52. McKenzie, N.J., Ryan, P.J., 1999. Spatial prediction of soil properties using environmental correlation. Geoderma 89 (1–2), 67–94. Mendonça-Santos, M.L., Santos, H.G., Dart, R.O., Pares, J.G., 2008. Digital mapping of soil classes in Rio de Janeiro State, Brazil: Data, modelling and prediction. In: Hartemink, A.E., McBratney, A., Mendonça-Santos, M.L. (Eds.), Digital soil mapping with limited data. Springer, Dordrecht, p. 381396. Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrological Processes 5, 3–30. Moore, I.D., Gessler, P.E., Nielsen, G.A., Peterson, G.A., 1993. Soil attribute prediction using terrain analysis. Soil Science Society of America Journal 57, 443–520. Moral, F.J., Terrón, J.M., Marques da Silva, J.R., 2010. Delineation of management zones using mobile measurements of soil apparent electrical conductivity and multivariate geostatistical techniques. Soil and Tillage Research 106, 335–343. Moran, C.J., Bui, E.N., 2002. Spatial data mining for enhanced soil map modelling. International Journal of Geographical Information Science 16, 533–549. Morari, F., Castrignanò, A., Pagliarin, C., 2009. Application of Multivariate Geostatistics in Delineating Management Zones within a gravelly vineyard using geo-electrical sensors. Computers and Electronics in Agriculture 68, 97–107. Morgan, R.P.C., Quinton, J.N., Smith, R.E., Govers, G., Poesen, J.W.A., Auerswald, K., Chisci, G., Torri, D., Styczen, M.E., 1998. The European Soil Erosion Model (EUROSEM): A dynamic approach for predicting sediment transport from fields and small catchments. Earth Surface Processes and Landforms 23, 527–544. Morvan, X., Saby, N.P.A., Arrouays, D., Le Bas, C., Jones, R.J.A., Verheijen, F.G.A., Bellamy, P.H., Stephens, M., Kibblewhite, M.G., 2008. Soil monitoring in Europe: A review of existing systems and requirements for harmonisation. Science of the Total Environment 391 (1), 1–12. Mulder, V.L., de Bruin, S., Schaepman, M.E., Mayr, T.R., 2011. The use of remote sensing in soil and terrain mappingdA review. Geoderma 162, 1–19. Nachtergaele, F.O., 1999. From the soil map of the world to the digital global soil and terrain database: 1960–2002. In: Sumner, M.E. (Ed.), Handbook of soil science. CRC Press, Boca Raton. Ndomba, P.M., Mtalo, F., Killingtveit, A., 2009. Estimating gully erosion contribution to large catchment sediment yield rate in Tanzania. Physics and Chemistry of the Earth 34, 741–748. Nearing, M.A., Foster, G.R., Lane, L.J., Finkner, S.C., 1989. A process-based soil erosion model for USDA Water Erosion Prediction Project technology. Transactions of ASAE 32 (5), 1587–1593.

GIS and Soil

49

Odeh, I.O.A., McBratney, A.B.., Chittleborough, D.J., 1992. Fuzzy-c-means and kriging for mapping soil as a continuous system. Soil Science Society of America Journal 56, 1848–1854. Odeh, I.O.A., McBratney, A.B.., Chittleborough, D.J., 1994. Spatial prediction of soil properties from landform attributes derived from a digital elevation model. Geoderma 63, 197–214. Park, S.J., Ruecker, G.R., Agyare, W.A., Akramhanov, A., Kim, D., Vlek, P.L.G., 2009. Influence of grid cell size and flow routing algorithm on soil–landform modeling. Journal of the Korean Geographical Society 44, 122–145. Pike, R.J., 2000. GeomorphometrydDiversity in quantitative surface analysis. Progress in Physical Geography 24 (1), 1–20. Refsgaard, J.C., Storm, B., 1995. MIKE SHE. In: Singh, V.P. (Ed.), Computer models of watershed hydrology. Water Resource Publications, Littleton, CO, pp. 806–846. Renard, K.G., Foster, G.R., Weesies, G.A., Mccool, D.K., Yoder, D.C., 1997. Predicting soil erosion by water: A guide to conservation planning with the revised soil loss equation (RUSLE). U.S. Dept. of Agriculture, Handbook No. 703, Washington DC, p. 404. Renschler, C.S., 2003. Designing geo-spatial interfaces to scale process models: The GeoWEPP approach. Hydrological Processes 17 (5), 1005–1017. Renschler, C.S., Flanagan, D.C., Engel, B.A., Frankenberger, J.R., 2002. GeoWEPP: The geospatial interface to the Water Erosion Prediction Project. ASAE Paper No. 022171. ASAE, St. Joseph, MI. Rossiter, D.G., 2004. Digital soil resource inventories: Status and prospects. Soil Use and Management 20 (3), 296–301. Saadat, H., Bonnell, R., Sharifi, F., Mehuys, G., Namdar, M., Ale-Ebrahim, S., 2008. Landform classification from a digital elevation model and satellite imagery. Geomorphology 100 (3–4), 453–464. Scheidegger, A.E., 2004. Morphotectonics. In: Springer, Berlin, p. 197. Schmidt, J., von Werner, M., Michael, A., 1996. EROSION 2D/3D. In: Ein Computermodell zur Simulation der Bodenerosion durch Wasser.. Sächsische Landesanstalt für Landwirtschaft/Sächsisches Landesamt für Umwelt und Geologie, Dresden, Freiberg. Scull, P., Chadwick, O.A., McArthur, D., 2003. Predictive soil mapping: A review. Progress in Physical Geography 27 (2), 171–197. Scull, P., Franklin, J., Chadwick, O.A., 2005. The application of classification tree analysis to soil type prediction in a desert landscape. Ecological Modelling 181, 1–15. Selige, T., Bohner, J., Schmidhalter, U., 2006. High resolution topsoil mapping using hyperspectral image and field data in multivariate regression modeling procedures. Geoderma 136 (1–2), 235–244. Shaddad, S.M., Madrau, S., Castrignanò, A., Mouazen, A.M., 2016. Data fusion techniques for delineation of site-specific management zones in a field in UK. Precision Agriculture 17, 200–217. Shoemaker, L.L., Magette, W.L., Shirmohammadi, A., 1990. Modeling management practices effects on pesticide movement to groundwater. Ground Water Monitoring Review XI (1), 109–115. Smith RE (1992) OPUS, An integrated simulation model for transport of nonpoint source pollutants at the field scale: Volume I, Documentation. U.S. Department of Agriculture, Agricultural Research Service, ARS-98, 120pp. Smith, M.P., Zhu, A.-X., Burt, J.E., Stiles, C., 2006. The effects of DEM resolution and neighborhood size on digital soil survey. Geoderma 137, 58–69. Sorokina, N.P., Kozlov, D.N., 2009. Experience in digital mapping of soil cover patterns. Eurasian Soil Science 42, 182–193. Stevens, A., Udelhoven, T., Denis, A., Tychon, B., Lioy, R., Hoffmann, L., van Wesemael, B., 2010. Measuring soil organic carbon in croplands at regional scale using airborne imaging spectroscopy. Geoderma 158, 32–45. Sulaeman, Y., Minasny, B., McBratney, A.B., Sarwani, M., Sutandi, A., 2013. Harmonizing legacy soil data for digital soil mapping in Indonesia. Geoderma 192, 77–85. Sullivan, D.G., Shaw, J.N., Rickman, D., 2005. IKONOS imagery to estimate surface soil property variability in two Alabama physiographies. Soil Science Society of America Journal 69, 1789–1798. Taylor, J.A., Odeh, I.O.A., 2007. Comparing discriminant analysis with binomial logistic regression, regression kriging and multi-indicator kriging for mapping salinity risk in Northwest New South Wales, Australia. In: Lagacherie, P., McBratney, A.B.., Voltz, M. (Eds.), Digital soil mapping. An introductory perspective. Elsevier, Amsterdam, pp. 455–464. Terranova, O., Antronico, L., Coscarelli, R., Iaquinta, P., 2009. Soil erosion risk scenarios in the Mediterranean environment using RUSLE and GIS: An application model for Calabria (southern Italy). Geomorphology 112 (3–4), 228–245. Thomas, A.L., King, D., Dambrine, E., Couturier, A., Roque, J., 1999. Predicting soil classes with parameters derived from relief and geologic materials in a sandstone region of the Vosges mountains (Northeastern France). Geoderma 90, 291–305. Thompson, J.A., Bell, J.C., Butler, C.A., 2001. Digital elevation model resolution: Effects on terrain attribute calculation and quantitative soil–landscape modelling. Geoderma 100, 67–89. Torri, D., Poesen, J., Borselli, L., Bryan, R., Rossi, M., 2012a. Spatial variation of bed roughness in eroding rills and gullies. Catena 90, 76–86. Torri, D., Borselli, L., Gariano, S.L., Greco, R., Iaquinta, P., Iovine, G., Poesen, J., Terranova, O.G., 2012b. Identifying gullies in the Mediterranean environment by coupling a complex threshold model and a GIS. Rendiconti Online della Società Geologica Italiana 21, 441–443. Torri, D., Poesen, J., 2014. A review of topographic threshold conditions for gully head development in different environments. Earth Science Reviews 130, 73–85. Webster, R., Oliver, M.A., 2007. Geostatistics for environmental scientists. In: 2nd edn. Wiley, Chichester, p. 315. Weller, U., Zipprich, M., Sommer, M., Castell, W.Zu., Wehrhan, M., 2007. Mapping clay content across boundaries at the landscape scale with electromagnetic induction. Soil Science Society of America Journal 71 (6), 1740–1747. Wicks, J.M., Bathurst, J.C., 1996. SHESED: A physically based, distributed erosion and sediment yield component for the SHE hydrological modelling system. Journal of Hydrology 175, 213–238. Wienhold, B.J., Doran, J.W., 2008. Apparent electrical conductivity for delineating spatial variability in soil properties. In: Allred, B.J., Daniels, J.J., Ehsani, M.R. (Eds.), Handbook of agricultural geophysics. CRC Press, Taylor and Francis Group, Boca Raton, Florida, pp. 211–215. Williams, J.R., Renard, K.G., Dyke, P.T., 1983. EPIC: A new method for assessing Erosion’s effect on soil productivity. Journal of Soil and Water Conservation 38 (5), 381–383. Wilson, J.P., Gallant, J.C., 2000. Digital terrain analysis. In: Wilson, J.P., Gallant, J.C. (Eds.), Terrain analysis: Principles and applications. John Wiley & Sons Inc, New York, pp. 1–27. Woolhiser, D. A., Smith, R. E. and Goodrich, D. C. (1990). KINEROS, A kinematic runoff and erosion model: Documentation and user manual. U.S. Department of Agriculture, Agricultural Research Service, ARS-77, 130p. Young, R.A., Onstad, C.A., Bosch, D.D., Anderson, W.P., 1994. Agricultural non-point source pollution model, Version 4.03 AGNPS USER’S GUIDE. July 1994. In: USDA-NRS-NSL, Oxford, MS. Zadeh, L.A., 1965. Fuzzy sets. Information and Control 8, 338–353. Zhao, Z., Chow, T.L., Rees, H.W., Yang, Q., Xing, Z., Meng, F., 2009. Predict soil texture distributions using an artificial neural network model. Computers and Electronics in Agriculture 65, 36–48. Zhou, B., Zhang, X.-G., Wang, R.-C., 2004. Automated soil resources mapping based on decision tree and Bayesian predictive modeling. Journal of Zhejiang University. Science 5, 782–795. Ziadat, F.M., Taylor, J.C., Brewer, T.R., 2003. Merging Landsat TM imagery with topographic data to aid soil mapping in the Badia region of Jordan. Journal of Arid Environments 54, 527–541.

50

GIS and Soil

Further Reading Adamchuk, V.I., Viscarra Rossel, R.A., 2010. Development of on-the-go proximal soil sensor systems. In: Viscarra Rossel, R.A., McBratney, A.B.., Minasny, B. (Eds.), Proximal soil sensing. Springer, New York, p. 15. Borselli, L., Torri, D., Poesen, J., Iaquinta, P., 2012. A robust algorithm for estimating probable values of soil erodibility in different climates. Catena 97, 85–94. Castaldi, F., Castrignanò, A., Casa, R., 2016. A data fusion and spatial data analysis approach for the estimation of wheat grain nitrogen uptake from satellite data. International Journal of Remote Sensing 37, 4317–4336. Deutsch, C.V., Journel, A.G., 1998. GSLIB Geostatistical software library and user’s guide. In: Applied geostatistics series, 2nd edn. Oxford University Press, Inc, New York, NY. Franklin, J., 1995. Predictive vegetation mapping: geographic modeling of biospatial patterns in relation to environmental gradients. Progress in Physical Geography 19, 474–499. King, D., Stengel, P., Jamagne, M., 1999. Soil mapping and soil monitoring: State of progress and use in France. In: Bullock, P., Jones, R.J.A., Montanarella, L. (Eds.), Soil resources of Europe. EUR 18991 EN. Office for Official Publications of the European Communities, Luxembourg, 204pp. Kirkby, M.J., Chorley, R.J., 1967. Throughflow, overland flow and erosion. Journal International Association of Scientific Hydrology. Bulletin 12, 5–12. Nazari Samani, A., Ahmadi, H., Jafari, M., Boggs, G., Ghoddousi, J., Malekian, A., 2009. Geomorphic threshold conditions for gully erosion in Southwestern Iran (Boushehr–Samal watershed). Journal of Asian Earth Sciences 35, 180–189. Young, A., 1972. Slopes. Oliver and Boyd, Edinburgh, p. 288.

Relevant Websites http://esdac.jrc.ec.europa.eudDigital soil mapping. www.pedometrics.orgdPedometrics. www.css.cornell.edudSoil Geographic Databases. www.iuss.orgdInternational Union of Soil Science. http://geomorphometry.orgdGeomorphometry.

2.04

GIS for Hydrology

Wolfgang Korres and Karl Schneider, University of Cologne, Cologne, Germany © 2018 Elsevier Inc. All rights reserved.

2.04.1 2.04.2 2.04.2.1 2.04.2.2 2.04.2.2.1 2.04.2.2.2 2.04.2.2.3 2.04.2.2.4 2.04.2.2.5 2.04.2.3 2.04.3 2.04.3.1 2.04.3.1.1 2.04.3.1.2 2.04.3.2 2.04.3.3 2.04.3.4 2.04.4 2.04.4.1 2.04.4.2 2.04.5 2.04.5.1 2.04.5.1.1 2.04.5.1.2 2.04.5.1.3 2.04.6 2.04.7 References

2.04.1

Introduction Data for Hydrology Data Structures Data Sources for Hydrology Digital elevation models Hydrography data Soil Land use and vegetation Climate and precipitation Uncertainties and Errors in Hydrological Data Standard GIS Methods for Hydrology Terrain Analysis Flow direction Drainage network extraction Spatial Interpolation of Precipitation Data NRCS Curve Number Pedotransfer Functions GIS and Hydrological Models Integrating GIS With Hydrological Models Taxonomy of Hydrological Models State and Perspectives of GIS Applications Spatial Data Infrastructure in Hydrology International portals National portals and services Observatories Decision Support Systems for Hydrology Future Prospects

51 52 53 54 54 54 55 55 56 57 57 58 58 59 59 61 62 62 62 63 65 66 67 68 69 71 74 75

Introduction

“Hydrology is the science that treats the waters of the earth, their occurrence, circulation and distribution, their chemical and physical properties, and their reaction with the environment, including their relation to living things. The domain of hydrology embraces the full life history of water on earth” (Maidment, 1993a). With the reference to properties, environmental interactions and distribution, this definition already implies the intricate and tight relationship between hydrology and GIS science and technology. De Haar (1974) explicitly addressed the human dimension in his definition of hydrology by including the interactions between natural prerequisites and human impacts. In the era of the Anthropocene (Waters et al., 2016), the interactions and feedback between natural and human systems has particularly become a focus of attention for basic as well as applied hydrology. The recent development of decision support systems in hydrology and water resources management shows that this integrative approach, which addresses natural and social scxiences as well as applied sciences and engineering, has developed far beyond the qualitative description of this complex system. The availability of measurement networks, remote sensing and GIS techniques, and the advancement of scientific understanding are prerequisites for this progress. In the infancy of hydrology, the focus was on improved process understanding based on field studies or with respect to modeling on lumped watershed or on time series analysis. The latter were and still are particularly relevant for problems of applied hydrology (e.g., flood forecasting). The Unit Hydrograph Approach (Sherman, 1932) is an example of nonspatial modeling of hydrological processes. These lumped models assume uniform spatial properties within the watershed, which can be addressed with effective model parameters, or so-called system response functions, where the watershed properties define the system response function. Some models also allowed for a small number of spatial subunits with uniform properties (Maidment, 1993b). Later hydrological models (e.g., HEC-1) used an inherit topology by connecting subwatersheds, but an explicit area representation or contiguity was still lacking (Chow et al., 1988). Since hydrological modeling and analysis approaches were in existence before the advent of spatial technologies, GIS and hydrology developed largely in parallel (Sui and Maggio, 1999). Traditionally, hydrologists focused mainly on time series, lumped watershed models and stochastic analysis, whereas GIS developers addressed methods of spatial analysis. With the advent of spatial technologies such as remote sensing, GIS, and spatial statistics, these lumped models were augmented

51

52

GIS for Hydrology

by a new model family, the distributed models, which explicitly use spatial information to describe the hydrological system. Particularly, deterministic, distributed models require a large amount of spatially distributed data, which can be provided through remote sensing, GIS techniques and spatial statistics. The advancement of hydrological process understanding, computer, and sensor technologies has led to an ongoing and rapid development of distributed hydrological models. Traditionally hydrological models addressed particularly randomness and time variation of hydrological variables by analyzing the stochastic or the time variant behavior of the hydrologic system. However, these dimensions are not the primary focus of GIS systems. Thus, while the advancement of spatial technology has been rather stimulating for hydrological sciences, integrating hydrological data, models and GIS techniques poses a particular challenge. Efforts to integrate GIS and hydrological modeling started in the 1980s. With the development of new analytical capabilities in GIS systems, a closer integration with hydrological models and data analysis tools became feasible with the terrain analysis from digital elevation models (DEMs) becoming standard preprocessing procedures for hydrological analyses, oftentimes directly integrated into GIS systems. The advent of personal computers was particularly relevant for the broad acceptance and spread of GIS technology, hydrological data analysis and hydrological modeling. The increasing interest in integrating GIS and hydrology is evidenced by the number of dedicated international conferences since 1990. One of the pioneering conferences was the First International Conference on GIS and Environmental Modeling held in Boulder, Colorado in 1991 (Maidment, 1993b) followed by the IAHS conferences on Applications of geographic information systems (GIS) in Hydrology and Water Resources Management in 1993 and 1996 (Kovar et al., 1993, 1996). Today conferences focusing on GIS and hydrology as well as dedicated journals on GIS and remote sensing applications in hydrology are well established (e.g., Journal of Spatial Hydrology) and indicate the strong interest in this field. While the number of papers published per year in GIS and hydrology was less than 10 before 1990, it has increased to more than 180 papers in 2015 and 2016 (http://apps.webofknowledge.com search for hydrology and GIS, accessed 9 January 2017). Also, the growing number of books addressing specifically GIS and hydrology clearly shows the increasing relevance of this subject for science and application as well as the active research done in this area (Chen et al., 2015; Dixon and Uddameri, 2016; Gurnell and Montgomery, 2000; Johnson, 2009; Singh and Fiorentino, 1996; Vieux, 2016). Against this background, this article aims to (a) (b) (c) (d)

provide an overview of the GIS fundamental approaches used in hydrology; present the state of the art of integrating GIS and hydrology; provide a point of reference for hydrological data sources and GIS applications, and discuss issues of future perspectives of GIS and hydrology, particularly with regards to decision support systems.

In view of the fact that entire books are dedicated to the application of GIS techniques in hydrology and water resources, the aim of this article is to provide a point of reference rather than a comprehensive account.

2.04.2

Data for Hydrology

Spatial data are the foundation of GIS and of spatial hydrology. With spatial data, we mean information that is associated with a geographic coordinate system. In this section a short overview of data in hydrology, data structures, and data sources is given. Traditionally, spatial data was stored in the form of maps. Thus, one basic approach to providing spatial datasets is by converting these analogous media into a digital format. Maps and photogrammetric data can be scanned, digitized, and referenced with spatial information. Existing digital datasets (e.g., from sensor systems) can be augmented with spatial information and can be imported into a GIS database. Often simple data format conversions are applied during the import process. Other approaches to provide spatial data involve the use of ground-, aerial-, or satellite-based measurement systems (Johnson, 2009). The advent of GPS (global positioning system) has proved to be a major advancement for collecting field data. GPS allows for the quick and simple tagging of the measurement with spatial information from a handheld GPS device. GPS calculates the position of the GPS receiver (latitude, longitude, and altitude) by trilateration of at least four specialized satellites. The importance of GPS for civilian and military applications is evidenced by the fact that after the completion of the first GPS system, called Navstar (Navigation System Using Timing And Ranging), by the United States in the mid-1990s, initially only a military application, the system was cleared for civilian use in 2000. The importance of GPS technology led to the development of additional systems by other nations as well as the European Union. These are particularly the Russian Global Navigation Satellite System (GLONASS), the European Union’s Galileo positioning system, the Chinese BeiDou Navigation Satellite System, the Japanese Quasi-Zenith Satellite System, and the Indian Regional Navigation Satellite System (IRNSS, 2017). The accuracy in latitudinal and longitudinal direction of a handheld GPS receiver is typically up to about 5 m; higher accuracy of up to a few millimeters is feasible with DGPS systems (differential GPS) which use correction signals from fixed ground-based reference stations. More details on GPS systems are provided in another article of this book. Many modern measurement devices today are already equipped with an on-board GPS receiver and store the spatial information in their metadata (data about data). The ubiquitous availability of GPS information in smartphones has led to a new realm of data sources for spatial hydrology through citizen science (Fienen and Lowry, 2012; Le Coz et al., 2016) and big-data approaches (Chen and Han, 2016). Remote sensing is one of the key technologies for repetitive spatial measurements of hydrologically relevant data, providing data on all spheres of hydrology ranging from dedicated missions on atmospheric parameters (e.g., Global Precipitation Mission GPM,

GIS for Hydrology

53

2017) to groundwater (e.g., Gravity Recovery And Climate Experiment GRACE, 2017) and from applications to address water supply (e.g., water levels) to water use (e.g., irrigation). Remote sensing provides large amounts of spatial data with inherent spatial properties defined by the spatial resolution of a pixel and the spatial extent of the image. The temporal (repetition rate) resolution of the remote sensing systems allows observation of the temporal dynamics of hydrological processes or parameters. Typically, spatial resolution, spatial extent of the image and repetition rate are inversely proportional to each other. The increasing number of spaceborne systems provide a vast amount of data for hydrology with information on a wide range of parameters (rainfall, soil moisture, topography, vegetation, etc.). Satellite instruments are able to measure reflectance of the earth’s surface in the visible and infrared spectrum with a bandwidth ranging from panchromatic (one band for the visible range, higher spatial resolution) to multispectral (different spectral bands with less spatial resolution) up to hyperspectral resolutions. While these measurements are limited to daytime observations, thermal measurements are also available during nighttime. Surface reflectance measurements are particularly useful to derive information on the type and status of the earth surface (e.g., land use, leaf area). Surface temperature measurements relate to energy fluxes at the land and water surface and are particularly useful to determine evapotranspiration and its spatial patterns (Bastiaanssen et al., 1998; Mauser and Schädlich, 1998; Schneider and Mauser, 1996). Starting in 1972, NASA’s (National Aeronautics and Space Administration) LANDSAT mission has provided an extensive data source for many hydrological applications. Many other space agencies have added to the earth observation capabilities, namely the European Space Agency (ESA) with its SENTINEL 2 mission, the Centre national d’études spatiales (CNES) with SPOT, NASA with TERRA and Aqua, the Indian Space research organization (IRSO) with the IRS systems, to name just a few. Private enterprises such as DigitalGlobe (DigitalGlobe, 2017) also add to the plethora of satellite data with the Digital Globe mission. While optical and thermal remote sensing of the earth’s surface is limited by daylight and/or cloud, radar systems provide a new data quality which is particularly relevant for hydrology. These systems allow observations of the earth’s surface also during cloudy conditions. Two types of radar systems can be distinguished: (a) active systems, which carry an on-board illumination source and which measure the backscatter intensity from the earth’s surface of a pulse transmitted by the satellite instrument and (b) passive systems, which use the radiation emitted from the earth’s surface. Active radar systems typically provide a high spatial resolution but a low repetition rate, while passive systems usually have a low spatial and high temporal resolution. Examples for active radar systems are the ERS, ENVISAT and SENTINEL 1 (C-Band) satellites from ESA, RADARSAT 2 (C-Band) from CSA (Canadian Space Agency) or ALOS 2 (L-Band) from JAXA (Japan Aerospace Exploration Agency). Passive systems (radiometer) are for instance SMOS (L-Band) from ESA. NASA’s SMAP (Soil Moisture Active Passiv, L-Band) system provides a combination of both passive and active radar systems. As the backscatter intensity measured by radar systems depends upon the dielectric constant of the surface, which is in turn a function of soil moisture, radar systems are particularly useful for surface soil moisture determination (Kornelsen and Coulibaly, 2013). However, surface roughness effects, vegetation effects and system inherent effects such as speckle make soil moisture retrievals difficult. Different levels of data products from raw data up to high-level data products such as daily soil moisture maps merged from remote sensing information and modeling are provided by the different operating agencies. Even though these satellite missions are several billion dollar projects, their products provide large amounts of GIS ready data at lower cost than traditional methods. Moreover, the temporal repetition and spatial coverage cannot be achieved with any other measurement approach. Remote sensing techniques are essential to bridge the scale gap between measurements and time series provided by local field studies and the spatial coverage needed for large-scale distributed information. Data fusion of field measurements, remote sensing measurements and models is an essential task for applying GIS methods in hydrology and for providing the needed data for hydrologic analysis at the required scale, spatial resolution and temporal recurrence rate. While for small research areas the challenge is to acquire data with suitably high spatial and temporal resolution, for large areas a special challenge is typically the homogenization of the dataset. All data from different data sources require the assembly into a consistent reference frame. A typical standard procedure of GIS software is to transform data with different projections into the required reference frame.

2.04.2.1

Data Structures

GIS data are typically differentiated into two basic categories, attribute data and spatial data. Attribute data characterize features, such as locations, names, properties (e.g., capacities of dams and reservoirs, pumps or turbines). It is also possible to manage time series data in attribute databases, for example river flow rates or reservoir releases (Johnson, 2009). Spatial data includes the location of a feature (for example, a gauging station, a stream or a watershed) and can be represented by one of six fundamental data structures (Maidment, 1993b), including three basic structures (point, line, polygon) and three derived structures (grid, triangulated irregular network (TIN), and network). The three basic structures of points, lines, and polygons are vector structures placing geographical features or objects precisely into a continuous map space, comparable to traditional hard copy maps (Garbrecht et al., 2001). A key aspect of the vector data model is the inclusion of topological information (spatial relationships or connectivity of neighboring objects), allowing for the automated analysis and interpretation of spatial data in GIS systems (Meijerink, 1994). The simplest geographic feature in GIS is a point, containing spatial and attribute information and representing in hydrology for example gauging stations, wells or point sources of contamination. In a simple network model, points can also be used to represent a point or node within a lumped system model without spatial dimensions. In this case the network model only describes the connectivity between different nodes and determines the order in which hydrologic operations (e.g., water flowing through the system) will be performed (Maidment, 1993b). A line or arc feature usually describes a river, a channel or a pipe in hydrology and begins and ends at a node. Flow and transport processes can be calculated by differential equations for segments of that line, describing the motion of water and its constituents. Polygons are sets of connected lines or arcs defining such features as

54

GIS for Hydrology

boundaries of watersheds or aquifers. In hydrology, these polygon features normally set the boundaries for grid or TIN structures that capture the spatial variability within these polygons (Maidment, 1993b). Resolution of vector data depends on the proximity of vertices, with closer vertices resulting in higher resolutions. Vector data is a very good choice for representing discrete features with well-defined geometric features, but it is not the best way to describe features varying continuously over space such as temperature, precipitation or elevation (Dixon and Uddameri, 2016). A grid partitions the spatial domain into a rectangular pattern of cells, with each cell specifying the value of the parameter. Each grid cell is referenced by its coordinates (row and column number) with the lines of the grid being located in space. In a grid data structure a point is represented by a single grid cell with a point identifier, a line by connected cells forming a line with a line identifier, and an area by a group of adjacent cells with an area identifier. The resolution of a raster data model is determined by the cell size with smaller cell sizes leading to higher resolutions. DEMs are sometimes stored in TINs, interconnected irregular triangles with points of known location and elevation at the vertices. Because triangles can be adapted in size, with smaller triangles in rapidly changing topography and large triangles in smooth topography, fewer data points are needed to describe a surface with equal accuracy compared to a grid structure (Garbrecht et al., 2001). Martin et al. (2005) gives examples of the misuse of data structures such as the use of polygons with defined boundaries to represent the continuously varying distribution of soil properties or the rasterization of soil group polygons, hence distorting the original boundaries. The availability of GIS data in raster format from remote sensing applications and the simplicity of data processing have contributed to the popularity of the raster data format (Garbrecht et al., 2001). Johnson (2009) compiled advantages and disadvantages of both data structures, vector and raster. Advantages of vector data structures include a good representation of point, line, and polygon features, compactness of data storage, accurate graphics at all scales and relational representation of objects. Advantages of the raster data format are the simple data structures with easy spatial analysis possibilities and data availability. For a more comprehensive review on this topic, the reader is referred to the corresponding article of this book or to Burrough et al. (1998).

2.04.2.2 2.04.2.2.1

Data Sources for Hydrology Digital elevation models

The relief of the earth changes within a watershed, causing water to move from higher to lower elevations following the gradient of potential energy. Detailed information about topography is therefore important to delineate the watershed boundaries corresponding to a defined outlet point and to route water as runoff through the watershed. DEMs provide essential hydrologic data such as the course of the streams within a catchment, the connectivity of the river system, the flow paths, slope, aspect or curvature, to name just a few. There are usually three structures representing topographic information: (a) grids, (b) TINs, and (c) contour lines. Square-grid data structures are the most widely used format of DEMs, because they are very simple and efficient to process. TIN structured DEMs are more complex to process, but have the advantage of the adaptable size of triangle structures depending on the complexity of the terrain, they can be easily up- and downscaled and the accuracy is not dependent on the spatial resolution. Contour-based structures provide a better visualization of landscape features, but the description of the topography by the one-dimensional features of contour lines is more complex and requires additional data. In conclusion, all these different data formats have their specific advantages and disadvantages. Nevertheless, the most widely used format for DEMs are square-grid data structures (Garbrecht et al., 2001). This is not only owed to the simplicity of the data format, but also to the data source, as DEMs are often derived from remote sensing data. It is important to recognize that DEMs are models of topography and that their accuracy not only depends on the terrain roughness, grid resolution, number, distribution and accuracy of the elevation measurements, but also on the interpolation and resampling methods used to generate the final product (Dixon and Uddameri, 2016). The USGS (United States Geological Survey) provides several DEM datasets though their EROS datacenter (Earth Resources Observation and Science, USGS, 2017a): the Global Multiresolution Terrain Elevation Data 2010 (GMTED2010) contains seven (different aggregation methods) raster elevation products from different global elevation data sources for different spatial resolutions, from 10 m for selected areas to 1 km. The National Elevation Dataset (NED, USGS, 2017b) is the primary dataset of the USGS, covering the United States, Alaska, Hawaii, and territorial islands with resolutions between 3 and 30 m. The NED was constructed using high-resolution methods for some areas, such as light detection and ranging (LIDAR) and interferometric synthetic aperture radar. LIDAR is an aircraft-mounted laser, recording high-resolution elevation and location data. LIDAR (USGS, 2017c) measurements are suitable for highly detailed terrain mapping, but they are expensive and not available for all areas. On a national level, it is often possible to find data with higher resolutions. For example, for Germany the German Federal Agency for Cartography and Geodesy (Bundesanstalt für Kartographie und Geodäsie, 2017) provides DEMs with 10, 25, 50, 200, and 1000 m spatial resolution. Other global DEM datasets are a 90 m resolution dataset (30 m inside the USA) from the shuttle radar topography mission (SRTM, USGS, 2017d), covering 80% of the earth’s surface between 60 north and 56 south and the ASTER Global Digital Elevation Model Version 2 (GDEM V2, JPL, 2017) with a global resolution of 30 m.

2.04.2.2.2

Hydrography data

Hydrography data provides information on flowing water (streams, pipes and channels), standing water (lakes and estuaries) and hydrologic boundaries (watersheds, drainage divides, and dams). Surface drainage and channel network are important landscape attributes and can be determined by field surveys or by digitizing data from remote sensing or topographic maps (Johnson, 2009). It is also possible to use already digitized data or to derive this data from DEMs. Digitized stream data are line elements most often provided in vector format with an accurate representation of the location, length, and connectivity of the stream network within the

GIS for Hydrology

55

watershed (Garbrecht et al., 2001). It might be necessary for the user to edit (add or remove stream segments) the data to represent the needs of the user regarding the required scale and resolution. There might be differences in the position of stream data from different sources (digitized from maps, field surveys, derived from DEMs) due to differences in acquisition time, differences in the resolution of the data or methodological differences. A separate attribute table can provide stream attributes (e.g., channel geometry, roughness information) in addition to the physical location of the stream (Garbrecht et al., 2001). A complete description of a stream consist of a map view showing the location of the center line of the stream, a series of cross-section profiles whose location is indicated on the map and a longitudinal profile of the stream bed (Maidment, 1993b). With this information it is also possible to delineate floodplains and identify flood-risk zones, for example the limits of 10-year flood, 100-year flood or other recurrence intervals (Maidment, 2002). For lakes and estuaries, the water depth is considered a critical parameter, because it influences the flow characteristics, mixing and the temperature profile within the water body. Depending on the volume-to-flow ratio and on the morphology, the flow characteristics in some lakes and particularly in dammed-up rivers is comparable to very slow flowing rivers with a large water depth (Maidment, 1993b). Other important data for hydrology are time series of stream flow and stream level. These data from stream gauges integrate the water-related processes of the catchment upstream of the measurement location. Gauges are important points of reference to calibrate and validate hydrological models. In GIS a point feature represents them. However, a point feature might relate to a linear feature of a river (e.g., stream gauge) or to a surface, as is the case for groundwater level gauges measuring the groundwater surface. Different agencies are involved in measuring and providing these different data sources. Major efforts are and were being made to standardize and harmonize data from the different sources to establish a consistent data format and quality control methods. For the United States, surface waters are available from the USGS in the national hydrography dataset (NHD). The NHD includes datasets covering all streams and lakes at scales of 1:24,000 and 1:100,000, also ponds, canals, dams, stream gauges and wells, in some areas even with higher resolutions. Watershed boundaries are provided in the Watershed Boundary Dataset (WBD, USGS, 2017e). The National Water Information System (NWIS, USGS, 2017f) is another web portal maintained by the USGS providing streamflow information from historical data to near real-time data. NWIS also provides groundwater level data from thousands of wells and other water-related data from all over the United States. The Global Runoff Database maintained by the Global Runoff Database Center (GRDC Bundesanstalt für Gewässerkunde, 2017) provides a large collection of historical and actual river discharge data collected at daily or monthly intervals from more than 9200 stations in 160 countries. Watershed boundaries for the gauging stations are also provided. The GRDC operates under the auspices of the World Meteorological Organization (WMO) and is hosted by the German Federal Institute of Hydrology (BfG). The German Federal Agency for Cartography and Geodesy (Bundesanstalt für Kartographie und Geodäsie, 2017) provides GIS data layers on streams, rivers, canals and lakes for different resolutions while streamflow data are provided by the BfG.

2.04.2.2.3

Soil

Soils play a key role in hydrology, because they influence the partitioning of precipitation into infiltration and runoff and subsequently the movement of water in the soil and the groundwater recharge. It is important for evaporation, water-holding capacity for plant growth and consequently for transpiration as well. Soil-mapping agencies linked particularly to the agricultural sector typically provide soil data. Some of the datasets contain soil hydraulic properties. For other datasets, soil hydraulic properties must be derived from soil texture and oftentimes soil organic carbon contents by using Pedotransfer Functions (PFTs) (for examples of the use of PTFs see “Pedotransfer Functions” section). The Harmonized World Soil Database (HWSD, actual Version 1.21, IIASA, 2017) is a 30 arc-second raster database with over 16,000 different soil mapping units. It combines existing regional and national updates of soil information worldwide. The raster map is linked to attribute data providing information on selected soil parameters (organic Carbon, pH, water storage capacity, soil depth, cation exchange capacity, clay fraction, total exchangeable nutrients, lime and gypsum contents, sodium exchange percentage, salinity, textural class and granulometry). The European Soil Database (ESDB) is a harmonized database for Eurasia consisting of four databases. The spatial database is the Soil Geographical Database of Eurasia at scale 1:1,000,000 (SGDBE), connected to three attribute databases, the Pedotransfer Rules Database (PTRDB), the Soil Profile Analytical Database of Europe of measured parameters (SPADBE) and the Database of Hydraulic Properties of European Soils (HYPRES). More detailed soil information for a given region is typically available from federal agencies. In the United States, digital soil information is available from the Natural Resources Conservation Service (NRCS, USDA, 2017). It includes the State Soil Geographic database (STATSGO) with a mapping scale of 1:250,000 and the more detailed Soil Survey Geographic database (SSURGO) with mapping scales ranging from 1:12,000 to 1:63,360. For Germany, soil maps with a scale of 1:200,000 and 1:1,000,000 are available from the Federal Institute for Geosciences and Natural Resources (BGR, Bundesamt für Geowissenschaften und Rohstoffe, 2017) in cooperation with the National Geological Surveys (SGDs) of the federal states. In Germany, more detailed information with a scale of 1:50,000, sometimes 1:5000, is provided by the federal states. Increasingly, soil data is available as a web-based service with GIS functionalities. In some cases, spatial data and attribute databases are available for download.

2.04.2.2.4

Land use and vegetation

Land use or land cover information influences important components in the hydrological cycle, such as runoff, infiltration, evapotranspiration and groundwater recharge. Forest for example usually generates less surface runoff compared to agriculture, pasture, grassland and urban areas. The temporal dynamics of land use and land cover change are also particularly relevant for hydrology. Forest, arable land use, pasture, grassland, and urban areas all have their specific character with respect to vegetation growth period and temporal dynamics of vegetation turnover, vegetation parameters and related hydrological fluxes. Arable land often shows

56

GIS for Hydrology

changes in vegetation type of up to three cycles or more per year, whereas grassland, pasture and forest typically have a longer persistence of several years to decades. Thus, depending upon the task at hand, it might be necessary to acquire more than one land use map per year and/or to have information about land management procedures, especially crop rotation patterns, sowing, cutting or harvesting dates. Land management characteristics together with the size of the fields determine to a large degree the spatial patterns as well as the temporal dynamics of water fluxes (Korres et al., 2013). Remote sensing provides the means to obtain detailed land use information over large areas. To establish a relationship between the remotely sensed signal and the surface parameter of interest, ground truth measurements are often required. The level of detail regarding the spatial resolution, the number of vegetation classes or the accuracy of the vegetation parameters derived from remote sensing data can vary largely. More information on land use classifications is provided in the corresponding article in this book. Time series of remote sensing observations can also be used to detect management parameters (e.g., cutting dates for grassland or sowing densities of crops) (Schneider, 2003). Spatially distributed vegetation indices such as NDVI and the related LAI (leaf area index) derived from remote sensing are particularly relevant as input parameters for hydrological process models. Evapotranspiration rates of the plants largely depend upon the LAI. A particular challenge in using remote sensing data to update and validate model results is the saturation effect of the reflectance. Thus at higher LAI or biomass values the remote sensing signal typically saturates. This saturation effect can be reduced by utilizing plant-specific combinations of narrow band reflectance (Gnyp et al., 2014). Avoiding misinterpretation of remote sensing products when used for model validation or as model inputs may be achieved by utilizing the reflectance or reflectance-based indices instead of the biophysical parameter such as LAI or biomass. An example of such an approach is given in Fig. 1. This approach employs a GIS-based crop growth model, remote sensing measurements, and a radiative transfer model to adjust model parameters. Land use data sets are available for download from various sources. For the US, the Multiresolution Land Characteristics Consortium (MRLC) produced land use maps with 16 classes and a 30 m resolution for the years 1992, 2001, 2006, and 2011 (DOI, 2017). The CORINE (Coordination of Information on the Environment) land cover from the European Union provides harmonized land use information for Europe with 44 land use classes and a minimum mapping width of 100 m. The first map was produced for 1990 and was updated in 2000, 2006, and 2012 (European Commission, 2017a). Even with these land use maps available for download, the specific requirements of hydrological applications may necessitate the production of customized land use maps.

2.04.2.2.5

Climate and precipitation

Spatial climate and precipitation data are essential drivers for hydrological models. These data are typically provided by a network of measuring stations, from ground-based radar estimates or from remote sensing estimates. The decision on which data sources are suitable depends on the spatial and temporal scale and the accuracy requirements of the analysis. Climate data is usually measured and collected by national agencies. For example, the German Weather Service (DWD) operates a dense network of climate stations, with approximately 130 stations run by DWD staff and over 1700 stations maintained by other personnel. NOAÁs (National Oceanic and Atmospheric Administration) National Centers for Environmental Information (NCEI, former NCDC) provide climate

Calibration & atmospheric correction

Optical RS images

Plausible ranges of

Observed reflectance spectrum

Crop growth model PROMET-V

GeoSAIL inversion

Modelled reflectance spectrum

Geobiophysical maps

Meteo-data, land use, GIS-Info

Adjust LAI, fraction brown, soil moisture

Canopy reflectance modeling GeoSAIL

Biomass, yield, height, ...

Simulated total LAI

Adjust density / cutting date

RS-retrieved total LAI

Fig. 1 Estimation of model parameters from a coupled remote sensing and modeling approach. Figure from Bach H, Mauser W, Schneider K (2003) The use of radiative transfer models for remote sensing data assimilation in crop growth models. In: Stafford J and Werner A (eds.) Precision Agriculture, pp. 35–40. The Netherlands: Wageningen Academic Publishers. http://dx.doi.org/10.3920/978-90-8686-514-7, with permission.

GIS for Hydrology

57

data from about 1600 locations from the Quality Controlled Local Climatological Data (QCLCD) and data from a network of 10,000 volunteers in the Cooperative Observer Network (COOP) for the United States. Climate data is often also measured by private companies, government agencies or university researchers for individual projects and it can be very elaborate and timeconsuming to identify these local sources (Garbrecht et al., 2001). Employing interpolation techniques, spatial data can be derived from these point measurements (see “Spatial Interpolation of Precipitation Data” section). Precipitation intensity, its spatial patterns and temporal dynamics may also be derived from precipitation radar. This technique is based on the relationship of radar reflectivity and rainfall intensity. While the location information of the precipitation by radar systems is quite exact, the accuracy of the precipitation amounts measured from climate stations exceeds the quality of the precipitation radar products. Thus, typically both measurement systems are combined to yield high-quality measurements with high spatial resolution. The NEXRAD (Next Generation Radar) dataset is a network of 160 high-resolution Doppler weather radars in the United States operated by the NOAA National Weather Service (NWS), the Federal Aviation Administration (FAA), and the U.S. Air Force (USAF). Their Level III product is a merged product over all available stations, including estimated ground accumulated rainfall amounts for 1- and 3-h periods, storm totals, and digital arrays. The German Weather Service (DWD) operates a similar radar network with 17 radar stations for Germany. For global applications, satellite systems also provide precipitation information. One example is the Tropical Rainfall Measuring Mission (TRMM) satellite from NASA and the Japanese Aerospace Exploration Agency (JAXA). It flew from 1997 to 2015 and provided rainfall estimates in a 0.25 grid over the latitude band 50 N–S. Currently the Global Precipitation Measurement as an international satellite mission provides observations of rain and snow worldwide every 3 h (GPM, 2017). Other precipitation products are available from NCAR/UCAR through its global precipitation climatology project (GPCP) providing daily rainfall estimates in a 1 grid globally (NCAR and UCAR, 2017) or by data provided from reanalysis projects. The WATCH project of the European Union has produced several datasets useful for regional and global studies of water, for example the WATCH-Forcing-Data-ERAInterim global reanalysis dataset providing eight meteorological variables at 3-hourly time steps and as daily averages with a 0.5 resolution between 1979 and 2012 (Weedon et al., 2014).

2.04.2.3

Uncertainties and Errors in Hydrological Data

To derive information from data, knowledge about errors and uncertainties of the data is critical. The knowledge about input data uncertainties enables interpretation of the results and gives meaning to the results. Even during the data collection process, measurement errors may occur, for instance due to malfunction or misuse of the instrument or device. GIS methods can help to identify errors, for instance by identifying outliers by visualizing spatial or temporal data or by supporting the selection of suitable datasets for statistical analysis. For example, erroneous values from climate stations can be identified with cumulative sum analysis or other homogeneity tests by comparing the values of the one station with values of suitable neighboring stations. GIS data also allow for applying a moving window filter on spatial data using proximity or distance relationships of the different pixels. Location errors also must be considered. In most cases, these errors arise from format conversion of analog to digital data, between raster and vector formats or simply by misinterpreting the required input format. A general problem is the projection of objects from the 3D surface of the earth’s geoid to the 2D map format. This process always leads to some distortions of the properties of directionality, distance, size or shape. Maps that preserve some of these properties are called azimuthal maps (true directionality), equidistant maps (true distances), equivalent maps (true size) or conformal maps (true shape) (Dixon and Uddameri, 2016). Data may also contain topological errors, meaning errors of relationships between objects across one or between different layers. Uncertainties in data can arise from limits of accuracy and precision of the measurement method, due to the interpolation of data to attain spatial data from point data sources or from generalization (resampling to coarser resolution) of the data. It is imperative to communicate these errors and uncertainties, for example within the metadata or inside the attribute table of a dataset by listing the method of data collection with uncertainties, the method of data conversion and the method of derivation and transformation to produce the final product. GIS systems usually provide several visual methods or cartographic techniques to represent and document uncertainties, and todalso visuallydcommunicate uncertainties to the user (Sui and Maggio, 1999).

2.04.3

Standard GIS Methods for Hydrology

This paragraph describes some of the fundamental GIS procedures used in hydrological applications. Flood analysis and runoff modeling are of primary importance for hydrology. The dynamics of water fluxes at the land surface are essential for both the management of water supply as well as the control of flood or drought risks. Fundamental characteristics of a watershed concerning water storage and flow characteristics are determined by the terrain. Thus, the analysis of DEMs is of critical importance for hydrology. Terrain analysis, determination of flow direction, flow accumulation and drainage networks can therefore be regarded as standard GIS-based analyses for hydrology. Deriving spatially distributed datasets from precipitation measurements, using methods to determine the separation of precipitation into surface runoff and infiltration and methods to derive hydrological properties of the soils column are essential to understand the pathways of the water within a catchment.

58

GIS for Hydrology

2.04.3.1

Terrain Analysis

Topography has a major impact on hydrological, geomorphological, and biological processes within a catchment. The distribution of topographic attributes in a catchment are often correlated to the spatial variability of these processes and can be used as a proxy (Moore et al., 1991). Manual extraction of topographic information from topographic maps can be time consuming, subjective and error-prone (Garbrecht and Martz, 1995a). The availability of DEMs and automated GIS methods in several software packages established a user friendly and quasi-standard option to extract or derive hydrographic data even for large watersheds from topographic information (Jenson and Domingue, 1988). Topographic attributes can be classified as primary and secondary (or compound) attributes. Primary attributes can be directly derived from topographic data, whereas compound attributes involve combinations of primary attributes that describe or characterize the spatial variability of specific processes occurring in a landscape (Moore et al., 1991). Looking at the hydrologic significance of some of the primary attributes, even simple attributes such as altitude or aspect (directional measurement of the slope) have a large influence on the spatial variability of climate parameters in a catchment (e.g., temperature, precipitation, solar irradiation). Slope is the first derivative of elevation and describes the change of elevation over a certain distance. It influences overland and subsurface flow velocities and runoff rates. Erosion processes are dependent on slope. Therefore, slope is one parameter in the universal soil loss equation, which is a widely used empirical model to describe erosion processes. Even with a basic attribute such as slope, it is important to understand the differences arising from the different standard calculating procedures implemented in GIS packages. Warren et al. (2004) pointed out that the standard method to compute the slope by trigonometric function (change in elevation over a certain distance) is less accurate than using differential geometry (magnitude of the tangent vector of the surface pointing in the direction of steepest slope). Surface curvature can be seen as the second derivative of elevation, as it describes the change of slope over a distance. The curvature along the maximum slope is termed profile curvature and influences parameters like flow acceleration, erosion, and deposition rates. The curvature perpendicular to the maximum slope is termed plan curvature and is connected to converging and diverging flow and soil water content. Other examples of primary attributes are upslope area, defined as the catchment area above a certain grid point containing information about the runoff volume and the runoff rate under steady state conditions, or flow path length, defined as the maximum distance of water flow to a point in the catchment influencing erosion rates, sediment yield and time of concentration of the water flow (Moore et al., 1991). Two examples for secondary or compound attributes are the wetness index and the stream power index. Both are derived by arithmetic combination of primary attributes. The topographic wetness index was developed within the runoff model TOPMODEL by Beven and Kirkby (1979). This unitless steady-state index is a function of both the slope and the upstream contributing area (catchment area divided by the cell width in slope direction) and is useful as an indicator of soil water content and soil water drainage. The stream power index is the product of catchment area and slope gradient and can be used to describe potential flow erosion.

2.04.3.1.1

Flow direction

Methods to compute the afore mentioned attributes or parameters and for segmentation of the watershed are based on some form of overland flow simulation to define drainage courses and catchment areas (Martz and Garbrecht, 1992). This requires the direction of the overland flow to be determined. Regardless of the method to calculate flow directions, closed depressions (pits, sinks) and flat areas on the digital models of the land surface pose problems as they do not provide an outlet for surface flow (Martz and Garbrecht, 1998). Thus, a standard preprocessing step before calculating the flow routing is the removal of closed depressions in the DEM. This processing step works under the general assumption that the closed depressions are spurious and are caused by the limited quality of the input data, interpolation errors during the computation of the DEM, truncation or rounding of interpolated values to lower precision and/or averaging of elevation information within grid cells (Martz and Garbrecht, 1998, 1992; Tribe, 1992). As a consequence, also natural depressions (e.g., in karstic landscapes) are removed by elevating the value to the level of the lowest grid cell at the rim of the depression (Vogt et al., 2003). This common approach of the removal of closed depressions is described by, for example, O’Callaghan and Mark (1984) and Jenson and Domingue (1988). If the depressions are hydrologically significant, the numerical filling of the depressions can be used as a method to determine the storage volumes of the sinks (Moore et al., 1991; Moore and Larson, 1979). Another simple method to address problematic features in a DEM regarding flow routing is the application of a smoothing filter. However, this reduces the overall information content of the DEM, because it uniformly applies smoothing on both, problematic and unproblematic areas of the DEM (Tribe, 1992). Martz and Garbrecht (1998) combined the method of filling depressions arising from elevation underestimation with a breaching algorithm that eliminates or reduces depressions that can be reasonably expected to have resulted from elevation overestimation. They also proposed a method to determine the flow directions over flat areas by using information of the surrounding topography and allowing for flow convergence within flat areas. After preprocessing the DEM, a flow direction grid can be computed. Tarboton (1997) listed relevant issues regarding the evaluation and design of flow direction algorithms. The algorithm should (i) avoid or minimize dispersion of the flow, (ii) avoid grid biases (due to the orientation of the numerical grid), (iii) resolve the flow directions with a high precision, (iv) have a simple and efficient grid-based matrix storage structure, and (v) be robust, to cope with difficult structures in the DEM. The simplest flow direction method, widely used in many GIS systems and hydrological models, is the D8 method (O’Callaghan and Mark, 1984). It is a single flow direction method, which defines the flow from a central cell to one of its eight neighboring cells by using the steepest down-slope gradient. This method has a very coarse resolution of the flow direction, because similar to all single flow methods, all the water flows in one of the eight flow directions resulting in a 45 resolution introducing a grid bias (Tarboton, 1997). Multiple flow direction methods route the water from the central cell to more than one of the neighboring cells using specified rules, but they tend to produce flow dispersion, while single flow direction methods lead to flow

GIS for Hydrology

59

concentration (Wainwright and Mulligan, 2013). Other commonly used algorithms (single and multiple flow direction) are the Rho-8 algorithm (Fairfield and Leymarie, 1991), the FD8 algorithm (Quinn et al., 1991), the D-Infinity algorithm (Tarboton, 1997), the kinematic routing algorithm (Lea, 1992) and the DEM network (DEMON) (Costa-Cabral and Burges, 1994). While multiple flow direction algorithms tend to give better results on convex hillslopes, found in the headwater region of catchments, the single flow direction algorithm D8 provides a reasonable representation of flow patterns for convergent flow conditions, usually found in distributed watershed models (Freeman, 1991; Quinn et al., 1991) and is therefore widely used for DEM analysis (Garbrecht et al., 2001). A detailed comparison of different flow routing algorithms and their influence on topographic and hydrologic attributes can be found in Moore (1996), Wilson et al. (2008), Wainwright and Mulligan (2013), and Tarboton (1997). Wilson et al. (2008) stated that the choice of flow routing algorithms is of great importance because it affects the calculation of the upslope contributing area, the prediction of flow accumulation and several other topographic and hydrologic attributes. However, to fully capture the characteristics of flow paths of the water through the landscape the variability of other factors in addition to topography must be considered. For example, borders between different land cover types or management actions (e.g., cross-slope tillage) can have important effects on flow paths in small-scale studies (Mitas and Mitasova, 1998). Another problem is the temporal variability of the extent of the upslope contributing area, caused by changing spatiotemporal distribution of soil water content (Walter et al., 2000). In many watersheds, for at least some parts of the year, many points receive only contributions from a small part of the total upslope contributing area. It is therefore important to be able to characterize the spatial variability of soil water content to generate meaningful hydrologic predictions at the catchment scale (Moore et al., 1993).

2.04.3.1.2

Drainage network extraction

From the flow direction grid a flow accumulation grid can be computed by calculating how many upstream cells contribute to the flow into each cell. This flow accumulation grid is then subsequently used for the extraction of the drainage network. High values in the accumulation grid correspond to stream channels, while zero values correspond to catchment or watershed boundaries. The computation of the channel source (start of a stream channel) is very critical when extracting the drainage network, because the drainage network can be extracted with an arbitrary drainage density or resolution depending on the definition of the channel sources (Tarboton et al., 1991). A common approach is the constant area threshold method (O’Callaghan and Mark, 1984). With this method, a grid cell is considered as being part of a channel if its contributing area is larger than the defined contributing area threshold. Setting the contributing area threshold to smaller values results in higher drainage density. The contributing area can be computed by multiplying the flow accumulation value of each cell with the cell size. For the identification of an appropriate contributing area threshold Tarboton et al. (1992) proposed to use a break value in the log–log plot of the local slope against the contributing area. With this break value the transition from convex hillslopes to concave valleys is expressed (Vogt et al., 2003). Vogt et al. (2003) extended this method by combining topographic attributes and environmental characteristics for the computation of the drainage network. Over extended areas, it might be necessary to apply a variable contributing area threshold to account for the spatial variation in drainage density in different parts of the catchment (Garbrecht and Martz, 1995b). Another slope-dependent critical support area method described by Dietrich et al. (1993) is based on the assumption that the channel source represents an erosional threshold. When comparing the extracted drainage network from a DEM with information from maps or aerial photos, discrepancies in the positioning of channels can occur, particularly in low-relief landscapes (Garbrecht et al., 2001). A primary reason may be that the resolution of the DEM cannot capture the relevant topographic information, such as a levee. To account for this problem previously digitized stream channels are used by a so-called “stream burning” method to artificially lower the elevation data of the DEM along these digitized channels (Lindsay, 2016; Maidment, 2002). This method can, however, introduce artifacts, because the stream channels may not be consistent with the digital topography of the DEM (Garbrecht et al., 2001). The grid cell size of the DEM determines the accuracy of the drainage network extraction. Smaller channels or hillslope characteristics cannot be resolved sufficiently with coarse or medium-resolution DEMs. Zhang and Montgomery (1994) analyzed the influence of different scales of gridded DEM data (2–90 m cell size) on topographic and hydrologic attributes and suggested a maximum grid size of 10 m. After the computation of the streams and channels, a drainage network topology has to be implemented to index and organize the drainage (or channel) network (Garbrecht et al., 2001). Stream sections between intersections have to be assigned to unique numbers and the order and flow directions of these sections have to be assigned for flow routing. Catchments and subcatchments inside the watershed can now be computed. Fig. 2 shows a result from an automated watershed delineation from the SWAT model (Arnold et al., 1998). The physical characteristics of a catchment can also be used to compute the geomorphological unit hydrograph (Rodríguez-Iturbe et al., 1979). Defining watersheds and stream networks in cities is more complicated than in rural areas, because water flows along curbs and drainage ditches with very small elevation gradients that empty into underground sewer systems (Maidment, 2002). All these processing steps, from DEM prepossessing, computation of the flow direction, flow accumulation, the drainage network extraction and the assignment of the drainage network topology is often readily implemented in hydrology toolsets of GIS software packages or can be computed with specialized software packages. An example of such a dedicated and widely used software package is TOPAZ (Topographic Parameterization) developed by Garbrecht and Martz (1995a).

2.04.3.2

Spatial Interpolation of Precipitation Data

Distributed hydrologic analyses require spatially distributed input datasets generated from the DEM, datasets for land use and its management, soil and climate. Precipitation data is one of the most important parameters, because it has a direct impact on runoff

60

GIS for Hydrology

Sub-catchment outlet Catchment outlet Stream Sub-catchment Catchment

DEM 600 in meters a.m.s.l. 34

N

0

1

2

4 Kilometers

Fig. 2 Pleisbach catchment in Western Germany calculated for the Niederpleis gauge (catchment outlet). Delineated with the SWAT-watershed delineation tool (Arnold et al., 1998) from a 25 m resolution DEM and critical source area of 150 ha.

and discharge from the catchment (Obled et al., 1994). Precipitation data is traditionally collected at rain gauges or weather stations. For distributed analysis, for example as input into distributed hydrological models, interpolation to unrecorded locations (mostly onto a regular grid) becomes necessary. Besides gauge measurements, precipitation data is also available from groundbased or spaceborne radar systems. While networks of rain gauges or weather stations provide an accurate but very localized measurement of rainfall, radars provide rain-rate estimates over large areas at high spatial and temporal resolutions, but low accuracy (Jewell and Gaussiat, 2015). Rain gauges are often the only source of historical precipitation information. Thus, methods to derive spatially distributed data from point measurements remain important (Dingman, 2015) and some commonly used interpolation methods shall be introduced here. The simplest method to derive an areal precipitation from point measurements is the arithmetic average of all stations. This simple approach assumes a spatially uniform distribution over the whole area of interest. Only in special cases this method provides a sufficient representation of the precipitation patterns at any given time, but it might be useful particularly in flat terrain without significant spatial precipitation heterogeneities. The Thiessen polygon method, also known as Voronoi polygons or Dirichlet tessellation, assigns the precipitation values of the nearest station to every grid cell. This method requires the construction of the Thiessen polygon network with the gauging stations in the center of each polygon. This method takes the distribution of the stations into account and is an objective method to estimate a spatial average when the distribution of the stations is irregular. While it is computationally very efficient as long as the measurement network remains unchanged, it does not provide a realistic model of the actual spatial variability of precipitation (Dingman, 2015). Another method is to fit a trend surface to the measured values, by minimizing the differences between a multiple regression function and the measured values. The simplest trend surface is a plane fitted to the data. Higher order polynomials give increasingly irregular surfaces. A trend surface is a smoothing method, except if the number of terms in the polynomial is equal to or higher than the number of stations. Thus, the fitted surface does not pass exactly though the measured values (Tabios and Salas, 1985). In the inverse distance weighting (IDW) method the interpolation is based on the distance between the grid points and each station. The value is computed as the weighted average of all stations, with the weight of each station being inversely proportional to the distance between them. This weighting scheme is based on the idea that measurements that are closer together are more alike and should receive a larger weight than those further apart (Goovaerts, 2000). There are several modifications of the IDW method. The first one is to change the power of the distance weight from one to some higher value in order to increase the decay of the weights with the distance. Most frequently, the exponent of two is used. The second group of modifications is to use only a subset of stations for interpolation, thus changing the method from a global to a local interpolation method. The subset is typically defined either by a certain maximum distance of the measurement locations from the target location or by choosing a certain minimum number of measurement locations to be used for the interpolation. For the latter, the “quadrant þ two” method is often used, which takes six measurement locations into account, one for each quadrant around the target location plus two additional stations which are closest to the target location. The IDW method reproduces the measurements

GIS for Hydrology

61

exactly. A geostatistical method similar to the inverse distant weighting method is kriging (Matheron, 1971), but instead of using the Euclidian distance to define the weights for interpolation, geostatistics uses the semivariogram. Geostatistical methods are based on the assumption of autocorrelation, expressed in Tobler’s first law of geography: “Everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). Autocorrelation can be described as the dissimilarity between observations as a function of the space or time lag between them. The empirical (or experimental) semivariogram (Cressie, 1991) is half the averaged squared difference between the components of data pairs grouped by distance classes (called lags). It is a function of both distance and direction and can therefore be computed omnidirectional (isotropic) or direction dependent (anisotropic) (Goovaerts, 2000). A variogram model (for example, a linear, spherical, exponential or Gaussian model), also called a theoretical variogram, can be fitted to the calculated semivariances. These models provide parameters called “nugget” (variance at zero distance, associated with the measurement error), “range” (distance beyond which the data is not autocorrelated anymore), and the “sill” (variance limit that is reached at large lag distances). Kriging uses the spatial autocorrelation structure of the data characterized by the variogram model to calculate weights of each station for interpolation. There are different types of kriging: simple kriging is used when the mean of the variable is known and constant, ordinary kriging is used where the mean is constant but unknown (one of the most used and robust methods) and universal kriging is used where there is known to be drift in the mean of the data (no first-order stationarity) (Webster and Oliver, 2007). Stationarity means that the value is only determined by the distances to the observation and not by the spatial location (no spatial correlation with the coordinates). Additionally to these mentioned point kriging methods, block kriging could be used to generate more smoothed surfaces, interpolating an area around an unsampled point rather than the estimated exact value of an unsampled point. Another advantage of using kriging for interpolation is an automated uncertainty computation for the estimated values. All these methods are univariate interpolation methods, using only data of the variable of interest (in our case precipitation) for the interpolation. The spatial patterns of the interpolated precipitation depend on the interpolation method and the inherent variability of the measurements, ignoring cause and effect relationships. Precipitation patterns, for instance, significantly depend upon topography. Thus, it is useful to integrate additional parameters that correlate with precipitation and are easier to measure into the interpolation process. For precipitation, the combination with elevation data enhances the accuracy of the interpolation results (Goovaerts, 2000). The regression-based IDW approach first calculates a regression equation of precipitation and elevation. For each measurement location, the residue of the regression to the measurement is calculated. The interpolated value is then computed from the sum of the precipitation derived from the regression equation plus the interpolated residues using the IDW method (Mauser and Schädlich, 1998). This technique reproduces the measurements exactly, it contains a cause-and-effect relationship explaining the patterns and it retains the simplicity of the IDW method. High-resolution ground-based radar data is also used as a secondary variable, but also temporally aggregated rainfall patterns from satellite-based systems (e.g., TRMM data) can be helpful, particularly in regions with a sparse measurement network (Wagner et al., 2012). The secondary variables can also be included by multivariate regression or kriging methods (e.g., ordinary cokriging, kriging with an external drift or regression kriging). A detailed overview over the geostatistical theory and methods can be found in Cressie (1991), Goovarts (1997), or Webster and Oliver (2007). In general, geostatistical methods using secondary variables produce smaller prediction errors than univariate methods (Goovaerts, 2000). Jewell and Gaussiat (2015) showed that external drift kriging is the preferred method for combining rain gauge and radar data. The significance of spatial precipitation patterns produced by different methods for runoff estimation depends on several factors such as the size of the watershed or the character of the precipitation event. For instance, the sensitivity to spatial precipitation data is enhanced in urbanized catchments, for small-scale convective events or mountainous regions, while it is dampened in larger catchments (Segond et al., 2007). Many of these interpolation methods are already implemented in the toolsets of GIS software packages or are integrated into hydrological models. Li and Heap (2014) provide a review of spatial interpolation methods applied in the environmental sciences. Specialized software packages for precipitation interpolation like PRISM (Parameter-elevation Relationships on Independent Slopes Model) are also available (Daly et al., 1994).

2.04.3.3

NRCS Curve Number

One of the most commonly used methods to estimate the volume of surface runoff for a given rainfall event is the Natural Resource Conservation Service Curve Number (NRCS-CN) method, formerly known as the Soil Conservation Service Curve Number (SCS-CN) method. The NRCS curve number is an empirical number depending on the ability of soils to infiltrate water, land use, and the soil water conditions at the beginning of a rainfall event. It is a dimensionless number ranging between 30 and 100 indicating the runoff potential of a surface (high values correspond to high runoff potential). To account for the infiltration characteristics of soils, the NRCS method distinguishes four hydrologic soil groups: soils with high, medium, low and very low infiltration rates (A, B, C, D), corresponding to low runoff potential (A) and high runoff potential (D). Assignment to one of these four hydrologic soil groups should be based on infiltration measurements. However, typically soil texture information derived from soil maps is used. Curve numbers for many different land cover types are provided in tables for the four hydrologic soil groups. Runoff is also affected by the soil moisture conditions at the beginning of the rainfall event (antecedent moisture condition, AMC). Therefore, an adjustment of the curve number for normal conditions (AMC II) to dry conditions (AMC I) or wet conditions (AMC III) can be applied with provided factors that are multiplied by the curve number. If soil moisture measurements are not available, the AMC can be estimated from the season and the 5-day antecedent precipitation. A complete description of the method can be found in the National Engineering Handbook Hydrology from USDA (2004). Enhancements of this method can be found in the literature; changes in the integration of the AMC in particular lead to improved runoff predictions (Michel et al., 2005; Mishra et al.,

62

GIS for Hydrology

2004). This method is implemented in many hydrological models. Although it is per se not a specific GIS method, the NRCS-CN method lends itself very well for use in GIS systems, as GIS provides the tools to compute spatially distributed curve number maps.

2.04.3.4

Pedotransfer Functions

Soil hydraulic parameters are crucial input parameters for any hydrological modeling study, but they are often not available from soil maps and the measurement of soil hydraulic parameters is very time consuming and costly. Consequently, PTFs are typically used to predict hydraulic parameters from already available soil properties such as soil texture information, bulk density and organic carbon content (Wösten et al., 2001). Most PTFs are developed to predict soil hydraulic properties such as hydraulic conductivity or water retention. However, PTFs to estimate soil chemical, biological and mechanical properties also have been developed (McBratney et al., 2002). The two main methods to derive PTFs are statistical regression models (linear and nonlinear models) and data mining/data exploration techniques (e.g., artificial neural networks, regression trees) (Vereecken et al., 2010). Their development is based on large soil databases to derive the empirical relationships linking the basic soil properties to hydraulic properties. Based on the type of data used to derive the PTF, Wösten et al. (1995) distinguishes between class PTFs (based for example on soil texture classes) and continuous PTFs (based for example on particle size distribution or clay content and bulk density). The most important properties described by a PTF are water retention curves (also called moisture-characteristic curves, relation between pressure head and water content) and hydraulic conductivity curves (relation between hydraulic conductivity and water content). Point estimation PTFs predict water retention at defined water potentials (for example, at field capacity or permanent wilting point to predict available water content) or saturated hydraulic conductivity (Ksat) and can be found for example in Cosby et al. (1984) or Saxton et al. (1986). With the development of analytical approximations of water retention curves and hydraulic conductivity curves by Brooks and Corey (1964), Campbell (1974), and van Genuchten (1980), so-called parametric PTFs can be used to estimate the parameters for these models (Rawls et al., 1982; Schaap et al., 1998; Vereecken et al., 2010). The parametric approach is usually preferable, because it yields a continuous function of water retention and hydraulic conductivity and can therefore be used directly in soil-water transport models (Minasny et al., 1999). Because PTFs are based on empirical relationships, the choice of a particular PTF should be based on the similarity of soils used to derive the PTF to the soils in the given area of investigation. Extrapolation to other soils is not recommended (Wösten et al., 2001). As spatial information on soil hydraulic parameters is essential for hydrologic modeling, GIS systems are particularly suitable to derive these spatially distributed parameters and analyze them in their spatial context. Romano and Chirico (2004) for instance discussed the role of terrain analysis in using and developing PTFs.

2.04.4

GIS and Hydrological Models

Hydraulic measurements and experiments flourished during the 18th and 19th centuries, but quantitative hydrology was still immature at the beginning of the 20th century (Chow et al., 1988). Gradually hydrologists replaced empirical relationships with rational analysis of hydrological problems. Chow et al. (1988) listed important examples of this development towards process understanding such as (i) the development of a physically based model for infiltration (Green and Ampt, 1911), (ii) the frequency analysis of flood peaks and water storage requirements (Hazen, 1914), (iii) the unit hydrograph method to transform effective rainfall into direct runoff (Sherman, 1932), (iv) the infiltration theory (Horton, 1933), (v) the extreme value law for hydrologic studies (Gumbel, 1941), (vi) the description of drainage basin form (Horton, 1945), and (vii) the rescaled range time series analysis (Hurst, 1951). With the advent of computers, more complex models were developed and are used on even larger scales. GIS technology helps to provide, organize, manipulate, and communicate large quantities of spatial data for complex hydrological models.

2.04.4.1

Integrating GIS With Hydrological Models

Four different approaches are widely used to integrate GIS with hydrological models (Sui and Maggio, 1999): (a) Embedding GIS in hydrological modeling: In this approach, GIS data structures are implemented in hydrological models and are tailored to the requirements of the particular model. This gives the model developers the most flexibility for system design, but usually the programming effort for these GIS packages is quite complex and the data management and visualization capabilities of these packages are most often inferior compared to those of commercial GIS software packages (Sui and Maggio, 1999). (b) Embedding hydrological modeling in GIS: Commercial GIS software vendors have implemented GIS modules that can be used for certain hydrological modeling needs. The main functionality oftentimes focuses on the preparatory processing steps needed for hydrological analyses (e.g., DEM processing, flow paths, watershed delineation) rather than on implementing full hydrological modeling capabilities. (c) Loose coupling of GIS and hydrologic models: This is probably the most common approach to integrate GIS and hydrologic modeling (Hartkamp et al., 1999). Standard GIS packages are used for preprocessing of the data, which are then transfer in a suitable format to the hydrological model. The GIS takes care of issues regarding scale, coordinate system, data structure and format conversion to produce the required input for the hydrological model. The modeling procedure is conducted independently from the GIS and the results are imported

GIS for Hydrology

63

into the GIS package for postprocessing, visualization, analysis and report generation. The great advantage of this approach is that specialized personnel on different computer systems can handle both tasks, GIS processing and hydrological modeling, independently (Dixon and Uddameri, 2016). This can be very effective, but the precise definition of the data requirements or specifications of the data format and the communication of the uncertainties of the transferred data sets are an absolute prerequisite for such a cooperative work. (d) Tight coupling of GIS and hydrologic models: With the tight coupling approach, the hydrological model is fully embedded within the GIS package, using the same database, thus the user is not forced to leave the GIS software environment to run the model. Already implemented code or additional functionality programmed with built-in scripting languages of the GIS software can be used to code the required model (Dixon and Uddameri, 2016). Sui and Maggio (1999) provide a review on the practices, the problems, and the prospects of hydrological modeling based on GIS and discuss a large number of studies using these four general approaches or combinations thereof.

2.04.4.2

Taxonomy of Hydrological Models

Hydrological models can be distinguished by their concept of handling randomness (deterministic or stochastic), spatial variation (lumped or distributed, space-independent or space-dependent) and time variation (steady-flow or unsteady-flow, timeindependent or time-correlated). This taxonomy of hydrological models was offered by Chow et al. (1988). Consequently, five sources of variation can be considered in a hydrologic model: randomness, three space dimensions and time (Maidment, 1993b). A stochastic model has probabilistic or random variables with no fixed values at a particular point in time and space. These variables are described by probability distributions (Chow et al., 1988); for example, the occurrence and intensity of precipitation during rainfall events can be highly variable in space and time and can be represented by a random field (probability distribution of a variable at every point in time and space). Depending on whether or not these random variables influence each other in the space or time domain, they can be classified as space-independent or space-dependent, time-independent or time-correlated, respectively. For example, during a rainfall event, precipitation occurring in adjacent cells or time steps is connected; as a result this variable is categorized as space-dependent and time-correlated. A deterministic model does not consider randomness and produces a single value for the variable (Chow et al., 1988). When temporal variability in flow rates is considered, the deterministic model is classified as unsteady-flow, otherwise as steady-flow (Chow et al., 1988). Concerning the spatial resolution, hydrological systems are conceptualized as lumped, semidistributed, or fully distributed models (Fig. 3). In lumped models, parameters are spatially averaged, sometimes to single values without any space dimensions (e.g., uniform precipitation input for a whole catchment). A distributed model accounts for the spatial variability of the parameters by discretizing the spatial domain into smaller subunits (typically in a uniform grid) and by performing their calculations on the smaller subunits. This data can then be accumulated to analyze the results for the whole catchment (Pullar and Springer, 2000). Semidistributed models typically account for spatial variability at the level of subcatchments and ignore the spatial variability within. Thus, they are lumped at the subcatchment scale and provide spatial information at the catchment scale. Distributed models require a large amount of data. Particularly in this regards GIS makes a big contribution to hydrological modeling, by facilitating and solidifying the treatment of spatial variation (Maidment, 1993b). Empirical models (based on the observed pattern of a variable) tend to be lumped, conceptual models (conceptual description of a catchment, no emphasis on physics) tend to be semidistributed, and physically based models tend to be fully distributed (Wainwright and Mulligan, 2013). An example for a very popular lumped model approach to describe a hydrological system is the unit hydrograph approach (Sherman, 1932). This approach calculates the direct runoff hydrograph from effective rainfall on the watershed. The system

Lumped model

Fig. 3

Spatial concepts of hydrological models.

Semidistributed model

Fully distributed model

64

GIS for Hydrology

Landuse

Soil overlay

Slope

HRUs N 0 0.5 1

2 Kilometers

Fig. 4 Definition of hydrological response units (HRUs) as areas with unique hydrological properties, here derived from land use properties, soil properties, and slope classes.

response function is the watershed response to a unit volume of rainfall of constant intensity and uniformly distributed over the watershed. The effective rainfall is the amount of precipitation leading to a direct runoff response at the gauge. Details on the unit hydrograph approach can be found in Ramirez (2000). A typical example for a semidistributed conceptualization of the hydrologic space is the widely used concept of hydrological response units (HRU). This concept was proposed by Flügel (1995). HRUs are defined as areas with homogeneous hydrological properties (see Fig. 4). Thus, each HRU can be described with a unique set of hydrological parameters. HRUs are derived from natural properties of the land surface and are defined as areas with homogeneous soil, land use, and topography features. Sometimes geological properties are also taken into account to delineate HRUs. Climate properties within an HRU are also assumed to be constant. The HRU concept lends itself very well for GIS analysis as it allows constructing HRUs applying GIS techniques such as overlay or merge functions. Examples for this approach are models such as PRMS (Flügel, 1995), J2000 (Krause, 2002), SWAT (Arnold et al., 1998; Arnold and Fohrer, 2005) or PREVAH (Viviroli et al., 2009). Based upon the previously described topographic analysis, subwatersheds are defined which contain a set of HRUs. A topology of subwatersheds is needed to describe the flow path to the outlet. Within a given subwatershed spatial dependencies are typically ignored as are man-made structures such as reservoirs. Man-made features, however, may affect hydrological processes significantly. Today’s drainage networks are often impacted by man-made structures. These structures may not be represented well by DEMs. As a result some models (e.g., Geo-MHYDAS) provides tools to address the effect of landscape features upon the water flow paths, such as man-made structures, which are not necessarily ordered along the slope (Lagacherie et al., 2010). However, stepping up complexity of the representation of HRUs does not necessarily result in increasing accuracy or performance. In fact, the overlay of a large amount of information layers may result in a strong increase in the number of HRUs with rather unrealistic forms, features and topologies. Sanzana et al. (2013) proposed and tested various mesh generation tools to improve the HRU representation in distributed hydrological models. Fully distributed hydrological models are often based upon a grid-based discretization of space. The simple data format, the inherent neighborhood relationships, the topology defined by the DEMs and the availability of gridded input data derived from remote sensing or models (e.g., general circulation models) are some of the main reasons for the popularity of this discretization concept. Examples of models build on this raster-based approach are plentiful such as PROMET (Mauser and Schädlich, 1998; Schneider, 2003), ANSWERS (Beasley et al., 1980), SHE (Abbott et al., 1986a, 1986b), TOPMODEL (Beven and Kirkby, 1979), and AGNPS (Young et al., 1989). While the simplicity of the data structure is an asset, the pixel resolution and shape do not necessarily correspond to the landscape features such as land use boundaries, soil units or agricultural fields. Particularly for grids with coarse spatial resolution, subscale heterogeneity within a pixel is a major problem as each pixel is typically represented with a unique parameter set. To circumvent this problem, the geo-complex approach of utilizing subscale classes within each raster cell (Fig. 5) combines the advantages of HRUs with the raster-based approach (Mauser and Bach, 2009).

GIS for Hydrology

Low spatial resolution (e.g.,1000m)

High spatial resolution (e.g., 30m) Landuse

Slope

Class 1: 3%

Conversion

Corn, sl, 0-3˚ Wheat, sl, 0-3˚

Class 2: 24% Class 3: 33% Class 4: 40%

Aspect Fig. 5

2.04.5

Soil

65

Grass, cl, 4-6˚ Forest, ul, 3-4˚

Elevation

Geocomplex concept as a hybrid between HRUs and raster-based approaches.

State and Perspectives of GIS Applications

There are many ways to organize and structure the application categories of GIS and hydrology: (a) According to discipline: Surface hydrology, groundwater hydrology, water resources management, waste and storm water management, flood plain management, water quality analyses, water resources monitoring and forecasting and engineering all have their particular modes of GIS applications. Entire books have been written to address GIS applications for hydrological and water resource systems modeling, urban facilities management, and decision support. Johnson and LaBadie (2008) provide a comprehensive overview of the various GIS applications in the different hydrological subdisciplines. (b) According to principal use: Hydrological inventory, monitoring of the hydrological status, planning and design of infrastructure, forecast of hydrological processes, early warning systems require a specific subset of GIS methods. (c) According to principal user: The use of GIS technologies has evolved rapidly from the realm of experts, engineers and scientists to the realm of a widespread public. GIS applications in hydrology are particularly important in (i) administration, private enterprises and industry, (ii) in science, and (iii) for public use particularly as web-based applications. The immediate relevance of water as a fundamental necessity of human livelihood, its inherent risk potential and the spatiotemporal nature of hydrological processes results in a large diversity of GIS applications in hydrology. With the rapid development of computer technology, internet applications and communication technology (especially smartphones), new opportunities and challenges emerged to collect, analyze and communicate up-to-date spatial information to the stakeholders. Spatial information is no longer static as in printed maps, but rather a continuum of time and space, which is particularly essential when addressing highly dynamic processes such as water fluxes. Moreover, today’s communication is not merely the unilateral provision of information from the expert to the user. Stakeholder participation is essential, particularly when it comes to the basics needs of human livelihood. Thus, interaction and feedback with the stakeholders is needed to provide timely, updated and relevant information and to foster appropriate actions. This results in a paradigm shift from the earlier stakeholders’ role as recipients of information to a new role, in which the stakeholders are an integral part of the information generation, dissemination and use continuum. Professionals in administration, industry, and science have used and developed hydrological GIS applications for an extensive amount of time. However, the rapid development of information and communication technology (ICT) has had a significant bearing on the mode of decision making and communication, which requires inclusion of all stakeholders. ICT technology not only facilitates fast dissemination of information, it has also led to the need for a more inclusive and participatory approach. Today, stakeholders not only need to be informed, they need to be an integral part of the decision-making and implementation process, particularly if this process relates to a fundamental basis of their livelihood. Citizens today are not only mere recipients of information, they are an active part of the process to generate information, to validate the data and to make the appropriate decisions. In many parts of the world, the changes in patterns of water supply and demand, which come about in the wake of socioeconomic changes as well as climate/environmental change, result in an increasing potential for conflict regarding the water resources. Managing extreme situations in terms of flood as well as drought requires appropriate participation of the stakeholders. Integrating web-based GIS applications with ICT provides an approach to mitigate conflicts of interest and to facilitate appropriate stakeholder participation. GIS technologies assume a central role in this process not only since they can inform about the actual state of water resources, but they also serve as an education tool to better understand spatial dependencies such as upstream and downstream relationships, to facilitate stakeholder participation by providing data, observations and measurements and to promote a sense for responsibility and ownership (Fienen and Lowry, 2012). The widespread use of GIS in the public is particularly evident with regards to applications such as navigation systems, Google Maps or Google Earth. The availability of these services has changed the perception of space. Navigation systems for instance focus the attention on the next crossing or exit rather than the overall spatial dependencies of the chosen route. On the other hand, up-to-date data on the current state of the earth and on current processes are

66

GIS for Hydrology

accessible any time through services such as Google Maps or Google Earth. The importance of local properties and their spatial connectivity is particularly obvious for flood risk assessment. Against this background, it is rather essential for professional GIS users and citizens alike to be informed about GIS services and data pertaining to water resources and hydrology. The following section presents and discusses some of the main GIS service providers in hydrology ranging from global and continental services to national regional and local issues. It provides an overview of the main service and data providers. Owing to the vast amount of hydrological data and service providers and owing to the rapid development of new approaches, the following account must remain incomplete but may serve as an initial point of reference. Thus, after presenting some general remarks on hydrological spatial data infrastructures, data sources and services are discussed according to disciplinary focus.

2.04.5.1

Spatial Data Infrastructure in Hydrology

During the infancy of GIS technology, most GIS solutions were desktop-based with local data storage utilizing mainly proprietary data and software. The specific data requirements within the field of water resources management led to the development of specialized databases. These were either not, or only weakly, connected. The increasing need for spatial data by many different agencies and stakeholders required the development of an integrated data infrastructure where disciplinary application, GIS software and server solutions work seamlessly hand in hand. This need for integration was supported by national and international legislation mandating the establishment of a Spatial Data Infrastructure (SDI), which integrates all necessary components from legal framework to quality control, from data access to norms and standards. The European Union has passed legislation to establish the Infrastructure for Spatial Information in the European Community (INSPIRE). INSPIRE is an EU initiative to establish an infrastructure for spatial information in Europe that is geared to help to make spatial or geographical information more accessible and interoperable for a wide range of purposes supporting sustainable development (European Union, 2007). The INSPIRE directive provides a framework for a spatial data infrastructure (SDI) to support the environmental policies of European Community. INSPIRE makes use of the SDI established by the member states. Hydrography is one of the 34 spatial data themes. Access to the INSPIRE SDI is provided by the INSPIRE Geoportal (European Commission, 2017b). Similar SDIs are available or are set up in many other regions of the world. In the United States, for instance, the United States Geological Survey (USGS) maintains a national water information system (NWIS) (USGS, 2017g). Australia is in the process of setting up the Australian Water Resources Information System (AWRIS) (Australian Government, 2017). Worldwide statistical information on water resources is provided by the United Nations Food and Agricultural Organization (FAO) in its AQUASTAT (FAO, 2017) homepage. Although AQUASTAT focuses mainly on statistical data on a national scale, it also provides maps and spatial data. SDIs are also set up at the local and regional level, in cities, municipalities, federal states and nations. Fig. 6 provides an example of the basic elements of an SDI as implemented by the Bavarian Environmental Protection Agency (Kazakos et al., 2012). A large diversity of stakeholders, agencies, disciplines and organizations is involved in developing, using and maintaining a water SDI. Hydrology is an interdisciplinary science, which addresses not only natural sciences but also social sciences and engineering. As a result, access to data and services is rather fragmented and distributed, involving many different entities. These range from local entities such as communities and cities, to federal organizations, nations, and to international organization. The diversity of stakeholders ranging from administrations to industry, from private businesses to nongovernmental organizations adds to the complexity and diversity of hydrological applications and data. The spheres of responsibility assumed by these organizations or institutions often overlap. Hydrologically relevant information therefore cannot be found through one particular portal. Some of the main portals will be briefly presented following.

Metadata

Geodata

Geodata services

Web services

Coordination and monitoring mechanisms

Relevant legal framework

Agreements on access and use

Norms and standards (specification) Fig. 6 Outline of a Geodata infrastructure as utilized by the Bavarian Environmental Protection Agency. Adapted according to Kazakos, W., Reineke, A., and Heidmann, C. (2012). GDI Bayern-Geodateninfrastrukturen in der Wasserwirtschaft Bayern. Presented at the EnviroInfo, pp. 783–789.

GIS for Hydrology

67

Several national and international portals exist, which provide a good starting point for hydrological information and data. A fundamental requirement to make the data meaningful for different users is the availability of standardized metadata explaining the main features of the data.

2.04.5.1.1

International portals

Worldwide hydrological data on the scale of individual nations are particularly provided by international organizations such as the United Nations (UN) within its Food and Agriculture Organization (FAO), the World Meteorological Organization (WMO), World Health Organization (WHO), and UNICEF. The importance of water for agricultural production as well as for health are the main reasons for the UN’s commitment to water research and spatial data. These activities are particularly relevant for monitoring and documenting the state of achievement of the Millennium Development Goals (MGD). Web-based maps provide a fast overview about essential water-related data. Hydrological data are particularly provided within the UN’s Hydrology and Water Resources Programme (HWRP) (WMO, 2017a) but other programs also provide hydrologically relevant information, services and research. WMO’s home page (WMO, 2017b) provides a good starting point for an overview of the different programs. The AQUASTAT homepage (FAO, 2017) of the FAO provides access to a large range of water-related GIS data sets and statistics pertaining particularly to agricultural water use. Water is one of the key prerequisites for agricultural production. Agriculture has the highest water demand worldwide. Thus, developing, providing and supporting expertise in water resource management is essential for FAO’s mission to help eradicate hunger, to work towards elimination of poverty, and to support sustainable management and utilization of natural resources. AQUASTAT started, with the aim to contribute to FAO’s goals through the collection, analysis and dissemination of information related to water resources, water uses and agricultural water management, with an emphasis on countries in Africa, Asia, Latin America, and the Caribbean. Beyond agricultural water use, the UN-Water web site provides global maps of key water indicators. The key water indicators portal provides a quick overview of key water indicators on a national level (United Nations, 2017). The scope of UN-Water’s work encompasses all aspects of freshwater, including surface and groundwater resources and the interface between fresh- and seawater. A dedicated system for groundwater data is provided by the Global Groundwater Information System (GGIS) (IGRAC, 2017a), which is supported by the International Groundwater Resource Assessment Centre (IGRAS). IGRAC is a UNESCO facility working under the auspices of the WMO. GGIS is an interactive, web-based portal to groundwater-related information and knowledge. Its main purpose is to assist in collection and analysis of information on groundwater resources and the sharing of this information among water experts, decision makers and the public. GGIS modules are structured around six themes: transboundary groundwater, global country data and regional maps, groundwater monitoring, managed aquifer recharge, project-related information and small islands. With its map-based viewer with an underlying database and web-based GIS functionality, this system allows storing, visualizing, sharing and analyzing geospatial data in a systematic way. A global groundwater-monitoring network (GGMN) (IGRAC, 2017b) is organized by IGRAS to support collaboration between countries and to achieve global coverage of groundwater monitoring data. Similar to the GGMN, the Global Runoff Data Centre (GRDC) (Bundesanstalt für Gewässerkunde, 2017) provides access to runoff data globally. As mentioned earlier, GRDC maintains not only a global runoff database and river discharge data, but also provides geospatial data products such as GIS data on rivers and watershed boundaries. GRDC also operates under the auspices of the WMO. Research institutions such as the International Water Management Institute (IWMI) maintain and support a Water Data Portal (WDP) (International Water Management Institute, 2017). The WDP provides access to a large amount of data related to water and agriculture. This portal contains meteorological, hydrological, socioeconomic, spatial data layer, satellite images as well as hydrological model setups. The data in the WDP, both spatial and nonspatial, are available for download by users including academia, scientists, researchers, and decision makers. Within its AQUEDUCT project, the World Resource Institute (World Resources Institute, 2017a) provides hydrological data embedded in GIS functionality related to measuring, mapping, and understanding water risks around the globe. The water risk atlas provides information on water risks pertaining to current and future conditions. Water risks are described in different dimensions such as physical risks, regulatory, and reputational risks. The Aqueduct Global Flood Analyzer (World Resources Institute, 2017b) provides information on flood risks at the country, state, and river basin scale across the globe using a web-based interactive platform. It aims to raise the awareness about flood risks and climate change impacts by providing open access to global flood risk data. By employing climate and socioeconomic change scenarios, it helps decision makers to assess flood damage risks and supports strategic planning. The Global Water Forum’s web site (UNESCO, 2017) provides an overview of international data providers for hydrological data. With respect to meteorological data, the Global Observing Systems Information Center Portal (GOSIC) (NOAA, 2017a) serves as a clearinghouse providing convenient, central, one-stop access to Global Climate Observing System (GCOS) data and information through the GCOS essential climate variables (ECVs) data and information access matrix. GCOS addresses the total climate system including hydrological components and provides access to data, metadata and information from GCOS and partner observing systems. An overview of the climate variables provided by GCOS is given on the ECV Data Access web site (NOAA, 2017b). A large range of meteorological measurements is provided through this website, as are key hydrological parameters (e.g., precipitation, river runoff, water use, soil moisture, and groundwater) and parameters influencing water fluxes (especially those pertaining to vegetation properties). Basic hydrological data of land use, topography, and soil properties is provided by a range of different agencies. Suitable starting points for these datasets are presented in the data section of this article. Supranational bodies such as the European Climate Assessment & Dataset project also provide meteorological data (ECA&D, 2017). An example of a supranational effort for a common water data infrastructure is the Water Information System for Europe

68

GIS for Hydrology

(WISE) (European Commission, 2017c) with its main components, the European Environment Agency’s Water Data Centre (European Environmental Agency, 2017), the European Commission’s Joint Research Centre (JRC) water portal (European Commission, 2017d) and EUROSTAT (European Commission, 2017e), which collects and disseminates water statistics, and provides input in the development of the GIS part of WISE. Many watersheds are documented using dedicated spatial information systems. In particular, the European water framework directive (European Union, 2000) was instrumental to establish watershed information systems. Many countries and states have implemented web-based information systems on surface water, groundwater, freshwater, wastewater, and special issues concerning the water framework directive. Flood risk management is particularly essential for regional and urban planning. The European community has established the European Flood Awareness System (EFAS). EFAS is the first operational European system for monitoring and forecasting floods across Europe. It provides complementary, flood early warning information up to 10 days in advance (European Commission, 2017f).

2.04.5.1.2

National portals and services

On the national level data on weather and climate are provided by the national weather service. In Germany this is the German Weather Service (DWD), which provides not only measured meteorological data for the DWD network on a range of climatological parameters including phenology as well as soil moisture measurements with temporal resolution ranging from hourly to annual data but also calculated parameters (e.g., potential and real evapotranspiration, soil moisture, soil temperature, frost depth) as well as spatial patterns of key hydrological variables with resolutions of up to 1 km (Deutscher Wetterdienst, 2017). Access to the data is publicly available via FTP server (Deutscher Wetterdienst, 2017b). In addition to measurements provided by the DWD, the KOSTRA data set is particularly valuable for water resource management and hydrology. This data set is an example of using GIS technology to address stochastic processes in their spatial context. KOSTRA (Koordinierte Starkniederschlags-Regionalisierungs Auswertungen) provides regionalized extreme precipitation data for a range of recurrence frequencies. The first version KOSTRA dataset (KOSTRA-DWD-2000) was released in 2005 and utilized precipitation measurements from 1951 to 2000. The current data set (KOSTRA-DWD-2010) includes measurements from 1951 to 2010. Extreme precipitation data is provided as precipitation height (mm) or precipitation yield (l/(s ha)) with precipitation durations ranging from 5 min to 72 h and recurrence frequencies ranging from 1 year to 100 years. The data is provided as a grid with 67 km2 resolution covering all of Germany. This data set is particularly useful for applied tasks, such as planning and engineering tasks of sewage systems, flood control structures and climate change adaptation. Generating, analyzing, and disseminating climate model results represent another example where GIS technology is of key relevance. Similar services are provided by many national agencies such as NOAA’s National Center for Environmental Information (NOAA, 2017b). Measurements networks maintained by national agencies are augmented by data measured by federal state agencies, specialized measurement networks such as agro-meteorological services or private bodies. Examples for these are in Germany the Bayerische Landesanstalt für Landwirtschaft (LfL, 2017), Agrarmeteorologie und Hydrometeorologie Rheinland Pfalz (MWVLW-RLP, 2017) or the “Regionale Klimainformationssystem Sachsen” (ReKIS, 2017). The ELWAS (Electronic Water resource management system) is an example of a GIS system used mainly by state and local authorities, water management associations, and experts in water resources management in the state of North Rhine-Westphalia (MUNLV-NRW, 2017a). ELWAS provides GIS functionality without maintaining their own dedicated database. Instead, interfaces to existing dedicated databases are established, which ensures actuality of the data provided from different sources. Fig. 7 provides an overview of the data content provided by the ELWAS system. ELWAS-Web (Intranet and Internet access)

Waste water

Industrial waste water Municipal sewage Precipitation Comprehensive analysis

Groundwater

Surface water

Potable water

Water framework directive (WFD)

Water supply

Inventory

Quality monitoring

Monitoring and measures of the WFD

Quantity Ground water quality Ground water level

Quality Swimming waters Structural quality of water bodies Facilities

Fig. 7 Overview of ELWAS Information System. Modified from MUNLV-NRW (2017). https://www.umwelt.nrw.de/fileadmin/redaktion/Bilder/Content/ Grafik_Datensysteme.TIF (accessed on January 29, 2017).

GIS for Hydrology

69

On the national level, flood forecast and warning systems are implemented, such as the flood information portal in Germany (LUBW und LfU, 2017) or the flood information service in the UK (Environment Agency UK, 2017). The relevance of river stage information and flood warning to a public has led to many web sites and apps (e.g., Floodwatch, FloodSpot, Meine Pegel) that are used to communicate water levels (Bessiere, 2017; LANUF, 2017; WSV, 2017). Particularly for urban areas, understanding the flood risk at different recurrence rates is essential. Flood protection involves not only experts but also citizens. Thus, interactive maps such as the flood risk maps for the city of Cologne provide valuable information for experts and citizens alike (STEB, 2017). Concerning flood protection, appropriate action in the case of a flood is important and involves many. Efforts, which must be implemented well before a flood event (e.g., construction of polders, retention basins, and dikes), are equally important as appropriate actions during a flood in order to minimize the potential damage. To this end, risk assessment and avoidance tools such as the flood risk pass aims at educating the public about risks arising from floods and about measures for flood protection (HKC, 2017). All of these approaches require the integration of GIS technologies, hydrological data and information and communication technologies. Private organizations are also involved in providing GIS-based hydrological services. One example is the riverbasin information system FluGGS developed by the Wupper water authority (Wupperverband, 2017). Besides providing essential information on water resources, water use, status on water quantity and quality, water infrastructure, ecology and much more, dedicated waterrelated information with particular relevance to the public is provided, such as walking and bicycle trails, access points to the river and information for water sports (e.g., canoeing). Actuality is an essential asset of the attractiveness of these information systems. Integration of basic water-related geo-data, ranging from watershed information to the location of essential infrastructure with current information derived from sensor networks is essential to provide up-to-date information. The required data may pertain to complex tasks such as flood management, but also to less far-reaching decisions, such as whether or not a canoe trip or swimming trip to the lake is feasible. Providing information to the public is more than an appreciated service, it is also a means to encourage stewardship to protect the common resources. Smart phone apps are an essential communication tool in this respect.

2.04.5.1.3

Observatories

The need for interdisciplinary integration is particularly evident with respect to global change issues. Ensuring water security and sustainable use of water resources is one of the critical challenges arising for global change. Water resources are intricately linked to ecosystem function and services. Long-term data are needed and particularly valuable to understand and diagnose global change effects. Provision of long-term data on entire ecosystems is an important mission of long-term observatories. One of the first initiatives to establish and maintain Long Term Ecological Research (LTER) is the LTER network created by the National Science Foundation (NSF) in 1980. This network is the largest and longest-lived ecological network in the United States. It provides the scientific expertise, research platforms, and long-term datasets necessary to document and analyze environmental change (LTER, 2013). The tasks of environmental observatories go far beyond the measurement and provision of environmental data. They must provide methods and tools for data storage, data description with metadata, and data dissemination. Quality assurance and quality control is essential, as is data analysis and presentation. Horsburgh et al. (2011) describe the essential components of environmental observatory information systems. These components are shown in Fig. 8 and are implemented in the observatories described in the following. However, the functionality with respect to data analysis often varies greatly. Critical zone observatories particularly adhere to the goal for integrative, interdisciplinary measurement concepts. In 2007, NSF funded the critical zone observatory program (White and Sharkey, 2016), which was followed up by the Soil Transformation in European Catchments (SoilTrEC) program funded by the European Community (Menon et al., 2014). In France, the Network of Drainage Basins (Réseau des Bassins Versants, RBV) was established as a multidisciplinary research network focusing particularly on fundamental and applied studies on streams, rivers and their watersheds. The RBV consists of around 15 observatories. A metadata catalog provides a GIS-based overview on the data available at these observatories. Similarly, Terrestrial Environmental Observatories (TERENO) in Germany have been established as interdisciplinary and long-term research sites in six regions ranging from the northern lowlands to the Bavarian Alps (Bogena et al., 2012). These observatories provide long-term data series of key environmental data, which are needed for analysis and prognosis of Global Change consequences using integrated model systems. The TERENO web site provide a comprehensive overview and access to relevant data using GIS-based graphical user interfaces (Helmholtz Gemeinschaft, 2017). Hydrological data are relevant for a large group of people interested in hydrology and to a large group of stakeholders. The need to access and visualize the vast amount of hydrological data led to the development of dedicated web-based hydrological GIS systems. One prominent example is the HydroDesktop system, which makes use of the CUAHSI-HIS database (Ames et al., 2012). The primary purpose of HydroDesktop is to provide access to hydrologic data and allow for data manipulation and synthesis. This software provides access to data from distributed data services and it enables a diverse group of users ranging from students to faculty and from consultants to K-12 students to operate within a relatively uncomplicated software environment. As an open source, free software application, HydroDesktop is widely available and facilitates growth of a community of users. Swain et al. (2015) reviewed many state-of-the-art open source software projects for water resources web applications in order to characterize their key features and capabilities. The projects were categorized in terms of (i) their GIS components and tools and (ii) web development software. By establishing standards and specifications, the open geospatial consortium provides the fundament for many web applications (OGC, 2017). Particularly services such as web mapping services, web feature services, web coverage services, catalog services, and web processing services, but also the data format standards (e.g., simple features interface standard, geography markup language, keyhole markup language) are essential points of reference for developers to create new software

70

GIS for Hydrology

QA/QC and provenance

Publication and interoperability

Data storage and metadata

Data QA/QC 700 600 500 400 300 200 100 0

700 600 500 400 300 200 100 0

Raw Data

Storage

Metadata

• Standard protocols • Standard formats • Controlled vocabularies

• Loading

• Sites

• Organization

• Variables

• Transaction management

• Methods • Sources

Quality Controlled Data

Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08

DataStore

Data Source 1

Data Source 2

Data Source 3

Common formats and vocabularies

Date

Observation and communication

Discovery and presentation

Sensors • Data visualization and analysis

• Stream gauging • Weather stations • Water quality sampling Telemetry network

Fig. 8 Components of an environmental observatory information. Figure from Horsburgh JS, Tarboton DG, Maidment DR, Zaslavsky I (2011) Components of an environmental observatory information system. Computers & Geosciences 37: 207–218. http://dx.doi.org/10.1016/j.cageo.2010.07.003, with permission.

applications. Swain et al. (2015) limited their review on free and open source software (FOSS) projects. They identified 45 water resources and earth science web apps that were developed in the decade from 2004 to 2014 ranging from prototype or demonstration systems to full-featured data and modeling services. A wide range of water resource–related aspects are addressed by these apps, such as flood warning, water quality, urban planning and ecology; 45% of the reviewed apps were published in the last 2 years of their analysis period. This strong increase in the last years indicates the growing interest in web apps as a medium for communication, public participation, earth science monitoring, and modeling. Concerning the spatial database, the PostgreSQL database with the PostGIS spatial extension was by far the preferred solution, followed by MySQL. A reason for the popularity of PostGIS is its extensive raster data support and the large number of database functions. Raster data are particularly relevant for hydrology and water resources research. Regarding the preferred development environment, JAVA solutions were identified as the clear favorite followed at a considerable distance by PHP web development. While most of the applications target experts as their primary users, the importance of involving a broader audience is increasingly recognized by decision makers, planners, administrators, and scientists. The increasing number of apps to record and communicate place-based hydrological data is evidence for the growing interest in stakeholder participation. GIS applications are particularly useful due to the inherent spatial character of hydrological information whether related to water supply, risk management, water quality, recreation or water infrastructure. The availability of FOSS for GIS system and web applications is essential for the widespread use of these tools. Water resources planning increasingly becomes a bottom up participatory approach, which needs the input of the public as much as that of the experts. ICT based methods are essential to facilitate stakeholder participation. ICT has the potential to be a robust approach to strengthen stakeholder participation (Rinner et al., 2008). Accordingly, web-based planning tools and smartphone applications become increasingly popular. While currently the dominant role of the web is communication through information dissemination, it has also played an important role in public consultation, by providing a forum for targeted questions and feedback (Kelly et al., 2012). Systems such as WRESTORE (Watershed Restoration Using Spatio-Temporal Optimization of Resources) are designed as a webbased, participatory planning tool to engage with watershed stakeholder communities and involve them in using science-based, interactive simulation-optimization approaches (Babbar-Sebens et al., 2015). Fienen and Lowry (2012) describe a crowdsourcing tool to acquire environmental data. The benefit of activating the public in hydrological research is in not only generating valuable data, but also in raising awareness and engagement. Citizen science is particularly important in areas with limited access to data or deteriorating measurement networks. Outreach to the civil society is increasingly recognized as an important component of science projects. After all, most science projects are funded from public sources. Beyond research and teaching, a third mission has been formulated for universities. This mission addresses technology transfer and outreach. The latter are of key importance for both universities and civil society alike (Roessler et al., 2015). Schools play a key role as educational institutions and multipliers of knowledge. International science and education programs such as GLOBE (Global Learning to Benefit the Environment) provide students and the public worldwide with

GIS for Hydrology

71

the opportunity to participate in data collection and the scientific process, and contribute meaningfully to our understanding of the earth’s system (GLOBE, 2017). By providing protocol-based instructions for measuring environmental data, tools to up- and download measurements, a GIS-based data access tool, apps such as the GLOBE Data Entry App and an extensive, worldwide database, a broad audience can participate in environmental observations and research. Concerning hydrology, the GLOBE protocols on atmosphere, biosphere hydrosphere and pedosphere are particularly relevant. The availability of web-based applications and mobile phone technology is a breakthrough in terms of public participation, communication and application of GIS technologies.

2.04.6

Decision Support Systems for Hydrology

The fundamental GIS functionality to analyze and visualize spatial data builds the basis for one of its most relevant applications in hydrology, namely decision support. The multitude of partially conflicting interests in utilizing water requires management and planning strategies, which facilitate the use of water by the respective stakeholders without unduly jeopardizing the use of others. As water is essential for all ecosystem functions, the term stakeholder reaches far beyond the appropriation of water for human use. Spatial Decision Support Systems (SDSS) are particularly powerful to understand the multiple and complex interactions of processes at the land surface and evaluate alternative land and water use strategies (Bareth, 2009). SDSSs are computer-based systems that combine storage, search, and retrieval capabilities of GISs with models and optimizing algorithms to support decision making concerning spatial problems (Pontius and Si, 2015). While all SDSSs must possess the functionality to manipulate geographic information, analyze and report effects of management alternatives and allow users to evaluate these, different SDSSs differ largely in terms of goals, area of application, complexity and targeted user group. Oftentimes the purpose of decision support is already achieved by simply overlaying different types of spatial data and visual analysis of the resulting map. Other applications require a complex model-based analysis of the interaction of natural and societal processes. Flood risk assessment is an example of the first category, while assessing effects of global change upon water resources and their appropriation is an example for the latter category. Many municipalities, cities, provinces or states provide flood maps with different recurrence rates to inform citizens about the flood risk in their respective area (e.g., MUEEF-RLP, 2017; MUNLV-NRW, 2017b; STEB, 2017). While recurrence rates are relevant for long-term decisions such as flood protection measures for buildings, information on the current and forecasted water levels are also provided and essential for decisions on the necessity of immediate action. Data on water quality parameters such as nitrogen content or leaching into the groundwater typically requires a more involved approach, as these depend upon a range of different environmental and societal controls such as aquifer properties, soil characteristics, precipitation patterns and regime, fertilization and plant dynamics. While the vulnerability of groundwater to nitrate pollution may be assessed from more or less time-invariant spatial properties using standard GIS tools (Lake et al., 2003), it is required to use spatially distributed models which integrate hydrological, plant growth and land use management processes (Schneider, 2003) to analyze the temporal and spatial dynamics of nitrogen fluxes. Decisions on water-related issues are made by a large variety of stakeholders ranging from agriculture to households and industries. Agricultural management decisions often have an inherent spatial dimension, whether related to water quantity (e.g., irrigation) or water quality (e.g., fertilization). At the same time, decisions made by one stakeholder affect decisions made by others. Excess fertilizer applications might for instance necessitate appropriate water treatment at water works to provide drinking water of suitable quality to households. The global water partnership (GWP) underlines the necessity of integrated water resources management (IWRM) and defines IWRM as “a process, which promotes the coordinated development and management of water, land and related resources in order to maximize economic and social welfare in an equitable manner without compromising the sustainability of vital ecosystems” (GWP, 2017). IWRM thus requires appropriate approaches and tools to analyze water use, to identify suitable alternatives, to communicate opportunities and threats of current and future water uses and to provide the necessary data and participatory approaches for decision making. SDSSs play a key role in IWRM. One main feature of SDSSs is their capability to address what-if-scenarios based upon modeling the key drivers and processes. Processes and drivers analyzed in SDSSs are not limited to natural sciences but also include societal processes. The need to analyse what-if-scenarios not only arises from the current water use, which already often ignores the limits of sustainability. Global change poses challenges, which cannot be appropriately addressed with trial and error approaches but instead need sound scientific analysis. SDSS are therefore needed which are based on state of the art science and which facilitate cross-disciplinary integration as well as stakeholder participation. The program on global change and the hydrological cycle (GLOWA) aimed at developing new simulation-tools, which will help to realize a sustainable water management under global change conditions (BMBF, 2017). GLOWA was established in 2000 by the German Ministry of Education and Research. A central objective of these projects, which addressed different watershed around the world (Elbe, Danube, Drâa, Ouémé, Jordan, and Volta), was to directly cooperate with local and regional stakeholders and decision-makers, in order to identify scientifically sound adaptation strategies to secure water availability, water quality and equitable allocation. By integrating expertise, data and model components from different disciplines, one key aim of the GLOWA program was to reduce the uncertainty in estimating the impact of climate and land use change and other environmental and socioeconomic impacts upon the change of water resource availability (GLOWA, 2008). Within the GLOWA projects, SDSS were developed and tested. These SDSSs take into account global environmental changes as well as changes of the socioeconomic framework. The projects addressed a range of current scientific core themes such as (i) climate

72

GIS for Hydrology

changes and their effect on the hydrological cycle, (ii) interactions within and between natural and social systems regarding water availability, and (iii) approaches to identify and address conflicting water uses. An overview of the GLOWA projects is available online (GLOWA, 2008), and a full description of the project results is given by Mauser and Prasch (2016), Speth (2010), and Wechsung et al. (2008). The natural and societal processes relevant for future water availability act on rather different scales, reaching from the global scale of climate change to the local scale of decision making at the household or farm level (Fig. 9). To facilitate meaningful interaction of the different components of an SDSS, spatial data at the appropriate resolution is required. Data sources such as satellite data or statistics typically are not available at the given spatial or temporal resolution required for the SDSS. Thus, upscaling and downscaling are essential functions for SDSS for both natural and social science issues. Up- and downscaling involve both spatial and temporal aspects. While spatial downscaling refers to methods used to derive finer-resolution information from coarserresolution data, temporal downscaling refers to the derivation of fine-scale temporal information from coarser-scale temporal data (e.g., daily rainfall sequences from monthly or seasonal rainfall amounts (Trzaska and Schnarr, 2014)). Upscaling can typically be achieved by aggregation of high-resolution data (e.g., by averaging, summation or grouping). Downscaling, however, is often a rather complex problem requiring subscale information on the disaggregation rules. A particular challenge in developing an SDSS for water management is the cross-disciplinary integration of model components. Most integrative projects develop the SDSS by coupling existing models. However, model coupling often poses the inherent problem that a process is described by more than one model component. Transpiration, for instance, can be calculated with a hydrological model component using a Penman-Monteith (Monteith, 1965) approach, or it may be calculated as a coupled process of photosynthesis and transpiration since stomata control regulates both H2O and CO2 flux. Modeling the same flux with different approaches typically results in model inconsistencies.

Coarse scale hydrological models (e.g., 1 km)

Downscaling

Upscaling

Climate drivers (e.g., 10 km)

Fine scale hydrological models (e.g., 30 m) Soil

Hydrology

Soil

Vegetation

Soil

Topography

Social systems

Land

Fig. 9

Concept of up- and downscaling.

Ocean

GIS for Hydrology

DANUBIA components

73

User Interface Atmosphere Atmosphere controller AtmoMM5 AtmoSat AtmoStations

Actor Actor controller Demography Economy Farming Household Tourist Watersupply

Landsurface Landsurface controller Biological Radiation balance Snow Soil Surface

DANUBIA Core system

Developer framework

Groundwater Groundwater controller Groundwater flow

Runtime environment

Rivernetwork Rivernetwork controller

DeepActor Framework

RModel Abstract actor model

Abstract actor

Abstract plan

Abstract action

Fig. 10 Architecture of the DANUBIA system. The ball and socket notation shows the interfaces between the model components. Modified from Hennicker R, Janisch S, Kraus A, and Ludwig M (2016) DANUBIA: A web-based modelling and decision support system to investigate global change and the hydrological cycle in the upper danube basin. In: Mauser W and Prasch M (eds.) Regional Assessment of Global Change Impacts, pp. 19–27, Cham: Springer International Publishing. http://dx.doi.org/10.1007/978-3-319-16751-0_2.GIS for Hydrology

Thus, model integration is preferable to model coupling, but also more complex. Key science aspects of these integrated systems are ensuring the conceptual compatibility among the components (ontology) and specification of the information to be exchanged between components (semantics) (Laniak et al., 2013). Following this basic principle, the GLOWA Danube approach developed a new approach for interdisciplinary model integration, instead of a model coupling approach. The DANUBIA decision support system was developed by integrating 16 simulation models grouped into five main components: Atmosphere, Actor, Landsurface, Groundwater, and Rivernetwork. Fig. 10 provides an overview of the architecture of DANUBIA (Hennicker et al., 2016). Embedded into the DANUBIA DSS are essential GIS functionalities for data handling and analysis such as aggregation and disaggregation, interpolation and display procedures. Design and development principles of DANUBIA followed the principles of object-oriented software engineering. The unified modeling language (UML) was used to define the interfaces of the model components and simulation models. Encapsulating models into components and classes facilitates cooperation among researchers, the analysis of interacting environmental processes, the comparison among different modeling solutions, and the adoption of reproducible research strategies (Formetta et al., 2014). UML was not only used to define the model interfaces, it also was instrumental of negotiate the responsibility of the project partners for their respective simulation models or model components. DANUBIA consists of the DANUBIA components with its simulation models, the DANUBIA Core component with the developer framework and a runtime environment, the DeepActor framework to account for the management plans of different actors implemented in the socioeconomic models, and the user interface which interacts with the DANUBIA core component. DANUBIA is a distributed system, which can run on individual computers as well as on small networks or computer clusters. At runtime, the main components are orchestrated by the runtime environment, which informs (i) the base data component to execute the model initialization, (ii) the land-use component to administer land use changes during run time, and (iii) the time controller which performs temporal coordination and synchronization of the respective model components. The spatial concept of DANUBIA is a raster grid with a resolution of 1 km2 per raster cell. Following the object-oriented paradigm, each raster cell represents a proxel (process pixel) where the relevant processes take place at a defined location. A proxel is an object, which is identified by an identification number. All proxel objects are administered in a table object. Each proxel object carries all universal basic information of the specified grid point such as coordinates, terrain elevation, and land use. Additionally, features of each proxel object are added via specialization by the respective DANUBIA components. A proxel containing agricultural land for instance includes, among others, features describing vegetation and the respective agricultural actor. The parallel execution of the model components requires a dedicated temporal concept. The time interval required by the different model components may be different. While model components describing processes with a daily course run typically on a 1 hourly time step, others require only a daily or even longer time step. Owing to the different time steps of the processes, the distributed execution of the model components on different nodes and the different time needed for computation by the different models, a proper time synchronization of the different model components must be guaranteed. This duty is performed by the time controller, which ensures stable data exchange and data validity concerning the model time of the receiving component. The time controller performs its purpose by cycling through five steps: (i) wait for data, (ii) get data, (iii) compute, (iv) wait for provide, and (v) provide. “Wait for

74

GIS for Hydrology

data” holds the execution of the component or model until all model components have arrived at the respective time step. “Wait for provide” holds the execution of the component until the release command is issued by the time controller. Socioeconomic simulation models are embedded into the DeepActor Framework. The DeepActor approach is based on a social sciences concept for agent-based simulation (Barthel et al., 2008; Gilbert and Troitzsch, 2005). An actor represents an entity acting within the simulation area. Actor based models for water supply companies, tourism industry, households, agriculture, economy, and demography are implemented in DANUBIA. The actors have “sensors” with which they perceive their surroundings, especially with respect to the state of the water resources and the reaction of other actors. These “sensors” are established through quantitative or qualitative information (flags) communicated by the model interfaces and which inform the actors about the state of their environment. Actors also have a history or memory, in which they store decisions of previous simulation steps. Plans represent the behavior options of an actor. Each plan contains a number of actions, which explicitly model the effects of a plan implementation (Hennicker et al., 2016). By integrating natural and social science components the SDSS DANUBIA not only addresses interactions and feedbacks in the natural environment, it also recognizes feedback mechanisms of man and the environment. Thus, stakeholders can investigate the effects of their management options by changing the management plans and thus the actor’s behavior. DANUBIA is designed as a framework-based, open source, distributed system. A comprehensive documentation of the system, the program code and of all major components is available at the GLOWA Danube web site (GLOWA-Danube, 2017a). DANUBIA is implemented in JAVA. Simulation results are publicly available at the Global Change Atlas for the upper Danube watershed (GLOWA-Danube, 2017b). SDSSs are tools to evaluate consequences of management alternatives. They require stakeholder participation at all stages: during the conceptualization phase, the development phase and the implementation phase. Stakeholders must be identified according to the criteria relevant for the decision-making and management objectives. Criteria might be based on spatial features such as region, they can be based on the role of the stakeholder, the societal sector or the sphere of activity, to name a few (Büttner, 2016). Stakeholder participation should be conceived as a process rather than a mere exchange at a given time. The stakeholder participation process facilitates open communication of all parties involved and results in a change of attitude, perception or actions of the persons or entities involved. Integrating actor-based models into an SDSS constituted not only an important scientific achievement and a milestone in methodological development for cross-disciplinary cooperation, it also is essential as a tool to understand process and effect relationships in the man-environment continuum and to assess the effect of management alternatives against the background of a scientific analysis of societal processes rather than a perceived ideal of action alternatives. However, a SDSS should not be understood as a mere computer system and also not as an optimizing system providing optimal solutions. SDSS in hydrology merely support decision making by providing spatial information about water fluxes and water resources and the effects of management alternatives, global change effects and/or variability of the environmental systems. Thus, essential components of a SDSS are the stakeholders and experts, which interact via suitable interfaces with the system consisting of spatial databases, models, GIS and other tools. The system’s output must be translated into a form easily understandable by the user and stakeholder. The process of identifying a suitable option for action is a societal process, which should be an informed process based on the best available knowledge and facts. GIS and SDSS can play a decisive role to provide the base for informed decisions.

2.04.7

Future Prospects

The development of computer technology and GIS has strongly influenced hydrology as a science. Spatial hydrology has thrived in the recent decades due to the increasing abundance of spatial data, the advancement of GIS methods and the development of spatially distributed models. Hydrology is inherently a spatial discipline. Consequentially, GIS has become an indispensable method for hydrologists and water managers. The integration of standard procedures of hydrological analyses into a GIS system shows the relevance of GIS to hydrology and vice versa. Hydrology also is a good example of the need to integrate across disciplines and to develop truly transdisciplinary approaches. The importance of spatial data for hydrology is evident by the multitude of services and data portals addressing hydrological issues. These data portals reach from worldwide data sets to local data sets, they are maintained by international organization, national and local entities, by governmental bodies or private enterprises. Water resource management is always associated with decision making and more often than not conflicts of interest must be mitigated. The decision-making process therefore must be fact based and transparent acknowledging the rightful interest of all stakeholders. The complexity of the different interests as well as the interactions and feedbacks of the processes involved requires the use of SDSS. The significance of SDSS particularly also for social sciences has been underlined by Pontius and Si (2015). They understand SDSS as an important element in social science theories and models to explain decisions. For water resources management, the transdisciplinary integration of natural and social science approaches is essential, both for the advancement of the respective disciplines as well as for applied tasks. Transdisciplinary integration, however, requires an appropriate methodological approach. Object oriented/component-based modeling technology were identified as being particularly suitable to facilitate transdisciplinary cooperative model development (Formetta et al., 2014). Undoubtedly, the development of transdisciplinary cooperative models will commence in the future. Understanding that today’s environmental problems, decisions, and policies require transdisciplinary science approaches and computer capabilities that allow the environment to be considered in a holistic way, Laniak et al. (2013) called for the development of integrated environmental modeling (IEM) science containing four interdependent elements: applications, science, technology, and community. They see among the highest priorities foster IEM to (i) develop standard for publishing IEM data and models,

GIS for Hydrology

75

(ii) to educate the next generation of environmental stakeholders, with a focus on transdisciplinary research, development, and decision making, and (iii) to provide a web-based platform for community interactions. Particularly with respect to water, citizens represent an often-overlooked group of stakeholders. Citizen-science networks can contribute in many different ways, such as (i) monitoring of natural resources and environmental conditions, (ii) provision of historical and local knowledge, (iii) facilitate knowledge transfer, (iii) validate monitoring technologies and processes or (iv) evaluate management alternatives (Laniak et al., 2013). GIS technologies have expanded beyond the realm of experts and have entered the public sphere. Water resources web applications are growing in popularity. These apps are different from standard web development software, because of their spatial data components. Swain et al. (2015) propose to address the spatial data needs through a combination of existing FOSS for GISs (FOSS4G) and FOSS for web development. However, the abundance of FOSS projects can be overwhelming to new developers and to users alike, as a single platform presenting these in a structured way does not exist. The establishment of a water resources web app platform would enable future web apps to be developed much more rapidly (Swain et al., 2015). With the increasing availability and popularity of smartphone apps, the interaction between the stakeholders will likely be augmented by additional options for participation such as interactive social-web tools for crowdsourcing, the development of mapping tools and other tools to effectively communicate spatial data to support decision making (Brown and Weber, 2011; Hudson-Smith et al., 2009). However, while big data from crowdsourcing offers many opportunities, quality control of the measurements, data and information provided is essential to ensure trustworthiness and accuracy. Quality control includes more than a rigorous check of the accuracy of the data, it includes among others observing and documenting a measurement and data analysis protocol, provision of the necessary auxiliary data and metadata and the appropriate spatial, temporal and thematic accuracy, resolution, consistency, and completeness (Li et al., 2016). Water is an essence of life, posing at times as much a threat as it is the basis for human and environmental wellbeing. With water security decreasing in many parts of the world and water conflicts increasing, a major challenge for hydrology is to find the right balance of appropriating water for human use and ecosystems. This balance is variable in space and time and must observe a sufficient margin of security needed to sustain the natural and social systems. GIS technologies, remote sensing, modeling and stakeholder participation are essential components towards the development of direly needed solutions for a sustainable use of water.

References Abbott, M.B., Bathurst, J.C., Cunge, J.A., O’connell, P.E., Rasmussen, J., 1986a. An introduction to the European Hydrological SystemdSysteme Hydrologique European, “SHE”, 2: Structure of a physically-based, distributed modelling system. Journal of Hydrology 87, 61–77. Abbott, M.B., Bathurst, J.C., Cunge, J.A., O’Connell, P.E., Rasmussen, J., 1986b. An introduction to the European Hydrological SystemdSysteme Hydrologique European, “SHE”, 1: History and philosophy of a physically-based, distributed modelling system. Journal of Hydrology 87, 45–59. Ames, D.P., Horsburgh, J.S., Cao, Y., Kadlec, J., Whiteaker, T., Valentine, D., 2012. HydroDesktop: Web services-based software for hydrologic data discovery, download, visualization, and analysis. Environmental Modelling & Software 37, 146–156. http://dx.doi.org/10.1016/j.envsoft.2012.03.013. Arnold, J.G., Fohrer, N., 2005. SWAT2000: Current capabilities and research opportunities in applied watershed modelling. Hydrological Processes 19, 563–572. http://dx.doi.org/ 10.1002/hyp.5611. Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic modeling and assessment part I: Model development. Journal of the American Water Resources Association 34, 73–89. http://dx.doi.org/10.1111/j.1752-1688.1998.tb05961.x. Australian Government (2017) Australian water resources information system: Water information: Bureau of meteorology [WWW Document]. http://www.bom.gov.au/water/about/ wip/awris.shtml (accessed on January 29, 2017). Babbar-Sebens, M., Mukhopadhyay, S., Singh, V.B., Piemonti, A.D., 2015. A web-based software tool for participatory optimization of conservation practices in watersheds. Environmental Modelling & Software 69, 111–127. http://dx.doi.org/10.1016/j.envsoft.2015.03.011. Bach, H., Mauser, W., Schneider, K., 2003. The use of radiative transfer models for remote sensing data assimilation in crop growth models. In: Stafford, J., Werner, A. (Eds.), Precision Agriculture. Wageningen Academic Publishers, Wageningen, pp. 35–40. http://dx.doi.org/10.3920/978-90-8686-514-7. Bareth, G., 2009. GIS-and RS-based spatial decision support: structure of a spatial environmental information system (SEIS). International Journal of Digital Earth 2, 134–154. Barthel, R., Janisch, S., Schwarz, N., Trifkovic, A., Nickel, D., Schulz, C., Mauser, W., 2008. An integrated modelling framework for simulating regional-scale actor responses to global change in the water domain. Environmental Modelling & Software 23, 1095–1121. http://dx.doi.org/10.1016/j.envsoft.2008.02.004. Bastiaanssen, W.G.M., Menenti, M., Feddes, R.A., Holtslag, A.A.M., 1998. A remote sensing surface energy balance algorithm for land (SEBAL). 1. Formulation. Journal of Hydrology 212–213, 198–212. http://dx.doi.org/10.1016/S0022-1694(98)00253-4. Beasley, D.B., Huggins, L.F., Monke, E.J., Others, 1980. ANSWERS: A model for watershed planning. Transactions of ASAE 23, 938–944. Bessiere F (2017) RiverApp | Mobile App für aktuelle Pegelstände für Kanu- und Kajakfahrer auf Smartphone [WWW Document]. http://www.riverapp.net/ (accessed on January 29, 2017). Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology. Hydrological Sciences Bulletin 24, 43–69. BGRdBundesamt für Geowissenschaften und Rohstoffe (2017) BGRdBundesamt für Geowissenschaften und Rohstoffe [WWW Document]. http://www.bgr.bund.de/DE/Home/ homepage_node.html (accessed on February 15, 2017). BMBF (2017) GLOWAdGlobal Change and the Hydrological Cycle [WWW Document]. http://www.glowa.org/ (accessed on February 13, 2017). Bogena, H., Kunkel, R., Krüger, K., Zacharias, S., Pütz, T., Schwank, M., Bens, O., Borg, T., Brauer, A., Dietrich, P., Hajnsek, I., Kunstmann, H., Munch, J., Papen, H., Priesack, E., Schmid, H., Teutsch, G., Wollschläger, U., Vereecken, H., 2012. TERENOdEin langfristiges Beobachtungsnetzwerk für die Global Change Forschung. Hydrologie und Wasserbewirtschaftung 56, 138–143. Brooks, R.H., Corey, A.T., 1964. Hydraulic properties of porous media. Colorado State University, Fort Collins, CO. Hydrology Papers No. 3. Brown, G., Weber, D., 2011. Public Participation GIS: A new method for national park planning. Landscape and Urban Planning 102, 1–15. http://dx.doi.org/10.1016/ j.landurbplan.2011.03.003. Bundesanstalt für Gewässerkunde (2017) BfGdThe GRDC [WWW Document]. http://www.bafg.de/GRDC/EN/Home/homepage_node.html (accessed on January 29, 2017). Bundesanstalt für Kartographie und Geodäsie (2017) BKGdBundesamt für Kartographie und Geodäsie [WWW Document]. https://www.bkg.bund.de/DE/Home/home.html (accessed on February 11, 2017). Burrough, P.A., McDonnell, R., Burrough, P.A., 1998. Principles of geographical information systems, Spatial information systems. Oxford University Press, Oxford\New York.

76

GIS for Hydrology

Büttner, H., 2016. The Stakeholder Dialogue in the Third Project Phase of GLOWA-Danube. In: Mauser, W., Prasch, M. (Eds.), Regional assessment of global change impacts. Springer International Publishing, Cham, pp. 49–53. http://dx.doi.org/10.1007/978-3-319-16751-0_5. Campbell, G.S., 1974. A simple method for determining unsaturated conductivity from moisture retention data. Soil Science 117, 311–314. Chen, Y., Han, D., 2016. On big data and hydroinformatics. In: Proceeding of the 12th International Conference Hydroinformatics HIC 2016 - Smart Water Future 154, 184–191. http://dx.doi.org/10.1016/j.proeng.2016.07.443. Remote sensing and GIS for hydrology and water resources: Proceedings RSHS 14 and ICGRHWE 14, Guangzhou, China, August 2014. In: Chen, Y., Neale, C., Cluckie, I., Su, Z., Zhou, J., Huan, Q., Xu, Z., International Association of Hydrological Sciences (Eds.), 2015Peer-Reviewed Papers Presented at the 3rd Remote Sensing and Hydrology Symposium and the 3rd International Conference of GIS/RS in Hydrology, Water Resources and Environment. IAHS publication, Wallingford. Chow, V.T., Maidment, D.R., Mays, L.W., 1988. Applied hydrology, McGraw-Hill series in water resources and environmental engineering. McGraw-Hill, New York. Cosby, B.J., Hornberger, G.M., Clapp, R.B., Ginn, T.R., 1984. A statistical exploration of the relationships of soil moisture characteristics to the physical properties of soils. Water Resources Research 20, 682–690. Costa-Cabral, M.C., Burges, S.J., 1994. Digital Elevation Model Networks (DEMON): A model of flow over hillslopes for computation of contributing and dispersal areas. Water Resources Research 30, 1681–1692. http://dx.doi.org/10.1029/93WR03512. Cressie, N.A.C., 1991. Statistics for spatial data, Wiley series in probability and mathematical statistics Applied probability and statistics. Wiley, New York. Daly, C., Neilson, R.P., Phillips, D.L., 1994. A statistical-topographic model for mapping climatological precipitation over mountainous terrain. Journal of Applied Meteorology 33, 140–158. http://dx.doi.org/10.1175/1520-0450(1994)0332.0.co;2. de Haar, U., 1974. Beitrag zur Frage der wissenschaftssystematischen Einordnung und Gliederung der Wasserforschung. Beiträge Zur Hydrologie 2, 85–100. Deutscher Wetterdienst, 2017b. Index von ftp://ftp-cdc.dwd.de/pub/CDC/[WWW Document]. ftp://ftp-cdc.dwd.de/pub/CDC/ (accessed on January 29, 2017). Dietrich, W.E., Wilson, C.J., Montgomery, D.R., McKean, J., 1993. Analysis of erosion thresholds, channel networks, and landscape morphology using a digital terrain model. The Journal of Geology 101, 259–278. http://dx.doi.org/10.1086/648220. DigitalGlobe (2017) DigitalGlobedSee a Better World With High-Resolution Satellite Imagery [WWW Document]. https://www.digitalglobe.com/ (accessed on February 23, 2017). Dingman, S.L., 2015. Physical hydrology, 3rd edn. Waveland Press, Long Grove, IL. Dixon, B., Uddameri, V., 2016. GIS and geocomputation for water resource science and engineering. John Wiley & Sons Ltd, West Sussex. DOI (2017) Multi-Resolution Land Characteristics Consortium (MRLC) [WWW Document]. https://www.mrlc.gov/ (accessed on February 11, 2017). ECA&D (2017). Home European Climate Assessment & Dataset [WWW Document]. http://www.ecad.eu/ (accessed on January 29, 2017). Environment Agency UK (2017) Flood information servicedGOV.UK [WWW Document]. https://flood-warning-information.service.gov.uk/ (accessed on January 29, 2017). European Commission (2017a) CORINE Land CoverdCopernicus Land Monitoring Service [WWW Document]. http://land.copernicus.eu/pan-european/corine-land-cover (accessed on January 29, 2017). European Commission (2017b) INSPIRE Geoportal [WWW Document]. http://inspire-geoportal.ec.europa.eu/ (accessed on January 29, 2017). European Commission (2017c) The Water Information System for Europe - [WWW Document]. http://water.europa.eu/ (accessed on January 29, 2017). European Commission (2017d) JRC Water Portal [WWW Document]. http://water.jrc.ec.europa.eu/waterportal (accessed on January 29, 2017). European Commission (2017e) HomedEurostat [WWW Document]. http://ec.europa.eu/eurostat/ (accessed on January 29, 2017). European Commission (2017f) European Flood Awareness System (EFAS) [WWW Document]. https://www.efas.eu/ (accessed on January 29, 2017). European Environmental Agency (2017). Data centre overviewdEuropean Environment Agency [WWW Document]. http://www.eea.europa.eu/themes/water/dc (accessed on January 29, 2017). European Union (2000) Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 Establishing a Framework for Community Action in the Field of Water Policy. p. 1 OJ L 327. Fairfield, J., Leymarie, P., 1991. Drainage networks from grid digital elevation models. Water Resources Research 27, 709–717. http://dx.doi.org/10.1029/90WR02658. FAO (2017) AQUASTATdFAO’s Information System on Water and Agriculture [WWW Document]. http://www.fao.org/nr/water/aquastat/main/index.stm (accessed on January 29, 2017). Fienen, M.N., Lowry, C.S., 2012. Social WaterdA crowdsourcing tool for environmental data acquisition. Computers & Geosciences 49, 164–169. http://dx.doi.org/10.1016/ j.cageo.2012.06.015. Flügel, W.-A., 1995. Delineating hydrological response units by geographical information system analyses for regional hydrological modelling using PRMS/MMS in the drainage basin of the River Bröl Germany. Hydrological Processes 9, 423–436. http://dx.doi.org/10.1002/hyp.3360090313. Formetta, G., Antonello, A., Franceschi, S., David, O., Rigon, R., 2014. Hydrological modelling with components: A GIS-based open-source framework. Environmental Modelling & Software 55, 190–200. http://dx.doi.org/10.1016/j.envsoft.2014.01.019. Freeman, T.G., 1991. Calculating catchment area with divergent flow based on a regular grid. Computers & Geosciences 17, 413–422. http://dx.doi.org/10.1016/0098-3004(91) 90048-I. Garbrecht J, Martz LW (1995a) TOPAZ: An Automated Digital Landscape Analysis Tool for Topographic Evaluation, Drainage Identification, Watershed Segmentation and Subcatchment Parameterisation: TOPAZ User Manual. US Department of Agriculture. Garbrecht, J., Martz, L.W., 1995b. Advances in automated landscape analysis. In: Espey, W.H., Combs, P.G. (Eds.)Presented at the First International Conference on Water Resources Engineering. American Society of Civil Engineers, San Antonio, TX, pp. 844–848. Garbrecht, J., Ogden, F.L., DeBarry, P.A., Maidment, D.R., 2001. GIS and distributed watershed models I: Data coverages and sources. Journal of Hydrologic Engineering 6, 506–514. http://dx.doi.org/10.1061/(Asce)1084-0699(2001)6:6(506). Helmholtz Gemeinschaft (2017) Willkommen bei TERENOdTEODOOR [WWW Document]. http://teodoor.icg.kfa-juelich.de/overview-de (accessed on February 4, 2017). Gilbert, G.N., Troitzsch, K.G., 2005. Simulation for the social scientist, 2nd edn. Open University Press, Maidenhead\New York, NY. GLOBE (2017) OverviewdGLOBE.gov [WWW Document]. https://www.globe.gov/de/about/overview (accessed on February 6, 2017). GLOWA (2008) GLOWA_broschuere_eng.pdf [WWW Document]. http://www.glowa.org/eng/glowa_eng/pdf_eng/GLOWA_broschuere_eng.pdf (accessed on February 6, 2017). GLOWA-Danube (2017) GLOWA-DanubedA research project in the framework of GLOWA [WWW Document]. http://www.glowa-danube.de/eng/opendanubia/opendanubia.php (accessed on February 6, 2017). GLOWA-Danube (2017) GLOWA-DanubedInteraktiver Online Atlas [WWW Document]. http://www.glowa-danube.de/atlas/ (accessed on February 13, 2017). Gnyp, M.L., Bareth, G., Li, F., Lenz-Wiedemann, V.I.S., Koppe, W., Miao, Y., Hennig, S.D., Jia, L., Laudien, R., Chen, X., Zhang, F., 2014. Development and implementation of a multiscale biomass model using hyperspectral vegetation indices for winter wheat in the North China Plain. International Journal of Applied Earth Observation and Geoinformation 33, 232–242. http://dx.doi.org/10.1016/j.jag.2014.05.006. Goovaerts, P., 1997. Geostatistics for natural resources evaluation. Oxford University Press, Oxford, 483. Goovaerts, P., 2000. Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. Journal of Hydrology 228, 113–129. http://dx.doi.org/10.1016/ S0022-1694(00)00144-X. GPM (2017) GPMdGlobal Precipitation Measurement [WWW Document]. http://www.nasa.gov/mission_pages/GPM/main/index.html (accessed February 11, 2017a). GRACE (2017) Grace Mission [WWW Document]. http://www.nasa.gov/mission_pages/Grace/index.html (accessed on February 23, 2017). Green, W.H., Ampt, G., 1911. Studies on soil physics. The Journal of Agricultural Science 4, 1–24. Gumbel, E.J., 1941. The return period of flood flows. Annals of Mathematical Statistics 12, 163–190. http://dx.doi.org/10.1214/aoms/1177731747. Gurnell, A.M., Montgomery, D.R. (Eds.), 2000. Hydrological applications of GIS, Advances in hydrological processes. John Wiley, New York.

GIS for Hydrology

77

GWP (2017) What is IWRM?dThe ChallengedGlobal Water Partnership [WWW Document]. http://www.gwp.org/en/The-Challenge/What-is-IWRM/ (accessed on February 13, 2017). Hartkamp, A.D., White, J.W., Hoogenboom, G., 1999. Interfacing Geographic Information Systems with Agronomic Modeling: A Review Presented at Annu. Meet. ASA, 89th, Anaheim, CA, 26–31 Oct. 1997. Agronomy Journal 91, 761–772. http://dx.doi.org/10.2134/agronj1999.915761x. Hazen, A., 1914. Storage to be provided in impounding reservoirs for municipal water supply. Proceedings of the American Society of Civil Engineers 39, 1943–2044. Hennicker, R., Janisch, S., Kraus, A., Ludwig, M., 2016. DANUBIA: A web-based modelling and decision support system to investigate global change and the hydrological cycle in the upper danube basin. In: Mauser, W., Prasch, M. (Eds.), Regional Assessment of Global Change Impacts. Springer International Publishing, Cham, pp. 19–27. http:// dx.doi.org/10.1007/978-3-319-16751-0_2. HKC (2017). Startseite [WWW Document]. Hochwasserpass. http://hochwasser-pass.com/ (accessed on January 29, 2017). Horsburgh, J.S., Tarboton, D.G., Maidment, D.R., Zaslavsky, I., 2011. Components of an environmental observatory information system. Computers & Geosciences 37, 207–218. http://dx.doi.org/10.1016/j.cageo.2010.07.003. Horton, R.E., 1933. The Rôle of infiltration in the hydrologic cycle. Eos, Transactions American Geophysical Union 14, 446–460. http://dx.doi.org/10.1029/TR014i001p00446. Horton, R.E., 1945. Erosional development of streams and their drainage basins; hydrophysical approach to quantitative morphology. Geological Society of America Bulletin 56, 275–370. Hudson-Smith, A., Batty, M., Crooks, A., Milton, R., 2009. Mapping for the Masses: Accessing Web 2.0 Through Crowdsourcing. Social Science Computer Review 27, 524–538. http://dx.doi.org/10.1177/0894439309332299. Hurst, H.E., 1951. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770–808. IGRAC (2017) Global Groundwater Information System (GGIS) | International Groundwater Resources Assessment Centre [WWW Document]. https://www.un-igrac.org/globalgroundwater-information-system-ggis (accessed on January 29, 2017). IGRAC (2017) GGMNdGlobal Groundwater Network [WWW Document]. https://ggmn.un-igrac.org/ (accessed on January 29, 2017). IIASA (2017) IIASAdLand Use Change and Agriculture Program [WWW Document]. http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/HTML/ (accessed on February 11, 2017). International Water Management Institute (2017). Water Data Portal [WWW Document]. http://waterdata.iwmi.org/ (accessed on January 29, 2017). IRNSS (2017) IRNSS [WWW Document]. http://www.isro.gov.in/irnss-programme (accessed on February 23, 2017). Jenson, S.K., Domingue, J.O., 1988. Extracting topographic structure from digital elevation data for geographic information-system analysis. Photogrammetric Engineering & Remote Sensing 54, 1593–1600. Jewell, S.A., Gaussiat, N., 2015. An assessment of kriging-based rain-gauge–radar merging techniques. Quarterly Journal of the Royal Meteorological Society 141, 2300–2313. http://dx.doi.org/10.1002/qj.2522. Johnson, L.E., 2009. Geographic information systems in water resources engineering. CRC Press, Boca Raton. Johnson, L.E., LaBadie, J., 2008. Geographic information systems in water resources engineering. CRC Press Inc., Boca Raton\London. JPL (2017) ASTER Global Digital Elevation Map [WWW Document]. https://asterweb.jpl.nasa.gov/gdem.asp (accessed on February 11, 2017). Kazakos W, Reineke A, and Heidmann C (2012) GDI Bayern-Geodateninfrastrukturen in der Wasserwirtschaft Bayern. Presented at the EnviroInfo, pp. 783–789. Kelly, M., Ferranto, S., Lei, S., Ueda, K., Huntsinger, L., 2012. Expanding the table: The web as a tool for participatory adaptive management in California forests. Journal of Environmental Management 109, 1–11. http://dx.doi.org/10.1016/j.jenvman.2012.04.035. Kornelsen, K.C., Coulibaly, P., 2013. Advances in soil moisture retrieval from synthetic aperture radar and hydrological applications. Journal of Hydrology 476, 460–489. http:// dx.doi.org/10.1016/j.jhydrol.2012.10.044. Korres, W., Reichenau, T.G., Schneider, K., 2013. Patterns and scaling properties of surface soil moisture in an agricultural landscape: An ecohydrological modeling study. Journal of Hydrology 498, 89–102. http://dx.doi.org/10.1016/j.jhydrol.2013.05.050. Application of geographic information systems in hydrology and water resources management. In: Kovar, K., HydroGIS international association of hydrological sciences (Eds.), 1993Proceedings of an International Conference Held in Vienna, Austria, From 19 to 22 April, 1993. IAHS Publication, Wallingford. Kovar K, HydroGIS, Institut für Wasserwirtschaft H und KW (Eds.) (1996) Application of geographic information systems in hydrology and water resources management: Proceedings of the HydroGis ’96 Conference held in Vienna, Austria, from 16 to 19 April 1996, IAHS publication. IAHS Press, Wallingford. Krause, P., 2002. Quantifying the impact of land use changes on the water balance of large catchments using the J2000 model. Physics and Chemistry of the Earth Part ABC 27, 663–673. http://dx.doi.org/10.1016/S1474-7065(02)00051-7. Lagacherie, P., Rabotin, M., Colin, F., Moussa, R., Voltz, M., 2010. Geo-MHYDAS: A landscape discretization tool for distributed hydrological modeling of cultivated areas. Computers & Geosciences 36, 1021–1032. http://dx.doi.org/10.1016/j.cageo.2009.12.005. Lake, I.R., Lovett, A.A., Hiscock, K.M., Betson, M., Foley, A., Sünnenberg, G., Evers, S., Fletcher, S., 2003. Evaluating factors influencing groundwater vulnerability to nitrate pollution: Developing the potential of GIS. Journal of Environmental Management 68, 315–328. http://dx.doi.org/10.1016/S0301-4797(03)00095-1. Laniak, G.F., Olchin, G., Goodall, J., Voinov, A., Hill, M., Glynn, P., Whelan, G., Geller, G., Quinn, N., Blind, M., Peckham, S., Reaney, S., Gaber, N., Kennedy, R., Hughes, A., 2013. Integrated environmental modeling: A vision and roadmap for the future. Environmental Modelling & Software 39, 3–23. http://dx.doi.org/10.1016/j.envsoft.2012.09.006. LANUF NRW (2017) Landesamt für Natur, Umwelt und Verbraucherschutz NRW [WWW Document]. http://luadb.it.nrw.de/LUA/hygon/pegel.php?karte¼nrw (accessed on January 29, 2017). Le Coz, J., Patalano, A., Collins, D., Guillén, N.F., García, C.M., Smart, G.M., Bind, J., Chiaverini, A., Le Boursicaud, R., Dramais, G., Braud, I., 2016. Crowdsourced data for flood hydrology: Feedback from recent citizen science projects in Argentina, France and New Zealand. Journal of Hydrology 541 (Part B), 766–777. http://dx.doi.org/10.1016/ j.jhydrol.2016.07.036. Lea, N.L., 1992. An aspect driven kinematic routing algorithm. In: Parsons, A.J., Abrahams, A.D. (Eds.), Overland flow: Hydraulics and erosion mechanics. Chapman and Hall, New York, pp. 147–175. LfL (2017) Bayerische Landesanstalt für Landwirtschaft: Agrarmeteorologie Bayern [WWW Document]. http://www.wetter-by.de/Internet/AM/inetcntrBY.nsf/cuhome.xsp? src¼L941ES4AB8&p1¼K1M7X321X6&p3¼10VER48553 (accessed on January 29, 2017). Li, J., Heap, A.D., 2014. Spatial interpolation methods applied in the environmental sciences: A review. Environmental Modelling & Software 53, 173–189. http://dx.doi.org/ 10.1016/j.envsoft.2013.12.008. Li, S., Dragicevic, S., Castro, F.A., Sester, M., Winter, S., Coltekin, A., Pettit, C., Jiang, B., Haworth, J., Stein, A., Cheng, T., 2016. Geospatial big data handling theory and methods: A review and research challenges. Theme Issue StatedArt Photogramm. Remote Sensing and Spatial Information Sciences 115, 119–133. http://dx.doi.org/ 10.1016/j.isprsjprs.2015.10.012. Lindsay, J.B., 2016. The practice of DEM stream burning revisited. Earth Surface Processes and Landforms 41, 658–668. http://dx.doi.org/10.1002/esp.3888. LTER (2013). The Long Term Ecological Research Network | Long-term, broad-scale research to understand our world [WWW Document]. https://lternet.edu/ (accessed on February 4, 2017). LUBW und LfU (2017) Kontakt: Länderübergreifendes Hochwasserportal [WWW Document]. http://www.hochwasserzentralen.de (accessed on January 29, 2017). Maidment, D.R., 1993a. Handbook of hydrology. McGraw-Hill, New York. Maidment, D.R., 1993b. GIS and hydrologic modeling. In: Goodchild, M.F., Parks, B.O., Steyaert, L.T. (Eds.), Environmental modeling with GIS. Oxford University Press, Oxford, pp. 147–167. Maidment, D.R., 2002. Arc hydro: GIS for water resources. ESRI Press, Redlands, Calif.

78

GIS for Hydrology

Martin, P.H., LeBoeuf, E.J., Dobbins, J.P., Daniel, E.B., Abkowitz, M.D., 2005. Interfacing GIS with water resource models: A state of the art review. Journal of the American Water Resources Association 41, 1471–1487. http://dx.doi.org/10.1111/j.1752-1688.2005.tb03813.x. Martz, L.W., Garbrecht, J., 1992. Numerical definition of drainage network and subcatchment areas from Digital Elevation Models. Computers & Geosciences 18, 747–761. http:// dx.doi.org/10.1016/0098-3004(92)90007-E. Martz, L.W., Garbrecht, J., 1998. The treatment of flat areas and depressions in automated drainage analysis of raster digital elevation models. Hydrological Processes 12, 843– 855. http://dx.doi.org/10.1002/(SICI)1099-1085(199805)12:63.0.CO;2-R. Matheron G (1971) The theory of regionalized variables and its applications. École national supérieure des mines, Paris. Mauser, W., Bach, H., 2009. PROMETdLarge scale distributed hydrological modelling to study the impact of climate change on the water flows of mountain watersheds. Journal of Hydrology 376, 362–377. http://dx.doi.org/10.1016/j.jhydrol.2009.07.046. Mauser, W., Prasch, M. (Eds.), 2016. Regional assessment of global change impacts. Springer International Publishing, Cham. Mauser, W., Schädlich, S., 1998. Modelling the spatial distribution of evapotranspiration on different scales using remote sensing data. Journal of Hydrology 212–213, 250–267. http://dx.doi.org/10.1016/S0022-1694(98)00228-5. McBratney, A.B., Minasny, B., Cattle, S.R., Vervoort, R.W., 2002. From pedotransfer functions to soil inference systems. Geoderma 109, 41–73. http://dx.doi.org/10.1016/S00167061(02)00139-8. Meijerink AMJ (1994) Introduction to the use of geographic information systems for practical hydrology, International Institute for Aerospace Survey and Earth Sciences (ITC), ITC publication: Enschede. Menon, M., Rousseva, S., Nikolaidis, N.P., van Gaans, P., Panagos, P., de Souza, D.M., Ragnarsdottir, K.V., Lair, G.J., Weng, L., Bloem, J., Kram, P., Novak, M., Davidsdottir, B., Gisladottir, G., Robinson, D.A., Reynolds, B., White, T., Lundin, L., Zhang, B., Duffy, C., Bernasconi, S.M., de Ruiter, P., Blum, W.E.H., Banwart, S.A., 2014. SoilTrEC: A global initiative on critical zone research and integration. Environmental Science and Pollution Research 21, 3191–3195. http://dx.doi.org/10.1007/s11356-013-2346-x. Michel, C., Andréassian, V., Perrin, C., 2005. Soil Conservation Service Curve Number method: How to mend a wrong soil moisture accounting procedure? Water Resources Research 41 (2). http://dx.doi.org/10.1029/2004WR003191. Minasny, B., McBratney, A.B., Bristow, K.L., 1999. Comparison of different approaches to the development of pedotransfer functions for water-retention curves. Geoderma 93, 225–253. Mishra, S.K., Jain, M.K., Singh, V.P., 2004. Evaluation of the SCS-CN-based model incorporating antecedent moisture. Water Resources Management 18, 567–589. http:// dx.doi.org/10.1007/s11269-004-8765-1. Mitas, L., Mitasova, H., 1998. Distributed soil erosion simulation for effective erosion prevention. Water Resources Research 34, 505–516. http://dx.doi.org/10.1029/97WR03347. Monteith, J.L., 1965. Evaporation and environment. In: Symposia of the Society for Experimental Biology, p. 205. Moore, I.D., 1996. Hydrologic modeling and GIS. In: Goodchild, M.F., Parks, B.O., Steyaert, L.T. (Eds.), GIS and environmental modeling: Progress and research issues. GIS World Books, Fort Collins, CO, pp. 143–148. Moore, I.D., Larson, C.L., 1979. Estimating micro-relief surface storage from point data. Transactions of the American Society of Agricultural Engineers 22 (5), 1073–1077. http:// dx.doi.org/10.13031/2013.35158. Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrological Processes 5, 3–30. http://dx.doi.org/10.1002/hyp.3360050103. Moore, I.D., Turner, A.K., Wilson, J.P., Jenson, S.K., Band, L.E., 1993. GIS and land surface-subsurface modelling. In: Goodchild, M.F., Parks, B.O., Steyaert, L.T. (Eds.), Environmental modeling with GIS. Oxford University Press, Oxford, pp. 196–230. MUEEF-RLP (2017) Hochwassergefahrenkarten [WWW Document]. http://www.geoportal-wasser.rlp.de/servlet/is/8201/ (accessed on February 6, 2017). MUNLV-NRW (2017) ELWAS Web [WWW Document]. http://www.elwasweb.nrw.de/elwas-web/index.jsf (accessed on January 29, 2017). MUNLV-NRW (2017) HWRMRL/Risiko- und GefahrenkartendFlussgebiete NRW [WWW Document]. http://www.flussgebiete.nrw.de/index.php/HWRMRL/Risiko-_und_ Gefahrenkarten (accessed on February 6, 2017). MWVLW-RLP (2017). Agrarmeteorologie/Startseite [WWW Document]. http://www.wetter.rlp.de/Internet/global/inetcntr.nsf/dlr_web_full.xsp? src¼L941ES4AB8&p1¼1PJCNH7DKW&p2¼IB26DJ6C96&p3¼9IQ84WEY3L&p4¼XJPZBV4849 (accessed on January 29, 2017). NCAR, UCAR (2017) Climate Data Sets | NCARdClimate Data Guide [WWW Document]. https://climatedataguide.ucar.edu/climate-data (accessed on February 11, 2017). NOAA (2017) Global Observing Systems Information Center (GOSIC) | National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC) [WWW Document]. https://www.ncdc.noaa.gov/gosic (accessed on January 29, 17). NOAA (2017) GCOS Essential Climate Variable (ECV) Data Access | National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC) [WWW Document]. https://www.ncdc.noaa.gov/gosic/gcos-essential-climate-variable-ecv-data-access-matrix (accessed on January 29, 2017). O’Callaghan, J.F., Mark, D.M., 1984. The extraction of drainage networks from digital elevation data. Computer Vision, Graphics, and Image Processing 28, 323–344. http:// dx.doi.org/10.1016/S0734-189X(84)80011-0. Obled, C., Wendling, J., Beven, K., 1994. The sensitivity of hydrological models to spatial rainfall patterns: An evaluation using observed data. Journal of Hydrology 159, 305–333. http://dx.doi.org/10.1016/0022-1694(94)90263-1. OGC (2017) OGC Standards | OGC [WWW Document]. http://www.opengeospatial.org/docs/is (accessed on Februray 6, 17). Pontius Jr., R.G., Si, K., 2015. Spatial Decision Support Systems A2dWright. In: James, D. (Ed.), International Encyclopedia of the Social & Behavioral Sciences, 2nd edn. Elsevier, Oxford, pp. 136–141. Pullar, D., Springer, D., 2000. Towards integrating GIS and catchment models. Environmental Modelling & Software 15, 451–459. http://dx.doi.org/10.1016/S1364-8152(00) 00023-2. Quinn, P., Beven, K., Chevallier, P., Planchon, O., 1991. The prediction of hillslope flow paths for distributed hydrological modelling using digital terrain models. Hydrological Processes 5, 59–79. http://dx.doi.org/10.1002/hyp.3360050106. Ramirez JA (2000) Prediction and modeling of flood hydrology and hydraulics. Inland Flood Hazards Hum. Riparian Aquat. Communities 498. Rawls, W.J., Brakensiek, D.L., Saxton, K.E., 1982. Estimation of soil water properties. Transactions of ASAE 25, 1316–1320. ReKIS [WWW Document] (2017). http://141.30.160.224/fdm/index.jsp?k¼rekis (accessed on January29, 2017). Rinner, C., Keßler, C., Andrulis, S., 2008. The use of Web 2.0 concepts to support deliberation in spatial decision-making. Computers, Environment and Urban Systems 32, 386– 395. http://dx.doi.org/10.1016/j.compenvurbsys.2008.08.004. Rodríguez-Iturbe, I., Devoto, G., Valdés, J.B., 1979. Discharge response analysis and hydrologic similarity: The interrelation between the geomorphologic IUH and the storm characteristics. Water Resources Research 15, 1435–1444. http://dx.doi.org/10.1029/WR015i006p01435. Roessler, I., Duong, S., Hachmeister, C.-D. (2015). Welche Missionen haben Hochschulen? Third Mission als Leistung der Fachhochschulen für die und mit der Gesellschaft, CHE Centrum für Hochschulentwicklung Arbeitspapier. Romano, N., Chirico, G.B., 2004. The role of terrain analysis in using and developing pedotransfer functions. Developments in Soil Science. Elsevier, Amsterdam, pp. 273–294. Sanzana, P., Jankowfsky, S., Branger, F., Braud, I., Vargas, X., Hitschfeld, N., Gironás, J., 2013. Computer-assisted mesh generation based on hydrological response units for distributed hydrological modeling. Computers & Geosciences 57, 32–43. http://dx.doi.org/10.1016/j.cageo.2013.02.006. Saxton, K.E., Rawls, W.J., Romberger, J.S., Papendick, R.I., 1986. Estimating generalized soil-water characteristics from texture. Soil Science Society of America Journal 50, 1031–1036. http://dx.doi.org/10.2136/sssaj1986.03615995005000040039x. Schaap, M.G., Leij, F.J., van Genuchten, M.T., 1998. Neural network analysis for hierarchical prediction of soil hydraulic properties. Soil Science Society of America Journal 62, 847–855.

GIS for Hydrology

79

Schneider, K., 2003. Assimilating remote sensing data into a land-surface process model. International Journal of Remote Sensing 24, 2959–2980. Schneider, K., Mauser, W., 1996. Processing and accuracy of Landsat Thematic Mapper data for lake surface temperature measurement. International Journal of Remote Sensing 17, 2027–2041. http://dx.doi.org/10.1080/01431169608948757. Segond, M.-L., Wheater, H.S., Onof, C., 2007. The significance of spatial rainfall representation for flood runoff estimation: A numerical evaluation based on the Lee catchment, UK. Journal of Hydrology 347, 116–131. http://dx.doi.org/10.1016/j.jhydrol.2007.09.040. Sherman, L.K., 1932. Streamflow from rainfall by the unit-graph method. Engineering News-Record 108, 501–505. Singh, V.P., Fiorentino, M. (Eds.), 1996. Geographical Information Systems in Hydrology, Water Science and Technology Library. Springer, Dordrecht. http://dx.doi.org/10.1007/ 978-94-015-8745-7. Speth, P. (Ed.), 2010. Impacts of global change on the hydrological cycle in west and northwest Africa. Springer, New York. STEB (2017) Hochwassergefahrenkarten Köln [WWW Document]. http://www.hw-karten.de/koeln/ (accessed on January 29, 2017). Sui, D.Z., Maggio, R.C., 1999. Integrating GIS with hydrological modeling: practices, problems, and prospects. Computers, Environment and Urban Systems 23, 33–51. http:// dx.doi.org/10.1016/S0198-9715(98)00052-0. Swain, N.R., Latu, K., Christensen, S.D., Jones, N.L., Nelson, E.J., Ames, D.P., Williams, G.P., 2015. A review of open source software solutions for developing water resources web applications. Environmental Modelling & Software 67, 108–117. http://dx.doi.org/10.1016/j.envsoft.2015.01.014. Tabios, G.Q., Salas, J.D., 1985. A comparative analysis of techniques for spatial interpolation of precipitation. Journal of the American Water Resources Association 21, 365–380. http://dx.doi.org/10.1111/j.1752-1688.1985.tb00147.x. Tarboton, D.G., 1997. A new method for the determination of flow directions and upslope areas in grid digital elevation models. Water Resources Research 33, 309–319. http:// dx.doi.org/10.1029/96WR03137. Tarboton, D.G., Bras, R.L., Rodriguez-Iturbe, I., 1991. On the extraction of channel networks from digital elevation data. Hydrological Processes 5, 81–100. http://dx.doi.org/ 10.1002/hyp.3360050107. Tarboton, D.G., Bras, R.L., Rodriguez-Iturbe, I., 1992. A physical basis for drainage density. Geomorphology 5, 59–76. http://dx.doi.org/10.1016/0169-555X(92)90058-V. Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46, 234–240. http://dx.doi.org/10.2307/143141. Tribe, A., 1992. Automated recognition of valley lines and drainage networks from grid digital elevation models: A review and a new method. Journal of Hydrology 139, 263–293. http://dx.doi.org/10.1016/0022-1694(92)90206-B. Trzaska S, Schnarr E (2014) A review of downscaling methods for climate change projections. U. S. Agency Int. Dev. Tetra Tech ARD 1–42. UNESCO (2017) Global Water Forum | Data and Tools. European Union (2007). Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). United Nations (2017). UN-Water: Home [WWW Document]. http://www.unwater.org/ (accessed on January 29, 2017). USDA, 2004. Chapter 10: Estimation of direct runoff from storm rainfall. National Engineering Handbook Hydrology. Natural Resources Conservation Service, Washington, DC. USDA (2017) Web Soil Survey [WWW Document]. https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm (accessed on February 15, 2017). USGS (2017a) Elevation Products | Earth Resources Observation and Science (EROS) Center [WWW Document]. https://eros.usgs.gov/elevation-products (accessed on February 11, 2017). USGS (2017b) National Elevation Dataset (NED) | The Long Term Archive [WWW Document]. https://lta.cr.usgs.gov/NED (accessed on February 11, 2017). USGS (2017c) Light Detection and Ranging (LIDAR) | The Long Term Archive [WWW Document]. https://lta.cr.usgs.gov/lidar_digitalelevation (accessed on February 11, 2017). USGS (2017d) Shuttle Radar Topography Mission (SRTM) | The Long Term Archive [WWW Document]. https://lta.cr.usgs.gov/SRTM (accessed on February 11, 2017). USGS (2017e) U.S. Geological SurveydNational Hydrography Dataset [WWW Document]. https://nhd.usgs.gov/wbd.html (accessed on February 11, 2017). USGS (2017f) USGS Water Data for the Nation [WWW Document]. https://waterdata.usgs.gov/nwis/ (accessed on February 11, 2017). USGS (2017g) Water Resources of the United StatesdNational Water Information System (NWIS) Mapper [WWW Document]. https://maps.waterdata.usgs.gov/mapper/index.html (accessed on January 29, 2017). van Genuchten, M.T., 1980. A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Science Society of America Journal 44, 892–898. Vereecken, H., Weynants, M., Javaux, M., Pachepsky, Y., Schaap, M.G., van Genuchten, M.T., 2010. Using pedotransfer functions to estimate the van Genuchten–Mualem soil hydraulic properties: A review. Vadose Zone Journal 9, 795–820. http://dx.doi.org/10.2136/vzj2010.0045. Vieux, B.E., 2016. Distributed hydrologic modeling using GIS. Water Science and Technology Library, 3rd edn. Springer, Dordrech. Viviroli, D., Zappa, M., Gurtz, J., Weingartner, R., 2009. An introduction to the hydrological modelling system PREVAH and its pre- and post-processing tools. Environmental Modelling & Software 24, 1209–1222. http://dx.doi.org/10.1016/j.envsoft.2009.04.001. Vogt, J.V., Colombo, R., Bertolo, F., 2003. Deriving drainage networks and catchment boundaries: A new methodology combining digital elevation data and environmental characteristics. Geomorphology 53, 281–298. http://dx.doi.org/10.1016/S0169-555X(02)00319-7. Wagner, P.D., Fiener, P., Wilken, F., Kumar, S., Schneider, K., 2012. Comparison and evaluation of spatial interpolation schemes for daily rainfall in data scarce regions. Journal of Hydrology 464–465, 388–400. http://dx.doi.org/10.1016/j.jhydrol.2012.07.026. Wainwright, J., Mulligan, M., 2013. Environmental modelling: Finding simplicity in complexity, 2nd edn. Wiley, Chichester\West Sussex\Hoboken, NJ. Walter, M.T., Walter, M.F., Brooks, E.S., Steenhuis, T.S., Boll, J., Weiler, K., 2000. Hydrologically sensitive areas: Variable source area hydrology implications for water quality risk assessment. Journal of Soil and Water Conservation 55, 277–284. Warren, S.D., Hohmann, M.G., Auerswald, K., Mitasova, H., 2004. An evaluation of methods to determine slope using digital elevation data. CATENA 58, 215–233. http:// dx.doi.org/10.1016/j.catena.2004.05.001. Waters, C.N., Zalasiewicz, J., Summerhayes, C., Barnosky, A.D., Poirier, C., Ga1uszka, A., Cearreta, A., Edgeworth, M., Ellis, E.C., Ellis, M., Jeandel, C., Leinfelder, R., McNeill, J.R., Ellis, E.C., Ellis, M., Jeandel, C., Leinfelder, R., McNeill, J.R., Richter, D., deB Steffen, W., Syvitski, J., Vidas, D., Wagreich, M., Williams, M., Zhisheng, A., Grinevald, J., Odada, E., Oreskes, N., Wolfe, A.P., 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science 351, aad2622. http:// dx.doi.org/10.1126/science.aad2622. Webster, R., Oliver, M.A., 2007. Geostatistics for environmental scientists, 2nd edn. John Wiley & Sons, Chichester. Wechsung, F., Kaden, S., Behrendt, H., Klöcking, B., 2008. Integrated analysis of the impacts of global change on environment and society in the Elbe Basin. Weißensee Verlag, Berlin. Weedon, G.P., Balsamo, G., Bellouin, N., Gomes, S., Best, M.J., Viterbo, P., 2014. The WFDEI meteorological forcing data set: WATCH Forcing Data methodology applied to ERAInterim reanalysis data. Water Resources Research 50, 7505–7514. http://dx.doi.org/10.1002/2014WR015638. Deutscher Wetterdienst (2017) Wetter und KlimadDeutscher WetterdienstdStartseite [WWW Document]. http://www.dwd.de/DE/Home/home_node.html (accessed on January 29, 2017). White, T., Sharkey, S., 2016. Critical Zone. http://dx.doi.org/10.1093/obo/9780199363445-0055. Wilson, J.P., Aggett, G., Yongxin, D., Lam, C.S., 2008. Water in the landscape: A review of contemporary flow routing algorithms. In: Zhou, Q., Lees, B., Tang, G. (Eds.), Advances in digital terrain analysis. Springer, Berlin Heidelberg, pp. 213–236. WMO (2017) World Meteorological Organization Extranet | www.wmo.int [WWW Document]. http://www.wmo.int/pages/index_en.html (accessed on January 29, 2017). WMO (2017) Hydrology and Water Resources Programme (HWRP) | WMO [WWW Document]. http://www.wmo.int/pages/prog/hwrp/index_en.php (accessed on January 29, 2017). World Resources Institute (2017) Aqueduct | World Resources Institute [WWW Document]. http://www.wri.org/our-work/project/aqueduct/ (accessed on January 29, 2017).

80

GIS for Hydrology

World Resources Institute (2017) Aqueduct Global Flood Analyzer | World Resources Institute [WWW Document]. http://www.wri.org//resources/maps/aqueduct-global-floodanalyzer (accessed on January 29, 2017). Wösten, J., Finke, P., Jansen, M., 1995. Comparison of class and continuous pedotransfer functions to generate soil hydraulic characteristics. Geoderma 66, 227–237. Wösten, J., Pachepsky, Y.A., Rawls, W., 2001. Pedotransfer functions: Bridging the gap between available basic soil data and missing soil hydraulic characteristics. Journal of Hydrology 251, 123–150. WSV (2017) Pegelonline [WWW Document]. http://www.pegelonline.wsv.de/gast/start (accessed on January 29, 2017). Wupperverband (2017) FluGGSdWupperverband [WWW Document]. http://fluggs.wupperverband.de/v2p/web/fluggs (accessed on January 29, 2017). Young, R.A., Onstad, C.A., Bosch, D.D., Anderson, W.P., 1989. AGNPS: A nonpoint-source pollution model for evaluating agricultural watersheds. Journal of Soil and Water Conservation 44, 168–173. Zhang, W., Montgomery, D.R., 1994. Digital elevation model grid size, landscape representation, and hydrologic simulations. Water Resources Research 30, 1019–1028. http:// dx.doi.org/10.1029/93WR03553.

2.05

GIS Applications in Geomorphology

Jan-Christoph Otto and Gu¨nther Prasicek, University of Salzburg, Salzburg, Austria Jan Blo¨the and Lothar Schrott, University of Bonn, Bonn, Germany © 2018 Elsevier Inc. All rights reserved.

2.05.1 2.05.2 2.05.3 2.05.3.1 2.05.3.2 2.05.3.2.1 2.05.3.2.2 2.05.3.3 2.05.4 2.05.4.1 2.05.4.2 2.05.4.2.1 2.05.4.2.2 2.05.5

Introduction Land Surface Parameters and Geomorphological Indices Data Sources Digital Terrain Models Optical Imagery Optical satellite imagery Unmanned aerial vehicles and structure from motion Other Data Sources Digital Geomorphological Mapping Map Creation Automated Land Surface Classification General land surface classification Specific land surface classification Application of Various Geomorphological Indices for Process and Landform AnalysisdCase Study Obersulzbach valley, Eastern Alps, Austria Hillslopes and Gravitational Processes Glacier Environments Periglacial Environments Fluvial Environments Sediment Flux and Erosion in Mountain Areas Conclusions

2.05.5.1 2.05.5.2 2.05.5.3 2.05.5.4 2.05.5.5 2.05.6 References Relevant Websites

2.05.1

81 83 84 84 86 86 88 89 89 89 90 90 91 91 91 93 97 99 101 106 107 111

Introduction

Modern geomorphological research is inextricably linked with geospatial technology and geographic information systems (GIS). Driven by rapid technological advances of remote sensing, geodesy, photogrammetry, computer science, and GIS, the application of analysis tools using digital information on the land surface revolutionized quantitative geomorphological research (Bishop, 2013). In the last three decades, GIS has increasingly influenced various fields of geomorphology. GIS are designed to facilitate spatial investigations, for example, through geostatistical analyses or the mathematical description of surfaces and are hence inherently linked to methodology and concepts in geomorphology. GIS tools support and enable many upfront research fields in geomorphology from the quantitative description of landforms to process modeling, the investigation of form–process interrelations and linkages to climate and environmental conditions, or the assessment of sediment flux. Furthermore, process and form modeling, statistical analysis and regionalization of field data as well as graphical visualization and map creation are key features of GIS applied in geomorphology. A starting point for GIS studies commonly is the digital elevation model (DEM) supplemented with image data of various types (see section “Data sources”). However, GIS tools also allow linking remotely sensed information with field data, for example, land surface features, process rates, or subsurface information, recorded with geopositioning systems. The roots of the first geomorphographic relief analyses can be identified in early studies of Penck (1894). His pioneering ideas of landforms led to the establishment of taxonomical structures which have been used in many subsequent studies (e.g., Ahnert, 1970; Kugler, 1975; Evans, 1972). A new era in the application of GIS in geomorphological studies, however, started almost 100 years later, in the 1990s. Classic papers by Dikau et al. (1991), Moore et al. (1991), Pike and Dikau (1995), or Wilson and Gallant (2000) focused on digitally derived landform classifications and general geomorphometrical advances using DEMs, respectively. First applications of GIS to traditional geomorphological topics such as landslides, soil erosion, and mountain permafrost distribution were successful on regional or local scales (Chairat and Delleur, 1993; Deroo et al., 1989; Dikau and Jäger, 1995; Eash, 1994; Jäger, 1997; Keller, 1992; vanWesten and Terlien, 1996; Koethe and Lehmeier, 1993). Since the late 1990s, we observe an increasing use of GIS in geomorphological studies (see Fig. 1). This development is strongly related to advances in computer science, remote sensing and photogrammetric techniques, as well as shallow geophysics

81

Release of SRTM DEM (90 m)

40 Release of GTOPO30 (1km)

Number of papers

50

30

20

Release of SRTM DEM (30 m)

60

Prerelease of TanDEM-X (1 m)

GIS Applications in Geomorphology

Release of ASTER GDEM (30 m)

82

10

0 1990

1995

2000

2005

2010

2015

Year Fig. 1 Total annual number of papers explicitly including “GIS” in the title, abstract or key words, published in four international journals of geomorphology from 1989 to 2016 (data source: Webofknowledge). Release dates of global DEM data sets have been included as benchmarks of data availability. Data from 1989–2009 are taken from Oguchi, T. and Wasklewicz, T. A. (2011). Geographic information systems in geomorphology. In: Gregory, K. J. and Goudie, A. S. (eds.) The SAGE handbook of geomorphology. London: SAGE.

(Bishop, 2013). In particular, the availability of global digital terrain datasets has boosted applications and research in GIS for land surface and process analysis. On a global scale, DEMs with resolutions between 1 and 30 m are now available for the entire terrestrial landmass. In addition, laserscanning (LIDAR: light detection and ranging) and structure from motion (SFM) techniques both ground- and air-based provide high-resolution DEMs (< 1 m) on local and regional scales. Additionally, numerous GIS software tools, both commercially and open source, are available today, opening unlimited opportunities for scientists. As a consequence, the use of GIS tools for geomorphological analyses became increasingly popular. Comprehensive reviews on basic elements of remote sensing techniques and application of GIS in geomorphological research are provided by Bishop (2013) and Oguchi and Wasklewicz (2011). Applications of GIS in geomorphology span from pure visualization approaches, landform classification, land surface and hydrological analysis, process and erosion modeling, topographic change detection to hazard susceptibility modeling. While many applications focusing on land surface analysis, change detection, or hazard modeling are performed within the specific GIS software, some approaches use statistical software (e.g., R software package), or special modeling software (e.g., Matlab, IDL, a.o.) to perform geospatial analysis. For example, modeling of erosional processes and landform evolution often demand requirements that exceed capabilities of GIS software and are produced using other resources (e.g., Chen et al., 2014; Coulthard, 2001; Tucker and Hancock, 2010; also refer to https://csdms.colorado.edu/for a list of available models). While GIS software became more powerful and even provided advanced graphical tools, a simultaneous increase in geomorphological mapping cannot be observed. This is somewhat surprising because the overlay of different geomorphological, litho-, and pedological information is one of the most important tools in GIS applications and improves the applicability of maps (Otto and Smith, 2013). However, geomorphological mapping and GIS became a self-evident combination and geomorphological symbol sets are designed for specific purposes and frequently used (Gustavsson et al., 2006; Otto and Dikau, 2004; Schoeneich, 1993). Moreover, geomorphological maps are now serving as an intermediate product for quantitative sediment budget analyses. For this, GIS-based modeling of landforms is combined with subsurface information such as soil or regolith thickness, which is derived from geophysical surveys. The gained knowledge on spatial distribution of sediment storage types plays an important role in quantitative sediment budget studies (Otto et al., 2009; Schrott et al., 2003b; Theler et al., 2008).

GIS Applications in Geomorphology

83

Many useful GIS modeling approaches have been developed in the field of natural hazards. Rockfalls, landslides, floods, avalanches, or soil erosion share inherent characteristics of hazards such as magnitude or spatial extension and depend strongly on slope angle, aspect, or other parameters which can be ideally integrated and displayed in GIS environments (e.g., Gruber and Mergili, 2013; Gruber and Bartelt, 2007; Lan et al., 2007; vanWesten and Terlien, 1996; Wilford et al., 2004; Wichmann and Becht, 2006). Hazard assessment using GIS often combines geomorphometric analysis with geostatistical analysis of related parameters to generate models of spatial susceptibility (Carrara and Guzzetti, 1995). Comprehensive reviews concerning methodological aspects and GIS-based hazard assessments can be found in Guzzetti et al. (1999), Huabin et al. (2005), and van Westen et al. (2008). This article gives an overview to various GIS applications in geomorphology. We introduce basic principles of parameters and indices used for landform and process analysis and briefly highlight typical and innovative data sources and references to geomorphological mapping. Instead of reviewing the vast amount of GIS applications in the literature we visualize GIS capabilities for geomorphology by applying a selection of tools and indices in a case study area in alpine terrain (Obersulzbach Valley, Austria, European Alps). The selection touches various fields of geomorphology, however is far from being complete. Applications in the following fields are presented: (i) (ii) (iii) (iv) (v)

Hillslope and gravitational processes. Glacial processes. Periglacial processes. Fluvial processes. Sediment flux and erosion in mountain areas.

The results of the case study can be accessed online on a WebGIS application (https://tinyurl.com/webgis-book-chapter).

2.05.2

Land Surface Parameters and Geomorphological Indices

Quantitative analysis of the land surface is defined by the term geomorphometry, a highly active research field within geomorphology (Hengl and Reuter, 2009). Its focus is on the quantification of land surface parameters (LSPs) and the detection of objects from digital elevation data. In turn geomorphometry as a research area builds a theoretical foundation and serves as a bridge between GIS and geomorphology (Dikau, 1996). Geomorphometric analysis can be separated into general and specific approaches (Evans, 1972; Goudie, 1990). The main distinction between the two approaches is the continuous or discontinuous character of the object in focus. General approaches analyze the continuous land surface without addressing specific landforms or boundaries. Specific geomorphometry aims to identify and describe discrete landforms and their morphological characteristic. One focus of specific approaches is the extraction of these forms from a continuous surface (see what follows), an issue that is at the research frontier of geomorphometry (Evans, 2012). LSPs are geometrical or statistical attributes of a land surface that can be derived directly from a DEM. They can be quantified locally, or involve a regional analysis approach (Olaya, 2009). While local parameters are quantified for a single location in relation to its immediate surrounding cells, regional parameters include relations to more distant cells. The most common basic LSPs are altitude, aspect, slope, and curvature and represent examples for local parameters. Regional parameters include aspects of flow over the surface, for example, utilized in the modeling of hydrological conditions, calculation of viewsheds, or solar radiation (Gruber and Peckham, 2009; Böhner and Antonic, 2009). Examples for hydrological LSPs are flow direction, flow accumulation, and drainage networks architecture. Relating LSPs to the three fundamental concepts in geomorphology, (i) form, (ii) process, and (iii) material (Gregory and Lewin, 2014), we could identify curvature and slope as principal descriptors of form, altitude, slope, and contributing drainage area as influential factors of process activity (in case of fluvial and gravitational processes) and surface roughness as an indicator of surface material characteristics (Otto et al., 2012). Based on these basic parameters numerous topographic or geomorphological indices have been developed to study geomorphological form and process configurations (Table 1). Geomorphological indices are combinations of the primary attributes that describe or characterize the spatial variability of specific processes or landforms occurring in the landscape and can be used for landform and process analysis or landscape comparison (Pike and Wilson, 1971; Wilson and Gallant, 2000). These indices are applied in erosional process modeling, hydrological modeling, or digital soil mapping, to name just a few (Marthews et al., 2015; Moore et al., 1991). Many geomorphologic indices have been formulated before the rise of GIS resulting from classical geomorphological works from the early days of quantitative geomorphology (e.g., Leopold et al., 1964; Strahler, 1952, 1957; Bagnold, 1960). GIS tools, however, facilitate the quantification of these parameters and in combination with DEMs enable rapid application of these indices on large areas. It must be acknowledged, however, that some indices are connected to a distinct spatial scale and their application makes sense on large scales only. They serve, for example, for comparing drainage basin characteristics or landform assemblages (e.g., drainage density, hypsometry, elevation relief ratio, a.o.). Other indices can be applied on several scales, for example, terrain or surface roughness.

84

GIS Applications in Geomorphology

2.05.3

Data Sources

2.05.3.1

Digital Terrain Models

An abundance of gridded elevation data has been produced from different sources within the last decades. Extent and resolution of such datasets have been growing with computing power and digital storage capacities. Modern geomorphological studies typically employ DEMs with a resolution between 1 and 90 m. Low-resolution DEMs (cell size  30 m) are widely used for large-scale analyses of landscape evolution. The extensive, often global coverage of low-resolution DEMs allows large-scale analyses and Table 1 Frequently used indices in GIS applications in geomorphology (collected from various sources, see references in table, online resource: http://gis4geomorphology.com/) Index

Description

Formula, units

Channel sinuosity

The Sinuosity index (SI) describes the ratio of the sinuous length (measured down the centerline of the channel) to the straight-line distance of a reach. The sinuous length is divided by the straight-line distance A sinuosity of 1 describes a completely straight channel. Ratios around 1.5 refer to sinuous channels, while channels with higher ratios are considered meandering channels (Burbank and Anderson, 2011) Drainage density (DD) of a catchment is the total line length of the stream network divided by catchment area. High density values potentially reveal maturity of the channel system, rapid surface runoff and low infiltration rates, or thin vegetation cover (Horton, 1932)

SI ¼ Lc/Lv where: SI ¼ channel sinuosity Lc ¼ length of channel Lv ¼ length of valley

Drainage density

Hypsometry, hypsometric index, hypsometric curves

Mountain front sinuosity

Relief ratio Stream frequency

Hypsometry is a measure of the relationship between elevation and area in a catchment. Catchment hypsometry may reveal local flood response and erosional maturity The hypsometric integral (HI) expresses the elevation/relief ratio and is often used as an estimate of the erosional development of a catchment. Strahler (1952) described HI values of 0.60 indicate “actively uplifting” or “young” basins. Intermediate or straight hypsometric curves (HI 0.50) suggest a relatively stable landscape. Please note that complex interplay of climatic, tectonic factors, as well as sedimentation and rock resistance may produce similarly shaped curves. The HI may hence be somewhat ambiguous and needs to be evaluated carefully Hypsometric curves plot normalized relief against the catchment’s normalized cumulative area. The curve’s shape may reveal the dominant geomorphic processes in the catchment (diffusive/fluvial). Convexity indicates a larger portion of the catchment’s area (volume of rock and soil) in the higher elevated areas of the catchment where diffusive hillslope processes dominate. Concavity implies a larger portion of the catchment’s area at lower elevation and more channelized, linear, fluvial or alluvial processes. Hypsometric curves and integrals where first introduced by Strahler (1952) and are commonly calculated for catchments. If the HI is calculated using a regular kernel (e.g., a 3  3 cell window) it is referred to as the elevation relief ratio (ERR) Mountain front sinuosity is a classic index of tectonic activity, based on the notion that straight mountain fronts tend to lie along active faults (Burbank and Anderson, 2011)

Catchment relief (km) divided by catchment length (km) Stream frequency (F) counts all stream segments per unit area of a catchment to describe the stream network’s texture, strongly governed by bedrock and surficial material properties (strength, fracture density, infiltration, mass wasting tendencies)

DD ¼ sum(L)/A where: DD ¼ drainage density L ¼ length of channel A ¼ basin area HI ¼ (Emean  Emin)/(Emax  Emin) where: HI ¼ hypsometric integral E ¼ elevation

Smf ¼ Lmf/Ls where: Smf ¼ mountain front sinuosity Lmf ¼ sinuous length measured along a path at the break of mountain slope and alluvial fan Ls ¼ length of the mountain front segment (straight line) [km/km] F ¼ N/A where: F ¼ stream frequency (total number of channels per unit area) N ¼ number of channels of all stream orders A ¼ catchment area

GIS Applications in Geomorphology

85

Table 1 Frequently used indices in GIS applications in geomorphology (collected from various sources, see references in table, online resource: http:// gis4geomorphology.com/)dcont'd Index

Description

Formula, units

Terrain or surface roughness (ruggedness)

(A) Relative topographic position (also: topographic position index) estimates terrain ruggedness and serves as an index for local elevation. Topographic position of each pixel is a relative metric based on its local neighborhood Used to identify landscape patterns corresponding to environmental (geomorphic, vegetational, etc.) factors. Applicable to bathymetric data (B) Standard deviation of elevation is a statistic measure of topographic roughness (C) Slope variability calculates the slope relief based on a slope raster and its local cell neighborhood (e.g., >100 m) (D) Basin-scale Ruggedness (Rb) compares the relief of catchments using streamlines and basin boundary polygons (E) Standard deviation of residual topography compares ratio of surface height and averaged surface on a local cell neighborhood (Grohmann et al., 2011) (F) Standard deviation of slope (Smith, 2014)

Valley width-to-height ratio

Valley width-to-height ratio (Vf) compares erosional patterns between catchments (one Vf value per catchment) based on values derived from a DEM or aerial photos along a single cross-section per catchment. Vf was originally used to distinguish V-shaped valleys (low Vf values, often close to 0) from U-shaped valleys (higher Vf values) (Burbank and Anderson, 2011)

Glaciality index

The glaciality index (GI) measures the concavity of a valley flank based on the idea that glacial valleys have a parabolic cross-section. It is defined as the exponent of a power-law fitted to the valley flank, after Svensson (1959) and was calculated automatically by Prasicek et al. (2015). A GI of 1 represents a fluvial valley and progressively higher exponents indicate progressively more U-shaped valleys The steepness index (ks) is the factor of a power-law describing the drainage areadchannel slope relation after Flint (1974). Drainage area can be combined with channel slope to derive the stream power or steepness index, a simple metric for the ability of a stream to incise into bedrock (Flint, 1974) It should be emphasized that channel slope represents elevation change over flow path length and hence differs from the topographic gradient as calculated in most GIS. The exponent q depicts the shape of the power-law that describes the relation between channel slope and A. This relation can only be described by a single power-law (i.e., uniform stream power regardless of location) if (i) influencing factors such as climate and rock type are homogenous and (ii) the topography is steady over time (Whipple and Tucker, 1999). If q is known and fixed, ks becomes ksn, the normalized steepness index, and can be used to identify a change in stream power and hence deviations from the spatial and temporal constraints mentioned previously. q generally ranges between 0.25 and 0.7 (Tucker and Whipple, 2002; Whipple, 2004; Whipple et al., 2013), but most studies use a reference value of 0.5 (Hack, 1957) or 0.45 (Whipple et al., 2013)

(A) (“DEMsmooth”  “DEMmin”)/ (“DEMmax”  “DEMmin”) where: DEMsmooth ¼ smoothed elevation raster (10  10 pixels) DEMmin ¼ minimum elevation raster DEM max ¼ maximum elevation raster (B) (“DEMmean”  “DEM”)/“DEMrange” where: DEMmean ¼ mean elevation raster DEMrange ¼ raster containing range of elevation values DEM (original elevation raster) (C) SV ¼ “Smax”  “Smin” where: SV ¼ slope variability Smax ¼ maximum slope value raster Smin ¼ minimum slope value raster (D) Rb ¼ A/DD where: Rb ¼ catchment ruggedness index A ¼ area of the catchment DD ¼ drainage density (calculated from segmented streamline) (see also: Drainage density) STD (DEM  DEM average) Averaged surface can be low pass filtered, for example STD (Slope) Vf ¼ 2Vfw/[(Eld  Esc) þ (Erd  Esc)] where: Erd ¼ elevation of the river-right valley divide (ridgeline) Eld ¼ elevation of the river-left valley divide (ridgeline) Esc ¼ elevation of the valley floor (canyon) Vfw ¼ width of valley floor y ¼ axb b ¼ power-law exponent

Steepness index (stream power)

ks ¼ AqvH vL ks ¼ steepness index A ¼ drainage area q ¼ concavity index ¼ channel slope

(Continued)

86

GIS Applications in Geomorphology

Table 1 Frequently used indices in GIS applications in geomorphology (collected from various sources, see references in table, online resource: http:// gis4geomorphology.com/)dcont'd Index

Description

Formula, units

Gradient index

The gradient index (SL) (Hack, 1973) allows the identification of breaks in channel geometry based on the assumption that channel length L is related to channel slope (Hack, 1957)

Topographic Wetness index (TWI)

A parameter describing the tendency of a location to accumulate water. TWI was developed within the hydrological runoff model TOPMODEL (Beven and Kirkby, 1979) and applied in studies on soil moisture, soil chemistry or species distribution analysis (Marthews et al., 2015)

Connectivity index (IC)

The connectivity index (IC) focuses on the influence of topography on sediment flux. It is intended to represent the linkage between different parts of the catchment and aims, in particular, at evaluating the potential connection between hillslopes and features of interest like channels, sinks and sediment storage landforms. (Cavalli et al., 2013)

SL ¼ vH vL  L SL ¼ gradient indexvH=vL ¼ channel slope L ¼ channel length from the divide A TWI ¼ ln½ tanb  where: A ¼ upslope area draining through a certain point per unit contour length b ¼ local slope   D IC ¼ log10 Dup dn component Dup: upslope p ffiffiffi   Dup ¼ WS A Ddn: downslope component P Ddn ¼ Wdi iSi i

W ¼ is a weighting factor (based on roughness) S ¼ average slope gradient of the upslope/downslope contributing area A ¼ contributing area d ¼ length of the per cell flow path in steepest downslope direction

comparisons between study areas worldwide. Medium to high resolution datasets (cell size < 30 and  1 m) are typically national grids with a more limited extent and are a good choice for regional modeling of different LSPs. Submeter resolutions are mostly produced by individual campaigns and spatially limited to single catchments or landscape patches. Such data sets are inevitable for detailed analyses of weathering processes, soil erosion, and rock wall retreat. Acquisition techniques vary and comprise active (radar, LiDAR) and passive (optical) remote sensing. While terrestrial and airborne LiDAR dominated the acquisition of high-resolution elevation models over the last two decades, photogrammetry has experienced a renaissance due to affordable drone technology and very high-resolution DEMs from SFM techniques. The most widely used global DTMs with a resolution < 90 m are the elevation data of the shuttle radar topography mission (SRTM) (Farr et al., 2007), available from 60 degree north to 60 degree south with a resolution of 1 arc second (approximately 30 m at the equator) and the ASTER GDEM (Gesch et al., 2012), a global DSM derived from satellite imagery via stereo photogrammetry and available from 83 degree north to 83 degree south with a resolution of 1 arc second. Furthermore, HydroSHEDS (Hydrological data and maps based on Shuttle Elevation Derivatives at multiple Scales) (Lehner et al., 2008) are derived from different SRTM versions and provide hydrographic information (e.g., imprinted river networks) along with elevation data at a resolution of 3 arc seconds. The radar interferometry-based WorldDEM (http://www.intelligence-airbusds.com/worlddem/) developed by the TanDEM-X Mission by the German Aerospace Center was just completed in October 2016 and for the first time provides a global terrain model with a resolution of 12 m (Zink et al., 2006, 2011).

2.05.3.2 2.05.3.2.1

Optical Imagery Optical satellite imagery

Geomorphologists have been using remotely sensed imagery since it became available during the first half of the 20th century. Carl Troll was one of the first physical geographers who systematically used aerial imagery for the emerging field of geomorphology (Lautensach, 1959). While conventional aerial photography is still widely applied for local studies, satellite remote sensing has become a useful tool when looking at larger areas. With the start of the first Landsat satellite in 1972, conventional earth observation from space entered a new era. It was now possible to survey large areas continuously from space, revisiting the same locality in only 18 days’ time. Since then, the lower earth orbit (160–2000 km above ground) has become packed with satellites from different agencies. In addition to the large fleet of satellites launched by the US National Aeronautic and Space Administration (NASA) since the 1970s, several other national space agencies and private companies launched their own earth observation missions, for example, the SPOT 1-7 satellites (launched between 1986 and 2014) by CNES (France), the IRS family of satellites (launched between 1988 and 1996) operated by ISRO (India), the World-View 1–4 satellites (Digital Globe) (launched between 2007 and 2016), and Sentinel 1–3 satellites by ESA (launched between 2015 and 2017), to name only a few.

GIS Applications in Geomorphology

87

600

Landsat (n = 626)

500

Keyword search in Web of Science

Photogrammetry (n = 324)

2000 1500

400

Structure from motion (n = 217)

0

100

500

200

1000

300

Landsat archive opened

0

Absolute number of publications

Remote sensing (n = 2111)

Absolute number of publications

2500

With the ongoing development of new sensors (and the launch of new satellites), spatial ground resolution of optical imagery has been enhanced very quickly. While the first Landsat satellites had a maximum ground resolution of 60 m (resampled), the WorldView-3 satellite’s panchromatic channel is collecting images with a ground resolution of 31 cm, a level of detail that only aerial surveys could achieve before. Services like GoogleEarth or BingMaps make these high-resolution images available for visual interpretation and comparison of different time steps. For geomorphological applications, the temporal resolution of satellite images might be at least as important as the everincreasing spatial resolution of newly launched missions. For the detection of geomorphological changes, both long time series of satellite imagery and short revisit times of the same locality are of vital importance. Increasing computational power but also political decisions during the past decade have boosted the use of satellite remote sensing products in geosciences (Wulder and Coops, 2014). Since a change in data policy in 2008, millions of images acquired by the Landsat family of satellites between 1972 and today have become available at no cost. The Landsat 8 satellite, launched in 2013 to ensure data continuity with nearly the same specifications as before, collects several hundred images each day, revisiting the same location every 16 days. The free distribution of these images via the USGS EarthExplorer (https://earthexplorer.usgs.gov) has spurred the use of satellite products in geomorphological research, enabling the detection of geomorphological change from space over more than four decades. Shortly after 2008, geoscientific studies making use of freely available Landsat imagery sharply increased (Fig. 2). Landsat imagery is not the only high-resolution imagery available at no cost. In 2016, data recorded by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), a sensor onboard the Terra satellite (sensor resolution: 15–90 m), became freely available as well (https://asterweb.jpl.nasa.gov/). Since 2014, the Sentinel satellite imagery from the European Space Agency (ESA) has been distributed via the ESA Science Hub (https://scihub.copernicus.eu) at no cost. Launched in 2015, the Sentinel-2A satellite already collects optical imagery at 10 m ground resolution with a revisit interval of 10 days at the equator, delivering comparable ground resolution and spectral specifications as Landsat and ASTER imagery. Together with the Sentinel-2B satellite (launched in early 2017; phased at 180 degree with Sentinel-2A), the Sentinel-2 mission will deliver optical imagery with a revisit time of 5 days at the equator (2–3 days in mid-latitudes). Alongside with the remote sensing products, ESA offers a stand-alone desktop Sentinel-2 Toolbox that features a set of processing and visualization tools for Sentinel-2 imagery, but also for other ESA and third-party remote sensing data. The potential for geomorphological research to gain knowledge through the enormous amount of remote sensing products is huge. One of the most prominent applications in geomorphology is the detection of glacial changes from space (Kääb et al., 2016; Paul et al., 2015), but multitemporal satellite imagery has also been applied to landslide studies (Scaioni et al., 2014; Stumpf et al., 2017), fluvial geomorphology (Legleiter and Fonstad, 2012; Rowland et al., 2016), and coastal erosion (Hara et al., 2015; Li and Damen, 2010). It is beyond the scope of this article to give a full overview of potential applications; the following example shall merely highlight the potential of satellite imagery to (a) pinpoint geomorphological change in remote regions and (b) to determine rates of certain processes from space. Despite potential distortion by intensive cloud cover, the high spatio-temporal resolution of Sentinel-2A makes it possible to detect geomorphological change in remote areas. In early 2016, a rock avalanche detached from the northeastern flank of the Cerro

1980

1990

2000

2010

Year of publication Fig. 2 Web of Science search results for different keywords. Results show absolute number of publications that have the keyword listed as topic. Search results were refined by Web of Science categories: Geosciences multidisciplinary, geography physical, environmental sciences and geology. Data from webofknowledge.com.

88

GIS Applications in Geomorphology

69°48‘W

69°47W

69°48‘W (C)

33°27‘S

(B)

69°47W

0

1000

2000 m

0

1000

2000 m

Panel B & C

33°28‘S

33°26‘S

(A)

Cerro Alto San Juan

69°49‘W

69°47‘W

69°45‘W

Fig. 3 Observation of geomorphological processes in remote regions: Multitemporal Sentinel 2A optical imagery (10 m ground resolution) of a rock avalanche at the Cerro Alto San Juan massif on the border of Argentina and Chile. (A) Image acquired on 26 January 2017 showing the rock avalanche in very clear conditions. (B) Image acquired on 22 January 2016, shortly before the rock avalanche detached. (C) Image acquired on 04 February 2016 showing the freshly deposited material on the glacier. Image courtesy of ESA.

Alto San Juan, situated on the border between Argentina and Chile (Fig. 3). The mass detached at around 5200 m and dropped onto the large glacier descending from the massif. The rock avalanche deposit covers an area of roughly 1 km2 (> 2000 m length; 500 m width). Repeated Sentinel-2A imagery narrows the detachment down to the time span between 22 January 2016 and 04 February 2016. Comparison with an image obtained on 26 January 2017 shows how the debris is transported 50–100 m on top of the moving glacier.

2.05.3.2.2

Unmanned aerial vehicles and structure from motion

Regarding the ground resolution, conventional aerial photography still outperforms satellite imagery by far. The major downside of conventional aerial photography, however, is the cost-intensive data acquisition from small planes or helicopters. Technological advances during the past decade revitalized the use of aerial photography in geosciences. Unmanned aerial vehicles (UAVs), also referred to as drones or multicopters, are available at low cost and make data acquisition cheap and relatively easy. Especially in remote areas and in steep and rugged terrain, where shadowing effects limit the use of conventional aerial photography, the acquisition of high-resolution aerial photography from UAV platforms offers new possibilities. Furthermore, repeated surveys make geomorphological change detection also feasible on small scales that remote sensing data from satellites cannot resolve in comparable detail. Along with the advances in data acquisition, newly developed image processing software and algorithms offer new opportunities for geoscientific research. Digital photogrammetry, grounding on the same principles as classical photogrammetry, is a powerful method for extracting digital topography from overlapping images (Baltsavias et al., 2001; Keutterling and Thomas, 2006; Lane et al., 2000). In the last couple of years, SFM, a new low-budget photogrammetric method to create DEMs from overlapping images,

GIS Applications in Geomorphology

89

has caught the attention of the geoscience community (Fig. 1). As opposed to classical photogrammetry, picture geometry, orientation, and position of the cameras toward the object of interest are not a prerequisite. SFM software calculates these parameters by matching common features in a set of overlapping digital images (Snavely et al., 2008). For a comprehensive summary of SFM tools and their applications in geosciences, see Westoby et al. (2012).

2.05.3.3

Other Data Sources

GIS offers numerous possibilities to combine data sets of different sources for analysis and visualization. Besides DEM and image data, all kinds of field and lab data can be imported into GIS for analysis. Data on surface material characteristics produced by sampling, coring, or near-surface geophysics, mapping data, as well as measurements of process rates and lab data, for example, sediment analysis or dating information, can be combined with digital land surface data. A prerequisite for the combination of field/lab and digital data is correct geopositioning. Global positioning systems (GPSs) have become a standard requisite for field work. Modern GPS receivers combined with correction signals transferred via mobile communications deliver highaccuracy positioning data. Additionally, low-resolution positioning is available in every smartphone, which appears almost ubiquitous today. Great potential lies in the combination of surface and subsurface information. Especially with the commonly applied techniques of near-surface geophysics, landforms can be studied in three dimensions. The most frequently used techniques in geomorphology are ground penetrating radar, seismic methods, resistivity and EM methods, and gravity methods (Kruse, 2013; Schrott and Sass, 2008). Since most geophysical systems have a distinct data format and many methods deliver data along a survey line, transfer of subsurface information often requires a conversion of, for example, depth information into point or line data, before further processing in GIS software is possible.

2.05.4

Digital Geomorphological Mapping

2.05.4.1

Map Creation

Geomorphological mapping is a fundamental tool for geomorphologists. Geomorphological maps are highly complex thematic maps that contain various different layers of geomorphological information. Digital map creation requires good map design and sophisticated graphical tools to produce a readable and understandable map. The developments of graphical and analytical functions in GIS software provide numerous valuable tools that facilitate map creation and distribution. Compared to analog methods of map creation, the application of GIS software tools represents a significant simplification of the production process and an important reduction of creation time and production costs. Within GIS software, manipulation and analysis of various types of geomorphological information, for example, delineation, measurement, mathematical operations and others, and the design and production of the mapare possible. Furthermore, the logical storage structure of geomorphological data enables rapid production of derivative maps with special thematic focus, like process domains, surface processes, surface material, or other. Geomorphological maps are created using either data gathered during field campaigns and/or data extracted from digital data sources like aerial photography, satellite imagery, and DEMs. Field mapping is significantly enhanced using mobile devices like tablets or handheld computers connected with GPS. Field mapping software, usually a GIS-type software, enables direct collection of observations into a georeferenced database system that can later be transferred to the desktop GIS used for map creation (Gustavsson et al., 2008; Minar et al., 2005). The mapping process can be performed manually, automated, or semiautomated. Manual mapping relies on the experience and competence of the mapper using visual heuristics to identify landforms of interest. The method is simple and rapid to deploy, and accuracy is generally high. Automated or semiautomated mapping allows generation of more objective and repeatable information but usually falls behind in accuracy compared to manual approaches. Corresponding methods rely on feature extraction techniques applied to satellite/aerial imagery or different types of DEMs and their derivatives (see section “Automated land surface classification”). The representation of a landform on an image is dependent upon (i) the landform itself, (ii) the data source, and (iii) the visualization method (Otto and Smith, 2013; Smith, 2011). Smith and Wise (2007) identified three main controls on the representation of landforms on images: (i) relative size: the size of the landform relative to the spatial resolution, (ii) azimuth biasing: the orientation of the landform with respect to solar azimuth, and (iii) landform signal strength: the tonal/textural differentiation of the landform. Consequently, the relative reflectance of the landform in relation to surrounding features determines the detectability of a landform. DEMs are applied using derivatives of elevation that provide various forms of visualization of inherent information including relief shading, gradient (slope angle), or curvature classification. The complex content of geomorphological maps is depicted using compound and often illustrative symbols (Otto et al., 2011). GIS software provides tools for the creation of custom symbols representing geomorphological features and functions for cartographic design and map production. Digitally produced maps are easily distributed in various formats ranging from print maps to online web services, making full use of the data organization structure and the georeferencing of the data (Smith et al., 2013). Additionally, the standard PDF (Portable Document Format) has been extended into a GeoPDF for display and dissemination of referenced map data. Geospatial functionality of a GeoPDF includes scalable map display, layer visibility control, access to attribute data, coordinate queries, and spatial measurements (www.terragotech.com).

90

GIS Applications in Geomorphology

2.05.4.2

Automated Land Surface Classification

2.05.4.2.1

General land surface classification

The extraction of discrete entities from a continuous digital image has been a main research field within geomorphometry for decades. Evans (2012) stresses the need to distinguish between landform and land surface form. The difference between the two arises from the discontinuity or continuity of the feature, respectively (referring to the concept of general and specific geomorphometry, see the previous discussion). Tasks related to the description of the continuous land surface are here addressed as general land surface classification, while discontinuous landforms will be treated in the following chapter on specific land surface classification. Theoretical frameworks for general land surface classification have been developed by several authors, mostly addressing basic LSPs, such as elevation, slope, aspect or curvature. Based on these characteristics the continuous land surface is split into discrete parts termed land surface elements. Their main characteristic is geometric homogeneity (Minár and Evans, 2008). Dikau (1989), for example, proposed nine landform elements, defined by their profile and plan curvature, to represent the building blocks of the land surface (Fig. 4). MacMillan et al. (2000) and Schmidt and Hewitt (2004) both used plan

(A)

Profile straight

Concave

Concave

X/X

SF/X

V/X

Plan straight

X/SL

SF/SL

V/SL

Convex

Profile curvature

X/V

SF/V

V/V

Plan curvature

Convex

12°16′0″E (B)

Relief Class.—Dikau SF/SL—flat X/X SF/X V/X X/SL SF/SL V/SL X/V SF/V V/V Lakes

47°8′0″N

Glaciers

0 0.25 0.5

1 km

Fig. 4 (A) Relief classification using the approach by Dikau (1989). (B) Basic landform elements based on curvature conditions in two directions. Based on Dikau, R. 1989. The application of a digital relief model to landform analysis in geomorphology. In: Raper, J. F. (ed.) Three dimensional applications in geographical information systems. London: Taylor & Francis.

GIS Applications in Geomorphology

91

and profile curvature, slope and slope position for automated segmentation of landforms into landform elements based on DEMs, heuristic rules and fuzzy logic. They extend the model of Dikau (1989) by nine elements with classes for ridges, peaks, valleys, spurs, terraces, hollows, plains, saddles, and slope position. Shary et al. (2005) presented 12 slope types, while Minár and Evans (2008) even distinguished between 25 elementary landforms. The main function of these elementary forms lies in their relation with process dynamics rather than the delineation of discrete real landforms. Curvature changes evoke an acceleration or deceleration of gravity flow and result in dispersion or concentration of transported matter (Minár and Evans, 2008). However, general land surface classification can be used as starting point for specific classification approaches (Dragut¸ and Blaschke, 2006).

2.05.4.2.2

Specific land surface classification

Specific landform classification ambitiously aims at combining single pixels or landform elements to landforms as perceived by experts and hence is somewhat more subjective and adds a lot of additional complexity (Hengl and Reuter, 2009). Attempts to classify entire scenes or at least specific landform types have only partially been successful. This is mostly owed to the exceptional complexity immanent in Earth’s landscapes. The surface of real-world landscapes or even small parts of it can never be exactly matched by mathematical representations of landforms. As a consequence, variations in the appearance of landforms have to be considered and measures for similarity have to be applied. Beyond these technical issues, different geomorphological processes may produce similar landforms, a mechanism that is known as “equifinality” and considerably hampers the interpretation of landscape shape. Analysis of remote sensing data including DEMs as well as aerial and satellite imagery is mainly performed in a pixel-based manner. Each pixel is treated separately and only a few attributes are at hand for characterization and classification. The adoption of image segmentation and object classification techniques from computer science to GIS applications (Blaschke and Strobl, 2001) allowed overcoming these limitations which can be particularly useful for the automated interpretation of complex landforms. Image segments are regions which are automatically merged from pixels referring to one or more criteria of homogeneity in one or more dimensions of feature space (e.g., spectral reflectance). The resulting objects can be described and classified using additional spectral (e.g., mean, median, variance), geometrical (e.g., circularity), textural, and hierarchical information (Blaschke, 2010). Schneevoigt et al. (2008) combined ASTER satellite imagery with DEM data to detect 20 different alpine landform types. They report an overall accuracy of 92% with good results for talus slopes, free rock faces and cirque walls but detection problems for fluvial, glacial, and debris flow deposits. Among the most successful approaches for a gapless classification of a landscape into individual landforms, Anders et al. (2011) employed an object-based approach to extract karst, glacial, fluvial, and denudation landforms from a high-resolution DTM and reached an overall accuracy of about 70%. Focusing on specific landforms only, d’Oleire-Oltmanns et al. (2013) and Eisank et al. (2014) reached accuracies around 60% when attempting to automatically map gullies and drumlins. Object-based image analysis is a promising technique for landform classification but clearly further research is needed to handle the exceptional complexity of real-world landscapes in automated routines of landform mapping.

2.05.5 Application of Various Geomorphological Indices for Process and Landform AnalysisdCase Study Obersulzbach valley, Eastern Alps, Austria Instead of reviewing the vast number of studies that apply GIS in geomorphological research, we now present the application of selected topographic indices and geomorphological analyses for a local test site in the Eastern European Alps. The Obersulzbach valley is located north of the main divide in the Hohe Tauern range in Austria and represents a southern tributary of the Salzach river (Fig. 5). It covers an area of 81 km2 of complex high alpine relief between 850 and 3657 m. Glacial imprint dominates surface morphology and is manifested by numerous cirques in the upper reaches and pronounced U-shaped sections along the main channel of the valley. The longitudinal profile of the main valley shows two pronounced steps, one located about 4 km into the valley and the other located 6 km from the valley head (Fig. 6). The first step separates the deeply incised, rather V-shaped lower level that connects to the Salzach valley from a U-shaped upper level. The upper step separates the U-shaped section from the cirque-like valley head area. Glaciers are still present in some of the cirques and cover large parts of the valley head, summing up to 17% of glacier coverage in total. The Obersulzbachkees glacier at the valley head covered an area of 15 km2 in the Little Ice Age and is now split into several glacier parts of 9 km2 area in total (last glacier extent from 2009, data by: Fischer et al., 2015). A large proglacial lake has formed in the valley (second pronounced step) since the late 1990s. The calculation of the geomorphological indices is based on a DEM with cell size of 10 m (provided by data.gv.at under the INSPIRE framework). We used ArcGIS, SAGA GIS, and TAUDEM for the analysis and ArcGIS for visualization. The results of the case study can be accessed online on a WebGIS application (https://tinyurl.com/webgis-book-chapter).

2.05.5.1

Hillslopes and Gravitational Processes

Process dynamics are governed by surface topography, impacting on energy potentials, for example, by intensification of attenuation of friction or diffusion, or concentration of flow. In absence of convergent flow and moving ice, hillslope processes are mostly controlled by surface slope and curvature. Erosion is most commonly modeled by hillslope diffusion where sediment transport depends only on slope and a diffusivity constant (Montgomery and Foufoula-Georgiou, 1993; Dadson and Church, 2005;

92

GIS Applications in Geomorphology

10˚0⬘0⬙E

11˚0⬘0⬙E

12˚0⬘0⬙E

13˚0⬘0⬙E

12˚12⬘0⬙E 47˚14⬘0⬙N

(A)

12˚14⬘0⬙E

12˚16⬘0⬙E

12˚18⬘0⬙E

12˚20⬘0⬙E

(B)

47˚6⬘0⬙N

47˚8⬘0⬙N

47˚10⬘0⬙N

46˚0⬘0⬙N

47˚12⬘0⬙N

47˚0⬘0⬙N

48˚0⬘0⬙N

49˚0⬘0⬙N

9˚0⬘0⬙E

0

1

2

4

km

Fig. 5 (A) Location of the study area. (B) Aerial image of the Obersulzbach valley, Eastern European Alps, Austria. Free orthofoto data from basemap.at.

0.45 2500

0.40

Elevation (m)

0.30 0.25

1500

0.20 1000

0.15

Channel slope ( )

0.35

2000

0.10

500

0.05 0

0

5000

10,000

15,000

20,000

25,000

0.00

Distance (m) Fig. 6 Longitudinal profile of the Sulzbach Creek draining the Obersulzbach Valley. Note the changes in channel slope along the path and the distinct steps along the profile (cf. text for details).

GIS Applications in Geomorphology

93

12°16′0″E

Hillslope length (m) 0–150 150–300 300–450 450–600 600–750 750–900 900–1,050 1,050–1,200 1,200–1,350 1,350–1,500 1,500–1,650

47°8′0″N

Lakes Glaciers

0

0.25 0.5

1 km

Fig. 7

Hillslope length index in the study area.

Egholm et al., 2012; Tucker and Bras, 1998; Kirkby, 1987). This makes basic LSPs such as slope and curvature very valuable for evaluating hillslope processes. Surface slope controls how gravitational acceleration is split into retentive and detaching forces and profile curvature holds information on how this relation changes. Convexity indicates areas of increasing slope, increased sediment transport capacity and hence erosion, while concavity represents reduced transport capacity which leads to deposition of material. The length and relief of the hillslopes holds information on forcing at a range of scales (e.g., Grieve et al., 2016) and plays an important role in the universal soil loss equation for estimating soil erosion (e.g., Hickey et al., 1994; Moore and Burch, 1986; Liu et al., 2000) and in slope susceptibility for shallow landslides (e.g., Carrara, 1983; Gómez and Kavzoglu, 2005). Furthermore, following Salcher et al. (2014), glacial landscapes tend to show longer hillslopes and greater hillslope relief than fluvial ones. For our study area, we calculated both hillslope length and relief with TauDEM (Tarboton et al., 1991) and observe this trend only in distinct cirques on the western valley flank. However, less dissected terrain can generally be expected to have longer hillslopes (Fig. 7). Another example for assessing the impact of surface characteristics on hillslope processes is the surface roughness index. Roughness, also termed ruggedness or microtopography, describes the local variability of elevation on a given scale. On a large scale, roughness of a land surface is controlled by the number, size and distribution of landforms and is expressed by landform and landform elements or breaklines. On small scales, roughness of a single landform results from material properties, processes acting upon it, and the time since formation (Grohmann et al., 2011). In this case it signifies grain size of surficial deposits, or surface smoothing due to erosion. The parameter is applied, for example, in fluvial geomorphology to characterize bed morphologies and fluvial erosion, in hydrological modelling, in weathering studies, for landslide detection and for relative age dating (Smith, 2014). In permafrost studies surface roughness controls energy transmission into the ground and fosters permafrost formation (Otto et al., 2012). Numerous approaches exist to quantify surface roughness (see Table 1; Grohmann et al., 2011; Smith, 2014). We have applied two ways to calculate roughness in the study site: (1) standard deviation (STD) of slope (Fig. 8A) and (2) STD of residual topography (Fig. 8B). Both calculations use a focal statistics tool with a 3  3 cell analysis window. Residual topography results from the difference between the surface height and a low pass filtered height. STD of slope describes sensitivity to changes in curvature, while STD of residual topography represents changes in altitude and hence gradient. Both approaches are sensitive to major breaklines and steps, especially at the cirque-main valley boundary (Fig. 8). They allow for a delineation of linear landforms, like ridges, lateral moraines, or steep walls. They also depict differences in roughness between bedrock slope, for example, cirque walls, and sediment covered cirque floors (see eastern valley flanks).

2.05.5.2

Glacier Environments

GIS play an important role in analyzing and visualizing glaciers and glacially sculpted landscapes (compare chapter on GIS for Glaciers). GIS are used by glacial geomorphologists to integrate multisource data, manage multiscale studies, identify spatial

94

GIS Applications in Geomorphology

12°16′0″E (A)

Slope STD ( ) 0.0–1.4 1.5–2.7 2.8–4.1 4.2–5.6 5.7–7.3 7.4–9.3 9.4–11.8 11.9–15.1 15.2–27.4 Lakes

47°8′0″N

Glaciers

0

0.25 0.5

1 km

12°16′0″E (B)

Residual topography (m) 0–0.56 0.57–1.21 1.22–2.06 2.07–2.99 3–4.11 4.12–5.51 5.52–7.29 7.3–9.72 9.73–14.01 14.02–23.82 Lakes

47°8′0″N

Glaciers

0

0.25 0.5

1 km

Fig. 8 Surface roughness calculated using the standard deviation of local slope (A) and the residual topography approach (B) in the study area (extract). See text for explanations.

and temporal relationships and patterns in geomorphological data, and to link landform data with numerical models as part of model calibration and verification (Napieralski et al., 2007). When looking at the ice itself, mass balance constraints are inevitable to understand glacier development and GIS are ideal to process and visualize related data. For example, GIS can be used to visualize manually mapped glacier extent or to even apply automated mapping routines based on DEM and spectral data (Racoviteanu et al., 2008). If glacier extent is known, the altitude of the equilibrium line (ELA), the virtual border between accumulation and ablation area that is central for mass balance assessment, can be calculated by applying different methods such as the accumulation area ratio (AAR).

GIS Applications in Geomorphology

12°16′0″E

95

12°20′0″E

ELA (m) 2400–2500 2500–2600 2600–2700 2700–2800 2800–2900 2900–3000 3000–3100 3100–3200 3200–3300

47°8′0″N

Lakes

0

0.5

1

2 km

Fig. 9

Equilibrium line altitudes of the glaciers of the Obersulzbach Valley.

The ELAs for the glaciers of our study area range between 2400 and 3300 m (Fig. 9). The distribution of the glaciers in the Obersulzbach valley already indicates the importance of climatic influences. While glaciers still exist on the slopes facing towards east and north, western slopes only show remnants of past ice cover. The climatic influence is further indicated by the relation between ELA and exposition. While northeast-facing glaciers feature the lowest ELAs, south-facing glaciers have the highest ELAs. When applying the AAR method for ELA calculation, the mass balance of the reference glaciers needs to be considered. If these glaciers were not fully adapted to the climatic conditions during ELA measurements, any deviation in mass balance is transferred to the ELAs calculated with the AAR method. Once the ELA is known, be it from field campaigns or GIS-based models, the glacier surface can be split into accumulation and ablation area. Shear stress models can be used to derive ice thickness estimations and hence calculate ice volume and water equivalent (Huss and Farinotti, 2012; Frey et al., 2013). Mass balance-based models can be applied to calculate ice flux and to retrieve similar results. We applied the model GlabTOP2 developed by (Frey et al., 2013) to the current glacier extend of our study site. The model calculated ice thickness from slope and shear stress relationships based on a simplified term presented by Haeberli and Hölzle (1995). Ice thickness modeled ranged from less than 25 m for many small cirque glaciers to maximum values of more than 200 m for the larger glaciers at the valley head (Fig. 10). Glacially sculpted landscapes are also frequently analyzed using GIS, be it the morphometric characterization of moraines or cirques, or automated mapping of glacial landscapes and glacial imprint based on estimations of cross-sectional valley shape

96

GIS Applications in Geomorphology

12°16′0″E

12°20′0″E

Ice thickness (m) 5–25 25–50 50–75 75–100 100–125 125–150 150–175 175–200 200–225 225–250

47°8′0″N

Lakes

0

0.5

1

2 km

Fig. 10

Ice thickness modeling using the model GlabTOP2 applied to the glaciers in the study area (Frey et al., 2013).

or hypsometric indices (e.g., Prasicek et al., 2015; Smith et al., 2006). Furthermore, a variety of GIS-based models has been applied to assess the volume of glacial valley fill (e.g., Mey et al., 2016; Jaboyedoff and Derron, 2005a). The hypsometry of glaciated landscapes has been used to predict the state of glacial landscape evolution and to distinguish between glacial and fluvial terrain (Brocklehurst and Whipple, 2004; Sternai et al., 2011). In our example, we quantify the U-shapedness of the valleys in our study area based on the glaciality index (GI), derived from parabolas fitted to valley flanks for all flow path cells (Prasicek et al., 2014, 2015). To analyze the glaciers themselves we use outlines mapped by Abermann et al. (2009). We calculate the approximate ELA of all glaciers in the study area employing an AAR of 0.6, which is in the center of the range of reported values (e.g., Porter, 1975; Benn and Lehmkuhl, 2000; Gross et al., 1977). For calculating U-shapedness as a proxy for the degree of glacial imprint, we automatically determine valley width using a multiscale curvature approach (Prasicek et al., 2014) and subsequently fit power-laws to the valley flanks (Prasicek et al., 2015). These calculations are performed for flow path cells only and then averaged for a spatially continuous result. Results show that deeply incised and dissected parts of the study area have a fluvial GI and hence rather straight valley flanks while the main trough and distinct cirques show increased glacial imprint. However, the effect of valley fill producing flat valley floors as well as other deposits on valley cross-sectional shape needs to be considered for interpretation (Fig. 11).

GIS Applications in Geomorphology

12°16′0″E

97

12°20′0″E Glaciality index ( ) 0.71–0.92 0.92–1.04 1.04–1.13 1.13–1.21 1.21–1.28 1.28–1.35 1.35–1.41 1.41–1.48 1.48–1.56 1.56–1.64 1.64–1.73 1.73–1.82 1.82–1.91 1.91–2.04

47°8′0″N

2.04–2.35 Lakes Glaciers

0

0.5

1

2 km

Fig. 11

2.05.5.3

Glaciality index of U-shapedness in the study area.

Periglacial Environments

Land surface is strongly interrelated with climate on a regional and local scale. The spatial differentiation of near-ground atmospheric processes and climate variables is dominantly controlled by topography. This impact of topography on climate is expressed by the term topoclimate or topoclimatology (Böhner and Antonic, 2009). In geomorphology, topoclimatic effects are related to the spatial distribution and variable morphological characteristics of landforms. Especially landforms of process domains sensitive to climatic influences such as glacial, periglacial and fluvial processes show interrelations with topoclimatic parameters, such as aspect, altitude or slope angle (Olyphant, 1977; Allen, 1998; Humlum, 1998; Sattler et al., 2016). Within periglacial geomorphology GIS have been used to model and visualize permafrost distribution since the early 1990s (Riseborough et al., 2008). Two classic models originally applied in the Swiss Alps, namely PERMAKART (Keller, 1992) and PERMAMAP (Funk and Hoelzle, 1992), can be considered as a starting point of GIS applications in permafrost research. The model PERMAKART is based on empirical geomorphological evidence concerning permafrost occurrence (e.g., lower limit of active rock glacier, basal temperature of snow) and integrates besides perennial snow avalanche deposits (protecting the ground surface from radiation) classic topoclimatic parameters such as altitude and aspect. These LSPs are used as proxy information for air temperature and solar radiation, respectively, two influential factors on the formation of mountain permafrost. Additionally, slope position and slope

98

GIS Applications in Geomorphology

N

W

E

3500

3000

Slope class 2500

Bedrock 2000

Steep slopes

< 50% > 50%

Foot slopes

< 50% > 50%

S Fig. 12

Permafrost index < 50% > 50%

Topoclimatic key used in the GIS-based permafrost distribution model PERMAKART 3.0 (Schrott et al., 2013).

angle are used representing the influence of snow cover on permafrost. The distribution modeling thus consists of a regionalization approach based on these LSPs using a DEM and the empirical data on permafrost occurrence classified by altitude and aspect, the so-called topoclimatic key. Over the years the original model structure has been modified and improved several times. One of the latest GIS-based empirical permafrost models comprises a topoclimatic key of 24 different relief classes subdivided in eight classes of aspect, each of them divided into three slope categories (rock, steep slopes and slope foot-positions, see Fig. 12) (Schrott et al., 2013). The empirical model PERMAKART 3.0 has an index-based permafrost probability ranging from 1 (very unlikely) to 100 (very likely). This allows a more transient and realistic visualization, also because the relief class “rock” displays more realistically lower permafrost probabilities even in higher altitudes (Fig. 13)

12⬚16⬘0⬙E

12⬚20⬘0⬙E

Permafrost index Probability High : 100

Low : 0

Lakes

47⬚8⬘0⬙N

Glaciers

N

0 0.4 0.8

1.6 km

Fig. 13 Map of permafrost index in the Obersulzbach valley (extract). Note the varying index values in similar aspect situations (e.g., on the eastern cirque walls) that result from differences in the topoclimatic key between rock slopes, steep slopes, and foot slopes.

GIS Applications in Geomorphology 2.05.5.4

99

Fluvial Environments

The investigation of fluvial environments is a key task in GIS applications, not only in the field of geomorphology but also hydrology and ecology. This is mainly because river networks constitute the backbone of most humid and also semi-arid landscape types worldwide. Even the course of present-day glacial valleys is determined by pre-glacial fluvial topography. Consequently, the structure of river networks manifested in basins and watersheds is commonly used to partition the land surface and the fluvial catchment is the standard unit for geomorphological and environmental analyses. In our example we provide an overview over the standard tools to extract the river network and subsequently calculate a number of indices to further characterize the drainage system. First, sinks in the DEM should be filled, if a continuous drainage network is desired. The pit-filled DEM can then be used to calculate flow direction and flow accumulation. Drainage area is one of the most important LSPs and it is typically derived from gridded elevation data via standardized GIS routines. It is used as a simple and easily determined proxy for discharge. Thus, drainage area is a major factor for assessing the erosive power of convergent flow and reveals the architecture of the drainage network, the backbone of many landscape types worldwide. A wealth of methods exists to determine drainage area from DEMs. The integration of drainage area in the direction of flow is central to all of them but they differ in the way flow directions are determined. It is beyond the scope of this book section to discuss the details of these approaches and the interested reader is referred to the work of, for example, Jenson and Domingue (1988), Fairfield and Leymarie (1991), Freeman (1991), Tarboton et al. (1991), Costa-Cabral and Burges (1994), or Seibert and McGlynn (2007). While in the classic procedure of Jenson and Domingue (1988) all flow is assumed to descend via the steepest path, cell area is partitioned between two or more flow directions in most other approachesda difference that needs consideration for geomorphological applications. In single flow direction algorithms, drainage area cannot decrease downstream, which leads to the formation of distinct flow paths. In contrast, multiple flow direction algorithms allow flow dispersion. Differences between the two types of algorithms are most pronounced in divergent areas such as ridges, hillslopes and planar valley fills (Erskine et al., 2006). Consequently, algorithms capable of flow dispersion should be used for detailed analyses of ridges and hillslopes, while single flow direction algorithms are designed for the extraction of drainage networks, Strahler orders and other derivatives such as flow length and river longitudinal profiles. Once the per-cell upstream drainage area has been computed, drainage area–slope relations can be used to identify the drainage area threshold between divergent and convergent flow, that is, the extent of the valley network, on a regional basis. Montgomery and Foufoula-Georgiou (1993) showed that the extent of topographically divergent hillslopes, and thus the extent of the valley network, corresponds to a change in sign of the relation between local slope and contributing drainage area. They further demonstrated that debris flow-dominated channels can be determined from an inflection in the drainage area–slope relation. In practice, a regional drainage area threshold for valley network initiation can be determined by establishing drainage area bins and plotting mean slope against drainage area. For our study area, such a plot is shown in Fig. 14. In our example, the slope–area relation shows a change in sign around 10 4 km2, at a scale where convergent terrain starts to establish and colluvial processes come into play. Further breaks are evident around 10 2 and 10 1 km2 and we interpret the latter one to indicate the transition from colluvial to fluvial processes. We thus

Slope ( )

100

10−1

10−5

10−4

10−3

10−2

10−1

100

101

102

103

Drainage area (km2) Fig. 14

Slope-area plot of the Obersulzbach Valley. Note the pronounced kink at a drainage area of approximately 0.1 km2.

100

GIS Applications in Geomorphology

chose this drainage area cutoff for river network extraction. The extracted drainage network acts as basis for the calculation of stream orders (Horton, 1945; Strahler, 1952), subcatchments and drainage density. We calculated drainage density per square kilometer (i) for the entire study area using a kernel with a size of 1 km2 (Fig. 15A), and (ii) for catchments of Strahler order 2 by dividing the summarized length of all drainage lines in each catchment by catchment size (Fig. 15B). Results show that elevated and cirque-shaped parts of the study area have a lower drainage density, probably due to more recent and/or more intensive activity of ice. In addition to identifying the transition between divergent and convergent flow, drainage area–slope relations and flow length– slope relations can be used to identify breaks in channel geometry and catchment evolution. Steepness index (ks) and gradient index (SL) hold information on deviations from fundamental relations between drainage area and slope and flow length and slope in a fluvial landscape, respectively. On glacially sculpted terrain deviations from fluvial topography can of course be expected to be ubiquitous. Nevertheless, both indices clearly indicate disturbances in narrow valley sections where knickpoints with comparatively steep channel sloped are located (Figs. 5 and 16). Horton-Strahler orders (Horton, 1945; Strahler, 1957) can also be calculated from the extracted drainage network. Subsequently, subcatchments can be delineated for each Strahler order which allows the comparison of landscape patches with similar drainage topology.

12°16′0″E (A)

12°20′0″E Drainage density (1/km) 0–0.5 0.5–1 1–1.5 1.5–2 2–2.5 2.5–3 3–3.5 3.5–4 4–4.5 4.5–5 Lakes

47°8′0″N

Glaciers

0

0.5

1

2 km

Fig. 15 (A) Drainage density calculated for the entire Obersulzbach Valley using a kernel of 1 km2. (B) Drainage density calculated for catchments of Strahler order 2.

GIS Applications in Geomorphology

12°16′0″E

101

12°20′0″E Drainage density (1/km)

(B)

0–0.4 0.4–0.8 0.8–1.2 1.2–1.6 1.6–2 2–2.4 2.4–2.8 2.8–3.2 3.2–3.6 3.6–4 Lakes

47°8′0″N

Glaciers

N 0

0.5

1

2 km

Fig. 15

(continued).

Hypsometry has been used to describe and interpret the topography of both fluvial and glacial landscapes (e.g., Brocklehurst and Whipple, 2004; Strahler, 1952). The hypsometric integral (HI) can be used in both cases to assess landscape maturity. However, if landscapes are influenced by both process domains, the interpretation of the HI gets difficult. We nevertheless calculate the HI in our study area (Fig. 17A), once for Strahler order 2 catchments and once applying a kernel-based approach on the entire Obersulzbach valley (elevation relief ratiodERR). The size of the kernel is 1 km2. The ERR is high for convex landscape patches and becomes lower with increasing concavity (Fig. 17B). The HI calculated for the irregularly shaped catchments of Strahler order 2 show a similar pattern with high HI values for hillslope-dominated catchments and low HI values for catchments that include a considerable amount of valley floor (Fig. 17A). Both the steepness index and the gradient index can be applied to identify spatial and/or temporal distortions of the drainage system and are hence very valuable tools for GIS-based geomorphological studies.

2.05.5.5

Sediment Flux and Erosion in Mountain Areas

GIS tools are also applied for the analysis and quantification of sediment flux. One focus in GIS-based sediment flux analysis is the issue of connectivity. Coupling of geomorphological processes and catchment connectivity are central to the efficiency of sediment flux and represent a significant impact on the sensitivity of geomorphological systems towards changes

102

GIS Applications in Geomorphology

12°16′0″E

12°20′0″E

(A)

Steepness index (m) 0–11.3 11.3–16.8 16.8–28.1 28.1–51.3 51.3–98.9 98.9–196.7 196.7–397.5 397.5–809.9 809.9–1656.7 1656.7–3395.7 Lakes

47°8′0″N

Glaciers

0

0.5

1

2 km

12°16′0″E

12°20′0″E

(B)

Gradient index (m) 0–260.8 260.8–387.8 387.8–648.7 648.7–1184.3 1184.3–2284.2 2284.2–4542.9 4542.9–9181.1 9181.1–18,705.8 18,705.8–38,265.1 38,265.1–78,430.6 Lakes

47°8′0″N

Glaciers

0

0.5

1

2 km

Fig. 16 Valley.

(A) Normalized steepness index (calculated a concavity index of 0.5) for the Obersulzbach Valley. (B) Gradient index for the Obersulzbach

GIS Applications in Geomorphology

12°12′0″E

12°16′0″E

(A)

12°20′0″E

Elevation relief ratio ( ) 0.07–0.14 0.14–0.21 0.21–0.28 0.28–0.35 0.35–0.42 0.42–0.49

47°12′0″N

0.49–0.56 0.56–0.63 0.63–0.7 Lakes Glaciers

0

0.5

1

2 km

12°12′0″E (B)

12°16′0″E

12°20′0″E

Hypsometric integral ( ) 0.26–0.30 0.30–0.34 0.34–0.38 0.38–0.42 0.42–0.46

47°12′0″N

0.46–0.50 0.50–0.54 0.54–0.58 0.58–0.62 0.62–0.66 Lakes Glaciers

0

0.5

1

2 km

Fig. 17

(A) ERR for the lower part of the Obersulzbach Valley. (B) HI for the lower part of the Obersulzbach Valley.

103

104

GIS Applications in Geomorphology

(Heckmann and Schwanghart, 2013; Harvey, 2001). Various approaches exist to visualize and quantify catchment connectivity using GIS tools. Indices of hydrological connectivity are presented by Borselli et al. (2008) and modified by Cavalli et al. (2013). They are based on parameters like slope gradient, flow length, surface roughness and contributing area. The tools have been applied in various environments and locations and used to discuss landform distribution and sediment flux (Gay et al., 2016; Lane et al., 2017; Messenzehl et al., 2014). Heckmann and Schwanghart (2013) apply a different approach using network analysis and graph theory to identify sediment cascades and delineate subcatchments of sediment flux. We have applied the connectivity index (IC) by Cavalli et al. (2013) to our study area (Fig. 18). The IC index depicts highly variable connectivity conditions, a generic characteristic of high mountain topography. Low connectivity areas (blue) are located in the floodplain and at footslopes. Zones of high connectivity represent flowlines mainly in channels and gullies. The index visualizes, for example, how sediment production and storage in cirques on the eastern valley side are connected, or disconnected, from the main valley and the dominant fluvial transport system of the main river. Several approaches have been applied for sediment flux quantification in GIS. Quantification of deposits is achieved by combining surface and subsurface data on sediment depth, for example, from logging or geophysical surveying. In this approach sediment volume results from the difference between the landform surface and the level of various sedimentary units or the bedrock boundary. Since data from geophysical surveying or from logging often represents local information of single points or along transects, interpolation of subsurface information is required and performed using GIS tools. The density of subsurface data, the interpolation method applied as well as the resolution of the surface and subsurface data determines the accuracy of the quantification result and needs to be considered. Due to the often-limited availability of detailed subsurface data due to high efforts in both time and costs of field campaigns, these approaches usually generate information on a limited number of single landforms. Examples for single landform quantification with GIS and subsequent calculation of sediment flux or erosion rates include glacial valley fills, talus slopes, rock glaciers, or moraine deposits (Sass, 2007; Schrott and Adams, 2002). Sediment budgets of entire catchments or valley fill deposits have been generated based on interpolation of dated sedimentary layers from cores (Tunnicliffe et al., 2012), on geophysical data (Otto et al., 2009; Hinderer, 2001), or a combination of both (Götz et al., 2013). An alternative solution to quantify sediment volumes is achieved by approximating the three-dimensional shape of the deposit or the bedrock boundary using geometric models or mathematical functions. Simple geometric shapes or power-law functions are applied for the representation of the volumetric body of landforms or the bedrock boundary, respectively (Hoffmann and Schrott, 2002; Schrott et al., 2003a). For single landforms sediment volumes of deposits or erosional features, for example, gullies, fans or talus slopes, are approximated using geometrical shapes, for example, cone sectors or prisms, that represent the landform. Volumes are quantified using the outline dimensions of the landform (height, length, width, depth) and the respective mathematical term corresponding to the geometry type (Campbell and Church, 2003; Curry, 1999; Shroder et al., 1999). Valley cross-sections and valley fill deposits of formerly glaciated terrain have been quantified using power law or polynomial functions based on the assumption that bedrock topography sculpted by glaciers can be described mathematically (Harbor and Wheeler, 1992; James, 1996). These mathematical approximations are applied to cross-sections and the results are interpolated in order to quantify entire valley sections (Schrott et al., 2003a; Jaboyedoff and Derron, 2005b). Finally, GIS tools are applied to quantify erosion and deposition using DEM data of different points in time. Calculating DEM of differences between two surfaces results in volumetric surface changes that can be associated with geomorphologic process activity. This approach is mostly applied for rock fall processes using high-resolution LIDAR or SFM data (Rabatel et al., 2008; Rosser et al., 2005; Warrick et al., 2017; Bremer and Sass, 2012). Even though spatial resolution of the data used in the presented cases is high, the temporal resolution depends on the frequency of measurements. The resulting erosion rates thus only represent current developments. A look into past erosion and more long-term rates can be established using DEM from historical aerial imagery (Fischer et al., 2011). Micheletti et al. (2015) use seven different scenes of aerial images between 1967 and 2005 to quantify surface changes in a high mountain environment and relate their observations to decadal climate changes in their study site. Within this environment, glacial and periglacial landforms show the greatest changes in the time period. They relate phases of warming with increasing surface displacement and downwasting and identify most dynamic changes in periods of observed increased precipitation and high temperatures. Bennett et al. (2013) quantified hillslope and channel erosion using a similar approach in the Illgraben catchment (Switzerland). Here, also periods of temperature increase and pronounced frequency and magnitude of intense rainfall events are associated with the observed changes. It is important to acknowledge that the resolution of the aerial images used, as well as the accuracy of the ground control points and the software used for DEM generation, significantly determine the observable surface changes. The studies presented generated DEMs at average resolution between 0.3–0.5 m (Micheletti et al., 2015) and 2–4 m (Bennett et al., 2012). An alternative way to quantify erosion is the approach of geophysical relief (Champagnac et al., 2007; Small and Anderson, 1998). Geophysical relief describes the eroded volume of valleys or entire mountain ranges derived from the difference between the actual surface heights and an interpolated surface connecting the highest points. The approach is used to analyze long-term landform evolution on mountain ranges and to differentiate between impacts of tectonic uplift, erosion and isostatic rebound on relief. We have applied the geophysical relief index on the study area (Fig. 19). The pattern of geophysical relief distribution in the Obersulzbach valley reveals the greatest removal in the central part above the lower step. Relief values exceed 1000 m compared to 700–900 at the lower valley level. This could hint at increased glacial erosion in this location that may be due to a longer time of glacier presence, for example, in late glacial times.

GIS Applications in Geomorphology

105

12°16′0″E (A)

Connectivity index (IC) –6.3 to –5.5 –5.4 to –4.7 –4.6 to –4.3 –4.2 to –4.0 –3.9 to –3.9 –3.8 to –3.6 –3.5 to –3.4 –3.3 to –3.0 –2.9 to –2.4 –2.3 to –1.4 Lakes

47°8′0″N

Glaciers

0

0.25 0.5

1 km

12°16′0″E

47°8′0″N

(B)

0

0.25 0.5

1 km

Fig. 18 (A) Connectivity index (IC) for the Obersulzbach Valley (part). The index displays distinct variation between valley floor locations and cirques. The largest changes applying are present between channels and flats (based on IC tool by: Cavalli et al., 2013). (B) Ortho image of the same area. Data from: basemap at, http://maps.wien.gv.at/basemap/1.0.0/WMTSCapabilities-arcmap.xml.

106

GIS Applications in Geomorphology

12°12′0″E

12°16′0″E

12°20′0″E

Geophysical relief (m) 0–100 100–200 200–300 300–400 400–500 500–600 600–700 47°12′0″N

47°12′0″N

700–800 800–900 900–1000 1000–1100 1100–1200 1200–1300 Lakes

47°8′0″N

47°8′0″N

Glaciers

0 0.5 1 12°16′0″E

2 km

12°20′0″E

Fig. 19 Geophysical relief calculated after Small and Anderson (1998). Note the differences between the central valley and the northern valley section (see text for explanation).

2.05.6

Conclusions

Modern quantitative geomorphological research can be regarded inextricably linked with GIS analysis. The availability of both high-resolution and global data on the land surface contributed significantly to recent fundamental advances in the discipline and opened new fields of research. Applications of GIS in geomorphology span from pure visualization approaches, landform

GIS Applications in Geomorphology

107

classification, land surface and hydrological analysis (usually derived from DEMs), process and erosion modeling, and topographic change detection to hazard zonation or susceptibility modeling. Herein, statistical analysis and spatial interpolation of field data as well as graphical visualization and map creation represent key features of GIS applied in geomorphology. Numerous topographic and geomorphological indices have been developed to study geomorphological form and process configurations using GIS techniques. In our paper we presented a selection of GIS-based tools and indices in some classic fields of geomorphology such as fluvial, gravitational, glacial, and periglacial environments. A promising approach which has become more popular in recent years focuses on quantifying sediment fluxes and deposits using digital data. The latter can be achieved by combining surface and subsurface data on sediment depth or by comparing surface data from various points in time. Increasing resolution of both DEM and image data, free availability of local and global data sets, and low-cost technology to generate high-resolution surface information will foster the possibilities of geomorphological analysis using GIS. Challenges, however, exist with respect to scale and the applicability of tools and parameters originally developed using data of lower resolution. High level of detail can contribute to scientific insights but may also represent noise that prevents clarity (Dragut¸ and Blaschke, 2006). Scale-dependency of LSPs and objects needs to be carefully considered when performing quantitative landform analysis.

References Abermann, J., Lambrecht, A., Fischer, A., Kuhn, M., 2009. Quantifying changes and trends in glacier area and volume in the Austrian Ötztal Alps (1969–1997–2006). The Cryosphere 3, 205. Ahnert, F., 1970. Functional relationships between denudation, relief, and uplift in large, mid-latitude drainage basins. American Journal of Science 268, 243–263. Allen, T.R., 1998. Topographic context of glaciers and perennial snowfields, Glacier National Park, Montana. Geomorphology 21, 207–216. Anders, N.S., Seijmonsbergen, A.C., Bouten, W., 2011. Segmentation optimization and stratified object-based analysis for semi-automated geomorphological mapping. Remote Sensing of Environment 115, 2976–2985. Bagnold, R.A., 1960. Sediment discharge and stream powerda preliminary announcement. In: Circular. US Geological Survey, Reston, Virginia, 421. Baltsavias, E.P., Favey, E., Bauder, A., Bosch, H., Pateraki, M., 2001. Digital surface modelling by airborne laser scanning and digital photogrammetry for glacier monitoring. The Photogrammetric Record 17, 243–273. Benn, D.I., Lehmkuhl, F., 2000. Mass balance and equilibrium-line altitudes of glaciers in high-mountain environments. Quaternary International 65–66, 15–29. Bennett, G.L., Molnar, P., Eisenbeiss, H., Mcardell, B.W., 2012. Erosional power in the Swiss Alps: characterization of slope failure in the Illgraben. Earth Surface Processes and Landforms 37, 1627–1640. Bennett, G.L., Molnar, P., Mcardell, B.W., Schlunegger, F., Burlando, P., 2013. Patterns and controls of sediment production, transfer and yield in the Illgraben. Geomorphology 188, 68–82. Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrological Sciences Bulletin 24, 43–69. Bishop, M.P., 2013. Remote sensing and GIScience in geomorphology: introduction and overview. In: Shroder, J.F. (Ed.), Treatise on geomorphology. Academic Press, San Diego. Blaschke, T., 2010. Object based image analysis for remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing 65, 2–16. Blaschke, T., Strobl, J., 2001. What’s wrong with pixels? Some recent developments interfacing remote sensing and GIS. Image Rochester NY 6, 12–17. Böhner, J., Antonic, O., 2009. Land-surface parameters specific to topo-climatology. In: Tomislav, H., Hannes, I.R. (Eds.), Developments in soil science. Elsevier, Amsterdam. Chapter 8. Borselli, L., Cassi, P., Torri, D., 2008. Prolegomena to sediment and flow connectivity in the landscape: a GIS and field numerical assessment. Catena 75, 268–277. Bremer, M., Sass, O., 2012. Combining airborne and terrestrial laser scanning for quantifying erosion and deposition by a debris flow event. Geomorphology 138, 49–60. Brocklehurst, S.H., Whipple, K.X., 2004. Hypsometry of glaciated landscapes. Earth Surface Processes and Landforms 29, 907–926. Burbank, D.W., Anderson, R.S., 2011. Tectonic geomorphology. John Wiley & Sons, Ltd, Hoboken, NJ. Campbell, D., Church, M., 2003. Reconnaissance sediment budgets for Lynn Valley, British Columbia: Holocene and contemporary time scales. Canadian Journal of Earth Sciences 40, 701–713. Carrara, A., 1983. Multivariate models for landslide hazard evaluation. Journal of the International Association for Mathematical Geology 15, 403–426. Carrara, A., Guzzetti, F., 1995. Geographical information systems in assessing natural hazards. Springer, Dordrecht. Cavalli, M., Trevisani, S., Comiti, F., Marchi, L., 2013. Geomorphometric assessment of spatial sediment connectivity in small Alpine catchments. Geomorphology 188, 31–41. Chairat, S., Delleur, J.W., 1993. Effects of the topographic index distribution on predicted runoff using grass. Water Resources Bulletin 29, 1029–1034. Champagnac, J.D., Molnar, P., Anderson, R.S., Sue, C., Delacou, B., 2007. Quaternary erosion-induced isostatic rebound in the western Alps. Geology 35, 195–198. Chen, A., Darbon, J., Morel, J.-M., 2014. Landscape evolution models: a review of their fundamental equations. Geomorphology 219, 68–86. Costa-Cabral, M.C., Burges, S.J., 1994. Digital Elevation Model Networks (DEMON): A model of flow over hillslopes for computation of contributing and dispersal areas. Water Resources Research 30, 1681–1692. Coulthard, T.J., 2001. Landscape evolution models: a software review. Hydrological Processes 15, 165–173. Curry, A.M., 1999. Paraglacial modification of slope form. Earth Surface Processes and Landforms 24, 1213–1228. Dadson, S.J., Church, M., 2005. Postglacial topographic evolution of glaciated valleys: a stochastic landscape evolution model. Earth Surface Processes and Landforms 30, 1387–1403. Deroo, A.P.J., Hazelhoff, L., Burrough, P.A., 1989. Soil-erosion modeling using answers and geographical information-systems. Earth Surface Processes and Landforms 14, 517–532. Dikau, R., 1989. The application of a digital relief model to landform analysis in geomorphology. In: Raper, J.F. (Ed.), Three dimensional applications in geographical information systems. Taylor & Francis, London. Dikau, R., 1996. Geomorphologische Reliefklassifikation und -analyse. Heidelberger Geographische Arbeiten 104, 15–36. Dikau, R., Jäger, S., 1995. Landslide hazard modelling in New Mexico and Germany. In: Mcgregor, D., Thompson, D. (Eds.), Geomorphology and land management in a changing environment. John Wiley, Chichester. Dikau, R., Brabb, E. E. and Mark, R. M. (1991). Landform classification of New Mexico by computer. Open-File Report.ded.

108

GIS Applications in Geomorphology

D’Oleire-Oltmanns, S., Eisank, C., Dragut, L., Blaschke, T., 2013. An object-based workflow to extract landforms at multiple scales from two distinct data types. Geoscience and Remote Sensing Letters, IEEE 10, 947–951. Dragut¸, L., Blaschke, T., 2006. Automated classification of landform elements using object-based image analysis. Geomorphology 81, 330–344. Eash, D.A., 1994. A geographic information-system procedure to quantify drainage-basin characteristics. Water Resources Bulletin 30, 1–8. Egholm, D.L., Pedersen, V.K., Knudsen, M.F., Larsen, N.K., 2012. Coupling the flow of ice, water, and sediment in a glacial landscape evolution model. Geomorphology 141–142, 47–66. Eisank, C., Smith, M., Hillier, J., 2014. Assessment of multiresolution segmentation for delimiting drumlins in digital elevation models. Geomorphology 214, 452–464. Erskine, R.H., Green, T.R., Ramirez, J.A., Macdonald, L.H., 2006. Comparison of grid-based algorithms for computing upslope contributing area. Water Resources Research 42, 1–9. Evans, I.S., 1972. General geomorphometry, derivatives of altitude, and descriptive statistics. In: Chorley, R.J. (Ed.), Spatial analysis in geomorphology. Methuen, London. Evans, I.S., 2012. Geomorphometry and landform mapping: what is a landform? Geomorphology 137, 94–106. Fairfield, J., Leymarie, P., 1991. Drainage networks from grid digital elevation models. Water Resources Research 27, 709–717. Farr, T.G., Rosen, P.A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., Alsdorf, D., 2007. The shuttle radar topography mission. Reviews of Geophysics 45, RG2004. Fischer, L., Eisenbeiss, H., Kääb, A., Huggel, C., Haeberli, W., 2011. Monitoring topographic changes in a periglacial high-mountain face using high-resolution DTMs, Monte Rosa East Face, Italian Alps. Permafrost and Periglacial Processes 22, 140–152. Fischer, A., Seiser, B., Stocker Waldhuber, M., Mitterer, C., Abermann, J., 2015. Tracing glacier changes in Austria from the Little Ice Age to the present using a lidar-based highresolution glacier inventory in Austria. The Cryosphere 9, 753–766. Flint, J.J., 1974. Stream gradient as a function of order, magnitude, and discharge. Water Resources Research 10, 969–973. Freeman, T.G., 1991. Calculating catchment area with divergent flow based on a regular grid. Computers & Geosciences 17, 413–422. Frey, H., Machguth, H., Huss, M., Huggel, C., Bajracharya, S., Bolch, T., Kulkarni, A., Linsbauer, A., Salzmann, N., Stoffel, M., 2013. Ice volume estimates for the Himalaya– Karakoram region: evaluating different methods. The Cryosphere Discuss 7, 4813–4854. Funk, M., Hoelzle, M., 1992. A model of potential direct solar radiation for investigating occurrences of mountain permafrost. Permafrost and Periglacial Processes 3 (2), 139–142. Gay, A., Cerdan, O., Mardhel, V., Desmet, M., 2016. Application of an index of sediment connectivity in a lowland area. Journal of Soils and Sediments 16, 280–293. Gesch, D., Oimoen, M., Zhang, Z., Meyer, D., Danielson, J., 2012. Validation of the Aster Global Digital Elevation Model Version 2 over the conterminous United States. In: Proceeding of the XXII ISPRS CongressInternational Society for Photogrammetry and Remote Sensing, Melbourne, pp. 281–286. Gómez, H., Kavzoglu, T., 2005. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Engineering Geology 78, 11–27. Götz, J., Otto, J.C., Schrott, L., 2013. Postglacial sediment storage and rockwall retreat in a semi-closed inner-alpine sedimentary basin (Gradenmoos, Hohe Tauern, Austria). Geografia Fisica e Dinamica Quaternaria 36, 63–80. Goudie, A.S. (Ed.), 1990. Geomorphological techniques. Unwin Hyman, London. Gregory, K.J., Lewin, J., 2014. The basics of geomorphology: key concepts. Sage, London. Grieve, S.W.D., Mudd, S.M., Hurst, M.D., Milodowski, D.T., 2016. A nondimensional framework for exploring the relief structure of landscapes. Earth Surface Dynamics 4, 309–325. Grohmann, C.H., Smith, M.J., Riccomini, C., 2011. Multiscale analysis of topographic surface roughness in the Midland Valley, Scotland. Geoscience and Remote Sensing, IEEE Transactions on 49, 1200–1213. Gross, G., Kerschner, H., Patzelt, G., 1977. Methodische Untersuchungen über die Schneegrenze in alpinen Gletschergebieten. Zeitschrift für Gletscherkunde und Glazialgeologie 12, 223–251. Gruber, U., Bartelt, P., 2007. Snow avalanche hazard modelling of large areas using shallow water numerical methods and GIS. Environmental Modelling & Software 22, 1472–1481. Gruber, F.E., Mergili, M., 2013. Regional-scale analysis of high-mountain multi-hazard and risk indicators in the Pamir (Tajikistan) with GRASS GIS. Natural Hazards and Earth System Sciences 13, 2779–2796. Gruber, S., Peckham, S., 2009. Land-surface parameters and objects in hydrology. In: Hengl, T., Reuter, H.I. (Eds.). Elsevier, Amsterdam. Chapter 7. Gustavsson, M., Kolstrup, E., Seijmonsbergen, A.C., 2006. A new symbol-and-GIS based detailed geomorphological mapping system: renewal of a scientific discipline for understanding landscape development. Geomorphology 77, 90–111. Gustavsson, M., Seijmonsbergen, A.C., Kolstrup, E., 2008. Structure and contents of a new geomorphological GIS database linked to a geomorphological mapdwith an example from Liden, central Sweden. Geomorphology 95, 335–349. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31, 181–216. Hack, J.T., 1957. Studies of longitudinal stream profiles in Virginia and Maryland. Shorter Contributions to General Geology 45–97. Hack, J.T., 1973. Stream-profile analysis and stream-gradient index. Journal of Research of the U.S. Geological Survey 1, 421–429. Haeberli, W., Hölzle, M., 1995. Application of inventory data for estimating characteristics of and regional climate effects on mountain glaciers: a pilot study with the European Alps. Annals of Glaciology 21, 206–212. Hara, K., Zhao, Y., Harada, I., Tomita, M., Park, J., Jung, E., Kamagata, N., Hirabuki, Y., 2015. Multi-scale monitoring of landscape change after the 2011 tsunami. International Archives of the Photogrammetry, Remote Sensing and Spatial Information SciencesdISPRS Archives XL-7/W3, 805–809. Harbor, J., Wheeler, D.A., 1992. On the mathematical description of glaciated valley cross section. Earth Surface Processes and Landforms 17, 477–485. Harvey, A.M., 2001. Coupling between hillslopes and channels in upland fluvial systems: implications for landscape sensitivity, illustrated from the Howgill Fells, northwest England. Catena 42, 225–250. Heckmann, T., Schwanghart, W., 2013. Geomorphic coupling and sediment connectivity in an alpine catchmentdexploring sediment cascades using graph theory. Geomorphology 182, 89–103. Hengl, T., Reuter, H.I. (Eds.), 2009. Geomorphometrydconcepts, software, applications. Elsevier, Oxford. Hickey, R., Smith, A., Jankowski, P., 1994. Slope length calculations from a DEM within ARC/INFO grid. Computers, Environment and Urban Systems 18, 365–380. Hinderer, M., 2001. Late Quaternary denudation of the Alps, valley and lake fillings and modern river loads. Geodinamica Acta 14, 231–263. Hoffmann, T., Schrott, L., 2002. Modelling sediment thickness and rockwall retreat in an Alpine valley using 2D-seismic refraction (Reintal, Bavarian Alps). Zeitschrift für Geomorphologie, Supplement Band 127, 153–173. Horton, R.E., 1932. Drainage–basin characteristics. EOS, Transactions American Geophysical Union 13, 350–361. Horton, R.E., 1945. Erosional development of streams and their drainage basins; hydrophysical approach to quantitative morphology. Geological Society of America Bulletin 56, 275–370. Huabin, W., Gangjun, L., Weiya, X., Gonghui, W., 2005. GIS-based landslide hazard assessment: an overview. Progress in Physical Geography 29, 548–567. Humlum, O., 1998. The climatic significance of rock glaciers. Permafrost and Periglacial Processes 9, 375–395. Huss, M., Farinotti, D., 2012. Distributed ice thickness and volume of all glaciers around the globe. Journal of Geophysical Research: Earth Surface 117, F04010. Jaboyedoff, M., Derron, M.H., 2005a. A new method to estimate the infilling of alluvial sediment of glacial valleys using a sloping local base level. Geografia Fisica e Dinamica Quaternaria 28, 37–46.

GIS Applications in Geomorphology

109

Jaboyedoff, M., Derron, M.H., 2005b. A new method to estimate the infilling of alluvial sediment of glacial valleys using a sloping local base level. Geografia Fisica e Dinamica Quaternaria 28, 37–46. Jäger, S., 1997. Fallstudien zur Bewertung von Massenbewegungen als geomorphologische Naturgefahr. Selbstverlag des Geographischen Instituts der Universität, Heidelberg. James, L.A., 1996. Polynomial and power functions for glacial valley cross-section morphology. Earth Surface Processes and Landforms 21, 413–432. Jenson, S.K., Domingue, J.O., 1988. Extracting topographic structure from digital elevation data for geographical information system analysis. Photogrammetric Engineering and Remote Sensing 54, 1593–1600. Kääb, A., Winsvold, S., Altena, B., Nuth, C., Nagler, T., Wuite, J., 2016. Glacier remote sensing using sentinel-2. Part I: radiometric and geometric performance, and application to ice velocity. Remote Sensing 8, 598. Keller, F., 1992. Automated mapping of mountain permafrost using the program PERMAKART within the geographical information system ARC/INFO. Permafrost and Periglacial Processes 3, 133–138. Keutterling, A., Thomas, A., 2006. Monitoring glacier elevation and volume changes with digital photogrammetry and GIS at Gepatschferner glacier, Austria. International Journal of Remote Sensing 27, 4371–4380. Kirkby, M.J., 1987. Modelling some influences of soil erosion, landslides and valley gradient on drainage density and hollow development. Catena Supplement 10, 1–14. Koethe, R., Lehmeier, F., 1993. SARAdEin Programmsystem zur Automatischen Relief-Analyse. Zeitschrift für Angewandte Geographie 4, 11–21. Kruse, S., 2013. 3.5 Near-surface geophysics in geomorphology A2. In: Shroder, J.F. (Ed.), Treatise on geomorphology. Academic Press, San Diego. Kugler, H. (1975). Das Georelief und seine kartographische Modellierung. Dissertation, Martin-Luther-Universität Halle. Lan, H., Derek Martin, C., Lim, C.H., 2007. RockFall analyst: a GIS extension for three-dimensional and spatially distributed rockfall hazard modeling. Computers & Geosciences 33, 262–279. Lane, S.N., James, T.D., Crowell, M.D., 2000. Application of digital photogrammetry to complex topography for geomorphological research. The Photogrammetric Record 16, 793–821. Lane, S.N., Bakker, M., Gabbud, C., Micheletti, N., Saugy, J.N., 2017. Sediment export, transient landscape response and catchment-scale connectivity following rapid climate warming and Alpine glacier recession. Geomorphology 277, 210–227. Lautensach, H., 1959. Carl TrolldEin Forscherleben. Erdkunde 13, 245–258. Legleiter, C.J., Fonstad, M.A., 2012. An introduction to the physical basis for deriving river information by optical remote sensing. Fluvial remote sensing for science and management. John Wiley & Sons, Ltd, Hoboken, NJ. Lehner, B., Verdin, K., Jarvis, A., 2008. New global hydrography derived from spaceborne elevation data. Eos, Transactions American Geophysical Union 89, 93–94. Leopold, L.B., Wolman, M.G., Miller, J.P., 1964. Fluvial processes in geomorphology. W.H. Freeman and Co, San Francisco. Li, X., Damen, M.C.J., 2010. Coastline change detection with satellite remote sensing for environmental management of the Pearl River Estuary, China. Journal of Marine Systems 82 (Suppl.), S54–S61. Liu, B.Y., Nearing, M.A., Shi, P.J., Jia, Z.W., 2000. Slope length effects on soil loss for steep slopes. Soil Science Society of America Journal 64, 1759–1763. Macmillan, R.A., Pettapiece, W.W., Nolan, S.C., Goddard, T.W., 2000. A generic procedure for automatically segmenting landforms into landform elements using DEMs, heuristic rules and fuzzy logic. Fuzzy Sets and Systems 113, 81–109. Marthews, T.R., Dadson, S.J., Lehner, B., Abele, S., Gedney, N., 2015. High-resolution global topographic index values for use in large-scale hydrological modelling. Hydrology and Earth System Sciences 19, 91–104. Messenzehl, K., Hoffmann, T., Dikau, R., 2014. Sediment connectivity in the high-alpine valley of Val Müschauns, Swiss National Parkdlinking geomorphic field mapping with geomorphometric modelling. Geomorphology 221, 215–229. Mey, J., Scherler, D., Wickert, A.D., Egholm, D.L., Tesauro, M., Schildgen, T.F., Strecker, M.R., 2016. Glacial isostatic uplift of the European Alps. Nature Communications 7, 13382. Micheletti, N., Lambiel, C., Lane, S.N., 2015. Investigating decadal-scale geomorphic dynamics in an alpine mountain setting. Journal of Geophysical Research: Earth Surface 120, 2155–2175. Minár, J., Evans, I.S., 2008. Elementary forms for land surface segmentation: the theoretical basis of terrain analysis and geomorphological mapping. Geomorphology 95, 236–259. Minar, J., Mentlik, P., Jedlicka, K., Barka, I., 2005. Geomorphological information system: idea and options for practical implementation. Geograficky Casopis 57, 247–264. Montgomery, D.R., Foufoula-Georgiou, E., 1993. Channel network source representation using digital elevation models. Water Recources Research 29, 3925–3934. Moore, I.D., Burch, G.J., 1986. Physical basis of the length-slope factor in the universal soil loss Equation1. Soil Science Society of America Journal 50, 1294–1298. Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrological Processes 5, 3–30. Napieralski, J., Harbor, J., Li, Y., 2007. Glacial geomorphology and geographic information systems. Earth-Science Reviews 85, 1–22. Oguchi, T., Wasklewicz, T.A., 2011. Geographic information systems in geomorphology. In: Gregory, K.J., Goudie, A.S. (Eds.), The SAGE handbook of geomorphology. SAGE, London. Olaya, V., 2009. Basic land-surface parameters. In: Tomislav, H., Hannes, I.R. (Eds.), Developments in soil science. Elsevier, Amsterdam. Chapter 6. Olyphant, G.A., 1977. Topoclimate and the Depth of Cirque Erosion. Geografiska Annaler. Series A. Physical Geography 59, 209–213. Otto, J.-C., Dikau, R., 2004. Geomorphologic system analysis of a high mountain valley in the Swiss Alps. Zeitschrift für Geomorphologie, NF 48, 323–341. Otto, J.C., Smith, M., 2013. Section 2.6 Geomorphological mapping. In: Clarke, L.E. (Ed.), Geomorphological techniques (online edition). British Society for Geomorphology, London. Otto, J.-C., Schrott, L., Jaboyedoff, M., Dikau, R., 2009. Quantifying sediment storage in a high alpine valley (Turtmanntal, Switzerland). Earth Surface Processes and Landforms 34, 1726–1742. Otto, J.C., Gustavsson, M., Geilhausen, M., 2011. Cartography: design, symbolisation and visualisation of geomorphological maps. In: Smith, M.J., Paron, P., Griffiths, J. (Eds.), Geomorphological mapping: methods and applications. Elsevier, London. Otto, J.-C., Keuschnig, M., Götz, J., Marbach, M., Schrott, L., 2012. Detection of mountain permafrost by combining high resolution surface and subsurface informationdan example from the Glatzbach catchment, Austrian Alps. Geografiska Annaler: Series A, Physical Geography 94, 43–57. Paul, F., Bolch, T., Kääb, A., Nagler, T., Nuth, C., Scharrer, K., Shepherd, A., Strozzi, T., Ticconi, F., Bhambri, R., Berthier, E., Bevan, S., Gourmelen, N., Heid, T., Jeong, S., Kunz, M., Lauknes, T.R., Luckman, A., Merryman Boncori, J.P., Moholdt, G., Muir, A., Neelmeijer, J., Rankl, M., Vanlooy, J., Van Niel, T., 2015. The glaciers climate change initiative: methods for creating glacier area, elevation change and velocity products. Remote Sensing of Environment 162, 408–426. Penck, A., 1894. Morphologie der Erdoberfläche, 2nd edn. Engelhorn, Stuttgart. Pike R and Dikau R (1995) Advances in Geomorphometry. Proceedings of the Walter F. Wood Memorial Symposium. Zeitschrift fu¨r Geomorphologie/Supplement, vol. 101, p. 238. Pike, R.J., Wilson, S.E., 1971. Elevation-relief ratio, hypsometric integral, and geomorphic area-altitude analysis. Geological Society of America Bulletin 82, 1079–1084. Porter, S.C., 1975. Equilibrium-line altitudes of late Quaternary glaciers in the Southern Alps, New Zealand. Quaternary Research 5, 27–47. Prasicek, G., Otto, J.-C., Montgomery, D.R., Schrott, L., 2014. Multi-scale curvature for automated identification of glaciated mountain landscapes. Geomorphology 209, 53–65. Prasicek, G., Larsen, I.J., Montgomery, D.R., 2015. Tectonic control on the persistence of glacially sculpted topography. Nature Communications 6, 8028. Rabatel, A., Deline, P., Jaillet, S., Ravanel, L., 2008. Rock falls in high-alpine rock walls quantified by terrestrial lidar measurements: a case study in the Mont Blanc area. Geophysical Research Letters 35 (10), L10502. Racoviteanu, A., Williams, M., Barry, R., 2008. Optical remote sensing of glacier characteristics: a review with focus on the Himalaya. Sensors 8, 3355. Riseborough, D., Shiklomanov, N., Etzelmüller, B., Gruber, S., Marchenko, S., 2008. Recent advances in permafrost modelling. Permafrost and Periglacial Processes 19, 137–156.

110

GIS Applications in Geomorphology

Rosser, N.J., Petley, D.N., Lim, M., Dunning, S.A., Allison, R.J., 2005. Terrestrial laser scanning for monitoring the process of hard rock coastal cliff erosion. Quarterly Journal of Engineering Geology and Hydrogeology 38, 363–375. Rowland, J.C., Shelef, E., Pope, P.A., Muss, J., Gangodagamage, C., Brumby, S.P., Wilson, C.J., 2016. A morphology independent methodology for quantifying planview river change and characteristics from remotely sensed imagery. Remote Sensing of Environment 184, 212–228. Salcher, B.C., Kober, F., Kissling, E., Willett, S.D., 2014. Glacial impact on short-wavelength topography and long-lasting effects on the denudation of a deglaciated mountain range. Global and Planetary Change 115, 59–70. Sass, O., 2007. Bedrock detection and talus thickness assessment in the European Alps using geophysical methods. Journal of Applied Geophysics 3, 254–269. Sattler, K., Anderson, B., Mackintosh, A., Norton, K., DE Róiste, M., 2016. Estimating permafrost distribution in the maritime Southern Alps, New Zealand, based on climatic conditions at rock glacier sites. Frontiers in Earth Science 4. Scaioni, M., Longoni, L., Melillo, V., Papini, M., 2014. Remote sensing for landslide investigations: an overview of recent achievements and perspectives. Remote Sensing 6, 9600. Schmidt, J., Hewitt, A., 2004. Fuzzy land element classification from DTMs based on geometry and terrain position. Geoderma 121, 243–256. Schneevoigt, N.J., Van der Linden, S., Thamm, H.-P., Schrott, L., 2008. Detecting Alpine landforms from remotely sensed imagery. A pilot study in the Bavarian Alps. Geomorphology 93, 104–119. Schoeneich, P., 1993. Comparaison des systémes de légendes francais, allemand et suisse principes de la légende IGUL. Travaux et Recherches 9, 15–24. Schrott, L., Adams, T., 2002. Quantifying sediment storage and Holocene denudation in an Alpine basin, Dolomites, Italy. Zeitschrift für Geomorphologie N.F. 128 (Suppl. Bd), 129–145. Schrott, L., Sass, O., 2008. Application of field geophysics in geomorphology: advances and limitations exemplified by case studies. Geomorphology 93, 55–73. Schrott, L., Hufschmidt, G., Hankammer, M., Hoffmann, T., Dikau, R., 2003a. Spatial distribution of sediment storage types and quantification of valley fill deposits in an alpine basin, Reintal, Bavarian Alps, Germany. Geomorphology 55, 45–63. Schrott, L., Hufschmidt, G., Hankammer, M., Hofmann, T., Dikau, R., 2003b. Spatial distribution of sediment storage types and quantification of valley fill deposits in an alpine basin, Reintal, Bavarian Alps, Germany. Geomorphology 55, 45–63. Schrott, L., Otto, J.C., Keller, F., 2013. Modelling alpine permafrost distribution in the Hohe Tauern region, Austria. Austrian Journal of Earth Science 105, 169–183. Seibert, J., Mcglynn, B., 2007. A new triangular multiple flow direction algorithm for computing upslope areas from gridded digital elevation models. Water Resources Research 43, 1–8. Shary, P.A., Sharaya, L.S., Mitusov, A.V., 2005. The problem of scale-specific and scale-free approaches in geomorphometry. Geografia Fisica e Dinamica Quaternaria 28, 81–101. Shroder, J.F., Scheppy, R.A., Bishop, M.P., 1999. Denudation of small alpine basins, Nanga Parbat Himalaya, Pakistan. Arctic Antarctic and Alpine Research 31, 121–127. Small, E.E., Anderson, R.S., 1998. Pleistocene relief production in Laramide mountain ranges, western United States. Geology 26, 123–126. Smith, M.J., 2011. Digital mapping: visualisation, interpretation and quantification of landforms. In: Smith, M.J., Paron, P., Griffiths, J. (Eds.), Geomorphological mapping: methods and applications. Elsevier, London. Smith, M.W., 2014. Roughness in the Earth Sciences. Earth-Science Reviews 136, 202–225. Smith, M.J., Wise, S.M., 2007. Problems of bias in mapping linear landforms from satellite imagery. International Journal of Applied Earth Observation and Geoinformation 9, 65–78. Smith, M.J., Rose, J., Booth, S., 2006. Geomorphological mapping of glacial landforms from remotely sensed data: an evaluation of the principal data sources and an assessment of their quality. Geomorphology 76, 148–165. Smith, M.J., Hilier, J., Otto, J.C., Geilhausen, M., 2013. Geovisualisation. In: Shroder, J.F. (Ed.), Treatise on geomorphology. Academic Press Elsevier, San Diego. Snavely, N., Seitz, S.M., Szeliski, R., 2008. Modeling the world from Internet photo collections. International Journal of Computer Vision 80, 189–210. Sternai, P., Herman, F., Fox, M.R., Castelltort, S., 2011. Hypsometric analysis to identify spatially variable glacial erosion. Journal of Geophysical Research 116, F03001. Strahler, A.N., 1952. Hypsometric (area-altitude) analysis of erosional topography. Geological Society of America Bulletin 63, 1117–1142. Strahler, A.N., 1957. Quantitative analysis of watershed geomorphology. Transactions of the American Geophysical Union 38, 913–920. Stumpf, A., Malet, J.P., Delacourt, C., 2017. Correlation of satellite image time-series for the detection and monitoring of slow-moving landslides. Remote Sensing of Environment 189, 40–55. Svensson, H., 1959. Is the cross-section of a glacial valley a parabola? Journal of Glaciology 3, 362–363. Tarboton, D.G., Bras, R.L., Rodriguez-Iturbe, I., 1991. On the extraction of channel networks from digital elevation data. Hydrological Processes 5, 81–100. Theler, D., Reynard, E., Bardou, E., 2008. Assessing sediment dynamics from geomorphological maps: Bruchi torrential system, Swiss Alps. Journal of Maps 4, 277–289. Tucker, G.E., Bras, R.L., 1998. Hillslope processes, drainage density, and landscape morphology. Water Resources Research 34, 2751–2764. Tucker, G.E., Hancock, G.R., 2010. Modelling landscape evolution. Earth Surface Processes and Landforms 35, 28–50. Tucker, G.E., Whipple, K.X., 2002. Topographic outcomes predicted by stream erosion models: sensitivity analysis and intermodel comparison. Journal of Geophysical Research: Solid Earth 107. ETG 1-1–ETG 1-16. Tunnicliffe, J., Church, M., Clague, J.J., Feathers, J.K., 2012. Postglacial sediment budget of Chilliwack Valley, British Columbia. Earth Surface Processes and Landforms 37, 1243–1262. Van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide susceptibility, hazard, and vulnerability assessment: an overview. Engineering Geology 102, 112–131. Vanwesten, C.J., Terlien, M.T.J., 1996. An approach towards deterministic landslide hazard analysis in GIS. A case study from Manizales (Colombia). Earth Surface Processes and Landforms 21, 853–868. Warrick, J.A., Ritchie, A.C., Adelman, G., Adelman, K., Limber, P.W., 2017. New techniques to measure Cliff change from historical oblique aerial photographs and structure-frommotion photogrammetry. Journal of Coastal Research 33, 39–55. Westoby, M.J., Brasington, J., Glasser, N.F., Hambrey, M.J., Reynolds, J.M., 2012. ‘Structure-from-Motion’ photogrammetry: a low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314. Whipple, K.X., 2004. Bedrock rivers and the geomorphology of active orogens. Annual Review of Earth and Planetary Sciences 32, 151–185. Whipple, K.X., Tucker, G.E., 1999. Dynamics of the stream-power river incision model: implications for height limits of mountain ranges, landscape response timescales, and research needs. Journal of Geophysical Research: Solid Earth 104, 17661–17674. Whipple, K.X., Dibiase, R.A., Crosby, B.T., 2013. Bedrock rivers. In: Shroder, J.F., Wohl, E. (Eds.), Treatise on geomorphology. Academic Press, San Diego. Wichmann, V., Becht, M., 2006. Rockfall modelling: methods and model application in an Alpine basin. Goltze, Göttingen. Wilford, D.J., Sakals, M.E., Innes, J.L., Sidle, R.C., Bergerud, W.A., 2004. Recognition of debris flow, debris flood and flood hazard through watershed morphometrics. Landslides 1, 61–66. Wilson, J.P., Gallant, J.C. (Eds.), 2000. Terrain analysis: principles and applications. Wiley, New York. Wulder, M.A., Coops, N.C., 2014. Make Earth observations open access. Nature 513, 30–31. Zink, M., Fiedler, H., Hajnsek, I., Krieger, G., Moreira, A., Werner, M., 2006. The TanDEM-X mission concept. IEEE International Symposium on Geoscience and Remote Sensing 1938–1941. Zink, M., Bartusch, M., Miller, D., 2011. TanDEM-X mission status. IEEE International Geoscience and Remote Sensing Symposium 2011, 2290–2293.

GIS Applications in Geomorphology

Relevant Websites www.geomorphometry.org/ – The science of digital terrain analysis. http://gis4geomorphology.com/ – GIS 4 Geomorphology. https://www.qgis.org/en/site/ – QGIS - A Free and Open Source Geographic Information System. http://www.saga-gis.org/en/index.html – SAGA - System for Automated Geoscientific Analyses. http://serc.carleton.edu/NAGTWorkshops/geomorph/vignettes.html – SERC Carleton - Geomorphology Vignettes.

111

2.06

GIS for Glaciers and Glacial Landforms

Tobias Bolch, University of Zurich, Zürich, Switzerland David Loibl, Humboldt University of Berlin, Berlin, Germany © 2018 Elsevier Inc. All rights reserved.

2.06.1 2.06.2 2.06.2.1 2.06.2.2 2.06.2.2.1 2.06.2.2.2 2.06.2.2.3 2.06.2.3 2.06.3 2.06.4 2.06.5 2.06.6 2.06.6.1 2.06.6.1.1 2.06.6.2 2.06.6.3 2.06.7 References

2.06.1

Introduction Mapping of Glaciers Using Remote Sensing and GIS Mapping Glacier Extents Mapping of Debris-Covered Glaciers Surface temperature Morphometric approaches Glacier motion Mapping Former Glacier Extents Generation of a Glacier Inventory Glacier Volume and Glacier Bed Topography Glacier Changes Terrain Analysis of Glaciers and Glacial Landforms Morphometric Analysis Key morphometric parameters in the glacial context Morphometric Analysis of the Equilibrium Line Altitude Landform Classification Conclusions and Future Perspectives

112 114 114 116 116 116 116 116 119 122 124 125 128 128 130 130 132 134

Introduction

Glaciers outside the Greenland and Antarctic ice sheets cover more than 700,000 km2 ( 5% of the total ice cover) and vary from small cirque glaciers to large ice caps (Pfeffer et al., 2014). Although the glaciers are much smaller in area and volume than the ice sheets, glacier melt contribution to global sea level rise during the last decade was similar or probably even higher than the contribution of the two ice sheets (Gardner et al., 2013). Glacier melt also contributed significantly to river run-off (Barnett et al., 2005; Radic and Hock, 2013). This meltwater is of importance for agriculture, industrial and domestic use, and an important source of hydropower not only in arid regions like Central Asia but also in Central Europe and North America (Huss, 2011; Kaser et al., 2010). This is especially the case in the summer months when the water demand is usually highest (Barnett et al., 2005). Glaciers are also a typical element of the landscape and can be a tourism attraction in mountains and polar areas. A glacier is defined as “a mass of surface-ice on land which flows down under gravity. In general, a glacier is formed and maintained by accumulation of snow at high altitudes, balanced by melting at low altitudes or discharge into lakes and the sea.” (WGMS, 2008) Glaciers react sensitively to climate forcing and fluctuations have occurred at all times. These fluctuations are easy to recognize and could be measured and, hence, glaciers are identified as key indicators of climate change in remote mountain areas where climate stations are uncommon (Vaughan et al., 2013). The large majority of the glaciers on earth have been retreating since the end of the Little Ice Age (LIA), with an increasing rate in the last 20–30 years. The rate of recession, however, varies in different mountain regions and there are also advancing glaciers in recent years, especially in the Karakorum Himalaya (Barry, 2006; Kargel et al., 2014, WGMS, 2008; Zemp et al., 2015; Bolch et al., 2012). Former glacier extents can be recognized in the landscape by moraines. Most prominent are the moraines from the last maximum extent from the so-called LIA, a period of lower temperatures starting around the 1350s and ending around the mid-19th century (Wanner et al., 2008, Fig. 1). Recent changes in glacier area and length are visible even by a layman on multitemporal aerial and/or satellite images and can be investigated using remote sensing and GIS. However, these changes show an indirect signal only, as the response time which a glacier needs to adjust to a new equilibrium has to be considered (Bolch et al., 2012). Glaciers with debris-covered glacier tongues as common in the Himalayas (Bolch et al., 2012; Scherler et al., 2011) and several other mountain ranges of the world need special consideration, as thick debris insulates the ice (Nicholson and Benn, 2006) but also favors the development of supraglacial lakes and ice cliffs, which are hot spots of glacier melt (Benn et al., 2012; Pellicciotti et al., 2015). Some glaciers also show periodic rapid advances where ice mass is redistributed to lower elevations, probably triggered by changes in subglacial hydrology (Sevestre and Benn, 2015). The best measure of glacier status is mass balance, which is directly coupled to climate and allows comparison to other regions (WGMS, 2008). In-situ field measurements are of high importance but laborious and, hence, only feasible for a few numbers of glaciers. Remote sensing offers the possibility of investigating glacier changes for a larger region simultaneously because they can in general be well identified both visually and in an automated way using multispectral imagery (Bolch et al., 2010a; Paul et al., 2002). The debris-covered glacier parts are, however, difficult to identify in an automated way due to the similar spectral signal of the surrounding debris. The first sensor suitable for automated glacier mapping is the thematic mapper (TM) launched with

112

GIS for Glaciers and Glacial Landforms

113

Fig. 1 Tschiervagletscher in the Swiss Alps (A, photo: T. Bolch); debris-covered glaciers in Khumbu Himalaya with Lhotse and Imja glaciers (B, pseudo 3D view using ASTER data generated by T. Bolch); Aksu Glacier in Northern Tien Shan (C, photo: T. Bolch). The moraines, indicating the glacier extents of the Little Ice Age, are well visible. The areal loss of the debris-covered glaciers is clearly less strong.

Landsat 4 in 1984. Earlier images are rare but, since about the year 2000, suitable multispectral satellite images are available for most of the glacierized areas of the world due to the frequent acquisitions of Terra ASTER, Landsat ETM þ, OLI and other sensors and the opening of the whole Landsat archive. Nowadays, acquisitions are made every few days by Sentinel-2, Landsat OLI and other sensors so that the chances are much higher of obtaining suitable cloud-free images (Paul et al., 2016). High-resolution images have been available since 2000 with Ikonos and Quickbird satellites and several options are now available to obtain images from glaciers with a resolution of 1 m and better. Declassified reconnaissance images such as from the Corona and Hexagon missions from the 1960s and 1970s are suitable to extend the analysis back. The images are panchromatic but have a resolution of less than 8 m, KH-4B even less than 3 m (Dashora et al., 2007; Surazakov and Aizen, 2010). Multitemporal digital elevation models (DEMs) allow the calculation of glacier mass balances (Bamber and Rivera, 2007) and complement existing in-situ mass balance measurements (Bolch, 2015) or can be used to reanalyze existing in-situ measurements (Zemp et al., 2010). Accurate DEMs are required in order to obtain significant signals, especially for smaller mountain glaciers. The DEM from the shuttle radar topography mission (SRTM) acquired in Feb. 2000 (Farr et al., 2007) is a suitable data set and is often used for this purpose (e.g., Gardelle et al., 2013; Paul and Haeberli, 2008; Schiefer et al., 2007). However, it has a single time stamp and, more problematic, the c-band radar beam can penetrate several meters into ice and snow, leading to large uncertainties (Gardelle et al., 2012; Kääb et al., 2015). Stereo ASTER is one economic source for DEM generation with a spatial resolution of 30 m but has height inaccuracies and artifacts (Kääb, 2002; Kamp et al., 2005; Toutin, 2008). Higher-resolution data such as WorldView-3 and -4, Pleiades, and Cartosat-1 are a suitable source of DEM generation of the recent period (Ahmed et al., 2007; Bignone and Umakawa, 2008), while Corona and Hexagon also offer stereo data which makes them a suitable source for obtaining the past surface conditions (Altmaier and Kany, 2002; Surazakov and Aizen, 2010). An accurate representation of the surface is required to obtain topographic and geomorphometric parameters. These parameters also support the mapping of debris-covered glaciers (Bishop et al., 2001; Bolch and Kamp, 2006; Shukla et al., 2010). While image processing often requires specific image analysis or photogrammetric software, the final processing, manual digitization, detailed analysis and change detection are mostly done in GIS software. Within this contribution, we provide an overview of the possibilities of remote sensing and GIS technologies to map and analyze glacial landscapes, glaciers, their changes over time and the identification of typical landforms such as drumlins, roches moutonnées, and the moraines of the LIA.

114

GIS for Glaciers and Glacial Landforms

2.06.2

Mapping of Glaciers Using Remote Sensing and GIS

2.06.2.1

Mapping Glacier Extents

Much work has been done to map ice-covered areas using remote sensing techniques since the launch of the first Landsat satellite in 1972. Although some promising approaches exist to map glaciers based on the earlier MSS sensor (e.g., Della Ventura et al., 1987; Gratton et al., 1990; Rott, 1976), an accurate automated mapping of clean-ice glaciers is only possible since the launch of Landsat TM in 1984, which had a short-wave infrared (SWIR) sensor. This is important, as snow and clean ice have high reflectance in the visible bands and very low reflectance in the SWIR band. The near-infrared band (NIR) allows distinguishing between snow and ice (Fig. 2A). Since then, several promising approaches to automatically delineate glacier ice, such as supervised classification or thresholding of ratio images and the Normalized Differenced Snow Index (NDSI), are presented (Aniya et al., 1996; Bayr et al., 1994; Hall et al., 1995; Paul et al., 2002; Racoviteanu et al., 2008; Rott, 1994). A good evaluation of automated mapping methods is presented by Sidjak and Wheate (1999). The ratio VIS-RED/SWIR has the advantage over the ratio NIR/SWIR in that it works better in the shadow and with thin debris cover (Andreassen et al., 2008; Paul and Kääb, 2005), but the number of misclassified pixels can be higher (Bolch and Kamp, 2006; Fig. 2B). Thresholding of ratio images was found to be the most accurate, robust and time-effective method to map clean-ice glaciers besides the time-intensive manual digitizing (Paul et al., 2013). Most important are suitable images (no clouds on glaciers, small shadows and little snow cover) and a carefully selected threshold (which is for Landsat images usually around 2.0; Bolch et al., 2010a; Paul et al., 2002). The application is simple and can be realized in almost all GIS software. Hence, the ratio method is currently often utilized to generate glacier inventories for manifold mountains of the world (e.g., Andreassen et al., 2008; Bolch et al., 2010a; Guo et al., 2015). A major improvement provided the launch of Landsat 8 OLI (11 Feb. 2013) and especially of Sentinel-2 MSI (23 Jun. 2015), due to increased spatial and radiometric resolution, a wider swath width and more frequent coverage (Paul et al., 2016). However, the automated methods have some major disadvantages: (a) The reflection of water bodies is similar to that of the used ratio bands, with much higher reflection in the visible and near infrared band than in the SWIR band. Therefore adjacent water would be included in the glacier area (Fig. 2B).

Fig. 2 Example of automated glacier mapping at Bernina Group, Swiss Alps. (A) TM4/TM5-ratio image; (B) delineation TM4/TM5 >1 (white), delineation TM3/TM5 >1 (white and red); (C) delineation of glaciers (white), vegetated areas (green), water (blue), rock and debris (brown); (D): raw glacier outlines after raster to vector transformation (background: Landsat ETMþ scene bands: SWIR, NIR, VIS-RED). Bolch, T. and Kamp, U. (2006). Glacier mapping in high mountains using DEMs, Landsat and ASTER Data. Grazer Schriften der Geographie und Raumforschung 41, 13–24.

GIS for Glaciers and Glacial Landforms

115

(b) The ratio method as well as other standard automated methods fail to include debris-covered glacier areas due to spectral similarity to surrounding bedrock. (c) Adverse snow conditions can make a delineation of the upper glacier boundary impossible. Wrongly classified clean water bodies can be detected using the Normalized Difference Water Index (NDWI) (Fig. 2C) but automated methods have problems with turbid or ice-covered lakes. Several automated methods were suggested to map debriscovered glaciers. This reduces the workload of manual improvements but manual editing is still required to achieve suitable accuracy (see next section). No automated method can detect glacier ice under snow cover. A possibility for removing misclassified snow-covered areas is to apply a slope threshold (usually between 45 and 60 degrees, Bajracharya and Shrestha, 2011) using a DEM, as usually snow cannot accumulate under these conditions. However, a cold hanging glacier can occur at steep slopes and would also be removed. Hence, to achieve the best possible accuracy, manual editing is required. Most favorable for this step is the vector data domain in a GIS environment. Therefore the gridded results need to be converted into the vector format (Fig. 2D). Usually the manual editing of the automatically derived outlines is quite time consuming and, hence, glacier inventories relying on manual digitizing are also widely applied, even for large areas (Nuimura et al., 2015). The uncertainty of the glacier outlines depends primarily on the image resolution, the suitability of the scenes, the glacier characteristics (esp. debris cover) and the chosen methods. With suitable scenes the uncertainty is usually between 2% and 5% (Bolch et al., 2010a; Paul et al., 2013). Glacier delineation based on lower resolution images tends to be larger (Paul et al., 2013, 2016). Most problematic is the accurate delineation of the debris-covered parts. Sometimes glacier margins cannot be detected even with very high-resolution imagery and huge differences in interpretations exist among experts (Fig. 3; Paul et al., 2013).

Fig. 3 Examples from the multiple digitizations of glaciers using a Landsat TM scene (A, B) or high-resolution images (C, D) performed by different analysts (colored lines). White outlines refer to the automatically derived extents; large deviations exist for the debris-covered glacier snout (C). Paul, F., Barrand, N., Baumann, S., Berthier, E., Bolch, T., Casey, K. A., Frey, H., Joshi, S.P., Konovalov, V., LeBris, R., Mölg, N., Nosenko, G., Nuth, C., Pope, A., Racoviteanu, A., Rastner, P., Raup, B., Scharrer, K., Steffen, S. and Winsvold, S. (2013). On the accuracy of glacier outlines derived from remote sensing data. Annals of Glaciology 54 (63), 171–182.

116

GIS for Glaciers and Glacial Landforms

2.06.2.2

Mapping of Debris-Covered Glaciers

Debris-covered glaciers occur in every glacierized mountain range and are especially common in the Himalayas, where more than 10% of the glacier area is covered by supraglacial debris (Bajracharya and Shrestha, 2011; Bolch et al., 2012). It is therefore of importance to automate not only the mapping of clean ice but also debris-covered ice. Methods relying on multispectral images alone have problems due to the similar spectral signal of the debris surrounding the glacier. Hence, additional information is required to map debris-covered glaciers. Several methods relying on GIS technologies were presented to map debris-covered glaciers. Encouraging approaches are based on thermal information (Ranzi et al., 2004), morphometric analysis (Bishop et al., 2001), analysis of slope gradients in combination with change detection and neighborhood analysis (Paul et al., 2004), information about motion based on SAR or optical imagery (Frey et al., 2012; Smith et al., 2015), or a combination of several of the mentioned parameters using multidimensional approaches (Bhambri et al., 2011a; Bolch et al., 2007; Racoviteanu and Williams, 2012) or object-based image classification (Rastner et al., 2012; Robson et al., 2015). These approaches are shortly presented in the following.

2.06.2.2.1

Surface temperature

The underlying ice is expected to cool the supraglacial debris (Ranzi et al., 2004). This is measurable on the surface with the thermal information from satellite imagery (e.g., ASTER, Landsat or Sentinel-2), if the debris cover does not exceed a certain density and/or thickness (e.g., 0.5 m). Mihalcea et al. (2008) showed the suitability of ASTER thermal data for not only delineation of supraglacial debris but also its thickness at Miage Glacier (Mont Blanc Massif, Italy) and Baltoro Glacier, Karakoram, Pakistan. However, the thermal information is problematic for glaciers with thick debris cover as nonglacier areas in the shadow can have surface temperatures similar to the clean-ice areas, but debris-covered ice with direct radiation is warmer. This hinders a clear differentiation of glacier and nonglacier areas. An additional draw-back is the relatively low resolution of the thermal bands.

2.06.2.2.2

Morphometric approaches

Debris-covered glacier tongues have typical morphometric characteristics. They have usually relatively shallow slope gradients and their surfaces are usually characterized by a slightly convex surface. The convexity increases to the edges followed by a concave break to the lateral moraines or mountain slopes. Most GIS software, e.g., ArcGIS, SAGA or GRASS, allow analyses of DEMs to derive these relevant morphometric parameters. Slope has been shown to be the key parameter to support the identification of debris-covered glaciers (Bhambri et al., 2011a; Paul et al., 2004). Suggested threshold values differ depending on glacier type and the characteristics of the snout and vary between 12 degrees for Khumbu Glacier/Himalaya (Bolch et al., 2007) and 24 degrees for Oberaletschgletscher/Swiss Alps (Paul et al., 2004). The curvature allows better detection of changes in slope. Profile curvature reveals convexity of lateral moraines and also the concavity of the glacier valley boundary where steep lateral moraines join. Plan curvature accentuates the crests of ridges and glacier valley bed. Curvatures are, hence, suitable to detect the glacier margins or glacier landforms such as lateral moraines (Bolch and Kamp, 2006). The main problems occur at the end of the glacier termini, but also at the glacier margins, if the transition to nonglaciated terrain is smooth, e.g., if the lateral moraine is missing or not represented in the DEM (Fig. 4). Plan and profile curvature can also be used to identify the margins of the snouts of valley glaciers (Fig. 17E).

2.06.2.2.3

Glacier motion

Glacier tongues show ice flow downvalley. The surface velocity usually decreases to the terminus and the most lateral part of the debris-covered tongues might be stagnant (Bolch et al., 2008a; Quincey et al., 2009). However, even slow movements cause decorrelations of SAR images and, hence, SAR coherence images, especially from ALOS PALSAR and Sentinel-1 data have been shown to be a suitable tool to detect debris-covered glacier snouts. (Frey et al., 2012) used coherence imagery to support detection and manual delineation of debris-covered glaciers snouts in a GIS system. However, all presented methods here have either high inaccuracies (error > 10%), are optimized for a small region or a single glacier (Bhambri et al., 2011a), or provide additional information to support manual editing (Bolch and Kamp, 2006; Frey et al., 2012). The accuracy is limited by the coarser resolution of the thermal bands and the quality of the utilized DEM. Hence, the results are only reasonable for lager valley glaciers with clearly defined margins (e.g., lateral moraines). Paul et al. (2004) have shown that multispectral image classification (glacier ice, vegetation), neighborhood analysis (connection to glacier ice), and change detection are useful to map debris-covered glaciers. Most promising for automated mapping of debris-covered glaciers are multidimensional approaches combining several parameters, such as (i) glacier velocity, morphometric and multispectral information (Smith et al., 2015) or (ii) morphometric, multispectral information and thermal information, and information about the shape of the feature using artificial intelligence (Quincey et al., 2014). The results of the latter approach are encouraging but require in-depth programming skills. Object-based image analysis (OBIA) also allows the combination of different relevant parameters but is available in software packages. Rastner et al. (2012) and Robson et al. (2015) show the suitability of the OBIA approach for glacier mapping (Fig. 5).

2.06.2.3

Mapping Former Glacier Extents

The application of GIS technology does not add a completely novel approach to map and analyze former glacier extents to the existing methodological framework (Ehlers and Gibbard, 2004a,b,c; Lowe and Walker, 1997). Nevertheless, GIS provides an outstandingly versatile tool to combine and extend existing approaches, some of which have been applied since the dawn of glaciation

GIS for Glaciers and Glacial Landforms

117

Fig. 4 Examples from debris-covered glaciers at Mt. Everest: ASTER thermal bands 14–12–10; the debris-covered glacier tongues appear in light purple (A); slope gradients calculated based on the SRTM3 DEM: green: slope 24 degrees (red), the glaciers can almost be identified based on the slope only. Bolch, T., Buchroithner, M. F., Kunert, A. and Kamp, U. (2007). Automated delineation of debris-covered glaciers based on ASTER data. In: Gomarasca M. A. (ed.). GeoInformation in Europe. Proceeding of the 27th EARSeL-Symposium, 4–7 June 07, Bozen, Italy, pp. 403–410. Millpress: Netherlands.

Fig. 5 Comparison of the object-based image analysis results (OBIA, A) and the pixel-based image analysis (PBIA, B). Rastner, P., Bolch, T., Notarnicola, C., and Paul, F. (2014). A comparison of pixel- and object-based glacier classification with optical satellite images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 7 (3), 853–862. http://dx.doi.org/10.1109/JSTARS.2013.2274668.

theory in the early 19th century (Krüger, 2013). Four different approaches to map former glacier extents can be distinguished based on available data, implicitly referring to the age of the glaciation under investigation: (i) image-based mapping for studies within the period the study area is covered by satellite and/or airborne imagery, (ii) document-based reconstruction analyzing historic maps, written accounts, paintings, etc., (iii) landform-based mapping for paleoglaciations whose landform record can still be reconstructed, and (iv) facies-based reconstruction by combining spatial distribution of glacigenic sediments with constraints on paleogeography to investigate glaciations whose landform record was entirely erased. (i) Mapping former glacier extents by the image-based approach requires datasets depicting the glacier’s state at a known point of time, e.g., satellite and/or airborne images (see previous section). Information can also be provided by oblique, perceptively distorted images but the effort for georeferencing and orthorectification is substantial and therefore these photos were rarely used for quantitative mapping. (ii) Historical records that allow the reconstruction of past glacier extents, e.g., drawings, paintings, sketches, engravings, photographs, chronicles, topographic maps, or reliefs, are a valuable source of information, particularly when both spatial extent and timing are presented. However, such documents are rare and unevenly distributed across the globe; the majority of glaciers in remote mountain regions were neither drawn nor mentioned in texts before the dawn of sophisticated surveying and imaging

118

GIS for Glaciers and Glacial Landforms

Fig. 6 Glacier des Bossons seen from le Brévent before 1906 (“Le massif du Montblanc vu du Brévent,”) cut-out; photochrom print; 27.0  80.0 cm; CAH, Annecy, Collection Paul Payot; photograph by H. J. Zumbühl (A); glacier foreland and LIA moraines of Glacier des Bossons (B). Nussbaumer, S. U. and Zumbühl, H. J. (2012). The Little Ice Age history of the Glacier des Bossons (Mont Blanc massif, France): A new highresolution glacier length curve based on historical documents. Climatic Change 111 (2), 301–334. doi:10.1007/s10584-011-0130-9.

technologies. Some glaciers are shown in maps from explorers of the late 19th century or early 20th century and some glaciers were even the target for mapping (see, e.g., Bhambri and Bolch, 2009 for information about the Indian Himalaya). In the relatively densely populated European Alps and Scandinavia, however, several prominent glaciers are depicted in hundreds of documents covering several hundred years. In the latter cases, GIS provides powerful tools to collect, evaluate, manage and present the researched data. For example, Nussbaumer and Zumbühl (2012) reconstructed maxima at Glacier des Bossons in the Mont Blanc massiv, France, back to 1580 CE, including calculated cumulative length changes and maps of georeferenced frontal positions (Fig. 6). About 400 documents allowed the investigation of up to 300 years of glacial history at eight outlet glaciers of Jostedalsbreen and Folgefonna Ice Caps, Norway (Nussbaumer et al., 2011). Knoll et al. (2009) combined historical data sources including paintings, photographs and historical maps, with airborne LiDAR DEMs and digital orthophotos to constrain LIA glacier extents in South Tyrol, Italy. When working with historical maps the quality of the initial mapping must also be assessed; particularly glaciers in remote regions were often not mapped in accordance with modern geodetic standards. (iii) Investigation of glacial stages older than the first available imagery requires a different approach. For most regions this will apply for all advances older than 50–100 years. Landform evidence representing ice edges during a certain maximum advance is used to delineate glacier extents for glacier stages attributed to the LIA or older. In this context, several linear landforms, particularly crests of laterofrontal moraines and trimlines, indicate the position of former ice edges quite clearly and can well be identified in many remote sensing data by distinct optical contrasts (Figs. 1 and 7). Unlike when mapping from dated imagery, close attention must be paid to the age of individual landforms to ensure that they correlate to the same glacial stage. It is therefore often an advisable first step to research and consider numerical dating results for the landforms under investigation. Before starting extensive mapping, it is recommended to analyze the moraine configuration in detail at least for several prominent glaciers in a particular study area to establish a morphostratigraphy (Lukas, 2006; Penck and Brückner, 1909). It is also worth noting that moraine evidence is limited to the paleoglacier’s ablation area owing to the distribution of flow directions of debris transport in a glacier. As a consequence, former ice boundaries are typically hardly constrained in the upper reaches of a paleoglacier and it is widely impossible to assess whether a specific rock outcrop, scarp, etc. was formerly covered by ice or not. Moreover, moraine and trimline evidence is patchy or lacking in many settings, inhibiting a straightforward delineation of the former ice boundary. In such cases reconstructions of ice extent are typically based on interpolating areas where general glacial landforms are present, including, e.g., drumlins, eskers, and roche monteneés. Besides these relatively clearly defined features, it is often reasonable to also include more general types such as “hummocky terrain” or “U-shaped valleys” which are prone to ontological challenges and typically require specific definition for a particular study region. In most cases, freely available satellite imagery (e.g., Landsat 7 ETM þ and 8 OLI, Sentinel-2 or Terra ASTER) and DEMs (e.g., SRTM1 or ASTER GDEM2) are well suited to map landform evidence for paleoglacier extents in GIS. A series of studies from Sweden-based researchers uses such approaches to map glacial landforms in High Asia, aiming to

GIS for Glaciers and Glacial Landforms

94°48′

119

94°54′

Little Ice Age lateral moraines

Active ice indicators

Little Ice Age terminal moraines

- linear flow structures - constrained tributaries

Dead ice indicators

30°30′

30°30′

- unconstrained tributaries - dense vegetation cover

Ridge lines (obscured) Ridge lines (clearly visible) 94°48′

Transient snowline 94°54′

Fig. 7 Mapping concept for delineation of modern and Little Ice Age glacier states in southeastern Tibet from Landsat 7 ETM þ imagery (channels 5–4–3). Modified after Loibl, D., Lehmkuhl, F. and Grießinger, J. (2014). Reconstructing glacier retreat since the Little Ice Age in SE Tibet by glacier mapping and equilibrium line altitude calculation. Geomorphology 214, 22–39. doi:10.1016/j.geomorph.2014.03.018.

constrain the extent of late Pleistocene glaciation in its different mountain ranges (e.g., Heyman et al., 2009; Lindholm and Heyman, 2016; Stroeven et al., 2013) (Fig. 8). The BRITICE project (Clark et al., 2004) compiled > 20,000 glacial landform features from more than 1000 publications in one large GIS database, forming a map of the last ice sheet in Britain. (iv) The oldest glacial stages known on Earth today date back to  2.3 billion years (Tang and Chen, 2013). Because glacial landform evidence from pre-Quaternary glaciations is widely lacking, the reconstruction of ice extents for those periods is based on interpolation of sites providing geological evidence, i.e., rocks formed of glacial sediments (tillites). Further details on the GIS methods relevant in this context are provided in the geology chapter. Briefly reviewing the four approaches of reconstructing former glacier extents reveals that the approaches vary substantially depending on the age of glaciation. Particularly when investigating prehistory, mapping of former glacier extents is closely related to geomorphological mapping and landform classification (cf. “Landform Classification” section). In addition, and owing to the often complex nature of glacial landscapes, mapping and interpretation usually require good knowledge of the overall landscape configuration, its specific glacial history, general glacial geomorphology, and relevant landform types as well as landform associations (Benn and Evans, 2010). Also, levels of uncertainty are increasing from approach (i)–(iv). This is basically a consequence of time and the resulting depletion of datasets and archives that could be used the further the investigation reaches into the past. However, local factors, particularly topography, climate and lithology, also have significant effects on the configuration of detectable evidence. The following concluding remarks will therefore focus on how to design mapping studies to minimize uncertainty and how to manage and visualize uncertainty in the output. First, a thorough investigation of former glacier extents always requires a good understanding of regional glacier dynamics and its forcing. This can often be achieved by considering different phases of glaciation (Rother et al., 2017) and applying polymethodological approaches, e.g., by combining mapping of modern glacier outlines with landform-based mapping of paleoextents under consideration of sedimentological evidence (Chandler et al., 2016; Loibl et al., 2015). The grade of confidence with which a former ice boundary can be determined will typically vary spatially. It is therefore important to inform readers in which areas actual evidence exists and where the delineation is based on interpolation or guessing. Conveniently, this can be achieved by including actual evidence in figures and maps, e.g., by using concepts from geomorphological mapping to indicate locations of relevant landforms, and by applying basic cartographic tools such as solid vs. dashed lines for optical differentiation between evidence vs. interpolation, respectively.

2.06.3

Generation of a Glacier Inventory

A glacier inventory does not only consist of the data about the glacier area (e.g., calculated from outlines of single glaciers) but should include further information such as glacier length, minimum, maximum, and median elevation, aspect, mean slope

120

GIS for Glaciers and Glacial Landforms

Fig. 8 Example of landform-based reconstruction of former glacier extents from the Maidika Basin, Tibet. Lindholm, M. S. and Heyman, J. (2016). Glacial geomorphology of the Maidika region, Tibetan Plateau. Journal of Maps, 12, 797–803.

(Paul et al., 2009). This information is needed for better characterization and comparison of the glaciers and especially for modeling the glaciers’ response to climate. This information can well be calculated in a GIS and require glacier outlines and a DEM only. The Randolph Glacier Inventory (RGI), whose first version was published in 2012, was the first freely available global glacier inventory (Pfeffer et al., 2014). To generate a glacier inventory, the first step after the glacier outlines are derived from semiautomated methods is to separate contiguous ice masses into single glaciers, e.g., at ice divides which separate glaciers as individual entities in a hydrologic sense. This can be done manually considering the requirements with the aid of a flow direction grid (e.g., Paul et al., 2002). However, this is quite time-consuming and hardly feasible for large areas and automation is recommendable. Most presented (semi-)automated approaches are based on hydrological analysis implemented in most GIS packages. The approaches require the identification of a starting point (pour point at the lowest glacier elevation) from where the contributing upslope area is computed (e.g., Manley, 2008; Schiefer et al., 2008). A simplified approach is to derive glacier drainage basins using the DEM and a buffer around each glacier (Bolch et al., 2010a, Fig. 9); this approach was, e.g., applied for the glacier inventory of the whole of Greenland (Rastner et al., 2012) and can be used for different, also larger, extents in contrast to methods linked to specific glacier outlines at a certain time (Manley, 2008; Schiefer et al., 2008). The optimum buffer size varies according to glacier size and the investigated time period (Bolch et al., 2010a). After clipping the DEM to this buffer, the glacier basins were calculated by hydrological functions

GIS for Glaciers and Glacial Landforms

121

Fig. 9 Sample of delineated glacier drainage basins for the Western Canadian Glacier Inventory. Red ellipses indicate examples of uncertain ice divides due to the erroneous DEM in the accumulation zones. Bolch, T., Menounos, B. and Wheate, R. D. (2010a). Landsat-based inventory of glaciers in western Canada, 1985–2005. Remote Sensing of Environment 114 (1), 127–137. doi:10.1016/j.rse.2009.08.015.

implemented in most GIS software packages. Subsequently, the calculated basin grid needs to be converted into polygons which represent glacier drainage basins. These polygons can then be used within the GIS software to clip the glaciers. These basins do not necessarily contain only one glacier, but could also contain several smaller glaciers. The main uncertainties of this and most other hydrological-based approaches stem from DEM artifacts in the accumulation zones (Fig. 9). A further problem is also to consistently separate small glaciers and ice caps without a distinctive tongue. Finally, small glacier parts reaching a bit over the mountain ridge are separated with this algorithm. Hence, also in this approach manual improvements are necessary. The ice divide calculated by hydrological methods using a single DEM assumes no migration of the divides through time. Kienholz et al. (2013) further automated the existing approaches using the lowermost point of a glacier and included a neighborhood analysis in order to avoid that more complicated shapes of glacier tongues being also considered as one glacier and to merge the sliver polygons (the small parts reaching over the mountain ridges) to the main glacier part. Guo et al. (2015) used hydrological modeling tools and the calculated aspect from a DEM to identify mountain ridges. The mountain ridges are ice divides and were subsequently used to split the contiguous ice masses. Once outlines of individual glaciers are available, glacier area, length, and topographic parameters such as mean, median, minimum, and maximum elevation, slope, and aspect can be calculated. The surface area and also the perimeter can automatically be calculated in a GIS. Important in this regard is the usage of an appropriate metric projection. The calculation of the glacier length is more demanding. The glacier length is usually defined as “the length of the longest flowline of the whole glacier.” GIS tools can be used to identify the highest point of a glacier and then hydrological tools can be used to follow the steepest gradient until the lowermost point of a glacier is reached. This method is relatively easy to apply automatically but overestimates the length, as the path would not follow the main flow line in the ablation region but rather one margin due to the convex nature of the glacier tongues. However, this method is still suitable to get an estimate of the glacier length and allows comparison of individual glaciers (Schiefer et al., 2008). Successful further developments use the glacier geometry to identify center points for defined elevation steps, which are then connected to derive the flowline (Le Bris and Paul, 2013). Kienholz et al. (2014) approximated the glacier length generating a grid-based least-cost-path. Machguth and Huss (2014) calculated the length of all glaciers in the world also based on constructing a grid-based path by maximizing the slope angle from one center line point to the next and included the criteria of also maximizing the distance to the glacier margins. However, most approaches give the longest centerline only, whereas Kienholz et al. (2014) generate an entire branch line network. The topographic parameters can be calculated using the glacier outlines, a DEM and its derivates (slope, aspect) and the zonal statistics tool implemented in GIS software (Fig. 10B). For the aspect, it must be taken into account that it is a circular parameter. Hence, the mean values must be derived by decomposition in the particular sine and cosine values (Paul et al., 2009). The median

122

GIS for Glaciers and Glacial Landforms

(A)

(B)

(C)

450 150 Number Area [km ]

250

6800

400 NW

350

NE

100

250 150 200 100

50 W

E

0

150

50 0

SE

SW

0.1-0.5

0.5-1.0

1.0-5.0

6200 6000 5800

5.0-25

Size classes

5400 5200

0 0.05-0.1

6400

5600

100 50

Area 2001 Area 1976

6600

300 Area [km ]

Number

200

N

Elevation [m a.s.l.]

300

S

0

10

20

30

40

50

60

70

80

90

Area [km ]

Fig. 10 Example for important glacier characteristics derived from glacier outlines and a DEM using GIS software: glacier number and area for different size classes (A); aspect of the glaciers (B); area-elevation distribution for two different years (C). Bolch, T., Yao, T., Kang, S., Buchroithner, M.F., Scherer, D., Maussion, F., Huintjes, E. and Schneider, C. (2010b). A glacier inventory for the western Nyainqentanglha Range and Nam Co Basin, Tibet, and glacier changes 1976–2009. Cryosphere 4, 419–433. doi:10.5194/tc-4-419-2010.

elevation can be considered a suitable proxy of the equilibrium line altitude (ELA), the altitude at which the accumulation equals the ablation of a glacier (Braithwaite and Raper, 2009). The ELA is one of the most important glacier indices as it strongly relates to climate conditions and climate change. Further important information which can be calculated in a GIS is the glacier hypsometry (area-elevation distribution, Fig. 10C). With the ELA and the hypsometry it can, e.g., be calculated how much ice would be prone to additional melt due to an increase of the ELA caused by climate change (Huss and Hock, 2015; Paul et al., 2007).

2.06.4

Glacier Volume and Glacier Bed Topography

Information about the glacier volume and its distribution is essential for several purposes such as estimating the potential maximum contribution of the glacier to sea level rise (Huss and Farinotti, 2012; Radic and Hock, 2011), improved modeling of glacier retreat, for projections of the future evolution of the glaciers (Huss and Hock, 2015) or their hazard potential (Frey et al., 2010). Whereas glacier area can be mapped directly from satellite, ice thickness information is only indirectly available by application of geophysical methods or by drilling of bore holes (Clarke, 1987). An approximation of the total ice volume can be calculated from the glacier area using a power law function (volume-area-scaling (Bahr et al., 1997; Bahr et al., 2015; Chen and Ohmura, 1990)). This approach is widely used due to its simplicity (Adhikari and Marshall, 2012; Grinsted, 2013; Radic and Hock, 2011). Most important is that the scaling parameter and exponent be carefully chosen for different regions and glacier types. However, calculating volume based on the area only can be highly uncertain and significantly depends on the quality of the outlines. Volumes would be significantly different if contributing areas or glaciers at steep slopes were considered as part of the main glacier, or not. Moreover, the volume would suddenly change when a glacier disintegrated. This approach is simple to calculate within a GIS but is accompanied by high uncertainties (Frey et al., 2014). In addition, no information is given regarding the spatial distribution of the ice volume. Recently, different modeling approaches have been developed to determine ice thickness and its distribution from glacier outlines and DEMs. With the topographic information available in detailed glacier inventories, it is possible to use glacier length and elevation range to derive a slope-dependent mean thickness for large samples of glaciers (Haeberli and Hoelzle, 1995). Corresponding thickness estimates for individual glaciers are considered to be more realistic than area-dependent estimates, because glacier thickness is related to flow and, hence, slope-dependent (Cuffrey and Paterson, 2010). In contrast to the volume of glaciers derived by scaling, glaciers of the same sizes can have different volumes. A further important parameter to consider is the basal shear stress of the glacier. This parameter is hard to measure but can be approximated from the vertical glacier extent (Haeberli and Hoelzle, 1995). More physically based approaches are based on mass conservation and principles of ice flow dynamics. E.g., Farinotti et al. (2009), whose approach was further developed by Huss and Farinotti (2012), estimated ice thickness by inverting the estimated ice volume flux along the glacier relying on mass turnover and the shallow ice approximation, assuming perfect plasticity of the ice. These estimates require a detailed parameterization or modeling of the involved physical processes (e.g., ice flux, basal velocity, surface mass balance). Remote sensing can provide relevant information such as surface velocity and mass balances. Clarke et al. (2013) estimated the ice thickness for a sample of glaciers in Western Canada using ice extent, surface topography, surface mass balance, and rate of surface elevation change as basic input. Gantayat et al. (2014) derived ice thickness values for a clean-ice glacier based on surface velocity and the equation of laminar flow. McNabb et al. (2012) calculated the ice thickness by an inverse approach by solving the continuity equation with surface mass-balance rates and surface velocity as basic inputs. An approach which is suitable to realize in a GIS system was developed by Linsbauer et al. (2009) (Fig. 11A, B) and applied to the Swiss Alps (Linsbauer et al., 2012). The authors relate ice thickness to local topographic parameters along glacier flow lines, assuming decreasing ice thickness with increasing slope angle. Local ice thickness is estimated in 50 m elevation bins together with a glacier-specific mean value of basal shear stress and subsequently spatially interpolated within a GIS. The basal shear stress is determined by the elevation range covered by the glacier system with an upper bound of 1.5 bar for glaciers with an elevation

GIS for Glaciers and Glacial Landforms

(A)

123

(B)

Glacier outlines

Digitized flow lines

DEM

Surface slope

fl elevation sectors

sin α of el. sectors

Basal shear stress (τ )

Contour lines (50 m) Glacier outlines Branch lines Surface slope Base points Interpolated glacier bed

ΔH sin

SIA

Base points

IDW

h Interpolation points

Topogrid

DEM of glacier bed

(C)

(D)

0

1

2 km

Glacier outlines (1973) Central branch lines

Ice thickness [m]

Contour lines 50 m (1985)

0

20 40 60 80 100 125 150 175 200 250 300 350

Fig. 11 Flowchart of the method (A) and schematic diagram of the modeled parameters (B). Input data: DEM, glacier outline and central branch lines (C); the corresponding modeled ice thickness distribution of Morteratschgletscher (D). (A) and (B) Linsbauer, A., Paul, F., Hoelzle, M., Frey, H. and Haeberli, W. (2009). The Swiss Alps without glaciersdA GIS-based modelling approach for reconstruction of glacier beds. In: Proceedings of Geomophometry, Zurich, Switzerland, 31 August–2 September 2009, pp. 243–247. (C) and (D) Linsbauer, A. (2013). Modeling ice thickness distribution and glacier bed topography from sparse input data to assess future glacier changes. PhD-Thesis. Geographisches Institut: Zurich, 166 pp.

range exceeding 1600 m (Haeberli and Hoelzle, 1995). This approach requires only glacier outlines, glacier flow or branch lines, and a DEM. The model results are calculated automatically for a large glacier sample using map algebra and a specific tool for spatial extrapolation, such as inverse distance weighting. Fig. 11C, D illustrates the application of the approach of this approach for Morteratschgletscher in the Swiss Alps. Comparison to ice thickness measurements using ground penetrating radar revealed that the general shape of the Morteratsch glacier bed, including the major overdeepenings, is well captured but locally larger deviations exist. It can be therefore concluded that the approach is suitable to reproduce the general characteristics of a glacier bed well. Using the same principle, Frey et al. (2014) calculated ice thickness values for randomly selected glacier pixels by considering local surface topography. Thus, they could avoid manual branch line digitization and were able to estimate the ice thickness and its distribution for the entire Himalaya. The approach is also suitable to detect overdeepenings in the glacier bed which can be in locations for future glacial lakes in the case of continued glacier retreat (Linsbauer et al., 2012, 2016).

124

GIS for Glaciers and Glacial Landforms

2.06.5

Glacier Changes

In this section information about the usage of GIS for assessing glacier changes is presented. Glacier length changes or fluctuations of the glacier snout were traditionally measured in the field, sometimes on an annual basis. Measurements started especially in the mid-20th century but also somewhat earlier. Glacier extents prior to the start of measurements can be derived from historical, geological, and biological evidence (Leclercq et al., 2014). E.g., positions from terminal moraines, especially those from the LIA, can often be (i) identified from satellite imagery (see “Mapping Former Glacier Extents”, Figs. 1 and 14) and (ii) used to extend length change information in time. One difficulty especially in the field is to identify a clear point from which to measure the glacier length change. Therefore, the distance from several points was measured perpendicular to the glacier snout (Fig. 12A). The length change is calculated from the mean of the changes along each line. With the increasing availability of airborne and satellite imagery, it also became practical to measure glacier length changes or snout fluctuations using multitemporal images in a GIS. Most important is that the images are all properly orthorectified or coregistered. The change is often measured along the central flow line, which can be more easily identified from images than on the ground. However, changing shapes of glacier termini can make comparable measurements difficult. E.g., reported retreat rates of Gangotri Glacier in Garhwal Himalaya vary significantly (Bhambri et al., 2012). Hence, it is recommended to use similar methodology as with ground-based measurements and calculate retreat rates from the intersection of parallel band stripes with the glacier outline (Fig. 12A). Almost 500 length series are available which start before 1950 and cover at least four decades. The longest record starts in 1535 (Mer de Glace), but the majority of time series start after 1850 (Fig. 12B; Leclercq et al., 2014). The earliest available sources in which glaciers are adequately represented are usually topographic maps. One of the oldest accurate topographic map series is the Dufour map from Switzerland, which covers the entire country in the scale 1:100,000 and was published between the years 1845 and 1860. The glacier extent is, hence, close to the maximum extent of the LIA. Several accurate maps are available for glaciers in the Himalayas generated by explorers in the early 20th century (Bhambri and Bolch, 2009; Schmidt and Nüsser, 2009). The first available satellite images are images from the Corona and Hexagon missions acquired by the United States in the 1960s and 1970s obtained especially over the Soviet Union during the Cold War. The panchromatic images have a relatively high spatial resolution between 2 and 7.6 m (Dashora et al., 2007). Despite complex distortions, these images have been proven to be suitable for glacier mapping and glacier change analysis. Ideally orthoimages should be generated. However, as this process is quite demanding due to the distortions and only approximately known camera parameters, these images are often

(B)

(A) Glacier outlines N

Anutsia, Russian Arctic2 Chamberlin, Greenland3

1965

Hymingsjokll, Iceland4

1968 2006

Nigardsbreen, Norway5

Δ L unit is 1 km

1980

Aldegonda, Svalbard1

Dinglestadt, Alaska6 Griffin, Canada7 Sofiyskiy, Russia8 Hintereis, Austria9 Nisqually, US10 Mer de Glace, France11 Bolshoy Azau, Russia12 Golubina, Kyrgyzstan13 Raikot, Pakistan14 Speke, Uganda15 Zongo, Bolivia16

0

100 200

400 Meters

Cipreses, Chile17 Frias, Argentina18 Franz Josef, New Zealand19 Hobbs Glacier, Antarctica20

1500 1600 1700 1800 1900 2000 Year

Fig. 12 Glacier outlines derived from different satellite imageries and overlaid stripes with 50 m distance. The average retreat rates are derived from the intersection of the glacier outlines with the band of stripes (A), source: Bhambri, R., Bolch, T. and Chaujar, R. K. (2012). Frontal recession of Gangotri Glacier, Garhwal Himalayas, from 1965–2006, measured through high resolution remote sensing data. Current Science 102 (3), 489–494; examples of glacier length records from different parts of the world. Each dot represents a data point (B), source: Leclercq, P. W., Oerlemans, J., Basagic, H. J., Bushueva, I., Cook, A. J., and Le Bris, R. (2014). A data set of worldwide glacier length fluctuations. The Cryosphere 8 (2), 659–672. doi:10.5194/tc-8-659-2014.

GIS for Glaciers and Glacial Landforms

125

20%

Area change

0% –20% –40% –60% –80% –100% 0.01

0.1

1 Area [km2]

10

100

Fig. 13 Scatter plot showing area changes vs. initial glacier area for a case study in Northern Tien Shan. It shows that smaller glaciers show larger relative area changes than large glaciers. Similar characteristics can be found in most regions of the globe. Bolch, T. (2007). Climate change and glacier retreat in northern Tien Shan (Kazakhstan/Kyrgyzstan) using remote sensing data. Global and Planetary Change 56, 1–12.

coregistered to recent imagery using transformations implemented in GIS software (e.g., Bhambri et al., 2011b; Bolch et al., 2010b; Narama et al., 2010; Schmidt and Nüsser, 2012). Glacier monitoring was possible with the start of the Landsat mission in 1972 and further improved with the Landsat 4 mission containing the improved TM sensor (cf. “Mapping Glacier Extents” section). However, cloud cover was a serious problem for glacier monitoring in humid regions like the Norwegian Skandes (Andreassen et al., 2008) or SE Tibet (Guo et al., 2015). This situation significantly improved with the launch of Landsat 8 OLI and Sentinel-2 as the chance for acquisition of a suitable image is much higher due to the repeat cycle of 10 days or less (Paul et al., 2016). The opening of the entire Landsat archive in the year 2008 provided a huge boost in glacier studies, so that now studies are available for almost every glacierized region of the globe. However, many studies focus on a catchment of a smaller region only. Studies presenting glacier changes over a large area include Narama et al. (2010) for the Tien Shan, and Bolch et al. (2010a) for Western Canada. GIS tools are instrumental in analysis of glacier changes with respect to their area and topographic settings. Usually larger glaciers lost relatively less glacier area than smaller ones (Fig. 13) and south exposed glaciers show a stronger retreat than north exposed ones. Glaciers are an excellent indicator for climate change because even the layman can recognize the changes easily. GIS tools allow glacier changes to be visualized, e.g., by overlaying outlines representing satellite images from different years (Fig. 14). As mentioned previously, glacier area and length changes show indirect signals to climate while the glacier mass budget can be directly linked to climate forcing. Comparisons of DEMs from two different times allow the analysis of changes in surface elevation and hence provide information about glacier mass changes (Bamber and Rivera, 2007). The information can be derived by a simple difference operation in GIS and needs only a glacier outline as additional information. Most important is that the utilized DEMs are well coregistered to each other (Nuth and Kääb, 2011). The freely available near-global SRTM DEM (coverage 60 N to 56 S) acquired in Feb. 2000 provided an important baseline dataset for comparisons with DEMs from other periods. These can be, e.g., national DEMs such as the TRIM DEM for Western Canada (Schiefer et al., 2007), DHM25 or the recent SwissAlti3d from Switzerland (Fischer et al., 2015; Paul and Haeberli, 2008). DEMs can also be generated from stereo satellite images. Frequently used data with stereo capabilities are ASTER (Berthier et al., 2016; Bolch et al., 2017; Kääb, 2008) or SPOT data (Berthier et al., 2007; Berthier and Toutin, 2008; Gardelle et al., 2013; Pieczonka et al., 2013). InSAR derived DEMs especially from TerraSAR-X and TanDEM-X data are increasingly used (Neckel et al., 2013; Rankl and Braun, 2016). Most available studies cover the period after the year 2000. Declassified stereo data from the 1960s and 1970s has also been successfully applied to estimate glacier mass changes, e.g., in the Mt. Everest area using Corona data (Bolch et al., 2008b; Bolch et al., 2011; Fig. 15). Especially promising is Hexagon data, due to less image distortion and larger ground coverage (Bolch et al., 2017; Maurer et al., 2016; Pieczonka and Bolch, 2015).

2.06.6

Terrain Analysis of Glaciers and Glacial Landforms

In this chapter, methods from analytical cartography that apply three-dimensional data of Earth’s surface (especially DEMs) in the context of glaciers and glacial landforms are discussed. The level of detail attainable by such investigations depends largely on properties and quality of the DEM used. Beside horizontal resolution, also referred to as posting, type and distribution of errors are of particular relevance, especially when working in high mountain environments where flaws may affect substantial portions of a dataset, impairing terrain analysis (Fig. 16). The SRTM and ASTER GDEM datasets (each have a spatial resolution of 1 arcsec. or  30 m) are most widely used today owing to free availability and competitive properties (Farr et al., 2007; Hayakawa et al., 2008). Prior to 2015, the SRTM C-band data was only available in the resolution of 3 arcsec. ( 90 m). Supplemental data providing insights

126

GIS for Glaciers and Glacial Landforms

11

10

Fläche [km ]

9

2004

8 Roseg Tschierva

7

1992

6

1976 5 1850

1850

1870

1890

1910

1930 Jahr

1950

1970

1990

2010 T. Bolch 2005

Fig. 14 Pseudo 3D view showing the area loss of Tschierva and Roseg glaciers in the Swiss Alps. Background: ASTER image from 2004; lower left: derived length changes.

Glacier

Deviation (m)

No data −50 −25 −10 10 25 50

Fig. 15 Elevation changes of the glaciers south of Mt. Everest derived from 1970 Corona and 2007 Cartosat-1 DEMs. Slightly modified after Bolch, T., Pieczonka, T. and Benn, D. I. (2011). Multi-decadal mass loss of glaciers in the Everest area (Nepal, Himalaya) derived from stereo imagery. Cryosphere 5, 349–358. doi:10.5194/tc-5-349-2011.

GIS for Glaciers and Glacial Landforms

80°

80°15′

80°

80°15′

42°15′ 42°

42°

42°

42°15′

(C)

42°15′

(A)

127

Glacier outlines

2000 3000 4000 5000 6000 7000 8000 m a.s.I. 80°

Glacier outlines

2000 3000 4000 5000 6000 7000 8000 m a.s.I.

Summit elev. (ASTER)

80°15′

80°

80°15′

80°

80°15′

42°15′

42°15′ 42°

42°

42°15′

(D)

42°

(B)

Summit elev. (SRTM)

voids 80°

80°15′

Fig. 16 Comparison of ASTER (left column) and SRTM (right column) DEMs in glaciated high mountain environments using the example region of Inylchek Glacier, Tien Shan, Central Asia. (A) Elevation and semitransparent hill-shade from ASTER GDEM2 data; (B) number of ASTER tiles that were used for DEM processing; (C) elevation and semitransparent hillshade from void-filled SRTM three arcsecond DEM; (D) voids in the raw one arcsecond SRTM dataset. Geodetic elevations for summits marked in (A) and (C) are 7010 m a.s.l. (Khan Tengri, north) and 7439 m a.s.l. (Mt. Tomur/Jengish Chokusu, south); note the underestimations of summit elevations, particularly by the SRTM dataset. Red markers indicate prominent examples of: (1) etching-like blunders resulting from clouds in original ASTER imagery; (2) bumpy surface noise; (3) sharp edges between original SRTM and void fill data (GTOPO30); (4) smooth surfaces lacking details (interpolated GTOPO30 data).

regarding quality and data gaps are available for both DEMs and it is certainly advisable to consider these before starting detailed analysis. In glaciological studies, the X-band SRTM DEM also has a certain relevance despite its patchy coverage owing to less penetration into ice and snow (Gardelle et al., 2012). As a commercial product with worldwide coverage and high resolution ( 12 m), the TanDEM-X DEM (Moreira et al., 2004) is particularly relevant to glaciologists since it facilitates detailed investigations of remote study areas. Properties and applications of these DEMs are evaluated in Pipaud et al. (2015). In configurations requiring DEMs of higher resolution, digital elevation data must be obtained by individual field campaigns, e.g., through structure from motion (SfM, Westoby et al., 2012), unmanned aerial vehicle flights (Immerzeel et al., 2014), or light detecting and ranging (LiDAR) measurements (McCormack et al., 2008).

128

GIS for Glaciers and Glacial Landforms

2.06.6.1

Morphometric Analysis

The term morphometry refers to quantitative analysis of form, complementary to morphography, conceived as qualitative description. Owing to specific relationships between glaciers, climate, topography, bedrock, sediments, etc., the majority of glacial landforms has well-defined morphographic characteristics. A straightforward example is a terminal moraine which morphographically could be defined as an elongated linear landform rising above its surroundings, typically forming a lobe. Morphometrical analysis of this feature could involve measuring properties such as its highest and lowest elevation, relative height above surrounding ground, outer and inner slope angles, and the direction it is facing. In contrast to mere morphographic description, the results of morphometric measurements often provide crucial insights to the dynamics of the coupled system under which the features formed, even if the glaciers are long gone. Basically all morphometric information on a landform can be obtained from DEMs, provided sufficiently high resolution and data quality. Owing to its outstanding capabilities regarding DEM analysis, GIS software provides a powerful tool for morphometric investigations. In the following, different morphometric properties will be discussed with focus on their potential in the context of analyzing glacial landforms in GIS.

2.06.6.1.1

Key morphometric parameters in the glacial context

A DEM is typically displayed in a GIS environment by a color ramp, stretching from the DEM’s lowest to highest values. To facilitate visual interpretation, it is advisable in most cases to load several instances of the DEM and apply different styles and configurations. Depending on the desired look and focus of study, this may include experimenting with different color ramps, using classified vs. stretched display, or stretching the full color ramp over the portion that is currently being displayed. In addition, putting a semitransparent hillshade layer on top of the DEMs creates a three-dimensional, map-like design (Fig. 17A). The resulting setup provides the basis to begin morphometric analysis with the most fundamental parameter, i.e. elevation. Surface elevation and glaciation are coupled in multiple ways in complex interactive systems. The key to disentangling this system is to understand the two basic controls elevation exerts on local climates: (a) temperature through adiabatic lapse rate, meaning air is cooling by  0.5 to  0.98 K each 100 m it is lifted vertically depending on the amount of evaporation, as well as (b), precipitation from advected air masses through orographic rainfall and precipitation shielding in LUV and lee positions, respectively (Bishop and Shroder, 2004). One of the fundamental contributions of terrain analysis to glaciology is complementing glacier inventory data with hypsometric information, such as elevations of highest summit and terminus (cf. “Generation of a Glacier Inventory” section). However, hypsometric attributes can also be appended in individual GIS-based glacier studies by applying zonal statistics tools to a polygon shapefile of glacier outlines and a DEM. A similar approach could be used to obtain datasets of cirque floor altitudes (CFA). Cirque floors will typically be mapped in expert-driven manual approaches because clear morphometric criteria to identify cirques are hard to establish. Conversely, visual identification is relatively straightforward, facilitating the creation of a point shapefile of cirque floor locations. Deriving altitudes from a DEM can subsequently be accomplished by the same zonal statistics technique. Analyzing datasets of CFA measurements, e.g., regarding their spatial distributions or various statistical properties, may provide insights into (paleo)climatic forcing and local variance of glaciation (Barr and Spagnolo, 2015a, 2015b; Rother et al., 2017). Beyond measuring elevations of specific points, further insights on glaciers, individual landforms or the whole study area can be obtained by investigating hypsometric properties of 3D surfaces. In a GIS-based DEM analysis, this is achieved by classifying the DEM in adequate steps (also referred to as “elevation bins”). The choice of classification steps will depend on the range of values in the sample under consideration and desired resolution. Subsequent to classification, plots of the histogram and relative as well as absolute hypsometric curves facilitate investigations of elevation–area relationships (Fig. 10C). Clearly indicating portions of terrain at individual levels of elevation, such plots are a powerful tool to identify morphological properties of a glacier under investigation, including possible tectonic and lithologic impacts as well as erosional and accumulative properties. Orogen-scale and worldwide investigations of hypsometric properties of mountain glaciers revealed distinct relationships to regional ELAs and mountain topography, forming the base of the vividly discussed “glacial buzzsaw” theory (Brocklehurst and Whipple, 2004; Hall and Kleman, 2014; Whipple et al., 1999). The example in Fig. 17B presents hypsographic data from the Mt. Everest Region in the Nepalese Himalaya. Here, the DEM histogram shows an almost Gaussian distribution, revealing the transient character of this landscape. The highest points refer to the prominent summits, including the spectacular peaks of Everest and Lhotse. However, the distribution also shows that, despite outstandingly good performance of the SRTM DEM in this region, the geodetic elevation of Everest (8848 m a.s.l.) is not met. The bulk of values lies between 4900 and 5900 m a.s.l., indicated by peak values in pixel counts and a steep rise of the cumulative curve, representing surface elevations of vast avalanching troughs and debris-covered glacier tongues as well as the extraordinarily high base level of erosion of the Tibetan Plateau to the North. Lowest elevations in the detail refer to the deeply incised V-shaped valley in the upper reaches of Dudh Koshi River (lower left corner in Fig. 17A). Implicitly, the discussion of patterns of hypsometric distributions already contains references to the slope angle, which is defined as the change of elevation with distance or, in more mathematical terms, the first derivative of elevation (Hengl and Reuter, 2008). Slope angles exert substantial influence on ice dynamics, including overall flow velocities as well as occurrence of ice falls and avalanching. Calculating slope angles from DEM data is a fundamental feature of all GIS software; sophisticated solutions such as SAGA (Conrad et al., 2015; http://www.saga-gis.org/) or QGIS (http://qgis.org/) allow a choice between different algorithms. In the Everest example, steep cirque headwalls exhibit the highest slope angles, signifying the region’s extremely rugged topography with vast areas of slope angles > 60 degrees. These are in marked contrast to both the sharp, narrow ridges and the surfaces of the debris-covered glacier tongues. Characteristic to the latter are irregular to undulating surfaces of low slope angles, typically in a range of 1–15 degrees (see also “Mapping of Debris-Covered Glaciers” section). Distinct surfaces of near-zero slope angle typically

129

GIS for Glaciers and Glacial Landforms

86°45′

87° 140

6000

28°

28°

(B) 120

5000

100 Count [pixel]

4000 80 3000 60 2000 40 1000

20

0 00

00

85

00

80

00

75

00

70

00

65

00

60

00

55

00

50

00

45

40

35

30

Elevation [m a.s.I.] 87°

86°45

28°

28°

(D) 28°

(C)

00

0 00

87°

27°45′

27°45′

86°45′

3 Cumulative count [pixel • 10 ]

(A)

N NW

NE

87°

27°45′

SE S

87°

86°45′

87°

(F)

28°

28°

(E)

SW

86°45′

28°

27°45′

86°45′

27°45′

E

W

Plan curvature

27°45′

convex –

87°

27°45′

86°45′

concave –straight –

27°45′

Profiel curvature

convex – –straight concave –

Fig. 17 Key morphometric parameters of the Everest region, Nepal, based on the void-filled SRTM3 DEM (A) Elevation with semitransparent hillshade, (B) elevation histogram and cumulative hypsometric curve related to the same DEM detail, (C) classified slope angle in degrees. (D) Classified slope aspects, combined with classified slope angles; (E) classified plan and profile curvature combined (after after Dikau, 1988); (F) landform types from topographic position indices (after Jenness, 2006).

130

GIS for Glaciers and Glacial Landforms

indicate water bodies. In this high mountain environment, these are typically moraine-dammed proglacial lakes which are of particular research interest, owing to their potential to drain in so-called “glacier lake outburst floods,” posing substantial risks to life and infrastructure downstream (Bolch et al., 2008a; Richardson and Reynolds, 2000). The most prominent example in Fig. 17C is located slightly east of the center, representing Lake Imja Tsho. The second derivative of elevation, i.e., the change of slope angle with distance, is referred to as “curvature”. In contrast to slope angles, where the direction of measurement is always along the steepest path, curvature can be calculated in different directions. The most commonly used types, “profile curvature” and” plan curvature,” quantify the rate of change along the steepest paths and parallel to contour lines, respectively. Curvature has been used to identify moraines and glacier edges (Bishop et al., 2001; Bolch and Kamp, 2006; see also “Mapping of Debris-Covered Glaciers” section). Profile curvature facilitates identification of knickpoints along the glacier’s course based on classification of curvatures into convex, concave and straight. It is also possible to combine profile and plan curvatures to obtain a simple landform classification (Dikau, 1988, Fig. 17E). Further, more specialized approaches to analyze glacial topography and landforms are discussed in the “Landform Classification” Section. The direction a slope faces, commonly referred to as the slope aspect, exerts substantial control on the surface energy balance of mountain glaciers (Evans, 2006). Receipt of solar radiation is higher on slopes facing toward the equator and lower on slopes facing poleward. The resulting contrasts are increasing with latitude; however, the effect may also be very pronounced in lower latitudes owing to higher potential solar radiation. The Mount Everest massif is a paramount example for pronounced contrasts between north- and southfacing slopes, facilitating markedly larger glaciers on the northern (e.g., Rongbuk and Kanshung) than on the southern side (e.g., Lhotse and Nuptse). Fig. 17D shows how DEM-based aspect calculations also highlight ridges, making it a versatile base for their delineation.

2.06.6.2

Morphometric Analysis of the Equilibrium Line Altitude

The ELA, i.e., the boundary between a glacier’s accumulation and ablation areas, is a key diagnostic feature in glaciology (Porter, 1975). It integrates the forcing of exogenous drivers from both climate and topography (Nesje, 1992; Ohmura et al., 1990), closely correlates to the glacier’s mass balance, and can be measured or calculated by a wealth of different methods (Benn and Lehmkuhl, 2000). A general classification of these methods distinguishes between glaciologic and morphometric approaches which are based on glaciological field data and geometric properties of the glacier, respectively. Morphometric approaches of ELA calculation assume distinct relationships between different parameters of a glacier’s form and its mass balance. Many of these relationships were established in empirical studies more than 100 years ago. For example, observations in the European Alps led Höfer (1879) to the conclusion that the ELA was always at the same ratio of altitude between a glacier’s terminus and its average ridge altitude above the ELA (note the circular argument). Since then, a variety of similar approaches has been proposed, basically varying in the feature which is considered the “highest point of a glacier,” e.g., summit (Louis, 1955) or headwall (Meierding, 1982). Beside these so-called glacier elevation indices (GEI, Benn and Lehmkuhl, 2000), area-based morphometrical approaches to estimate ELAs also have a long tradition beginning with the work of Kurowski (1891), review in Braithwaite (2015). The accumulation area ratio (AAR, Porter, 1975) is considered the most accurate morphometric approach to ELA calculation and became a widely used standard (Nesje, 1992). A recently released GIS tool has the potential to facilitate AAR application further (Pellitero et al., 2015). Area-based and GEI approaches have both been criticized, e.g., for being too sensitive regarding slope angles and valley morphology (Nesje, 1992) or for a lack of conceptual relationship to the steady-state ELA (Benn and Lehmkuhl, 2000). Conversely, several studies investigating correlations between morphometric ELA calculations and glaciological ELA data or proxies found good agreement, particularly for mountain glaciers (e.g., Braithwaite, 2015; Loibl et al., 2014). Nevertheless, all studies evaluating morphometric methods to calculate ELAs agree that careful calibration and evaluation are fundamental prerequisites to obtain adequate results. Abundant debris cover and reconstituted glaciers including free ice fall are examples of features rendering any of the previously mentioned methods impossible, whereas glaciers with clean ice and relatively simple geometries are suited best (Benn and Lehmkuhl, 2000; Loibl et al., 2014). It is advisable to check and calibrate morphometric ELA measurements using available mass balance measurements, e.g., at the World Glacier Monitoring Service (WGMS, http://wgms.ch) and other ELA indicators, e.g., the late summer transient snowline (Loibl et al., 2014; Shea et al., 2013; Spiess et al., 2015). Comparing the focus and potential of the different morphological approaches, the AAR’s greatest potential is to obtain sophisticated estimations for individual glaciers, whereas well-configured GEIs may be applied to gather large sets of data with high spatial resolution but less confidence regarding calculations at individual glaciers (Loibl et al., 2014; Fig. 18).

2.06.6.3

Landform Classification

As outlined before, morphometric characterization of glacial environments provides quantitative insights regarding a variety of properties. Its great advantages are objectivity, transferability and ease of use, at least for standard approaches. However, some tasks cannot be accomplished by outlining morphometric properties but require a morphogenetic assignment to landform types, i.e., defining which processes created a certain landform. Such classification may be based exclusively on morphometrical parameters but can also include various other information. Sedimentological and lithological properties are of particular relevance in glaciological landform analysis, owing to abundant information they convey regarding former process regimes (Benn and Evans, 2010). In glaciology, landform classification is primarily applied in investigations of paleoglaciation (Section “Mapping Former Glacier Extents”) where the extent and/or former dynamics are reconstructed from landform evidence. The techniques applied

GIS for Glaciers and Glacial Landforms

131

Fig. 18 An example of high-resolution (A) ELA and (B) dELA data from southeastern Tibet, calculated using a “refined toe-to-ridge altitude method” (rTRAM). Loibl, D., Lehmkuhl, F. and Grießinger, J. (2014). Reconstructing glacier retreat since the Little Ice Age in SE Tibet by glacier mapping and equilibrium line altitude calculation. Geomorphology 214, 22–39. doi:10.1016/j.geomorph.2014.03.018.

for this purpose are closely related to those used, e.g., in geomorphological mapping, but often have a strong focus on specifics of glacial landforms. Before going into further detail, it is useful to revisit some key challenges of geomorphological mapping that led to vivid debates for approximately five decades now and are also highly relevant when investigating glacial landforms. First, it is important to appreciate that a priori definitions of landforms do not exist; landform classification always is based on human perception. As a consequence, classic geomorphological schools often had a distinct focus on semantics, resulting in an outstandingly elaborate and complex terminology. Nevertheless, semantic approaches to landform definition remain widely descriptive and often vague, making it easy to challenge them by their lack of distinct thresholds (Smith and Mark, 2003). This lack of accurately defined thresholds also affects quantitative approaches to landform definition, exacerbating the determination of boundaries or ranges for values a specific morphometric parameter must not exceed at specific landforms (e.g., Fisher, 2000). Adding to this challenge is the fact that such criteria need to be suitable at different spatial scales even though relevant morphometric parameters, like curvature, for example, are strictly scale-dependent. Another relevant concept in this context is self-similarity, referring to the fact that many landforms or different variants of landforms exist at a wide range of spatial scales, e.g., microripples to megadunes. Also, different processes may result in similar landforms. Despite these challenges, improved DEMs and enhancements of GIS software tools allowed investigation of the nexus between morphometrical surface parameters and landform definition in more detail (e.g., Böhner et al., 1997; Rasemann et al., 2004). Allred and Luo (2016) demonstrate that combining DEM-based GIS and data mining methods provides a promising tool to identify key defining morphometric parameters for specific landforms, i.e., glacial valleys. From a glaciologist’s point of view, the relevance of these mentioned aspects may not be apparent at first glance. However, every time glacial landforms are investigated, these challenges inevitably play a role; not knowing or not considering them will in most cases reduce the quality of the resulting datasets. This is particularly important in the context of reconstructing paleoglaciations, e.g., when building morphosequences, mapping former ice extents, or analyzing moraine chronologies. In practice, creating a table listing all relevant landforms with definitions and boundary criteria provides a good means to clarify the concept of classification (Glasser et al., 2008). The most straightforward approach to classify landforms in a GIS environment is manual mapping from remote sensing data. Base data are typically DEMs plus DEM-based rasters such as hillshade, slope and curvature as well as optical imagery. Even though this setup is relatively simple, the mapping procedure usually requires expert-level understanding of the landform configuration in the study area. Unless the study area is already very well known, it is therefore not advisable to start mapping the first time the base data is loaded into the GIS environment. Instead, successful mapping requires becoming acquainted with the specific landform configuration before delineating the first feature. It is therefore advisable to visually analyze the study area first, using combinations of 3D tools such as Google Earth or ArcScene with the 2D view in the GIS environment. This will typically result in identifying one or more key sites which can subsequently serve for initial test mapping. Critically assessing the results of the test mapping will show whether all relevant landforms are considered, the shapefiles contain all required attributes, mapping scale and map layout are adequate, etc. Manual classification of glacial landforms in GIS environments has become a standard tool. Consequently, a wealth of highquality examples exists out of which a brief selection is presented in the following, aiming to tentatively demonstrate the range of different applications. However, a unified legend or methodological standard does not exist, resulting in a variety of different designs and different foci of content. On very small scales, remote sensing–based landform classification allows investigation of patterns of distribution for whole mountain ranges. Such approaches were, for example, used to investigate patterns of spatial distribution of glacial landforms and paleoglaciation by Glasser et al. (2008) and Harrison et al. (2008) in Patagonia as well as Stroeven

132

GIS for Glaciers and Glacial Landforms

et al. (2013) and Fu et al. (2013) in High Asia. More detailed maps often combine field and remote sensing data. A comparative analysis by Smith et al. (2006) revealed that on larger scales (they used 1:10,000) a combination of field work and high-resolution LiDAR data facilitated the creation of a near-complete inventory of glacial landforms, whereas remote sensing–based mapping from freely available DEMs and satellite imagery was limited to delineating regional patterns of the largest landforms. Lukas (2012) combined aerial photographs and topographic maps to map selected glacial features, i.e.,moraines, roches moutonnées, and icemolded bedrock. Loibl et al. (2015) created high-resolution maps of glacier forefields in remote regions of southeastern Tibet by compiling results from field mapping with DEM analyses and high-resolution satellite imagery from Google Earth. High resolution aerial imagery and field mapping were used to delineate moraine ridges in the Skálafellsjökull foreland, Iceland, at high spatial detail (Chandler et al., 2016). In the context of dating moraine stages from former glaciation, maps combining landform classification with chronological classification are a powerful tool to visualize complex deglaciated environments (Putnam et al., 2013; Reznichenko et al., 2016; Rother et al., 2017) (Fig. 19). Expert-driven manual mapping is still by far the most precise approach to map landforms; the higher the complexity of landform configuration and individual landforms, the greater become the advantages of manual mapping. Conversely, manual mapping is very time-consuming and thus expensive in most cases. To this end several automated and semiautomated GIS methods have been developed. Typically, these approaches aim to increase the speed of cognition but are limited to relatively simple landforms. In addition, technical implementation of these techniques often requires distinct GIS and programming skills. The probably most often applied script to semiautomatic classification of landscape elements is the Topographic Position Index (TPI; Jenness, 2006; Fig. 17F). The TPI is based on relatively simple algorithms that calculate the difference between a cell elevation value and the average elevation of neighboring cells. A basic TPI classification is limited to identifying slope positions. However, combining two TPI calculations with different neighborhood size allows the delineation of a variety of landforms. Several more sophisticated yet less convenient techniques are demonstrated in Pipaud et al. (2015). A semi-automated DEM-based approach to map geomorphological units in high mountain environments is presented by van Asselen and Seijmonsbergen (2006). Smith et al. (2009) present a semiautomated approach to extract drumlins from DEM data. Hiller and Smith (2008) use a synthetic DEM representing the same landscape with certain statistically valid modifications to investigate the quality of the same semiautomated method to delineate drumlins. Beyond the aforementioned approaches which all base on relatively straightforward methods of spatial or optical analysis, several studies applied more complex algorithms to classify landforms. Brown et al. (1998) experimented with maximum likelihood classification and artificial neural nets to classify different landscape elements from Pleistocene glaciations in Michigan, USA. Anders et al. (2015) created an object-based rule set to extract cirques from LiDAR data and color-infrared orthophotos.

2.06.7

Conclusions and Future Perspectives

We have shown that geo-information technology and geo-information software offer in combination with remote sensing images and DEMs a wide range of applications for analyzing glaciers and glacial landforms. GIS supports not only the mapping of glaciers; the possibilities for information extraction offered in GIS software are key for the generation of glacier inventories. While the extraction of topographic parameters was quite demanding in earlier epochs based on topographic map data, the information can now be extracted in a short amount of time based on a DEM and the generated glacier outlines. In addition, formerly developed methods can nowadays be relatively easily implemented in GIS software without in-depth programming skills. As an example, the suggested parameterization scheme to estimate the glacier volume by Haeberli and Hoelzle (1995) may be mentioned, which was later realized in a GIS system (Linsbauer et al., 2012). The GIS approach allows not only estimation of the total ice volume of the glaciers but also the ice volume/thickness distribution, which is especially important for glacier projections. In addition, the information can be calculated for entire mountain ranges simultaneously. GIS allows, besides quantification of glacier area and length changes obtained based on direct (glacier margins) or indirect (moraines) evidence using multitemporal sources such as satellite and aerial images or topographic maps, also the investigation of changes of glacier surface elevation by differencing DEMs of two or more periods in time. Most important in this regard is, besides reliable DEMs, a precise co-registration. GIS offers the baseline tools for this process. Glacier changes can easily be quantified but also the variability of the changes can be analyzed with respect to the characteristics of the glaciers, such as area, length, slope and aspect or the area elevation distribution. A GIS allows also the extraction of topographic information based on a DEM for morphometric analysis of the terrain in order to get information about former glacier characteristics and extents. These analyses are supported by the freely available near global SRTM DEM and the ASTER GDEM with a spatial resolution of about 30 m. The morphometric analysis includes the identification of specific terrain features such as moraines or drumlins or U-shape valleys and cirques, which all enable reconstruction of glacier coverage. These analyses are used in most cases to support the manual mapping of these features or also debris-covered glaciers, but automation is in rapid development. Important features of GIS software are the powerful visualization possibilities which make it, e.g., easy to show the relief in a way so that it appears with a 3D effect and surface features can well be identified (Fig. 20). Overlays of vectors from multiple periods enable even the layman to understand the magnitude of glacier changes also for inaccessible parts of the world. 3D views and animations are specifically favorable in this regard. GIS technology in combination with the freely available or low-cost satellite imagery and DEMs led to a significantly improved knowledge of the glaciers and their changes over time during the last years and allows investigation of almost every region of the

GIS for Glaciers and Glacial Landforms

30°8⬘N

94°38⬘E

94°38⬘E

Sk

ál

af

el

ls



ku

ll

30°9⬘N

133

30°9⬘N

Moraine ridges

Roche moutonnées

Water bodies

Fluted moraines

Glaciofluvial sediments

Contour lines (20 m)

Meltwater channels

Kame terraces

30°8⬘N

Fig. 19 Examples of GIS-based landform classification in glacial studies: (A) Map of a glacier forefield in New Zealand (Putnam, A. E., Schaefer, J. M., Denton, G. H., Barrell, D. J. A., Birkel, S. D., Andersen, B. G., Kaplan, M. R., Finkel, R. C., Schwartz, R. and Doughty, A. M. (2013). The last glacial maximum at 44 S documented by a 10Be moraine chronology at Lake Ohau, Southern Alps of New Zealand. Quaternary Science Reviews 62, 114–141. doi:10.1016/j.quascirev.2012.10.034); (B) Excerpt from a small-scale map of glacial features in Patagonia (Glasser, N. F., Jansson, K. N., Harrison, S. and Kleman, J. (2008). The glacial geomorphology and Pleistocene history of South America between 38 S and 56 S. Quaternary Science Reviews 27, 365–390. doi:10.1016/j.quascirev.2007.11.011); (C) Large-scale map of a glacier forefield in the eastern Nyainqênthanglha Range, southeastern Tibet (Loibl, D., Hochreuther, P., Schulte, P., Hülle, D., Zhu, H., Bräuning, A. and Lehmkuhl, F. (2015). Toward a late Holocene glacial chronology for the eastern Nyainqêntanglha Range, southeastern Tibet. Quaternary Science Reviews 107, 243–259. doi:10.1016/j.quascirev.2014.10.034.); (D) Detailed, very large-scale map of glacial landforms at Skálafellsjökull, Iceland (Chandler, B. M. P., Evans, D. J. A., Roberts, D. H. (2016). Characteristics of recessional moraines at a temperate glacier in SE Iceland: Insights into patterns, rates and drivers of glacier retreat. Quaternary Science Reviews 135, 171–205. doi:10.1016/j.quascirev.2016.01.025).

globe. The knowledge gained provides fundamental data to improve the modeling of glaciers and glacial processes in order not only to better understand the past but to improve future predictions. This is of high relevance for the general society with respect to the impact of glacier changes on sea-level rise, local hydrology and water availability, as well as the impact on natural hazards such as rock falls and mudflows of glacial lake outburst floods.

134

GIS for Glaciers and Glacial Landforms

Fig. 20 Hillshade of a DEM generated from Pleiades data (DEM resolution 2 m) at Muztag Ata (Eastern Pamir). The glacier extent, specific landforms such as rock glaciers and moraines and also the knik caused by a tectonic fault are well visible.

GIS and remote sensing are currently experiencing rapid development as more and more data and results from studies become available and are, in an ideal case, shared through international databases. Moreover, and even more important, is the development in earth observation technologies. A wealth of data is now provided on a regular basis from the Sentinel missions from the European Space Agency in addition to the long-lasting Landsat mission from USGS. Many satellite data from rapidly developing countries like China, India and Brazil may also become available with less restrictions or for a cheaper price in the near future. Several ultra-high resolution stereo images, such as Pleiades and Worldview-3 and data from SAR missions like Terra-SAR-X or Cosmo-SkyMed provide the opportunity to study glaciers and glacial landscapes in a so far unprecedented detail (e.g., Fig. 20). The generation of the WorldDEM, a now global DEM based on TanDEM-X data with a spatial resolution of 12 m and an unmatched vertical accuracy is almost complete and even higher resolution DEMs generated based on WorldView-3 data were recently released from several parts of the world with the focus on the Arctic. Hence, the challenge for the future will be how to handle all the available data and information in a most efficient way. In this regard further automatization and effective methods to analyze the data are required and new scale depended parameterizations need to be developed. GIS technology and software were key in the past and will be even more important in the future, so that not only a huge amount of data is produced but that the data will lead to a significant improvement of our knowledge in the future.

References Adhikari, S., Marshall, S.J., 2012. Glacier volume-area relation for high-order mechanics and transient glacier states. Geophysical Research Letters 39. http://dx.doi.org/10.1029/ 2012GL052712. Ahmed, N., Mahtab, A., Agrawal, R., Jayaprasad, P., Pathan, S.K., Ajai Singh, D.K., Singh, A.K., 2007. Extraction and validation of Cartosat-1 DEM. Journal of the Indian Society of Remote Sensing 35 (2), 121–127. http://dx.doi.org/10.1007/BF02990776. Allred, K.J., Luo, W., 2016. Quantifying and predicting the glacial extent using valley morphometry and data-mining techniques. Annals of GIS 22 (3), 203–214. http://dx.doi.org/ 10.1080/19475683.2016.1195873. Altmaier, A., Kany, C., 2002. Digital surface model generation from CORONA satellite images. ISPRS Journal of Photogrammetry and Remote Sensing 56 (4), 221–235. Anders, N.S., Seijmonsbergen, A.C., Bouten, W., 2015. Rule set transferability for object-based feature extraction: An example for cirque mapping. Photogrammetric Engineering & Remote Sensing 81 (6), 507–514. http://dx.doi.org/10.14358/PERS.81.6.507. Andreassen, L.M., Paul, F., Kääb, A., Hausberg, J.E., 2008. Landsat-derived glacier inventory for Jotunheimen, Norway, and deduced glacier changes since the 1930s. The Cryosphere 2 (2), 131–145. Aniya, M., Sato, H., Naruse, R., Skvarca, P., Casassa, G., 1996. The use of satellite and airborne imagery to inventory outlet glaciers of the Southern Patagonian Icefield, South America. Photogrammetric Engineering & Remote Sensing 62, 1361–1369. Bahr, D.B., Meier, M.F., Peckham, S.D., 1997. The physical basis of area volume scaling. Journal of Geophysical Research 102 (B9), 20,355. Bahr, D.B., Pfeffer, W.T., Kaser, G., 2015. A review of volume-area scaling of glaciers. Reviews of Geophysics 53 (1), 95–140. http://dx.doi.org/10.1002/2014RG000470. Bajracharya, S.R., Shrestha, B.R. (Eds.), 2011. The status of glaciers in the Hindu Kush-Himalayan region. ICIMOD, Kathmandu, 127 pp. Bamber, J.L., Rivera, A., 2007. A review of remote sensing methods for glacier mass balance determination. Global and Planetary Change 59 (1–4), 138–148. Barnett, T.P., Adam, J.C., Lettenmaier, D.P., 2005. Potential impacts of a warming climate on water availability in snow-dominated regions. Nature 438. http://dx.doi.org/10.1038/ nature04141. Barr, I.D., Spagnolo, M., 2015a. Glacial cirques as palaeoenvironmental indicators: Their potential and limitations. Earth-Science Reviews 151, 48–78. http://dx.doi.org/10.1016/ j.earscirev.2015.10.004. Barr, I.D., Spagnolo, M., 2015b. Understanding controls on cirque floor altitudes: Insights from Kamchatka. Geomorphology 248, 1–13. http://dx.doi.org/10.1016/ j.geomorph.2015.07.004. Barry, R.G., 2006. The status of research on glaciers and global glacier recession: A review. Progress in Physical Geography 30 (3), 285–306. Bayr, K., Hall, D., Kovalick, W., 1994. Observations on glaciers in the eastern Austrian Alps using satellite data. International Journal of Remote Sensing 15, 1733–1742. Benn, D.I., Evans, D.J.A., 2010. Glaciers und glaciation, 2nd edn. Hodder Education, London. 802 pp. Benn, D.I., Lehmkuhl, F., 2000. Mass balance and equilibrium-line altitudes of glaciers in high-mountain environments. Quaternary International (65–66), 15–29.

GIS for Glaciers and Glacial Landforms

135

Benn, D.I., Bolch, T., Hands, K., Gulley, J., Luckman, A., Nicholson, L.I., Quincey, D., Thompson, S., Toumi, R., Wiseman, S., 2012. Response of debris-covered glaciers in the Mount Everest region to recent warming, and implications for outburst flood hazards. Earth-Science Reviews 114 (1–2), 156–174. http://dx.doi.org/10.1016/ j.earscirev.2012.03.008. Berthier, E., Toutin, T., 2008. SPOT5-HRS digital elevation models and the monitoring of glacier elevation changes in North-West Canada and South-East Alaska. Remote Sensing of Environment 112 (5), 2443–2454. http://dx.doi.org/10.1016/j.rse.2007.11.004. Berthier, E., Arnaud, Y., Kumar, R., Ahmad, S., Wagnon, P., Chevallier, P., 2007. Remote sensing estimates of glacier mass balances in the Himachal Pradesh (Western Himalaya, India). Remote Sensing of Environment 108 (3), 327–338. Berthier, E., Cabot, V., Vincent, C., Six, D., 2016. Decadal region-wide and glacier-wide mass balances derived from multi-temporal ASTER satellite digital elevation models. Validation over the Mont-Blanc area. Frontiers in Earth Science 4. http://dx.doi.org/10.3389/feart.2016.00063. Bhambri, R., Bolch, T., 2009. Glacier mapping: A review with special reference to the Indian Himalayas. Progress in Physical Geography 33 (5), 672–704. http://dx.doi.org/ 10.1177/0309133309348112. Bhambri, R., Bolch, T., Chaujar, R.K., 2011a. Mapping of debris-covered glaciers in the Garhwal Himalayas using ASTER DEMs and thermal data. International Journal of Remote Sensing 32 (23), 8095–8119. http://dx.doi.org/10.1080/01431161.2010.532821. Bhambri, R., Bolch, T., Chaujar, R.K., Kulshreshtha, S.C., 2011b. Glacier changes in the Garhwal Himalayas, India 1968–2006 based on remote sensing. Journal of Glaciology 57 (203), 543–556. Bhambri, R., Bolch, T., Chaujar, R.K., 2012. Frontal recession of Gangotri Glacier, Garhwal Himalayas, from 1965 to 2006, measured through high resolution remote sensing data. Current Science 102 (3), 489–494. Bignone, F., Umakawa, H., 2008. Assessment of ALOS PRISM digital elevation model extrraction over Japan. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XXXVII (Part B1), 1135–1138. Bishop, M.P., Shroder, J. (Eds.), 2004. Geographic information science and mountain geomorpology. Springer, Berlin. Bishop, M.P., Bonk Jr., R., Kamp Jr., U., Shroder Jr., J.F., 2001. Terrain analysis and data modeling for alpine glacier mapping. Polar Geography 25 (3), 182–201. http:// dx.doi.org/10.1080/10889370109377712. Böhner, J., Köthe, R., Trachinow, C., 1997. Weiterentwicklung der automatischen Reliefanalyse auf der Basis von digitalenGelandemodellen. Göttinger Geographische Arbeiten 100, 3–21. Bolch, T., 2015. Glacier area and mass changes since 1964 in the Ala Archa Valley, Kyrgyz Ala-Too, northern Tien Shan. Mëe j Coe[ (Ice and Snow) 1 (219), 28–39. Bolch, T., Kamp, U., 2006. Glacier mapping in high mountains using DEMs, Landsat and ASTER data. Grazer Schriften der Geographie und Raumforschung 41. In: Proceedings of the 8th International Symposium on High Mountain Remote Sensing Cartography, 20–27 March 2005, La Paz, Bolivia, pp. 13–24. Bolch, T., Buchroithner, M.F., Kunert, A., Kamp, U., 2007. Automated delineation of debris-covered glaciers based on ASTER data. GeoInformation in Europe. In: Gomarasca, M.A. (Ed.)Proceeding of the 27th EARSeL-Symposium, 4–7 June 07, Bozen, Italy. Millpress, Netherlands, pp. 403–410. Bolch, T., Buchroithner, M.F., Peters, J., Baessler, M., Bajracharya, S.R., 2008a. Identification of glacier motion and potentially dangerous glacier lakes at Mt. Everest area/Nepal using spaceborne imagery. Natural Hazards and Earth System Sciences 8 (6), 1329–1340. Bolch, T., Buchroithner, M.F., Pieczonka, T., Kunert, A., 2008b. Planimetric and volumetric glacier changes in Khumbu Himalaya since 1962 using Corona, Landsat TM and ASTER data. Journal of Glaciology 54 (187), 592–600. http://dx.doi.org/10.3189/002214308786570782. Bolch, T., Menounos, B., Wheate, R.D., 2010a. Landsat-based inventory of glaciers in western Canada, 1985–2005. Remote Sensing of Environment 114 (1), 127–137. http:// dx.doi.org/10.1016/j.rse.2009.08.015. Bolch, T., Yao, T., Kang, S., Buchroithner, M.F., Scherer, D., Maussion, F., Huintjes, E., Schneider, C., 2010b. A glacier inventory for the western Nyainqentanglha Range and Nam Co Basin, Tibet, and glacier changes 1976–2009. The Cryosphere 4, 419–433. http://dx.doi.org/10.5194/tc-4-419-2010. Bolch, T., Pieczonka, T., Benn, D.I., 2011. Multi-decadal mass loss of glaciers in the Everest area (Nepal, Himalaya) derived from stereo imagery. The Cryosphere 5, 349–358. http://dx.doi.org/10.5194/tc-5-349-2011. Bolch, T., Kulkarni, A., Kääb, A., Huggel, C., Paul, F., Cogley, J.G., Frey, H., Kargel, J.S., Fujita, K., Scheel, M., Bajracharya, S., Stoffel, M., 2012. The state and fate of Himalayan glaciers. Science 336 (6079), 310–314. http://dx.doi.org/10.1126/science.1215828. Bolch, T., Pieczonka, T., Mukherjee, K., Shea, J., 2017. Brief communication: Glaciers in the Hunza catchment (Karakoram) have been nearly in balance since the 1970s. The Cryosphere 11 (1), 531–539. http://dx.doi.org/10.5194/tc-11-531-2017. Braithwaite, R.J., 2015. From Doktor Kurowski’s Schneegrenze to our modern glacier equilibrium line altitude (ELA). The Cryosphere 9 (6), 2135–2148. http://dx.doi.org/10.5194/ tc-9-2135-2015. Braithwaite, R.J., Raper, S., 2009. Estimating equilibrium-line altitude (ELA) from glacier inventory data. Annals of Glaciology 50 (53), 127–132. Brocklehurst, S., Whipple, K.X., 2004. Hypsometry of glaciated landscapes. Earth Surface Processes and Landforms 29 (7), 907–926. Brown, D., Lusch, D., Duda, K., 1998. Supervised classification of types of glaciated landscapes using digital elevation data. Geomorphology 21 (3/4), 233–250. Chandler, B.M.P., Evans, D.J.A., Roberts, D.H., 2016. Characteristics of recessional moraines at a temperate glacier in SE Iceland: Insights into patterns, rates and drivers of glacier retreat. Quaternary Science Reviews 135, 171–205. http://dx.doi.org/10.1016/j.quascirev.2016.01.025. Chen, J., Ohmura, A., 1990. Estimation of Alpine glacier water resources and their change since 1870s. IAHS Publication 193, 127–135. Clark, C.D., Evans, D.J.A., Khatwa, A., Bradwell, T., Jordan, C.J., Marsh, S.H., Mitchell, W.A., Bateman, M.D., 2004. Map and GIS database of glacial landforms and features related to the last British Ice Sheet. Boreas 33 (4), 359–375. http://dx.doi.org/10.1111/j.1502-3885.2004.tb01246.x. Clarke, G., 1987. A short history of scientific investigations on glaciers. International Glaciological Special Issue 4–24. Clarke, G., Anslow, F., Jarosch, A., Radic, V., Menounos, B., Bolch, T., Berthier, E., 2013. Ice volume and subglacial topography for western Canadian glaciers from mass balance fields, thinning rates, and a bed stress model. Journal of Climate 26, 4282–4303. http://dx.doi.org/10.1175/JCLI-D-12-00513.1. Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., Böhner, J., 2015. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geoscientific Model Development 8 (7), 1991–2007. http://dx.doi.org/10.5194/gmd-8-1991-2015. Cuffrey, K., Paterson, W.S.B., 2010. The physics of glaciers, 4th edn. Academic Press, Cambridge, MA. 704 pp. Dashora, A., Lohani, B., Malik, J.N., 2007. A repository of earth resource informationdCORONA satellite programme. Current Science 92 (7), 926–932. Della Ventura, A., Rampini, A., Rabagliati, R., Barbero, R., 1987. Development of a satellite remote sensing technique for the study of alpine glaciers. International Journal of Remote Sensing 8, 203–215. Dikau, R., 1988. Entwurf einer geomorphografisch-analytischen Systematik von Reliefeinheiten. Heidelberger Geographische Bausteine 5. Ehlers, J., Gibbard, P.L., 2004a. Quaternary glaciationsdextent and chronologydpart I: Europe, 1st Edn. Elsevier, Amsterdam. 475 pp. Ehlers, J., Gibbard, P.L., 2004b. Quaternary glaciationsdextent and chronologydPart III: South America, Asia, Africa, Australia, Antarctica, 1st edn. Elsevier, Amsterdam. 396 pp. Ehlers, J., Gibbard, P.L., 2004c. Quaternary glaciationsdextent and chronology: Part II: North America, 1st edn. Elsevier, Amsterdam. 452 pp. Evans, I.S., 2006. Local aspect asymmetry of mountain glaciation: A global survey of consistency of favoured directions for glacier numbers and altitudes. Geomorphology 73 (1–2), 166–184. http://dx.doi.org/10.1016/j.geomorph.2005.07.009. Farinotti, D., Huss, M., Bauder, A., Funk, M., Truffer, M., 2009. A method to estimate ice volume and ice thickness distribution of alpine glaciers. Journal of Glaciology 55, 422–430. Farr, T.G., Rosen, P.A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., Alsdorf, D., 2007. The shuttle radar topography mission. Reviews of Geophysics 45 (2), RG2004. Fischer, M., Huss, M., Hoelzle, M., 2015. Surface elevation and mass changes of all Swiss glaciers 1980–2010. The Cryosphere 9, 525–540.

136

GIS for Glaciers and Glacial Landforms

Fisher, P., 2000. Sorites paradox and vague geographies. Fuzzy Sets and Systems 113 (1), 7–18. Frey, H., Haeberli, W., Linsbauer, A., Huggel, C., Paul, F., 2010. A multi-level strategy for anticipating future glacier lake formation and associated hazard potentials. Natural Hazards and Earth System Sciences 10 (2), 339–352. Frey, H., Paul, F., Strozzi, T., 2012. Compilation of a glacier inventory for the western Himalayas from satellite data: Methods, challenges, and results. Remote Sensing of Environment 124, 832–843. http://dx.doi.org/10.1016/j.rse.2012.06.020. Frey, H., Machguth, H., Huss, M., Huggel, C., Bajracharya, S., Bolch, T., Kulkarni, A., Linsbauer, A., Salzmann, N., Stoffel, M., 2014. Estimating the volume of glaciers in the Himalayan–Karakoram region using different methods. The Cryosphere 8 (6), 2313–2333. http://dx.doi.org/10.5194/tc-8-2313-2014. Fu, P., Harbor, J.M., Stroeven, A.P., Hättestrand, C., Heyman, J., Zhou, L., 2013. Glacial geomorphology and paleoglaciation patterns in Shaluli Shan, the southeastern Tibetan PlateaudEvidence for polythermal ice cap glaciation. Geomorphology 182, 66–78. http://dx.doi.org/10.1016/j.geomorph.2012.10.030. Gantayat, P., Kulkarni, A.V., Srinivasan, J., 2014. Estimation of ice thickness using surface velocities and slope: Case study at Gangotri Glacier, India. Journal of Glaciology 60 (220), 277–282. Gardelle, J., Berthier, E., Arnaud, Y., 2012. Impact on resolution and radar penetration on glacier elevation changes computed from DEM differencing. Journal of Glaciology 58 (208), 419–422. Gardelle, J., Berthier, E., Arnaud, Y., Kääb, A., 2013. Region-wide glacier mass balances over the Pamir–Karakoram–Himalaya during 1999–2011. The Cryosphere 7, 1263–1286. http://dx.doi.org/10.5194/tc-7-1263-2013. Gardner, A.S., Moholdt, G., Cogley, J.G., Wouters, B., Arendt, A.A., Wahr, J., Berthier, E., Hock, R., Pfeffer, W.T., Kaser, G., Ligtenberg, S.R.M., Bolch, T., Sharp, M.J., Hagen, J.O., van den Broeke, M.R., Paul, Frank, 2013. A reconciled estimate of glacier contributions to sea level rise: 2003 to 2009. Science 340, 852–857. Glasser, N.F., Jansson, K.N., Harrison, S., Kleman, J., 2008. The glacial geomorphology and Pleistocene history of South America between 38 S and 56 S. Quaternary Science Reviews 27 (3–4), 365–390. http://dx.doi.org/10.1016/j.quascirev.2007.11.011. Gratton, D., Howarth, P., Marceau, D., 1990. Combining DEM parameters with Landsat MSS and TM imagery in a GIS for mountain glacier characterization. IEEE Transactions on Geoscience and Remote Sensing 28, 766–769. Grinsted, A., 2013. An estimate of global glacier volume. The Cryosphere 7 (1), 141–151. http://dx.doi.org/10.5194/tc-7-141-2013. Guo, W., Liu, S., Xu, J., Wu, L., Shangguan, D., Yao, X., Wei, J., Bao, W., Yu, P., Liu, Q., Jiang, Z., 2015. The second Chinese glacier inventory: Data, methods and results. Journal of Glaciology 61 (226), 357–372. Haeberli, W., Hoelzle, M., 1995. Application of inventory data for estimating characteristics of and regional climate-change effects on mountain glaciers: A pilot study with the European Alps. Annals of Glaciology 21, 206–212. Hall, A.M., Kleman, J., 2014. Glacial and periglacial buzzsaws: Fitting mechanisms to metaphors. Quaternary Research 81 (2), 189–192. http://dx.doi.org/10.1016/ j.yqres.2013.10.007. Hall, D.K., Riggs, G.A., Salomonson, V.V., 1995. Development of methods for mapping global snow cover using moderate resolution imaging spectroradiometer (MODIS) data. Remote Sensing of Environment 54, 127–140. Harrison, S., Glasser, N., Winchester, V., Haresign, E., Warren, C., Duller, G.A., Bailey, R., Ivy-Ochs, S., Jansson, K., Kubik, P., 2008. Glaciar León, Chilean Patagonia: LateHolocene chronology and geomorphology. The Holocene 18 (4), 643–652. http://dx.doi.org/10.1177/0959683607086771. Hayakawa, Y.S., Oguchi, T., Lin, Z., 2008. Comparison of new and existing global digital elevation models: ASTER G-DEM and SRTM-3. Geophysical Research Letters 35, L17404. http://dx.doi.org/10.1029/2008GL035036. Hengl, T., Reuter, H.I., 2008. Geomorphometry: Concepts, software, applications, 1st edn. Elsevier, Amsterdam. 796 pp. Heyman, J., Stroeven, A.P., Alexanderson, H., Hättestrand, C., Harbor, J., Li, Y., Caffee, M.W., Zhou, L., Veres, D., Liu, F., Machiedo, M., 2009. Palaeoglaciation of Bayan Har Shan, northeastern Tibetan Plateau: Glacial geology indicates maximum extents limited to ice cap and ice field scales. Journal of Quaternary Science 24 (7), 710–727. http:// dx.doi.org/10.1002/jqs.1305. Hiller, J.K., Smith, M., 2008. Residual relief separation: Digital elevation model enhancement for geomorphological mapping. Earth Surface Processes and Landforms 33 (14), 2266–2276. Höfer, H.V., 1879. Gletscher und Eiszeitstudien. Sitzungsberichte der Akademie der Wissenschaften Wien, math.-phys. Klasse I 79, 331–367. Huss, M., 2011. Present and future contribution of glacier storage change to runoff from macroscale drainage basins in Europe. Water Resources Research 47, W07511. http:// dx.doi.org/10.1029/2010WR010299. Huss, M., Farinotti, D., 2012. Distributed ice thickness and volume of all glaciers around the globe. Journal of Geophysical Research 117, F04010. http://dx.doi.org/10.1029/ 2012JF002523. Huss, M., Hock, R., 2015. A new model for global glacier change and sea-level rise. Frontiers in Earth Science 3, 54. http://dx.doi.org/10.3389/feart.2015.00054. Immerzeel, W.W., Kraaijenbrink, P., Shea, J.M., Shrestha, A.B., Pellicciotti, F., Bierkens, M.F., de Jong, S.M., 2014. High-resolution monitoring of Himalayan glacier dynamics using unmanned aerial vehicles. Remote Sensing of Environment 150, 93–103. Jenness, J. (2006). Topographic Position Index (TPI) v. 1.3a. Kääb, A., 2002. Monitoring high-mountain terrain deformation from repeated air- and spaceborne opitcal data: Examples using digital aerial imagery and ASTER data. ISPRS Journal of Photogrammetry and Remote Sensing 57, 39–52. Kääb, A., 2008. Glacier volume changes using ASTER satellite stereo and ICESat GLAS Laser Altimetry. A test study on Edgeøya Eastern Svalbard. IEEE Transactions on Geoscience and Remote Sensing 46 (10), 2823–2830. Kääb, A., Teichler, D., Nuth, C., Berthier, E., 2015. Brief communication: Contending estimates of 2003–2008 glacier mass balance over the Pamir–Karakoram–Himalaya. The Cryosphere 557–564. http://dx.doi.org/10.5194/tc-9-557-2015. Kamp, U., Bolch, T., Olsenholler, J., 2005. Geomorphometry of Cerro Sillajhuay, Chile/Bolivia: Comparison of DEMs derived from ASTER remote sensing data and contour maps. Geocarto International 20 (1), 23–34. Kargel, J.S., Leonard, G.J., Bishop, M.P., Kääb, A., Raup, B.H., 2014. Global Land Ice Measurements from Space. Springer, Berlin Heidelberg. Kaser, G., Grosshauser, M., Marzeion, B., 2010. Contribution potential of glaciers to water availability in different climate regimes. Proceedings of the National Academy of Sciences of the United States of America 107 (47), 20223–20227. http://dx.doi.org/10.1073/pnas.1008162107. Kienholz, C., Rich, J.L., Arendt, A.A., Hock, R., 2014. A new method for deriving glacier centerlines applied to glaciers in Alaska and northwest Canada. The Cryosphere 8, 503–519. Kienholz, C., Hock, Regine, Arendt, Anthony A., 2013. A new semi-automatic approach for dividing glacier complexes into individual glaciers. Journal of Glaciology 59 (217). Knoll, C., Kerschner, H., Heller, A., Rastner, P., 2009. A GIS-based reconstruction of Little Ice Age glacier maximum extents for South Tyrol, Italy. Transactions in GIS 13 (5-6), 449–463. http://dx.doi.org/10.1111/j.1467-9671.2009.01173.x. Krüger, T., 2013. Discovering the ice ages: International reception and consequences for a historical understanding of climate. Brill, Leiden, 554 pp. Kurowski, L., 1891. Die Höhe der Schneegrenze mit besonderer Berücksichtigung der Finsterahorngruppe. Penck’s Geographischen Abhandlung 5 (1). Le Bris, R., Paul, F., 2013. An automatic method to create flow lines for determination of glacier length: A pilot study with Alaskan glaciers. Computers & Geosciences 52, 234–245. Leclercq, P.W., Oerlemans, J., Basagic, H.J., Bushueva, I., Cook, A.J., Le Bris, R., 2014. A data set of worldwide glacier length fluctuations. The Cryosphere 8 (2), 659–672. http://dx.doi.org/10.5194/tc-8-659-2014. Lindholm, M.S., Heyman, J., 2016. Glacial geomorphology of the Maidika region, Tibetan Plateau. Journal of Maps 12 (5), 797–803. http://dx.doi.org/10.1080/ 17445647.2015.1078182.

GIS for Glaciers and Glacial Landforms

137

Linsbauer, A., Paul, F., Hoelzle, M., Frey, H., Haeberli, W., 2009. The Swiss Alps without glaciersdA GIS-based modelling approach for reconstruction of glacier beds. In: Proceedings of Geomophometry, Zurich, Switzerland, 31 August–2 September 2009, pp. 243–247. Linsbauer, A., Paul, F., Haeberli, W., 2012. Modeling glacier thickness distribution and bed topography over entire mountain ranges with GlabTop: Application of a fast and robust approach. Journal of Geophysical Research 117, F03007. http://dx.doi.org/10.1029/2011JF002313. Linsbauer, A., Frey, H., Haeberli, W., Machguth, H., Azam, M.F., Allen, S., 2016. Modelling glacier-bed overdeepenings and possible future lakes for the glaciers in the Himalaya– Karakoram region. Annals of Glaciology 57 (71), 117–130. Loibl, D., Lehmkuhl, F., Grießinger, J., 2014. Reconstructing glacier retreat since the Little Ice Age in SE Tibet by glacier mapping and equilibrium line altitude calculation. Geomorphology 214, 22–39. http://dx.doi.org/10.1016/j.geomorph.2014.03.018. Loibl, D., Hochreuther, P., Schulte, P., Hülle, D., Zhu, H., Bräuning, A., Lehmkuhl, F., 2015. Toward a late Holocene glacial chronology for the eastern Nyainqêntanglha Range, southeastern Tibet. Quaternary Science Reviews 107, 243–259. http://dx.doi.org/10.1016/j.quascirev.2014.10.034. Louis, H., 1955. Schneegrenze und Schneegrenzbestimmung. Geographisches Taschenbuch 1954/55, 414–418. Lowe, J.J., Walker, M.J.C., 1997. Reconstructing quaternary environments, 2nd edn. Longman, Harlow. 446 pp. Lukas, S., 2006. Morphostratigraphic principles in glacier reconstructionda perspective from the British Younger Dryas. Progress in Physical Geography 30 (6), 719–736. http:// dx.doi.org/10.1177/0309133306071955. Lukas, S., 2012. Processes of annual moraine formation at a temperate alpine valley glacier: Insights into glacier dynamics and climatic controls. Boreas 41 (3), 463–480. http:// dx.doi.org/10.1111/j.1502-3885.2011.00241.x. Machguth, H., Huss, M., 2014. The length of the world’s glaciersda new approach for the global calculation of center lines. The Cryosphere 8 (5), 1741–1755. http://dx.doi.org/ 10.5194/tc-8-1741-2014. Manley, W.F., 2008. Geospatial inventory and analysis of glaciers: A case study for the eastern Alaska Range. In: Williams, R., Ferrigno, J.G. (Eds.), Satellite image atlas of glaciers of the world. USGS professional paper 1386-K, pp. 424–439. Maurer, J., Rupper, S., Schaefer, J.M., 2016. Quantifying ice loss in the eastern Himalayas since 1974 using declassified spy satellite imagery. The Cryosphere 10, 2203–2215. McCormack, D.C., Irving, D.H.B., Brocklehurst, S.H., Rarity, F., 2008. Glacial geomorphological mapping of Coire Mhic Fhearchair, NW Scotland: The contribution of a highresolution ground based LiDAR survey. Journal of Maps 4 (1), 315–331. http://dx.doi.org/10.4113/jom.2008.1033. McNabb, R.W., Hock, R., O’Neel, S., Rasmussen, L., Ahn, Y., Braun, M., Conway, H., Herreid, S., Joughin, I., Pfeffer, W.T., Smith, B.E., Truffer, M., 2012. Using surface velocities to calculate ice thickness and bed topography: A case study at Columbia Glacier, Alaska, USA. Journal of Glaciology 58 (212), 1151–1164. http://dx.doi.org/10.3189/ 2012JoG11J249. Meierding, T.C., 1982. Late Pleistocene glacial equilibrium-line altitudes in the Colorado Front Range: A comparison of methods. Quaternary Research 18 (3), 289–310. http:// dx.doi.org/10.1016/0033-5894(82)90076-X. Mihalcea, C., Mayer, C., Diolaiuti, G., D’Agata, C., Smiraglia, C., Lambrecht, A., Vuillermoz, E., Tartari, G., 2008. Spatial distribution of debris thickness and melting from remotesensing and meteorological data, at debris-covered Baltoro glacier, Karakoram, Pakistan. Annals of Glaciology 48 (1), 49–57. http://dx.doi.org/10.3189/ 172756408784700680. Moreira, A., Krieger, G., Hajnsek, I., Hounam, D., Werner, M., Riegger, S., Settelmeyer, E., 2004. TanDEM-X: A TerraSAR-X add-on satellite for single-pass SAR interferometry. In: Proceedings of the IGARSS Conference at Alaska, September 2004, 4 pp. Narama, C., Kääb, A., Duishonakunov, M., Abdrakhmatov, K., 2010. Spatial variability of recent glacier area changes in the Tien Shan Mountains, Central Asia, using Corona (1970), Landsat (2000), and ALOS (2007) satellite data. Global and Planetary Change 71 (1-2), 42–54. http://dx.doi.org/10.1016/j.gloplacha.2009.08.002. Neckel, N., Braun, A., Kropácek, J., Hochschild, V., 2013. Recent mass balance of the Purogangri Ice Cap, central Tibetan Plateau, by means of differential X-band SAR interferometry. The Cryosphere 7 (5), 1623–1633. http://dx.doi.org/10.5194/tc-7-1623-2013. Nesje, A., 1992. Topographical effects on the equilibrium-line altitude on glaciers. GeoJournal 27 (4), 383–391. Nicholson, L., Benn, D.I., 2006. Calculating ice melt beneath a debris layer using meteorological data. Journal of Glaciology 52 (178), 463–470. Nuimura, T., Sakai, A., Taniguchi, K., Nagai, H., Lamsal, D., Tsutaki, S., Kozawa, A., Hoshina, Y., Takenaka, S., Omiya, S., Tsunematsu, K., Tshering, P., Fujita, K., 2015. The GAMDAM glacier inventory: A quality-controlled inventory of Asian glaciers. The Cryosphere 9 (3), 849–864. http://dx.doi.org/10.5194/tc-9-849-2015. Nussbaumer, S.U., Zumbühl, H.J., 2012. The Little Ice Age history of the Glacier des Bossons (Mont Blanc massif, France): A new high-resolution glacier length curve based on historical documents. Climatic Change 111 (2), 301–334. http://dx.doi.org/10.1007/s10584-011-0130-9. Nussbaumer, S.U., Nesje, A., Zumbühl, H.J., 2011. Historical glacier fluctuations of Jostedalsbreen and Folgefonna (southern Norway) reassessed by new pictorial and written evidence. The Holocene 21 (3), 455–471. http://dx.doi.org/10.1177/0959683610385728. Nuth, C., Kääb, A., 2011. Co-registration and bias corrections of satellite elevation data sets for quantifying glacier thickness change. The Cryosphere 5 (1), 271–290. http:// dx.doi.org/10.5194/tc-5-271-2011. Ohmura, A., Lang, H., Blumer, D., 1990. Glacial climate research in the Tianshan. Research report on project glacier No 1, 1985–1987. Paul, F., Haeberli, W., 2008. Spatial variability of glacier elevation changes in the Swiss Alps obtained from two digital elevation models. Geophysical Research Letters 35, L21502. http://dx.doi.org/10.1029/2008GL034718. Paul, F., Kääb, A., 2005. Perspectives on the production of a glacier inventory from multispectral satellite data in Arctic Canada: Cumberland Peninsula, Baffin Island. Annals of Glaciology 42, 59–66. Paul, F., Kääb, A., Maisch, M., Kellenberger, T., Haeberli, W., 2002. The new remote sensing derived Swiss glacier inventory: I methods. Annals of Glaciology 34, 355–361. Paul, F., Huggel, C., Kääb, A., 2004. Combining satellite multispectral image data and a digital elevation model for mapping of debris-covered glaciers. Remote Sensing of Environment 89 (4), 510–518. Paul, F., Maisch, M., Rothenbüler, C., Hoelzle, M., Haeberli, W., 2007. Calculation and visualisation of future glacier extent in the Swiss Alps by means of hypsographic modelling. Global and Planetary Change 55 (4), 343–357. Paul, F., Barry, R.G., Cogley, J.G., Haeberli, W., Ohmura, A., Ommanney, C., Raup, B., Rivera, A., Zemp, M., 2009. Recommendations for the compilation of glacier inventory data from digital sources. Annals of Glaciology 50 (53), 119–126. Paul, F., Barrand, N., Baumann, S., Berthier, E., Bolch, T., Casey, K.A., Frey, H., Joshi, S.P., Konovalov, V., LeBris, R., Mölg, N., Nosenko, G., Nuth, C., Pope, A., Racoviteanu, A., Rastner, P., Raup, B., Scharrer, K., Steffen, S., Winsvold, S., 2013. On the accuracy of glacier outlines derived from remote sensing data. Annals of Glaciology 54 (63), 171–182. Paul, F., Winsvold, H.S., Kääb, A., Nagler, T., Schwaizer, G., 2016. Glacier remote sensing using Sentinel-2. Part II: Mapping glacier extents and surface facies, and comparison to Landsat 8. Remote Sensing 8 (7), 15. http://dx.doi.org/10.3390/rs8070575. Pellicciotti, F., Stephan, C., Miles, E., Herreid, S., Immerzeel, W.W., Bolch, T., 2015. Mass balance changes of the debris-covered glaciers in the Langtang Himal in Nepal between 1974 and 1999. Journal of Glaciology 61 (256), 373–386. http://dx.doi.org/10.3189/2015JoG13J237. Pellitero, R., Rea, B.R., Spagnolo, M., Bakke, J., Hughes, P., Ivy-Ochs, S., Lukas, S., Ribolini, A., 2015. A GIS tool for automatic calculation of glacier equilibrium-line altitudes. Computers & Geosciences 82, 55–62. http://dx.doi.org/10.1016/j.cageo.2015.05.005. Penck, A., Brückner, E., 1909. Die Alpen im Eiszeitalter. Tauchnitz, Leipzig. Pfeffer, W., Arendt, A.A., Bliss, A., Bolch, T., Cogley, J.G., Gardner, A.S., Hagen, J.-O., Hock, R., Kaser, G., Kienholz, C., Miles, E.S., Moholdt, G., Mölg, N., Paul, F., Radic, V., Rastner, P., Raup, B.H., Rich, J., Sharp, M.J., 2014. The Randolph Glacier Inventory: A globally complete inventory of glaciers. Journal of Glaciology 60 (221), 537–552. Pieczonka, T., Bolch, T., 2015. Region-wide glacier mass budgets and area changes for the Central Tien Shan between  1975 and 1999 using Hexagon KH-9 imagery. Global and Planetary Change 128, 1–13. http://dx.doi.org/10.1016/j.gloplacha.2014.11.014.

138

GIS for Glaciers and Glacial Landforms

Pieczonka, T., Bolch, T., Wei, J., Liu, S., 2013. Heterogeneous mass loss of glaciers in the Aksu-Tarim Catchment (Central Tien Shan) revealed by 1976 KH-9 Hexagon and 2009 SPOT-5 stereo imagery. Remote Sensing of Environment 130, 233–244. http://dx.doi.org/10.1016/j.rse.2012.11.020. Pipaud, I., Loibl, D., Lehmkuhl, F., 2015. Evaluation of TanDEM-X elevation data for geomorphological mapping and interpretation in high mountain environmentsdA case study from SE Tibet, China. Geomorphology 246, 232–254. http://dx.doi.org/10.1016/j.geomorph.2015.06.025. Porter, S.C., 1975. Equilibrium-line altitudes of late Quaternary glaciers in the Southern Alps, New Zealand. Quaternary Research 5 (1), 27–47. http://dx.doi.org/10.1016/00335894(75)90047-2. Putnam, A.E., Schaefer, J.M., Denton, G.H., Barrell, D.J.A., Birkel, S.D., Andersen, B.G., Kaplan, M.R., Finkel, R.C., Schwartz, R., Doughty, A.M., 2013. The last glacial maximum at 44 S documented by a 10Be moraine chronology at Lake Ohau, Southern Alps of New Zealand. Quaternary Science Reviews 62, 114–141. http://dx.doi.org/10.1016/ j.quascirev.2012.10.034. Quincey, D., Luckman, A., Benn, D.I., 2009. Quantification of Everest region glacier velocities between 1992 and 2002, using satellite radar interferometry and feature tracking. Journal of Glaciology 55 (192), 596–606. Quincey, D., Bishop, M., Kääb, A., Berthier, E., Flach, B., Bolch, T., Buchroithner, M., Kamp, U., Khalsa, S., Toutin, T., Haritashya, U., Racoviteanu, A., Shroder, J., Raup, B., 2014. Digital terrain modeling and glacier topographic characterization. In: Kargel, J.S., Leonard, G.J., Bishop, M.P., Kääb, A., Raup, B.H. (Eds.), Global land ice measurements from space. Springer, Berlin, pp. 113–144. Racoviteanu, A., Williams, M.W., 2012. Decision tree and texture analysis for mapping debris-covered glaciers in the Kangchenjunga area, eastern Himalaya. Remote Sensing 4, 3078–3109. http://dx.doi.org/10.3390/rs4103078. Racoviteanu, A.E., Williams, M.W., Barry, R.G., 2008. Optical remote sensing of glacier characteristics: A review with focus on the Himalaya. Sensors 8, 3355–3383. http:// dx.doi.org/10.3390/s8053355. Radic, V., Hock, R., 2011. Regionally differentiated contribution of mountain glaciers and ice caps to future sea-level rise. Nature Geoscience 4 (2), 91–94. http://dx.doi.org/ 10.1038/ngeo1052. Radic, V., Hock, R., 2013. Glaciers in the Earth’s hydrological cycle: Assessments of glacier mass and runoff changes on global and regional scales. Surveys in Geophysics 35 (3), 313–387. http://dx.doi.org/10.1007/s10712-013-9262-y. Rankl, M., Braun, M., 2016. Glacier elevation and mass changes over the central Karakoram region estimated from TanDEM-X and SRTM/X-SAR digital elevation models. Annals of Glaciology 57 (71), 273–281. http://dx.doi.org/10.3189/2016AoG71A024. Ranzi, R., Grossi, G., Iacovelli, L., Taschner, T., 2004. Use of multispectral ASTER images for mapping debris-covered glaciers within the GLIMS Project. IEEE Transactions on Geoscience and Remote Sensing 2, 1144–1147. Rasemann, S., Schmidt, J., Schrott, L., Dikau, R., 2004. Geomorphometry in mountain terrain. In: Bishop, M.P., Shroder, J. (Eds.), Geographic information science and mountain geomorpology. Springer, Berlin, pp. 101–137. Rastner, P., Bolch, T., Mölg, N., Machguth, H., Le Bris, R., Paul, F., 2012. The first complete inventory of the local glaciers and ice caps on Greenland. The Cryosphere 6, 1483– 1495. http://dx.doi.org/10.5194/tc-6-1483-2012. Reznichenko, N.V., Davies, T.R.H., Winkler, S., 2016. Revised palaeoclimatic significance of Mueller Glacier moraines, Southern Alps New Zealand. Earth Surface Processes and Landforms 41 (2), 196–207. http://dx.doi.org/10.1002/esp.3848. Richardson, S., Reynolds, J., 2000. An overview of glacial hazards in the Himalayas. Quaternary International 65/66 (1), 31–47. Robson, B.A., Nuth, C., Dahl, S.O., Hölbling, D., Strozzi, T., Nielsen, P.R., 2015. Automated classification of debris-covered glaciers combining optical, SAR and topographic data in an object-based environment. Remote Sensing of Environment 170, 372–387. http://dx.doi.org/10.1016/j.rse.2015.10.001. Rother, H., Stauch, G., Loibl, D., Lehmkuhl, F., Freeman, S., 2017. Late Pleistocene glaciations at Lake Donggi Cona, eastern Kunlun Shan (NE-Tibet): Early maxima and a diminishing trend of glaciation during the last glacial cycle. Boreas. http://dx.doi.org/10.1111/bor.12227. Rott, H., 1976. Analyse der Schneeflächen auf Gletschern der Tiroler Zentralalpen aus Landsat Bildern. Zeitschrift für Gletscherkunde und Glazialgeologie 12, 1–28. Rott, H., 1994. Thematic studies in alpine areas by means of polarimetric SAR and optical imagery. Advances in Space Research 14, 217–226. Scherler, D., Bookhagen, B., Strecker, M.R., 2011. Spatially variable response of Himalayan glaciers to climate change affected by debris cover. Nature Geoscience 4, 156–159. http://dx.doi.org/10.1038/NGEO1068. Schiefer, E., Menounos, B., Wheate, R.D., 2007. Recent volume loss of British Columbia glaciers Canada. Geophysical Research Letters (34), L16503. http://dx.doi.org/10.1029/ 2007GL030780. Schiefer, E., Menounos, B., Wheate, R.D., 2008. An inventory and morphometric analysis of British Columbia glaciers Canada. Journal of Glaciology 54 (186), 551–560. Schmidt, S., Nüsser, M., 2009. Fluctuations of Raikot Glacier during the past 70 years: A case study from the Nanga Parbat massif, northern Pakistan. Journal of Glaciology 55 (194), 949–959. Schmidt, S., Nüsser, M., 2012. Changes of high altitude glaciers from 1969 to 2010 in the Trans-Himalayan Kang Yatze Massif, Ladakh, Northwest India. Arctic, Antarctic, and Alpine Research 44 (1), 107–121. http://dx.doi.org/10.1657/1938-4246-44.1. Sevestre, H., Benn, D.I., 2015. Climatic and geometric controls on the global distribution of surge-type glaciers: Implications for a unifying model of surging. Journal of Glaciology 61 (228), 646–662. http://dx.doi.org/10.3189/2015JoG14J136. Shea, J.M., Menounos, B., Moore, R.D., Tennant, C., 2013. An approach to derive regional snow lines and glacier mass change from MODIS imagery, western North America. The Cryosphere 7 (2), 667–680. http://dx.doi.org/10.5194/tc-7-667-2013. Shukla, A., Arora, M.K., Gupta, R.P., 2010. Synergistic approach for mapping debris-covered glaciers using optical-thermal remote sensing data with inputs from geomorphometric parameters. Remote Sensing of Environment 114 (7), 1378–1387. http://dx.doi.org/10.1016/j.rse.2010.01.015. Sidjak, R., Wheate, R., 1999. Glacier mapping of the Illecillewaet icefield, British Columbia, Canada, using Landsat TM and digital elevation data. International Journal of Remote Sensing 20, 273–284. Smith, B., Mark, D.M., 2003. Do mountains exist? Towards an ontology of landforms. Environment and Planning B: Planning and Design 30 (3), 411–427. Smith, M., Rose, J., Booth, S., 2006. Geomorphological mapping of glacial landforms from remotely sensed data: An evaluation of the principal data sources and an assessment of their quality. Geomorphology 76, 148–165. Smith, M.J., Rose, J., Gousie, M.B., 2009. The Cookie Cutter: A method for obtaining a quantitative 3D description of glacial bedforms. Geomorphology 108 (3–4), 209–218. http://dx.doi.org/10.1016/j.geomorph.2009.01.006. Smith, T., Bookhagen, B., Cannon, F., 2015. Improving semi-automated glacier mapping with a multi-method approach: Applications in central Asia. The Cryosphere 9 (5), 1747– 1759. http://dx.doi.org/10.5194/tc-9-1747-2015. Spiess, M., Maussion, F., Möller, M., Scherer, D., Schneider, C., 2015. MODIS derived equilibrium line altitude estimates for Purogangri Ice Cap, Tibetan Plateau, and their relation to climatic predictors (2001–2012). Geografiska Annaler. Series A, Physical Geography 97 (3), 599–614. http://dx.doi.org/10.1111/geoa.12102. Stroeven, A.P., Hättestrand, C., Heyman, J., Kleman, J., Morén, B.M., 2013. Glacial geomorphology of the Tian Shan. Journal of Maps 9 (4), 505–512. Surazakov, A., Aizen, V.B., 2010. Positional accuracy evaluation of declassified Hexagon KH-9 mapping camera imagery. Photogrammetric Engineering & Remote Sensing 76 (5), 603–608. Tang, H., Chen, Y., 2013. Global glaciations and atmospheric change at ca. 2.3 Ga. Geoscience Frontiers 4 (5), 583–596. http://dx.doi.org/10.1016/j.gsf.2013.02.003. Toutin, T., 2008. ASTER DEMs for geomatic and geoscientific applications: A review. International Journal of Remote Sensing 29 (7), 1855–1875. van Asselen, S., Seijmonsbergen, A.C., 2006. Expert-driven semi-automated geomorphological mapping for a mountainous area using a laser DTM. Geomorphology 78 (3–4), 309–320.

GIS for Glaciers and Glacial Landforms

139

Vaughan, D.G., Comiso, J.C., Allison, I., Carrasco, J., Kaser, G., Kwok, R., Mote, P., Murray, T., Paul, F., Ren, J., Rignot, E., Solomina, O., Steffen, K., Zhang, T., 2013. Observations: Cryosphere. In: Stocker, T., Qin, D., Plattner, G.-K., Tignor, M., Allen, S., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P. (Eds.)Climate change 2013: The physical science basis. Contribution of working group I to the fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge, pp. 317–382. Wanner, H., Beer, J., Bütikofer, J., Crowley, T.J., Cubasch, U., Flückiger, J., Goosse, H., Grosjean, M., Joos, F., Kaplan, J.O., Küttel, M., Müller, S.A., Prentice, I.C., Solomina, O., Stocker, T.F., Tarasov, P., Wagner, M., Widmann, M., 2008. Mid- to Late Holocene climate change: An overview. Quaternary Science Reviews 27 (19–20), 1791–1828. http:// dx.doi.org/10.1016/j.quascirev.2008.06.013. Westoby, M.J., Brasington, J., Glasser, N.F., Hambrey, M.J., Reynolds, J.M., 2012. ‘Structure-from-Motion’ photogrammetry: A low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314. http://dx.doi.org/10.1016/j.geomorph.2012.08.021. WGMS, 2008. Global glacier changes: Facts and figures, 88 pp. Whipple, K.X., Kirby, E., Brocklehurst, S.H., 1999. Geomorphic limits to climate-induced increases in topographic relief. Nature 401 (6748), 39–43. http://dx.doi.org/10.1038/ 43375. Zemp, M., Jansson, P., Holmlund, P., Gärtner-Roer, I., Koblet, T., Thee, P., Haeberli, W., 2010. Reanalysis of multi-temporal aerial images of Storglaciären, Sweden (1959–99) Part 2: Comparison of glaciological and volumetric mass balances. The Cryosphere 4, 345–357. Zemp, M., Frey, H., Gärtner-Roer, I., Nussbaumer, S.U., Hoelzle, M., Paul, F., Haeberli, W., Denzinger, F., Ahlstrøm, A.P., Anderson, B., Bajracharya, S., Baroni, C., Braun, L.N., Cáceres, B.E., Casassa, G., Cobos, G., Dávila, L.R., Delgado Granados, H., Demuth, M.N., Espizua, L., Fischer, A., Fujita, K., Gadek, B., Ghazanfar, A., Hagen, J.O., Holmlund, P., Karimi, N., Li, Z., Pelto, M., Pitte, P., Popovnin, V.V., Portocarrero, C.A., Prinz, R., Sangewar, C.V., Severskiy, I., Sigurðsson, O., Soruco, A., Usubaliev, R., Vincent, C., 2015. Historically unprecedented global glacier decline in the early 21st century. Journal of Glaciology 61 (228), 745–762. http://dx.doi.org/10.3189/ 2015JoG15J017.

2.07 GIS and Remote Sensing Applications in Wetland Mapping and Monitoring Qiusheng Wu, Binghamton University, State University of New York, Binghamton, NY, United States © 2018 Elsevier Inc. All rights reserved.

2.07.1 2.07.2 2.07.2.1 2.07.2.2 2.07.2.3 2.07.2.4 2.07.3 2.07.3.1 2.07.3.2 2.07.3.3 2.07.4 2.07.4.1 2.07.4.2 2.07.5

Introduction Wetland Indicators Hydrology Hydrophytic Vegetation Hydric Soils Topographic Position Wetland Classification Geospatial Data Classification Methods Classification Systems Current Large-Scale Wetland Inventories Global-Scale Wetland Inventories U.S. National Wetlands Inventory Case Study: Mapping Prairie Wetlands and Surface Hydrologic Flow Pathways Using LiDAR Data and Aerial Imagery Introduction Methods Results and Discussion Conclusion

2.07.5.1 2.07.5.2 2.07.5.3 2.07.6 References Relevant Websites

Abbreviations CCI Climate Change Initiative CIFOR Center for International Forestry Research DEM Digital elevation model ELUs Ecological Land Units ESA European Space Agency ESRI Environmental Systems Research Institute ETM Enhanced Thematic Mapper FAO Food and Agriculture Organization of the United Nations GIEMS Global Inundation Extent from Multi-Satellites GIS Geographical Information Systems GLCF Global Land Cover Facility GLWD Global Lakes and Wetlands Database LiDAR Light detection and ranging MODIS Moderate Resolution Imaging Spectroradiometer NAIP National Agriculture Imagery Program NDVI Normalized difference vegetation index NDWI Normalized difference water index NIR Near infrared NRCS Natural Resources Conservation Service NWI National Wetlands Inventory OBIA Object-based image analysis PPR Prairie Pothole Region SAR Synthetic aperture radar TM Thematic Mapper UAS Unmanned aerial systems

140

141 141 142 142 142 143 143 143 144 144 144 145 145 147 147 152 153 155 155 157

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

141

UNEP United Nations Environment Program UNESCO United Nations Educational, Scientific and Cultural Organization USDA United States Department of Agriculture USGS United States Geological Survey WCMC World Conservation Monitoring Centre

2.07.1

Introduction

Wetlands are recognized as one of the world’s most valuable natural resources (Burton and Tiner, 2009; Tiner, 2015b). Wetlands provide numerous ecological and socioeconomic benefits, such as providing critical habitats for fish, wildlife, and plant communities, storing floodwater and reducing peak runoff, recharging groundwater, filtering impurities in water, acting as nutrient and sediment sinks, protecting shorelines from erosion, and providing a range of recreational opportunities (e.g., boating, fishing, hunting). With the increasing world population, human demands on wetland resources for agricultural expansion and urban development continue to increase (Mitsch and Gosselink, 2000; Zedler and Kercher, 2005). In addition, global climate change has pronounced impacts on wetland ecosystems through alternations in hydrological regimes (Erwin, 2009). It was estimated that 64% of the world’s wetlands have disappeared since 1900 (Ramsar Convention, 2009). The rate of wetland loss varies considerably from country to country. In a report to the United States Congress on the status of wetland resources, Dahl (1990) estimated that the Conterminous United States lost an estimated 53% of their original wetlands over a period of 200 years between the 1780s and the 1980s. Similarly, China suffered a 33% wetland loss in just 30 years from 1978 to 2008 (Niu et al., 2012). To better manage and conserve wetland resources, we need to know the distribution and extent of wetlands and monitor their dynamic changes. However, there is no single, indisputable, universally accepted definition of wetlands due to the diversity of wetlands (Cowardin et al., 1979; Tiner, 2009b), making it difficult to determine the global wetland extent. The term “wetlands” covers a wide variety of aquatic habitats, including marshes, swamps, bogs, fens, peatlands, prairie potholes, vernal pools, and aquatic beds, among others. In general, wetlands are transitional habitats situated between wet (e.g., lakes, rivers, streams, estuaries) and dry (upland) environments. Thus, the demarcation of a wetland lies along a continuum of water gradient and is somewhat arbitrary. Some wetland definitions include open-water habitats (e.g., lakes, rivers, streams) as wetlands, while others exclude permanent deepwater and focus more on shallow water habitats. For example, in the national wetland classification system adopted in the United States, permanently flooded freshwater aquatic systems deeper than 2 m are generally classified as deepwater habitats and are not considered as wetlands (Cowardin et al., 1979). In contrast, under the Ramsar international wetland conservation treaty, wetlands are defined as areas of marsh, fen, peatland, or water, whether natural or artificial, permanent or temporary, with water that is static or flowing, fresh, brackish or salt, including areas of marine water, the depth of which at low tide does not exceed 6 m (Ramsar Convention Secretariat, 2016). In the Canadian wetland classification system, a wetland is defined as land that is saturated with water long enough to promote wetland or aquatic processes as indicated by poorly drained soils, hydrophytic vegetation, and various kinds of biological activity which are adapted to a wet environment (National Wetlands Working Group, 1997). Although the technical definition of wetlands adopted by different countries varies to some degree, they still have much in common. The presence of water, either permanent or temporary, must be long enough to support animal species, plant communities, soil development, and the variety of functions attributed to these natural resources (Tiner, 2015b). Geographic Information System (GIS) and remote sensing technologies have proven to be useful for mapping and monitoring wetland resources (Adam et al., 2010; Lang et al., 2015; Lyon et al., 2001; Ozesmi and Bauer, 2002; Rebelo et al., 2009; Sader et al., 1995). Wetland maps and inventories provide crucial information for wetland conservation, restoration, and management. Since the first multispectral satellite data (i.e., Landsat MSS) became publicly available in the 1970s, significant efforts have been made to develop remote sensing technology. The technological advances have led to the increasing availability of remotely sensed imagery with better and finer spatial, temporal, and spectral resolution. In the meantime, image analysis and processing methods have been improving, which enables us to map wetlands and monitor changes with unprecedented accuracy. In particular, the availability of high-resolution light detection and ranging (LiDAR), synthetic aperture radar (SAR), hyperspectral, and multispectral data, and the introduction of multisensor and multiscale data fusion techniques hold great potential for improving large-scale wetland mapping and monitoring (Lang et al., 2015). This article presents an introduction to the uses of GIS and remote sensing technologies for wetland mapping and monitoring. A case study will be presented to demonstrate the use of high-resolution LiDAR data and aerial photographs for mapping prairie potholes and surface hydrologic flow pathways.

2.07.2

Wetland Indicators

As noted earlier, there is no universally accepted definition of wetlands due to the diversity of wetland types. Wetlands can occur in a variety of landscape, hydrological, and climatic settings. They differ in size, shape, plant, soil, and hydrologic conditions. Despite these differences, wetlands can still be categorized into certain types based on their common characteristics. Some wetland types are

142

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

easier to identify in the field than others due to their distinctive features. Although identifying wetlands in the field is indispensable to wetland inventory and field verification, it is labor-intensive, time-consuming, and impractical for surveying a large area. GIS and remote sensing techniques can facilitate wetland identification and delineation by analyzing a combination of wetland indicators such as hydrology, vegetation, soil types, and topographic position. These wetland indicators can be represented as various wetland indicator layers in a GIS environment, which can be overlaid or integrated to identify areas where there is a high probability that wetlands may be present (i.e., potential wetlands).

2.07.2.1

Hydrology

Among the many wetland indicators, hydrology is probably the most important factor that affects the formation and functions of a wetland as it influences plant communities, animal species, soil properties, and human use. Lands must remain “wet” for a long period of time during the growing season in order to be designated as wetlands. The prolonged wetness of wetlands results from water received from various sources, including precipitation, snowmelt, surface water runoff, groundwater discharge, among others. Based on the frequency and duration of inundation or soil saturation, wetlands can generally be classified as ephemeral, temporary, seasonal, semipermanent, and permanent. In the United States, the minimum wetness for a federally regulated wetland is defined by saturation within 30 cm of the surface for at least 2 weeks during the growing season in most years (Tiner, 2015b). In general, wetlands with a high wetness are relatively easier to identify than dried wetlands through remote sensing. Apart from cloud shadows, a dark tone in multispectral remote sensing imagery is often indicative of water or high soil moisture areas, where wetlands are likely to occur. The normalized difference water index (NDWI) is a commonly used index to detect and delineate water-like features and high soil moisture areas (McFeeters, 1996). The formula for calculating NDWI is: NDWI ¼

ðGREEN  NIR Þ ðGREEN þ NIR Þ

(1)

where near infrared (NIR) and GREEN represent the spectral reflectance values acquired in the near-infrared and green portion of the electromagnetic spectrum, respectively. Theoretically, NDWI values range from  1 to þ 1. An NDWI value that is negative or close to zero means no water whereas an NDWI value close to þ 1 indicates the highest wetness. In addition to multispectral imagery, high-resolution LiDAR and SAR data are increasingly being used to map surface water and wetlands (Brian, 2015; Huang et al., 2011b; Lang and McCarty, 2009; Wu and Lane, 2016).

2.07.2.2

Hydrophytic Vegetation

The nature of plants colonizing wetlands is considered one of the most distinctive features of wetlands, as vegetation life form and patterns, if present, can be easily observed and recognized (Tiner, 2009a). These plants are called hydrophytic vegetation. In the United States, it was reported that the national list of wetland plants contains nearly 6700 species (Tiner, 1991). These species have adapted to the frequent and prolonged flooding events that occur in wetlands. Remotely sensed data are frequently used to identify specific plant species or vegetation types indicative of wetlands. The most well-known and commonly used index to detect green vegetation from multispectral remote sensing data is the normalized difference vegetation index (NDVI) (Tucker, 1979). The formula for calculating NDVI is: NDVI ¼

ðNIR  REDÞ ðNIR þ REDÞ

(2)

where NIR and RED represent the spectral reflectance values acquired in the near-infrared and red portions of the electromagnetic spectrum, respectively. Theoretically, NDVI values range from  1 to þ 1. An NDVI value that is negative or close to zero means no vegetation whereas an NDVI value close to þ 1 indicates the highest concentration of green vegetation.

2.07.2.3

Hydric Soils

Hydric soils are soils that are saturated, ponded, or flooded long enough during the growing season to promote the development of anaerobic conditions in the upper horizons. These conditions are favorable environmental conditions for supporting the growth and reproduction of hydrophytic vegetations. Most wetlands have hydric soils and hydrophytic vegetations present; however, there are also some nonvegetated wetlands (e.g., mudflats). The U.S. Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) developed a national list of hydric soils which are updated periodically (USDA-NRCS, 2010). Currently, there are approximately 2000 hydric soil types in this national list (USDA-NRCS, 2016a). Each hydric soil type is a unique combination of physical, chemical, and moisture properties. The USDA-NRCS also developed a GIS database called Soil Survey Geographic Database (SSURGO), which contains soil information collected over the course of a century (USDA-NRCS, 2016b). The SSURGO database consists of spatial data (map unit polygons) and tabular data (attribute tables). The map unit polygons (MUPLOYGON) delineate the extent of different soils. The attribute table (muaggatt) contains the hydric soil information, which can be joined to the map unit polygons through the common MUKEY column. In the muaggatt attribute table, there is a field called hydclprs (Alias: Hydric ClassificationdPresence), which indicates the proportion of the map unit that is hydric. Map units with a higher proportion of hydric soils are more likely to contain wetlands. In other words, wetlands are less likely to occur on nonhydric soils. However, the

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

143

absence of hydric soil does not mean that the area is always without wetlands, since SSURGO data have a limited map scale between 1:24K and 1:12K. Therefore, it is recommended that the hydric soil indicator be used in conjunction with hydrology and vegetation indicators to identify areas with a high probability of wetland occurrence.

2.07.2.4

Topographic Position

In addition to the three key wetland indicators (hydrology, hydrophytic vegetation, and hydric soil) mentioned earlier, topographic position can be used as a supplementary indicator of wetland occurrence. Digital elevation models (DEMs) are commonly used to derive primary topographic metrics (e.g., slope, aspect, and curvature) and secondary topographic metrics, which are computed from two or more primary metrics. One of the most widely used secondary topographic metrics is the topographic wetness index (TWI), which quantifies the tendency of a grid cell to receive and accumulate water (Sörensen et al., 2006). The TWI is defined as:   A (3) TWI ¼ ln tanðbÞ where A is the upslope contributing area and b is the local slope angle. The higher TWI of a cell has the higher tendency to accumulate water, and thus the higher likelihood of wetland presence. Traditionally, coarse-resolution DEMs (e.g., the USGS National Elevation Dataset (NED) with 10–30 m resolution) have been used to derive TWI. More recently, high-resolution LiDARbased DEMs have been used to derive TWI and facilitate forested wetland mapping (Lang et al., 2013). In addition to the TWI, other algorithms have been developed to extract surface depressions and map depressional wetlands based on LiDAR-based DEMs in conjunction with aerial photographs, such as the stochastic depression analysis method for mapping vernal pools (Wu et al., 2014) and the localized contour tree method for mapping prairie wetlands (Wu and Lane, 2016). An emerging open-source GIS software package called Whitebox Geospatial Analysis Tools (Whitebox GAT) also provides a number of geoprocessing tools for computing topographic metrics based on DEMs (Lindsay, 2016).

2.07.3

Wetland Classification

2.07.3.1

Geospatial Data

Geospatial data for wetland mapping and monitoring include imagery collected by a variety of airborne or satellite sensors. These sensors can be broadly divided into passive and active sensors. Passive sensors measure electromagnetic radiation naturally reflected from the Earth’s surface, which usually takes place during the daytime when the reflected energy from the sun is detectable by the sensor. In contrast, active sensors emit radiation using their own energy source toward the Earth’s surface and measure the returned signals, which can acquire imagery both day and night under all weather conditions. Geospatial data acquired by passive sensors include aerial photography, multispectral imagery, and hyperspectral imagery. In contrast, LiDAR data and SAR imagery are collected by active sensors. Aerial photography has been used for wetland mapping for many decades. With the technological advances, image quality collected by aerial photography has been improving, from initially black and white (panchromatic), to true color (RGB), and then to color infrared (CIR). Aerial photographs are commonly collected by states and local governments. For example, the State of Massachusetts collected 1:12,000 scale CIR aerial photographs to conduct a statewide inventory of potential vernal pool habitats (Burne, 2001). One of the most common sources of aerial photography in the United States is the USDA National Agriculture Imagery Program (NAIP) initiated in 2002. The original 5-year imagery acquisition cycle has been upgraded to a 3-year cycle since 2009. The statewide NAIP imagery can be freely downloaded from the USDA Geospatial Data Gateway (USDA, 2016). These highresolution natural color and CIR aerial imagery have been used in numerous wetland studies (see examples in Enwright et al., 2011; Johnston, 2013; Vanderhoof et al., 2016; Wu and Lane, 2016). Similar to aerial photographs, multispectral satellite images are collected by passive sensors. In addition to the visible (red, green, blue) and near-infrared (CIR) portions of the electromagnetic spectrum, many satellite sensors also collect information on longer wavelengths, such as the short-wave infrared and thermal infrared. The most commonly used multispectral satellite sensors for wetland mapping include Landsat MSS/TM/ETM þ/OLI, MODIS, AVHRR, SPOT-4/5/6/7, IKONOS, QuickBird, GeoEye-1, RapidEye, Sentinel-2, and WorldView-1/2/3/4, among others. Comprehensive reviews of these commonly used satellite sensors for wetland mapping can be found in Ozesmi and Bauer (2002), Klemas (2011), and Lang et al. (2015). Compared to aerial photography, satellite sensors can provide multispectral imagery with finer spectral and better temporal resolutions, which are essential for classifying wetland vegetation types and analyzing wetland water dynamics. In addition to aerial photography and multispectral imagery, LiDAR data have increasingly been incorporated into the wetland mapping process. LiDAR sensors are active systems that use laser pulses to measure ranges to the Earth, producing precise (x, y, z) measurements in the form of LiDAR point clouds. High-resolution DEMs can then be derived from LiDAR point clouds by using interpolation algorithms. Importantly, the LiDAR-based DEMs can be used to compute various topographic metrics, which serve as essential wetland indicators as noted earlier. Although LiDAR sensors are primarily used to generate precise information on surface elevation, some LiDAR sensors can also record LiDAR intensity, which represents the returned signal strength relative to the emitted energy. Since most LiDAR sensors operate in the near-infrared spectrum, laser lights are strongly absorbed by water, resulting in very weak or no signal returns. As a result, water areas appear as dark features in the LiDAR intensity imagery. Therefore, LiDAR intensity

144

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

data are particularly useful for mapping surface water and wetland inundation extent. A number of studies have reported improved accuracy of wetland inundation mapping by using LiDAR intensity data with simple thresholding techniques (Huang et al., 2011b; Lang and McCarty, 2009; Wu and Lane, 2016). It is worth noting that the high-resolution DEMs can also be derived from aerial imagery acquired using other emerging geospatial technologies such as unmanned aerial systems (UAS) or drones. Two of the leading software packages for processing drone imagery include Drone2Map for ArcGIS (ESRI, 2016) and ENVI OneButtion (Harris Geospatial Solutions, 2016), both of which can take raw imagery from drones and create high-resolution orthomosaics and digital surface models for wetland mapping.

2.07.3.2

Classification Methods

Wetland classification methods have been developing for decades along with methods for land use and land cover classification. Common classification methods can be divided into two broad categories: supervised classification and unsupervised classification. In a supervised classification, the analyst first selects training samples (i.e., homogeneous and representative image areas) for each land cover class and then uses them to guide the computer to identify spectrally similar areas for each class. The selection of training samples can be based on field data collection or expert knowledge. The most common supervised classification methods include maximum likelihood, parallelepiped, minimum distance, decision tree, random forest, and support vector machine, among others (Lang et al., 2015). Unsupervised classification, however, does not start with training samples. Instead, the analyst specifies the desired number of classes, and then the computer automatically groups pixels that are statistically similar into categories using clustering algorithms. The most commonly used cluster algorithms include K-Means, Iterative Self-Organizing Data Analysis Technique (ISODATA), and agglomerative hierarchical (Duda and Canty, 2002). The iterative clustering process results in a preset number of “spectral classes,” which can then be assigned class labels and become “information classes.” Unsupervised classification is particularly useful when field data or prior knowledge about the study area is not available. Some studies used a hybrid approach that combines unsupervised and supervised classification methods with field survey (Lane et al., 2014). Supervised and unsupervised methods have been used for decades for classifying remote sensing images. They are pixel-based classification methods solely based on spectral information (i.e., digital number values), which often result in “salt and pepper” effect in the classification result. To overcome the issues associated with pixel-based classification methods, object-based image analysis (OBIA) methods for image classification have been developed (Blaschke, 2010; Liu et al., 2010). The OBIA approach can incorporate spectral, spatial, textural, and contextual information into the classification process. Numerous studies have reported that OBIA approach can achieve greater accuracy for wetland mapping than traditional pixel-based approach (Joseph et al., 2015). Trimble eCognition Developer is one the of most popular software packages for object-based image classification and analysis (Trimble, 2016).

2.07.3.3

Classification Systems

There are two kinds of wetland classification systems: horizontal and vertical (Tiner, 2009a). A horizontal classification system classifies wetlands into a finite number of types based on the major characteristics of each type. The wetland types in the horizontal classification system are mutually exclusive and highly generalized. For example, some commonly used terms such as marsh, swamp, bog, and fen belong to the horizontal classification system. In terms of water permanence (i.e., frequency and duration of water ponding), wetlands can be classified into permanently flooded, semipermanently flooded, seasonally flooded, temporarily flooded, and ephemeral wetlands (Sloan, 1972). Similarly, based on their topographic position, wetlands can be classified into marine, estuarine, lotic (rivers and streams), lentic (lakes), terrene, and geographically isolated wetlands (Tiner, 2015a). Vertical classification systems utilize a hierarchical approach that classifies wetlands into a few general types and then further subdivides each type into more and more detailed types. Higher-level wetland types share more generalized characteristics, such as topographic position and water source, while lower-level wetland types are based on more specific characteristics (e.g., dominant vegetation species, water chemistry, substrate characteristics, and water-level fluctuations) (Tiner, 2015a). A good example of vertical classification systems is the Cowardin et al. (1979) classification hierarchy developed and adopted by the U.S. Fish and Wildlife Services (FWS) for conducting a nationwide wetland inventory. This hierarchical classification system classifies wetlands and deepwater habitats into five levels, including system, subsystem, class, subclass, and modifiers. More information about this Cowardin et al. (1979) classification system will be described in the section “U.S. National Wetlands Inventory.” It should be noted that vertical classifications usually require high-resolution aerial photographs or submeter satellite imagery in conjunction with field verification. Traditional medium-resolution satellite data are generally not suitable for developing vertical classification systems.

2.07.4

Current Large-Scale Wetland Inventories

Significant progress has been made in mapping large-scale (e.g., global-scale, regional-scale) wetlands during the past decades. A number of large-scale wetland inventories have been developed by various individuals, agencies, and organizations (Channan et al., 2014; Cowardin et al., 1979; Dugan, 1993; Feng et al., 2016; Finlayson et al., 1999; Fluet-Chouinard et al., 2015; Gumbricht, 2012; Lehner and Döll, 2004; Ramsar Convention Secretariat, 2016; Rebelo et al., 2009; Zheng et al., 2015). As noted earlier, there is no universally accepted definition of wetlands due to the diversity of wetlands. As a result, the currently available large-scale wetland

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

145

inventories are not consistent in their wetland definition, methodology, or wetland classification system. The inconsistencies between these existing large-scale wetland inventories make it difficult to conduct comparative analysis. Nevertheless, these wetland inventories do represent the best available wetland datasets and could serve as a good starting point for analyzing wetland extents at a global or regional scale.

2.07.4.1

Global-Scale Wetland Inventories

The advancement of remote sensing technology has enabled satellite to provide global land cover images with increasing spatial, temporal, and spectral resolutions. Currently available wetland inventories at the global scale are inconsistent in many ways, such as wetland definition, wetland classification system, wetland classification method, data type, and spatial resolution. A list of the existing global datasets of wetlands is shown in Table 1. Some current wetland maps at the global scale were extracted from different global land cover products derived from remotely sensed data with a spatial resolution ranging from 5-minute to 30-m. Despite the inconsistencies and limitations of these existing inventories, they provide important information about the global extent of wetlands and serve as valuable data sources for wetland research, conservation, and management. The Ramsar Convention, also known as the Convention on Wetlands of International Importance, is an international treaty for promoting conservation and sustainable use of wetland habitats, especially as habitats for migratory waterfowl. It was named after the city of Ramsar in Iran, where the Convention was first held in 1971. Initially, there were only seven countries that signed the agreement on December 21, 1975. Ever since then, the number of contracting countries has been growing. Currently, the Convention has 169 contracting countries. As of November 2016, there are 2243 sites worldwide that the Ramsar Convention designated as wetlands of international importance, covering approximately 2.16 million km2. The spatial distribution of these Ramsar sites is shown in Fig. 1. The country with the highest number of Ramsar sites is the United Kingdom with 170 sites, while the country with the highest total area of Ramsar wetlands is Bolivia with 148,424 km2. The GIS data for the Ramsar sites are available through the Ramsar Sites Information Service (Ramsar Convention, 2016), which provides point coordinates as well as polygons (partially) for Ramsar sites in the ESRI Shapefile format. Lehner and Döll (2004) developed a Global Lakes and Wetlands Database (GLWD) in the form a global raster map at 30-s resolution. They estimated that global wetlands cover approximately 8.3–10.2 million km2 or 6.2%–7.6% of the global land surface area (excluding Antarctica and Greenland). Geographically, nearly half of the global wetlands occur in the high northern latitudes between 50 and 70 N in boreal and arctic regions where permafrost, bogs, and fens are abundant (Fig. 2). The remainder of global wetlands are primarily located in the humid tropical and subtropical regions where forested wetlands and marshes are dominant (Melton et al., 2013; Tiner, 2009b). It should be noted that the estimated global wetland extents reported in the literature vary significantly, which have an almost threefold difference between the lower and upper estimates (4.3–12.9 million km2). In addition to the above-mentioned global wetland inventories developed specifically for mapping global wetland resources, there are a number of global land cover or water datasets from which wetlands can be extracted (see Table 1). For example, the European Space Agency’s (ESA) Climate Change Initiative (CCI) produced a Global Land Cover Map for the 2010 epoch, which is a 300-m resolution raster that classifies global land cover into one of 36 classes (Fig. 3). Data for this Global Land Cover Map 2010 epoch were collected from 2008 to 2012 by the ESA’s MERIS satellite. It should be noted that the ESA’s Global Land Cover Map does not have a dedicated wetland land cover type. Nevertheless, there are some wetland-related land cover types, such as water bodies, shrub or herbaceous flooded, tree floodeddfresh water, tree floodeddsaline water. These wetland-related land cover types can be combined to extract global wetland extent. More recently, the Association of American Geographers (AAG) published a groundbreaking 250-m resolution global map and database of Ecological Land Units (ELUs), which was derived from a stratification of the earth into unique physical environments and their associated vegetation (Fig. 4). The mapping approach first characterizes the climate regime, landforms, lithology, and land cover of the Earth, and then models terrestrial ecosystems as a combination of those four land surface components (Sayre et al., 2014). These four components resulted in 3639 different combinations or ELUs. This global map of ELUs, implemented by USGS and ESRI, represents the latest collective efforts to map standardized, high-resolution terrestrial ecosystems of the Earth.

2.07.4.2

U.S. National Wetlands Inventory

In 1974, the U.S. FWS initiated the National Wetlands Inventory (NWI) Program to conduct a nationwide inventory of wetlands in the United States, aiming to provide decision-makers with information on the distribution and status of wetlands to aid in wetland conservation efforts (Tiner, 2009c). To achieve this goal, the NWI developed a national wetland classification system (Cowardin et al., 1979), which has now become the federal standard for wetland classification and has been adopted for use in other countries. Compared to the currently available global wetland inventories, the NWI classification system has more detailed wetland types for both wetlands and deepwater habitats. The classification is primarily based on vegetation, hydrologic regime, soil, salinity, and the location of wetlands. It is a hierarchical classification system consisting of five basic levels (from general to detailed): system, subsystem, class, subclass, and modifiers. The five major systems are: marine (open ocean and its associated coastline), estuarine (estuary and associated tidal and adjacent tidal wetlands), riverine (wetlands and deepwater habitats within banks of rivers and streams), lacustrine (permanently flooded lakes and reservoirs, intermittent lakes, and tidal lakes), and palustrine (inland vegetated wetlands, such as marshes, swamps, bogs, fens, ponds, and prairie wetlands) (Tiner, 2009a). Each system (except palustrine) is further subdivided into subsystems. More information about the classification system can be found in Cowardin et al. (1979).

146

No.

Overview of existing global datasets of wetlands Name

Data source

Data types and resolution

Description

Website

1

Ramsar Wetlands Database

Ramsar Convention (2016)

Global Lakes and Wetlands Database (GLWD)

Lehner and Döll (2004)

3

Global Ecological Land Units (ELUs)

USGS and ESRI (2015)

Global raster map; 250-m resolution

4

ESA CCI Global Land Cover Dataset

ESA (2016)

Global rater map; 300-m resolution

5

Global Mosaics of the standard MODIS land cover

Channan et al. (2014)

Global raster map; 5-min resolution

6

Global Water Frequency

Feng et al. (2016)

Global raster map; 30-m resolution

7

CIFOR World Wetland Distribution

Gumbricht (2012)

Global raster map; 236-m resolution

8

Global Inundation Extent from Multi-Satellites (GIEMS)

Fluet-Chouinard et al. (2015)

Global raster map; 500-m resolution

9

World Water Bodies

ESRI (2016)

10

Global Distribution of Wetlands Map

USDA-NRCS (1997)

Global vector map; 1:2 million resolution Global raster map; 1:5 million resolution

11

Global Wetland Project

ESA and Ramsar (2012)

Global vector map

12

Wetlands Map of the UNEP-WCMC

UNEP-WCMC (1993), Dugan (1993)

Global vector map

Currently comprises 2243 wetland sites worldwide, covering approximately 2.16 million km2 Comprises lakes, reservoirs, rivers, and different wetland types; wetlands were estimated to reach about 8–10 million km2, or 6.2%–7.6% of the global land surface area Comprises areas of distinct bioclimate, landform, lithology, and land cover that form the basic components of terrestrial ecosystem structure; 3639 different combinations or ELUs Includes three land cover maps corresponding to the different epochs 2000, 2005, and 2010 that classifies land cover into 1 of 36 classes Based on MODIS land cover type data product (MCD12Q1), classified as 17 land cover types, including “permanent wetlands” Contains estimates of the percentage of water occurrence among all valid Landsat observations circa 2000; it provides a more comprehensive estimation of global water area and changes in compare to static inland water maps Covers the tropics and subtropics; consists of seven classes: Fen, Bog-ombrotrophic peat domes, Riverine, Mangrove, Flood-out, Floodplain, Swamp and Marsh Provides the surface water extent and dynamics over the globe and over a long time record (1993–2007), based on a collection of satellite observations Includes 2.23 million polygons, classified as “Inland intermittent,” “Inland perennial,” and “Dry salt flat” Based on a reclassification of the FAO-UNESCO Soil Map of the World combined with a soil climate map; five major wetland classes were identified Involves 10 countries in the Northern Africa and Middle East Includes 24,685 wetland and lake polygons; classified in eight types

https://rsis.ramsar.org/

2

Global representative point coordinates Global raster map; 30-s resolution

http://www.worldwildlife.org/

http://esriurl.com/eco/

http://www.esa-landcover-cci.org/ http://www.landcover.org/ http://www.landcover.org/

http://www.cifor.org/ https://lerma.obspm.fr/ http://www.arcgis.com/ http://www.nrcs.usda.gov/ http://www.globwetland.org/ https://www.unep-wcmc.org/

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

Table 1

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

Fig. 1

147

Global distribution of Ramsar sites. Source: Ramsar Convention.

The NWI maps were primarily produced by manually interpreting the mid-1980’s aerial photographs at a scale of 1:24K with subsequent support from soil surveys and field verifications. So far, the NWI has produced maps for more than 90% of the conterminous United States, the entire state of Hawaii, and 30% of Alaska (Tiner, 2009c). The spatial data of the entire NWI have been made available through the internet via the Wetlands Mapper online tool (USFWS, 2016) and can be downloaded in the format of ESRI Geodatabase or Shapefile. The NWI target mapping unit (i.e., the minimum-sized wetland that is consistently mapped) for different regions of the United States varies between 1000 and 20,000 m2 (or 0.1–2.0 ha), depending on the types of aerial imagery used and the types of wetland being mapped (Tiner, 1997). It is generally accepted that NWI mapping is most accurate for permanently flooded wetlands where distinct changes between vegetation, hydrology, and soil occur at the wetland boundary (Lang et al., 2015). In contrast, other wetland types, such as seasonally and temporarily flooded wetlands, ephemeral wetlands, and forested wetlands, are mapped more conservatively. It should be noted that NWI is a static dataset that might not reflect current wetland conditions, especially in areas where changes have occurred over the past 30 years due to natural changes and human activities. Nevertheless, NWI remains the most comprehensive nationwide wetland inventory in the United States and does provide a valuable data source for wetland location information. Great efforts have been made by the U.S. FWS and some states to update NWI by incorporating additional data and advanced remote sensing techniques.

2.07.5 Case Study: Mapping Prairie Wetlands and Surface Hydrologic Flow Pathways Using LiDAR Data and Aerial Imagery 2.07.5.1

Introduction

The Prairie Pothole Region (PPR) of North America encompasses a vast area of approximately 715,000 km2, including parts of five north-central U.S. states (Montana, North Dakota, South Dakota, Minnesota, and Iowa) and three south-central Canadian provinces (Alberta, Saskatchewan, and Manitoba) (Fig. 5A). The landscape of the PPR is characterized by millions of closed-basin wetland depressions (see Fig. 5B) in clay-rich glacial deposits left by the last glacial retreat (van der Kamp et al., 2016; Winter, 1989). The PPR is considered as one of the largest and most productive wetland areas in the world (Keddy, 2010; Steen et al., 2014). These wetland depressions, commonly known as potholes, possess important hydrological and ecological functions, such as providing critical habitat for many migratory and breeding waterbirds (Minke, 2009; Rover and Mushet, 2015), acting as nutrient sinks (Oslund et al., 2010), and storing surface water that can attenuate peak runoff during a flood event (Huang et al., 2011b). The potholes range from a relatively small area of less than 100 m2 to as large as 30,000 m2, with an estimated median size of 1600 m2 (Huang et al., 2011a; Wu and Lane, 2016; Zhang et al., 2009). The depths of potholes are generally less than 1 m with varying water permanency (ephemeral, temporal, seasonal, semipermanent, and permanent) (Sloan, 1972). Due to their small size and shallow depth, these wetlands are highly sensitive to climate variability and are vulnerable to ecological, hydrological, and anthropogenic changes. Their ponded water areas are highly variable resulting from alternating wet and dry periods. In extremely wet periods, many small wetland depressions may coalesce to form larger wetland depressions through the fill-spill mechanism. The time-series aerial imagery in Fig. 6 clearly demonstrates the ponded water dynamics of prairie wetland depressions. Prairie wetlands in the PPR have been extensively drained and filled for agricultural purposes, which is considered as the greatest source of wetland loss in the PPR (Johnston, 2013). In a report to the United States Congress on the status of wetland resources,

148

120°

90°

60°

30°



30°

90°

60°

120°

150°

60°

60°

30°

30°





30°

30°

Upland Lowland Organic 60°

Miller projection

Salt affected

60°

SCALE 1:100,000,000

Permafrost affected

0 500 1000

2000

Inland water bodies

3000

4000 km

5000

6000

7000

8000

No wetlands (or too small to display)

150°

Fig. 2

120°

90°

60°

Global Lakes and Wetlands Database (GLWD). Source: USDA-NRCS.

30°



30°

60°

90°

120°

150°

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

150°

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

Fig. 3 300-m resolution Global Land Cover Map (2010 epoch). Source: ESA-CCI.

149

150 GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

Wet Moist

Dry

Goode Homolosine Projection Arctic

0

2000

4000

6000 km

Fig. 4 250-m resolution global map and database of Ecological Land Units (ELUs). Source: USGS and ESRI.

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

151

Fig. 5 The Prairie Pothole Region (PPR) of North America. (A) Geographic extent of the PPR and (B) the aerial photograph shows an enormous amount of prairie pothole wetlands formed by the last glacial retreat.

Fig. 6

Time-series aerial photographs illustrate the dynamic nature of prairie pothole wetlands under alternating wet and dry periods.

Dahl (1990) estimated that the lower 48 states lost an estimated 53% of their original wetlands over a period of 200 years between the 1780s and the 1980s. More recently, Dahl (2014) reported that the total wetland area in the PPR declined by an estimated 301 km2 or 1.1% between 1997 and 2009. This represents an average annual net loss of 25 km2. Regarding the number of depressions, it was estimated that the wetland depressions declined by over 107,177 or 4% between 1997 and 2009 (Dahl, 2014). The

152

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

extensive wetland drainage and removal have increased precipitation runoff into regional river basins, which is largely responsible for the increasing magnitude and frequency of flooding events in the PPR (Bengtson et al., 1999; Miller and Nudds, 1996; Todhunter and Rundquist, 2004). Concerns over flooding along rivers in the PPR have stimulated interest in developing hydrologic models to simulate the effects of depression storage on peak river flows (Gleason et al., 2008; Gleason et al., 2007; Huang et al., 2011b; Hubbard and Linder, 1986). Since most of these prairie wetlands do not have surface outlets or well-defined surface water connections, they are generally considered as geographically isolated wetlands (Cohen et al., 2016; Lane and D’Amico, 2016). Despite their lack of an apparent surface water connection, it is important to note that these wetlands may be hydrologically connected to other wetlands and waterbodies through groundwater or intermittent surface water connections during extremely wet periods (Leibowitz et al., 2016; Tiner, 2015a). A number of recent studies focusing on the hydrologic connectivity of prairie wetlands have been reported in the literature. For example, Chu (2015) proposed a puddle-to-puddle modeling framework to delineate prairie wetlands and characterize their dynamic hydro-topographic properties in the Cottonwood Lake area (2.55 km2) using a 10-m resolution DEM. Vanderhoof et al. (2016) examined the effects of wetland expansion and contraction on surface water connectivity in the PPR using time-series Landsat imagery. Ameli and Creed (2016) developed a physically based subsurface–surface hydrological model to characterize both the subsurface and surface hydrologic connectivity of prairie wetlands and explore the time and length variations in these connections to a river. In a comprehensive overview of the hydrology of prairie wetlands, Hayashi et al. (2016) highlighted that prairie wetlands and catchments should be considered as highly integrated hydrological units because the existence of prairie wetlands depends on lateral inputs of runoff water from their catchments in addition to direct precipitation. However, few studies on the hydrology of prairie wetlands have treated wetlands and catchments as integrated hydrological units. Furthermore, high-resolution LiDAR data have rarely been used in broad-scale (e.g., basin- or subbasin-scale) studies to delineate wetland catchments and model wetland connectivity in the PPR. In this case study, a semiautomated framework was proposed to delineate nested hierarchical wetland depressions, their corresponding wetland catchments, and surface water connectivity using high-resolution LiDAR data (Wu and Lane, 2017). The nested hierarchical structure of wetland depressions and catchments was identified and quantified using the localized contour tree method (Wu et al., 2015). The surface water connectivity between wetlands and streams was characterized using the least-cost path algorithm. The derived surface water flow network successfully captured those intermittent flow paths that were generally not available in the National Hydrography Dataset (NHD) of the PPR. The results demonstrated that the proposed framework is promising for improving overland flow modeling and hydrologic connectivity analysis.

2.07.5.2

Methods

In general, there are two types of surface water connectivity between prairie wetlands: fill-spill and fill-merge-spill. Whether fill-spill or fill-merge-spill occurs depends on the relative elevation of spill points and the water levels. If two adjacent wetland depressions share the same spill point (elevation), the fill-merge-spill hydrological process will occur. However, if a wetland depression has no adjacent wetland depressions sharing the same spill point, it will directly spill to a downstream waterbody or wetland. In that case, it is a fill-spill only hydrological process. Fig. 7 illustrates the fill and spill dynamics of prairie wetlands. As water level gradually increases in the individual wetland depressions B and C, they will eventually begin to coalesce and form a larger wetland complex. Once the larger wetland complex is fully filled, it will spill to the downstream wetland depression D. Similarly, depressions D and E will experience the same fill-merge-spill hydrological process. On the contrary, depression A will experience the fill-spill process as no adjacent depressions sharing the same spill points are available. As shown in Fig. 7, both wetland depressions and catchments exhibit a nested hierarchical structure. It should be noted that the wetland depression is different from the wetland inundation area. The standing water surface of a wetland is referred to as the inundation area (see the dark blue area in Fig. 7), whereas the maximum potential ponded extent is referred to as the wetland depression (see the light blue area in Fig. 7). The wetland catchment is defined as the upslope contributing area that drains water into the wetland depression. The catchment is also known as the watershed, contributing area, or drainage basin. For example, the corresponding wetland catchment of depression B is bounded by the two vertical dashed lines surrounding it. When depressions B and C coalesce to form a larger wetland complex, the wetland catchment of the resulting wetland complex is the aggregated area of wetland catchments B and C.

A

B

C D Depression Catchement

Fig. 7

A schematic diagram illustrates the fill and spill dynamics of prairie wetlands.

E

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

153

In this case study, 1-m resolution LiDAR-derived DEM in conjunction with LiDAR intensity imagery was used to map prairie wetlands and surface hydrologic flow pathways. The LiDAR intensity imagery was used to delineate wetland inundation areas, whereas the LiDAR DEM was used to delineate wetland depressions, catchments, and surface hydrologic flow pathways. Thresholding techniques have been commonly applied to LiDAR intensity imagery to extract inundation areas (Lang and McCarty, 2009; McCauley and Anteau, 2014). The proposed methodology for delineating nested wetland catchments and flow paths is a semiautomated approach consisting of several key steps: (a) extraction of hierarchical wetland depressions using the localized contour tree method (Wu et al., 2015); (b) delineation of nested wetland catchments; (c) calculation of potential water storage; and (d) derivation of flow paths using the least-cost path search algorithm. The LiDAR-derived bare-earth DEM is used to delineate hierarchical wetland depressions and nested wetland catchments. The LiDAR intensity imagery is used to extract standing waterbodies on the ground. The potential water storage of each individual wetland depression is calculated as the volume between the standing water surface and the maximum water boundary where water overspills into downstream wetlands or waters. The flow paths representing surface water connectivity can then be derived according to the potential water storage and simulated rainfall intensity. The flowchart in Fig. 8 shows the detailed procedures of the proposed framework for delineating wetland catchments and flow paths. To streamline the procedures for automated delineation of wetland catchments and flow paths, the proposed framework has been implemented as an ArcGIS toolboxdWetland Hydrology Analyst. The core algorithms of the toolbox were implemented using the Python programming language. The toolbox consists of three tools: Wetland Depression Tool, Wetland Catchment Tool, and Flow Path Tool. The Wetland Depression Tool asks the user to select a DEM grid, and then executes the localized contour tree algorithm with user-specified parameters (e.g., base contour, contour interval, minimum depression size, minimum ponding depth) automatically to delineate hierarchical wetland depressions. The depressional wetland polygons can be saved as ESRI Shapefile or a Feature Dataset in a Geodatabase. Various morphometric properties (e.g., width, length, area, perimeter, maximum depth, mean depth, volume, elongatedness, and compactness) are computed and included in the attribute table of the wetland polygon layers. The Wetland Catchment Tool uses the DEM grid and the wetland polygon layers resulted from the Wetland Depression Tool as input, and exports wetland catchment layers in both vector and raster format. The Flow Path Tool can be used to derive overland flow paths of surface water based on the DEM grid and the wetland polygon layers.

2.07.5.3

Results and Discussion

The proposed methods were tested in the PPR of North Dakota (see Fig. 9). A small portion of the results is shown in Fig. 9. By comparing the inundation polygons derived from the 2011 LiDAR intensity data and the NWI polygons created in the early 1980s by the U.S. FWS, it can clearly be seen that the NWI wetlands inventory in this region is considerably out of date. It is a static dataset that does not reflect the wetland changes in the past decades or capture the fill-spill dynamics. Apparently, some relatively large disjointed NWI wetlands coalesced and formed even larger wetland complexes during the extremely wet period in October 2011 when the LiDAR data were acquired. On the contrary, some small NWI wetlands appeared to have dried out without visible standing water. The median size of the dried NWI wetlands is approximately 1200 m2, which is considerably smaller than the median size of all NWI wetlands in this region (1780 m2). The decline in the number of small NWI wetlands can be partly attributed to the high sensitivity of these wetlands to hydrological and climatic changes. In addition to mapping the wetland inundation areas using LiDAR intensity imagery, the maximum potential ponded extent of wetlands can be delineated from the LiDAR DEM using the localized contour tree method (Wu et al., 2015), whereas the potential hydrologic flow pathways can be derived using the least-cost path algorithm. A small portion of the resulting map is shown in Fig. 10. Clearly, the derived flow paths not only captured the permanent surface water flow paths (see the thick blue NHD flowline in Fig. 10) but also the intermittent and infrequent flow paths that have not been mapped previously. By examining the flow paths

LiDAR DEM grid

LiDAR intensity grid

Preprocessing

Potential water storage

Thresholding method

Contour tree method

Potential flow paths

Standing waterbodies

Wetland depressions

Flow paths

Rainfall scenario

Wetland catchments

Results validation

NHD flowlines

Shaded relief map

NWI wetlands NAIP imagery

Fig. 8

Flowchart of the proposed framework for delineating wetland catchments and flow paths.

154

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

NWI Inundation areas

Maximum ponded extent

Fig. 9 Comparison between inundation areas (derived from LiDAR intensity data) and NWI wetland polygons. (A) Inundation areas and NWI wetlands overlaid on LiDAR intensity image; (B) inundation areas and NWI wetlands overlaid on color infrared aerial photograph; and (C) maximum ponded extent overlaid on shaded relief of LiDAR DEM.

Fig. 10 Examples of LiDAR-derived wetland depressions and flow paths in the Prairie Pothole Region. (A) Wetland depressions and flow paths overlaid on LiDAR shaded relief map and (B) wetland depressions and flow paths overlaid on color infrared aerial photograph.

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

155

overlaid on the CIR aerial photograph (Fig. 10B), it can be seen that the majority of flow paths appeared to be surrounded by vegetated areas. This indicates that flow paths are located in high soil moisture areas that are directly or indirectly related to surface water or groundwater connectivity. It is important to note that the proposed methodology in this case study was designed to reflect the topography and hydrologic connectivity between wetlands in the PPR. Assumptions have been made to simplify the complex prairie hydrology. Physically based hydrological models have not yet been integrated into this framework. However, fill-and-spill is a complex and spatially distributed hydrological process highly affected by many factors, such as surface topography, surface roughness, soil infiltration, soil properties, depression storage, precipitation, evapotranspiration, snowmelt runoff, and groundwater exchange. Nevertheless, this case study presents the first attempt to use LiDAR data for deriving nested wetland catchments and simulating flow paths at a broad-scale in the PPR.

2.07.6

Conclusion

Wetland mapping capabilities have been greatly improved over the past decades. Initial wetland mapping efforts were primarily based on manual photointerpretation of aerial photographs in conjunction with field data collection and verification, which are time-consuming and labor-intensive. Since the first multispectral satellite data became publicly available in the 1970s, the science of wetland mapping and monitoring has been developing rapidly. The technological advances of GIS and remote sensing technologies have provided wetland mapping science with improved GIS tools and remotely sensed imagery with ever-increasing spatial, temporal, and spectral resolution. In particular, recent advances in the quality and availability of high-resolution LiDAR, SAR, UAS, hyperspectral, and multispectral data, and the introduction of multisensor and multiscale data fusion techniques hold great potential for improving large-scale wetland mapping and monitoring. The multitude of these geospatial datasets can provide complementary information about wetland occurrence and characteristics. More and more semiautomated and automated wetland mapping techniques and large-scale wetland inventories have become available during the past few years. Though the use of GIS and remote sensing has resulted in improved wetland mapping capabilities, there remain many challenges that require further investigation. The challenges can be summarized in three aspects in terms of spatial, temporal, and spectral resolution of wetland inventories. First, except for North America and parts of Europe, comprehensive national-scale wetland inventories are not available for most countries. Currently available large-scale wetland inventories are inconsistent in their wetland definition, classification method, or classification system, making it difficult to conduct comparative analyses. Consequently, there is an appealing need for a universally accepted definition of wetland and wetland classification system in order to conduct a globalscale wetland inventory. In addition, the spatial resolution of most large-scale wetland inventories ranges from 250 to 500 m, which might not be sufficient for fine-scale wetland mapping and management, especially for small-size temporary and ephemeral wetlands (e.g., vernal pools). Second, the temporal resolution of current large-scale wetland inventories is very limited. Most inventories are static datasets derived from one-time airborne or satellite imagery, which could not reflect seasonal or annual changes (e.g., hydroperiods, phenology) of wetlands. The increasing availability of SAR data (e.g., ESA’s Sentinel-1A) holds great potential for mapping temporal changes of wetlands. Last but not least, there is a lack of spectral libraries for the large number of hydrophytic vegetation. Hyperspectral data can potentially fill the gap and provide more spectral information than other types of remote sensing imagery. However, the availability of hyperspectral data is relatively limited and the algorithms or tools for processing hyperspectral imagery are less developed compared to other data types.

References Adam, E., Mutanga, O., Rugege, D., 2010. Multispectral and hyperspectral remote sensing for identification and mapping of wetland vegetation: a review. Wetlands Ecology and Management 18, 281–296. Ameli, A.A., Creed, I.F., 2016. Quantifying hydrologic connectivity of wetlands to surface water systems. Hydrology and Earth System Sciences Discussions 2016, 1–28. Bengtson ML and Padmanabhan G (1999) A hydrologic model for assessing the influence of wetlands on flood hydrographs in the Red River Basin: development and application. Report submitted to the International Joint Commission Red River Task Force. North Dakota State Water Commission. Available at https://goo.gl/xDMngM (accessed on November 21, 2016). Blaschke, T., 2010. Object based image analysis for remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing 65, 2–16. Brian, B., 2015. Mapping and monitoring surface water and wetlands with synthetic aperture radar. In: Lang, M.W., Bourgeau-Chavez, L.L., Tiner, R.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL, pp. 119–136. Burne, M.R., 2001. Massachusetts aerial photo survey of potential vernal pools. Natural Heritage & Endangered Species Program, Westborough, MA. Burton, T.M., Tiner, R.W., 2009. Ecology of wetlands. Encyclopedia of inland waters. Academic Press, Oxford, pp. 507–515. Channan S, Collins K, and Emanuel W (2014) Global mosaics of the standard MODIS land cover type data. University of Maryland and the Pacific Northwest National Laboratory, College Park, Maryland, USA, 30. Chu, X., 2015. Delineation of pothole-dominated wetlands and modeling of their threshold behaviors. Journal of Hydrologic Engineering 22 (1). http://dx.doi.org/10.1061/(ASCE) HE.1943-5584.0001224. D5015003. Cohen, M.J., Creed, I.F., Alexander, L., Basu, N.B., Calhoun, A.J., Craft, C., D’Amico, E., DeKeyser, E., Fowler, L., Golden, H.E., 2016. Do geographically isolated wetlands influence landscape functions? Proceedings of the National Academy of Sciences of the United States of America 113, 1978–1986. Cowardin, L.M., Carter, V., Golet, F.C., LaRoe, E.T., 1979. Classification of wetlands and deepwater habitats of the United States. US Fish and Wildlife Service FWS/OBS 79, 131. Dahl, T.E., 1990. Wetlands losses in the United States, 1780’s to 1980’s. Report to the Congress. U.S. Department of the Interior, Fish and Wildlife Service, Washington, DC, p. 13.

156

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

Dahl, T.E., 2014. Status and trends of prairie wetlands in the United States 1997 to 2009. U.S. Department of the Interior, Fish and Wildlife Service, Ecological Services, Washington, DC, p. 67. Duda, T., Canty, M., 2002. Unsupervised classification of satellite imagery: choosing a good algorithm. International Journal of Remote Sensing 23, 2193–2212. Dugan, P., 1993. Wetland in danger: a world conservation atlas. Oxford University Press, Oxford. Enwright, N., Forbes, M.G., Doyle, R.D., Hunter, B., Forbes, W., 2011. Using geographic information systems (GIS) to inventory coastal prairie wetlands along the Upper Gulf Coast, Texas. Wetlands 31, 687–697. Erwin, K.L., 2009. Wetlands and global climate change: the role of wetland restoration in a changing world. Wetlands Ecology and Management 17, 71. ESA (2016) ESA CCI Global Land Cover Dataset. Available at https://www.esa-landcover-cci.org/ (accessed on November 21, 2016). ESA and Ramsar (2012) Global wetland project. Available at http://www.globwetland.org/ (accessed on November 21, 2016). ESRI (2016) Drone2Map for ArcGIS. Available at http://doc.arcgis.com/en/drone2map/ (accessed on November 21, 2016). Feng, M., Sexton, J.O., Channan, S., Townshend, J.R., 2016. A global, high-resolution (30-m) inland water body dataset for 2000: first results of a topographic–spectral classification algorithm. International Journal of Digital Earth 9, 113–133. Finlayson, C., Davidson, N., Spiers, A., Stevenson, N., 1999. Global wetland inventory–current status and future priorities. Marine and Freshwater Research 50, 717–727. Fluet-Chouinard, E., Lehner, B., Rebelo, L.-M., Papa, F., Hamilton, S.K., 2015. Development of a global inundation map at high spatial resolution from topographic downscaling of coarse-scale remote sensing data. Remote Sensing of Environment 158, 348–361. Gleason RA, Tangen BA, Laubhan MK, Kermes KE, and Euliss Jr NH (2007) Estimating water storage capacity of existing and potentially restorable wetland depressions in a subbasin of the Red River of the North, p. 36. U.S. Geological Survey Open-File Report 2007-1159. Gleason RA, Laubhan MK, and Euliss NH, Jr, eds. (2008) Ecosystem services derived from wetland conservation practices in the United States Prairie Pothole Region with an emphasis on the U.S. Department of Agriculture Conservation Reserve and Wetlands Reserve Programs: U.S. Geological Survey Professional Paper 1745, 58 p. Available at https://pubs.usgs.gov/pp/1745/ (accessed on November 21, 2016). Gumbricht, T., 2012. Mapping global tropical wetlands from earth observing satellite imagery. Center for International Forestry Research (CIFOR), Bogor. Harris Geospatial Solutions (2016) ENVI OneButton. Available at http://www.harrisgeospatial.com/ (accessed on November 21, 2016). Hayashi, M., van der Kamp, G., Rosenberry, D.O., 2016. Hydrology of prairie wetlands: understanding the integrated surface-water and groundwater processes. Wetlands 36, 237–254. Huang, S., Dahal, D., Young, C., Chander, G., Liu, S., 2011a. Integration of Palmer Drought Severity Index and remote sensing data to simulate wetland water surface from 1910 to 2009 in Cottonwood Lake area, North Dakota. Remote Sensing of Environment 115, 3377–3389. Huang, S., Young, C., Feng, M., Heidemann, K., Cushing, M., Mushet, D.M., Liu, S., 2011b. Demonstration of a conceptual model for using LiDAR to improve the estimation of floodwater mitigation potential of Prairie Pothole Region wetlands. Journal of Hydrology 405, 417–426. Hubbard, D.E., Linder, R.L., 1986. Spring runoff retention in prairie pothole wetlands. Journal of Soil & Water Conservation 41, 122–125. Johnston, C.A., 2013. Wetland losses due to row crop expansion in the Dakota Prairie Pothole Region. Wetlands 33, 175–182. Joseph, F.K., Jennifer, M.C., Lian, P.R., Keith, C.P., 2015. Theory and applications of object-based image analysis and emerging methods in wetland mapping. In: Tiner, R.W., Lang, M.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL, pp. 175–194. Keddy, P.A., 2010. Wetland ecology: principles and conservation. Cambridge University Press, Cambridge. Klemas, V., 2011. Remote sensing of wetlands: case studies comparing practical techniques. Journal of Coastal Research 27, 418–427. Lane, C.R., D’Amico, E., 2016. Identification of putative geographically isolated wetlands of the conterminous United States. JAWRA Journal of the American Water Resources Association 52 (3), 705–722. Lane, C., Liu, H., Autrey, B., Anenkhonov, O., Chepinoga, V., Wu, Q., 2014. Improved wetland classification using eight-band high resolution satellite imagery and a hybrid approach. Remote Sensing 6, 12187–12216. Lang, M., McCarty, G., 2009. Lidar intensity for improved detection of inundation below the forest canopy. Wetlands 29, 1166–1178. Lang, M., McCarty, G., Oesterling, R., Yeo, I.-Y., 2013. Topographic metrics for improved mapping of forested wetlands. Wetlands 33, 141–155. Lang, M.W., Bourgeau-Chavez, L.L., Tiner, R.W., Klemas, V.V., 2015. Advances in remotely sensed data and techniques for wetland mapping and monitoring. In: Tiner, R.W., Lang, M.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL, pp. 79–116. Lehner, B., Döll, P., 2004. Development and validation of a global database of lakes, reservoirs and wetlands. Journal of Hydrology 296, 1–22. Leibowitz, S.G., Mushet, D.M., Newton, W.E., 2016. Intermittent surface water connectivity: fill and spill vs fill and merge dynamics. Wetlands 36, 323–342. Lindsay, J.B., 2016. Whitebox GAT: a case study in geomorphometric analysis. Computers & Geosciences 95, 75–84. Liu, H., Wang, L., Sherman, D., Gao, Y., Wu, Q., 2010. An object-based conceptual framework and computational method for representing and analyzing coastal morphological changes. International Journal of Geographical Information Science 24, 1015–1041. Lyon, J.G., Lopez, R.D., Lyon, L.K., Lopez, D.K., 2001. Wetland landscape characterization: GIS, remote sensing and image analysis. CRC Press, Boca Raton, FL. McCauley, L.A., Anteau, M.J., 2014. Generating nested wetland catchments with readily-available digital elevation data may improve evaluations of land-use change on wetlands. Wetlands 34, 1123–1132. McFeeters, S.K., 1996. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International Journal of Remote Sensing 17, 1425–1432. Melton, J., Wania, R., Hodson, E., Poulter, B., Ringeval, B., Spahni, R., Bohn, T., Avis, C., Beerling, D., Chen, G., 2013. Present state of global wetland extent and wetland methane modelling: conclusions from a model intercomparison project (WETCHIMP). Biogeosciences 10, 753–788. Miller, M.W., Nudds, T.D., 1996. Prairie landscape change and flooding in the Mississippi River Valley. Conservation Biology 10, 847–853. Minke AGN (2009) Estimating water storage of prairie pothole wetlands. University of Saskatchewan. Mitsch, W.J., Gosselink, J.G., 2000. The value of wetlands: importance of scale and landscape setting. Ecological Economics 35, 25–33. National Wetlands Working Group, 1997. In: Warner, B.G., Rubec, C.D.A. (Eds.), The Canadian wetland classification system, 2nd edn. Wetlands Research Branch, University of Waterloo, Waterloo, ON, p. 68. Niu, Z., Zhang, H., Wang, X., Yao, W., Zhou, D., Zhao, K., Zhao, H., Li, N., Huang, H., Li, C., Yang, J., Liu, C., Liu, S., Wang, L., Li, Z., Yang, Z., Qiao, F., Zheng, Y., Chen, Y., Sheng, Y., Gao, X., Zhu, W., Wang, W., Wang, H., Weng, Y., Zhuang, D., Liu, J., Luo, Z., Cheng, X., Guo, Z., Gong, P., 2012. Mapping wetland changes in China between 1978 and 2008. Chinese Science Bulletin 57, 2813–2823. Oslund, F.T., Johnson, R.R., Hertel, D.R., 2010. Assessing wetland changes in the Prairie Pothole Region of Minnesota from 1980 to 2007. Journal of Fish and Wildlife Management 1, 131–135. Ozesmi, S.L., Bauer, M.E., 2002. Satellite remote sensing of wetlands. Wetlands Ecology and Management 10, 381–402. Ramsar Convention (2009) Wetlands: a global disappearing act. Available at https://goo.gl/GCBTT9 (accessed on November 21, 2016). Ramsar Convention (2016) Ramsar Sites Information Service. Available at https://rsis.ramsar.org/ (accessed on November 21, 2016). Ramsar Convention Secretariat, 2016. An introduction to the Ramsar convention on wetlands, 7th edn. Ramsar Convention Secretariat, Gland, Switzerland (previously The Ramsar Convention Manual). Rebelo, L.M., Finlayson, C.M., Nagabhatla, N., 2009. Remote sensing and GIS for wetland inventory, mapping and change analysis. Journal of Environmental Management 90, 2144–2153. Rover, J., Mushet, D.M., 2015. Mapping wetlands and surface water in the Prairie Pothole Region of North America. In: Tiner, R.W., Lang, M.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL, pp. 347–368.

GIS and Remote Sensing Applications in Wetland Mapping and Monitoring

157

Sader, S.A., Ahl, D., Liou, W.S., 1995. Accuracy of landsat-TM and GIS rule-based methods for forest wetland classification in Maine. Remote Sensing of Environment 53, 133–144. Sayre, R., Dangermond, J., Frye, C., Vaughan, R., Aniello, P., Breyer, S., Cribbs, D., Hopkins, D., Nauman, R., Derrenbacher, W., 2014. A new map of global ecological land unitsdan ecophysiographic stratification approach. Association of American Geographers, Washington, DC. Sloan, C.E., 1972. Ground-water hydrology of prairie potholes in North Dakota. US Government Printing Office, Washington, DC. Sörensen, R., Zinko, U., Seibert, J., 2006. On the calculation of the topographic wetness index: evaluation of different methods based on field observations. Hydrology and Earth System Sciences Discussions 10, 101–112. Steen, V., Skagen, S.K., Noon, B.R., 2014. Vulnerability of breeding waterbirds to climate change in the Prairie Pothole Region USA. PLoS ONE 9, e96747. Tiner, R.W., 1991. The concept of a hydrophyte for wetland identification. BioScience 41, 236–247. Tiner, R.W., 1997. NWI maps: what they tell us. National Wetlands Newsletter 19, 7–12. Tiner, R.W., 2009a. Ecology of wetlands: classification systems. In: Encyclopedia of inland waters. Academic Press, Oxford, pp. 516–525. Tiner, R.W., 2009b. Global distribution of wetlands. In: Encyclopedia of inland waters. Academic Press, Oxford, pp. 526–530. Tiner RW (2009c) Status report for the National Wetlands Inventory program: 2009. US Department of the Interior, Fish and Wildlife Service, Division of Habitat and Resource Conservation, Branch of Resource and Mapping Support, Arlington, TX. Tiner, R.W., 2015a. Classification of wetland types for mapping and large-scale inventories. In: Tiner, R.W., Lang, M.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL. Tiner, R.W., 2015b. Wetlands: an overview. In: Tiner, R.W., Lang, M.W., Klemas, V.V. (Eds.), Remote sensing of wetlands: applications and advances. CRC Press, Boca Raton, FL, pp. 3–18. Todhunter, P.E., Rundquist, B.C., 2004. Terminal lake flooding and wetland expansion in Nelson County, North Dakota. Physical Geography 25, 68–85. Trimble (2016) eCognition Developer 9. Available at http://www.ecognition.com/ (accessed on November 21, 2016). Tucker, C.J., 1979. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sensing of Environment 8, 127–150. UNEP-WCMC (1993) Global wetlands. Available at https://www.unep-wcmc.org/resources-and-data/global-wetlands (accessed on November 21, 2016). USDA (2016) Geospatial Data Gateway. Available at https://gdg.sc.egov.usda.gov/ (accessed on November 19, 2016). USDA-NRCS, 2010. In: Vasilas, L., Hurt, G., Noble, C. (Eds.), Field indicators of hydric soils in the United States. United States Department of Agriculture, Natural Resources Conservation Service, in Cooperation with the National Technical Committee for Hydric Soils, Washington, DC. USDA-NRCS (1997) Global distribution of wetlands map. Available at https://www.nrcs.usda.gov/ (accessed on November 21, 2016). USDA-NRCS (2016a) National List of Hydric Soils. Available at https://www.nrcs.usda.gov/wps/portal/nrcs/main/soils/use/hydric/ (accessed on November 17, 2016). USDA-NRCS (2016b) Web Soil Survey (WSS). Available at http://websoilsurvey.nrcs.usda.gov/ (accessed on November 17, 2016). USGS and ESRI (2015) Global Ecological Land Units (ELUs). Available at https://rmgsc.cr.usgs.gov/ecosystems/ (accessed on November 21, 2016). USFWS (2016) NWI Wetlands Mapper. Available at https://www.fws.gov/wetlands/Data/Mapper.html (accessed on November 21, 2016). van der Kamp, G., Hayashi, M., Bedard-Haughn, A., Pennock, D., 2016. Prairie pothole wetlands – suggestions for practical and objective definitions and terminology. Wetlands 36, 229–235. Vanderhoof, M., Alexander, L., Todd, M.J., 2016. Temporal and spatial patterns of wetland extent influence variability of surface water connectivity in the Prairie Pothole Region, United States. Landscape Ecology 31, 805–824. Winter, T., 1989. Hydrologic studies of wetlands in the northern prairie. In: Van Der Valk, A.G. (Ed.), Northern prairie wetlands. Iowa-State University Press, Ames, IA. Wu, Q., Lane, C.R., 2016. Delineation and quantification of wetland depressions in the Prairie Pothole Region of North Dakota. Wetlands 36, 215–227. Wu, Q., Lane, C.R., 2017. Delineating wetland catchments and modeling hydrologic connectivity using LiDAR data and aerial imagery. Hydrology and Earth System Sciences Discussions 2017, 1–32. Wu, Q., Lane, C., Liu, H., 2014. An effective method for detecting potential woodland vernal pools using high-resolution LiDAR data and aerial imagery. Remote Sensing 6, 11444– 11467. Wu, Q., Liu, H., Wang, S., Yu, B., Beck, R., Hinkel, K., 2015. A localized contour tree method for deriving geometric and topological properties of complex surface depressions based on high-resolution topographical data. International Journal of Geographical Information Science 29, 2041–2060. Zedler, J.B., Kercher, S., 2005. Wetland resources: status, trends, ecosystem services, and restorability. Annual Review of Environment and Resources 30, 39–74. Zhang, B., Schwartz, F.W., Liu, G., 2009. Systematics in the size structure of prairie pothole lakes through drought and deluge. Water Resources Research 45. Zheng, Y., Niu, Z., Gong, P., Wang, J., 2015. A database of global wetland validation samples for wetland mapping. Science Bulletin 60, 428–434.

Relevant Websites http://www.cifor.org/global-wetlandsdCIFOR Global Wetlands. http://www.edenextdata.com/?q¼content/global-gis-datasets-links-0dGlobal GIS Datasets. http://www.esa-landcover-cci.org/dESA CCI Global Land Cover Dataset. http://esriurl.com/eco/dGlobal Ecological Land Units (ELUs). http://freegisdata.rtwilson.com/dFree GIS Data. https://www.fws.gov/wetlands/dU.S. Fish & Wildlife Services National Wetlands Inventory. http://www.landcover.org/dGlobal Land Cover Facility. https://rsis.ramsar.orgdRamsar Sites Information Service. http://websoilsurvey.nrcs.usda.gov/dUSDA-NRCS SSURGO Database. http://www.worldwildlife.org/dGlobal Lakes and Wetlands Database (GLWD).

2.08

GIS for Natural Resources (Mineral, Energy, and Water)

Wendy Zhou, Colorado School of Mines, Golden, CO, United States Matthew D Minnick, Colorado School of Mines, Golden, CO, United States; and RESPEC Consulting & Services, Rapid City, SD, United States Celena Cui, Colorado School of Mines, Golden, CO, United States © 2018 Elsevier Inc. All rights reserved.

2.08.1 2.08.2 2.08.2.1 2.08.2.2 2.08.2.3 2.08.2.3.1 2.08.2.3.2 2.08.2.4 2.08.3 2.08.3.1 2.08.3.2 2.08.3.3 2.08.3.3.1 2.08.3.3.2 2.08.3.4 2.08.3.5 2.08.3.5.1 2.08.3.5.2 2.08.3.5.3 2.08.3.6 2.08.3.7 2.08.4 2.08.4.1 2.08.4.2 2.08.5 2.08.5.1 2.08.5.2 2.08.6 Acknowledgment References

2.08.1

Introduction GIS Application in Water Resource Geospatial Infrastructure Development for Supporting Oil Shale Development in Piceance Basin, Northwestern Colorado, USA Background Baseline Data Collection and Integration Analytical Models in Support of Oil Shale Development Three-dimensional geologic modeling Three-phase energy resource development systems models Summary GIS Application in Assessment of the Vulnerability of Surficial Aquifer System for the State of Florida, USA Background Data Collection Methods for SAS Vulnerability Assessment Transport equation Fate component: Effects of environmental factors on N transformation Model Input Parameters GIS Model and Implementation Single-step process model with blanket application Two-step model with blanket application Existing OWTS application for single-step and two-step models Results Summary GIS Application in Analyzing the Offshore Placer Gold at Nome, Alaska, USA Background Data Collection and Geodatabase Development Methods for Offshore Placer Gold Resource Estimation Results Summary Conclusions

158 160 160 160 163 163 166 168 171 171 172 173 173 173 174 177 177 177 177 178 178 178 178 180 181 183 183 184 184 184

Introduction

Natural resources embrace a broad array of categories, including agricultural, conservational, forestry, oceanic, water, energy, and mineral resources. This article only focuses on the last three. The process of natural resource development includes resource exploration, resource assessment, resource management, and resource production. Traditional methods for natural resource exploration include, but are not limited to, geophysical exploration, field geological mapping, geochemical analysis, and aerial-photo interpretations. Integration of field survey data and other pertinent information for the purpose of natural resource estimation can be a time-consuming task by traditional methods. Natural resource-related research is by nature a spatial problem. With the help of the geographic information systems (GIS) technology, most of the tasks related to natural resource development can be conducted in ways that are nearly impossible with traditional methods. With the digital mapping capacity of GIS and the ever-evolving functionalities of GIS in recent decades, the industry has shifted to using GIS as the preferred tool for resource exploration, planning, analysis, visualization, and management. Moreover, the state and federal agencies involved in the natural resource management and environmental regulating process are adopting the GIS format as the standard for communicating spatial data in digital format (Bonham-Carter, 1996; Price, 2001). The development of GIS has a long history. One of the important milestones of GIS history is the creation of the Canada Geographic Information System (CGIS), the first national GIS, in 1964. Since then, GIS became a powerful time-efficient and cost-effective technique for miners, geologists, scientists, and engineers who have been solving problems related to geospatial

158

GIS for Natural Resources (Mineral, Energy, and Water)

159

data in traditional ways for generations. There are many different definitions of GIS; for example, the United State Geological Survey (USGS) defines GIS as a computer system capable of assembling, storing, managing, analyzing, and displaying geographically referenced data, i.e., geospatial data identified according to their locations (USGS, 2007). GIS experts also look at the sum of GIS as including operating personnel and the data that go into the system. GIS has been used in managing and archiving large volume of natural resource data. A good example of such a data management system is the USGS Mineral Resources Data System (MRDS). MRDS is a collection of reports describing metallic and nonmetallic mineral resources throughout the world, which includes deposit name, location, commodity, deposit description, geologic characteristics, production, reserves, resources, and references. GIS is an ideal platform to bring data in heterogeneous formats together and deliver meaningful information. GIS can also be used to integrate survey data with block models or mine design data from other software packages such as GeoSoft, Vulcan, MineSight, SURPAC Range, or Mining Visualization System (MVS) (ESRI, 2006). There are many examples of GIS applications in natural resource management. The literature cited here does not intend to be a complete bibliography list; rather it gives examples of the GIS applications in natural resource management. For instance, Wing and Bettinger (2008) wrote a book on GIS application in natural resource management which is an introductory text book for college students majoring in forestry and natural resource management, field forestry, biology, and other natural resources or natural resource-related fields. An example given by ESRI (ESRI, 2006) is the GIS application in building a virtual threedimensional (3D) GIS model for Mayflower Gold Mine in southwest Montana. Hammond (2002) showed an example of GIS application in underground mining, which focused on four areas: land ownership and mineral claims, exploration management, production, and mine safety. Berry and Pistocchi (2003) presented an application of GIS, together with multicriteria analysis for supporting decision making, in the environmental impact assessment of open-pit quarries. Dillon and Blackwell (2003) conversed in general about GIS application in surface mining development. Starting from the exploration database, their paper described how a GIS can be used for the development of mining plans based on topography, geology, and mineralization information stored in a relational database. The application of GIS to bauxite mining in Jamaica (Almarales–Hamm et al., undated) is an example of GIS applications supported by satellite imagery and orthorectified aerial photography to manage, analyze, and display data on tonnage, ore quality, location, and ownership. A customized 3D modeling and mine planning system using ArcGIS and its extensions provides tools that assist in the decision-making process and reserve management. Water is one of the most precious natural resources on the Earth. GIS applications in water resource study embrace a broad spectrum of topics, ranging from hydrological cycle, hydrologic processes modeling (e.g. Naiman et al., 1997; National Research Council, 1999), watershed characterization, to assessment of environmental condition of water catchments (e.g. Aspinall and Pearson, 2000; Di Luzio et al., 2004), as well as water resource assessment in both quality and quantity (e.g. Cui et al., 2016; Zhou et al., 2015). Additionally, Wilson et al. (2000) examined the advancement of water resource assessment and management by integrating GIS and hydrological simulation model. GIS has also been used intensively in both conventional and unconventional fossil energy resource assessment and management. For instance, the USGS energy resource program did outstanding work in assessment of in-place oil shale resources in the Green River Formation (Johnson et al., 2011; Mercier et al., 2011a, 2011b; Mercier, 2011; Self et al., 2011; Brownfield et al., 2011). Their efforts demonstrated GIS applications in in-place resource assessment and overburden calculation. Using mineral resource as an example of natural resources, Zhou (2009) summarized the GIS applications for mining industry as the following: 1. Preproduction phase of a mine: (a) Site selection. (b) Land ownership. (c) Mineral claims. (d) Exploration management. 2. Applications of GIS to the production phase of a mine: (a) Environmental quality monitoring. (b) Facilities management. (c) Volume computations. (d) Emergency management and industrial security. (e) Transport routes. 3. Applications of GIS to the postproduction phase of a mine: (a) Reclamation, vegetation characterization. (b) Slope-aspect characterization. (c) Volume computations. (d) Visualization. 4. GIS-based analysis method for technical research in the field of minerals and mining: (a) Resource estimation. (b) Environmental impact assessment. In the following sections, three comprehensive case studies in the subjects of GIS applications in natural resource and environmental assessment will be presented. The first study is an example of GIS application in water resource geospatial infrastructure

160

GIS for Natural Resources (Mineral, Energy, and Water)

development for supporting oil shale development in Piceance Basin, Northwestern Colorado. The second study describes a GIS application in assessment of surficial groundwater aquifer vulnerability for the State of Florida. The last study presents an example of placer gold mine in-place resource estimation in Nome, Alaska. This article is a synthesis of a series of previously nonpublished and published works. Through all case studies, GIS technology has been applied to compile, integrate, analyze, and visualize natural resource data in 2D or 3D domains. It is demonstrated that GIS-based analyses have helped in making informed decisions and revealing or discovering new information or knowledge in ways that are impossible using conventional methods alone.

2.08.2 GIS Application in Water Resource Geospatial Infrastructure Development for Supporting Oil Shale Development in Piceance Basin, Northwestern Colorado, USA 2.08.2.1

Background

Large oil shale deposits are found throughout the Midwestern and Eastern United States. However, the deposits of the Green River Formation in northwestern Colorado, southwestern Wyoming, and northeastern Utah (Fig. 1) are most likely to be developed because of their richness, accessibility, and extensive prior characterization (USGS, 2005). Development of oil shale resources in Western U.S. will require significant quantities of water for oil shale retorting, reclamation, and associated economic growth. The current water consumption estimation, based on retorting methods from oil shale industry, is of 3:1 water-to-oil ratio (Wood et al., 2008). For an oil shale industry producing 2.5 million barrels of oil per day, this equates to between 105 and 315 million gallons of water per day for power generation for in-situ heating processes, retorting, refining, reclamation, dust control, and onsite worker demands. Collecting regional “baseline” data and compiling them into an integrated database is the groundwork of addressing potential water issues due to oil shale development on a regional basis (NETL, 2007). The current methods used in collecting and storing oil shale-related data overwhelm our current ability to make these valuable data resources easily available to both the scientific community and policy-makers. Despite different levels of technical knowledge, the data consumers face similar problems of locating, assembling, and integrating heterogeneous domain-specific data into a format that meets their needs. This task could be possible for the technically savvy data consumer, but often only with significant and time-consuming effort that could be better spent on data analysis. The ability to view products based on multiple heterogeneous datasets in a new and novel manner is often the key to enhancing scientific understanding. Among all the Green River Formation basins, the Piceance in northwestern Colorado has the smallest area but largest resource (Johnson et al., 2011). Hence, the Piceance Basin was selected as the study area. USGS has done intensive works in assessment of in-place oil shale resources in the Green River Formation. Readers who are interested in the formation and evolution of these oil shale basins and the assessment of in-place oil shale resources in the Green River Formation refer to a series of publications by the USGS Oil Shale Assessment Project: Oil Shale Resources of the Eocene Green River Formation, Greater Green River Basin, Wyoming, Colorado, and Utah led by Ronald C. Johnson (Johnson et al., 2011; Mercier et al., 2011a, 2011b; Mercier, 2011; Self et al., 2011; Brownfield et al., 2011). In this project, our research focused on building a GIS-based water resource geospatial infrastructure for data storing, managing, manipulating, modeling, and visualizing, and integrated the infrastructure with 3D geologic, system dynamic, and surface water resource analysis models. Study of water availability and environmental impact is a critical early step for the potential development of oil shale resources in the Western U.S. The ultimate goal of this study is to provide supporting information for water resource assessment and for better decision making on oil shale resource development in the Western U.S., as well as for facilitating environmental impact studies. Research protocols developed in this study were based on Piceance Basin but intended to be general so that they can be readily adapted to other similar study areas. The sections will present the research project of GIS application in water resource geospatial infrastructure development for supporting oil shale development in Piceance Basin, Northwestern Colorado. This presentation is a synthesis of previously published or unpublished works by a multiinstitutional research group led by the Colorado School of Mines, joined by Idaho National Laboratory, University of Texas San Antonio, and Oklahoma Geological Survey (Zhou et al., 2012, 2015; Mattson et al., 2012). The presentation of this project starts with data collection and integration, followed by 3D geologic model and analytical models in support of oil shale development, and ends with results and summary.

2.08.2.2

Baseline Data Collection and Integration

Data collected and compiled include a Microsoft Access database containing Fischer assays of oil shale drill cores for the Piceance Basin created by the USGS Oil Shale Assessment Team, the National Hydrography Dataset Plus (NHD Plus), 10-m digital elevation model (DEM), geologic maps, subsurface geology, land use dataset, vegetation classification data, stream flow, precipitation, climate, well, groundwater level, water use, water right data, and water quality data. Water quality data were collected from the USGS National Water Information System (NWIS) and the United State Environmental Protection Agency (US EPA) STOrage and RETrieval (STORET) Data Warehouse. Locations for 893 springs were collected from the Colorado Decision Support System (CDSS). Table 1 summarizes various data collected, data sources, and brief descriptions of the data (Zhou et al., 2012, 2015). The 1:100,000 scale geologic quadrangle maps (Hail and Smith, 1994, 1997) of the Piceance Basin were obtained from USGS in DJVU format and were georeferenced into ArcGIS-compatible image. Two major products, surface expression of faults and surficial

GIS for Natural Resources (Mineral, Energy, and Water)

161

Fig. 1 The locations of four Green River Formation basins (in upright diagonal fill pattern) in Colorado, Utah, and Wyoming. Modified from Zhou W, Minnick MD, Mattson ED, Geza M, and Murray KE (2015) GIS-based geospatial infrastructure of water resource assessment for supporting of oil shale development in Piceance Basin of northwestern Colorado. Computers and Geosciences 77: 44–53.

Table 1

Summary of data acquisition Description

Geodatabase feature

Watersheds (HUCS)

NHDplus

Basin feature class

Elevation

NED

Catchments

NHDplus

Stream networks

NHDplus

Flow accumulation

NHDplus

Flow gages Flow data Daymet extraction points Precipitation data (time series)

CDSS, NWIS NWIS Centroid of WARMF model catchments Daymet

Meteorological data (time series)

Daymet

Climate monitoring stations

NOAA, CDSS

Climate monitoring stations Surface water quality Aerial imagery Geologic maps

NOAA NWIS, EPA STORET USGS, NAIP, ESRI services CGS, USGS

Subsurface geology

USGS, CSM database

Wells Water level data Ground water quality Hydrogeologic data Land cover Land use/ownership Base map layers

NWIS NWIS NWIS CGS, USGS NLCD BLM USGS, ESRI Services

Springs Spring flow Diversions

CDSS CDSS CDSS

Diversion flow Pumping tests

CDSS TEOSR

Surficial geological structure Surficial alluvial deposits

Digitized from USGS Geologic Map Digitized from USGS Geologic Map

Watershed polygons at various scales from the National Hydrologic Dataset Digital elevation models 90 m, 30 m, and 10 m from the National Elevation Dataset Lowest level of surface water divisions defined by the Stream Networks from the National Hydrologic Dataset Stream line data networked in a reach and nodal system from the National Hydrologic Dataset Flow network and direction data linked to the Stream Network from the National Hydrologic Dataset USGS flow gage point locations Time series stream flow data linked to flow gage ID Points calculated at centroids of WARMF model catchments for Daymet data extraction Time series precipitation data from Daymet linked to monitoring stations, processed yearly, and monthly precipitation data trends for watersheds Time series temperature data and processed temperature datasets from Daymet Point locations for climate monitoring stations in and around the Piceance Basin Downloaded time series for up to 55 climate/weather parameters Water quality data linked to monitoring locations Color aerial imagery at varying resolutions Images of geologic maps at various scales, georeferenced, from the CGS and USGS Borehole data from exploration wells including geophysical data, formation tops, oil shale richness data input for 3D geologic model Water Wells with production and source data Time series data of water level measurements for wells Water quality data associated with wells Hydrologic parameter data derived from cores and pump tests Vegetation and barren land data from the national land cover dataset Land use and ownership data General map data including roads, towns, population, site names, USGS topographic maps Point data for locations and time series tables for flow Time series data of water flow from springs Irrigation ditches, stock ponds, reservoirs, stream pumping locations, and wells Time series data of water flow and usage Tests conducted by various institutions throughout the years compiled from nondigital documents Surface expression of faults in Piceance Basin Surficial alluvial deposits that make up the stream valleys in the Piceance Basin

Georasters Catchment feature class Hydroline feature class Related table Monitoring point feature class Time series table Custom point feature class Time series table Time series table Monitoring point feature class Time series table Monitoring point feature class Raster catalog Georasters and geoarea feature class Geovolume multipatch feature class Well point feature class Time series table Time series tables Tables Raster feature set Custom polygon feature class ESRI services not included in geodatabase Hydropoint feature class Time series tables Water discharge and water withdraw point feature classes Time series tables Point feature class Polyline feature class Polygon feature class

GIS for Natural Resources (Mineral, Energy, and Water)

Source

162

Name

GIS for Natural Resources (Mineral, Energy, and Water)

163

alluvial deposits, are georeferenced and digitized from the USGS 1:100,000 scale geologic maps for the Piceance Basin and added to the “baseline” geodatabase. Additional data are searched from the Tell Ertl Oil Shale Repository (TEOSR) at the Arthur Lakes Library of Colorado School of Mines after readily available digital resources were exhausted. The TEOSR contains materials related to oil shale and the history of the oil shale industry. Technical materials include journals, government, contractor reports, and unpublished papers of key oil shale players, original research maps, charts, and data compilations. Database is strictly defined as one or more structured sets of persistent data, managed and stored as a unit and generally associated with software to update and query the data (Litton, 1987; Navathe and Elmasri, 2002). A geodatabase is a collection of geographic datasets for use by ArcGIS (Date, 2003; ESRI, 2004a), and it can include the spatial locations and shapes of geographic features recorded as points, polylines, polygons, pixels, or grid cells, as well as their attributes and relationships among them (Date, 2003). The geodatabase format in ArcGIS functions is similar to any relational database management system (RDBMS). Data retrieved from various sources are integrated via the geodatabase format into an integrated geodatabase as shown in Fig. 2. The integrated geodatabase allows one to perform surface creation, multipatch creation, and multicriteria decision analysis tasks and is capable of performing both basic and advanced GIS operations such as spatial, geostatistical, and 3D analyses and queries. Arc Hydro, the current industry standard relational database for water resource analysis on the ArcGIS platform, was chosen as the database schema for the prototype database framework. Arc Hydro has two separate geodatabase schemas, one to support surface water datasets and the other to support groundwater datasets. The Arc Hydro framework supports a toolbar and a geoprocessing toolbox in ArcGIS for data analysis and simple modeling (Maidment, 2002). The surface water and groundwater databases were built separately at the early stage of the project because it allows us to better manage the data and avoid duplication of effort, and were then broken down into basic components and rebuilt into one geodatabase. The data schema was customized to support generating the input data of the surface water and the system dynamic models. The definition of the database schema was accomplished by selecting a “data model” on which the project geodatabase was based. A “data model” is the representation of a real-world phenomenon or system within a database with a conceptually logical framework. When designing a data model, the main features of the system must be defined using geographic features, tabular data, and relationships between those features as cardinality or topological relationships. A well-designed model or data model allows for efficient analysis of the system behavior. The Arc Hydro Data Model (AHDM) was selected as the database schema for this project because it supports the fundamental data types used in this project while being extensible, flexible, and adaptable to our modeling and web-based applications in the meantime. As mentioned above, we elected to build two separate databases at the early stage of the project: An Arc Hydro Surface Water (AHSW) geodatabase and an Arc Hydro Ground Water (AHGW) geodatabase. The two databases were integrated into one with the evolvement of the project. The basic framework of the overall geospatial infrastructure is represented in Fig. 2, which shows the relationship between the databases and the analytical models. An ArcGIS geodatabase schema is summarized in Fig. 3 for the implementation of the integrated database.

2.08.2.3

Analytical Models in Support of Oil Shale Development

Four analytical models, namely 3D geologic, and system dynamic, surface water, and groundwater models, were developed for this project. The centralized geospatial infrastructure served as the groundwork for setting up the frameworks for these analytical models. Such a centralized geospatial infrastructure made it possible to directly generate model inputs from the same database and to indirectly couple the different models through inputs/outputs. In this article, we only focus on presenting the 3D geologic and system dynamic models. Readers who are interested in the surface water and groundwater models, refer to Zhou et al. (2012, 2015).

2.08.2.3.1

Three-dimensional geologic modeling

3D visualization and volume calculation are essential for in-place nature resource evaluation. A fully attributed 3D geologic model of the Piceance Basin was built to support groundwater, and system dynamic modeling. The 3D geologic model consists of, from top to bottom, Uinta, Upper Green River, Lower Green River, and Wasatch and Mesaverde Formations. The oil shale-bearing Lower Green River strata were separated into alternating layers of oil-rich zones (R-zones) and oil-lean zones (L-zones) following the name convention from Cashion and Donnell (1972) and Self et al. (2010). There are 16 layers in the Lower Green River Formation. These are, from top to bottom, A-groove, Mahogany, B-groove, R-6 Zone, L-5 Zone, R-5 Zone, L-4 Zone, R-4 Zone, L-3 Zone, R-3 Zone, L-2 Zone, R-2 Zone, L-1 Zone, R-1 Zone, L-0 Zone, and R-0 Zone. The input data for the 3D geologic model mainly included the USGS Fischer Assay (Mercier et al., 2009), Geologic Tops data, and a 10-m DEM. In order to make sure the surface interpolations were reliable, a lengthy process of model quality assurance/quality control (QA/ QC) was conducted to correct geometric inconsistency issues in the model. Each interpolated surface was singled out. Data distribution and resulting structure representation were then verified. When necessary, additional data points were added to an interpolated surface to fill out missing sections of the original data and maintain consistency in the geometric structure and average layer thickness in the neighborhood. Fig. 4 shows the final top surface after QA/QC for the Mahogany Zone of the Green River Formation. The digitized structure information from USGS geologic maps is overlain on the surface interpolation to verify consistency in the layers. The QA/QC process was done for all interpolated surfaces in the model. After all the interpolated surfaces were verified, a full basin scale model was reconstructed at various grid resolutions (Fig. 5). Once this was completed, cross sections could be extracted

Fig. 2

The high-level geospatial infrastructure.

GIS for Natural Resources (Mineral, Energy, and Water) 165

Fig. 3 The ArcCatalog tree of the integrated database partially shows the structure of the database, such as file geodatabase tables, relationship classes, feature datasets, and rasters. Zhou W, Minnick MD, Mattson ED, Geza M, and Murray KE (2015) GIS-based geospatial infrastructure of water resource assessment for supporting of oil shale development in Piceance Basin of northwestern Colorado. Computers and Geosciences 77: 44–53.

166

GIS for Natural Resources (Mineral, Energy, and Water)

Fig. 4 Image of the top of Mahogany surface in the Green River Formation colored by elevation reveals the layer structure which is verified via the USGS structural interpretations.

from the model and exported to a 3D geospatial dataset in the project database. These layers and cross sections can be served out via ArcGIS service and accessed through ArcExplorer (Fig. 6). The Mining Visualization Systems (MVS) software by C-Tech was chosen for building the 3D geologic model due to the nature of the data and its compatibility with ArcGIS. MatLab scripts were written to process the raw assay and geologic tops data from the USGS geospatial database into MVS input files to facilitate advanced visualization and interpolation of the data set. As part of the 3D geologic framework output and input file generation for the system dynamic model, an initial retortdistribution grid was created to generate individual retort cells for detailed data interpolation (Fig. 7). Each grid cell can be input into the 3D geologic framework to create a retort block (Fig. 8). Other datasets can then be interpolated into the 3D retort framework including Fischer assay resource assessments, water content, fracture distribution, and hydrogeologic parameters (Fig. 9). The intention of generating the individual retort cells was to produce data input files for the system dynamic model. This assists specific spatial locations for estimating water usage with time that needed to process each retort column within the model.

2.08.2.3.2

Three-phase energy resource development systems models

In order to evaluate the water balance for in-situ oil shale conversion, a system dynamic model was constructed (Mattson et al., 2012) using the Powersim Studio 9Ô (version 9.01) software package. Three phases of an in-situ retort were considered: (1) a construction phase primarily accounts for water needed for drilling and water produced during dewatering, (2) an operation phase includes the production of water from the retorting process, and (3) a remediation phase involves water to remove heat and solutes from the subsurface as well as return the ground surface to its natural state (Mattson et al., 2012). Throughout these three phases, the water is consumed and produced. Consumption of water is accounted for through the drill process, dust control, returning the groundwater to its initial level, and making up water losses during the remedial flushing of the retort zone. Production of water is through the dewatering of the retort zone, and during chemical pyrolysis reaction of the kerogen conversion. The major water consumption was during the remediation of the in-situ retorting zone (Mattson et al., 2012). A large-scale hypothetical in-situ oil shale retort was simulated with the Powersim system dynamic model in the Piceance Basin. The Shell experimental area is in the northwestern part of the Piceance Basin, and the retorting site location was southwest of the Shell demonstration sites #1 and #3 as shown in Fig. 10 and was assumed to have a dimension of 3000  3000 ft. At this location, subsurface information from well CO213 was assessable in the geodatabase (see Table 2). Based on this information, it was

GIS for Natural Resources (Mineral, Energy, and Water)

Fig. 5

Output of a basin-wide 3D geologic model post QA/QC with a vertical exaggeration of 10 times.

Fig. 6

A fence diagram in ArcGIS stored in a 3D dataset in the project database exported from the 3D geologic framework.

167

168

GIS for Natural Resources (Mineral, Energy, and Water)

N

10 Miles Fig. 7

Map of initial grid used to generate spatially tied retort cells within the 3D geologic framework.

assumed that oil shale from the A-Groove through the L-0 unit would be retorted. The total volume of the retort is about 360 million cubic meters. The models of the three phases were run independently from one another, and therefore water production/consumption of the three phases must be sequentially added. Fig. 11 illustrates the cumulative water consumption for all phases of the hypothetical retort. Positive slopes represent water production while negative slopes represent water consumption. As shown in Fig. 11, although some water is consumed during drilling and dust control, water is generally produced in the first half of a retort operation due to dewatering of the retort volume and steam production of residual water during heating, whereas water is consumed in the final remediation phase. Overall, approximately 500 million barrels ( 15.75 billion gallons) of water is consumed (loss) for this hypothetical retort (Fig. 11). However, the retort is calculated to produce 341 million barrels of oil. The ratio of water to oil is 1.47 and is in the range of what the industry has claimed as the expected water use rate. Data generated by the in-situ retort system dynamic model can be imported into the geodatabase for subsequent analysis of the available water resources within the basin.

2.08.2.4

Summary

A GIS-based water resource geospatial infrastructure supporting oil shale development has been developed in this project. The geospatial infrastructure not only serves as a repository for managing large volumes of geological, hydrogeological, topological water resource and oil shale data but also as the groundwork for generating input data for different analytical models. The geodatabase within the geospatial infrastructure allows for collaborative regional/basin-wide assessments for future oil shale development based on the same “baseline.” This type of collaboration provides an ideal atmosphere for the development of new, generic approaches that utilize new technology and procedures that promote the best and most widespread use of our enormous data holdings despite their disparate locations and heterogeneous formats.

Fig. 8

Single retort cell or column within the Green River Formation. The dimensions of a cell are 3000  3000  2300 ft.

Fig. 9

Image of a cell which displays Fischer assay data, oil shale resource gallons/ton, interpolated into extracted retort framework.

170 GIS for Natural Resources (Mineral, Energy, and Water)

Fig. 10

The simulated retort location is in the northwestern part of the Piceance Basin.

GIS for Natural Resources (Mineral, Energy, and Water) Table 2

171

Subsurface information of a retorting cell obtained from the integrated GIS database

Layer name

Average thickness (m)

Volume (106 m3)

Porosity (%)

Hydraulic conductivity (cm/day)

Average oil (GPT) based on CO213

Water in matrix (PGT)

Upper GRF A Groove Mahogany B Groove R6 L5 R5 L4 R4 L3 R3 L2 R2 L1 R1 L0 R0

121.3 3.7 36.3 7.3 36.0 24.7 55.2 20.1 21.3 11.3 22.9 7.9 24.4 9.4 33.8 14.3 43.9

101.4 3.2 30.6 6.1 30.3 20.9 46.4 16.8 18.0 10.8 19.2 7.1 20.4 8.0 28.3 12.0 36.8

10 10 1 10 1 10 15 20 15 8 1 5 1 3 0.5 3 0.5

36.6 36.6 0.3 36.6 0.3 18.3 12.2 61.0 12.2 12.2 0.3 6.1 0.3 6.1 0.3 6.1 0.3

N/A 6.2 25.4 5.7 19.9 11.7 28.9 21.8 33.3 11.3 24.5 15.3 26.9 5.4 19.4 5.5 N/A

N/A 1.3 3.8 1.0 3.4 5.4 9.0 7.1 4.8 4.5 4.2 4.8 6.4 10.5 9.1 7.8 N/A

Oil volume (gal) 44,104,320 1,711,756,800 76,826,880 1,328,683,200 538,068,960 2,957,510,400 806,669,760 1,321,557,120 27,006,0960 103,958,4000 237,725,280 1,211,920,320 9,4685,760 1,210,560,000 145,860,000

Source: Zhou W, Minnick MD, Mattson ED, Geza M, and Murray KE (2015) GIS-based geospatial infrastructure of water resource assessment for supporting of oil shale development in Piceance Basin of northwestern Colorado. Computers and Geosciences 77: 44–53.

200

Water usage (mbarrels)

100 0 –100 –200 –300 –400 –500 –600 0

2

4

6

8

10

Time (yrs) Fig. 11 Cumulative water extracted versus temperature from hypothetical simulation. Water production is shown as positive water usage value, while water consumption is shown as negative water usage value.

The components of this geospatial infrastructure were designed to be interlinked. These components include data frame, geodatabases, as well as customized tools and analytical models. The interlink allows for “synchronized” updating. The final results of this project support decision makers to make informed decisions. The procedures/tools/models developed in this research were designed to be general. These procedures/tools/models are readily adaptable to other similar study areas.

2.08.3

GIS Application in Assessment of the Vulnerability of Surficial Aquifer System for the State of Florida, USA

2.08.3.1

Background

There are three main aquifer systems in the State of Florida, namely the Surficial Aquifer System (SAS), the Intermediate Aquifer System (IAS), and the Floridian Aquifer System (FAS) (Fig. 12). While the IAS and the FAS are mostly confined, the SAS comprises unconfined aquifers, including the Sand and Gravel Aquifer and the Biscayne Aquifer. Due to its proximity and connectedness to the land surface, the SAS is highly susceptible to direct infiltration of contaminants from onsite wastewater treatment system (OWTS) (Arthur et al., 2007). This study only focuses on the vulnerability of the SAS due to OWTS. In the State of Florida, OWTS has been a feasible and economical wastewater treatment option for about 30% of Florida’s population (Florida Department of Health, 2014). OWTS releases nitrogen-rich effluent mostly in the form of ammonium and nitrate, negatively impacting human and environmental health. Groundwater contamination from OWTS may reach the SAS and surface

172

GIS for Natural Resources (Mineral, Energy, and Water)

Fig. 12 Map of aquifer systems and their extent in the State of Florida (Florida Department of Environmental Protection, undated). Source: http:// www.dep.state.fl.us/swapp/Aquifer_Pframe.html.

water bodies via percolation and subsurface transport of nitrogen. The detrimental impact of excess nitrogen in the environment warrants vulnerability studies that allow the delineation of areas more or less susceptible to contamination from land use practices (Cui et al., 2016). A regional-scale GIS-based nitrogen fate and transport model (GIS-N model) was developed to assess aquifer vulnerability to contamination by examining the fate and transport of ammonium and nitrate from OWTS. The GIS-N model analyzed fate and transport of nitrogen through the unsaturated zone using a simplified advection–dispersion equation (ADE) incorporated into a GIS framework. Operational inputs considered in this model include wastewater effluent ammonium or nitrate concentration and hydraulic loading rates. The GIS-N model considers two different approaches: single-step and two-step modeling approaches. The single-step model considers a denitrification process assuming all the ammonium is converted to nitrate before land application, while the two-step model uses ammonium as an input and considers nitrification followed by denitrification. The two approaches were evaluated for two nitrogen application scenarios: a uniform blanket input of nitrogen to the entire study area and a spatially variable input based on existing OWTS locations. The resulting maps from the different modeling approaches were classified into vulnerability zones based on the natural breaks in the data histogram. It was revealed that ground water vulnerability from OWTS is sensitive to the depth to water table and first-order reaction rates, as well as parameters controlling the time and amount of conversion, respectively. Nitrate concentration is highest in areas with shallow water table depth. The vulnerability maps produced in this study will facilitate planners in making informed decisions on placement of OWTS and on groundwater protection and management (Cui, 2014). In the following sections, the study of GIS application in assessment of the vulnerability of SAS for the State of Florida is presented. This is a synthesis of previously published or nonpublished works by a research group at the Colorado School of Mines (Cui, 2014; Cui et al., 2016). The presentation of this project starts with data collection, followed by methods, input parameters, GIS implementation of the models, results, and ends with summary.

2.08.3.2

Data Collection

GIS data for this study were acquired from various sources, including the Natural Resources Conservation Service (NRCS) soil survey, Florida Department of Environmental Protection, Florida Department of Health, Florida Fish and Wildlife Conservation

GIS for Natural Resources (Mineral, Energy, and Water) Table 3

173

Summary of datasets and sources

Name

Source

Description

GeoDatabase feature

Gridded Soil Survey Geographic database

U.S. Department of Agriculture (USDA)

GeoRaster and tables

Wastewater inventory database Porosity

Florida Department of Health Rawls et al. (1982)

Coefficients

STUMOD McCray et al. (2005) Water Environment Research Foundation from USDA Florida Fish and Wildlife Conservation Commission

Soil data displayed as tables and maps Active OWTS locations Porosity values for USDA soil textures Values for coefficients used in equations Soil temperature regime

Temperature regime FL land cover

Water bodies and wetland

Polygon feature class GeoRaster GeoRaster GeoRaster Polygon feature class

Commission, and Florida Geographic Data Library (FGDL). Table 3 summarizes data sources of parameter values and spatial data used in the study. The NRCS provides soil data for the entire State of Florida from the Gridded Soil Survey Geographic (gSSURGO) database in the format of an Environmental Systems Research Institute, Inc. (ESRI) file geodatabase. Attributes used from the NRCS database include soil organic carbon, soil water content at field capacity, density, soil temperature, and soil texture. Locations of OWTS, effluent concentration, and loading rates were obtained from the Florida Department of Health. Florida land cover data were obtained from the Florida Fish and Wildlife Conservation Commission. Other parameters used in the contaminant removal calculation were obtained from the literature and reports. Nitrification, denitrification, and sorption rates were obtained from peerreviewed literature.

2.08.3.3

Methods for SAS Vulnerability Assessment

The contaminant fate and transport approach was used in determining aquifer vulnerability. The spatial variability of the input parameters was taken into account by implementing the approach on a GIS platform.

2.08.3.3.1

Transport equation

The calculation for contaminant removal in the vadose zone was based on the simplified advection–dispersion equation in N-CALC (Rao et al., 1985; McCray et al., 2010). The contaminant removal equation accounts for contaminant removal through first-order nitrification and denitrification processes and considers operational parameters (effluent concentration, effluent loading rates, porosity, and soil depth) and sorption and reaction parameters for nutrient transformation (linear sorption, nitrification rates, and denitrification rates) (Jury et al., 1987; McCray et al., 2010). The simplification ignores the effects of dispersion and assumes steady-state conditions (Schlosser et al., 2002). The simplified contaminant removal equation is an exponential decay function, which calculates the concentration of ammonium and nitrate as a function of the removal processes, expressed as:   RKr CðZ Þ ¼ C0 exp Z (1a) vz CðZ Þ ¼ expðRKr :T Þ

(1b)

where Co is the initial concentration of ammonium or nitrate (mg L 1), Z is the soil depth (cm), vz is the vertical water velocity evaluated as hydraulic loading rate divided by porosity (cm day 1), Kr is the first-order reaction rate (nitrification for NHþ 4 and denitrification for NO 3 ), and R is the retardation factor. The equation includes reaction rates, retardation, applied effluent concentration, and the travel time for attenuation of contaminants. Note that Z/vz in Eq. (1a) was replaced by travel time (T) in Eq. (1b). The velocity was estimated as hydraulic loading rate divided by the porosity assuming at steady state (Cui et al., 2016).

2.08.3.3.2

Fate component: Effects of environmental factors on N transformation

Nitrogen transformation is microbially facilitated and, thus, affected by environmental factors that influence soil microbial activity. The GIS-N model considers the effect of environmental factors on biological reaction rates by adjusting the maximum reaction rate occurring at optimum environmental conditions for the effect of soil temperature and soil moisture. Those factors, defined as response functions, are combined linearly and reflect the nonoptimum conditions controlling the first-order biological reaction rates. The first-order reaction rate, Kr, is defined as the maximum reaction rate after adjustment for nonoptimal biological activity. Krmax is adjusted for the effect of soil temperature, soil moisture, and soil organic carbon content. The effect of those processes on the first-order reaction rate is represented by response functions which are empirical factors accounting for nonoptimal conditions, expressed as: Kr ¼ Krmax ft fsw fz

(2)

174

GIS for Natural Resources (Mineral, Energy, and Water)

where Krmax is the maximum first-order reaction rate (day 1), ft is soil temperature response function, fsw is the soil moisture response function, and fz is the soil organic carbon response function. Soil temperature regulates organic carbon decomposition and nitrogen transformation processes. The temperature response function accounts for the influence of increase or decrease in temperature deviation from the optimum biological process rate, with a maximum value at optimum temperature (Youssef, 2003). The temperature response function used for both nitrification and denitrification is based on the Van’t Hoff equation:    0:5T ft ¼ exp 0:5bTopt þ bT 1  (3) Topt where T is soil temperature ( C), topt is the optimum temperature ( C) at which ft equals unity, and b is an empirical coefficient (Youssef, 2003). The Van’t Hoff equation describes the temperature effect on the nitrification and denitrification processes and accounts for the temperature sensitivity. The temperature response function results in values between 0 and 1, with the value of 1 at the optimum temperature and less than 1 at soil temperatures below and above the optimum (Youssef, 2003). Soil moisture content is another sensitive parameter in the nitrification and denitrification processes. Nitrification rates significantly decrease when soil moisture content exceeds an optimum amount and may cease at saturation. On the other hand, denitrification conditions are optimal as the relative soil moisture content reaches its maximum at complete saturation (Youssef, 2003; Barton et al., 1999). The response function for soil moisture is based on relative saturation rates at field capacity. The soil moisture response for denitrification, fsw, is expressed as: 8 0 s < sdn > < fsw;dn ¼  s  sdn e1 (4) > s  sdn : 1  sdn where s is the relative saturation, sdn is a threshold relative saturation below which denitrification does not occur, and e1 is an empirical exponent. S is the relative saturation as the ratio of actual moisture content to moisture content at saturation ranging from 0 to 1. Because of the lack of actual moisture data, the field capacity was used in place of the soil moisture content. The soil moisture response function for nitrification is expressed as: 8   1  s e2 > > sh < s  1 > fs þ ð1  fs Þ > > 1  sh > < sl  s  sh fsw ¼ 1 (5) > >   > e2 >   s  s > wp > : fwp þ 1  fwp swp  s < sl sl  swp where s is the relative saturation (field capacity), sh and sl are the upper and lower limits of the relative saturation range within which nitrification proceeds at optimum rate, swp is the relative saturation at permanent wilting point, fs and fwp are the values of the soil water function at saturation and permanent wilting point, respectively, and e2 is an empirical exponent (McCray et al., 2010; Youssef, 2003). The last function controlling the first-order rate process equation is the organic carbon response function, fz. In denitrification, microbes use soil organic carbon as an electron donor to obtain energy through oxidation (Rivett et al., 2008). The organic carbon content in soil varies with depth. The rate adjustment factor for the organic carbon response function (fz) is 1 when organic carbon is not limiting. In this study, organic carbon is assumed as nonlimiting due to the organic matter continuously supplied from the applied wastewater effluent (McCray et al., 2010). Furthermore, for nitrification, carbon dioxide gas is the main energy source for the microorganisms. Soil gas is known to have high concentrations of CO2 (Jury and Horton, 2004); thus, it is assumed that sufficient carbon will always be present for nitrification, providing that gas diffusion was not inhibited due to high soil water contents. An additional process of retardation, represented by R in Eq. (6) is considered for the positively charged ammonium adsorbing to negatively charged soils. On the other hand, nitrate is considered to not sorb and, thus, has a retardation factor of 1. Retardation is defined as: R¼1þ

Kd r q

(6)

where r (g cm 3) is the bulk density of the soil, Kd (L kg 1) is the distribution coefficient, and q (%) is the soil moisture content. Kd, the distribution coefficient, is dependent on soil types and independent on water content (McCray et al., 2005).

2.08.3.4

Model Input Parameters

The model input parameters that describe the nitrification and denitrification processes include the first-order reaction rate, sorption, and operational parameters. The operational parameters considered in this nitrogen removal model include effluent concentration, hydraulic loading rates, porosity, depth to water table, soil moisture, and soil temperature. The following sections will provide a brief description of each of these parameters. Refer to Cui (2014) and Cui et al. (2016) for detailed information on how the values of these parameters were determined.

GIS for Natural Resources (Mineral, Energy, and Water)

175

Onsite wastewater treatment system effluent releases organic nitrogen that is readily decomposed into ammonium. The total nitrogen concentration in conventional OWTS effluent is assumed to be in the form of ammonium nitrogen with a median concentration of 58 mg L 1, thus a value of 60 mg L 1 NHþ 4 nitrogen is used as the ammonium input concentration for the nitrification process in the single-step and two-step models (McCray et al., 2005). The Florida Department of Health maintains a statewide inventory of onsite sewage treatment and disposal systems for the State of Florida. The 2009 wastewater inventory database was used in this study. For parcels with an unknown wastewater treatment method, a logistic regression model was used to estimate the probability of the parcel being on an active OWTS based on parcels with a known wastewater treatment method (EarthSTEPS et al., 2009). While nitrates do not readily sorb onto soil, ammonium exhibits sorption, which slows the transport of the contaminant and allows its transformation to nitrate via nitrification. Sorption is an important process controlling ammonium transformation. Ammonium is adsorbed during the wetting pulse of effluent application and is held onto the soil for nitrification when the soil dries (Ramesh Reddy and Delaune, 2008). The cation exchange process of the ammonium sorption process is assumed to be linear, in equilibrium, and reversible (McCray et al., 2010). The seepage velocity is dependent on the effluent hydraulic loading rate and the porosity for corresponding USDA soil textures. A hydraulic loading rate (HLR) of 2 cm day 1 for subsurface trenches was used as a representative value for drain field discharge (McCray et al., 2010). The porosity values from Rawls et al. (1982) were used for the seepage velocity calculation and correspond to the USDA soil textures (Table 4). Soil texture refers to the relative proportion of particles of various sizes in a given soil and affects the percolation rate of a soil. Coarser soil textures retain less water than fine grain soils, allowing the contaminated water to leach into the subsurface faster (Witheetrirong et al., 2011). The depth to water table controls the depth available for contaminant transformation from microbial activity in the vadose zone as described by the travel time (Z/vz) in Eq. (1a). The USDA annual minimum water table depth field for soil depth input is measured as the shallowest depth to a wet soil layer (water table) at any time during the year expressed in centimeters from the soil surface, for components whose composition in the map unit is equal to or exceeds 15%. The depth to water table layer is measured using the recorded representative value with a range of 0–203 cm. The maximum reaction rate (Krmax) is adjusted by factors calculated as response functions: ft, fs, and fz to represent nonoptimal conditions. The first-order maximum nitrification and denitrification rate coefficients were obtained from the cumulative frequency diagram (CFD) of reaction rates developed by McCray et al. (2005) (Fig. 13). The CFDs were created based on the literature review of nitrification and denitrification rates observed for natural soils under both saturated and unsaturated conditions (Heatwole and McCray, 2007; Anderson and Otis, 2000). The nitrogen transformation rate increases with increasing soil temperature until the optimal value of 25  C and declines with additional increases in temperature (McCray et al., 2010). The temperature function for both nitrification and denitrification is represented by Eq. (3). The soil temperature (T) for Florida is determined from the USDA soil annual average temperature map for the contiguous United States. The USDA assigns soil temperatures based on the interpolation between NRCS soil temperature stations or through extrapolation. The soil organic carbon content is assumed to be nonlimiting for the nitrogen transformation process due to the introduction of organic carbon with OWTS effluent. The soil organic carbon response function is set to 1 to represent the presence of sufficient carbon. Soil moisture content affects the diffusivity of gases into the soil, controlling oxygen availability to nitrify microbes. The soil moisture function for nitrification represents the optimal relationship between substrate and available oxygen levels (McCray et al., 2010). The soil moisture function for the transformation processes is represented by Eqs. (4), (5). The relative saturation is calculated as the soil moisture at field capacity divided by porosity. Field capacity was used as the soil moisture value due to limited soil moisture data. Additional parameter values for coefficients present in the equation are listed in Table 5. The soil moisture function value for the nitrification and denitrification reaction rate adjustment calculation was performed in GIS. Table 4

Porosity classified by soil texture

USDA soil texture

Sample size

Total porosity/saturation, qs (cm3/cm3)

Sand Loamy sand Sandy loam Loam Silt loam Sandy clay loam Clay loam Silty clay loam Sandy clay Silty clay Clay

762 338 666 383 1206 498 366 689 45 127 291

0.437 0.437 0.453 0.463 0.501 0.398 0.464 0.471 0.430 0.479 0.475

Source: Rawls WJ, Brakensiek DL, and Saxton KE (1982) Estimation of soil water properties. Transaction of American Society of Agriculture Engineering 25(5): 1316–1320.

176

GIS for Natural Resources (Mineral, Energy, and Water)

100% 90%

Cumulative frequency

80% 70% y = 0.107Ln(x) + 0.3736 R2 = 0.9838

60% 50% 40% 30% 20% 10% 0% 0

50

100

150

200

250

Nitrification rate (day–1) 100% 90%

Cumulative frequency

80% 70% y = 0.1348Ln(x) + 0.9288 R2 = 0.9835

60% 50% 40% 30% 20% 10% 0% 0

0.5

1

1.5 2 Denitrification rate (day–1)

2.5

3

3.5

Fig. 13 CFD for the first-order nitrification rate and denitrification rate. McCray JE, Kirkland SL, Siegrist RL, and Thyne GD (2005) Model parameters for simulating fate and transport of on-site wastewater nutrients. Groundwater 43(4): 628–639. Table 5

Coefficient values used for soil moisture function

Parameter

Value

Sdn e e1 e2 Sh Sl swp Fs Fwp

0 1.4 1 1 0.85 0.5 0 0 0

The first-order reaction rate (Kr) for the nitrification and denitrification process is the maximum reaction rate adjusted by the response functions for nonoptimal biological activity. The combination of the maximum reaction rate, soil temperature function, soil moisture function, and the soil organic carbon content function is represented by the Kr values. The first-order reaction rate after adjustment for the nitrification and denitrification reaction ranges from 3.25–0.36 to 0.27–0.0046 day 1, respectively, with a Krmax value at 3.25 day 1 and 0.27 day 1, respectively.

GIS for Natural Resources (Mineral, Energy, and Water) 2.08.3.5

177

GIS Model and Implementation

The Florida aquifer vulnerability model was implemented in a GIS platform. GIS allows the integration of spatial data in heterogeneous formats to represent spatially variable events by relating a series of data layers (Bonham-Carter, 1996). In this study, ArcGIS 10.1 is used to process and manage spatial data through the input of created data layers. Each layer represents a variable in the contaminant removal equation and was algebraically combined based on the contaminant fate and transport equation. Performing the calculations in GIS allows the display of spatially variable data. The spatially variable parameters used in the calculation were incorporated into the GIS-N model to produce zonation maps illustrating Florida’s surficial aquifer vulnerability based on the concentration of ammonium and nitrate reaching the water table. The remaining concentration maps were produced for four different modeled scenarios: single-step and two-step models with uniform blanket application, and existing OWTS application for both the single-step and two-step models.

2.08.3.5.1

Single-step process model with blanket application

The single-step GIS-N model simulates the nitrification and denitrification processes separately via a raster approach through the combination of layers based on the simplified advection–dispersion Eqs. (1a), (1b). Both processes were calculated with the depth to water table value set as the soil depth (Z) input. The remaining concentration output from the contaminant fate and transport equation was determined for the depth at the water table. For the single-step nitrification model, a uniform blanket application of 60 mg L 1 of NH4-N was applied to the entire State of Florida. The model utilizes the inputted concentration in the contaminant transport equation and produces results of remaining ammonium concentration. Similarly, the denitrification model utilizes the uniform input concentration of 60 mg L 1 of NO3-N to calculate remaining nitrate concentrations for a sensitivity analysis. The initial input concentration of 60 mg L 1 of NO3-N for the single-step denitrification model assumes all of the ammonium is converted to nitrate before application to the soil surface.

2.08.3.5.2

Two-step model with blanket application

The two-step GIS-N model calculates the nitrification and denitrification processes as dependent steps. The two-step model assumes the nitrification and denitrification processes occur in a stepwise manner and not simultaneously. The first step simulates the nitrification process with an input concentration of 60 mg L 1 of NH4-N with the soil depth equivalent to the depth to water table layer capped at a maximum of 31 cm, since at a depth of 31 cm below the soil treatment unit, the nitrification process is usually completed (Fischer, 1999; Beach, 2001). Any depth to water table distance remaining after 31 cm was used as the depth in the denitrification process. The input concentration for the denitrification process is the concentration of nitrate converted from the ammonium via the nitrification process in step one. Step two is the removal of nitrate through denitrification, providing final results of remaining nitrate concentration at the water table depth. For the blanket application approach, a uniform input concentration of contaminant is applied to the entire State.

2.08.3.5.3

Existing OWTS application for single-step and two-step models

The OWTS model calculates nitrogen removal with the single-step and two-step approaches based on effluent input at existing locations of active OWTS in a point feature approach. An initial concentration of contaminant was applied to the areas influenced by OWTS effluent discharge. Parameters from the developed raster layers were converted to point feature classes for the calculation. The calculated result of remaining nitrate concentration from the single-step and two-step modeling methods provided information on areas currently affected by OWTS. The point feature results were then interpolated with the kriging method described below to provide a probability of nitrate exceeding a given threshold and areal extent of contaminant influence from OWTS effluent loading. Kriging is a stochastic method for interpolation used to predict values for unmeasured locations by weighing the surrounding measured values based on autocorrelation. Kriging associates probability with the predictions and also assesses the errors. Predictions are computed by assigning weights based on distance between measured points, prediction locations, and spatial arrangement among the measured points (ESRI, 2003). Points that are closer together are assumed to be more similar than points further apart. The observed trend which models autocorrelation as a function of distance can be described by different kriging models. Indicator kriging predicts the probability of a point exceeding a given threshold through the process of ordinary kriging. Indicator kriging determines the probability of the remaining contaminant levels at the water table reaching the set threshold. The model is based on the equation described below: IðsÞ ¼ m þ 3 ðsÞ

(7)

where I(s) is a binary variable, m is the unknown mean constant, and 3 (s) is the autocorrelated error. Continuous data points are converted to binary values (0 or 1) based on the threshold, with 0 if the value is below the threshold and 1 if the value is above the threshold. The indicator kriging method was applied to the existing OWTS application for the single-step and two-step models. A different threshold was used for the single-step and two-step models based on the distribution of remaining nitrate concentration determined from the geometric interval.

178 2.08.3.6

GIS for Natural Resources (Mineral, Energy, and Water) Results

Florida surficial aquifer vulnerability maps were produced from the GIS-N single-step and two-step models with uniform blanket application, and existing OWTS application for the single-step and two-step models. The models indicated the likelihood of areas susceptible to nitrate contamination based on remaining nitrate concentrations calculated from the contaminant fate and transport equation. Fig. 14 shows the Florida surficial aquifer vulnerability map based on the two-step OWTS nitrogen removal model with vulnerability classification based on the natural break in the predicted probability of exceedance. Among the four nitrogen removal models, each has different advantages and limitations. A comparison of the four models provided in Table 6 will assist in determining the optimal model to utilize in reducing the human and environmental impacts of land use decisions.

2.08.3.7

Summary

Florida’s SAS is vulnerable to contamination from anthropogenic sources stemming from land use practices. In Florida, OWTS contributes to nitrogen loading into the vadose zone and aquifer system. This study modeled the fate and transport of nitrogen in the vadose zone based on a simplified groundwater flow equation implemented in a GIS platform. The resulting aquifer vulnerability maps produced with spatially variable soil data will facilitate efforts of land management in protecting water resources for better human and environmental health. The key findings of the study are listed below: 1. Depth to water table in Florida is generally shallow, ranging from 0 to 203 cm with 24.3% of the area  5 cm. The most vulnerable areas correlate with shallow depth to water table measurements. 2. Zones within the Gravel and Sand Aquifer are less vulnerable. The lower vulnerability is attributed to the deeper depth to water table measurement. A lower vulnerability correlates with known occurrences of silt and clay confining lenses. 3. Streams near OWTS are of concern due to the discharge of the groundwater as baseflow. Groundwater from the SAS moves along quick and short flow paths, which can prevent any additional denitrification in the groundwater.

2.08.4

GIS Application in Analyzing the Offshore Placer Gold at Nome, Alaska, USA

2.08.4.1

Background

Nome is located on the southern coastline of the Seward Peninsula, on the northern coast of Norton Sound to the west of the State of Alaska (Fig. 15). There is abundant placer gold offshore (Koschmann and Bergendahl, 1968; Garnett, 2000) at Nome because Florida surficial aquifer vulnerability

Relative vulnerability (probability of exceedence) Less vulnerable (0—0.30) Vulnerable (0.30—0.71) More vulnerable (0.71—1) Water bodies/wetlands

N

0

45

90

180 Miles

Fig. 14 Florida surficial aquifer vulnerability map based on the two-step OWTS nitrogen removal model with vulnerability classification based on the natural break in the predicted probability of exceedance.

GIS for Natural Resources (Mineral, Energy, and Water) Table 6

179

Advantages and limitations of the different nitrogen removal models

Nitrogen model

Advantages

Limitations

Single-step blanket application

l

Provides sensitivity analysis Most simple, with the least assumptions and uncertainty l Considers vulnerability based on soil conditions for the entire State l Calculates vulnerability based on soil conditions of the entire State l Considers both the nitrification and denitrification processes

l

Considers aquifer vulnerability from current location of OWTS

l

Two-step blanket application

Single-step existing OWTS application

Assumes 60 mg/L NO3–N as initial contaminant concentration input Not representative of current aquifer vulnerability conditions from OWTS effluent Only considers the denitrification process Bias at shallow water table depth Not representative of current aquifer vulnerability conditions Nitrification and denitrification not modeled as simultaneous reactions Assumes 60 mg/L NO3–N as initial contaminant concentration input Estimate of vulnerability based on kriging interpolation not representative of radial extent of OWTS effluent Only considers the denitrification process Bias at shallow water table depth Nitrification and denitrification not modeled as simultaneous reactions Estimate of vulnerability based on kriging interpolation not representative of radial extent of OWTS effluent

l

l

l l l l l

l l

Two-step existing OWTS application

Considers aquifer vulnerability from existing OWTS effluent discharge l Considers both the nitrification and denitrification processes l

l l l

N

Nome Boundary of Alaska OtherA 0

1,250

2,500

5,000 Km

Tomcod Silver Red

Pink King Herring OtherC Halibut

Humpy

0

1.25

2.5

0

0.75

1.5

Coho

5

7.5

10

Km

Study area with dividing leasing blocks 3

4.5

6 Miles

Fig. 15 Location of the study site at Nome, Alaska. The individual block in the study site is for leasing purpose only. There are no geologic and bathymetric influences in dividing these blocks.

180

GIS for Natural Resources (Mineral, Energy, and Water)

fluvial and glacial processes transported gold from gold-enriched bedrock in the uplands into the marine environment where it was further concentrated by wave and current actions. The Nome offshore area was studied extensively due to the extent and richness of the placer gold resources. Geological, geophysical, and geochemical characteristics of offshore gold deposits were well documented in the literature. The USGS and United States Bureau of Mines (USBM) summarized much of the geology of the area (Nelson and Hopkins, 1972; Tagg and Greene, 1973; Bronston, 1990). There are 22 metasedimentary, metavolcanic, and metaplutonic bedrock units in the area. The Nome Group is a series of four lithostratigraphic units that are locally deformed by low-angle thrust faults. The Nome Group consists of the following four subunits (Bundtzen et al., 1994): (1) basal, complexly deformed quartz-rich politic schist, (2) mafic and pelitic schists and marble, (3) mafic-dominated schist assemblage, and (4) dirty marble (Bundtzen et al., 1994). Nome is one of the most recently active areas of marine and beach placer mining in the State of Alaska (Koschmann and Bergendahl, 1968; Garnett, 2000). As early as 1897–1962, the Nome area produced about five million ounces of gold (Koschmann and Bergendahl, 1968). Because of the huge amount of data in a variety of forms accumulated from placer gold exploration and production over the past century, the analysis and management of these data for future development of the Nome offshore gold resource is an enormous task. GIS technology is a tool well suited to meet this challenge. A GIS-based approach to data management provides an in-depth understanding of the Nome offshore gold deposit, which, in turn, will greatly assist the development of the offshore gold resource in that area. In the following sections, a study of GIS application in placer gold resource estimation in Nome, Alaska, is presented. This is a synthesis of previously published or unpublished works by a research group at the University of Alaska Fairbanks (Huang et al., 2001; Chen et al., 2003, 2005; Li et al., 2005; Luo et al., 2005; Zhou et al., 2007; Zhou, 2009). The leading author of this article was part of that research team. The presentation of this case study starts with data collection and geodatabase development, followed by methodology, resource estimation, and ends with summaries.

2.08.4.2

Data Collection and Geodatabase Development

Data sources for this study include private sectors of mining and mineral exploration, published literature, unpublished reports, maps and open file reports from government agencies, documents from recording offices, and information from the Internet. During the summers of 1986 and 1987, WestGold Exploration Mining Company, Limited Partnership (WestGold) carried out 3400-line km of high-resolution seismic surveys of the lease area. Seismic data were interpreted to provide facies interfaces and thicknesses, allowing faulting to be identified and profiles to be drawn. Simultaneous side scan sonar surveys, with 3-mm penetration, were used to map sediment type on the seafloor. From 1987 to 1989, WestGold completed 2530 holes and collected 57 bulk samples. Each hole was drilled in 1-m increments. The sediment from each 1-m interval was collected and stored, a brief sediment description was recorded and the gold content was assayed (Bronston, 1989). Data files from 3468 drill holes in the offshore area at Nome were reformatted and compiled. These file types were the principal sources of information for this project (Huang et al., 2001). Most drill hole logs record lithology, gold concentration value, and penetration blow count. Blow count data, the number of blows needed to drive each barrel through a sample length of 30 cm, provide sediment hardness information. A geologic key describes lithologic types intercepted. Gold concentration values are tabulated in oz/m3 and oz/yd3, along with “reclassified” and “intensity” values. Reclassified value is the relative gold concentration as compared with a given reference. Intensity value shows reclassified gold values (from 0 to 45) as nine even intervals with intensity of nine signifying highest gold concentration. The research team at UAF then re-compiled and integrated the information to study the geologic characteristics, geochemical and geophysical signatures, borehole data, economic considerations, oceanographic factors, submarine topography, and potential environmental impacts (Zhou et al., 2007; Zhou, 2009). In this project, GIS was applied to analyze the offshore marine placer gold deposits at Nome, Alaska. Two geodatabases, namely Integrated Geodatabase (IG) and Regularized 2.5D Geodatabase (R2.5DG), were created in this study. The IG served as a data container to manage various geological data, such as borehole, bedrock geology, surficial geology, and geochemical data. The R2.5DG was generated based on the IG and was used for gold resource estimate. The two geodatabases are linked and related (cross-referenced) to each other through common attributes. Other features of this GIS infrastructure include data updating, query, analyzing, visualizing, and volume calculation. The gold resource estimation at various cutoff grades can be calculated interactively. A geostatistical study was carried out for optimizing the resource estimation approach. Fig. 16 shows the flowchart of the conceptual GIS architecture of this project. Table 7 lists the feature classes stored in the IG. The IG is capable of displaying 3D surfaces using ArcGIS extensions, such as 3D Analyst, Geostatistics Analyst, and Spatial Analyst. However, it does not have the capability for solid 3D analyses and cannot yield information on volume calculation and resource estimation. In order to perform resource estimation, the Regularized 2.5 Dimensional Geodatabase (a quasi-3D geodatabase) was created by subdividing the study area into cells. The R2.5G is able to define ore body boundaries, perform grade extrapolation, and estimate the resource at any given spatial domain. The study area is approximately 40 km2. There were 3468 drill holes distributed irregularly and spaced from 50 to 120 m within the study area. The study area was divided into 10  10 m grids to develop the regularized geodatabase. The division formed 404,319 spatial records, with one record for each cell. The finer the grid, the more precise the result. However, finer grids would demand more computer storage and take longer time to perform ore resource estimation. For the purpose of this project, a 10  10 m grid is sufficient for the precision and yet time efficient to perform ore resource estimation (Zhou et al., 2007).

GIS for Natural Resources (Mineral, Energy, and Water)

Original Drillhole

ArcMap ArcCatalog ArcScene

Orebody Boundary Grades Prediction

Integrated Geodatabase

VBA, VC MapObject ArcObject

Regularized 2.5D Geodatabase ArcIMS ArcObject VBA

Information query statistics analysis

181

Advanced query Advanced statistics Advanced analysis

Internet distribution Fig. 16 Conceptual GIS architecture of the project. Zhou W, Chen G, Li H, Luo H, and Huang SL (2007) GIS application in mineral resource analysisdA case study of offshore marine placer gold at Nome, Alaska. Computers and Geosciences 33: 773–788. Table 7

List of feature classes in integrated geodatabase

Feature class

Feature type

Description

DH_location DH_segmentinfo DH_sliceinfo GeoDH_sliceinfo Lith_slice_Poly OnshoreGeo OffshoreGeo Structure Permitblk Rivers Studyarea

Table Table Table Point Polygon Polygon Polygon Polygon Polygon Polygon Polygon

Drill hole location info Borehole segment attributes Layered ore body information Layered ore body information Layered lithology Onshore geology Ocean floor geology Offshore sediment structure elements Exploration permit blocks Rivers, town, and roads of Nome Study area boundary

Source: Zhou W, Chen G, Li H, Luo H, and Huang SL (2007) GIS application in mineral resource analysisdA case study of offshore marine placer gold at Nome, Alaska. Computers and Geosciences 33: 773–788.

2.08.5

Methods for Offshore Placer Gold Resource Estimation

Any in-place resource calculation problems must deal with estimating two interrelated items: the grade and the associated volume. This grade could be the economic cutoff grade or the average grade within a prespecified volume. The volume of ore can be estimated based on ore body boundary. Five steps were used for in-place placer gold estimation. The first step for resource calculation is to determine ore body thickness and average gold grade for every borehole. Calculation of average gold grade for each borehole based on various cutoff grades is a time-consuming task. Since ore body boundaries are different at various cutoff grades, their associated thickness and average grade also are different. The calculations are performed repetitively in MS Access using SQL language, based on segment records stored in the borehole segment information table in the Integrated Geodatabase. The calculated average grades and thicknesses of various layers form the layered ore body information table. The second step is to create a point layer and grid coverage and add them in the 2.5D Geodatabase. The point layer is used to store the layered ore body grades and thicknesses. In this layer, the entire study area is divided into 10  10 m grids, and each point feature object in the center of a cell represents this cell and all of the grades and thickness are stored in this point object. The entire study area is approximately 40.4319 km2. The Regularized 2.5D Geodatabase contains 404,319 spatial point records, each point record representing one 10  10 m-grid cell. The third step is to determine the thickness for each cell created during the second step. The attribute data in GeoDH_sliceinfo layer is utilized to build ore body boundaries. To determine the ore body thickness of each regularized cell, two ore body boundaries are generated based on the levels of the ore body’s beginning depth and the ending depth, respectively. The ore body boundaries are interpolated using the Natural Neighbor method, which is an interpolation method that estimates the value of a cell using weighted values of the input data points that are their natural neighbors, determined by creating a triangulation of the input points. The Natural Neighbor tool in ArcGIS can efficiently handle large numbers of input points. Other interpolators may have difficulty with large point datasets (ESRI, 2004b).

182

GIS for Natural Resources (Mineral, Energy, and Water)

The next step is to interpolate the gold content for each regularized cell. Five interpolation methods are investigated and compared. IDW and IK were selected for interpolating the gold concentration contour maps (Li et al., 2005). The procedure of geostatistical analysis will be described in the next section. The last step is to estimate the gold resource within the study area. After the thickness (T) and average grade (G) for each cell are P P Ti can be obobtained, one new column (G_ T) can be created to store Gi  Ti . For any selected polygon area, ðGi  Ti Þ and tained using the “statistics” tool in ArcMap The gold resource within a selected a polygon area is estimated by the following equation: X R¼ ðGi  Ti Þ  10  10 (8) where Gi is grade value of each cell above cutoff grade and Ti is the thickness of each cell. The average grade (Gave) within a polygon area is given by:  1 X Gave ¼ P ðGi  Ti Ti

(9)

Table 8 gives total resource estimates at different cutoff grades within the entire study area. Both the actual and normalized gold values are calculated and stored in the IG. Depending on the cutoff grade, the total amount of gold resource ranges from 113,767 oz (with a cutoff grade of 1000 mg/m3) to 2,309,664 oz (with a cutoff grade of 0 mg/m3). The average grade ranges from 0.233 g/m3 (with a cutoff grade of 0 mg/m3) to 1.929 g/m3 (with a cutoff grade of 1000 mg/m3). During the surface interpolation process, five geostatistical approaches built-in in ArcGIS are evaluated and compared. These geostatistical approaches include Inverse Distance Weighted (IDW), Ordinary Kriging (OK), Ordinary Kriging with lognormal transformation (OK-log), Simple Kriging (SK), and Indicator Kriging (IK). Cross-validation is used to examine the accuracy of the five approaches. It is determined that the OK and SK algorithms are not suitable for this dataset because of the difficulty of fitting the semivariogram model despite their lower root-mean-square error (RMSE) and the higher root-mean-square standardized (RMSS). The OK-log algorithm does not provide satisfactory prediction either, because it places too much weight on the role of extreme values resulting in overestimation and extremely high RMSE. Reserve estimation of each block is conducted using each of the interpolation methods and compared with the reserve calculated using the conventional polygon method. The IWD and IK algorithm seem to provide estimations more agreeable with the conventional polygon method. The geostatistical analysis will be briefly described next. Detailed information about this geostatistical study can be found in Li et al. (2005). IDW is one of the many methods to perform interpolation of scattered spatial data. A neighborhood about the point to be interpolated is selected and a weighted average is calculated of the observed values within this neighborhood. The weighting factors of observed points are a function of the distance between the observed points to the interpolated point; the closer the distance the bigger the weights. The interpolating function is constructed as a linear combination of the observed values vi at point x multiplied with weight functions wi (Fisher et al., 1987): f ðxÞ ¼

n X

vi wi ðxÞ

(10)

i¼1

where the weight factors wi ði ¼ 1; 2.; nÞ are constructed by normalizing each inverse distance: n .X l i wi ðxÞ ¼ dl dj j ðxÞ i ðxÞ

(11)

j¼1

where di(x) is the Euclidean distance from point x to node xi, and li is the exponential power of weight. Each weight function wi is inversely proportional to the distance from the point xi where the value vi is prescribed. The sum of the proportional factors wi ði ¼ 1; 2.; nÞ should be equal to one. The possible redundancy between samples depends not simply on the distance but also on the spatial continuity. Kriging methods, which use a customized statistical distance rather than a geometric distance to decluster the available sample data, are linear estimation that develops optimal weights to be applied to each sample in the vicinity of the block being estimated. They

Table 8

Resource estimations at different cutoff grades

Cutoff grade (mg/m3)

Orebody area (m2)

Average thickness (m)

Average grade (mg/ m3)

Ore volume (m3)

Resource (Kg)

Resource (Oz)

0 200 500 800 1000

40429800 18060600 2438900 740400 404600

7.63 4.86 5.28 5.33 4.53

0.233 0.554 1.047 1.441 1.929

308325290 87737850 12886388 3948949 1834422

71840 48607 13492 5690 3539

2309664 1562718 433772 182949 113767

Source: Zhou W, Chen G, Li H, Luo H, and Huang SL (2007) GIS application in mineral resource analysisdA case study of offshore marine placer gold at Nome, Alaska. Computers and Geosciences 33: 773–788.

GIS for Natural Resources (Mineral, Energy, and Water) Table 9

183

Comparison of prediction errors (Herring block)

Interpolation method

RMSE

MSE

IDW Ordinary Kriging Ordinary Kriging (Log) Simple Kriging Indicator Kriging

455.1 452.7 718.6 450.5 452.9

0.0714 0.000218 0.0568 0.003029 0.067

Notes: RMSE root mean square error; MSE mean square error.

are based on the best linear unbiased estimation, which minimizes the mean error and the variance of the errors. They depend on the statistical model employed and the following mathematical formula. Z ðsÞ ¼ mðsÞ þ 3 ðsÞ

(12)

where Z(s) is the value of interest, m(s) is the deterministic trend, and 3 (s) is auto correlated errors. The variable s indicates the location which is the spatial coordinates in ArcGIS. Based on different assumptions of the error term, 3 (s), there are several different kriging methods, such as OK, SK, Universal Kriging, IK, Cokriging, and others. All of them are built-in in ArcGIS Geostatistical Analyst extension. Cross-validation is used for assessing the surface interpolation of different kriging methods. The mean standardized prediction errors (MSE) and the RMSE are shown in Table 9. The RMSE for estimates made using OK-log is the highest, whereas for others they are almost the same. The OK-log algorithm will not provide satisfactory prediction either, because it places too much weight on the role of extreme values resulting in overestimation and extremely high RMSE. It is determined that the OK and SK algorithms are not suitable for this dataset because of the difficulty of fitting the semivariogram model despite their lower RMSE and higher RMSS.

2.08.5.1

Results

The reserves of Herring block (see Fig. 15 for the location of Herring block) were calculated using the five interpolation methods. The tonnages of gold for Herring block at various cutoff grades are shown in Table 10. The tonnage estimated by OK-log is excluded in this table due to its extremely high RMSE and the tonnages almost double than those of other interpolation methods. It is deemed unsuitable for this dataset. For comparison, resource estimation was also done using the Thiessen polygons method. At lower cutoff grade from 0 to 400 mg/m3, the tonnages estimated using various interpolation methods show significant difference (Fig. 17), and the difference rapidly increases at higher cutoff grade. Thiessen Polygon method gets higher tonnages at all cutoff grade levels up to 1400 mg/m3 and IDW has the second highest estimation. Krigings (OK, SK, IK) have no estimation above the cutoff grade of 800 mg/m3 due to the smoothing effect.

2.08.5.2

Summary

Two databases were developed during this research, namely the IG and the R2.5DG. The IG is a data container which could be used for data management and information query, and the R2.5DG is capable of handling volume calculation, and resource estimation. A customized resource approach was built around this project, which can be used to estimate gold resources at different cutoff grades and different spatial domains in a time-efficient way. The geostatistical analysis in this case study has revealed that gold concentration dataset collected from the Nome offshore deposit are near lognormal distribution, positive skewed, directional, and have outliers. IDW, OK, OK-log, SK, and IK were assessed and compared with each other in the study. Cross-validation technique is employed to examine the accuracy of the five approaches. It is determined that the IWD and IK algorithm seem to provide estimations more agreeable with the conventional polygon method. Table 10

Tonnages of gold of Herring Block with various cutoff grades (Herring block)

Methods/cutoff grade

Tonnage (Kg) of gold at various cutoff grade (mg/m3)

Cutoff grade (mg/m3) IK IDW OK SK Polygon

0 5380 5209 5353 5427 4999

200 4761 4482 4737 4993 4146

400 2974 2696 2869 2548 2999

600 424 1336 710 516 2460

800

1000

1200

1400

749 124

438 2

294

200

1868

1246

1079

923

184

GIS for Natural Resources (Mineral, Energy, and Water)

6,000 IK 5,000

IDW OK

Tonnage (kg)

4,000

SK Polygon

3,000

2,000

1,000

0 0

200

400

600

800

1000

1200

1400

3

Cut-off grade (mg/m ) Fig. 17 Tonnages of gold with various cutoff grades (Herring block). Li H, Luo H, Chen G, Zhou W, and Huang SL (2005) Visualized geostatistical analysis of Nome offshore gold placer deposit using ArcGISdA case study. In: The Proceedings of 2005 Society for Mining, Metallurgy, and Exploration (SME) Annual Meeting, 28 February– 2 March 2005. Salt Lake City, Utah.

2.08.6

Conclusions

In this article, we have demonstrated the comprehensive applications of GIS in natural resource analyses through three case studies. Natural resources embrace a broad category of various resources. We were only able to focus on a few, i.e., water, energy, and mineral resources. Natural resource analyses are by nature spatial problems. The digital and geospatial capacities of GIS can bring the analyses to a new depth that was impossible or nearly impossible to achieve by the traditional ways. These in-depth analyses presented in this article include: 1. 2. 3. 4. 5. 6. 7.

Managing a large amount of data in heterogeneous formats. Integrating geological data and information management. Constructing geospatial infrastructure as a central data repository and as a connector for various analytical models. Visualizing geospatial data in two dimensions and three dimensions. Enabling 3D geological or subsurface modeling. Quantitatively analyzing groundwater vulnerability. Interactively estimating total in-place mineral resource with different cutoff grades.

Traditional methods for natural resource analysis are very important. These methods collect first-hand data and cannot be replaced by any other means. The combination of traditional methods and GIS-based methods is essential for any natural resource analysis project. Traditional methods collect the first-hand data and GIS bring the analysis to a new depth in a time-effective and costefficient fashion.

Acknowledgment The Piceance Basin water for oil shale project was funded by the US Department of Energy (Award # DE-NT0006554). The Florida groundwater vulnerability project was funded by the Florida Department of Health through a subcontract with HAZEN and SAWYER, P.C. (JOB No.: 44237–001). The Nome placer gold GIS project was funded by the MMS (Minerals Management Services, U.S. Department of the Interior).

References Almarales–Hamm D, Campos GD,, Murray B and Russell N (undated) The application of GIS to bauxite mining in Jamaica. http://www.esri.com/mapmuseum/mapbook_gallery/ volume19/mining7.html. (Accessed August 2005). Anderson DL and Otis RJ (2000) Integrated wastewater management in growing urban environments. In: Managing soils in an urban environment. Agronomy Monograph 39. American Society of Agronomy, Crop Science Society of America, Madison, Wisconsin: Soil Science Society of America. Arthur, J.D., Wood, H.A.R., Baker, A.E., Cichon, J.R., Raines, G.L., 2007. Development and implementation of a Bayesian-based aquifer vulnerability assessment in Florida. Natural Resource Research 16 (2), 93–107.

GIS for Natural Resources (Mineral, Energy, and Water)

185

Aspinall, R., Pearson, D., 2000. Integrated geographical assessment of environmental condition in water catchments: Linking landscape ecology, environmental modeling and GIS. Journal of Environmental Management 59 (4), 299–319. Barton, L., McLay, C.D.A., Schipper, L.A., Smith, C.T., 1999. Annual denitrification rates in agricultural and forest soils: A review. Australian Journal of Soil Research 37, 1073–1093. Beach DN (2001) Infiltration of wastewater in columns. M.S. thesis. Colorado School of Mines, Golden. Berry, P., Pistocchi, A., 2003. A multicriterial geographical approach for the environmental impact assessment of open-pit quarries. International Journal of Surface Mining, Reclamation and Environment 17 (4), 213–226. Bonham-Carter GF (1996) Geographic information systems for geoscientistsdModeling with GIS. Computer methods in the geosciences (vol. 13, 1st edn.). Tarrytown: Pergamon Press and Elsevier Science Ltd. 1994, reprint in 1996. 398p. Bronston, M.A., 1989. Offshore placer drilling technologydA case study from Nome. Alaska: Mining Engineering 42 (1), 26–31. Bronston MA (1990) A view of sea-floor mapping priorities in Alaska from the mining industry: C1052. US Geological Survey Reports on Alaska Released in 1991, 1990, pp. 86–91. Brownfield ME, Self JG, and Mercier TJ (2011) Fischer assay histograms of oil-shale drill cores and rotary cuttings from the Great Divide, Green River, and Washakie Basins, Southwestern Wyoming. USGS Digital Data Series DDS-69-DD, 9p. Bundtzen TK, Reger RD, Laird GM, Pinney CS, Clautice KH, Liss SA, and Cruse GR (1994) Progress report on the geology and mineral resources of the Nome Mining District. Division of Geological & Geophysical Surveys, Public-Data File 94–39. Cashion WB and Donnell JR (1972) Chart showing correlation of selected key units in the organic-rich sequence of the Green River Formation, Piceance Creek Basin, Colorado, and Uinta Basin. Utah: U.S. Geological Survey Oil and Gas Investigations Chart OC-65. Chen G, Huang SL, and Zhou W (2003) GIS applications to Alaskan near-shore marine mineral resources, stage II: enhancement of web site; improvement of GIS structure; predictive model development; and new site identification and study. An unpublished report to Minerals Management Services, U.S. Department of the Interior by University of Alaska Fairbanks, 45p. Chen G, Zhou W, Huang SL, Li H, and Luo F (2005) GIS applications to Alaskan near-shore marine mineral resources (phase III). An unpublished report to Minerals Management Services, U.S. Department of the Interior by University of Alaska Fairbanks, 159p. Cui C (2014) GIS-based nitrogen removal model for assessing Florida’s surficial aquifer vulnerability from onsite wastewater treatment systems. Master’s thesis, Colorado School of Mines, 86p. Cui, C., Zhou, W., Geza, M., 2016. GIS-based nitrogen removal model for assessing Florida’s surficial aquifer vulnerability. Environmental Earth Sciences 75 (6), 1–15. Date, C.J., 2003. An introduction to database systems, 8. Addison Wesley, Reading. 1024p. Di Luzio, M., Srinivasan, R., Arnold, J.G., 2004. A GIS-coupled hydrological model system for the watershed assessment of agricultural nonpoint and point sources of pollution. Transactions in GIS 8 (1), 113–136. Dillon, U., Blackwell, G., 2003. The use of a geographic information system for open pit mine development. CIM Bulletin 96 (1069), 119–121. EarthSTEPS, LLC and GlobalMind. (2009) Statewide inventory of onsite sewage treatment and disposal systems in Florida. Final Report prepared for the Florida Department of Health. http://www.floridahealth.gov/healthy-environments/onsite-sewage/research/_documents/research-reports/_documents/inventory-report.pdf. (Accessed 12 December 2013). ESRI (2003) Using ArcGIS Geostatistical Analyst. ESRI (Environmental Systems Research Institute, Inc.) Digital Book, 306p. ESRI (2004a) Building a geodatabase. ESRI (Environmental Systems Research Institute, Inc.) Digital Book, 382p. ESRI (2004b) Using spatial analyst. ESRI (Environmental Systems Research Institute, Inc.) Digital Book, 232p. ESRI (2006) GIS best practices for mining. http://www.esri.com/industries/mining/business/literature.html. (Accessed August 2006). Fischer E (1999) Nutrient transformation and fate during intermittent sand filtration of wastewater. M.S. thesis, Colorado School of Mines, Golden. Fisher, N.I., Lewis, T., Embleton, B.J.J., 1987. Statistical analysis of spherical data. Cambridge University Press, Cambridge, 329p. Florida Department of Environmental Protection. (undated). http://www.dep.state.fl.us/swapp/Aquifer_Pframe.html. (Accessed 24 February 2017). Florida Department of Health (2014) Onsite sewage. http://www.floridahealth.gov/healthy-environments/onsite-sewage/index.html. (Accessed 30 January 2014). Garnett, R.H.T., 2000. Marine placer gold, with particular reference to Nome, Alaska. In: Cronan, David S. (Ed.), Handbook of marine mineral deposits. CRC Press, Boca Raton, pp. 67–101. Hail WJ and Smith MC (1994) Geologic map of the northern part of the Piceance Creek Basin, northwestern Colorado. U.S. Geological Survey Miscellaneous Investigations Series Map I-2400. Hail WJ and Smith MC (1997) Geologic map of the southern part of the Piceance Creek Basin, northwestern Colorado. U.S. Geological Survey Miscellaneous Geologic Investigations Map I-2529. Hammond, A.D., 2002. GIS applications in underground mining. Mining Engineering 54 (9), 27–30. Heatwole, K.K., McCray, J.E., 2007. Modeling potential vadose-zone transport of nitrogen from onsite wastewater systems at the development scale. Journal of Contaminant Hydrology 91, 184–201. Huang SL, Chen G, Maybrier S, and Brennan KL (2001) GIS applications to Alaskan near-shore marine mineral resources. An unpublished report to Minerals Management Services, U.S. Department of the Interior by University of Alaska Fairbanks, 280p. Johnson RC, Mercier TJ, Ryder RT, Brownfield ME, and Self JG (2011) Assessment of in-place oil shale resources of the Eocene Green River Formation, Greater Green River Basin, Wyoming, Colorado, and Utah. USGS Digital Data Series DDS-69-DD, 68p. Jury, W., Horton, R., 2004. Soil physics. Wiley, New York, 384p. Jury, W.A., Focht, D.D., Farmer, W.J., 1987. Evaluation of pesticide ground water pollution from standard indices of soil-chemical adsorption and biodegradation. Journal of Environmental Quality 16 (4), 422–428. Koschmann AH and Bergendahl MH (1968) Principal gold-producing areas of the United States. Geological Survey Professional Paper, 610p. Li H, Luo H, Chen G, Zhou W, and Huang SL (2005) Visualized geostatistical analysis of Nome offshore gold placer deposit using ArcGISdA case study. In: The Proceedings of 2005 Society for Mining, Metallurgy, and Exploration (SME) Annual Meeting, 28 February– 2 March 2005. Salt Lake City: Society of Mining Metallurgy and Exploration. Litton, G., 1987. Introduction to database management: A practical approach. William C Brown, Dubuque, p. 532. Luo H, Li H, Chen G, Zhou W and Huang SL (2005) Application of web GIS for Nome Alaska offshore mineral resource management and utilization. In: The Proceedings of 2005 Society for Mining, Metallurgy, and Exploration (SME) Annual Meeting, 28 February–2 March 2005, Salt Lake City. Maidment, D.R., 2002. ArcHydrodGIS for water resources, 1. ERSI Press, Redlands. 208p. Mattson ED, Hull L, and Cafferty K (2012) Water usage for in-situ oil shale retorting – a system dynamic model. Idaho National Laboratory Report, Report No. INL/EXT-1227365. McCray, J.E., Kirkland, S.L., Siegrist, R.L., Thyne, G.D., 2005. Model parameters for simulating fate and transport of on-site wastewater nutrients. Groundwater 43 (4), 628–639. McCray, J.E., Geza, M., Lowe, K., Tucholke, M., Wunsch, A., Roberts, S., Drewes, J., Amador, J., Atoyan, J., Kalen, D., Loomis, G., Boving, T., Radcliffe, D., 2010. Quantitative tools to determine the expected performance of wastewater soil treatment units: Guidance manual. Water Environment Research Foundation, Alexandria, 198p. Mercier TJ (2011) Calculation of overburden above the LaClede ed of the Laney Member of the Eocene Green River Formation, Green River and Washakie Basins, Southwestern Wyoming. USGS Digital Data Series DDS-69-DD, 10p. Mercier TJ, Brownfield ME, Johnson RC, and Self JG (2009) Fischer assays of oil shale drill cores and rotary cuttings from the Piceance Basin, Colorado-2009. Update p. 16, USGS Open-File Report 98–483 Version2.

186

GIS for Natural Resources (Mineral, Energy, and Water)

Mercier TJ, Brownfield ME, and Johnson RC (2011a) Methodology for calculating oil shale resources for the Green River and Washakie Basins, Southwestern Wyoming. USGS Digital Data Series DDS-69-DD, 52p. Mercier TJ, Gunther GL, and Skinner CC (2011b) The GIS project for the geologic assessment of in-place oil shale resources of the Eocene Green River Formation, Greater Green River Basin, Wyoming, Colorado, and Utah. USGS Digital Data Series DDS-69-DD, 5p. Naiman, R.J., Bisson, P.A., Lee, R.G., Turner, M.G., 1997. Approaches to management at the watershed scale. In: Kohm, Kathryn A., Franklin, Jerry F. (Eds.), Creating a forestry for the 21st century: The science of ecosystem management. Island Press, Washington, DC, pp. 239–253. National Research Council, 1999. New strategies for America’s watersheds. National Academy Press, Washington, DC, 328p. Navathe, S.B., Elmasri, R., 2002. Fundamentals of database systems, 3. Addison Wesley and Longman, Harlow. 1000p. Nelson CH and Hopkins DM (1972) Sedimentary processes and distribution of particulate gold in the Northern Bering Sea. Geological Survey Professional Paper 689. NETL (National Energy Technology Laboratory, U.S. Department of Energy). (2007). Oil shale environmental issues and needs workshop report, 61p. Price MJ (2001) Geographic information systems and industrial minerals, preprint 01–116. Society of Mining Engineers. Ramesh Reddy, K., Delaune, R.D., 2008. Biochemistry of wetlands: Science and application. CRC Press, Boca Baton, p. 800. Rao, P.S.C., Hornsby, A.G., Jessup, R.E., 1985. Indices for ranking the potential for pesticide contamination of groundwater. Soil Crop Science Society Florida Proceedings 44, 1–8. Rawls, W.J., Brakensiek, D.L., Saxton, K.E., 1982. Estimation of soil water properties. Transaction of American Society of Agriculture Engineering 25 (5), 1316–1320. Rivett, M.O., Buss, S.R., Morgan, P., Smith, J.W., Bemment, C.D., 2008. Nitrate attenuation in groundwater: A review of biogeochemical controlling processes. Water Research 42 (16), 4215–4232. Schlosser, S.A., McCray, J.E., Murray, K.E., Austin, B., 2002. A subregional-scale method to assess aquifer vulnerability to pesticides. Ground Water 40 (4), 361–367. Self JG, Johnson RC, Brownfield ME, and Mercier TJ (2010) Simplified stratigraphic cross sections of the Eocene Green River Formation in the Piceance Basin, Northwestern Colorado. USGS Digital Data Series DDS–69–Y, Chapter 5, p. 7. Self JG, Ryder RT, Johnson RC, Brownfield ME, and Mercier TJ (2011) Stratigraphic cross sections of the Eocene Green River Formation in the Green River Basin, Southwestern Wyoming, Northwestern Colorado, and Northeastern Utah. USGS Digital Data Series DDS-69-DD, 11p. Tagg AR and Greene HG (1973) High-resolution seismic survey of an offshore area near Nome, AK. Geological Survey Professional Paper 759-A. USGS (2005) Geology and resources of some world oil-shale deposits. USGS Scientific Investigations Report 2005–5294. USGS (2007) Geographic information systems, access date: February 22, 2007, http://erg.usgs.gov/isb//pubs/gis_poster/. Wilson, J.P., Mitasova, H., Wright, D.J., 2000. Water resource applications of Geographic Information Systems. Urban and Regional Information System Association Journal 12 (2), 61–79. Wing, M.G., Bettinger, P., 2008. Geographic information systems: Applications in natural resource management. Oxford University Press, Oxford, 272p. Witheetrirong, Y., Tripathi, N.K., Tipdecho, T., Parkpian, P., 2011. Estimation of the effect of soil texture on nitrate-nitrogen content in groundwater using optical remote sensing. International Journal of Environmental Research and Public Health 8 (8), 3416–3436. Wood, T., Dammer, A., Wilson, C., Parker, J., Skaggs, R., Stovall, T., 2008. An overview of the water management cross-cut plan for commercialization of America’s unconventional fuels. In: Boak, J., Whitehead, H. (Eds.)Proceedings of the 27th Oil Shale Symposium, Colorado Energy Research Institute Document 2008–1, Colorado School of Mines. Golden CO USA (CD-ROM). Youssef MA (2003) Modeling nitrogen transport and transformations in high water table soils. Doctoral dissertation, North Carolina State University, Raleigh. Zhou, W., 2009. An outlook of GIS applications in mineral resource estimation. In: Corral, Melanie D., Earle, Jared L. (Eds.), Gold mining: Formation and resource estimation, economics and environmental impact. Nova Science Publishers Inc., New York, pp. 33–62. Zhou, W., Chen, G., Li, H., Luo, H., Huang, S.L., 2007. GIS application in mineral resource analysisdA case study of offshore marine placer gold at Nome, Alaska. Computers and Geosciences 33, 773–788. Zhou W, Minnick MD, Geza M, Murray KE, and Mattson ED (2012) GIS- and web-based water resources geospatial infrastructure for oil shale development. The Final Project Report to United States Department of Energy, National Energy Technology Laboratory, December 29, 2012, Report No. DOE/NT0006554-1, 143p. Zhou, W., Minnick, M.D., Mattson, E.D., Geza, M., Murray, K.E., 2015. GIS-based geospatial infrastructure of water resource assessment for supporting of oil shale development in Piceance Basin of northwestern Colorado. Computers and Geosciences 77, 44–53.

2.09

GIS for Urban Energy Analysis

Chaosu Li, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States © 2018 Elsevier Inc. All rights reserved.

2.09.1 2.09.2 2.09.3 2.09.3.1 2.09.3.2 2.09.4 2.09.4.1 2.09.4.2 2.09.4.3 2.09.5 References

2.09.1

Introduction Literature Review GIS Use in Urban Energy Analysis and Planning Suitability Analyses Analysis/Prediction of Energy Usage and Emissions Tools and Models Renewable Energy Potential Analysis Tools Urban Building Energy Consumption/Microclimate Tools and Models Energy Consumption/Emission Assessment Tools and Models Conclusions

187 187 188 188 189 190 190 190 193 193 194

Introduction

The relationship between energy consumption and space in cities has somehow been overlooked to date. The basic use of geographic information systems (GIS) within urban energy analysis and planning allows for the capture, storage, and display of in-depth information. When cleaned and layered, this information can be used to illustrate, analyze and draw conclusions about important urban features such as energy usage, capacity, and potential. The flexibility of GIS and the accompanying extensions allow for a wide variety of analysis. Although use of GIS in urban energy analysis beyond the basic level is still in an early stage of development, the work that has been done to date shows that, because the spatial dimensions of energy systems dramatically affect their cost efficiency and environmental impacts, GIS holds tremendous promise for boosting the analytical power of energy planning and modeling techniques. In particular, GIS has proven its utility in assessing renewable energy generation potential, constructing and maintaining efficient energy distribution systems, and understanding the spatial dimensions of energy consumption and heat transfer within an urban landscape (Resch et al., 2014). By enhancing our understanding of how energy flows, GIS helps with designing and constructing sustainable buildings that use less energy and to appropriately respond to interactions between built and natural features of the urban landscape. This article presents an overview of various approaches to urban energy analysis that rely upon GIS. It is organized into four main sections. First, existing studies regarding GIS use in urban energy analysis are critically reviewed. Secondly, a few contractions of potential GIS applications of GIS analysis in urban contexts are summarized. Thirdly, based on these main concentrations, tools and models with regard to urban energy analysis are summarized and analyzed. Finally, I discuss the review results and make conclusions.

2.09.2

Literature Review

Because of the advanced spatial analysis ability of GIS and several third-party extensions available to users, many studies have been conducted using GIS to analyze the potential for renewable energy production in and around urban centers. As far back as 1998, a study used GIS to consolidate data on wind, topography, and urban areas on the island Crete in Greece. This data was used to evaluate the available land and potential energy production regarding the economic impact of the island’s renewable potential (Voivontas et al., 1998). Similarly focused on wind energy potential but using different methods, an environmental assessment of Western Turkey used GIS to create a raster style data set and associated environmental factors with each grid. A satisfaction score was then calculated based on the fulfillment of factors, and a map was created to show the best possible wind farm locations (Aydin et al., 2010). Some focused on solar energy by creating a relational database that considered baseline energy consumption and projected energy savings on domestic housing, clustered and extracted from different urban maps using GIS (Rylatt et al., 2001) or by plotting the potential yields of solar hot water systems in cities, using calculated thermal performance for water heating systems as a basis to predict long-term energy savings and carbon dioxide reductions (Gadsden et al., 2003). Amado and Poggi (2014) used both energy consumption and potential energy production data to determine the electricity offset potential of the photovoltaic panel on rooftops. This study along with many others is helping to expand the usage and interpretations available of GIS and urban energy analysis programs. The integrated use of Excel, ArcGIS, Rhinoceros, Grasshopper, and EcoTect in this study demonstrates the need for more integrated database management and interpretation software. As the need for more urban energy analysis expands, the creation of extensions and increase of functionality of programs will expand with it. Still, others have used GIS to analyze a more wide-sweeping range of renewable and sustainable development. One such study created a constrained cellular automata raster

187

188

GIS for Urban Energy Analysis

model with GIS to consider different factors such as urban forms, environmental standards, and land and water consumption in the planning and construction of a city to develop sustainable development principles (Yeh and Li, 2001). Another form of GIS research focuses on energy consumption in an urban setting. Ratti et al. (2005) used raster models of cities to analyze building energy consumption and digital elevation models to create energy simulations of urban areas such as London and Berlin. Identifying the spatial variance of urban energy consumption is necessary to restructure energy systems to improve emissions. As a data management system that supports spatial data, ArcGIS is an obvious resource for planners looking to understand the patterns of urban energy consumption. In a study on the distribution of urban energy consumption in China, researchers used ArcGIS to represent urban energy consumption indicators in spatial form (Zhang et al., 2010). An additional study analyses energy consumption and carbon dioxide emissions on an urban scale, breaking the United States into counties and examining the urban, rural distribution of per-capita consumption (Parshall et al., 2010). In addition, Howard et al. (2012) created a model that estimates the energy end-use intensity of urban buildings in New York City and arranges them according to the building’s primary use. This allows the examination of every zoned lot within the New York City area to have an estimated energy consumption level. Beyond papers published in academic journals, various other sources have used GIS to create planning tools and models. One GIS-based model was developed at the Georgia Institute of Technology in order to properly gauge how much solar energy can be produced by a given city and what proportion of energy for a given building can be supplied by solar power (Quan et al., 2015a). The process of modeling these scenarios involved creating four models using the ModelBuilder tool in ArcToolbox, a component of ArcGIS, as well as employing Python scripts. The data integration model takes urban scale data and refines it to add to the next two models, the urban building energy model (UBEM) and the urban roof solar energy model. The latter of these two used a raster dataset of a 3D urban environment model to drive the “Area Solar Radiation Tool” included in ArcGIS (Quan et al., 2015a). The results of these two models are finally fed to the energy balance model in order to see how much solar energy is produced both by individual buildings and the city as a whole (Quan et al., 2015a). Quan et al. (2015b) also developed a GIS-based urban building energy modeling system by using the Urban-EPC simulation engine, which is urban context sensitive, urban data driven, planningoriented, resolution controlled, and is compatible with other planning tools. GIS can also be used to generate three-dimensional models to track energy usage and demand in a given urban area. Researchers at HFT Stuttgart in Germany developed a model to map out the demand for heating in each building in the city of Ludwigsberg, which has approximately 14,000 buildings (Nouvel et al., 2014). The first model used was called CityGML and served as a basis for three-dimensional geospatial analysis and visualization. Data to generate buildings in the model was collected from the ALKIS database (which contains building codes and other information) and population census data. To measure the heating demand for each building, researchers used a quasistatic energy balance algorithm which is measured using monthly mean irradiances per façade orientation and monthly mean bulb temperature (Nouvel et al., 2014). Using these models and data, a three-dimensional map of Ludwigsberg was generated in ArcScene, with a color-coded scale indicating which buildings had the highest demand for heating. The distribution of demand also exhibited a normal trend when plotted on a histogram. This example again shows the relevance of GIS in mapping the need for energy in a particular urban area. While the previous examples used GIS to perform a one-time analysis of a given facet of urban energy analysis, GIS can also be employed in creating and updating an integrated model on a daily basis. Members of the Sustainable Design Lab at the Massachusetts Institute of Technology worked in conjunction with the City of Boston to create models tracking urban building energy demand that could be continuously updated and modified to adjust to demand. These UBEM used GIS shapefiles of the city for geometric input in order to properly map out energy demand in each area of the city. These shapefiles included tax parcels, building footprints and property tax records. The property tax record data was stored as points, and the building footprint and tax parcel data were stored as polygons. Once these shapefiles were mapped, the buildings were segmented and categorized by the year they were built as well as their type of usage. These categories give users an indication of which buildings have the most successfully integrated energy systems and, and as a result, which buildings will demand more energy. Next, thermal data is imported and applied to each building and tax parcel to see which areas of Boston have the highest energy demand. The UBEM tool is then continuously updated to reflect the changes at a given time of year. The results indicated that buildings used in commercial industries required the most energy in the summer (electricity) and residential buildings required the most energy in the winter (heating fuel). Using GIS to continuously update the UBEM models allows for quick analysis and a clear presentation of results.

2.09.3

GIS Use in Urban Energy Analysis and Planning

As is indicated in the previous literature review section, GIS has a range of applications in urban energy analysis and planning. Based on the literature review, potential applications that will be discussed in this section involve the use of GIS in suitability analyses and in analyzing energy usage and emissions from existing energy systems in urban contexts.

2.09.3.1

Suitability Analyses

ArcGIS and other GIS programs are often used to conduct site suitability analyses for energy systems, in both urban and nonurban contexts. In the context of urban energy analyses, there are several examples of using ArcGIS in academic and nonacademic publications. In particular, suitability analyses have been used to determine the placement of energy-generating facilities, including

GIS for Urban Energy Analysis

189

wind farms, solar farms, rooftop photovoltaic cells, nuclear facilities, and power plants (Belles et al., 2013; Charabi and Gastli, 2011; Idris and Latif, 2012; Miller and Li, 2014; Rylatt et al., 2001; Voivontas et al., 2001). In ArcGIS, rooftop photovoltaic cell placement represents a common example of the use of GIS in suitability analyses. The general workflow for such an analysis is to (1) identify significant criteria for placement (i.e., angle of the ground), (2) generate layers/masks representing these criteria (i.e., aspect or elevation map), and (3) determine areas that meet all or most of these criteria, often done using overlay tools to create an output raster that classifies the range of suitability of the area of interest. In the context of urban solar panel placement, common criteria used are elevation, aspect, slope, and average radiation received. Using the Solar Radiation Geoprocessing tools, especially the Area Solar Radiation tools, and the Surface tools within the Spatial Analyst Toolbox in ArcGIS, solar radiation, aspect, and slope maps can be generated for use from a digital surface or digital elevation model. Additionally, the Spatial Analyst tools allow the application of a fuzzy logic framework or weighted overlays to create output that displays the distribution of suitability throughout an area based on the input criteria. Notably, input masks must be in raster format for use in the Overlay tools in ArcGIS. Academic articles detailing this process have been published for urban energy development in a range of cities, including Alexandria, Egypt, Thessaloniki, Greece, and Bari, Italy (Hassaan, 2015; Karteris et al., 2013; Morano et al., 2015). The output suitability raster facilitates decision making by displaying suitable locations for energy facility placement. While ArcGIS and its Spatial Analyst Extension are commonly used to perform site suitability analyses, ESRI also produces a webbased application called GeoPlanner for ArcGIS, which is specialized for site suitability analyses, among other functions commonly used in urban planning. Specifically, GeoPlanner for ArcGIS contains a service known as Modeler, which takes input raster maps of particular criteria of interest and outputs an overlay raster displaying the distribution of suitability across a geographic area. Specific for solar panel placement, the web-based GIS program Smart City Energy (iguess.tudor.lu) can also be used for determining suitable rooftop locations in urban centers. According to Smart City Energy, this module requires vector data representing building parcels, tables containing economic parameters (e.g., cost of panel installation, potential cost savings), and raster maps of rooftop solar irradiation. The output is a table detailing the rooftop photovoltaic potential for the rooftop patches defined in your input vector. Similar functions can be found in the domestic energy, carbon counting and carbon reduction model (DECoRuM), which will be discussed in greater detail in next section of this article (Gupta, 2009). GRASS, an open-source GIS, also has a program for calculating solar power potential, called r.sun, which takes the same input data forms as ArcGIS to produce an output raster-based map of the potential solar power yield over a given area (Nguyen and Pearce, 2010). This open-source software has been used to determine the photovoltaic potential of several cities, including Bardejov, Slovakia, and Lisbon, Portugal (Hofierka and Ka nuk, 2009; Redweik et al., 2013). These programs facilitate problem-solving by generating easy-to-use visual output that displays suitable areas to build energy-generating systems within a given area. While common examples derive from urban rooftop solar panel placement, GIS have also been used for power plant site selection in urban settings (Hassaan, 2015).

2.09.3.2

Analysis/Prediction of Energy Usage and Emissions

Once energy systems are in place, GIS are often used to examine consumption and associated emissions within a given area. With respect to usage analyses, GIS have been applied to achieve several different kinds of results. First, GIS are used for visualization of consumption data, down to the parcel or building-type level. Mapped consumption data has been used to identify buildings within cities that represent targets of retrofitting interventions to foster more sustainable energy usage via hotspot analysis (Balta, 2014; Mattinen et al., 2014). Other energy consumption analyses have focused on predicting energy consumption and modeling the potential economic and environmental benefits of adopting renewable energy and sustainable practices in urban centers (Reiter and Wallemacq, 2011; Resch et al., 2014). Related to energy usage, GIS are also often used to model atmospheric emissions resulting from energy consumption (Guttikunda and Calori, 2013). GIS may or may not be used for consumption analysis and prediction, but are often used for the storage, retrieval, and visualization of such data. As mentioned previously, GIS, including ArcGIS, can be used for visualization, storage, and retrieval of energy usage and emission data. GIS may or may not be used to conduct the statistical analysis, although examples exist, as was done in Texas and China to predict household energy usage (Valenzuela et al., 2014; Xie et al., 2014). In ArcGIS, regression models can be created to predict energy consumption using the regression tools provided in the Spatial Statistics Toolbox in ArcGIS, which takes input vector data and its associated attributes and creates feature classes containing model diagnostics and predictions. However, many dedicated energy models (GIS and non-GIS-based) have also been developed and statistical analyses may be conducted using more complicated statistical software, such as R (Aksoezen et al, 2015; Fumo and Biswas, 2015; Mastrucci et al., 2014). Alternative GIS platforms that can be used to analyze energy consumption include the web-based Smart City Energy and the energy and environmental prediction model, which take input building characteristics and energy usage to predict energy consumption at a variety of levels, including the individual unit, district, or city level. These platforms are also designed to store and visualize data and to model the potential benefits of implementing renewable energy systems in urban centers (Jones et al., 2000). There are also models that are currently being developed, including Energy Atlas Berlin, MEU Project, and CityGML, which are designed to facilitate both the analysis/visualization of urban energy consumption and the modeling of renewable energy potentials (Capezzali and Cherix, 2012; Krüger and Kolbe, 2012). These web-based platforms require the construction of 3D city models as well as geodatabases of energy consumption determinants. It is unclear what specific data formats are required for using these tools as they are still in the testing phases. Nevertheless, they are designed to allow urban planners to visualize current and predict future energy usage throughout a city and test alternative scenarios to identify solutions and reduce consumption. These tools are important to foster the construction of more sustainable cities and to identify problem areas in which mitigation strategies may be carried out.

190

GIS for Urban Energy Analysis

In the context of energy emissions, ArcGIS can be used to calculate predicted energy emissions based on a bottom-up approach where energy emissions from different sources are calculated based on known characteristics (i.e., number and type of power plants) of each cell of a raster map for an urban area (Behera et al., 2011). However, other GIS applications are available. One potential model, with greater spatial resolution, is known as the DECoRuM, which is a GIS-based platform designed to assess greenhouse gas emissions in urban areas and model the reduction in emissions that can be achieved using interventions at the unit, street, district or city level. It requires feature class input of parcels and their associated building attributes and energy consumption in order to create thematic maps to display this information at a dwelling level. It also incorporates existing statistical energy consumption models to perform analyses that use information deriving from the input data. Filtering criteria can also be used alongside the constructed maps in order to identify homes that are suitable for greenhouse gas emission reduction measures. To date, this model has only been applied to the city of Oxford, England, but can theoretically be constructed for any city that is invested in reducing its carbon emissions through building-level interventions (Gupta, 2009). These GIS applications empower problem-solving within the context of urban planning by giving planners the tools to visualize energy use, greenhouse gas emissions, and the effects of mitigation strategies at various levels, including the building, street, district, or city level. This ensures that decision makers have an understanding of how new and existing buildings will impact energy use and greenhouse gas emissions and allows smarter decision making based on these factors. Another key component of urban energy usage is the reduction of consumption. As cities and urban areas grow, it is vital that consumption does not get so out of control that energy demand becomes higher than energy supply. Researchers at the University of Liege in Belgium sought a way to reduce energy usage per building as well as in the transport sector. GIS tools were used to assess the current energy usage in Liege in order to understand which areas could be altered to be more efficient regarding consumption. Through the use of urban-based GIS, researchers developed an energy model of residential building stock and spatialized the major components of the city of Liege (Reiter and Wallemacq, 2011). The areas of the city were divided into urban blocks and then joined to the cadastre (a property registry comprising financial and tax information) using a spatial join in ArcMap. Once the data was properly processed, a model was built to chart energy performance. Demographic data as the census block data was charted from 2000 to 2008 to examine which areas were growing the fastest and as a result had the highest energy consumption. Several scenarios were created to determine which would be the most effective way to reduce energy usage in the building and transport sectors. Ultimately it was determined that the most effective way to reduce energy would be to revitalize older systems in these sectors as opposed to creating newer buildings and transport routes. This study, as well as the previously mentioned applications of GIS, shows how the many varieties of GIS tools and software available may be successfully integrated to tackle important problems of urban energy distribution, demand, and consumption reduction, among many others.

2.09.4

Tools and Models

GIS have a range of applications in the context of urban energy analysis. This section will discuss the variety of GIS tools and GIS-integrated models that may be used to implement these analyses. The basic information, input data, output, spatial resolution/scale and potential applications of these tools and models are summarized in Table 1.

2.09.4.1

Renewable Energy Potential Analysis Tools

One of the most common types of energy analysis conducted in ArcGIS is evaluation of potential for renewable energy generation. ArcGIS is capable of modeling incoming solar radiation for an entire landscape or a particular point, based on an input digital elevation model raster and optional data inputs such as time period/interval, transmittivity, etc. As part of this modeling process, ArcGIS also models upward-looking viewsheds from specified locations in order to identify areas of shading (and therefore obstruction of solar radiation) (ESRI, 2016). The collection of Solar Radiation tools in ArcGIS has been used by many researchers to identify potential sites for rooftop solar photovoltaic array installation in cities (ESRI, 2009a, b; Freitas et al., 2015). An important question to resolve when it comes to solar energy is how much total solar energy can be produced in a city and what portion of building energy can it supply. One study attended to this by developing a GIS-based energy balance modeling system for urban solar buildings (Quan et al., 2015a, b). The created models used urban scale data to provide input for simulations, and all of the models run on GIS platform with the ModelBuilder Toolboxes and Arcpy Codes. ArcGIS also has the capability, via a Desktop extension called Airflow Analyst for ArcGIS, to model air movement through urban landscapes, which can in turn help to identify potential sites for rooftop wind turbines. This extension makes it easier for people to process wind energy data in a GIS environment; it can model data, perform complex calculations; and generate animated or temporal representations (ESRI, 2009a). In addition to generating statistics, the model produces three-dimensional surfaces and animations depicting wind flow. Required inputs include raster or TIN elevation data, and shapefiles with height and building shape attributes (Airflow Analyst, 2016; ESRI, 2009a).

2.09.4.2

Urban Building Energy Consumption/Microclimate Tools and Models

GIS-based urban microclimate models can also contribute to urban energy analysis since several studies have demonstrated that urban microclimate affects building energy consumption. For example, in response to heat waves, the overall energy consumption

Table 1

GIS tools/models for urban energy analysis

Application

Input data

Renewable energy potential analysis ArcGIS Spatial Raster maps representing the distribution of Analyst criteria over a given area Extension GeoPlanner for Raster maps representing the distribution ArcGIS of criteria over a given area Solar Radiation tools in ArcGIS Airflow Analyst for ArcGISa Smart City Energy Platform

Digital elevation model of the study area with detailed elevation information of building shape and topographies Terrain (raster or TINb) data and building shape data Vectors of building polygons, tables of building features, raster maps of rooftop solar irradiance

DECoRuM

3D city model

Spatial resolution/scale

Applications

Raster maps demonstrating the suitability of a geographic area

If using high-resolution input, can get house-level suitability If using a high-resolution digital surface model, can get house-level output Depends on the input raster data

Any power-generating facility

Raster maps demonstrating the suitability of a geographic area Raster maps demonstrating solar radiation of the study area Three-dimensional mesh (also known as a computational mesh) Rooftop photovoltaic potential is added to attribute tables for the input building polygons Roofs with enough area to fit solar panels

Depends on the input data House-level suitability

Any power-generating facility Urban solar energy potential Urban wind energy potential Urban solar energy potential

House-level suitability

Urban solar energy potential

Disaggregated predicted energy consumption

Depends on the input vector polygon layerdcan be done at the unit, block, or zip-code level

Urban building energy consumption

Comprehensive simulation results of factors that affect thermal comfort, such that they can be integrated to develop indicators

Neighborhood level

Surface–plant–air interactions

(Continued)

GIS for Urban Energy Analysis

Urban building energy consumption and microclimate analysis Smart Energy City Aggregated energy consumption, vector Platform point layers of buildings, tables of housing characteristics for each address, and a vector polygon layer representing the level at which you wish results to be shown ENVI-met Vegetation and soil type, buildings material, building height

Output

191

192

GIS tools/models for urban energy analysisdcont'd

Application

Input data

Output

Spatial resolution/scale

Applications

Townscope

Three-dimensional landscape renderings, meteorological profiles, foliage density, latitude, and altitude Building and vegetation properties, meteorological data (wind/cloud parameters)

Simulation results of solar access and thermal comfort

Neighborhood level

Solar access and thermal comfort

Radiation flux densities, sunshine duration, shadow spaces, and thermo-physiologically relevant assessment indices Parameters including air temperature, exhaustive heat from air-conditioning systems, etc.

Neighborhood level

Solar radiation and mean radiant temperature

Neighborhood level

Energy heat balance within the urban canopy structure

Uses stored Google Maps imagery with thermal infrared energy data overlaid to provide information on annual home energy usage, energy costs, and GHG emissions Feature classes with predicted total domestic GHGc emissions

Community and home level

Energy efficiency upgrades

Individual unit or regional level

Urban building energy consumption and GHG emissions

GHG emissions, fuel use, and energy costs in residential buildings, map of house-by-house results Building energy demand and GHG emissions, solar radiation and shading, energy balance for particular surfaces

Depends on the level specifieddcan be by the unit, street, or district level Building or neighborhood level

Urban building energy consumption and GHG emissions

RayMan

AUSSSM

Building design, meteorological, and HVAC systems data

Urban energy consumption and greenhouse gas emissions analysis HEAT Roof material, fuel type, and preferred temperature ranges

Energy and Environmental Prediction Platform DECoRuM GOSOL

a

3D city model, property age, and building characteristics Building materials, building age, energy usage survey data, and thermal imaging Building type and proportions, site design, topography, and vegetation

Airflow Analyst for ArcGIS is a third-party extension for ArcGIS. TIN here refers to triangulated irregular network. c GHG here refers to greenhouse gas. b

Urban building energy consumption and GHG emissions

GIS for Urban Energy Analysis

Table 1

GIS for Urban Energy Analysis

193

for cooling (i.e., refrigeration and air-conditioning) also increases. Therefore, the assessment of urban microclimate is necessary for researchers to identify and create targets to reduce energy consumption in buildings. Several urban energy models exist in the practice of determining and simulating the urban microclimate that can be used alongside ArcGIS. ENVI-met is a three-dimensional microclimate model that simulates surface-plant-air interactions in the urban environment. The required inputs are detailed data pertaining to meteorological profiles, soil wetness, as well as properties of ground surfaces, buildings, and vegetation. The model calculates air movement patterns and particle dispersion as well as temperature interactions among the natural and built features of an urban landscape (Yang et al., 2013). Similar to ENVI-met, Townscope conducts threedimensional modeling of solar radiation and temperatures in urban spaces by using meteorological parameters and vegetation masks (Freitas et al., 2015; TownScope, 2013). The RayMan is another model designated towards analyzing the urban microclimate. Its data inputs are meteorological information (e.g., air temperature, humidity, air transparency, cloud cover), the time of day and time of year, the albedo (reflectivity) and angles of landscape features, and the spatial relationships of landscape features (necessary for modeling shading) (Matzarakis and Frohlich, 2009). Finally, the AUSSSM Tool models urban microclimates; its outputs include energy heat balance, waste-heat produced by buildings, and air temperature. It utilizes a large number of data inputs relating to urban and building design (e.g., building proportions and number of floors, building materials, building reflectivity, ground cover type and soil condition, heat generation and transfer by buildings, roof material or cover type); HVAC systems (system type, exhaust pipe altitudes); and meteorology (e.g., solar radiation, regional air temperature, humidity, wind velocity) (Tanimoto et al., 2003).

2.09.4.3

Energy Consumption/Emission Assessment Tools and Models

With regard to energy consumption and emission assessment, a large number of researchers have been more concerned with modeling their data towards residents. One typical GIS-based software is Home Energy Assessment Technologies (HEAT), a free online, GeoWeb mapping service designed to assist residents of Canadian cities in assessing the thermal efficiency of their dwelling. HEAT uses recorded thermal infrared energy data and cadastral datasets within Geographic Object-Based Image Analysis (GEOBIA) software, which generates three maps at the residential, community, and home level (Hay, Hemachandran, & Kyle, 2010). The most noteworthy feature of HEAT is that the model displays its data within Google Maps, making it easy for residents to navigate to their abode. HEAT’s data is timely, in-depth, easy-to-use, and location-specific. Using the data, planners and contractors could identify communities for marketing to and incentivizing energy efficiency upgrades. DECoRuM is a GIS-based toolkit that assesses domestic-energy greenhouse gas emissions baseline energy use at the individual home level, identifies carbon emission hotspots, predicts the potential for reduction, and monitors reduction achievements as a result of deployed energy efficiency measures and renewable energy systems (Droege, 2011). One benefit of using DECoRuM is that it requires less data input to estimate the baseline energy use and greenhouse emissions than its predecessor, the BREDEM-12 model (Droege, 2011). Required data includes primary data for every dwelling and output for individual dwellings. DECoRuM is a useful tool for urban planners to develop carbon emission reduction plans, as it can help them to establish the baseline energy, the potential for reductions, and monitor energy efficiency improvements. DECoRuM is an award-winning model, so it is clearly widely respected as a tool for carbon emission and reduction. The data used in Gupta’s study (2009) was locally relevant and simplified to make it easier to collect input data and reuse in the future. The findings from the study corresponded to other national findings, suggesting that the model uses accurate calculations. The filtering criteria for identifying dwellings that are suitable for installing energy saving measures is timesaving and useful for planners to identify target areas. It provides a useful visual aid in the form of thematic maps, for use when encouraging householders to install energy efficiency measures or presenting to developers. Finally, the cost-benefit analysis enables the cost comparison of different measures, again making it a useful resource for planners. GOSOL is a German urban environmental analysis model that evaluates solar radiation and shading, greenhouse gas emissions, energy balance for particular surfaces, and building energy demand. Required inputs include building type and proportions, site design, topography, and vegetation (Energy City, 2011; Freitas et al., 2015). Lastly, Autodesk’s Ecotect Analysis and Green Building Studio model energy use (including heating and cooling loads), carbon dioxide emissions, and solar radiation (including daylight illuminance and shadow analysis). The software is designed to support architects in designing sustainable buildings and retrofitting older buildings to improve performance. Users input data relating to the thermal zone of the area, building type, projected numbers of occupants, and various other design parameters of the building and the surrounding landscape features (Cadpoint, 2012). These two models are not GIS-based, although their outputs can be integrated into GIS software for further analysis.

2.09.5

Conclusions

This article provides an overview of potential GIS applications in the context of urban energy analysis. GIS applications for site suitability analyses and for analyzing usage and emissions that are associated with existing urban energy systems have been covered. In particular, while ArcGIS can be used for many of these analyses, there are other notable GIS-based platforms, including SmartCityEnergy, the energy and environmental prediction model, the HEAT, and the DECoRuM, which are designed to enable

194

GIS for Urban Energy Analysis

city planners to model energy consumption and emission reductions associated with various mitigation strategies, including increased renewable energy usage within cities. This article also provides a holistic review of existing GIS models and tools for urban energy analysis. Based on the review results, it is evident that there have already been various GIS-based tools and models for urban energy analysis. Three important categories of models/tools have been identified and analyzed, including (1) renewable energy potential analysis tools; (2) urban building energy consumption/microclimate tools and models; (3) energy consumption/emission assessment tools and models. It is worthwhile to note that the existing renewable energy potential analysis GIS tools and models are in general simple. In order to analyze the renewable energy potential in detail in detailed urban morphologies, more sophisticated models that employ technologies such as remote sensing (RS) and light detection and ranging (LIDAR) with huge local three-dimensional data extraction and management should be further developed. It is also worthwhile to note that current renewable energy potential tools mainly focus on solar and wind energy. Further, the current urban energy consumption and greenhouse gas emissions models are limited to a single spatial scale, such as neighborhood or building scale. Modeling urban energy consumption based on GIS at multiple scales is a promising area to explore. Finally, there is great potential to integrate GIS into current urban energy system models to support evidence-based urban energy infrastructure planning, especially in the context of transiting from centralized generating systems to more smart, decentralized systems.

References Airflow Analyst (2016) Features. Retrieved from http://airflowanalyst.com/en/function.html. Aksoezen, M., Daniel, M., Hassler, U., Kohler, N., 2015. Building age as an indicator for energy consumption. Energy and Buildings 87, 74–86. Amado, M., Poggi, F., 2014. Solar urban planning: A parametric approach. Energy Procedia 48, 1539–1548. Aydin, N.Y., Kentel, E., Duzgun, S., 2010. GIS-based environmental assessment of wind energy systems for spatial planning: A case study from Western Turkey. Renewable and Sustainable Energy Reviews 14 (1), 364–373. Balta, C., 2014. GIS-based energy consumption mapping (master’s thesis). The University of Edinburgh. Retrieved. https://www.era.lib.ed.ac.uk/handle/1842/10341. Behera, S.N., Sharma, M., Dikshit, O., Shukla, S.P., 2011. Development of GIS-aided emission inventory of air pollutants for an urban environment. INTECH Open Access Publisher, Rijeka, Croatia. Belles R, Copinger DA, Mays GT, Omitaomu OA, and Poore III WP (2013) Evaluation of suitability of selected set of coal plant sites for repowering with small modular reactors. ORNL/TM-2013/109. Oak Ridge National Laboratory (ORNL). Cadpoint (2012) Autodesk Ecotect Analysis 2011. Retrieved from http://www.cadpoint.co.uk/ecotectanalysis/. Capezzali M and Cherix G (2012) MEUdA cartographic-based web-platform for urban energy management and planning. ESRI World Summit, Palm Spring. Charabi, Y., Gastli, A., 2011. PV site suitability analysis using GIS-based spatial fuzzy multi-criteria evaluation. Renewable Energy 36 (9), 2554–2561. Droege, P. (Ed.), 2011. Urban energy transition: From fossil fuels to renewable power. Elsevier, Oxford. Energy City (2011) Existing urban energy models. Retrieved from www.energycity2013.eu/pages/results/data-catalogue/existing-urban-energy-models.php. ESRI (2009a) GIS to meet renewable energy goals. Retrieved from http://www.esri.com/news/arcnews/fall09articles/gis-to-meet.html. ESRI (2009b) Mapping the solar potential of rooftops. Retrieved from http://www.esri.com/news/arcnews/fall09articles/mapping-the-solar.html. ESRI (2016) An overview of the solar radiation tools. Retrieved from http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-analyst-toolbox/an-overview-of-the-solar-radiationtools.htm. Freitas, S., Catita, C., Redweik, P., Brito, M.C., 2015. Modelling solar potential in the urban environment: State-of-the-art review. Renewable and Sustainable Energy Reviews 41, 915–931. Fumo, N., Biswas, M.R., 2015. Regression analysis for prediction of residential energy consumption. Renewable and Sustainable Energy Reviews 47, 332–343. Gadsden, S., Rylatt, M., Lomas, K., 2003. Putting solar energy on the urban map: A new GIS-based approach for dwellings. Solar Energy 74 (5), 397–407. Gupta R (2009) Carbon emission reduction model: A new GIS-based approach. Arizona State Univeristy. Guttikunda, S.K., Calori, G., 2013. A GIS based emissions inventory at 1 km  1 km spatial resolution for air pollution analysis in Delhi, India. Atmospheric Environment 67, 101–111. Hassaan, M.A., 2015. A GIS-based suitability analysis for siting a solid waste incineration power plant in an urban area case study: Alexandria Governorate, Egypt. Journal of Geographic Information System 7 (6), 643. Hay, G.J., Hemachandran, B., Chen, G., Kyle, C., 2010. HEAT-Home energy assessment technologies: a web 2.0 residential waste heat analysis using geobia and airborne thermal imagery. In: Proceedings of the ISPRS Conference GEOBIA. Ghent, Belgium. Hofierka, J., Kanuk, J., 2009. Assessment of photovoltaic potential in urban areas using open-source solar radiation tools. Renewable Energy 34 (10), 2206–2214. Howard, B., Parshall, L., Thompson, J., Hammer, S., Dickinson, J., Modi, V., 2012. Spatial distribution of urban building energy consumption by end use. Energy and Buildings 45, 141–151. Idris R and Latif ZA (2012) GIS multi-criteria for power plant site selection. In: Proceedings of IEEE Control and System Graduate Research Colloquium (ICSGRC), pp. 203–206. IEEE. Jones, P., Williams, J., Lannon, S., 2000. Planning for a sustainable city: An energy and environmental prediction model. Journal of Environmental Planning and Management 43 (6), 855. Karteris, M., Slini, T., Papadopoulos, A.M., 2013. Urban solar energy potential in Greece: A statistical calculation model of suitable built roof areas for photovoltaics. Energy and Buildings 62, 459–468. Krüger, A., Kolbe, T.H., 2012. Building analysis for urban energy planning using key indicators on virtual 3D city modelsdThe energy atlas of Berlin. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 39 (B2), 145–150. Mastrucci, A., et al., 2014. Estimating energy savings for the residential building stock of an entire city: A GIS-based statistical downscaling approach applied to Rotterdam. Energy and Buildings 75, 358–367. Mattinen, M.K., Heljo, J., Vihola, J., Kurvinen, A., Lehtoranta, S., Nissinen, A., 2014. Modeling and visualization of residential sector energy consumption and greenhouse gas emissions. Journal of Cleaner Production 81, 70–80. Matzarakis A and Frohlich D (2009) RayMan. Retrieved from http://www.urbanclimate.net/rayman/. Miller, A., Li, R., 2014. A geospatial approach for prioritizing wind farm development in Northeast Nebraska, USA. ISPRS International Journal of Geo-Information 3 (3), 968–979. Morano, P., Locurcio, M., Tajani, F., 2015. Energy production through roof-top wind turbines A GIS-based decision support model for planning investments in the City of Bari (Italy). In: International Conference on Computational Science and Its ApplicationsSpringer, New York, NY, pp. 104–119.

GIS for Urban Energy Analysis

195

Nguyen, H.T., Pearce, J.M., 2010. Estimating potential photovoltaic yield with r. sun and the open source geographical resources analysis support system. Solar Energy 84 (5), 831–843. Nouvel R, Zirak M, Dastageeri H, Coors V, and Eicker U, (2014) September. Urban energy analysis based on 3D city model for national scale applications. In: IBPSA Germany Conference. RWTH Aachen University. Parshall, L., Gurney, K., Hammer, S.A., Mendoza, D., Zhou, Y., Geethakumar, S., 2010. Modeling energy consumption and CO2 emissions at the urban scale: Methodological challenges and insights from the United States. Energy Policy 38 (9), 4765–4782. Quan, S.J., Li, Q., Augenbroe, G., Brown, J., Yang, P.P.J., 2015a. A GIS-based energy balance modeling system for urban solar buildings. Energy Procedia 75, 2946–2952. Quan, S.J., Li, Q., Augenbroe, G., Brown, J., Yang, P.P.J., 2015b. Urban data and building energy modeling: A GIS-based urban building energy modeling system using the urbanEPC engine. In: Planning Support Systems and Smart Cities. Springer, New York, NY, pp. 447–469. Ratti, C., Baker, N., Steemers, K., 2005. Energy consumption and urban texture. Energy and Buildings 37 (7), 762–776. Redweik, P., Catita, C., Brito, M., 2013. Solar energy potential on roofs and facades in an urban landscape. Solar Energy 97, 332–341. Reiter S and Wallemacq V (2011) City energy management: A case study on the urban area of Liège in Belgium. Paper presented at the Third International Conference on Advanced Geographic Information Systems, Applications, and Services. February 23-28, Gosier, France. Resch, B., Sagl, G., Törnros, T., Bachmaier, A., Eggers, J.B., Herkel, S., Narmsara, S., Gündra, H., 2014. GIS-based planning and modeling for renewable energy: Challenges and future research avenues. ISPRS International Journal of Geo-Information 3 (2), 662–692. Rylatt, M., Gadsden, S., Lomas, K., 2001. GIS-based decision support for solar energy planning in urban environments. Computers, Environment and Urban Systems 25 (6), 579–603. Tanimoto J, Hagishima A, and Chimklai P (2003) Development of an advanced computer tool, AUSSSM Tool, for a coupling simulation with building thermal system and urban climatology. In: IBPSA Proceedings. Eindhoven, Netherlands. TownScope (2013) Software: Description. Retrieved from http://www.townscope.com/index.php?page¼software&lang¼EN&theme¼default. Valenzuela, C., Valencia, A., White, S., Jordan, J.A., Cano, S., Keating, J., Nagorski, J., Potter, L.B., 2014. An analysis of monthly household energy consumption among singlefamily residences in Texas, 2010. Energy Policy 69, 263–272. Voivontas, D., Assimacopoulos, D., Mourelatos, A., Corominas, J., 1998. Evaluation of renewable energy potential using a GIS decision support system. Renewable Energy 13 (3), 333–344. Voivontas, D., Assimacopoulos, D., Koukios, E.G., 2001. Assessment of biomass potential for power production: A GIS based method. Biomass and Bioenergy 20 (2), 101–112. Xie, H., Liu, G., Liu, Q., Wang, P., 2014. Analysis of spatial disparities and driving factors of energy consumption change in china based on spatial statistics. Sustainability 6 (4), 2264–2280. Yang, X., Zhao, L., Bruse, M., Meng, Q., 2013. Evaluation of a microclimate model for predicting the thermal behavior of different ground surfaces. Building and Environment 60, 93–104. Yeh, A.G.O., Li, X., 2001. A constrained CA model for the simulation and planning of sustainable urban forms by using GIS. Environment and Planning B: Planning and Design 28 (5), 733–753. Zhang, L., Yang, Z., Liang, J., Cai, Y., 2010. Spatial variation and distribution of urban energy consumptions from cities in China. Energies 4 (1), 26–38.

2.10

GIS in Climatology and Meteorology

Ju¨rgen Bo¨hner and Benjamin Bechtel, University of Hamburg, Hamburg, Germany © 2018 Elsevier Inc. All rights reserved.

2.10.1 Introduction 2.10.1.1 Setting the Scene 2.10.1.2 Structure and Content 2.10.2 Data Collection and Inventory 2.10.2.1 Ground-Based Observations 2.10.2.1.1 Meteorological station networks 2.10.2.1.2 Databases and data availability 2.10.2.1.3 Quality control and error detection 2.10.2.2 Remotely Sensed Data Sources 2.10.2.2.1 Remote sensing principles 2.10.2.2.2 Challenges in atmospheric remote sensing 2.10.2.2.3 Observed parameters and methods 2.10.2.3 Crowd Sensing 2.10.3 Data Analysis and Spatialization 2.10.3.1 Deterministic Interpolation 2.10.3.1.1 Thiessen polygons 2.10.3.1.2 Triangle interpolation 2.10.3.1.3 Inverse distance weighting interpolation 2.10.3.1.4 Trend surface analysis 2.10.3.1.5 Thin plate spline interpolation 2.10.3.2 Geostatistical Interpolation 2.10.3.2.1 Basics of kriging 2.10.3.2.2 Semivariance and variogram 2.10.3.2.3 Ordinary kriging and simple kriging 2.10.3.2.4 Universal kriging and regression kriging 2.10.3.3 Multivariate Statistical Approaches 2.10.3.3.1 Multivariate regression analysis 2.10.3.3.2 Geographically weighted regression 2.10.3.3.3 Machine learning algorithms 2.10.3.4 Validation and Accuracy Assessment 2.10.4 Climate Modeling and Application 2.10.4.1 Terrain Parameterization 2.10.4.1.1 Topographic solar radiation 2.10.4.1.2 Topographic temperature 2.10.4.1.3 Topographic precipitation 2.10.4.2 Land-Surface Parameterization 2.10.4.2.1 LSP types and development 2.10.4.2.2 Land surface temperature 2.10.4.2.3 Soil moisture and vegetation 2.10.4.2.4 Urban areas 2.10.4.3 Downscaling 2.10.4.3.1 Dynamical downscaling 2.10.4.3.2 Statistical downscaling 2.10.4.4 Environmental Applications 2.10.5 Summary and Conclusions Acknowledgements References Further Reading

Abbreviations AC Autocorrelation ACP Annual cycle parameters

196

198 198 199 200 200 200 201 203 203 204 204 204 205 205 205 206 206 206 208 209 209 210 210 211 212 214 214 216 221 221 222 222 223 225 225 226 227 227 228 228 228 229 229 230 231 231 231 235

GIS in Climatology and Meteorology

ADW Angular distant weighting AEM Maximum of absolute deviation AMV Atmospheric motion vectors ANN Artificial neural networks AOD Aerosol optical depth ASTER Advanced spaceborne thermal emission and reflection radiometer BLUEWS Boundary layer urban energy water scheme CART Classification and regression tree CDD Correlation decay distance CMV Cloud motion vectors CORDEX Coordinated regional climate downscaling experiment DEM Digital elevation model DKRZ German climate computing center DWD German Weather Service (Deutscher Wetterdienst) ERA European Centre for Medium-Range Weather Forecasts Re-Analysis ESA European Space Agency FTP File transfer protocol GCMs General circulation models GEO Geostationary orbit GHCN Global historical climatology network GIS Geographic information systems GOES Geostationary operational environmental satellite GWR Geographically weighted regression HPC High performance computing IDW Inverse distance weighting IPCC Intergovernmental Panel on Climate Change ISA International standard atmosphere LEO Low earth orbit LOOCV Leave-one-out cross-validation LSP Land surface parameterizations LST Land surface temperature LUCY Large scale urban consumption of energy model m.a.s.l Meters above sea level MAE Mean absolute error METEOSAT Meteorological satellite MODIS Moderate resolution imaging spectroradiometer MOS Model output statistics MRA Multivariate regression analysis NCAR National Center for Atmospheric Research NCEP National Centers for Environmental Prediction NDVI Normalized difference vegetation index NOAA National Oceanic and Atmospheric Administration NRMSE Normalized root mean square error NWPM Numerical weather prediction model OK Ordinary kriging PP Perfect prognosis PRISM Parameter regression on independent slopes model RADAR Radio detection and ranging RCMs Regional climate models RF Random forest RMSE Root mean square error RS Remote sensing SK Simple kriging SMOS Soil moisture ocean salinity

197

198

GIS in Climatology and Meteorology

SOLWEIG Solar and longwave environmental irradiance geometry model SRTM Shuttle radar topography mission SUEWS Surface urban energy and water balance scheme SVAT Soil-vegetation-atmosphere-transfer SVF Sky view factor TEI Terrain exposure index TI Triangle interpolation TIN Triangulated irregular network TIROS-1 Television infrared observation satellite TP Thiessen polygon TPS Thin plate spline TRMM Tropical rainfall measurement mission TSA Trend surface analysis UK Universal kriging UMEP Urban multi-scale universal predictor VGI Voluntary geographic information WCRP World climate research program WLI Windward-leeward index WMO World meteorological organization WUDAPT World urban database and access portal tools

2.10.1

Introduction

The importance of geographic information systems (GIS) in climate research has increased in recent decades. The recognition that climate variations and climatically determined processes are spatially explicit has paved the way for environmental research combined with sophisticated GIS tools and spatial distribution models. Climate research, though traditionally the core domain of climatologists and meteorologists, is an increasingly cross-sectorial task which requires multidisciplinary knowledge and accordingly integrates the specific scientific perceptions and methodological paradigms of a wide range of disciplines. Particularly, the Intergovernmental Panel on Climate Change (IPCC) process from the First Assessment Report (FAR; IPCC, 1990) to the Fifth Assessment Report (AR5; IPCC, 2013) has expanded the scientific scope and fostered a more holistic assessment of climate change, its natural and socioeconomic drivers, its scale-dependent atmospheric processes and its environmental and socioeconomic impacts. As a result, the professional designation “climate scientist” today comprises media scientists, targeting the medial perception and presence of climate risks, as well as economists, e.g., balancing the social burden and benefit of climate mitigation and adaptation measures. Additionally, climate modeling and the development of new sophisticated climate models which formerly fell primarily to atmospheric physicists and theoretical meteorologists (dynamists) today involves bioclimatologists, biogeochemists, hydrometeorologists, physical geographers, mathematicians and information scientists (to name but a few). This reflects both the ongoing disciplinary specialization and interdisciplinary cooperation in response to the complexity of the climate system. The result has been an ever growing demand on computationally efficient IT-solutions, and to a certain extent, the epistemological progress in climate research.

2.10.1.1

Setting the Scene

Despite this disciplinary diversification, almost all climate scientists in both basic and applied research, as well as decision makers across numerous branches faced with climate issues (politics, economics, environmental planning, etc.) ultimately depend on spatially explicit information, today routinely collected, processed and/or performed with support from GIS. Indeed, a climate modeler who aims to improve the parameterization of surface processes in a mesoscale climate model might not necessarily be aware of the role of GIS in his research design, but will typically apply vector-to-raster conversion of land use/land cover, which is a fundamental GIS routine, as old as GIS itself. Likewise, nonscientists, or rather those who own a television set, consume GIS applications when viewing the daily weather map, since the predicted temperature or precipitation fields are typically a GIS visualization of the output of a numerical weather prediction model (NWPM) or even the result of a GIS-based spatial refinement (downscaling) of the NWPM forecasts. These two examples alone illustrate the range of GIS applications in climatology and meteorology, from rather simple geodata processing to quite complex modeling techniques. It is not surprising that GIS has recently gained increasing attention in meteorology and climatology (Chapman and Thornes, 2003; Tveito et al., 2008; Dobesch et al., 2013). In view of the broad variety of opportunities to use GIS in climate-related surveys, research activities, and decision processes, it is clear that this chapter can only cover a selection of elementary GIS applications in this disciplinary context. Considering the core

GIS in Climatology and Meteorology

Data base

Continuous climate layers

199

Climate & process models

Remotely sensed data Land use/vegetation Digital Terrain data Reanalysis series GCM/RCM data

Selection, assessment, and processing of climate data, remotely sensed sources and geobasis data

Spatial analysis of climate data, selection and application of GIS spatialization methods, climate mapping

Surface parameterization, downscaling of GCM or RCM data, environmental modelling

Modelling & application

Data analysis & spatialization

Methodical and scientific sophistication

Climate records

Data collection & inventory Stages of research Fig. 1

Conceptual model of a three-stage research process aiming at climate and environmental modeling.

competencies of contemporary geospatial science and the skills of current GIS, we see three major methodical fields of GIS applications: 1. Data collection and inventory. 2. Data analysis and spatialization. 3. Climate modeling and application. The inherent hierarchy of these principal application scopes and related tasks largely corresponds to the three-step GIS evolution scheme of Crain and MacDonald (1984), which describes the development and implementation of a comprehensive GIS as a three-stage process of differently time-intensive core activities and tasks. Assuming here a research and development process ultimately targeting at GIS-based environmental assessments and modeling of climate-driven environmental processes, Fig. 1 adapts this GIS evolution scheme for the respective disciplinary needs in our context, in order to illustrate the thematic framework and structure of this textbook contribution.

2.10.1.2

Structure and Content

The first stage of the research involves the selection, assessment and processing of data and metadata. These tasks are addressed in the “Data Collection and Inventory” section, where we highlight principal sources of observational climate data, for met-station observations, remotely sensed climate data, and crowdsourcing. Given that observations from meteorological networks and remotely sensed datasets seldom directly meet the researchers’ requirements on the spatial resolution of climate and weather data, the “data analysis and spatialization” activity faces the challenging task of predicting climate values for particular locations or areas without (sufficient) direct observations. Accordingly, the “Data Analysis and Spatialization” section introduces basic interpolation and spatial modeling techniques, exemplified for selected climate variables. From a climate science perspective, the development and application of geospatial methods is not only geared to the generation of areal extensive climate datasets but particularly aims to deepen the understanding of climatic phenomena and processes. This aim, however, can rarely be achieved by pure spatialization techniques but rather requires the use of statistical or numerical modeling approaches, covering elementary cause-effect relations between spatiotemporal climate variations and their atmospheric and topographic controls. Moreover, in a broader environmental research context, climate spatialization and modeling results are often required to assess (for example) environmental risks of climate change or climate extremes. The “climate modeling and application” activity in the third stage of the schematic research process mirrors both the development and implementation of climate models and the frequent utilization of climate and weather data in environmental assessment and application. These aspects are finally targeted in the “Climate Modeling and Application” section, where we emphasize GIS-based terrain and surface parameterization techniques, supporting

200

GIS in Climatology and Meteorology

an improved representation of small-scale topographically forced climate variations which are particularly required in downscaling of climate model outputs. Finally, we draw these elements together by exemplifying applications of climate and weather data in environmental assessments and process modeling. Regarding the outlined conceptual model of a research process as a whole, it is worth noting that the meaning of “climate” evolves from an object of inventory in the first stage, to a spatiotemporal highly variable process-controlling subject in the final stage of research. Against this background, it should be obvious that the “methodical and scientific sophistication” increases within such a research process. While “data collection and inventory” as well as most of the “data analysis and spatialization” methods are wellestablished, readily available standard routines in current GIS software, the “climate and environmental modeling task” typically requires advanced programmable GIS environments, sufficient hardware resources, computationally efficient geodata and modeling structures and frequently the utilization of GIS-external modeling tools and components. However, even if a research process has multiple aims, for example, aims to support a spatially explicit high-resolution assessment of degradation risks which may require the application of both a numerical climate model and a process-based soil erosion model, GIS in this context serves as an indispensable integrative platform for the interlinking of data, methods and modeling components and, finally, the dissemination of research results with respect to the specific requirements of scientists, stake holders or decision makers. This rather abstract representation will be elaborated on in the following sections.

2.10.2

Data Collection and Inventory

In general, the major primary sources of climate data for GIS applications are the databases and data portals of governmental agencies and authorities, given that the regular observation of climate and weather is mostly the responsibility of national weather services. In the following subsections, we highlight the main types of observational data operational to GIS. Of the various sources available, ground-based weather station observations are by far the most used climate database in GIS, followed by the more recently available remote sensing (RS) products measured using active and passive RS techniques. Additionally, we introduce crowdsourcing as a novel way of extensive data acquisition which we assume will be an increasing GIS issue in the near future. Further data sources delineated from primary observational data such as spatial high-resolution gridded data products or atmospheric reanalyses as well as climate model simulations are considered in later sections.

2.10.2.1

Ground-Based Observations

Weather observations are likely as old as mankind, given the meaning of climate and weather for almost all areas of human activities. Early evidence for observations of climate phenomena such as cloud formation, the beginning and duration of the rainy season or the timing of phenological stages trace back to the early high cultures in Babylon, India and China, depicting the significance of climate and weather for human wellbeing. However, in a narrower sense, quantitative weather observations were only possible after the invention of the thermometer by Galileo Galilei (1564–1642) in 1593, the first scientific rain gauge measurements by Benedetto Castelli (1578–1643) in 1639, and the experimental evidence that “air has weight”, proved in 1643 by Evangelista Torricelli (1608–1647), the founder of the mercury barometer. These ingenious inventions mark not only the beginning of the quantitative era in meteorology, but the principal detection that climate varies from place to place and the resulting scientific need for a systematic analysis of observations from different locations is likely also one of the most important epistemological roots for geospatial sciences in general.

2.10.2.1.1

Meteorological station networks

The first weather observing network was already implemented as far back as 1654, shortly after the invention of measuring techniques, and consisted of ten meteorological stations mainly located in Italy and one station in Innsbruck, Paris, Osnabrück and Warsaw. The funding by the Florentine Medici family enabled instrumentation, measurement and delivery of recorded data in regular time intervals to the Accademia del Cimento in Florence. Although the coordinated data transfer and central collection of observed records resembled a modern meteorological network, the foundation of the Accademia del Cimento by students of Galileo was more devoted to experimental physics rather than to atmospheric studies (Boschiero, 2007). A more disciplinary focus evolved in the Age of Enlightenment in the late 17th and 18th centuries with the constitution of mostly national academies of sciences and meteorological societies such as the European Societas Meteorologica Palatina, founded in 1781, which fostered the installation of standardized station networks (Kington, 1974). In this context, it is quite remarkable that the imprint of these early academies and societies is still detectable even today, for instance in the timing of daily measurements and similarly in the regional differences in calculating daily mean temperatures. While many countries, including the United States, average the minimum and the maximum temperature, the daily mean in Germany (up until 2000) was based on the “Mannheimer Stunden” average of the temperatures recorded at 7 pm, 2 am and two times at 9 am, originally proposed by the Societas Meteorologica Palatina. The historical propagation of instrumental weather observations, however, was of course not a purely scientifically driven process. Colonial interests of the European powers and their particular need for environmental information from overseas were likewise important drivers for the establishment of national meteorological departments (e.g., the India Meteorological Department established in 1875) and this fostered the global sprawl of weather observatories which, according to the US National Oceanic and Atmospheric Administration (NOAA), at the end of the 19th century already counted more than 2000 stations, mostly located in North America, Central Europe and Southeast Australia.

GIS in Climatology and Meteorology

201

These long-term records from historical networks differ in terms of instrumentation, homogeneity, spatial coverage and often contain measurement gaps, which, as in the case of the famous Central England Temperature record (since 1659), had to be filled with observational material from other locations (Manley, 1974); nevertheless, centennial or multicentennial records are the most valuable source for the direct quantification of global warming since industrialization (cf. IPCC, 2013). Even today, despite increasing RS-capacities for climate monitoring, ground-based weather station records remain a fundamental building block of climate research. Weather data are recorded at variously equipped observatories, which may be generally grouped into synoptic meteorological stations, weather stations and rain-gauge stations, each supporting different core tasks of national meteorological services. For GISbased surveys this hierarchical differentiation is particularly relevant with respect to the different number of installations and thus the principal data availability and spatial coverage. In Germany for instance, the weather network in 2016 included 172 synoptic stations (0.5 stations per 1000 km2), 576 weather stations (1.6 stations per 1000 km2) and 3162 rain gauges including weather stations (8.8 gauges per 1000 km2). In the following we briefly outline the basic features of these networks. Rain gauges form the densest networks and reflect the need to account for the high spatial variability of precipitation. In its simplest form, precipitation is measured by a nonrecording rain gauge where the total amounts of precipitation per unit time, mostly daily values, are gathered and manually readout. However, less personal intensive self-recording rain gauges, particularly required in remote areas, are becoming increasingly common. Standard weather stations are ideally equipped for in situ measurements of the primary (i.e., directly measurable) meteorological variables: air temperature, pressure, humidity, solar radiation (often substituted by sunshine duration), wind speed and wind direction. Accordingly, the instrumentation of weather station networks includes a thermometer, barometer, hygrometer, pyranometer (or alternatively sunshine recorder), anemometer and wind vanes, typically installed at 2 m above ground. National weather station networks are the backbone and major source for the statistical description of climate given in terms of means, variability and extremes of major weather variables recorded over a longer time period of ideally at least three decades, frequently referred to as “climate normal periods” (recently 1981–2010). Since many GIS users tend to assume that this statistically aggregated data material is the final product of a digital data stream originating at the station site, it is important to stress that many of these digital data sets, hosted for download at national or international agencies, originate from manual Stevenson screen (weather hut) measurements and are the result of long engagements of voluntary observers often spanning decades. Despite the long-standing presence of analog measuring techniques, which still continue today, automatic weather stations are increasingly becoming the standard, at least at the level of synoptic meteorological networks. Automatic weather stations consist of weather sensors, installed at masts with standard heights of 2, 3 or more seldom 10 m (particularly used to measure wind speed and direction at a level less obstructed by surface obstacles) and a data logger unit, which collects, processes and stores the data from all sensors and moreover manages the communication protocols with remote servers. The temporal high-resolution recording at synoptic times and real-time exchange of ground-based observations with regional or global synoptic networks enables the monitoring of weather conditions over broad domains and thus provides an essential observational component for weather forecasting.

2.10.2.1.2

Databases and data availability

Baseline climatologies, monthly, daily, hourly and even sub-hourly resolution climate records, are increasingly available through a variety of national and international data portals. Particularly in the United States, long-standing data policies fostered free data access to national weather records while in many European states, the slower pace of free data policies only recently eased the open availability of climate and environmental data, at least in some countries. Although almost all European national weather agencies provide climate data for research purposes, free access to climate data for the general public is still being implemented. More restrictive data policies are particularly distinctive in some Asian states, which often only provide a minor selection of climate data, such as baseline climatologies, for few locations, while in developing countries the often pressing economic constraints are reflected in poor or missing online presence of responsible agencies. In the case of limited or missing web-based data access, queries for climate data may typically start with the database of the Global Historical Climatology Network (GHCN), which provides probably the most comprehensive collection of climate summaries from land surface stations obtained from more than 20 sources. The GHCN data portal enables a systematic query and access to quite a large amount of station data and metadata, available for download via file transfer protocol (FTP). The entire station list comprises more than 100,000 stations, though climate data is only available for a subset of these stations. But even under the assumption of complete data availability for all these stations, their global distribution, shown in Fig. 2, reveals a core problem of climate surveys: the uneven coverage (and thus limited representativeness) of climate records in many areas of the world. At first glance, the distribution of stations does not necessarily reveal a clear gradient of station densities between the global north and south, nor that the distribution is somewhat biased with respect to different national reporting politics; however, on closer inspection the station coverage is often poorer in developing countries, particularly in Africa, which are actually assumed to be most endangered by climate change (IPCC, 2013), while the well-established geodata infrastructure, e.g., in the United States, in Central and Northern Europe or Australia, is mirrored by a quite dense observation network. India, however, has likewise a remarkably high station density, likely reflecting the traditional significance of weather observations in respect to the important role of the South Asian Monsoon system for India’s agricultural sector and economy (Böhner, 2006). A further constraint of climate surveys concerns the spatial coverage of climate data with respect to different natural environments. Polar regions, rainforested areas in the moist tropics of South America and Africa, the Central Asian and African drylands, and high-mountain environments are rarely covered by station observations The latter aspect is illustrated in Fig. 2 by an almost

GIS in Climatology and Meteorology

Elevation [m asl]

202

6000 5000 4000 3000 2000 1000 0 100

101

102

103

104

105

Number of Stations

Fig. 2

Spatial distribution of weather stations of the Global Historical Climatology Network (Peterson and Vose, 1997).

exponential decrease of the total number of weather stations with elevation. Moreover, the stations at an elevation above 4000 m.a.s.l., especially when located in High Asia (Himalayas, Karakoram, Tibetan Plateau), are typically situated at (bioclimatic favorable) settlements in valleys and thus do not necessary represent the actual high-mountain climate. Due to these data constraints, the acquisition of climate data often requires time-consuming, patient but highly rewarding requests from additional sources, for example national or federal environmental agencies, which may possibly operate their own meteorological network independently from a national weather service, or already have successfully undertaken the collection of datasets from sources not accessible via the internet. The data situation for climate surveys in extreme high-mountain or desert environments, however, can ultimately only be improved by establishing additional weather stations. For this chapter, we demonstrate the hierarchy of working steps and GIS applications using an example area in Germany. The area chosen (49,875 km2) captures a transition from the northern German lowlands with elevations almost at sea level, to the central low mountain ranges with elevations up to 1141 m.a.s.l at the Brocken Mountain in the Harz (Fig. 3). Therefore our example region covers a fair amount of different topographic settings, suitable to subsequently exemplify working steps and GIS applications. Starting with a data query at the GHCN data portal, the search for stations in this particular region yielded daily resolution

Fig. 3 Spatial distribution of weather stations in the German example region: GHCN stations (cross), DWD weather stations (circle) and rain gauges (dots).

GIS in Climatology and Meteorology

203

climate records and metadata information for only 26 weather stations, differing in terms of record length, available variables and completeness. Much better data coverage was obtained from the German Weather Service (Deutscher WetterdienstdDWD), comprising daily and monthly resolution time series since 1971 for 128 rain gauge stations and 76 weather stations (Fig. 3). The data sets from DWD are considered in the subsequent sections to exemplify interpolation and spatialization techniques.

2.10.2.1.3

Quality control and error detection

While the data sets from national weather agencies or the GHCN are commonly quality controlled, an evaluation and check of the reliability of data is particularly required when using records from less-established sources; often this reveals incomplete or missing metadata information. With respect to the use in GIS-based studies, the station coordinates and the elevation should be checked first, given that incorrect information can severely hinder proper spatialization. Station elevations can be easily checked against elevation values obtained from a digital elevation model (DEM) and if contradictory station elevations are indicated in different sources (e.g., WMO station lists, metadata information from a web portal, printed sources) the DEM supports an identification of a more reliable value. The screening of records at first should identify obvious mistakes, e.g., negative precipitation values, temperature minima (maxima) above (below) the respective means or records out of a physically plausible range, easily detectable by automated logical queries. In the next step, a z-transformation of records and the calculation of statistical exceeding probabilities, which in case of precipitation may require a logarithmic transformation in advance, supports an initial identification of questionable records (e.g., with an observation exceeding probability of less than 0.01%). However, a purely statistical detection of extremes without a comparative analysis of neighboring stations bears the risk of an incorrect valuation. Superior methods for quality controls and error detection are offered by GIS-based interpolation methods in combination with cross-validation (see “Data Analysis and Spatialization” section). The logged result of the cross-validation protocol lists the differences between station records and interpolation results for each station, and thus enables the identification of stations with extreme residuals, which, particularly when this occurs frequently, indicates low quality or questionable records. Hijmans et al. (2005) applied this procedure of quality control, considering largest root mean square residuals from cross-validation protocols of thin plate spline (TPS) interpolation as error indicators. Further methods used to identify abrupt changes and inhomogeneities in longer time series, typically induced by the relocation of a station site or changes in instrumentation, are more a topic of time series analysis methods rather than GIS, and thus not addressed in this chapter. For absolute and relative homogeneity tests and their application see, e.g., Yozgatligil and Yazici (2016). Although statistical and geostatistical methods are undoubtedly helpful for the evaluation of climate data, a simple visual quality check of the plotted time series is always recommended given that frequent mistakes sometimes are hardly automatically detectable (cf. Xavier et al., 2008). Typical errors in this regard are missing values, quite often wrongly labeled as “0”, which are more or less detectable dependent on the measured quantity, unit and expected value. In precipitation series these wrongly attributed gaps often become more apparent by visual synoptic comparison of all variables, even though the statistical analysis of changing covariance pattern or the cross-validation from spatial interpolation may likewise be helpful in this respect. A further mistake more likely to be detectable by visual checks may occur in precipitation series, which have been labeled as daily resolution data, but in fact are monthly totals, each assigned the last day of a month while all other dates are given as “0”. If this occurs in a long time series, it may go undetected in statistical tests, but the visual check easily identifies the unrealistic temporal pattern. Finally, according to the experience of the authors, the on-site visit of a weather station provides a good (though sometimes disillusioning) impression of the quality of observations and distinctly eases the access to data and metadata material which may otherwise be unavailable. Lastly, the overall quality of observational material in terms of precision and completeness of course depends on the technical equipment and instrumentation of station networks. The implementation of automated screens and the use of high-quality sensors are severely limited by their large maintenance costs. This is in particular reflected in the relative data sparseness in poorer and more remote locations. Even in the industrialized world, existing networks are not sufficient to fully represent the three-dimensional and microclimatic variations across the world, and moreover are now operating under increased pressure in recent decades, given the insatiable need for quality climate information. The most suitable approach to simultaneously gather information with a synoptic view on wider areas is remote sensing, which will be addressed in the following section.

2.10.2.2

Remotely Sensed Data Sources

Since the beginning of the Television InfraRed Observation Satellite (TIROS-1) as the first meteorological satellite on Apr. 1, 1960, satellite remote sensing has become an important instrument for climate research. The growing availability of high-resolution multispectral data from weather and earth observation satellites (especially GOES, METEOSAT and NOAA), the new possibilities of continuous monitoring of hydrometeorological phenomena and the development of innovative remote sensing techniques for the quantitative detection of atmospheric processes have fostered atmospheric RS. These data are now utilized in various subdisciplines of climate and atmospheric research. In particular, various sensors have enabled deeper understanding of the earthatmosphere system, as well as better representation in NWPMs (Kidd et al., 2009; Thies and Bendix, 2011). Nowadays, the vast majority of observations assimilated into operational NWPMs originate from satellite remote sensing. In the following sections, selected methods will be presented after a brief discussion of fundamental principles and challenges of remote sensing approaches.

204

GIS in Climatology and Meteorology

2.10.2.2.1

Remote sensing principles

Remote sensing in general is defined as gathering of information about an object without being in physical contact with it. Here it is used more specifically, as sensing of the Earth from satellites (or aircrafts) by the use of electromagnetic radiation and its properties. According to the origin of the electromagnetic waves, passive and active remote sensing techniques are distinguished. In passive remote sensing, the radiation is either emitted by the object itself or originates from the sun, while in active remote sensing, the radiation is emitted by the sensing platform itself, which is illuminating the scene comparable to a flashlight in photography. In the two latter cases the properties of the electromagnetic waves (such as spectral intensity and polarization) are altered by the object of interest by reflection, absorption or scattering, and hence allow inference about the object itself. Sounders observe the vertical distribution of the atmosphere while imagers produce 2D representations of the Earth’s surface or the atmosphere, which are especially suitable for further processing in GIS.

2.10.2.2.2

Challenges in atmospheric remote sensing

A physical limitation in satellite remote sensing is the orbit of the sensing platform, which is determined by Kepler’s laws of planetary motion. The most common orbits are the geostationary orbit (GEO) and the low earth orbit (LEO). In the GEO, the satellite is placed above the equator with a revolution period of one day and thus remains at a fixed position relative to the Earth’s surface. The trade-off is a distance of about 36,000 km to the surface, resulting in high temporal and low spatial resolution. Conversely, satellites in LEO have a ground distance of only about 500–1000 km, resulting in much higher spatial resolution but long revisit times of days to weeks. Most of them are in a sun synchronous orbit, which means they always pass the equator at the same local time and thus enable macroscale spatial comparability but no observation of the diurnal cycle. The trade-off between spatial and temporal resolution is complemented by the spectral resolution, with narrower spectral bands also resulting in coarser spatial resolution. Another general challenge in remote sensing is the diverse origin of the radiation observed at the sensor. In particular, it typically comprises a mix of radiation reflected and emitted from the surface and radiation emitted or scattered from the atmosphere. While in atmospheric remote sensing the latter is the signal and the earlier needs to be corrected, the perspective in terrestrial remote sensing is reverse. This makes the inversion (deduction on the interfering object from the altered radiation) more challenging. Moreover, the radiation is typically directionally dependent (anisotropic).

2.10.2.2.3

Observed parameters and methods

The most important atmospheric quantities observed by satellites comprise wind fields, radiation, cloud cover, type, and height, precipitation, air temperature and water vapor, as well as other trace gases and aerosols. In the following only selected methods are introduced; a comprehensive overview of missions and their applied techniques is given in the supplementary materials of Thies and Bendix (2011). Wind fields, providing information about the dynamics of the atmosphere, are highly relevant for NWP and several techniques are used for their derivation. Most common are feature-tracking algorithms, which are differentiated into cloud motion vectors (CMV) and atmospheric motion vectors (AMV) corresponding to the tracked features clouds and water vapor. Ocean microwave emissivity and active RADAR backscatter can be exploited to estimate near surface winds. The key driver of the large-scale atmospheric dynamics, namely the distributions of incoming and outgoing radiation, can be measured comparably straightforwardly at the top of the atmosphere from satellite sensors as well (Kidd et al., 2009). Moreover, various sensors have been developed to measure trace gases and aerosols, which are crucial for the energy balance and atmospheric chemistry. Gases are typically detected based on specific absorption features and include carbon dioxide, methane, ozone, nitrous gases and sulfur oxides. Aerosols (a dispersion of solid particles or liquid droplets in air) are characterized by the aerosol optical depth (AOD) and the Ångström exponent (dependency on wavelength) and can be derived over oceans with their low reflectance in the red and near infrared region, while over land assumptions about the surface reflectance properties are needed (Thies and Bendix, 2011). An alternative approach exploits the polarization of light scattered by particles. Specific cases of great relevance are water vapor and clouds. Water vapor is the prime greenhouse gas and is detected at absorption features in the near infrared, thermal infrared or microwave regions. Water vapor and air temperature are also observed as vertical profiles. In the latter case the emission around and absorption band of a well-mixed gas is used, since for different optical thicknesses the altitude of the contribution peak varies (the emissions from the lowest layers are entirely absorbed while the highest layers contribute less due to their small density). Cloud detection algorithms are essential to both atmospheric and terrestrial remote sensing and often grouped into empirical (implicit physics) and theoretical (explicit physics) (Bankert et al., 2009), which are based on thresholds and histogram approaches or more complex cloud models, respectively. Moreover, relevant cloud properties such as optical thickness, liquid water path or cloud top height are derived from remote sensing. The latter can be estimated from the cloud-top temperature observed in the thermal bands, and subsequently be used to estimate precipitation with thresholds (cloud index methods). If the surface temperatures are estimated beforehand from their annual cycle, this can improve the cloud masking (Bechtel, 2012). Despite the simplicity of threshold-based methods they are presently used by most geostationary systems. However, while they provide robust estimates for convective clouds, they are known to underestimate precipitation from stratiform clouds (Thies and Bendix, 2011). Other methods for precipitation retrievals are based on passive and active microwave, since the larger droplets especially interact in this spectral region. The first active RADAR sensor was launched in 1997 with the Tropical Rainfall Measurement Mission (TRMM).

GIS in Climatology and Meteorology 2.10.2.3

205

Crowd Sensing

A more recent phenomenon is the widespread engagement of untrained citizens in the creation of voluntary geographic information (VGI), which “represents a dramatic innovation that will certainly have profound impacts on geographic information systems (GIS) and more generally on the discipline of geography and its relationship to the general public” (Goodchild, 2007). Crowdsourcing has a long tradition in disciplines such as astronomy and ecology, but it remains in its relative infancy in the atmospheric sciences (Muller et al., 2015) despite its high potential to overcome both the data sparseness and the high maintenance costs of sensing networks. This is largely due to justified concerns about the quality of crowd data, but also partly reflects the tradition of a discipline where the measurement standardization in protocols and norms is of high importance to achieve globally consistent datasets. Atmospheric crowd sensing is especially relevant in urban areas, which have the best coverage of VGI and large variation of influencing factors and resulting microclimatic conditions, which results in limited representativeness of individual stations. Despite the general hesitancy, single exploratory studies based on crowdsourced data have very promising results. Overeem et al. (2013) applied a heat transfer model to derive daily mean air temperature from smartphone battery temperatures for eight cities worldwide. Anderson et al. (2012) evaluated data collected from the on-board equipment of nine vehicles. Meier et al. (2017) showed that citizen weather stations are an alternative and cost-efficient source for monitoring of urban climates and that filtered data from 1500 stations in and around Berlin provided a robust estimate of hourly and daily urban air temperature and showed spatiotemporal characteristics of the urban heat island.

2.10.3

Data Analysis and Spatialization

To date, a variety of methods have been applied to generate continuous climate surfaces from point source data. The range of methods is already mirrored in apparently different terms, which to a certain extent reflect disciplinary perceptions when addressing the challenge of estimating areal climate layers from local observations. A meteorologist for instance may prefer the term “upscaling”, given that weather station observations, each of them recorded at the local scale, are transformed into a coarser grid network. “Regionalization” is likewise an often-used expression, which, however, is ambiguous due to the original concept of regionalization as a method for the spatial discretization of homogeneous regions. Probably the most common term is “interpolation”, which is an explicit co-terminus of a variety of methods (e.g., Inverse Distance Weighted interpolation) but is inadequate for regression-based approaches (see “Multivariate Statistical Approaches” section). Against this background, in this chapter we use “spatialization” as an increasingly accepted umbrella term for all kinds of techniques applied for the generation of spatial data from point observation while “interpolation” is more specifically considered when addressing methods explicitly targeting the estimation of climate values for nonsampled locations between sample points. In the following, we introduce deterministic interpolation (see “Deterministic Interpolation” section), geostatistical “kriging” interpolation (see “Geostatistical Interpolation” section), and multivariate statistical spatialization approaches (see “Multivariate Statistical Approaches” section). Although this structure may be questioned, particularly from scientists with a stronger mathematical background, given that the elementary differentiation between deterministic and probabilistic principles is not stringently represented, in the context of this book we assume this structure to sufficiently capture the core problem of spatialization: the immediate dependency of the quality of results on the spatial distribution and density of point source observations on the one hand, and the skills of different spatialization approaches to tackle different input data situations. Accordingly, in the “Validation and Accuracy Assessment” section we finally introduce validation measures, basically derived from cross-validation, and provide a comparative evaluation of methods and interpolation results obtained from long-term means and daily values of temperature and precipitation from our example region in Germany. Assuming a research process, which requires regular gridded climate data (which in fact is by far the most common representation of continuous climate fields), in the following, a target grid node located at the geographic or geodetic coordinates (xg, yg) is assigned with G and the respective climate value ṽg is the resulting estimate (or prediction) for the unknown value vg of the climate variable V of interest. Weather station locations are indicated as data points Pi (P1 to Pn) with the coordinates (xi, yi) (x1, y1 to xn, yn) at altitudes zi (z1 to zn) while the values vi (v1 to vn) indicate the corresponding input data sets. Mapped results of different spatialization approaches are inferred from the observational database from the example region, previously introduced in the “Data bases and data availability” section, considering at first long-term annual means of precipitation and temperature. As far as the statistical accuracy of spatialization results is commented with respect to the coefficient of determination (R2), all numbers refer to the validation measures as described in “Validation and Accuracy Assessment”.

2.10.3.1

Deterministic Interpolation

The term “deterministic interpolation” designates mathematically exact (i.e., non-approximative) methods, explicitly designed to preserve the values of the input data exactly (or as exact as possible) when generating a continuous surface from irregular distributed data points. Deterministic interpolation is frequently differentiated into “local” and “global” approaches. The latter consider the entire input data set for each grid node, while local deterministic methods only use a subset of data points neighboring a target grid node. This differentiation, however, is only unambiguous for local Thiessen polygon (TP) and triangle interpolation (TI), while inverse distance weighting (IDW), trend surface analysis (TSA) and thin plate spline (TPS) instead allow both, its application on the

206

GIS in Climatology and Meteorology

entire input data or on a subset of data points in a moving window. Although TP and TI today only play a minor role in climate spatialization, we considered these methods for completeness and since they facilitate the introduction to spatialization.

2.10.3.1.1

Thiessen polygons

The Thiessen polygon (TP) method, frequently also referred to as Voronoy tessellation (Zhu, 2016), is originally not targeted at an estimation of regular gridded surfaces but aims to geometrically define the spatial “responsibility” of punctual observations or information at the points Pi, by defining the area that is closest to a particular data point. The optimal determination of TP is mostly performed using the Delaunay Triangulation algorithm, which results in a gapless, non-overlapping triangulated irregular network (TIN) and moreover ensures a maximization of the minimum angle in each triangle (McCloy, 2005). Based on the triangle faces, each defined by three vertices and edges, the Thiessen polygons are geometrically defined by the perpendicular bisectors of the edge half of the distance between adjacent vertices. In the result, a plane is discretized into a finite set of areas, each assigned to a specific observation point (or seed). Since no new values are estimated or interpolated but only the observed values vi at points Pi are assigned to the respective surface unit, TP is not an interpolation method in the narrower sense. However, the TP vector data operation enables a “Natural Neighbor” interpolation, which estimates weighted means from neighboring values vi (cf. Hofstra et al., 2008) or may be directly rasterized to determine the Nearest Neighbor, i.e., the nearest neighboring point value for each grid node. Although both methods yield rather poor representations of continuous climate fields, the Nearest Neighbor method is useful when trying to separate spatially discrete climate phenomena, e.g., areas with or without precipitation.

2.10.3.1.2

Triangle interpolation

Triangle Interpolation (TI) generates spatially continuous surfaces for a target variable V, as with TP, TI commonly uses a Delaunay triangulation to generate a proper TIN from irregular distributed data points as the data basis for linear interpolation. Although TIN is a frequently used vector data format for DEM, given the relative low storage requirements as compared to gridded elevation data, non-overlapping triangles may likewise be used to represent irregular distributed climate values. The interpolation of gridded values from TIN is rather simple. Assuming three values v1, v2 and v3 observed at the points P1, P2, and P3 with the respective coordinates (x1, y1), (x2, y2) and (x3, y3) the spanned triangular surface enables a linear interpolation for each grid node G within the triangle. The interpolation result then is the “height” ṽg of the triangular surface perpendicular at the point G. A weakness of the method is the assignment of the values to triangles, which is not straightforward. A further deficit is that the result inherits artifacts from the triangular pattern or the polyhedral structure of the input data with “distortions” at the boundaries. Fig. 4 exemplifies the TIN and the respective precipitation surface, obtained by TI. As shown by the distinct triangular facets, e.g., in the Harz mountain, TI tends to reproduce the TIN geometry, particularly at data-sparse areas with strong variations. Accordingly, TI is only recommended at high observation densities to limit the influence from using a triangular pattern.

2.10.3.1.3

Inverse distance weighting interpolation

Inverse distance weighting (IDW) interpolation is a computational efficient method, which enables a calculation of spatially highresolution gridded data sets from higher volumes of irregular distributed point source data. Accordingly, IDW has been particularly used by geodesists to generate DEMs, given the high amount of punctual elevation data, e.g., from digitized isohypses from topographic maps. In climate surveys IDW is likewise a well-established method and particularly used if the input data is derived from a dense observational network. As already depicted by the method’s name, IDW considers the reciprocal of the Euclidean distances di between data points Pi and a target grid nodes G as weightings when determining the value ṽg as the weighted arithmetic mean of the observed values vi, by: n P wi vi 1 bv g ¼ i¼1n with wi ¼ p (1) P d i w i

i¼1

Accordingly, the weight wi of an observed value vi at a point Pi decreases with increasing distance di to the target grid node G (xg, yg). The power parameter, p, in Eq. (1) controls this distance-depended weighting. A p of 0 results in a non-weighted arithmetic mean of all values while for p > 1 nearer input data points are weighted stronger than distant points. This effect is stronger the higher p is, and converges to the value of the closest input data point (nearest neighbor) if p goes against infinity. Besides this rather simple weighting scheme, the correlation decay distance (CDD) is sometimes employed to define the distance component of station weights, particularly when estimating monthly or daily climate surfaces from station records. Based on a cross-correlation matrix for climate time series of a given set of stations and the respective distances between all station pairs, CDDs are estimated as kilometer distances, where the correlation decays below a critical significance level (e.g., alpha > 0.05). In the procedure, for each station record, the correlations ri with all other station records are plotted against the distances hi and fitted by an exponential decay function of the form (2). The distance, where rc equals 1/e in Eq. (2) is then considered as the CDD estimate to define the distance-dependent station weights wi for IDW applications. rc ¼ ehi =CDD 

wi ¼ edi =CDD

m

(2) (3)

GIS in Climatology and Meteorology

TI

207

TI

10 20 30 40 [km]

10 20 30 40 [km]

IDW

IDW

10 20 30 40 [km]

10 20 30 40 [km]

TPS

TPS

10 20 30 40 [km]

10 20 30 40 [km]

Temperature (°C)

2.6

4.0

5.4

6.8

Precipitation (mm)

8.2

9.6

600

800

1000

1200

1400

1600

Fig. 4 Annual mean temperature (left) and precipitation (mm) (right) fields (1961–1990) estimated with different interpolation techniques. Triangle interpolation (TI), inverse distance weighted (IDW), thin plate spline (TPS).

The empirical exponent, m, in Eq. (3), controls the gradient of the distance-dependent weights and needs to be iterated through cross-validation (New et al., 2000; Caesar et al., 2006). Given that the weights wi are a function of both CDD and distance di between input data points Pi and the target grid nodes G, the weighting scheme not only considers the distances but likewise the representativeness of the temporal climate variations observed at a particular station. Another improved weighting scheme, which addresses interpolation for areas with strongly varying station densities, is the socalled angular distant weighting (ADW) method. If we, for instance, assume to estimate a climate value for a grid node from a set of, say, 11 station observations, 10 of them lying south and only 1 station at a comparable distance north of the grid node, the interpolation result will of course be largely determined by the southern stations, even if the northern location represents a spatial trend in the variable of interest and should therefore be weighted the same as the entire southern cluster. To tackle this problem, ADW extends Eq. (4) with the angular distance weight for each input data point Pi out of n points by: !, !  m X X wi ¼ 1 þ wn ½1  cosðqn  qi Þ wn $ edi =CDD (4) n

n

where qi is the angle of an input data point Pi to the north, relative to the target grid node G (Caesar et al., 2006). The extension of weighting by the angular distance weight ensures that values of more isolated input data points have a greater weight (Hofstra and New, 2009). Although ADW is sometimes regarded as an independent interpolation method, the Euclidean distance is still an important parameter and warrants a designation of ADW as a specific IDW method. Today IDW and its derivatives are widely implemented

208

GIS in Climatology and Meteorology

standard routines in GIS software, which may be applied on a subset of observations by either limiting the search-radius or the number of input data points to be considered when determining a value for a grid node. If IDW is applied on the entire input data set, it would be per definition a global deterministic interpolation approach, but the strong imprint of input values close to grid nodes remains distinct. The major disadvantage of IDW interpolation is the frequent production of “bull’s eyes,” i.e., a more or less circular pattern clearly indicating the positions of input data points. This problem becomes apparent in Fig. 4, e.g., with somewhat isolated high precipitation pockets in the Harz mountain indicating the location of the Brocken Observatory. Due to these constraints, IDW interpolation requires densely and preferably evenly distributed input data, sufficiently representing the spatial variations of the target climate variable. Against this background, nowadays more sophisticated interpolation methods are increasingly preferred in climate studies. However, referring to Xavier et al. (2008) who developed high-resolution interpolated grids (0.25  0.25 degrees) of daily precipitation, temperature, solar radiation, relative humidity, and wind speed for Brazil, the comparative evaluation of alternative interpolation schemes based on cross-validation results clearly reveals the competiveness of IDW even against thin plate spline (see “Inverse distance weighting interpolation” section) and kriging interpolation (see “Basics of Kriging” section).

2.10.3.1.4

Trend surface analysis

Interpolation based on trend surface analysis (TSA) employs a spatial trend function to predict the value ṽg for a particular grid node G, either by fitting a plane or a curved surface to all input values vi (global deterministic interpolation) or a subset of them (local deterministic interpolation). In simple terms, TSA is a multiple regression with the statistically dependent target variable V and the xand y-coordinates as independent variables (statistical predictors). In its simplest form a spatial trend function sv(x, y) is determined by the parameters of a two-fold linear regression: n P

sv ðx; yÞ ¼ a0 þ b1 x þ b2 y

with

b1 ¼

i¼1

vi2 $

i¼1

b2 ¼

i¼1

x2i $

n P

n X

n X

n X

vi2 

n X

n X

yi vi

i¼1 n X

!2

xi vi

i¼1

yi vi 

i¼1

x2i $

xi vi $

i¼1

i¼1

x2i $

i¼1

a0 ¼

xi yi 

i¼1

n P

n P

n X

n X

xi vi $

i¼1

vi2 

i¼1

n X

xi yi

(5)

i¼1 n X

!2

xi vi

i¼1

n n n 1X b1 X b2 X vi  xi  yi n i¼1 n i¼1 n i¼1

where xi and yi are the x- and y-coordinates of the data points Pi with the input values vi, b1 and b2 which are the regression coefficients, determining the slope of the trend surface sv(x, y) in the x- and y-direction, and a0 is the constant term (intercept) in the regression. Although the observed input values vi at the data points Pi may distinctly differ from the trend surface, which to a certain extent relativizes the previously outlined idea of deterministic interpolation as a mathematically exact method, Eq. (5) determines the regression parameters a0, a1 and a2 by minimizing: n X

ðvi  bv ðxi ; yi ÞÞ2

(6)

i¼1

This so-called least-square method minimizes the sum of the squared residuals (i.e., squared deviations of observed values vi from the respective trend surface values ṽi at the points Pi) and thus ensures that the overall deviations of observed values from the trend surface are minimized. Since this approximation is mathematically exact, the classification of TSA as a deterministic approach is justified. If applied to the coordinates (xg, yg) of each grid node G of the grid, by:   bv g ¼ sv xg ; yg (7) the resulting values form a sloping planar surface, which represents the spatial gradient (i.e., the gradual linear change) of the target variable in one direction (e.g., a unidirectional increase of precipitation from SSE to NNW). TSA, however, is not limited to produce planar surfaces, as defined by a first-order (linear) regression (5) it can also produce curved surfaces, using higher-order polynomials estimated with the least square method, such as a second-order (quadratic) polynomial regression (8) or a third-order (cubic) polynomial regression (9). sv ðx; yÞ ¼ a0 þ b1 x þ b2 y þ b3 x2 þ b4 y2 þ b5 xy

(8)

sv ðx; yÞ ¼ a0 þ b1 x þ b2 y þ b3 x2 þ b4 y2 þ b5 xy þ b6 x3 þ b7 y3 þ b8 x2 y þ b9 xy2

(9)

GIS in Climatology and Meteorology

209

Polynomials of even higher order increase the complexity of the fitted surface and the R2, but complicate a meaningful interpretation of the resulting equation. Moreover, an overfitting by higher-order polynomials bears the risk for generating overshoots, especially at the edges of a spatialization domain (see “Validation and Accuracy Assessment” section). Against this background, TSA may be used as a standalone spatialization method only when the target variable shows a distinct spatial trend. Today, TSA is mainly used as a method to separate a spatial trend as part of other spatialization techniques such as thin plate spline (see “Trend surface analysis” section) or universal kriging (see “Ordinary kriging and simple kriging” section).

2.10.3.1.5

Thin plate spline interpolation

Spline interpolation is a general term for methods that employ mathematical functions designed to minimize the curvature of an interpolated surface when estimating spatial values from scattered input data. The objective function of generating a smooth (i.e., gradually changing surface) is addressed by different methods with the thin plate spline (TSP) interpolation being the most commonly implemented in GIS. TPS interpolation methods generate gridded estimates in a manner that the continuous surface is as smooth as possible (analogously to a thin metal plate, hence “thin plate”) while the input values at each data point are exactly or almost exactly preserved. Following the notation of Zhu (2016) and Lloyd (2010), for our purpose the general form of a TPS function can be expressed as: bv ðx; yÞ ¼ sv ðx; yÞ þ

n X

li r ðdi Þ

with

r ðdi Þ ¼ d2i log di

(10)

i¼1

where ṽ(x, y) is the TPS function, x and y are the coordinates of the data points Pi, sv(x, y) is a spatial trend function, di are distances to the data points, li are weights and r(di) is the so-called basis function; Eq. (10) is expressed in a generic form (Lloyd, 2010), which is designed to minimize the curvature of the surface by minimizing the sum of the squares of the second derivative (i.e., the curvature) over each point Pi (Zhu, 2016). In simple terms, the TPS function comprises two components: (1) a spatial trend function (see “Trend surface analysis” section), and (2) a weighted basis function, which, applied to the coordinates (xg, yg) of each grid node G of a grid network, by: n     X bv g xg ; yg ¼ sv xg ; yg þ li r ðdi Þ

(11)

i¼1

results in a continuous climate surface with a smooth appearance. A comprehensive description of the determination of weights (li) and parameters of the spatial trend function using matrix algebra is given in Lloyd (2010). Beyond this general form, TPS functions exist in a variety of different realizations; however, the methods “regularized spline” and “spline with tension” are the most established ones (for full mathematical descriptions see Mitas and Mitasova, 1999). Both methods are well suited to delineate smooth surfaces from irregular distributed input data values but the results may require subsequent corrections given that the estimated values frequently lie outside of the observed value range, exceeding (or undercutting) the observed maximum (or minimum) beyond a physically plausible limit. This problem is more distinct when using the regularized spline function, which tends to create so-called overshoots, i.e., nonplausible extreme values, particularly in areas with low data density, although the objective function of minimizing the curvature of the surface is fulfilled. This is addressed by tension spline with an adjustable tension parameter to control the curvature tolerance. Fig. 4 exemplifies an interpolation result, obtained with tension spline. Compared to the patchy IDW pattern, the smooth appearance of the temperature and precipitation fields is clearly an advantage although topographic effects are rarely captured. More advanced in this respect are TSP techniques which allow for the incorporation of additional independent variables (e.g., elevation) either as parametric linear submodels or directly considered in the spatial trend function (Hutchinson, 2004). A prominent example for the latter option of TPS interpolation is the WorldClim data set (Hijmans et al., 2005). The spatial high-resolution (30 arc sec) monthly climate surfaces were performed, using a second-order spline with latitude, longitude and elevation as independent variables. Fig. 12 gives examples of WorldClim spatialization results and illustrates that the orography is clearly represented.

2.10.3.2

Geostatistical Interpolation

Geostatistics is a relatively young discipline which is based on the theory of regionalized variables introduced by Matheron (1963). This theory basically states that a locally quantified variable, such as temperature, is generally comprised of two components: a deterministic component representing an aerial average (or a gradual change) of the respective variable, and a random component. Accordingly, a regionalized variable can be seen as the realization of a random function (random function model) in the form: V ¼ mv þ dv

(12)

where mv is the “first-order” deterministic component and dv is the “second-order” random component in the spatial variation of the variable V. From the perspective of a meteorologist or climatologist, the notion that a measured climate value is a priori an expression of a random process may contradict the geodetically exact georeferenced locations of weather stations, the use of standardized instruments or, more generally, the level of scientific understanding of the underlying processes. However, if we consider temperatures at set stations under a high fog (stratus) cloud, which locally dissolves during the course of the day, the

210

GIS in Climatology and Meteorology

turbulent entrainment at cloud top and the resulting cloud dissolution can be seen as a stochastic process given the great complexity of cloud physics. Accordingly, a forecast about which particular locations in such a situation will experience higher temperatures is rather some sort of lottery. Moreover, if we assume that cloud dissolution affects all stations with the same probability, the mean and the variance from subsets of stations in different parts of the area should be comparable, or, in other words, all values could be seen as realizations of the same random function model. This assumption is called stationarity of the random function model and is a basic assumption in geostatistics. Given that statistics generally rely on repeated observations, to estimate mean and variation measures and their uncertainties, in geostatistics, the stationarity assumption is required to obtain necessary replications for spatial predictions. With respect to diverse spatialstatistical characteristics of input data and the respective geostatistical methods to be used, this general assumption is further divided into different degrees of stationarity from which the lesser restrictive intrinsic hypothesis is adequate for most geostatistical applications (Isaaks and Srivastava, 1989). Intrinsic stationarity assumes that first, the expected values of the mean and the variance of a regionalized variable are spatially invariant, and secondly, that differences between point observations solely depend on their distance and direction, but not on their location. This distance-dependency of spatial variations is addressed in geostatistical analyses and kriging interpolation. In the following section, we will outline the basics of kriging and the underlying concept of semivariography as a technique to model spatial dependencies, and introduce the major kriging derivatives used in GIS.

2.10.3.2.1

Basics of kriging

Kriging interpolation is frequently assigned to the South African statistician and mining engineer Danie Gerhardus Krige, who pioneered the field of geostatistics (Krige, 1951). However, the kriging approach itself, though named after Krige, was formalized by the French mathematician and engineer Georges Matheron (Matheron, 1963). Kriging may be described as a mathematical manifestation of the prominent “First Law of Geography” of Waldo Tobler, which states that “everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). Expressed in a more general way, kriging is based on the assumption that environmental processes and settings (in this context weather and climate) at locations close to one another are more alike than those further apart. In accordance with this assumption, kriging predicts an unknown value vg of a variable V at a non-sampled location (e.g., a grid node) G by a weighted arithmetic mean of the values vi observed at n neighboring data points (Pi). In contrast to IDW, the kriging weights li are not only determined by the distances between data points and the target location but also by the functional correlation of variability and distances hi between the points Pi, referred to as spatial autocorrelation of the variable V. The consideration of the spatial autocorrelation is based on the assumption that any interpolation between observed values vi is only meaningful if the variable V shows a neighborhood relation or a spatial-statistical dependency at neighboring points. Given that spatial autocorrelations vary with variables and regions of interest, this distance dependency of the variability of the target variable V needs to be analyzed for kriging applications using a variogram. The variogram analysis, which is subsequently introduced, is a central part in kriging interpolation, since it determines the weights li for a statistically optimized (i.e., unbiased) prediction with minimized error variance. This statistical objective function is an important distinction compared to deterministic TPS interpolation. Although TPS comprises a trend component and a local component, quite similar to the random function model in geostatistics, the TPS basis function minimizes the curvature of the interpolated surface, which is a mathematical formalism to achieve a spatial representation of local deviations from the underlying spatial trend, while kriging interpolation minimizes the error variance and thus considers a statistical formalism in spatial prediction.

2.10.3.2.2

Semivariance and variogram

In simple terms, a variogram (or semivariogram) is a graph that expresses the dissimilarity of paired values vi as a function of the distances between the respective data points. The dissimilarity is statistically measured by the empirical (experimental) semivariance g(h), which is defined as: gðhÞ ¼

n     2 1 X v Pj þ h  v Pj 2 m j¼1

(13)

where m is the number of pairs of data points, Pj þ h and Pj are the vector coordinates of the points and v(Pj þ h) and v(Pj) are the paired values separated by the distance (and optionally direction), h in variogram analyses is referred to as “lag.” In an empirical variogram, the semivariances are plotted against lags, each given as a distance class with equidistant “bin size” (e.g., a bin size of 1 km with the lags 0–1, 1–2 km and so on). Note that the selection of an appropriate bin size is crucial, since smaller bin sizes may result in a rather noisy variogram pattern, while bin sizes that are too large mask small-scale variation structures. For the analysis of directional “anisotrophic” effects and phenomena, e.g., a spatial gradient in the variable V, variograms can be computed for different directions (e.g., NW–SE within a prescribed angular tolerance of  30 degrees) or “omnidirectional,” i.e., for all directions. In the latter case, the maximum number m of pairs of data is determined by 0.5 (n2  n) and only occurs at higher lags. When m equals 0.5 (n2  n), the semivariance is identical with the statistical variance of all values vi from the entire input data set, while the semivariance, as expressed in the co-terminus “semi,” is based on a subset of data pairs falling in the respective lag class. A generic example for an empirical variogram is given in Fig. 5, depicting the typical distance-dependent changes of semivariances. Starting from the low semivariance level near the origin, the semivariances increase with lag distance, indicating that the spatial

GIS in Climatology and Meteorology

γ (h)

Range (A)

Gaussian

2.0

γ (h) = C0 + (C − C0)· 1 − exp

1.5

− h2 A2

Spherical γ (h) = C, for h > A

Sill (C)

Semivariance

211

1.0

γ (h) = C0 + (C − C0)·

0.5

0.0 20

40



1

h2

2 A2

Exponential

Nugget variance (C0) 0

3h 2A

60

80

100

120

140

160

γ (h) = C0 + (C − C0)· 1 − exp

180 [km]

−h A

Lag (h)

Fig. 5

Empirical variogram (dots) with fitted variogram model (left); Standard variogram models (right).

correlation of the input values becomes smaller. At some distance the semivariance stabilizes at the so-called “sill” level which equals the variance of the input values (vi). The distance where this sill level is reached is called “range” (A), and indicates the maximum distance to which data are spatially correlated. Unlike IDW, which requires an a priori definition of a search radius or a maximum number of data points when estimating grid node values from subsets of neighboring values, the range obtained from the variogram objectifies the distance up to which data points should be considered when predicting a grid node value. Empirical variograms provide various insights into the spatial variation structure and particularly depict typical patterns in the distance-dependent changes and thus the heterogeneity of climate when, for example, comparing mountainous areas with lowlands. For the determination of the kriging prediction function, however, empirical variograms need to be approximated by mathematical functions. These curve functions, also denoted as theoretical variograms or variogram models, are fitted to the empirical semivariances mostly using least square fitting, which minimizes the sum of the squared differences between the empirical semivariances and the curve. Fig. 5 (left) gives an example for a spherical variogram model. The combined curve function g(h) comprises a constant C which represents the sill variance for lags above the range (h > A), a spherical component (or structured component) for lags below the range (h  A) and a so-called “nugget variance” C0 for infinitely small distances below the sampling interval. The term nugget variance (or “nugget effect”) refers to the origin of the kriging method, which was developed for geophysical prospecting efforts in South Africa. Thus, in the case of gold mining, the irregular, random distribution of gold nuggets is associated with a high variance in the gold content when analyzing rock samples from the same sample location. In addition to the detection of the small-scale random variability, measurement, sampling and analysis errors can also be reasons for the occurrence of a nugget effect.

2.10.3.2.3

Ordinary kriging and simple kriging

Kriging is an umbrella term for a variety of geostatistical methods from which ordinary kriging (OK) is the most frequently used approach. Referring to the previously introduced intrinsic hypothesis, for OK the mean of a target variable V must not be known nor needs to be strictly spatially invariant, but is only assumed to be constant over the search neighborhood (i.e., within the range, A). The unknown value vg of a variable V at a non-sampled location G is predicted from the input values vi of the neighboring data points Pi as a weighted average by: bv g ¼

n X

li vi

with

i¼1

n X

li ¼ 1

(14)

i¼1

where n is the number of data points Pi lying within the range A and li are weights which need to sum up to 1 to ensure an unbiased prediction. Unlike IDW, where the weights are rather simple derivatives of the distances between data points and target grid nodes, the weights li in OK explicitly minimize the error variance (17) and prevent systematic over- or underestimation. Since, according to the intrinsic hypothesis, the mean mv is assumed to be constant within the range A and all differences between the unknown value vg and the input values vi as well as between all paired values vi and vj depend on their distance (and eventually direction), the weights li are solely estimated from the coefficients of the variogram model, by solving: n X j¼1

  lj g Pi  Pj þ j ¼ g ðPi  GÞ

and

n X j¼1

lj ¼ 1

(15)

212

GIS in Climatology and Meteorology

where g(Pi  Pj) are the semivariances for the distances (lags) between paired data points with the coordinate vectors Pi and Pj, g(Pi  G) are the semivariances for the distances between the data points Pi and the target grid node G, and J is the Lagrange multiplier, required for the minimization of the kriging error variance. According to Eq. (15) the weights li are optimally estimated if the semivariances between the data points Pi and the target grid node G equal the sum of the weights li multiplied by the semivariances for the distances between paired data points, plus the Lagrange multiplier (cf. Lloyd, 2010). Solving this optimization problem is based on the OK equation system with the n þ 1 x n þ 1 matrix of semivariances: 3 3 2 gðPi  P1 Þ . gðPi  Pn Þ 1 1 l1 6$ 7 6 $ $ $ 7 7 6 7 6 7 6 7 6 6$ 7 6 $ $ $ 7 7 6 7¼6 6$ 7 6 $ $ $ 7 7 6 7 6 7 6 7 6 4 ln 5 4 gðPn  P1 Þ . gðPn  Pn Þ 1 5 1 . 1 0 j 2

3 gðP1  GÞ 7 6 $ 7 6 7 6 7 6 $ 7 6 7 6 $ 7 6 7 6 4 gðPn  GÞ 5 2

(16)

1

where l1 to ln are the weights, n is the number of data points Pi and J is the Lagrange multiplier. Note that both the weights and the Lagrange multiplier are only estimated from the lag-dependent semivariances g, obtained from the variogram model. Since kriging is a probabilistic method, which assumes observations to be realizations of a random process and consequently conceptualizes interpolation results as one of many probable realizations, kriging allows for the calculation of an error variance of the predictions. Based on the variogram model, the error variance: b s 2g ¼

n X

li g ðPi  GÞ þ j

(17)

i¼1

is given by the sum of the weights li multiplied by the semivariances g for the lags separating the data points Pi and the grid node G, plus the Lagrange multiplier (J). To explain this in simple terms, assume we want to interpolate a grid value halfway between only two points with the vector coordinates P1 and P2; the error variance is the semivariance at the respective distance from the variogram model. Accordingly, for grid nodes which have no neighboring data points within the range A, the error variance equals the sill of the variogram. Based on the error variance of each grid node, the square root sk of the error variance may be additionally considered as an estimate of the standard error of the predicted value ṽg which allows us to compute confidence intervals for the “real” unknown grid values (vg). Results of OK interpolation of annual mean precipitation and temperature as well as the respective kriging variances are shown in Fig. 6 for the German example domain. The pattern of the error variance clearly depicts the close dependency on the distribution of the input data, with the lowest variances near the station locations equaling the nugget variance, and high values close to the sill variance at areas with lower station densities. Another kriging approach frequently implemented in GIS is simple kriging (SK). As the name suggests, SK is a mathematically simplified kriging variant, but presumes stationarity of the first moment over the entire domain with a known mean (mv). Since this is seldom the case (especially when interpolating climate values), practical assumptions for the application of SK are that the expected mean m is either 0 over the entire target area or is prescribed (e.g., from a climate model). Although simple kriging is mathematically the simplest, the requirement of first moment stationarity limits its application. Due to this, SK is assessed to be the least general kriging derivative. However, if we assume that the results of a statistical or numerical climate model already sufficiently represent the large scale (first-order) spatial variation of a target variable, SK may be applied to the model residuals (i.e., the differences between the observed and modeled values). Given that the mean of the residuals should equal 0 (at least in case of proper bias corrected model results), SK interpolation of the model residuals, representing second-order effects in the variation pattern of the target variable, then supports a geostatistical optimization of modeling results.

2.10.3.2.4

Universal kriging and regression kriging

Although the intrinsic hypothesis was previously introduced to be less restrictive in terms of stationarity requirements, the assumption that the expected values of the mean and the variance of a regionalized variable are spatially invariant is of course violated when addressing climate spatialization in environments with distinct large-scale changes in the variation pattern, such as, for instance, mountainous areas. To tackle the problem of overlaying spatial trends in the input data, various modifications and extensions of the general kriging approach have been developed, with universal kriging (UK) being the most prominent one in this regard. UK was introduced by Matheron (1973) himself, who principally proposed to detrend the input data by a spatial trend function and to apply (universal) kriging to the deviations of the observed values vi from the respective trend surface estimates (i.e., the residuals of the trend surface). Formally, universal kriging splits the random function model into a deterministic trend component m(x, y), represented by a spatial trend function and an intrinsically stationary random component d(x, y). If we assume an input data set with an overriding trend, approximated by a trend surface sv(x, y) obtained using trend surface analysis (see “Inverse distance weighting interpolation” section), and an autocorrelation among the detrended data values vi, the

GIS in Climatology and Meteorology

OK

213

OK

10 20 30 40 [km]

10 20 30 40 [km]

σ2

σ2

10 20 30 40 [km]

10 20 30 40 [km]

UK

UK

10 20 30 40 [km]

10 20 30 40 [km]

Temperature (°C)

2.6

4.0

5.4

6.8

Precipitation (mm)

8.2

600

9.6

800

1000

1200

1400

1600

Fig. 6 Annual mean temperature (left) and precipitation (right) fields (1961–1990) estimated with different kriging techniques. Ordinary kriging and kriging variance (s2 of OK), universal kriging (UK).

unknown value vg of a variable V at a nonsampled location G is then predicted from the neighboring data points Pi as a weighted average by: bv g ¼

n X

  li vi þ sv xg ; yg

with

i¼1

n X

li ¼ 1

(18)

i¼1

where n is the number of data points Pi lying within the range A, li are weights and sv(xg, yg) is the value of the trend surface at the coordinates (xg, yg). Based on the coefficients of a variogram model, the weights li for the interpolation then can be derived by solving the following system of linear equations: n X

  lj g Pi  Pj þ j þ sv ðxi ; yi Þ ¼ g ðPi  GÞ

with

j¼1

n X

lj ¼ 1

(19)

j¼1

where g(Pi  Pj) are the semivariances for the distances (lags) between paired data points with the coordinate vectors Pi and Pj, g(Pi  G), which are the semivariances for the distances between the data points Pi and the target grid node G, and J is the Lagrange multiplier, the same as required in OK for the minimization of the kriging error variance. In accordance with Eq. (19) the error variance of predictions, obtained by UK is then given by: b s 2g ¼

n X i¼1

li g ðPi  GÞ þ j þ sv ðxi ; yi Þ

(20)

214

GIS in Climatology and Meteorology

where g(Pi  G) are the semivariances for the distances between the data points Pi and the target grid node G obtained from a variogram model, sv(xi, yi) are the values of the trend surface at the coordinates (xi, yi), and J is the Lagrange multiplier. UK may be performed using first- or higher-order polynomials. Moreover, UK may merge TSA with both, simple kriging or OK; however, today the combination of TSA and OK is a common standard in GIS software. In Fig. 6, the interpolation results of UK, integrating only a first-order polynomial, are compared to the results from OK. Although the variation patterns in both realizations are largely comparable, the cross-validation illustrates a higher accuracy of the universal kriging approach for both the temperature and the precipitation surface (see “Validation and Accuracy Assessment” section). UK commonly considers the x- and y-coordinates as statistically independent variables in the underlaying TSA, while an integration of other covariates (i.e., statistical predictors) is not generally supported by GIS software. However, statistically speaking, the separation of the deterministic component in a random function model is not necessarily limited to the coordinates but may likewise consider additional statistical predictors. This is addressed in regression kriging (RK) or kriging with external drift, where the deterministic component of the random function is modeled, based on one or a set of independent predictors, using uni- or multivariate linear regression analysis. From a geostatistical perspective, these additional variables may be termed “auxiliary” variables, considered to improve the precision of a kriging approach. However, as shown in the following section these predictors ideally explain a high share of the observed spatial variability. Against this background, the kriging element in RK could be likewise seen as an extension for multivariate statistical spatialization methods.

2.10.3.3

Multivariate Statistical Approaches

Multivariate statistics is an area of statistics that simultaneously analyses two or a set of variables, particularly targeting the statistical relations among different variables. From the broad range of principal methods subsumed under the general term multivariate statistics, correlation and regression analyses are of major concern in the context of climate spatialization. The correlation analysis quantifies the strength of a relationship between variables, expressed in a correlation coefficient, while regression analysis quantifies statistical dependencies between variables, expressed in a statistical function that explicitly differentiates between a dependent variable (target variable, predictand) and one or a set of independent variables (explanatory variable, predictors). A regression analysis can thus be seen as a statistical form of predictive modeling, which estimates the expected value of a predictand as a function of the respective values of one or a set of predictors with a certain statistical probability. Results of correlation and regression analyses are closely related, given that the coefficient of explanation of a regression (R2) equals the squared correlation coefficient (R) while the correlation coefficient can be derived as the tangent of the slope (given in degrees) of a linear regression from z-transformed variables (Bahrenberg et al., 2010). In the thematic context of climate spatialization, multivariate statistics present possibilities for a diverse range of applications, given that spatial variations of a climate variable, though in geostatistics formally regarded as realizations of a random function, are often clearly related to forcing factors determining the spatial behavior of a target variable. In the following subsections we briefly summarize multivariate regression analysis (MRA) and its application as a global probabilistic approach for climate spatialization (see “Multivariate regression analysis” section). Thereafter, we introduce geographically weighted regression (GWR) as a local spatialization approach (see “Geographically weighted regression” section) and finally highlight machine learning, which is an increasingly emerging field in climate spatialization (see “Machine learning algorithms” section), considered in this chapter with respect to its substantial overlaps with multivariate statistics.

2.10.3.3.1

Multivariate regression analysis

Multivariate regression analysis (MRA) is a well-established approach in climate spatialization, given that the method, though rather simple, is quite efficient, particularly when addressing climate variables with distinct statistical dependence, as indeed is the case for many climate variables dependent on elevation. Given the increasing availability of high-resolution DEM data, regression of climate variables using elevation and optionally x- and y-coordinates as explanatory variables is a frequently used, though somewhat ad hoc, approach in spatialization. Referring to the “Trend surface analysis” section, an extension of the first-order polynomial from x- and y-coordinates by altitude z results in a linear spatial trend function in the form: sv ðx; y; zÞ ¼ a0 þ b1 x þ b2 y þ b3 z þ 3 (21) where x and y are the x- and y-coordinates, z is the altitude and 3 is the error term. The regression coefficients b1, b2 and b3 determine the slope of the trend surface sv(x, y, z) in x-, y- and z-directions, and a0 is the constant term (intercept) in the regression. As previously introduced, with respect to TSA, the “ordinary least square” multiple regression with the regression parameters a0 and b1 to b3 minimizing the sum of the squared residuals of the statistical regression model, defined by: n X 2 3 i with 3 i ¼ vi  b v ðxi ; yi ; zi Þ (22) i¼1

where vi and ṽ(xi, yi, zi) are the observed and estimated values of the target variable V, xi and yi are the x- and y- coordinates of the data points Pi and zi are the altitudes. The explicit notation of the residuals as random errors 3 i reflects that regression analysis is a probabilistic method. Comparable with geostatistical methods (but contrasting with deterministic approaches) probabilistic spatialization methods assume observations to be random variables (i.e., to be realizations of a random process) entailing random errors.

GIS in Climatology and Meteorology

215

Accordingly, the statistical significance of obtained results needs to be tested commonly using F-test statistics for the overall significance of the regression equation and t-test statistics for the significance of the regression parameters. The F-test compares the explained variance and the error variance, using the F statistic, defined by: n P 1 ðbv i  vÞ2 k i¼1 F¼ (23) n P 1 2 nk1

i¼1

3i

where ṽi are the estimated values of the target variable V with the observed mean v, n is the number of data points Pi, k is the number of explanatory variables, and 3 i are the random errors. As shown in Eq. (23), the F statistic is the ratio of the explained variance and the error variance. Statistical significance is given if the statistical probability of test value F exceeds a critical probability threshold (e.g., alpha < 0.05), obtained from the probability density function of the Fisher F-distribution below a critical level. The t-test statistics for testing the significance of the regression coefficients are based on the standard errors of the regression parameters. The t statistics, exemplified as follows for the explanatory variable altitude (z), are given by: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n n P P 1 1 ðvi  vÞ2  b2z n1 ðzi  zÞ2 n1 i¼1 i¼1 bz sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with sb ¼ tb ¼ (24) sa n P 2 pffiffiffiffiffiffiffiffiffiffiffiffi 1  ð z  z Þ n  2 i n1 i¼1

where bz is the regression coefficient with the standard error sb, vi are the observed values of the target variable V with the mean value v, zi are the values of the explanatory variable (here: altitude Z) with the mean value z, and n is the number of data points. The t statistics indicate significant regression coefficients (i.e., coefficients significantly differing from 0) if the statistical probability of the test value tb obtained from the probability density function of the student t-distribution exceeds a critical level (e.g., alpha < 0.05). Significance tests of regression coefficients are particularly relevant, given that only statistically significant explanatory variables should be included in the regression equation. However, when using stepwise regression in most GIS software and statistical software packages, explanatory variables are successively added or removed based on t-statistics of their coefficients. For further details about regression analytical statistics, see Sen and Srivastava (2012). Implications of multicollinearity, falsely inflating the significance of regression coefficients in the case of significantly correlated predictors are addressed in Zhu (2016). Variants of regression approaches, such as “generalized least square” regression, which allows for examining the spatial dependence of variables, are discussed in Ward and Gleditsch (2008). Here, we only refer to ordinary least square regression, since this is the most commonly used regression approach in climate spatialization. Assuming significant results of a multiple linear regression analysis with (for instance) altitude and the x- and y-coordinates as predictors, the regression equation of the form (21) then can be applied as a spatialization function to estimate the unknown values vg for each grid node G by:   bv g ¼ sv xg ; yg ; zg ¼ a0 þ b1 xg þ b2 yg þ b3 zg (25) where xg and yg are the x- and y-coordinates of the grid nodes G, and zg are the altitudes, obtained from a DEM. As shown in Fig. 7, the resulting trend surfaces for annual temperatures and precipitation totals largely mirror the influence of altitude in the spatial pattern. In particular, the regression coefficients (b3) indicate a quite distinct gradient in the mean annual precipitation of 108.6 mm/100 m and the environmental (i.e., near ground) temperature lapse rate of -0.61 C/100 m almost equals the tropospheric reference gradient of  0.65 C/100 m of the International Standard Atmosphere (ISA). The predictor altitude already explains approximately 40% of the precipitation and even more than 85% of the temperature variance, and the adjusted R2 of the entire regression models obtained by cross validation are 70.9% for precipitation and 90.6% for temperature (see Table 1). Although ordinary least square linear regression analyses only quantify the linear components in the statistical relationships among analyzed variables, nonlinear relations may be captured by using higher-order polynomials or by transforming the input data. The latter option is particularly recommended in case of asymmetric data distributions masking the statistical dependencies. Precipitation data, for instance, is often strongly right-skewed and its logarithmic transformation may result in an increased R2 of the multiple regression and particularly in a better representation of the expected patterns, such as the exponential increase of precipitation with elevation below the average condensation level (Karger et al., 2016). Further improvements of the spatialization results may be achieved by residual kriging, which is applying kriging interpolation on the residuals 3 i of the statistical regression model. In Figs. 7 correction layers, obtained by OK interpolation of the regression residuals, had been added to the spatial trend surfaces. According to the results of cross-validation (see “Validation and Accuracy Assessment” section), consistently applied to both multiple regression and residual kriging, the already high R2 of the temperature regression was only slightly improved, while the R2 of the precipitation distribution in contrast increases markedly to 86.9%. MRA combined with residual kriging is an efficient approach when aiming at precise climate mapping. Regression analysis, however, is certainly not only targeted on mapping applications but supports explorative analyses of topographic effects on climate variations at different scales (see “Terrain Parameterization” section), especially when integrating further meaningful predictors. Of high importance in this regard is terrain analysis, given the ubiquitous availability of quite accurate, high-resolution DEMs from

216

GIS in Climatology and Meteorology

MRA

MRA

10 20 30 40 [km]

10 20 30 40 [km]

MRA-OK

MRA-OK

10 20 30 40 [km]

10 20 30 40 [km]

GWR

GWR

10 20 30 40 [km]

10 20 30 40 [km]

Temperature (°C)

2.6

4.0

5.4

6.8

Precipitation (mm)

8.2

9.6

600

800

1000

1200

1400

1600

Fig. 7 Annual mean temperature (left) and precipitation (right) fields (1961–1990) estimated using multivariate statistical approaches. Multivariate regression analysis (MRA), multivariate regression analysis and residual kriging (MRA-OK), geographically weighted regression (GWR).

radar interferometry (SRTM), stereo-photogrammetry (ASTER) or laser scanning (Conrad et al., 2015). Although the surface elevation is commonly the primary orographic predictor, the first and second derivatives of elevation (slope, aspect, curvature) are likewise occasionally considered when developing regression models either as standalone approaches or combined with interpolation methods (e.g. Hudson and Wackernagel, 1994; Böhner and Schröder, 1999; Agnew and Palutikof, 2000; Ninyerola et al., 2000; Perry and Hollis, 2005; Bolch, 2006; Böhner and Selige, 2006; Kessler et al., 2007; Soria-Auza et al., 2010). More sophisticated, process-oriented terrain parameters as well as surface parameters are highlighted in the “Land-Surface Parameterization” and “Downscaling” sections.

2.10.3.3.2

Geographically weighted regression

Results of MRA approaches introduced thus far may already capture the same basic regularities in climate variations. The seasonal course of temperature gradients in the German example domain for instance, clearly mirrors the higher frequencies of dry adiabatic lapse rates in summer by average monthly temperature gradients of almost -0.8 C/100 m while the recurrent occurrence of temperature inversions in winter drop the environmental temperature lapse rates below 0.5 C/100 m. Cold air accumulation and the formation of inversion layers, however, may be more pronounced in areas with steep cut valleys but diminish with lessening relief energy. In similar cases of such regional variations, which are particularly apparent in larger areas with different climate regimes, and become even more distinct when analyzing daily observations instead of long-term means, a spatially static representation of relationships between a target variable and its predictors by conventional global MRA is hardly suitable.

217

GIS in Climatology and Meteorology

Table 1 Spatialized grid statistics (minimum, mean, maximum, and standard deviation) and leave-one-out cross-validation (LOOCV) results for monthly (January and July) and annual temperature ( C) and precipitation totals (mm) (1961–1990) using different spatialization methods: triangle interpolation (TI), inverse distance weighting (IDW), trend surface analysis (TSA), thin plate spline (TPS), ordinary kriging (OK), universal kriging (UK), multivariate regression analysis (MRA), MRA with Residual Kriging (MRA-OK), and geographically weighted regression (GWR)

Observations n ¼ 76 Grid statistics TI IDW TSA TPS OK UK MRA MRA-OK GWR LOOCV TI IDW TSA TPS OK UK MRA MRA-OK GWR

Observations n ¼ 128 Grid statistics TI IDW TSA TPS OK UK MRA MRA-OK GWR LOOCV TI IDW TSA TPS OK UK MRA MRA-OK GWR

January

January

January

January

July

July

July

July

Year

Year

Year

Year

Min ( C) 4.20 Min ( C) 3.89 3.96 1.07 4.20 3.72 3.82 5.02 5.12 5.08 NRMSE 0.14 0.13 0.13 0.13 0.12 0.12 0.06 0.05 0.07

Mean ( C) 0.04 Mean ( C) 0.26 0.04 0.19 0.17 0.20 0.20 0.04 0.04 0.07 RMSE ( C) 0.96 0.88 0.88 0.85 0.82 0.83 0.40 0.34 0.49

Max ( C) 2.50 Max ( C) 2.46 2.44 2.04 2.52 2.46 2.56 1.63 1.73 1.64 R2 (%) 36.48 48.28 45.73 49.68 53.36 51.88 88.78 91.87 83.28

SD ( C) 1.21 SD ( C) 0.90 0.73 0.82 1.21 0.98 1.03 1.15 1.20 1.17 AEM 3.41 2.63 3.20 2.91 2.63 2.65 1.25 0.77 1.54

Min ( C) 10.30 Min ( C) 10.81 10.74 14.33 10.30 11.13 11.03 10.43 10.33 10.27 NRMSE 0.15 0.13 0.14 0.15 0.13 0.13 0.11 0.04 0.05

Mean ( C) 16.45 Mean ( C) 16.77 16.55 16.77 16.75 16.78 16.78 16.45 16.45 16.31 RMSE ( C) 1.21 1.04 1.13 1.19 1.03 1.05 0.85 0.36 0.43

Max ( C) 18.30 Max ( C) 18.11 18.26 21.43 19.88 18.26 18.36 17.86 17.96 17.87 R2 (%) 14.96 39.63 25.43 17.51 38.00 35.62 80.36 92.58 87.11

SD ( C) 1.32 SD ( C) 0.75 0.59 0.86 1.10 0.80 0.85 1.28 1.33 1.30 AEM 4.94 4.25 5.40 4.51 4.28 4.28 2.16 0.94 1.23

Min ( C) 2.90 Min ( C) 3.32 3.24 6.67 2.90 3.55 3.45 2.64 2.54 2.58 NRMSE 0.15 0.14 0.15 0.15 0.13 0.13 0.07 0.04 0.06

Mean ( C) 8.18 Mean ( C) 8.49 8.27 8.46 8.44 8.47 8.47 8.18 8.18 8.11 RMSE ( C) 1.05 0.92 1.00 1.02 0.89 0.91 0.47 0.30 0.40

Max ( C) 9.70 Max ( C) 9.68 9.66 11.66 10.10 9.69 9.79 9.36 9.46 9.19 R2 (%) 23.28 43.63 30.78 27.22 44.66 42.06 90.62 93.55 88.79

SD ( C) 1.21 SD ( C) 0.73 0.59 0.75 1.05 0.79 0.84 1.18 1.23 1.19 AEM 4.18 3.31 4.43 3.68 3.36 3.36 1.32 0.86 1.04

January

January

January

January

July

July

July

July

Year

Year

Year

Year

Min (mm) 26.70 Min (mm) 27.20 27.18 0.00 23.00 27.09 26.55 16.25 16.87 17.50 NRMSE 0.10 0.10 0.12 0.10 0.09 0.09 0.07 0.06 0.06

Mean (mm) 63.28 Mean (mm) 59.58 60.92 59.91 58.02 58.08 57.50 56.91 56.91 57.42 RMSE (mm) 16.36 16.21 18.50 16.15 14.28 14.45 11.24 8.80 9.00

Max (mm) 186.40 Max (mm) 170.64 167.00 123.21 186.53 171.02 174.44 171.04 179.04 187.04 R2 (%) 40.26 49.28 33.47 49.47 60.44 59.51 72.56 84.97 84.28

SD (mm) 22.78 SD (mm) 17.28 13.61 16.69 19.39 18.18 18.36 20.48 20.67 20.87 AEM 95.68 102.91 121.98 93.91 89.84 89.78 33.31 22.20 22.71

Min (mm) 44.70 Min (mm) 45.48 44.95 23.07 42.59 44.89 44.00 43.48 45.15 46.83 NRMSE 0.12 0.11 0.13 0.11 0.10 0.11 0.09 0.07 0.07

Mean (mm) 71.95 Mean (mm) 69.00 70.04 69.20 68.06 68.14 68.82 67.49 67.89 67.50 RMSE (mm) 10.24 10.08 11.05 9.97 9.18 9.36 7.92 6.13 6.27

Max (mm) 132.90 Max (mm) 124.79 124.74 121.43 133.38 124.74 127.23 125.49 131.36 137.22 R2 (%) 53.58 59.14 50.48 59.76 65.90 64.50 54.93 84.74 84.04

SD (mm) 15.77 SD (mm) 12.83 10.55 13.19 14.19 13.47 13.60 13.31 13.43 13.56 AEM 49.27 52.18 63.42 47.69 46.38 46.35 27.04 25.12 25.69

Min (mm) 469.20 Min (mm) 469.91 469.80 118.62 421.92 468.51 459.14 391.17 406.27 421.38 NRMSE 0.10 0.10 0.11 0.10 0.09 0.09 0.07 0.05 0.06

Mean (mm) 762.30 Mean (mm) 731.15 742.35 732.28 717.88 717.81 710.63 713.55 713.59 718.62 RMSE (mm) 134.38 129.84 149.74 130.93 116.69 118.86 89.69 68.28 80.90

Max (mm) 1814.10 Max (mm) 1682.07 1651.87 1284.00 1815.69 1676.81 1710.34 1658.10 1735.62 1813.14 R2 (%) 42.71 52.88 36.72 51.76 61.65 60.21 70.88 86.85 82.48

SD (mm) 189.04 SD (mm) 144.43 114.46 145.76 158.80 151.01 152.52 168.37 169.95 171.52 AEM 773.22 844.41 1026.48 752.98 729.70 729.20 264.67 223.74 291.36

Spatially varying statistical relations are better represented when conducting a standard regression analysis for subsets of input data in a moving window. If the regression coefficients are simply assigned to the center of each moving window and subsequently interpolated to the example domain, the outputs of such an approach are spatially continuous gridded surfaces of regression parameters, accounting for regional variations in the analyzed statistical relationships. An advanced extension of standard moving window technique is geographically weighted regression (GWR). Based on the assumption that statistical relations between target variables and predictors change with increasing distance, GWR extends the standard regression approach by a distance-depending weighting scheme. In simple terms, GWR integrates inverse-distance weighting of input data, as considered in several interpolation methods, with local regression analysis.

218

GIS in Climatology and Meteorology

If we at first analyze solely the elevation dependent variations of a climate variable V from a set of n observations vi at altitudes zi, the local GWR for each location (data point) Pj is defined by: bv j ¼ aj þ bj zj þ 3 j with n X

aj ¼

wij z2i

i¼1

n X

wij

i¼1

bj ¼

n X

i¼1 n X

n X

wij vi 

wij

i¼1 n X i¼1

i¼1

n X

wij z2i



wij zi vi 

n X

wij zi

i¼1

n X

wij zi vi

!2

wij zi

i¼1

i¼1

wij

n X i¼1

n X

i¼1 n X

wij zi

i¼1

(26)

wij vi

i¼1 n X

wij z2i 

n X

!2

wij zi

i¼1

where aj is the intercept, bj is the regression coefficient and 3 j is the error term of the regression equation. The observed values vi and the altitudes zi are weighted by wij indicating the weights for the input data points Pi with respect to the jth location (Pj). In its simplest form, a GWR equation may be obtained from a subset of observations within a specified radius around the target point Pj each equally weighted by 1 while data points outside the search radius are not considered (i.e., wij ¼ 0). Given that regression parameters are largely determined by extreme values, however, such a simple approach results in discontinuously changing regression parameters, particularly in the case of varying spatial data densities. More advanced in this respect are continuous weighting functions (so-called “kernel” functions) as introduced, e.g., for IDW (see “Inverse distance weighting interpolation” section). An often-used kernel in GWR is the Gaussian weighting scheme, which results in a Gaussian curve-like decrease of weights with increasing distance. The Gaussian kernel is given by: d 2 wij ¼ e

1 2

ij b

(27)

where wij are the weights for the values of the data points Pi with respect to the jth location (Pj) and dij are the respective distances. The parameter b is the so-called “bandwidth,” which is given in the same units as the coordinates and controls the distancedependent weighting. Lower bandwidths compress the Gaussian curve, leading to a rapid decline of weights with increasing distance, while at higher bandwidths the weights become increasingly the same. Accordingly, the bandwidth needs to be adjusted to the data situation using higher (lower) b values in case of sparsely (densely) distributed data points. However, given that in climate studies the data coverage often varies in space especially when addressing larger areas with remote environments, adaptive kernels, e.g., auto-adjusted to a specified number of data points lying within the bandwidth-radius around the target point Pj are generally advantageous and accordingly today form a common standard in GIS. Further methods and criteria addressing the problem of kernel optimization are detailed in Fotheringham et al. (2002). The GWR approach introduced so far is assumed to be applied for each data point Pi separately. Accordingly, the results are n sets of regression parameters which need to be interpolated to the target domain to enable an estimation of the values vg for each grid node. Alternatively, Eq. (26) may likewise be directly calculated for each grid node, but the computational requirements increase markedly. A compromise in this respect is to calculate the regression parameters for a regular spaced coarser grid network, which is subsequently refined to the target grid resolution using, e.g., TPS interpolation. The latter option was applied in Fig. 7, first computing GWR parameters in a 1 x 1 km resolution which was subsequently refined to the target 250 x 250 m resolution grid. To facilitate comparability with the MRA results, only altitude was considered as predictor. Although the variation pattern of GWR and MRA are fairly comparable, in the case of mean annual precipitation the GWR approach clearly outperforms MRA with an R2 of 82.5% (see Table 1). Further predictors suitable in GWR applications are highlighted in the “Land-Surface Parameterization” section. In general, GWR is a powerful spatialization approach which enables a flexible integration of predictors for systematic analyses of spatially varying relationships. Accordingly, soon after its introduction by Fotheringham et al. (2000), GWR became an established standard GIS routine and presently is applied widely in climate spatialization (e.g., Brunsdon et al., 2001; Fotheringham et al., 2002; Foody, 2003). Although GWR allows for a spatially differentiated analysis of predictor-predictand relationships, one has to keep in mind that these are statistical relations and its physical meaning requires careful examination. Particularly when analyzing, for example, elevation-temperature dependencies in areas of rather low relief energy, GWR may yield physically implausible lapse rates which, in case of sparse data coverage in adjacent mountainous areas, would serve to bias the spatialization results. More robust in this regard is the parameter regression on independent slopes model (PRISM), which applies to local climateelevation regression functions in areas with distinct terrain features but disregards elevation in rather flat areas (Daly et al., 1994; Daly et al., 2008). In the empirical modeling procedure at first the observational database is spatially clustered with respect to different climate regimes and topographic facets, using a coordinated set of rules, calculations and eventually “expert” decisions. Although the governing equation applied to the different climate regimes is a simple linear climate-elevation regression, we refer to PRISM here as a multivariate statistical approach with respect to its comprehensive weighting scheme considering, e.g., elevation, distance, terrain aspect and proximity to coastlines. Moreover, PRISM allows a two-layer atmosphere representation for a given

Table 2 Spatialized grid statistics (minimum, mean, maximum, and standard deviation) and leave-one-out cross-validation (LOOCV) results for daily (selected days) temperature ( C) and precipitation totals (mm) using different spatialization methods: triangle interpolation (TI), inverse distance weighting (IDW), trend surface analysis (TSA), thin plate spline (TPS), ordinary kriging (OK), universal kriging (UK), multivariate regression analysis (MRA), MRA with residual kriging (MRA-OK), and geographically weighted regression (GWR) 1/2/2003

1/2/2003

1/2/2003 2/1/2003

2/1/2003

2/1/2003

2/1/2003 6/28/2002 6/28/2002

6/28/2002 6/28/2002 8/7/2003

8/7/2003

8/7/2003

8/7/2003

Mean ( C) 6.69 Mean ( C) 6.18 6.87 6.70 6.76 7.00 6.88 7.32 7.44 7.10 RMSE ( C) 1.43 1.35 0.98 1.19 1.34 0.74 1.05 0.52 0.78 37623.00 Mean (mm) 2.96 Mean (mm) 18.99 19.01 18.70 18.24 18.80 17.62

Max ( C) 9.20 Max ( C) 9.20 9.20 10.60 9.94 8.70 9.72 9.90 10.10 9.81 R2 (%) 10.54 21.57 58.28 38.37 22.17 77.15 51.64 91.75 73.79 37623.00 Max (mm) 3.62 Max (mm) 34.67 36.30 25.30 36.36 29.20 40.16

SD ( C) 1.83 SD ( C) 1.89 0.82 1.40 1.52 0.80 1.07 0.97 1.16 1.02 AEM 3.07 3.25 2.23 3.12 2.88 1.76 2.21 1.33 2.15 37623.00 SD (mm) 0.27 SD (mm) 4.45 2.70 4.10 5.08 3.40 4.18

Mean ( C) 5.61 Mean ( C) 5.92 5.28 5.40 4.59 4.90 4.88 4.64 4.76 4.76 RMSE ( C) 1.77 1.75 1.45 1.48 1.71 0.97 1.02 0.48 0.83 37653.00 Mean (mm) 0.52 Mean (mm) 0.67 0.77 0.70 0.66 0.70 0.79

Max ( C) 2.20 Max ( C) 2.30 2.20 5.80 1.02 2.60 2.77 2.76 2.66 2.77 R2 (%) 29.13 35.26 52.19 50.63 34.05 78.54 76.47 96.43 84.48 37653.00 Max (mm) 1.28 Max (mm) 2.42 2.60 2.20 2.76 1.50 3.67

SD ( C) 2.22 SD ( C) 1.58 1.04 1.90 1.77 1.00 1.27 0.96 1.15 1.11 AEM 3.86 3.82 2.72 3.80 2.91 2.09 1.72 1.69 3.04 37653.00 SD (mm) 0.35 SD (mm) 0.50 0.35 0.40 0.54 0.40 0.46

Max ( C) 21.60 Max ( C) 21.51 15.20 22.40 18.30 14.50 16.11 15.07 15.27 15.59 R2 (%) 9.92 25.88 55.26 30.12 31.62 91.40 90.07 95.77 91.51 37435.00 Max (mm) 3.34 Max (mm) 26.21 27.10 17.00 27.14 14.60 15.43

Mean ( C) 24.52 Mean ( C) 24.30 24.72 24.50 25.00 24.90 24.94 25.23 25.11 25.09 RMSE ( C) 1.15 1.19 1.12 0.81 1.19 1.03 1.12 1.15 0.89 37840.00 Mean (mm) * Mean (mm) * * * * * *

Max ( C) 28.10 Max ( C) 28.06 28.10 29.30 28.14 27.70 28.27 29.22 29.32 28.75 R2 (%) 67.16 65.70 68.17 83.35 63.71 72.85 67.89 68.81 80.03 37840.00 Max (mm) * Max (mm) * * * * * *

SD ( C) 2.06 SD ( C) 1.46 1.21 1.70 1.69 1.30 1.54 1.61 1.53 1.57 AEM 1.49 3.36 3.12 1.77 3.13 2.63 2.85 2.45 1.91 37840.00 SD (mm) * SD (mm) * * * * * *

Min ( C) 11.60 Min ( C) 10.51 11.60 11.60 11.61 9.80 9.40 12.14 12.24 10.77 NRMSE 0.19 0.19 0.15 0.16 0.18 0.10 0.11 0.05 0.09 37653.00 Min (mm) 0.00 Min (mm) 0.00 0.00 0.00 0.00 0.10 0.00

Min ( C) 13.80 Min ( C) 14.54 4.90 5.00 2.45 8.30 7.73 4.68 4.48 6.20 NRMSE 0.28 0.26 0.20 0.24 0.24 0.09 0.09 0.06 0.09 37435.00 Min (mm) 0.00 Min (mm) 0.00 0.00 0.00 0.00 0.00 0.00

Mean ( C) 18.40 Mean ( C) 19.01 12.12 10.70 12.50 12.50 12.67 12.83 12.95 12.75 RMSE ( C) 2.15 2.05 1.52 1.91 1.90 0.67 0.72 0.47 0.66 37435.00 Mean (mm) 0.43 Mean (mm) 1.20 1.44 2.00 1.61 1.50 1.79

SD ( C) 1.89 SD ( C) 1.15 1.04 3.30 2.25 1.00 1.49 1.11 1.33 1.30 AEM 5.33 3.91 2.17 5.28 3.54 1.69 1.77 2.12 1.21 37435.00 SD (mm) 0.78 SD (mm) 3.01 2.39 3.50 3.72 2.70 3.16

Min ( C) 19.90 Min ( C) 20.34 19.90 19.00 19.69 21.00 20.97 19.42 19.32 20.20 NRMSE 0.14 0.15 0.14 0.10 0.15 0.13 0.14 0.14 0.11 37840.00 Min (mm) * Min (mm) * * * * * *

(Continued)

GIS in Climatology and Meteorology

Min ( C) 1.10 Min ( C) 1.23 3.10 0.30 1.61 5.10 3.70 2.56 2.36 3.13 NRMSE 0.18 0.17 0.12 0.15 0.17 0.09 0.13 0.06 0.10 37623.00 Observations Min (mm) n ¼ 87 2.12 Grid Min (mm) TI 11.05 IDW 11.00 TSA 0.00 TPS 6.35 OK 13.50 UK 2.50

Observations n ¼ 31 Grid statistics TI IDW TSA TPS OK UK MRA MRA-OK GWR LOOCV TI IDW TSA TPS OK UK MRA MRA-OK GWR

1/2/2003

219

220

MRA MRA-OK GWR LOOCV TI IDW TSA TPS OK UK MRA MRA-OK GWR

1/2/2003

1/2/2003

1/2/2003

1/2/2003 2/1/2003

2/1/2003

2/1/2003

2/1/2003 6/28/2002 6/28/2002

6/28/2002 6/28/2002 8/7/2003

8/7/2003

8/7/2003

8/7/2003

12.41 11.98 11.55 NRMSE 3.05 3.00 2.79 2.75 2.76 2.84 3.28 3.00 2.51

18.68 18.84 9.34 RMSE (mm) 4.58 4.51 4.20 4.13 4.14 4.27 4.92 4.51 3.77

28.52 31.54 34.57 R2 (%) 26.85 29.15 38.21 40.23 40.06 36.10 15.07 29.08 50.14

1.96 2.60 3.24 AEM 13.46 14.85 16.47 11.07 16.55 16.46 14.10 14.85 10.82

0.71 0.76 0.80 RMSE (mm) 0.58 0.54 0.54 0.52 0.54 0.53 0.50 0.46 0.46

2.23 2.35 2.48 R2 (%) 14.83 25.06 26.48 31.83 26.22 28.29 37.18 46.25 45.26

0.42 0.41 0.40 AEM 1.83 1.71 1.42 1.62 1.68 1.80 1.42 1.28 2.65

6.45 16.13 25.81 R2 (%) 23.75 32.60 37.80 23.01 32.37 11.59 24.89 45.73 48.91

* * * RMSE (mm) * * * * * * * * *

* * * R2 (%) * * * * * * * * *

* * * AEM * * * * * * * * *

0.00 0.00 0.00 NRMSE 0.45 0.42 0.42 0.40 0.42 0.41 0.39 0.36 0.36

0.00 0.00 0.00 NRMSE 1.12 1.05 1.01 1.13 1.05 1.21 1.11 0.94 0.92

1.61 1.45 1.30 RMSE (mm) 3.72 3.51 3.37 3.76 3.51 4.02 3.71 3.15 3.07

1.73 2.15 2.57 AEM 18.38 20.43 19.64 16.61 20.13 20.50 22.41 19.43 15.13

* * * NRMSE * * * * * * * * *

GIS in Climatology and Meteorology

Table 2 Spatialized grid statistics (minimum, mean, maximum, and standard deviation) and leave-one-out cross-validation (LOOCV) results for daily (selected days) temperature ( C) and precipitation totals (mm) using different spatialization methods: triangle interpolation (TI), inverse distance weighting (IDW), trend surface analysis (TSA), thin plate spline (TPS), ordinary kriging (OK), universal kriging (UK), multivariate regression analysis (MRA), MRA with residual kriging (MRA-OK), and geographically weighted regression (GWR)dcont'd

GIS in Climatology and Meteorology

221

(prescribed) boundary layer depth to account for temperature inversions. PRISM is commonly assessed to be a powerful spatialization tool particularly in areas with sparse station networks, not sufficiently representing topographic variations and thus was widely applied to compile high quality gridded climate data sets (e.g., Daly et al., 1994; Daly et al., 2008; Schwarb et al., 2001).

2.10.3.3.3

Machine learning algorithms

Machine learning algorithms are a relatively new approach for spatial data analytics in general and data interpolation in particular, but have proved their prediction capability in various other disciplines and applications. They encapsulate an enormous number of methods that allow computers to generalize from experience, by learning from a training dataset. Commonly, algorithms are distinguished according to their learning style into supervised and unsupervised techniques. While supervised algorithms search for inherent structures in the data, the latter are provided the correct labels or function values (here meteorological parameters) in the learning phase. Further, they are differentiated into regression and classification type algorithms, which predict numeric or categorical values, respectively. Eventually, they can be grouped according to similarities into broader categories such as regression (discussed before), instance-based algorithms (e.g., k-Nearest Neighbor), decision trees (e.g., classification and regression tree, CART), artificial neural networks (ANNs) and deep learning algorithms (essentially more complex ANNs). Finally, ensembles (committee of classifiers, ensemble forecasting) are used, which combine different classifiers to jointly make a decision, following the same logic that human committees would tend to make better decisions than an individual. Eventually, machine learning approaches can be combined with deterministic and geostatistical approaches. A comparison of different approaches for spatial interpolation using mud content samples was conducted by Li et al. (2011), who found, in accordance with other studies, random forest (RF) algorithms to be particularly effective. Thus, this algorithm is the focus here and is discussed in more detail, along with ANN, which is presumably the most relevant class of machine learning methods. RF was created by Leo Breiman (2001) and is an ensemble of tree classifiers, which vote for the outcome. For each tree, a sample is drawn from the full training set and the input parameters to assure the individual trees are different and thus the overall vote is less sensitive to noise. The remaining part of the training set can be used to compute an unbiased error estimate, called “out of bag” error. RF has excellent accuracy and is highly computing efficient at the same time. ANNs imitate biological neural networks. They consist of a number of neurons (nodes in a graph), typically arranged in different connected layers (input layer, hidden layers, and output layer). The nodes have an activation level, which is sent to several other connected neurons (output signal). Each node has a weighted nonlinear activation function, which transfers the input signals into the output signal (activation level). The weights of the single nodes determine the output of the entire network and are optimized in the training phase. ANNs have been used in many tasks and have shown to offer great advantages in interpolation such as robustness against noisy data and nonlinear function approximation (Tveito et al., 2008).

2.10.3.4

Validation and Accuracy Assessment

The accuracy of spatialization results in the first instance depends on the quality of input data, but equally inappropriate spatialization methods may yield biased results or even objective errors, particularly when applying methods such as TPS or MRA, where the spatial prediction functions are not bound to the observational data range. Apart from visual checks of the results obtained, grid statistics compared against observations facilitate an a priori detection of questionable results. The most common approach to evaluate the accuracy of spatialization results is cross-validation. This is presently implemented in almost all GIS software packages. In cross-validation a particular spatialization approach is applied only to a subset of input data (observations), while the unconsidered data sets subsequently serve as a “statistically independent” data basis to estimate the precision of the predicted values. A robust but elaborate approach is Leave-One-Out Cross-Validation (LOOCV, e.g., Daly et al., 2008). In this validation procedure, climate surfaces are estimated n times from n  1 data points Pi, each time omitting one value from the input data. The residuals (i.e., the differences between the observed values vi and estimated values ṽi) are subsequently statistically aggregated as root mean square error (RMSE), normalized root mean square error (NRMSE), mean absolute error (MAE), coefficient of determination (R2) and coefficient of correlation (R) by: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uP un u ðvi  bv i Þ2 t (28) RMSE ¼ i¼1 n NRMSE ¼

RMSE vmax  vmin

n P

MAE ¼ i¼1 n P 2

R ¼

ðvi  bv i Þ2

1  i¼1 n P

2

ðvi  vÞ

and

(29)

jvi  bv i j (30)

n

R2a ¼ 1 

 n1  1  R2 nk1

(31)

i¼1



pffiffiffiffiffiffi R2

(32)

222

GIS in Climatology and Meteorology

where vi and ṽi are the observed and estimated (predicted) values at the points Pi, n is the number of data points, k the number of predictors, vmax and vmin are the minimum and maximum values of the observational data base, considered in (29) to normalize the RMSE with the observed data range. The coefficient of determination is the ratio between the explained (systematic) variance and the total variance of vi and accordingly quantifies the share of the total variance of vi which is “explained” by the spatialization function. The adjusted R2 is considered in spatialization approaches which are based on explanatory variables or statistical predictors, as is the case in some deterministic and geostatistical interpolation methods and particularly in multivariate statistical approaches. Since the unadjusted R2 automatically increases with increasing number k of explanatory variables, even if we add a dummy variable from a random number generator, this spurious increase of is R2 is countered by the adjusted R2. The coefficient of correlation R is the ratio between the covariance of the observed and estimated values (vi, ṽi) and the product of their standard deviations. In this context, R is the Pearson product-moment correlation coefficient, which is simply derived from a given R2 as the square root of R2. In cases where more than one explanatory variable are considered in spatialization, R is the multiple (Pearson product-moment) correlation coefficient. Grid statistics and cross-validation results of the previously introduced spatialization methods are listed in Table 1, referring at first to the long-term January, July and annual means of temperature and precipitation from the German example domain. In addition to the selected aforementioned validation measures, the maximum of the absolute deviations (AEM) is considered to illustrate the dimension, to which spatialization results may differ from observations. The comparative assessment shows, that regression based approaches distinctly outperform interpolation methods both for temperature and precipitation. Although both kriging variants perform slightly better than the deterministic interpolation methods and do not produce implausible values like TSA (e.g., the minimum of annual precipitation of 118.6 mm), their coefficients of determination remain in almost all cases below those of MRA and GWR. The most precise approximations of temperature and precipitation variations are obtained with MRA in combination with residual kriging, where the maximum of the absolute deviations of temperature is less than 1 C. Spatial prediction of long-term means, however, is not as demanding as predicting daily values and particularly the spatialization of daily precipitation remains a challenge, given the often-discrete variation in patterns of convective precipitation. To enable a comprehensive evaluation of the performance of different spatialization techniques in this respect, we performed cross-validation of spatialization results for daily records from a reduced data set of only 31 temperature and 87 precipitation stations. We selected four dates with differing circulation patterns over Europe resulting in warm moist (Jan. 2, 2003) and cold dry (Feb. 1, 2003) weather conditions in winter and hot dry (Aug. 7, 2003) and rather cool moist (Jun. 28, 2002) conditions in summer. The resulting grid statistics and validation results are listed in Table 2. Since at Aug. 7, 2003 almost no rain was recorded, this date was not considered in the precipitation statistics. As shown in the cross-validation results, the accuracy of daily spatialization is generally significantly worse as compared to the results for long-term means. Although the regression-based approaches again mostly attain a higher accuracy for daily temperatures, the performance of MRA for precipitation is poor, especially for the moist conditions on Jan. 2, 2003 and is partly outperformed by kriging or even by IDW. Better fitting results are obtained by MRA with subsequent RK (MRA-OK), stressing that the high spatial variability of precipitation is only to be captured by integrating local methods. Quite accurate results in terms of validation measures and AEM in most cases are achieved by GWR, both, for temperature and precipitation. Apart from altitude, GWR considered the terrain exposure index (TEI) for temperature and additionally for precipitation the windward-leeward index (WLE) adjusted to the particular atmospheric flow pattern. These methods are addressed in the following sections.

2.10.4

Climate Modeling and Application

Climate models are abstractions of the climate system, or components of it, representing (in particular) atmospheric processes and phenomena in equation systems. In general, modeling approaches are commonly divided into empirical (bottom-up) or numerical (top-down) models. Both basic modeling paradigms are well established in atmospheric sciences; however, physically based numerical models and especially general circulation models (GCMs) are of particular significance for environmental sciences and policies, given that these models allow quantitative simulations and projections of past, present and future climate under explicit consideration of anthropogenic forcings of the climate system and its interaction with oceans, cryosphere and biosphere (Maraun et al., 2010; Schoof, 2013). Despite a significant scientific and technological progress during the IPCC process, mirrored in the further development from pure circulation models to complex Earth System Models (Cubasch et al., 2013; McGuffie and Henderson-Sellers, 2014), the typical spatial discretization of state-of-the-art GCMs on the order of 102 km remains often beyond the requirements on the spatio-temporal data resolution, e.g., for case studies and integrated climate impact assessment. In this regard, current programmable GIS software provides a range of options, supporting the processing and downscaling of climate model outputs. In the following subsections, we introduce specific terrain and surface parameterization methods, improving the spatial representation of climate variables. Finally, we provide examples for the use of climate data in environmental modeling and assessment.

2.10.4.1

Terrain Parameterization

The orography of the Earth is a major control for the spatial differentiation of atmospheric processes and associated climatic variations (Geiger, 1969). Terrain-induced or influenced atmospheric processes are detectable at all meteorological scales (cf. Deacon, 1969;

GIS in Climatology and Meteorology

223

Emeis and Knoche, 2009). However, their most distinct influence on climatic patterns is due to boundary layer processes in topoclimatic scales with dimensions between 101 km (meso g scale) and 10 3 km (micro g scale, cf. Oke, 2000). A prominent example is the differential solar radiation budget of slopes due to varying aspects, slopes, and horizon screening, which are very often modeled using GIS. The impacts of mountain ranges on the distribution pattern of precipitation or the influences of the roughness and shape of the terrain on the lower boundary layer wind field or the flow path of cold air are further examples, to be parameterized in GIS. This section provides a short introduction to basic DEM parameters applicable as topoclimatic estimators in climate spatialization or downscaling of climate model outputs. Detailed discussions and justifications of different parameterization approaches are given in Böhner and Antonic (2009). To ease the comparative reading, in the following the terminology and notation of equations largely conforms to Böhner and Antonic (2009).

2.10.4.1.1

Topographic solar radiation

Shortwave solar radiation is the primary climatic factor entering the energy and heat budget and thus a significant determinant of all processes of energy exchange. Moreover, the radiation gradients in longitudinal and latitudinal directions are the key driver for the atmospheric circulation. Given that solar radiation is also a major limiting factor for photosynthesis and for plant growth, as well as a key factor for evapotranspiration, radiation is an often-required climate parameter for many applications. Incoming shortwave solar radiation covers wave length from approximately 0.3–3.5 mm with the highest share in the visible spectrum (0.39–0.71 mm) and reaches the Earth’s surface either as direct solar radiation or diffuse radiation received from the sky’s hemisphere. Assuming an unobstructed horizontal surface, the net shortwave radiation Sn is commonly expressed as: Sn ¼ Sg ð1  r Þ

with

Sg ¼ Sd þ Sh

(33)

where Sg is the global radiation and Sd and Sh are the direct and diffuse components of down-welling irradiance. The surface albedo (or surface reflectance) r reduces Sn to the absorbed share of Sg and depends on the nature of the underlying ground. For tabulated standard values of r for different natural surfaces see, e.g., Oke (2000). Values of global radiation and its components, considered in GIS applications, may be either obtained from station observations, climate model outputs, atmospheric reanalyses, or estimated from atmospheric data using a transmittance parameterization approach of the form: Sd ¼ sinqSc sz

(34)

Sh ¼ 0:5sinqSc cð1  sz Þ

(35)

ZN sz ¼ b

‘Dz

(36)

z

where q is the sun elevation angle, Sc is the solar constant and L is the air density integrated over distance Dz from elevation z of a target location (e.g., a DEM grid node) to the top of the troposphere. The parameterization according to the Bouguer-Lamberts law (Malberg, 2007; Kyle, 2013) approximates the transmittance of atmosphere sz by an empirical estimation of its optical depth. The empirical coefficients c and b represent the strength of scattering and absorption of the solar irradiance passing the atmosphere and may be calibrated from pyranometer measurements or estimated as empirical functions of water vapor or precipitable water (Böhner, 2006). Note that Eqs. (34)–(36) refer to clear-sky conditions, and assume an atmosphere which is homogenous in terms of its vertical chemical composition. For a comprehensive discussion on attenuation effects of clouds and atmospheric inhomogeneities see, for example, Kyle (2013). In Eq. (33) the net shortwave radiation refers to an ideal planar surface, not obstructed by any orographic effects. At sloping surfaces, the radiation components are modified by the local geometry of the terrain and horizon screening. The net shortwave radiation is then expressed as: Sn ¼ Sg ð1  r Þ

with

Sg ¼ Sd þ Sh

(37)

where Sg*, Sd* and Sh* now denote the radiation components altered by the terrain. Given that the direct solar radiation Sd* reaches the earth surface as a directed beam while the diffuse radiation Sh* is rather isotropic, Sd* and Sh* need to be modeled separately. Assuming a temporal high (e.g., hourly) computation of the diurnal course of Sd* for a particular point (i.e., a DEM grid node) on a sloping surface not obstructed by cast-shadowing, orographic effects at first require a calculation of sun elevation q and azimuth F for each grid node and time step given by: sin q ¼ cosl$cosd$cos6 þ sinl$sind cos4 ¼

cosd$cos6  sinq$cosl sinl$cosq

(38) (39)

where l is the latitude, d is the solar declination angle, and u is the hour angle (all given in degrees). Based on the primary DEM parameters slope b and aspect a, the direct solar radiation Sd* is then a function of the illumination angle g for a given sun position and time step:

224

GIS in Climatology and Meteorology Sd ¼

Sd $cosg sinq

with

cosg ¼ cosb$sinq þ sinb$cosq$ðcos4  aÞ

(40)

In the case of self-shadowing, indicated by a sin q < 0, the point (grid node) is not directly illuminated and Sd* is set to zero. The same applies for cast-shadowing by the terrain, which occurs if the solar elevation angle q is smaller than the horizon angle. In a DEM, the horizon angle 4 of each grid node for a given azimuth is calculated by: z  z 4 (40) fz ¼ arctan d max where z is the elevation of the target grid node and d is the distance to the grid node with the elevation zF determining the horizon angle. The computation of the horizon angle using DEM is a common standard routine in GIS software. The effect of cast shadowing is illustrated in Fig. 8A. Modeling orographic effects on the diffuse solar radiation Sd* requires an estimation of the obstruction of the overlaying sky hemisphere by the surrounding terrain, referred to as the sky view factor (SVF). The computation of the SVF for every grid node aggregates the horizon angle 4z in all azimuth directions of the full circle by: SVF ¼

1 2p

Z2p



cosbcos2 f þ sinbcosðF  aÞ$ð90  f  sinfcosfÞ



(41)

0

The accuracy of the calculation depends on the number of directions considered; however, given that the SVF is a static terrain parameter, which only needs to be calculated once, azimuth increments of 1 degrees are recommended. As shown in Fig. 8B, the SVF has a value of 1 at peaks, ridges, and flat areas not obstructed by horizon screening, while minimum values of about 0.5 occur in deep cut valleys, indicating that the incoming diffuse shortwave radiation is reduced by half. Based on the SVF the diffuse component Sh* in each time step is then determined by: Sh ¼ SVF$Sh

(42)

Referring to equation 37, daily totals are finally obtained as sums of direct and diffuse radiation components. The examples of daily totals of potential net shortwave radiation under clear-sky conditions at winter and summer solstice, shown in Fig. 9, have been obtained from simulations in 1 h resolution. For GIS-based modeling of net long-wave radiation, see Böhner and Antonic (2009).

(A)

(B)

Fig. 8 (A) Solar illumination after sun rise at summer solstice; dark blue indicates cast shadowing and self shadowing. (B) Sky view factor (SVF); values range from 1 (white) to 0.5 (blue).

(A)

(B)

Fig. 9 (A) Potential topographic solar radiation under clear-sky conditions at winter solstice; values range from 0.371 kW/m2 (blue) to 1.131 kW/m2 (white). (B) Potential topographic solar radiation under clear-sky conditions at summer solstice; values range from 7.024 kW/m2 (blue) to 8.419 kW/ m2 (white).

GIS in Climatology and Meteorology 2.10.4.1.2

225

Topographic temperature

As shown in the “Data Analysis and Spatialization” section, temperature is closely correlated with altitude, particularly when analyzing long-term means. Given that the saturation vapor pressure is determined by the air temperature, the same is valid for the atmospheric moisture content, which typically exponentially decreases with height. Spatial variations of temperature and moisture, however, are determined by the vertical state of the troposphere, which varies with circulation modes altering temperature and moisture lapse rates, particularly in the case of inversions. At the topoclimatic scale, the diurnal differential heating of slopes and the nocturnal cold air formation and cold air flow lead to distinct distribution patterns of moisture and temperature in the near surface layer. Although both processes can be partly expressed in physical terms, using microclimatic models, terrain analysis also allows an approximation of these orographically induced modulations of near-ground processes. The diurnal differential heating of slopes is due to shifts in the Bowen ratio, with a higher fraction of latent heat flux in the morning hours, when the ground surface is still moist due to condensation, and an increasing transfer of sensible heat in the afternoon, which results in a relative heat surplus at western slopes; this is particularly evident in the favored south to west sloping planting of sensitive crops such as vineyards. A sufficient approximation of the anisotropic diurnal heat distribution may be obtained by: Ha ¼ cos ðamax  aÞ$arctan ðbÞ

(43)

where Ha is the anisotropy parameter, b is the slope and a is the aspect. The term amax defines the aspect with the maximum total heat surplus, which may be calibrated based on field observations. Cold air formation and cold air flow is a typical phenomenon in calm and cloud-free nights, resulting in radiative heat loss of the ground surface and radiative transfer of sensible heat from the near-surface layer to the ground. In sloping terrain, cold (and thus denser) air flows gravitationally downhill and accumulates in sinks, valleys or basins. Particularly in mountainous areas with steep sloping surfaces, pulsating cold air currents are frequently occurring phenomena (Deacon, 1969) and in the mountain rimmed basins of Central Asia, stagnating cold air throughout the winter even forms to high-reaching cold air domes and persistent inversion layers (Böhner, 2006). Despite some analogs, cold air currents differ from the flow patterns of the much denser agent, water. This is particularly apparent in broad valleys where the cold air distribution is not limited to channel lines but disperses and typically covers the entire valley ground. Accordingly, a simple DEM-based parameterization of the cold air contributing upslope area by the DEM catchment area fails to adequately capture this effect (see Fig. 10A). A more suitable approach in this regard is a slope-dependent iterative modification of the DEM catchment area C, using: bexpð10b Þ bexpð10b Þ 1 1 f or CA < CAmax (44) CAm ¼ CAmax 10 10 where CAm is the modified catchment area, b is the slope angle and CAmax is the maximum DTM catchment area size in a 3  3 moving window, computed according to the multiple flow direction method of Freeman (1991). This algorithm proves suitable to approximate the flow path of cold air and the size of the cold air contributing upslope area (Dietrich and Böhner, 2008). The resulting distribution pattern is shown in Fig. 10B. A second important terrain parameter, which is especially relevant for estimating potential inversion heights in valleys experiencing nocturnal or persistent wintertime temperature inversions, is the altitude above channel lines. Based on a reasonable channel network grid, which in GIS is typically initialized with a catchment area threshold, each grid node of the channel network is assigned the base 0 m elevation and the altitude above channel network is subsequently computed using the single flow direction method. The iterative procedure required to prevent from abruptly changing values at the watersheds is described in Böhner and Köthe (2003). The altitude above channel lines moreover allows the computation of the normalized altitude, scaled between 0 and 1 and the midslope zone, indicating the thermal belt at slopes (Böhner and Antonic, 2009).

2.10.4.1.3

Topographic precipitation

Although the spatio-temporal dynamics of cloud formation and precipitation are also significantly affected by the terrain, the relationship is rather complex, owing to alternating thermally and dynamically induced processes and moreover varies between

(A)

Fig. 10

(B)

(A) DEM catchment area. (B) Cold air contributing upslope area. In both figures the values are normalized from (white) to 1 (blue).

226

GIS in Climatology and Meteorology

precipitation regimes. In the convective regimes of the tropics, precipitation amounts typically increase in the convective boundary layer while the exponentially decreasing air moisture content in the mid- to upper troposphere results in a corresponding drying above the average condensation level. In the subtropics and more distinctly in the mid-latitudes, frequent high-reaching advection of moisture-bearing air at fronts leads to increasing precipitation amounts in high mountains such as the Alps (cf. Weischet and Endlicher, 2012). Vertical precipitation gradients in high mountain ranges may be even strengthened by diurnal autochthonous upslope breezes, intensifying cloud and shower formation in upper slope positions, while the subsiding branch of these local circulation systems along the valley axis leads to cloud dissolution and a corresponding reduction of precipitation in the valley grounds. This phenomenon is particularly pronounced in subtropical and tropical high-mountain ranges like the Himalayas or the Bolivian Andes, where the thermally induced daytime circulation is mirrored in the vegetation physiognomy, with semidesert vegetation in the interior dry valleys and humid forests at neighboring upper slopes (Böhner et al., 2015; Miehe et al., 2015). The most obvious orographic effect, however, is orographic precipitation caused by the uplift of moist air currents at the windward side of mountain ranges, resulting in increased precipitation amounts, while the rain shadow effect at leeward settings induced by the blockage of moisture-bearing air and subsiding air currents is simultaneously mirrored by low precipitation rates or even dry conditions. This differentiation may be roughly captured by the DEM aspect when using a coarse resolution or strongly generalized DEM; however, the decisive trigger for orographic precipitation is not simply the aspect of a particular location (i.e., a DEM grid node) but the large-scale orientation of a mountain range against moist air currents. Given that the uplift of moist air at windward settings of a mountain range and the resulting precipitation pattern are associated with increasing angular slope of moisture distributing trajectories, Eq. (45) for the windward horizon parameter function HF and Eq. (46) for the leeward horizon parameter function Hh proved suitable as parameterizations of terrain determined the windwardleeward effects (WLI) on precipitation amounts (Böhner, 2006; Böhner and Antonic, 2009).



n n P P 1 1 Dz4i 1 $tan1 Dzhi 0:5 Dh4i $tan Dh Dh0:5 Dh hi 4i hi i¼1 i¼1 H4 ¼ þ (45) n n P P 1 1 i¼1

Dhhi n P

Hh ¼ i ¼ 1

i¼1 1 1 lnðDhhi Þ$tan n P i¼1



Dzhi Dh0:5 hi

Dhhi



1 lnðDhhi Þ

WLI ¼ H4 $Hh

(46)

(47)

With respect to a particular grid node, DhFi and DzFi are the horizontal and vertical distances to the grid nodes in wind direction and Dhhi and Dzhi are the corresponding vertical and horizontal distances in the opposite (leeward) directions. Note that the calculation either requires a prescribed wind direction or, in the case of statistical downscaling, is taken as a wind vector from climate model outputs or atmospheric reanalyses. Fig. 11A gives an example of the resulting distribution pattern for the wind direction WNW. The computation of WLI in all directions of the full circle results in the terrain exposure index (TEI) and indicates the degree to which a particular location is exposed to (or sheltered from) advection. As shown in Fig. 11B, the TEI is lowest in deep cut valleys largely sheltered from advection and highest at exposed peaks and ridges. Accordingly, the TEI is particularly suitable as an estimator in settings where local circulation systems cause a distinct small-scale differentiation between dry valleys and moist slopes.

2.10.4.2

Land-Surface Parameterization

Land surface parameterizations (LSP; also land surface models, LSM) generally represent the interface between land surface and atmosphere and in particular include the surface energy and water balances, the aerodynamic interaction and momentum sinks,

(A)

(B)

Fig. 11 (A) Windward–leeward index (WLI) for WNW wind direction; values range from 0.70 (white) to 1.45 (blue). (B) Terrain exposure index (TEI), for WNW wind direction; values range from 0.75 (white) to 1.35 (blue).

GIS in Climatology and Meteorology

227

and more recently also the carbon storage and release. Due to the large radiative fluxes at the surface, they are a critical component of all weather and climate models. In general, there is strong evidence that the choice of LSP is critical for modeling results (Pitman, 2003). Thomas et al. (2014), for instance, simulated Western Disturbances with the Advanced Research WRF using different LSPs and found that circulation features and precipitation were sensitive to the selection of the LSP scheme.

2.10.4.2.1

LSP types and development

In the earliest LSPs, soil moisture was prescribed and static until Manabe (1969) first introduced a “bucket model” for a GCM, where the soil water content increases with precipitation and decreases with evapotranspiration, which allowed the simulation of the hydrologic cycle, i.e., the interaction between Earth surface hydrology and the general circulation. Meanwhile, the complexity of LSPs increased considerably and a diversity of different approaches developed. Soil-vegetation-atmosphere-transfer (SVAT) models are of specific relevance and account for the impact of vegetation on the surface energy and water balances. In particular, the effects of vegetation on latent and sensible heat fluxes (i.e., the Bowen ratio), momentum and radiation balance are parameterized (Koster et al., 2000). Even within SVAT models, there is a large variety of approaches of different complexity, which can be divided into meteorological and physical models with different representation of the stomatal response (Niyogi and Raman, 1997). Operational meteorological models typically use a resistance approach to model the transpiration of the plant as a function of meteorological parameters. For the simpler models this is equivalent to the Penman-Monteith evapotranspiration equation (Monteith, 1965), which parameterizes net evapotranspiration from temperature, wind speed, humidity and radiation. lEPM ¼

DðRn  GÞ þ pcp ðE  eÞ=ra D þ gð1 þ rc =ra Þ

(48)

where lEPM is the latent heat flux (Jm 2 s 1); D ¼ is the change rate of saturation specific humidity with air temperature (hPa K 1); Rn is the net irradiation (J m 2 s 1), G is the ground heat flux (J m 2 s 1); p is the density of dry air (kg m 3); cp is the specific heat capacity of dry air (J kg 1 K 1); e (E) is the (saturated) vapor pressure (hPa); g is the psychrometric constant (hPa K 1); and ra and rc are the aerodynamic and canopy resistances (s m 1), which can be defined as: h i h i $ln z2=3h ln z2=3h z0m z0h (49) ra ¼ k2 uz rc ¼

rl LAIactive

(50)

where z is the height above ground (m); h the height of the vegetation canopy; k the Karman constant (41); z0m and z0h are the roughness lengths for momentum and heat (m); rl is the stomatal resistance (s m 1); LAIactive is the leaf area index; and uz is the wind speed at height z (m s 1). A limitation of most LSPs is the large emphasis on the formulation of one-dimensional, vertical physics while lateral processes and the horizontal heterogeneity of surface properties (such as soil moisture variability and runoff) are often neglected (Koster et al., 2000). Therefore, these authors introduced a new model approach based on hydrological catchments in higher resolution, which is also more compatible with GIS-based hydrological models. Ludwig and Mauser (2000) presented a GIS-based SVAT model. Further GIS-based approaches include spatialization techniques based on statistical predictors. Such approaches allow for the assimilation of observations, but are dependent on spatially consistent, near real-time input data, which can best be derived from remote sensing. The most relevant parameters (and thus covariates) for the surface energy and water balances at the interface between land and atmosphere include land surface temperature (LST), soil moisture, and vegetation indices.

2.10.4.2.2

Land surface temperature

LST is typically derived by inversion of Planck’s law in the atmospheric window between 10 and 12 mm, which corresponds to the emission maximum of the Earth: Bl ðl; T Þ ¼

2hc2 1 l5 ehc=lkB T  1

(51)

where Bl is the spectral radiant emittance in W/m3, l is the wavelength, h is Planck’s constant, T is the absolute temperature of the body, c is the velocity of light, and k is Bolzmann’s constant. The most common approaches to correct for the atmospheric radiance are split window methods (based on the different atmospheric attenuation in two adjacent bands), and the land surface emissivity is often estimated from vegetation indices or land cover. Since these data mostly have cloud gaps, which complicate spatial analysis, aggregated parameters from long time series such as the annual cycle parameters (ACP) provide a good alternative for empirical modeling (Bechtel, 2015; Bechtel, 2012). Another option is passive microwave data, which is not affected by clouds and thus largely weather independent, but has much coarser spatial resolution. A comprehensive review of LST remote sensing is given by Li et al. (2013), and a review of applications in climatology and meteorology by Tomlinson et al. (2011). Since the near-surface air temperature cannot directly be observed from satellites, it is often subsequently estimated from LST, where the differences in the diurnal cycle have to be taken into account (Bechtel et al., 2014). An operational system for high-resolution air temperature monitoring based on atmospheric temperature profiles and LST is presented by Keramitsoglou et al. (2016).

228

GIS in Climatology and Meteorology

2.10.4.2.3

Soil moisture and vegetation

Soil moisture is mainly obtained using microwave techniques as suggested by Jackson (1993), which is in the case of low vegetation the dominant effect on the microwave emission (Njoku and Entekhabi, 1996). The most common wavelength is the L-band at about 1.4 GHz which is, for instance, used by the Soil Moisture Ocean Salinity (SMOS) mission launched by ESA in 2009. Active RADAR observations contain large influences of the surface roughness and therefore different wavelengths are needed. Vegetation coverage is often parameterized with the help of vegetation indices such as the normalized difference vegetation index (NDVI), which are based on the red-edge (increase of reflectance between red and NIR wavelength) characteristics of photosynthetically active vegetation: NDVI ¼

rNIR  rRED rNIR þ rRED

(52)

where rNIR and rRED are the reflectances in the near infrared and red band, respectively. Subsequently, fractional vegetation cover or the leaf area index can be estimated. Additional indices account for the influence of bare soil like the soil-adjusted vegetation index: SAVI ¼

rNIR  rRED ð1 þ LÞ rNIR þ rRED þ L

(53)

where L is a correction factor for the canopy density between 0 (dense) and 1 (sparse) (Huete, 1988), or focus on the vegetation water content like the normalized difference water index: NDWI ¼

rNIR  rSWIR rNIR þ rSWIR

(54)

where rSWIR is the shortwave infrared at about 1.24 mm as defined by Gao (1996). Sentinel 2, launched in 2015, has three additional bands at the red edge, which allow more detailed analysis of vegetation composition and health status. Finally, the previous parameters can be combined to estimate fluxes of the surface energy balance, i.e., the latent heat flux, often incorporating SVAT models. In particular, soil moisture and heat fluxes can be estimated from the combination of LST and vegetation indices (Petropoulos et al., 2009) and the roughness lengths can be derived from SAR interferometry (Bechtel et al., 2011). Since the full range of remote flux retrieval methods is beyond the scope of this chapter, Kalma et al. (2008) is recommended for a review.

2.10.4.2.4

Urban areas

The need for detailed information on the surface is especially evident in urban areas, which are of particular relevance due to the large population and their specific vulnerability to climate change. While there are urban land surface schemes of different complexity available (Chen et al., 2011; Martilli et al., 2002; Masson, 2000), which parameterize the urban canopy and the building energy balance, the needed input data on the heterogeneous urban structure are frequently lacking. The World Urban Database and Access Portal Tools (WUDAPT) is an international initiative to collect data on the form and function of cities around the world for atmospheric sciences using remote sensing and crowdsourcing approaches (Bechtel et al., 2015). The urban multi-scale universal predictor (UMEP) is an urban climate service and modeling system designed to be applied from street canyon to city scale and builds on a number of existing state-of-the-art urban atmospheric modeling components. In particular it integrates the solar and longwave environmental irradiance geometry model (SOLWEIG), which simulates 3D radiation fluxes, the surface urban energy and water balance scheme (SUEWS), which simulates the urban energy and water balances based on surface cover and routinely observed meteorological variables, the convective boundary layer urban energy water scheme (BLUEWS), and the large scale urban consumption of energy model (LUCY), which simulates anthropogenic heat flux (Lindberg et al., 2015). While the previous implementations of the model’s components were typically written in FORTRAN and MATLAB, and implied the editing and reading of various ASCII files for setting input parameters and analyzing results, the UMEP has chosen a GIS environment as integration platform and common graphical user interface. The model is split into a preprocessor, a processor, and a postprocessor, which are all implemented in an open source GIS. This is to deliver climate sensitive planning tools that can directly be operated and visualized by the applicants and that take advantage of the available GIS functionality such as in- and output of various geodata formats and handling of the projections and grid definitions.

2.10.4.3

Downscaling

In order to bridge the scale gap between limited resolution GCMs and the climate information needs at the regional or local level, various downscaling methods have been developed. Presuming that local scale climates are determined by large-scale atmospheric processes, these refinement techniques either aim to estimate point-scale climatic information for particular locations or to generate distributed fields of the major climatic variables with high spatial resolution (Maraun et al., 2010; Schoof, 2013). From a methodological perspective, downscaling methods can be distinguished into physically and empirically based approaches. However, the terms dynamical and statistical downscaling today are more common in the literature. Basic approaches and principal assumptions, advantages and limitations of these major downscaling techniques are summarized in this chapter.

GIS in Climatology and Meteorology 2.10.4.3.1

229

Dynamical downscaling

Dynamical downscaling is performed using high-resolution regional climate models (RCMs), either nested in GCM simulations or reanalyzed atmospheric fields. Due to higher spatial resolution, and subsequently, relatively detailed representation of topographic features and surface characteristics of the underlying ground, mesoscale atmospheric processes, such as tropical disturbances or orographic effects, can be directly resolved by RCMs (Maraun et al., 2010; Rummukainen, 2010). Forced by lateral boundary conditions from GCMs or reanalyses, RCMs solve the same fundamental differential equations of thermo- and hydrodynamics as the driving global models, and thus are considered advantageousdcompared to statistical downscalingdin terms of physical consistency. On the other hand, due to the close numerical interlinkage between the forcing global model and the mesoscale model, dynamical downscaling inherits the biases of the driving model, and accordingly, commonly requires additional bias corrections to improve the quality of the model output (Maraun et al., 2010). Further constraints concern the parameterization of subscale processes as well as the spatial resolution, length of simulation and size of the example domain. Although RCMs, and especially nonhydrostatic mesoscale models (cf. Skamarock et al., 2005; Langkamp and Böhner, 2011) are principally able to refine the coarse resolution model forcings down to a grid size of 1 km or even less, RCM-based studies are typically conducted with a grid size of about 10 km or coarser owing to the comparatively high computational requirements (Gerlitz et al., 2014). Dynamical downscaling has particularly been performed in context of the Coordinated Regional Climate Downscaling Experiment (CORDEX) of the World Climate Research Program (WCRP). The coordinated framework enabled a systematic comparison of different modeling initiatives and provided insights into the skills and shortcomings of the participating RCMs, so as to identify uncertainties in the regional climate simulations and projections (Giorgi et al., 2009). Although comparative assessments of the performance of CORDEX South America (Solman, 2013; Solman et al., 2013) and CORDEX South Asia (Mishra, 2015; Hasson, 2016; Ghimire et al., 2015) prove the principal ability of RCMs to reproduce basic spatio-temporal climate features, modeled results often entail distinct RCM-specific biases and the spread among ensemble members reflects both the different behavior of GCM/ RCM setups as well as uncertainties, e.g., due to the insufficient representation of land surface processes and features still too coarse to sufficiently incorporate regional forcings (Hasson, 2016; Solman, 2013).

2.10.4.3.2

Statistical downscaling

In view of the shortcomings of RCMs in terms of their high computational requirements, limited spatial resolution and typically high number of degrees of freedom in the modeling process (e.g. nesting architecture and nesting levels, domain size and resolution, parameterization schemes), all affecting the model results (Bhaskaran et al., 2012; Maraun et al., 2010), statistical downscaling is sometimes considered as a suitable alternative (Böhner, 2006; Gerlitz et al., 2014, 2015). The basic idea of statistical downscaling is to exploit the observed (empirical) relationship between large-scale atmospheric variables (represented by GCMs or reanalyses) and local observations, in order to obtain statistical transfer functions, which predict the local weather variations of interest in dependence of controlling large-scale variations (von Storch, 1995). As compared to dynamical approaches, statistical downscaling is not computationally expensive and simulates local climate variations directly based on the physically consistent climate model output. However, the physical consistency of statistically downscaled climatic fields is clearly limited. Given that the statistical transfer functions have to be calculated separately for each variable of interest, the covariance between different variables may not be properly captured in the modeling results (cf. Böhner, 2005). Statistical downscaling is frequently distinguished into model output statistics (MOS) and perfect prognosis (PP) techniques. PP-based approaches suppose that the large-scale free atmospheric model variables are perfectly simulated and their deterministic or probabilistic relationships with local-scale predictants can directly be explored (Rummukainen, 2010; Maraun et al., 2010). Methods applied to determine quantitative transfer functions range from multivariate standard methods (e.g., product-moment or canonical correlation analyses) to complex nonlinear machine-learning algorithms such as artificial neural networks (Schoof, 2013). Particularly ANNs are increasingly used in PP downscaling, given that this self-learning technique emulates biological neuronal networks by a set of connectionist models, which are suitable to capture the various nonlinearities and parameter interactions within the climate system (Schoof and Pryor, 2001). Application examples are given in Mendes et al. (2014) and Marengo et al. (2012), who compared ANN with statistical autocorrelation (AC) techniques for temporal downscaling of daily precipitation time series over the Amazon Basin. Their results indicate that ANNs significantly outperform the AC approach. MOS downscaling assumes that the results of limited resolution climate models are inaccurate (i.e., biased), and accordingly require an adjustment of modeled near-surface climate estimates using additive or multiplicative bias corrections. Böhner (2004, 2005, 2006) proposed a GIS-based spatialization approach, which basically merges MOS downscaling and surface parameterization techniques. Supposing the spatio-temporal variability of a climatic variable to be predominantly controlled by both tropospheric and terrain-forced processes, different DEM parameters and monthly resolution tropospheric fields from NCAR-NCEP (cf. Cavazos and Hewitson, 2005) had been considered as statistical predictors, supporting a high spatial resolution estimation of climate variables for Germany (Böhner, 2004), southern Amazonia (Böhner et al., 2013), the Okavango catchment (Weinzierl et al., 2014) and different modeling domains in Central and High Asia (Böhner, 2006; Gerlitz et al., 2014; Klinge et al., 2014). At the global level, MOS downscaling of ERA Interim reanalysis (cf. Berrisford et al., 2009; Dee et al., 2011) was applied in the CHELSA project (Climatologies at High Resolution for the Earth’s Land Surface Areas) to compute high-resolution monthly temperature and precipitation climatologies since 1979 (http://chelsa-climate.org/). The data sets obtained cover the entire earth’s land surface areas with a horizontal grid spacing of 30 arc sec ( 1 km) hosted at the German Climate Computing Center (DKRZ) under http://cera-www.dkrz.de. Refinement of ERA Interim from a T255 spectral resolution (0.75 degrees Lat./Long.) to the target grid resolution was completely performed in a programmable GIS environment running on a high performance computing (HPC)

230

GIS in Climatology and Meteorology

Fig. 12

Temperature and precipitation distribution according to data from WorldClim (1961–1990) and CHELSA (1979–2013).

architecture. MOS downscaling comprised altitude adjustment of ERA temperatures delineated from tropospheric temperature profiles (Gerlitz et al., 2014) and boundary layer correction of ERA precipitation fields, integrating the WLI with respect to the flow directions at the upper boundary layer and the TEI to account for dry valley phenomena. For a comprehensive description and justification of the downscaling algorithm and its evaluation against station observations from the Global Historical Climate Network (GHCN) and high resolution gridded TRMM (Tropical Rain Measuring Mission) and MODIS (Moderate Resolution Imaging Spectroradiometer) data, see Karger et al. (2016). Fig. 12 exemplifies CHELSA results for the German modeling domain, clearly depicting the effects of the WLI on the distribution pattern of precipitation.

2.10.4.4

Environmental Applications

Climate spatialization and climate modeling results are seldom stand-alone outcomes of research, but often considered as basic information, statistical predictor variables, or dynamical forcings for further environmental analysis, modeling, and assessment. Particularly climate impact analyses which today address a broad spectrum of apparently different aspects of climate-driven processes, place multiple demands on climate information. Although one rather intrinsic demand in most environmental studies is the need for spatially explicit climate data at high resolution, the needs differ in terms of data formats (vector, raster), spatial aggregation and, in particular, in terms of temporal resolution. Since GIS applications in specific fields are discussed in other chapters of this book, here only selected examples are briefly highlighted. First focusing on research dealing with climatically driven environmental process changes, the temporal scales of course vary with the respective sensitivity and temporal dynamics of the research objects addressed. Geomorphological and paleoecological studies, for instance, which aim to reconstruct the quaternary landscape and climate history, use baseline climatologies to detect present climate determined environments. These are subsequently considered as analogs to reconstruct paleoclimatic settings from paleoecological or geomorphological findings. Examples for multiproxy-based approaches using long-term mean climate values for paleoenvironmental modeling, obtained from statistical downscaling of NCAR-CDAS reanalyses (cf. Böhner, 2006) are given in (Aichner et al., 2010; Böhner and Lehmkuhl, 2005; Miehe et al., 2014; Herzschuh et al., 2009, 2011; Wang et al., 2014). Moreover, baseline climatologies at high spatial resolution are essential in most macroecological and geobotanical studies on vegetation and species distribution. The position of the upper treeline for instance to date was addressed in numerous studies, given that tree line ecotone within the zonation of vegetation in high-mountains is the most conspicuous physiognomic boundary and at the same time one of the most fundamental ecological boundaries. Comprehensive reviews of studies addressing climate factors and climatically-driven changes in position and structure of the treeline ecotone are given in Holtmeier (1985, 1989, 1995, 2003), Holtmeier and Broll (2005, 2007), Schickhoff (2005) and Körner and Paulsen (2004). Considerably higher demands on the temporal resolution of climate data are placed by dendroecological studies, typically requiring monthly time series and delineated indices to infer climate transfer functions from tree-ring and stable oxygen isotope

GIS in Climatology and Meteorology

231

chronologies (e.g., Bräuning, 2001; Helle and Schleser, 2004; Bräuning and Grießinger, 2006). The climate sensitivity of forests, however, is of course not only an issue in palecological studies but of general interest, given the economic and environmental relevance of forests and especially climate change effects on the potential future state and distribution of tree species. Nothdurft et al. (2012) used statistically downscaled RCM variables for the German state of Baden-Württemberg to quantify climate change effects on major tree species for the IPCC A1B and A2 climate scenarios. The same requirements on the temporal resolution of climate data are likewise placed by most process-oriented hydrological models. The general data needs in hydrological modeling are detailed in the chapter “GIS for hydrology” (Korres and Schneider, 2017). For the taxonomy of hydrological models see Chow et al. (1988). A comprehensive review about GIS in glaciology is given by Gao and Liu (2001). The highest requirements on climate data in terms of temporal resolution, however, are placed by process-oriented soil erosion and runoff models, typically requiring sub daily (hourly to 10 min resolution) input data given that the relevant subprocesses (saturation runoff, hortonian runoff, soil erosion, transportation, deposition) during events are largely forced by temporal highly variable rain intensities (Morgan et al., 1998; Schmidt et al., 1999; Böhner, 2011). Although soil erosion and runoff modeling clearly show the limits of GIS-based climate data support, GIS-based statistical or dynamical modeling approaches are well established to assess soil erosion and runoff risks and moreover, when considering contemporary Web-GIS opportunities, serve as key technology to further the dissemination and communication of modeling application in practice. Applications in agriculture also need high temporal resolution and thus have great potential to benefit from GIS applications, which can integrate spatial climate and weather datasets, with positioning systems and remotely and in situ sensed information on soil status, nutrients, and fertility to predict crop yield and other relevant parameters (Chapman and Thornes, 2003; Pierce and Clay, 2007; Wilson, 1999). Likewise, ecological biodiversity, species and plant community occurrence, and habitat suitability can be modeled based on GIS-based spatial climate datasets for nature conservation (Vogiatzakis, 2003).

2.10.5

Summary and Conclusions

GISs as powerful tools in the generation, integration, analysis, storage and visualization of cross-disciplinary data on climatological, meteorological, hydrological and environmental matters are widely accepted in geo-sciences, while their added value for meteorological applications is still underused (Tveito et al., 2008). However, GIS-based methods are implicitly included in many applications familiar to meteorologists and climatologists, from preprocessing of model data to visualization of NWPM output for weather forecasts. Hence, contemporary geospatial science and in particular GIS-based analysis and modeling opportunities provide a substantial support for meteorological and climatological studies, which currently seldom fully exploit the potential of existing geodata sources. Beyond a restricted recognition by the modeling community, one rather pragmatic reason for the limited application of these data may be the multitude of products, held in multiple layer structures, not necessarily adequate to satisfy the demands at the side of the modeling community. Contemporary GIS and in particular the amalgamation of computer-assisted cartography with database integration in modular organized object-oriented programming environments offer enhanced options for raw data processing (e.g., denoising, filtering, smoothing), a variety of geostatistical and numerical up- and downscaling strategies for the adjustment of geodata with respect to the requirements at the climate modeling side, and a wide range of techniques for the spatialization of point-source, in-situ weather and climate data for the observation side. However, the derived spatial patterns of such spatialization as well as the results of subsequent impact models crucially depend on the applied spatialization method. Therefore, a number of relevant (deterministic and geostatistical) interpolation and multivariate statistical spatialization techniques were discussed in detail in this chapter and compared using an exemplary dataset and workflow. Knowledge of their requirements, skills, and limitations is considered necessary to fulfil the multiple requirements regarding the spatio-temporal resolution of climate data in case studies and climate change impact assessments, which are seldom sufficiently covered directly by observations or the often very specific output of climate models. The introduced suite of spatialization algorithms and strategies, readily available as modular organized routines for geospatial analyses, supports atmospheric research and climate impact assessment at different scales. Moreover, current programmable GIS software enables the interlinking of climate forcings with environmental processes (e.g., process models), and bridging scale gaps by merging statistical downscaling with surface parameterization techniques.

Acknowledgements We thank Tobias Kawohl, Johannes Weidinger, Ines Friedrich, Torben Kraft, Georg Bareth, and Paul Alexander for their great support with the spatialization analysis, the cartography, the literature, and valuable comments on the manuscript.

References Agnew, M.D., Palutikof, J.P., 2000. GIS-based construction of baseline climatologies for the Mediterranean using terrain variables. Climate Research 14 (2), 115–127. Aichner, B., Herzschuh, U., Wilkes, H., Vieth, A., Böhner, J., 2010. d-values of n-alkanes in Tibetan lake sediments and aquatic macrophytesda surface sediment study and application in a 16 ka record from Lake Koucha. Organic Geochemistry 41 (8), 779–790. Anderson, A.R., Chapman, M., Drobot, S.D., et al., 2012. Quality of mobile air temperature and atmospheric pressure observations from the 2010 development test environment experiment. Journal of Applied Meteorology and Climatology 51 (4), 691–701.

232

GIS in Climatology and Meteorology

Bahrenberg, G., Giese, E., Mevenkamp, N., Nipper, J., 2010. Statistische Methoden in der Geographie. Borntraeger, Stuttgart, 416 pp. Bankert, R.L., Mitrescu, C., Miller, S.D., Wade, R.H., 2009. Comparison of GOES Cloud classification algorithms employing explicit and implicit physics. Journal of Applied Meteorology and Climatology 48 (7), 1411–1421. Bechtel, B., 2012. Robustness of annual cycle parameters to characterize the urban thermal landscapes, geoscience and remote sensing letters. IEEE 9 (5), 876–880. Bechtel, B., 2015. A new global climatology of annual land surface temperature. Remote Sensing 7 (3), 2850–2870. Bechtel, B., Alexander, P.J., Böhner, J., et al., 2015. Mapping local climate zones for a worldwide database of the form and function of cities. ISPRS International Journal of GeoInformation 4 (1), 199–219. Bechtel, B., Langkamp, T., Ament, F., et al., 2011. Towards an urban roughness parameterisation using interferometric SAR data taking the Metropolitan Region of Hamburg as an example. Meteorologische Zeitschrift 20 (1), 29–37. Bechtel, B., Wiesner, S., Zaksek, K., 2014. Estimation of dense time series of urban air temperatures from multitemporal geostationary satellite data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 7 (10), 4129–4137. Berrisford, P., Dee, D., Fielding, K., et al., 2009. The ERA-interim archive (Report No 1). European Centre for Medium-Range Weather Forecasts, Shinfield Park, Reading. Bhaskaran, B., Ramachandran, A., Jones, R., Moufouma-Okia, W., 2012. Regional climate model applications on sub-regional scales over the Indian monsoon region: The role of domain size on downscaling uncertainty. Journal of Geophysical Research 117, D10113. Böhner, J., Schröder, H., 1999. Zur Klimamorphologie des Tian Shan. Petermanns Geographische Mitteilungen 143 (1), 17–32. Böhner, J., Köthe, R., 2003. Bodenregionalisierung und Prozessmodellierung: Instrumente für den Bodenschutz. Petermanns Geographische Mitteilungen 147 (3), 72–82. Böhner, J., 2004. Regionalisierung bodenrelevanter Klimaparameter für das Niedersächsische Landes-amt für Bodenforschung (NLfB) und die Bundesanstalt für Geowissenschaften und Rohstoffe (BGR). Arbeitshefte Boden 2004 (4), 17–66. Böhner, J., 2005. Advancements and new approaches in climate spatial prediction and environmental modelling. Arbeitsberichte des Geographischen Instituts der HU zu Berlin 109, 49–90. Böhner, J., Lehmkuhl, F., 2005. Climate and environmental change modelling in central and high Asia. Boreas 34, 220–231. Böhner, J., 2006. General climatic controls and topoclimatic variations in Central and High Asia. Boreas 35, 279–295. Böhner, J., Selige, T., 2006. Spatial prediction of soil attributes using terrain analysis and climate regionalisation. In: Böhner, J., McCloy, K.R., Strobl, J. (Eds.), SAGAdAnalyses and modelling applications. Göttinger Geographische Abhandlungen 115. Goltze, Göttingen, pp. 13–28. Böhner, J., Antonic, O., 2009. Land-surface parameters specific to topo-climatology. In: Hengl, T., Reuter, H.I. (Eds.), Geomorphometry: Concepts, software, applications. Elsevier, Amsterdam, pp. 195–226. Böhner, J., 2011. Modelle und Modellierungen. In: Gebhardt, H., Glaser, R., Radtke, U., Reuber, P. (Eds.), GeographiedPhysische Geographie und Humangeographie, 2nd edn. Spektrum Akademischer Verlag, Heidelberg, pp. 44–49. Böhner, J., Aurelio, M., Dietrich, H., Fraedrich, K., Kawohl, T., 2013. Development and implementation of a hierarchical model chain for modelling regional climate variability and climate change over southern Amazonia. In: Proceedings of the CarBioCial Status Conference 2013, Cuiaba, Brazil. Böhner, J., Miehe, G., Miehe, S., Nagy, L., 2015. Climate and weather of Nepal. In: Miehe, G. (Ed.), Nepal. An introduction to the environment, ecology and human impact in the Himalayas, 1st edn. Royal Botanic Garden Edinburgh, Flora of Nepal, pp. 23–90. Bolch, T., 2006. GIS- und fernerkundungsgestützte Analyse und Visualisierung von Klima- und Gletscheränderungen im nördlichen Tien Shan (Kasachstan/Kyrgyzstan)dmit einem Vergleich zur Bernina-Gruppe/Alpen. Dissertation,. Faculty of Sciences of the Friedrich-Alexander, Universität Erlangen-Nuremberg, Erlangen-Nuremberg, 210 pp. Boschiero, L., 2007. Experiment and natural philosophy in seventeenth-century Tuscany. Springer, Dordrecht. Bräuning, A., 2001. Climate history of the Tibetan Plateau during the last 1000 years derived from a network of Juniper chronologies. Dendrochronologia 19, 127–137. Bräuning, A., Grießinger, J., 2006. Late Holocene variations in monsoon intensity in the Tibetan-Himalayan regiondevidence from tree-rings. Journal of the Geological Society of India 683, 485–493. Breiman, L., 2001. Random forests. Machine Learning 45 (1), 5–32. Brunsdon, C., McClatchey, J., Unwin, D.J., 2001. Spatial variations in the average rainfall-altitude relationship in Great Britain: An approach using geographically weighted regression. International Journal of Climatology 21 (4), 455–466. Caesar, J., Alexander, L., Vose, R., 2006. Large-scale changes in observed daily maximum and minimum temperatures: Creation and analysis of a new gridded dataset. Journal of Geophysical Research 111, D05101. Cavazos, T., Hewitson, B.C., 2005. Performance of NCEP–NCAR reanalysis variables in statistical downscaling of daily precipitation. Climate Research 28, 95–107. Chapman, L., Thornes, J.E., 2003. The use of geographical information systems in climatology and meteorology. Progress in Physical Geography 27 (3), 313–330. Chen, F., Kusaka, H., Bornstein, R., et al., 2011. The integrated WRF/urban modelling system: Development, evaluation, and applications to urban environmental problems. International Journal of Climatology 31 (2), 273–288. Chow, V.T., Maidment, D.R., Mays, L.W., 1988. Applied hydrology, McGraw-Hill series in water resources and environmental engineering. Eos, Transactions American Geophysical Union 70 (10), 149. Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., et al., 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4. Geoscientific Model Development 8, 1991–2007. Crain, I.K., Macdonald, C.L., 1984. From land inventory to land management. Cartographica 21, 40–46. Cubasch, U., Wuebbles, D., Chen, D., Facchini, M.C., Frame, D., et al., 2013. Introduction. In: Stocker, T.F., Qin, D., Plattner, G.K., Tignor, M., Allen, S.K., Boschung, J., Nauels, A., Xia, Y., Bex, V., Midgley, P.M. (Eds.), Climate change 2013: The physical science basis. Contribution of working group I to the fifth assessment report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge. Daly, C., Neilson, R.P., Phillips, D.L., 1994. A statistical-topographic model for mapping climatological precipitation over mountainous terrain. Journal of Applied Meteorology 33, 140–158. Daly, C., Halbleib, M., Smith, J.I., Gibson, W.P., et al., 2008. Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous united states. International Journal of Climatology 28 (15), 2031–2064. Deacon, E.L., 1969. Physical processes near the surface of the earth. In: Flohn, H. (Ed.), World survey of climatology 2. Elsevier, Amsterdam, pp. 39–104. Dee, D.P., Uppala, S.M., Simmons, A.J., et al., 2011. The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society 137, 553–597. Dietrich, H., Böhner, J., 2008. Cold air production and flow in a low mountain range landscape in Hessia. In: Böhner, J., Blaschke, T., Montanarella, L. (Eds.), SAGAdSeconds out. Hamburger Beiträge zur Physischen Geographie und Landschaftsökologie 19. Universität Hamburg, Hamburg, pp. 37–48. Dobesch, H., Dumolard, P., Dyras, I., 2013. Spatial Interpolation for climate data: The use of GIS in climatology and meteorology. Wiley, New York. https://books.google.com/books? hl¼de&lr¼&id¼rcgoj_aFbbsC&oi¼fnd&pg¼PT11&dq¼þspatialþinterpolationþforþclimateþdata&ots¼I81pgGEiIA&sig¼Md28So3j8XN9A4uiCliWRXM5kHM. (accessed 12.01.17). Emeis, S., Knoche, H.R., 2009. Applications in meteorology. Developments in Soil Science 33, 772. Foody, G.M., 2003. Geographical weighting as a further refinement to regression modelling: An example focused on the NDVI-rainfall relationship. Remote Sensing of Environment 88 (3), 283–293. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically weighted regression. The analysis of spatially varying relationships. Wiley, Chichester. Fotheringham, A.S., Brunsdon, C., Charlton, M., 2000. Quantitative geography: Perspectives on spatial data analysis. Sage, Los Angeles. Freeman, T.G., 1991. Calculating catchment area with divergent flow based on a regular grid. Computers & Geosciences 17, 413–422.

GIS in Climatology and Meteorology

233

Gao, B.-C., 1996. NDWIda normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sensing of Environment 58 (3), 257–266. Gao, J., Liu, Y., 2001. Applications of remote sensing, GIS and GPS in glaciology: A review. Progress in Physical Geography 25 (4), 520–540. Geiger, R., 1969. Topoclimates. In: Flohn, H. (Ed.), World survey of climatology 2. Elsevier, Amsterdam, pp. 105–138. Gerlitz, L., Conrad, O., Böhner, J., 2015. Large scale atmospheric forcing and topographic modification of precipitation rates over High Asiada neural network based approach. Earth System Dynamics 6, 1–21. Gerlitz, L., Conrad, O., Thomas, A., Böhner, J., 2014. Assessment of warming patterns for the Tibetan Plateau and its adjacent Lowlands based on an elevation- and bias corrected ERA-Interim Data Set. Climate Research 58, 235–246. Ghimire, S., Choudhary, A., Dimri, A., 2015. Assessment of the performance of CORDEX-South Asia experiments for monsoonal precipitation over the Himalayan region during present climate: Part I. Climate Dynamics 2015, 1–24. Giorgi, F., Jones, C., Asrar, G.R., 2009. Addressing climate information needs at the regional level: The CORDEX framework. World Meteorological Organization (WMO) Bulletin 58 (3), 175. Goodchild, M.F., 2007. Citizens as sensors: The world of volunteered geography. GeoJournal 69 (4), 211–221. Hasson, S., 2016. Seasonality of precipitation over Himalayan watersheds in CORDEX South Asia and their CMIP5 forcing experiments. Atmosphere 7 (10), 123. Helle, G., Schleser, G.H., 2004. Beyond CO2-fixation by rubiscodan interpretation of C-13/C-12 variations in tree rings from novel intra-seasonal studies on broad-leaf trees. Plant, Cell and Environment 27, 367–380. Herzschuh, U., Birks, H.J.B., Mischke, S., Zhang, C., Böhner, J., 2009. A modern pollen-climate calibration set based on lake sediments from the Tibetan Plateau and its application to a Late Quaternary pollen record from the Qilian Mountains. Journal of Biogeography 37 (4), 752–766. Herzschuh, U., Ni, J., Birks, J.B., Böhner, J., 2011. Driving forces of mid-Holocene vegetation shifts on the upper Tibetan Plateau, with emphasis on changes in atmospheric CO2 concentrations. Quaternary Science Reviews 30, 15–16. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A., 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25, 1965–1978. Hofstra, N., Haylock, M., New, M., Jones, P., Frei, C., 2008. Comparison of six methods for the interpolation of daily, European climate data. Journal of Geophysical Research. Atmospheres 113 (D21), 1–19. Hofstra, N., New, M., 2009. Spatial variability in correlation decay distance and influence on angular-distance weighting interpolation of daily precipitation over Europe. International Journal of Climatology 29 (12), 1872–1880. Holtmeier, F.K., 1985. Die klimatische WaldgrenzedLinie oder Übergangssaum (Ökoton)?dEin Diskussionsbeitrag unter besonderer Berücksichtigung der Waldgrenzen in den mittleren und hohen Breiten der Nordhalbkugel. Erdkunde 39, 271–285. Holtmeier, F.K., 1989. Ökologie und Geographie der oberen Waldgrenze. Berichten der Reinhold-Tüxen-Gesellschaft 1, 15–45. Holtmeier, F.K., 1995. Waldgrenze und KlimaschwankungendÖkologische Aspekte eines vieldiskutierten Phänomens. Geoökodynamik 16, 1–24. Holtmeier, F.K., 2003. Mountain timberlines. In: Ecology, patchiness, and dynamics. Kluwer, Dordrecht. Holtmeier, F.K., Broll, G., 2005. Sensitivity and response of northern hemisphere altitudinal and polar treelines to environmental change at landscape and local scales. Global Ecology and Biogeography 14, 395–410. Holtmeier, F.K., Broll, G., 2007. Treeline advanceddriving processes and adverse factors. Landscape Online 1, 1–21. Hudson, G., Wackernagel, H., 1994. Mapping temperature using kriging with external drift: Theory and an example from scotland. International Journal of Climatology 14, 77–91. Huete, A.R., 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment 25 (3), 295–309. Hutchinson, M.F., 2004. Anusplin version 4.3. Centre for Resource and Environmental Studies. The Australian National University, Canberra. IPCC, 1990. Report prepared for Intergovernmental Panel on Climate Change by working group I. Cambridge University Press, Cambridge, 410 pp. IPCC, 2013. Climate change 2013: The physical science basis. Contribution of working group I to the fifth assessment report of the Intergovernmental Panel on Climate Change. Cambridge University Press, Cambridge, 1535 pp. Isaaks, E.H., Srivastava, R.M., 1989. An introduction to applied geostatistics. Oxford University, Oxford. Jackson, T.J., 1993. III. Measuring surface soil moisture using passive microwave remote sensing. Hydrological Processes 7 (2), 139–152. Kalma, J.D., McVicar, T.R., McCabe, M.F., 2008. Estimating land surface evaporation: A review of methods using remotely sensed surface temperature data. Surveys in Geophysics 29 (4–5), 421–469. Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., et al., 2016. Climatologies at high resolution for the Earth land surface areas. http://arxiv.org/abs/1607.00217. Keramitsoglou, I., Kiranoudis, C.T., Sismanidis, P., Zaksek, K., 2016. An online system for nowcasting satellite derived temperatures for urban areas. Remote Sensing 8 (4), 306. Kessler, M., Böhner, J., Kluge, J., 2007. Modelling tree height to assess climatic conditions at tree lines in the Bolivian Andes. Ecological Modelling 207, 223–233. Kidd, C., Levizzani, V., Bauer, P., 2009. A review of satellite meteorology and climatology at the start of the twenty-first century. Progress in Physical Geography 33 (4), 474–489. Kington, J.A., 1974. The Societas meteorological Palatina: An eighteenth-century meteorological society. Weather 29, 416–426. Klinge, M., Böhner, J., Erasmi, S., 2014. Modelling forest lines and forest distribution patterns with remote sensing data in a mountainous region of semi-arid Central Asia. Biogeosciences Discussion 11, 14667–14698. Körner, C., Paulsen, J., 2004. A world wide study of high altitude treeline temperatures. Journal of Biogeography 31, 713–732. Korres, W., Schneider, K., 2017. GIS for hydrology. In: Huang (Ed.), Comprehensive geographic information systems. Elsevier, Amsterdam. Koster, R.D., Suarez, M.J., Ducharne, A., Stieglitz, M., Kumar, P., 2000. A catchment-based approach to modeling land surface processes in a GCM. Part 1. Model Structure. https://ntrs.nasa.gov/search.jsp?R¼20000034093 (accessed 3101.17). Krige, D.G., 1951. A statistical approach to some basic mine valuation problems on the Witwatersrand. Journal of the Chemical, Metallurgical and Mining Society of South Africa 52 (6), 119–139. Kyle, T.G., 2013. Atmospheric transmission, emission and scattering. Elsevier, Pergamon. Langkamp, T., Böhner, J., 2011. Influence of the compiler on multi-CPU performance of WRFv3. Geoscientific Model Development 4, 611–623. Li, J., Heap, A.D., Potter, A., Daniell, J.J., 2011. Application of machine learning methods to spatial interpolation of environmental variables. Environmental Modelling & Software 26 (12), 1647–1659. Li, Z.-L., Tang, B.-H., Wu, H., et al., 2013. Satellite-derived land surface temperature: Current status and perspectives. Remote Sensing of Environment 131, 14–37. Lindberg, F., Grimmond, C., Onomura, S., Järvi, L., Ward, H., 2015. “UMEP-An integrated tool for urban climatology and climate sensitive planning applications”, 9th International Conference on Urban Climate, Toulouse, France available at: http://www.meteo.fr/icuc9/LongAbstracts/tukup7_@28cont@29-3-3431366_a.pdf (accessed 30 January 2017). Lloyd C (2010) Spatial data analysis: an introduction for GIS users. Oxford university press. (hab ich heute von Jürgen bekommen). Ludwig, R., Mauser, W., 2000. Modelling catchment hydrology within a GIS based SVAT-model framework. Hydrology and Earth System Sciences 4 (2), 239–249. Malberg, H., 2007. Meteorologie und Klimatologie: Eine Einführung. Springer-Verlag. Manabe, S., 1969. Climate and the ocean circulation. Monthly Weather Review 97 (11), 739–774. Manley, G., 1974. Central England temperatures: Monthly means 1659 to 1973. Quarterly Journal of the Royal Meteorological Society 100, 389–405. Maraun, D., Wetterhall, F., Ireson, A.M., Chandler, R.E., Kendon, E.J., et al., 2010. Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Reviews of Geophysics 48 (3). Marengo, J.A., Chou, S.C., Kay, G., Alves, L.M., Pesquero, J.F., et al., 2012. Development of regional future climate change scenarios in South America using the Eta CPTEC/ HadCM3 climate change projections: Climatology and regional analyses for the Amazon, São Francisco and the Paráná river basins. Climate Dynamics 38 (9), 1829–1848. Martilli, A., Clappier, A., Rotach, M.W., 2002. An urban surface exchange parameterization for mesoscale models. Boundary-Layer Meteorology 104, 261–304.

234

GIS in Climatology and Meteorology

Masson, V., 2000. A physically-based scheme for the urban energy budget in atmospheric models. Boundary-Layer Meteorology 94 (3), 357–397. Matheron, G., 1963. Principles of geostatistics. Economic Geology 58, 1246–1266. Matheron, G., 1973. The intrinsic random functions and their applications. Advances in Applied Probability 5 (3), 439–468. McCloy, K.R., 2005. Resource management information systems: Remote sensing, GIS and modelling, 2nd edN. Taylor & Francis, Boca Rado. McGuffie, K., Henderson-Sellers, A., 2014. The climate modelling primer, 4th edN. Wiley-Blackwell, Chichester. Meier, F., Fenner, D., Grassmann, T., Otto, M., Scherer, D., 2017. Crowdsourcing air temperature from citizen weather stations for urban climate research. Urban Climate. http:// dx.doi.org/10.1016/j.uclim.2017.01.006. Mendes, D., Marengo, J.A., Rodrigues, S., Oliveira, M., 2014. Downscaling statistical model techniques for climate change analysis applied to the Amazon region. Advances in Artificial Neural Systems 2014, 1–10. Miehe, G., Miehe, S., Böhner, J., Bäumler, R., Ghimire, S., 2015. Vegetation ecology. In: Miehe, G. (Ed.), Nepal. An introduction to the environment, ecology and human impact in the Himalayas. Royal Botanic Garden Edinburgh, Flora of Nepal, pp. 385–472. Miehe, G., Miehe, S., Böhner, J., Kaiser, K., Hensen, I., et al., 2014. How old is the human footprint in the world’s largest alpine ecosystem? A review of multiproxy records from the Tibetan Plateau from the ecologists’ viewpoint. Quaternary Science Reviews 86, 190–209. Mishra, V., 2015. Climatic uncertainty in Himalayan water towers. Journal of Geophysical Research. Atmospheres 120, 2689–2705. Mitas, L., Mitasova, H., 1999. Spatial interpolation. In: Geographical information systems: Principles, techniques, management and applications. Wiley, New York, pp. 481–492. Monteith, J.L., 1965. Evaporation and environment. Symposia of the Society for Experimental Biology 19, 4. Morgan, R.P.C., Quinton, J.N., Smith, R.E., et al., 1998. The European soil erosion model (EUROSEM): A dynamic approach for predicting sediment transport from fields and small catchments. Earth Surface Processes and Landforms 23, 527–544. Muller, C.L., Chapman, L., Johnston, S., et al., 2015. Crowdsourcing for climate and atmospheric sciences: Current status and future potential. International Journal of Climatology 35 (11), 3185–3203. New, M., Hulme, M., Jones, P.D., 2000. Representing twentieth-century space–time climate variability. Part II: Development of 1901–1996 monthly grids of terrestrial surface climate. Journal of Climate 13, 2217–2238. Ninyerola, M., Pons, X., Roure, J.M., 2000. A methodological approach of climatological modelling of air temperature and precipitation through GIS techniques. International Journal of Climatology 20, 1823–1841. Niyogi, D.S., Raman, S., 1997. Comparison of four different stomatal resistance schemes using FIFE observations. Journal of Applied Meteorology 36 (7), 903–917. Njoku, E.G., Entekhabi, D., 1996. Passive microwave remote sensing of soil moisture. Journal of Hydrology 184 (1), 101–129. Nothdurft, A., Wolf, T., Ringeler, A., Böhner, J., Saborowski, J., 2012. Spatio-temporal prediction of site index based on forest inventories and climate change scenarios. Forest Ecology and Management 279, 97–111. Oke, T.R., 2000. Boundary layer climates. Taylor & Friends., London, 435 pp. Overeem, A., Robinson, R., Leijnse, J.C., et al., 2013. Crowdsourcing urban air temperatures from smartphone battery temperatures. Geophysical Research Letters 40 (15), 4081–4085. Perry, M., Hollis, D., 2005. The generation of monthly gridded datasets for a range of climatic variables over the UK. International Journal of Climatology 25, 1041–1054. Peterson, T.C., Vose, R.S., 1997. An overview of the global historical climatology network temperature database. Bulletin of the American Meteorological Society 78 (12), 2837–2849. Petropoulos, G., Carlson, T.N., Wooster, M.J., Islam, S., 2009. A review of Ts/VI remote sensing based methods for the retrieval of land surface energy fluxes and soil surface moisture. Progress in Physical Geography 33 (2), 224–250. Pierce, F.J., Clay, D., 2007. GIS applications in agriculture. Taylor & Friends, Boca Raton. Pitman, A.J., 2003. The evolution of, and revolution in, land surface schemes designed for climate models. International Journal of Climatology 23 (5), 479–510. Rummukainen, M., 2010. State-of-the-art with regional climate models. WIREs Climate Change 1, 82–96. Schickhoff, U., 2005. The upper timberline in the Himalayas, Hindu Kush and Karakorum: A review of geographical and ecological aspects. In: Broll, G., Keplin, B. (Eds.), Mountain ecosystems. Springer, Berlin, pp. 275–354. Schmidt, J., Werner, M.v., Michael, A., 1999. Application of the EROSION 3D model to the Catsop watershed, The Netherlands. Catena 37 (3-4), 449–456. Schoof, J.T., 2013. Statistical downscaling in climatology. Geography Compass 7, 249–265. Schoof, J.T., Pryor, S.C., 2001. Downscaling temperature and precipitation: A comparison of regression-based methods and artificial neural networks. International Journal of Climatology 21 (7), 773–790. Schwarb, M., Daly, C., Frei, C., Schär, C., 2001. Mean annual and seasonal precipitation throughout the European Alps 1971–1990. In: Sperafico, R., Weingartner, R., Leibundgut, C. (Eds.), Hydrologischer Atlas of Switzerland. Landeshydrologie und Geologie, Institute of Geography of University Bern, Bern. Sen, A., Srivastava, M., 2012. Regression analysisdtheory, methods, and applications, 4th edn. Springer-Verlag, Berlin. Skamarock, W.C., Klemp, J.B., Dudhia, J., Gill, D.O., Braker, D.M., 2005. A description of the advanced reasearch WRF version 2. NCAR/TN-468 þ STRdNCAR technical note. Solman, S.A., 2013. Regional climate modeling over South America: A review. Advances in Meteorology 2013, 13 pp. Solman, S.A., Sanchez, E., Samuelsson, P., da Rocha, R.P., Li, L., et al., 2013. Evaluation of an ensemble of regional climate model simulations over South America driven by the ERA-Interim reanalysis: Model performance and uncertainties. Climate Dynamics 41, 1139–1157. Soria-Auza, R.W., Kessler, M., Bach, K., Barajas-Barbosa, P.M., Lehnert, M., 2010. Impact of the quality of climate models for modelling species occurrences in countries with poor climatic documentation: A case study from Bolivia. Ecological Modelling 221 (8), 1221–1229. Thies, B., Bendix, J., 2011. Satellite based remote sensing of weather and climate: Recent achievements and future perspectives. Meteorological Applications 18 (3), 262–295. Thomas, L., Dash, S.K., Mohanty, U.C., 2014. Influence of various land surface parameterization schemes on the simulation of Western disturbances. Meteorological Applications 21 (3), 635–643. Tobler, W., 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46 (2), 234–240. Tomlinson, C.J., Chapman, L., Thornes, J.E., Baker, C., 2011. Remote sensing land surface temperature for meteorology and climatology: A review. Meteorological Applications 18 (3), 296–306. Tveito, O.E., Wegehenkel, M., van der Wel, F., COST Office, 2008. COST Action 719: The use of geographic information systems in climatology and meteorology: final report. EUROP, Luxembourg. Vogiatzakis, I.N., 2003. GIS-based modelling and ecology: A review of tools and methods. Department of Geography, University of Reading, Reading. http://www.readingconnect. net/web/FILES/geographyandenvironmentalscience/GP170.pdf (accessed 14.03.17). von Storch, H., 1995. Inconsistencies at the interface of climate impact studies and global climate research. Meteorologische Zeitschrift NF 4, 72–80. Wang, Y., Herzschuh, U., Shumilovskikh, L.S., Mischke, S., Birks, H.J.B., et al., 2014. Quantitative reconstruction of precipitation changes on the NE Tibetan Plateau since the Last Glacial Maximumdextending the concept of pollen source area to pollen-based climate reconstructions from large lakes. Climate of the Past 10, 21–39. Ward, M.D., Gleditsch, K.S., 2008. Spatial regression models. Quantitative Applications in the Social Sciences 155. Sage, Los Angeles. Weinzierl, T., Conrad, O., Böhner, J., Wehberg, J., 2014. Räumliche Interpolation von Klimamodelldaten im Einzugsgebiet des Okavango-Flusses. Zentralblatt für Geologie und Paläontologie Teil I 1, 267–290. Weischet, W., Endlicher, W., 2012. Einführung in die allgemeine Klimatologie: mit 13 Tabellen, 8 durchges, Aufl. Borntraeger, Berlin. Wilson, J.P., 1999. Local, national, and global applications of GIS in agriculture. In: Geographical information systems: Principles, techniques, management, and applications. Wiley, New York, pp. 981–998.

GIS in Climatology and Meteorology

235

Xavier, A.C., King, C.W., Scalon, B.R., 2008. Daily gridded meteorological variables in Brazil (1980–2013). International Journal of Climatology 36, 2644–2659. Yozgatligil, C., Yazici, C., 2016. Comparison of homogeneity tests for temperature using a simulation study. International Journal of Climatology 36, 62–81. Zhu, H., 2016. GIS for environmental applicationsda practical approach. Routledge, London, New York, 471 pp.

Further Reading Bendix, J., 2004. Geländeklimatologie. Bornträger, Stuttgart, 282 pp. Böhner, J., 1996. Säkulare Klimaschwankungen und rezente Klimatrends Zentral- und Hochasiens. Göttinger Geogr. Abh. 101. Verl. Erich Goltze, Göttingen, 180 pp. Gerlitz, L., Bechtel, B., Zaksek, K., Kawohl, T., Böhner, J., 2013. SAGA GIS based processing of high resolution temperature data. In: Proceedings of the EnvironInfo-Conference, Hamburg. Hasson, S., Lucarini, V., Böhner, J., 2015. Prevailing climatic trends and runoff response from Hindukush-Karakoram-Hiamalaya, upper Indus basin. Earth System Dynamics Discussions 6, 579–653. Holtmeier, F.-K., 2009. Mountain timberlines: Ecology, patchiness, and dynamics. Springer, New York. Schmidt, J., Böhner, J., Brandl, R., Opgenoorth, L., 2015. Mass elevation and lee effect override latitudinal effects in determining the distribution ranges of species: An example from ground beetles in the Himalayan-Tibetan orogen. Journal of Global Ecology and Biogeography. Wehberg, J., Bock, M., Weinzierl, T., Conrad, O., Böhner, J., et al., 2013. Terrain-based landscape structure classification in relation to remote sensing products and soil data for the Okavango catchment. Biodiversity and Ecology 5, 221–233. Weinzierl, T., Conrad, O., Böhner, J., Wehberg, J., 2013. Regionalization of baseline climatologies and time series for the Okavango catchment. Biodiversity and Ecology 5, 235–245.

2.11

GIS and Coastal Vulnerability to Climate Change

Sierra Woodruff, Kristen A Vitro, and Todd K BenDor, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States © 2018 Elsevier Inc. All rights reserved.

2.11.1 2.11.2 2.11.2.1 2.11.2.2 2.11.3 2.11.3.1 2.11.3.1.1 2.11.3.1.2 2.11.3.1.3 2.11.3.1.4 2.11.3.1.5 2.11.3.1.6 2.11.3.2 2.11.3.2.1 2.11.3.2.2 2.11.3.2.3 2.11.3.2.4 2.11.3.2.5 2.11.3.2.6 2.11.3.2.7 2.11.4 2.11.4.1 2.11.4.2 2.11.4.3 2.11.4.4 2.11.4.5 References

2.11.1

Introduction Climate Change in Coastal Areas Sea Level Rise Hurricanes Vulnerability Assessment Exposure and Biophysical Determinants of Vulnerability Bathtub models Cell connectivity Hydrodynamic models DEM accuracy DEM and error sources Modeling data errors Sensitivity and Adaptive Capacity Contrasting physical and social vulnerability Physical vulnerability Social vulnerability Established social vulnerability indices Validation of social vulnerability indices Spatial resolution of vulnerability assessments Dynamic landscapes and the assumption of stationary socioeconomic characteristics Conclusions: Improving Decision-Making for Climate Vulnerability What are the objectives of the assessment? What are the benefits for the stakeholders? Who Is the Audience? Who Will Participate and How Will Results be Communicated? How Is the Assessment Framed? For Whom, and to What, Is Vulnerability Being Assessed? Should We Represent Uncertainty, and if so, How? A Final Word

236 237 237 238 238 239 239 239 240 240 241 242 242 243 243 244 247 248 249 251 252 252 253 254 254 255 255

Introduction

GIS is a powerful tool that can be leveraged to better understand the consequences of climate change. Communities across the United States and throughout the world are already experiencing the impacts of climate change, which include more severe heat, longer periods of drought, heavier precipitation events, and increased nuisance flooding (Melillo et al., 2014). Given the potential severity of future climate change impacts, there is a pressing need to assess community-level climate change vulnerability (Lieske, 2015). While vulnerability may have specific definitions across organizations or agencies, it is most simply defined as “the potential for loss” (Romero Lankao and Qin, 2011; Frazier et al., 2014; Füssel, 2007). The International Panel on Climate Change (IPCC, 2012) defines vulnerability as the predisposition to harm due to existing characteristics of societal assets (sensitivity) and the ability of those assets to cope and recover from events (adaptive capacity). Vulnerability is widely understood to be a function of exposure, sensitivity, and adaptive capacity (Cutter et al., 2008; Frazier et al., 2014; Sahin and Mohamed, 2013). Exposure is the proximity of an asset or area to a hazard; sensitivity refers to the level of impact a hazard has on the asset; and adaptive capacity is the ability to adjust and cope with the effect of the hazard. Exposure, sensitivity, and adaptive capacity all vary spatially. GIS can be used to examine spatial variation in the dimensions of vulnerability as well as how these dimensions interact with one another. This can help identify areas that are particularly vulnerable to climate change, and in doing so, can improve understanding of problems, educate the public, inform decision-making, and direct investments (Zerger, 2002; Preston et al., 2011b). GIS offers a number of advantages for analyzing community vulnerability to climate change, including data layering, querying, geo-referencing, and visualization (Gemitzi and Tolikas, 2007). Given these advantages, the application of GIS to analyze climate change vulnerability has grown exponentially in the last decade. A large, diverse literature has developed providing frameworks, conceptual models, and methods for vulnerability assessment (Romero Lankao and Qin, 2011; Cutter et al., 2008). However, challenges remain with using GIS for climate change vulnerability assessments, including data availability and data error, the communication of uncertainty, and the presentation of information in a readily understandable manner so that it may aid the decision-making process.

236

GIS and Coastal Vulnerability to Climate Change

237

Coastal areas are particularly vulnerable to climate change. Although coastal areas account for only about 2% of the globe’s total land area, roughly 10% of the world’s population lives in coastal regions within 10 m elevation of sea level (Neumann et al., 2015). Many of the world’s largest cities, including Mumbai, Shanghai, Jakarta, Bangkok, London, and New York, are located along coastlines (FitzGerald et al., 2008; Neumann et al., 2015). Moreover, the population in coastal areas is expected to continue to grow in the coming decades (Neumann et al., 2015). Sea level rise (SLR) may put these areas at substantial risk, both in terms of human safety and economic security (Beatley, 2009). Due to the high populations and large economic investments in coastal areas, it is critical to assess the vulnerability of these areas to climate change. Compared to other climate change impacts, vulnerability assessments for coastal threats such as SLR and storm surge are more established, with studies dating back several decades. Coastal impacts, including SLR and storm surge, also have a strong spatial component, illustrating the utility of GIS for conducting vulnerability assessments. The challenges confronted in coastal vulnerability assessments are also illustrative of vulnerability assessments more broadly. The goals of this chapter are to (1) summarize how GIS can be used to assess climate change impacts in coastal areas, (2) identify challenges in current practice, and (3) provide relevant examples of recent advances in the implementation of GIS as a tool for understanding potential coastal impacts of climate change. We first summarize the climate threats in coastal areas, including SLR and hurricanes. Next, we address exposure and biophysical determinants of vulnerability that are examined within the context of vulnerability assessments. We then address the use of GIS for the analysis of sensitivity and adaptive capacity. Finally, we conclude with a discussion of the role of GIS in the decision-making process, including its use as a communicative tool and role in stakeholder engagement, as well as a means for comparing policy options.

2.11.2

Climate Change in Coastal Areas

In coastal areas, the greatest and most immediate threat posed by climate change is SLR. Approximately 7% of the world’s population (upward of 431.4 million people) live in areas at risk of inundation (Li et al., 2009). Neumann et al. (2015) estimate that about 189 million people, or one-third of the population living in coastal areas, live within the 100-year floodplain. With rising sea levels and growing coastal populations, more than 400 million people could be living within the 100-year floodplain by 2100 (Neumann et al., 2015). In addition to causing more nuisance flooding and eventual inundation, rising sea levels may cause saltwater intrusion (Chang et al., 2011; Loáiciga et al., 2012) and erosion, and will exacerbate the impact of coastal storms (Melillo et al., 2014).

2.11.2.1

Sea Level Rise

Like mercury in a thermometer, water expands as it warms, resulting in an increase in its volume and thus causing sea levels to rise. This is referred to as “thermal expansion.” In addition, melting glaciers and ice sheets contribute to SLR at increasing rates. Since the late 1800s, data from tide gauges indicate that global sea level has risen by approximately 0.2 m, equivalent to an annual rise of 1.7 mm ( 0.3 mm). Satellite data indicates that, since 1992, global SLR has accelerated to approximately twice the observed rate over the last century to a yearly increase of 3.1 mm ( 0.7) (FitzGerald et al., 2008). While SLR is expected to continue to accelerate, projecting future SLR remains challenging. Climate models cannot simulate rapid changes in ice sheet dynamics, and are therefore likely to underestimate future SLR. “Semiempirical” methods project future rates of SLR based on simple statistical relationships between past rates of globally averaged temperature change and SLR (Schaeffer et al., 2012). Even for low greenhouse gas emission scenarios (see IPCC, 2012 for an in-depth explanation of scenarios and subsequent predictions in SLR), thermal expansion and melting of glaciers will result in approximately 0.28 m of SLR by 2100, not including contributions from melting ice sheets in Greenland and Antarctica. This suggests that, at a minimum, we should see approximately 1 m of SLR by the end of the century (Melillo et al., 2014). Taking into account higher greenhouse gas emission scenarios, it is plausible to expect upward of 4 m of SLR by 2100 (Melillo et al., 2014). It is important to recognize that SLR will not stop in 2100. The oceans take a long time to respond to warmer global temperatures; consequently, the oceans will continue to warm and SLR will continue to increase for many centuries at rates equal to or higher than those of this century. To add further complexity to SLR predictions, rates of rise are not uniform; rather, these rates vary regionally. For example, between 1950 and 2009 the rate of SLR along the eastern seaboard of the United States from North Carolina to Massachusetts has been approximately three to four times greater than global averages (Sallenger et al., 2012). Regional SLR is influenced by a number of factors including local temperature, salinity, ocean currents, and vertical land movement. Land movement and global SLR combine to cause relative SLR. If land is subsiding, it can cause aggregate sea level to rise faster relative to nominal increases in water height. In the Chesapeake Bay region of the United States, more than half of the relative SLR has been attributed to land subsidence (Eggleston and Pope, 2014). Vertical land movement can be caused by several factors, including plate tectonic forces. For example, glacial isostatic adjustment, or the flexing of the Earth’s crust in response to glacier formation and subsequent melting, contributes to land subsidence along the eastern coast of the United States, including the Chesapeake Bay region (Eggleston and Pope, 2014). Compaction of aquifers from extensive groundwater pumping can also contribute to subsidence. In the Chesapeake Bay region, aquifer compaction accounts for more than half the observed land subsidence (Eggleston

238

GIS and Coastal Vulnerability to Climate Change

and Pope, 2014). Subsidence can also be caused by oil and gas drilling operations and groundwater withdrawal; for example, along the Gulf Coast of the United States, oil and gas extraction are estimated to have caused a relative SLR of 5–11 mm annually (Sharp and Hill, 1995), with some extremes calculated at 75–1200 mm per year in some areas in Texas (White and Tremblay, 1995). In contrast, tectonic activity can lead to positive vertical land movement. The 9.0 magnitude earthquake that struck Japan on 11 Mar. 2011 caused part of the Pacific plate off the shore of Honshu to rise by approximately 3 m (Chock et al., 2013). The combination of these factors means relative SLR can vary widely along the coast, sometimes even within a single state. For example, in North Carolina, the Coastal Resources Commission’s scientific panelda state agency tasked with developing SLR projections for the basis of state policydfound that when localized factors are accounted for, estimates of 30-year relative SLR under identical scenario projections ranged from 2.4 in. in Wilmington, NC, to 5.4 in. in Duck, NC (NCCRC, 2016). Rising sea levels will result in both temporary and permanent inundation. Temporary inundation, such as flooding that occurs during storm events, may be damaging to property, but the water will eventually recede. Permanent inundation, in addition to flooding property and rendering the area uninhabitable, can have substantial impacts on local ecosystems. Fields, shrub lands, and forests may become bogs and wetlands, leading to vegetative change as some species may be unable to tolerate consistently wet soil (Krauss et al., 2009). Stands of dead trees, called ghost forests, are often indicative of a long-term change in soil moisture (Saha et al., 2011; Upton, 2016). Even for plants that are tolerant of high levels of moisture, changing water levels can have impacts. Wetlands and tidal marshes may eventually die off and become open water (Day et al., 2007; Hackney et al., 2007). Barrier islands are particularly vulnerable to inundation. Under natural conditions, rising sea levels will cause barrier islands to migrate landwards, as overwash moves sediment from the seaward side of the island to the landward side (Davis and Fitzgerald, 2004). However, barrier islands that have been developed and protected by hardened infrastructure are trapped in place. The natural sediment transport processes that allow their landward migration are hindered, leading to increasing relative SLR. Overtime, without continued investment in the construction and maintenance of hardened infrastructure and beach nourishment projects, barrier islands will eventually “drown” under the rising sea (Lorenzo-Trueba and Ashton, 2014). In addition to inundation, SLR will cause saltwater intrusion into coastal aquifers (Werner et al., 2013; Ferguson and Gleeson, 2012; Conrads and Darby, 2016). Saltwater intrusion is accelerated in regions of high groundwater withdrawal. For the populations of small islands that rely on freshwater aquifers as a water source, reduction or disappearance of potable water may be the greatest threat to their survival (FitzGerald et al., 2008). Entire island nations such as Tuvalu and Marshall Islands are already confronting saltwater intrusion (FitzGerald et al., 2008). Saltwater intrusion may also impact biological communities, as certain plant species may be unable to tolerate increasingly saline conditions (Krauss et al., 2009).

2.11.2.2

Hurricanes

The ability to assess long-term trends in hurricane activity is limited by the availability of data. Over the period for which high quality satellite data is available, there has been a substantial increase in hurricane activity in the Atlantic basin (Melillo et al., 2014). Since the 1980s, the intensity, frequency, and duration, as well as the number of strong storms, have increased. While the historic record dates back to the mid-1800s, there is considerable uncertainty in the records (Melillo et al., 2014). The recent increase in hurricane activity is linked, in part, to higher sea surface temperatures in the region where hurricanes form in the Atlantic Basin. Hurricane development, however, is influenced by more than just sea surface temperature. Ultimately the link between climate change and hurricanes is complex and represents an active area of research (Melillo et al., 2014). Climate models suggest that by the end of this century, the number of tropical storms will decrease on average, but the strongest hurricanes (Category 4 and 5 Saffir-Simpson scale) will increase. Regardless of projected hurricane intensity, rising sea levels will magnify hurricane storm surge. As sea level rises, surges from storms of any given magnitude will reach higher elevations, overtopping sea walls and levees, overwashing barriers more frequently, and, ultimately, producing more extensive areas of flooding (FitzGerald et al., 2008; Kleinosky et al., 2007). For example, in Lancaster, England, a 0.28 m predicted rise in sea level will make what is now considered a 250-year extreme water level or wave height (0.4% annual chance of an event of this magnitude) into a 50-year (2% annual chance) extreme water level (Prime et al., 2015). This issue is further exacerbated by continued reliance on hardened infrastructure along the coast, and the false sense of security it provides (Beatley, 2009). Stronger storms combined with SLR will worsen beach erosion and cause major shoreline change (Li et al., 2013). Increased swash from waves and intensified longshore currents have the capacity to move large amounts of sediment during storm events, substantially altering the coastline in the wake of the event (Fredsøe and Diegaard, 1992; Davis and Fitzgerald, 2004).

2.11.3

Vulnerability Assessment

Determining the economic and social impacts of SLR and increased hurricane intensity in coastal areas is critical to improve future development and investment decisions. GIS has an important role to play in understanding the vulnerability of coastal areas to climate change. Most vulnerability assessments use GIS to map exposure, physical vulnerability, and social vulnerability, and to overlay these dimensions of vulnerability to identify the most vulnerable locations. Exposure to inundation, storm surge, saltwater intrusion, and other climate threats are dependent on physical, ecological, and biological processes (Preston et al., 2011b). For example, determinants of exposure include elevation, natural hazards, land cover,

GIS and Coastal Vulnerability to Climate Change

239

and regional climate conditions. Exposure may also be called biophysical vulnerability (Kashem et al., 2016). Sensitivity to exposure, in contrast, is based on socioeconomic variables, such as poverty, governance, and access to transportation (Preston et al., 2011b). Both have high spatial heterogeneity (Zerger, 2002), making GIS important to help improve our understanding of these determinants, their distribution within a community, and their interactions, with the ultimate goal to improve decision-making (Krishnamurthy et al., 2011). GIS is a powerful platform to combine different types and sources of data such as biophysical and socioeconomic indicators of vulnerability. Social vulnerability exists independent of hazards, but the implications of that vulnerability may only become apparent when exposed to climatic hazards (Preston et al., 2011b). Conversely, population and infrastructure with equivalent exposure are not uniformly vulnerable (Frazier et al., 2014). Vulnerability cannot be defined by the hazard alone, nor can it be fully represented by the underlying properties of the system being stressed (Romero Lankao and Qin, 2011). Overlaying biophysical and socioeconomic data layers can help identify the “hot spots” of vulnerability. Mapping vulnerability in this way can help understand the problem, communicate issues, and may hold the promise of transparent, defensible priority setting (de Sherbinin, 2014).

2.11.3.1 2.11.3.1.1

Exposure and Biophysical Determinants of Vulnerability Bathtub models

The simplest way to model the area exposed to flooding from SLR or storm surge is a “bathtub” approach, in which a grid cell becomes flooded if its elevation is less than the projected sea level or storm surge height. The bathtub approach has also been referred to as the contour-line method, the static method, the planar surface projection, or the equilibrium method (Gallien et al., 2014). This is the most commonly used approach in studies to date (Frazier et al., 2010; Eckert et al., 2012; Kleinosky et al., 2007). In the bathtub approach, projected sea level or storm surge height is simply subtracted from the surface elevation. Digital elevation models (DEMs), which provide a continuous representation of the earth’s elevation surface (Wechsler, 2007), are ubiquitous in coastal vulnerability studies. A DEM is based on a grid, or matrix, with each cell having certain x- and y-coordinates, as well as an elevation value, z (Fig. 1; Blomgren, 1999). Elevation data, provided by DEMs, is often the most critical element in assessing the potential impacts of SLR (Sahin and Mohamed, 2013). Studies predominantly indicate that the increase in area exposed to flooding and inundation is not linear. In North Carolina, for example, a much larger area will be flooded by an initial 1.5 m SLR compared with a subsequent SLR of equal magnitude due to the greater slope of the land between 1.5 and 3.0 m in elevation (Poulter and Halpin, 2008). Slope of the land surface governs the horizontal extent of flooding, but model results are also influenced by data resolution and the particular modeling approach.

2.11.3.1.2

Cell connectivity

While the simple bathtub approach can provide insight into the vulnerability of coastal areas, it does not account for hydrological connectivity or geophysical barriers (Fig. 2). Some areas below the elevation of projected SLR may not be inundated due to protective infrastructure, such as levees, and geophysical barriers, such as dunes that exist between the ocean and the area of focus(Poulter and Halpin, 2008; Brown, 2006; Li et al., 2009). Accounting for these features is important to accurately map the area at risk of flooding. Poulter and Halpin (2008) consider two connectivity definitions: a “four-sided rule” where the grid cell was connected if any of its cardinal directions were adjacent to a flooded cell, and an “eight-sided rule” where the grid cell is connected if its cardinal and diagonal directions were adjacent to a flooded cell. They found that specifying connectivity decreased the area flooded in comparison with a bathtub model. Enforcing hydrologic connectivity also increased the importance of fine-scale landscape features such as ditches and dikes (see Manda et al. (2014) for additional discussion about the impact of artificial drainage networks and water movement). For example, water was forced to pass around dikes rather than flood low-elevation cells in front of and behind the obstruction.

DEM extent in y-direction

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.7 2.2 2.5 2.7 3.1 4.0 3.6 0.0 0.0 0.0 0.0 1.3 2.0 1.8 1.7 2.1 2.1 2.3 2.4 3.6 4.4 4.6 0.0 0.0

z value

0.0 1.8 2.1 3.0 3.3 3.4 3.3 3.7 4.3 5.0 4.8 4.6 3.5 2.6 0.0 0.0 1.7 2.9 4.0 3.7 4.7 4.8 6.3 5.9 7.5 7.0 6.1 4.5 2.2 0.0 0.0 1.6 2.7 4.2 4.1 4.5 4.3 8.3 7.9 7.7 6.6 5.4 3.8 1.8 0.0 0.0 2.0 2.5 3.1 3.2 3.7 4.0 6.7 6.2 5.2 4.2 3.2 2.7 1.5 0.0 0.0 2.0 1.1 2.6 3.2 2.9 2.6 3.0 3.1 2.7 2.3 2.0 0.9 1.2 0.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

DEM extent in x-direction Fig. 1

Example of DEM representation commonly used in bathtub models. Adapted from Blomgren, S. (1999).

240

GIS and Coastal Vulnerability to Climate Change

(A)

(B)

“Pour point”

Fig. 2 Panel A: Some inundation models fail to account for physical barriers. However, to better account for physical barriers, connectivity rules can be incorporated into inundation models. Panel B: using connectivity rules, a grid cell becomes flooded only if its elevation is below sea level and it is connected to a flooded or open water grid cell (Brown, 2006; Sahin and Mohamed, 2013; Poulter and Halpin, 2008).

Using models that include connectivity rules is critical when considering policy decisions about hard infrastructure such as seawalls and levees. For example, Brown (2006) aimed to compare landscape change and the number of properties at risk of inundation under four policy scenarios: (1) maintaining existing coastal defenses; (2) making minor inland adjustments; (3) major retreat of defensive lines inland; and (4) no defenses. Understanding the consequence of these different policies requires a modeling approach that incorporates connectivity and accounts for how these barriers affect water flow and flooding. Since protective coastal infrastructure is a common policy, models of exposure should account for connectivity and the consequences of this infrastructure and other barriers on flooding.

2.11.3.1.3

Hydrodynamic models

While requiring connectivity is an improvement over the simple bathtub model, static approaches that determine flood extent exclusively based on topography have been criticized for poor predictive power (Poulter and Halpin, 2008; Gallien et al., 2014). All of these models assume that flooding occurs instantaneously upon exceeding the “pour point” elevation and many fail to account for drainage and flood defense infrastructure that significantly alter flood outcomes (e.g., Fig. 2). Hydrodynamic models that enforce the laws of physics to describe the flow of water can provide more precise estimates of exposure (Seenath, 2015; Gallien et al., 2014). Hydrodynamic models can take into account a number of factors that influence floodwater flow such as friction, wind, tides, and barriers, as well as temporal effects (Seenath, 2015; Gallien et al., 2014). Gallien et al. (2014) compare static models and hydrodynamic model to the flood extent observed during a 2011 storm that overtopped coastal barriers and caused flooding in Newport Beach, California. Two static models were tested, one based on tide elevation and one based on wave height; neither model effectively predicted the realized flood extent. The first predicted no flooding, while the second predicted floods of the entire study area. Flooding predicted by the static models differed by two orders of magnitude from the observed flood. The hydrodynamic model significantly improved flood prediction. Selecting between static and hydrodynamic models depends largely on the goal of the project. Static methods can be helpful to increase awareness and communicate risk of long-term impacts of SLR (Gallien et al., 2014), but they are not effective for accurately predicting episodic flooding from storm surge or wave overtopping. Gallien et al. (2014) argue that these approaches can even undermine flood-risk management and optimal resource allocation. When hydrodynamic models are used, however, input parameters (e.g., friction, wind, and barriers) must be carefully defined (Gallien et al., 2014; Seenath, 2015). These variables may have a large effect on the model outcome. Gallien et al. (2014), for example, found that results varied significantly depending on whether or not the hydrodynamic model accounted for local drainage.

2.11.3.1.4

DEM accuracy

Regardless of whether a static or hydrodynamic approach is used, model results are highly dependent on data accuracy and spatial resolution (Eckert et al., 2012). DEM products provide clear and detailed renditions of topography and terrain surfaces; however, spatial resolution and vertical accuracy of DEMs produce uncertainty in results (Wechsler, 2007; Sahin and Mohamed, 2013). Spatial resolution refers to the size of the DEM grid cell, with a smaller grid cell size indicating higher resolution (Wechsler, 2007). High-resolution DEMs are better able to refine characteristics of complex topography, such as dunes and barriers, that are otherwise missed in coarser DEMs. (Wechsler, 2007; Sahin and Mohamed, 2013). DEM resolution has been shown to impact derived hydrological measures such as flow direction (Poulter et al., 2008). Greater spatial resolution tends to reduce the area exposed to flooding, especially if connectivity is assumed. Greater topographic complexity detected in higher-resolution DEMs fragments hydrological connectivity, limiting the area flooded (Poulter and Halpin, 2008). The appropriate DEM resolution to utilize for vulnerability analysis ultimately depends on characteristics of the study area such as topographic complexity, purpose of the analysis, and finances available to purchase high-resolution DEM data (Wechsler, 2007). For example, in their analysis comparing different flood defense policies, Brown (2006) uses high-resolution LiDAR (Light Detection and Ranging) because it better captures the influence of barriers on flooding, a critical component in understanding the consequences of different defensive policies. Even high-resolution LiDAR, however, can fail to capture defense infrastructure and other complex topographical features (Gallien et al., 2014). Often, these features must be manually added to the data (Brown, 2006). Such features can be surveyed using GPS technology and then added to existing LiDAR data (Gallien et al., 2014). It is also possible to survey flood control infrastructure and remove it from existing LiDAR data to represent premitigation landscape-alteration conditions (Manda et al., 2014; Poulter

GIS and Coastal Vulnerability to Climate Change

241

et al., 2009). Ongoing research along the Pamlico-Albemarle peninsula of North Carolina seeks to “undo” much of the topographical manipulation in that area, primarily drainage and irrigation ditches and flood control infrastructure, to allow for comparison of hydrological models for pre- and postdevelopment periods. High-resolution LiDAR data may also be limited in accurately determining the vertical elevation in areas that are dominated by large extents of marshes (Schmid et al., 2011). Vertical accuracy in elevation datasets also poses a challenge to mapping exposure to SLR, storm surge, and flooding. All DEMs include some elevation error, or measurements that depart from the true value (Wechsler, 2007; Coveney and Fotheringham, 2011). Errors are a fact of spatial data and cannot be avoided. Errors in DEMs can often be large enough to seriously affect model outcomes (Coveney and Fotheringham, 2011). These errors can be particularly problematic for modeling exposure of shallow-gradient coastlines since minor vertical errors can result in large miscalculations of the horizontal extent of flooding. Annual SLR rate is an order of magnitude smaller than the vertical error of most elevation datasets (Poulter and Halpin, 2008). Even lower-end projections for the end of the century fall within this vertical error (Frazier et al., 2010). This presents serious challenges, as there are often significant increases in the horizontal extent of inundation, number of structures damaged, and people subject to risk at SLR increments less than the vertical accuracy of the DEM (Zerger, 2002). Gesch (2009) argues that mapping increments of SLR that fall within the error of DEM is highly questionable. For example, when examining how SLR will increase flooding from storm surge, Frazier et al. (2010) intended on considering multiple SLR projections ranging from 0.3 to 1.2 m. Lower projections, however, were within the 1-m vertical accuracy of the DEM. Because the lower projections are within the vertical accuracy, it would be statistically inappropriate to consider them, so the authors focus on the 1.2 m SLR projection. Several other studies take this approach of excluding scenarios that fall within the vertical error of the DEM (see Cooper et al., 2013 for a summary of studies and how they address errors). Uncertainty in the DEM representation of elevation, however, is rarely accounted for by DEM users or addressed as a side issue (Wechsler, 2007; Fisher and Tate, 2006). In a content analysis of climate change vulnerability assessments, Preston et al. (2011b) found that most assessments neglect uncertainty entirely.

2.11.3.1.5

DEM and error sources

DEMs can be produced using a number of different methods. Today, they are frequently created using remote sensing rather than direct survey data. Older methods of generating DEMs often involve interpolating digital contour maps or data from direct surveys of the land surface: for example, Blomgren (1999) created a DEM for the Falsterbo Peninsula, Sweden, by interpolating elevation data from contour maps and land surveys to examine different scenarios of SLR. DEM errors are inextricably linked to production methods (Wechsler, 2007). Sources of DEM errors have been described in detail elsewhere. There is also a significant body of literature pertaining to the quantification of DEM error (Fisher and Tate, 2006; Wechsler, 2007; Coveney and Fotheringham, 2011). Here, we provide a brief overview of some key concepts from this literature. Common sources of error in DEM production include variations in the accuracy, density, and distribution of the measured source data, as well as the processing and interpolation of source data to derive the DEM (Fisher and Tate, 2006; Blomgren, 1999). Historically, DEMs were most frequently created from digitized contour lines, imagery (such as stereo aerial photographs), or direct land surveys (Fisher and Tate, 2006). All of these approaches are subject to inaccuracies. In the case of the digitized contour map, there may be error in the source map arising from the processes of collection, recording, generalization, symbolization, and production inherent in the cartographic process (Fisher and Tate, 2006). Remote sensing, namely LiDAR, has advanced efforts to improve elevation accuracy. Even LiDAR, however, will include some measurement error (Fisher and Tate, 2006). The quality of elevation information obtained is a function of the sensor and scanning system, aircraft speed/flying height, and the characteristics of the terrain surface (Fisher and Tate, 2006). Errors are often most significant in areas with high terrain and land cover variability, areas where dense vegetation cover is present (Coveney and Fotheringham, 2011), and low-lying coastal marshes (Schmid et al., 2011). Irrespective of the method of DEM construction, the error in a DEM can also be influenced by the density and distribution of the measured point source data (Fisher and Tate, 2006). The use of relatively dispersed measurement points results in more smoothed DEMs that fail to account for local variations in topography (Coveney and Fotheringham, 2011; see discussion on spatial resolution earlier). The density and distribution of data points is also related to error from processing and interpolation of data. The degree of processing and interpolation required to derive a regular gridded DEM from a set of measurements will depend on the density and distribution of data (Fisher and Tate, 2006). If the distribution of the source data is irregular or not at the desired spacing, then some degree of processing/interpolation of values at grid intersections is required, which can itself be a source of error. In general, there seems to be no single method that is uniformly most accurate for the interpolation of terrain data (Fisher and Tate, 2006). Furthermore, the success of a given interpolation method depends on distribution of the measured source data and the nature of the terrain surface. Blomgren (1999) noted that redundant, clustered samples created challenges in the data interpolation process. Removing some points improved the agreement between actual and theoretical models. Clustered points were mainly located around areas with high vertical variability such as hills and ditches, resulting in greater overall small-scale variability in the data set than was actually the case. Blomgren (1999) also found that interpolating the DEM from elevation data smoothed topographical features such as road embankments and ditches. These features are not preserved because the interpolated value assigned to a grid cell is a weighted average of several surrounding, known points. In the case of a ditch, the surrounding points will be at a relatively higher elevation, therefore resulting in a consistent overestimation (positive bias) of the elevation of the true lowest point of the ditch. For research purposes, teams may develop and utilize DEMs for further modeling. In these cases, the local and global accuracies are known and the methods used to derive the product are understood and often well documented. However, for practical

242

GIS and Coastal Vulnerability to Climate Change

application, DEMs are often acquired either by purchasing or by obtaining publicly available products. Publicly available and purchased DEM data sets typically come with a global error statistic, but do not provide information on data collection, methods for processing, or distribution of errors (Januchowski et al., 2010).

2.11.3.1.6

Modeling data errors

The most common global error statistic is the root mean square error (RMSE): sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P ðzDEM  zRef Þ2 RMSE ¼ n where zDEM represents the elevation measurement from the DEM, and zRef represents a more accurate measurement of elevation for a sample of n (Fisher and Tate, 2006). RMSE is equal to the standard deviation of the error if the mean error is (or is assumed to be) zero. Error statistics provided by data suppliers are commonly used as justification for selecting a given DEM (Coveney and Fotheringham, 2011). However, statistics often understate actual elevation error. Different statistics also provide different information, so users should consider several complementary measures to better understand and characterize error (Januchowski et al., 2010). Any single measure of error cannot serve as a surrogate for other statistics (see Cooper et al., 2013 for further discussion). RMSE, for example, does not describe the spatial heterogeneity of the errors; a problematic issue as DEM error is likely to vary spatially with terrain and land cover variability (Fisher and Tate, 2006; Januchowski et al., 2010; Coveney and Fotheringham, 2011). In effect, RMSE confounds the magnitude of error across a DEM with the model’s spatial variation in error (Januchowski et al., 2010). In a number of studies, the mean error has not been found to be equal to zero, and so the RMSE is not necessarily a good description of the statistical distribution of the error (Fisher and Tate, 2006). Moreover, RMSE is usually based on a comparison with a limited sample of reference points (Fisher and Tate, 2006). These global error statistics may not take into account elevation error outside the best-performing ground reference areas (Coveney and Fotheringham, 2011). Because measures of DEM error beyond the RMSE are not readily provided, data users have little understanding of the spatial distribution of DEM errors (Wechsler, 2007; Januchowski et al., 2010). In addition, it can be difficult to recreate the spatial patterns of errors for preprocessed DEMs (Januchowski et al., 2010). There are multiple approaches to modeling data errors. One can assume that error follows a normal distribution around the elevation of each pixel, and use the global RMSE as an estimate of the local error variance (Cooper et al., 2013). Alternatively, error for each pixel can be selected from a normal distribution (based on RMSE) and added to the DEM. This approach assumes error is random, but spatial dependence of error can be added. Monte Carlo simulations may also be used to examine multiple realizations of the models to assess error. For the study of error distributions in data to have any meaning, it is important to study their propagation into subsequent models, such as predictions of inundated areas (Fisher and Tate, 2006). Unfortunately, such propagation is complicated and studies on error propagation are rare (Fisher and Tate, 2006; Wechsler, 2007). Sensitivity analysis and simulation have not been widely used to assess the influence of DEM uncertainty on hydrologic model output (Wechsler, 2007). The absence of tools within the GIS software for handling the effects of input data uncertainty, and thus possible error propagation, has caused some to question whether GIS can be appropriately used for the basis of decision-making (Wechsler, 2007). Wechsler and Kroll (2006) created a toolbox in a GIS program to allow users to simulate the effects of DEM error on elevation, slope, upslope area, and the topographic index. While this tool is relatively limited, it demonstrates how GIS can better assess uncertainty of input data. One approach to dealing with the DEM error is to determine the minimum data requirements for the specific model or application (Fisher and Tate, 2006; Januchowski et al., 2010). In some situations, results may be sensitive to DEM error, but for other applications, this may not be the case. When a DEM is combined with other data in contexts, such as hydrological modeling, the effect of DEM error may be diluted, or may be relatively unimportant compared to errors in other data and uncertainty in the model itself (Zerger, 2002; Fisher and Tate, 2006). Sensitivity to DEM error may be minor when compared to modeling decisions about connectivity. In vulnerability assessments that combine exposure and sensitivity, the exposure aspect generally presents fewer issues, since biophysical data sets are reasonably well advanced, and the uncertainties in the data are, for the most part, quantifiable (de Sherbinin, 2014). Still, vulnerability assessments often ignore error altogether whether it originates with DEM error, other input data sources, or modeling decisions (Preston et al., 2011b). To better account for uncertainty, vulnerability assessments could utilize sensitivity analysis to compare the area of inundation under different modeling decisions, such as DEM resolution and error, connectivity, and hazard level (Poulter and Halpin, 2008; Hinkel et al., 2014). Model validation, in which model results are compared to field observations from historic events, can also be a powerful tool to assess model performance and accuracy. However, validation is severely limited by the availability of data: greater resources should be dedicated to field observations of coastal flooding to help improve models of future flooding and inundation (Gallien et al., 2014).

2.11.3.2

Sensitivity and Adaptive Capacity

Economic and social consequences of hazards are not simply equivalent to the area affected. For example, Prime et al. (2015) found that while SLR increases the extent of flooding by a factor of two, it increases the cost by a factor of approximately five. This

GIS and Coastal Vulnerability to Climate Change

243

nonlinear relationship, which occurs because populations and economic assets are not equally distributed across the landscape, means that simply estimating the extent of flooding can underestimate the overall community and economic impacts (Martinich et al., 2013). Consequently, it is essential to combine robust analysis of exposure or biophysical impacts with socioeconomic data. Additionally, human populations and economic resources vary in their ability to cope with events, further emphasizing the need for robust analysis from a variety of data sources.

2.11.3.2.1

Contrasting physical and social vulnerability

Most vulnerability assessments focus on physical vulnerability, particularly the total number of people that may be affected by a given hazard (Van Zandt et al., 2012). Physical vulnerability describes the ability of the built environment, including homes, roads, bridges, hospitals, schools, and government buildings, to withstand impacts. Generally, physical vulnerability is represented as the monetary value of physical assets in the hazardous zone. In recent years, vulnerability assessments have moved away from being solely focused on physical assets and are increasingly incorporating social vulnerability. Social vulnerability is defined as the susceptibility of social groups to the impacts of hazards, as well as their ability to adequately recover from them (Cutter, 2006). Past disasters, including Hurricane Katrina, illustrate that vulnerability is not simply the location and concentration of human populations, but also the characteristics of the population that determine its ability to anticipate, respond to, and recover from hazardous events (Van Zandt et al., 2012). Socioeconomic characteristics such as age, race, and income are typically emphasized in social vulnerability assessments, as these factors may influence the ability of a community to prepare and respond to a hazardous event (Kashem et al., 2016). For example, low income and minority households tend to be less prepared for hazards such as having hurricane preparation supplies or hurricane shutters (Van Zandt et al., 2012). Similarly, low income, minority, and elderly households are less likely to evacuate in advance of a hazardous event. This may be due to a combination of lack of resources, ineffective public transportation or evacuation transportation, and limited refuge opportunities outside the hazardous zone (Van Zandt et al., 2012). Focusing on how different social groups respond to hazardsdnot simply the physical location of hazardsdcan help create more effective adaptation policies (Garbutt et al., 2015). If we understand the differences in social vulnerability across a spatial scale, policies and disaster management can be tailored to the population, thus saving lives and reducing property losses (Cutter, 2006). There is a lineage of research that focuses exclusively on the inherent characteristics of social vulnerability (Romero Lankao and Qin, 2011). In this school of thought, social vulnerability is understood as a product of social inequalities within society. Natural hazards magnify existing social and economic inequalities; they do not change them (Cutter, 2006). As such, social vulnerability can be measured independently of exposure to hazards, and therefore, in order to reduce vulnerability, we must focus on creating a more just and equitable society. Studies in this area often describe inequities in resource distribution and access, but do not describe the full causal sequence of how these inequities interact with hazard exposure to produce differential impacts (Romero Lankao and Qin, 2011). While research on inherent social vulnerability has significantly advanced our understanding of overall vulnerability, it is important to recognize that it is only one dimension of vulnerability. Exposure, physical vulnerability, and social vulnerability must be considered holistically. Social vulnerability is often closely linked to physical vulnerability. For example, if a component of social vulnerability is access to health care, one must consider the physical location of hospitals and health care providers, as well as the state of that infrastructure and quality of service. Damage to physical infrastructure will inevitably affect social functions (Romero Lankao and Qin, 2011). The Community Resilience Planning Guide for Buildings and Infrastructure Systems (Community Resilience Group, 2015), released by the US National Institute of Standards and Technology in 2015, focuses on the role physical infrastructure systems play in ensuring social functions. The guide directs communities to consider how people and social institutions, such as government, business, healthcare, and education depend on the built environment. The importance of buildings and infrastructure in supporting these critical institutions should determine both their level of protection and sequence of recovery after an event. For example, to support emergency healthcare, communities may set a goal that hospitals remain functional during and immediately after a hazard event. This approach argues that protection of physical assets should be based on their importance in maintaining social institutions and limiting social vulnerability. Building on this, Garbutt et al. (2015) attempt to capture the link between physical and social vulnerability by including access to health care facilities, food stores, and schools, along with more traditional demographics, in their assessment of vulnerability to flooding in Norfolk, England. Flooding results in loss or diminished access to health care facilities for some of the most vulnerable populations. The connection between physical infrastructure and social welfare, however, is frequently overlooked in vulnerability assessments. Understanding the complex linkages between physical and social systems, or systems of systems, is an ongoing area of research (Romero Lankao and Qin, 2011). Flanagan et al. (2011) further argue that it is important to incorporate data on physical infrastructure, as it may expose vulnerabilities that are masked in the social vulnerability indices. For example, they found that during Hurricane Katrina, 30 residents of St. Rita’s nursing home in St. Bernard Parish, Louisiana, died in the flooding; however, GIS analysis on the census-tract level did not identify this area as particularly vulnerable based on the overall census numbers of elderly individuals. While census numbers incorporate the population from nursing homes, it made up a small population within the tract. For this reason, it is important to include data on critical infrastructure, such as nursing homes, that house or support vulnerable populations (Flanagan et al., 2011).

2.11.3.2.2

Physical vulnerability

Many assessments focus exclusively on physical or economic vulnerability, attempting to quantify the cost of damage. To better understand the economic impacts of SLR on coastal property in New Jersey and assess potential policy responses to SLR, Neumann

244

GIS and Coastal Vulnerability to Climate Change

et al. (2010) use GIS to overlay multiple data layers, including coastal elevation, parcel value, and land use and zoning categories. They estimate the value of property at risk of inundation, and then, using the cost of different adaptation strategies such as beach nourishment, elevation, armoring, and abandonment, determine what strategies would be most cost-effective. Studies that focus on physical vulnerability often overlay exposure to the particular biophysical hazard in question with parcel data and maps. In addition to being sensitive to the modeling decisions for exposure (discussed earlier), results are also dependent on assumptions about when damage occurs. Neumann et al. (2010) found that requiring the parcel centroid to be inundated decreases damage estimates by about 25% compared to the assumption that any inundation of the parcel causes damage. In estimating property damage in the City of Washington, NC, Berke et al. (2015) used data with building footprints in their analysis, in which buildings are considered vulnerable if any part of the footprint intersects the hazard zone. Several studies have incorporated additional data, such as base floor height (i.e., building first floor elevations), to more accurately represent the vulnerability of existing structures to flood-related hazards (Eckert et al., 2012; Bodoque et al., 2016). Furthermore, physical characteristics including wind design features, height of structure relative to potential floods, and artificial and natural flood defense measures will affect the overall extent of building damage (Van Zandt et al., 2012). To recognize that floods do not result in complete property damage, several studies make use of depth-damage functions, which describe the relationship between flood depth and relative damage (Hinkel et al., 2014; Jongman et al., 2012). Depth-damage functions essentially assume that damage is related to the depth to which a building is submerged. Depth-damage functions tend to have a declining slope, reflecting that additional damage diminishes with additional water depth (Hinkel et al., 2014). There have been several studies that use estimates of property damage from SLR to spatially determine the flood responses that would be economically efficient. For example, Martinich et al. (2013) divided the United States into a 150-m grid, and determined which areas should be abandoned, protected, or nourished (sand beach nourishment) based on calculations that compare the value of at-risk property to the cost of flood responses. They found that when decisions about protection and abandonment are based solely on property value, more land is abandoned in areas with high social vulnerability and more land is protected in areas with low social vulnerability. These results emphasize the disproportionate impacts of SLR and the need to incorporate social vulnerability into decision-making.

2.11.3.2.3

Social vulnerability

Hazards affect communities and social groups differently (Felsenstein and Lichter, 2014). This is due to variations in spatial distribution and varying abilities to cope with events. Development patterns often concentrate poverty and isolate vulnerable populations (Van Zandt et al., 2012). Lower-income populations often live in older and poorer quality housing located within low-lying areas that are more susceptible to damage (Van Zandt et al., 2012). Vulnerable populations are less likely to have access to information and resources to anticipate and respond to threats (Van Zandt et al., 2012). This susceptibility is exacerbated in areas where large informal settlements exist, as these settlements are generally population dense with relatively few resources, and, unfortunately, are rarely prioritized in vulnerability assessments (De Risi et al., 2013). Measuring and mapping social vulnerability is complex (de Sherbinin, 2014; Holand and Lujala, 2013). The use of composite indicators is one of the most prevalent approaches to measure, assess, and aggregate social vulnerability. Composite indicators combine individual variables that represent the diverse dimensions of vulnerability (Sullivan and Meigh, 2005; Cutter et al., 2010), and have been praised for their ability to simplify and visually communicate the complexity of social vulnerability. Quantitative indicators can be rapidly developed and provide a systematic and consistent method to measure progress, compare relative vulnerability of different areas, and identify priority areas for interventions, thus making them a valuable tool for decision-making (Sullivan and Meigh, 2005; Cutter et al., 2008; Balica et al., 2012). However, the composite indicator approach has been criticized for subjective variable selection and weighting, being limited to readily available data, misrepresenting the scale and dimensions of vulnerability, and being difficult to validate (Cutter et al., 2008; Garbutt et al., 2015). An extensive literature on composite indicators has developed in recent years (Cutter et al., 2010), but there is little consensus regarding the best methods for developing indicators (Tate, 2013). Generally, the process of creating a composite indicator includes four key steps: (1) development or application of a theoretical framework; (2) variable selection; (3) transforming variables into comparable scales; and (4) data aggregation and weighting. 2.11.3.2.3.1 Development of a theoretical framework The first step involves developing or applying a theoretical framework to support variable selection, the weighting of variables, and the statistical framework by which the index is evaluated (Cutter et al., 2010; Rygel et al., 2006). This theoretical framework should describe or conceptualize the cause of vulnerability (Holand and Lujala, 2013) and will depend on the goal of the index. Questions that should be considered include: is the index intended to apply to all hazards, or is it hazard specific? Should it represent generalized vulnerability or focus on a particular stage of a disaster (e.g., evacuation, response, recovery; Tate, 2013)? During this initial step, it is also important to determine the statistical approach for developing the indicator (Tate, 2012). Deductive, hierarchical, and inductive approaches are all commonly used (Fig. 3). Deductive designs are the simplest: upward of 10 normalized variables are assembled and then used directly to compute the index. Hierarchical models (Flanagan et al., 2011; Berke et al., 2015) use a greater number of indicators grouped into thematic subindexes, such as household composition and economic resources, which are then combined to form the overall index. Hierarchical approaches have a greater level of theoretical organization compared to deductive models, and less statistical complexity compared to inductive approaches (Tate, 2012). Inductive approaches (Fekete, 2009; Cutter et al., 2010) typically begin with more than 20 variables, which are reduced to a smaller

GIS and Coastal Vulnerability to Climate Change

(A)

I-3 Disability

I-2 Age I-1 Class

245

I-4 Gender

Index

I-5 Health

I-6 Ethnicity

(B)

Index Demographic structure

Socioeconomic status I-1

I-2

I-4

I-3

I-5

Special needs I-6

I-7

I-8

I-9

(C) I-6 I-3 I-4

I-7

I-5

I-9

PCA

Factor 1

Factor 3

Index

Factor 4

I-1

I-13 I-8

Factor 2

I-11

I-2

I-10 I-12

Factor 6

Factor 5

Fig. 3 Depiction of (A) deductive, (B) hierarchical, and (C) inductive models that structure vulnerability indices. Adapted from Tate, E. (2012). Social vulnerability indices: A comparative assessment using uncertainty and sensitivity analysis. Natural Hazards 63, 325–347.

number of latent variables using multivariate statistical techniques (typically principal components analysis (PCA)); these latent variables are then aggregated to compute the index. 2.11.3.2.3.2 Variable selection for social vulnerability indicators After the development and application of a theoretical framework, variables of interest are selected for analysis. There is no single set of established indicators for quantifying vulnerability; consequently, it is up to the researcher or practitioner to determine which variables should be included. Decisions regarding which variables to include may be based on a number of factors, including data availability, but selected variables should have strong theoretical grounding. To capture the multifaceted nature of vulnerability, composite indicators often consider a large number of variables. Early vulnerability studies claimed that including more variables strengthened results universally (Balica et al., 2012). However, many studies now run correlation analyses to remove duplicative variables that have little unique explanatory power in order to improve overall statistical validity. There are no direct measures for many concepts associated with social vulnerability. Therefore, proxy measures must be used. For example, the extent of community or citizen participation in civic affairs is arguably an important component of social vulnerability, but there are rarely direct measures of these concepts. As an alternative, composite indicators may include the number of civic and social advocacy organizations. Although the dimensions and concepts of social vulnerability are common across studies, there is significant variation in the variables selected for use as proxies (Tate, 2013). For example, both income and poverty rates are plausible proxies for material resources or economic capacity. Which should be included in the index, and why (Tate, 2012)? When selecting proxy variables, it is important to consider “validity” of the variable; the variable must be representative of the dimension or process of interest. For example, global assessments frequently use indicators such as relative mortality rate or GDP, as these proxy variables are measurable and well-defined. At local scales, vulnerability indices commonly include poverty, race, age, disabilities, education, and language (Felsenstein and Lichter, 2014; Kashem et al., 2016). Generally, communities that have higher levels of educational attainment, fewer elderly residents, and fewer disabled residents are thought to be less vulnerable (Cutter et al., 2010). Communities whose residents have a high percentage personal vehicle ownership, telephone access, and health insurance are also thought to be less vulnerable (Van Zandt et al., 2012). Indicators such as proximity to hazardous land uses and pollution sources, as well as health risk and exposure, have also been used to account for environmental justice-related vulnerability proxy measures (Martinich et al., 2013). Ultimately, the strength of composite indicator for overall social vulnerability is based on the quality of the variables represented. A major critique of the composite indicator approach is that variable selection is subjective (Holand and Lujala, 2013) and often driven by data availability rather than theoretical importance (Fekete, 2009). Indicators are often selected based on untested assumptions regarding the determinants of vulnerability (Romero Lankao and Qin, 2011).

246

GIS and Coastal Vulnerability to Climate Change

2.11.3.2.3.3 Variable comparison and scaling In addition to these issues, it is also important to recognize that variables cannot be readily compared if they are measured using different scales. To easily combine different variables, they must be normalized. Three popular normalization schemes have been proposed and implemented in vulnerability studies: unscaled variables, linear scaling, and z-score standardization (Tate, 2013). The use of unscaled variables is only appropriate when the data are measured in comparable scales, such as percentages, per capita values, or area-based density measures. When using unscaled variables, it is important to consider the implications of different approaches. Percentages and per capita measures emphasize the importance of average composition over the total number of individuals at risk. Density, on the other hand, often indicates that areas with more people or households have greater potential for damage compared to less-populated areas (Kleinosky et al., 2007). Linear scaling is a widely applied normalization technique. There are multiple ways of linearly scaling variables; in min–max scaling, the difference between each individual value and the minimum value is divided by the range of observed values, generating a measure that falls between zero and one. While min–max linear scaling is easy to comprehend, it does not perform well in the presence of outliers, as the difference between minimum and maximum values might distort index scores (Tate, 2013). When outliers are present, linear scaling using only the maximum value might be preferred, calculating the transformed value by dividing the unscaled value by the maximum observed value. Alternatively, variables can be normalized using z-scores, which produces values with a mean of 0 and a standard deviation of 1. The score represents the number of standard deviations a particular observation is away from the mean value of all observations, and also conveys the direction of this deviation. A positive z-score indicates that the given observation is above the mean value, while a negative z-score indicates that the observation is below the mean value. z-scores are calculated by subtracting the mean value for the study area from a given observed value, then dividing this difference by the calculated standard deviation for the study area. z-score standardization is preferred for data sets with extreme values (Tate, 2013). Additionally, z-score standardization produces standardized variables with the same value regardless of the measurement unit of the raw, unstandardized variable (Tate, 2013). Prior to normalizing variables, indicators for which vulnerability decreases as their values increase should undergo a directionality adjustment to address these negative associations between variables (Tate, 2013). For example, higher household wealth corresponds to lower social vulnerability, so measures of this wealth variable should be inverted, or multiplied by  1. Normalization essentially converts variables into a comparative measuredvulnerability is rarely measured in absolute terms; rather, assessments usually take a comparative approach in which indicators assess relative levels of vulnerability (Cutter et al., 2008, 2010). There is no cut-off for percentage of the population living below the poverty line when a particular county is labeled vulnerable, but we can compare it to a neighboring county to determine if it is more vulnerable in the context of this particular measure. Vulnerability assessments typically compare measures of vulnerability among particular places or analyze trends over time for a given area (Cutter et al., 2008). Because vulnerability measures are typically comparative, vulnerability will be dependent on the reference study area selected. Comparing counties within a state will produce different results for the relative vulnerability of these counties than if they are compared to counties across the entire nation. To illustrate this point, Felsenstein and Lichter (2014) categorize social indicators in each statistical area (equivalent to a US census tract) within Israel into national quintiles. This facilitates comparison of vulnerability measures to national averages, even though the study area was limited to two cities, Tel Aviv and Haifa. Similarly, rather than comparing each coastal county to all other coastal counties in the United States, Martinich et al. (2013) account for regional differences by limiting comparisons to only the coastal counties in the same region (North Atlantic, South Atlantic, Gulf Coast, and Pacific). For inductive approaches to developing vulnerability indicators, there is an additional step before data aggregation in which variables are reduced to a smaller number of latent variables using PCA. The social vulnerability index (SoVI), one of the most widely used composite indicators, uses PCA to reduce 42 different variables to latent variables that are included in the final aggregation. PCA returns a set of orthogonal components that are the linear combinations of the original variables (Martinich et al., 2013). In PCA, the first component is the linear combination that explains the greatest variation among the original variables, the second component explains the greatest remaining variation, and so on. PCA and other multivariate statistical techniques arguably increase the objectivity of variable selection (Holand and Lujala, 2013). Furthermore, this method circumvents potential issues with multicollinearity between the original variables examined for inclusion in the composite indicator. Many of the proxies typically chosen to represent vulnerability are highly correlated with one another and measure essentially the same themes. For example, a high percentage of population living below the poverty line and a low per-capita income both indicate limited economic capacity. PCA avoids these challenges by grouping variables together into components, thereby allowing for a more robust and consistent set of variables (Holand and Lujala, 2013). While PCA has many benefits, the method has been criticized based on its focus on outliers and its tendency to reflect spatial relationships between variables rather than the variables’ influence on vulnerability (Holand and Lujala, 2013; Frazier et al., 2014; Frazier et al., 2013).

2.11.3.2.3.4 Variable aggregation and weighting After determining the variables for inclusion in a vulnerability index and scaling those variables, one final step is needed to create the composite indicator: aggregation and weighting. The most common approach to aggregate variables across deductive, hierarchical, and inductive models is to simply average all the components or an additive approach. To date, nearly every social vulnerability assessment has taken this approach (Tate, 2012). Not only is this approach simple and easy to understand, it is not sensitive

GIS and Coastal Vulnerability to Climate Change

247

to outliers that could otherwise influence results (Tate, 2013). This additive approach, however, assumes that variables have no influence on each other; in statistical terms, it assumes there are no interactions between variables. Another disadvantage of additive aggregation is compensability, where a low value in one indicator can mask a high value in another (Tate, 2013; Kleinosky et al., 2007; Rygel et al., 2006). For example, an area with high levels of poverty might not receive a high vulnerability score if it also has low numbers of racial and ethnic minorities. To avoid these two issues, other aggregation techniques can be applied. Geometric aggregation multiplies normalized indicators, reducing the aforementioned problems. While a potential downside of geometric aggregation is its high sensitivity to outliers, if properly identified, these extreme scores can be addressed to allow this method’s appropriate implementation (Tate, 2013). Alternatively, other measures can be used to identify areas that may have extreme scores on one indicator, but do not score high on the overall composite indicator. Flanagan et al. (2011), for example, counted the number of individual variables with percentile ranks of 90 or greater to identify census tracts that have vulnerable populations due to a high percentile in at least one demographic variable. Kleinosky et al. (2007) similarly used Pareto ranking of census block groups, a method for ordering cases on multiple criteria that gives higher rankings to those cases that score highly on one factor (Rygel et al., 2006). Simply adding or averaging components assumes that they are equally weighted in their importance. Equal weighting is usually applied as a default because there is insufficient understanding of underlying processes to assign meaningful weights, not because indicators are of equal importance in the calculation of a composite index (Tate, 2012; Tate, 2013). For example, Cutter et al. (2010) chose to equally weight their selected indicators, arguing that there is “no theoretical or practical justification for the differential allocation of importance across indicators” (p.10). Equal weighting, however, is not considered an unweighted index (Tate, 2013; Felsenstein and Lichter, 2014). While differentially weighting indicators have been criticized for being too subjective, equal weighting is just as subjective. High correlations between indicators might introduce implicit weighting into an equal weighting scheme, as the associated dimensions could be effectively double counted (Tate, 2013). The goal of weighting is to bestow greater weights on those indicators deemed most important for measuring vulnerability (Tate, 2013). Unequal weights are often determined through the use of either statistical or participatory approaches. Statistical weights might be based on the percentage of variance explained in exploratory factor analysis, coefficients in linear regression, or the inverse of the coefficient of variation. Statistical weighting is determined with little or no input from decision-makers or those within the community being analyzed (Garbutt et al., 2015), and consequently, may fail to actually account for what is important for the decision-making process. Sullivan and Meigh (2005) suggest that the weighting of variables should be based on participatory consultation and expert opinion. Ultimately, the weighting of index components is among the most highly subjective decisions in the vulnerability index construction process (Tate, 2013).

2.11.3.2.4

Established social vulnerability indices

Established vulnerability indices employ a multitude of different methodological approaches (Tate, 2012). For example, social vulnerability assessment is perhaps the most well-known and widely used social vulnerability composite indicator. Since it was first developed, it has been applied extensively to different geographic settings, spatial scales, and time periods. Originally designed to compare disaster vulnerability at the county level throughout the United States, SoVI uses 42 socioeconomic, demographic, and built environmental variables to assess vulnerability. These variables represent factors and characteristics that had been consistently identified in past research as contributing to vulnerability. These variables are then entered into a principal component analysis, from which components were selected. The original study selected 11 components, which explained 76.4% of the variance in the original dataset, with socioeconomic status being the most influential factor in explaining vulnerability. The 11 variables were then scaled and summed with equal weights to create a comprehensive composite indicator. SoVI has been praised for integrating theory, conceptualization, and indicator selection (Holand and Lujala, 2013), but has been criticized for compensatory logic. The use of PCA has also been questioned based on the focus on outliers and its tendency to reflect spatial relationships between variables rather than the variable’ influence on vulnerability (Holand and Lujala, 2013). Today, hierarchical composite indicators, such as the social vulnerability index (SVI) developed by Flanagan et al. (2011), are becoming an increasingly popular method for assessing the vulnerability of populations (Tate, 2013). The SVI includes four subindices: (1) socioeconomic status, which consists of income, poverty, employment, and education variables; (2) household composition and disability, which includes age, single parenting, and disability variables; (3) minority status and language, which is comprised of race, ethnicity, and English proficiency variables; and (4) housing and transportation variables, which describe housing structure and vehicle access. In total, the index incorporates 15 census variables measured at the census-tract level. To construct the index, the percentile rank of each variable was calculated at the census-tract level. Percentile ranks of variables in each subindex were summed, resulting in an overall percentile rank for each census tract. To address the issue of compensability, Flanagan et al. (2011) also “flagged” the variables in each tract that have a percentile rank of 90 or above, recording the total number of such “flags” within both the subindices and overall composite score. Another example of a hierarchical composite indicator developed and widely used to assess social vulnerability is the Baseline Resilience Indicators for Communities (BRIC; Cutter et al., 2010). The BRIC measure includes five subcomponents: social resilience, economic resilience, institutional resilience, infrastructural resilience, and community capital. Measures of each subcomponent were operationalized using seven to eight variables (for a total of 36) whose selections were based on popular measures of resilience in the existing literature, as well as the availability of consistent, high-quality data from national sources. Variables that were highly correlated with one another (r > 0.70) were removed from the analysis, and all remaining variables were normalized to a min–max scale (see earlier discussion for further explanation of this normalization procedure).

248

GIS and Coastal Vulnerability to Climate Change

In BRIC, social resilience describes demographic variables that may affect a community’s response to hazardous events, including age, transportation access, language, educational equity, and health care coverage. Economic resilience refers to the economic vitality of the community and is operationalized using data on income inequality, employment, and homeownership. Institutional resilience includes variables such as the percent of homes covered by flood insurance, the number of disaster declarations, and the presence of hazard mitigation planning that assesses the capacity for risk reduction within the community. Infrastructure resilience captures the physical response and recovery capacity of a community, and includes variables such as the percent of housing units that are mobile homes, number of hospital beds, and age of housing stock. Finally, community capital describes, and is operationalized by, voter participation, number of civic and religious organizations within the community, and migration rates. Variable scores in each subindex are averaged, and then summed to produce a final composite resilience score. There are several other examples of social vulnerability indices, including the Prevalent Vulnerability Index; Index of Social Vulnerability to Climate Change for Africa; Disaster Risk Index; Predictive Indicator of Vulnerability; and the Disaster Resilience of Place. While some of these indices have been replicated and applied in multiple studies, many were developed for a specific study or application. Indices vary based on scale of analysis, the hazard under investigation, and particular conception of vulnerability of the study (Felsenstein and Lichter, 2014). Unfortunately, the bulk of these vulnerability index studies fail to explain why particular choices were made and how they may impact affect the output index (Tate, 2012). Existing techniques for developing social vulnerability indices have been criticized on a number of grounds, listed here and explained in further detail in the subsequent section: variables used in composite indicators assess socioeconomic determinants of vulnerability but do not provide information regarding perceptions of risk, behavioral responses, or political institutions that are the actual causes of vulnerability (Preston et al., 2011b); indicators are typically based on widely available proxy measures, not primary data, and may not adequately characterize local circumstances (Preston et al., 2011b); weighting of indicators is subjective (Frazier et al. 2013); indices are rarely validated and may not have predictive applications (Preston et al., 2011b); the scale of indicators may not correspond to the location or scales where vulnerability is manifested (Kienberger et al., 2013); and, vulnerability assessments tend to focus on current demographic and socioeconomic data, ignoring trends and likely future states (Preston et al., 2011b).

2.11.3.2.5

Validation of social vulnerability indices

Measures and indices of social vulnerability are highly dependent on the indicators chosen for inclusion, indicator weighting, data aggregation, scale of analysis, and data source (Frazier et al., 2014). During each stage of constructing a social vulnerability assessment, researchers are faced with many choices (e.g., hierarchical or inductive, census tracts or blocks, equal weights, or weighted (Tate, 2013)). Whenever an index developer makes a choice between competing viable options, uncertainty and subjectivity is introduced into the modeling process (Tate, 2013). These decisions have the potential to significantly influence the output. Consequently, validation is needed to understand how well indices represent the concept of vulnerability (Tate, 2012). Validation of composite indicators can be achieved by either comparing the index to an independent, second dataset (external validation), or, alternatively by testing the internal validity of the model using sensitivity analysis (Fekete, 2009; Tate, 2012). 2.11.3.2.5.1 External vulnerability index validation External validation tests the conceptual validity of composite indicators by addressing the question of whether the indicator concretely explains the phenomenon of interest (Fekete, 2009). Validation using an independent second data set has been stymied because social vulnerability is not a directly observable phenomenon: there exists no device with which to measure it. As a result, validation requires the use of proxies such as evacuation behavior, physical damage, mortality, or recovery activity. Van Zandt et al. (2012) created a composite social vulnerability indicator for Galveston, TX (USA), and then examined the correlation between calculated social vulnerability and evacuation behavior, degree of damage, resources for recovery, and rebuilding activity after Hurricane Ike (2008). Their findings indicate that spatial disparities persist for disadvantaged populations at every stage of disaster response and recovery. In other words, indicators of social vulnerability do help identify neighborhoods that have higher susceptibility to damage and lower capacity to respond. Fekete (2009) employed a similar approach, assessing the correlation between a composite social vulnerability indicator and evacuation behavior during a 2002 flood in Germany. Similarly, he found that home-owners, urban populations, and the elderly were differentially affected by the flood, thus supporting the idea that demographic variables and composite indicators are meaningful in discerning differential hazard impacts on particular groups or populations. Both of these studies examine whether the predicted social vulnerability from the analysis of variables included in the composite measure is observed during real events. Given that the composite indicators used in their respective studies are, in fact, correlated with actual observed damage suggests that the variables are valid for identifying vulnerability, and that composite indicators can be created without direct relation to exposure or hazard parameters. Data for external validation, however, is often lacking (Schmidtlein et al., 2008; Fekete, 2009). Data on evacuation behavior, degree of damage, cost of recovery, and extent of recovery is not systematically or uniformly collected after disasters. These values may be difficult to quantify, and do not necessarily reflect sensitivity across cases because every hazard event is unique, and it is difficult to disentangle whether outcomes are due to differing exposure or due to differing susceptibility (Van Zandt et al., 2012). 2.11.3.2.5.2 Internal vulnerability index validation An alternative approach to validation is to assess how changes in the underlying construction of the composite indicator change the resulting composite score. Such assessments typically calculate social vulnerability indices multiple times, changing decisions in

GIS and Coastal Vulnerability to Climate Change

249

construction slightly each time. This can help identify the influence of each modeling decision (e.g., variable selection, weighting) on the index, as well as the overall robustness of the index. The change in ranks of individual analysis units (e.g., US Census tracts) can help determine the sensitivity of the index to different construction approaches (Tate, 2013). Tate (2012) assessed how variable selection, scale, transformation, normalization, and weighting influence the results of deductive, hierarchical, and inductive social vulnerability indices. He calculated indicators using different combinations of approaches, changing a single modeling decision in the construction process during each run to determine the sensitivity of the index to each decision, as well as to identify interactions between decisions. He found that the deductive index had the worst performance of the three approaches. The hierarchical index was more accurate, with smaller changes in vulnerability under different modeling decisions. The inductive model was less sensitive to construction decisions, but tended to produce more outliers compared to the range of indicator scores. Most of the uncertainty in the deductive model emerges during the transformation stage, as there is significant variation between indices that use percentages, per capita measures, or densities (Tate, 2012). Transformation is also a major source of uncertainty in the hierarchical model (Tate, 2012). Additional analysis found that the vulnerability rank of census tracts varied greatly under different weighting schemes (Tate, 2013). This analysis also revealed that tracts with higher vulnerability were found to have more uncertainty or variation due to changes in weighting schemes and other changes in composite construction. In both the deductive and hierarchical model, most of the variance is explained by first-order effects, or single decisions in the construction process (Tate, 2012). There are few interactions between decisions. The inductive model, on the other hand, has little variance explained by first-order effects, but almost all decisions have high degree of interaction with each other. This high degree of interaction makes it difficult to determine what particular step in the decision process should be modified to increase accuracy (Tate, 2012). Schmidtlein et al. (2008), however, did find that SoVI results (an inductive model) are sensitive to the exact PCA methods that are used to derive components and weighting. They conclude that SoVI results may vary substantially based on these decisions, including which areas are identified as most vulnerable within the given reference frame. These sources of uncertainty appear to be consistent across multiple study areas, indicating that the uncertainty and sensitivity of social vulnerability indices is more likely a function of construction methodology than differences in demographics (Tate, 2012). However, other studies have shown that the influence of index construction decisions on the composite indicator differs based on the study area (Schmidtlein et al., 2008), suggesting that the geographic context for which the analysis is performed has an important impact on the relative behavior of the index. Regardless of the type of model or location, differences in how variables are weighted produces substantially different results. To address sensitivities to weighting and other construction decisions, Schmidtlein et al. (2008) suggest that index constructions should seek expert guidance to ensure both the overall results and the variables used are consistent with local understanding and contextual knowledge of the study area. Based on their experience using SoVI to evaluate social vulnerability in Norway, Holand and Lujala (2013) similarly recommend that conceptualizations of factors that influence social vulnerability indicators should be modified for the context in which they are applied. To date, validation studies demonstrate that social vulnerability indices are useful and may help predict differential preparation, response, damage, and recovery from disasters. Internal validation, however, demonstrates that the modeling decisions made during index construction truly matter, and such decisions should not be made arbitrarily. This finding drives home the important need for researchers and practitioners to create and distribute detailed documentation on indicator construction. The use of sensitivity analysis should also be widely adopted to help identify the sources of uncertainty in the indices resulting from modeling decisions.

2.11.3.2.6

Spatial resolution of vulnerability assessments

The scale of analysis of social data has long been an issue of great importance in geographical analysis (Kienberger et al., 2013). The modifiable areal unit problem can affect geographical studies that use spatially aggregated data: this issue manifests as a combination of a scale problem, where correlations between variables often increase with the level of aggregation, and an aggregation problem, in which correlations between enumerated units might depend just as much on the manner of the aggregation as they do the actual relationships between variables (Tate, 2013). Vulnerability assessments are often conducted at the census-tract or block-group level, as this is the level at which most socioeconomic data is available. Representing socioeconomic data in these polygons, however, can be problematic; grouping populations into socially constructed classifications is a normative exercise, and these spatial scales may not correspond to the location or scale where vulnerability manifests (Kienberger et al., 2013). The majority of the research on social vulnerability has been conducted at a household or individual level using qualitative methods (Tate, 2012; Schmidtlein et al., 2008). This work has identified key drivers of vulnerability, and is often used to select variables and characteristics to represent vulnerability of populations for quantitative methods (Tate, 2012). While this seems reasonable, the variables that influence vulnerability at the individual or household may not have the same relationship at the population level (Schmidtlein et al., 2008). The relationship between indicators and vulnerability might change when analysis moves between scale levels (Tate, 2012, 2013). Indeed, Schmidtlein et al. (2008) found that the power of PCA to explain variability in observed data decreased for smaller levels of aggregation (i.e., more variation was explained for counties compared to census tracts). Scale, however, did not strongly influence which variables were identified as most important in explaining variation. Relationships between variables aggregated in socially constructed classifications may be just as likely due to the aggregation scheme as they are the relationship between the variables themselves. It is important to consider whether the scale of analysis is appropriate to assess the phenomenon in question. Only

250

GIS and Coastal Vulnerability to Climate Change

when the scale of the model is appropriate are the results relevant; similarly, policies must be commensurate with the intrinsic scale of the problem to be effective (Kienberger et al., 2013). 2.11.3.2.6.1 Data availability Data variability often determines the spatial and temporal extent of a vulnerability assessment, but this may not align with the boundaries and dynamics of the system (Preston et al., 2011b). For example, most vulnerability assessments in the United States are conducted at the census-tract level because it is the smallest unit for which complete demographic data are available. This data availability issue may be particularly problematic for less-developed countries. For example, in their attempt to assess the vulnerability of Alexandria, Egypt to tsunamis, (Eckert et al., 2012) found that they were severely limited by the availability of socioeconomic data. In many situations, there may be large amounts of relevant data, but it is often fragmented and inconsistent in its collection or coding (Sullivan and Meigh, 2005). Inconsistency between data sources can be a major challenge when comparing different countries. For comparisons to be valid, data sources must be reliable (Balica et al., 2012). Reliance on preprocessed economic and social information can severely limit the ability of researchers and practitioners to accurately map socioeconomic vulnerability and support decision-making (Lieske, 2015). As a result, there is an urgent need to develop systematic and robust approaches to data collection and storage to facilitate vulnerability assessments (Sullivan and Meigh, 2005). 2.11.3.2.6.2 Data scaling and aggregation issues Another important consideration in vulnerability assessments is the fact that spatial data can be aggregated to provide a more generalized representation of an area, but in the absence of additional information, data cannot be scaled down to finer levels. This is an example of the ecological fallacy: individual attributes cannot be deduced from statistical inferences of the group to which such individuals belong. Block groups may be too coarse for local government planning efforts (Garbutt et al., 2015). However, block groups cannot be easily downscaled due to uncertainty in how the individuals in the population and their characteristics are spatially distributed within each block group. This is a major issue when only a portion of a block group polygon intersects a hazard zone, as one cannot accurately discern the individuals and characteristics of the population within the given hazard zone (Berke et al., 2015; Sahin and Mohamed, 2013). In this case, part of the population of the block group is exposed to the hazard, but the data available assumes spatial homogeneity of socioeconomic characteristics. When intersecting socioeconomic polygons with hazard exposure, we may assume that the population is evenly distributed across the polygon and that the total number of individuals (and their respective characteristics) exposed to the hazard is proportional to the percentage area of the socio-demographic polygon that is covered by the hazard exposure polygon. For example, in determining the number of residents in a floodplain, one would assume that the fraction of the population in the floodplain is equivalent to the fraction of the area in the floodplain. If 50% of the block group falls within the floodplain, then we should assume that 50% of the population is also in the floodplain and that demographic characteristics are uniform across the population. When assessing the vulnerability of Sarasota County, FL to elevated storm surge due to projected SLR, Frazier et al. (2014) used census blocks as their unit of analysis, determining the areal percentage of each block in the hazard zone and then including this percentage as a vulnerability indicator within their calculations. Several researchers have developed techniques to better estimate the population at risk. Felsenstein and Lichter (2014) improve upon the simple areal approach by using building data. They calculate the total floor space per building and then proportionately allocate people and their socioeconomic characteristics to buildings on a per square meter basis. They then re-aggregate to the hazard area. This provides a more accurate spatial distribution of inhabitants and allows for a better estimate of population exposed to hazards, but it still assumes that demographic characteristics are uniform across the population. Prasad (2016) addresses this challenge in a different way through the use of dasymetric mapping, a technique that utilizes ancillary data, such as land use and land cover data, to allocate the population based on density. To evaluate the evacuation needs of residents in the South East Florida floodplain, he created a 30  30 m raster grid population estimate based on land use and land cover data. Population was excluded from uninhabited areas (e.g., golf courses, agricultural land, water) and allocated to residential areas based on whether land cover analysis identified areas as high, medium, or low density. Then, he used the floodplain boundary as a mask to extract the pixels in the floodplain. This technique provides a more realistic estimate of the distribution of people within a given area and their vulnerability to the hazard of interest. Berke et al. (2015) use a similar approach to Prasad (2016) in their assessment of the social vulnerability of Washington, NC. Using LandScan data, a global population distribution model developed at Oak Ridge National Laboratory that calculates the number of people living within 90-m cells, they applied a weighting system for social vulnerability indicators; densely populated cells with high social vulnerability were scored higher than sparsely populated cells with a similar social vulnerability (Berke et al., 2015). LandScan is commonly used to disaggregate population counts within administrative units, transforming data on the census-tract level to a smaller, finer scale grid (Mondal and Tatem, 2012; Li et al., 2009). LandScan compiles data from population censuses around the world, distributing the population into cells based on land cover, proximity to roads, terrain slope, and nighttime light intensity (Mondal and Tatem, 2012). LandScan provides “ambient” population distributions, integrating diurnal movements and collective travel habits into a single measure. Global Rural Urban Mapping Project (GRUMP) is another product commonly used to estimate populations at risk of SLR and coastal hazards (Neumann et al., 2015; Mondal and Tatem, 2012). Similar to LandScan, GRUMP disaggregates population data from administrative units to grids using nighttime light satellite data. Nighttime light intensity is used as a proxy measure to identify urban areas. GRUMP distributes the population within the administrative unit based on rural–urban extents.

GIS and Coastal Vulnerability to Climate Change

251

While disaggregated population datasets can improve estimates of vulnerable populations, globally it is important to recognize that different datasets may influence the accuracy of results. Mondal and Tatem (2012) highlight this issue by comparing estimates of population in low-lying coastal areas using LandScan and GRUMP. While these disaggregated datasets produce similar results for developed countries in Europe and North America, there were substantial differences in less-developed countries in Africa, Asia, and South America. Some of these countries do not conduct regular censuses of their populations, thus limiting overall knowledge of human population distributions. As models of biophysical change improve in accuracy and detail, knowledge of human population distribution, especially in less-developed regions of the world, may remain a major limiting factor in our understanding of vulnerability to particular hazards, especially for hazards that have distinct spatial extents (Mondal and Tatem, 2012).

2.11.3.2.7

Dynamic landscapes and the assumption of stationary socioeconomic characteristics

Many vulnerability studies do not explicitly account for temporal changes within their analyses (Sahin and Mohamed, 2013; Balica et al., 2012). Studies often overlay future scenarios of biophysical change on current demographic and socioeconomic data, implicitly assuming these characteristics are stationary (Preston et al., 2011b; Berry and BenDor, 2015). Consequently, vulnerability assessments may underestimate the total population and value of assets at risk with respect to future climate impacts, which can lead to nonadaptive or even maladaptive outcomes (Kashem et al., 2016). Disaster losses worldwide have risen substantially due to increased investment in coastal areas (Beatley, 2009; de Moel et al., 2011). Development patterns will continue to be an important determinant of vulnerability in the future (McNamara et al., 2011). Therefore, examining trends over time can help researchers understand why people and investments are located in hazardous areas in the first place (Kashem et al., 2016). Social vulnerability assessments often focus solely on who is vulnerable and what characteristics make them vulnerable. For example, Kashem et al. (2016) argue that further research should focus on exploring why populations with characteristics typically associated with higher vulnerability (see above) relocate to, or are concentrated in, certain areas. Using SoVI, Kashem et al. (2016) analyzed how social vulnerability has changed over three decades (1980–2010) in Houston, TX; Tampa, FL; and New Orleans, LA. They found that the variables most important in explaining vulnerability change over time. Each city shows different patterns of change; in Houston, the growth of the Hispanic population has made the importance of ethnicity and citizenship status more important in determining vulnerability. In Tampa, gentrification of the inner city has pushed socially vulnerable populations to suburban census tracts, while some coastal locations have experienced high rates of growth in elderly populations due to the development of retirement communities. This work demonstrates that the relative importance of specific characteristics that influence vulnerability, and the concentration of those characteristics, varies spatially and temporally. Social vulnerability is not static, but is instead a dynamic concept. Some studies have attempted to model future population growth and migration to better estimate the number of people that will be exposed to future natural hazards (Neumann et al., 2015; Kleinosky et al., 2007). For example, in their SLR vulnerability analysis of the City of the Gold Coast in Southeast Queensland, Australia, Sahin and Mohamed (2013) attempted to incorporate SLR, population growth, and changing development patterns. To accomplish this, they combined GIS modeling with system dynamics modeling and multicriteria analyses. For each time step, they modeled inundation, overall area at risk, population growth, and the population at risk. One major challenge in this approach involves accounting for the interactions and feedbacks between environmental change and migration (Neumann et al., 2015). Vulnerability assessments that account for population growth rely on separate models of population growth and predicted flooding. Typically, these studies assume past trends will continue, and that coastal cities will continue to grow (Balica et al., 2012). However, increased flooding and extreme events may reverse migration to and discourage future investment in coastal areas. Several studies have attempted to better understand when coastal residents choose to retreat from the coastal area (Werner and McNamara, 2007; Smith et al., 2009; Williams et al., 2013), but further research is needed to better understand feedbacks between climate change and population and development patterns. Rather than model population growth, some studies assess vulnerability based on alternative future scenarios of population and economic change. Kleinosky et al. (2007), for example, develop three future global impact scenarios based on various SLR and population distribution scenarios. Although dozens of possible future impact scenarios could have been created (by combining population growth, population distribution, and SLR scenarios), the selected scenarios provide realistic upper and lower bounds of future vulnerability. This work demonstrates that, even with low estimates of population growth and SLR, the absolute number of people in each flood-risk zone increases over time. Similarly, de Moel et al. (2011) combined two extreme scenarios of population and economic change with flood projections in the Netherlands. The socioeconomic scenarios were considered the upper and lower extremes with regards to the amount of projected urban development. Using the extent of current flood hazard zones, they calculated the potential flooding damage for each scenario. Their analysis suggested that urban development in flood-prone areas may increase between 30% and 125%, equating to a 3- to 10-fold increase in flood damage, respectively. This work demonstrates that time is an important consideration in determining overall social vulnerability. As with biophysical vulnerability or exposure, social vulnerability will change over time. Historically, the increasing investments and growing populations in high-risk locations have increased total disaster losses. The total cost of hazards is likely to continue in the future due to development patterns, even if there is no increase in storm intensity or sea level. Development patterns also have a profound influence on the underlying causes of social vulnerability. As gentrification and suburbanization occur, vulnerable populations may be displaced to the urban fringe, creating new challenges and centers of vulnerability.

252

2.11.4

GIS and Coastal Vulnerability to Climate Change

Conclusions: Improving Decision-Making for Climate Vulnerability

Zerger (2002) argued that the success of GIS-based vulnerability assessments should be based on the ability to improve decisionmaking. However, research rarelydif everdhas examined the consequences of spatial modeling on decision-making. There is limited evidence that vulnerability assessments actually influence decisions, investments, or adaptation activities in a meaningful way (Frazier et al., 2013; de Sherbinin, 2014). Vulnerability assessments published in the peer-reviewed literature often assert that they can improve make decision-making by providing additional information to decision-makers. This argument rests on the concept of a “knowledge-deficit,” which assumes that the quality of decision-making outcomes is limited by a lack of information (Preston et al., 2011b). This deficit model of communication, which asserts that the public and decision-makers merely need to be informed and presented facts in order to perceive risks, is overly simplistic (Milfont et al., 2014). Multiple barriers to action exist, including distrust in information sources, externalizing responsibility, optimism bias, attention to other priorities, fatalism, lack of perceived power to take action, and ideological beliefs (Milfont et al., 2014). To effectively aid decision-making, vulnerability assessments need to better understand decision-making processes and barriers to action. There are opportunities to integrate assessments of climate change vulnerability into existing decision-making processes. For example, Berry and BenDor (2015) demonstrate how risk of inundation due to SLR could be integrated into urban development suitability analysis. Development suitability analysis is a common tool in land use planning, identifying preferred areas for development. Including SLR projections in suitability analysis better reflects future development suitability and will discourage growth in areas that will be at risk of inundation due to SLR. Suitability analysis may also be a better tool to guide land use planning compared to separate vulnerability assessments, as planners are familiar with suitability analysis and the inclusion of additional factors offers a more holistic approach for identifying appropriate areas for future development (Berry and BenDor, 2015). As a final part of this discussion, it is important to emphasize the growing body of literature demonstrating that when spatial decision support systems and visualization tools are integrated with decision-making process, they can be an important aid in climate change adaptation (Lieske, 2015). To ensure that vulnerability assessments are in fact relevant, we encourage researchers and practitioners to consider the following questions, which we explore in more detail in the following: 1. 2. 3. 4.

What are the objectives of the assessment? What are the benefits for the stakeholders? Who is the audience? Who will participate and how will results be communicated? How is the assessment framed? For whom, and to what, is vulnerability being assessed? Should we represent uncertainty, and if so, how?

2.11.4.1

What are the objectives of the assessment? What are the benefits for the stakeholders?

In considering the objectives of a particular vulnerability assessment, it can be useful to understand the process by which communities adapt to climate change (Lieske, 2015). While multiple models of the climate adaptation process have been proposed, here we draw on the model described by Moser and Ekstrom (2010), as it provides an appropriate amount of detail, without being overly specific, and focuses on the process as a whole and its iterative nature. They divide adaptation into three primary phases: understanding, planning, and managing, each of which includes three subphases (see Fig. 4). Moser and Ekstrom (2010) divide each of the three primary phases into subphases; Phase I, Understanding, includes (i) problem detection and awareness raising; (ii) information gathering and use to deepen problem understanding; and (iii) problem (re)definition. Phase II, Planning, involves (iv) development of adaptation options; (v) assessment of options; and (vi) selection of option(s). Finally, Phase III, the management phase, consists of (vii) implementation of the selected option(s); (viii) monitoring the environment and outcome of the realized option(s); and (ix) evaluation. Vulnerability assessments can aid in many of these phases. Most existing vulnerability assessments focus primarily on the very first subphase, problem detection and raising awareness. Early vulnerability assessments largely focused on communicating the extent of potential damage from SLR (Neumann et al., 2010; Cutter et al., 2008). For information to be accepted and used appropriately in decision-making processes, it must be accessible, salient, trusted, and legitimate (Moser and Ekstrom, 2010). Stakeholders may feel like vulnerability indices are a black-box process focused on turning out results, and may not accept or fully integrate findings into decision-making. This is particularly true for social vulnerability assessments that tend to rely more on theory and proxies (Balica et al., 2012). For vulnerability indices to be accepted, the process and underlying assumptions need to be explained to the user (Balica et al., 2012). To ensure that vulnerability assessments are accessible, salient, and legitimate requires an understanding of the audience and their receptivity to the issue. Most vulnerability assessments fail to consider these issues, and many lack a specific target audience (de Sherbinin, 2014). Raising awareness is only the first step in the process; for action to occur, multiple steps must occur, including defining the problem, selecting adaptation options, and implementing those options. Vulnerability assessment can also be used in the later phases of adaptation planning. For example, social vulnerability assessments may help redefine climate change adaptation as an issue encompassing both social justice and equity. Evaluations of early adaptation plans found that these plans predominantly frame climate change as an environmental issue (Preston et al., 2011a; Baker et al., 2012). Vulnerability assessments focused on social aspects may help change how the problem is understood, thereby engaging stakeholders from different organizations. In turn, this can ensure that the options considered in the planning phase include actions to address underlying causes of vulnerability, such as poor education, limited economic opportunities, lack of affordable housing, and concentrated poverty.

GIS and Coastal Vulnerability to Climate Change

Detect problem

253

Understanding

Gather/use info Evaluate

Managing

(Re)define problem Monitor option and environment

Implement option

Develop options

Assess options

Planning

Select option(s)

Fig. 4

The primary phases and subphases of climate adaptation. Adapted from Moser and Ekstrom (2010).

Vulnerability assessments can be powerful tools for assessing and comparing policy options (Ran and Nedovic-Budic, 2016; Preston et al., 2011b). In fact, some argue that vulnerability assessments should be used primarily as a policy evaluation tool, with the goal of comparing adaptation strategies and determining the most appropriate and effective option (Preston et al., 2011b). For example, Brown (2006) used GIS to help decision-makers in North Norfolk, UK visualize different coastal defense policies, modeling landscape change, and the number of properties at risk of inundation through eight scenarios. They applied each of two SLR scenarios, 8 and 42 cm, to four policy scenarios, which involved, (1) maintaining existing coastal defenses; (2) making minor inland adjustments; (3) a major retreat of defensive lines inland; and (4) maintaining no defenses. Landscape change varied greatly between different coastal defense policies, resulting in important differences in the number of properties at risk. In addition, the model was used to estimate the cost of policy options based on maintenance cost of existing structures and capital cost of building new structures. Similarly, Sahin and Mohamed (2013) surveyed residents, politicians, and experts to determine stakeholders’ preferred policies to address SLR and then modeled the most popular policies. These policies included, (1) retreat; (2) improvement of building design; (3) improvement of public awareness; (4) construction of protective structures; or (5) no action. Their results suggest that protective structures provided no reduction in total land area or total population at risk, especially for high SLR projections. In this case, rivers and canals in their study area allowed floodwaters to move inland despite protective structures. Again, these types of studies aid in the assessment of different adaptation options. However, to date, most vulnerability assessments do not consider how differing decisions will influence outcomes. Many vulnerability assessments claim to be baseline assessments (Cutter et al., 2010). As such, they play an important role in the evaluation phase of the adaptation process. In theory, repeated vulnerability assessments can help determine whether previous adaptation actions were effective. However, it is extremely difficult, if not impossible, to demonstrate whether adaptation actions cause changes in vulnerability (Baer, 1997). Studies analyzing vulnerability overtime (Kashem et al., 2016) show that many of the changes in vulnerability are related to larger patterns in development and migration, such as suburbanization. Disentangling the effect of these larger patterns from the influence of adaptation actions is a major challenge.

2.11.4.2

Who Is the Audience? Who Will Participate and How Will Results be Communicated?

The public often views climate change impacts as “remote” (Milfont et al., 2014); many individuals feel that climate change will occur far away, in the distant future, and to people different from oneself (Milfont et al., 2014). Maps are a powerful tool for communicating risk to both the public and decision-makers (Preston et al., 2011b; de Sherbinin, 2014). When determining the exact method of communication, it is important to consider the target audience. When communicating with the public, maps become “boundary objects,” or points of reference that facilitate conversation and learning (Preston et al., 2011b). Maps can be used to engage community members in a conversation about their perceptions of risks

254

GIS and Coastal Vulnerability to Climate Change

and solicit their knowledge about the many drivers of vulnerability (Krishnamurthy et al., 2011). Pairing vulnerability maps with visualization tools can provide an even more powerful impression for the audience. For example, inundation and landscape change can be depicted using computer-generated graphics or historical images (Lieske, 2015; Brown, 2006). Dockerty et al. (2005) used GIS to create “futurescapes,” potential future landscapes that incorporate climate change impacts, to engage agricultural landowners in the Wissey Valley region of England in conversations about climate change impacts. This imagery can strongly impact risk perceptions and deepen understanding of hazards, and may be more accessible to a wide range of audiences and stakeholders (Lieske, 2015). For example, the Collaborative for Landscape Planning launched a website to help individuals visualize projected climate impacts and potential adaptation scenarios. This information can be quite powerful when discussing the outcomes and tradeoffs between different alternatives with elected officials.

2.11.4.3

How Is the Assessment Framed? For Whom, and to What, Is Vulnerability Being Assessed?

Vulnerability assessments often assert that they are developed to support policy or decision-making, even though such efforts suffer a lack of specificity regarding who precisely constitutes the “policy audience” (de Sherbinin, 2014). Generally, it is assumed that distilling complex concepts like social vulnerability into indicators and maps will be useful for decision-makers. Unfortunately, there is often a disconnect between the information provided in such assessments and the information decision-makers need (Moser and Ekstrom, 2010). Practitioners often criticize third-party assessments for failing to adequately understand and adapt to a municipality’s needs. When municipalities are not engaged early in the process, results tend to be viewed as “overly complex” (Graham and Mitchell, 2016). Alternatively, some organizations may take a long time to generate their assessments, only to come back to the municipality with too much information to sort through. To increase the relevance of vulnerability assessment for decision-making, assessments should better define the targeted end-users and incorporate them into the assessment process (Preston et al., 2011b; Balica et al., 2012). This involvement may range from simply considering the scale that is actually relevant for decision-making, to collaboratively developing the variables used in the assessment. For example, Frazier et al. (2013) conducted focus groups with representatives from the hazard mitigation, public safety, engineering, and public work sectors in Sarasota County, FL, to determine what vulnerability indicators were most relevant at the local level. These indicators were then incorporated into the vulnerability assessment for the area. The demonstrated sensitivity of social vulnerability indices to variable selection, transformation, and weighting suggests that end-users should be involved in making these decisions to ensure the end result is relevant (Balica et al., 2012). Vulnerability assessments often assert that they can help make decision-making more “transparent” and data driven (de Sherbinin, 2014), but it is important to recognize that there are multiple, subjective decisions in developing vulnerability assessments and indicators. If these decisions are not explained, there is a potential for stakeholders to feel that assessments are a veiled process that is aimed at turning out results, which may not be appropriate for a particular area or community.

2.11.4.4

Should We Represent Uncertainty, and if so, How?

The outcome of vulnerability assessments is typically a map or visualization of the area, consisting of physical assets and populations that may be exposed to a hazard. While sets of maps and images can illustrate a range of possible outcomes, representing uncertainty in such visualizations can be challenging (Dockerty et al., 2005). Clear and detailed depictions of vulnerable areas, either visualizations or maps, can convey a false sense of precision and legitimacy (Wechsler, 2007). This raises the question; how should uncertainty be visualized? As discussed earlier, there are multiple sources of uncertainty in vulnerability assessments, including data and modeling decisions, which are rarely discussed when communicating results (Zerger, 2002; Preston et al., 2011b). Failing to verbally express or graphically represent this uncertainty can lead to overconfidence in predictions and convey a false sense of precision and legitimacy of the information to end-users (Preston et al., 2011b). Presenting multiple scenarios is a potential way of highlighting the uncertainty in vulnerability assessments. Scenarios also convey the fact that the future is not fixed and that outcomes depend on how communities respond to the challenges posed by climate change and other drivers of change (Dockerty et al., 2005). Still, some worry that presenting multiple scenarios without sufficient guidance can result in an information overload. Users may choose to ignore information that is too complicated for them, or will disproportionately make use of some types of information over others (Preston et al., 2011b). Alternatively, uncertainty can be communicated cartographically, a topic that has received increasing scholarly attention in recent years. Techniques for visualizing attribute or variable uncertainty include color saturation, crispness, transparency, texture, interactivity, animation, and dimensionality (Tate, 2013). However, does uncertainty in vulnerability assessments actually matter? Zerger (2002) interviewed risk managers about how they use maps for decision-making, concluding that visualizing uncertainty in maps was not important to decision-makers. For example, emergency managers tended to use a precautionary approach evacuating entire neighborhoods, glossing over uncertainty in whether individual structures will be inundated when assessing flooding or SLR inundation. Public officials were also resistant to using detailed maps that visualize uncertainty for public communication, as the public may misinterpret their personal risk. Decisions about how to best represent uncertainty are dependent on the objectives and the audience of the vulnerability assessment.

GIS and Coastal Vulnerability to Climate Change 2.11.4.5

255

A Final Word

Throughout this article, we have explored the theoretical considerations, technical caveats, process, and dissemination associated with using GIS to evaluate vulnerability to coastal climate risks, including flooding and SLR inundation. Many significant research avenues remain, including a number relating to the use of GIS in vulnerability assessments for changing policies and helping communities to adapt to uncertain futures. Adaptation remains a complex issue with multiple political, social, environmental, and economic dimensions. While GIS-based assessment can help identify problems, prioritize solutions, and raise awareness of climate risks through innovative means of communicating detailed spatial and temporal information, it may not be enough to motivate adaptation action. In order to identify the goals and objectives of adaptation and prevent conflict among stakeholder goals (Lieske, 2015), addressing the four questions we pose above will become increasingly imperative for improving adaptation efforts.

References Baer, W.C., 1997. General plan evaluation criteria: An approach to making better plans. Journal of the American Planning Association 63, 329–344. Baker, I., Peterson, A., Brown, G., McAlpine, C., 2012. Local government response to the impacts of climate change: An evaluation of local climate adaptation plans. Landscape and Urban Planning 107, 127–136. Balica, S.F., Wright, N.G., van der Meulen, F., 2012. A flood vulnerability index for coastal cities and its use in assessing climate change impacts. Natural Hazards 64, 73–105. Beatley, T., 2009. Planning for coastal resilience: Best practices for calamitous times. Island Press, Washington, DC. Berke, P., Newman, G., Lee, J., et al., 2015. Evaluation of networks of plans and vulnerability to hazards and climate change: A resilience scorecard. Journal of the American Planning Association 81, 287–302. Berry, M., BenDor, T.K., 2015. Integrating sea level rise into development suitability analysis. Computers, Environment and Urban Systems 51, 13–24. Blomgren, S., 1999. A digital elevation model for estimating flooding scenarios at the Falsterbo Peninsula. Environmental Modelling & Software 14, 579–587. Bodoque, J.M., Guardiola-Albert, C., Aroca-Jiménez, E., Eguibar, M.A., Martínez-Chenoll, M.L., 2016. Flood damage analysis: First floor elevation uncertainty resulting from LiDARderived digital surface models. Remote Sensing 8, 604. Brown, I., 2006. Modeling future landscape change on coastal floodplains using a rule-based GIS. Environmental Modelling & Software 21, 1479–1490. Chang, S.W., Clement, T.P., Simpson, M.J., Lee, K.K., 2011. Does sea-level rise have an impact on saltwater intrusion? Advances in Water Resources 34, 1283–1291. Chock, G., Robertson, I.N., Kriebel, D.L., Francis, M., Nistor, I., 2013. Tohoku, Japan, earthquake and tsunami of 2011: Performance of structures under tsunami loads. American Society of Civil Engineers, Reston, Virginia. Community Resilience Group, 2015. Community resilience planning guide for buildings and infrastructure systems. National Institute of Standards and Technology (NIST), Gaithersburg, Maryland. http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.1190v1.pdf. Conrads, P.A., Darby, L.S., 2016. Development of a coastal drought index using salinity data. Bulletin of the American Meteorological Society. http://dx.doi.org/10.1175/BAMS-D15-00171.1 (in press). Cooper, H.M., Fletcher, C.H., Chen, Q., Barbee, M.M., 2013. Sea-level rise vulnerability mapping for adaptation decisions using LiDAR DEMs. Progress in Physical Geography 37, 745–766. Coveney, S., Fotheringham, A.S., 2011. The impact of DEM data source on prediction of flooding and erosion risk due to sea-level rise. International Journal of Geographical Information Science 25, 1191–1211. Cutter, S.L., 2006. Moral hazard, social catastrophe: The changing face of vulnerability along the hurricane coasts. The Annals of the American Academy of Political and Social Science 604, 102–112. Cutter, S.L., Barnes, L., Berry, M., et al., 2008. A place-based model for understanding community resilience to natural disasters. Global Environmental Change 18, 598–606. Cutter, S.L., Emrich, C.T., Burton, C., 2010. Disaster resilience indicators for benchmarking baseline conditions. Journal of Homeland Security and Emergency Management 7, 1–22. Davis, R.A., FitzGerald, D.M., 2004. Beaches and coasts. Blackwell Publishing, Malden, MA. Day, R.H., Williams, T.M., Swarzenski, C.M., 2007. Hydrology of tidal freshwater forested wetlands of the Southeastern United States. In: Ecology of Tidal Freshwater Forested Wetlands of the Southeastern United States. Springer, The Netherlands, pp. 29–63. De Risi, R., Jalayer, F., De Paola, F., et al., 2013. Flood risk assessment for informal settlements. Natural Hazards 69, 1003–1032. Dockerty, T., Lovett, A., Sünnenberg, G., Appleton, K., Parry, M., 2005. Visualizing the potential impacts of climate change on rural landscapes. Computers, Environment and Urban Systems 29, 297–320. Eckert, S., Jelinek, R., Zeug, G., Krausmann, E., 2012. Remote sensing-based assessment of tsunami vulnerability and risk in Alexandria, Egypt. Applied Geography 32, 714–723. Eggleston, J., Pope, J.P., 2014. Land subsidence and relative sea-level rise in the southern Chesapeake Bay Region. Circular 1392. United States Geological Survey, Reston, VA. Fekete, A., 2009. Validation of a social vulnerability index in context to river-floods in Germany. Natural Hazards and Earth System Sciences 9, 393–403. Felsenstein, D., Lichter, M., 2014. Social and economic vulnerability of coastal communities to sea-level rise and extreme flooding. Natural Hazards 71, 463–491. Ferguson, G., Gleeson, T., 2012. Vulnerability of coastal aquifers to groundwater use and climate change. Nature Climate Change 2, 342–345. Fisher, P.F., Tate, N.J., 2006. Causes and consequences of error in digital elevation models. Progress in Physical Geography 30, 467–489. FitzGerald, D.M., Fenster, M.S., Argow, B.A., Buynevich, I.V., 2008. Coastal impacts due to sea-level rise. Annual Review of Earth and Planetary Sciences 36, 601. Flanagan, B.E., Gregory, E.W., Hallisey, E.J., Heitgerd, J.L., Lewis, B., 2011. A social vulnerability index for disaster management. Journal of Homeland Security and Emergency Management 8, 1–22. Frazier, T.G., Thompson, C.M., Dezzani, R.J., Butsick, D., 2013. Spatial and temporal quantification of resilience at the community scale. Applied Geography 42, 95–107. Frazier, T.G., Thompson, C.M., Dezzani, R.J., 2014. A framework for the development of the SERV model: A spatially explicit resilience-vulnerability model. Applied Geography 51, 158–172. Frazier, T.G., Wood, N., Yarnal, B., Bauer, D.H., 2010. Influence of potential sea level rise on societal vulnerability to hurricane storm-surge hazards, Sarasota County, Florida. Applied Geography 30, 490–505. Fredsøe, J., Diegaard, R., 1992. Mechanics of coastal sediment transport. World Scientific Publishing, Hackensack, NJ. Füssel, H.M., 2007. Adaptation planning for climate change: Concepts, assessment approaches, and key lessons. Sustainability Science 2, 265–275. Gallien, T.W., Sanders, B.F., Flick, R.E., 2014. Urban coastal flood prediction: Integrating wave overtopping, flood defenses and drainage. Coastal Engineering 91, 18–28. Garbutt, K., Ellul, C., Fujiyama, T., 2015. Mapping social vulnerability to flood hazard in Norfolk, England. Environmental Hazards 14, 156–186. Gemitzi, A., Tolikas, D., 2007. HYDRA model: Simulation of salt intrusion in coastal aquifers using visual basic and GIS. Environmental Modelling & Software 22, 924–936. Gesch, D.B., 2009. Analysis of LiDAR elevation data for improved identification and delineation of lands vulnerable to sea-level rise. Journal of Coastal Research 10053, 49–58. Graham, A., Mitchell, C.L., 2016. The role of boundary organizations in climate change adaptation from the perspective of municipal practitioners. Climatic Change 139, 381–395.

256

GIS and Coastal Vulnerability to Climate Change

Hackney, C.T., Avery, G.B., Leonard, L.A., Posey, M., Alphin, T., 2007. Biological, chemical, and physical characteristics of tidal freshwater swamp forests of the Lower Cape Fear River Estuary, North Carolina. In: Conner, H., Doyle, T.W., Krauss, K.W. (Eds.), Ecology of tidal freshwater forested wetlands. Springer, Dordretcht, The Netherlands, pp. 183–222. Hinkel, J., Lincke, D., Vafeidis, A.T., et al., 2014. Coastal flood damage and adaptation costs under 21st century sea-level rise. Proceedings of the National Academy of Sciences 111, 3292–3997. Holand, I.S., Lujala, P., 2013. Replicating and adapting an index of social vulnerability to a new context: A comparison study for Norway. The Professional Geographer 65, 312–328. IPCC (Intergovernmental Panel on Climate Change), 2012. Managing the risks of extreme events and disasters to advance climate change adaption. In: Cambridge University Press, New York, NY. Januchowski, S.R., Pressey, R.L., VanDerWal, J., Edwards, A., 2010. Characterizing errors in digital elevation models and estimating the financial costs of accuracy. International Journal of Geographical Information Science 24, 1327–1347. Jongman, B., Ward, P.J., Aerts, J.C.J.H., 2012. Global exposure to river and coastal flooding: Long term trends and changes. Global Environmental Change 22, 823–835. Kashem, S.B., Wilson, B., Van Zandt, S., 2016. Planning for climate adaptation: Evaluating the changing patterns of social vulnerability and adaptation challenges in three coastal cities. Journal of Planning Education and Research 36, 304–318. Kienberger, S., Blaschke, T., Zaidi, R.Z., 2013. A Framework for spatio-temporal scales and concepts from different disciplines: The ‘vulnerability cube’. Natural Hazards 68, 1343–1369. Kleinosky, L.R., Yarnal, B., Fisher, A., 2007. Vulnerability of Hampton Roads, Virginia to storm-surge flooding and sea-level rise. Natural Hazards 40, 43–70. Krauss, K.W., Duberstein, J.A., Doyle, T.W., et al., 2009. Site condition, structure, and growth of baldcypress along tidal/non-tidal salinity gradients. Wetlands 29 (2), 505–519. Krishnamurthy, P.K., Fisher, J.B., Johnson, C., 2011. Mainstreaming local perceptions of hurricane risk into policymaking: A case study of community GIS in Mexico. Global Environmental Change 21, 143–153. Li, H., Lin, L., Burks-Copes, K.A., 2013. Modeling of coastal inundation, storm surge, and relative sea-level rise at Naval Station Norfolk, Norfolk, Virginia, USA. Journal of Coastal Research 286, 18–30. Li, X., Rowley, R.J., Kostelnick, J.C., et al., 2009. GIS analysis of global impacts from sea level rise. Photogrammetric Engineering & Remote Sensing 75, 807–818. Lieske, D.J., 2015. Coping with climate change: The role of spatial decision support tools in facilitating community adaptation. Environmental Modelling & Software 68, 98–109. Loáiciga, H.A., Pingel, T.J., Garcia, E.S., 2012. Sea water intrusion by sea-level rise: Scenarios for the 21st century. Ground Water 50, 37–47. Lorenzo-Trueba, J., Ashton, A.D., 2014. Rollover, drowning, and discontinuous retreat: Distinct modes of barrier response to sea-level rise arising from a simple morphodynamic model. Journal of Geophysical Research: Earth Surface 119, 779–801. Manda, A.K., Giuliano, A.S., Allen, T.R., 2014. Influence of artificial channels on the source and extent of saline water intrusion in the wind tide dominated wetlands of the Southern Albemarle Estuarine System (USA). Environmental Earth Sciences 71, 4409–4419. Martinich, J., Neumann, J., Ludwig, L., Jantarasami, L., 2013. Risks of sea level rise to disadvantaged communities in the United States. Mitigation and Adaptation Strategies for Global Change 18, 169–185. McNamara, D.E., Murray, A.B., Smith, M.D., 2011. Coastal sustainability depends on how economic and coastline responses to climate change affect each other. Geophysical Research Letters 38, 1–5. Melillo, J.M., Richmond, T.C., Yohe, G.W. (Eds.), 2014. Climate change impacts in the United States: The third national climate assessment. U.S. Global Change Research Program. http://purl.fdlp.gov/GPO/gpo48682. Milfont, T.L., Evans, L., Sibley, C.G., Ries, J., Cunningham, A., 2014. Proximity to coast is linked to climate change belief. PLoS ONE 9, 1–8. de Moel, H., Aerts, J.C.J.H., Koomen, E., 2011. Development of flood exposure in the Netherlands during the 20th and 21st century. Global Environmental Change 2, 620–627. Mondal, P., Tatem, A.J., 2012. Uncertainties in measuring populations potentially impacted by sea level rise and coastal flooding. PLoS ONE 7, 1–7. Moser, S.C., Ekstrom, J.A., 2010. A framework to diagnose barriers to climate change adaptation. Proceedings of the National Academy of Sciences 107, 22026–22031. Neumann, B., Vafeidis, A.T., Zimmermann, J., Nicholls, R.J., 2015. Future coastal population growth and exposure to sea-level rise and coastal floodingdA global assessment. PLoS One 10, 1–34. Neumann, J.E., Hudgens, D.E., Herter, J., Martinich, J., 2010. Assessing sea-level rise impacts: A GIS-based framework and application to coastal New Jersey. Coastal Management 38, 433–455. North Carolina Coastal Resources Commission, 2016. North Carolina Sea Level Rise Assessment Report: 2015 Update to the 2010 Report and 2012 Addendum. North Carolina Coastal Resources Commission Science Panel, Raleigh, NC. Poulter, B., Halpin, P.N., 2008. Raster modeling of coastal flooding from sea-level rise. International Journal of Geographical Information Science 22, 167–182. Poulter, B., Feldman, R.L., Brinson, M.M., et al., 2009. Sea-level rise research and dialogue in North Carolina: Creating windows for policy change. Ocean and Coastal Management 52, 147–155. Poulter, B., Goodall, J.L., Halpin, P.N., 2008. Applications of network analysis for adaptive management of artificial drainage systems in landscapes vulnerable to sea level rise. Journal of Hydrology 357, 207–217. Prasad, S., 2016. Assessing the need for evacuation assistance in the 100 year floodplain of South Florida. Applied Geography 67, 67–76. Preston, B.L., Westaway, R.M., Yuen, E.J., 2011a. Climate adaptation planning in practice: An evaluation of adaptation plans from three developed nations. Mitigation and Adaptation Strategies for Global Change 16, 407–438. Preston, B.L., Yuen, E.J., Westaway, R.M., 2011b. Putting vulnerability to climate change on the map: A review of approaches, benefits, and risks. Sustainability Science 6, 177–202. Prime, T., Brown, J.M., Plater, A.J., 2015. Physical and economic impacts of sea-level rise and low probability flooding events on coastal communities. PLoS ONE 10, 1–28. Ran, J., Nedovic-Budic, Z., 2016. Integrating spatial planning and flood risk management: A new conceptual framework for the spatially integrated policy infrastructure. Computers, Environment and Urban Systems 57, 68–79. Romero Lankao, P., Qin, H., 2011. Conceptualizing urban vulnerability to global climate and environmental change. Current Opinion in Environmental Sustainability 3, 142–149. Rygel, L., O’sullivan, D., Yarnal, B., 2006. A method for constructing a social vulnerability index: An application to hurricane storm surges in a developed country. Mitigation and Adaptation Strategies for Global Change 11, 741–764. Saha, A.K., Saha, S., Sadle, J., et al., 2011. Sea level rise and South Florida coastal forests. Climatic Change 107, 81–108. Sahin, O., Mohamed, S., 2013. A spatial temporal decision framework for adaptation to sea level rise. Environmental Modelling & Software 46, 129–141. Sallenger, A.H., Doran, K.S., Howd, P.A., 2012. Hotspot of accelerated sea-level rise on the Atlantic Coast of North America. Nature Climate Change 2, 884–888. Schaeffer, M., Hare, W., Rahmstorf, S., Vermeer, M., 2012. Long-term sea-level rise implied by 1.5 C and 2 C warming levels. Nature Climate Change 2, 1–4. Schmid, K.A., Hadley, B.C., Wijekoon, N., 2011. Vertical accuracy and use of topographic LIDAR data in coastal marshes. Journal of Coastal Research 27 (6), 116–132. Schmidtlein, M.C., Deutsch, R.C., Piegorsch, W.W., Cutter, S.L., 2008. A sensitivity analysis of the social vulnerability index. Risk Analysis 28, 1099–1114. Seenath, A., 2015. Modeling coastal flood vulnerability: Does spatially-distributed friction improve the prediction of flood extent? Applied Geography 64, 97–107. Sharp, J.M., Hill, D.W., 1995. Land subsidence along the Northeastern Texas Gulf Coast: Effects of deep hydrocarbon production. Environmental Geology 25, 181–191. de Sherbinin, A., 2014. Climate change hotspots mapping: What have we learned? Climatic Change 123 (1), 23–37. Smith, M.D., Slott, J.M., McNamara, D.E., Murray, A.B., 2009. Beach nourishment as a dynamic capital accumulation problem. Journal of Environmental Economics and Management 58 (1), 58–71.

GIS and Coastal Vulnerability to Climate Change

257

Sullivan, C., Meigh, J., 2005. Targeting attention on local vulnerabilities using an integrated index approach: The example of the climate vulnerability index. Water Science and Technology: A Journal of the International Association on Water Pollution Research 51, 69–78. Tate, E., 2012. Social vulnerability indices: A comparative assessment using uncertainty and sensitivity analysis. Natural Hazards 63, 325–347. Tate, E., 2013. Uncertainty analysis for a social vulnerability index. Annals of the Association of American Geographers 103, 526–543. Upton, J., 2016. Ghost forests appear as rising seas kill trees, Climate Central. http://www.climatecentral.org/news/ghost-forests-appear-as-rising-tides-kill-trees-20701. Van Zandt, S., Peacock, W.G., Henry, D.W., et al., 2012. Mapping social vulnerability to enhance housing and neighborhood resilience. Housing Policy Debate 22, 29–55. Wechsler, S.P., 2007. Uncertainties associated with digital elevation models for hydrologic applications: A review. Hydrology and Earth System Sciences 11, 1481–1500. Wechsler, S.P., Kroll, C.N., 2006. Quantifying DEM uncertainty and its effect on topographic parameters. Photogrammetric Engineering & Remote Sensing 72, 1081–1090. Werner, A.D., Bakker, M., Post, V.E.A., et al., 2013. Seawater intrusion processes, investigation and management: Recent advances and future challenges. Advances in Water Resources 51, 3–26. Werner, B.T., McNamara, D.E., 2007. Dynamics of coupled human-landscape systems. Geomorphology 91, 393–407. White, W.A., Tremblay, T.A., 1995. Submergence of wetlands as a result of human-induced subsidence and faulting along the upper Texas gulf coast. Journal of Coastal Research 11, 788–807. Williams, Z.C., McNamara, D.E., Smith, M.D., Murray, A.B., Gopalakrishnan, S., 2013. Coupled economic-coastline modeling with suckers and free riders. Journal of Geophysical Research: Earth Surface 118, 887–899. Zerger, A., 2002. Examining GIS decision utility for natural hazard risk modeling. Environmental Modelling & Software 17, 287–294.

2.12 Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study in Iran Alireza Motevalli, Tarbiat Modares University, Tehran, Iran Hamid Reza Pourghasemi*, Shiraz University, Shiraz, Iran Mohsen Zabihi, Tarbiat Modares University, Tehran, Iran © 2018 Elsevier Inc. All rights reserved.

2.12.1 2.12.1.1 2.12.1.1.1 2.12.1.1.2 2.12.1.1.3 2.12.2 2.12.2.1 2.12.2.2 2.12.2.2.1 2.12.2.2.2 2.12.2.2.3 2.12.2.2.4 2.12.2.2.5 2.12.3 2.12.3.1 2.12.3.2 2.12.3.3 2.12.3.4 2.12.3.4.1 2.12.3.4.2 2.12.3.4.3 2.12.3.5 2.12.4 2.12.5 References

2.12.1

Introduction Importance and Application of GIS in Determining the Spatial Susceptibility Map of Landslide GIS techniques in landslide hazard mapping GIS techniques in landslide hazard assessment and evaluation GIS techniques in landslide prediction and monitoring Materials and Methods The Study Area Methodology Landslide inventory mapping Landslide conditioning factors (LCFs) Landslide susceptibility spatial modeling (LSSM) Assessment of the importance of variables using the LVQ algorithm Accuracy assessment of LSSMs Results Investigation of Multicollinearity Application of EBF The Importance of LCFs Using LVQ Algorithm Landslide Susceptibility Maps Using SVM, RF, and NB Models Support vector machine Random forest Na ı¨ve Bayes Validation of Landslide Susceptibility Maps Discussion Conclusion

258 259 260 260 260 260 260 261 261 261 266 268 268 269 269 269 270 271 271 273 274 274 275 276 276

Introduction

Every year, landslides are one of the most important natural hazards, causing serious financial damages and loss of life (directly and indirectly) in many countries, such as Iran. So, governmental agencies and nongovernmental organizations have paid much attention to susceptibility assessment and the mitigation of damage caused by landslides (Pham et al., 2016). Based on aerial photos and detailed field surveys reported by associated organizations of landslide hazard management in the Mazandaran Province of Iran (from 519 recorded landslides), 438 and 1264 ha of forests and rangelands, and farming land use types suffering from the landslide damages, respectively. Also, 12 km of rural roads and 50 villages in this area have been destroyed (Land use planning of Mazandaran Province Governor, 2007). In this regard, identification of the areas with different susceptibility classes (including low, moderate, high, and very high), and also determination of spatial variability of landslides in different regions is important to finding appropriate solutions for its control and mitigation (Barrile et al., 2016). Landslide susceptibility mapping and determination of landslides spatial distribution in each watershed is a useful and practical tool for land use planning and integrated watershed management (Akgun, 2012). Due to the increasing occurrences of natural hazards, such as landslides in Iran and other places worldwide in recent years, the governmental manager and planners need to seek appropriate solutions which can provide for the best decision-making and hazard management. In general, disaster management can be done before, during, and after the events. In this regard, geographic information systems (GIS) and remote sensing (RS) will serve the goals mentioned. GIS is a computer system that provides accessibility, and allows simultaneous applicability of multiple issues (Van Westen et al., 2003). This technology can overlay different maps based on various aims, and also provide a comprehensive survey in a specific field, such as natural hazards (Carrara et al., 1999). Field-based monitoring and assessment of past landslide occurrences in large areas is not feasible. Therefore, the use of GIS and RS tools for

*

Corresponding Author.

258

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

259

investigating landslide susceptibility and hazards is necessary and inevitable (Van Westen et al., 2003). So far, different types of models, including expert knowledge/opinion, statistics (bivariate and multivariate), analytical, and machine learning have been used for landslide spatial modeling (LSM) and its prediction (Goetz et al., 2015; Pham et al., 2016). Among these, the efficiency of different machine learning-based models has been approved in several studies (Zare et al., 2013; Goetz et al., 2015; Pham et al., 2016). However, too many methods and models were applied by researchers in different countries according to data availability and accessibility for this purpose. According to a literature review, more studies of landslide susceptibility zonation focused on statistical techniques (Pourghasemi et al., 2012a,b; Devkota et al., 2013; Jaafari et al., 2014; Akgun and Erkan, 2015), probabilistic (Kouli et al., 2014; Lari et al., 2014; Regmi et al., 2014), and machine-learning (Colkesen et al., 2016; Hong et al., 2016a,b; Vasu and Lee, 2016) models. But, the efficiency of machine-learning models to solve complex and uncertain problems is undeniable (Toll, 1996). Hence, the models of the support vector machine (SVM; Yao et al., 2008; Ballabio and Sterlacchini, 2012; Were et al., 2015), multivariate adaptive regression spline (Felicísimo et al., 2013; Conoscenti et al., 2016; Pourghasemi and Rossi, 2016), random forest (RF; Miner and Warner, 2010; Stumpf and Kerle, 2011; Catani et al., 2013; Youssef et al., 2015), boosted regression trees (Lombardo et al., 2015; Youssef et al., 2015), classification and regression trees (Nefeslioglu et al., 2010; Alkhasawneh et al., 2014), generalized additive models (Park and Chi, 2008; Goetz et al., 2011; Canli et al., 2015), Naïve Bayes (NB; Tien Bui et al., 2012; Tsangaratos and Ilia, 2016), linear discriminant analysis (Murillo-García and Alcántara-Ayala, 2015), quadratic discriminant analysis (Eker et al., 2014), and artificial neural networks (Lee et al., 2004; Zare et al., 2013; Dou et al., 2015; Wang et al., 2016) have been considered in different parts of the world. Although the performance of machine-learning models depends on whether the study area and considered phenomenon has been approved, choosing the best and efficient models among existing models is the main concern for researchers. Today, GIS is used in various fields, including environmental, geographic, social, industrial, economic, and many other subjects, to save costs and time (Carrara et al., 1999). Recently, GIS has appeared as a powerful and useful tool for developing, performing, and analyzing spatial data and thematic maps. In this regard, the use of GIS for landslide hazard zonation, because it covers large areas and shows many layers of maps in a specific area for successive years, can help in decision-making and in managing landslides. Therefore, the environmental sciences are forced to use this capability of systems. So, the aim of the present study is to (1) determine the relationship between each conditioning factor and occurrence of landslides using the evidential belief function (EBF) model, (2) utilize landslide susceptibility spatial modeling (LSSM) using three GIS-based, machine-learning models, including the SVM, RF, and NB, (3) evaluate considered models and select the best model using the receiver operating characteristics (ROC) and the calculation of the area under the curve (AUC) value, and (4) feature selection and the importance ranking of conditioning factors using the learning vector quantization (LVQ) algorithm.

2.12.1.1

Importance and Application of GIS in Determining the Spatial Susceptibility Map of Landslide

Disaster management is very important and necessary (Kunlong et al., 2007; Zhishan et al., 2012), because of crisis management and unexpected accidents in different communities. So attention to prevention is the most important factor for improving crisis management (Andersson-Sköld et al., 2015). In this regard, landslides are one of the most common natural hazards that cause heavy losses of life and property all over the world each year (Dikau et al., 1996; Guzzetti et al., 2012). Undoubtedly, the occurrence of mass movement, and its associated losses, will be even more widespread in the future (Westen et al., 2013). Many researchers and decision makers in different countries focus on the investigation of different aspects of the aforementioned phenomenon, and have also attended to the optimal ways to predict and inhibit it (Zhishan et al., 2012; Andersson-Sköld et al., 2015). Planning and decision-making require accurate information in the considered field (Remondo and Oguchi, 2009; Westen et al., 2013). If decision and policy making is based on correct information, then the importance and quality of information and also information systems should be well known (Walsh et al., 1998; Guzzetti et al., 2012). Undoubtedly, realizing this subject through the traditional collection of information requires a lot of time and cost (Schuster, 1995). Quick access to information for planning, decision-making, utilization, monitoring, and implementation in various branches of science have caused the development of new technologies, including GIS. For this reason, GIS is used for various purposes today (Dikau et al., 1996; Walsh et al., 1998; Corominas and Moya, 2008). By collecting and developing data layers in GIS, the goals and purposes in crisis management can be achieved. The importance of GIS techniques are described according to the purpose of study, the studied scale, the analysis type, and the types of collected input data. The use of GIS techniques responds to these four major issues in the spatial modeling of landslide hazards (Van Westen et al., 2003). In recent years, the spatial distribution of landslides has been estimated using GIS techniques, and the result of landslide hazards has been mapped (Carrara et al., 1999; Guzzetti et al., 1999; Van Westen et al., 2003; Li et al., 2012). In summary, the application of GIS in the spatial modeling of landslide susceptibility and hazards is the development of spatial information, thematic mapping, guidance and planning for landslide hazards, and crisis management of natural disasters (Carrara et al., 1999). In fact, according to mentioned features, GIS is an efficient tool to display information and explore the results in order to solve an ever-increasing problem caused by landslides (Carrara et al., 1999). The main reason for using GIS is to prepare spatial landslide inventories and related information to develop a spatial susceptibility map of landslides for future analysis (Dikau et al., 1996). The use of GIS-based models (like the three models used in the present study) provides the possibility of planning in order to manage risk (Dikau et al., 1996; Walsh et al., 1998; Corominas and Moya, 2008). Also, GIS-based models and techniques help in the prioritization of landslide susceptible areas in terms of risk and, therefore, save time and cost (Dikau et al., 1996; Van Westen et al., 2003).

260

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

2.12.1.1.1

GIS techniques in landslide hazard mapping

Employment of GIS techniques in the field of geographical studies were conducted in the early 1970s (Carrara et al., 1999). Many of the scientists using different methods based on GIS, attempted landslide hazard mapping as well as spatial distribution of these natural events (Carrara et al., 1995). The GIS-based approaches were divided into two general categories: qualitative and quantitative (Dai and Lee, 2002). Scientists initially applied indirect and direct methods to recognize and identify landslides based on qualitative methods that included indicator-based techniques (Eidsvig et al., 2014) and geomorphological mapping, respectively (Lee et al., 2003). Then, due to human and financial losses, the necessity of predicting landslide occurrences (Ohlmacher and Davis, 2003) increased. Updating the database using GIS tools and information related to the development of landslide (such as geomorphological mapping and monitoring of the landslides inventory) (Pradhan and Lee, 2007), and preparing digital elevation maps and other digital information with high precision, led to the significant development of quantitative methods (Carrara et al., 1995; Gritzner et al., 2001). GIS, as a powerful tool for understanding the behavior of landslides, used to be considered a violation of human understanding (Carrara et al., 1999; Guzzetti et al., 1999).

2.12.1.1.2

GIS techniques in landslide hazard assessment and evaluation

Researchers, using a quantitative landslide modeling approach attempted to map spatial landslide susceptibility, which can be divided into two subcategories: physically based and statistical based models (Dai and Lee, 2002). In the field of physically based modeling, researchers using the mechanical properties of the soil, hillslope geometry, and combined with hydrological factors (e.g., rainfall, subsurface flow), tried to simulate the safety and stability factor in watershed scale and presented various studies (Montgomery and Dietrich, 1989; Montgomery and Dietrich, 1994; Wu and Sidle, 1995; Pack and Tarboton, 2001; Borga et al., 2002; Wilkinson et al., 2002; Deb and El-Kadi, 2009). Considering of landslides occurrence mechanism is the advantage of this type of modeling process (Montgomery and Dietrich, 1994). A limitation of this model is that it requires detailed information of the characteristics of the soil and hydrological parameters at the watershed scale, and may be too costly (Dai and Lee, 2002). Also, the effect of artificial factors (e.g., roads construction) is not considered with landslide occurrences, and most studies have been implemented only on shallow landslides (Dai and Lee, 2002; Pradhan and Lee, 2007; Pradhan and Kim, 2016). As a result, it can be concluded that most researchers using GIS techniques and statistical methods have attempted to assess or evaluate landslide susceptibility, hazard, and risk. Because of the same kinds of factors (topography, geology, hydrology, human, soil, etc.) involved, the comparison of final map of landslide hazards obtained from mentioned models is possible. Although, some studies have tried to evaluate and compare statistical methods with the physically based method (Yilmaz and Keskin, 2009).

2.12.1.1.3

GIS techniques in landslide prediction and monitoring

In different countries, the needed information to assess the risk, due to the inherent complexity of landslide disasters, is inadequate in terms of accuracy, stability, and updating (Carrara et al., 1995). Also, the zoning and prediction of risk remains for many of unsolved issues (Carrara et al., 1999). GIS-based predictive models of landslide can be used to decrease the damage of landslide occurrence in the future (Pradhan and Lee, 2007). Recently, statistical models have been significantly used for landslide hazard mapping (Carrara et al., 1991; Lee et al., 2003). These statistical models were various (bivariate, multivariate, multiple logistic regression, and their sub methods) and can consider the effect of multiple factors on landslides occurrence (in comparison with physically based models) (Carrara et al., 1991; Pradhan and Lee, 2007). In addition, the evaluation (e.g., using ROC) of statistical models in order to select the best model using landslides inventory in validation step was applied significantly in recent studies (Devkota et al., 2013; Felicísimo et al., 2013; Pourghasemi et al., 2013a,b; Pradhan, 2013; Jaafari et al., 2014; Kouli et al., 2014; Wu et al., 2014; Pham et al., 2015; Tsangaratos and Ilia, 2016; Wang et al., 2016; Colkesen et al., 2016; Hong et al., 2016a,b). So, GIS has an important and undeniable role in the modeling, monitoring, evaluation, and prediction of landslide phenomenon (Carrera et al., 1991, 1995; Carrara et al., 1995; Carrara et al., 1999; Gritzner et al., 2001; Dai and Lee, 2002; Lee et al., 2003; Ohlmacher and Davis, 2003; Pradhan and Lee, 2007), especially with multiple logistic regression, statistical methods, and artificial intelligence. In this study, we used three machine-learning algorithms for the spatial modeling of landslide susceptibility.

2.12.2

Materials and Methods

2.12.2.1

The Study Area

The Chahardangeh Watershed is located between longitudes 53 090 0000 –53 220 3000 E and latitudes 36 160 –36 240 0000 N, with an area of 180 km2 in the Mazandaran Province in Iran. This watershed is mountainous with an elevation range of 194–1287 m above sea level and a mean elevation of 688.7 m. The average slope of the study area is 16.58 degrees. The average annual rainfall and daily maximum average rainfall is 730 mm and 80 mm (IWRRC, 2011), respectively. The geology of the study area is known as the Pol-e Sefid Sheet and is at a 1:100,000-scale with different rocks and sediments from the Mesozoic to Cenozoic era. The land use types of the Chahardangeh Watershed were classified as forestland (F1) with an area of 99.5 km2 (55%), cultivated area (DF), thin forest (F2), and irrigated farming (IF) with areas of 99.5 km2 (55%), 2.65 km2 (1.46%), 37.4 km2 (20.6%), and 41.1 km2 (22.7%), respectively. The location of the study area is shown in Fig. 1.

261

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

53°0⬘0⬙E

54°0⬘0⬙E

(B)

36°0⬘0⬙N

36°0⬘0⬙N

Iran

53°12⬘0⬙E

53°15⬘0⬙E

53°18⬘0⬙E

52°0⬘0⬙E

53°0⬘0⬙E

53°21⬘0⬙E

54°0⬘0⬙E 36°24⬘0⬙N

51°0⬘0⬙E 53°9⬘0⬙E 36°24⬘0⬙N

52°0⬘0⬙E

37°0⬘0⬙N

51°0⬘0⬙E

(A)

53°24⬘0⬙E

(C)

N W

E

36°21⬘0⬙N

36°21⬘0⬙N

S

36°18⬘0⬙N

Legend 36°18⬘0⬙N

Landslide

DEM(meter) Value High : 1287.07 0

Low : 194.823 53°9⬘0⬙E

Fig. 1

2

4 Kilometers

53°12⬘0⬙E

53°15⬘0⬙E

53°18⬘0⬙E

53°21⬘0⬙E

53°24⬘0⬙E

Location of study area for LSSM.

2.12.2.2

Methodology

The main purpose of this study is to assess the efficiency of three GIS-based machine-learning models, including SVM, RF, and NB, for the spatial modeling of landslide susceptibility in eastern Mazandaran Province, Iran. The flowchart of methodology applied in the Chahardangeh Watershed as a landslide-prone area is given in Fig. 2.

2.12.2.2.1

Landslide inventory mapping

The landslide locations were prepared for the study area using national reports (Iranian Department of Natural Resources & Watershed Management, and the General Directorate of Roads and Urban Development) as well as extensive field monitoring (Fig. 3). For LSM, spatial distribution of landslides needs to be considered. So, a total of 146 landslides were detected, and  70% (102) and  30% (44) locations (Pourghasemi et al., 2013a,b; Chen et al., 2015; Hong et al., 2016a,b) were used as training and validation sets, respectively. Most of the landslides identified in the Chahardangeh Watershed are rotational landslides (Varnes, 1958). Fig. 3 shows some landslides recorded in the field surveys.

2.12.2.2.2

Landslide conditioning factors (LCFs)

Various factors combine to cause landslides; so, it is crucial to identify which factors are involved in LSM (Guzzetti et al., 1999). In this research, 12 conditioning factors including aspect, altitude, drainage density, distance from faults (DFF), lithology, slope, land use, plan curvature, profile curvature, distance from rivers (DFRI), distance from roads (DFRO), and topographic wetness index (TWI) were used for GIS-based LSSM, using machine-learning algorithms (Fig. 4). 2.12.2.2.2.1 Topographic and geomorphological characteristics The important parts of topography features, slope, aspect, and elevation, in a watershed may determine the inherent characteristics of landslide occurrences (Dai and Lee, 2002). All topographical and geomorphological features (primary and secondary features) in this study (elevation/digital elevation model (DEM), aspect, slope, topographic wetness index, plan curvature, and profile

262

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

Landslide susceptibility spatial modeling (LSSM) Conditioning factors

Elevation

Aspect

Slope

Topographic wetness index

Plan curvature

Profile curvature

Drainage density

Distance from river

Distance from fault

Distance from road

Lithology

Land use

Landslide inventory map (LIM)

Random partition of landslides

Validation data set

Training data set

(30% or 44)

(70% or 102)

Application of three models including - Support vector machine (SVM) - Random forest (RF) - Naïve Bayes (NB)

Validation of landslide susceptibility models

Selection of the best model Fig. 2

Using ROC–AUC curve

The flowchart of methodology applied to the Chahardangeh Watershed, eastern Mazandaran Province, Iran.

curvature) were derived from a DEM with a spatial resolution of 10 m ”; 10 m (Fig. 5A–F) (NCC, 1997). In low altitudes, underdip slopes are usual (Ayalew and Yamagishi, 2005). In the middle, there are zones of dip-slopes where the slopes and bedrocks are inclined parallel to each other (Ayalew and Yamagishi, 2005). Higher altitude represents high landslide-prone areas, but for a specific range (Pachauri and Pant, 1992; Ercanoglu and Gokceoglu, 2004). The aspect is directly associated with bedrock, especially where the landslide occurred. Similarly, the failure surface of the hillslope directly shows the slope plan (Dai and Lee, 2002). The aspect strongly exerts evapotranspiration effects on hydrological processes, and is more effective on north facing slopes, which is often due to less evaporation and more penetration of moisture and gradual water in the soil, and the increasing force of gravity (Ayalew and Yamagishi, 2005; Hong et al., 2016a,b). Gradient of slope is the main driver of instability with increases in the force of gravity, shear strength, shear force, friction, subsurface flow of water, infiltration, and slope orientation material; so, that they intensify mass movements (Deb and El-Kadi, 2009; Evans, 2012). The topographic wetness index was proposed by Beven et al. (1984) and Tarboton (2003) to simulate the watershed response to a precipitation event. The topographic wetness index for a given pixel within the hydrographic watershed is computed as follows:   As (1) Topographic wetness index ¼ ln tan b where As is the specific catchment area expressed in meters and computed as the territorial up-slope area terrain via a generic spot, and per unit contour length; and b is the vernacular ramp at the pixel in query expressed in degrees (Fig. 5D). To study the effect of the geomorphic and topographic characteristics of landslides, the hillslope, obtained by combining the slope profile (curvature of the slope) and the plan curvature (curvature in the direction perpendicular to the slope), is described (Talebi et al., 2007; Evans, 2012). The plan curvature is important because it determines the scope of convergence of topography and surface flow concentration (Talebi et al., 2007). The profile curvature is important because the mass rate moving downward on the slopes is controlled by this factor (Sidle and Bogaard, 2016). 2.12.2.2.2.2 Buffer and density parameters The main origin of instability of many landslides is in connection with fractures, faults, joints, and distance from them (Umar et al., 2014; Guo et al., 2015). In general, tectonics play a dominant role and are known as an effective factor in mass movement (Li et al., 2013; Katz et al., 2015). The distance from fault map extracted from a geological map and reclassified into five classes with distances of 150 m is shown in Fig. 5G. River proximity to hillslopes, under washing the foot of the slope by flood, and surface runoff or river flows, are processes that cause many landslide occurrences (Pham et al., 2015; Stefanelli et al., 2016). This effective action increases damaging shear forces and eventually causes slope instability (Wang et al., 2012). The vicinity of hillslopes to drainage systems is an important landslide factor (Pourghasemi et al., 2014). So, the distance from river map for the Chahardangeh Watershed was

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

Fig. 3

Some pictures of landslides recorded in the field surveys.

Fig. 4

Destruction of road network in the Chahardangeh Watershed as a result of landslide occurrence.

263

264

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

(B) 704000

708000

696000

712000

696000

700000

1.5

704000

3

W

708000

712000

E

4027000 4024000

4024000

4024000

4021000

Aspect N

S

NE E SE LS_Train

SW W NW LS_Validation

km 0

712000

1.5

3

6

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

(D) 696000

700000

704000

708000

712000

696000

700000

0

1.5

704000

3 708000

4030000

E

4024000

4024000

4027000

S

4021000

4024000

4021000

km 6

4021000

LS_Validation

4018000

Low : 0.006 LS_Train

4018000

4027000

4027000 4024000 4021000 4018000

Slope High : 67.59

W

Topographic wetness index High : 24.64 Low : 1.40 LS_Train LS_Validation

km 0

712000

(E)

1.5

3

4018000

4030000

E S

4027000

W

4030000

N

N

6

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

(F) 696000

700000

704000

708000

712000

N

704000

3 708000

6 712000

The LCF maps for landslide spatial modeling in the study area.

4030000

W

E

4024000 4021000

4024000

4024000

4021000

4027000

S

Profile curvature (100/m) High : 11.05 Low : −15.92 LS_Train LS_Validation 696000

700000

0 704000

1.5

3 708000

km 6 712000

4018000

4030000 700000

1.5

4021000

4027000 696000

Fig. 5

km 0

4018000

Low : −8.37 LS_Train LS_Validation

4018000

4024000 4021000 4018000

Plan curvature (100/m) High : 15.39

4030000

E S

4027000

W

N

4027000

4030000

708000

S

6

(C)

4030000

4030000

4030000 km 0

4021000

4027000 Low : 190.55 LS_Train LS_Validation

4018000

4024000 4021000 4018000

DEM (m) High : 1230.15

704000

4027000

E S

700000

N

4018000

W

4027000

4030000

N

4030000

700000

4021000

696000

4018000

(A)

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

(H) 696000

700000

704000

708000

712000

696000

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

E

4027000

S

4024000

4024000 Distance from rivers (m) High : 912.85 Low : 0 LS_Train LS_Validation

km 0

1.5

3

6

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

(J)

E

1.5

3

4024000 Land use/land cover F1 IF

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

4018000

km 6

4018000

4018000

0

4021000

4024000 4021000

4027000

S

High : 6.98 Low : 0 LS_Train LS_Validation

4030000

W

LS_Train

DF F2 LS_Validation km 0

(K)

1.5

3

4018000

4024000 4021000

4027000

4027000 4024000 Drainage density (km/km2)

4030000

E S

N

4027000

W

4030000

N

6

696000

700000

704000

708000

712000

696000

700000

704000

708000

712000

(L) N

Low : 0 LS_Train LS_Validation

696000

Fig. 5

(continued).

700000

km 0 704000

1.5

3 708000

4030000

W

E S

4024000 4021000

4024000 4021000

4027000

4030000

4024000 4021000

4027000 High : 2846.08

4018000

4027000 4024000 4021000 4018000

Distance from roads (m)

4018000

E S

4027000

4030000

W

N

Lithology units Q2

K2M

PIQCS

MMS

MML

MG

QAL

M2,3M,S,L

PEL LS_Train

LS_Validation

0

6 712000

696000

700000

704000

1.5

3 708000

km 6 712000

4018000

4030000

712000 4030000

W

6

(I)

4021000

708000

4021000

4024000

4027000 3

4021000

km 1.5

4021000

Low : 0 LS_Train LS_Validation

4018000

4018000

High : 1694.99

4018000

4024000 4021000

Distance from faults (m)

4030000

4030000

E S

0

4030000

704000

4027000

W

700000

N

4027000

4030000

N

4018000

(G)

265

266

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

prepared according to the topographical map and reclassified into five classes with 100 m intervals (Fig. 5H). To calculate the drainage density map (Fig. 5I), the total sum of the drainage lengths in the cell of the watershed was divided by the area of the corresponding cell (Montgomery and Dietrich, 1989). 2.12.2.2.2.3 Human factors Human activities associated with landslides can be divided as follows (Mugagga et al., 2012; Promper et al., 2014): change and destruction of vegetation, such as conversion of forest to pasture, agriculture, and urban development; conversion of a forest species; destruction of forest and pasture due to increased grazing livestock; and general involvement in land use and land management (Mugagga et al., 2012; Promper et al., 2014). Therefore, the landslide susceptibility mapping for land use planning is necessary (Leventhal and Kotze, 2008). The land use map with a supervised classification, using the maximum likelihood algorithm, was delineated from a Landsat 7/ETMþ (2011) image and reclassified to four land use types including forestland, cultivated area, thin forest, and irrigated farming (Fig. 5J). In this regard, strip lines in the Landsat 7/ETM þ were solved with the use of Gapfill software according to Mohammady et al. (2014). Undercutting and road construction in mountainous or rural areas, immobilized slope instability and rupture hillslope (Das et al., 2012; Hong et al., 2016a,b). Two examples of the destruction of the road network in the Chahrdangeh Watershed as a result of landslide are given in Fig. 4, where 330 m of asphalted road between the villages were destroyed. The best way to include the effect of road sections in landslide research is to delineate a buffer around them (Ayalew and Yamagishi, 2005). The distance from road map for the Chahardangeh Watershed was prepared according to the General Directorate of Roads and Urban Development and was reclassified into five classes with 250 m distances (Fig. 5K). 2.12.2.2.2.4 Lithology units The lithology map of the study area has been prepared from a geological survey of Iran (GSI, 1997). The lithology structure/units of this area were various and consisted of nine classes (Fig. 5L and Table 1). Based on a GSI (1997), 59.4% of the lithology of the study area consisted of rock units with code M2, 3M, S, L including marl, calcareous sandstone and siltstone, silty marl, sandy limestone, and mudstone (Table 1).

2.12.2.2.3

Landslide susceptibility spatial modeling (LSSM)

2.12.2.2.3.1 Support vector machine The SVM model was first introduced by Vapnik (1995) and was based on statistical learning theory, and was applied by many researchers in environmental and natural hazard studies (Shruthi et al., 2014; Tehrany et al., 2015; Tramontana et al., 2015; Yousefi et al., 2015; Rahmati et al., 2016). This method has been successfully used in information categorization (Liu et al., 2016a,b) and has been used significantly for LSM (Marjanovic et al., 2011; Pourghasemi et al., 2013a,b; Pradhan, 2013; Taner San, 2014; Wu et al., 2014; Hong et al., 2015; Colkesen et al., 2016). The purpose of this class of algorithms is to detect and distinguish complex patterns of data such as clustering, classification, ranking, and cleaning (Vapnik, 1995). In order to classify the data with high data complexity using kernel function moved to space with higher dimensions. Then, the training data transferred by an optimal hyperplane separated in that high dimensions space (Liu and Li, 2013). The training data separator closest to the hyperplane are called support vectors (Tien Bui et al., 2012; Scardapane et al., 2016). Support vectors are the instruction data that are nearest to the optimal hyperplane (Marjanovic et al., 2011; Tien Bui et al., 2012). The new classification of data can be performed by solving function (Tien Bui et al., 2012; Pradhan, 2013; Tehrany et al., 2015). The categories of disk positives and negatives are classified as plusplane (þ 1) and minus-plane ( 1), and the W vector perpendicular to them will be a positive and negative plane; therefore, W will be a multiple of the distance between the two planes (Marjanovic et al., 2011). In the training samples (xi, yi) with xi ˛ Rn and yi ˛ {1, 1}, and i¼1,., L. If xi ˛ Rn be lower and upper of the hyper-plane, they are classified as -1 and þ1, respectively. In these Table 1

Lithology of the Chahardangeh Watershed

Code

Lithology

Age

Era

QAL Q2

Recent loose alluvium in the river channels Young alluvial fans and terraces, river terraces, and mainly cultivated Conglomerate, sandstone, siltstone, and silty marl Marl, calcareous sandstone and siltstone, silty marl, sandy limestone, and mudstone Gray-red marls, thin-medium bedded sandstone, and calcareous\sandstone intercalations Gypsum, gypsiferous marl, and gray-green marl Homogeneous ochre colored marls Medium-thick bedded cream-gray limestone and sandy limestone, and locally thin marl layers Gray-blue marls and silty marls

Quaternary Quaternary

Cenozoic Cenozoic

Pliocene Miocene

Cenozoic Cenozoic

Miocene

Cenozoic

Miocene Miocene Late Cretaceous

Cenozoic Cenozoic Mesozoic

Late Cretaceous

Mesozoic

PlQCS M2,3M,S,L MMS MG MML PEL K2M

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

267

circumstances, the state of landslide susceptibility, x is a vector of input space that include slope aspect, altitude, drainage density, distance from faults, lithology, slope degree, land use, plan curvature, profile curvature, distance from rivers, distance from roads, and TWI. The two classes {1,  1}, demonstrate landslide and nonlandslide pixels, respectively. The systematization subordinate is therefore given with Eq. (2): ! L X y ¼ signð f ðxÞÞ ¼ sign Wi xi þ b ¼ signðW,x þ bÞ (2) i¼1

where y illustrates the category label, W and b are the components of the hyperplane and sign function (Vapnik, 1995). Solving the optimization problem is difficult (Vapnik, 1995). To simplify it using the optimal hyperplane, weights, W, can be expressed as P a linear combination of training samples, W ¼ Li¼1 li yi xi , i ¼ 1,., L. Eq. (2) changing to: f ðxÞ ¼ sign

L X

li yi ðxi $xÞ þ b

(3)

i¼1

where li is a support vector greater than zero, and also other parts of li will be zero. Therefore, according to Eq. (3) and zero-li related to xi that do not support vector, to obtain the boundary decision, it just needs a few training points that are support vectors and they all are not needed (Vapnik, 1995; Marjanovic et al., 2011). After finding W, using Eq. (4), quantity of b per vectors support was determinates and final b, be obtained with a mean of whole of b (Vapnik, 1995). f ðx; W; bÞ ¼ signðw$x þ bÞ

(4)

In this research, in order to LSSM, radial basis function (SVM-RBF) has been used (Eq. 5) (Zare et al., 2013). Due to the greater use of RBF, the final classification can be achieved using Eq. (6) (Marjanovic et al., 2011; Liu and Li, 2013; Pourghasemi et al., 2013a,b; Pradhan, 2013; Taner San, 2014; Hong et al., 2015; Tehrany et al., 2015). ! L X li yi kðxi ; xÞ þ b (5) f ðx; l1 ; .; lL Þ ¼ sign i¼1

  K ðx; yÞ ¼ exp gjjx  yjj2

(6)

where g illustrates the kernel width that should be considered in the SVM classification. 2.12.2.2.3.2 Random forest Random trees, which was invented and developed by Breiman (2001), is one of the most efficient techniques used to estimate the target variables-related issues, or pattern classification. A decision forest divides the input space into a set of separate areas and assigns a value to each part of the answer (Trigila et al., 2015). In simple mode, this answer can be determined based on the average values of objective regression issues related to the patterns in each area (Breiman, 2001; Liaw and Wiener, 2002). When using a RF algorithm to construct a tree, distinct categories of existing pattern are selected with a view for replacing each selected pattern again (Stumpf and Kerle, 2011). The size of these samples will be equal to the total number of existing models (Breiman, 2001). This method of sampling out-of-bag (OOB) usually puts on about one-third of the templates out of that category patterns (Breiman, 2001; Rahmati et al., 2016). Each tree, based on the category of selected pattern and the predetermined maximum depth, is grown. The depth is determined by the minimum pattern on each end node (Breiman, 2001). Based on the RF algorithm, the steps of development for each tree, each node, and each bunch of features are selected randomly, and the split among the category’s selected feature is intended to create the next new nodes (Breiman, 2001; Belgiu and Dragut¸, 2016). The definition of MinD and PercD can be said that each tree has growth pattern categories that are selected based on a predetermined maximum depth (Liaw and Wiener, 2002). The depth is determined based on the minimum pattern in each end node (MinD), and the percentage of features randomly chosen to determine the split (PercD) of the decision tree. (Breiman, 2001). From OOB algorithms, it can be found that generalizability of the tree is tested in tree generation step and then used to determine the regulatory parameters (MinD and PercD) (Breiman, 2001; Liaw and Wiener, 2002). RF, like SVM, is widely used for environmental and natural hazard studies, including groundwater, landslides, flood, and gully erosion (Stumpf and Kerle, 2011; Chen et al., 2014; Shruthi et al., 2014; Naghibi and Pourghasemi, 2015; Trigila et al., 2015; Youssef et al., 2015; Hong et al., 2016a,b; Naghibi et al., 2016; Pourghasemi and Kerle, 2016; Rahmati et al., 2016; Zabihi et al., 2016), and is also widely used in RS classification (Waske et al., 2012; Belgiu and Dragut¸, 2016; Wang et al., 2016). 2.12.2.2.3.3 Naive Bayes NB is one of the most efficient and effective inductive learning algorithms for machine learning and data mining (Zhang, 2004; Larsen, 2005). In probability theory and statistics, the Bayes’ proposition characterizes the probability of an occurrence based on the position that might be dependent on the event (Collins, 2012). Due to the simplicity in classification, Naive Bayes Classification (NBC) method was applied when the independent variables were numerous; this method is able to determine a bias-variance trade off the probability. The NB classification tends to produce biased estimated class possibilities; however, it is capable of preserving a notable division around these estimates (Zhang, 2004; Larsen, 2005). The NB is similar to the SVM and RF procedures, and is

268

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

currently, widely used for modeling natural hazards and classification techniques (Fernandes et al., 2010; Ballabio and Sterlacchini, 2012; Das et al., 2012; Pourghasemi et al., 2012a,b; Song et al., 2012; De Risi et al., 2015; Mao et al., 2015; Pham et al., 2015; Silva et al., 2015; Bozzeda et al., 2016; Liu et al., 2016a,b; Motevalli and Vafakhah, 2016; Pham et al., 2016; Tsangaratos and Ilia, 2016; Zhang et al., 2016). The superiority of the Bayesian network is that it requires a low content of teaching data to estimate parameters for classification (Bhargavi and Jyothi, 2009; Pham et al., 2016). The principal aim of NB is to characterize the prior probability of an event based on observed cases (Pham et al., 2016). In this study, 12 conditioning factors were used to evaluate the performance of the NB model. Suppose that a ¼ (a1, a2, ., a12) is the result of 12 landslide conditioning factors (LCFs), where z ¼ (z1, z2) is the vector of the classifier variables (landslide, not landslide). The NB classifier is defined as follows (Zhang, 2004; Pham et al., 2015; Zhang et al., 2016): 10   Y ai YNB ¼ arg maxP ðzi Þ P zi (7) i¼1 zi ¼ ½landslide; not landslide In the previous equation, P(zi) is the possibility probability of zi that can be estimated pursuant to the proportion of the observed data with generation class zi in the teaching dataset. P (ai|zi) is the conditional probability that can be computed based on Eq. (8) (Zhang, 2004; Pham et al., 2015; Zhang et al., 2016): 2

ðai mÞ 1 P ðai jzi Þ ¼ pffiffiffiffiffiffiffiffiffie 2s2 2ps

(8)

where m is the mean and s is the standard deviation of ai.

2.12.2.2.4

Assessment of the importance of variables using the LVQ algorithm

LVQ is a neural network that synthesizes competitive acquisition with supervision classification (Sato and Yamada, 1996; Hollm et al. (2000); Kohonen, 1998; Valdivia et al., 2007). This algorithm subset of classifiers has an open box nature, so that it provides a thorough understanding of the information learned by the classification (Nova and Estévez, 2013; Ortiz-Bayliss et al., 2013; Hammer et al., 2014; de Vries et al., 2015; Nebel et al., 2015; Kumar et al., 2016). The LVQ method due to directly clarify learning procedure and facility in implement, can be handled for pattern systematization and is applied to determine the uncertainty (Naghibi et al., 2016; Rahmati et al., 2016). In classification, the data point, zi, is given to a class conforming to the class label of the nearest vector (Kohonen, 1998; Nova and Estévez, 2013). The training algorithm initiates a duplicative gradient update of the winner unit. The winner unit, wu, is defined by Hollm et al. (2000) and Kohonen (1998): u ¼ arg minjjzi  wk jj k

(9)

where k is the code book index (Kohonen, 1998). The direction of the gradient update belongs to the validity of the systematization, using a nearest neighborhood formula in Euclidean space (Kohonen, 1998; Hammer et al., 2014). If the sample of data is correctly classified, the model vector is closest to the sample data. If the sample is wrongly classified, the sample data has a repulsive effect on the vector model. (Sato and Yamada, 1996; Hollm et al., 2000). The updated equation for the winner unit, wu, describes, via the nearest-neighbor rule, the data sample, x(t) (Kohonen, 1998): wu ðt þ 1Þ ¼ wu ðt Þ  bðt Þ½xðt Þ  wu ðt Þ

(10)

where the sign depends on whether the data sample is correctly classified (þ) or misclassified () (Kohonen, 1998; Nebel et al., 2015). The acquisition level, bðtÞ˛½0; 1, must decrease constantly (Hollm et al., 2000). In order to pick various data samples from our teaching set, this method is reiterated until convergence (Kohonen, 1998). In this research, the 12 conditioning vectors of inputs in landslide occurrence, such as aspect, altitude, drainage density, distance from faults, lithology, slope, land use, plan curvature, profile curvature, distance from rivers, distance from roads, and TWI, will be determined and their importance will prioritized by LVQ algorithm.

2.12.2.2.5

Accuracy assessment of LSSMs

In order to evaluate the three models used for the spatial modeling of landslide susceptibility, data sets were divided randomly into two parts using Hawth’s Tools extension in ArcGIS 9.3. The first part of the data set, including 70% of landslide occurrences (102 locations), was used for a training data set, and the next part of the data set (30% or 44 locations) was devoted to validation. The ROC curve is a useful method for illustrating the quality of deterministic and probability models (Swets, 1988). The purpose of a ROC curve analysis is to determine the cutoff value (Pourghasemi et al., 2014; Pham et al., 2015; Youssef et al., 2015; Hong et al., 2016a,b; Motevalli and Vafakhah, 2016; Pourghasemi and Kerle, 2016). The ROC curve is a graph of sensitivity (y-axis) versus 1–specificity (x-axis) (Swets, 1988). Once a prediction threshold has been adopted, the dual predictions can be compared with the validation model, allowing the manufacturing of a confusion matrix (Table 2). The confusion matrix shows the number of correctly and incorrectly predicted observations, for both positive and negative cases (Beguería, 2006). According to Table 2, a/(a þ c) / sensitivity, d/(b þ d)/ specificity, b/(b þ d) / false positive rate, c/(a þ c) / false negative rate, and sensitivity/(1-specificity) / is a likelihood ratio (Beguería, 2006).

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

269

Confusion Matrix

Table 2

Observed Predicted

X1

X0

X10 X00

a c

b d

a, true positives; b, false positives; c, false negative; d, true negatives (Beguería, 2006). X10 , unsafe prediction; X00 , safe prediction; X1, unsafe observation; X0, safe observation.

The AUC-ROC curve describes the expression of a prediction model by illustrating the model’s sufficiency to predict the accurate occurrence, or nonoccurrence, of a predetermined course of events. The qualitative correlation between AUC and prediction accuracy can be arranged as (Swets, 1988): 0.9–1 (perfect), 0.8–0.9 (very good), 0.7–0.8 (good), 0.6–0.7 (average), and 0.5–0.6 (poor).

2.12.3

Results

2.12.3.1

Investigation of Multicollinearity

Tolerance and the variance inflation factor (VIF) are two essential indices that detect and investigate the multicollinearity of independent variables. Values less than 0.20 or 0.10 of tolerance, and/or values of 5 or 10 and above of VIF show multicollinearity issues (O’Brien, 2007; Hong et al., 2016a,b). The results of multicollinearity are shown in Fig. 6. From Fig. 6, all of these instructions were supplied; and there was no limitation in terms of multicollinearity of conditioning factors on landslide occurrences. In this regard, aspect and topographic wetness index have the highest (0.916) and the lowest (0.442) values of tolerance, respectively. In addition, 1.091 and 2.264, as minimum and maximum values of VIF, indicated that there was no multicollinearity among the 12 conditioning factors applied for LSM within the study area.

2.12.3.2

Application of EBF

EBF was applied for the evaluation and assessment of the spatial correlation between landslides and different classes of conditioning factors. The result of EBF values is shown in Fig. 7. According to these results, in the case of altitude, an inverse relationship was detected between belief values and altitude. In addition, altitudes of < 426.06 and > 884.97 classes have maximum (0.0631) and minimum (0.023) value of belief, respectively. It shows that the first class of altitude had the maximum probability of landslide occurrences. In the case of aspect, flat has the lowest belief, and the highest belief was assigned to the south aspect. The maximum (0.458) and minimum (0.000) values of EBF were estimated for 5–15 and > 30 slope classes, respectively. In the case of topographic wetness index, 7.94–12.3 classes have the most belief value. In the case of plan curvature and profile curvature, the highest value corresponds to the flat class. There was a direct relationship between drainage density and belief values, so that the < 1.23 and the > 0.410 classes had the lowest and highest values of belief, respectively. In the case of DFRI, minimum and maximum values of belief were 0.141 and 0.247, respectively. Based on the obtained results, there was a direct and inverse relationship between

2.5

2.264

Value

2 1.5 1 Tolerance 0.442

0.5

VIF

0 de

tu

ti Al

t

e

ec

x

op

ss

ne

et

c

hi

w

at

in

n

a Pl

e

og

Fig. 6

n de

ag

Dr

n ai

a

st

Di

e nc

om

fr

e nc

a

st

Di

ul

riv

Conditioning factors

Tolerance and VIF values in the multicollinearity test.

t

er

ty

si

e

fc

o Pr

p ra

p To

at

v ur

v ur

c

e

ur

ur

de

Sl

p As

om

m

fr

a

st

Di

e nc

fro

e

gy

ad

fa

lo

ro

L

o ith

L

d an

us

270

Assessment of GIS-Based Machine Learning Algorithms for Spatial Modeling of Landslide Susceptibility: Case Study

DFF and DFRO with belief values, respectively. In the case of lithology, the highest value (0.313) of belief is related to the Q2 class, and the lowest value of belief (0.000) is assigned to the MG class. In the case of the land use factor, the obtained values of belief were 0.116, 0.153, 0.165, and 0.566, which were devoted to F1, DF, F2, and IF land use types, respectively. According to the results, the relationship obtained between each conditioning factor of landslide occurrence and belief values can help with the understanding of landslide processes and their mechanisms.

2.12.3.3

The Importance of LCFs Using LVQ Algorithm

In this research, the LVQ method was implemented for specifying the importance of LCFs, and their results are presented in Fig. 8. Based on our findings, maximum and minimum values of LVQ were computed as 0.825 and 0.465, respectively, which corresponded to altitude and profile curvature factors. In other words, altitude and profile curvature have the highest and the lowest effects on landslide susceptibility. DFRO, lithology, land use, and DFF are assigned to rank 2 to 5, respectively. Also, LVQ values of aspect, slope, DFRI, drainage density, plan curvature, and topographic wetness index were 0.542, 0.535, 0.526, 0.496, 0.493, and 0.492, respectively (Fig. 8).

0.70

0.300

(A)

Altitude

0.631

0.60

(B)

Aspect 0.229

0.50

0.200

0.195

0.180

0.40 0.30

0.138

0.227

0.20

0.079

(C)

0.40

0.458

Slope

st So Sou t ut h h W es t N We or th st W es t

st

ut

h

Ea

th

Ea

or or

th

84

>8

0.40

(D)

Topographic wetness index

0.359

0.30 0.320

0.30

0.267

0.252

0.20

0.222

0.20

0.123 0.10

0.10 0.000

0.00 0.80

at

84

8–8

.7 738

.97

N

3–7

.5 588

.97

N

38

88

9–5

.0 426

.78

Fl

26

400

100–200 200–300 300–400

Distance from road

0.552

0.50 0.237

0.20

0.40

0.205 0.140

0.30

0.158

0.255

0.20

0.10

0.095

0.10

0.055

0.044

Distance from fault

0.60

(K)

Lithology 0.313

0 00

10 0– 75

>1

00

0 75 0– 50

0–

50

50

45

25

0–

>6