Spatiotemporal Analytics 1032303050, 9781032303055

This book introduces readers to spatiotemporal analytics that are extended from spatial statistics. Spatiotemporal analy

169 22 20MB

English Pages 266 [267] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Spatiotemporal Analytics
 1032303050, 9781032303055

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Table of Contents
Editor
Contributors
Chapter 1 Introduction to Spatiotemporal Analytics
1.1 From Spatial Analytics to Spatiotemporal Analytics
1.2 Spatial Dependency and Spatiotemporal Dependency Among Geographic Events or Objects
1.3 Space–Time Dependency
1.4 Concluding Remarks
References
Chapter 2 Spatiotemporal Centrography and Dispersion
2.1 Introduction
2.2 Review of Relevant Literature
2.3 Analytical Methods
2.3.1 Centrography of Spatiotemporal Points
2.3.1.1 Spatiotemporal Mean Center
2.3.1.2 Weighted Spatiotemporal Mean Center
2.3.1.3 Changes in Spatiotemporal Mean Centers
2.3.2 Dispersion of Spatiotemporal Points
2.3.2.1 Standard Spatiotemporal Distance
2.3.2.2 Standard Spherical Volume
2.4 Application Example
2.5 Software and Usage
2.5.1 Hardware/Software Requirements
2.5.2 Software Usage for Spatiotemporal Mean Centers
2.5.3 Software Usage for Standard Spatiotemporal Distance
2.6 Concluding Remarks
References
Chapter 3 Spatiotemporal Quadrat Analytics
3.1 Introduction
3.2 Review of Relevant Literature
3.3 Analytical Methods
3.4 Application Example
3.5 Software and Usage
3.5.1 Hardware/Software Requirements
3.5.2 Software Usage for Spatiotemporal Quadrat Analysis
3.6 Concluding Remarks
References
Chapter 4 Spatiotemporal Nearest Neighbor Analytics
4.1 Introduction
4.2 Nearest Neighbor Index
4.3 Spatiotemporal Nearest Neighbor Index
4.3.1 The Time Dimension
4.3.2 Space–Time Nearest Neighbor Index
4.3.3 STNNI Application
4.3.4 Some Final Remarks
4.4 Software and Usage
4.4.1 Installation and Uninstallation
4.4.1.1 Install QGIS and NNI Plugin
4.4.1.2 Uninstall
4.4.2 Run STNNI and NNI Scripts
4.4.2.1 Space–Time Nearest Neighborhood Index
4.4.2.2 Spatial Nearest Neighborhood Index
4.5 Concluding Remarks
References
Appendix
Chapter 5 Spatiotemporal Ripley's K and L Functions
5.1 Introduction
5.2 Concept and Methods
5.2.1 Spatial Ripley's K Function
5.2.2 Spatiotemporal Ripley's K Function
5.3 An Example Application
References
Chapter 6 Spatiotemporal Autocorrelation Analytics
6.1 Introduction
6.2 Methodology
6.2.1 Spatial Autocorrelation Moran's I
6.2.2 Temporal Autocorrelation
6.2.2.1 Global Temporal Moran's I[sub(t)]
6.2.2.2 Localized Temporal Moran's I
6.2.3 Spatiotemporal Autocorrelation (Temporal and Spatial Moran's I)
6.2.3.1 Global Spatiotemporal Moran's Index
6.2.3.2 Localized Spatiotemporal Moran's Index
6.3 Example Application
6.3.1 Disease Patterns
6.3.2 Simulation Experiments
6.3.2.1 Monte Carlo Simulation Process
6.3.2.2 Sensitivity and Temporal and Spatial Trend Analysis
6.4 Software and User Manual
6.4.1 Moran's I Tool User Manual
6.4.2 Demonstration of Software Results
6.4.3 Supplementary Explanation
References
Chapter 7 Spatiotemporal G Statistical Analytics
7.1 Introduction
7.2 The Getis – Ord G[sub(i)] and G[sub(i)]* Statistics
7.2.1 Space–Time Weight Matrix
7.2.2 Space–Time G[sub(i)] and G[sub(i)]*
7.3 Space–Time Crime Pattern in Chicago
7.3.1 Software and Usage
7.3.2 Hardware/Software Requirements
7.3.3 Software Usage for ST G[sub(i)] and G[sub(i)]* Analysis
7.4 Concluding Remarks
References
Chapter 8 Spatiotemporal Kernel Density Estimation
8.1 Introduction
8.2 Methods
8.2.1 Classic Spatiotemporal Kernel Density Estimation (CL_STKDE)
8.2.2 Conditional Spatiotemporal Kernel Density Estimation (CN_STKDE)
8.2.3 Integrative Spatiotemporal Kernel Density Estimation (IN_STKDE)
8.2.4 Validation Measurement
8.2.4.1 Hit Rate
8.2.4.2 Compactness Index
8.3 Example Application
8.4 Software and User Manual
References
Chapter 9 Spatiotemporally Weighted Regression
9.1 Introduction
9.2 Methodology
9.2.1 OLS Model
9.2.2 GWR Model
9.2.3 GTWR Model
9.3 Application Examples
9.3.1 House Price Estimation
9.3.2 Environmental Pollution Monitoring
9.3.3 Transportation Management
9.3.4 Crime Analysis – Based Urban Planning
9.4 Software and Usage
9.4.1 Installation and Uninstallation
9.4.1.1 How to Install GTWR Add-in
9.4.1.2 Uninstall
9.4.2 Run GTWR
9.4.2.1 Data Input
9.4.2.2 Setting
9.4.2.3 Output
9.4.2.4 Error
9.4.3 Some Notes
9.4.3.1 Data Requirements
9.4.3.2 Model Test
9.4.3.3 Spatiotemporal Distance
9.5 Concluding Remarks
References
Chapter 10 Spatiotemporal Bayesian Regression
10.1 Introduction to Bayesian Inference
10.1.1 Disease Mapping
10.1.2 Adding a Temporal Component
10.1.3 Parametric Time Trend
10.1.4 Exceedance Probabilities and Hotspot Identification
10.2 Example Applications
10.2.1 Example 1: Modeling Drug Overdose Incident
10.2.1.1 Defining Spatial Adjacency
10.2.1.2 Mapping the Relative Risk
10.2.1.3 Spatial Risk
10.2.1.4 Spatiotemporal Trend and Exceedance Probabilities
10.2.2 Example Application 2: Predictive Distribution of Spatiotemporal Bayesian Model
10.2.2.1 Predictive Distribution of Spatiotemporal Bayesian Model
10.2.2.2 Parameter Estimation via MCMC
10.2.2.3 Application Example 2
10.3 Concluding Remarks
References
Chapter 11 Spatiotemporal Process Analytics and Simulations
11.1 Introduction to Space–Time Network Simulations
11.2 Network Complexity
11.3 Classifying Network Diffusion Processes
11.4 Spatiotemporal Simulation with Agent-Based Modeling (ABM)
11.5 Application Example
11.5.1 Modeling the Spatiotemporal Network of a Dengue Fever Outbreak
11.6 Concluding Remarks
References
Chapter 12 Spatiotemporal Analytical Unit Problems
12.1 Introduction
12.2 Review of Relevant Literature
12.3 Analytical Methods
12.3.1 Modifiable Areal-Temporal Unit Problem (MATUP)
12.3.1.1 Spatiotemporal Scale
12.3.1.2 Spatiotemporal Divisions
12.3.1.3 Spatiotemporal Boundaries
12.3.2 Research Method
12.4 Application Example
12.4.1 Data
12.4.2 The Scale Effect of Space–Time Unit
12.4.3 Effects by Different Division Schemes
12.4.4 Effects of the Spatiotemporal Boundary
12.5 Software and Usage
12.6 Concluding Remarks
References
Index

Citation preview

Spatiotemporal Analytics This book introduces readers to spatiotemporal analytics that are extended from spatial statistics. Spatiotemporal analytics help analysts to quantitatively recognize and evaluate the spatial patterns and their temporal trends of a set of geographic events or objects. Spatiotemporal analyses are very important in geography, environmental sciences, economy, and many other domains. Spatiotemporal Analytics explains with very simple terms the concepts of spatiotemporal data and statistics, theories, and methods used. Each chapter introduces a case study as an example application for an indepth learning process. The software used and the provided codes enable readers not only to learn the analytics but also to use them effectively in their projects. • Provides a comprehensive understanding of spatiotemporal analytics to readers with minimum knowledge in statistics. • Written in simple, understandable language with step-by-step instructions. • Includes numerous examples for all theories and methods explained in the book covering a wide range of applications from different disciplines. • Each application includes a software code needed to follow the instructions. • Each chapter also has a set of prepared PowerPoint slides to help instructors of a course on spatiotemporal analytics with the content explained. Undergraduate and graduate students who use Geographic Information Systems or study Geographical Information Science will find this book useful. The subject matter also pertains to an array of disciplines such as agriculture, anthropology, archaeology, architecture, biology, business administration and management, civic engineering, criminal justice, epidemiology, geography, geology, marketing, political science, and public health.

Spatiotemporal Analytics

Jay Lee

Designed cover image: Image from Shutterstock (Image ID: 1115087306) First edition publisshed 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 selection and editorial matter, Jay Lee; individual chapters, the contributors Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www. copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-303055 (hbk) ISBN: 978-1-032-303062 (pbk) ISBN: 978-1-003-304395 (ebk) DOI: 10.1201/9781003304395 Typeset in Times by codeMantra Access the Support Material: www.routledge.com/9781032303055

Contents Editor....................................................................................................... vii Contributors ............................................................................................. ix Chapter 1 Introduction to Spatiotemporal Analytics ............................ 1 Jay Lee Chapter 2 Spatiotemporal Centrography and Dispersion .....................13 Langxue Dang, Jay Lee, and Huiyu Lin Chapter 3 Spatiotemporal Quadrat Analytics ..................................... 35 Zhuo Chen Chapter 4 Spatiotemporal Nearest Neighbor Analytics .......................53 Qingsong Liu and Jay Lee Chapter 5 Spatiotemporal Ripley’s K and L Functions ....................... 77 Jay Lee Chapter 6 Spatiotemporal Autocorrelation Analytics ..........................91 Shengwen Li, Xuyang Cheng, Bo Wan, Junfang Gong, and Jay Lee Chapter 7 Spatiotemporal G Statistical Analytics .............................113 Huiyu Lin and Zhuo Chen Chapter 8 Spatiotemporal Kernel Density Estimation .......................127 Junfang Gong, Zhuang Zeng, Bo Wan, Shengwen Li, and Jay Lee Chapter 9 Spatiotemporally Weighted Regression .............................145 Bo Huang and Sensen Wu

v

vi

Contents

Chapter 10 Spatiotemporal Bayesian Regression .................................175 Ortis Yankey, Tao Hu, Han Yue, Peixiao Wang, and Xiao Xu Chapter 11 Spatiotemporal Process Analytics and Simulations ......... 207 Moira O’Neill and Jay Lee Chapter 12 Spatiotemporal Analytical Unit Problems .........................233 Langxue Dang, Huiyu Lin, and Jay Lee Index ......................................................................................................255

Editor Dr. Jay Lee received his PhD in geography from the University of Western Ontario in 1989. Since then, he has taught GIS and quantitative methods in geography at Kent State University. Dr. Lee’s research interests stem from integrating operations research and spatial analysis. He coauthored two editions of Statistical Analysis with GIS. Since 2017, Dr. Lee has worked on extending spatial analytics to spatiotemporal analytics. This book represents the research outcomes from the Applied Geography Laboratory at Kent State University from 2017 to 2022.

vii

Contributors Zhuo Chen Department of Geography Kent State University Kent, Ohio

Jay Lee Department of Geography Kent State University Kent, Ohio

Xuyang Cheng National Engineering Research Center for Geographic Information Systems China University of Geosciences Wuhan, China

Shengwen Li School of Computer Science China University of Geosciences Wuhan, China

Langxue Dang School of Computer and Information Engineering Henan University Henan, China

Huiyu Lin Department of Geography Kent State University Kent, Ohio Qingsong Liu Shenzhen e-Traffic Technology Shenzhen, China

Junfang Gong School of Geography and Information Engineering China University of Geosciences Wuhan, China

Moira O’Neill Department of Geography Kent State University Kent, Ohio

Tao Hu Department of Geography Oklahoma State University Stillwater, Oklahoma

Bo Wan School of Computer Science China University of Geosciences Wuhan, China

Bo Huang Department of Geography and Resource Management The Chinese University of Hong Kong Hong Kong, China

Peixiao Wang Surveying, Mapping, and Remote Sensing Wuhan University Wuhan, China

ix

x

Sensen Wu Department of Geography and Resource Management The Chinese University of Hong Kong Hong Kong, China and School of Earth Sciences Zhejiang University Zhejiang, China Xiao Xu Surveying, Mapping, and Remote Sensing Wuhan University Wuhan, China Ortis Yankey Department of Geography and Environmental Science University of Southampton Southampton, United Kingdom

Contributors

Han Yue Center for GeoInformatics for Public Security School of Geography and Remote Sensing Guangzhou University Guangzhou, China Zhuang Zeng School of Computer Science China University of Geosciences Wuhan, China

1

Introduction to Spatiotemporal Analytics Jay Lee Kent State University

CONTENTS 1.1 From Spatial Analytics to Spatiotemporal Analytics ....................... 4 1.2 Spatial Dependency and Spatiotemporal Dependency among Geographic Events or Objects .......................................................... 5 1.3 Space–Time Dependency.................................................................. 6 1.4 Concluding Remarks......................................................................... 9 References ................................................................................................. 9 Over the last decade or so, several newly developed spatiotemporal analytics have been proposed for users to analyze the spatial processes of a set of geographic objects or events. These analytics not only enable users to observe and analyze the different spatial patterns of events or objects at different times, but they also provide users ways to quantitatively assess how the different time-specific spatial patterns evolve over time into a spatial process. Figure 1.1 shows the relationship between a series of timeobserved spatial patterns linked to form a spatial process. In many ways, understanding spatial patterns is not the ultimate goal of a scientific inquiry when investigating data that have both spatial and temporal elements. It is often our interest in finding not only spatial patterns of a certain phenomenon, but also how such patterns change over time. For linking a sequence of time-specific spatial patterns of events or objects into a spatial process, conventional approaches often burden viewers of the sequence of displays (or maps) the task of imagining how those displays relate to each other and what that relevance may suggest. This is a weakness that the current geographical information systems (GIS) have because users are limited to only a handful of analytics when analyzing spatiotemporal processes. For example, Figure 1.2 shows sequences of maps for reported cases of dengue fever in Kaohsiung City, Taiwan, from 2003 to 2007. Either reported as raw counts of case numbers or against DOI: 10.1201/9781003304395-11

2

Spatiotemporal Analytics

FIGURE 1.1 From spatial patterns to a spatial process.

population densities, the space–time interaction embedded in these trends is extremely difficult to quantify. Yet, this task is often left for readers of reports or scientific reports to ‘imagine’ or engage in an ‘educated guess’. There have been many studies that proposed ways to quantify spatiotemporal interactions to understand the dynamic of changing spatial patterns as a spatial process (for example, Diggle et al. 1995; Wilesmith et al. 2003; Sanchez et al. 2005; Picado et al. 2007, with many other studies for similar objectives). Such quantification effort, however, often stopped short of full accounts of the interaction between space and time of the geographic objects or events being studied. To fill this gap, this book discusses spatiotemporal analytics over the next 11 chapters. Each of these chapters explains the background, concepts, theories, and methods of a particular spatiotemporal analytic. Some of these analytics may be technically simpler than others, but they may be easier to use and interpret the analytical results. Some of these may be more technically sophisticated, but they provide users with more options for performing analytical tasks. It must be noted, however, that these analytics share a common issue of not readily available for users to use. This is because current commercially available GIS software packages do not offer too many ready-to-use tools for these analytics. It is for this reason that this book provides discussions of spatiotemporal analytics with readyto-use computer code/programs accompanying the chapters in this book. Each chapter in this book also includes at least one application example. In the chapters, the authors provide detailed instructions for using the

FIGURE 1.2 Reported dengue fever cases in Kaohsiung City, Taiwan, 2003–2007.

Introduction to Spatiotemporal Analytics 3

4

Spatiotemporal Analytics

accompanying test dataset and the accompanying software tools to arrive at the presented analytic results. These computer code/programs were developed by the chapter authors to help readers to better understand and experiment with the spatiotemporal analytics discussed in this book. It should be noted that these computer code/programs were taken directly from previous studies carried out by the chapter authors, so they may not have the friendliest user interfaces and they were developed using different computer programming languages. Hence, we wish that the step-by-step instructions for using the software would alleviate most potential issues or difficulties. Another note on the computer code/programs is that we do not have any phone lines or designated emails to support software issues, but it is always possible to communicate with the authors to discuss potential applications for future uses.

1.1 FROM SPATIAL ANALYTICS TO SPATIOTEMPORAL ANALYTICS Geographic events or objects may be defined by their locations and some attributes that describe the different characteristics they have. Geographic events and objects can be represented in a vector data structure as points, lines, or polygons or in a raster data structure as rasters (or pixels), or known as field-based GIS data structure. In a raster data structure, an event or an object may be represented by a raster or a group of spatially adjacent rasters. An attribute table may be constructed to describe some characteristics of the rasters. In a vector data structure, geographic objects are defined by their coordinates and the information describing their attributes. If an event or an object takes no space in the world (such as a city on a smallscale map, a power pole on a large-scale map, a crime event, a reported disease case, and so on), it can be represented as a point. If the event or object has a linear form (such as the centerline of a street segment, a segment of a river, a geological fault line, a tornado touchdown path, and so on), it can be represented as a line segment. For events or objects that do occupy space in the world (such as a building, a farm, a city, a county, a state, etc.), they are usually represented by polygons. Locations of the events or objects may be defined by a pair or a sequence of geographic coordinates. For example, a pair of longitude and latitude coordinates or coordinates in a user-defined coordinate system is conventionally denoted as (x, y). Of course, the actual numeric expression of the location coordinates would be determined by the map projection of the coordinate system. For the purpose of the spatiotemporal analytics being discussed in this book, a third coordinate can be added to reflect the time when an event or an object occurs, or (x, y, t), with t denoting the time. For ease of illustration and discussion, from here onward, events will be used to include objects in this book.

Introduction to Spatiotemporal Analytics

1.2

5

SPATIAL DEPENDENCY AND SPATIOTEMPORAL DEPENDENCY AMONG GEOGRAPHIC EVENTS OR OBJECTS

When attempting to understand a set of events (or objects), analysts often start by examining whether the events distribute spatially or spatiotemporally. Through such an assessment, the events may be clustered, disperse, or spatially or spatiotemporally random. Following such observations, analysts often explore environmental, social, economic, or even political factors that may have potentially influenced or contributed to shaping such spatial or spatiotemporal patterns. If the spatial or spatiotemporal patterns are deemed desirable, it would be beneficial to find ways to promote influencing factors. Alternatively, intervention programs or policies may have to be implemented to reverse any negative effects that potentially influencing factors may have. The assessment of a spatial or a spatiotemporal pattern to be clustered, disperse, or random is conventionally done with an assumption that near things tend to be more alike than distant things (a.k.a. the First Law of Geography, Tobler 1970; Miller 2004). If this notion stands, analysts can choose from a suite of quantitative analytics, as will be discussed in the coming chapters in this book, to assess the degree (strength) and direction of the spatial or spatiotemporal dependency among a set of events they study. In short, if spatial or spatiotemporal dependency exists among events, it is likely that events occurring at specific locations and times have potentially been influenced by certain factors or for certain reasons. To explore or to detect these underlying factors, current GIS software packages do offer plenty of tools for analyzing spatial dependency. However, there are only a limited set of tools currently available for us to analyze spatiotemporal dependency among events. We hope the subsequent chapters would help. From spatial analytics to spatiotemporal analytics, users should keep in mind that certain limitations would remain after adding to the analysis with the temporal dimension of event data. First, the modifiable area unit problems, a.k.a. MAUPs, that are associated with spatial analytics have a closely related sibling in spatiotemporal analytics. In carrying out spatial or spatiotemporal analysis, analysts need to carefully assess the different options and choose appropriate analytical units to do the analysis. As discussed in the chapter on spatiotemporal units in this book, different analytics may lead to different analytical results and different spatial or spatiotemporal units would also lead to different analytical outcomes. Spatial dependency between a pair of events may be different from the spatiotemporal dependency between a pair of events in that an earlier event may or may not influence the later event but not the reverse. Consequently,

6

Spatiotemporal Analytics

spatiotemporal analytics should be used with care and with a thorough consideration on the temporal implications of how events are related to one another. The MAUP is therefore extended to modifiable area and temporal unit problems, MATUP, as discussed in the last chapter of this book.

1.3

SPACE–TIME DEPENDENCY

Tests for spatial patterns fail when applied to study the dynamics of a spatial process. When geographic events are associated with temporal attributes, it should be possible to check whether space and time are dependent or if there exists a link between space and time of where and when the events occur. For space–time interactions, it is possible to use the Knox test to see if space–time clusters can be identified at certain space–time distances (Knox 1964; Knox and Bartlett 1964). But the definition of how close two events are in space–time distances (a.k.a. critical distances) to be space–time adjacent has been subjectively determined. To improve on this, the Mantel test (Mantel 1967) applied the notion of distance decay in the tests, which weight nearby events more than distant events. In practical ways, spatial adjacency can be defined as 1 when two events are closer to each other than a critical (or a user-defined) spatial distance or assessed to be adjacent based on a ratio that an analyst defined. If the spatial distance exceeds the critical distance, a value of 0 is assigned to indicate non-adjacency. Similarly, temporal adjacency can be 1 or 0, according to whether their temporal distance is within the critical temporal distance or if it abides to what distance decay defines. A pair of events are space–time (or spatiotemporally) adjacent, or to be space– time neighbors, only if both spatial adjacency and temporal adjacency are 1. Mathematically, space–time adjacency is equal to 1 only when 1 (spatial adjacency) × 1 (temporal adjacency). This is generally known as the space–time multiplicative principle. It should be noted, however, that the temporal adjacency may need to be considered further and defined as onedirectional if the notion is of allowing only earlier events to influence later events (but not vice versa). Adding time to spatial problems often needs to increase the number of dimensions of analysis. Many researchers have used Minkowski space– time, which merges 3-D Euclidean space with time to create a 4-D manifold that Harvey (1969) argued to be essential to the theory of relativity. The field of Time Geography (Hagerstrand 1963, 1973) pioneered the analytical and theoretical inquiry in that area by adding time to an x, y plane to create a 3-D cube that conveyed individual movements over space and time, which are also known as life-line trajectories. Research in time

Introduction to Spatiotemporal Analytics

7

geography led to works in space–time prism that were referenced by many time geographers such as Miller (1994) and Peuquet (1994). Statistical detection of spatial patterns may assist analysts in assessing the level and direction of spatial and spatiotemporal dependency by detecting the patterns to be (or similar to) clustering, dispersion, or seeming random. For this, a family of spatial interaction models exists for analyzing and understanding such relationship, as Wilson (1970) reviewed. These models cleverly addressed the issue of geographical data, violating a fundamental assumption of traditional (classic) statistics that requires observations be unrelated and independent of each other. Blending space and time is an important area of research in mathematical physics, evolutionary biology, and epidemiology. The literature on spatiotemporal analysis has drawn on the transdisciplinary nature of the topic. Within geography and GIS, two closely related sub-streams have emerged, namely spatiotemporal dependency and spatiotemporal interaction. The formal concerns the challenges that arise in analyzing geographical data, whose observations are correlated in time and space, with traditional statistics. The latter involves characterizing the space–time relationship of variables that travel between two or more areas. There have been some attempts to assess spatiotemporal clustering/dispersion in geographic events. For example, Kulldorff (1997) and Kulldorff et al. (1998) use space–time scans (SaTScan) through the space–time distribution of geographic events to find space–time clusters. The approach is based on a simple notion, and the algorithm is practical. Cylinders with incrementally increased spatial extents as the base and the incrementally lengthened time durations as the height were used in the scans. SaTScan (www.satscan.org) starts with small cylinders when searching for clusters. Clusters of different sizes can be found when cylinders were enlarged incrementally in scan iterations. While the approach is robust, the identified clusters are often coarse, encompassing sometimes large geographical extents or long time durations than for practical uses. A separate attempt was the use of space–time cube in ArcGIS and ArcGIS Pro (esri.com) to find hotspots (space–time clusters) of the events being identified. The space–time extent as defined by the distribution of geographic events being studied is partitioned into bins. A statistic based on counts of events in bins is used to test the statistical significance of the identified hotspots. As this tool is offered in a commercial software, it is easy to use. The best advantage of this approach is the classification of hotspots into different types, including those that are consistently hotspots, those that were not hotspots but are becoming so, those that were hotspots but no longer so, and so on. This new way of recognizing the different characteristics of hotspots brought the analysis of space–time pattern to a much higher level than before.

8

Spatiotemporal Analytics

Spatiotemporal interactions are a class of relationships that include human mobilities such as commuting and tourism, as well as biogeographical migration, spread of diseases, supply chains of goods and services, communication linkages, or even on-/off-line rumor circulations. Geographers have borrowed from physics the notion of gravity models to analyze spatial interactions between places (Wilson 1971; Gonzalez et al. 2008; Liu et al. 2012; Xiao et al. 2013; Gao et al. 2013; Kim et al. 2018; Jia et al. 2020). Spatial diffusion is a special type of spatial interaction that needs not depend solely on distance to model geographical phenomena. Goods, diseases, and information are often transmitted across space and time through networks of social or economic contacts (Bertazzon 2003; Lee et al. 2014). Such processes may follow expansion/contagious diffusion patterns, which traditionally spread across spatially autocorrelated networks, or the spread pattern may follow relocation or hierarchical diffusion that the spatial phenomenon ‘jumps’ over distances (Lee et al. 2014). Bertazzon’s (2003) dynamic Moran’s I relies on a ‘hypermatrix’ or ‘matrix of matrices’ of weights to explicitly add temporal dependence to a spatial diffusion model of ski lift innovation in the Italian Alps. While the exponential growth of computing power and proliferation of software that can manage multidimensional modeling tasks has facilitated recent progress such as that described in Lee et al. (2014), the curse of dimensionality continues to pose challenges in accurate prediction within dynamic spatiotemporal systems research. In contrast to static, descriptive spatiotemporal models that report means, variances, and covariances as ‘moments’ in a process, dynamical models are explicitly based on the evolution of time. Wikle (2015) suggests conceptualizing this difference as one between a descriptive model characterized by a marginal probability distribution of the process and a dynamical model based on conditional probability distributions, ‘where the present state of the process in conditioned on its past’ (p. 87). Bayesian hierarchical models (BHM) are designed to do just that (Cressie and Wikle 2011; Wikle 2015). As we have discussed, observations in space and time often form clusters whose membership depends on shared properties. These shared properties allow us to nest parameters at different levels of groups (a hierarchy), and to come up with a weighted average of pooled (i.e., multiple models for each group that mix fixed and random effects to produce different versions with the same variance) and unpooled models (i.e., models with unequal variances). The conceptual stages of Bayesian theory can be followed, including a data model [data | process, data parameters], a process model [process | process parameters], and a parameter model [data parameters, process parameters].

Introduction to Spatiotemporal Analytics

9

Together, these stages produce a posterior distribution that is proportional to the product of the stages that make up the hierarchical model, and from which we can generate our inferences. This book has a chapter devoted to Bayesian spatiotemporal analytics to extend this discussion.

1.4

CONCLUDING REMARKS

One last note about chapters in this book is that we fully recognize that this book is to introduce technical aspects of spatiotemporal analytics, not to provide a complete literature review or literature lineage. For that, we hope authors of research articles, reports, or books on anything related to space–time analysis pardon us for not citing all relevant publications to keep the length of the book in a manageable level. Chapters in this book were written by different authors, so their writing styles may vary. The software accompanying chapter texts were also developed by different chapter authors in different computer languages and compiled/prepared in different ways for users to experience the analytics firsthand. In practical use, these software tools can be used to carry out analyses in actual studies beyond just using them as learning tools. By making them accessible to all readers, we hope to promote the spread of using spatiotemporal analytics. To this end, we hope to hear from readers and users of the software tools if any bugs or issues are encountered. We will attempt to correct the problems for future revisions. Part of the educational ambition we have is hoping that Spatiotemporal Analytics can be offered as a semester-long course in as many universities as possible. To help potential instructors to initiate this, the chapter authors have developed instructional slides in PowerPoint (Microsoft) format, available upon request. For accompanying software tools and/or instructional slide sets, please send your request to Jay Lee, [email protected].

REFERENCES Bertazzon, S. (2003). Spatial and temporal autocorrelation in innovation diffusion analysis. Computational Science and Its Applications – ICCSA 2003, 23–32. Cressie, N. and C. K. Wilkie (2011). Statistics for Spatiotemporal Data. Hoboken, NJ: John Wiley & Sons, Inc. Diggle, P. J., A. G. Chetwynd, R. Haggkvist, and S. E. Morris (1995). Secondorder analysis of space-time clustering. Statistical Methods in Medicine 4, 124–136. Gao, S., Y. Liu, Y. Wang, and X. Ma (2013). Discovering spatial interaction communities from mobile phone data. Transactions in GIS 17(3), 463–481. Gonzalez, M. C., C. A. Hidalgo, and A. L. Barabasi (2008). Understanding individual human mobility patterns. Nature 453, 779–782.

10

Spatiotemporal Analytics

Hagerstrand, T. (1963). Geographic measurements of migration: Swedish data. In Sutter, J. (ed.) Les Déplacements Humains. Monaco: Entretiens de Monaco en Sciences Humaines, Prémiere Session. Hagerstrand, T. (1973). Innovation Diffusion as a Spatial Process. Chicago: University of Chicago Press. Harvey, D. (1969). Explanation in Geography. London: Edward Arnold. Jia, J. S., X. Lu, Y. Yuan, G. Xu, J. Jia, and N. A. Christakis (2020). Population flow drives spatiotemporal distribution of COVID-19 in China. Nature 582, 389–394. Kim, S., S. Jeong, I. Woo, Y. Jang, R. Maciejewski, and D. S. Ebert (2018). Data flow analysis and visualization for spatiotemporal statistical data without trajectory information. IEEE Transactions on Visualization and Computer Graphics 24(3), 1287–1300. Knox, G. (1964). Epidemiology of childhood leukaemia in Northumberland and Durham. British Journal of Preventive & Social Medicine 18(1), 17. Knox, E. G., and Bartlett, M. S. (1964). The detection of space-time interactions. Journal of the Royal Statistical Society. Series C (Applied Statistics) 13(1), 25–30. Kulldorff, M. (1997). A spatial scan statistic. Communications in Statistics Theory and Methods 26(6), 1481–1496. Kulldorff, M., W. F. Athas, E. J. Feuer, B. A. Miller, and C. R. Key (1998). Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of Public Health 88(9), 1377–1380. Lee, J., J.-G. Lay, W. C. B. Chin, Y.-L. Chi, and Y.-H. Hsueh (2014). An experiment to model spatial diffusion process with nearest neighbor analysis and regression estimation. International Journal of Applied Geospatial Research 5(1), 1–15. Liu, Y., C. Kang, S. Gao, Y. Xiao, and Y. Tian (2012). Understanding intro-urban trip patterns from taxi trajectory data. Journal of Geographical Systems 14, 463–483. Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research 27, 209–220. Miller H. J. (1994). Modelling accessibility using space-time prism concepts within geographical information systems. International Journal of Geographical Information Systems 5(3), 287–301. Miller, H. J. (2004). Tobler’s first law and spatial analysis. AAAG 94(2), 284–289. Peuquet, D. J. (1994). It’s about time: A conceptual framework for the representation of temporal dynamics in geographic information systems. Annals of the Association of American Geographers 84(3), 441–461. Picado, A., F. J. Guitian, and D. U. Pfeiffer (2007). Space-time interaction as an indicator of local spread during the 2001 FMD outbreak in the UK. Preventive Veterinary Medicine 79, 3–19. Sanchez, J., H. Stryhn, M. Flensburg, A. K. Ersboll, and I. Dohoo (2005). Temporal and spatial analysis of the 1999 outbreak of acute clinical infectious bursal disease in broiler flocks in Denmark. Preventive Veterinary Medicine 71, 209–223.

Introduction to Spatiotemporal Analytics

11

Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography 46, 234–240. Wikle, C. (2015). Modern perspectives on statistics of spatiotemporal data. WIREs Computational Statistics 7, 86–98. Wilesmith, J. W., M. A. Stevenson, C. B. King, and R. S. Morris (2003). Spatiotemporal epidemiology of food-and-mouth disease in two countries of Great Britain in 2001. Preventive Veterinary Medicine 61, 157–170. Wilson, A. G. (1971). A family of spatial interaction models and associated developments. Environment and Planning A: Society and Space 3, 1–32. Xiao, Y., F. Wang, Y. Liu, and J. Wang (2013). Reconstructing gravitational attraction of major cities in China from air passenger flow data 2001–2008: A particle swarm optimization approach. Professional Geographer 65, 265–282.

2

Spatiotemporal Centrography and Dispersion Langxue Dang Henan University

Jay Lee and Huiyu Lin Kent State University

CONTENTS 2.1 Introduction .....................................................................................13 2.2 Review of Relevant Literature .........................................................16 2.3 Analytical Methods .........................................................................18 2.3.1 Centrography of Spatiotemporal Points ................................18 2.3.1.1 Spatiotemporal Mean Center ...................................19 2.3.1.2 Weighted Spatiotemporal Mean Center...................19 2.3.1.3 Changes in Spatiotemporal Mean Centers ............. 20 2.3.2 Dispersion of Spatiotemporal Points.................................... 22 2.3.2.1 Standard Spatiotemporal Distance ......................... 22 2.3.2.2 Standard Spherical Volume .................................... 23 2.4 Application Example ...................................................................... 24 2.5 Software and Usage ........................................................................ 27 2.5.1 Hardware/Software Requirements ....................................... 28 2.5.2 Software Usage for Spatiotemporal Mean Centers .............. 29 2.5.3 Software Usage for Standard Spatiotemporal Distance ........31 2.6 Concluding Remarks....................................................................... 32 References ................................................................................................33

2.1

INTRODUCTION

The first step in analyzing a large volume of data is often the calculation of their statistical summaries. The summaries provide quantitatively digested information that offers a glance of the data, sometimes as an initial assessment. This is especially useful when dealing with large quantities of data. In addition, such summaries calculated from multiple datasets enable analysts to do a general comparison between/among the datasets. Following DOI: 10.1201/9781003304395-2

13

14

Spatiotemporal Analytics

the terminology used in Lee and Wong (2001), we refer to these summary statistics as those measuring the central tendency of the data. In this era of big data, such statistical summaries seem to be even more important than before. In a similar way, measuring the central tendency of a set of spatiotemporal objects or events that may be represented as points allows analysts to understand how the points concentrate around their central point, or the centroid of the set of points, and how the other points deviate from the central point. Such measurements of central tendency are not only a quantitative way to understand and compare spatiotemporal datasets, they also enable analysts to visually, or qualitatively, map the distributions of multiple datasets for comparisons. Statistical summaries of a set of data usually include values of the maximum, minimum, average, median, mode, standard deviation, and variance of the data values. In terms of measuring the variation of data values beyond these descriptive statistics, one may calculate the skewness, kurtosis coefficients of a set of data values. For example, consider the exam scores students achieve for a spatiotemporal statistics class. Assuming there are n students, their scores are xi , i = 1,2,3,…, n. In this case, the scores are our data values. The mean score would tell us how the group performs on average. If mean scores of other similar datasets (from other class groups) are available, the calculated mean score would let us compare how this group of students performs as compared to last year’s group or another group of students taking the same course/exam with a different instructor, and so on. The mean score can be calculated with the following equation:

∑ x=

n

xi

i=1

n

(2.1)

It should be noted, however, that the mean calculated from a dataset that has a very much larger or smaller value from others in the dataset would typically not be a good representative of the entire dataset. Therefore, the measured central tendency is often accompanied by some measurements of averaged deviation of the data values from their mean. The statistics for such averaged deviation include standard deviation and variance, and both of these give the degree to which data values in a dataset deviate from their mean. The smaller the values of standard deviation and variance, the more concentrate the values are around their mean in a dataset. Alternatively, large values of standard deviation and variance indicate that the data values are more disperse from their mean.

Spatiotemporal Centrography and Dispersion

15

To assess the dispersion of data values, the following equation can be used to calculate the standard deviation (S) of the data:



S=

n i=1

( xi − x )2 n

(2.2)

The variance (V) of values in a dataset can be calculated as

V=S

2

∑ =

n i=1

( xi − x )2 n

(2.3)

because variance of a set of data values is the squared value of their standard deviation. Geographic objects or events are often represented as points, lines, or polygons. These three types of objects are often referred to as forming the geometry of geographic elements. Points represent objects that do not occupy any areas in space. Examples include a tree, a water hydrant, or a power pole by a street. Lines are used to represent objects whose primary property of our concern are their lengths and directions, such as segments of rivers, street centerlines, or boundaries of administrative units. Schools, hospitals, or buildings on a large-scale map are often represented as polygons because they do occupy space on the ground. These objects are depicted by polygons whose boundaries delineate their area extents. Geographic scale affects how objects are represented on a map, or in a dataset of geographic information. An example is a city that occupies the entire map sheet at a large geographic scale, e.g., one inch on the map representing 100 m on the ground. On a map of the scale of 1 inch on the map representing 10,000 km on the ground (or a small-scale map), however, that same city may become just a point on the map. Geographic objects or events, when represented by points, can be defined by a pair of coordinates, (x, y), in space. In practice, the coordinates may be longitudes and latitudes. Alternatively, coordinates of a user-defined coordinate system can also be used to define locations of geographic objects or events. It should be noted that distances between actual longitude and latitude coordinates of certain locations on the surface of the Earth may be different, depending on the map projections and the coordinate systems in which the coordinates are defined. In a two-dimensional space, the mean center (or spatial mean) is the average location of a set of locations given by a set of geographic objects or events being represented as points. Once the coordinate system of these

16

Spatiotemporal Analytics

points is defined, the mean center can be easily calculated by using the following equation:  ( x , y ) =  



n

xi

i=1

n

∑ ,

yi   n   n

i =1

(2.4)

where x ,  y are the coordinates of the mean center, xi ,  yi are the coordinates of the i-th point, and n is the number of points. A coordinate pair, ( x , y ), can define a location in a two-dimensional space. When adding the time dimension, a space–time position can be defined in a three-dimensional space, (x, y, t). For example, a bank ATM needs only a pair of coordinates to define its location in space. When adding its installation date, the space–time position of the ATM would need a triplet to define it. In this fashion, all ATMs in a city, together with their installation dates, can be analyzed to allow the detection of any spatiotemporal clusters, which would likely be associated with how the city grew or its economic development. Spatiotemporal data can be used in many applications and analyses, such as understanding how climatic trends change in different locations, prevalence of public health concerns, analysis of earthquakes, management and control of traffic patterns, and changes in crime hotspots. Figure 2.1 shows how the spatiotemporal coordinates, (x, y, t), define positions of spatiotemporal objects or events. Note that the x-axis and y-axis define the two-dimensional space. The (vertical) t-axis defines the temporal dimension, from the bottom of the diagram as the starting time and the top of the diagram as the ending time of the included time period. Many analytical methods that are used for analyzing point patterns in two dimensions can be extended to analyzing point patterns in three dimensions. The rest of this chapter first reviews the relevant literature as the basis for introducing the extended methods for analyzing point patterns in three dimensions. At the end of this chapter, a description and an example application are given for the analysis of central tendency and dispersion of a set of spatiotemporal points.

2.2 REVIEW OF RELEVANT LITERATURE Classic statistics offer descriptive statistics that summarize and digest data values, including mean, mode, median, standard deviation, variance, skewness, and kurtosis. All these statistics are to allow users to understand how

Spatiotemporal Centrography and Dispersion

FIGURE 2.1

17

A set of spatiotemporal events in three dimensions (x, y, t).

the data values distribute, how concentrated or how dispersed they distribute, or if the distribution has any directional bias (Dickinson, 1963). Such statistics, along with the computation of standard distance, can be found in Bachi (1963). Levine (1996), in a software package, crimestat (https://nij.ojp.gov/ topics/articles/crimestat-spatial-statistics-program-analysis-crimeincident-locations#about), that he and his group developed, provides tools for spatial statistics. Later, Lee and Wong (2001) extended the one-dimensional descriptive statistics into a two-dimensional descriptive statistics for spatial statistics with accompanying computer scripts (Avenue) for users to apply spatial statistics. Their discussion included the measurements for central tendency and dispersion of points defined in space. Later, Burt et al. (2009) also discussed similar measures for central tendency and dispersion of spatially defined locations. The introduction of spatial statistics was soon noticed by commercial vendors of GIS software. For example, Scott and Janikas (2010) provided a summary of tools that ArcGIS developed and offered. Spatial descriptive statistics were widely applied in mapping and analysis of geographic information as first presented by Dickinson (1963). Others also extended the classic statistics into spatial statistics in their applications. For example, Wong (1999) reviewed centrographic measures as examples for implementing spatial statistics in GIS. Scott and Warmerdam (2005) used spatial statistics to analyze crime events. For public health applications, Lu (2011) applied spatial statistics to identify and visualize

18

Spatiotemporal Analytics

disease patterns in a doctoral dissertation. Not surprisingly, spatial statistics were also used to carry out socioeconomic studies, such as that described in Melendez-Pastor et al. (2014) on land cover changes in rural areas. Romanian researchers used spatial statistics to analyze the spatial trends of tourism (Bujdoso et al., 2015). Spatial statistics were also applied in tracking how the center points (mean centers) of population moved over time (Thapar et al., 1999; Plane and Rogerson, 2015). Most, if not all, descriptive spatial statistics have now become the de facto initial steps in exploratory spatial analysis. Once a geographic pattern is assessed to be statistically significantly different from a random pattern, researchers would explore further on how the non-random patterns are formed and if those patterns are related to any particular attributes of the studied places. The same processes are expected to be the case for spatiotemporal studies. The measurements for spatiotemporal central tendency and degrees of spatiotemporal dispersion among space–time events represented as points, though seeming simple and straightforward to compute, offer very useful hints for whether any space–time patterns warrant further investigations.

2.3

ANALYTICAL METHODS

Each spatiotemporal event can be represented as a coordinate triplet (x, y, t). These triplets define space–time positions that are referred to as points in this chapter. The analytical methods described here include those for finding the centrography of spatiotemporal points and those measuring the degree to which these points disperse spatiotemporally.

2.3.1

Centrography of Spatiotemporal pointS

Centrography of spatiotemporal points includes a mean center, a weighted mean center, standard distances, and how these centers change over time. Each spatiotemporal point has a location and a time. Locations in space may have different coordinates under different map projections. Coordinates may also be different at different geographic scales. Similarly, spatiotemporal points may also have different temporal definitions when time is partitioned into different units. Given these varying situations, it is important that all spatiotemporal points in the same dataset be defined in precisely the same way. The last chapter of this book addresses the issue of employing different spatiotemporal units in an analysis of spatiotemporal data and the potentially varying analytical outcomes that one can expect.

19

Spatiotemporal Centrography and Dispersion

2.3.1.1 Spatiotemporal Mean Center Given a set of n spatiotemporal points, pi ,  i = 1,2,…,n, whose coordinates are {( xi , yi ,ti ) | i = 1,2,…n}, the spatiotemporal mean center of the points can be calculated as  ( xmc ,  ymc , tmc ) = ( x , y , t ) =  



n

xi

i=1

n

∑ ,

n

yi

i =1

n

∑ ,

ti  i =1  n   n

(2.5)

where ( xmc ,  ymc ,tmc ) or ( x , y , t ) is the coordinates of the mean center, xi , yi , ti are the coordinates of point pi, the i-th point in the dataset, and n is the number of points in the dataset. As shown in Figure 2.2, triangular symbols represent the spatiotemporal positions of data points. The mean center is represented by a circular symbol that indicates the central tendency of all points in the dataset. 2.3.1.2 Weighted Spatiotemporal Mean Center Just as each location in space is different from others in some way, every spatiotemporal position is unique if considering that the different properties (or attributes) may be different from others. To account for such

FIGURE 2.2 A set of spatiotemporal points (triangles) and their mean center (circle).

20

Spatiotemporal Analytics

differences, it is possible to give each spatiotemporal point a weight in calculating the spatiotemporal mean center such that the different levels of importance of the points can be accounted for. The weights can be defined in different ways. For example, bank robberies may have occurred at different locations (i.e., bank branches or ATM locations), at different times, and have different amounts of money involved. The amounts of money taken can be used as weights in calculating the mean center of all such robberies. The essence of calculating a weighted mean center lies at finding not only where the events may have concentrated, but also how that central tendency may be affected by the concerned characteristics of the events. A weighted spatiotemporal mean center of a set of spatiotemporal points can be calculated as



where wi is the weight associated with point pi. In the above equation, each point is associated with a given weight by multiplying the weight to the coordinates of the point. Consequently, the denominators in the equation take away such weights to return the weighted coordinates to the original magnitudes. This equation is a simple extension of the weighted mean center as given in Lee and Wong (2001), different only in adding the time dimension. Please note that a weighting scheme for a set of spatiotemporal positions depends on the attribute information associated with the events. For example, the bank robberies can be weighted by the amounts of money lost at those spatiotemporal positions. They can also be weighted by the number of times robberies occurred at each location. 2.3.1.3 Changes in Spatiotemporal Mean Centers Since spatiotemporal mean centers can be calculated for different time periods, it would be useful if the movements of the mean centers can be measured or visualized. Because each mean center can be considered as a representative of a set of points at a particular time period, over different time periods, such mean centers typically provide a snapshot of how all the points and their distributions are changing over time. Without mean centers calculated from different time periods, it is virtually impossible for

21

Spatiotemporal Centrography and Dispersion

FIGURE 2.3 The movement (or distance) between two mean centers. Triangles are spatiotemporal points and mean centers are represented by circular points.

one to appreciate the changes in the positions of points and changes in the distributions. By detecting and visualizing the movements of mean centers over time, one can better understand the scope and the trend of behavioral patterns of the spatiotemporal points. Note that mean centers can be calculated from multiple sets of spatiotemporal points. The distances, or the magnitudes of changes in the positions of mean centers, can give hints for how different the point sets are. So, a spatiotemporal deviation distance between two mean centers can be calculated as Dab =

( xa − xb ) + ( ya − yb ) + (ta − tb ) 2

2

2

(2.7)

where Dab is the distance between spatiotemporal mean centers a and b, a, b represent two different scopes of coordinate ranges, or two different time periods, etc., and ( xa , ya ,ta ) and ( xb , yb ,tb ) are coordinates of the two mean centers. Figure 2.3 shows the distance (or movement) between mean center a and mean center b by a straight line. In this figure, points of the two sets are represented by triangles of two gray shades and the two mean centers are the two circles.

22

Spatiotemporal Analytics

2.3.1.3.1 Orientations and Changes in Spatiotemporal Mean Centers For the same set of geographic objects or events, their locations may change over time. With the changes, the mean centers may be at different locations at different times. Additionally, a mean center can be calculated from each set of space–time points so there would be multiple mean centers if multiple sets of space–time points are being compared. Regardless of which of the two situations is of interest, orientation, i.e., an indication of the directional changes, between two mean centers can be calculated as a vector V, using the equation below: V = ( x b − x a ,  yb − ya , tb − ta ,)

(2.8)

When examining just the spatial orientation without considering time, which means the projection of the two mean centers onto a spatial plane, we can calculate the angle of such movement as θ , as shown below:

θ = tan −1 ( yb − ya ) ( x b − x a ) 

(2.9)

On the horizontal spatial plane in space, θ shows the orientation of the movement between mean centers. It is useful to have such information when comparing multiple space–time points. For example, the changing orientations of two sets of tornado touchdown locations, or two sets of earthquake locations, and so on.

2.3.2

DiSperSion of Spatiotemporal pointS

Spatiotemporal mean centers show the central tendency among space–time points. They give a general indication for where the likely centers of different space–time point sets are. But, in many cases, understanding just the central locations of space–time points may not be sufficient, especially if point sets contain a few points that deviate from the mean center with distances that are significantly longer than those of most other points in the set. These few points may cause distortion of how the entire set of points distribute. To that end, it is also necessary to keep in mind that most geographic objects or events would not occur and/or distribute evenly across space and over time. Such observations often require additional summary statistics to help us better understand the spatiotemporal distribution of space–time points. 2.3.2.1 Standard Spatiotemporal Distance There is a measurement, standard distance, in spatial statistics that provides an indication for how disperse or concentrated a set of points distribute

23

Spatiotemporal Centrography and Dispersion

across space. Extending from that notion, a spatiotemporal standard distance (SD) is given as



SD =

n i=1

( xi − xmc )2 +



( yi − ymc )2 + ∑ i=1(ti − tmc )2 i =1 n

n

n

(2.10)

where ( xmc ,  ymc ,tmc ) is the coordinate triplet of the spatiotemporal mean center. The calculation in the above equation compares coordinates of each point with those of the spatiotemporal mean center, squares the differences to avoid the offsets between positive and negative differences, divides the sums of such squared differences by the number of points, and then takes the square root of the sum to return the result into the original numerical magnitude. If the spatiotemporal points are associated with weights, the calculation of spatiotemporal standard distance can incorporate the weighted spatiotemporal mean center as shown in the equation below:

SD =



n

wi ( xi − x wmc ) +

i=1

2



n

wi ( yi − ywmc ) + 2

i =1



n



n

wi ( ti − twmc )

2

i =1

wi

i=1

(2.11) where ( xwmc, ywmc, twmc ) is the coordinate triplet of the weighted spatiotemporal mean center. 2.3.2.2 Standard Spherical Volume The standard distance in spatial statistics, once extended to include the temporal dimension, would become a standard spherical volume. (Note: Conceptually, this may also be considered as a standard spherical space. To avoid confusing the space with the spatiotemporal wording, volume is used.) It can be formulated as using the spatiotemporal mean center as the centroid and the standard distance (calculated with the aforementioned equations) as the radius. In general, the more disperse a set of spatiotemporal points are, the longer their standard distance would be. In turn, the larger the volume of the spatiotemporal standard sphere would be. Alternatively, a concentrated set of spatiotemporal points would expect their standard sphere to be small, reflecting their clustering distribution. One such spherical volume is shown in Figure 2.4. In the figure, the square at the center denotes the spatiotemporal mean center. Solid circles represent the spatiotemporal points inside the sphere while the empty circles are outside of the sphere.

24

Spatiotemporal Analytics

FIGURE 2.4 Standard sphere example. Square is the spatiotemporal mean center, solid color circles are inside the sphere, and the empty circles are outside of the sphere.

2.4

APPLICATION EXAMPLE

To better describe the spatiotemporal descriptive statistics we have discussed so far in this chapter, we use a set of space–time events, which are the recorded burglary cases from the 2013 911-calls for service in Portland, Oregon. These data were downloaded from http://nij.gov/funding/pages/ fy16-crime-forecasting-challenge.aspx. The Portland map (in shapefile format) was also downloaded from the same website. From this dataset, Figure 2.5 shows the distribution of burglary events as extracted from the downloaded dataset. Using the methods discussed in this chapter, Figure 2.6 shows the locations of spatiotemporal mean centers of burglaries in 2013 by weeks and Figure 2.7 shows them by months. As expected, the general trends of how mean centers moved are similar between the two figures, with Figure 2.6 offering a more detailed illustration of such movements. Please note that the temporal dimension (i.e., the vertical axis) starts from the bottom (earlier time) and proceeds upward (later time). Furthermore, Figure 2.8 shows the deviation among monthly spatiotemporal mean centers over the same time period. Using the equation given in this chapter, the deviation was calculated as 3,871.66 data units. This deviation was calculated using coordinate triplets of all weekly spatiotemporal mean centers.

Spatiotemporal Centrography and Dispersion

FIGURE 2.5 Burglary events in Portland, Oregon, 2013.

FIGURE 2.6 Spatiotemporal mean centers by weeks.

25

26

Spatiotemporal Analytics

FIGURE 2.7

Spatiotemporal mean centers by months.

FIGURE 2.8

Deviation of weekly spatiotemporal mean centers.

Given that the spatial units and temporal units are not the same and are not directly compatible, we suggest that all coordinates be standardized prior to the calculation of the deviation, such as the following steps:

27

Spatiotemporal Centrography and Dispersion

FIGURE 2.9 Standard sphere of the 2013 burglary events in Portland, Oregon.

{

Given a set of n spatiotemporal points, P = pi | ( xi , yi , zi ) where  i = 1, 2,…, n} coordinates can be standardized as ( xi′, yi′,ti′ ), using

xi′ =

xi x max − x min

(2.12)



yi′ =

yi ymax − ymin

(2.13)



ti′ =

ti tmax − tmin

(2.14)

Based on the equation for calculating standard spatiotemporal distance, we can use the standardized coordinates to compute this distance, which is 0.4146 data unit. Then we use it as the radius, together with the spatiotemporal mean center, to construct a standard sphere as shown in Figure 2.9. Please note that the small red circles are events lying outside of the standard sphere and the green ones are inside the sphere.

2.5

SOFTWARE AND USAGE

To help readers use the analytics introduced in this chapter, we developed a set of Python tools. This section discusses the procedures for downloading

28

Spatiotemporal Analytics

and using the tools. For software code and example data sets, please email [email protected] with name, affiliation information, a copy of purchase receipt and allow up to a week to receive links to download Chapter02.zip. Once downloaded, the .zip files can be uncompressed or restored to any folder on your computer hard drive, such as C:\ sttools\centrography.

2.5.1

harDware/Software requirementS

Before using the tools downloaded from the aforementioned website, it is necessary to install some environmental settings, which are all available in the Software and Package folder in the uncompressed folder. The installation steps are Step 1: Install Python 2.7 Double-click python-2.7.13.amd64.msi. The working environment for Python 2.7 needs to be set according to the specification of Python 2.7. It should be straightforward to find those with Internet search engine. Step 2: Install PyQt Double-click PyQt4-4.11.4-gll-Py2.exe, then install PyQt according to instructions on screen. After finishing Steps 1 and 2, locate the folder where Python 2.7 was installed and open the subfolder Scripts, for example, C:\Python27\Scripts. In this folder, press Alt and Ctrl keys and use mouse cursor to open the command window, then continue with the following steps. It is important to note that the path for the files to be installed on the command line needs to supply the name and paths of the actual folder path based on where the tool is unzipped. Suppose the decompression is located at C:\sttools\centrography\MeanCenter, for Step 3, you can change the command to Pip install C:\sttools\centrography\ MeanCenter\numpy-1.13.1-cp27-none-win _ amd64. whl. Step 3: Install numpy Execute the command, Pip install numpy-1.13.1cp27-none-win _ amd64.whl Step 4: Install matplotlib Execute the command, Pip install matplotlib-2.0.2cp27-cp27m-win _ amd64.whl Step 5: Install pyshp Execute the command, Pip install pyshp-1.2.12.tar.gz

Spatiotemporal Centrography and Dispersion

29

Step 6: Install xlrd Execute the command, Pip install xlrd _ with _ formulas-1.0.0-py2.py3-none-any.whl

2.5.2

Software uSage for Spatiotemporal mean CenterS

Once the environment settings for the Python and tools are completed as introduced in Section 2.5.1, the spatiotemporal analytics introduced here can be used. To do so, execute MeanCenter.bat (by double-clicking the file). After the tool is initiated, the user interface would be displayed, such as the one shown in Figure 2.10. In that interface, there are multiple input fields for users to enter specified parametric values. Table 2.1 explains these parameters. Here we use the 2013 burglary events of Portland, Oregon, as an example, with Portland.shp. In addition, the burglary events in 2013 911calls for service are also recorded in 2013burglary.csv file. These files are in the Data subfolder.

FIGURE 2.10 Interface of MeanCenter.bat.

30

Spatiotemporal Analytics

This tool supports multiple time formats. Users can select the format they prefer. In addition to the common year, month, and day, it is possible to just sequence the events into a numerical order, such as 1, 2, 3, …. Please note, however, that the sequential order should be in integers only. For this purpose, data may need to be pre-processed so that temporal data can be expressed as sequential integers. For example, Step 1: Select the input file (e.g., Data\NIJ2013Burglary.csv) Step 2: Select corresponding X field, Y field, and T field as in Figure 2.11 (e.g., x _ coordinate, y _ coordiante, and occ _ date as the x, y, t input values). If weights are to be used, choose the Weight field accordingly. The example in Figure 2.11 does not use weights. Step 3: Set Time Format. This example run uses Year/Month/Day and Week for Time Granularity to calculate spatiotemporal mean center. Step 4: Click Set Path to specify Output File. In this example, we set it as Output/Result.

FIGURE 2.11

Example for using MeanCenter.bat.

Spatiotemporal Centrography and Dispersion

31

Step 5: A base map can be selected for visualization of calculated output. Here we choose Data\portland-police-districts\ Portland-Police _ Districts.shp as the base map. Step 6: (Optional) Check to specify the desired output files and formats. Step 7: Click OK button to proceed with the calculation. Otherwise, click Close button to terminate this. The output from this example is that of Figure 2.6.

2.5.3

Software uSage for StanDarD Spatiotemporal DiStanCe

To calculate and use standard spatiotemporal distance, execute StandardDistance.bat. Once initiated, a window appears as shown in Figure 2.12. The parameters can be set as described in Table 2.1. The steps in using this tool are Step 1: Select the input file. In this example, we use NIJ2013Burglary _ Normal.csv. Please note that this file contains standardized coordinates for the ( x, y, t ) triplets. The standardized coordinates should all be between [0, 1].

FIGURE 2.12 User interface of StandardDistance.bat.

32

Spatiotemporal Analytics

TABLE 2.1 Parameters in MeanCenter.bat Parameter Input point file

X field Y field Time field Weight field (optional) Time format

Time granularity Output file Base map (optional) Create txt file Create shapefile Plot 3D

Description The name of the input file. The file should contain spatial coordinates and time stamps of each spatiotemporal point Name of the x-coordinate Name of the y-coordinate Name of the temporal coordinate (time stamp) Name of the field that contains weights. This is optional, not required The format of time, choose among month/ day/year, month-day-year, year/month/ day, year-month-day, or time series in integers Day, week, month, or year Folder path and name of output file The base map (shapefile format) for display. This is optional Check to create output in txt format (for other applications) Check to create shapefile of the calculated points Check to create 3D visualization of the spatiotemporal centrography

Step 2: Select and define the X field, Y field, and T field. Step 3: This example does not use weights. Step 4: Click Set Path to set the folder path for Output File. Step 5: Select Create txt file to output the calculated values. Select Plot 3D View to output calculated Standard Sphere. Step 6: Select OK to start the calculation. The resulting 3D plot is shown here as Figure 2.9.

2.6

CONCLUDING REMARKS

While the spatiotemporal analytics introduced and discussed in this chapter may not seem complex, they are by no means less useful than their more complicate counterparts. The now widely referenced spatial exploratory data analysis (SEDA) can be enhanced with the spatiotemporal analytics discussed here. This is because the temporal dimension of geographic information is in

Spatiotemporal Centrography and Dispersion

33

fact very important so as not to derive distorted results from spatial analysis. Such distortions can be seen in many spatial analyses when the time stamps of geographic events are not considered. Take, for an example, the hotspots of burglary events, if not considering when they occurred, can be very different from those derived from calculation with temporal data. A final note on the spatiotemporal analytics discussed in this chapter is that, selecting appropriate spatial and temporal units for the analysis can be very important because, like results from spatial analysis at different geographical resolutions, different results from spatiotemporal analysis may be expected from using different spatial and temporal units. Unfortunately, there is not a single criterion that can be used to guide all analyses. Users should carefully consider the implications of using particular spatiotemporal analytical units in their work.

REFERENCES Bachi, R. (1963). Standard distance measures and related methods for spatial analysis. Papers in Regional Science, 10(1), 83–132. Bujdoso, Z., J. Penzes, S. Madaras, and L. David (2015). Analysis of the spatial trends of Romanian tourism between 2000–2012. Geographia Technica, 10(2), 9–19. Burt, J. E., G. M. Barber, and D. L. Rigby (2009). Elementary Statistics for Geographers. Guilford Press. Dickinson, G. C. (1963). Statistical Mapping and the Presentation of Statistics. London: E. Arnold. Lee, J., and D. W. Wong (2001). Statistical Analysis with ArcView GIS. John Wiley & Sons. Levine, N. (1996). Spatial statistics and GIS: Software tools to quantify spatial patterns. Journal of the American Planning Association, 62(3), 381–391. Lu, T. C. (2011). Cross-correlation networks to identify and visualize disease transmission patterns (Doctoral dissertation, University of Washington). Melendez-Pastor, I., E. I. Hernandex, J. Navarro-Pedreno, and I. Gomez (2014). Socioeconomic factors influencing land cover changes in rural areas: The case of the Sierra de Albarracin (Spain). Applied Geography, 52, 34–45. Plane, D. A., and P. A. Rogerson (2015). On tracking and disaggregating center points of population. Annals of the Association of American Geographers, 105(5), 968–986. Scott, L. M., and M. V. Janikas (2010). Spatial Statistics in ArcGIS. Redlands, CA: ESRI Press. Scott, L., and N. Warmerdam (2005). Extend Crime Analysis with ArcGIS Spatial Statistics Tools. Redlands, CA: Esri Press. Thapar, N., D. Wong, and J. Lee (1999). The changing geography of population centroids in the United States between 1970 and 1990. The Geographical Bulletin, 41(1), 45. Wong, D. W. (1999). Several fundamentals in implementing spatial statistics in GIS: Using centrographic measures as examples. Geographic Information Sciences, 5(2), 163–174.

3

Spatiotemporal Quadrat Analytics Zhuo Chen Kent State University

CONTENTS 3.1 3.2 3.3 3.4 3.5

Introduction .................................................................................... 35 Review of Relevant Literature ........................................................ 37 Analytical Methods ........................................................................ 39 Application Example .......................................................................41 Software and Usage ........................................................................ 45 3.5.1 Hardware/Software Requirements ....................................... 45 3.5.2 Software Usage for Spatiotemporal Quadrat Analysis ........ 46 3.6 Concluding Remarks....................................................................... 49 References ............................................................................................... 49

3.1

INTRODUCTION

Spatial data analysis has been a salient feature of quantitative geography and a foundation for GIS. However, the temporal aspect of spatial data should not be ignored when studying geographic objects or events (from here onward simply events). This is because spatial analysis of a set of geographic events often fails to capture the underlying process when time is not considered. In order to understand how the events evolve in space and over time, spatiotemporal (ST) analysis is often needed in order to identify and measure the level of spatiotemporal clustering or dispersion so that the ST processes can be characterized as a random process or not. Similar to the research paradigm of spatial analysis, measuring the level of ST clustering in a set of ST points (that represent geographic events) is generally the very first step in ST analysis. If the ST points are found to be a nonrandom process with statistical significance, the analysis then proceeds to trying to find if there are any underlying socio-economic or environmental factors that help form such a non-random process. Depending on the data type, different exploratory ST techniques can be utilized to analyze ST processes. In particular, analyzing ST point process is particularly important when only information regarding locations (spatial coordinates) and time of events is available. Examples for such DOI: 10.1201/9781003304395-3

35

36

Spatiotemporal Analytics

events include public health incidents, crime incidents, habitats of endangered/threatened species of animals or plant communities, and location of stores, among others. To that end, this chapter specifically focuses on analyzing data that represent geographical events as points in an ST domain. In this case, analysis of ST point processes extends the analysis of spatial point patterns to ST cases by incorporating the time dimension in the data. Research on exploratory analysis of ST point processes typically focuses on assessing whether the distribution of certain attributes of ST points is statistically significantly different from a complete spatiotemporal randomness (CSTR). An intuitive way of analyzing spatial distribution of points is that we can partition the study area into a number of spatial units using one of the geometric shapes (e.g., squares or pentagons). Then we can count the number of such points falling into each partitioned unit. In this case, each such unit is called a quadrat. We then test the distribution of frequencies of points in quadrats against a Poisson distribution to decide if the studied distribution is similar or different from a random pattern as defined by the Poisson distribution. This method is known as quadrat analysis (QA). To become more familiar with QA, readers are recommended to read the discussion of QA in Lee and Wong (2001). Figure 3.1 demonstrates the idea of quadrat analysis that partitions the study area into a set of squares. Similarly, when we want to measure how clustered ST point processes are, we can rely on spatiotemporal quadrat analysis (STQA) which is derived by extending QA to a 3-dimensional volume. That is, the STQA allows time to be the third dimension when partitioning the data space into a volume of ST cubes. Figure 3.2 illustrates how a volume of cubes is drawn in a 3-dimensional space. By counting the

FIGURE 3.1 (a) A set of random points in two dimensions. (b) Same set of random points with 25 quadrats (squares) imposed on the 2-dimensional space.

Spatiotemporal Quadrat Analytics

37

FIGURE 3.2 (a) A set of random spatiotemporal points in three dimensions. (b) Same set of the points with eight quadrats (cubes) imposed on the 3-dimensional space.

points in each quadrat/cube, we obtain the frequencies of points in cubes and use that to statistically analyze the frequency distribution. The rest of this chapter first reviews existing methods of conventional spatial point analysis and quadrat analysis. Following that, we introduce the method of using STQA to analyze ST point processes. Then, an application of STQA with a set of public health incidents is used with a real-world dataset as a demonstration of the analytical method. The performance of the application is then assessed in comparison with those using a distancebased ST analytical method for spatiotemporal analysis: the spatial temporal nearest neighbor index (STNNI), which is discussed in the next chapter. In addition, this chapter provides a walkthrough of using our software to perform STQA. Finally, the chapter concludes with a discussion regarding the advantages and limitations of the STQA in spatiotemporal analysis.

3.2

REVIEW OF RELEVANT LITERATURE

There are three types of methods in conventional spatial point analysis that we can use when no attribute information is available for the points: quadrat methods, distance-based methods, and kernel density estimation (KDE) (Fischer, 2015). ST point analysis follows these methods but integrates temporal information of the data. For example, spatiotemporal kernel density estimation (ST-KDE) methods (Hu et al., 2018; Lee et al., 2017) have been developed to estimate and assess the ST processes of geographic events. It should be noted that, although these ST-KDE methods are capable of estimating probability density and thus enabling the mapping of ST processes, they cannot quantitatively measure the level of ST clustering in

38

Spatiotemporal Analytics

a set of ST points. Distance-based methods for analyzing ST point processes include space–time interaction methods (Knox & Bartlett, 1964; Mantel, 1967) and spatiotemporal k-nearest neighbor tests (Jacquez, 1996; Lee et al., 2020). Furthermore, scalability issues may arise in these methods when the ST point dataset becomes large. Since these methods must consider each possible pair of points in the ST point set (sometimes iteratively), the analysis may become very time-consuming and computationally inefficient. The STQA methods that we discuss here are advanced variants of conventional QA. Conceptually, the STQA method has some common notions similar to those in space–time scan statistics (Kulldorff et al., 2005) and spatiotemporal density-based scan (ST-DBSCAN) (Birant & Kut, 2007; Wang et al., 2006). Like ST-KDE methods, these scan statistics are mainly concerned with the locations and time periods of ST clusters instead of measuring the level of ST clustering. These analytical scans of ST points do not necessarily incorporate attribute information but rather, they focus on the locations and time stamps of the incidents/points. Nevertheless, the scan statistic methods have been widely employed in the research of crime and disease incidents because there have not been alternative methods that are more effective, at least until the STQA we discuss here becomes available. Furthermore, scan statistics allow only the mapping and the identification of ST clusters. They do not support the measurement of overall ST autocorrelation among ST points, which, as mentioned earlier, is the first step of analyzing ST processes. Beyond scan statistics, ST analysis of the data is typically concerned with space–time autocorrelation. There have been proposed extensions of the concept of spatial autocorrelation to ST autocorrelation. Spatial autocorrelation, giving an indication of the level of spatial dependency among data, follows the notion of what is now widely known as the First Law of Geography: “Everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). ST autocorrelation, similarly, indicates the level of non-random global or local space–time point processes. Most methods developed for measuring ST autocorrelation have focused on modifying and extending existing spatial algorithms to working with ST data. While global and local spatial autocorrelation measures have been widely used (Anselin, 1995; Moran, 1950), ST autocorrelation measure has not yet drawn equivalent attention until the mid-1970s when Cliff and Ord’s (1975) seminal approach of incorporating spatial and temporal weights matrix sparked an interest in measuring ST autocorrelation. Examples for these ST autocorrelation statistics include global ST Moran’s Index (Cliff & Ord, 1981; Griffith, 1981; Reynolds & Madden, 1988; Gao et al., 2019;

39

Spatiotemporal Quadrat Analytics

Lee & Li, 2017) and spatiotemporal Getis–Ord Gi and Gi* (Wang & Lam, 2020). ST autocorrelation statistics allow rigorous quantitative measurements and testing of the levels of ST clustering in ST processes. It should be pointed out, however, that the ST methods for measuring ST autocorrelation all require a variable of interest (or, attribute information) associated with data points, such as socio-economic status or demographics or environmental factors of the ST events. As one of the oldest methods for exploring spatial point patterns, QA had been used in studies in many fields, such as those in plant ecology, geography, corrosion science, and geology (Al-Ahmadi et al., 2014; Dale & Fortin, 2014; Greig-smith, 1952; De La Cruz & Gutiérrez, 2008). The origin of QA can be traced back to the 1920s when plant ecologists in Uppsala school used a set of square quadrats of 1-m side-length to investigate the spatial patterns of plant communities (Du Rietz, 1929; Diggle, 2013). In general, quadrat analysis is an area-based method that partitions the study area into a number of subregions, or quadrats, of equal size and shape and uses the counts of the number of points/events that fall in each quadrat in the analysis (Lee & Wong, 2001). Admittedly, quadrat analysis had been criticized for having some drawbacks that include the lack of a theoretically optimized quadrat size and a ubiquitously suitable shape of quadrats for each dataset, the inability to show the relationship among data points, and the failure in recognizing the spatial variations at local scale. This approach, however, still has its advantages in allowing a quick assessment of basic spatial point pattern because of its computational efficiency and not requiring precise locations of the data locations as those in other distance-based methods (Diggle, 2013).

3.3

ANALYTICAL METHODS

STQA can be used to assess whether a point process pattern is statistically different from a random process pattern. This can be done by examining how the frequency distribution of counts of points in individual quadrats collectively corresponds to a process defined by a Poisson distribution of frequencies. If the frequency distribution of counts closely corresponds to a Poisson distribution, the point process pattern is considered to have formed a process pattern (statistically significantly) close to a CSTR. The Poisson process can be defined as follows: In the context of QA, let λ denote intensity parameter, or the average number of ST points in a quadrat among all quadrats. The value of λ depends on the number of quadrats in the analysis, which can be calculated as

λ = 

N M

(3.1)

40

Spatiotemporal Analytics

where N is the number of points, M is the number of quadrats. Explicitly, the probability distribution of having x number of points in a quadrat is:

p( x ) =

e− λ λ x  : x =  0, 1, 2, …. x!

(3.2)

For example, p ( 6 ) = 0.15 would indicate that, under a Poisson process, the probability of having 6 points in a quadrat is 0.15 with a corresponding λ. Recall that λ is the average number of ST points in a quadrat and it should be pre-computed before using the equation. Having this equation, we can compute the expected frequency distribution when the ST points follow a Poisson process: fexpected = {nx  |  x = 0,1,2,…, N }

(3.3)

nx =   p ( x ) × λ

(3.4)

where nx is the expected count of quadrats where the number of points equals x. To test if the frequency distribution of event points follows a Poisson process in ST dimensions, a test statistic that is helpful for this purpose is Chi-squared test:

χ 2 = 



( fobserved −  fexpected )2 fexpected

(3.5)

where fobserved is the observed frequency distribution of ST points using current ST quadrats. The closer χ 2 is to 0, the better the ST points follows a Poisson process.

41

Spatiotemporal Quadrat Analytics

Another or maybe more informative test is the variance-to-mean ratio (VMR) or an index of dispersion: VMR = 

σ2 µ

(3.6)

where σ 2 is the variance, µ denotes the mean of the observed frequency distribution ( fobserved). This index can be used to assess the point process patterns because the value of σ 2 and µ would be the same if the points follow a Poisson process. If variance σ 2 is greater than the mean µ (i.e., VMR > 1), the point process pattern is more clustered than a random process. If σ 2 is less than the mean µ (i.e., VMR < 1), the points are more disperse than a random process pattern in the study region. Many other indices for detecting clustering patterns (in spatial cases) have also been documented in the literature. Examples include index of cluster size/index of clumping (David & Moore, 1954), Morisita’s index (Morisita, 1961), Green’s index (Green, 1966), index of cluster frequency (Douglas, 1960), index of mean crowding, and index of patchiness (Lloyd, 1967). When using QA/STQA to examine the spatial pattern or ST process pattern of geographical events, the most important step is to determine the quadrat size to be used. Conventionally, the quadrat size is chosen arbitrarily (or by experiments) by users based on their experience or whatever the assumptions they formulate. It is believed that the size could affect the results. Thus, it is necessary to examine how different quadrat sizes affect the results of testing the event process patterns in a spatiotemporal context. For the same event process, different quadrat sizes may produce results in completely different outcomes in terms of measuring the pattern of ST processes.

3.4

APPLICATION EXAMPLE

In this section, we present results from using a real-world dataset to show the applicability of the STQA for testing the randomness of event process patterns. The study area is the City of Kaohsiung, a large port city in southern Taiwan. The dataset consists of dengue fever cases in the city for four consecutive years (2004–8). Figure 3.3 shows the spatial distribution of these cases in each year. There are significantly more cases in 2006–7 cycle with 766 reported cases when compared with the lower numbers of

42

FIGURE 3.3

Spatiotemporal Analytics

Dengue fever cases in Kaohsiung, Taiwan, from 2004 to 2008.

reported cases in 2004–5 (38 cases) cycle, in 2005–6 (90 cases) cycle, and in 2007–8 (131 cases) cycle. The spatial distribution of dengue fever cases is not able to tell us whether a spatial cluster in a region is also a temporal cluster. Therefore, as shown in Figure 3.4, when we look at the cases for each year and used days of the year as the unit time scale, the ST process of dengue fever cases gives us

43

Spatiotemporal Quadrat Analytics

FIGURE 3.4 2008.

2004-2005

2005-2006

2006-2007

2007-2008

ST distribution of dengue fever cases in Kaohsiung from 2004 to

another perspective. It indicates that these cases were occurring over a long period of time, mostly starting from July to January of the subsequent year. A volume of 19 different quadrats and sliced cubes were used to test the spatial and spatiotemporal process patterns of the dengue fever cases in this region using VMR. The results are shown in Figures 3.5 and 3.6. Both results show that year 2004–5 has the smallest VMR for all different cubic/ quadrat sizes with smaller cubic/quadrat sizes yielding VMR near 1. QA has relatively larger VMRs than those for STQA. This indicates that dengue fever cases in this year is generally randomly distributed in space and time, but slightly clustered in space. Spatiotemporally, year 2006–7 has the highest VMR for all cubic sizes except for the test with 16 × 16 × 16 slices where year 2007–8 has the highest (Figure 3.6). We wish to note, however, that for quadrat analysis, data in year 2006–7 is significantly higher than cases in any other year. Nevertheless, year 2006–7 can be seen as the year with the most spatial and spatiotemporally clustered distribution of dengue fever in Kaohsiung. In general, when the cubic/quadrat size is large or the cubic/quadrat slice is small, VMRs are generally large but fluctuating.

44

Spatiotemporal Analytics

FIGURE 3.5 VMR tests for different sizes of quadrat on the dataset of dengue fever in Kaohsiung from 2004 to 2008.

FIGURE 3.6 VMR tests for different cubic sizes on the dataset of dengue fever in Kaohsiung from 2004 to 2008.

As the sizes of cube/quadrat decrease, the corresponding VMRs decrease yet become more stable. This is because larger space–time cubes are more likely to contain more cases. For comparison, we utilized STNNI to test the spatiotemporal distributions for the dataset of dengue fever in Kaohsiung from 2004 to 2008 (Table 3.1). Similarly, tests on cases from years 2005–6, 2006–7, and 2006–8 showed that these cases are spatiotemporally clustered as their STNNIs are smaller than 1. Years 2005–6 and 2007–8 had very similar degrees of clustering, and year 2006–7 shows the strongest clustering in space and time. Surprisingly, disagreement between the results from ST quadrat analysis and that from STNNI is found for the data in year 2004–5. The level of clustering as assessed by using STNNI is slightly higher than 1 in year 2004–5, indicating that the cases had a random but slightly disperse distribution (Table 3.1). However, VMR in ST quadrat analysis on this data shows a slightly clustered distribution in space and time (Figure 3.5).

45

Spatiotemporal Quadrat Analytics

TABLE 3.1 STNNI Tests on the Dataset of Dengue Fever in Kaohsiung from 2004 to 2008

3.5

Period

STNNI

Z Score

2004–5 2005–6 2006–7 2007–8

1.03069537 0.451995511 0.382793987 0.446386789

0.08445 −1.50772 −1.69811 −1.52315

SOFTWARE AND USAGE

A handy Python tool is provided for readers to use the ST quadrat analysis introduced in this chapter. In this section, we introduce the procedures for downloading, installing, and using the tool. For software code and example data sets, please email [email protected] with name, affiliation information, a copy of purchase receipt and allow up to a week to receive links to download Chapter03.zip. Once downloaded, the .zip file can be uncompressed or restored to any folder on your computer hard drive, such as C:\sttools\quadrat.

3.5.1

harDware/Software requirementS

Before running the tool using data in the uncompressed folder, it is necessary to set some environmental settings on the computer. The settings are all available in the same subfolder in the uncompressed folder. It basically involves installing Python and the dependent modules. Do not panic if you are not familiar with such procedures. We made the installation much easier with only two steps, and users do not need to write a single code: Step 1: Install Python 3.9 Download Python 3.9 (https://www.python.org/downloads/ release/python-390/). Based on your computer system, choose an appropriate source file to download. For example, Windows users can download Windows x86-64 executable installer or Windows x86-64 web-based installer. Double-click your downloaded installer and install it. Be sure to check Add Python 3.9 to PATH when installing as it ensures that the system knows where you installed your Python program. Step 2: Install Python requirements Double-click install-reqs.bat to automatically download and install all the required modules for this tool. After the process is done, press any key to exit the command window.

46

Spatiotemporal Analytics

FIGURE 3.7

3.5.2

Interface of Spatiotemporal Quadrat Analysis tool.

Software uSage for Spatiotemporal quaDrat analySiS

Once Python and the environment has been set up as introduced in Section 3.1, we can run the analytical tool for Spatiotemporal Quadrat Analysis. To do so, double-click run.bat in the uncompressed folder. This will open a window with a user interface like Figure 3.7. The interface contains multiple input fields for users to enter specified parametric values. Explanation of these parameters can be found in Table 3.2. To better illustrate the usage of the software, we use the 2006–7 dengue fever cases in Kaohsiung, Taiwan, as an example. The records of 2006–7 dengue fever cases are stored in DF2006–2007.shp file. You can find the file in the Data subfolder. Please note, the temporal coordinate (Z field) needs to be pre-processed so the values should be sequential numbers of any time span (i.e., temporal unit) such as a day, a month, or 40 hours. For example, in 2006–7 dengue fever data, we pre-processed the date of the cases and use day to represent the temporal coordinates. The date of the cases ranged from 7/1/2006 to 3/17/2007, so after pre-processing we have days ranging from 180 to 436, correspondingly. To use Spatiotemporal Quadrat Analysis tool: Step 1: Click Open file button to select the input file (e.g., …/Data/ DF2006–2007.shp)

47

Spatiotemporal Quadrat Analytics

TABLE 3.2 Parameters in Spatiotemporal Quadrat Analysis Tool Parameter/Function Input point file

X field Y field Z(time) field Number of quadrats (cubes) Plot points Run Close

Description The name of the input file. The file should contain spatial coordinates and time stamps of each spatiotemporal point. It supports both csv file and ESRI’s shapefile. Name of the x-coordinate. For shapefile, this is optional because it can automatically read its x-coordinate. Name of the y-coordinate. For shapefile, this is optional because it can automatically read its y-coordinate. Name of the temporal coordinate (time stamp). Ranges from 2 × 2 × 2 (6) quadrats to 20 × 20 × 20 (8,000) quadrats. Create 3-dimensional visualization of the spatiotemporal points as well as the quadrats drawn around them. Run the application with specified parameters. Close the tool.

FIGURE 3.8 Example of using Spatiotemporal Quadrat Analysis tool.

Step 2: Select corresponding X field, Y field, and Z field as in Figure 3.8. (Here for shapefile, X field and Y field can be optional.) As discussed above, we use Days in Z field to represent temporal coordinate.

48

Spatiotemporal Analytics

FIGURE 3.9

FIGURE 3.10

Interactive figure generated by clicking Plot points button.

Results of the Spatiotemporal Quadrat Analysis tool.

Step 3: Select the Number of quadrats used in the analysis. In this example, we used 3 × 3 × 3 quadrats. Step 4: (Optional) Click Plot points button to visualize the points and the 3 × 3 × 3 quadrats. This will open a new interactive window for the plot (see Figure 3.9).

Spatiotemporal Quadrat Analytics

49

Step 5: Click Run button to proceed with the calculation. Otherwise, click Close button to terminate this. The output will be shown in a new popup window as in Figure 3.10.

3.6

CONCLUDING REMARKS

VMR is chosen as the indicator of a spatiotemporal pattern in the analytical tool because it is a simple and straightforward method. However, limitations to this method should be considered before employing it in your studies. The result of the experiments discussed in this chapter confirms that, in the context of spatiotemporal analysis, index of dispersion test (i.e., VMR) is still more suitable for clustered patterns against other alternatives such as STNNI, but may be inappropriate for regular patterns (Diggle, 1979). Hurlbert (1990) had also suggested that a possible spatial arrangement of non-random patterns could produce VMR of 1. This is because spatial information among individual events is not considered in the statistics, nor the spatial information of the quadrats is considered in VMR test. Given this consideration, this index is still an effective starting point to test if the exact distribution is unknown and how it departs from homogeneous Poisson distribution. One final note on ST quadrat analysis is that the method discussed here cannot account for time-delayed events and it is still subject to the same boundary issue as the spatial quadrat analysis is (i.e., does not account for events outside of the study area but very close to the boundary of study area). In summary, STQA is an effective tool in exploratory space–time data analysis for the assessment of a spatiotemporal process pattern. It serves as the first step for research on the spatiotemporal trends of events. Although STQA tends to ignore individual spatiotemporal relationships of pointpairs that distance-based ST analytical method accounts for, STQA saves computational resources and gives a quick overview of the spatiotemporal process patterns so that research can proceed to subsequent steps of trying to identify factors influencing the ST process patterns.

REFERENCES Al-Ahmadi, K., Al-Amri, A., & See, L. (2014). A spatial statistical analysis of the occurrence of earthquakes along the Red Sea floor spreading: clusters of seismicity. Arabian Journal of Geosciences, 7(7), 2893–2904. Anselin, L. (1995). Local indicators of spatial association—LISA. Geographical Analysis, 27(2), 93–115. https://doi.org/10.1111/j.1538-4632.1995.tb00338.x. Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatialtemporal data. Data and Knowledge Engineering, 60(1), 208–221. https:// doi.org/10.1016/j.datak.2006.01.013.

50

Spatiotemporal Analytics

Cliff, A. D., & Ord, J. K. (1975). Space-time modelling with an application to regional forecasting. Transactions of the Institute of British Geographers, (64), 119–128. Cliff, A. D., & Ord, J. K. (1981). Spatial and temporal analysis: Autocorrelation in space and time. Quantitative Geography: A British View, 104–110. Dale, M. R., & Fortin, M. J. (2014). Spatial Analysis: A Guide for Ecologists. Cambridge University Press. David, F. N., & Moore, P. G. (1954). Notes on contagious distributions in plant populations. Annals of Botany, 18(1), 47–53. https://doi.org/10.1093/oxfordjournals.aob.a083381. De La Cruz, J. L., & Gutiérrez, M. A. (2008). Spatial statistics of pitting corrosion patterning: Quadrat counts and the non-homogeneous Poisson process. Corrosion Science, 50(5), 1441–1448. Diggle, P. J. (1979). On parameter estimation and goodness-of-fit testing for spatial point patterns. Biometrics, 35(1), 87. https://doi.org/10.2307/2529938. Diggle, P. J. (2013). Statistical Analysis of Spatial and Spatio Temporal Point Patterns (3rd ed.). Chapman and Hall/CRC. Douglas, J. B. (1960). Clustering and aggregation. Source: Sankhyā: The Indian Journal of Statistics, Series B, 37. Du Rietz, G. E. (1929). The fundamental units of vegetation. Proceedings of the International Congress of Plant Sciences, 1(1), 623–627. Fischer, M. M. (2015). Spatial analysis in geography. International Encyclopedia of the Social & Behavioral Sciences: Second Edition, 94–99. https://doi. org/10.1016/B978-0-08-097086-8.72054–X. Gao, Y., Cheng, J., Meng, H., & Liu, Y. (2019). Measuring spatio-temporal autocorrelation in time series data of collective human mobility. Geo-Spatial Information Science, 22(3), 166–173. https://doi.org/10.1080/10095020.20 19.1643609. Green, R. H. (1966). Measurement of non-randomness in spatial distributions. Researches on Population Ecology, 8(1), 1–7. https://doi.org/10.1007/ BF02524740. Greig-Smith, P. (1952). The use of random and contiguous quadrats in the study of the structure of plant communities. Annals of Botany, 67, 293–316. Griffith, D. W. (1981). Interdependence of space and time: Numerical and interpretative considerations. In Dynamic Spatial Models (pp. 258–287). https:// www.researchgate.net/publication/266241770. Hu, Y., Wang, F., Guin, C., & Zhu, H. (2018). A spatio-temporal kernel density estimation framework for predictive crime hotspot mapping and evaluation. Applied Geography, 99, 89–97. https://doi.org/10.1016/j.apgeog.2018.08.001. Hurlbert, S. H. (1990). Spatial distribution of the montane unicorn. Oikos, 58(3), 257. https://doi.org/10.2307/3545216. Jacquez, G. M. (1996). A k nearest neighbour test for space-time interaction. Statistics in Medicine, 15(18), 1935–1949. https://doi.org/10.1002/ (sici)1097-0258(19960930)15:183.0.co;2-i. Knox, E. G., & Bartlett, M. S. (1964). The detection of space-time interactions. Source: Journal of the Royal Statistical Society. Series C (Applied Statistics), 13. https://about.jstor.org/terms.

Spatiotemporal Quadrat Analytics

51

Kulldorff, M., Heffernan, R., Hartman, J., Assunção, R., & Mostashari, F. (2005). A space–time permutation scan statistic for disease outbreak detection. PLoS Medicine, 2(3), e59. https://doi.org/10.1371/journal.pmed.0020059. Lee, J., Gong, J., & Li, S. (2017). Exploring spatiotemporal clusters based on extended kernel estimation methods. International Journal of Geographical Information Science, 31(6), 1154–1177. https://doi.org/10.1080/13658816. 2017.1287371. Lee, J., & Li, S. (2017). Extending Moran’s index for measuring spatiotemporal clustering of geographic events. Geographical Analysis, 49(1), 36–57. https://doi.org/10.1111/gean.12106. Lee, J., Li, S., Wang, S., Wang, J., & Li, J. (2020). Spatio-temporal nearest neighbor index for measuring space-time clustering among geographic events. Papers in Applied Geography, 1–14. Lee, J., & Wong, D. W. S. (2001). GIS and Statistical Analysis with ArcView. John Wiley & Sons. Lloyd, M. (1967). Mean crowding. The Journal of Animal Ecology. https://about. jstor.org/terms. Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209–220. Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika. https://doi.org/10.2307/2332142. Morisita, M. (1961). Measuring of dispersion of individuals and analysis of the distributional patterns. Japanese Journal of Ecology, 11(6), 252. https://doi. org/10.18960/seitai.11.6_252_3. Reynolds, K. M., & Madden, L. V. (1988). Analysis of epidemics using spatiotemporal autocorrelation. Phytopathology, 78(2), 240–246. https://doi. org/10.1094/phyto–78–240. Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography. https://doi.org/10.2307/143141. Wang, Z., & Lam, N. S. N. (2020). Extending Getis–Ord statistics to account for local space–time autocorrelation in spatial panel data. The Professional Geographer, 1–10. https://doi.org/10.1080/00330124.2019.1709215. Wang, M., Wang, A., & Li, A. (2006). Mining spatial-temporal clusters from geodatabases. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4093 LNAI, 263–270. https://doi.org/10.1007/11811305_29.

4

Spatiotemporal Nearest Neighbor Analytics Qingsong Liu Shenzhen e-Traffic Technology

Jay Lee Kent State University

CONTENTS 4.1 4.2 4.3

Introduction .....................................................................................53 Nearest Neighbor Index ................................................................. 54 Spatiotemporal Nearest Neighbor Index .........................................55 4.3.1 The Time Dimension.......................................................... 56 4.3.2 Space–Time Nearest Neighbor Index ................................ 58 4.3.3 STNNI Application ............................................................ 60 4.3.4 Some Final Remarks ...........................................................61 4.4 Software and Usage ....................................................................... 62 4.4.1 Installation and Uninstallation ........................................... 63 4.4.1.1 Install QGIS and NNI Plugin .............................. 63 4.4.1.2 Uninstall .............................................................. 65 4.4.2 Run STNNI and NNI Scripts ............................................. 65 4.4.2.1 Space–Time Nearest Neighborhood Index ......... 65 4.4.2.2 Spatial Nearest Neighborhood Index .................. 69 4.5 Concluding Remarks...................................................................... 73 References ................................................................................................74 Appendix ..................................................................................................76

4.1

INTRODUCTION

Patterns of distribution of geographic phenomena, such as plants and animal populations, are the basis for researchers to understand the dynamics of geographical objects or events (henceforth geographic events) in the context of environmental influences. Nearest neighborhood index (NNI) is one of the earliest and easiest indices to describe a spatial distribution of geographic events. It uses the average nearest neighbor distance between events to assess how, and to what degree, the distribution of events or objects is clustered or dispersed in space. Such assessment of the degree DOI: 10.1201/9781003304395-4

53

54

Spatiotemporal Analytics

of clustered or dispersion of a spatial pattern is then inferred and used to understand the mechanisms behind the spatial relationships among geographic events and their interactions with the surrounding environment. Like NNI, many methods have been developed to detect and measure the level of spatial clustering in a set of geographical events. These methods include coefficients and indices that measure local and global spatial autocorrelation exists in a set of geographic events. It should be noted, however, that these commonly used methods tend to focus mainly on information in the spatial dimension, while the information along the temporal dimension is often neglected. The absence of temporal information prevents the index from observing changes in the degree of aggregation or dispersion of events or objects over time. Even the calculation results under the lack of time information may be misleading because the temporal distribution of geographic events such as human activities and animal/plant habitats can be inherently heterogeneous. Thus, adding time dimension information is needed for most of the currently commonly used indexes. In this chapter, we start with the simplest NNI to introduce the recent contributions made by scholars who extended such index to including time dimension.

4.2

NEAREST NEIGHBOR INDEX

The nearest neighbor index (NNI) was introduced by two botanists in the 1950s (Clark and Evans 1954). Initially, it was mainly used to describe and test the spatial pattern of plant locations in fieldwork, comparing the observed event/object distribution with random distribution. Through this comparison process, the authors primarily answered whether plants of interest were randomly distributed throughout the study area. They showed that a plant community could form (1) a cluster pattern when there was attraction among locations of plants and (2) a regular pattern when there existed inhibition (i.e., “competition”) among plants. In the following text, we use Complete Spatial Randomness (CSR) to describe the random distribution in space. Intuitively, CSR means that the events or objects are independently and randomly distributed within the study area and are homogeneous. In other words, there are no areas where events or objects are considered more (or less) likely to happen, and there is also no relationship between the occurrence probability of any two events or objects. It is worth noting that, in practice, researchers often use Homogeneous Poisson processes to represent CSR. The calculation of NNI is relatively straightforward. NNI provides a concise measure of point pattern with a single value. The single output value is a ratio that simply divides the observed mean nearest neighbor distance by the expected mean nearest neighbor distance under CSR condition. It is defined as

Spatiotemporal Nearest Neighbor Analytics

R=

robs rE

55

(4.1)

where R is the result of NNI, robs denotes the average distance from the selected events or objects to their nearest neighbors, and rE denotes the expected mean distance between nearest neighbors, assuming the population is distributed randomly. As mentioned above, the random distribution in space has a form of Poisson distribution if, in a population of N individuals that have a specified density 1 ρ , the mean distance, rE , can be shown to have the value equal to . 2 ρ The detailed derivation can be found in Clark and Evans (1954). The NNI measures the spatial distribution from 0 (clustered pattern) to 1 (randomness pattern) to 2.15 (dispersion/uniform pattern). Note that there MUST be more than 30 events or objects in the distribution to obtain a meaningful NNI.

4.3

SPATIOTEMPORAL NEAREST NEIGHBOR INDEX

Given that NNI considers only the locations of points, its usefulness is limited in many situations. To incorporate the temporal dimension in data into the calculation, Lee et al. (2020) extended NNI to spatiotemporal nearest neighbor index (STNNI). To this end, the need for incorporating temporal data is apparent in that, for example, human activities and animal/plant habitats are inherently unevenly distributed across space and over time because environmental conditions are not the same everywhere and will not remain unchanged over time. The understanding of the associations between environmental conditions and human/animal/plant dynamics cannot be separated from time. In addition, STNNI provides a powerful tool for evaluating the point pattern of events without needing attribute information of the events or objects. On measuring the level of spatiotemporal autocorrelation as an assessment of how a set of geographic events are clustered in space and over time, we often face the limitations of needing interval- or ratio-scale attribute information associated with the studied events to use existing methods for that purpose. Without attribute information for the events being studied, the conventional tools, such as spatiotemporal Moran’s index (Lee and Li 2017), become less practical. Given this limitation, commonly used workarounds have been to (1) partition the study area into mutually

56

Spatiotemporal Analytics

exclusive but collectively exhaustive spatial units, such as a grid of square units, administrative units, or census units; (2) count the number of events or objects inside each spatial unit and use that frequency as an attribute associated with the unit; and (3) estimate the level of spatiotemporal autocorrelation based on the frequencies associated with the spatial units. Such point-to-area aggregation processes have apparent shortcomings in that they over-generalize how events or objects distribute in space and over time. This shortcoming would lead to biased estimation for the level of spatiotemporal autocorrelation among the events or objects. In other words, whenever possible, the assessment of the level of spatiotemporal clustering should be based on individual events, not aggregated events or object clusters. The traditional spatial nearest neighbor index can evaluate the clusterness, randomness, or dispersion of the point distribution without obtaining information about the attributes connected to the spatial points. Thus, the STNNI proposed by Lee et al. (2020) can be of significant use for this purpose. Lee et al. (2020) note that the extension of NNI to STNNI is based on mathematical reasoning so as to allow fast calculation. A series of experiments for estimating the maximum value of STNNI can be found in Lee et al.’s work. Section 4.4 demonstrates the tools we have developed for STNNI and validated them with actual Boston 911 robbery data. It should be noted that STNNI is used to assess the level of how events cluster in space and over time. It must be noted, however, that this is not to identify spatiotemporal clusters such as those that can be found by space–time scan (http://satscan.org) or space–time cubes (http://www.esri.com).

4.3.1

the time DimenSion

Geographic research often starts with a speculation of the patterns of how and where certain geographic events occurred and if any socio-economic and/or environmental conditions influence such spatial patterns. Spatial patterns at different times, when linked, form spatial processes. Knowing spatial patterns without the coupled understanding of the consequential spatial processes may lead to biased interpretations of how geographic events evolve. The spatial processes of interesting geographic events can be explored and often quantified concerning how such events or objects are clustered in space and over time. Investigating the spatial processes of how a set of geographic events evolve helps to find ways to promote the influences of underlying factors or to turn around undesirable tends. Current literature in GIScience and spatial analysis has provided several methods for detecting and measuring the levels of spatial clustering in a set of geographic events. These include a variety of spatial autocorrelation

Spatiotemporal Nearest Neighbor Analytics

57

coefficients, including Moran’s index (Moran 1950), Geary’s ratio (Geary 1954), G-Statistics (Getis and Ord 2010), and others for measuring the spatial dependency among geographic events. In addition, Local Indicators of Spatial Association (LISA) (Anselin 1995) and kernel density estimation (Silverman 1986; Scott 2015) have been proposed and are now widely used to detect the location of spatial clusters. With these methods, however, only spatial data are used, but not the temporal data that describe when the events or objects occurred and how they change over time. Beyond these methods, there is only a handful of analytics that are capable of detecting and measuring spatiotemporal clusters in data, for example, Knox and Mantel test (Siemiatycki 1978; Jacquez 1996), spatiotemporal and spatial autocorrelation index (Cliff and Ord 1975; Lee and Li 2017), spatiotemporal kernel density estimation (Li et al. 2017), space-and-time scans (Kulldorff et al. 1998), and space–time cubes (ESRI, Inc., ArcMap 10.x and later). Global spatiotemporal autocorrelation indexes measure the level that geographic events of similar properties cluster with one another. Local spatiotemporal autocorrelation indexes assess how spatiotemporal dependency varies in space and over time in different parts of the entire set of events or objects. The methods and tools mentioned here assume that spatial and temporal units can be integrated by using what is known as a multiplicative principle, which means the variation in spatial dimensions and variation along temporal dimension can be multiplied to be considered together. Furthermore, the mutual influences between objects or events being analyzed are in most cases assumed to be equal when measuring the level of spatiotemporal autocorrelation (clustering) among them. To reflect the real-world situation better, this particular treatment can be easily extended by adding varying spatial weights to the objects and events being studied. Although not explicitly discussed here, it should be noted that an alternative way to integrating spatial and temporal data would be to standardize both data before integration. More discussion on this can be seen in Lee and Li (2017) and Lee et al. (2017). As mentioned earlier, to provide another alternative, we have a widely used index, known as the (spatial) nearest neighbor index (NNI) (Clark and Evans 1954; Lee and Wong 2001), to also incorporate the temporal data associated with the geographic events being analyzed. The extended index allows analysts to explore and visualize a set of spatiotemporally distributed geographic events without attribute information to see if the events or objects distribute more (or less) than would be by random chances. This chapter discusses the logical and mathematical reasoning of the extended index, or spatiotemporal nearest neighbor index (STNNI). We also discuss a series of experiments to test and demonstrate the use of STNNI.

58

Spatiotemporal Analytics

4.3.2

SpaCe–time neareSt neighbor inDex

A set of space–time events that are represented by points, e.g., criminal activities, animal/plant habitats, or disease incidents, each have a location and a time when it occurs or when it was observed. Spatiotemporal nearest neighbor index (STNNI) is an index that measures the degree of space– time (spatiotemporal) clustering or dispersion. With this consideration, a 3-dimensional space (x, y, t) is defined to have an (x, y) plane for the spatial dimension and t for the temporal dimension. Together they represent the volume v, which is the studied space. Similar to NNI, STNNI assumes that: • geographic events being analyzed occur within a finite volume, defined by an ( x , y ) plane and a temporal dimension t and • geographic events can occur anywhere in that finite volume. With v defined by the ( x , y,t ), a density of n geographic events being anan lyzed is the number of points per unit volume, or . This density varies v from one part of the entire volume to other parts, where v is the volume of the study space. Mean density is the average of densities, denoted as ρ . For sparsely distributed geographic events, the Poisson distribution gives the probability of n events found in the studied space, p(n) =

m ne− m n!

(4.2)

where m is the mean number of events per unit volume or m = vρ . According to Lee et al.’s (2020) derivation, rE =

0.55396 3 ρ

var ( rE ) =

(4.3)

0.04054 3

ρ2

(4.4)

The index values are all positive and asymmetric with respect to rE . To test the statistical significance of the index value under the normality assumption, we consider the chances for any discrepancy between the calculated index value for a set of spatiotemporal geographic events (robs ) and the expected index value when the set is randomly distributed follow a normal

Spatiotemporal Nearest Neighbor Analytics

59

distribution (rE ). This is the same logic as testing the statistical significance of a (spatial) nearest neighbor index value. The Z-score is given by

Z=

robs − rE var ( rE )

(4.5)

The above two equations allow STNNI to test if any index value is statistically significant from randomness. The statistical significance levels are calculated based on the normality assumption that the spatiotemporal distribution of a set of geographic objects or events is considered as a sample of a normal distribution of all possible spatiotemporal distributions given these objects or events. As the two equations define statistical significance levels with equations built from standard errors and the expected index values, there is no need for generating artificially controlled datasets and conducting simulations of those to establish probability density functions for supporting the testing of statistical significance levels. Finally, the spatiotemporal nearest neighbor index presented here inherits the same constraints and limitations as its spatial version. These include the edge effects when there are objects or events that are immediately outside (or outside but in close proximity) of the defined boundary for a studied area of the set of points being considered. If that is the case, the calculated STNNI values would certainly suffer or be distorted. In addition, STNNI, as discussed here, is limited to dealing with first-lag spatiotemporal neighborhoods. Higher-ordered neighborhoods will be dealt with by spatiotemporal Ripley’s K function, which will be discussed in another chapter. Edge effects arise because we assume an unbounded area when we derive the first-order nearest neighborhood distribution. Still, the observed nearest neighborhood distances are calculated from experimental points set in a predefined study area. Therefore, calculating the nearest neighbor distance of the points close to the boundary could be incorrect because it is possible that the true nearest neighbor is a point just outside the study area. Edge effects lead to overestimation (positive bias) of the mean nearest neighborhood distance. Nearest neighbor distances are only calculated for points in the predefined study area, but the points that are outside but close to the study area can still be involved in the nearest distance calculation process. If we set a large enough buffer around the study area, then it is possible for us to eliminate the boundary effect. However, it runs the risk of losing more points to the computation, leading to an increase in the generalization variance of the final model.

60

4.3.3

Spatiotemporal Analytics

Stnni appliCation

STNNI, as a tool for measuring the level of spatiotemporal clustering, is indeed effective. It is especially useful when measuring the degree to which events or objects are related to each other in space and over time. STNNI also enables the comparison among sets of geographic events in terms of their spatiotemporal clustering patterns. Here, two sets of disease incidents are used to demonstrate such capability of STNNI. Using STNNI, it is possible to find out if the disease has become more or less spatiotemporally clustered, which helps the development of intervention programs for this public health issue. The first set is the reported dengue fever cases in Kaohsiung, Taiwan, from 2004 to 2008. Dengue fever is a disease that is caused by the virus carried and spread by mosquitos. Dengue fever typically prevails in subtropical climate regions. It is most prevailing in summer or warm and wet seasons. In Taiwan, dengue fever cases typically begin to appear in late Spring or early Summer and recede when Winter arrives. In Taiwan, due to the subtropical climate pattern, dengue fever cases would continue to appear until January or early February of the subsequent year. When infected with dengue, patients have a high fever. This fever is fatal to infant patients. Made available by Kaohsiung Health District, data show that there are different numbers of cases in different years. Knowing the level of clustering of the disease, the Health District can refer to the climate patterns of the corresponding years so that it can develop effective intervention programs to help reduce the spread of this disease. As shown in Table 4.1, dengue fever cases in the 2004–2005 cycle were the most spatiotemporally clustered, even though not statistically significant. As shown in Table 4.1, STNNI reflects the levels of clustering more than what spatial NNI does. Given the calculated values for dengue fever cases over the 3-year cycles as listed in Table 4.1, it may be desirable to visualize the space–time distributions of the dengue fever cases such that the index values can be linked to visualized space–time distributions. Figure 4.1 shows the three cycles. TABLE 4.1 STNNI Values of Dengue Fever Incidents in Kaohsiung, Taiwan, 2004–2007 STNNI 2004–5 2005–6 2006–7

1.0307 0.4520 0.3828

Z-Score 0.0845 −1.5077 −1.6981

SNNI 0.6345 0.3191 0.3664

Z-Score −4.3106 −12.3571 −33.5467

Spatiotemporal Nearest Neighbor Analytics

61

(a) 2004-2005

(b) 2005-2006

(c) 2006-2007

FIGURE 4.1 fever cases.

Space–time distributions of three yearly cycles of reported dengue

Note that the vertical axes in three graphics represent time, with earlier time at the bottom and progressing upward. Also, note that the numbers of dengue fever cases are 38 for the 2004–2005 cycle, 90 for the 2005–2006 cycle, and 766 for the 2006–2007 cycle.

4.3.4

Some final remarkS

As the first step in analyzing the spatial and temporal patterns of a set of geographic events, it is necessary to assess how clustered or dispersed the

62

Spatiotemporal Analytics

events or objects are. If identified as clustered, the study could explore potential underlying factors that may affect the forming of such spatiotemporal processes. To that end, STNNI provides a way to measure the level of spatial and temporal clustering, simultaneously considered, in a set of space–time events. The calculation of STNNI is simple and fast using the equations provided here. With the knowledge of the minimum and maximum values of STNNI and the associated Z values for statistical significance, STNNI is indeed a beneficial quick-and-simple way to assess the level of spatiotemporal clustering of the studied events. STNNI also enables us to compare such clustering levels among multiple sets of events. Again, STNNI, as presented here, was derived from a direct extension of how classic NNI was calculated by adding the temporal dimension. The statistical significance of the calculated values for STNNI is ready for statistical testing significance, so no simulations are needed. STNNI values are affected by the way the study volume is defined, the temporal units used, and the coordinate system used to define spatial locations of the events or objects and are limited to considering only the first spatiotemporal lag among events or objects. STNNI advances NNI by incorporating the time dimension in the event data. It is useful because studying spatial patterns no longer satisfies the need for understanding the spatiotemporal processes of a set of geographic events. Considering both spatial dimension and temporal dimension of geographic events simultaneously allows us to explore spatiotemporal processes, not just spatial patterns. We wish to note that STNNI is subject to the same limitations that NNI has, namely, the issue of properly defining the extent of study space–time volume. This is similar to the boundary problems in spatial NNI. This requires analysts to consider it carefully in order to have meaningful STNNI calculated. Furthermore, STNNI, as proposed here, is based on spatiotemporal distances between first-order neighbors, just like spatial NNI is based on first-order spatial neighbors. If necessary, distances between secondorder (or higher-order) neighbors can be used to avoid potential bias. The discussion of STNNI for second- or higher-order neighbors is beyond the scope of this chapter. Readers are welcome to communicate with authors to explore further on these topics.

4.4

SOFTWARE AND USAGE

In order to facilitate the understanding and use of STNNI, we implemented the program used to calculate this index in the QGIS environment. For comparison purposes, we also implemented the traditional NNI.

Spatiotemporal Nearest Neighbor Analytics

4.4.1

63

inStallation anD uninStallation

The following demonstration shows how to install the plugin into QGIS. If you do not already have QGIS installed, you can first download and install the latest version of QGIS software at https://qgis.org/en/site/forusers/download.html. 4.4.1.1 Install QGIS and NNI Plugin The STNNI plugin can be downloaded by emailing spatiotemporal. [email protected] with name, affiliation information, and a copy of the receipt for purchasing this book as Chapter04.zip. Once downloaded, uncompress Chapter04.zip to a workspace folder. 1. After opening the above link in your browser, click on the Code dropdown menu on the right and click Download Zip. You need to save the zip file and unzip it into the workspace of your PC. See Figure 4.2. 2. Start QGIS software. Click Settings → User Profiles → Open Active Profile Folder. See Figure 4.3. 3. Copy the unzipped STNNI plugin into your Active Profile Folder\ processing\scripts. 4. Back to QGIS, add the scripts into the QGIS Processing Toolbox. The Processing Toolbox is the main panel of the processing GUI and the one that you are more likely to use in your daily GIS processing work. It shows the list of all available geoprocessing algorithms, and custom models and scripts can be added to extend the algorithm set. In some cases, the Processing Toolbox is not activated by default, and you need to activate the Toolbox by opening the plugin manager.

FIGURE 4.2 Download STNNI and NNI plugin.

64

FIGURE 4.3 Open QGIS Active Profile Folder.

FIGURE 4.4 QGIS plugin manager.



Spatiotemporal Analytics

Spatiotemporal Nearest Neighbor Analytics

65

FIGURE 4.5 Add scripts to toolbox.



4.4.1.2 Uninstall You could delete the two scripts from the Toolbox by right clicking Delete Script…. However, when you click the Delete Script button, the script files in the disk will also be deleted. Thus, if you are developing your own scripts, please backup your scripts before deleting it from Toolbox.

4.4.2

run Stnni anD nni SCriptS

4.4.2.1 Space–Time Nearest Neighborhood Index Like most of the GIS software toolboxes, the STNNI toolbox interacts with the user in the form of dialog boxes. Once you double-click on the

66

FIGURE 4.6 Import the two scripts.

FIGURE 4.7 Added scripts in the toolbox.

Spatiotemporal Analytics

Spatiotemporal Nearest Neighbor Analytics

67

FIGURE 4.8 STNNI toolbox.

Space–time Nearest Neighborhood Index algorithm, a dialog below is shown (Figure 4.8). Before you click Run, you need to set nine parameters. We briefly explain them below. Input layer: This parameter corresponds to a point layer of our interest. In addition to the coordinate of each event, the layer also needs to contain a time field that represents when each point incident occurred. Datetime info field: This parameter corresponds to the time field from the Input point layer. This field only accepts text or string type. By default, it can deal with three forms of time strings: a. yyyy-MM-dd hh:mm:ss (e.g., 2017-07-05 12:54:23) b. yyyy-MM-dd (e.g., 2017-02-12) c. hh:mm:ss (e.g., 23:28:56) Datetime format: If your time string does not fall within the three time formats above, you can define your own form of the time field. For more details about how to set the formation of the time text, please check Table A1 in the Appendix.

68

Spatiotemporal Analytics

Define distance unit in meters: This parameter defines the resolution of the space. The purpose of this is to make the calculation more flexible. For example, if there are two points on the map with a distance of 1,000 m, and when we set this parameter to 100 m, the distance between the two points in the calculation is 1,000/100 = 10. Number of time units and calculation time unit: These two parameters are combined together to set the units of time. This parameter is consistent with the unit set for the distance above. Base layer: This parameter sets the x–y boundary on the map. This boundary will be panned up and down according to time units, forming a 3-dimensional space. Use on-the-fly map projection in calculation: The purpose of this parameter is for the consistency of the map projection between layers. In the STNNI calculation, the projection coordinate system is required for all the input layers. In QGIS, we can take advantage of the feature that newly added layers are automatically converted to the coordinate system of the map project. In this way, our requirements for the coordinate system are met without changing the coordinate system of the original layer. Statistics [optional]: Set the output result. We recommend, of course, to manually convert the latitude and longitude coordinate system to the projection coordinate system before using the data. And most importantly, make sure the point layer and the base boundary layer have the same coordinate system. In the following, we’ll take Boston’s 911 call for robbery during January 2017 to test our STNNI. The sample dataset is located in the folder named exampledata (Figure 4.9). The folder contains the robbery incidents of Boston in January, March, May, July, September, November in 2017, and the Boston Boundary (City _ of _ Boston _ Boundary _ prj.shp). First, add the 911-call of robbery point layer (robbery201701.shp) and a boundary layer (City _ of _ Boston _ Boundary _ prj.shp)

FIGURE 4.9

Example data and code.

Spatiotemporal Nearest Neighbor Analytics

69

FIGURE 4.10 QGIS interface.

to QGIS. If you add this data to a blank map project, then in the bottom right corner of the QGIS interface, you will see that the default projection of layers is set to “EPSG:26986” (NAD83 / Massachusetts Island) (Figure 4.10). In the Processing Toolbox panel, search for STNNI tool by Script→ Distance Indicators→ Space–time Nearest Neighborhood Index (Figure 4.11). Double-click Space–time Nearest Neighborhood Index. Set input layer as robbery201701.shp and base layer as City _ of _ Boston _ Boundary _ prj.shp. You also need to set the output file to your workspace folder. For other parameters, we will use the default settings for now (Figure 4.12). After setting all the parameters, click the Run button. The output result is shown in the Results Viewer panel (Figure 4.13). When you click the File path: the result will open in your browser (Figure 4.14). For the case of our demonstration, the STNNI index is 1.84, and the Z-score is 2.32. 4.4.2.2 Spatial Nearest Neighborhood Index For traditional (spatial) NNI, which does not have a time dimension, the interface is even more straightforward. Before running NNI, you need to set up four parameters (Figure 4.15):

70

Spatiotemporal Analytics

FIGURE 4.11 Added scripts.

FIGURE 4.12

STNNI parameter setting.

Spatiotemporal Nearest Neighbor Analytics

71

FIGURE 4.13 STNNI results.

FIGURE 4.14 STNNI results in html format.

Input layer: This parameter corresponds to a point layer of our interest. In addition to the coordinate of each event, the layer also needs to contain a time field that represents when each point incident occurred. Define distance unit in meters: This parameter defines the resolution of the space. The purpose of this is to make the calculation more flexible. For example, if there are two points on the map with distance of 1,000 m,

72

Spatiotemporal Analytics

FIGURE 4.15 NNI interface.

and when we set this parameter to 100 m, the distance between the two points in the calculation is 1,000/100 = 10. Projection used in the calculation: This parameter defines the coordinate system to be used in the calculation. It can be different from the input layer’s coordinate system but need to be projected. Base layer: This parameter sets the x–y boundary on the map. This boundary will be panned up and down according to time units, forming a 3-dimensional space. Statistics [optional]: Set the output result. We recommend, of course, that the latitude and longitude coordinate system be converted to the projection coordinate system before using the data. And most importantly, make sure the point layer and the boundary layer have the same coordinate system. To examine the difference between the STNNI and NNI results, we can still use the robbery data of the Boston 911 call as an example. The basic NNI operation steps are the same as STNNI. We perform the calculation for 6 months of data for each time. The results of all calculations are combined in Table 4.2. The first two columns show the STNNI results and their corresponding Z-scores. And the last two columns show the NNI and their corresponding Z-score.

73

Spatiotemporal Nearest Neighbor Analytics

TABLE 4.2 Results of STNNI and NNI Using 911-Call of Robbery in Boston 2017.01 2017.03 2017.05 2017.07 2017.09 2017.11

4.5

STNNI

Z-Score

NNI

1.8435 1.8374 2.1788 1.9561 1.8994 2.3014

2.3207 2.3039 3.2432 2.6305 2.4746 3.5805

0.6404 0.582 0.7178 0.6409 0.6371 0.7507

Z-Score −8.34032029 −7.01674628 −5.34435346 −6.69561611 −7.50995821 −4.72175392

CONCLUDING REMARKS

As the first step in the analysis of the spatial and temporal patterns of a set of geographic events, it is necessary to assess how clustered or how dispersed the events or objects are. If identified as clustered, the study would proceed with exploring potential underlying factors that may affect the forming of such spatiotemporal processes. To that end, STNNI provides a way to measure the level of spatial and temporal clustering, simultaneously considered, that a set of space–time events may have. The calculation of STNNI as discussed in this chapter is fast and straightforward using the equations provided here. With the knowledge of the minimum and maximum values of STNNI and the associated Z-score values for statistical significance, STNNI is indeed an advantageous quickand-simple way to assess the level of spatiotemporal clustering of the studied events. STNNI also enables us to compare such clustering levels among multiple sets of events. Again, STNNI, as presented here, has been derived from a direct extension of how classic NNI is calculated by adding the temporal dimension. The statistical significance of the calculated values for STNNI is ready for statistical significance testing, so no simulations are needed. STNNI values are affected by the way the study volume is defined, the temporal units used, and the coordinate system used to define spatial locations of the objects or events and are limited to considering only the first spatiotemporal lag among objects and events. STNNI advances NNI by incorporating the time dimension in the event data. It is useful because studying spatial patterns no longer satisfies the need for understanding the spatiotemporal processes of a set of geographic events. Considering both spatial dimension and temporal dimension of geographic events simultaneously allows us to explore spatiotemporal processes, not just spatial patterns. We wish to note that STNNI is subject to similar limitations that NNI has. Namely, the definition of study volume (as opposed to study area in

74

Spatiotemporal Analytics

two dimensions) faces similar issues as that of the boundary problems in NNI. These limitations require analysts to consider carefully how the study volume can be properly defined in order to have meaningful STNNI calculated. Furthermore, STNNI is based on spatiotemporal distances between first-order neighbors, just like NNI is based on first-order spatial neighbors. If necessary, distances between second-order (or higher-order) neighbors can be used to avoid potential bias.

REFERENCES Anselin, L. 1995. Local indicators of spatial association-LISA. Geographical Analysis 27 (2): 93–115. https://doi.org/10.1111/j.1538-4632.1995. tb00338.x. Clark, P. J., and F. C. Evans. 1954. Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35 (4): 445–453. https://doi. org/10.2307/1931034. Cliff, A. D., and J. K. Ord. 1975. Space-time modelling with an application to regional forecasting. Transactions of the Institute of British Geographers 64 (March): 119. https://doi.org/10.2307/621469. Geary, R. C. 1954. The contiguity ratio and statistical mapping. The Incorporated Statistician 5 (3): 115. https://doi.org/10.2307/2986645. Getis, A., and J. K. Ord. 2010. The analysis of spatial association by use of distance statistics. Geographical Analysis 24 (3): 189–206. https://doi. org/10.1111/j.1538-4632.1992.tb00261.x. Jacquez, G. M. 1996. A k nearest neighbour test for space-time interaction. Statistics in Medicine 15 (18): 1935–1949. https://doi.org/10.1002/ (SICI)1097-0258(19960930)15:183.0.CO;2–I. Kulldorff, M., W. F. Athas, E. J. Feurer, B. A. Miller, and C. R. Key. 1998. Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos, New Mexico. American Journal of Public Health 88 (9): 1377– 1380. https://doi.org/10.2105/AJPH.88.9.1377. Lee, J., and S. Li. 2017. Extending Moran’s index for measuring spatiotemporal clustering of geographic events. Geographical Analysis 49 (1): 36–57. https://doi.org/10.1111/gean.12106. Lee, J., J. Gong, and S. W. Li. 2017. Exploring spatiotemporal clusters based on extended kernel estimation methods. International Journal of Geographical Information Science, 31 (6): 1154–1177. https://dx.doi.org/10.1080/136588 16.2017.1287371. Lee, J., S. Li, S. Wang, J. Wang, and J. Li. 2020. Spatio-temporal nearest neighbor index for measuring space-time clustering among geographic events. Papers in Applied Geography, 1–14. https://doi.org/10.1080/23754931.2020.1810112. Lee, J., and D. W. S. Wong. 2001. Statistical Analysis with ArcView GIS. New York: Wiley. Li, S., X. Ye, J. Lee, J. Gong, and C. Qin. 2017. Spatiotemporal analysis of housing prices in China: A big data perspective. Applied Spatial Analysis and Policy 10 (3): 421–433. https://doi.org/10.1007/s12061-016-9185–3.

Spatiotemporal Nearest Neighbor Analytics

75

Moran, P. A. P. 1950. Notes on continuous stochastic phenomena. Biometrika 37 (1/2): 17. https://doi.org/10.2307/2332142. Scott, D. W. 2015. Multivariate Density Estimation : Theory, Practice, and Visualization. Hoboken, NJ: Wiley. Siemiatycki, J. 1978. Mantel’s space-time clustering statistic: Computing higher moments and a comparison of various data transforms. Journal of Statistical Computation and Simulation 7 (1): 13–31. https://doi.org/10.1080/0094965 7808810206.a. Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. London: Chapman & Hall.

76

Spatiotemporal Analytics

APPENDIX

TABLE A1 Time Format String Expression d dd ddd dddd M MM MMM MMMM yy yyyy h hh H HH m mm s ss z

zzz t

Output The day as a number without a leading zero (1–31) The day as a number with a leading zero (01–31) The abbreviated localized day name (e.g., “Mon” to “Sun”). Uses the system locale to localize the name The long localized day name (e.g., “Monday” to “Sunday”). Uses the system locale to localize the name The month as a number without a leading zero (1–12) The month as a number with a leading zero (01–12) The abbreviated localized month name (e.g., “Jan” to “Dec”). Uses the system locale to localize the name The long localized month name (e.g., “January” to “December”). Uses the system locale to localize the name The year as a two-digit number (00–99) The year as a four-digit number. If the year is negative, a minus sign is prepended, making five characters The hour without a leading zero (0–23 or 1–12 if AM/PM display) The hour with a leading zero (00–23 or 01–12 if AM/PM display) The hour without a leading zero (0–23, even with AM/PM display) The hour with a leading zero (00–23, even with AM/PM display) The minute without a leading zero (0–59) The minute with a leading zero (00–59) The whole second, without any leading zero (0–59) The whole second, with a leading zero where applicable (00–59) The fractional part of the second, to go after a decimal point, without trailing zeroes (0–999). Thus “s.z” reports the seconds to full available (millisecond) precision without trailing zeroes The fractional part of the second, to millisecond precision, including trailing zeroes where applicable (000–999) The time zone (for example, “CEST”)

5

Spatiotemporal Ripley’s K and L Functions Jay Lee Kent State University

CONTENTS 5.1 Introduction .................................................................................... 77 5.2 Concept and Methods ..................................................................... 79 5.2.1 Spatial Ripley’s K Function.................................................. 80 5.2.2 Spatiotemporal Ripley’s K Function .................................... 82 5.3 An Example Application ................................................................ 83 References ............................................................................................... 88

5.1

INTRODUCTION

As discussed in earlier chapters, the initial steps in analyzing a spatial process are often to assess if the geographic objects or events forming the process are clustered, dispersed, or showing seemingly no patterns or trends. This has been the focus of many exploratory studies (for example, as discussed in Bailey and Gatrell 1995; Lee and Wong 2001; Delmelle 2009; Hohl et al. 2017). If a spatial pattern is determined to be clustered or dispersed, the logical next step would be to explore what may be the underlying influencing factors, regardless of them being environmental or socioeconomic (or even political) factors. Knowing this would likely facilitate the formation of policies to either promote benefits that the spatial process brings about or to slow down or stop the negative effects caused by the evolving patterns or trends. Also introduced and discussed in earlier chapters are spatiotemporal quadrat analysis and spatiotemporal nearest neighbor analysis. These two spatiotemporal (ST) analytics are conceptually and operationally straightforward to understand and to apply in real-world studies. The ST quadrant analysis compares the observed (real) ST distribution to an ST distribution based on its corresponding Poisson distribution (see Lee and Wong 2001). It depends on the density of events in the space–time domain and how events are apart or close to each other. In turn, ST nearest neighbor analysis uses the observed average space–time distance to compare with that of a

DOI: 10.1201/9781003304395-577

78

Spatiotemporal Analytics

a

b e f

d

c

d

e g (a)

FIGURE 5.1

b

a

c

h

f

h g (b)

(a) First-order and (b) second-order neighboring relationships.

theoretically constructed random pattern and process (Diggle et al. 1976; Lee and Wong 2001; Hohl et al. 2016). Both analytics are easy to understand and to use. Both provide hints as to whether a space–time distribution of events can be characterized as clustered, dispersed, or not. These two ST analytics are, however, not without shortcomings. For spatiotemporal quadrat analysis, any attempts to account for spatially (or spatiotemporally) varying attribute values would encounter difficulties since the analytic was designed for using only locations of spatial or spatiotemporal points. In addition, let’s first consider only the spatial case of the nearest neighbor analysis, which uses only distances as a parameter in assessing the level of clustering or dispersion in a point distribution, such as the study by Arbia et al. (2010). While this analytic is useful for that purpose, it does not work well when points distribute as in Figure 5.1b. The diagram in Figure 5.1a shows a distribution of points that can be characterized as a moderately dispersed pattern overall, or even close to a random pattern. On the right, or Figure 5.1b, as opposed to Figure 5.1a, each point has a close neighbor, forming small groups. The overall pattern of the two-point groups is dispersed, but a typical (spatial) nearest neighbor analysis would judge the overall pattern in Figure 5.1b as clustered because the observed average nearest neighbor distance is short. The neighboring relationship in the distribution on the right is a typical case of a secondorder nearest neighbor relationship. Ripley’s K function (Ripley 1976; Dixon 2002) is an effective alternative to deal with a pattern like the one in Figure 5.1b. The K function indicates whether the observed points form a random, clustered, or a dispersed/regular pattern, even in second- or higher orders. K function can be used to work with a set of points distributed in an n-dimensional space to estimate the second-order property (i.e., variance) embedded in the data. The computation processes of values of K function account for the numbers and distances between points. It quantifies the degree to which the observed pattern deviates from a random pattern of the same number of points in

Spatiotemporal Ripley’s K and L Functions

79

the same n-dimensional domain. K function is a second-order analysis of point patterns first proposed and used in two-dimensional space (Haase 1995; Dixon 2013). In other words, second-order effects may be due to the spatial dependency, often measured as spatial autocorrelation (see Diggle and Chetwynd 1995). Based on this a number of studies were carried out using spatiotemporal Ripley’s K function (Clements et al. 2012; Dixon 2013; Mollalo et al. 2013).

5.2 CONCEPT AND METHODS To assess a distribution of geographic objects or events as points for its level of clustering or dispersion, Ripley’s K function uses an iterative approach. At each iteration, a series of circles of gradually increased radii are placed and centered at each event in turn to count the numbers of other events falling inside the circles. At the end of each iteration, each event will be associated with a series of event counts that each corresponds to a radius. Calculating event counts for each radius, an array of average event counts can be derived by summing all event counts and dividing the sum by the number of events. Incorporating event intensity and any consideration for edge effects, a K function value can be calculated for each radius. Together, the K function values can be plotted as a K function curve. In Figure 5.2, three hypothetic distributions are shown. In Figure 5.2a, which approximates a uniform distribution, a circle with a radius is shown to demonstrate the iterative process of placing circles centered at each event to count the number of other events falling inside the circle. Figure 5.2b approximates a second-order clustering of events, and Figure 5.2c shows an approximated dispersed spatial distribution of events. Given the three distributions, three K function curves can be plotted, as shown in Figure 5.2d. In the plot, the colors of three curves correspond to the colors of event points in Figure 5.2. As can be seen in Figure 5.2d, the blue curve fluctuates within a small range of K function values as the radii increased through the iterations (i.e., toward the right side of the chart). This suggests that the distribution in Figure 5.2a approximates a uniform pattern. With the red curve in Figure 5.2d, the distribution of red events in Figure 5.2b is shown to be a clustered pattern with an increasing trend in short radii. The red curve turns to a downward trend after reaching a peak that highlights the radius beyond which neighborhood clustering is no longer clearly detectable. Finally, the green curve in Figure 5.1d suggests the green events distribute in a dispersed manner. Together, the three hypothetic distributions demonstrate the different distribution types as well as how the shapes of Ripley’s K function help us to assess the spatial patterns of these distributions.

80

Spatiotemporal Analytics

Clustered

Uniform

(a)

(b)

Dispersed

(c)

K(r)

Search volume size (d) FIGURE 5.2 Hypothetic spatial distributions of events. (a) A uniform pattern, (b) a clustered pattern, (c) a dispersed pattern, and (d) K function curves corresponding to (a), (b), and (c).

5.2.1

Spatial ripley’S K funCtion

Given a set of points (each representing an event) S that contains N points, an intensity of points λ can be calculated from the n-dimensional domain and the N. A spatial Ripley’s K function value K ( d ) is defined by a distance d with λ of S (first-order property) as K (d ) = E (d ) λ  

(5.1)

For this equation, dividing the total number of points N by the area of a circle with a radius of d (or π d 2) results in an estimated intensity λ . Values of K function can be calculated for a series of d, from a small distance to a larger distance with a pre-defined increment by centering a circle of radius d on each point and counting the number of other points in

81

Spatiotemporal Ripley’s K and L Functions

the event set that fall inside the circle. In this manner, one such point count could be found for each point at the radius d. For N points, there would be N such point counts at each d. The K function at radius d would be the average of all such counts when the radius is d. K function is a cumulative distribution of observed point events S with increasing distances. It is expected that K ( d ) = π d 2 if the point distribution conforms to complete spatial randomness (CSR); If K ( d ) > π d 2, point events are likely clustered. Finally, K ( d ) > π d 2 when the points exhibit a regular or dispersed pattern. At each iteration, a circle of radius d is placed at each point to count the number of other points falling inside the circle. When all points have one such count, an average of all such counts is calculated and is associated with the radius d. This procedure is repeated after the radius is increased by a pre-defined increment until a pre-determined d is reached. Specifically, values of K function can be calculated as A K (d ) =  2  N 

∑∑( I (d ) w ) h

i

ij

ij

(5.2)

j

where dij is the distance between events i and j; A is the size of the study region; and the term wij is a factor to correct for edge effects. On these effects, interested readers can refer to Haase (1995) and Yamada and Rogerson (2003). The K function is potentially biased as edge effects arise when circles intersect the boundary of the study region. The wij may be set to 0 if such a circle intersects with any part of the boundary of the study region to remove the influence of that. Other approaches to dealing with the edge effect can be seen in Yamada and Rogerson (2003). Finally, I ij ( dij ) is an indicator function that defines neighboring relationship between events i and j. In other words, I ij ( dij ) = 1 when event j falls inside the circle that centers at event i when the circle has a radius of dij . It is defined as I h ( dij ) = 1 if dij ≤ h, 0 otherwise. The value of K function increases as distance h becomes larger. Operationally, values of a Ripley’s K function can be transformed to values of a Ripley’s L function, which is L (h) =

K (h) −h π

(5.3)

82

Spatiotemporal Analytics

where L ( h ) = 0 if the spatial pattern of events conforms to a random pattern; L ( h ) > 0 if clustered; and L ( h ) < 0 for a regularly distributed event pattern. Note that L (h) =

5.2.2

K (h) −h π

(5.4)

( L ( h ) + h )2 = K ( h ) π

(5.5)

K (h) = π ( L (h) + h)

(5.6)

2

Spatiotemporal ripley’S K funCtion

Now, let’s extend the spatial Ripley’s K function to a spatiotemporal Ripley’s K function. In working with events that are defined by both space and time, let’s assume each event has a location ( x , y ) in space at a certain time t. In a way similar to the spatial Ripley’s K function, a spatiotemporal Ripley’s K function can be formulated as (following Bailey and Gatrell 1995):

(

L*R K ( h,t ) =  2  ∑ ∑ I h ,t ( dij ,tij ) wij  n 

)

(5.7)

where tij is the time difference separating events i and j; dij is the distance between events i and j; and L is the area of the study region and R is the temporal duration of the study period. I h ,t ( dij , tij ) = 1 if dij ≤ h and tij ≤ ( t , 0 ) A larger ( h, t ) interval would contribute to an increase in the spatiotemporal K function. If there is no space–time interaction, K ( h, t ) = ( K ( h ) * K ( t )). Testing for space–time dependence can be done by calculating K ( h, t ) − K ( h ) * K ( t ), as suggested by Gabriel (2014). Similar to the spatial Ripley’s K function, the spatiotemporal Ripley’s K function can be transformed to a spatiotemporal Ripley’s L function as L ( h,t ) =

K ( h,t ) −h πt

(5.8)

Spatiotemporal Ripley’s K and L Functions

83

where L ( h, t ) = 0 suggests the space–time distribution is under complete space–time randomness (CSTR); L ( h, t ) > 0 hints a clustered pattern; and L ( h, t ) < 0 hints a regular pattern of space–time events. Finally for our discussion here, please note that Ripley’s K function, regardless of using it for assessing a spatial pattern or a spatiotemporal pattern, should not be thought of as a single index for each spatial (or spatiotemporal) pattern. Instead, Ripley’s K function gives us a series of K function values and a series of L function values that form a K trend (curve) and an L trend (curve). Together, these two trends (curves) allow us to assess the event pattern based on the shape of the curve.

5.3

AN EXAMPLE APPLICATION

A computer program for calculating values of both (1) spatial Ripley’s K function and L function and (2) spatiotemporal Ripley’s K function and L function and the dataset used in this example application are compressed in Chapter 05.zip. This file can be accessed and downloaded by emailing [email protected] with name, affiliation information, and a copy of receipt for purchasing this book. This computer program calculates values of the functions and outputs them as plain ASCII text files. For generating graphics using calculated K and L function values, we suggest a simple Excel spreadsheet can be used to produce charts depicting K and L values in spatial Ripley’s K function. For spatiotemporal K and L function values, either a GIS software that interpolates ( x ,  y, t ,  K ) or ( x ,  y, t ,  L ) to a mesh/grid would work well. In this example, we use Excel to plot the spatial Ripley’s K values and L values into curves. We use Surfer (Golden Software, http://www.goldensoftware.com) to plot the spatiotemporal Ripley’s K and L values. To demonstrate the use of the accompanying computer program for calculating spatial and spatiotemporal K values and L values, an example data file is provided with the computer program. The example data file contains 1,408 reported cases of dengue fevers in a city in south Taiwan. There are four columns in the data file. In the data file, each row is a reported case, and each row has four items. In the data file, the first column has identification for reported cases, the second column has x-coordinates, the third column has y-coordinates, and the fourth column has the times t when cases were reported. Please see Figure 5.3 for the first few rows of the data file. From uncompressing the downloaded zip file, you should see an example data file (as shown in Figure 5.3), AllCases _ dbf _ 4fields. txt, and an executable computer program, KFunctionApp.exe. This

84

Spatiotemporal Analytics

FIGURE 5.3 Example data file for reported dengue fever.

program is only executable in a Windows-based operating system. To run this program, simply double-click the program to initiate it. Please note that a warning message may appear if your desktop computer has the real-time virus scan turned on. To go past this message, please click at the More Info link and then click at the Run Anyway. For those who are keen to be sure the program is safe to run, we suggest you scan it with any virus protection software first. Figure 5.4 shows the link and the button mentioned here. Once the Run Anyway button is clicked, the K Function Application appears. Note that there are three text fields for users to browse the computer for input data file, for naming two output files after the computation is completed. As shown in Figure 5.5, the Input File Browse allows users to navigate to a folder on the computer hard drive and select the data file. The two text boxes are for users to name the output files, one for output from calculating spatial Ripley’s K function and another for output from calculating spatiotemporal Ripley’s K and L functions. Each of the two text boxes for output

Spatiotemporal Ripley’s K and L Functions

85

FIGURE 5.4 Possible warning messages.

FIGURE 5.5 User interface of K function application program.

file names has an on–off check box to indicate if the program should calculate one or both. In the user interface, an on–off check box for Time Restrict allows users to indicate whether the calculation of spatiotemporal Ripley’s K and L function values should be limited to one-directional consideration between events occurring at different time periods. This option was added

86

Spatiotemporal Analytics

250000000

K (h)

200000000

150000000 100000000 50000000 0

0

2000

4000

6000

8000

10000

2500 2000 1500 1000 500 0 -500 -1000 -1500 -2000 -2500 12000

L(h)

Spatial Ripley's K and L Functions

Bandwidth h K(h)

FIGURE 5.6

L(h)

Curves of K(h) and L(h) values from the example data file.

for situations where users can assume earlier events may influence or affect later events but not vice versa. When this check box is not checked, all events, regardless of their chronical orders, would be included in the calculation. Below the Time Restrict check box, a sliding bar allows users to indicate the number of iterations to be executed in the calculation. In the case of spatial Ripley’s K and L functions, the number of increments refers to the number of times the radius of the search circle was increased. Similarly, the number of increments in the case of spatiotemporal Ripley’s K and L functions would be the number of steps the search cylinder is enlarged, by simultaneously increasing the radius of the base circle and the height (time) of the cylinder. Finally, the Process button can be clicked to start the computation of function values. Note that the bottom of the user interface reports the progress of the computation. Upon seeing the message of Status: Done analyzing function(s), the user interface can be closed. Now, let’s take a look at the output files. For 50 increments, Spatial. txt has three columns: Radius (bandwidth), K ( h ), and L ( h ). This file can be imported into any program for charting the values. In the example shown in Figure 5.6, Excel was used to produce this chart, which has two vertical axes and a horizontal axis of bandwidth (50 increments of radii). The left axis marks the K ( h ) values and the right axis marks the L ( h ) values. In Figure 5.6, it can be seen that K ( h ) values, being an accumulating value series, increase faster when radii are small and increase much slower for larger radii. For L ( h ) values, please reference them to the 0 mark on the right axis since L ( h ) > 0 hints the clustering trend of the events when radii are small, but the pattern of events is more disperse when larger radii are used, especially when L ( h ) < 0.

Spatiotemporal Ripley’s K and L Functions

87

FIGURE 5.7 (a) Vertical axis marks K(h) values, radii of the search cylinders are long the y-axis, and the temporal units are along the x-axis. (b) Vertical axis marks the L(h) values, radii of the cylinder base are along the y-axis, and the temporal units are along the x-axis.

From spatiotemporal Ripley’s K and L functions, the output file spatiotemporal.txt contains five columns: (1) IDs, (2) x-coordinates, (3) y-coordinates, (4) K ( h ) values, and (5) L ( h ) values. In the example here, we used Surfer software by Golden Software, Inc. (www.goldensoftware.com) to produce the wireframe three-dimensional diagrams shown in Figure 5.7. Again, upon viewing the diagrams, one should keep in mind that the surface shows the trend of K ( h ) and L ( h ) values as radii (of the cylinder bases) and time increase. For K ( h ) values, pay attention to how they increase over time and through the enlargement of the bases of the search cylinders. For L ( h ) values, make a reference to the 0 values for

88

Spatiotemporal Analytics

clustering/dispersion of the events at different radii and time units. To that end, it is interesting to note that a dent can be seen in the first increment of the L ( h ) diagram. This may suggest a reduction of clustering even with the first increment applied. In turn, it is very likely that a strong second-order clustering is embedded in the spatiotemporal clustering of the events (i.e., dengue fever cases in this example). One final note on the computer program is that it is designed for users to experiment with different numbers of increments in the calculation. For finer illustration of how K ( h ) and L ( h ) values progress, more increments can be used. In this implementation, the maximum number of increments is 100 and the minimum is 20. It should be expected that the calculation takes longer time to complete when the number of increments is set to be higher.

REFERENCES Arbia, G., G. Espa, D. Guiliani, and A. Mazzitelli 2010. Detecting the existence of space-time clustering of firms. Regional Science and Urban Economics 40, 311–323. Bailey, T. and Q. Gatrell 1995. Interactive Spatial Data Analysis. Edinburgh Gate, England: Pearson Education Limited. Clements, R. A., F. P. Schoenberg, and A. Veen 2012. Evaluation of space-time point process models using super-thinning. Environmetrics 23, 606–616. Delmelle, E. 2009. Point pattern analysis. International Encyclopedia of Human Geography 8: 204–211. Diggle, P. J., J. Besag, and J. T. Gleaves 1976. Statistical analysis of spatial point patterns by means of distance methods. Biometrics 32, 659–667. Diggle P. J. and A. G. Chetwynd 1995. Second-order analysis of space-time clustering. Statistical Methods in Medical Research 4, 124–136. Dixon, P. M. 2002. Ripley’s K function. Encyclopedia of Environmetrics, 3. Wiley Online Library, pp. 1796–1803. Dixon, P. M. 2013. Ripley’s K function. Encyclopedia of Environmetrics. New York: John Wiley & Sons. Gabriel, E. 2014. Estimating second-order characteristics of inhomogeneous spatio-temporal point processes. Methodology and Computing in Applied Probability 16(2), 411–431. Haase, P. 1995. Spatial pattern analysis in ecology based on Ripley’s K-function: Introduction and methods of edge correction. Journal of Vegetation Science 6(4), 575–582. Hohl, A., E. Delmelle, W. Tang, and I. Casas 2016. Accelerating the discovery of space-time patterns of infectious diseases using parallel computing. Spatial and Spatio-temporal Epidemiology 19, 10–20. Hohl, A., M. Zheng, W. Tang, E. Delmelle, and I. Casas 2017. Spatiotemporal point pattern analysis using Ripley’s K function. Geospatial Data Science Techniques and Applications, 155–176.

Spatiotemporal Ripley’s K and L Functions

89

Lee, J. and D. Wong 2001. Statistical Analysis of Geographical Information with ArcView GIS. New York: John Wiley & Sons. Mollalo, A., A. Alimohammadi, M. R. Shirzadi, and M. R. Malek 2013. Geographic information system-based analysis of the spatial and spatiotemporal distribution of Zoonotic Cutaneous Leishmaniasis in Golestan Province, North-East of Iran. Zoonoses and Public Health 62, 18–28. Ripley, B. D. 1976. The second-order analysis of stationary point processes. Journal of Applied Probability 13(2), 255–266. Yamada, I. and P. A. Rogerson 2003. An empirical comparison of edge effect correction methods applied to K-function analysis. Geographical Analysis 35(2): 97–109.

6

Spatiotemporal Autocorrelation Analytics Shengwen Li, Xuyang Cheng, Bo Wan, and Junfang Gong China University of Geosciences

Jay Lee Kent State University

CONTENTS 6.1 Introduction .................................................................................... 92 6.2 Methodology ................................................................................... 93 6.2.1 Spatial Autocorrelation Moran’s I ........................................ 93 6.2.2 Temporal Autocorrelation .................................................... 94 6.2.2.1 Global Temporal Moran’s It .................................... 95 6.2.2.2 Localized Temporal Moran’s I ............................... 96 6.2.3 Spatiotemporal Autocorrelation (Temporal and Spatial Moran’s I) ............................................................................ 97 6.2.3.1 Global Spatiotemporal Moran’s Index .................... 97 6.2.3.2 Localized Spatiotemporal Moran’s Index .............. 99 6.3 Example Application .................................................................... 100 6.3.1 Disease Patterns ................................................................. 100 6.3.2 Simulation Experiments......................................................105 6.3.2.1 Monte Carlo Simulation Process ...........................106 6.3.2.2 Sensitivity and Temporal and Spatial Trend Analysis .................................................................106 6.4 Software and User Manual ............................................................107 6.4.1 Moran’s I Tool User Manual ...............................................107 6.4.2 Demonstration of Software Results ....................................109 6.4.3 Supplementary Explanation ................................................ 111 References .............................................................................................. 111

DOI: 10.1201/9781003304395-691

92

6.1

Spatiotemporal Analytics

INTRODUCTION

Tobler’s First Law of Geography (Tobler 1970) states, everything is related to everything else, but near things are more related than distant things. If the law stands, when studying geographic events (or objects), we often need to evaluate how the spatial distribution of the events changes in time to understand why/how the changes occur, how the changes may evolve, and their relationship with the surrounding environmental conditions (Cliff and Ord 1969, 1973, 1975). Spatial autocorrelation coefficients as an effective tool for describing the levels of spatial clustering of geographic phenomena among events have attracted much attention over the last few decades because of their ability to do so quantitatively (Lopez and Chasco 2007). In order to capture the spatial autocorrelation in a set of events to see whether the spatial pattern of the events shows a clustering trend or a dispersion trend that is statistically significantly different from a random pattern of the events, Moran’s Index (Moran 1950; Lee and Wong 2001) can be used to measure the degree of clustering, dispersion, or randomness in a set of events in the study area. This coefficient has been widely used to evaluate the spatial autocorrelation between the attribute values of geographic objects or events. In addition to space, time is another dimension that should not be overlooked when trying to understand geographic phenomena. The autocorrelation between geographic events along the time dimension is helpful for in-depth analysis of how and why geographic events cluster, disperse, or not. In order to substitute temporal autocorrelation, Bertazzon (2003) proposed a solution for estimating a value of the spatiotemporal Moran’s Index. This method is based on the spatiotemporal specification of the spatial interaction model. In Betazzon’s work, the values of space–time Moran Index can be estimated by solving a set of eight simultaneous equations. The coefficient values of the simultaneous equations are set as a matrix to illustrate the dependence of space and time (Martin and Oeppen 1975). Temporal autocorrelation and spatial autocorrelation together can form the spatiotemporal autocorrelation of geographic events. With the increase in the number of different sensing devices that create geographic data, the volumes of spatiotemporal data have seen a blowout growth. Because of this significant increase, spatiotemporal analysis and modeling of geographic data is now ever more important than before. Since the Moran’s Index has a good scalability, it makes sense to extend it to measure the temporal and spatial autocorrelation of geographic events simultaneously. The rest of this chapter is organized as follows. In Section 6.2, we present the basic framework of the global and local spatiotemporal Moran’s Index I st and I sti and offer the key details of technical implementation. In Section 6.3, an application example of the spatiotemporal autocorrelation

93

Spatiotemporal Autocorrelation Analytics

model is reported. In Section 6.4, we introduce the software and provide instructions for using the software. Finally, we summarize this discussion with some concluding remarks.

6.2 6.2.1

METHODOLOGY Spatial autoCorrelation moran’S I

Moran’s Index (Moran 1950) is arguably the most popular index for measuring global spatial autocorrelation, which is defined as I=

n ∑ ∑ wij ( ai − a ) ( a j − a )

(6.1)

∑ ∑ wij ∑ ( ai − a )( ai − a )

where n is the number of geographic events (or objects); ai and a j  are the values of some attribute of geographic events i and j; and a is the mean value of such attribute of all events; wij is the weights assigned to event i and event j, which are usually called spatial weights. The spatial weight matrix, denoted W , is an important tool to quantify the spatial relationship between all pairs of locations among the events. It is usually expressed as an n-th order non-negative matrix W as shown below:  w11   w21  W= ⋅ ⋅   ⋅  wn1

w12 w22 ⋅ ⋅ ⋅ wn 2

w13 w23 ⋅ ⋅ ⋅ wn 3

w1n w2n ⋅ ⋅ ⋅ wnn

       

(6.2)

where n is the number of locations in space; wij represents the spatial relationship between location i and location j. The larger the weight value, the stronger the spatial dependence between locations, and vice versa. In spatial measurement, the definition of “distance” can be broad, including but not limited to geographic proximity or Euclidean distance, but also the proximity of cooperative relationships in the economic sense or the closeness of interpersonal relationships in the sociological sense. Here, we use the Euclidean distance in space as the “distance” between geographic events i and j, which is:

94

Spatiotemporal Analytics

d ij =

( xi − x j )2 + ( yi − y j )2

(6.3)

1 dij

(6.4)

and wij =

where (xi , yi) are the geographic coordinates of event i. Considering the nonstationarity of spatial autocorrelation of some attribute of the set of events, a local indicator of spatial association based on Moran’s I was defined by Anselin (1995). It allows for the decomposition of global Moran’s I into contributions by individual observations (attribute value of the events). The calculation for local indicators of spatial association (LISA) is I i = Li

∑w L ij

j

(6.5)

j

where Li and L j are the deviations of attribute values of events i and j from the average attribute value. The calculation method is further as follows: Li =

( ai − a ) zi =

δ

δ

(6.6)

where δ is the standard deviation of attribute a. Based on this, we can get:

Ii =

6.2.2

zi



δ

j 2

wij z j (6.7)

temporal autoCorrelation

A simple way to measure the temporal autocorrelation is to use the time attribute t of the geographic event as the attribute value a in the I equation above. That is, simply replacing the attribute a with the time t to derive the temporal autocorrelation Moran’s Index I t . From this, we formulate the needed equations for calculating the global temporal Moran’s Index and the local temporal Moran’s Index.

95

Spatiotemporal Autocorrelation Analytics

6.2.2.1 Global Temporal Moran’s It 6.2.2.1.1

It Formula Definition It =

n ∑ ∑ wij ( ti − t ) ( t j − t ) W ∑ ( ti − t ) ( ti − t )

(6.8)

It should be noted that  wij is still the spatial weight value of event i and event j. It is still defined by the spatial distance between the two events. Let z ti = ti − t , then I t can be defined as follows: It =

n ∑ ∑ wij zti ztj W ∑ zti2

(6.9)

6.2.2.1.2 Assumption of Normality Under the normality assumption, it means that the attribute values of each geographic event obey a normal distribution. In this case, the occurrence time of each event is derived from a normal distribution of possible occurrence times. With the assurance that I t and ∑ zti2 (defined previously) remain constant and independent, the expected value I t can be obtained as: EN ( I t ) =

−1 n −1

(6.10)

Since I t is an inferential statistic, we can’t take the index at its face value but to determine a statistical significance before making use of the result. This is done with a simple hypothesis test, calculating a Z score and its associated p-value for each It. This can be done by the following: Z N ( It ) =

It − EN ( I t ) VAR N ( I t )

(6.11)

6.2.2.1.3 Assumption of Randomization Based on the concept of time and space described by Cliff and Ord (1981) and the newly defined time, Moran’s Index can have an expected value for no spatiotemporal autocorrelation as: ER ( I t ) =

n ( n 2 − 3n + 3) S1 − nS2 + 3S02  − bt 2 ( n 2 − n ) S1 − 2nS2 + 6 S02 

( n − 1)( n − 2 )( n − 3) S02

(6.12)

96

Spatiotemporal Analytics

The definitions of S0, S1, S2, and bt 2 are as follows: S1 =

∑ ∑ ( wij + w ji )

2

(6.13)

2

S2 = ∑ ( wi . + w.i )

2

(6.14)

while wi. =

∑w , ij

w.i =

j

∑w

ji

i

S0 = W = ∑ ∑ wij bt 2 =

n ∑ ( ti − t )

(6.15)

4

 ∑ ( ti − t )2   

(6.16)

2

As discussed before, each index value should be referenced with a standard score Z so that the statistical significance of the index value can be determined. The Z score under the randomness assumption is defined as: Z R ( It ) =

I t − ER ( I t ) VAR R ( I t )

(6.17)

The calculation method for the variance of E R ( I t ) is: n ( n 2 − 3n + 3) S1 − nS2 + 3S02  VAR R ( I t ) =

− bt 2 ( n 2 − n ) S1 − 2nS2 + 6S02 

( n − 1)( n − 2)( n − 3) S

2 0

− [ E R ( I t )]

2

n ( n − 3n + 3) S1 − ns2 + 3S02  2

=

− bt 2 ( n 2 − n ) S1 − 2nS2 + 6S02 

( n − 1)( n − 2)( n − 3) S02

(6.18) 1 − (n − 1)2

6.2.2.2 Localized Temporal Moran’s I Similar to the above-mentioned global temporal Moran’s Index, the localized temporal Moran’s Index can also be calculated as: I ti = ui ∑ wij u j

(6.19)

97

Spatiotemporal Autocorrelation Analytics

where ui and u j are the deviations of the corresponding t value from the mean values: ui =

( ti − t ) δ t2

(6.20)

In the equation, δ t is the standard deviation of the values of t variable. In this case, the expected values of I ti under normality and randomness assumptions can be calculated as: wi n

(6.21)

wi n −1

(6.22)

E N ( I ti ) = E R ( I ti ) = where wi =

∑W . ij

j

Finally, the calculation of the Z values under normality assumption and randomization assumption can be:

6.2.3

Z N ( I ti ) =

I ti − E N ( I ti ) VAR N ( I ti )

(6.23)

Z R ( I ti ) =

I ti − E R ( I ti ) VAR R ( I ti )

(6.24)

Spatiotemporal autoCorrelation (temporal anD Spatial moran’S I)

Next, we extend the classic Moran’s Index, taking into account the spatial pattern of attribute values and the temporal pattern of geographic events. This extension considers that temporal adjacency and spatial adjacency may coexist, which means we assume that the spatial weights and the temporal weights are multiplied to derive spatiotemporal weights, following the multiplicative principle (for example, as in Schertzer and Lovejoy 1987). 6.2.3.1 Global Spatiotemporal Moran’s Index Time adjacency can be defined by calculating the time difference between two events. This means:

98

Spatiotemporal Analytics

tij =

1 ti − t j

(6.25)

We can also set a threshold and convert it to a binary value:  1  tij =   0

ti − t j ≤ tθ ti − t j > tθ

(6.26)

Please note that temporal adjacency can be mutual between two temporally adjacent events. However, the implied influences of one event to another may or may not be mutual. If not mutual, the calculation of time difference can be set to be one-directional. 6.2.3.1.1 Spatiotemporal Weight Matrix Spatiotemporal Moran’s Index formula For events i and event j, the spatiotemporal weight is defined as wij tij . The spatiotemporal Moran’s Index I st can take the following form: I st =

n ∑ ∑ wij tij ( ai − a ) ( a j − a )

∑ ∑ wij tij ∑ ( ai − a )( ai − a )

(6.27)

In order to simplify the formula, let vij = wij tij, zi = ( ai − a ), the calculation method of I st is changed to: I st =

n ∑ ∑ vij zi z j ∑ ∑ vij ∑ zi2

(6.28)

Normality assumption For calculating the variance of expected index value (if no spatiotemporal autocorrelation exists among the events), we need the expected value of the square of the index value, E N ( I st2 ). This is for the calculation of a Z score for testing statistical significance of a calculated index value. It can be achieved with the following: E N ( I st2 ) =

n 2V1 − nV2 + 3V02 ( n − 1)( n + 1)V02

(6.29)

where V0 = V = ∑ ∑ vij

(6.30)

99

Spatiotemporal Autocorrelation Analytics

V1 =

∑ ∑ ( vij + v ji )

2

2

V2 = ∑ ( vi . + v.i )

2

and vi . =

∑ v ,  v = ∑ v j

ij

.i

i

(6.31) (6.32)

ji

Randomization assumption In the randomization assumption, the expected value of the spatiotemporal Moran’s I is: E R ( I st ) = −

1 n −1

(6.33)

Replace the spatial weight matrix with the spatiotemporal weight matrix to get: ER ( I

2 st

)=

n ( n 2 − 3n + 3) V1 − nV2 + 3V02  − b2 ( n 2 − n ) V1 − 2nV2 + 6V02 

( n − 1)( n − 2 )( n − 3)V02

(6.34) where n ∑ ( ai − a )

4

b2 =

 ∑ ( ai − a )2   

2

(6.35)

Z equations Under the normality assumption and randomization assumption, respectively, the Z scores are: Z N ( I st ) =

I st − E N ( I st ) V ⋅1RN ( I st )

(6.36)

Z R ( I st ) =

I st − E R ( I st ) VAR R ( I st )

(6.37)

6.2.3.2 Localized Spatiotemporal Moran’s Index The overall transformation process of extending the localized Moran’s Index to a localized spatiotemporal Moran’s Index is similar to that of extending the global spatial Moran’s Index to the spatiotemporal Moran’s

100

Spatiotemporal Analytics

Index. Without going into too much detail here, a few final formulas are given here. However, detailed discussions on these can be found in Lee and Li (2017). I sti = Li ∑ tij wij Li

(6.38)

can be simplified to: I sti =

zi ∑ vij zi δ2

(6.39)

Z equations

6.3 6.3.1

Z N ( I sti ) =

I − E N ( I sti ) VAR N ( I sti )

(6.40)

Z R ( I sti ) =

I − E R ( I sti ) VAR R ( I sti )

(6.41)

EXAMPLE APPLICATION DiSeaSe patternS

In order to better reflect our description of the spatiotemporal autocorrelation in spatiotemporal data, here we use the data of Sina Weibo in 2014 (https://www.techinasia.com/weibo-2014-176-million-monthly-activeusers/) (http://politics.people.com.cn/n/2014/0526/c1001-25065640.html) to demonstrate the calculation and use of the spatiotemporal Moran’s Index discussed here. The data used in this case is the Weibo comment data related to an outbreak of dengue in China in 2014. Dengue fever is said to be caused by the dengue virus, which is an infectious disease with strong transmission ability. Around 2013, dengue fever spread rapidly in many countries. Early detection of diseases and understanding of their temporal and spatial patterns help us assess the risks posed by new outbreaks and to alleviate the seasonal and pandemic flu effects of this disease. Here we choose to use Sina Weibo’s microblog data about dengue fever in Guangdong to study their spatial and event clustering patterns to further verify the feasibility and practicality of the spatiotemporal version of the Moran’s Index. We downloaded a total of 1,908 Weibo for analysis. The Weibo microblogs (posts) used in the experiment have no attribute values in the traditional sense. However, each Weibo microblog had a quantitative indicator, which is the number of comments posted on the post. That was

Spatiotemporal Autocorrelation Analytics

101

used to measure the influence of posts. Specifically, the number of comments of a posted microblog is the number of comments received for it, from the time it was posted to the time it was deleted. Secondly, each microblog had a timestamp, that is, the time of publication, and the coordinates of its corresponding microblog account (when the account was registered). Therefore, here we took the number of comments as the attribute value of a posted microblog, a, its release time as the corresponding event attribute t, and the coordinates of the Weibo account as the spatial information. As shown in Figure 6.1, the microblogs are represented by dots, and the distribution represents the spatial distribution of microblogs. The different sizes of the circular symbols indicate the numbers of comments received for the corresponding microblogs. In addition, colors are used to indicate the time when the Weibo microblogs were posted, with the release time of the first article on dengue fever in Guangdong as the time origin, using days as the time unit. As can be seen in Figure 6.1, the temporal and spatial patterns of Weibo are not evenly distributed. Therefore, this data set can be used to test the feasibility and practicability of the Moran’s Index. We have defined six spatial weight matrices. First, set each spatial weight to be inversely proportional to the distance between a pair of events. In order to avoid the problem of excessive weights due to the short distances, we aggregate all the locations within 0.1 km away from each other into the same location. The other five weighting schemes are defined using distance thresholds of 1, 2, 3, 4 and 5 km, respectively.

FIGURE 6.1 Temporal and space distribution of Weibo.

102

Spatiotemporal Analytics

 wij = 1, dij < dθ  wij = 0, dij ≥ dθ 

(6.42)

Tables 6.1 and 6.2 show the calculation results. In the first three columns, the values of the classic Moran Index were calculated using a spatial weight that is inversely proportional to the distances between events and a weighting scheme using five distance thresholds, respectively. The second column lists the results of calculating the temporal autocorrelation. The rightmost column lists the spatiotemporal autocorrelation index values calculated based on the equations discussed in the previous sections. Using the same data set, the calculated index values are all statistically significant, which means that there is spatial clustering and temporal clustering, and the clustering exists for spatial and temporal autocorrelation simultaneously. In addition, among the three Moran’s Indices, the spatiotemporal Moran Index values are the highest, which means that this index is more capable of measuring the level of spatiotemporal clustering. It is worth noting that the calculated index values in Tables 6.1 and 6.2 cannot be directly compared between spatial, temporal, or spatiotemporal autocorrelation coefficients, because they each measure different things. So far, the analysis of Weibo shows a strong spatiotemporal clustering pattern in the Weibo data set. Both spatial and temporal trends show positive autocorrelation. These trends are further verified by the positive spatiotemporal autocorrelation shown in the data set. Specifically, they are verified by using spatial weights that are inversely proportional to the distances between paired events with a predefined distance threshold. With the help of the extended equations for calculating the local spatiotemporal autocorrelation, we calculated the Z score of a local spatiotemporal autocorrelation index value for each microblog (the spatiotemporal threshold was set to 7d (days) and the distance threshold was set to 5 km). The spatial weight was assigned to be inversely proportional to the distances between paired microblogs (events). In order to visualize the trend of the calculated indicator values (since they are difficulty to be displayed graphically), the point-based indicator values were spatially interpolated using the inverse distance weight (IDW) function to generate Figure 6.2 (under the normality assumption) and Figure 6.3 (under the randomization assumption). For the maps shown in Figures 6.2 and 6.3, the pixel size was set to 2 km, and the search radius was set to be variable, up to 12 points. Figures 6.2a and 6.3a show the Z scores of the local Moran’s Index when only spatial weights were used. Figures 6.2b and 6.3b show the Z scores of the local Moran’s Index when only temporal weights were used. Figures 6.2c and 6.3c are the results of combining space and temporal weights, which prove the effect and practicality of the new index combining space

4.634

−3.056

−2.525

−2.761

−2.723

−2.559

0.063

−0.001

−0.001

−0.001

−0.001

−0.001

1 wij = dij

wij = 1 if dij < 1km

wij = 1  if dij < 2 km

wij = 1 if dij < 3 km

wij = 1 if dij < 4 km

wij = 1 if dij < 5 km

ZN

I

Case#

Classic

−2.584

−2.752

−2.791

−2.554

−3.092

4.689

ZR

−0.001

−0.001

−0.001

−0.001

−0.001

0.080

It

−4.490

−4.162

−2.980

−2.277

−4.083

5.877

ZN

ZR

−4.490

−4.162

−2.980

−2.277

−4.083

5.877

t in Place of a

0.693 0.673

0.000

0.675 0.000

0.890

0.000

0.844

4.066

ZN

0.000

0.000

0.099

IUTI

∑ wij tij

0.680

0.701

0.682

0.899

0.853

4.115

ZR

0.000

0.000

0.000

0.000

0.000

0.120

IITI

0.673

0.694

0.675

0.890

0.844

4.062

ZN

∑ wij ∑ tij

0.680

0.701

0.682

0.899

0.852

4.111

ZR

TABLE 6.1 Comparison of the Performance of Different Forms of Moran’s Index Using the 2014 Guangdong Province Dengue Fever Weibo Data (the Threshold is 7 Days)

Spatiotemporal Autocorrelation Analytics 103

4.634

−3.056

−2.525

−2.761

−2.723

−2.559

0.063

−0.001

−0.001

−0.001

−0.001

−0.001

1  wij = dij

wij = 1 if dij < 1km

wij = 1 if dij < 2 km

wij = 1 if dij < 3 km

wij = 1 if dij < 4 km

wij = 1 if dij < 5 km

ZN

I

Case#

Classic

−2.584

−2.752

−2.791

−2.554

−3.092

4.689

ZR

−0.001

−0.001

−0.001

−0.001

−0.001

0.080

It

−4.490

−4.162

−2.980

−2.277

−4.083

5.877

ZN

ZR

−4.490

−4.162

−2.980

−2.277

−4.083

5.877

t in Place of a

0.666

0.863 0.000

0.877 0.000

1.097

1.046

4.854

ZN

0.000

0.000

0.000

0.086

IUTI

∑ wij tij

0.672

0.869

0.884

1.106

1.054

4.912

ZR

0.000

0.000

0.000

0.000

0.000

0.094

IITI

0.667

0.863

0.878

1.097

1.045

4.851

ZN

∑ wij ∑ tij

TABLE 6.2 Comparison of the Performance of Different Forms of Moran’s Index Using the 2014 Guangdong Dengue Fever Weibo Data (Threshold is 14 Days)

0.672

0.870

0.884

1.105

1.053

4.909

ZR

104 Spatiotemporal Analytics

Spatiotemporal Autocorrelation Analytics

105

and temporal autocorrelation. Through this visualization, it is also possible to identify hot and cold spots in space and over time. In addition, looking at Figures 6.2c and 6.3c, one can see that they have very different results, which highlight the difference in the calculation outcomes between that under the normality hypothesis and that under the randomness hypothesis. Therefore, both the classic Moran’s Index and the extended Moran’s Index need to be carefully considered to use normality assumptions or randomization assumptions.

6.3.2

Simulation experimentS

Experimental steps:



FIGURE 6.2 The distribution of Z scores under the assumption of normality based on the reciprocal of distance. (a) Spatial autocorrelation I. (b) temporal autocorrelation I t. (c) temporal and spatial autocorrelation I sti .

FIGURE 6.3 Distribution of Z scores under the assumption of randomization and based on the reciprocal of distance for (a) spatial autocorrelation I; (b) temporal autocorrelation I t; and (c) temporal and spatial autocorrelation I sti .

106

Spatiotemporal Analytics

with different spatial autocorrelation levels and different levels of temporal autocorrelation through the randomly generated point set and their associated attribute values and timestamps. 6.3.2.1 Monte Carlo Simulation Process a. Set the target space (or time) autocorrelation level (for example, 0.1, 0.2, etc.) c. Randomly select two of the 100 points to swap their spatial (or temporal) attribute values. d. Recalculate the spatial (or temporal) autocorrelation level after the swap. e. Compare the recalculated spatial (or temporal) autocorrelation level with the current level. i. If the calculated value is closer to the target level than the current level, keep the swap. Set the current level to be the recalculated level. ii. If the calculated value is not closer to the target level, exchange it back to the original value. f. Check whether the target level is reached, or whether the difference between the current level and the target level is less than a preset range. i. If reached, stop the simulation and output the data set, including each point’s x, y, t, and a. ii. If the target level is not reached, return to step c to select another set of points. Through the above process, a total of seven data sets were generated to simulate spatial autocorrelation. The levels were set for −0.1, 0.0, 0.1, 0.2, 0.3, 0.4, and 0.5. For temporal autocorrelation, a similar process was used to generate another seven sets of data sets with the same level of distribution. Theoretically, simulations could produce data sets with spatial and temporal autocorrelations that could be less than −0.1 or greater than 0.5. However, the cost of overusing computing resources beyond −0.1 and 0.5 would be very expensive. Therefore, we limited the experiments to only generating simulation modes between −0.1 and 0.5. At the end, we generated 49 sets of simulated data for each specified spatial and temporal autocorrelation level. 6.3.2.2 Sensitivity and Temporal and Spatial Trend Analysis The results of the simulation experiment are shown in Table 6.3. The results are listed according to the spatial autocorrelation levels measured by the

107

Spatiotemporal Autocorrelation Analytics

TABLE 6.3 Comparison of Experimental Results of Different Moran’s Indices Based on Simulated Data Ist I −0.1 0.0 0.1 0.2 0.3 0.4 0.5

Min −0.291 −0.010 0.035 0.140 0.181 0.348 0.455

Max −0.014 0.031 0.164 0.344 0.543 0.678 0.888

Range 0.278 0.041 0.129 0.204 0.362 0.331 0.434

Trend Linear Non-linear Non-linear Linear Linear Linear Linear

classical Moran’s Index from −0.1 to 0.5 (seven levels). Similarly, the time autocorrelation is also controlled between −0.1 and 0.5. We directly compare data sets generated by different spatial autocorrelation levels and different temporal autocorrelation levels. When the spatial autocorrelation was controlled to be at −0.1, the minimum temporal autocorrelation of the spatiotemporal Moran’s Index reaction was −0.291. As the spatial autocorrelation increased, the numerical trend of the calculated I st values became more linear.

6.4

SOFTWARE AND USER MANUAL

We have developed a Python tool for calculating the classic Moran’s Index and its extended version. Here we briefly introduce how to download and use the software provided in this book. Please email [email protected] with name, affiliation information, and a copy of receipt for purchasing this book to download Chapter06.zip. After the download is complete, unzip it to any location on your computer hard disk.

6.4.1

moran’S I tool uSer manual

The software interface is shown as Figure 6.4. For convenience, we provide default values for all required parameters.

108

Spatiotemporal Analytics

FIGURE 6.4 Software interface diagram. Parameter Name

Description

Point File.csv

Point data input file (csv format), including coordinates, attribute values, and time. (The data file format refers to the sample file Dengue_spatial.csv.)

Spatial Weights-Threshold Temporal Weights-Threshold Local Moran’s I

Spatial weight matrix parameter settings. Set the distance threshold.

Temporal Moran’s I Spatial Temporal Moran’s I Run Output Use initial map

Temporal weight matrix parameter setting. Set the time threshold. Choose whether to calculate the partial Moran Index, if so, select the output file (the default is Result.csv). Choose whether to calculate the temporal-expanded local Moran’s Index (only the classic local Moran Index is calculated by default). Choose whether to calculate the spatiotemporal expansion local Moran’s Index. Run the software. Global space–time Moran Index calculation result output box. Choose whether to use the default map of Guangdong Province. (If the input data is the standard data located in Guangdong Province, check it, otherwise do not check it, which will cause the legend to display unbalanced.) (Continued)

109

Spatiotemporal Autocorrelation Analytics

Parameter Name

Description

Draw

Used to directly draw preset calculation results and provide legend display effects.

Result.csv

Preset local Moran Index output file name (you can choose the file location and rename it yourself).

6.4.2

DemonStration of Software reSultS

One click at the Draw button brings up a diagram showing the geographic distribution of Z scores of the local spatiotemporal Moran’s Index of dengue microblog data under randomization assumption, shown in Figure 6.5. The value range is 0–1. The base map is Guangdong Province. Figure 6.6 is a 3-dimensional schematic diagram of the Z score of the local spatiotemporal Moran’s Index of the dengue microblog data under random assumption. Figure 6.7 is the calculation result of the global Moran’s Index of the software and the corresponding Z score. The results support localized record keeping by copying and pasting.

FIGURE 6.5 Geographical distribution map of 2-dimensional local spatiotemporal Moran’s Index Z score.

110

Spatiotemporal Analytics

FIGURE 6.6 Geographical distribution map of 3D partial spatiotemporal Moran’s Index Z score.

FIGURE 6.7

Illustration of calculation results.

Spatiotemporal Autocorrelation Analytics

6.4.3

111

Supplementary explanation

• The default input file has a relatively large amount of data, the software calculation is computationally demanding and takes some time to complete (takes about 3–4 hours if all are selected). • The default file corresponds to Guangdong Province, so is the base map. If the user selects data outside of Guangdong Province, only the offline point distribution will be displayed and will be without a base map. • The input file must be a csv file with the same format as the example data file. • Because the calculation may take a long time, it may be convenient for users to view the graphical effect directly. This can be done by clicking the Draw button to draw the preset calculated data directly, users can run directly to view.

REFERENCES Anselin, L (1995). Local indicator of spatial association: LISA. Geographical Analysis 27(2), 93–115. Bertazzon, S. (2003). “Spatial and Temporal Autocorrelation in Innovation Diffusion Analysis.” In Computational Science and Its Applications, ICCSA 2003 Part III, 23–32, edited by V. Kumar et al., Berlin Heidelberg: Springer. Cliff, A. D., and J. K. Ord (1969). The problem of spatial autocorrelation. In London Papers in Regional Science, 25–55, edited by A. J. Scott. London: Pion. Cliff, A. D., and J. K. Ord (1973). Spatial Autocorrelation. London: Pion. 178 pp. Cliff, A. D., and J. K. Ord (1975). Space-time modelling with an application to regional forecasting. Transactions of the Institute of British Geographers 64, 119–128. Cliff, A. D., and J. K. Ord (1981). Spatial processes: models & applications. Taylor & Francis. Lee, J. and S. W. Li (2017). Extending Moran’s Index for spatial autocorrelation to the detection and measurement of spatiotemporal autocorrelation. Geographical Analysis 49, 36–57. Lee, J. and D. Wong (2001). Statistical Analysis of Geographic Information with ArcView. New York: Wiley and Sons. Lopez, H. F. A., and Y. C. Chasco (2007). Time-trend in spatial dependence, specification strategy in the first-order spatial autoregressive model. Estudios de Economia Aplicada 25(2), 631–650. Martin, R. L., and J. E. Oeppen (1975). The identification of regional forecasting models using space: Time correlation functions. Transactions of the Institute of British Geographers 66, 95–118. Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika 37(1), 17–23.

112

Spatiotemporal Analytics

Schertzer, D., and S. Lovejoy (1987). Physical modeling and analysis of rain and clouds by anisotropic scaling multiplicative processes. Journal of Geophysical Research: Atmospheres 92(D8), 9693–9714. Tobler, W. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography 46(Supplement), 234–240.

7

Spatiotemporal G Statistical Analytics Huiyu Lin and Zhuo Chen Kent State University

CONTENTS 7.1 Introduction ...................................................................................113 7.2 The Getis–Ord Gi and Gi* Statistics ..............................................115 7.2.1 Space–Time Weight Matrix ................................................116 7.2.2 Space–Time Gi and Gi*....................................................... 117 7.3 Space–Time Crime Pattern in Chicago .........................................118 7.3.1 Software and Usage ............................................................119 7.3.2 Hardware/Software Requirements ......................................119 7.3.3 Software Usage for ST Gi and Gi* Analysis .......................119 7.4 Concluding Remarks......................................................................122 References ............................................................................................. 124

7.1

INTRODUCTION

In recent years, the development of mobile devices and high-speed internet has significantly increased the making of voluntary geographical data. However, in order to mine data to understand complex spatial phenomena and to extract useful information from large volumes of data, a variety of exploratory spatial data analyses (ESDAs) have been developed and proposed by many scholars. These have become crucial elements in geographical studies (for example, Anselin 1999; Goodchild et al. 2000). More specifically, ESDA techniques usually mine geographical information to assess the degree of spatial autocorrelation of events represented by the data to see if the events are spatially related. This attempt is based on a notion that is known as Tobler’s First Law of Geography: Everything is related to everything else, but near things are more related than distant things (Tobler 1970). Some of the most widely used global and local indexes of spatial autocorrelation include Moran’s I, local indicators of spatial association (LISA), and Getis–Ord G statistics. Although ESDA is powerful for analyzing and visualizing spatial patterns of events, it lacks the ability to capture the trend of how spatial patterns of the phenomena change over time. To better capture the complex DOI: 10.1201/9781003304395-7113

114

Spatiotemporal Analytics

dynamics of geographical events, efforts have been made to extend spatial analytics to space–time analytics. The extensions have been done by incorporating the time dimension in the data into the computation algorithms for calculating different indexes for spatiotemporal autocorrelation. Some examples include extending kernel density estimation into space–time kernel density estimation (Nakaya and Yana 2010; Lee et al. 2017), adding time in constructing geographically weighted regression models to develop geographically and temporally weighted regression models (Huang et al. 2015), and space–time statistics such as space–time cubes in ArcGIS (ESRI, Inc.), space–time scan statistics (Kulldorff 2001; Kulldorff et al. 2005), and the space–time Ripley’s K function (Bailey and Gatrell 1995). It is worth noting that different terms have been used to describe space–time analytics in research including spatial–temporal analysis, spatiotemporal analysis, and space–time analysis. In order to avoid confusion, terms such as “space–time”, “spatial–temporal”, and “spatiotemporal” mentioned in this chapter will be abbreviated to ST. Despite adding the temporal dimension into computation algorithms being mathematically feasible and applicable as in previous research, measuring and the use of the level of ST autocorrelation in a set of events has not been extensively studied in ST clustering. Cliff and Ord (1975) proposed the concept of ST autocorrelation that built the foundation for measuring ST autocorrelation. Some of the most recent advancements in discussing local ST autocorrelation include the extending Moran’s I (Lee and Li 2017) and Getis–Ord Gi and Gi* (Wang and Lam 2020). Getis and Ord (1992, 1996) and Ord and Getis (1995) introduced the Getis–Ord Gi and Gi* to identify local clusters as represented by some attribute values of a set of events being studied. The outcome may be used to identify hot and cold spots, where hot spots are locations of high attribute values being surrounded by other high attribute values and cold spots are locations of low attribute values surrounded by other low attribute values. For the rest of the chapter, Getis–Ord Gi and Gi* will be referred to as the Gi and Gi* for simplicity of illustration. The Gi and Gi* analytics have among the most used methods in spatial pattern mining due to their ability to identify local outliners and the straightforward interpretation of their results. For example, in ArcGIS, Gi* statistics are also embedded in the Hot Spot Analysis tool. If a local feature has a high attribute value and it is also neighboring features that have high values, the location of the feature is considered to be a statistically significant hot spot. The same applies to the detection of a cold spot when a location of low attribute value and its neighbors are all with low attribute values, the location is identified as a cold spot. As a result, a z-score and a p-value, which are normally computed for each Gi* statistic for each location, can be used to assess its statistical significance. A positive and statistically significant z-score clearly

115

Spatiotemporal G Statistical Analytics

indicates a spatial hot spot or the location where a high-value cluster is, while a negatively significant z-score clearly indicates a spatial cold spot or the location where a low-value cluster is. Mapping the z-scores would give users a visual presentation of exactly where the hot/cold spots are. One attempt to visualize ST clustering is presented in the Emerging Hot Spot Analysis toolbox in ArcGIS (ESRI 2019). The tool created ST cubes with customized spatial and temporal units defined by users. The tool produces different types of ST hot/cold spots. Although the results are easy to map and interpret, the ST adjacency is not well constructed because the tool contains two independent algorithms to analyze the spatial and temporal associations between ST cubes. Detailed instructions and explanation of the Emerging Hot Spot Analysis tool can be found in the toolset’s Item Description in ArcGIS or its user manuals. For instance, as the routine activity theory in environmental criminology proposed, crimes result from the three environmental elements: a motivated offender, a suitable target, and the absence of a capable guardian. In addition, these elements must converge in both space and time (Cohen and Felson 1979). Following this school of thought, it has been generally accepted in previous research that crime incidents are indeed concentrated spatially. However, new discussions revolving around the stability of crime clusters have been raised recently to ignite much debate (Weisburd et al. 2012; Weisburd 2015; Hunt 2016). To contribute to the discussion on the dynamics of crime hot spots over space and time, we developed a Python program incorporating the ST Gi and Gi* statistics developed by Wang and Lam (2020). An example examining the spatial–temporal hot spots of Chicago crime is provided in this chapter, with stepwise instructions for using the program.

7.2

THE GETIS–ORD Gi AND Gi* STATISTICS

Introduced by Getis and Ord (1992) and Ord and Getis (1995), Getis–Ord Gi and Gi* statistics have been extensively used in spatial studies to analyze local patterns in spatial data. The null hypothesis of Gi is that the local sum of the observation’s neighbors is not significantly different from the sum of all observations across the entire region. The algorithm for Gi is G i (Ws ) =

∑ ∑

wij xi

j , j  ≠  i

j , j  ≠  i

(7.1)

xj

where Ws is the spatial weight matrix, each entry wij in Ws equals 1 if locations i and j are geographic neighbors (i.e., spatially adjacent) and 0 otherwise. In Gi, i is not its own neighbor, and hence wii = 0.

116

Spatiotemporal Analytics

If the observation itself is also considered along with its neighbors when compared to all observations, we will have the null hypothesis for Gi*. In that case, the algorithm is given by the following:

Gi * (Ws ) =

∑wx ∑x

ij i

j

j

(7.2)

j

where i is also defined as its own neighbor; therefore, when j = i, wii = 1. However, the spatial Gi and Gi* algorithms do not include temporal attributes. Wang and Lam (2020) extended the Gi and Gi* algorithms to ST Gi and Gi* algorithms by incorporating a space–time weight matrix into the original statistics.

7.2.1

SpaCe–time weight matrix

The original spatial weight matrix defines neighboring realtionships by the spatial adjacency between events. For example, a neighbor for location i can be the one that shares a boundary with i, or within a threshold distance from i. In much the same way, the conceptualization of space–time adjacency would also have two specifications. Two direct adjacencies or two thresholds are needed, one in space and another one in time. That being set, it must be pointed out that, when constructing a temporal association, it is important to consider the following rules: only incidents in the past have the ability to influence current or future incidents, not vice versa. Furthermore, the definition of the most appropriate temporal intervals for the incidents to be considered “close” to one another, i.e., the temporal threshold, has to be carefully considered and constructed. Griffith (2010, 2012) proposed two conceptualizations of the space–time association structures, including contemporaneous specification and lagged specification. The contemporaneous specification describes the space–time neighbor for i as “its preceding in situ location as well as the instantaneous neighboring locations”, while in the space–time lagged specification, i’s neighbor is “its preceding in situ location as well as the preceding neighboring locations”. Both conceptualizations can be written in a matrix form. Here we reference the summary provided by Wang and Lam’s paper for the explanation of the two specifications (Wang and Lam 2020): “The space–time contemporaneous specification is: VST =   I T   ⊗  WS  +  WT   ⊗   I s The space–time lagged specification is

(7.3)

Spatiotemporal G Statistical Analytics



117

118

Spatiotemporal Analytics

Gi * (VST ) =

∑vx ∑x

ij i

j

j

(7.6)

j

To interpret the result: a positive standardized z-score suggests an ST hot spot, whereas a negative z-score indicates an ST cold spot. Wang and Lam also applied the algorithm in the form of panel data that are often used in regional economic studies. This data form is generally considered more informative. They usually contain more variation and less collinearity among the variables (Elhorst 2003). Other advantages of panel data are shown in previous research such as increasing the efficiency in the estimations due to greater availability of degrees of freedom and providing more capacity for modeling the complexity of human behavior than a single cross-section or time-series data (Hsiao 2005). The rows of the panel data contain repeated observations over time on the same set of cross-sectional (geographic) units (e.g., countries, states, counties, census tracts, and ZIP codes), while the columns of panel data represent the geographical units. In addition, they proposed to apply conditional permutation as the inference approach for ST Gi and Gi*. To control Type I error and identify as many true ST clusters as possible, a false discovery rate (FDR) is incorporated into the program.

7.3 SPACE–TIME CRIME PATTERN IN CHICAGO The identification and analysis of crime hot spots have long been an interest to both academics and the public. Identifying crime hot spots has become the basis for policing strategies such as problem-oriented policing and community-oriented policing. Spatial analytics such as local Moran’s I, LISA, and Getis–Ord Gi* statistics (Hot Spot Analysis), etc. contributed to furthering the geography of crime research. In addition, according to crime theories such as the routine activity theory and the near-repeat theory, crime incidents are linked in both space and time. To analyze how the spatial patterns of violent crimes in Chicago change in space and time at a fine local level (block group), crime incidents data were downloaded from the Chicago Data Portal (https://data.cityofchicago. org/). We reorganized the data for Chicago violent crimes into 12 temporal units by month. The spatial unit is census block groups. Hence, the ST panel data in this example consist of 25,704 (12 times the number of block groups in Chicago, which is 2,142) observations. Each observation represents the crime rate of the block group within that month (i.e., the number of crimes recorded for the month divided by the corresponding local population).

Spatiotemporal G Statistical Analytics

7.3.1

119

Software anD uSage

An ST Gi and Gi* program coded in Python is provided for readers to execute the ST Gi and Gi* statistics (Wang and Lam 2020) introduced in this chapter. In this section, we present the procedures for downloading, installing, and using the tool. To access and download this file, please email [email protected] with name, affiliation information, and a copy of the receipt for purchasing this book to download Chapter 07.zip. Once downloaded, the.zip files can be uncompressed or restored to any folder on your computer hard drive, such as C:\sttools\st _ g.

7.3.2

harDware/Software requirementS

Before running the tool in the uncompressed folder, it is necessary to install Python and the dependent modules (omit this if Python is already installed): Step 1: Install Python 3.9. Download Python 3.9 (https://www.python.org/downloads/ release/python-390/). Depending on the setup of your computer system, choose an appropriate source file to download. For example, Windows users can download Windows x86-64 executable installer or Windows x86-64 web-based installer. Double-click your downloaded installer to install it. Be sure to check Add Python 3.9 to PATH when installing as it ensures that the system knows where you installed your Python program. Step 2: Installing Python. Double-click install-reqs.bat to automatically download and install all the required modules for this tool. After the process is done, press any key to exit the command window.

7.3.3

Software uSage for St gi anD gi* analySiS

Once Python and the environment have been set up as introduced in previous sections, you would be able to run the analytical tool for ST Gi and Gi* analysis. The specific software developed for this chapter and the dataset used in this example application are compressed into Chapter 07.zip, which can be downloaded by emailing [email protected] with name, affiliation information, and a copy of the receipt for purchasing this book. First, double-click run.bat in the uncompressed folder. This will open the user interface that looks like Figure 7.1. The interface contains multiple input fields for users to enter values of choice for specified parameters. An explanation of these parameters can be found in Table 7.1. Then open the data file named Chicago _ Joined.shp. You can find the file in the Data subfolder. Note that the order of the chosen temporal

120

FIGURE 7.1

Spatiotemporal Analytics

The interface of ST Gi and Gi* statistics tool.

TABLE 7.1 Parameters in of ST Gi and Gi* Statistics Tool Parameter/Function

Description

Input shapefile (polygon) The name of the input file. The file should be in the format of ESRI’s shapefile. Z (time) fields Names of the temporal fields containing the values of the interest in each period. Type Indicate whether it is Gi* (include focal observation) or Gi statistics (not including focal observation). Permutations The number of random permutations for calculating pseudo p values. Output shapefile The name of the output file (.shp is not necessary for the file name). It will be saved as a shapefile. Run Run the application with specified parameters. Close Close the tool.

fields is important and should be chronological. For example, the fields need to be in the order of January, February, March, etc. To use the ST Gi and Gi* tool: Step 1: Click Open file button to select the input file (e.g., …/ Data/ Chicago _ Joined.shp)

Spatiotemporal G Statistical Analytics

121

FIGURE 7.2 Parameters set up for ST Gi and Gi* statistics tool.

Step 2: Select the corresponding Z fields as in Figure 7.2. As discussed above, it should be in chronological order. We use 19_01 (January 2019), 19_02, 19_03… in Z fields to represent values of interest in each month. Step 3: Select the Type used in the analysis. In this example, we use Gi* to include the focal observation when considering the spatial weight. Step 4: Set the number of random permutations for calculating pseudo p values. We use the default value, 999. Step 5: Specify the output shapefile name (including the path) or use the Browse… button to open the dialog to do so. Step 6: Click the Run button to proceed with the calculation. Otherwise, click the Close button to terminate this. The calculation will take a minute. Once the process is completed, it will show a window notifying the user as in Figure 7.3. Use GIS software to map and check the result in the newly created shapefile. The results are shown in Figure 7.4. Each map represents the ST hot/cold spot in Chicago for the corresponding month. The result shows that violent crime distributions in Chicago change over time and in space. More crime hot spots were detected for the month from July to November compared to the rest of the year, as well as clustered in the center and south side of Chicago. Furthermore, depending on the need of the user, different ST units can be used to uncover the dynamics of the desired phenomena.

122

Spatiotemporal Analytics

FIGURE 7.3

7.4

Popup window notifying the completion of the calculation.

CONCLUDING REMARKS

The topic of spatiotemporal analysis has long been an interest in the geography and GIS communities. It has been given increasing attention recently. However, no applicable executable program has been available or easily accessible to users. Although the tool Emerging Hot Spot Analysis in ArcGIS provides ST analysis for users with an easy to interpret result, the tool has limitations in terms of providing detailed calculations for ST autocorrelation. In this chapter, we provide a Python program incorporating the ST Gi and Gi* statistics proposed by Wang and Lam (2020) in the hope of providing a handy executable tool for users to test and apply the ST Gi and Gi* method to spatiotemporal research. Step-by-step instructions for installing and using the program are provided. An example of analyzing the ST hot spots of violent crime in Chicago showed that there had been a shift in both space and time of violent crime cluster across block groups of Chicago from month to month. In addition, as suggested by Wang and Lam (2020), FDR has used the program to reflect Type I error and identify as many true space–time clusters as possible, which results in a more reliable outcome.

123

Spatiotemporal G Statistical Analytics

January

April

February

March

May

June

July

August

October

November

September

December

FIGURE 7.4 ST hot and cold spots of violent crimes in Chicago by block groups and months, 2019.

124

Spatiotemporal Analytics

REFERENCES Anselin, L. 1999. Interactive techniques and exploratory spatial data analysis. Geographical Information Systems: Principles, Techniques, Management and Applications 1:251–64. Bailey, T. C., and A. C. Gatrell 1995. Interactive Spatial Data Analysis. Vol. 413. Essex, UK: Longman Scientific and Technical. Cliff, A. and J. K. Ord. 1975. Space-time modelling with an application to regional forecasting. Transactions of the Institute of British Geographers 64:119–28. Doi:10.2307/621460. Cohen, L. E., and M. Felson 1979. Social change and crime rate trends: A routine activity approach. American Sociological Review 44:588–608. Elhorst, J. P. 2003. Specification and estimation of spatial panel data models. International Regional Science Review 26(3):244–68. doi: 10.1177/ 0160017603253791. ESRI (Environmental Systems Research Institute). 2019. How emerging hot spot analysis works. Accessed August 20, 2021. Retrieved from: https:// pro.arcgis.com/en/pro-app/tool-reference/space-time-pattern-mining/learnmoreemerging.htm. Getis, A., and J. K. Ord 1992. The analysis of spatial association by use of distance statistics. Geographical Analysis 24(3):189–206. doi: 10.1111/j.1538– 4632.1992.tb00261.x. Getis, A., and J. K. Ord 1996. Local spatial statistics: An overview. In P. Longley and M. Batty (eds.) Spatial Analysis: Modelling in a GIS Environment, 261– 77. Cambridge, UK: GeoInformation International. Goodchild, M. F., L. Anselin, R. P. Appelbaum, and B. H. Harthorn 2000. Toward spatially integrated social science. International Regional Science Review 23(2):139–59. doi: 10.1177/016001760002300201. Griffith, D. A. 2010. Modeling spatio-temporal relationships: Retrospect and prospect. Journal of Geographical Systems 12(2): 111–23. Doi: 10.1007/ s10109-010-0120-x. Griffith, D. A. 2012. Space, time, and space–time eigenvector filter specifications that account for autocorrelation. Estadıstica Espanola 54(177):7–34. Griffith, D. A., and J. H. Paelinck 2018. Space–time autocorrelation. In D. A. Griffith and J. H. P. Paelinck (eds.) Morphisms for Quantitative Spatial Analysis, 25–34. Cham, Switzerland: Springer. Hsiao, C. 2005. Why panel data? Singapore Economic Review 50(2):1–12. Huang, J., Huang, Y., Pontius Jr, R. G., and Zhang, Z. 2015. Geographically weighted regression to measure spatial variations in correlations between water pollution versus land use in a coastal watershed.  Ocean & Coastal Management 103:14–24. Hunt, J. M. 2016. Do crime hot spots move? Exploring the effects of the modifiable areal unit problem and modifiable temporal unit problem on crime hot spot stability (Doctoral dissertation, American University). Retrieved from: http://search.proquest.com/openview/57d0440647a2a47cbf84cfada487ec4f/ 1?pq-origsite=gscholar&cbl=18750&diss=y.

Spatiotemporal G Statistical Analytics

125

Kulldorff, M. 2001. Prospective time periodic geographical disease surveillance using a scan statistic. Journal of the Royal Statistical Society: Series A (Statistics in Society) 164(1):61–72. doi: 10.1111/1467–985X.00186. Kulldorff, M., R. Heffernan, J. Hartman, R. Assunçao, and F. Mostashari 2005. A space–time permutation scan statistic for disease outbreak detection. PLoS Medicine 2(3):e59. doi: 10.1371/journal.pmed.0020059. Lee, J., J. Gong, and S. Li 2017. Exploring spatiotemporal clusters based on extended kernel estimation methods. International Journal of Geographical Information Science 31(6):1154–77. doi: 10.1080/13658816.2016.1170133. Lee, J., and S. Li 2017. Extending Moran’s index for measuring spatiotemporal clustering of geographic events. Geographical Analysis 49(1):36–57. doi: 10.1111/gean.12106. Nakaya, T., and K. Yano 2010. Visualising crime clusters in a space–time cube: An exploratory data analysis approach using space–time kernel density estimation and scan statistics. Transactions in GIS 14(3):223–39. doi: 10.1111/j.1467–9671.2010.01194.x. Ord, J. K., and A. Getis 1995. Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis 27(4):286–306. doi: 10.1111/j.1538–4632.1995.tb00912.x. Tobler, W. R. 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46(Suppl. 1):234–40. doi: 10.2307/143141. Wang, Z. and N. S. N. Lam. 2020. Extending Getis-Ord statistics to account for local space-time autocorrelation in spatial panel data. The Professional Geographer, 72(3). https://doi.org/10.1080/00330124.2019.1709215. Weisburd, D. 2015. The law of crime concentration and the criminology of place. Criminology 53:133–57. Weisburd, D., E. R. Groff, and S-M. Yang 2012. The Criminology of Place: Street Segments and Our Understanding of the Crime Problem. New York: Oxford University Press.

8

Spatiotemporal Kernel Density Estimation Junfang Gong, Zhuang Zeng, Bo Wan, and Shengwen Li China University of Geosciences

Jay Lee Kent State University

CONTENTS 8.1 8.2

Introduction ...................................................................................127 Methods.........................................................................................131 . 8.2.1 Classic Spatiotemporal Kernel Density Estimation (CL_STKDE) .................................................131 8.2.2 Conditional Spatiotemporal Kernel Density Estimation (CN_STKDE) .................................................131 8.2.3 Integrative Spatiotemporal Kernel Density Estimation (IN_STKDE) ..................................................132 8.2.4 Validation Measurement ...................................................133 8.2.4.1 Hit Rate .............................................................133 8.2.4.2 Compactness Index ...........................................134 8.3 Example Application.....................................................................135 8.4 Software and User Manual............................................................138 References ..............................................................................................143

8.1

INTRODUCTION

When we conduct a study on a set of geographic events or objects, we usually need to obtain the distribution characteristics of the events (or objects) in the study region so that we can analyze the target events as a whole and see if there are underlying factors influencing their spatial distribution. There are different ways to obtain the distribution characteristics of the study objects, either by direct analysis of the distribution of the target events through mathematical tools such as histograms or by quantitative study of the attribute values of the target events using mathematical formulas. Among the existing research approaches, kernel density estimation DOI: 10.1201/9781003304395-8

127

128

Spatiotemporal Analytics

(KDE) is highly valued by researchers as evident by its increasing use. KDE is a method to study the characteristics of data distribution from the data itself. This method does not utilize any prior knowledge of the target data distribution and does not require making any assumptions for the data distribution. KDE uses a smooth peak function to fit a surface to the data points, thus simulating a real probability distribution surface. KDE has been widely used in studies of the spatiotemporal distributions of geographic events to create density surfaces describing the intensity of the distribution of geographic events. KDEs were first used for one-dimensional data analysis, e.g., historical traffic incidents on a section of highway that were recorded as points. KDEs assume that the probability of a traffic incident reoccurring at a location where a previous incident had occurred would be the highest. The probability at a location away from a previous incident would decrease with the increase of distance. Using the same logic, we can use the probability density function of a density kernel placed at each location of prior occurrence (observation) to estimate the probability of other locations having the same event occurring. As shown in the example in Figure 8.1, the portion of the roadway where more crashes occurred would end up having a higher cumulative probability density (as depicted by the solid curve). This is due to the fact that the associated probability density functions have overlapping parts, so that the final probability curve is derived by accumulating (adding) the probabilities taken from different probability density function curves. The only assumption that the above method makes is the shape of probability density estimation kernel (dashed curves). It is due to the simplicity of this method that KDEs have drawn wide applications in fields such as geographic information science after expanding from one-dimensional to two-dimensional space. It should be noted that the probability density function in Figure 8.1 is based on a Gaussian curve. If preferred (or needed), one can use any other reasonable mathematical function to formulate the shape of the kernel. Common kernel functions include constant kernel function (Const), exponential kernel function (Exponential), Gaussian kernel function (Gaussian), Epanechnikov kernel function (Epanechnikov), quadratic kernel function (Quartic), and polynomial kernel function (Polynomial Order 5), among others. In two-dimensional space, we typically use a pair of coordinates, ( x , y ), to represent the coordinates of the event as a point and to create a density surface based on the locations of the data points. Accordingly, the estimated kernel density of events in two-dimensional space can be represented using three parameters, where the first two parameters are defined by the point coordinates, or ( x , y ) pairs, and the third parameter is the probability

129

Spatiotemporal Kernel Density Estimation

FIGURE 8.1 Kernel density estimation in one dimension.

density. Specifically, the kernel density estimation (KDE) can be described by the following equation: In the formula, n denotes the total number of all data points involved in the calculation; xi and yi define the coordinates of point i ; hs is the bandwidth of the kernel density function (i.e., how wide the kernel is spread) – the larger the bandwidth hs, the larger the domain around point i, the smoother the estimated density function fˆ ( x , y ). Because of this, the bandwidth hs is also called “smooth parameter”; ks denotes the kernel density function, which is essentially a weighting function. Now let us assume that the kernel density function is a Gaussian kernel density function, so the formula can be simplified as the following (other mathematically defined function can be used if appropriate): 2

 x − xi y − yi   x − xi   y − yi  ks ( us ) = ks  , =  +  hs  hs   hs  hs 

2

(8.2)

Then the formula for KDE can be simplified as 1 fˆ ( x , y ) = nhs

n

∑k (u ) s

i=1

s

(8.3)

130

Spatiotemporal Analytics

In two-dimensional space, we focus on the location information of data points, and we can learn the spatial distribution of geographic events at a certain (temporal) moment by the location information of the events. However, as time passes, the distribution of the same geographic events corresponding to different time points may differ. To capture these changes into a spatial process, we can extend KDE by combining the two-dimensional KDE and time to form a three-dimensional space. We call this method spatiotemporal density estimation (STKDE). Due to the different measurement units that space and time have, it is not possible to integrate spatial and temporal data directly. In that regard, any extension of spatial statistics to incorporating temporal data must consider the different temporal properties of the events. There are many researchers who have proposed different methods for exploring the spatiotemporal clusters of geographic events (Figure 8.2). KDE was widely used as early as 1986 to create density surfaces describing the intensity of the distribution of geographic events. Not long afterward, KDE was introduced to GIS (see, for example, Boros and Lee, 1995). Because KDE uses points to represent the geographic locations of geographic events, KDE is widely used in epidemiology, criminology, and other fields to describe the spatial distribution of disease incidents or crime activities. In early phases of spatial analyses of geographic events by researchers, the way in which geographic events were distributed in time and space was usually evaluated quantitatively at individual time periods. The results from doing so were then used to see if they deviated significantly from a model of theoretically constructed random distribution. In most cases, however, the estimated density surface based on a set of geographic events does not remain static over time, as the spatial

FIGURE 8.2 Events distribution in three dimensions.

131

Spatiotemporal Kernel Density Estimation

distribution of geographic events often changes over time. Some researchers have studied the visualization of spatial and temporal density estimates of spatiotemporal data and applied it to construct spatiotemporal cubes used to explore the spatial and temporal distribution of crime events, diseases, etc. (see, for example, Brunsdon et al., 2007). The rest of this chapter is organized as follows. In Section 8.2, the basic principles and key details of the technical implementation of three typical spatiotemporal kernel density estimation models are given (see, for example, Lee et al., 2017). In Section 8.3, application examples of spatiotemporal kernel density estimation models are presented. In Section 8.4, the software accompanying this chapter and its usage are introduced.

8.2 METHODS 8.2.1 Classic Spatiotemporal Kernel Density Estimation (CL_STKDE) The KDE can broaden the kernel density estimation in space and time by using the multiplicative orthogonal relation between spatial and temporal dimensions between the two geographical dimensions and a temporal dimension. This method is called classical spatiotemporal kernel density estimation (CL_STKDE). The formula for the CL_STKDE can be expressed as follows: fˆ ( x , y, t ) =



=

1 nhs2 ht   1 nhs2 ht  

n

∑k  x −h x , y −h y  k  t −h t  i

s

i

s

i =1

s

i

t

t

n

∑k (u ) k (u ) s

s

t

t



(8.4)

i =1

t − ti ; kt is a ht probability function defined over time with bandwidth ht ; and n is the number of observations of the form ( xi , yi , ti ) for i =1, …, n.

where the notation is as those defined in equation (8.1),  ut =

A second spatiotemporal KDE, CL_STKDE, treats time and space separately, using different kernel functions as density estimates in the space and time dimensions, and finally multiplying the estimated densities in the spatial and temporal dimensions to obtain estimates of the spatiotemporal densities.

8.2.2 Conditional Spatiotemporal Kernel Density Estimation (CN_STKDE) Based on the distance decay concept and applying it to KDEs, we found that the probability of an event occurring at a given location depends on

132

Spatiotemporal Analytics

how far away that location is from the other location where such an event occurred before. The longer the distance, the lower the probability that the event would occur at the estimated location. This rule can be similarly applied in temporal dimension. Considering that the units of spatial and temporal dimensions are usually different and thus are difficult to be integrated directly, we propose a threshold ht for filtering geographic events with respect to temporal distances, ut (time difference between each pair of consecutive events). In other words, for two events in adjacent locations, they need to be closer to each other in time than a predefined temporal threshold to be both spatially and temporally adjacent. With such a temporal threshold defined, the temporal density kernel function, or kt , becomes a constant. In addition, it does not need to be divided by ht , since there is no need to define any bandwidth for a kernel. With this change of incorporating a threshold for temporal adjacency, the conditional spatiotemporal kernel density estimation (CN_STKDE) is now calculated as follows: fˆ ( x , y,t ) =

8.2.3

1 nhs2  

n



 x − xi y − yi  1 ks  ,  , ut < ht = nh 2    h h s s s i=1

n

∑k (u ), u < h s

s

t

t

(8.5)

i =1

integrative Spatiotemporal kernel DenSity eStimation (iN_STkDe)

The two methods in Sections 8.2.1 and 8.2.2 still consider the time dimension and the space dimension separately, without further in-depth consideration of the connection between time and space. The integrated estimation of spatiotemporal kernel density chooses the same kernel functions in the time and space dimensions, i.e., a kst that is the same as ks and kt is chosen, while a multiplicative orthogonal relationship between the spatial and temporal dimensions can be assumed. To do this, we first need to normalize the spatial and temporal data, and for this purpose different denominators need to be chosen. s=

s′ − s hs  

(8.6)

t=

t′ − t ht  

(8.7)

The s′ and t′ in the formula are the data before normalization. s and t are used as reference values for normalization of spatial and temporal data, such as mean values, etc. The hs and ht are the bandwidths in space and

133

Spatiotemporal Kernel Density Estimation

time threshold in temporal dimensions. The purpose of performing data normalization is to eliminate the differences in measurement units between spatial and temporal data. The standardization process should be used with appropriately selected reference values and desired bandwidths to obtain standardized data values with similar ranges. After standardization, the standardized data values can be integrated directly and the integrative spatiotemporal density estimation (IN_STKDE) can be calculated as follows: fˆ ( x , y,t ) =

1 nhs2 ht  

n



1 kst ( ust ) = 2 nhs ht   i=1

n

∑k i =1

st

 x − xi y − yi t − ti   h , h , h  (8.8) s s t

ust  is calculated as follows: 2

2

 x − xi   y − yi   t − ti  ust =  + +  hs   hs   ht 

8.2.4

2

(8.9)

valiDation meaSurement

The estimated density is computed in a three-dimensional space. We define V as a set of spatiotemporal points pxyt , so that V can have the following definition: V = { pxyt | ( x , y ) ∈ S,t ∈ T }

(8.10)

In the space–time point set V , ( x , y ) denotes the spatial dimension and t denotes the temporal dimension. The use of cells in the ( x , y ) plane to form surfaces makes the spatial dimensions form a raster-like data structure. If the set of spatiotemporal points V is considered as a cube, the partitions of the cube can be called cube cells. In order to better evaluate the effect of different STKDEs, we choose two evaluation metrics, hit rate and compactness index. 8.2.4.1 Hit Rate To calculate the hit rate, let N be the total amount of spatiotemporal data and n be the number of spatiotemporal data points inside the clusters identified using STKDE, then the hit rate Hit is calculated as follows. Hit =

n N

(8.11)

We classify the identified spatiotemporal clusters into two categories: hot clusters and cold clusters. The definition of hot and cold clusters is similar

134

Spatiotemporal Analytics

to that of hot and cold spots in the two-dimensional case, where hot clusters are spatiotemporal clusters of objects with similar high values of a certain unitary property and cold clusters are spatiotemporal clusters of objects with similar low values of a unitary property. In order to define cold and hot clusters, it is necessary to define a threshold d for estimating the density. With the threshold, a cube pxyt contained in a cluster, it must be pxyt ≥ d to be a candidate cubic cell for inclusion in a hot cluster. Intuitively, we can conclude that a larger number of spatiotemporal clusters may capture more spatiotemporal data points than a small number of spatiotemporal clusters. Let θ be some spatiotemporal cluster, then the volume v of θ is proportional to the number n of spatiotemporal data points in the cluster. Let P be the ratio of the volume v of the cluster to the total volume V defined by all spatiotemporal data points, and P is calculated as follows: P=

v V

(8.12)

Using p-values it is possible to compare the hit rates of different KDEs according to the statistical significance of 0.01 or 0.05; 0.01 and 0.05 indicate 1% and 5% of the total volume V , respectively. 8.2.4.2 Compactness Index To identify the compactness of spatiotemporal clusters, the analysis was performed using a block index (CL). CL is defined as the ratio of neighboring cells with similar properties in the raster data to data that are completely randomly distributed in space. The value of CL would be larger when cells with certain properties form only a few large clusters. The value of CL is smaller when the clustering of cells with similar properties is not significant. To facilitate the generalization of CL to spatiotemporal data, the calculation of spatiotemporal CL is required. Let R be the ratio of the volume of all cubes in all identified thermal clusters to the total volume. Also let G1 be the total number of adjacent faces between adjacent cubic units in all identified hot clusters, and G2 be the total number of adjacent faces between identified hot clusters and adjacent cold cluster cubic units. G is calculated as: G=

G1 G1 + G 2

(8.13)

Let n be the total number of cubic units in the thermal cluster, then 6 × n is the number of shared surfaces of all cubic units in the clusters, so the number of G2 can be calculated from G1, and G2 is calculated as:

Spatiotemporal Kernel Density Estimation



135

136

FIGURE 8.3

Spatiotemporal Analytics

Distribution of residential burglaries in Akron, Ohio, 2012.

FIGURE 8.4 Kernel estimation of burglary density in Akron, Ohio, 2012.

We used the three aforementioned STKDE methods and one spatial KDE method to estimate burglary densities. We also experimented with six different kernel functions for all four KDEs. The experimental results are shown in Table 8.1. It can be found that IN_KDE has the highest hit rate in most cases. Table 8.2 shows the CL values of different KDE algorithms. It can be found that CN_KDE is the most sensitive to the variation of the dataset

Spatiotemporal Kernel Density Estimation

137

FIGURE 8.5 Temporal trend of reported residential burglaries in Akron in 2012, by week.

FIGURE 8.6 Comparison of KDE surfaces in different weeks. (a) Week 1 (b) week 2, (c) week 27, and (d) week 52.

due to having the highest maximum density, the highest average density, the highest mean, the highest median, and the largest variance. The experimental results demonstrate that the STKDE method can identify cold and hot clusters better than the spatial KDE method can.

138

Spatiotemporal Analytics

TABLE 8.1 Hit Rates by Different KDE Methods Kernel Function Significant level Const Exponential Gaussian Epanechnikov Quartic Polynomial order 5

Spatial KDE 0.01 6.99 11.31 10.53 9.04 9.97 10.28

0.05 28.57 38.59 37.67 33.40 36.54 37.82

CL_KDE 0.01 11.36 78.42 44.50 26.36 39.88 42.24

IN_KDE

0.05 33.71 100.00 100.00 69.84 100.00 100.00

0.01 10.53 56.06 44.76 29.96 40.90 46.09

0.05 40.85 100.00 100.00 100.00 100.00 100.00

CN_KDE 0.01 11.36 32.99 24.97 19.12 23.64 25.13

0.05 33.71 78.42 70.45 60.43 66.60 69.99

TABLE 8.2 CI Values by Different KDE Methods

8.4

Kernel Function

Spatial KDE

Significant level Const Exponential Gaussian Epanechnikov Quartic Polynomial order 5

0.01 0.62 0.69 0.71 0.73 0.70 0.68

0.05 0.77 0.79 0.80 0.81 0.79 0.78

CL_KDE 0.01 0.68 0.65 0.70 0.75 0.71 0.69

0.05 0.55 0.75 0.76 0.80 0.76 0.75

IN_KDE 0.01 0.41 0.68 0.70 0.73 0.69 0.68

0.05 0.61 0.76 0.76 0.78 0.75 0.74

CN_KDE 0.01 0.68 0.70 0.72 0.74 0.72 0.72

0.05 0.55 0.76 0.79 0.80 0.79 0.79

SOFTWARE AND USER MANUAL

We have developed a tool for calculating the three STKDEs that can be used to calculate the accuracy of the three STKDEs mentioned above as well as a method for spatial visualization. Here we describe how to download and use the tool provided in this book. Please email spatiotemporal. [email protected] with name, affiliation information, and a copy of the receipt for purchasing this book to download Chapter 08.zip. After downloading, unzip it to any location. The software interface is shown in Figure 8.7. For convenience, we only provide options to select different STKDEs and the corresponding kernel functions (Table 8.3). The tool supports only numerically encoded time series, such as 1, 2, 3, 4… to indicate time sequences in days (or whatever the unit used). After starting the tool, click the Open button to select the input file. Check S_Show and ST_Showky to display the spatial distribution and temporal distribution of the input data, respectively. Be sure to select the input folder, otherwise the program will prompt that no data is selected. The input data contains three dimensions: x-coordinates, y-coordinates, and time t (Figure 8.8).

139

Spatiotemporal Kernel Density Estimation

FIGURE 8.7

Software interface.

TABLE 8.3 Illustration of Software Parameters Parameter Input file S_Show ST_Show Selection algorithm CL_STKDE CN_STKDE IN_STKDE S_kernel T_kernel ST_kernel Loading

Run

Description The input file of point data, including coordinates and occurrence time. If selected, a two-dimensional distribution of the point data will be displayed. If selected, a three-dimensional distribution of the point data will be displayed. Option to select one KDE algorithm. Classic spatiotemporal kernel density estimation. Conditional spatiotemporal kernel density estimation Integrative spatiotemporal kernel density estimate Kernel in spatial dimension Kernel in temporal dimension Kernel in space and time Density calculation progress Run program.

140

Spatiotemporal Analytics

FIGURE 8.8 Prompt for no data selected.

FIGURE 8.9 Prompt for no function selected.

When running the software, make sure to check one of S_Show, ST_ Show, CL_STKDE, CN_STKDE, and IN_STKDE. Otherwise, the program will respond with an error message that no function is selected. It is recommended to select only one of CL_STKDE, CN_STKDE, and IN_STKDE. If you select more than one, the calculation will be done in the order of CL_STKDE, CN_STKDE, and IN_STKDE and the previous calculation results will be updated, resulting in calculation error (Figure 8.9). We choose to use a small amount of data for the tool’s functional presentation. Checking S_Show will display an image list like that in Figure 8.10. Checking ST_Show will display a graph similar to that in Figure 8.11. Select the kernel function (S_kernel) corresponding to the STKDE method (Figure 8.12). Figure 8.13 shows a display for the progress of the calculation. It can be seen that Figure 8.14 is similar to Figure 8.11, the density of the region containing data points is usually higher, but some of the regions that do not contain data points also have higher density values. In the calculation of Figure 8.14, we divided the whole spatiotemporal region into 100*100*100 subregions based on three dimensions. We then used the density value of the center point of each subregion as the density value of that subregion. In the three STKED algorithms we set the bandwidth of the three dimensions as 1% of the corresponding dimension. In CN_STKDE,

Spatiotemporal Kernel Density Estimation

FIGURE 8.10

Spatial distribution.

FIGURE 8.11

Spatial and temporal distribution.

141

we chose the time threshold as 1% of the time span. In IN_STKDE, we chose to use the mid-value of each dimension for data normalization. Since a total of one million subregions were created, it was first filtered to find if the current subregion has a density function value greater than one million; the current subregion is considered as a high-density subregion. The average value of all high-density subregions is calculated. In addition, different subregions are displayed in different colors according to the density values. Subregions with density values greater than twice of

142

Spatiotemporal Analytics

FIGURE 8.12 Option of kernel function.

FIGURE 8.13 Density calculation progress.

FIGURE 8.14 Regional distribution map of high-density areas.

Spatiotemporal Kernel Density Estimation

143

the average density are displayed in red, those with a density value greater than the average density are displayed in yellow, those with a density value greater than one million are displayed in blue, and the remaining subregions are not colored. Note: The best results under the two statistical significance are bolded. Note: The best results under the two statistical significance are bolded.

REFERENCES Boros, A., and J. Lee, 1995. Point pattern analysis of offender residences in Cuyahoga County, Ohio, in M. Salling (ed.), URISA Proceedings, San Antonio, TX: 56–65. Brunsdon, C., J. Corcoran, and G. Higgs, 2007. Visualising space and time in crime patterns: a comparison of methods. Computers, Environment and Urban Systems, 31(1), 52–75. doi: 10.1016/j.compenvurbsys.2005.07.009. Lee, J., J. Gong, and S. Li, 2017. Exploring spatiotemporal clusters based on extended kernel estimation methods. International Journal of Geographical Information Science, 31(6), 1154–1177.

9

Spatiotemporally Weighted Regression Bo Huang The Chinese University of Hong Kong

Sensen Wu The Chinese University of Hong Kong Zhejiang University

CONTENTS 9.1 Introduction ...................................................................................146 9.2 Methodology ..................................................................................147 9.2.1 OLS Model .........................................................................147 9.2.2 GWR Model........................................................................148 9.2.3 GTWR Model .....................................................................149 9.3 Application Examples ....................................................................151 9.3.1 House Price Estimation ......................................................152 9.3.2 Environmental Pollution Monitoring ..................................152 9.3.3 Transportation Management ...............................................153 9.3.4 Crime Analysis–Based Urban Planning .............................156 9.4 Software and Usage .......................................................................157 9.4.1 Installation and Uninstallation ............................................157 9.4.1.1 How to Install GTWR Add-in ...............................157 9.4.1.2 Uninstall ................................................................160 9.4.2 Run GTWR .........................................................................161 9.4.2.1 Data Input ..............................................................161 9.4.2.2 Setting ...................................................................161 9.4.2.3 Output ....................................................................165 9.4.2.4 Error ......................................................................168 9.4.3 Some Notes .........................................................................169 9.4.3.1 Data Requirements ................................................169 9.4.3.2 Model Test .............................................................170 9.4.3.3 Spatiotemporal Distance .......................................171 9.5 Concluding Remarks......................................................................171 References ..............................................................................................171

DOI: 10.1201/9781003304395-9

145

146

9.1

Spatiotemporal Analytics

INTRODUCTION

Space and time are two fundamental dimensions pertaining to geographical processes and social phenomena. As the world becomes instrumented and interconnected, spatiotemporal data are more ubiquitous and richer than ever before. Moving object (such as taxi and bird) trajectories recorded by GPS devices, social events (such as microblog and crime) with location tags and time stamps, and environmental monitoring readings are typical spatiotemporal data that we can observe every day. The increasing development of Internet of Things (IoT) technology allows the integration of sensors, radio-frequency identification, and Bluetooth in the real-world environment using highly networked services. As a result, the promise of IoT leads to an exponential increase in spatiotemporal data by several orders of magnitude. Analysis and modeling of spatiotemporal data has long been one of the main focuses of geographical information science. Regression analysis of geographical relationships is a basic topic in the study of spatiotemporal modeling. Developing new geo-enabled regression methods to improve the capabilities of spatiotemporal analysis and data mining is significantly important for the understanding of geographical processes and social phenomena. Regression methods can be grouped into two general categories: global and local regression models. Global spatial models are usually an improved form of the ordinary least squares (OLS) model. Spatial or temporal effects are addressed by modeling the residual variance–covariance matrix directly or by inversing the residual variance–covariance matrix to eliminate dependency in the residuals. However, a major problem with global methods when applied to spatial or temporal data is that the processes being examined are assumed to be constant over space. For a specific model (e.g., the price of real estate), the assumption of stationarity or structural stability over time and space is generally unrealistic, as parameters tend to vary over the study area. In order to capture the spatial variation, various localized modeling techniques have been proposed to capture spatial heterogeneity in geographical relationships. Notably, Brunsdon et al. (1996) and Fotheringham et al. (2002) proposed geographically weighted regression (GWR) as a local variation modeling technique. GWR allows the exploration of the variation of the parameters as well as the testing of the significance of this variation. Therefore, GWR has been widely used in numerous fields, such as environmental modeling (Zhang et al. 2016, Zhai et al. 2018, Yang et al. 2019, Zhang et al. 2019), public health (Chen et al. 2016, Wang et al. 2016, Ge et al. 2017), housing price analysis (Harris et al. 2013, Lu et al. 2016, Cao et al. 2019, Fang et al. 2019), transport geography (Selby

Spatiotemporally Weighted Regression

147

and Kockelman 2013, Li et al. 2016, Xu et al. 2019), and crime researches (Cahill and Mulligan 2007, Stein et al. 2016). In addition to space, time is another basic dimension associated with geographical processes and social phenomena. To model temporal heterogeneous effects, Huang et al. (2010) developed a geographically and temporally weighted regression (GTWR) model to deal with both spatial and temporal non-stationarity simultaneously by incorporating temporal effects into the standard GWR model. Since the space–time distance defined by GTWR solved the problem of space–time integration, the GTWR model has been successfully applied in diverse domains with excellent results and decent spatiotemporal interpretability (Liu et al. 2017, Du et al. 2018, Wu et al. 2019, Zeng et al. 2019, Zhou and Lin 2019, Dong et al. 2020). Moreover, compared to the GWR model, the GTWR model often shows superior performance by taking the temporal dynamics into account, resulting in apparent improvements in both prediction accuracy and efficiency. The rest of this chapter is organized as follows. In Section 9.2, we present the basic framework of the GTWR model and offer the key details of technical implementation. In Section 9.3, several application examples of the GTWR model are reported. In Section 9.4, we introduce the usage of the GTWR model in the ArcGIS software. Finally, we summarize and draw conclusions.

9.2 9.2.1

METHODOLOGY olS moDel

In spatial analysis, the OLS model is a basic method to identify the nature of the relationships among the factors in the form of a linear regression. In this technique, the relationship between the dependent variable yi and the independent variables xi1 , xi 2 ,…, xip can be expressed as: p

yi =   β 0 +

∑β x

k ik

+ εi         i = 1,2,…, n

(9.1)

k=1

where β 0 is the constant coefficient of the regression; β1 ,…, β p are the regression coefficients of the corresponding independent variables; and ε i is the error term of sample i with zero mean and constant variance σ 2. The estimates of the OLS model in matrix form are as follows: −1 βˆ = ( X T X ) X T y

where

(9.2)

148

Spatiotemporal Analytics

 β0  β β =  1   β k

9.2.2

 1     1  , X =  1   1  

x11

x12

x12

x 22

x n1

xn 2

 y1 x1p    x2 p  y2   , and y =    y x np   n 

     

gwr moDel

The basic concern associated with the GWR model is that a global model’s coefficient estimates are probably unable to express the sophisticated local variations over space. Therefore, the global form is extended by the GWR to allow local estimation, and the GWR can be mathematically expressed as (Fotheringham et al. 2002): yi = β 0 ( ui , vi ) +

p

∑ β (u , v ) x k

i

i

ik

+ ε i      i = 1,2,…, n

(9.3)

k=1

where ( ui , vi ) represents the coordinates of point  i in space and β 0 ( ui , vi ) denotes the intercept value. β k ( ui , vi ) is a series of coefficient values at point i . In the GWR model, by allowing the coefficient values to vary spatially, one can capture the local effects. The estimation of the coefficient values β ( ui , vi ) can be expressed as: −1 βˆ ( ui , vi ) =  X T W ( ui , vi ) X  X T W ( ui , vi ) y

(9.4)

where W ( ui ,vi ) is an n × n geographical weighting matrix. The diagonal elements of the W ( ui ,vi ) matrix denote the geographical weights, and the off-diagonal elements are set to zero. A certain weight kernel is used to calculate the weight matrix for each point  i . In the GWR model, the weight kernels usually adopted are the Gaussian, bi-square, tri-cube, and exponential functions and can be either fixed or adaptive (Fotheringham et al. 2002). For instance, the fixed Gaussian-based kernel function can be expressed as:

(

aij = exp − ( dijs ) h 2 2

)

(9.5)

where h, the bandwidth, is a non-negative attenuation parameter that produces a declining effect relative to the distance dijs . Adaptive kernels, which use adaptive bandwidths, are commonly constructed to ensure adequate local calibration when the samples are dense or sparse. For instance, the form of an adaptive bi-square weighting function is as follows:

149

Spatiotemporally Weighted Regression

 2 2  1 − ( dijs hi )  ,  aij =    0, 

if  disj < hi

(9.6)

otherwise

where hi denotes the distance from point i to its qth nearest neighbor, and thus, the estimation of the adaptive kernel is equal to the calibration of the q value. The bandwidth is usually calculated utilizing a cross-validation (CV) procedure: CV ( h ) =

∑( y − yˆ

≠i

i

( h ))

2

(9.7)

i

In addition, the parameter h or q can also be fitted by minimizing the corrected Akaike Information Criterion (AICC ), which takes the following form:  n + tr ( S )  AICC = nloge σˆ 2 + nloge ( 2π ) + n   n − 2 − tr ( S ) 

( )

(9.8)

where S is the hat matrix. The fitted values yˆ are obtained by premultiplying the observed values y with matrix S : yˆ = Sy .

9.2.3

gtwr moDel

The GTWR model was originally extended from the traditional GWR model to address both spatial and temporal non-stationarity and was firstly applied to the spatiotemporal real estate data analysis to test the superiority than other estimation methods in terms of their accuracy and statistical significance (Huang et al. 2010). The GTWR model extends the GWR framework by allowing parameters to be estimated locally and instantaneously over time and space so that the model can be expressed as: Yi = β 0 ( µ i , νi , ti ) +

∑ β (µ , ν , t ) X k

i

i

i

ik

+ ε i,

(9.9)

k

where ( µi ,ν i ,ti ) denotes the space–time location of point i , β 0 ( µi ,ν i ,ti ) represents the intercept value, and β k ( µi ,ν i , ti ) is a set of values of parameters at point i . This model allows the parameter estimates to vary across space and time to account for spatial–temporal non-stationarity in parameter estimates. To calibrate the model, it is assumed that the observed data close to

150

Spatiotemporal Analytics

point i in the space–time coordinate system have a greater influence in the estimation of the β k ( µi ,ν i , ti ) parameters than the data located farther from observation i , and therefore a weight matrix Wij is adopted to represent the different importance of each individual observation j in the dataset used to estimate the parameters at location i . Thus, the estimation of β k ( µi ,ν i , ti ) can be expressed as: −1 βˆ ( µi ,ν i , ti ) =  X T W ( µi ,ν i , ti ) X  X T W ( µi ,ν i , ti ) Y

(9.10)

where W ( µi ,ν i , ti ) = diag (α i1 ,α i 2 ,…,α in ) , n is the number of observations. Here the diagonal elements α ij   (1 ≤ j ≤ n ) are space–time distance functions of ( µ ,ν ,t ) corresponding to the weights when calibrating a weighted regression adjacent to observation point i . Thus, the GTWR model relies on the appropriate specification of the space–time kernel function α ij. Considering that location and time usually have different scaling effects, Huang et al. (2010) combined the spatial distance d S and the temporal distance d T to form a spatiotemporal distance d ST = d S ⊗ d T , with symbol ⊗ standing for different operators. Specifically, assuming the Euclidean distance function and Gaussian distance decay-based kernel function are used to construct the spatial–temporal weight matrix, we will have

(d )

ST 2 ij

2 2 2 = λ ( µi − µ j ) + (ν i − ν j )  + µ ( ti − t j ) ,  

(9.11)

where d ijST represents the extent of “closeness” in a spatiotemporal space, ti and t j are observed times at locations i and j.   λ  µ − µ 2 + ν −ν 2 + µ t − t 2   ( i j) ( i j)  ( i j)   α ij = exp −    2  hST        µ − µ 2 + ν −ν 2  ( i j )  (ti − t j )2   j)   ( i + = exp −  hS2 hT2      

( ) + (d )

  dS ij  = exp −  2 h  S  

( )

 d ijS  = exp − 2  hS = α ijS × α ijT

2

2

T 2 ij

hT2

     

  ( d ijT )2    × exp − 2    hT 

(9.12)

151

Spatiotemporally Weighted Regression

where

( )



 d ijS  α = exp − 2  hS



 ( d ijT )2  α Tij = exp − 2   hT 



S ij

2

( ) = ( µ − µ ) + (ν

  d ijS

2

( d ) = (t T 2 ij

2

i

j

   

−ν j ) , 2

i

− t j ) , and 2

i

hST is a parameter of spatiotemporal bandwidth, and hS2 =

2 hST h2 and hT2 = S are µ λ

parameters of the spatial and temporal bandwidths, respectively. As such, the weighting construct of the GTWR model retains a diagonal matrix, whose diagonal elements are multiplied by α ijS × α Tij (1 ≤ j ≤ n ). Thus, it follows that we can build a spatially weighted matrix W S and a temporally weighted matrix W T and then combine them to form a spatial–temporal weight matrix W ST = W S × W T . After the distance d ijST between location i and all observations are computed, the weighting functions can then be constructed. In theory, if there is no temporal variation in the observation data, then the parameter µ can be set to 0 (i.e., µ = 0), which, in turn, degrades the distance calculation to the traditional GWR distance. If, on the other hand, the parameter λ is set to 0 (i.e., λ = 0), only the temporal distances and the temporal non-stationarity are considered. This will lead to a temporally weighted regression (TWR). In most real cases, however, neither λ nor µ equals zero, and both spatial and temporal distances will be modeled. In practice, the bandwidth h and ratio parameter µ as well as λ can be optimized using cross-validation in terms of coefficient of determination (R 2) or AICc if no a priori knowledge is available (Huang et al. 2010).

9.3 APPLICATION EXAMPLES Spatiotemporal analysis and modeling have long been a major concern of geography information science, environmental science, epidemiology, and other research areas. The wide applications of the GTWR model include exploring the spatiotemporal patterns of human behavior, crime activities,

152

Spatiotemporal Analytics

and disease outbreaks to analyze and visualize space–time data. In this section, typical applications (such as environmental science, landscape ecology, hedonic house price, human behavior, health research, and crime studies) would be used to demonstrate the effectiveness of the proposed GTWR model and its superiority to the traditional estimation methods, highlighting the importance of temporally explicit spatial modeling.

9.3.1

houSe priCe eStimation

In this case, the GTWR approach was employed to calibrate local hedonic price models in the city of Calgary, Canada (Huang et al. 2010). Performances of different models (including OLS, GWR, TWR, and GTWR) were compared in terms of (1) the conventional goodness-offit measures using R 2, and (2) the statistical significance measures using McNemar’s Test. As a result, GTWR achieved a better modeling accuracy than both the global OLS model with no spatiotemporal non-stationarity incorporated and the GWR model, which deals with spatial non-stationarity only in sample data. Compared with the global OLS model, TWR and GWR increased the R 2 values from 0.763 to 0.779 and 0.889, respectively, and GTWR yielded a considerably higher R 2 of 0.928. The residual sum of squares (RSS) for the GTWR also yielded a 46.4% improvement over OLS and a 15.6% improvement over GWR. Statistical tests showed that there was a significant difference between GTWR, GWR, and TWR, and therefore we concluded that it is meaningful to incorporated temporal non-stationarity into a GWR model, and GTWR can provide an additional useful methodology for computerassisted mass estimation of real property prices.

9.3.2

environmental pollution monitoring

In this case, the GTWR approach was used to generate ground-level PM2.5 concentrations from satellite-derived 500 m AOD in a region covering the adjacent parts of Jiangsu, Shandong, Henan, and Anhui provinces in central China (Bai et al. 2016). The GTWR model incorporates the SARA (simplified high-resolution MODIS aerosol retrieval algorithm) AOD product with meteorological variables, including planetary boundary layer height (PBLH), relative humidity (RH), wind speed (WS), and temperature (TEMP) extracted from WRF (weather research and forecasting) assimilation to depict the spatiotemporal dynamics in the PM2.5-AOD relationship (Figure 9.1). A cross-validation was done to evaluate the performance of the GTWR model. As a result, compared to OLS, GWR, and TWR models, the GTWR model showed the best performance in depicting the spatiotemporal dynamics of PM2.5-AOD relationship for both model fitting and

Spatiotemporally Weighted Regression

153

FIGURE 9.1 Location of AERONET station and environmental monitoring stations in JSHA. (Revised from Bai et al. 2016.)

cross-validation, obtaining the highest value of R 2, and the lowest values of mean absolute difference (MAD), root mean square error (RMSE), and mean absolute percentage error (MAPE).

9.3.3

tranSportation management

The rapid increase in private car ownership aggravates metropolitan traffic congestion, therefore causing a series of issues, such as air pollution, high energy consumption, and accidents. In order to deal with these problems and improve service quality to achieve the goal of promoting public transit system, identifying the key determinants that affect transit ridership and analyzing the spatial and temporal evolution of influence seems to be critical. OLS regression is the most representative and widely used approach among statistical methods for unraveling the complex relationship between the built environment and transit ridership; however, ridership data from a particulate place do not confirm to the independence hypothesis because of the local interaction and spatial non-stationarity. Hence, several OLSextended models have been proposed to consider spatial heterogeneity in parameter estimation. Typical examples include distance decay weighed regression, two-stage least square regression, and the geographically weighted regression models. Several studies on the influence of the built environment on transit ridership are summarized in Table 9.1. Compared to the traditional OLS global regression model, these listed extended models indeed overcome the drawback of neglecting the spatial autocorrelation effect, among which the GWR model is specifically designed to deal with spatial data regression, effectively capturing the spatial pattern of data via spatial varying coefficients. However, when modeling spatiotemporal data using GWR, the input (i.e., dependent variable)

154

Spatiotemporal Analytics

TABLE 9.1 Summary of the Studies on the Impact of the Built Environment on Transit Ridership Author

Dependent Variable

Model

Key Explanatory Variable

Quick et al. (2019)

Taxi ridership in the Geographically Road density, bike lane zip code tabulation weighted regression density, parking spaces, areas subway, and bus accessibility Jun et al. Station-level Mixed Population and employment (2015) ridership of the geographically densities, mixed land use, pedestrian weighted regression intersection density, road catchment areas density, number of bus stops Zhao et al. Ridership within the Ordinary least Area of residential, office, (2013) pedestrian squares regression and other-use buildings; catchment area of number of educational metro stations institutions, hotels, restaurants, entertainments venues, shopping centers, and hospitals; road length; number of feeder bus lines Cardozo et al. Stop-level boarding Geographically Land use mix, street density, (2012) passengers weighted regression number of metro lines, number of urban bus lines, number of suburban bus lines Taylor et al. Transit ridership for Two-stage Regional geography, (2009) each of the 265 simultaneous metropolitan economy, urbanized areas equation regression population characteristics, auto/highway system, transit system characteristics

requires being aggregated or averaged by a certain period, such as average annual traffic data or daily boarding passengers. Time is another critical dimension that cannot be adequately learned by traditional GWR models. To fill this gap, Ma et al. (2018) used the GTWR approach to explore the spatial and temporal influence of the built environment on transit ridership. Firstly, a GTWR model extended from the GWR model was applied to analyze transit ridership with high fitting precision; second, the spatiotemporal pattern of coefficients was analyzed and an empirical study in

Spatiotemporally Weighted Regression

155

Beijing was adopted to validate the effectiveness of the GTWR model in exploring the relationship between transit ridership and the built environment. In this case study, traffic analysis zone (TAZ) provided by Beijing Urban Planning Bureau was adopted as the analysis unit. Transit ridership was characterized by smart card data because of the high usage rate of such cards. Besides, the built environment variables are described in Table 9.2. As a result, the GTWR model can simultaneously incorporate spatial and temporal non-stationarities into transit ridership data analysis. The time-dependent effects of built environment on transit ridership were confirmed by exhibiting the spatial and temporal distributions of coefficients. Moreover, the GTWR model achieves significantly better goodness-offit than those of the traditional OLS and GWR models. In particular, R 2 increases from 0.153 in the OLS model and 0.406 in the GWR model to 0.965 in the GTWR model. The AIC value is reduced from 256,781.9 and 239,019.7 in the OLS model and GWR model, respectively, to 137,818.1 in the GTWR model.

TABLE 9.2 Built Environment Variables Type Land use

Variable Residential building density Place of employment density

Commercial establishment density Service facility density

Transport

Hotel density Attraction density Bus stop density Metro station density Road density External station density

Description (Unit: per km 2 in Each TAZ) Number of residential buildings Number of companies, research and education agencies, and government agencies Number of shopping malls, restaurants, retail stores, and entertainment centers Number of automobile, telecommunication, financial, and medical service facilities Number of hotels Number of attractions Number of bus stops Number of metro stations Length of road External stations (airport, rail stations, and long-distance bus stations)

156

9.3.4

Spatiotemporal Analytics

Crime analySiS–baSeD urban planning

Local land use composition shapes the situation conditions necessary for crime offenses to occur and is often interpreted through the routine activity theory, which hypothesizes that crimes result from the convergence of motivated offenders, suitable targets, and a lack of capable guardianship in space and time. However, past researches have generally applied crosssectional spatial analysis, while just a few investigated if, and how, the relationships between land use and crime change over time. The study by Quick et al. (2019) aims to explore the time-varying relationships between land use and property crime at the small area scale in the Region of Waterloo, Canada, for 12 seasons from Spring 2011 to Winter 2013– 2014. Figure 9.2 shows the seasonal property crime trend for the study region. Consistent with past researches, property crime was highest in summer seasons and lowest in winter seasons. Quick et al. (2019) show the geographical distribution of property crime in Spring 2011 and Winter 2013–2014, and the seasons with the highest and lowest total property crime counts, respectively. Generally, DAs with high property crime counts are clustered in central areas of the study region during Spring 2011, whereas DAs with high crime counts during Winter 2013–2014 are relatively more dispersed. Property crimes were the sum of break and enters, thefts under $5,000, thefts over $5,000, motor vehicle thefts, property damage, and graffiti incidents. Eight distinct land use variables were analyzed at the small area: location in a central business district, commercial land use, eating and drinking establishments, government institution land use, parks, residential land use, schools, and public transit stations. Eight socio-demographic variables were tested to account for neighborhood disadvantage: residential population, 5-year residential mobility, percent of immigrant residents, index of ethnic heterogeneity, percent of lone-parent families, percent of low-income families, median income, and percent young adult population.

FIGURE 9.2

Seasonal property crime trend. (Revised from Quick et al. 2019.)

Spatiotemporally Weighted Regression

157

As a result, Model 3 (GTWR idea-based), which allows the regression coefficients associated with land uses to vary over time, shows the best fitting performance. The time-constant and time-varying coefficients of sociodemographic and land use characteristics were finally derived, where values greater than 1 indicate positive associations with property crime. Besides, central business districts and commercial land uses are representative of small areas with high concentration of material goods that may attract motivated offenders regardless of season. Moreover, three land uses exhibited recurring seasonally varying relative risk trends including parks, public transit stations, and eating and drinking establishments. While public transit stations are associated with overall property crime risk, there is little evidence of recurring seasonal influence on small area property crime. Parks and eating and drinking establishments both show evidence of recurring seasonal trend. This research also informs crime reduction and prevention initiatives in both urban planning and law enforcement. Urban planning has the potential to reduce time-constant property crime risk in specific small areas via modifications to the built environment. However, because many land uses found to be associated with property crime are desirable amenities and serve important functions, a better option may be to implement crime prevention through environmental design standards (CPTED), to influence offender decision-making by increasing perceptions of capable guardianship. Public awareness campaigns and targeted policing initiatives may prevent and deter crime, influencing time-constant and season-specific crime risk-targeted small areas and the study region.

9.4

SOFTWARE AND USAGE

To help readers to use the GTWR model introduced in this chapter, we developed a GTWR tool of ArcGIS Add-In. This section discusses the procedures for downloading and using the tool. Please email [email protected] with name, affiliation information, and a copy of the receipt for purchasing this book to download Chapter 09.zip. Once downloaded, the .zip files can be uncompressed or restored to any folder on your computer hard drive.

9.4.1

inStallation anD uninStallation

9.4.1.1

How to Install GTWR Add-in

1. Download the install package and double-click GTWR _ Beta. esriAddIn (Figures 9.3 and 9.4). *Unpack the install package, and you can see some Add-in support files. *Double-click this file to install the GTWR Add-in.

158

FIGURE 9.3

Spatiotemporal Analytics

Files in the install package.

FIGURE 9.4 Esri AddIn File: GTWR_Beta.esriAddIn.

FIGURE 9.5 The window for confirming Add-In file installation.



Spatiotemporally Weighted Regression

FIGURE 9.6

The message box shows “Installation succeeded”.

FIGURE 9.7

The window for GTWR.

159

FIGURE 9.8 The location of the Add-in Manager button.

*If the GTWR does not show up, please check the Add-in Manager (Customize → Add-in Manager) (Figure 9.8).

160

Spatiotemporal Analytics

Make sure GTWR has been installed successfully to your ArcMap (Figure 9.9). 9.4.1.2

Uninstall



FIGURE 9.9

The GTWR tool has been installed successfully to ArcMap.

FIGURE 9.10 Select GTWR_Beta in Add-in Manager window.

Spatiotemporally Weighted Regression

161

FIGURE 9.11 The GTWR tool has been uninstalled successfully to ArcMap.



9.4.2

run gtwr

9.4.2.1 Data Input Open/select your data file (*.csv, *.shp, or a point layer in ArcMap). *Only 2D point data are accepted for the current version.

, and select a layer that has already 2. For point layer – Click been added to the content window (Figure 9.13). The input data should only include double fields. Non-numeric, like characters and symbols, will lead to an error. For panel data, observations with different time stamps, but the same locations should be put in different individual rows. In other words, one row represents one observation with a different spatial or temporal coordinate. 9.4.2.2 Setting 9.4.2.2.1 Dependent Variable/Explanatory Variable(s) Users should select at least one field as an explanatory variable and one field as the dependent variable.

162

FIGURE 9.12

Spatiotemporal Analytics

Click the “Open” button and select the file.

FIGURE 9.13 Select a layer which has already been added to the content window.

*The data in that field should only include digits. *The first line of *.csv file will be distinguished as field name separated by a comma. *Field can only be used once in any or all selection (dependent variable, explanatory variable(s), x/y-coordinate, or time stamp) (Figure 9.14). 1. Select a dependent variable for the regression model. 2. Select explanatory variable(s) for the regression model. 9.4.2.2.2 Output Class The path of output features will be created automatically and the output file will be saved in the textbox. Users can also define the path by clicking the Save button. 9.4.2.2.3 Regression Model Type GWR, TWR, GTWR, and globe ordinary least squares regression (GlobeOLS) are supported. It should be noted that * The time stamp applies to TWR and GTWR. * x/y-coordinate applies to GWR and GTWR. * If users select Globe-OLS as their regression type, no spatial or temporal parameters are needed.

Spatiotemporally Weighted Regression

163

FIGURE 9.14 The Settings of GTWR.

9.4.2.2.4 Spatiotemporal Distance Ratio This specifies a ratio k when combining spatial distance and temporal distance into spatiotemporal distance: dST=dS+k*dT. 9.4.2.2.5 X/Y-Coordinate and Time Stamp These three fields are used to mark the geographic and temporal information of an individual observation. The X- and Y-coordinates should be projected ones. Latitude and longitude (like 77°55′20″ N) are not applicable.

164

Spatiotemporal Analytics

The time stamp should be a number (e.g., 1, 7, 14, …) transformed from a time or date. A format like 11:22:33 is inapplicable. 9.4.2.2.6 Kernel Type This specifies if the kernel is constructed using a fixed distance or if it is allowed to vary in extent as a function of feature density. • FIXED: the spatial context (the Gaussian kernel) used to solve each local regression analysis is a fixed distance. • ADAPTIVE: the spatial context (the Gaussian kernel) is a function of a specified number of neighbors. When feature distribution is dense, the spatial context is smaller; however, when feature distribution is sparse, the spatial context is larger. 9.4.2.2.7 Bandwidth Method This specifies how the extent of the kernel function should be determined. When Akaike Information Criterion (AICc) or cross-validation (CV) is selected, the optimal distance/neighbor parameter will be searched using such criteria. Typically, you will select either AICc or CV if you do not know what to use for the distance (Kernel type = FIXED) or the number of neighbors (Kernel type = ADAPTIVE) parameters. If you select BANDWIDTH_PARAMETER you will need to specify the distance or number of neighbors. • AICc: the extent of the kernel is determined using AICc. • CV: the extent of the kernel is determined using CV. • BANDWIDTH_PARAMETER: the extent of the kernel is determined by a fixed distance or a fixed number of neighbors. 9.4.2.2.8 Max Distance This specifies a fixed bandwidth spatial/temporal extent or distance whenever the kernel type is FIXED and the bandwidth method is BANDWIDTH_PARAMETER (“0” means not constrained on spatial/ temporal distance). 9.4.2.2.9 Number of Neighbors An integer reflecting the exact number of neighbors to be included in the local bandwidth of the Gaussian kernel when the kernel type is ADAPTIVE and the bandwidth method is BANDWIDTH_PARAMETER.

Spatiotemporally Weighted Regression

9.4.2.3

165

Output

9.4.2.3.1 Regression Result The regression result will be output as a *.shp file and saved at the output path. Also, the output features will have same geometry and order as the input one. Some regression results (Residual, Predicted, Observed, Coefficients, and Intercept) will be recorded as fields (Figures 9.15 and 9.16).

FIGURE 9.15

Attribute table information for the regression result.

166

Spatiotemporal Analytics

FIGURE 9.16

9.4.2.3.2

Symbolize the regression result.

Supplementary Table Showing Model Variables and Diagnostic Results

FIGURE 9.17

Model variables and diagnostic results.

9.4.2.3.3 Bandwidth or Neighbors This is the bandwidth or number of neighbors used for each local estimation and is perhaps the most important parameter for a GTWR model. It controls the degree of smoothing in the model. Typically, you will let the program choose a bandwidth or neighbor value for you by selecting either AICc or CV for the Bandwidth method parameter. Both options try to identify an optimal fixed distance or optimal adaptive number of neighbors. Since the criteria for “optimal” are different for AICc than for

Spatiotemporally Weighted Regression

167

CV, it is common to get a different optimal value. You may also provide an exact fixed distance or a particular number of neighbors by selecting BANDWIDTH PARAMETER for the Bandwidth method. The bandwidth units depend on the specified Kernel type. If you select FIXED, the bandwidth value will reflect a distance in the same units as the Input feature class (for example, if the input feature class is projected using UTM coordinates, the distance reported will be in meters). If you select ADAPTIVE, the bandwidth distance will change according to the spatial density of features in the Input feature class. The bandwidth becomes a function of the number of nearest neighbors such that each local estimation is based on the same number of features. Instead of a specific distance, the number of neighbors used for the analysis is reported. 9.4.2.3.4 Residual Squares This is the sum of squared residuals in the model (the residual being the difference between an observed y value and its estimated value returned by the model). The smaller this value, the closer the fit of the model to the observed data. This value is used in several other diagnostic measures. 9.4.2.3.5 Sigma This value is the square root of the normalized residual sum of squares, where the residual sum of squares is divided by the effective degrees of freedom of the residual. This is the estimated standard deviation for the residuals. Smaller values of this statistic are preferable. Sigma is used for AICc computations. 9.4.2.3.6 AICc This is a measure of model performance and is helpful for comparing different regression models. Considering model complexity, the model with the lower AICc value provides a better fit to the observed data. AICc is not an absolute measure of goodness-of-fit but is useful for comparing models with different explanatory variables as long as they apply to the same dependent variable. If the AICc values for two models differ by more than 3, the model with the lower AICc is held to be better. Comparing the GTWR AICc value to the OLS AICc value is one way to assess the benefits of moving from a global model (OLS) to a local regression model (GTWR). 9.4.2.3.7 R2 R-Squared is a measure of goodness-of-fit. Its value varies from 0.0 to 1.0, with higher values being preferable. It may be interpreted as the proportion of dependent variable variance accounted for by the regression model. The denominator for the R2 computation is the sum of squared dependent

168

Spatiotemporal Analytics

variable values. Adding an extra explanatory variable to the model does not alter the denominator but does alter the numerator; this gives the impression of improvement in model fit that may not be real. See Adjusted R2 below. 9.4.2.3.8 R2 Adjusted Because of the problem described above for the R2 value, calculations for the adjusted R-squared value normalize the numerator and denominator by their degrees of freedom. This has the effect of compensating for the number of variables in a model, and consequently, the Adjusted R2 value is almost always smaller than the R2 value. However, in making this adjustment, you lose the interpretation of the value as a proportion of the variance explained. In GTWR, the effective number of degrees of freedom is a function of the bandwidth, so the adjustment may be quite marked in comparison to a global model like OLS. For this reason, the AICc is preferred as a means of comparing models. 9.4.2.4 Error Some common errors are listed below. 9.4.2.4.1 Repeated Variables Any field can be used as a dependent/explanatory variable, x/y-coordinate, or time stamp. Using one filed twice in the same regression is not allowed and a message box will remind the user of such an error. 9.4.2.4.2 Missing Required Field Some fields are required for a particular type of regression, e.g., x/ycoordinate for GWR or GTWR. A missing required field will lead to an error. 9.4.2.4.3 High Variance Inflation Factor (VIF) VIF is used as criteria to test whether factors/explanatory variables show multi-collinearity to each other. Selecting high VIF factors may result in an unstable and incorrect regression result. Therefore, any factor, which can pass a VIF test (VIF > 30), will be reported to users. 9.4.2.4.4 Inapplicable Neighbor Number If users choose BANDWIDTH_PRARMETERS and ADAPTIVE in a regression, the number of neighbors will be set manually. A small number of neighbors (i.e., NUMBER OF NEIGHBORS < 3n, n is the number of explanatory variable(s)) will lead to an unsuccessful regression.

Spatiotemporally Weighted Regression

169

9.4.2.4.5 Invalid Input Only a digital number can be accepted as a valid input for all fields (dependent variable, explanatory variable(s), x/y-coordinate, and time stamp). Some unexpected characters can result in an error and the software will use 0 to replace them. However, the error (the line ID of unexpected characters) will still be reported to users. *Only the first 20 unexpected characters will be listed on the message box. 9.4.2.4.6 No Valid Result Some unknown mistakes happen in the regression and the regression assessment cannot be executed successfully. A singular matrix can lead to such an error.

9.4.3

Some noteS

9.4.3.1

Data Requirements

9.4.3.1.1

Input Data Order

1. *.shp file Only point feature shape file can be used in the Add-in. 3D point, polygon, polyline, or other unexpected features will result in an unknown error. The field name should not violate the ArcGIS Field Name Regulation. Otherwise, Add-in may fail to create an output *.shp with legal field names. Typically, field name with more than 10 characters (five for Chinese characters) will be cut out to 10. 2. Layer feature Only point feature layer can be used in the Add-in. 3D point, polygon, polyline, or other feature layers will result in an unknown error. 3. *.csv file The first row of CSV data should be a field name separated by a “,” and the remaining rows should be data in the same format (Figure 9.18).

FIGURE 9.18

*.csv file and data format.

170

Spatiotemporal Analytics

Only numerical data are acceptable. Unexpected characters may result in some errors and the rows with unexpected characters will be ignored in the regression. 9.4.3.1.2

Statistical Requirements of Input Data

1. Explanatory variable(s) One explanatory variable should be independent of others. High collinearity between two or more variables will result in an incorrect regression model. For each explanatory variable, it should have a linear correlation to the dependent variable. Otherwise, a box-cox transform is needed to convert their relationship into a linear one. Constant field (intercept) should not be included in the input dataset. An intercept will be added to the output result automatically. 2. Model Each explanatory variable included in a GWR/TWR/GTWR model should be statistically significant and pass a null hypothesis. It is strongly recommended that an OLS model be created with step-wise regression before a weighted regression is performed. Usually, data showing certain spatial/temporal autocorrelation will be more suitable for applying GWR/TWR/GTWR. 9.4.3.2

Model Test

9.4.3.2.1 OLS Model Test Before performing a weighted regression, an OLS regression should be performed. From this, a valid list of explanatory variables will be formed and used in the weighted regression later on. A passing OLS model should follow some criteria, which include: • R-squared should meet your expected threshold. • For all explanatory variables, p-values should be less than what you specified. • For all explanatory variables, VIF values should be less than what you specified. • A Jarque–Bera p-value larger than what you specified should be returned. A “step-wise” method can be used to create an OLS model. Then, the Moran’s I value of its residual can be calculated.

Spatiotemporally Weighted Regression

171

9.4.3.2.2 Weighted Regression Model Test Some tests should be performed after performing a weighted (GWR/TWR/ GTWR) regression. Coefficients of explanatory variables should reflect an expected, or at least a justifiable, relationship between each explanatory variable and the dependent variable. Explanatory variables should obtain different aspects of what you are trying to model (none is redundant; small VIF values less than 7.5). Normally distributed residuals indicate your model is free from bias (the Jarque–Bera p-value is not statistically significant). Randomly distributed over- and under-predictions showing model residuals are normally distributed (the spatial autocorrelation p-value is not statistically significant). 9.4.3.3 Spatiotemporal Distance It is defined as a linear combination of spatial distance and temporal distance: dST =dS+k*dT

where k is the spatiotemporal distance ratio. For more details, see the article by Huang et al. (2010).

9.5

CONCLUDING REMARKS

Spatiotemporal analysis and modeling of geographical data has long been one of the main focuses of geographical information science. Regression analysis of geographical relationships is a basic topic in the study of spatiotemporal modeling. As most previous studies have demonstrated that the hypothesis of a stationary geographical process is unlikely to be supported, this chapter introduces a GTWR model for analyzing spatiotemporal non-stationary relationships. Compared to the GWR model, the GTWR model often shows superior performance by taking the temporal dynamics into account, resulting in apparent improvements in both prediction accuracy and efficiency. Further, we have developed a GTWR tool of ArcGIS Add-In and discussed the procedures for using the tool to help readers to use the GTWR model.

REFERENCES Bai, Y., et al., 2016. A geographically and temporally weighted regression model for ground-level PM2.5 estimation from satellite-derived 500 m resolution AOD. Remote Sensing, 8, 262.

172

Spatiotemporal Analytics

Brunsdon, C., A. S. Fotheringham and M. E. Charlton, 1996. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28, 281–298. Cahill, M. and G. Mulligan, 2007. Using geographically weighted regression to explore local crime patterns. Social Science Computer Review, 25, 174–193. Cao, K., M. Diao and B. Wu, 2019. A big data-based geographically weighted regression model for public housing prices: A case study in Singapore. Annals of the American Association of Geographers, 109, 173–186. Cardozo, O. D., J. C. García-Palomares and J. Gutiérrez, 2012. Application of geographically weighted regression to the direct forecasting of transit ridership at station-level. Applied Geography, 34, 548–558. Chen, Q., et al., 2016. Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression. Science of the Total Environment, 572, 450–466. Dong, F., et al., 2020. Can industrial agglomeration promote pollution agglomeration? Evidence from China. Journal of Cleaner Production, 246. Du, Z., et al., 2018. Extending geographically and temporally weighted regression to account for both spatiotemporal heterogeneity and seasonal variations in coastal seas. Ecological Informatics, 43, 185–199. Fang, L., H. Li and M. Li, 2019. Does hotel location tell a true story? Evidence from geographically weighted regression analysis of hotels in Hong Kong. Tourism Management, 72, 78–91. Fotheringham, A. S., C. Brunsdon and M. Charlton, 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. UK: John Wiley & Sons. Ge, Y., et al., 2017. Geographically weighted regression-based determinants of malaria incidences in northern China. Transactions in Gis, 21, 934–953. Harris, R., G. Dong and W. Zhang, 2013. Using contextualized geographically weighted regression to model the spatial heterogeneity of land prices in Beijing, China. Transactions in Gis, 17, 901–919. Huang, B., B. Wu and M. Barry, 2010. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. International Journal of Geographical Information Science, 24, 383–401. Jun, M., et al., 2015. Land use characteristics of subway catchment areas and their influence on subway ridership in Seoul. Journal of Transport Geography, 48, 30–40. Li, X., et al., 2016. Exploring the impact of high speed railways on the spatial redistribution of economic activities – Yangtze River Delta urban agglomeration as a case study. Journal of Transport Geography, 57, 194–206. Liu, J., et al., 2017. A mixed geographically and temporally weighted regression: Exploring spatial-temporal variations from global and local perspectives. Entropy, 19, 53. Lu, B., et al., 2016. The Minkowski approach for choosing the distance metric in geographically weighted regression. International Journal of Geographical Information Science, 30, 351–368.

Spatiotemporally Weighted Regression

173

Ma, X., et al., 2018. A geographically and temporally weighted regression model to explore the spatiotemporal influence of built environment on transit ridership. Computers, Environment and Urban Systems, 70, 113–124. Quick, M., J. Law and G. Li, 2019. Time-varying relationships between land use and crime: A spatio-temporal analysis of small-area seasonal property crime trends. Environment and Planning B: Urban Analytics and City Science, 46, 1018–1035. Selby, B. and K. M. Kockelman, 2013. Spatial prediction of traffic levels in unmeasured locations: Applications of universal kriging and geographically weighted regression. Journal of Transport Geography, 29, 24–32. Stein, R. E., J. F. Conley and C. Davis, 2016. The differential impact of physical disorder and collective efficacy: A geographically weighted regression on violent crime. Geojournal, 81, 351–365. Taylor, B. D., et al., 2009. Nature and/or nurture? Analyzing the determinants of transit ridership across US urbanized areas. Transportation Research Part a: Policy and Practice, 43, 60–77. Wang, W., et al., 2016. Local spatial variations analysis of smear-positive tuberculosis in Xinjiang using Geographically Weighted Regression model. BMC Public Health, 16. Wu, C., et al., 2019. Multiscale geographically and temporally weighted regression: Exploring the spatiotemporal determinants of housing prices. International Journal of Geographical Information Science, 33, 489–511. Xu, C., J. Zhao and P. Liu, 2019. A geographically weighted regression approach to investigate the effects of traffic conditions and road characteristics on air pollutant emissions. Journal of Cleaner Production, 239. Yang, Q., et al., 2019. The relationships between PM2.5 and aerosol optical depth (AOD) in mainland China: About and behind the spatio-temporal variations. Environmental Pollution, 248, 526–535. Zeng, J., et al., 2019. The local variations in regional technological evolution: Evidence from the rise of transmission and digital information technology in China’s technology space, 1992–2016. Applied Geography, 112. Zhai, L., et al., 2018. An improved geographically weighted regression model for PM2.5 concentration estimation in large areas. Atmospheric Environment, 181, 145–154. Zhang, K., et al., 2019. Estimating spatio-temporal variations of PM2.5 concentrations using VIIRS-derived AOD in the Guanzhong Basin, China. Remote Sensing, 11. Zhang, T., et al., 2016. Ground level PM2.5 estimates over China using satellitebased geographically weighted regression (GWR) models are improved by including NO2 and enhanced vegetation index (EVI). International Journal of Environmental Research and Public Health, 13. Zhao, J., et al., 2013. What influences Metro station ridership in China? Insights from Nanjing. Cities, 35, 114–124. Zhou, S. and R. Lin, 2019. Spatial-temporal heterogeneity of air pollution: The relationship between built environment and on-road PM2.5 at micro scale. Transportation Research Part D-Transport and Environment, 76, 305–322.

10

Spatiotemporal Bayesian Regression Ortis Yankey University of Southampton

Tao Hu Oklahoma State University

Han Yue Guangzhou University

Peixiao Wang and Xiao Xu Wuhan University

CONTENTS 10.1 Introduction to Bayesian Inference ...............................................176 10.1.1 Disease Mapping ...............................................................178 10.1.2 Adding a Temporal Component .........................................179 10.1.3 Parametric Time Trend ......................................................180 10.1.4 Exceedance Probabilities and Hotspot Identification ........180 10.2 Example Applications ...................................................................181 10.2.1 Example 1: Modeling Drug Overdose Incident .................181 10.2.1.1 Defining Spatial Adjacency .................................184 10.2.1.2 Mapping the Relative Risk ..................................188 10.2.1.3 Spatial Risk .........................................................189 10.2.1.4 Spatiotemporal Trend and Exceedance Probabilities ........................................................191 10.2.2 Example Application 2: Predictive Distribution of Spatiotemporal Bayesian Model........................................194 10.2.2.1 Predictive Distribution of Spatiotemporal Bayesian Model ..................................................197 10.2.2.2 Parameter Estimation via MCMC .......................197 10.2.2.3 Application Example 2 ........................................198 10.3 Concluding Remarks.................................................................... 204 References ............................................................................................. 205

DOI: 10.1201/9781003304395-10

175

176

Spatiotemporal Analytics

In spatial statistics, spatial regression methods are often used to quantify the relative influence of factors on health and crime, among others. Spatial Lag Model (SLM) and Spatial Error Model (SEM) are widely adopted in spatial regression analysis. However, these models assume that dependent variables are continuous and normally distributed and require that parameters be non-random variables. These assumptions limit the processing or analysis of some spatial information systematically. As opposed to this, a Bayesian spatial regression model treats data as fixed and unknown quantities or parameters as random variables expressed in terms of probabilities. Thus, it can leverage information on the adjacent regions to estimate the dependent variables, overcoming the data sparseness and small-area problem that spatial analysis often encounters. This approach also makes the estimation of model parameters more stable.

10.1

INTRODUCTION TO BAYESIAN INFERENCE

Law et al. (2014) first used the Bayesian modeling approach to analyze the trend over time of lost property cases in different local regions of York City, Canada. Taking spatial autocorrelation and variation into account, the Bayesian modeling approach was able to predict the general trend for lost property and its variations in different local regions (Liu and Zhu, 2017). Bayesian models are becoming increasingly popular for the analysis of spatial and spatiotemporal data particularly in spatial epidemiology where one of the objectives is the quantification of spatial or spatiotemporal risk (Blangiardo and Cameletti, 2015; Moraga, 2019; Lawson, 2013). Bayesian models are different from frequentist statistics (classical statistics) on how parameters are treated during inference. In a frequentist statistic, the parameter of interest is treated as an unknown fixed parameter that can be estimated using maximum likelihood estimation or ordinary least square estimate. Bayesian statistics, on the contrary, treat the parameter of interest as a random variable and assign a prior probability distribution to the parameter to reflect our belief regarding the distribution of the parameter or the uncertainty associated with the parameter (Gelman et al., 2020; Kéry, 2010; Kaplan, 2014). In a Bayesian model, we use probability to represent uncertainty associated with our model parameters, which are estimated based on Bayes theorem. Assuming we have a matrix of observations (data) denoted as y that distribute according to a certain probability distribution y ~ P ( y;θ ), where θ is the parameter of interest, Bayes theorem is given by the formula: P (θ y ) =

P ( y θ ) P (θ ) P ( y)

(10.1)

Spatiotemporal Bayesian Regression

177

In the equation, P (θ y )  is called the posterior distribution and read as the probability (P) of θ given y. P (y θ ) is referred to as the likelihood of the data y given θ. P (θ ) is called the prior distribution of θ . The base P ( y ) is called the marginal probability. The base P ( y ) can be a normalization constant but it is computationally difficult to derive so it is dropped from the Bayes theorem so that Bayes theorem becomes: P (θ y ) ∝ P (y θ ) P (θ )

(10.2)

Thus, the posterior probability of the parameter θ given y is proportional to the likelihood of y given the parameter θ multiplied by the prior probability of the parameter θ . Unlike a point estimation in a classical statistic, Bayesian inference summarizes the parameter of interest by a probability distribution (posterior distribution), which indicates the possible range of the parameter (Gelman et al., 2020). We can then estimate a single-point statistic from the posterior distribution and estimate the level of uncertainty around the value using a credible interval (Bayesian name for confidence interval in frequentist statistics). Although Bayesian inference is logically intuitive, the complex computational process involved has been a setback for most people. Markov Chain Monte Carlo simulation (MCMC) has been one of the earliest methods for Bayesian inference. MCMC is an iterative process in which random samples are drawn from some complex stochastic process and the next drawn sample is based on the previous samples until the iteration process converges to the posterior mean (Brooks et al., 2011; Zuur, 2012). The algorithms Gibbs Sampling and Metropolis–Hastings are used to run the MCMC. Different software packages such as STAN, WinBUGS, OpenBUGS, and JAGS can be used to run MCMC. Although MCMC simulation has been extensively used in Bayesian inference there are still computational difficulties in the iteration process especially when handling a big dataset, for example, a spatiotemporal dataset with high spatial and temporal resolution. The iteration process can run for a very long duration (even days) before the application can converge. Rue et al. (2009) developed a new deterministic model called the Integrated Nested Laplace Approximation (INLA) to circumvent the computational challenges associated with MCMC. The main advantage of the INLA approach over MCMC is that it is much faster to compute by arriving at answers in minutes or seconds as compared to MCMC that often requires hours to days (Martino and Rue, 2009). INLA uses a numerical approximation to estimate the posterior distribution. Unlike MCMC, it does not require any iterative sampling. INLA has been implemented as

178

Spatiotemporal Analytics

an R-Package that can be used to handle a class of statistics called Latent Gaussian Models, which include spatiotemporal models. In this chapter, we use INLA R-Package for the spatiotemporal Bayesian analysis. For an in-depth discussion on INLA, we refer users to the INLA website https:// www.r-inla.org/home, which offers various literatures and discussion.

10.1.1

DiSeaSe mapping

Disease mapping is an important analytical tool in spatial epidemiological analysis. With disease mapping, the aim is to understand spatial patterns in disease etiologies by identifying areas with an unusually high risk for the disease and to predict the trends of the disease outcome across space. Showing spatial variations in disease occurrences would help to spatially identify areas where diseases are particularly prevalent, which may lead to the identification of previously unknown risk factors. Spatial statistics have been widely used in disease mapping, and a wide range of books and articles have been published in the field of spatial epidemiology (Elliot et al., 2000; Lawson, 2013; Pfeiffer et al., 2008). Disease mapping often uses areal data structure where the data are aggregated into a finite number of well-defined non-overlapping geographic units. The geographic unit can be regular such as a regular grid of cells or irregular such as counties, census tracts, or block groups. When the data are aggregated to an area unit, spatial variation in disease may produce unstable disease risk estimates due to the fact that the disease occurrences may be rare within areas with small population sizes. Observations may also be similar to each other due to the spatial proximity of two geographic units, which can also result in unreliable disease risk estimate. Unreliable disease risk estimate due to low observed disease count and spatial dependency of neighborhood structure needs to be accounted for in the model in order to obtain an accurate estimate. In practice, spatial smoothing techniques may be used to address such challenges. Spatial smoothing techniques average the data according to a certain neighborhood structure by borrowing information from neighboring areas in the disease risk estimation, for instance, averaging the population of neighboring areas to represent the local population. We can also incorporate covariate information such as the disease rate for a given geographic neighborhood (Moraga, 2019; Blangiardo and Cameletti, 2015). Disease risk estimates obtained from such spatial smoothing are more reliable and robust due to the increased precision in the risk estimates in areas with few observations (Kang et al., 2016). Bayesian smooth risk estimates is one of the methods used in disease mapping. Estimating disease risk (θ ) for an area is based on our prior belief about θ (prior distribution) and the likelihood of observing θ given the data.

Spatiotemporal Bayesian Regression

179

Combining the prior distribution and the likelihood gives the posterior distribution for the risk estimate for the various neighborhood. Given a matrix of disease counts y = ( y1 yn ) corresponding to each area unit j = 1…n, estimated disease risk is modeled as a Poisson distribution y j ~ Poisson ( e jθ j ). Here e j is the expected count at area j and  θ j is the risk at area j. The risk  θ j   is transformed to a log scale and modeled as log (θ j ) = α + x β j + u j + v j

(10.3)

where α is the intercept, xβ j is a vector of covariates, u j is spatially structured effect to account for spatial dependency, and v j is spatially unstructured random effect. Bayesian inference assumes that all the parameters of interest arise from a probability distribution; hence prior probabilities distribution is assigned to all the parameters above, indicating the likely range of the parameter. If previous epidemiological studies have been conducted, we can incorporate the result of those studies as our prior belief in the model. In the absence of such information, we assign non-informative priors to the parameters. In general, the intercept is assigned a normal distribution with mean 0 and precision τ = 0, the covariates xβ j are assigned a normal distribution with mean 0 and precision τ = 0.001, xβ j ~ N(0, 31.62). A popular method for modeling the spatial component is the Besag, York, and Mollie (BYM) model (Besag et al., 1991). Under the BYM, the spatially structured effect u j is assigned a conditional autoregressive (CAR) distribution that accounts for spatial dependency between neighboring units. The unstructured component v j is modeled as independent and identically distributed normal variable with zero mean and conditional variance σ 2v inversely proportional to the number of neighbors. The hyperpriors, i.e., the variance component for the spatially structured and spatially unstructured effect, are assigned a gamma prior with a shape parameter a = 1 and a scale parameter b = 0.00005. These priors are the default priors used in INLA.

10.1.2

aDDing a temporal Component

Observed disease counts within a geographic unit may also have a temporal dimension where the data are aggregated periodically, such as weekly, monthly, or yearly. A temporal component can be added to the spatial model described above to estimate the spatiotemporal risk or spatiotemporal hotspot for the disease. Many space–time variations in disease risk have been proposed in the literature, but in this study, we focus on the model by Bernardinelli et al. (1995) to estimate spatiotemporal variation in disease risk.

180

10.1.3

Spatiotemporal Analytics

parametriC time trenD

In the spatiotemporal Bayesian model, the temporal effect is considered as linear. This is beneficial in determining developing trends. While involving the spatial effect, the model considers the structured spatial random effect to reveal the interdependence between different regions. Such associated information serves to smoothen and stabilize the estimation results. Bernardinelli et al. (1995) proposed a space–time model with a parametric linear time trend. This model assumes a linear relationship between disease rates and time within a geographic area. The time trend is decomposed into two main effects: a main linear time trend (global trend) and a differential time trend that is allowed to vary for each geographic unit. Assuming the observed count of disease within a geographic area follows a Poisson distribution, the spatial model described above can be reparametrized by adding a temporal component to the model. The model is given by: log (θ j ) = α + xβ j + u j + v j + ( β + δ j ) × t

(10.4)

where α is the intercept, xβ j is a vector of covariate, u j + v j are the main spatial effect, β is the main linear trend (global effect), and δ j represents the differential time trend (space–time interaction). The global time trend represents the average temporal pattern for the entire study location. The differential time trend represents the differences between the global time trend and the time trend for the j-th focal spatial unit. If the differential time trend δ j is negative (or less than 0), then the focal spatial unit is less steep than the global trend. Conversely, if the differential time trend δ j is positive (or greater than 0), it is an indication that the focal spatial unit is steeper than the global trend β. Thus, each spatial unit has a spatial risk given by u j + v j and its own time trend given by the sum (β + δ j ). The main spatial effect (u j + v j) and the differential time trend δ j are assigned the same CAR specification.

10.1.4

exCeeDanCe probabilitieS anD hotSpot iDentifiCation

Exceedance probabilities refer to the probability that a parameter of interest exceeds a given threshold. For example, the probability that the spatial risk (ς = u j + v j) exceeds a given threshold c is given as P(ς > c data ). The exceedance probabilities are interpreted as an area having an unusual or excess disease risk. Bayesian exceedance probabilities have

Spatiotemporal Bayesian Regression

181

been proposed as a Bayesian approach to hotspot/coldspot identification (Richardson et al., 2004). The exceedance probabilities range between a low value of 0 and a high value of 1. Richardson et al. (2004) provide a categorization of the exceedance probabilities into hotspot or coldspot. Exceedance probabilities between 0 and 0.2 are considered as coldspot, 0.2–0.8 as neither cold- or hotspot, and 0.8–1 as hotspot. In a spatiotemporal model, we can extend the exceedance probabilities to be greater than the overall trend to identify spatiotemporal hotspot based on the differential time trend δ j (Li et al., 2014; Law et al., 2014, 2015; Luan et al., 2015). The probability that the differential time trend δ j exceeds a given threshold c is given by P(δ j  > c data ). Results from the exceedance probabilities using the differential time trend which also ranges between 0 and 1 represent a spatiotemporal hotspot, i.e., whether an area is experiencing an increasing disease risk over time or a decreasing disease risk or stable trend over time. Thus, differential posterior exceedance probabilities between 0 and 0.2 can be considered as a spatiotemporal coldspot (decreasing trend), 0.2 and 0.8 are neither coldspot nor hotspot (stable), and 0.8 and 1 are hotspot (increasing trend). The threshold value c used in estimating the exceedance probabilities is dependent on the researcher. Ideally, a value of 0 can be used as the threshold value to estimate the exceedance probabilities.

10.2 EXAMPLE APPLICATIONS 10.2.1

example 1: moDeling Drug overDoSe inCiDent

We examine spatiotemporal pattern of drug overdose risk using overdose incidents for the state of Ohio using a parametric time trend, such as the Bernardinelli model. The data are aggregated at the county level for the time period 2013–2018. The data were downloaded from https://odh.ohio. gov/wps/portal/gov/odh/explore-data-and-stats. Map of Ohio showing the various counties (in shapefile format) was also downloaded from https:// ogrip-geohio.opendata.arcgis.com/datasets/7212b60cc5fc49498699958136 28182c_12. We used a 5-year population estimate for Ohio (2014–2018) by US Census Bureau to compute the expected overdose incidences for each county. There are 88 counties within the state of Ohio. The data are in a csv format arranged in a longitudinal form starting from the year 2013 to the year 2018. A common assumption for modeling count data is to use a Poisson regression where the observed count of area i (i = 1, …, 88) and time t (t = 1,…, 6) is calibrated as   yit ~ Poisson (θ ieit ), where θ i  is the risk  and eit   is the expected count of overdose. The risk of overdose in each county is modeled as

182

Spatiotemporal Analytics

where each of the notations has the same meaning as previously discussed in this chapter. We model the risk of drug overdose using the R-INLA package. To install the package, the address to the INLA-repository needs to be installed as install.packages(“INLA”, repos=c(getOption(“repos”), INLA= “https://inla.r-inla-download.org/R/stable”), dep=TRUE) You also need to load the following R-packages (install them if you do not have it already installed in R): library(sf) library(INLA) library(spdep) library(reshape2) library(RColorBrewer) library(tidyverse) library(classInt) We read the overdose data using the read.csv() function. The head() function shows the first 10 rows of the data. The data are arranged in a long format starting from the year 2013, followed by the year 2014 until the year 2018. Each row contains six columns of information, as follows: County – Name of the county ID – An ID field of one unique identifier for each county Overdose – Observed number of overdose incidents for each county E – Expected rate Year – Data year Pop – Total population The expected overdose rate was estimated using indirect standardization. Expected rate for area i  and time t is given by eit =

Total number of cases for year t × Population for area i (10.6) Total population in all counties for year t

The expected number of cases has already been estimated and can be found in the E column. ###To load overdose data file Overdose_data