Introduction to Environmental Data Science 9781032322186, 9781032330341, 9781003317821

Introduction to Environmental Data Science focuses on data science methods in the R language applied to environmental re

439 72 77MB

English Pages 402 [403] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Introduction to Environmental Data Science
 9781032322186, 9781032330341, 9781003317821

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Contents
Author/editor biographies
List of Figures
1. Background, Goals and Data
1.1. Environmental Data Science
1.2. Environmental Data and Methods
1.3. Goals
1.3.1. Some definitions:
1.4. Exploratory Data Analysis
1.5. Software and Data
1.5.1. Data
1.6. Acknowledgements
I. Exploratory Data Analysis
2. Introduction to R
2.1. Data Objects
2.1.1. Scalars and assignment
2.2. Functions
2.3. Expressions and Statements
2.4. Data Classes
2.4.1. Integers
2.5. Rectangular Data
2.6. Data Structures in R
2.6.1. Vectors
2.6.2. Lists
2.6.3. Matrices
2.6.4. Data frames
2.6.5. Factors
2.7. Accessors and Subsetting
2.7.1. [] Subsetting
2.7.2. [[]] The mysterious double bracket
2.7.3. $ Accessing a vector from a data frame
2.8. Programming scripts in RStudio
2.8.1. function : creating your own
2.8.2. if : conditional operations
2.8.3. for loops
2.8.4. Subsetting with logic
2.8.5. Apply functions
2.9. RStudio projects
2.9.1. R Markdown
2.10. Exercises: Introduction to R
3. Data Abstraction
3.1. The Tidyverse
3.2. Tibbles
3.2.1. Building a tibble from vectors
3.2.2. tribble
3.2.3. read_csv
3.3. Summarizing variable distributions
3.3.1. Stratifying variables by site using a Tukey box plot
3.4. Database operations with dplyr
3.4.1. Select, mutate, and the pipe
3.4.2. filter
3.4.3. Writing a data frame to a csv
3.4.4. Summarize by group
3.4.5. Count
3.4.6. Sorting after summarizing
3.4.7. The dot operator
3.5. String abstraction
3.5.1. Detecting matches
3.5.2. Subsetting strings
3.5.3. String length
3.5.4. Replacing substrings with other text (“mutating” strings)
3.5.5. Concatenating and splitting
3.6. Dates and times with lubridate
3.7. Calling functions explicitly with ::
3.8. Exercises: Data Abstraction
4. Visualization
4.1. plot in base R
4.2. ggplot2
4.3. Plotting one variable
4.3.1. Histogram
4.3.2. Density plot
4.3.3. Boxplot
4.4. Plotting Two Variables
4.4.1. Two continuous variables
4.4.2. Two variables, one discrete
4.4.3. Color systems
4.4.4. Trend line
4.5. General Symbology
4.5.1. Categorical symbology
4.5.2. Log scales instead of transform
4.6. Graphs from Grouped Data
4.6.1. Faceted graphs
4.7. Titles and Subtitles
4.8. Pairs Plot
4.9. Exercises: Visualization
5. Data Transformation
5.1. Data joins
5.2. Set operations
5.3. Binding rows and columns
5.4. Pivoting data frames
5.4.1. pivot_longer
5.4.2. pivot_wider
5.4.3. A free_y faceted graph using a pivot
5.5. Exercise: Transformation
II. Spatial
6. Spatial Data and Maps
6.1. Spatial Data
6.1.1. Simple geometry building in sf
6.1.2. Building points from a data frame
6.1.3. SpatVectors in terra
6.1.4. Creating features from shapefiles
6.2. Coordinate Referencing Systems
6.3. Creating sf Data from Data Frames
6.3.1. Removing geometry
6.4. Base R’s plot() with terra
6.4.1. Using maptiles to create a basemap
6.5. Raster data
6.5.1. Building rasters
6.5.2. Vector to raster conversion
6.6. ggplot2 for Maps
6.6.1. Rasters in ggplot2
6.7. tmap
6.8. Interactive Maps
6.8.1. Leaflet
6.8.2. Mapview
6.8.3. tmap (view mode)
6.8.4. Interactive mapping of individual penguins abstracted from a big dataset
6.9. Exercises: Spatial Data and Maps
6.9.1. Project preparation
7. Spatial Analysis
7.1. Data Frame Operations
7.1.1. Using grouped summaries, and filtering by a selection
7.2. Spatial Analysis Operations
7.2.1. Using topology to subset
7.2.2. Centroid
7.2.3. Distance
7.2.4. Buffers
7.2.5. Spatial overlay: union and intersection
7.2.6. Clip with st_crop
7.2.7. Spatial join with st_join
7.2.8. Further exploration of spatial analysis
7.3. Exercises: Spatial Analysis
8. Raster Spatial Analysis
8.1. Terrain functions
8.2. Map Algebra in terra
8.3. Distance
8.4. Extracting Values
8.5. Focal Statistics
8.6. Zonal Statistics
8.7. Exercises: Raster Spatial Analysis
9. Spatial Interpolation
9.1. Null Model of the Original Data
9.2. Voronoi Polygon
9.2.1. Cross-validation and relative performance
9.3. Nearest Neighbor Interpolation
9.3.1. Cross-validation and relative performance of the nearest neighbor model
9.4. Inverse Distance Weighted (IDW)
9.4.1. Using cross-validation and relative performance to guide inverse-distance weight choice
9.4.2. IDW: trying other inverse distance powers
9.5. Polynomials and Trend Surfaces
9.6. Kriging
9.6.1. Create a variogram.
9.6.2. Fit the variogram based on visual interpretation
9.6.3. Ordinary Kriging
9.7. Exercises: Spatial Interpolation
III. Statistics and Modeling
10. Statistical Summaries and Tests
10.1. Goals of Statistical Analysis
10.2. Summary Statistics
10.2.1. Summarize by group: stratifying a summary
10.2.2. Boxplot for visualizing distributions by group
10.2.3. Generating pseudorandom numbers
10.3. Correlation r and Coefficient of Determination r2
10.3.1. Displaying correlation in a pairs plot
10.4. Statistical Tests
10.4.1. Comparing samples and groupings with a t test and a non-parametric Kruskal-Wallis Rank Sum test
10.4.2. Analysis of variance
10.4.3. Testing a correlation
10.5. Exercises: Statistics
11. Modeling
11.1. Some Common Statistical Models
11.2. Linear Model (lm)
11.3. Spatial Influences on Statistical Analysis
11.3.1. Mapping residuals
11.4. Analysis of Covariance
11.5. Generalized linear model (GLM)
11.5.1. Binomial family: logistic GLM with streams
11.5.2. Logistic landslide model
11.5.3. Poisson regression
11.5.4. Models employing machine learning
11.6. Exercises: Modeling
12. Imagery and Classification Models
12.1. Reading and Displaying Sentinel-2 Imagery
12.1.1. Individual bands
12.1.2. Spectral subsets to create three-band R-G-B and NIR-R-G for visualization
12.1.3. Crop to study area extent
12.1.4. Saving results
12.1.5. Band scatter plots
12.2. Spectral Profiles
12.3. Map Algebra and Vegetation Indices
12.3.1. Vegetation indices
12.3.2. Histogram
12.3.3. Other vegetation indices
12.4. Unsupervised Classification with k-means
12.5. Machine Learning Classification of Imagery
12.5.1. Read imagery and training data and extract sample values for training
12.5.2. Training the CART model
12.5.3. Prediction using the CART model
12.5.4. Validating the model
12.6. Classifying with 10 m Sentinel-2 Imagery
12.6.1. Subset bands (10 m)
12.6.2. Crop to RCV extent and extract pixel values
12.6.3. Training the CART model (10 m) and plot the tree
12.6.4. Prediction using the CART model (10 m)
12.7. Classification Using Multiple Images Capturing Phenology
12.7.1. Create a 10-band stack from both images
12.7.2. Extract the training data (10 m spring + summer)
12.7.3. CART model and prediction (10 m spring + summer)
12.8. Conclusions and Next Steps for Imagery Classification
12.9. Exercises: Imagery Analysis and Classification Models
IV. Time Series
13. Time Series Visualization and Analysis
13.1. Structure, Seasonality, and Decomposition of Time Series
13.2. Creation of Time Series (ts) Data
13.2.1. Frequency, start, and end parameters for ts()
13.2.2. Associating times with time series
13.2.3. Subsetting time series by times
13.2.4. Changing the frequency to use a different period
13.2.5. Time stamps and extensible time series
13.3. Data smoothing: moving average (ma)
13.4. Decomposition of data logger data: Marble Mountains
13.5. Facet Graphs for Comparing Variables over Time
13.6. Lag Regression
13.6.1. The lag regression, using a lag function in a linear model
13.7. Ensemble Summary Statistics
13.8. Learning more about Time Series in R
13.9. Exercises: Time Series
V. Communication and References
14. Communication with Shiny
14.1. Shiny Document
14.1.1. Input and output objects in the Old Faithful Eruptions document
14.1.2. Input widgets
14.1.3. Other input widgets
14.2. A Shiny App
14.2.1. A brief note on reactivity
14.3. Shiny App I/O Methods
14.3.1. Data tables
14.3.2. Text as character: renderPrint() and verbatimTextOutput()
14.3.3. Formatted text
14.3.4. Plots
14.4. Shiny App in a Package
14.5. Components of a Shiny App (sierra)
14.5.1. Initial data setup
14.5.2. The ui section, with a tabsetPanel structure
14.5.3. The server section, including reactive elements
14.5.4. Calling shinyApp with the ui and server function results
14.6. A MODIS Fire App with Web Scraping and observe with leafletProxy
14.6.1. Setup code
14.6.2. ui
14.6.3. Using observe and leafletProxy to allow changing the date while retaining the map zoom
14.7. Learn More about Shiny Apps
14.8. Exercises: Shiny
References
Index

Citation preview

Introduction to Environmental Data Science Introduction to Environmental Data Science focuses on data science methods in the R language applied to environmental research, with sections on exploratory data analysis in R including data abstraction, transformation, and visualization; spatial data analysis in vector and raster models; statistics & modelling ranging from exploratory to modelling, considering confirmatory statistics and extending to machine learning models; time series analysis, focusing especially on carbon and micrometeorological flux; and communication. Introduction to Environmental Data Science. It is an ideal textbook to teach undergraduate to graduate level students in environmental science, environmental studies, geography, earth science, and biology, but can also serve as a reference for environmental professionals working in consulting, NGOs, and government agencies at the local, state, federal, and international levels. Features • Gives thorough consideration of the needs for environmental research in both spatial and temporal domains. • Features examples of applications involving field-collected data ranging from individual observations to data logging. • Includes examples also of applications involving government and NGO sources, ranging from satellite imagery to environmental data collected by regulators such as EPA. • Contains class-tested exercises in all chapters other than case studies. Solutions manual available for instructors. • All examples and exercises make use of a GitHub package for functions and especially data.

Taylor & Francis Taylor & Francis Group

http://taylorandfrancis.com

Introduction to Environmental Data Science

Jerry D. Davis

Designed cover image: By Anna Studwell and Jerry D. Davis First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 Jerry D. Davis Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-32218-6 (hbk) ISBN: 978-1-032-33034-1 (pbk) ISBN: 978-1-003-31782-1 (ebk) DOI: 10.1201/9781003317821 Typeset in LM Roman by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.

“Dandelion fluff – Ephemeral stalk sheds seeds to the universe” by Anna Studwell

Taylor & Francis Taylor & Francis Group

http://taylorandfrancis.com

Contents

Author/editor biographies

xiii

List of Figures 1 Background, Goals and Data 1.1 Environmental Data Science . . . 1.2 Environmental Data and Methods 1.3 Goals . . . . . . . . . . . . . . . . 1.3.1 Some definitions: . . . . . . 1.4 Exploratory Data Analysis . . . . 1.5 Software and Data . . . . . . . . . 1.5.1 Data . . . . . . . . . . . . . 1.6 Acknowledgements . . . . . . . . .

xv . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

I Exploratory Data Analysis 2 Introduction to R 2.1 Data Objects . . . . . . . . . . . . . . . . . . . 2.1.1 Scalars and assignment . . . . . . . . . 2.2 Functions . . . . . . . . . . . . . . . . . . . . . 2.3 Expressions and Statements . . . . . . . . . . 2.4 Data Classes . . . . . . . . . . . . . . . . . . . 2.4.1 Integers . . . . . . . . . . . . . . . . . . 2.5 Rectangular Data . . . . . . . . . . . . . . . . 2.6 Data Structures in R . . . . . . . . . . . . . . 2.6.1 Vectors . . . . . . . . . . . . . . . . . . 2.6.2 Lists . . . . . . . . . . . . . . . . . . . . 2.6.3 Matrices . . . . . . . . . . . . . . . . . . 2.6.4 Data frames . . . . . . . . . . . . . . . . 2.6.5 Factors . . . . . . . . . . . . . . . . . . 2.7 Accessors and Subsetting . . . . . . . . . . . . 2.7.1 [] Subsetting . . . . . . . . . . . . . . . 2.7.2 [[]] The mysterious double bracket . . 2.7.3 $ Accessing a vector from a data frame . 2.8 Programming scripts in RStudio . . . . . . . . 2.8.1 function : creating your own . . . . . . 2.8.2 if : conditional operations . . . . . . . . 2.8.3 for loops . . . . . . . . . . . . . . . . . 2.8.4 Subsetting with logic . . . . . . . . . . . 2.8.5 Apply functions . . . . . . . . . . . . . . 2.9 RStudio projects . . . . . . . . . . . . . . . . . 2.9.1 R Markdown . . . . . . . . . . . . . . .

1 1 1 2 3 3 4 5 8

11 . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

13 14 14 16 17 20 20 22 23 23 29 29 31 37 39 39 41 42 42 43 44 45 49 50 51 52 vii

viii

Contents 2.10 Exercises: Introduction to R

. . . . . . . . . . . . . . . . . . . . . . . . . .

53

3 Data Abstraction 3.1 The Tidyverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Tibbles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Building a tibble from vectors . . . . . . . . . . . . . . . . 3.2.2 tribble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 read_csv . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Summarizing variable distributions . . . . . . . . . . . . . . . . 3.3.1 Stratifying variables by site using a Tukey box plot . . . . 3.4 Database operations with dplyr . . . . . . . . . . . . . . . . . . 3.4.1 Select, mutate, and the pipe . . . . . . . . . . . . . . . . . 3.4.2 filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Writing a data frame to a csv . . . . . . . . . . . . . . . . 3.4.4 Summarize by group . . . . . . . . . . . . . . . . . . . . . 3.4.5 Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 Sorting after summarizing . . . . . . . . . . . . . . . . . . 3.4.7 The dot operator . . . . . . . . . . . . . . . . . . . . . . . 3.5 String abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Detecting matches . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Subsetting strings . . . . . . . . . . . . . . . . . . . . . . 3.5.3 String length . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.4 Replacing substrings with other text (“mutating” strings) 3.5.5 Concatenating and splitting . . . . . . . . . . . . . . . . . 3.6 Dates and times with lubridate . . . . . . . . . . . . . . . . . . 3.7 Calling functions explicitly with :: . . . . . . . . . . . . . . . . 3.8 Exercises: Data Abstraction . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

55 56 56 57 58 58 60 62 63 63 66 67 67 68 69 69 70 70 72 72 73 74 76 77 77

4 Visualization 4.1 plot in base R . . . . . . . . . . . . . 4.2 ggplot2 . . . . . . . . . . . . . . . . . 4.3 Plotting one variable . . . . . . . . . 4.3.1 Histogram . . . . . . . . . . . . 4.3.2 Density plot . . . . . . . . . . . 4.3.3 Boxplot . . . . . . . . . . . . . 4.4 Plotting Two Variables . . . . . . . . 4.4.1 Two continuous variables . . . 4.4.2 Two variables, one discrete . . 4.4.3 Color systems . . . . . . . . . . 4.4.4 Trend line . . . . . . . . . . . . 4.5 General Symbology . . . . . . . . . . 4.5.1 Categorical symbology . . . . . 4.5.2 Log scales instead of transform 4.6 Graphs from Grouped Data . . . . . . 4.6.1 Faceted graphs . . . . . . . . . 4.7 Titles and Subtitles . . . . . . . . . . 4.8 Pairs Plot . . . . . . . . . . . . . . . . 4.9 Exercises: Visualization . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

79 79 80 81 82 84 86 90 90 92 92 97 98 99 99 100 102 102 103 105

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

5 Data Transformation 107 5.1 Data joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Contents 5.2 5.3 5.4

5.5

II

Set operations . . . . . . . . . . . . . . . . Binding rows and columns . . . . . . . . . Pivoting data frames . . . . . . . . . . . . 5.4.1 pivot_longer . . . . . . . . . . . . . 5.4.2 pivot_wider . . . . . . . . . . . . . . 5.4.3 A free_y faceted graph using a pivot Exercise: Transformation . . . . . . . . . .

ix . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Spatial

121

6 Spatial Data and Maps 6.1 Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Simple geometry building in sf . . . . . . . . . . . . . . 6.1.2 Building points from a data frame . . . . . . . . . . . . 6.1.3 SpatVectors in terra . . . . . . . . . . . . . . . . . . . . 6.1.4 Creating features from shapefiles . . . . . . . . . . . . . 6.2 Coordinate Referencing Systems . . . . . . . . . . . . . . . . . 6.3 Creating sf Data from Data Frames . . . . . . . . . . . . . . . 6.3.1 Removing geometry . . . . . . . . . . . . . . . . . . . . 6.4 Base R’s plot() with terra . . . . . . . . . . . . . . . . . . . . 6.4.1 Using maptiles to create a basemap . . . . . . . . . . . 6.5 Raster data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Building rasters . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Vector to raster conversion . . . . . . . . . . . . . . . . 6.6 ggplot2 for Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Rasters in ggplot2 . . . . . . . . . . . . . . . . . . . . . 6.7 tmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 Interactive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.1 Leaflet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.2 Mapview . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.3 tmap (view mode) . . . . . . . . . . . . . . . . . . . . . 6.8.4 Interactive mapping of individual penguins abstracted dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Exercises: Spatial Data and Maps . . . . . . . . . . . . . . . . 6.9.1 Project preparation . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . from . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a big . . . . . . . . . . . .

7 Spatial Analysis 7.1 Data Frame Operations . . . . . . . . . . . . . . . . . . . . . 7.1.1 Using grouped summaries, and filtering by a selection 7.2 Spatial Analysis Operations . . . . . . . . . . . . . . . . . . 7.2.1 Using topology to subset . . . . . . . . . . . . . . . . . 7.2.2 Centroid . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 Spatial overlay: union and intersection . . . . . . . . . 7.2.6 Clip with st_crop . . . . . . . . . . . . . . . . . . . . 7.2.7 Spatial join with st_join . . . . . . . . . . . . . . . . 7.2.8 Further exploration of spatial analysis . . . . . . . . . 7.3 Exercises: Spatial Analysis . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

8 Raster Spatial Analysis

109 110 111 111 115 116 119

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

123 124 125 128 129 133 135 137 137 138 139 141 141 143 146 149 150 153 154 155 156 157 159 159 163 164 165 168 168 169 171 178 179 182 183 184 184 187

x

Contents . . . . . . .

187 190 192 194 200 203 203

9 Spatial Interpolation 9.1 Null Model of the Original Data . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Voronoi Polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Cross-validation and relative performance . . . . . . . . . . . . . . . 9.3 Nearest Neighbor Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Cross-validation and relative performance of the nearest neighbor model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Inverse Distance Weighted (IDW) . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Using cross-validation and relative performance to guide inversedistance weight choice . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 IDW: trying other inverse distance powers . . . . . . . . . . . . . . . 9.5 Polynomials and Trend Surfaces . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Create a variogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Fit the variogram based on visual interpretation . . . . . . . . . . . 9.6.3 Ordinary Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Exercises: Spatial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . .

205 205 207 209 210

8.1 8.2 8.3 8.4 8.5 8.6 8.7

III

Terrain functions . . . . Map Algebra in terra . Distance . . . . . . . . Extracting Values . . . Focal Statistics . . . . . Zonal Statistics . . . . Exercises: Raster Spatial

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

Statistics and Modeling

211 211 212 213 214 218 219 220 222 223

225

10 Statistical Summaries and Tests 227 10.1 Goals of Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 227 10.2 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10.2.1 Summarize by group: stratifying a summary . . . . . . . . . . . . . . 229 10.2.2 Boxplot for visualizing distributions by group . . . . . . . . . . . . . 230 10.2.3 Generating pseudorandom numbers . . . . . . . . . . . . . . . . . . . 230 10.3 Correlation r and Coefficient of Determination r2 . . . . . . . . . . . . . . . 233 10.3.1 Displaying correlation in a pairs plot . . . . . . . . . . . . . . . . . . 236 10.4 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10.4.1 Comparing samples and groupings with a t test and a non-parametric Kruskal-Wallis Rank Sum test . . . . . . . . . . . . . . . . . . . . . 237 10.4.2 Analysis of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 10.4.3 Testing a correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.5 Exercises: Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 11 Modeling 11.1 Some Common Statistical Models . . . . . . . . . 11.2 Linear Model (lm) . . . . . . . . . . . . . . . . . . 11.3 Spatial Influences on Statistical Analysis . . . . . 11.3.1 Mapping residuals . . . . . . . . . . . . . . 11.4 Analysis of Covariance . . . . . . . . . . . . . . . 11.5 Generalized linear model (GLM) . . . . . . . . . . 11.5.1 Binomial family: logistic GLM with streams

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

253 253 254 256 256 261 266 266

Contents 11.5.2 Logistic landslide model . . . . . . . 11.5.3 Poisson regression . . . . . . . . . . 11.5.4 Models employing machine learning 11.6 Exercises: Modeling . . . . . . . . . . . . .

xi . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

269 276 278 279

12 Imagery and Classification Models 281 12.1 Reading and Displaying Sentinel-2 Imagery . . . . . . . . . . . . . . . . . . 281 12.1.1 Individual bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 12.1.2 Spectral subsets to create three-band R-G-B and NIR-R-G for visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 12.1.3 Crop to study area extent . . . . . . . . . . . . . . . . . . . . . . . . 284 12.1.4 Saving results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.1.5 Band scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.2 Spectral Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 12.3 Map Algebra and Vegetation Indices . . . . . . . . . . . . . . . . . . . . . . 290 12.3.1 Vegetation indices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 12.3.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 12.3.3 Other vegetation indices . . . . . . . . . . . . . . . . . . . . . . . . . 291 12.4 Unsupervised Classification with k-means . . . . . . . . . . . . . . . . . . . 293 12.5 Machine Learning Classification of Imagery . . . . . . . . . . . . . . . . . . 295 12.5.1 Read imagery and training data and extract sample values for training 296 12.5.2 Training the CART model . . . . . . . . . . . . . . . . . . . . . . . . 297 12.5.3 Prediction using the CART model . . . . . . . . . . . . . . . . . . . 298 12.5.4 Validating the model . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 12.6 Classifying with 10 m Sentinel-2 Imagery . . . . . . . . . . . . . . . . . . . 303 12.6.1 Subset bands (10 m) . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 12.6.2 Crop to RCV extent and extract pixel values . . . . . . . . . . . . . 304 12.6.3 Training the CART model (10 m) and plot the tree . . . . . . . . . 304 12.6.4 Prediction using the CART model (10 m) . . . . . . . . . . . . . . . 305 12.7 Classification Using Multiple Images Capturing Phenology . . . . . . . . . 308 12.7.1 Create a 10-band stack from both images . . . . . . . . . . . . . . . 309 12.7.2 Extract the training data (10 m spring + summer) . . . . . . . . . . 309 12.7.3 CART model and prediction (10 m spring + summer) . . . . . . . . 310 12.8 Conclusions and Next Steps for Imagery Classification . . . . . . . . . . . . 315 12.9 Exercises: Imagery Analysis and Classification Models . . . . . . . . . . . . 316

IV Time Series 13 Time Series Visualization and Analysis 13.1 Structure, Seasonality, and Decomposition of Time Series . . . . 13.2 Creation of Time Series (ts) Data . . . . . . . . . . . . . . . . . 13.2.1 Frequency, start, and end parameters for ts() . . . . . . . 13.2.2 Associating times with time series . . . . . . . . . . . . . 13.2.3 Subsetting time series by times . . . . . . . . . . . . . . . 13.2.4 Changing the frequency to use a different period . . . . . 13.2.5 Time stamps and extensible time series . . . . . . . . . . 13.3 Data smoothing: moving average (ma) . . . . . . . . . . . . . . . 13.4 Decomposition of data logger data: Marble Mountains . . . . . . 13.5 Facet Graphs for Comparing Variables over Time . . . . . . . . 13.6 Lag Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6.1 The lag regression, using a lag function in a linear model

317 . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

319 321 323 324 325 325 327 328 332 335 338 341 343

xii

Contents 13.7 Ensemble Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Learning more about Time Series in R . . . . . . . . . . . . . . . . . . . . 13.9 Exercises: Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

V Communication and References

345 347 347

349

14 Communication with Shiny 351 14.1 Shiny Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 14.1.1 Input and output objects in the Old Faithful Eruptions document . 353 14.1.2 Input widgets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 14.1.3 Other input widgets . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 14.2 A Shiny App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 14.2.1 A brief note on reactivity . . . . . . . . . . . . . . . . . . . . . . . . 359 14.3 Shiny App I/O Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 14.3.1 Data tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 14.3.2 Text as character: renderPrint() and verbatimTextOutput() . . . . . 360 14.3.3 Formatted text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 14.3.4 Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 14.4 Shiny App in a Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 14.5 Components of a Shiny App (sierra) . . . . . . . . . . . . . . . . . . . . . . 363 14.5.1 Initial data setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 14.5.2 The ui section, with a tabsetPanel structure . . . . . . . . . . . . . . 364 14.5.3 The server section, including reactive elements . . . . . . . . . . . . 365 14.5.4 Calling shinyApp with the ui and server function results . . . . . . . 367 14.6 A MODIS Fire App with Web Scraping and observe with leafletProxy . . 367 14.6.1 Setup code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 14.6.2 ui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 14.6.3 Using observe and leafletProxy to allow changing the date while retaining the map zoom . . . . . . . . . . . . . . . . . . . . . . . . . 369 14.7 Learn More about Shiny Apps . . . . . . . . . . . . . . . . . . . . . . . . . 370 14.8 Exercises: Shiny . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 References

373

Index

377

Author/editor biographies

Jerry Douglas Davis is a Professor of Geography & Environment (https://geog.sfsu.edu/) and the Director of the Institute for Geographic Information Science (https://gis.sfsu.edu/) at San Francisco State University, and borrows heavily from his and his students’ field-based environmental research for examples in the book.

xiii

Taylor & Francis Taylor & Francis Group

http://taylorandfrancis.com

List of Figures

1.1 1.2

Environmental data science . . . . . . . . . . . . . . . . . . . . . . . . . . California counties simple features data in igisci package . . . . . . . . . .

1 7

2.1 2.2 2.3

Variables, observations, and values in rectangular data . . . . . . . . . . . Temperature plotted by index (left) and elevation (right) . . . . . . . . . . The three penguin species in palmerpenguins. Photos by KB Gorman. Used with permission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagram of penguin head with indication of bill length and bill depth (from Horst, Hill, and Gorman (2020), used with permission) . . . . . . . . . . . Temperature and elevation scatter plot . . . . . . . . . . . . . . . . . . . . TRI dataframe – DT datatable output . . . . . . . . . . . . . . . . . . . . Crude river map using x y coordinates . . . . . . . . . . . . . . . . . . . . Longitudinal profile built from cumulative distances and elevation . . . . .

22 28

2.4 2.5 2.6 2.7 2.8 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19

Visualization of some abstracted data from the EPA Toxic Release Inventory Euc-Oak paired plot runoff and erosion study (Thompson, Davis, and Oliphant (2016)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eucalyptus/Oak paired site locations . . . . . . . . . . . . . . . . . . . . . Tukey boxplot of runoff under eucalyptus canopy . . . . . . . . . . . . . . Flipper length by mass and by species, base plot system. The Antarctic peninsula penguin data set is from @palmer. . . . . . . . . . . . . . . . . . Simple bar graph of meadow vegetation samples . . . . . . . . . . . . . . . Distribution of NDVI, Knuthson Meadow . . . . . . . . . . . . . . . . . . . Distribution of Average Monthly Temperatures, Sierra Nevada . . . . . . . Cumulative Distribution of Average Monthly Temperatures, Sierra Nevada Density plot of NDVI, Knuthson Meadow . . . . . . . . . . . . . . . . . . . Comparative density plot using alpha setting . . . . . . . . . . . . . . . . . Runoff under eucalyptus and oak in Bay Area sites . . . . . . . . . . . . . Boxplot of runoff by site . . . . . . . . . . . . . . . . . . . . . . . . . . . . Runoff at Bay Area Sites, colored as eucalyptus and oak . . . . . . . . . . Marble Valley, Marble Mountains Wilderness, California . . . . . . . . . . Marble Mountains soil gas sampling sites, with surface topographic features and cave passages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Visualizing soil CO2 data with a Tukey box plot . . . . . . . . . . . . . . . Scatter plot of discharge (Q) and specific electrical conductance (EC) for Sagehen Creek, California . . . . . . . . . . . . . . . . . . . . . . . . . . . Q and EC for Sagehen Creek, using log10 scaling on both axes . . . . . . . Setting one color for all points . . . . . . . . . . . . . . . . . . . . . . . . . Two variables, one discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . Using aesthetics settings for both points and lines . . . . . . . . . . . . . . Color set within aes() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32 32 35 35 46 48 55 60 62 62 80 81 83 83 84 85 85 86 87 87 88 89 89 90 91 91 92 93 94 xv

xvi

List of Figures 4.20 4.21 4.22 4.23 4.24 4.25 4.26 4.27 4.28 4.29 4.30 4.31 4.32 5.1 5.2 5.3 5.4 5.5 5.6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20

Streamflow (Q) and specific electrical conductance (EC) for Sagehen Creek, colored by temperature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Channel slope as range from green to red, vertices sized by elevation . . . Channel slope as range of line colors on a longitudinal profile . . . . . . . . Channel slope by longitudinal distance as scatter points colored by slope . Trend line with a linear model . . . . . . . . . . . . . . . . . . . . . . . . . EPA TRI, categorical symbology for industry sector . . . . . . . . . . . . . Using log scales instead of transforming . . . . . . . . . . . . . . . . . . . . NDVI symbolized by vegetation in two seasons . . . . . . . . . . . . . . . . Eucalyptus and oak: rainfall and runoff . . . . . . . . . . . . . . . . . . . . Faceted graph alternative to color grouping (note that the y scale is the same for each) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Titles added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairs plot for Sierra Nevada stations variables . . . . . . . . . . . . . . . . Enhanced GGally pairs plot for palmerpenguin data . . . . . . . . . . . . . Color classified by phenology, data created by a pivot . . . . . . . . . . . . Euc vs oak graphs created using a pivot . . . . . . . . . . . . . . . . . . . Runoff/rainfall scatterplot colored by tree, created by pivot and binding rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flux tower installed at Loney Meadow, 2016. Photo credit: Darren Blackburn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . free-y facet graph supported by pivot (note the y axis scaling varies among variables) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A simple ggplot2 map built from scratch with hard-coded data as simple feature columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using an sf class to build a map in ggplot2, displaying an attribute . . . . Base R plot of one attribute from two states . . . . . . . . . . . . . . . . . Points created from a dataframe with Simple Features . . . . . . . . . . . Simple plot of SpatVector point data with labels (note that overlapping labels may result, as seen here) . . . . . . . . . . . . . . . . . . . . . . . . ggplot of twostates and stations . . . . . . . . . . . . . . . . . . . . . . . . Base R plot of twostates and stations SpatVectors . . . . . . . . . . . . . . A simple plot of polygon data by default shows all variables . . . . . . . . A single map with a legend is produced when a variable is specified . . . . Points created from data frame with coordinate variables . . . . . . . . . . Plotting SpatVector data with base R plot system . . . . . . . . . . . . . . Features added to the map using the base R plot system . . . . . . . . . . Using maptiles for a base map . . . . . . . . . . . . . . . . . . . . . . . . . Converted sf data for map with tiles . . . . . . . . . . . . . . . . . . . . . Simple plot of a worldwide SpatRaster of 30-degree cells, with SpatVector of CA and NV added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stream raster converted from stream features, with 30 m cells from an elevation raster template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuttle Radar Topography Mission (SRTM) image of Virgin River Canyon area, southern Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . simple ggplot map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . labels added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . repositioned legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 96 96 97 98 99 100 101 101 102 103 104 104 113 114 114 117 118 120 127 127 128 129 131 132 133 134 134 137 138 139 140 141 143 144 146 147 147 148

List of Figures 6.21 6.22 6.23 6.24 6.25 6.26 6.27 6.28 6.29

6.30 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8

Using bbox to zoom into two counties . . . . . . . . . . . . . . . . . . . . . Rasters displayed in ggplot by converting to points . . . . . . . . . . . . . tmap of the world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . tmap fill colored by variable . . . . . . . . . . . . . . . . . . . . . . . . . . hillshade, borders and point symbols in tmap . . . . . . . . . . . . . . . . Two western states with a basemap in tmap . . . . . . . . . . . . . . . . . Leaflet map showing the location of the SFSU Institute for Geographic Information Science with choices of basemaps . . . . . . . . . . . . . . . . View (interactive) mode of tmap with selection of basemaps . . . . . . . . Observations of Adélie penguin migration from a 5-season study of a large colony at Ross Island in the SW Ross Sea, Antarctica; and an individual – H36CROZ0708 – from season 0708. Data source: Ballard et al. (2019). Fine-scale oceanographic features characterizing successful Adélie penguin foraging in the SW Ross Sea. Marine Ecology Progress Series 608:263-277. . . . . . . . . . tmap View mode (goal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii 149 150 151 152 152 153 155 157

158 161 165 165 167 169 170 171 173

7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 7.18 7.19

Plotting filtered data: above 2,000 m and 38°N latitude with a basemap . . A Bodie scene, from Bodie State Historic Park (https://www.parks.ca.gov/) Sierra data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Northern Sierra stations and places . . . . . . . . . . . . . . . . . . . . . . California county centroids . . . . . . . . . . . . . . . . . . . . . . . . . . . Map scaled to cover Bay Area tracts using a bbox . . . . . . . . . . . . . . Nile River points, colored by channel slope . . . . . . . . . . . . . . . . . . Nile River channel slope as range of colors from green to red, with great circle channel distances derived using the haversine method . . . . . . . . Selection of soil CO2 sampling sites, July 1995 . . . . . . . . . . . . . . . . Selection of soil CO2 and in-cave water samples . . . . . . . . . . . . . . . Distance from CO2 samples to closest streams (not including lakes) . . . . Distance to towns (places) from weather stations . . . . . . . . . . . . . . 100 m trail buffer, Marble Mountains . . . . . . . . . . . . . . . . . . . . . Unioned trail buffer, dissolving boundaries . . . . . . . . . . . . . . . . . . Intersection of trail and stream buffers . . . . . . . . . . . . . . . . . . . . Union of two sets of buffer polygons . . . . . . . . . . . . . . . . . . . . . . Cropping with specified x and y limits . . . . . . . . . . . . . . . . . . . . TRI points with census variables added via a spatial join . . . . . . . . . . Transect Buffers (goal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12

Marble Mountains (California) elevation . . . . . . . . . . . . . . . . . . . Slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Classified slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hillshade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Map algebra conversion of elevations from metres to feet . . . . . . . . . . Boolean: slope > 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boolean intersection: (slope > 20) * (elev > 2000) . . . . . . . . . . . . . . Stream distance raster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Random points in the Marble Valley area, Marble Mountains, California . Points colored by geology extracted from raster . . . . . . . . . . . . . . . Elevation by stream distance, colored by geology, random point extraction

188 188 189 189 190 191 191 192 193 196 197 197

173 174 176 177 178 179 180 181 181 182 183 185

xviii

List of Figures

8.16 8.17 8.18 8.19 8.20

Dissolved calcium carbonate grouped by geology extracted at water sample points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Slope by elevation colored by extracted geology . . . . . . . . . . . . . . . Logarithm of calcium carbonate total hardness at sample points, showing geologic units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9x9 focal mean of elevation . . . . . . . . . . . . . . . . . . . . . . . . . . . Hillshade of 9x9 focal mean of elevation . . . . . . . . . . . . . . . . . . . . Marble Mountains geology raster . . . . . . . . . . . . . . . . . . . . . . . Modal geology in 9 by 9 neighborhoods . . . . . . . . . . . . . . . . . . . . Geology and elevation by stream and trail distance (goal) . . . . . . . . . .

199 201 201 202 202 204

9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 9.10 9.11 9.12 9.13 9.14 9.15 9.16 9.17 9.18 9.19

Precipitation map in Teale Albers in Sierra counties . Voronoi polygons around Sierra stations . . . . . . . Precipitation mapped by Voronoi polygon . . . . . . Rasterized Voronoi polygons . . . . . . . . . . . . . . Nearest neighbor interpolation of precipitation . . . . IDW interpolation, power = 2 . . . . . . . . . . . . . IDW interpolation, power = 1 . . . . . . . . . . . . . Linear trend . . . . . . . . . . . . . . . . . . . . . . . 2nd order polynomial, precipitation . . . . . . . . . . Third order polynomial, temperature . . . . . . . . . Third order polynomial with extremes flattened . . . Third order local polynomial, precipitation . . . . . . Variogram of precipitation at Sierra weather stations Fitted variogram . . . . . . . . . . . . . . . . . . . . Spherical fit . . . . . . . . . . . . . . . . . . . . . . . Exponential model . . . . . . . . . . . . . . . . . . . Ordinary Kriging . . . . . . . . . . . . . . . . . . . . Voronoi polygons of precipitation (goal) . . . . . . . IDW (goal) . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

206 207 208 208 210 212 213 215 216 217 217 218 219 220 221 222 222 224 224

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper . . . . . . . . . . . . . . . .

231 231 232 232 233 234 235 237 238 241 241 242 243 244 247

8.13 8.14 8.15

10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12 10.13 10.14 10.15 10.16

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

Tukey boxplot by group . . . . . . . . . . . . . . . . . . . . . . . . Marble Mountains average soil carbon dioxide per site . . . . . . . Random uniform histogram . . . . . . . . . . . . . . . . . . . . . . Random normal histogram . . . . . . . . . . . . . . . . . . . . . . . Random normal density plot . . . . . . . . . . . . . . . . . . . . . . Random normal plotted against random uniform . . . . . . . . . . Scatter plot illustrating negative correlation . . . . . . . . . . . . . Pairs plot with r values . . . . . . . . . . . . . . . . . . . . . . . . . NDVI by phenology . . . . . . . . . . . . . . . . . . . . . . . . . . . Runoff under eucalyptus and oak in Bay Area sites . . . . . . . . . Runoff at various sites contrasting euc and oak . . . . . . . . . . . . East Bay sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eucalyptus and oak sediment runoff box plots . . . . . . . . . . . . Facet density plot of eucalyptus and oak sediment runoff . . . . . . Water sampling in varying lithologies in a karst area . . . . . . . . Total hardness from dissolved carbonates at water sampling sites in Sinking Cove, TN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.17 Sinking Cove dissolved carbonates as total hardness by lithology . . 10.18 Upper Sinking Cove (Tennessee) stratigraphy . . . . . . . . . . . . 10.19 Sinking Cove dissolved carbonates as TH and elevation by lithology

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

198 198

247 248 250 250

List of Figures

xix

11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 11.13 11.14 11.15 11.16 11.17 11.18

Original February temperature data . . . . . . . . . . . . . . . Temperature predicted by elevation model . . . . . . . . . . . Temperature predicted by elevation raster . . . . . . . . . . . Residuals of temperature from model predictions by elevation Meandering river . . . . . . . . . . . . . . . . . . . . . . . . . Braided river . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anastomosed river . . . . . . . . . . . . . . . . . . . . . . . . Q vs S with stream type . . . . . . . . . . . . . . . . . . . . . Landslide in San Pedro Creek watershed . . . . . . . . . . . . Landslides in San Pedro Creek watershed . . . . . . . . . . . . Sediment source analysis . . . . . . . . . . . . . . . . . . . . . Raw random points . . . . . . . . . . . . . . . . . . . . . . . . Landslides and buffers to exclude from random points . . . . . Landslides and random points (excluded from slide buffers) . . Logistic model prediction of 1983 landslide probability . . . . Black-footed albatross counts, July 2006 . . . . . . . . . . . . Prediction of temperature from elevation (one of two goals) . Prediction raster (goal) . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

258 258 259 260 262 262 263 263 270 271 271 272 273 273 276 277 279 280

12.1 12.2 12.3 12.4 12.5

Four bands of a Sentinel-2 scene from 20210628. . . . . . . . . . . . . . . . R-G-B image from Sentinel-2 scene 20210628. . . . . . . . . . . . . . . . . Color image from Sentinel-2 of Red Clover Valley, 20210628. . . . . . . . . NIR-R-G image from Sentinel-2 of Red Clover Valley, 20210628. . . . . . . Relations between Red and NIR bands, Red Clover Valley Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectral signature of nine-level training polygons, 20 m Sentinel-2 imagery from 20210628. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectral signature of seven-level training polygons, 20 m Sentinel-2 imagery from 20210628. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectral signature of six-level training polygons, 20 m Sentinel-2 imagery from 20210628 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NDVI from Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . . . . NDVI histogram, Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . NDMI from Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . . . . NDMI histogram, Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . NDGI from Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . . . . NDGI histogram, Sentinel-2 image, 20210628 . . . . . . . . . . . . . . . . . Unsupervised k-means classification, Red Clover Valley, Sentinel-2, 20210628 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Training samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CART Decision Tree, Sentinel-2 20 m, date 20210628 . . . . . . . . . . . . CART classification, probabilities of each class, Sentinel-2 20 m 20210628 . CART classification, highest probability class, Sentinel-2 20 m 20210628 . 10 m CART regression tree . . . . . . . . . . . . . . . . . . . . . . . . . . . CART classification, probabilities of each class, Sentinel-2 10 m 20210628 . CART classification, highest probability class, Sentinel-2 10 m 20210628 . CART decision tree, Sentinel 10-m, spring and summer 2021 images . . . . CART classification, probabilities of each class, Sentinel-2 10 m, 2021 spring and summer phenology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CART classification, highest probability class, Sentinel-2 10 m, 2021 spring and summer phenology . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

283 284 285 285

12.6 12.7 12.8 12.9 12.10 12.11 12.12 12.13 12.14 12.15 12.16 12.17 12.18 12.19 12.20 12.21 12.22 12.23 12.24 12.25

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

286 287 288 289 290 291 292 292 293 294 295 296 297 298 299 304 305 306 310 311 312

xx

List of Figures 12.26 Classification of Sentinel-2 20 m image . . . . . . . . . . . . . . . . . . . . 12.27 Classification of Sentinel-2 10 m spring and summer images . . . . . . . .

313 314

13.1 13.2

319

13.25 13.26 13.27 13.28 13.29 13.30

Red Clover Valley eddy covariance flux tower installation . . . . . . . . . . Loney Meadow net ecosystem exchange (NEE) results (Blackburn et al. 2021) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time series of Nile River flows . . . . . . . . . . . . . . . . . . . . . . . . . Decomposition of Mauna Loa CO2 data . . . . . . . . . . . . . . . . . . . Seasonal deomposition of time series using loess (stl) applied to CO2 . . . San Francisco monthly highs and lows as time series . . . . . . . . . . . . . SF data with yearly period . . . . . . . . . . . . . . . . . . . . . . . . . . . Greenhouse gases with 20 year observations, so 0.05 annual frequency . . . Monthly sunspot activity from 1749 to 2013 . . . . . . . . . . . . . . . . . Monthly sunspot activity from 1940 to 1970 . . . . . . . . . . . . . . . . . Sunspots of the first 20 years of data . . . . . . . . . . . . . . . . . . . . . 11-year sunspot cycle decomposition . . . . . . . . . . . . . . . . . . . . . San Pedro Creek E. coli time series . . . . . . . . . . . . . . . . . . . . . . Decomposition of weekly E. coli data, annual period (frequency 52) . . . . Moving average (order=15) of E. coli data . . . . . . . . . . . . . . . . . . GHG CO2 time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving average (order=7) of CO2 time series . . . . . . . . . . . . . . . . Random variation seen by subtracting moving average . . . . . . . . . . . Decomposition using stl of a 15th-order moving average of E. coli data . . Marble Mountains resurgence data logger design . . . . . . . . . . . . . . . Marble Mountains resurgence data logger equipment . . . . . . . . . . . . Data logger data from the Marbles resurgence . . . . . . . . . . . . . . . . stl decomposition of Marbles water level time series . . . . . . . . . . . . . Flux tower installed at Loney Meadow, 2016. Photo credit: Darren Blackburn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Facet plot with free y scale of Loney flux tower parameters . . . . . . . . . Scatter plot of Bugac solar radiation and air temperature . . . . . . . . . . Solstice 8-day time series of solar radiation and temperature . . . . . . . . Bugac solar radiation and temperature . . . . . . . . . . . . . . . . . . . . Manaus ensemble averages with error bars . . . . . . . . . . . . . . . . . . Facet graph of Marble Mountains resurgence data (goal) . . . . . . . . . .

14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11 14.12 14.13 14.14

New Shiny Document dialog . . . . . . . . . . . . . Shiny Document Editor . . . . . . . . . . . . . . . . Old Faithful geyser eruptions Shiny interface . . . . numericInput and renderPrint code . . . . . . . . . Numeric and slider inputs and print outputs . . . . Plot modified by input . . . . . . . . . . . . . . . . Radio buttons and check boxes . . . . . . . . . . . Simple Inline app . . . . . . . . . . . . . . . . . . . Simple inline coding . . . . . . . . . . . . . . . . . . Rendered data table . . . . . . . . . . . . . . . . . . Text entry and rendered text . . . . . . . . . . . . . Rendered box plot . . . . . . . . . . . . . . . . . . . Shiny app of Sierra climate data, with multiple tabs MODIS fire detection Shiny app . . . . . . . . . . .

13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13 13.14 13.15 13.16 13.17 13.18 13.19 13.20 13.21 13.22 13.23 13.24

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . available . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

320 320 322 322 323 324 325 326 326 327 328 331 331 333 334 334 335 336 336 337 338 339 339 341 342 343 345 346 348 352 352 353 354 355 356 356 357 358 360 361 361 362 368

1 Background, Goals and Data

1.1 Environmental Data Science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data (Wikipedia). A data science approach is especially suitable for applications involving large and complex data sets, and environmental data is a prime example, with rapidly growing collections from automated sensors in space and time domains. Environmental data science is data science applied to environmental science research. In general, data science can be seen as being the intersection of math and statistics, computer science/IT, and some research domain, and in this case it’s environmental (Figure 1.1).

FIGURE 1.1 Environmental data science

1.2 Environmental Data and Methods The methods needed for environmental research can include many things since environmental data can include many things, including environmental measurements in space and time domains. 1

2

Background, Goals and Data

• data analysis and transformation methods – importing and other methods to create data frames – reorganization and creation of fields – filtering observations – data joins – reorganizing data, including pivots • visualization – graphics – maps – imagery • spatial analysis – vector and raster spatial analysis ∗ spatial joins ∗ distance analysis ∗ overlay analysis ∗ terrain modeling – spatial statistics – image analysis • statistical summaries, tests and models – statistical summaries and visualization – stratified/grouped summaries – confirmatory statistical tests – physical, statistical and machine learning models – classification models • temporal data and time series – analyzing and visualizing long-term environmental data – analyzing and visualizing high-frequency data from loggers

1.3 Goals While the methodological reach of data science is very great, and the spectrum of environmental data is as well, our goal is to lay the foundation and provide useful introductory methods in the areas outlined above, but as a “live” book be able to extend into more advanced methods and provide a growing suite of research examples with associated data sets. We’ll briefly explore some data mining methods that can be applied to so-called “big data” challenges, but our focus is on exploratory data analysis in general, applied to environmental data in space and time domains. For clarity in understanding the methods and products, much of our data will be in fact be quite small, derived from field-based environmental measurements where we can best understand how the data were collected, but these methods extend to much larger data sets. It will primarily be in the areas of time-series and imagery, where automated data capture and machine learning are employed, when we’ll dip our toes into big data.

Exploratory Data Analysis

3

1.3.1 Some definitions: Machine Learning: building a model using training data in order to make predictions without being explicitly programmed to do so. Related to artificial intelligence methods. Used in: • image and imagery classification, including computer vision methods • statistical modeling • data mining Data Mining: discovering patterns in large data sets • • • •

databases collected by government agencies imagery data from satellite, aerial (including drone) sensors time-series data from long-term data records or high-frequency data loggers methods may involve machine learning, artificial intelligence and computer vision

Big Data: data having a size or complexity too big to be processed effectively by traditional software • data with many cases or dimensions (including imagery) • many applications in environmental science due to the great expansion of automated environmental data capture in space and time domains • big data challenges exist across the spectrum of the environmental research process, from data capture, storage, sharing, visualization, querying Exploratory Data Analysis: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of structuring data to make its analysis easier • summarizing • restructuring • visualization

1.4 Exploratory Data Analysis Just as exploration is a part of what National Geographic has long covered, it’s an important part of geographic and environmental science research. Exploratory data analysis is exploration applied to data, and has grown as an alternative approach to traditional statistical analysis. This basic approach perhaps dates back to the work of Thomas Bayes in the eighteenth century, but Tukey (1962) may have best articulated the basic goals of this approach in defining the “data analysis” methods he was promoting: “Procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.” Some years later Tukey (1977) followed up with Exploratory Data Analysis.

4

Background, Goals and Data

Exploratory data analysis (EDA) is an approach to analyzing data via summaries and graphics. The key word is exploratory, and while one might view this in contrast to confirmatory statistics, in fact they are highly complementary. The objectives of EDA include (a) suggesting hypotheses; (b) assessing assumptions on which inferences will be based; (c) selecting appropriate statistical tools; and (d) guiding further data collection. This philosophy led to the development of S at Bell Labs (led by John Chambers, 1976), then to R.

1.5 Software and Data First, we’re going to use the R language, designed for statistical computing and graphics. It’s not the only way to do data analysis – Python is another important data science language – but R with its statistical foundation is an important language for academic research, especially in the environmental sciences. ## [1] ”This book was produced in RStudio using R version 4.2.1 (2022-06-23 ucrt)”

For a start, you’ll need to have R and RStudio installed, then you’ll need to install various packages to support specific chapters and sections. • In Introduction to R (Chapter 2), we will mostly use the base installation of R, with a few packages to provide data and enhanced table displays: – igisci – palmerpenguins – DT – knitr • In Abstraction (Chapter 3) and Transformation (Chapter 5), we’ll start making a lot of use of tidyverse 3.1 packages such as: – ggplot2 – dplyr – stringr – tidyr – lubridate • In Visualization (Chapter 4), we’ll mostly use ggplot2, but also some specialized visualization packages such as: – GGally • In Spatial (starting with Chapter 6), we’ll add some spatial data, analysis and mapping packages: – sf – terra – tmap – leaflet • In Statistics and Modeling (starting with Chapter 10), no additional packages are needed, as we can rely on base R’s rich statistical methods and ggplot2’s visualization.

Software and Data

5

• In Time Series (Chapter 13), we’ll find a few other packages handy: – xts (Extensible Time Series) – forecast (for a few useful functions like a moving average) And there will certainly be other packages we’ll explore along the way, so you’ll want to install them when you first need them, which will typically be when you first see a library() call in the code, or possibly when a function is prefaced with the package name, something like dplyr::select(), or maybe when R raises an error that it can’t find a function you’ve called or that the package isn’t installed. One of the earliest we’ll need is the suite of packages in the “tidyverse” (Wickham and Grolemund (2016)), which includes some of the ones listed above: ggplot2, dplyr, stringr, and tidyr. You can install these individually, or all at once with: `install.packages(”tidyverse”)`

This is usually done from the console in RStudio and not included in an R script or markdown document, since you don’t want to be installing the package over and over again. You can also respond to a prompt from RStudio when it detects a package called in a script you open that you don’t have installed. From time to time, you’ll want to update your installed packages, and that usually happens when something doesn’t work and maybe the dependencies of one package on another gets broken with a change in a package. Fortunately, in the R world, especially at the main repository at CRAN, there’s a lot of effort put into making sure packages work together, so usually there are no surprises if you’re using the most current versions. Note that there can be exceptions to this, and occasionally new package versions will create problems with other packages due to inter-package dependencies and the introduction of functions with names that duplicate other packages. The packages installed for this book were current as of that version of R, but new package versions may occasionally introduce errors. Once a package like dplyr is installed, you can access all of its functions and data by adding a library call, like … library(dplyr)

… which you will want to include in your code, or to provide access to multiple libraries in the tidyverse, you can use library(tidyverse). Alternatively, if you’re only using maybe one function out of an installed package, you can call that function with the :: separator, like dplyr::select(). This method has another advantage in avoiding problems with duplicate names – and for instance we’ll generally call dplyr::select() this way.

1.5.1 Data We’ll be using data from various sources, including data on CRAN like the code packages above which you install the same way – so use install.packages(”palmerpenguins”). We’ve also created a repository on GitHub that includes data we’ve developed in the Institute for Geographic Information Science (iGISc) at SFSU, and you’ll need to install that package a slightly different way.

6

Background, Goals and Data

GitHub packages require a bit more work on the user’s part since we need to first install remotes1 , then use that to install the GitHub data package: install.packages(”remotes”) remotes::install_github(”iGISc/igisci”)

Then you can access it just like other built-in data by including: library(igisci)

To see what’s in it, you’ll see the various datasets listed in: data(package=”igisci”)

For instance, Figure 1.2 is a map of California counties using the CA_counties sf feature data. We’ll be looking at the sf (Simple Features) package later in the Spatial section of the book, but seeing library(sf), this is one place where you’d need to have installed another package, with install.packages(”sf”). library(tidyverse); library(igisci); library(sf) ggplot(data=CA_counties) + geom_sf()

The package datasets can be used directly as sf data or data frames. And similarly to functions, you can access the (previously installed) data set by prefacing with igisci:: this way, without having to load the library. This might be useful in a one-off operation: mean(igisci::sierraFeb$LATITUDE)

## [1] 38.3192

Raw data such as .csv files can also be read from the extdata folder that is installed on your computer when you install the package, using code such as: csvPath % group_by(trtype) %>%

Statistical Tests

245

summarize(meanfines = mean(fines_g, na.rm=T), sdfines = sd(fines_g, na.rm=T), meantotal = mean(total_g, na.rm=T), sdtotal = sd(total_g, na.rm=T))

## # A tibble: 2 x 5 ## trtype meanfines sdfines meantotal sdtotal ##

## 1 euc ## 2 oak





14.2 39.4

3.50 20.4

48.6 86.7

35.0 26.2

eucoakLong % pivot_longer(col=c(fines_g,litter_g), names_to = ”sed_type”, values_to = ”sed_g”) eucoakLong %>% ggplot(aes(trtype, sed_g, col=sed_type)) + geom_boxplot()

eucoakLong %>% ggplot(aes(sed_g, col=sed_type)) + geom_density() + facet_grid(trtype ~ .)

Tests of euc vs oak based on fine sediments: shapiro.test(eucoaksed$fines_g[eucoaksed$trtype == ”euc”]) shapiro.test(eucoaksed$fines_g[eucoaksed$trtype == ”oak”]) t.test(fines_g~trtype, data=eucoaksed)

## ## Shapiro-Wilk normality test ## ## data: eucoaksed$fines_g[eucoaksed$trtype == ”euc”] ## W = 0.9374, p-value = 0.6383 ## ## Shapiro-Wilk normality test ## ## data: eucoaksed$fines_g[eucoaksed$trtype == ”oak”] ## W = 0.96659, p-value = 0.8729 ## ## ## ## ## ## ## ## ## ## ##

Welch Two Sample t-test data: fines_g by trtype t = -3.2102, df = 6.4104, p-value = 0.01675 alternative hypothesis: true difference in means between group euc and group oak is not equal to 0 95 percent confidence interval: -44.059797 -6.278299 sample estimates: mean in group euc mean in group oak 14.21667 39.38571

246

Statistical Summaries and Tests

Tests of euc vs oak based on total sediments: shapiro.test(eucoaksed$total_g[eucoaksed$trtype == ”euc”]) shapiro.test(eucoaksed$total_g[eucoaksed$trtype == ”oak”]) kruskal.test(total_g~trtype, data=eucoaksed)

## ## Shapiro-Wilk normality test ## ## data: eucoaksed$total_g[eucoaksed$trtype == ”euc”] ## W = 0.76405, p-value = 0.02725 ## ## Shapiro-Wilk normality test ## ## data: eucoaksed$total_g[eucoaksed$trtype == ”oak”] ## W = 0.94988, p-value = 0.7286 ## ## Kruskal-Wallis rank sum test ## ## data: total_g by trtype ## Kruskal-Wallis chi-squared = 3.449, df = 1, p-value = 0.06329

So we used a t test for the fines_g, and the test suggests that there’s a significant difference in sediment yield for fines, but the Kruskal-Wallis test on total sediment (including litter) did not show a significant difference. Both results support the conclusion that oaks in this study produced more soil erosion, largely because the Eucalyptus stands generate so much litter cover, and that litter also made the total sediment yield not significantly different. See Thompson, Davis, and Oliphant (2016) for more information on this study and its conclusions.

10.4.2 Analysis of variance The purpose of analysis of variance (ANOVA) is to compare groups based upon continuous variables. It can be thought of as an extension of a t test where you have more than two groups, or as a linear model where one variable is a factor. In a confirmatory statistical test, you’ll want to see if you can reject the null hypothesis that there’s no difference between the within-sample variances and the between-sample variances. • The response variable is a continuous variable • The explanatory variable is the grouping – categorical (a factor in R) From a study of a karst system in Tennessee (J. D. Davis and Brook 1993), we might ask the question: Are water samples from streams draining sandstone, limestone, and shale (Figure 10.15) different based on solutes measured as total hardness? We can look at this spatially (Figure 10.16) as well as by variables graphically (Figure 10.17).

Statistical Tests

247

FIGURE 10.15 Water sampling in varying lithologies in a karst area

wChemData % mutate(siteLoc = str_sub(Site,start=1L, end=1L)) wChemTrunk % filter(siteLoc == ”T”) %>% mutate(siteType = ”trunk”) wChemDrip % filter(siteLoc %in% c(”D”,”S”)) %>% mutate(siteType = ”dripwater”) wChemTrib % filter(siteLoc %in% c(”B”, ”F”, ”K”, ”W”, ”P”)) %>% mutate(siteType = ”tributary”) wChemData