GIS: An Introduction to Mapping Technologies [1 ed.] 9781498740234, 9781498740258, 9781498740241, 9780429441028, 9780429804779, 9780429804762, 9780429804786, 9780367570903

Over the past few decades the world has been organized through the growth and integration of geographic information syst

1,217 107 50MB

Pages [363] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

GIS: An Introduction to Mapping Technologies [1 ed.]
 9781498740234, 9781498740258, 9781498740241, 9780429441028, 9780429804779, 9780429804762, 9780429804786, 9780367570903

Table of contents :

1. Making Sense of Geotechnology 2. Georeferencing 3. Getting the World Into Your GIS 4. Geospatial Analysis: Introduction5. Thematic Mapping 6. Data Models 7. Web GIS 8. Open-Source GIS 9. Introduction to Remote Sensing and GIS 10. Introduction to Health GIS Applications 11. GIS and Digital Humanities 12. Organizational GIS

Citation preview

GIS

An Introduction to Mapping ­Technologies

GIS

An Introduction to Mapping ­Technologies

Patrick McHaffie, Sungsoon Hwang, and Cassie Follett

CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2019 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-1-4987-4023-4 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged, please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.­ copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

Contents Preface.......................................................................................................................ix Acknowledgments..................................................................................................xi Authors.................................................................................................................. xiii 1. Making Sense of Geotechnology.................................................................1 1.1 Coming to GIS........................................................................................1 1.2 The Geospatial Analyst: A Definition.................................................6 1.3 What This Book Is About......................................................................7 1.4 How This Book Is Organized...............................................................9 References........................................................................................................ 20 2. Georeferencing............................................................................................... 23 2.1 Needs for Metric Georeferencing...................................................... 23 2.2 Understanding Datums...................................................................... 24 2.3 Understanding Map Projection.......................................................... 27 2.4 Coordinate Systems for GIS................................................................ 31 2.5 Address Geocoding............................................................................. 36 References........................................................................................................ 59 3. Getting the World Into Your GIS............................................................... 61 3.1 Introduction: A World of Geospatial Data....................................... 61 3.2 Data Acquisition: Getting Started.....................................................63 3.3 Private Sector Geospatial Services and Data................................... 67 3.3.1 “Snapshot” Services............................................................... 67 3.3.2 Subscription/“On-Demand” Services................................. 67 3.4 Conversion Services............................................................................. 69 3.5 GNSS: A Viable Approach to Data Collection................................. 70 3.5.1 DIY Data Collection Using GNSS......................................... 71 3.6 Geodatabases in ArcGIS Desktop...................................................... 72 References........................................................................................................ 74 4. Geospatial Analysis: Introduction.............................................................77 4.1 Introduction to Geospatial Analysis.................................................77 4.2 Selection and Subsetting..................................................................... 78 4.3 Classification.........................................................................................83 4.4 Exploring Proximity, Buffering, and Spatial Relationships...........83 4.5 Interpolation.........................................................................................85 4.6 Networks............................................................................................... 86 References........................................................................................................ 88

v

vi

Contents

5. Thematic Mapping........................................................................................ 91 5.1 Fundamental Elements of Maps........................................................ 91 5.2 Gallery of Maps.................................................................................... 93 5.3 How Data Representation Affects Map Reading and Interpretation������������������������������������������������������������������������������������� 102 5.4 Choosing Symbology (Visual Variables)........................................ 107 5.5 Choosing a Quantitative Thematic Mapping Technique............. 112 5.6 Organizing Thematic Content......................................................... 113 References...................................................................................................... 137 6. Data Models.................................................................................................. 139 6.1 Geospatial Data is Special................................................................. 139 6.2 How Spatial Data Is Structured—Raster and Vector.................... 140 6.3 How Attribute Data Is Structured—Relational Databases.......... 144 6.3.1 Relational Databases............................................................ 144 6.3.2 Database Normalization...................................................... 145 6.3.3 Field Data Types.................................................................... 148 6.3.4 Table Join................................................................................ 149 6.4 How Spatial and Attribute Data are Put Together........................ 151 References...................................................................................................... 181 7. Web GIS......................................................................................................... 183 7.1 Introduction........................................................................................ 183 7.2 Web and Internet Fundamentals..................................................... 185 7.3 Web GIS History................................................................................. 190 7.4 Mainstream Web GIS Applications................................................. 194 7.5 Desktop Web GIS Exercise................................................................ 198 7.6 ArcGIS Online Exercise..................................................................... 206 7.7 Emerging Trends in Web GIS and Advanced Topics.................... 212 References...................................................................................................... 216 8. Open-Source GIS......................................................................................... 217 8.1 What Is Open-Source GIS?............................................................... 217 8.2 Getting Started with QGIS............................................................... 218 8.3 GDAL and Raster Data......................................................................223 8.4 GeoDa and Open-Source GIS Databases........................................ 231 8.5 The Future of Open-Source GIS....................................................... 238 References...................................................................................................... 238 9. Introduction to Remote Sensing and GIS ............................................. 239 9.1 Remote Sensing Fundamentals........................................................ 239 9.2 Organizational Considerations When Beginning Remote Sensing Work������������������������������������������������������������������������������������� 245 9.3 Major Land Remote Sensing Projects............................................. 247 9.3.1 Government Aerial Photography....................................... 247

Contents

vii

9.3.2 Landsat and Other Multispectral Imaging Programs......252 Digital Remote Sensing Data............................................................ 255 9.4.1 Sensors.................................................................................... 255 9.5 Remote Sensing Applications .......................................................... 258 References...................................................................................................... 269 9.4

10. Introduction to Health GIS Applications............................................... 271 10.1 Introduction ....................................................................................... 271 10.2 Accessibility ....................................................................................... 272 10.3 Accessibility in Design...................................................................... 279 10.4 Cartograms......................................................................................... 281 10.5 Ring Maps........................................................................................... 283 References...................................................................................................... 286 11. GIS and Digital Humanities..................................................................... 289 11.1 Digital Humanities Meets GIS......................................................... 289 11.2 Thinking Spatially with GIS............................................................. 293 11.3 Georeferencing Images..................................................................... 295 11.4 Mapping Texts.................................................................................... 297 11.5 Representing Time in a Geographic Database.............................. 299 11.6 Mapping Temporal Data................................................................... 302 11.7 Mapping Qualitative Data................................................................306 References...................................................................................................... 321 12. Organizational GIS..................................................................................... 323 12.1 Working as a GIS Professional......................................................... 323 12.2 Organizational Implementation of GIS.......................................... 324 12.3 Organizational Best Practices.......................................................... 325 12.4 Collaborative GIS............................................................................... 327 12.5 Survey123 and Collaborative StoryMaps........................................ 329 References...................................................................................................... 339 Index...................................................................................................................... 341

Preface This project is a product of the work of numerous faculty and staff over the past few decades developing a strong and focused curriculum and ­program in the geospatial and geographical information sciences. Because we are ­residents of Chicago, a vast city of immigrants, and working teachers of bright and motivated students who challenge us every day to be relevant, engaged, and interesting, we have tended to take advantage of the urban milieu around us. Working with staff and faculty from university centers and institutions like the Steans Center for Community-based Service Learning, the Chaddick Institute for Metropolitan Development, and the College of Liberal Arts and Social Sciences at DePaul University we have been able to make connections with and learn from communities across the city. At the same time, we have learned to listen to our students. Teachers are often accused of not listening to their subjects (and some of us do this better than others), but in our case we’ve done this and have learned from it. Faculty in strong undergraduate teaching programs in geography like ours (and there are many) draw much of their inspiration, motivation, and sustenance from the time they spend with their students and ultimately the success those students achieve during their matriculation and beyond as scholars, professionals, and citizens. Our experiences and interests range across numerous fields and ­subject areas, including Manual Cartographic Production, Card-punched Computer Mapping, Analog and Digital Photogrammetry, Environmental and Social Geography, Geographic Information Systems (GIS) and Science, Web GIS, Augmented Reality GIS, Mobile GIS, Web App Development, GIS in Sustainability Education and Research, Community GIS, GIS in Development Studies, Housing Market Analysis, Transportation GIS, and Global Navigation Satellite Systems (GNSS) Trajectory Computing. This smorgasbord is the palette we draw on when working in our classrooms, interacting with students, and creating and shaping courses in our programs. At DePaul Geography we have grown into a premier provider of undergraduate GIS education over the past decade, principally focused on undergraduate programs including a GIS Certificate Program and Geotechnology concentration in the BA program. In addition, we have recently collaborated in the launch of a very promising Master’s Program in Sustainable Urban Development (SUD), with the previously mentioned Steans Center and the Chaddick Institute. SUD combines coursework in planning and policy, green infrastructure, brownfields redevelopment, GIS for community development, and GIS for sustainable urban development with coursework in public service, statistics, and a broad array of practical electives and a GIS practicum. In many ways ix

x

Preface

this program is a crystallization of the community GIS ethic we have incorporated into our undergraduate program over the last decade. A number of people have participated in this and are working now to make the program better, some as faculty, others as students. All who have can take some pride in the success of SUD, and the successes of students who have completed it and are working in the field as well as those who will be part of this program in the future. In 1973, William Warntz, describing his work at the Harvard Lab for Computer Graphics and Spatial Analysis speaking to Maynard Dow, said “It was no trick to actually get a computer (to) output map form, but you could combine all the facilities of the computer for analysis and manipulation purposes so that literally the map became an experimental tool. So the map became the geographer’s laboratory. We could experiment in ways that were denied to us previously by using the enormous power of the computer, and hence I would say that we literally experiment with spatial structure via… computer mapping in a way that we could never do before.” This vision of the map as experiment has become a reality through GIS, and the computer now is an ubiquitous and embedded form of “knowledge matter” in our pockets, knapsacks, briefcases, offices, classrooms, homes, and vehicles. The data collected by these place-aware, space-calculating devices is now and forever biographical, so that our every movement, motion, and action is knowable, trackable, and analyzable. Work with these data seems to be a big part of the emerging future for geospatial analysis and location-based services. The need is for educated, smart, well-trained, ethical GIS professionals who are capable of navigating this future. This text is intended to provide a starting point for many, and our hope is that this will be the beginning of a collaboration that fills a gap in the GIS literature with resources and guidance that will deepen and broaden the diverse knowledge community in this field. Users of this text can obtain data and other peripheral files associated with the included lab activities at the CRC Press site: https://www.crcpress.com/GIS-An-Introduction-to-Mapping-Technologies/ McHaffie-Hwang-Follett/p/book/9781498740234. These materials will be updated as necessary.

Reference Association of American Geographers, Maynard Weston Dow, Maynard Weston Dow, and William Warntz. Geographers on Film Interview with William Warntz. Association of American Geographers, 1973. Video. https://www.loc. gov/item/mbrs01844953/.

Acknowledgments My contribution to this project has been helped by many colleagues over the years. Irma Shagla-Britton and her staff at Taylor and Francis/CRC have been patient and supportive. My friends Jeremy Crampton, Alan MacEachren, Matt Wilson, John Cloud, John Krygier, Nick Chrisman, Francis Harvey, Rina Ghose, John Pickles, and Laurel Smith helped me to think about ­mapping as a human, cultural process early on. At Kentucky, Adrian Smith, Ian Cook, and Karen Falconer-al Hindi taught me how to think, and J.P. Jones and Wolfgang Natter showed me what was important to think about. In Chicago, Alex Papadopoulos gave me a chance to be part of a small but strong group of geographers. Sarah Elwood and Howard Rosing showed me what ­community means, and Sungsoon Hwang showed me how to do things that really matter for our students and the communities we serve. And of course Amy, Tessa, and Kienan give me strength, sustenance, and an occasional smile every day of my life. Patrick McHaffie This book would not be possible without those who put thought and care into advancing GIS and GIS education. Many of my writings were inspired by the work of Paul Longley, Mike Goodchild, David Maguire, David Rhind, Jon Kimerling, Phillip Muehrcke, Nick Chrisman, Mike Worboys, John Krygier, Denis Wood, Paul Bolstad, Menno-Jan Kraak, F. J. Ormeling and Timothy Nyerges. I thank my ­colleagues Euan Hague and Patrick McHaffie for their friendship and tireless support. I thank students who became ­audience to my classes, and inspired me to think how to teach. Finally, my deepest thanks go to my family. Sungsoon Hwang My contributions would not have been possible without the help and ­encouragement of the DePaul Geography department, especially the support from Dr. Euan Hague and the other authors. My special thanks to Cameron, Belinda, Penny, and Diane, who gave ­feedback on my drafts and other materials. Thanks also to Claire, Adalyn, and my mom for being there for me this past year. And finally, my thanks to Chris and Jamie, Steve and Steph, and my many friends at DeSoto. Cassie Follett

xi

Authors

Patrick McHaffie: Professor McHaffie joined the faculty at DePaul University in 1996 after academic appointments at West Virginia University, Dartmouth College, and West Georgia College. During the 1980s he served with the Kentucky Geological Survey where he conducted environmental research related to coal mining, directed the National Cartographic Information Center affiliate office, and coordinated the State Topographic Mapping Program. His research interests and publication history include the social history of cartography and GIS, the cartographic labor process, cartographic ethics, the geography of education spending, Appalachian social geography, and the social construction of the global. He currently is completing a study of highway planning in Chicago during the cold war. His work has been supported by grants from the National Science Foundation, History of Cartography Project (University of Wisconsin), and the University Research and Quality of Instruction Councils (DePaul University), and he holds a GIS Professional (GISP) certification. Sungsoon Hwang:  Sungsoon Hwang is an associate professor in the Department of Geography at DePaul University. She received her BA in Geography Education from Seoul National University. She received her MS in Geographic Information Systems from the University of Nottingham. She received her PhD in Geography with a concentration on Urban and Regional Analysis from State University of New York (SUNY) at Buffalo. She has ­created and taught 20 courses related to GIS. She has published on fuzzy geodemographics, GIS for sustainability education, and uncertain GPS ­trajectory computing in the disciplines of geography, urban planning, and computer science. Cassie Follett: Cassie Follett is the GIS Coordinator for the Department of Geography at DePaul University, helping to plan and expand GIS ­capabilities on-campus and managing the new GIS Lab. Before coming to DePaul she worked for NASA and the federal Environmental Protection Agency, and received her MA from West Virginia University while researching big data and open-source web GIS. She holds a bachelor’s degree with a double major in Geography and History from Carthage College. ‘She contributes code to the open source QGIS project, and was a “Hacker in Residence” with local Chicago startup Hologram, and a “Cyberacademy Fellow” with SANS. Her research interests are participatory GIS, GIS programming, critical GIS, ­environmental justice, and virtual reality.

xiii

1 Making Sense of Geotechnology

1.1 Coming to GIS Imagine a world where spatial relationships between people and their physical and human environments, economies, and cultures are made clear in ways that have never before been possible. This is the promise of geographic information systems (GIS). Over the past century scientific and technical advancements have completely changed the ways in which we map. First, through the incorporation of aerial photography and photogrammetry taking mapping out of the field and moving it into the factory, then through a series of developments in the materials, machines, and methods used to make representations of the world mapmaking became more standardized and thus more efficient and precise. In the decades following World War II a societal shift to (1) electronic data processing and quantification in many public and private bureaucratic processes, (2) the professionalization and implementation of operations research and systems analysis methods in many settings, and (3) the widespread deployment of general- purpose computing systems in government, business, higher education, and the military. Each of these three were closely bound up in defense research in both peacetime and war, and the subsequent transfer of defense technologies into civil society and applications in civilian settings has been a driving force in these changes. Once general-purpose paper and film maps produced by national mapping agencies were digitized (layers of rivers, roads, and terrains were converted to digital files of x, y coordinates) they became the “elixir of life” for emerging new technologies such as GIS and remote sensing. It is no exaggeration to point out that mappers entered the 20th century on the back of a mule and left it staring at georeferenced electronic images collected by robotic spaceborne sensors, all embedded in a framework of exquisitely precise models of the geoid, festooned with layer upon layer of digital cartographic data. Mature GIS is an information constellation; a database management system with all the characteristics of other IS technologies, foregrounding powerful geospatial logic and “spatio-enabling” platforms ranging from smart watches to interplanetary probes. 1

2

GIS: An Introduction to Mapping Technologies

However, there still exists a wide gap between the haves and have-nots in their ability to access technologies and thus to derive the benefits. The digital divide is evident and stark in GIS and geotechnology more ­generally— indeed, even access to reliable and widely available electric service (often accepted in the developed world as a given) is considered a luxury in much of sub-Saharan Africa, south and southeast Asia, and Latin America. Yet over the past two decades there has been a stunning democratization of technologies through a shift to less expensive and more powerful mobile platforms and the creation and articulation of global wireless data networks, accelerating the pace of change and perhaps even speeding up the bridging of the technological divide. The coming years will determine whether the technological tsunami that has swamped the developed world over the past several decades in the realms of culture, communication, information, and governance will be made available to all people, communities, and nations. In the realm of geography, the academic home of GIS, it will help the reader to view this book as a disciplinary-resident response to the common ­misunderstanding of what we do and why it matters. We often refer to the two “big questions” we (as geographers) are expected to answer. The first question actually comes in two parts (What and where is it?), commonly seen as “the inventory question.” Answering these questions has occupied mappers for millennia. Our answer at present is the geodatabase: massive sets of geographically referenced features with associated attribute tables that refine and detail each and every feature. This systematizes a Cartesian inventory of the world and everything in it, albeit in machine-readable form, and presents a rich matrix of associated spatially explicit information as the raw material that can fuel possible answers to the second question (Why is it there?). This often requires research in other disciplinary realms while trying to understand the processes that have produced the outcomes we map, hinting at the multidisciplinary nature of our work. The cartographic answers to the first question can be considered as general-purpose “base maps,” while answers to the second are most assuredly “thematic maps” (specialized content with quantitative or qualitative evidence of processes and their effects). Answering the second question opens a dialogue with our two-part representations (features and attributes) that is only becoming more complex, reflexive, and power-laden over time. Responses to the “why” question in GIS should always be: • Complex—to reflect the reality of the world we all share • Reflexive—rather than pulling apart causes and effects, understanding them as swirling in cyclical, asymmetric relationships • Powerful—in that it (GIS) can endow individuals, collectives, and institutions with competitive advantages in a multitude of contexts And it is not presumptive to suggest a third big question for GIS users of all disciplines—and this is perhaps the most important of the three—can

Making Sense of Geotechnology

3

a different arrangement of spatial facts revealed by geospatial technologies make possible more just, humane, and sustainable futures for our shared and finite planet? The answer to this question will be determined largely by the next generation of GIS users. This work is intended to open a door to new users of geotechnology, whether they are pursuing accreditation in geography and GIS, or have arrived here from other disciplines hoping to incorporate geospatial analysis in their work. Perhaps the most compelling aspect of GIS and geospatial technology has been its meteoric rise in the public consciousness and entry into the mainstream over the past two decades, alongside other “techs” (e.g., nano-, bio-). The broad umbrella presented by this category has grown to include such diverse technologies and activities as global navigation satellite systems/ global positioning systems (GNSS/GPS), remote sensing, interactive web mapping, open-source geospatial data and systems projects and user communities, and participatory and community GIS (P GIS/C GIS). The constellation of categories that comprise and enable GIS includes: • general-purpose and specialized devices; • applications and logical systems (from simple one-task phone apps to complex, all-encompassing meta-systems in commercial and open-source contexts); • geographically referenced data in vector, raster, and hybrid formats; • rules and standards (allowing communication, integration, and coordination between and among different users); • the people who work with the technology (from individuals to large and complex private and public organizations); and • wired and wireless digital communications networks (local, regional, and global). This collection of “GI things” that make up “the system” on one hand suggests a certain complexity, but also allows new users to grasp the scope and scale of the whole and its parts: their interdependence, relationships, roles, and purposes in the enterprise. At the same time, it gives some evidence of the embeddedness of geotech and its connections to many other activities, disciplines, economic sectors, and governmental structures, rather than drawing a boundary around the whole. In GIS we use two different representational models to organize the spatial data used to depict the world; this data is separated into layers that are grouped based on their membership in particular classes. The vector data model uses a geometry of points, lines, or polygons defined by Cartesian coordinates referenced to systems based on the spherical coordinates of latitude and longitude. When separated into layers, and then combined, the representation is very much like a traditional paper map, with points, lines, and

4

GIS: An Introduction to Mapping Technologies

areas (Figure 1.1). The raster data model divides the same geography into a system of equal-sized grid cells. The grid is also referenced to a geographic coordinate system, a process called georeferencing. These grids are used to represent a single characteristic of the area contained within the cell, and they can represent either classed or continuous data. In a sense, the raster model is like a photo or digital image (Figures 1.1 and 1.2). Recent appraisals of the growing place this technology has taken within the US economy give some sense of its relevance and the degree that our world has become dependent on maps. In late 2013 the Federal Geographic Data Committee released an eye-popping estimate of the size of the “G-Conomy”: A 2012 study by the Boston Consulting Group (BCG) estimated that the U.S. geospatial industry generated approximately $73 billion in revenues in 2011 and comprises at least 500,000 high-wage jobs. In addition, BCG found that geospatial services deliver efficiency gains in the rest of the economy that are valued at many times the size of the sector itself—with geospatial services driving $1.6 trillion in revenue and $1.4 trillion in cost savings. These benefits, representing 15 to 20 times the size of the geospatial services sector itself, create an important competitive advantage

FIGURE 1.1 Vector data model.

Making Sense of Geotechnology

5

FIGURE 1.2 Raster data model.

for the U.S. economy. In addition, the U.S. Department of Labor recently identified the geospatial technology sector as one of the three technology areas that would create the greatest number of new jobs over the next decade… Worldwide, a 2012 study by Oxera commissioned by Google estimated that the global geospatial services sector generates $150 to $270 billion annually. By comparison, this is greater than the $25 billion generated by the video games industry, roughly equivalent to the $140 billion in revenue from the global security services industry, and about one-third of the global airline industry’s annual revenues of $594 billion.*

Clearly, GIS has moved beyond the realm of specialized mapping software and into the mainstream of global economic activity. * This excerpt is taken from page 10, Federal Geographic Data Committee, 2013, National Spatial Data Infrastructure Strategic Plan 2014–2016: Reston, Va.; Federal Geographic Data Committee. The entire report can be accessed online at http://www.fgdc.gov/nsdi-plan/ nsdi-strategic-plan-final-draft-12-4-13.pdf.

6

GIS: An Introduction to Mapping Technologies

1.2 The Geospatial Analyst: A Definition One of the most common questions from students first studying GIS is, “What is a geospatial analyst, and what does she do?” This is a difficult question, as geospatial analysis is a relatively recent job category, perhaps the largest category (in numbers) in the industry, and its roles and descriptions of duties encompass a broad understanding of GIS, even within specialized application areas. We should probably start with a definition of analysis. At the risk of committing rhetorical tautology, analysis is what analysts do; more specifically, what professional analysts do in response to requests or queries from public or private organizations and individuals (clients) regarding unknowns. They produce reports that may be written, tabular, quantitative or qualitative, and may include graphical elements of various kinds. Analysts typically use database management systems, statistical analysis, synthesis of multidisciplinary logics, as well as critical analysis of written texts, images, and other artifacts to answer questions. They may work using secondary data sources and extract specialized subsets of these, or they may gather their own information and record it using various methods in the field. Strong communication skills (spoken, written, graphical, photographic, and videographic) are needed to be a successful analyst. Analysts are judged based on their ability to compose strong and convincing arguments within their reports that ­support their responses to clients. Analysts produce reports that answer questions from clients regarding unknowns in the human and physical world, but what do geospatial analysts do? Geospatial analysts do all of this using all of the tools, skills, and logics that most other analysts use, but in addition they use GIS and other geotechnologies to produce their reports. Geospatial analysts produce reports that include maps (conventional paper maps and/or smart interactive maps), which can include qualitative and quantitative data such as evidence of social or environmental phenomena that strengthen claims or arguments being made. They also produce image maps or animations that may incorporate the results of analytical techniques applied to satellite images that have been used to make visible that which was only assumed or theorized. Quite often they are expected to work within a larger group of specialists making contributions to a flexible or ad hoc working group. They may work with analysts from other fields and collaborate with them, using their knowledge and expertise to respond to requests for specific products (graphical, tabular, or textual) that will contribute to larger projects and team-written reports. They may also be asked to help develop applications that can be used by non-specialists to access, edit, manipulate, or analyze geospatial data. In summary, a geospatial analyst is necessarily a generalist in a world of specialists. This suggests that a broad breadth of interests, training,

Making Sense of Geotechnology

7

knowledge, and skills are a necessary prerequisite, both in quantitative analytical techniques (e.g., mathematics, statistics, logic, content-laden fields in the social and physical disciplines). While it is critical for the geospatial analyst to master the technologies we work with it is also necessary for them to bring something more to the table. GIS can open doors to new ways of understanding, analyzing, and representing the world and is only the beginning of what can become a lifelong engagement with a learning community that is growing larger, smarter, and more intellectually diverse every year. The authors of this book are colleagues in a strong Department of Geography situated in Chicago, one of the more interesting and diverse cities in the United States with a long history and tradition in cartography, mapping, and geography. In 2016 the Department was awarded the initial Association of American Geographers award for program excellence. In particular, the program was recognized for “developing a curriculum that advances urban social justice, community service, and geotechnology.” The faculty and staff maintain high standards in geospatial analysis and technology, and are committed to engagement and service in the city that serves as both home and workplace. As a result, each of our worldviews and voices are distinct, focused, and geographical, concerned primarily with understanding and sharing the rich spatiality of the world at all scales.

1.3 What This Book Is About The purpose of this text is to provide an accessible introduction to geotechnology for a wide range of students. If you are considering pursuing baccalaureate credentials in geography, the home discipline of the authors, this book will provide a strong introduction to geotech. You may be coming from another academic discipline as an undergraduate with an interest in applying spatial analysis to datasets or research questions you have been investigating (e.g., environmental science, public policy, biology, sociology, political science). The techniques and approaches to problem solving, project organization and management, and data visualization found here will produce information products that support the work that you are doing and provide new ways to view your results. If you are a graduate student in one of these disciplines this text can serve as a manageable and adaptable introduction to GIS that will add gravitas to your work. Finally, if you are a working professional or analyst in any of a broad array of professions and considering retooling or refreshing your credentials, this book will be a useful reference and introduction to the technology.

8

GIS: An Introduction to Mapping Technologies

This guide to GIS and geospatial analysis is grounded in applications based on three broad and linked ontologies: • Social: Understanding the world of people as a spatial system of urban and rural places, made meaningful by the actions of individuals, bound together more or less loosely by political, economic, and cultural structures, and each with temporal and spatial qualities that may or may not be quantifiable. • Environmental: Understanding the physical world as a single spatial system of interlocked subsystems, bio-, hydro-, litho-, and atmospherical, that can be analyzed using geoscientific techniques and technologies. • Sustainable: Accepting and working toward a spatial system of social and environmental ethics that obliges relationships and exchanges between and among entities from the first two categories to be organized and conducted in ways that neither harm nor exclude the possibilities of future similar relationships and exchanges. Like a three-legged table, these must all “stand together.” That is, analyses, models, and representations of the contemporary world derived and produced using GIS should not stand in isolation within a particular silo, but rather be embedded in an understandable and accessible framework that draws on what is known across the disciplines in these three categories. For example, if we are working to understand human land use in a particular geometry (e.g., states, communities, watersheds) for the purpose of rational land-use planning, the primary goal often is to inform decisions regarding development or conversion of land from one use to another. A proper analysis then would consider not only the economic consequences of development but a wide range of physical, environmental, social, and cultural effects as well. The ability of analysts to synthesize, visualize, and even model the relationships between a myriad of human and physical entities, spatially and temporally, makes the use of GIS and the skills of geospatial analysts critically necessary. For this reason, we repeat the importance of the geospatial analyst as necessarily being a generalist. In this book we will work with a number of necessary tools including software, hardware, data, and network resources. The principal system we use to explore GIS is ArcGIS Desktop 10.5. This is the leading enterprise and desktop GIS worldwide, and those interested in becoming proficient and competitive in the current environment will be well served to learn to use this powerful, integrative toolset. We will also explore a number of other working environments, both directly within GIS and within domains connected and related to this technology. These will include public-domain/open-source resources that may be easily accessed and acquired through the Internet, technologies such as GNSS/GPS locational systems, image processing and remote sensing systems, and other

Making Sense of Geotechnology

9

related technologies. We are a methodological and content-diverse group and our writing will reflect that, allowing each of us to explore and present analytical and substantive topics that cover a broad gamut of applications, techniques, locales, and contexts. While everything here may interest some, for others there may be particular parts of interest—take the time to explore each chapter with an open and inquisitive mind. Throughout this book there will be a number of online resources students will need to access in order to complete activities, learn about freely available data and applications, and build a complete foundation for a lifetime of work in the geospatial services sector. The authors recognize and understand that not all people have access to the technological resources needed to use a text of this sort. This is not to suggest or propose a solution to the larger problem of free and open access to the Internet and information technology but rather as an admission of the unevenness of these resources in many parts of the world. Our hope is that we are moving to a time where increased and open access to all is achievable. Whether this is possible is beyond the scope of this book.

1.4 How This Book Is Organized This book is comprised of chapters that are organized first around general then more specific topics and tasks that are necessary to become a comfortable and confident user of GIS. Several chapters contain detailed exercises (activities) for a particular analytical task or workflow that can be adapted to other circumstances, data types, or areas of interest. Detailed lists of references and occasional resource guides are also included with each chapter. The following includes an overview of the chapters in this book: 1. Making sense of geotechnology (McHaffie) A lucid and engaging description of what geotechnology is, what role professionals engaged with the technology assume, and the ontology of the text. A brief description of the structure and outline of the text is presented. Activity 1.1 can be found at the end of this chapter. 2. Georeferencing (Hwang) This chapter helps readers make sense of map datums, map projections, and coordinate reference systems (CRSs). Other fundamental mapping concepts (e.g., scale, position, distance) depend on precise georeferencing that seamlessly converts between diverse CRSs in use within a multilayer dataset. The chapter includes visuals and ArcGIS tutorials designed to foster understanding of these concepts.

10

GIS: An Introduction to Mapping Technologies

3. Getting the world into your GIS (McHaffie) There is a world of data available for GIS projects through many ­possible sources, and in the age of the network, these are often just a click away. Both public-domain and commercial datasets provide options for mappers in generally usable formats, whether the application requires physical, human, vector, or raster data. Sometimes data that exists in tabular form with spatial fields can be converted quickly into mappable layers. Users may also require remote sensing images or digital terrain data that is used to visualize physical and human landscapes. Increasingly users collect their own vector data in the field using GNSS/GPS devices. Or they may rely on in situ measurements collected in near real time to populate their analytical models. And some users still rely on in-house or contracted digitizing services to convert scanned or hard copy materials into map layers. This chapter presents an organized guide to finding, collecting, or converting and organizing different types of geospatial data for a GIS project. 4. Geospatial Analysis: Introduction (McHaffie) This discussion introduces spatial analysis and presents some of the basic operations performed by analysts including selection and subsetting, classification, proximity functions and buffering, interpolation, and network analysis. An activity demonstrates selection by attributes and locations using ArcGIS. 5. Thematic mapping (Hwang) The most common use of GIS is to create maps based on attribute values, which is also called thematic mapping. While it has become easy to create maps using GIS, learning to read maps in an informed manner and designing maps effectively are more difficult. The chapter discusses on the fundamental elements of maps—namely, map scale and cartographic abstraction—to compare maps to other types of media. Different types of thematic maps are also explained. The impact of data representation (namely, data classification and normalization) on thematic maps is discussed with illustrations. Finally, the chapter addresses questions essential to effective cartographic design—i.e., how to go about choosing aspects of map symbols such as color and shape, choosing thematic mapping techniques, and organizing thematic content. An activity shows how to create thematic maps from geospatial data using ArcGIS. 6. Data models (Hwang) One of the challenges in many GIS projects is the diversity of data. Most GIS projects involve a variety of geospatial data formats and nonspatial (attribute) data stored in tables. This chapter explains how both spatial and nonspatial aspects of the world are represented in GIS. Understanding data models for both spatial and attribute data is

Making Sense of Geotechnology

crucial to exploring geospatial data and manipulating attribute tables (e.g., tasks such as field calculation, query, table join). The activities in this chapter explain how to display and review geospatial data and how to create a thematic map from census data using table join. 7. Web GIS (Follett) Web GIS bring static maps to life, allowing interaction, flowing narratives, and new ways to present maps and data through interactive maps. Technical knowledge and history and the basics of HTML, the web, and server technology relevant to GIS on the web are explained. An overview of different web GIS platforms and technologies precedes an in-depth exploration of the ArcGIS Online platform, with special attention and activities relating to web maps and ESRI Storymaps. Readers will learn how to share their data and maps with a global audience, learning techniques of design and features unique to the web. 8. Open-source GIS (Follett) The world of GIS has changed appreciably over the past 20 years as numerous open-source and web-based mapping platforms have become available to users and developers, creating a variety of ways to make, manipulate, and deliver content. As discussed in this chapter, this could include providing free standalone desktop or mobile applications like QGIS or GRASS, or developing specialized mapping applications with analytical power delivered through ordinary web browsers on different platforms. 9. Introduction to Remote sensing and GIS (McHaffie) Remote sensing, the interpretation and analysis of digital images acquired by aerial platforms in the atmosphere and outer space, is a major component of GIS. It is described as a linear process that begins with electromagnetic radiation from a natural or humanmade source and ends with products of analysis such as images, maps, and reports. This chapter covers several ways to acquire digital image data, different sensors and data types, and approaches to analysis. A short activity showing how to acquire digital imagery using the Landsat program and how to produce a false-color composite image is also included. 10. Introduction to Health GIS applications (Follett) In the health sector GIS are used by everyone from emergency responders directing ambulances to epidemiologists analyzing disease outbreaks and by those providers in between. Methods and approaches of special relevance to health GIS concerns are discussed as well as techniques presented in a health GIS context that are useful for other fields as well, such as network analysis. Issues of accessibility in map design are also discussed.

11

12

GIS: An Introduction to Mapping Technologies

11. GIS and Digital Humanities (Hwang) GIS mapping has become one of the tools used by humanities scholars to present their narratives, and to transform histories into ­“geohistories” (referred to as digital humanities, or DH). DH ­scholars have distinct needs since they often deal with poorly ­structured data, historical records of poorly understood geographies, and q ­ ualitative data. This chapter begins with case studies from DH projects that utilize GIS within the framework of spatial thinking. This chapter explains GIS-related tasks that scholars in the humanities are most likely to perform; for example, structuring data amenable to GIS mapping/analysis, mapping places and locations using a digital gazetteer, mapping historical data, mapping qualitative data, and curating geospatial data online. QGIS tutorials for these tasks are provided. 12. Organizational GIS (Follett) GIS as used by organizations, from businesses to government, involve a great deal of teamwork and collaboration; large projects rarely involve one analyst working alone. Real-time editing of maps shared between several GIS users is discussed in this chapter, along with best practices for organizing GIS products and sharing data. A hands-on activity for building and sharing a mobile GIS application is also included.

ACTIVITY 1.1 ARCGIS 10.5: ARCMAP, ARCCATALOG, AND CREATING A THEMATIC MAP This activity will allow you to explore the display and manipulation capabilities of the application. It also presents several modules that are incorporated within the ArcGIS framework (Figure 1.3).

FIGURE 1.3 Word cloud for Activity 1.1.

Making Sense of Geotechnology

The more familiar and comfortable you are with the help resources available the quicker you will become a GIS maven. There are also some interesting tutorials and a guide to the toolbars in the various modules we will be using. The more adept you become at helping yourself learn, the more successful you will be here and beyond. You should always work with the following items close at hand: flash drive (minimum 4 GB recommended), dedicated notebook and pencil with eraser, and headphones or earbuds (to access audio content from video resources). Learning Goals • Become familiar with the ArcGIS Desktop work environment • Become familiar with ArcCatalog and ArcMap • Learn how to compose thematic choropleth maps using census data • Learn how to produce and export a high-quality digital map Starting Up Data folder [data folder included with submission—Lab01] Save this file (download it) to your desktop or flash drive. You will need to extract the zipped folder (compressed) and place it in a normal folder using a tool such as Winzip or 7-zip. The folder contains a map layer of census tracts for Cook County, Illinois (an ESRI shapefile) with a data table from the 2010 census with demographic and other associated data. It also contains an Excel spreadsheet with explanations of the attribute codes (column headings) for the census data. You should refer to this spreadsheet (DP_TableDescriptions) anytime you are working with these data. If you are trying to determine whether a layer uses a vector or raster data model, you can zoom in close to the layer and carefully view it. A raster layer will eventually be “pixilated” (it will begin to break up into regular-shaped squares or rectangles), but a vector layer will not. ArcGIS Desktop is a complex suite of applications with several major modules that either run as components within the principal gateway (ArcMap) or as standalone applications themselves. These include ArcMap, ArcCatalog, ArcGlobe, and ArcScene. Most of the work here will be accomplished using ArcMap and often ArcCatalog. You can start this application by clicking the Start menu (lower left) and selecting ArcMap from the list of programs. The interface can be customized, mainly by opening and closing “Toolbars” and moving them around the workspace. The screen is divided into several sections: menus (highlighted at top of screen), several toolbars between the menu and the main display, the data frame (where your map will be displayed), and the table of contents (on left). There can also be a number of windows open and anchored vertically on the right, left, or free floating in the display (e.g., ArcToolbox, ArcCatalog). The toolbars are the most variable part of the user ­interface—they are accessed through the Customize menu and remain open in your user profile on the desktop until you close them.

13

14

GIS: An Introduction to Mapping Technologies

This is only a quick introduction to help you get started—as you work you will become more familiar and comfortable with the workspace. In this activity we will create a choropleth map of Cook County, Illinois that displays a quantitative measure of one of the data items (attributes) collected at the tract level in the 2010 census. 2010 Census Demographic Profile 1 These shapefiles join the geometry and selected attributes from the 2010 Census TIGER/Line shapefiles and the 2010 Census Summary File 1 Demographic Profile (DP1) for the United States and Puerto Rico. They contain data items such as total population, total housing units, median age, population 16 years and over, population 65 years and over, race, relationship, average household size, and others. All files are zipped. The zip-compressed folder contains the data for Cook County, Illinois that has been extracted from the national dataset for census tracts. The census tract is a geographically bounded unit with 4–8,000 people residing within it. These data can be accessed at http://www.census.gov/geo/ maps-data/data/tiger-data.html. This site is a clearinghouse of geographic data that can be accessed and used in ArcGIS. There are a number of other locations that can be accessed and explored that have public-domain GIS data available including: • City of Chicago (www.cityofchicago.org/city/en/depts/doit/ provdrs/gis.html) • Cook County GIS (https://datacatalog.cookcountyil.gov/ browse?category=GIS%2FMaps) • Illinois Geospatial Data Clearinghouse (http://clearinghouse. isgs.illinois.edu/frontpage) • The National Map (http://viewer.nationalmap.gov/viewer/) Feel free to explore these resources on your own. ArcCatalog and ArcMap Once ArcMap is open, create an ArcCatalog window using the ArcCatalog button in ArcMap (Figure 1.4). • When you open ArcCatalog the first time it will be “free floating,” but you may want to dock it in a part of the interface. Use the Catalog to connect to the folder you just downloaded and extracted on your desktop. If you do not see the folder you can connect to it using the Connect tool in Catalog. When you have connected to a folder you will stay connected to it through your profile on the desktop you are using. Explore the data in this

FIGURE 1.4 Connect button from ArcGIS Desktop; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Making Sense of Geotechnology

folder using the Catalog. Look around in the folder and find the cook_tracts_2010_demog file (Figure 1.5). Open this in ArcMap. (Hint: You can drag and drop these from the Catalog into the Table of Contents or the Data Frame. Try to get used to using the Open File dialog with the Add Data button (Figure 1.6 in ArcMap). Also spend some time familiarizing yourself with the Catalog. Move back over to the ArcMap window from ArcCatalog. This is the main framework where you will do most of your work creating maps and manipulating information in ArcGIS. You should aim to master this part of the application. Part of this will be understanding the ways that projections, datums, and coordinate systems are implemented in ArcGIS Desktop. Much of what you will do in ArcMap will be accessed through the Properties dialog boxes of the respective layers and data frames. These can be most easily accessed by pointing to the layer or data frame in the table of contents and right-clicking. As you can see there are many parts to a Layer dialog, accessed through a system of tabs at the top. Explore the different tabs and familiarize yourself with the way they work. You will also use the Data Frame Properties dialog. It can be accessed by pointing at the “Layers” label at the top of the table of contents and right-clicking, then selecting “Properties.” This is the Data Frame—there can be more than one of these open in an ArcMap document. This dialog also uses a system of tabs for different screens. There are also a number of toolbars accessible to you that can be accessed through the Customize – Toolbars menu selection. These can be turned on and off interactively and will be useful to you as you proceed. Thematic Mapping When we have a layer with rich attribute data associated with the features in the layer we can do a number of different types of thematic mapping. In this activity we will work with the attribute data associated with the census tract layer for Cook County to produce a Choropleth map— this is a type of thematic map normally made with polygon features and quantitative data associated with these features. Now you are ready to start mapping. Access the properties of the layer by right-clicking it and selecting “Properties” in the Table of Contents. Select “Quantities” on the left of this dialog.

FIGURE 1.5 ArcCatalog button from ArcGIS Desktop; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 1.6 Add data button from ArcGIS Desktop; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

15

16

GIS: An Introduction to Mapping Technologies

Now you can select an attribute field to map by using “Value.” Most of these are raw values so they may need to be “Normalized” (divided by) another field. This converts the raw value into a percentage of the total, which is easily comparable between features/census tracts, unlike raw values. In choropleth mapping most raw data needs to be normalized to convert it to a comparable ratio. Only data that is recorded as some measure of central tendency should not be normalized (values such as percentages, means, median values). For example, if you wanted to map the percentage of children under 5 you would select the field DP0010002 and normalize it by DP0010001, this will convert the raw number to a percentage of the total population. You do this because when you have many attributes/columns associated with each tract sometimes raw numbers are biased by the total number of a particular category. Remember: You have to use the spreadsheet referenced above that is found in the dataset (DP_TableDescriptions) to decode the attribute codes for the census tracts. There are 186 attributes for each tract in this dataset. Several “raw” attribute fields can be used to “normalize” data (i.e., convert it to a ratio)—these are 100% counts of an attribute of interest: • DP0010001 (100% count of persons in each census tract) • DP0030001 (100% count of persons 16 and older in each census tract) • DP0040001 (100% count of persons 18 and older in each census tract) • DP0180001 (Total housing units) • DP0130001 (Total households) • If you have questions about normalizing data do not hesitate to ask. When you normalize data this should be clearly explained in either the legend or a text box in the margins of your map. Select an attribute to map from the list of attributes—make sure you properly normalize the data if it is raw data or leave it if it is an average or some other aggregate statistic. • When you have selected data to map you should also classify the data. The Classification dialog can be accessed using the “Classify...” button in the Symbology tab of the Layer Properties dialog. Data classification in ArcGIS offers several choices for your map. Notice that the histogram (it will appear different depending on the dataset you select to map) displays the locations of the class boundaries for the classification scheme you have selected. As you change the scheme (e.g., quantile, natural breaks) or the number of classes, the class boundaries will move in the histogram to their new location, and the “Break Values” will change. You can also view the percentage of the total values that are represented in each class. Be careful here— some choices work better than others. Think about what you are trying to do—i.e., show the variability of a particular census statistic in Cook County, Illinois. Always make certain that you can explain the classification scheme you have chosen. • It is easy to make mistakes with color in mapping. ArcGIS tries to make selections easy for you with a system of palettes that

Making Sense of Geotechnology

can be easily selected, but the default choices are generally not the best. Do not ever settle for a default color scheme without trying some different options and becoming more adept at map design! There is a useful guide to color choices in mapping from the faculty and staff in the Department of Geography at Pennsylvania State University that can be helpful (http:// colorbrewer2.org/). Use the tools available to you to make an effective and aesthetically pleasing map that clearly communicates what you are trying to show. • Now—to be clear—you are to produce a high-quality map of a single demographic attribute for Cook County formatted as instructed—do not rush!! Take your time. Sloppy work is usually the result of working too quickly. Saving Your Work There are normally two options for saving your work in ArcGIS—either as a map document file (mxd) or map package (mpk) file. In general, a map file is less portable than a map package since it does not contain the data used to make the map. If you want to take everything with you that will allow you to make your map completely portable then create a map package. To see detailed instructions on these two procedures read the help sections “saving a map” and “creating a map package.” Remember if you create a map package all the data contained in the map is saved in the package, so these files can be quite large, especially when they include raster layers. The new files should be slightly larger than the sum of all the data layers used to create them. It is also important to remember that a map document file (mxd filetype) does not contain the layers used to make your map—rather it contains pointers to the locations of those layers and other conditions and edits you may have applied to them. For your map to work you have to have all of this. The easiest way is to use one system for all your work, but this may be a problem if you are using a desktop in a shared or public facility. Later we will generate “pdf” files to “turn in” our work (or share it with others). Finishing: Creating a Layout When your map is well-centered, switch to the “Layout” view of the Data Frame. You have been working in the “Data” frame—you can switch between the Data and the Layout frames using the selections in the “View” menu or use the small buttons in the bottom-left corner of the frame. As you begin the Layout, make sure that you uncheck the “Use Printer Paper Settings” box in the page and print setup of the file menu, and then set the size to 36” by 24.” Select “Landscape” or “Portrait” and click the “Scale Map Elements proportionally...” box. Map design is a massive topic and encompasses the discipline of ­cartography, which is unfortunately becoming a subset of GIS, and many design tasks that were formerly done by specialists are now automated. Design is perhaps best learned by doing. Imagine that you are the user of the map you are designing and then design a map that you find attractive, useful, and logically structured. When you start designing you will

17

18

GIS: An Introduction to Mapping Technologies

find that there are many intangibles that require subjective decisions— any change that you make, even the most minor—can change the way every element of the map works. In the end do not be afraid to experiment with your design—it matters. Do not blindly accept the defaults. And the more maps you make the more you will begin to understand the importance of design. You should begin to think of the tasks involved with creating maps as a form of writing, mainly using different symbols than when creating text. In most cases clarity is the most important. When you think you are done, look at your map and ask, “Is this clear?” If not, get back to work. Your final map should be formatted to most effectively use the space on the page, whether landscape or portrait format. It should include an appropriate title, legend, scale, and a text box that briefly describes the process of making the map and the data used. You also have the option of adding and using a basemap layer to your map composition. If you use outside sources for your text make sure you cite appropriately. These

FIGURE 1.7 Export map dialog; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Making Sense of Geotechnology

can all be added using the “Insert” menu. Also make certain to give yourself at least a 1-inch margin around the edges of your layout to make certain it all makes it onto the pdf file you will later create. When you have your map/image where you want it to be in the Layout module, use the File > Export Map... function to create a pdf copy of your map and save it in a secure location. This becomes a high-quality (and portable) digital file for your portfolio. We highly recommend that you also check the box in the “Advanced” tab to Export Map Georeference Information. Exporting a map with the Export Map Georeference Information option enabled will record each data frame’s georeference information inside the pdf. When the georeferenced pdf is opened in a compatible viewer, such as Adobe Reader 9 or later, the user can access geospatial functions such as coordinate readout and finding x,y (more information on this can be found in ArcGIS Help). The file can also be added to ArcGIS Desktop (Figures 1.7 and 1.8).

FIGURE 1.8 Export map dialog with georeference selection checked; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

19

20

GIS: An Introduction to Mapping Technologies

In a separate document, answer the following questions:



1. What attribute values did you choose to map? Why? 2. What data model do you think is used on this layer? Raster or vector? Look carefully at some of the census tracts in Cook County—are they all identical? Define the data model used here. 3. What classification scheme did you choose (e.g., quantile, equal interval)? Explain how this scheme works, including a discussion of the number of classes, how class boundaries were chosen, etc. Why did you choose it?

When you have completed both of these (the pdf map composition and the Word document responding to these questions) submit both of them to your instructor’s dropbox.

References Burns, R., Dalton, C. M., and Thatcher, J. E. 2018. Critical Data, Critical Technology in Theory and Practice. The Professional Geographer 70 (1): 126–28. Crampton, J. W. 2009. Cartography: Maps 2.0. Progress in Human Geography 33 (1): 91–100. Curry, M. 1998. Digital Places: Living with Geographic Information Technologies. Routledge: London and New York. Kwan, M.-P. 2002. Feminist Visualization: Re-envisioning GIS as a Method in Feminist Geographic Research. Annals of the Association of American Geographers 92 (4): 645–61. McHarg, I. 1969. Design with Nature. Doubleday/Natural History Press: Garden City, NY. McLafferty, S. L. 2002. Mapping Women’s Worlds: Knowledge, Power and the Bounds of GIS. Gender, Place and Culture: A Journal of Feminist Geography 9 (3): 263–69. Pickles, J. ed. 1994. Ground Truth: The Social Implications of Geographic Information Systems. Guilford: New York. Sack, R. D. 1980. Conceptions of Space in Social Thought: A Geographic Perspective. Macmillan: London. Schulten, S. 2001. The Geographical Imagination in America, 1880–1950. University of Chicago Press: Chicago. Sheppard, E. S. 1995. GIS and Society: Towards a Research Agenda. Cartography and Geographic Information Systems 22 (1): 5–16. Sui, D. Z. 2004. Tobler’s First Law of Geography: A Big Idea for a Small World? Annals of the Association of American Geographers 94 (2): 269–77.

Video Resources Opening Application (http://youtu.be/oYh-NiTUyQw) Help System (http://youtu.be/NGAzPEKN0YQ)

Making Sense of Geotechnology

Add Data (http://youtu.be/_YPAsrV5sRM) Using Basemaps (http://youtu.be/A71NZDH118E) ArcCatalog (http://youtu.be/s57WVD6-g2U) ArcCatalog Properties (http://youtu.be/PNRGRYdgZc0) Table of Contents (http://youtu.be/VRvr_afHoXw) Layer Properties (http://youtu.be/R9qKAh1MO4s) Dataframe Properties (http://youtu.be/ifXbWfXnJec) Saving a Map (http://youtu.be/fW2WyggF-Gs) Coordinate Systems 1 (http://youtu.be/cxNmuLvT4MI) Coordinate Systems 2 (http://youtu.be/9_3mF2q51Ho) Coordinate Systems 3 (http://youtu.be/JlQ3RV5eLXE) Choropleth Mapping (http://youtu.be/GYkKA5j02KQ) Choropleth Classification (http://youtu.be/ZVGkxNMyqus) Attribute Table 1 (http://youtu.be/ZLpHmnj0UZM) Attribute Table 2 (http://youtu.be/fKZ3cpRgRv4) Layout Intro (http://youtu.be/dAR2ews8nSY) Layout 1 (http://youtu.be/fRmpzVCIm-s) Layout 2 (http://youtu.be/cJdk9LUViiM) Layout 3 (http://youtu.be/1j9KbXs88i0) Layout 4 (http://youtu.be/wqF4j3uQf1Q) Layout 5 (http://youtu.be/Rvz5AWINGQg) Exporting a pdf (http://youtu.be/WUYFPGOyexE)

21

2 Georeferencing

2.1 Needs for Metric Georeferencing Geospatial data differ from other types of data in that data are geographically referenced. One must understand and master coordinate systems to become functionally adept in the use and application of GIS. Locations are uniquely assigned to data in reference to coordinate systems in GIS. Georeferencing enables geospatial data of disparate sources to be integrated. Without this integration of data, relationships among different variables will not be holistically explored, and high-value information based on multiple criteria (such as suitable sites) will not be created. Consider two methods for identifying a location on the Earth’s surface. The location of the White House can be expressed in a street address or ­latitude/ longitude as follows: • 1600 Pennsylvania Ave NW, Washington, DC 20500 • 38°53′52.6452″ N 77°2′11.6160″ W The former is a relative and nonmetric method of georeferencing whereas the latter is an absolute and metric method of georeferencing. A street address is identified relative to a street network. Thus, the location of the building would change if the street network changed. Two people in the same building cannot be distinguished if a street address is used as a method of georeferencing. In contrast to a street address, a location referenced by latitude and longitude is not measured with respect to geographic features that might change (absolute), and can be identified in an infinitely fine spatial resolution (metric). This enables any location to be identified unambiguously and precisely. One of the aspects that distinguish geospatial data from other data types is that coordinates of spatial entities are explicitly coded in a database. Such data are geographically referenced with respect to a universally recognized coordinate system in contrast to spreadsheets, photos, or computeraided design files. To understand coordinate systems that are used in GIS, it is necessary to understand datums and map projections.

23

24

GIS: An Introduction to Mapping Technologies

2.2 Understanding Datums Just like street addresses are measured with respect to street networks, latitude and longitude are measured with respect to the surface of the Earth. Consider how your height (one-dimensional value) is measured. Height is measured with respect to the floor. You can measure the location of trees (two-dimensional value) as (x, y) coordinates with respect to the origin (0, 0) in a local Cartesian coordinate system if you presume the surface is flat. The presumption is not valid any more if the goal is to measure locations in the entire Earth’s curved surface. To measure locations on the Earth’s surface, consider a three-dimensional earth model as a base that forms the theoretic component of a geodetic (or map) datum. Figure 2.1 shows how latitude and longitude are determined. Latitude and longitude are measured in angular units, not linear units (such as meters) because latitude and longitude are measured in reference to a three-­ dimensional earth model. If you assume the Earth’s surface is a sphere, latitude is the angular distance (φ) between the plane of the equator and a line passing through the point under investigation and the center of the Earth. Latitude ranges from 0° at the equator to 90° at the poles. The latitude 38°53′52.6452″ N means the location is 38°53′52.6452″ north of the equator. Longitude is the angular distance (λ) between the prime meridian and the meridian of the point under investigation. Longitude ranges from 0° to 180°. The longitude 77°2′11.6160″ W means the location is 77°2′11.6160″ west of the prime meridian. In geodetic surveying and GIS, a sphere is not usually used as an Earth model because a sphere is not the most accurate way to characterize the Earth. After measuring the shape and size of the Earth for many years, geodesists realized that an oblate ellipsoid represents the Earth more accurately than a sphere. Accumulated centrifugal force caused by the Earth’s rotation makes

FIGURE 2.1 Determining latitude and longitude.

Georeferencing

25

the equatorial axis (semimajor axis) longer than the polar axis (semiminor axis). Hence, geodetic coordinates are measured with respect to an ellipsoid. Figure 2.2 illustrates the difference between the Earth as a sphere and the Earth as an oblate ellipsoid in reference to how latitude is determined. Although it would be ideal to have one ellipsoid for the entire world, this is not reality. Different countries have developed their own ellipsoids to best fit their local areas (Table 2.1). For instance, GRS 80, the ellipsoid used for mapping North America, models the Earth as the elliptical surface that has semimajor axis 6,378,137 m and semiminor axis 6,356,752.3141 m. Different countries have also developed a network of surveyed control points that form an empirical component of datums (Figure 2.3). Different specifications of the Earth also reflect surveying technology at the time of surveying. The United States previously used Clarke 1866 as a reference ellipsoid. This ellipsoid forms the basis of the North American Datum of 1927 (NAD27), which was phased out in the early and mid-2000s. WGS 84 is the reference system for GPS and for mapping the entire world. Latitude and longitude collected with a GPS-equipped device (such as a smartphone) is measured using WGS 84. Pseudo Mercator—widely used as a coordinate system used for web mapping—is based on WGS 84 as the web map is designed to show any location in the world unlike other projection that is usually designed to show particular areas. Altitude is measured with respect to a geoid that approximates the mean sea level. Unlike a sphere or an oblate ellipsoid, a geoid is not a mathematical surface. The ellipsoid serves as the basis for horizontal datums, and a geoid serves as the basis of vertical datum. Coordinates reference different datums depending on what jurisdictions created the geospatial data. Figure 2.4 shows the spatial reference section of the metadata in XML format. The metadata shows that latitude and longitude are measured with respect to the North American Datum of 1983, using GRS 80 as an ellipsoid. In sum, datums consist of control points as empirical components and ellipsoids (geoids) as theoretical components for horizontal (vertical) measurement and are used in coordinate systems as reference surfaces.

FIGURE 2.2 Earth as sphere vs. Earth as oblate ellipsoid. (Reprinted with permission from Paul Longley, Michael Goodchild, David Maguire, and David Rhind, Geographic Information Systems and Science, 2nd Edition, Wiley, 2005.)

26

GIS: An Introduction to Mapping Technologies

TABLE 2.1 Selected common ellipsoids used for regional, national, and international mapping and GIS applications Ellipsoid Australian National 1966 Clarke 1866 International 1924 GRS80: Geodetic Reference System 1980 WGS84: World Geodetic System 1984 Normal Radius of the Earth

Semimajor Axis (m)

Semiminor Axis (m)

Use

6,378,160

6,356,774.719

Australia

6,378,206.4 6,378,388 6,378,137

6,356,584.467 6,356,911.946 6,356,752.3141

6,378,137

6,356,752.3142

North America Remaining parts of the world Adopted in North America for 1983 Earth-centered coordinate system Used with GPS and by NASA

6,370,997

6,370,997

A perfect sphere

FIGURE 2.3 Geodetic control points that are established through surveying.

FIGURE 2.4 Spatial reference section in the geospatial metadata.

Georeferencing

27

2.3 Understanding Map Projection Map projection is another key component of coordinate systems that must be understood. Map projection can be considered as the process of flattening the Earth’s curved surface. Although the Earth is curved, most of media that represent the features of the Earth (such as maps and computer screens) are flat. This means that positions on the Earth’s curved surface should be rendered (projected) onto a flat map surface (Figure 2.5). Map projection refers to mathematical operations that convert longitude and latitude defined on the curved Earth onto (x, y) coordinates in a flat (Cartesian) coordinate system. One way to understand how features on the curved Earth are rendered onto a flat map surface is to visualize projecting light from a lightbulb at the center of a miniature globe onto a transparent sheet of paper that wraps around it (Figure 2.6). The graticule (gridlines made of parallels and meridians) drawn on the curved Earth will be projected onto the upwrapped sheet of paper (a flat map surface). The miniature globe is called the reference globe, and the sheet of paper is called the developable surface. Mapping the contiguous United States on a flat surface can be understood as the process of drawing the contiguous United States on the reference globe, projecting light through the globe onto the developable surface, and unwrapping the developable surface. The size and shape of the land shown in maps are artifacts of map projection. Figure 2.7 shows the world in Mercator projection and Gall-Peters projection. The Mercator map on the left shows the shape accurately but distorts size greatly toward high latitude. The Peters map on the right portrays area accurately but distorts shape. For example, you can determine the true shape of Alaska using a Mercator map and the true size of Alaska relative to other features using a Peters map. A caveat of using map projection is that at

FIGURE 2.5 Map projection flattens the curved Earth’s surface.

28

GIS: An Introduction to Mapping Technologies

FIGURE 2.6 How map projection is constructed. (Adapted from ESRI, Understanding Map Projections, Retrieved from http://downloads2.esri.com/support/documentation/ao_/710Understanding_ Map_Projections.pdf.)

FIGURE 2.7 The world in Mercator projection vs. Gall-Peters projection.

least one geometric property is distorted. A Mercator map may suggest that Alaska and Brazil are similar in size, but Brazil is nearly five times larger than Alaska. The size of Alaska might be true to scale in the Peters map, but the shape of Alaska will be greatly distorted. Thousands of map projections have been devised to serve different purposes and thus it is useful to classify them. There are three families of map projection: azimuthal, cylindrical, and conic. They use developable surface in different shapes. If the shape of a developable surface is a plane, cylinder, or cone, the map projection is called azimuthal, cylindrical, and conic, respectively, as shown in Figure 2.8. Cylindrical projection can show the entire world. Conic projection is good for showing areas with a longer ­east-west extent in mid-latitude. Azimuthal projection is commonly used to show hemispheres. Map projection can also be classified in terms of a geometric property preserved in a map. The type of map projection that preserves shape true to scale is called conformal. The type of map projection that preserves area true to scale is called equal-area. Figure 2.9 shows how circles drawn on the reference globe change after map projection. The shape of the circles does not change in a Mercator map, which is conformal. The shape of circles changes from orthogonal circles to skewed circles, but size is preserved in Mollweide which is equal-area. Distortion is unavoidable in projected maps that represent features on the curved Earth (Figure 2.10). Since the amount of distortion affects the

Georeferencing

29

FIGURE 2.8 Family of map projection. (From QGIS 2.8  Documentation, retrieved from https://docs.qgis. org/2.8/en/docs/gentle_gis_introduction/coordinate_reference_systems.html)

FIGURE 2.9 Properties of map projection. (From Eric Gaba, Wikimedia Commons User: Sting, retrieved from https://commons.wikimedia.org/wiki/File:Tissot_indicatrix_world_map_Mercator_proj.svg and https://commons.wikimedia.org/wiki/File:Tissot_indicatrix_world_map_Mollweide_ proj.svg.)

positional accuracy of maps, it is important to understand the patterns of distortion in map projection. Figure 2.11 shows that the line that intersects a developable surface and reference globe—called the standard parallel—has the least distortion. The farther from standard parallel the more distortion present (i.e., less accurate). Letting the developable surface cut through the

30

GIS: An Introduction to Mapping Technologies

FIGURE 2.10 Distortion is unavoidable in map projection.

FIGURE 2.11 Pattern of distortion in map projection. (From Anzlic Committee on Surveying & Mapping, retrieved from https://www.icsm.gov.au/education/fundamentals-mapping/projections.)

reference globe results in two standard parallels (known as the secant case). Standard parallels are chosen so that the overall distortion is minimized in the areas to be projected. The same principle applies to other families of map projection. The contiguous United States is commonly depicted in a type of map projection known as Albers Equal-Area Conic (Figure 2.12), which uses a cone as a developable surface (conic family) that cuts through the reference globe around two standard parallels (secant case) and preserves area true to scale (equal area). Figure 2.12 also shows that the first standard parallel is in 29°30′ and the second standard parallel is in 45°30′. These two lines of parallels fall within the areas of interest (contiguous United States). The aim is to minimize the overall distortion, which is lowest around standard parallels. Having two standard parallels within an area of interest reduces distortion more than having one standard parallel in the middle of an area of interest.

Georeferencing

31

FIGURE 2.12 Parameters of Albers Equal-area map projection showing the contiguous United States; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

2.4 Coordinate Systems for GIS Coordinate systems used in GIS can be understood in terms of a progressive flattening of the Earth’s irregular shape (Figure 2.13). First, the rugged Earth’s surface is modeled as a geoid to measure altitude and as an ellipsoid to ­measure latitude and longitude. Second, latitude and longitude are defined with respect to a horizontal datum that fits the area of interest. Third, if needed, ­appropriate parameters of map projection (such as family, property, and standard p ­ arallels) are selected to depict an area of interest in a flat surface as r­ ealistically as possible while meeting the requirements of the map being created. GIS include tools that can display maps in any coordinate system on the fly or change coordinate systems permanently as a preliminary to spatial analysis based on Euclidean geometry, which assumes coordinates on a flat surface.

FIGURE 2.13 Coordinate systems in terms of progressive flattening of the Earth’s irregular surface.

32

GIS: An Introduction to Mapping Technologies

There are two types of coordinate systems used in GIS—the geographic coordinate system (GCS) and the projected coordinate system (PCS). The GCS refers to coordinates defined on the curved Earth, and the PCS refers to coordinates defined on the flat (projected) surface as illustrated in Figure 2.14. A GCS is simply an alternative name for latitude and longitude, which are defined on the basis of a particular datum. (Note that the same latitude and longitude may differ depending on datum.) Longitude and latitude on the curved Earth may be rendered onto a flat surface using map projection, resulting in (x, y) in a Cartesian (projected) coordinate system if needed. A PCS is defined on the basis of a GCS. Many publicly available geospatial data are stored in GCS to allow the data to be displayed and projected as needed. Since there is no way to display geospatial data stored in a GCS (a threedimensional coordinate system) on a flat computer screen, it is displayed by default using Plate Carrée in a GIS. Plate Carrée represents one degree of latitude and longitude in one linear unit on a flat surface. While a line of longitude has constant spacing, one degree of spacing in a line of latitude gets smaller toward high latitude and becomes zero at the pole. The zero dimension at the pole is forced to expand to the equator, resulting in distortion in shape (extended in east-west direction) and size (larger) in high latitude. Figure 2.15 shows the contiguous United States in Plate Carrée on the left and in Albers Equal-Area Conic with standard parallels falling within an area of interest after the layer is projected on the right. Latitude and longitude (or GCS) are commonly expressed in degrees, minutes, and seconds (DMS) format. In a GIS, decimal degrees are widely used to express latitude and longitude. Decimal degree expresses latitude and longitude in fractional degree by converting minutes and seconds to

FIGURE 2.14 Geographic coordinate system vs. projected coordinate system.

Georeferencing

33

FIGURE 2.15 Data stored in GCS displayed in Plate Carrée vs. Albers Equal-Area Conic.

degrees. To convert minutes to degrees, you divide the number of minutes by 60. To convert seconds to degrees, you need to divide the number of seconds by 3600. For example, latitude 42°55′26.30″ N is equal to 42.92397 (= 42 + 55/60 + 26.30/3600) in decimal degrees. Decimal degree takes a negative value if the direction of latitude is south or the direction of longitude is west. Figure 2.16 shows longitude, latitude in quadrants of the geographic grid depicted in Plate Carrée. Longitude and latitude in the northern and eastern hemisphere will have positive values (+, +) in decimal degrees. Longitude and latitude in the southern and western hemisphere have negative values (−, −) in decimal degrees. For instance, longitude 87°39′19.67″ W is equal to −87.65546 (= −1 × (87 + 39/60 + 19.67/3600)) in decimal degrees since the direction of longitude is west. One of widely used PCS in GIS is the Universal Transverse Mercator (UTM). The UTM was initially developed by the US Army Corps of Engineers and has been adopted by various mapping agencies as a standard PCS for topographic (reference) mapping all around the world. Many publicly available raster data (such as digital orthophoto quadrangles) are cast to the UTM. The UTM divides the world into 60 zones by six-degree longitudinal strips (Figure 2.17). The zone numbering increases eastward from the international dateline (180° longitude). While Mercator uses a cylindrical developable

FIGURE 2.16 Signs of decimal degree in quadrants of geographic grid.

34

GIS: An Introduction to Mapping Technologies

FIGURE 2.17 UTM zones of the world. (Reprinted with permission from National Geospatial-Intelligence Agency, retrieved from http://earth-info.nga.mil/GandG/coordsys/grids/grid1.html.)

FIGURE 2.18 How UTM zones are defined. (From United Nations Statistics Division, UNGEGN-ICA webcourse on Toponymy, retrieved from https://unstats.un.org/unsd/geoinfo/UNGEGN/docs/_ data_ICAcourses/_HtmlModules/_Selfstudy/S06/S06_03.html.)

surface placed around the equator, the UTM uses six-degree longitudinal strips as a developable surface placed around the meridian in the middle of each UTM zone (known as the central meridian; Figure 2.18). Each UTM zone has its own Cartesian coordinate system, where the equator serves as an x axis and central meridian serves as a y axis. A position can be located using UTM zone and (x, y) offset from an origin of the Cartesian coordinate system. A meter is a measurement unit for the UTM. To avoid negative coordinate values, an origin shifts westward (500,000 m) and shifts

Georeferencing

35

southward (1,000,000 m) in the southern hemisphere. The amount of shift in the west and south directions is referred to as false northing and false easting, respectively. For instance, UTM coordinates 16N (445572, 4641636) means the coordinates are located 445,572 m east and 4,641,636 m north of the origin in UTM zone 16N. Municipalities and jurisdictions govern local and regional assets such as land records and utilities and thus have a need for precisely measuring the locations of these assets using a specific coordinate system. State Plane Coordinate (SPC) systems were developed to meet the needs of administrative units to precisely survey and inventory properties in the United States. SPC systems divide the United States into zones that follow state boundaries and then subdivide each zone into smaller zones that are designated mostly by direction (West, East, North, South; Figure 2.19). For instance, Illinois has two SPC zones—IL East and IL West. An SPC system is a projected coordinate system. Two map projections— Lambert conformal conic and transverse Mercator—are used for the SPC system (Figure 2.20). A developable surface cuts through two standard parallels in Lambert conformal conic, making it suitable to map areas that have a greater east-west extent. Transverse Mercator is good for showing areas with a great south-north extent as distortion is minimized around the central meridian. For instance, Kansas (with a greater east-west extent) uses Lambert conformal conic, and Illinois (with a greater south-north extent) uses transverse Mercator. The SPC system also use false northing and false easting to avoid negative coordinate values.

FIGURE 2.19 SPC system zones.

36

GIS: An Introduction to Mapping Technologies

FIGURE 2.20 Map projections used for SPC systems. (From NOAA, retrieved from https://www.ngs.noaa. gov/PUBS_LIB/ManualNOSNGS5.pdf.)

2.5 Address Geocoding As locations are commonly identified by street addresses, they often need to be converted to metric coordinates such as latitude and longitude. A standard GIS software comes with address geocoding. Geocoding refers to the process of turning the textual descriptions of locations (e.g., street addresses, place names) to metric descriptions of locations (e.g., decimal degrees). Provided that addresses are defined in reference to a street network, address geocoding requires three components: (a) input table containing street addresses; (b) reference database (transportation network) against which addresses can be matched; and (c) an algorithm that determines how addresses will be matched against the reference data. Today geocoding is commonly performed using an online geocoding service as it is convenient to store a large volume of reference data with frequent updates on a server. For address geocoding to work, street addresses should be formatted in the order of housing number, prefix direction (if any), street name, street type, suffix direction (if any), city, state, and zip code (e.g., 1234 N. Main St, Chicago IL 60012) in the case of US streets. A reference database should be comprised of a road segment that contains the street name, address range, adjacent zip code, etc., in a predefined format. Figure 2.21 shows that a selected road segment falls within the address range between 2300 and 2399 in N Kenmore Ave. The fields named L_F_ADD_INT and L_T_ADD_INT contain the beginning and ending house number on the left-hand side (at the digitizing order) of the street centerline, respectively. A geocoding algorithm goes through several steps—parsing and standardization, matching, and location assignment. It first splits street addresses into standardized values of components. For example, the street address “2350

Georeferencing

37

FIGURE 2.21 Reference data for address geocoding; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

North Kenmore Avenue, Chicago IL 60614” is parsed into standardized components “2350 N Kenmore Ave, Chicago IL 60614” so that Avenue and Ave are treated equally. Second, it finds potential matches between an input street address and a road segment in a reference database by calculating a score that measures how well input data matches candidate road segments in reference data. The road segment with the highest score such as “2300–2399 N Kenmore Ave, Chicago, 60614” would be found as a match. Third, if a match whose score exceeds a minimum score is found, the location is estimated with the assumption that parcel size is equal along the matched road segment. That is, 2350 N Kenmore will be located in the middle of the matched road segment within the address range of 2300 to 2399. Thus, the quality of address geocoding results depends on the quality of the input table, reference database, and geocoding algorithm. Geocoding can be customized as there are multiple parameters that can be adjusted including the minimum required score, misspelling sensitivity (how much to penalize typos in input table), and offset value (how far the output point can be located from the street centerline). GIS software provides interactive and manual geocoding tools to help deal with cases where either multiple matches or no match is found. Although network-based address geocoding is the most common, it may not be adequate for use cases that require highly precise geocoding results for emerging applications like self-driving.

38

GIS: An Introduction to Mapping Technologies

ACTIVITY 2.1 COORDINATE REFERENCE SYSTEM (ARCGIS 10.5) In this lab activity, you will learn how to turn XY data (earthquake spreadsheet data) into geospatial data and project maps showing different parts of the world on the fly. 1.  Add a text file and basemap in ArcMap. • Get earthquake data signif.txt in the Lab02 folder. The data comes from the NOAA Significant Earthquake Database website at https://www.ngdc.noaa.gov/nndc/struts/form?t=​ 101650&s=1&d=1. The text file signif.txt is in tab-delimited ­format. Variable definitions can be viewed at https://www. ngdc.noaa.gov/nndc/struts/results?&t=101650&s=225&d=225. With ArcMap, you can open and view a tab-delimited text file if data is consistently formatted. A. Add a tab-delimited text file. • Open ArcMap. • Click Cancel in the ArcMap—Getting Started. This will open an empty map. • Add data: (1) Click the Add Data icon in the Standard Toolbar. (2) Connect to Folder in the Add Data dialog box to navigate to the working folder where data is saved. (3) Select the working folder from the Look in: dropdown list. (4) Choose data. This will add signif.txt from the folder connection (Figure 2.22). • Open the attribute table of the text file added to the Table of Contents (right-click on signif.txt in the Table of Contents > Open) (Figure 2.23). • Freeze the field (column) YEAR (right-click on the column header YEAR > Freeze/Unfreeze Column). This will keep this field visible when scrolling the table left or right (Figure 2.24). • Scroll the table right to check out fields LATITUDE and LONGITUDE (Figure 2.25). • Scroll the table all the way to the bottom to check out significant earthquakes in year 2015 (Figure 2.26). • Close the attribute table.

FIGURE 2.22 ArcMap– Add data; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Georeferencing

FIGURE 2.23 ArcMap—Open a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.24 ArcMap—Freeze a column; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.



B. Add a basemap. • Add a basemap (click the arrow next to the Add Data ­button > Add Basemap) (Figure 2.27). • Choose Topographic in the Add Basemap window and click the Add button. This will add a basemap (Figure 2.28). The added basemap references the WGS 84 Web Mercator. This means that latitude and longitude are defined using WGS 84 (the specification of the Earth’s dimension as an oblate ellipse), and then coordinates are projected onto the flat surface using Web Mercator (also known as Pseudo

39

40

GIS: An Introduction to Mapping Technologies

FIGURE 2.25 ArcMap—Scroll to the right in a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.26 ArcMap—Scroll to the bottom in a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.27 ArcMap—Add a basemap; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Mercator). Since the basemap is the first layer added to the map, the coordinate system of the basemap automatically sets the coordinate system of a Data Frame. • To check the coordinate system of the current active Data Frame, right-click on a stack of yellow layers named Layers in the Table of Content > Properties. This will open the Data

41

Georeferencing

FIGURE 2.28 ArcMap—Available basemaps; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Frame Properties window. The coordinate system should be set to WGS 84 Web Mercator. Close the Data Frame Properties window (Figure 2.29). 2.  Display XY data in ArcMap. • Right-click on the signif.txt in the Table of Contents and click Display XY Data (Figure 2.30). A. Choose XY fields. • Set LONGITUDE from the dropdown list to X field and LATITUDE to Y field. Recall that longitude is east and west of the Prime Meridian, equivalent to X, and latitude is north and south of the equator, equivalent to Y. If it is confusing, develop a habit of calling long/lat for (X,Y; Figure 2.31) • Note that Input Coordinates references the Web Mercator (which will be corrected later). Click OK for now. • Click OK if the message “Table Does Not Have Object-ID Field” appears. The message is to inform you that a text file does not have a unique ID and any attribute table in ArcGIS needs a unique ID. We will fix this problem later.

42

GIS: An Introduction to Mapping Technologies

FIGURE 2.29 ArcMap—Coordinate systems in Data Frame properties; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.30 ArcMap—Display XY data in the table of content; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Georeferencing

FIGURE 2.31 ArcMap—Display XY data dialog box; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

This will create events data from the significant earthquake data. This will plot earthquakes somewhere in the ocean west of Africa. What’s gone wrong? • If you would like to zoom to the extent of the signif.txt Events, right-click on the signf.txt Events > Zoom to Layer (Figure 2.32). • Longitude and latitude is a GCS, but coordinates are referenced to the wrong reference system (Web Mercator). In other words, this data needs to be referenced to GCS, not Web Mercator (which is a PCS). Let us fix this problem. • Zoom to the full extent of the map (Figure 2.33). • Remove the signif.txt Events (Figure 2.34). • Right-click on the signif.txt in the Table of Contents and click Display XY Data (again). Select LONGITUDE from the X Field dropdown list and LATITUDE from the Y Field dropdown list (Figure 2.35).

43

44

GIS: An Introduction to Mapping Technologies

FIGURE 2.32 ArcMap—Zoom to layer; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.33 ArcMap—Zoom to the full extent of a map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.34 ArcMap—Remove a layer in the table of contents; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Georeferencing

FIGURE 2.35 ArcMap—Edit button in a display XY data dialog box; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Note that data references a PCS. ArcMap automatically imports the coordinate system of a basemap (the layer added first to the current Data Frame) into the Display XY data window. Longitude and latitude should reference GCS. To change the coordinate system of the input coordinates or tie coordinates (latitude and longitude) to the appropriate datum:



B. Choose an appropriate coordinate system of input coordinates. • Click Edit... button (at the lower right) in the Display XY data window. Spatial Reference Properties will appear. • Scroll all the way to the top and expand the GCS by clicking the plus sign left of Geographic Coordinate Systems. Then choose World > WGS 1984. Click OK (Figure 2.36). Now XY coordinates are referenced to a GCS defined on the basis of WGS 1984 (horizontal datum for mapping the world; Figure 2.37). • Click OK after confirming that the XY coordinates reference the GCS. • Click OK when the message on a unique identifier appears. C. Export Events to a shapefile

Events that result from the Display XY Data tool are temporary, meaning the data is not saved permanently. If you exit the map, the Event layer will be gone. To save the Event layer permanently:

45

46

GIS: An Introduction to Mapping Technologies

FIGURE 2.36 ArcMap—Choose WGS 1984 in the Data Frame properties window; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.37 ArcMap—A result of WGS 84 chosen in the Data Frame properties window; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Right-click on signif.txt Events > Data > Export Data (Figure 2.38). • Export all features using the same coordinate system as this layer’s source data. Click the Browse button to save the shapefile in a working folder (Figure 2.39). • Click Yes if prompted for “Do you want to add the exported data to the map as a layer?” • Remove signif.txt Events and signif.txt in the Table of Contents (right-click on the layer > Remove). • Unselect features if any of them are selected (Figure 2.40). • Save a map as Earthquakes in Web Mercator.mxd in the working folder (File > Save). Close map.

Georeferencing

FIGURE 2.38 ArcMap—Export a layer into a geospatial dataset; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.39 ArcMap—Save a layer as a shapefile; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.40 ArcMap—Clear selected features; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

47

48

GIS: An Introduction to Mapping Technologies

3.  Display earthquakes in different map projections • Open an empty ArcMap. • Add the layer Signif_Earthquakes.shp in the Earthquakes folder. The coordinate system of a Data Frame is set by the coordinate system of the layer you just added. A status bar at the bottom right of the map display area should show decimal degree (Figure 2.41). • Add a basemap “Imagery with Labels.” If the basemap is not shown after a while, refresh the view by clicking the Refresh icon (or hitting F5) in the status bar at the bottom left of the map display area (Figure 2.42). • Open the Data Frame Properties window (right-click on the yellow stack Layers > Properties). Click the Coordinate System tab. Close the Data Frame Properties window. Note that the layer Signif_Earthquakes is stored in WGS 84/GCS. Data is stored in GCS but is displayed in Plate Carrée.

FIGURE 2.41 ArcMap—Coordinate units in the status bar; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.42 ArcMap—Choose and refresh a basemap; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Georeferencing

A. Mapping the world • Change the coordinate system of the Data Frame to the Cylindrical Equal Area (World) (i.e., Gall-Peters). To do this, open the Data Frame Properties window > click the Coordinate System tab > Projected Coordinate Systems > Cylindrical Equal Area (world). This will change the coordinate system of the Data Frame on the fly. It can be noted that standard parallel is at (0.0), the equator in this map projection. This is an equal area map. However, the distortion of shape increases as it gets far away from standard parallel (Figure 2.43). • Change the coordinate system of the Data Frame to Projected Coordinate System (PCS) > World > Mercator. It should be noted that standard parallel is at the equator. This is a conformal map that portrays shape true to scale. However, the distortion of area (size) increases moving toward high latitude. • Change the coordinate system of a Data Frame to PCS > World > The World from Space. The map will show the world from space with the map centered at longitude and latitude (−72.53, 42.53) of origin (Figure 2.44). • Change the coordinate system of a Data Frame to PCS > World > Goode Homolosine (Land). The Goode map is an equal-area, and is interrupted in order to reduce distortion in shapes (Figure 2.45). • Change the coordinate system of a Data Frame to Projected Coordinate Systems (PCS) > World > Mollweide. This map projection is an equal-area that depicts area true to scale. B. Mapping the American Continent • Zoom to the North America. North America should be significantly distorted in the Mollweide. To fix this, let us display North America with less distortion (Figure 2.46). • Change the coordinate system of a Data Frame to PCS > Continental > North America > North American Albers Equal Area Conic. Note that two standard parallels (of

FIGURE 2.43 ArcMap—Project the world to cylindrical equal area on the fly; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

49

50

GIS: An Introduction to Mapping Technologies

FIGURE 2.44 ArcMap—Project the world to orthographic (the world from space) on the fly; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.45 ArcMap—A result of projecting a map to Goode Homolosine (land).

Georeferencing

FIGURE 2.46 ArcMap—North America in the Mollweide projection.

FIGURE 2.47 ArcMap—Display North America in North American Albers Equal Area Conic; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

51

52

GIS: An Introduction to Mapping Technologies

FIGURE 2.48 ArcMap—A result of displaying North America in North American Albers Equal Area Conic; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.



latitude) lie at 20° north and 60° north (Figures 2.47 and 2.48). • Zoom to the Full Extent. It can be seen that the map projection is supposed to show North America best, and a conic developable surface is used to construct map projection. • Change the coordinate system of a Data Frame to PCS > Continental > South America > South America Lambert Conformal Conic. Note where two standard parallels (of latitude) lie: They are 5° south and 42° south (Figure 2.49). • Zoom to South America. The map projection is designed to show South America with the least distortion using two standard parallels that lie in South America (Figure 2.50). C. Mapping the United States • Zoom to the Full Extent. • Change the coordinate system of a Data Frame to PCS > UTM > NAD 183 > NAD 1983 UTM Zone 10N while noting where the central meridian −123 lies. The central meridian is the line of longitude where a developable surface basically wraps around a reference globe (Figure 2.51). UTM Zone 10N shows the most of the Pacific Coast in the United States because the central meridian lies in the middle of the area marked as zone 10. This minimizes distortion around the meridian −123° (west of the Prime Meridian).

Georeferencing

FIGURE 2.49 ArcMap—Display South America in South America Lambert Conformal Conic; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Zoom to the Pacific Coast that includes California, Oregon, and Washington. • Change the symbology of the layer Signif_Earthquakes by clicking a symbol selector (Figure 2.52). • Save the map as Earthquakes in the Pacific Coast.mxd in the working folder (File > Save As). To export the map as an image file, File > Export Map. Export the saved map into Earthquakes in the Pacific Coast.jpg in the working folder. • Change the coordinate system of a Data Frame to PCS > State Plane > NAD 1983 (US Feet) > NAD 1983 StatePlane Washington North FIPS 4601 (US Feet). Note that this projection uses Lambert conformal conic with two standard parallels at 47.5 and 48.73° in the northern latitude (Figure 2.53). The SPC for Washington State is divided into two zones, North and South. This map projection is the best way to show the northern part of Washington State where Seattle is located. • Zoom to Seattle, WA. • Save a map as Earthquakes in Seattle. mxd in the working folder (File > Save As).

53

54

GIS: An Introduction to Mapping Technologies

FIGURE 2.50 ArcMap—A result of displaying South America in South America lambert conformal conic.

FIGURE 2.51 ArcMap—Change a map projection to UTM Zone 10N; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Georeferencing

FIGURE 2.52 ArcMap—Change a symbology of a layer using symbol selector; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 2.53 ArcMap—Change a map projection to state PCS Washington North Zone.

ACTIVITY 2.2 GEOCODING STREET ADDRESSES IN BATCHES (ARCGIS 10.5) In this lab activity, you will learn how to geocode street addresses in batches. This activity requires an ArcGIS organizational account. Unlike a public account, the organizational account comes with credits that can be used for online services such as geocoding. You can contact your GIS instructor or your organization’s administrator to request an organizational account if your institution has ArcGIS.

55

56

GIS: An Introduction to Mapping Technologies

• Open an empty ArcMap. • Add address data DePaulColleges.csv in the Lab02 folder (click the Add Data button). The CSV file will be added to the Table of Contents on the left. • Right-click on the DePaulColleges.csv in the Table of Contents > Open. Note that street addresses are populated in the fields named Address, City, State, and Zip code. Close the table (Figure 2.54). • Right-click on the DePaulColleges.csv in the Table of Contents > Geocode Addresses (Figure 2.55). • Select Esri World Geocoder for an Address Locator to use and click OK (Figure 2.56). • Enter the username and password for your organizational account if prompted (Figure 2.57). • Set appropriate address input fields. That is, the field named Address in the input address table will be set to Address in this case. Other fields such as City, State, and Zip code are set to City, Region, and Postal, respectively. Name an output shapefile. Click OK (Figure 2.58).

FIGURE 2.54 ArcMap—Check an input address table.

FIGURE 2.55 ArcMap—Geocode addresses in the table of contents.

Georeferencing

FIGURE 2.56 ArcMap—Choose an address locator to use.

FIGURE 2.57 ArcMap—ArcGIS online sign in.

• This will result in a 100% match. If there are unmatched cases, you can click Rematch button to deal with unmatched cases. Otherwise, you can simply click Close. Now geocoded data will be added to the map (Figure 2.59). • Add a basemap (File > Add Data > Add Basemap… from the menu). Choose Streets. • Review the ArcGIS Help article “About rematching a geocoded feature class” for further information.

57

58

GIS: An Introduction to Mapping Technologies

FIGURE 2.58 ArcMap—Geocode addresses dialog box.

FIGURE 2.59 ArcMap—A summary of address geocoding.

Georeferencing

59

References Bolstad, P. 2016. GIS Fundamentals: A First Text on Geographic Information Systems, 5th Edition. Acton, MA: XanEdu and White Bear Lake, MN: Eider Press. Jensen, J. and R. Jensen. 2012. Introductory Geographic Information Systems. New York, NY: Pearson. Kimerling, J. A., A. R. Buckley, P. C. Muehrcke, and J. O. Muehrcke. 2016. Map Use: Reading, Analysis, Interpretation, 8th Edition. Esri Press: Redlands, CA. Longley, P. A., M. F. Goodchild, D. J. Maguire, D. W. Rhind. 2015. Geographic Information Science and Systems, 4th Edition. John Wiley & Sons: Hoboken, NJ.

3 Getting the World Into Your GIS Do you know what you want to map but do not know how to make or get data that will work for the project? Where do you start? There is a world of data available for GIS projects through many possible sources, and in today’s age of global networking, it is often just a click away. Both public domain and commercial datasets provide options for mappers in generally usable formats, whether the application requires physical, human, vector, or raster data. Sometimes data that exists in tabular form with spatial fields can be converted quickly into mappable layers. You may also require remote sensing images or digital terrain data that can be used to visualize physical and human landscapes. Increasingly users collect their own vector data in the field using GNSS/GPS devices. Or they may rely on in situ measurements collected in near real time to populate their analytical models. Some users still rely on in-house or contracted digitizing services to convert scanned or hard copy materials into map layers. This chapter presents an organized guide to finding and getting, collecting, or converting and organizing different types of geospatial data for a GIS project.

3.1 Introduction: A World of Geospatial Data When mappers begin any project in geographic information systems they have a multitude of options for acquiring or creating data layers for their work. Ideally a mapper would be able to find datasets that integrate geospatially with all of the layers needed in the project. While it may be possible to find workable data that is available at either no or very low cost that can meet the project’s needs, it is unlikely. A more likely situation would be to find data layers, both raster and vector, which are a mix of formats from sources that cover a range of cost and compatibility characteristics. This can leave analysts with a dizzying mix of possibilities for acquiring suitable data for their project. Most projects begin with a search for geospatial data. Taking the time to establish some baseline categories and characteristics of the required mapping data at this point can help organize and simplify this search and make later stages of the project easier. There are several ways to categorize data sets based on their source, ownership, format, size, or organization that can help 61

62

GIS: An Introduction to Mapping Technologies

guide data acquisition. The following checklist contains some simple queries that could provide guidance during the initial stages of data acquisition: Source • Public sector or private sector? • Hard copy, digital offline, digital online, or cloud service? • Freely available to all or restricted? • Domestic (i.e., from the user’s country) or international? • Can be used/transferred internationally without restriction or restricted geographically? • Public domain (without restriction) or privately owned (with or without copyright protection)? • Can be used/republished (with or without citation)? • Can be published online (with or without citation)? • Free to access and use by all or restricted to a particular class of user? • License for use required (with temporal or spatial restriction)? • Can add value to and resell or share? Format • Proprietary or open source? • Single format (proprietary or open) or multiple formats available? • Historically fixed or may be updated? • Contains associated metadata in a recognized, readable format? • Metadata is external to actual data or is embedded in data files? • Includes detailed narrative description of data genealogy? • Media (e.g., paper, film, punched cards, magnetic, optical)? Size • Raw/lossless or compressed? • Required storage space needed if acquired (hard copy standard file space, digital—binary space in standard measurement units)? • When acquired will storage space be fixed or will it necessarily expand? • Maximum limit of space required through duration of project? • Temporal limits of stability/expected shelf-life of media used for archiving data? Organization • Systematic series that cover the ROI or covers only part of the ROI? • Consistent technical/precision/measurement genealogy or variable technical precision/measurement genealogy?

Getting the World Into Your GIS

63

• Coordinate system well-documented and accessible through metadata? • Data includes external topological/geospatial files? • Geospatial status (i.e., can the data be brought into a standard GIS without conversion or does it require geospatial preprocessing/­ georeferencing before use)? • Is attribute data/level of measurement/precision documented? • Does data consist of separate layers or is it stored and shared as a geodatabase? These questions can be used to guide any project, and their application to a given context can be ad hoc and selective or formalized with a written report that can be used to inform the acquisition, storage, sharing, maintenance, and disposition of data. As a GIS project unfolds it makes a lot of sense to revisit this process at cyclical and regular intervals. This may include checking particular categories of data to see if they have been updated or revised and also investigating new sources that may have become available or are expected in the near future. This may change the direction of your work, opening new avenues for analysis and the possibility of new geospatial information products as a result. You should always keep a “data diary” that chronicles the acquisition and disposition of data sets associated with your project as it progresses. Again, this can be more or less formal, depending on the needs and requirements of your work.

3.2 Data Acquisition: Getting Started After working through the preceding checklist you may still feel daunted by the number of choices to be made. This can be anxiety-inducing, particularly when one considers the cost (actual fiscal and more abstract intellectual expense) normally required to complete this seemingly mundane but necessary task. However, when we consider the answers to the questions in the above checklist we will have a better idea of what choices need to be made. For example, consider the question “Public sector or private sector?” Laws and regulations pertaining to geospatial data created in government agencies at any level and certainly that were created in private sector organizations are extraordinarily dense and complex. Since the authors of this text live and work in the United States, the examples provided are in that national context (although not always). In the United States, geospatial data is produced and owned by national/federal, state government, and local jurisdictions of various types. While in this country restrictions on government data sharing allow for relatively open access to geospatial data, this can vary within

64

GIS: An Introduction to Mapping Technologies

levels of government and organizations. In some instances, these regulations have become less restrictive and access to data has become much easier and economical; in other instances the opposite has occurred. Generally speaking geospatial data that is provided by government agencies in the United States is or can be much more economical than similar data that is either created by the user or acquired through private contract services, largely an effect of saved labor costs on the frontend of data creation or conversion. But this may not always be the case, as factors such as detail, veracity, accuracy, precision, and timeliness must be considered when making decisions about data. Nevertheless, it is best to conduct an exhaustive search of relevant public sector data archives at the appropriate level of government early in the data acquisition process. As you become familiar with the terrain of geospatial data in the United States (or in your own relevant national or international context) this component of project management will become hardwired into your individual or organizational human capital. If public sector data does not suffice for a project’s needs other data solutions have to be considered. This may mean either developing the capacity to create vector and raster data layers in-house or going shopping for geospatial data services in the open market. In a later section we will explore some different methods for creating vector and raster data layers. We do this with the goal of teaching analysts to effectively communicate with specialists in geospatial data within their own group, or to participate in writing specifications for outsourcing for geospatial data services outside of the organization. Some of the best sources for public domain geospatial data in the United States can be found online through gateway sites hosted by major federal, state, and local government agencies involved in mapping, social data collection, physical science research, and planning. A sampling of these are listed and briefly described as follows: https://www.geoplatform.gov/ The Federal Geographic Data Committee hosts the Geospatial Platform, a vast gateway to many geospatial data sets that have been produced by federal government agencies and are available in numerous commonly accepted formats. There are a number of very useful applications hosted through the Geoplatform that build on the hosted data sets to provide easy web access and discovery including Map Manager, Map Viewer, and Marketplace Preview. https://nationalmap.gov/ The national map, hosted by the US Geological Survey, is a treasure trove of geospatial data from multiple federal agencies. Virtually all of these data are available in the public domain at little or no direct cost to the user. These include current and historical topographic maps in a number of formats, the national boundary dataset, a number of different elevation data sets at various resolutions in raster and vector formats (DEM, contours), and elevation

Getting the World Into Your GIS

65

source data (DEM, lidar, and radar). There are also hydrographic data sets including watershed boundaries and vector stream lines, polygons, and many other data layers that are too numerous to mention here. https://nassgeodata.gmu.edu/CropScape/ The National Agricultural Statistics Service (NASS) is the data arm of the US Department of Agriculture. Their product, Cropscape, is very useful to natural resource planners, remote sensors, and any analyst concerned with agricultural data analytics. Through the NASS gateway analysts have access to very detailed geospatial agricultural crop data dating from 1997 through 2016. Crop data layers for each year at 30 m resolution are derived from supervised classification of Landsat imagery. These layers are field checked for accuracy and crops are mapped using strong and precise georeferencing. There are many possible ways to extract and download data sets in formats usable in most GIS, at county, state, and user-defined levels. https://gdg.sc.egov.usda.gov/ The Natural Resources Conservation Service is a branch of the US Department of Agriculture that focuses on providing assistance to farmers in both environmental conservation and agricultural operations. Through the geospatial data gateway they make available numerous different data resources covering physical and human concerns including direct links to agency internal offices such as the aerial photography field office as well as external entities from the department of the Interior and the Department of Commerce. http://clearinghouse.isgs.illinois.edu/ In the United States, most states have and maintain detailed and state-focused gateways to geospatial data collected through federal, state, and local agencies. A good example of this is the Illinois Geospatial Data Clearinghouse, a node of the National Geospatial Data Clearinghouse hosted by the Illinois State Geological Survey at the University of Illinois Urbana-Champaign. State agencies like this provide more focused data for regional users that includes resources from the federal government as well as specialized data sets only available at the state level. Examples of this are historical collections of aerial photography from the 1920s, 1930s, and 1940s covering much of the state of Illinois available as high-resolution digital images that can be easily accessed through the Clearinghouse. https://gis1.cookcountyil.gov/cookviewer/mapViewer.html Most of the 3,142 counties and many other units of local government use and maintain geospatial data at the center of their missions to provide services to their constituents. As technology has evolved and become more tightly integrated into information processing and data collection the logic inherent in spatial and geographic data has become an organizing principle for many if not most types of data used in managing government services and

66

GIS: An Introduction to Mapping Technologies

workflows. In Cook County Illinois, a large urban area where Chicago is located, GIS has become central to the structure and organization of public information in county government. In the United States, material (e.g., literature, images, maps, and digital data) produced by a government agency is generally considered to be in the public domain (i.e., it is available to the public for use without restriction or limitation and at minimal cost). These current rights are established and made legal by the Copyright Act of 1976 and were extended to digital material with the passage of the Digital Millennium Copyright Act in 1998. In this context databases such as those collected and compiled by government agencies (e.g., the Bureau of the Census) are in the public domain, and are generally available online. This generally also applies to databases and other forms of digital data collected by state and local governments. These policies of open access to government data have been a boon to open GIS, and the amount and quality of geodatabases and digital cartographic data available through this channel have created what David Rhind called “the elixir of life” for GIS and the location-based services industry. A related body of law in the United States, falling under the categories of Right to Know (RTK), freedom of information, and government in the sunshine, have affected open information policies at all levels of government across many different categories. These laws were generally enacted in the 1970s and 1980s and, while serving to expand and codify policies regarding public access to government information and data, have also complicated policies and procedures regarding public access to geographic information. While these types of policies are not uncommon in other nations, analysts should become familiar with their local information access policies. Digital geographic information and data can be a valuable resource. Since large amounts of this data is freely available in the public domain in the United States, private users can modify and reformat these data to make it more usable or compatible with particular systems, applications, or other data sets. There are generally very low or minimal requirements for additional creative work to be applied to public domain data before it can be protected by a registered copyright. In general, geographic databases, when they have been reformatted or have had other value added to them, are treated as new compilations and may be copyrighted by a nongovernmental or private user. For example, while an original database may be considered in the public domain a user may compile a selection or subset from the original data and arrange it in a particular way that makes it more usable for a special purpose. This subset may then be eligible for copyright while the original dataset maintains its public domain status. Users of digital cartographic data or geodatabases from any source should always make certain that they understand fully the copyright status and their legal rights including Fair Use limitations and exceptions regarding data.

Getting the World Into Your GIS

67

3.3 Private Sector Geospatial Services and Data Private sector geospatial services and data is a broad and rapidly expanding field that will only be briefly covered here. A discussion of these data can be divided into a few categories that may help users who are considering using commercial services for data discovery, acquisition, and deployment. These include services with existing data that are updated regularly and may be purchased as one-time “snapshots” of a particular area of interest, services with similar data that require a subscription to access data from network resources or that acquire and process data “on demand” in situ from various types of active and passive systems, or services that convert existing analog or digital data into digital layers that can be used in a particular system. Some suppliers of digital data may function in two or more of these geospatial consumer markets, and depending on your needs for particular ad hoc projects, or projected needs for geospatial data as operational or ongoing sources for recurrent analytical, inventory, or validation processes, you may find a single source or need to acquire geospatial data from multiple suppliers. Suppliers commonly offer different options for acquiring data that allow users to choose the one that works best for their projects. 3.3.1 “Snapshot” Services This type of service gives users the option of getting the geospatial data they need without entering into a long-term or difficult (from a budgetary standpoint) arrangement with a supplier. When using this approach, you should always make certain that specifications for the data sets you are acquiring are clearly and completely written and agreed upon by all parties. This should include additional considerations of metadata, geographic reference systems, accuracy and precision, liability, attribute data, and copyright/ reproduction permissions and restrictions. This type of arrangement is commonly used by individuals or organizations with little experience working with geospatial data processing and analysis who are hoping to learn more. 3.3.2 Subscription/“On-Demand” Services Geospatial data subscription services have many obvious advantages. Users have access to a defined set of data giving near immediate startup for projects, initiatives, and other client services. The staffing and infrastructure advantages of this approach can be substantial. Many suppliers also provide support services. Of course, the same additional considerations mentioned above apply here as well. When entering into a relationship with a subscription service it is important to investigate the company’s reputation and reliability to determine its ability to provide usable geospatial data in a timely and satisfactory manner.

68

GIS: An Introduction to Mapping Technologies

On-demand and subscription-based services are the fastest-growing segment of data providers in geospatial and location-based services. Products or services that fall in this category include the technological and labor process categories mentioned above, but more specifically, fresh, often real-time, digital geospatial data. The sectors that they operate within have come to be known as Software as a Service (SaaS) or Data as a Service (DaaS). In some instances, new technologies have allowed new geospatial data collection processes to be formalized and to become part of a new sector of location-based services. Global Positioning Systems (GPS) is a good example of this. GPS is not the only global navigation satellite system (GNSS). The Russian GPS, Globalnaya Navigazionnaya Sputnikovaya Sistema (GLONASS,) and the European Union system (Galileo), are other examples. GNSS has evolved into a vast constellation of hardware, software, and service providers that cross many disciplinary and professional boundaries. There has also been substantial growth in areas such as digital remote sensing that were formerly dominated by aerial photography service firms and others. There are a number of private firms operating in this ­sector, in the United States, Europe, and Asia, that have taken advantage of the medium and small-scale imagery offered by national space and earth science agencies to offer their users more current and higher-resolution ­satellite imagery. Another technology impacting the widespread adoption and acceptance of GIS and offering a low-cost platform for high-resolution and relatively inexpensive imagery acquisition is unmanned aerial vehicle (UAV) technology. Again, many of these technologies started in governmental defense agencies but are now used by different public and private sector organizations as well as by equipment manufacturers and software developers. UAV technology supports a growing list of active and passive data-collection devices such as high-resolution multi- and hyperspectral sensors, LiDAR, and imaging RADAR. But note that this has not occurred without significant unanswered questions regarding these technologies surveillant nature and potential for misuse. Clearly, we are in a period of regulatory response at multiple jurisdictional levels that is being sorted out by various government entities and the judicial system, even though there is most certainly room for many possible ethical uses of UAVs and the devices being deployed on them across many different disciplines and application areas. Yet another fast-growing area of interest for geospatial data users discussed in other parts of this text is the collection and dissemination of data extracted from social media streams such as Twitter, Facebook, Google, and Instagram. Of course, crowd-sourced data often comes with a number of questions, problems, and limitations in its use, in particular regarding the substitution of inherently biased social media data points for statistically derived random samples, as well as substantial and troubling privacy concerns. But the advantages include more timely, in fact, near real-time, responses to events, queries, surveys, or policies from many users. In

Getting the World Into Your GIS

69

addition, location or geographic reference data can be shared by users and used by companies to create and update the geospatial layers of their products. The voluntary nature of data sourced in this fashion has given rise to the moniker “voluntary geographic information,” even though many or perhaps most users are unaware that they are actively sharing their precise location in real time. A number of commercial enterprises have been launched with the aim of aggregating, classifying, and reselling this geospatial data from various social and quasi-social media, including data provided by smartcar and smart-city technologies. Hybrid social media platforms (e.g., Waze) explicitly rely on the geospatial nature of crowdsourced data and market this data to application developers using geospatiality as a principal selling point. Another example of this type of enterprise is the private firm INRIX (http://inrix.com/) founded in 2004, who collect and aggregate social media and other data, often with geospatial attributes, for a variety of purposes.

3.4 Conversion Services Many organizations in both public and private sectors have large amounts of legacy geospatial data in a number of formats ranging from paper and other analog media to digital formats that may no longer meet the needs of users. Since the large-scale initiation of GIS in the 1960s and 1970s this has been a major concern for organizations who are transitioning into this technology. While this seems to be a continuing issue, it has shifted from the early adopters of GIS to a much larger community of organizations and individuals. In some instances, this may involve relatively mundane processes of manual or automated digitizing from existing sources (e.g., converting map data from historical documents or publications). It may also include analytical components (e.g., using geospatial datasets that may be public or private, contemporary or historical, to derive new geospatial data such as classifications or indices). Also included in this category are firms or organizations that work in the more traditional areas of survey research and that now also collect and code geographic information as part of their standard practices. In general, this activity can take a number of forms that includes in-house work by staff to convert existing data into formats that are usable in GIS using existing technology and techniques, in-house conversion in preparation for transfer to outsourced services for additional conversion, and transfer of analog legacy media to outsourced services for conversion. This conversion may involve many different linked tasks and workflows, manual sorting and organization of textual or graphic material, parsing databases to expose and organize locational information, data entry, georeferencing scanned graphical imagery, heads-up digitizing, and raster-to-vector conversion of scanned analog maps. In the early stages of the geospatial services industry data

70

GIS: An Introduction to Mapping Technologies

conversion often consumed 70%–80% of project budgets, mainly due to labor costs. This has improved significantly, however, as large sets of usable public domain data are now available online, and the large volume of analog and legacy digital data has been reduced due to large public and private sector conversion projects. Of course, the availability of useful public data provided at low cost depends on a number of variables such as copyright law, public domain policies, privacy restrictions, and fiscal limitations. In the United States there is a strong culture of openness regarding cartographic and geographic data that dates back to the 19th and 20th centuries when many national mapping programs were funded. This culture has carried over to digital cartographic databases at nearly all jurisdictional levels, and the Internet has enabled easy and low-cost geospatial data, seeding the “g-conomy” with useful and inexpensive data feedstock.

3.5 GNSS: A Viable Approach to Data Collection Over the past few decades, GNSS have changed the landscape of geographic information systems and have become the vehicle for revolutionizing any activity involved with the collection, compilation, and analysis of geospatial data. There are currently several systems that comprise the GNSS constellation, and the number of navigation satellites and systems that are in development and planned is growing rapidly. GNSS were initially associated principally with the US global positioning system GPS, which was created by the US Department of Defense for military and security navigation and positioning during the 1970s and 1980s. As the technologies associated with these systems have become less expensive, more precise, and much more widely available in civilian applications there has been rapid growth of special-purpose applications and devices in the general area of location-based services. There are currently three systems classed as true global navigation systems: GPS, GLONASS, and Galileo. All GNSS operate in the same basic configuration, which consists of a space segment made up of constellations of medium Earth-orbiting space vehicles, a control segment that coordinates and serves as master of the space segment, and a user segment comprised of devices controlled through many different logical systems that are typically mobile and compact. Most people experience and utilize these systems to collect, record, and share geospatial data through the user segment. User devices range from expensive and exquisitely precise surveying instruments with capabilities of submillimeter precision to very inexpensive, although still precise, components combined with hybrid mobile platforms such as tablet computers, small handheld devices, wearable technology, and virtually all smartphones.

Getting the World Into Your GIS

71

These systems work through a process of trilateration, where the device calculates the time difference of radio signals sent from orbiting satellites to nanosecond precision. Each satellite also continuously broadcasts a NAV file, which includes information on all satellites in the system’s position and atmospheric corrections. This allows corrections to be applied to distance calculations for ionospheric and tropospheric delays, to relativistic errors as a result of satellite speed and differential gravity between satellite and user clocks, and to multipath errors resulting from buildings and other objects in the signal path. These calculations create a geometry of intersecting spheres and circles and produce precise positions of user devices. During normal operation user devices correct for several sources of error in timing calculations. When distances to three satellites are calculated a user’s position is recorded using the geometry of the three intersecting spheres, which produces two points. Normally the closest of these two points to the Earth model in use is selected as the correct location of the user device. When the device completes a fourth satellite distance calculation, an unambiguous location is reported. In 2003 the US Federal Aviation Administration (FAA) deployed an enhanced system intended to provide instrument flight control positioning information for civilian aviation across North America, the Caribbean, and Central America. This wide-area augmentation system (WAAS) uses a network of ground receiving stations that measure small variations in GPS positioning. This information is uplinked to geostationary satellites that broadcast a signal providing WAAS correction for GPS users. The use of a wide network of monitoring stations allows the signal to be corrected for local variations in GPS precision due to atmospheric and other factors. Most commercial civilian GPS receivers are capable of using the WAAS signal to enhance the precision of a reported location. In addition to the three main global navigation systems there are several other regional augmentation systems created by countries developing space programs. These programs aim to improve coverage provided by the three global systems for specific regions and include autonomous GNSS space vehicles to supplement the coverage provided by global systems. Nations with these programs include China (projected to be a global system by 2020), India, and Japan. 3.5.1 DIY Data Collection Using GNSS Over the past decade major developments in technology have allowed most people to carry extraordinarily smart microcomputers in their pockets. Most smartphones now have GNSS chipsets allowing location services to perform many spatially sensitive functions and spawning a new segment of the geospatial services industry focused on personal location-based services. While location-based services are most commonly used for day-to-day navigation, there are many other possibilities inherent in the powerful capabilities of GNSS. Low-cost applications are available for both macOS and Android

72

GIS: An Introduction to Mapping Technologies

platforms that perform useful diagnostic and production functions using harvested location data. Furthermore, recently developed mobile applications have significantly improved functionality, accuracy, and precision. GNSS chipsets can now access L1, L2, and L5 signals from navigation satellites. The L1 signal is the standard GNSS data stream that has been in use since the 1980s, and is used by virtually all GNSS devices to calculate positions through trilateration. The L2 and L5 signals are used by some of the newer GPS satellites (since at least 2010). These signals provide significant advantages to the previous standard, largely eliminating problems with multipath signals and improving corrections for ionospheric delays, and they hold the promise of approximately 30 cm precision without augmentation. Some smartphones using the new chipset are scheduled for release during 2018. When deployed on tablet computers and other small handheld and wearable devices these chipsets will result in extraordinary performance improvements. GIS users can already find many of these apps (e.g., on Google Play, Mac App Store, Amazon). For example, Survey 1–2–3 for ArcGIS and QField for QGIS (Quantum GIS) both allow the collection of vector data using GNSS functionality and have design and recording interfaces for field attribute collection. There is much more on this in Chapter 12.

3.6 Geodatabases in ArcGIS Desktop With the purchase of Salamanca Software in the mid-1990s the Environmental Systems Research Institute (ESRI) obtained the Spatial Database Engine (SDE), a relational database management system (RDBMS) designed to work with spatial data types. Rebranded as ArcSDE, the geodatabase has been in use in ArcGIS since the release of ArcGIS Desktop 8.0 in 1999, pushing powerful database capabilities into the large and expanding installed base of pc-GIS users. With the creation of the geodatabase model, ESRI data architecture began to shift beyond the pc-based shapefile and ArcInfo coverage to a relational model that can better handle various types of spatial data (e.g., vector feature classes, raster and image data, attribute tables) and integrate more readily with mainstream enterprise DBMS such as Oracle, Microsoft SQL Server, and others. Today there are three geodatabase types established by ESRI: Personal Geodatabases: The personal geodatabase was released with ArcGIS in the late 1990s as an option for data management at the individual user level. The geodatabase file uses a Microsoft Access file structure and has a relatively small effective size limit of around 500 MB before system performance begins to degrade. It is only available on Windows OS. While ESRI still supports Access files they recommend the use of the geodatabase file to improve performance and scalability.

Getting the World Into Your GIS

73

File Geodatabases: The file geodatabase was released with ArcGIS Desktop a few years after the release of personal databases. The file geodatabase was a major improvement over the personal geodatabase in a number of ways. It is capable of scaling to a very large size, fundamentally limited only by the storage capacity of a user’s system, and is capable of being used on multiple OS platforms. There are also numerous performance and usability improvements when using file geodatabases. Both personal and file geodatabases are easily portable and can be shared between multiple users, but personal geodatabases are not capable of being edited by more than one user at a time and file geodatabases only support limited simultaneous access and editing. ArcSDE Geodatabases: ArcSDE Enterprise geodatabases are intended for large organizations with many users who need to access data simultaneously and to edit and query geometries and attribute data as well. This model allows for virtually limitless storage of geospatial data and direct and continuous access. The ArcSDE RDBMS normally comes with the ArcGIS Enterprise Server license. Both personal and file geodatabases can be easily created using ArcCatalog. Simply navigate to the desired folder, right-click, and select New > File Geodatabase or New > Personal Geodatabase (Figure 3.1). When the new geodatabase has been created it can be populated with either raster or vector layers from the table of contents by right-clicking “layer name” > Data > Export data... (Figure 3.2). You can then select or enter Input Features, Output Location, and Output Feature Class (Figure 3.3).

FIGURE 3.1 Creating a new file geodatabase; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

74

GIS: An Introduction to Mapping Technologies

FIGURE 3.2 Exporting a shapefile; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 3.3 Exporting a shapefile to a file geodatabase; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

References Kerski, J. and Clark, J. 2012. The GIS Guide to Public Domain Data. ESRI Press: Redlands, CA. Masser, I. 2005. GIS Worlds: Creating Spatial Data Infrastructures. ESRI Press: Redlands, CA.

Getting the World Into Your GIS

75

Miller, H. J. and Goodchild, M. F. 2015. Data-driven geography. GeoJournal 80: 4. Monmonier, M. 1985. Technological Transition in Cartography. University of Wisconsin Press: Madison, WI. Rhind, D. 1992. War and peace: GIS data as a commodity. GIS World 5 (11): 37–39. Steiniger, S. and G. Hay. 2009. Free and open source geographic information tools for landscape ecology. Ecological Informatics 4 (4): 183–195. Selected Online Resources for Geospatial Data. 2018. ArcGIS Online. http://www. arcgis.com. Accessed April 08, 2018. CropScape – U.S. Department of Agriculture. 2018. https://nassgeodata.gmu.edu/ CropScape/. Accessed April 08, 2018. GeoPlatform. 2018. https://www.geoplatform.gov/. Accessed April 08, 2018. GIS Data Sources (Wikipedia). 2018. https://en.wikipedia.org/wiki/List_of_GIS_ data_sources. Accessed April 08, 2018. Inspire Geoportal. 2018. http://inspire-geoportal.ec.europa.eu/. Accessed April 08, 2018. National Centers for Environmental Information (NOAA). 2018. https://www.ngdc. noaa.gov/. Accessed April 08, 2018. National Map (USA). 2018. https://nationalmap.gov/. Accessed April 08, 2018. Natural Earth. 2018. http://www.naturalearthdata.com/. Accessed April 08, 2018. U.S. Census Bureau. 2018. https://www.census.gov/geography.html. Accessed April 08, 2018. United Nations Environmental Program (UNEP). 2018. Environmental Data Explorer. http://geodata.grid.unep.ch/. Accessed April 08, 2018.

4 Geospatial Analysis: Introduction

4.1 Introduction to Geospatial Analysis In Chapter 1 we explained analysis as that which an analyst does. Our goal here is to set out some working categories for the types of analyses that can be accomplished using GIS, and relate these categories to the types of questions that can be answered using geospatial data and techniques. The second goal is to offer examples of information products that can be created as either substantive components of reports or as supplemental illustrative or supportive evidence that can be used in explanatory or argumentative text. Some of the terms used in spatial analysis carry implicit meanings in general usage but become more specific and enriched in the context of a geospatial analyst’s work. For example, a term such as distance can take on multiple meanings in geospatial analysis when using different metrics and modifiers such as economic cost, resistance, and mode of conveyance and other push/ pull, friction, or temporality factors. In an urban transportation network, “as the crow flies” (also called Euclidean) distance can be alternatively specified as “as the crow walks, drives, bikes, or uses public transit.” Indeed, some of the terms used to refer to spatial characteristics and qualities become flexible and adaptable terms when used in specific analytical contexts. Furthermore, some of the most commonly used spatial terms (e.g., location, situation, proximity, or direction) can serve as categorical headings, each with a subset of context-sensitive dependents. Here are some examples: Category (dependents) Location (healthcare access, food access, employment opportunities, first-responder coverage, etc.) Situation (presence of sanitary water supplies, presence of medicine, presence of fear or anxiety, etc.) Direction (prevailing wind, passive solar alignment, direction of egress, etc.) Proximity (time to work, nearness to environmental or technological hazards, nearness to parks, etc.) 77

78

GIS: An Introduction to Mapping Technologies

These dependents can serve as prompts for structuring analytical research into the particular phenomena represented, and can produce new and useful information. As is normal at the start of a project, geospatial and tabular data is collected (see Chapter 3). For example, if the aim of the project was to investigate the status of children in Cook County, Illinois you would take a number of specific steps. After data is assembled and organized from public and private sources analytical tasks can begin. In geospatial analysis, the results could then be used as part of status reports, policy recommendations, or other documents that will include maps, charts, graphs, etc., categorized in a number of ways. The following gives an overview of some of the spatial analyses used in the chapter of this text using ArcGIS Desktop 10.5.

4.2 Selection and Subsetting Big data… it’s everywhere. In the last decade it has sprung into public consciousness, first as a techno—meme/dream, and then quickly becoming one of several engines (along with geospatial analysis) driving economies, markets, and analytics touching virtually every sector. But geospatial data has been “big” for a while—in fact, enormous. Since the 1980s with the creation of topologically integrated geographic encoding and referencing (TIGER) files resulting from a partnership between the US Census Bureau and the United States Geological Survey, GIS analysts have had direct access to more than 156 million addresses, ready to be geocoded and mapped. There are many other examples of geospatial big data. For instance, the Landsat program, which has imaged the entire Earth a couple times each month since 1972. The current satellite, Landsat 8, using the Operational Land Imager (OLI) instrument, has the capability to image each of approximately 165 billion land pixels (30 m × 30 m) every 16 days, collecting 108 bits of visible and reflected infrared spectral information for each land pixel.* We have been swimming in big geospatial data for some time. With the volume of valuable information buried within vast datasets, the need for tools to select and create subsets extracted from big data archives is clear. Census data in the public domain is one such example of big data that can be easily accessed. TIGER/Line files with selected demographic data are available in various geographies and file formats (e.g., Shapefiles, Geodatabases) and can be found at https://www.census.gov/geo/maps-data/data/tiger-data.html. * These data are archived and available online within hours of their collection. Image frames approximately 185 km on a side are precisely georeferenced and available as GeoTiff raster files. The entire archive comprising over 45 years of near continuous data is available at no cost at https://earthexplorer.usgs.gov/.

Geospatial Analysis

79

These files are easily accessible in ArcGIS and include the most current full population census (2010) and updated estimates to within two years of the current date. For example, the demographic profile 1 (DP1) files contain over 180 detailed attributes for each geometry in the feature class. Cook County Illinois is comprised of 1318 census tracts that completely enclose the city of Chicago and numerous other suburban communities. Figure 4.1 shows a choropleth map of single female households with children in Cook County, Illinois from the 2010 census. In the following example the “Select by Attributes…” and “Select by Location…” tools are used to create subsets of census tracts that meet specific criteria. Both of these tools are found in the Selection menu. In GIS generally, and more specifically in ArcGIS, vector data sets with rich attribute data associated with map features can be queried using easily constructed structured query language (SQL) statements that build on the logical structure and content of specific attributes. Queries can be simple or complex depending on the selective logic of statements the analyst constructs. Logical queries that use particular spatial logics to select features from a target layer using another source layer as a spatial criterion for the selection can also be constructed. First, let us look at how the attribute query (Select by Attributes…) works. A simple attribute query of the Cook County census tracts is shown in Figure 4.2. This query selects all census tracts with single female households with children under 18, which comprise over 25% of all households. Only 37 census tracts are selected (the results are shown in Figure 4.3 and highlighted in cyan).

FIGURE 4.1 Layout view, choropleth map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

80

GIS: An Introduction to Mapping Technologies

FIGURE 4.2 Select by Attributes dialog; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Next, we demonstrate a relatively simple spatial query (Select by Location…). We added a feature class that shows facilities in the Toxic Release Inventory (TRI) maintained by the Environmental Protection Agency (EPA) for 2010, shown in Figure 4.4. We leave the census tracts with greater than 25% single female households with children under 18 selected to use as a source to target the TRI layer. We select TRI sites that are located within 5 km of any census tract selected in our first query. This spatial selection query is shown in Figure 4.5. This query selected 33 facilities; the largest 5 releasers were each responsible for over 100,000 pounds of toxics in the air, soil, or water in 2010. Altogether the 33 facilities released over 1.3 million pounds of toxic substances into the environment in 2010. These selected facilities are shown in Figure 4.6 and highlighted in cyan. Selected subsets from population datasets that meet particular criteria can be easily extracted for further processing or analysis. The activity at the end of this chapter further illustrates these techniques.

Geospatial Analysis

81

FIGURE 4.3 Layout view, choropleth map with census tracts >25% single female households with children selected; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 4.4 Layout view, choropleth map with TRI facilities, 2010; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

82

GIS: An Introduction to Mapping Technologies

FIGURE 4.5 Select by Location dialog; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 4.6 Layout view, choropleth map with TRI facilities within 5 km of census tracts >25% single female households with children; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Geospatial Analysis

83

4.3 Classification Historians of cartography have traced in great detail the rise of thematic cartography and statistical mapping from the Enlightenment period and accelerating through the 19th century. Technical advances during the 20th century in lithography and graphic design allowed cartographers to expand the horizons of statistical and thematic cartography. With the development of computer-aided cartography, automated cartography, and eventually GIS much of what had become during the 20th century a rich field designated as thematic cartography became a subset of the techniques and technologies of automated mapping, now largely subtended within GIS. Cartography and map design, long recognized as an art and science, has shifted over the past few decades from an activity dominated by human craft, manual skill, and photolithography to an automated process controlled and executed by programmed logic-driven devices under light human direction. In thematic cartography, the need to generalize large datasets for graphical presentation has been addressed by classification. The goal of classification is to reduce the array of symbols found in thematic maps (graphical patterns, colors, etc.) to a manageable set of classes to allow easy interpretation of statistical thematic maps. Classes may be applied to all types of vector graphics (points, lines, and polygons) as well as raster datasets. In Chapter 5 the various components of classification as an analytical technique are discussed and explained including data normalization, which allows unbiased comparison of feature values, and the various classification schemes used to produce particular views of attribute data in thematic maps. In some ways data classification can be thought of as an exploratory analytical technique that makes visible the nuances of spatial distribution for ­particular attributes of vector and raster layers.

4.4 Exploring Proximity, Buffering, and Spatial Relationships The notion of proximity is closely related to one of the fundamental principles of geographic thought: Tobler’s Law, which simply means that “everything is related to everything else, but near things are more related than distant things.” The logical grounding for this arises from the added cost of travel to interact with more distant things than those that are closer. When considered in specific contexts, logical foundation of Tobler’s Law becomes more complicated as the friction of distance cannot always be considered as a completely fungible cost from one place to another.

84

GIS: An Introduction to Mapping Technologies

The process of buffering and other related location sensitive procedures are the main ways that distance, and more specifically proximity (relational distance), become analytically operational. Buffering normally refers to ­vector-to-vector operations, in which vector source features (points, lines, and polygons) produce new geometries, normally polygons, that define zones around source geometries. These new buffer polygons can then be used as tools in subsequent analytical operations with their proximal attributes defining their locational status in reference to the source features used to establish their existence. An example of this would be using a buffer polygon defined by a sensitive and legally specified distance around a highway where hazardous materials are frequently transported. This polygon could be used as a cookie cutter (as a former baker one of my favorite tools) to select features from another layer such as childcare facilities, assisted living facilities, or even census subdivisions such as tracts or block groups. These facilities or particular subsets of the population derived from the census could then be set aside for special consideration or treatment, and later used in planning response to hazard events, such as accidents or spills, for example. Another way proximity may be treated is in the calculation of raster distance or density surfaces. These are continuous raster layers that can be calculated to very high precision or resolution. Distance surfaces are normally calculated for each grid cell using the Euclidean distance to the nearest source feature. Density surfaces are calculated by searching a specified threshold distance from each grid cell and calculating the areal density of all source features found within the threshold distance. Using either approach the calculated value in each grid cell can be considered as a measure of the expected interaction between the source features and features in any other layers that intersect or overlay that cell. Using the zonal tools (in particular, zonal statistics or zonal statistics in a table) in ArcGIS you can then calculate or summarize values from the raster surface for features in another proximate layer. When these values are reported as a table they can be joined to the attribute table of that layer, adding new attributes that can be used as a measure of the expected relationship with the source layer for the distance or density surface. There are many possible instances where raster distance or density layers might be used, for instance when investigating the relationship between two point features. For example, if a distance surface is calculated using a point layer showing the locations of hazardous facilities such as point sources of chemical releases, this raster layer can be summarized and joined to another point layer showing the locations of sensitive facilities such as schools, hospitals, or assisted-living facilities. Similarly the distance surface can be used to summarize the zones of polygon layers such as census tracts, and once joined to these layers, can add new attributes such as distance to or density of hazardous facilities that could be used as a measure of environmental spatial justice.

Geospatial Analysis

85

4.5 Interpolation In mapping, interpolation is the process of estimating or calculating estimated values for locations where no known values have been measured, using known values from locations where measurements or counts have been taken and recorded. It is normally performed using values that have been measured at a point and recorded as an attribute of that feature, or aggregated from a polygon (e.g., population measures assigned to a census subdivision), and then assigned to a point within the polygon. In interpolation, a raster grid is specified that covers the area of interest, or a sizable area where randomly spaced measurements or even nonrandomly located measurements have been made (e.g., the locations of weather stations making meteorological observations). There are several algorithms commonly used to calculate estimated values for each of the grid cells in the raster model, referred to as a z-surface. Figure 4.7 illustrates the most common algorithm, called Inverse Distance Weighted (IDW), which is used to interpolate values for z-surfaces. Grid cell z is calculated from the values of six measured points weighted by the inverse of the distance of each measured point from the center point of the cell (z). To calculate an estimated value for an unknown grid cell (z) measured points are weighted by the inverse of the distance from the unknown cell and then summed. This value is then divided by the sum of each measured point’s weight and the result is the estimated value of grid cell z. Optionally, an

FIGURE 4.7 Algorithm used to calculate estimated IDW interpolation for single cell (Original graphic by author, algorithm from (Bolstad 2016)).

86

GIS: An Introduction to Mapping Technologies

exponent may be applied to distance as another means of controlling the effect of varying distances in grid cell calculations. Exponents less than 1 tip the balance toward more distant known points, while those greater than 1 emphasize the values of more proximal points in the calculation of an unknown grid cell. Other options for interpolation in ArcGIS Desktop include Kriging (a technique that uses a semivariogram); spline with barriers options; topographic interpolations using elevation values; and trend surface analysis using linear or logistic regression. Chapter 5 discusses IDW interpolation.

4.6 Networks The analysis of networks has long been one of the most challenging problems faced in work with transportation, utilities, and drainage networks. The requirements of any system that hopes to realistically capture the complexity and functional structure of networks demands that it accounts for the many different types of movement through segments in the mesh, and reflects the specific characteristics of different types of connected linear geometries. For example, most of the characteristics of surface vehicle transportation networks such as direction of travel, capacity, speed parameters, traffic control structures, and the traffic itself can be associated with linear segments of networks through the attributes of those segments. Networks are built up through linear features that are connected into systems that allow flow through the connected linear features to occur. Examples of networks occur in natural (e.g., circulatory systems, drainage networks) and human (e.g., transportation systems, utility systems) entities. The connectivity of networks is also necessary and building and maintaining this characteristic of topologically robust transportation systems is part of the important preprocessing of street and road data layers. In ArcGIS most work with network data layers is accomplished using the Network Analyst toolbox (Figure 4.8). This toolbox includes tools and modules that can perform numerous analytical tasks such as routing functions, origin destination analysis, service area creation, and numerous other utilities involved in maintaining different types of network data layers. This tool also includes multiple maintenance routines that allow near real-time repair and restructuring of network layers. This book includes several activities covering network analysis tools. Chapter 2 discusses transportation networks in the context of address georeferencing and the locations provided by different georeferencing services. Chapter 10 includes a Health GIS case study involving the Network Analyst toolbox. These activities will enhance your ability to use these tools in different and diverse contexts.

Geospatial Analysis

FIGURE 4.8 Network Analyst toolsets and tools (from ArcGIS Desktop, permission pending).

ACTIVITY 4.1 QUERIES AND SELECTION IN ARCGIS DESKTOP This activity covers selecting sets of records/features from a vector ­feature class to solve analytical problems using a geodatabase as the source that contains vector and raster layers. We use a geodatabase file that contains three vector feature classes (layers) to spatially analyze the different characteristics of the TRI facilities reported for the state of Illinois during calendar year 2015. There is also a raster digital elevation model (DEM) of the state of Illinois that will allow you to more readily visualize the drainage network in the state (IL_Dem_30m). It is found in the folder IL_Dem_30m elev. There is also a text file in this folder (IL_Dem_30m.txt) that contains detailed metadata for the DEM. The compressed folder can be found in this location:

87

88

GIS: An Introduction to Mapping Technologies

[data folder included with submission Lab04] The zip archive contains the geodatabase and the enclosed folder with the DEM. Extract the contents of the folder and open the three layers using ArcMap. When you navigate to the folder with a geodatabase in ArcGIS its icon will resemble a small cylinder. We will use two modules in this activity to query these layers: Select by Attributes and Select by Location. You can learn more about these in the following Help files: • • • • •

Building a query expression Using Select By Attributes Using Select By Location Working with selected features SQL reference for query expressions used in ArcGIS

Read these Help topics to learn how to build SQL queries in ArcGIS. There are also a couple YouTube videos that will help you: • Select by Attributes (https://youtu.be/J5BC_P6mlAw) • Select by Location (https://youtu.be/9t06AE6vSzs) Answer the following questions. Each query should produce a single set of facilities extracted from the target layer. In each response you need to do two things: A. State how you answered the question, with a clear transcription of the entire logical query(s) you used to produce the final result. If you sequenced the queries, clearly state each as a complete and logical statement. B. The number of features that meet the stated criteria in the ­question—how many features are in the final result? 1. How many TRI facilities (total) are there in counties that border Wisconsin? 2. How many TRI facilities in Illinois released more than 1,000,000 pounds of toxic chemicals onsite in 2015? 3. How many TRI facilities in Illinois are there within 10 km of Cook County (not including any facilities in Cook County)? 4. How many TRI facilities are there in Cook County that are located a distance greater than 1 km from a stream? 5. How many TRI facilities are there in Illinois located within 50 m or greater than 1000 m from a stream?

References Bolstad, P. 2016. GIS Fundamentals: A First Text on Geographic Information Systems, 5th Edition. XanEdu: Ann Arbor, MI. Brimicombe, A. 2010. GIS, Environmental Modeling and Engineering, 2nd Edition. CRC Press: Boca Raton, FL.

Geospatial Analysis

89

De Smith, M., M. Goodchild, and P. Longley. 2015. Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools, 5th Edition. Winchelsea Press: Winchelsea, UK. Foresman, T. ed. 1998. The History of Geographic Information Systems: Perspectives from the Pioneers. Prentice Hall: Upper Saddle River, NJ. Goodchild, M., K. Kemp eds. 1990. NCGIA Core Curriculum in GIS. National Center for Geographic Information and Analysis: UC Santa Barbara, CA. Longley, P., M. Goodchild, D. Maguire, D. Rhind. 2010. Geographic Information Systems and Science, 3rd Edition, John Wiley & Sons: Chichester, UK. Slocum, T., R. McMaster, F. Kessler, H. Howard. 2008. Thematic Cartography and Geographic Visualization, 3rd Edition, Prentice-Hall. Tobler, W. 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography, 46 (Supplement): 234–240.

5 Thematic Mapping

5.1 Fundamental Elements of Maps Do you use maps often? Are there any characteristics that different maps have in common? Most maps represent features of the Earth (such as roads, buildings, and lakes) from a viewpoint directly above those features. That is, planimetric (or 2D) representations of geographic features are common in maps, although the Earth is three dimensional. To represent features in spatially constrained and flat media such as a sheet of paper or computer screen, cartographers have to select, simplify, project, and symbolize the features of interest at a reduced scale. Therefore, cartographic maps (except for physical 3D models) have the following common denominators: • map scale: how much a map reduces features on the ground • map projection: how features on the curved Earth are flattened • cartographic abstraction: how features are cartographically represented using visual variables such as color, shape, size, shading, etc. In other words, a map is a reduced, flattened, and abstracted version of selected features on the Earth’s surface. Suppose you are given the topographic map shown in Figure 5.1 and are asked to determine the 2D distance between Belknap Crater and Black Crater on the ground, marked with a purple bidirectional arrow in the middle of the map. To do this, you would first measure the distance between the two craters using a ruler. If the distance is 1 in., you could then mark the distance (1 in.) in the scale bar at the bottom of the map, to find out how this map distance corresponds to the distance on the ground. The scale bar shows that the distance between the two craters is 5 miles. This means that 1 in. on the map represents 5 miles (=316,800 in.) on the ground. In other words, 316,800 units on the ground are reduced to 1 unit on the map. Therefore, the map scale can be expressed as 1:316,800, where the number on the left (1) represents a unit on the map and the number on the right (316,800) represents a unit on the ground. Although the scale ratio is 1:250,000 in this map, the

91

92

GIS: An Introduction to Mapping Technologies

actual map scale of this map is 1:316,800 because the map is digitally reduced to the image shown in Figure 5.1. This means that scale ratio would not be valid if the map image was resized (enlarged or reduced). It is always a good practice to go by the scale bar because the scale bar (unlike scale ratio) is resized in proportion to the map image when maps are resized.

FIGURE 5.1 Topographic map used as a reference map showing the location of natural and cultural features (from US Geological Survey).

Thematic Mapping

93

Compare 1:1,000,000 to 1:24,000. Which do you think is smaller? The former (1/1,000,000) is smaller than the latter (1/24,000). Therefore, it is correct to say that the former is a small-scale map while the latter is a large-scale map. In cartographic terms, a small-scale map shows a large area with less detail (like US maps), and a large-scale map shows a small area with more detail (like neighborhood maps). This distinction is often used when classifying topographic map series. Some geospatial data is derived from topographic maps and thus the scale of source topographic maps is an important factor influencing the quality of that geospatial data. All else being equal, the larger the map scale the more precise and accurate the geospatial data derived from the map will be. You can use topographic maps to determine location. In the lower-left corner of the map shown in Figure 5.1, you might read 44° and 122° latitude and longitude of that corner. The topographic map also shows location in Universal Transverse Mercator (UTM) coordinates. For example, the location of the red symbol with the number 1 is read 10N 590,000 Easting (x) 4,880,000 Northing (y) in UTM coordinates. You can infer a UTM zone number based on longitude since the UTM zone number increases every 6 degrees longitude from the international dateline. Mapmakers rely on cartographic conventions to reduce ambiguity. For instance, in Figure 5.1 the names of natural features (like lakes and mountains) are displayed in italics, and place names are displayed in upright text. Primary highways (e.g., route # 20) and secondary highways (e.g., route # 242) are displayed using standardized symbols. Some maps are easier to read and more informative than other maps even when the same data is mapped. Cartographers apply cartographic principles to design and create effective maps. Unlike nonvisual media such as letters, many factors that dictate the appearance of a map—such as the colors and symbols used and how the data is classified—can significantly affect map reading. We will address some of the principles of cartographic design in later sections, but first we will explore the types of maps available for use.

5.2 Gallery of Maps There are two types of cartographic maps: reference maps and thematic maps. A reference map shows as many features as possible as long as the map size allows. A topographic map (Figure 5.1) showing both natural and cultural features belongs to a reference map. In contrast, a thematic map picks a particular theme or variable (e.g., primary language, population density, GDP by country) and depicts the spatial distribution of the variable. A population density map (Figure 5.2) is an example of a thematic map. A reference map works like an encyclopedia, and a thematic map is more like an

94

GIS: An Introduction to Mapping Technologies

FIGURE 5.2 Choropleth map as a thematic map (from US Census Bureau).

essay due to its emphasis on a particular topic. In the following we focus on thematic maps because they can be easily created based on field values from the attribute table in GIS. In Figure 5.2, people per square mile is depicted as graduated shades at the level of counties (areal units), where the darker shades indicate higher population density. This type of map is called a choropleth map. “Choro” means area (region) and “pleth” means value. Hence, a choropleth map depicts “values assigned or constrained to areas” (e.g., population density by county). A choropleth map is also referred to as a graduated color map in GIS software. The population density map uses insets to show Alaska, Hawaii, and Puerto Rico. These three insets have different map projections and map scales as indicated by the different scale bars in each of those insets. The legend on the right shows that the population density values are grouped into seven classes. This method of dividing quantitative data into classes is referred to as data classification in GIS, and will be discussed in a later section. There are two types of thematic maps: qualitative maps and quantitative maps. The former maps categorical data (such as land cover and primary language), and the latter maps quantitative data (such as population density and temperature). Some maps only show the locations of features. These

Thematic Mapping

95

types of maps are called point maps. Figure 5.3 shows an example of a point map that depicts the location of UN cultural heritage sites in South America. The qualitative map shown in Figure 5.4 represents language families (categorical data) as color. Maps depicting categorical data for boundaries that exhaustively cover an entire area of interest are called categorical maps. Example categorical maps include maps showing geology type, ethnicity, and climate regions where boundaries are delineated based on the homogeneity of attribute values within each boundary. In contrast, in choropleth maps attribute values are assigned to pre-existing areal units. This type of map is also referred to as a unique value map. There are several other commonly used quantitative thematic mapping techniques. Figure 5.5 shows AIDS cases in July of 1989 and December of 1995. In this map, one dot represents 30 AIDS cases as shown at the bottom of the map. In Montana, AIDS cases increased from zero to 120 cases from 1989 to 1995 as shown by the four dots in Montana in the 1995 map. A large number of dots indicates high density. The type of map that scales the number of dots in proportion to numerical data is called a dot density map. Both dot density maps and choropleth maps depict data that are aggregated at the level of areal tabulation units (such as counties and states). In GIS software, dots are placed randomly within an areal unit because it is not

FIGURE 5.3 Point map showing the location of geographic features.

96

GIS: An Introduction to Mapping Technologies

FIGURE 5.4 Categorical map showing language families. (From Eric Gaba, Wikimedia Commons User: Sting, retrieved from https://commons.wikimedia.org/wiki/File:Map_of_African_language_ families.svg).

FIGURE 5.5 Dot density map depicting AIDS cases (from US Centers for Disease Control).

known exactly where individual AIDS cases occurred within the areal unit. Thus, the placement of dots should not be considered as the actual location of events. The smaller the areal unit, the more accurately dots will be placed. Figure 5.6 shows where green building projects were implemented in the Washington, DC region. In this map, the size of a circle is scaled in proportion to the square footage of the green building project. This type of map is called a proportional (graduated) symbol map. Similar to a choropleth map, in a proportional symbol map the data (in this case the square footage of the green building projects) can be grouped into classes (there are four classes in

Thematic Mapping

97

FIGURE 5.6 Proportional symbol map portraying Leadership in Energy & Environmental Design (LEED) buildings in the Washington metropolitan area (from US Green Building Council).

Figure 5.6). A proportional symbol map is good for visualizing quantitative data at point locations. A proportional symbol map can also be used to map quantitative data at areal units. Figure 5.7 shows the number of microbreweries and brewpubs by state. If a proportional symbol map is used to map areal data, the exact location of breweries is not of interest because the aim

98

GIS: An Introduction to Mapping Technologies

FIGURE 5.7 Proportional symbol map that shows data tabulated at the level of areal units (states).

is to show numerical data representatively by areal units (states). This type of proportional symbol map is often used to depict non-normalized data (such as the number of microbreweries) to complement a choropleth map that is good for depicting normalized data (such as microbreweries per 1,000 population). Another widely used quantitative thematic map is an isarithmic map. Figure 5.8 shows an isarithmic map that depicts temperature using a spectral (rainbow) color scheme. An isarithmic map is the most appropriate way to visualize continuous geographic phenomena with smooth variation, such as temperature, elevation, and carbon dioxide. To create an isarithmic map using GIS software, you need to use a spatial interpolation tool that creates a continuous surface from sample point data. There are two types of isarithmic maps: isometric maps and isopleth maps. An isometric map depicts data collected at physical point locations (e.g., temperature at weather monitoring stations). An isopleth map, on the other hand, depicts data collected at perceptual or representative locations (e.g., poverty rate by census tracts). Figure 5.9 shows an isopleth map with the median home value to complement the choropleth map. Isarithmic maps should only be used when similar values cluster together (i.e., positive spatial autocorrelation). The world air travel map in Figure 5.10 depicts the number of international flights per year along the major air routes. Data is represented by the thickness of the two directional arrows. The thicker the arrow the more international flights. This type of map depicting the amount of flow between origin and destination is referred to as a flow map. This map also depicts two more variables: traffic in passenger miles at the origin airport by choropleth

Thematic Mapping

99

FIGURE 5.8 Isarithmic map showing temperature in the contiguous United States.

FIGURE 5.9 Isopleth map depicting median home price by census tracts in Chicago (2010).

mapping, and the number of passengers at major airports by proportional symbol mapping. In contrast to a proportional symbol and choropleth map that shows spatial distribution at a particular location, a flow map depicts the amount of spatial interaction between places. It is interesting to note that Fuller map projection is employed to map air traffic that is centered around the north pole. Figure 5.11 shows how population density by county changed over time in the western United States. A map that represents numeric data using perspective height is called a prism map. Note that the same data classification

100

GIS: An Introduction to Mapping Technologies

FIGURE 5.10 Flow map showing air traffic in the world.

FIGURE 5.11 Prism map showing population density in the western United States from 1850 to 1900.

Thematic Mapping

101

is used to facilitate comparison across different years. To make a prism map, the GIS software should have 3D modeling and visualization capabilities. ArcGIS, for example, provides this functionality through ArcScene. We have examined many thematic maps that represent data aggregated at areal units. A choropleth map, dot density map, proportional symbol map, and isopleth map all represent such data. While these maps use different visual variables (e.g., color shading, the number of dots, size of symbol), the geometric properties (e.g., the size of areal unit) do not change. However, some thematic maps do modify geometric properties in proportion to the data being used. This type of thematic map is called a cartogram. The cartogram shown in Figure 5.12 scales the size of the areal units (countries) in proportion to the per capita consumption (translated as land area using the ecological footprint framework) by country. Although the distortion of geometric properties (e.g., sizes and shapes of countries) can make it challenging to read this type of map, global inequality of consumption is effectively conveyed due to the unfamiliar appearance of the map. Thus far we have looked at thematic maps depicting one variable at a time (univariate maps). But sometimes it is useful to map charts comprised of multiple variables at a particular location. For example, the map in Figure 5.13 depicts age structure by tracts, where the height of each bar represents the population counts by different age groups. This map shows that the elderly population is more concentrated near Lake Michigan compared to other age groups in Uptown, Chicago. A bar chart map is good for visualizing related variables that are ordered (e.g., income group, population by year). In contrast, a pie chart map can most effectively show the relative share of related categorical variables that make up the whole. In Figure 5.14 a pie chart is created in areal units (tracts) to represent ethnic composition. It is important to note that

FIGURE 5.12 Cartogram that portrays consumption level by country in the world (2003).

102

GIS: An Introduction to Mapping Technologies

FIGURE 5.13 Bar chart map showing age composition by tracts in Uptown, Chicago; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

categories that make up slices of a pie chart are mutually exclusive. For example, in Figure 5.14 the Hispanic population is not shown in the pie chart map because it is subsumed in the White population according to the census ethnic category. Figure 5.14 also shows that tracts that border the intersection between Sheridan and Lawrence appear to be most diverse in Uptown, Chicago.

5.3 How Data Representation Affects Map Reading and Interpretation All four of the choropleth maps shown in Figure 5.15 represent the same data (median house value by tracts in a Seattle metropolitan area in 2000), but they all look different. Map readers will likely draw different conclusions from those maps. In those maps data values are grouped into five to seven classes using different data classification methods. You can compare data classification methods by referring to a histogram and noting where classes are divided in the histogram.

Thematic Mapping

103

FIGURE 5.14 Pie chart map showing ethnic composition in Uptown, Chicago; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.15 Effect of data classification methods on map reading; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

104

GIS: An Introduction to Mapping Technologies

In an equal interval map (upper left of Figure 5.15), the data values are divided into five classes with equal intervals. To determine the intervals, you divide the range (maximum value minus minimum value) by the number of classes. Classes are equally divided at the interval 150,154 (=750,773/5). The fifth class has the range $639,445–$789,600 in an equal interval map. While the equal area method is easy to understand, it does not fairly represent data when the data is highly skewed. For example, in the equal area map in Figure 5.15 the fourth and fifth classes are overrepresented despite the low frequency of the data. A quantile map (lower left of Figure 5.15) divides data in terms of percentiles. If data is grouped into five classes, the first class ($38,827–$144,900) represents the 0–20th percentile, the second class represents the 20–40th percentile, and the fifth class ($266,200–$789,600) represents the 80–100th percentile. In a quantile map, an equal number of ordered cases (units of observation) is placed in each class. The histogram in the lower left of Figure 5.15 shows that the breaking points are close to each other in the second, third, and fourth classes because many cases are concentrated in those values in the range $144,900–$266,200. The result is that the remaining data is grouped into the fifth class in the large range $266,200–$789,600. This works to mask internal variation in the range $266,200–$789,600, but data in the range $144,900–$266,200 are more precisely depicted. The natural break method (also known as Jenk’s method) divides data into the most homogeneous classes. Notice that the fifth class ($505,000–$789,600) in the natural break map is internally homogeneous and is distinct from the other classes. Breaking points are determined through a clustering algorithm that maximizes intra-class similarity and inter-class difference. The default option for data classification in many GIS software is natural break because natural break organizes data in itself, and works in most (if not all) cases. Standard deviation calculates the mean and standard deviation from data, and determines breaking points by subtracting multiples of standard deviation from the mean. The standard deviation method is useful when it is important to convey how much data deviates from the average for each areal unit. In contrast to the equal interval and quantile methods, natural break and standard deviation take into account the frequency distribution of data. Using GIS software, you can also set your own breaking points to meet map requirements. For example, it would make sense to group data into classes manually when you need to visualize income data by income tax brackets. Data should be normalized as needed before mapping. Data normalization is intended to facilitate comparison of data values that are often measured in different units. For example, if you are studying how prevalent obesity is in a community, it would be more useful to look at obesity rates than the number of obese persons because variations in base populations across communities make it hard to compare obesity prevalence equally. Examples of normalized data include mortality rate, rate of population change, median household income, and average SAT score.

Thematic Mapping

105

Consider the two choropleth maps in Figure 5.16. The map on the left (A) depicts raw population count by tract, and the map on the right (B) shows population density (population per square mile) in Cook County, Illinois. Cook County contains the City of Chicago at the center right. Do you see any difference between these maps? Which map do you think more accurately represents the spatial distribution of the population? Map A may give you the false impression that there are many people living in the suburbs and relatively few people living in the city. But note that tract sizes are typically large in suburbs and small in a city. When mapping raw count at areal units remember that the size of areal units varies greatly, and thus mapping nonnormalized data (such as raw count) may simply reflect the size of areal units rather than the actual data being mapped. Similarly, if you map the number of deaths instead of the mortality rate, the number of deaths will most likely reflect the base population as well. When interpreting a map showing nonnormalized data, it is easy to be unfairly influenced by varying areal sizes or base populations. To illustrate this concept further, suppose that you will choropleth map trees that are evenly distributed (or are planted at equal intervals) in a study area divided into two sections, where the first section is 10 acres and the second section is 5 acres. If you map raw count (the number of trees), the first section will look darker than the second section in the map because the first section is larger than the second section, giving a false impression higher density of trees in the first section than in the second section. Mapping

FIGURE 5.16 Effect of data normalization on map reading; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

106

GIS: An Introduction to Mapping Technologies

normalized data (e.g., the number of trees per acre) will show that the two sections have the same density (equal distribution). Mapping normalized data can help map readers neutralize the effects of varying areal sizes and base populations. Moreover, shading is usually interpreted as intensity (e.g., population density or crime rate). That is, map readers tend to associate darker tints with higher intensity (or density). Map B in Figure 5.16 gives the correct impression that there are more people in the city than the suburbs by naturally associating dark tints with high intensity. Further, map B is more effective at portraying spatial distribution of population than map A because additional information (e.g., area, base population) that helps compare data among different areal units is taken into account, and a relative measure facilitates comparison among cases. It is important to consider data normalization to create an effective choropleth map that is not misleading. Consider the two maps in Figure 5.17. The map on the left (A) shows the crude death rate and the map on the right (B) shows the age-adjusted death rate. The crude death rate is simply the number of deaths per 100,000 population. The age-adjusted death rate is adjusted for differences in age distribution. To calculate age-adjusted death rates, you multiply the age-specific crude death rate by the national standard proportion for each age group (as weight), and sum the products over all age groups. That way, the age-specific death rate of the state with a high proportion of elderly population will be deflated. Age adjustment is a common procedure in calculating the rates of health statistics to allow communities with different age structures to be compared. There are higher crude death rates in a few states (e.g., Pennsylvania, Maine, Florida) compared to other parts of the country because there are

FIGURE 5.17 Effect of data normalization methods on map reading.

Thematic Mapping

107

relatively more elderly people in those states. The map of the age-adjusted death rate gives a different pattern than the map of the crude death rate. Figure 5.17 shows that map interpretation can be significantly influenced by how data is normalized. It is important to consider an appropriate method for data normalization that meets the requirements of the map being created.

5.4 Choosing Symbology (Visual Variables) The previous three sections focused on how to read cartographic maps in an informed and critical manner. After familiarizing yourself with how features and attributes are cartographically represented, you know that making maps involves numerous choices regarding cartographic parameters. What symbology should you use? What thematic mapping technique would be most appropriate for a given data and problem? Should you normalize data? If so, how? What data classification method would be most effective? How can you make a map easy to read? In the following we consider some of the following mapmaking questions: • What symbology should be used? • What thematic mapping technique should be used? • How should thematic contents be organized? Compare the two maps in Figure 5.18. Both maps depict the percentage of multiple races by state in the United States. Which map do you think is more effective at depicting the spatial distribution of multiple races? What is wrong with the other map? One of the problems with the map on the left is that the visual representation is not logically aligned with the characteristics of data. In contrast, the visual representation from light shade to dark shade

FIGURE 5.18 Choosing visual variables: use color value (lightness) for quantitative data.

108

GIS: An Introduction to Mapping Technologies

is naturally in line with the graduation of data values (from low percentage to high percentage of multiple races) in the map on the right. Consider the two maps in Figure 5.19 showing presidential election results. Which map do you think is more effective at depicting election results, and why? Why did you choose that map over the other one? The map on the right exploits the symbolic connotation of color; red is associated with the Republican political party, and blue is associated with the Democratic political party. The two colors (red and blue) are well suited to depict categorical differences. In contrast, dark green and light green in the map on the left can be perceived as data with different weights, where dark green potentially looks more important than light green. In other words, visual representation that suggests ordinal difference is not naturally aligned with data that suggests qualitative difference. Consider the two maps in Figure 5.20. The map on the left shows the number of active hate groups by state, and the map on the right shows the dominant hate groups by state. Which map do you think is poorly designed, and how would you improve it? The number of active hate groups would have

FIGURE 5.19 Choosing visual variables: use color hue for categorical data. (Reprinted with permission from Krygier, J. and Wood, D. Making Maps: A Visual Guide to Map Design for GIS, 1st Edition, The Guilford Press, 2005.)

FIGURE 5.20 Choosing visual variables: use shape for categorical data. (Reprinted with permission from Krygier, J. and Wood, D. Making Maps: A Visual Guide to Map Design for GIS, 1st Edition, The Guilford Press, 2005.)

Thematic Mapping

109

been better represented by the same shape (a circle rather than various geometric shapes) in graduated size, where the number of active hate groups is depicted by the size. The map on the right is logically designed as different pictorial map symbols that suggest qualitative difference are used to represent qualitative data (dominant hate groups). The human eye can perceive differences in map elements or symbology, such as color, symbol size, and shape. Those visual aspects that result in perceived differences are called visual variables (or graphic elements) in cartography. Figure 5.21 shows shape, size, color and perspective height, the visual variables commonly used in mapmaking. Humans perceive three dimensions of color: hue, brightness (value), and saturation (intensity). Hue refers to what we commonly think of as color, such as red, blue, and green. We can perceive a varying degree of brightness within a color hue, like dark blue to light blue. Humans can also distinguish between pure (more saturated) blue and impure (less saturated) blue by adding or subtracting gray. Each of the visual variables has its own inherent logic. Some visual variables suggest qualitative difference, and some visual variables suggest quantitative difference. Suppose you are given cards with different color hues (blue, red) and are asked to order them. If you can order them, they are quantitative. Otherwise, they are qualitative. Thus, visual variables are classified

FIGURE 5.21 Visual variables used for map design. (Reprinted with permission from Krygier, J. and Wood, D. Making Maps: A Visual Guide to Map Design for GIS, 1st Edition, The Guilford Press, 2005.)

110

GIS: An Introduction to Mapping Technologies

TABLE 5.1 Two Types of Visual Variables Qualitative (No Order) Qualitative (Ranked) Shape Size Color hue Color value Color intensity Perspective height

x x (x)

x (x) x x x

into two categories, qualitative (no order) or quantitative (ranked), as shown in Table 5.1. Shape and color hue suggest difference in kind or category and hence are qualitative visual variables. Size, color value, color intensity, and perspective height are quantitative visual variables because they can be ordered, suggesting a difference in quantity. Although color hue is mostly qualitative, color can be grouped into warm colors and cool colors, and there is some perceived order within that category. For example, we can order yellow, orange, and red within a group of warm colors. Similarly, color intensity is in most cases quantitative (pure to impure red), but the most impure color or achromatic color (e.g., white, black, gray) is qualitatively different from chromatic color (e.g., red, blue, green). Hence, color intensity can be used to suggest qualitative difference if achromatic colors and chromatic colors are used together. When choosing visual variables you should match the type of visual variables with the nature of the data. That is, you should use qualitative visual variables if you are mapping categorical data such as the dominant hate group, but you should use quantitative visual variables when mapping ordinal or numeric data such as earthquake magnitude or the percentage of multiple races. Figure 5.22 uses a qualitative visual variable—shape (coffee cups and coffee bags)—to represent categorical data (coffee consumption and coffee production). This map also uses a quantitative visual variable—size— to represent quantitative data (the relative share of coffee consumption and production). Pictorial symbols—in this case, cups and bags—make it easy to recognize two categories (here, coffee consumption and production). Even when working with the same raw data, you will often need to choose different visual variables to meet different map requirements. For instance, Figure 5.23 shows age-adjusted death rate in two different color schemes— sequential vs. diverging color schemes. While the sequential color map in Figure 5.23 is good to show death rate in natural sequence, the diverging color map would be effective if map requirements are to show how data (death rate) deviates from mean. The diverging color map in Figure 5.23 divides data into two categories—data below mean, and data above mean—to represent two categories in color hues (blue and red), and amount of deviation within each category in color value (or lightness). In general,

Thematic Mapping

111

FIGURE 5.22 Choosing visual variables: shape for categorical data and size for numerical data.

FIGURE 5.23 Sequential color scheme vs. diverging color scheme.

a diverging color scheme would be appropriate for depicting data with a natural breaking point (or baseline) such as population growth which can be positive or negative. Color Brewer 2.0 (http://colorbrewer2.org) provides color specifications for suggested sequential, diverging, and qualitative color schemes.

112

GIS: An Introduction to Mapping Technologies

5.5 Choosing a Quantitative Thematic Mapping Technique The previous section provided guidance on choosing thematic mapping techniques based on measurement levels of data. A unique value map (qualitative thematic map) that uses color hue as a visual variable was created to map election results (qualitative data) in Figure 5.19. A point map that uses shape (pictorial symbol) as a visual variable was created to map dominant hate groups in Figure 5.20. However, matching the type of visual variables with the nature of the data provides little guidance on choosing among the different quantitative thematic maps—namely, choropleth maps, dot density maps, proportional symbol maps, and isarithmic maps. Suppose you need to map the following four quantitative data in a metropolitan area: • Precipitation, where data is collected at weather monitoring stations • Hospital discharge rates for diabetics, where data is aggregated at the level of zip code • Revenue of shopping malls • Population (raw counts), where data is aggregated at the level of census block What thematic mapping techniques would you consider for each of the above data types, and why? In cartography data used can be classified depending on whether it is continuous or not (y axis) and how data values vary (x axis), as shown in Figure  5.24. In a y axis, data is spatially continuous if the data values exist at all locations and are seen as a continuous surface filled with values. Temperature, elevation, and precipitation are good examples. Data is spatially discrete if the values exist at discrete locations and do not occur between those locations. Examples include buildings, trees, and traffic accidents. Population is considered discrete data as each data point falls into

FIGURE 5.24 Matching thematic mapping techniques with data characteristics.

Thematic Mapping

113

each person at discrete location and no values exist between two persons. Population density is considered continuous data because the value calculated for a given area occurs at all locations as if it fills the entire area continuously. Normalized data for an areal unit such as SAT score by school district, median household income by tract, and mortality rate by county are considered continuous data. In an x axis, data is considered smooth if discrete data points occur relatively frequently or continuous data varies smoothly in the geographic space. Data is considered abrupt if discrete data points occur rather infrequently or continuous data does not vary smoothly in the geographic space. If you compare shopping malls and buildings in a metropolitan area, shopping malls are on the abrupt side and buildings are on the smooth side. Temperature would have a smooth variation whereas population density would have an abrupt variation as the level of population density varies abruptly across boundaries of areal enumeration units. In summary, data used for quantitative thematic mapping can be organized into the phenomenon space in two dimensions of continuity (y) and variation (x), and the phenomenon space can be matched with the symbolization space as shown in Figure 5.24. An isarithmic map is the best way to depict continuous phenomenon with smooth variation, such as precipitation, temperature, elevation, and ozone level. A dot density map is well suited for portraying discrete phenomenon that occurs frequently, such as population count, number of jobs, number of churches, and acreage of cropland harvested by county subdivision. A proportional symbol map works well for depicting discrete phenomenon that does not occur frequently, such as the number of cases of the West Nile virus, revenue of shopping malls, enrollment by universities, and the number of visitors to national and state parks. Obviously, the map scale or study area influences the variation factor in an x axis. For example, enrollment by university in Chicago may be abrupt, but enrollment by university in the United States may be smooth. A choropleth map is suitable for depicting continuous phenomenon whose value abruptly changes across areal units, such as average SAT score by school district, median household income by tracts, hospital discharge rates for diabetics by zip code, and population density by county. Two dimensions of cartographic data (continuity and variation) can be considered along with map purposes and audience when choosing the right quantitative thematic mapping technique.

5.6 Organizing Thematic Content After you choose your visual variables and a thematic map, next you need to consider how to organize thematic content. Consider the two maps shown in

114

GIS: An Introduction to Mapping Technologies

Figure 5.25. The purpose of these maps is to show where the Black Heritage Trail is located. Which map do you think serves the purpose more effectively? The map on the left is less effective than the map on the right because the Black Heritage Trail, the most important thematic content, is not emphasized. “Black Heritage Trail” in the title should have been emphasized more than “Eli County, VA” as the map is about “Black Heritage Trail,” and a mapping area is secondary to the trail. The map on the right clearly emphasizes the trail. Unlike the former map, the trail stands out relative to other content such as topography, place names, and other trails. The two maps in Figure 5.26 show the Latino population using a dot density map. The only difference is the visual representation of dots and boundaries. The map on the right is more effective than the map on the left because the most important thematic content (dots) stand out most while less important content (tracts boundary) are deliberately toned down. Consider how the AIDS maps in Figure 5.5 would have looked if the background color was orange instead of black. Black was chosen because it helps the yellow dots representing AIDS cases stand out. In the AIDS map, good contrast is achieved using color saturation as well, where dots (yellow) are highly saturated and a backdrop color (black) is unsaturated (achromatic). In a welldesigned map, visual weight corresponds to importance of map content. Map design uses visual hierarchy to reflect varying levels of intellectual importance of map content by exploiting the figure-ground relationship. Figure-ground relationship refers to a perceptual grouping necessary for recognizing objects through vision. When you see a guinea pig in Figure 5.27, you will distinguish a guinea pig as a figure from the background. In an effectively designed map, more important content stands out to map readers and less important content retreats to the background. This is illustrated in the figure on the right (Figure 5.27) in which darker visuals are seen more clearly by map readers, and lighter visuals are de-emphasized. The two maps (Black Heritage Trail and Latino population) on the right in Figures 5.25 and 5.26, respectively, are effective because the thematic content is hierarchically

FIGURE 5.25 Visual hierarchy can be achieved by highlighting important elements. (Reprinted with permission from Krygier, J. and Wood, D. Making Maps: A Visual Guide to Map Design for GIS, 1st Edition, The Guilford Press, 2005.)

Thematic Mapping

115

FIGURE 5.26 Visual hierarchy can be achieved by toning down less important elements.

FIGURE 5.27 Figure-ground relationship. (Reprinted with permission from Krygier, J. and Wood, D. Making Maps: A Visual Guide to Map Design for GIS, 1st Edition, The Guilford Press, 2005.)

organized and depth is added through visual hierarchy. Compare these maps to those on the left, which look flat and do not immediately convey what is important in the map. While there are many cartographic conventions at your disposal, many of them rely on your cognitive capacity to determine the best choices for your map content. The human brain is incredibly efficient at processing visual information but has limited short-term memory with a tendency to organize fuzzy constructs into discrete ones. Cartographers understand this and group data into no more than seven classes because humans can only process limited amount of information at once. Cartographers also make use of the figure-ground relationship to organize content.

116

GIS: An Introduction to Mapping Technologies

A map is similar to language in that both are media through which content is communicated. But like a map, visuals are inherently different from letters in that the look of visuals affects interpretation and leaves a lasting impression. For that reason, a map is considered an opaque media whose look interferes with map reading and interpretation whereas letters are considered a transparent media. This means mapmakers can influence how their maps are interpreted through their map design choices. ACTIVITY 5.1 MAKE THEMATIC MAPS (ARCGIS 10.5) 1. Open an Existing Map Template For this activity, you will use data in the Lab05 folder. cities, counties, and states are extracted from ESRI DataMaps 2010. ustemp come from the US Historic Climate Network data version 2. • Open ArcMap. The ArcMap—Getting Started window will appear. • Choose New Maps > Templates > Traditional Layouts > USA, and click ConterminousUSA. Click OK (Figure 5.28). This will open an existing map template tailored to depicting the contiguous United States.

FIGURE 5.28 ArcMap—Open an existing map template; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Thematic Mapping

2. Point Maps Let’s make a map showing the location of major cities in the contiguous United States. • Add cities.shp from the Lab05 folder (click the Add Data icon). • Turn off layers in the Table of Contents except for cities (the first layer) and three layers from the bottom (State Boundaries, States, and Neighboring Countries). Since the map is cluttered, we will limit the display of cities with populations over 500,000: • Right-click on a cities layer > Properties > Definition Query > click Query Builder… button; type in “POP2007” > 500000 in the input box in the Query Builder. Click OK (Figure 5.29). This will only show cities where the population in the year 2007 was over 500,000. • Click a symbol selector of the layer cities, and browse the available symbols by scrolling down. Type “place” in the search box of the Symbol Selector to look for more symbols. Select a point symbol style and size that you find appropriate by considering visual hierarchy (or figure-ground relationship; Figure 5.30).

FIGURE 5.29 ArcMap—Definition query; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

117

118

GIS: An Introduction to Mapping Technologies

FIGURE 5.30 ArcMap—Search a symbol marker in a symbol selector; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Label cities based on the field NAME (right-click on the cities layer > Properties > click Labels tab. Check “Label features in this layer” at the top, and then choose NAME from the Label Field dropdown list. Click OK (Figure 5.31). • Save the map as point.mxd with a relative pathname to source data in the Lab05 folder. To set a relative path name, File > Map Document Properties > and check “Store relative pathnames to data sources” at the bottom. 3. Proportional (Graduated) Symbol Maps This time we will make a map that depicts the population of these major cities. • cities Layer Properties > Symbology > Choose Quantities on the left then Graduated symbols; choose POP2007 from the Value dropdown list; and change symbol size 6–32 (Figure 5.32). In ArcGIS, a graduated symbol map is a classed map, and a proportional symbol map is an unclassed map. • To change the properties of all symbols, click the Template. To change the properties of the selected symbols, double-click any symbol in the symbol selector as shown in Figure 5.33. To break down data into classes manually (instead of using the natural breaks method):

Thematic Mapping

FIGURE 5.31 ArcMap—Label features in a layer; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Click the Classify button in the Classification panel in the Layer Properties > Symbology (Figure 5.34). • In the Classification dialog box, choose Manual from the Method dropdown list, and enter break values manually (minimum value, 1 million, 2 million, 4 million, and maximum value). Click OK (Figure 5.35). • If the legend (on the lower left in the map) is covered by the map area, select the legend and cut (ctrl-x) and paste (ctrl-v). Reposition and resize the map elements (e.g., legend, text) as needed (Figure 5.36). • Save the map as gradsymbol.mxd in the working folder. 4. Choropleth (Graduated Color) Maps • Turn off the cities layer, and add counties.shp from the Lab05 folder. • Click the List By Drawing Order icon in the Table of Contents (Figure 5.37). • Click State Boundaries in the Table of Contents, and drag and drop the selected layer on top of the counties layer. State Boundaries should be drawn above Counties (Figure 5.38).

119

120

GIS: An Introduction to Mapping Technologies

FIGURE 5.32 ArcMap—Create a graduated symbol map.

FIGURE 5.33 ArcMap—Change properties of graduated symbols; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Thematic Mapping

FIGURE 5.34 ArcMap—Open data classification for thematic mapping; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.35 ArcMap—Set class breaks manually for thematic mapping; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

To map the raw count of the Hispanic population by county: • counties Layer Properties > Symbology > Choose Quantities on the left, then Graduated colors; choose HISPANIC from the Value field dropdown list; and choose appropriate color ramp. Click OK. This will map the raw count of Hispanic by county (Figure 5.39).

121

122

GIS: An Introduction to Mapping Technologies

FIGURE 5.36 ArcMap—Manipulate map elements in the layout view; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.37 ArcMap—List layers by drawing order in the table of contents; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.38 ArcMap—Change the drawing order of a layer in the table of contents; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

To map the proportion of the Hispanic population by county: • Select POP2000 from the Normalization dropdown list. This will map proportion of Hispanics (i.e., the number of Hispanics in the year 2000 divided by POP2000). Click OK (Figure 5.40). To make a choropleth map showing the percentage of Hispanics using different data classification methods: • county Layer Properties > Symbology > Click Classify button. • Select Equal Interval from the Method dropdown list (Figure 5.41).

Thematic Mapping

FIGURE 5.39 ArcMap—Make a choropleth (graduated color) map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.40 ArcMap—Normalize data in a choropleth map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.41 ArcMap—Choose equal intervals for data classification; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

123

124

GIS: An Introduction to Mapping Technologies

• Pay attention to the histogram (Figure 5.42). Click OK. See what the map looks like when data is classified into five groups using the equal interval method. • See what the map looks like when data is classified using quantile, natural breaks, and standard deviation, respectively. • Save the map as gradcolor.mxd in the working folder. 5. Dot Density Maps • counties Layer Properties > Symbology > Choose Quantities— Dot Density (Figure 5.43): (1) Choose POP2010, then click > button. (2) Change the color of the dots to black by double-clicking symbol selector. (3) Change the dot size to 2. (4) Change the dot value to 50,000. That way, each dot represents 50,000 persons. (5) Tone down the background (county boundaries; Figure 5.44). Click OK. • Experiment with other cartographic parameters, including dot color, dot size, dot value, etc. • Save the map as dotdensity.mxd in the working folder.

FIGURE 5.42 ArcMap—Equal intervals in the histogram; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Thematic Mapping

FIGURE 5.43 ArcMap—Create a dot density map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.44 ArcMap—Change the symbology of background for a dot density mapping; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

125

126

GIS: An Introduction to Mapping Technologies

6. Isarithmic Maps





• Turn off the counties and State Boundaries layers, and add ustemp.shp and states.shp to the map. The map shows weather monitoring stations in the contiguous United States. Data are permanently projected to the Contiguous USA Albers Equal Area Conic. The ustemp shapefile contains the mean temperature in the year 2011. • Open an attribute table of the ustemp layer. Pay attention to the field VALUE10. VALUE10 represents the mean temperature value in tenth of Fahrenheit in October 2011. For example, 613 means 61.3 degrees in Fahrenheit. A. Exclude Data Points with Missing Values Note that there are 1218 stations at the bottom of the attribute table (0 out of 1218 selected), and some records have values 9999. Data value 9999 means missing values. If records with missing values are not treated, maps made based on VALUE10 will be unfairly influenced by these outliers. We need to remove those records where VALUE10 is 9999 and only keep records where VALUE10 is not 9999. • ustemp Layer Properties > Definition Query > Click Query Builder… button. In the Query Builder dialog box, enter the following expression: “VALUE10” 9999. means “not equal to.” Click OK. This will select records where values are not 9999. In other words, stations with missing values will be excluded as a result. This is how you exclude records with abnormal values. Note that the definition query does not permanently delete those records, but rather those records with missing values are hidden temporarily for mapping purposes (Figure 5.45). • Check the attribute table to check whether those records with values 9999 for VALUE10 have been removed. Now the number of records should be reduced to 1199 (Figure 5.46). B. Spatial Interpolation Isarithmic mapping will create a smooth surface (raster data) from temperature values collected at sample weather stations. Temperature values between weather stations are unknown. Unknown values from unsampled locations should be estimated. The method of estimating unknown values at unsampled locations from known values at sampled locations is called spatial interpolation. You can perform spatial interpolation using ArcGIS extensions like Spatial Analyst, Geostatistical Analyst, and 3D Analyst. • To activate a Spatial Analyst Extension, click Customize > Extensions from the menu. Check Spatial Analyst. Click Close. This will activate the extension Spatial Analyst. This extension specializes in raster data analysis (Figure 5.47). • Open ArcToolbox (the red box icon in the Standard toolbar; Figure 5.48). Then click Spatial Analyst Tools > Interpolation > double-click IDW.

Thematic Mapping

FIGURE 5.45 ArcMap—Exclude missing data using definition query; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.46 ArcMap—Number of records in the attribute tables; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.47 ArcMap—Activate extensions; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

127

128

GIS: An Introduction to Mapping Technologies

FIGURE 5.48 ArcMap—Open an ArcToolbox; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.



• •



IDW, which stands for inverse distance weighting, estimates unknown z values at any given location i as the weighted average of ‘data points within its neighborhood’ j, where weight is the inverse distance between i and j. In the IDW dialog box, specify input data (ustemp) and z value field (VALUE10). This will generate a raster surface based on VALUE10 (the temperature in October as of the year 2011) from the ustemp layer. Click the browse button under the Output raster to create new raster data in the working folder (Figures 5.49 and 5.50). By default, the output cell size is set to 11539 m, which is the shorter of the width or the height of the extent of the input point features divided by 250. This is equivalent to 7.17 miles. In other words, raster data created from this operation will have spatial resolution (pixel size) 7.17 miles. By default, twelve neighboring data points will be considered in estimating z values at any given location, as shown under the Search Radius setting > number of points. Before you click OK, Click the Environments… button at the bottom (Figure 5.51). In the Environment Settings dialog box, choose Raster Analysis and select states from the Mask dropdown list. This will cut raster data by the mask layer (states). Click OK. This will create a raster surface from the temperature data, clipped by the contiguous state boundary (Figure 5.52). Turn off the ustemp layer.

FIGURE 5.49 ArcMap—Choose input parameters for the IDW tool dialog box; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Thematic Mapping

FIGURE 5.50 ArcMap—Save an output raster for the IDW tool dialog box; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.51 ArcMap—Choose environment settings for the IDW tool; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Change the symbology of states to Hollow (no fill), and move states on top of oct2011 in the Table of Contents if states are below oct2011. This should show interpolated temperature along with state boundaries (Figure 5.53). • Save a map as isarithmic.mxd in the working folder.

129

130

GIS: An Introduction to Mapping Technologies

FIGURE 5.52 ArcMap—Choose a mask layer for environment settings; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.53 ArcMap—Change the symbology of polygon features to no fill in a symbol selector; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

7. Prism Maps • Turn off oct2011 in the Table of Contents. • Make a choropleth (graduated color) map based on the field POP2010 from a states layer (not State Boundaries). To do this, right-click on the states layer > Properties > Symbology. Then choose Quantities > Graduated colors on the left. Choose POP2010 from the field value dropdown list. • Right-click on a layer states > Save As Layer File…. Save it as states.lyr in the working folder. A layer file (.lyr) stores cartographic information (such as color ramp, data classification). Similar to a map file (.mxd), it does not store source data, but the path to source data. • • • •

Open ArcScene (All Programs > ArcGIS > ArcScene). If prompted, open New Blank Scene. Add states.lyr from the working folder (Figure 5.54). In ArcScene, right-click on the states layer in the Table of Contents > Properties > Click Extrusion tab.

Thematic Mapping

FIGURE 5.54 ArcScene—Add a layer file in an ArcScene; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 5.55 ArcScene—Set an extrusion value based on a field value for 3D mapping.

• Check the radio box “Extrude features in layer.” Extrusion turns points into vertical lines, lines into walls, and polygons into blocks.” Click the calculator button, and click POP2010 in the Expression Builder. Click OK (Figure 5.55).

131

132

GIS: An Introduction to Mapping Technologies

Since population values are highly variable, extrusion will clutter a map. To remedy this (i.e., to adjust an exaggeration factor): • Right-click on Scene Layers > Scene Properties (Figure 5.56). • Click the Calculate from Extent button. A vertical exaggeration factor 0.014568 means that extruded blocks will be reduced to 1.4% of the original block height. Click OK (Figure 5.57). • Navigate the map using a Navigate tool. You can change the vantage point, and you can tilt the map in different angles and from different vantage points (Figure 5.58).

FIGURE 5.56 ArcScene—Open a scene properties window.

FIGURE 5.57 ArcScene—Calculate a vertical exaggeration factor from extent in Scene Properties window.

FIGURE 5.58 ArcScene—Navigate a map.

Thematic Mapping

• Save a scene as prism.sxd in the working folder. • Close ArcScene. If you do not close ArcScene, you will not be able to edit the source data (states layer) later. 8. Graph Maps • Go back to ArcMap. • Right-click on Layer States > Properties > Symbology > (1) Choose Charts then Bar/Column. (2) Select both POP2000 and POP2010 from the Field Selection, and click the > button. (3) Set the background symbol to Hollow. Click OK. This will make a bar graph map showing how the population changed from 2000 to 2010 (Figure 5.59). • Save the map as graph.mxd. To make a map showing the rate of change in population from 2000 to 2010, you first need to create and calculate a new field containing the change rate. • Open the attribute table of the states layer. Add a new field named ChangeRate with type Float. Float stores real number with fractional values. To add a new field, click the Table Option at the upper-left corner in the Attribute Table > Add Field. Type the name of a field, and select data type. Click OK (Figure 5.60).

FIGURE 5.59 ArcMap—Make a bar chart (graph) map.

133

134

GIS: An Introduction to Mapping Technologies

FIGURE 5.60 ArcMap—Add a field in an attribute table.

FIGURE 5.61 ArcMap—Calculate a field in an attribute table.

• Right-click on the column header Change > Field Calculator. Click OK if prompted. This will open a field calculator (Figure 5.61). − [POP2000])/ • Calculate a field change with ([POP2010]  [POP2000]*100, which is the rate of population change from 2000 to 2010 in percent. Click OK (Figure 5.62). • Copy the states layer, and paste it into the same data frame. To do this, right-click States Layer > Copy. Then right-click on the data frame Conterminous United States > Paste Layer(s) (Figure 5.63). • Make a choropleth map based on the field Change from the original states layer you copied (Figure 5.64). • Save the map as change.mxd. • Put a finishing touch on the map you just created. You may need to change the title and resize and move the legend (Figure 5.65). • When you are done, export the map into an image file in 300 dpi (File > Export Map). Dpi stands for dots per inch. The higher dpi the higher the spatial resolution (the better image quality; Figure 5.66). • Open the image file you just exported.

Thematic Mapping

FIGURE 5.62 ArcMap—Enter an expression for a field calculator.

FIGURE 5.63 ArcMap—Copy and paste a layer in the current data frame.

FIGURE 5.64 ArcMap—Make a choropleth map from an original layer.

135

136

GIS: An Introduction to Mapping Technologies

FIGURE 5.65 ArcMap—Select elements in the layout view.

FIGURE 5.66 ArcMap—Export a map.

Thematic Mapping

137

References Monmonier, M. 1996. How to Lie with Maps, 2nd Edition. University of Chicago Press: Chicago. Dent, B. D. 1998. Cartography: Thematic Map Design, 5th Edition. William C Brown Pub: Boston. Jensen, J. and Jensen, R. 2013. Introductory Geographic Information Systems. Pearson: New York. Krygier, J. and Wood, D. 2015. Making Maps: A Visual Guide to Map Design for GIS. The Guilford Press: New York.

6 Data Models

6.1 Geospatial Data is Special One of the reasons there is a steep learning curve for GIS is due to the diversity of the geospatial and attribute data formats involved with most GIS projects. Some of the commonly used geospatial formats include shapefiles, geodatabases, KMLs, GeoTIFFs, ERDAS imagine files (.img), AutoCAD Drawing Exchange Format files (.dxf), GPS Exchange Format files (.gpx), and GeoJSON files. GIS data can also be in tabular form (e.g., Excel, CSV, TXT, and dBase files). This diversity comes from the different fields (e.g., image processing, computer-aided design, and database management system) that have contributed to the genesis of the geospatial industry and due to the complexity of geospatial data itself. The complexity can be contrasted alongside attribute data in three respects. First, attribute data stores only nonspatial attributes, while geospatial data stores spatial aspects as well as nonspatial attributes. For example, mapping populations by county requires storing county boundaries (spatial objects) as well as populations (attributes) in GIS. Second, compared to attribute data, which is typically stored as a table, there is no unified method to encode the spatial aspects of the world. Some geographic phenomena are spatially continuous and amorphous (e.g., temperature, precipitation, carbon dioxide) whereas some geographic entities are spatially discrete and have well-defined shapes (e.g., land parcel, roads, buildings). Therefore, a single method cannot adequately represent geographic phenomena. Lastly, spatial data is multidimensional and representation of spatial entities is scale-dependent. For instance, the spatial aspects of a building can be represented in multiple dimensions (x, y, z) whereas nonspatial aspects (e.g., name, ownership, number of floors) are represented with one dimension for each attribute. Buildings can be represented as a zero-dimensional point in a city map but can be represented as a 2D polygon in a detailed land parcel map. A 3D representation of a building is required for virtual globe software such as Google Earth. All this complexity explains why unique methods have been developed to represent geography in GIS.

139

140

GIS: An Introduction to Mapping Technologies

Most GIS software has functionality for loading (and editing) geospatial file formats or has conversion tools for different formats. For example, ArcGIS can directly display IMG and GeoTIFF files and convert KML, DXF, and GPX files to proprietary formats (shapefile and geodatabase). Open-source QGIS can directly load GeoJSON and OpenStreetMap (OSM) file formats. While proprietary GIS file formats are still common, recent trends are toward open standards that facilitate easy exchange of data and protocols among the general public (e.g., KML, GeoJSON) that can be exchanged across platforms and among multiple users on the web. In the following sections, we will look at how geospatial entities and phenomena are encoded in a GIS database. More specifically, we will review how spatial objects, events, and processes are structured in data (Section 6.2), how attribute data are stored (Section 6.3), and how spatiality and attributes are all encoded in geospatial data (Section 6.4). This chapter will help you make sense of the different data formats and competently work with both spatial and attribute data in GIS.

6.2 How Spatial Data Is Structured—Raster and Vector Spatial aspects of the world are structured as data in GIS. Data model (or data structure) is used to encode aspects of entities that are relevant to an application domain. There are two fundamentally different data structures used to represent the spatial aspects of the world—raster and vector. Figure 6.1 illustrates the difference between raster and vector data structures in representing land cover such as forest and water. Raster data models divide space into pixels of uniform size and assigns attribute values (land cover) to each pixel. For example, the four pixels that cover the lower right are encoded as water for raster data in Figure 6.1. Vector data models delineate boundaries for different land covers and assigns other additional attribute values to each land cover area. A line forming boundaries for vector data is made of a set of points (vertices). The building block of raster data models is a pixel and that of vector data models is a point. Both building blocks are associated with (x, y) coordinates in reference to a spatial reference system as discussed in Chapter 2. In Figure 6.1, raster data structure was used to encode a representative land cover value for each pixel since boundaries among different land covers are drawn along pixels. Therefore, pixelated lines between two different land covers (e.g., between water and beach) approximate the transition from one land cover to another land cover. The smaller the pixel size the more precise the land cover will be represented, but this precision is achieved at the expense of increased data size. Although the vector data structure might look more accurate in this example, it is not safe to say that vector representation is more accurate than raster as it is often difficult to delineate

Data Models

141

boundaries among different land covers. However, it suffices to say that both vector and raster data structures have their strengths and weaknesses in representing geography. Since raster encodes attribute values given a controlled spatial unit (pixel), it is suitable for representing continuous and amorphous phenomenon such as temperature, where the pixel serves as the spatial sampling unit. By contrast, vector works well for representing spatial entities with well-defined shapes due to a relative emphasis on spatial forms. Raster and vector are also different in how (nonspatial) attributes are stored. For vector, attributes are typically stored as a table, where a row (record) is linked to a spatial object (land cover area). Multiple attributes such as Name, Public, and Owner are assigned to each land cover area in vector. For example, the second row in the table for vector in Figure 6.1 states that water is Public and Owner is state. For raster that record only one attribute (land cover) per pixel, it would be tedious to store an attribute value per pixel as a row in a table since there are too many pixels (rows) with the same value. Instead, if there is a table for raster it summarizes how many pixels exist for each unique pixel value. The fifth row in the table for raster in Figure 6.1 tells us that there are four pixels (see the column Count) for water. As vector and raster are distinctly different in how they represent both spatial and nonspatial aspects, the ways in which a query is performed and analysis is conducted are also different. For instance, subsets of vector data are selected using query tools (e.g., Select by Attribute in ArcGIS), but those of raster data are selected using a map algebra tool (e.g., Raster Calculator in ArcGIS). Due to the relative focus of vector model on shape, vector data is associated with different geometry types (or dimensionality)—point, line, and polygon.

FIGURE 6.1 Data structure for spatial data: raster vs. vector.

142

GIS: An Introduction to Mapping Technologies

Point (zero-dimensional) data is good for storing the location of any point of interest or event; line (1D) data can store length and direction such as roads. Polygon (2D) data is used to represent areas with size and shape. Figure 6.2 shows that vector data is comprised of points with (x, y) coordinates. Each line (or polyline) is made of a series of four vertices. Polygon data is made of a series of (x, y) coordinates where the first coordinate is the same as the last coordinate. Geometry type of vector data is largely dependent on intended use as well as geographic scale in which the data is displayed. Roads can be represented as polygons at local scale or for detailed cartographic display but can also be represented as connected linear features at regional scale or for routing applications. To find the shortest path between two points along a transportation network, the transportation network should be represented as a network data model, which represents it as a set of nodes (junctions or intersections) and

FIGURE 6.2 Representation of point, line, and polygon objects using the vector data model (Reprinted with permission from Paul Longley, Michael Goodchild, David Maguire, and David Rhind, Geographic Information Systems and Science, 2nd Edition (Wiley, 2005) p. 184).

Data Models

143

links (edges or linear segments), and focuses on representing the connectivity among these elements along with the cost associated with traversing them. Figure 6.3 illustrates how connectivity is stored in a table. The first row for the Links table under a column header indicates that link 1 traverses from node C to node A in one way. Node C is linked to node E in both directions. Similarly, adjacency can be stored in a table as well (Figure 6.4). Moving in the direction of the arrow by Polyline# 6, the left-hand side polygon of the line is two and the right-hand side polygon of the line is five. Explicitly encoding qualitative spatial relationships among elements of the data model is a prerequisite to some spatial analyses including network analysis, spatial query, and overlay. Topological vector models use this type of vector data, compared to a simple vector model that does not store topology. Finally, data structure is also dictated by the ways in which data is captured. For instance, the raster data structure of satellite imagery (e.g., Landsat TM product, DigitalGlobe’s WorldView-2 product) is built into data capture methods just as an image is captured by an electronic scanner mounted on a satellite. Digitizing contours (comprised of vertices) from a topographic map placed on a digitizing table will result in vector data (a set of points with elevation values). Air-quality attributes such as ozone level can be measured using a GPS-equipped sensor at sample locations, and is thus stored as point (vector) data. In summary, geospatial data formats are either vector or raster data structures in which vector data is associated with geometry type (point, line, polygon; e.g., shapefile) and raster has a fixed spatial resolution (pixel size; e.g., GeoTIFF). In the next section, we will examine how attribute data is typically stored in GIS.

FIGURE 6.3 Representation of connectivity among links and nodes for network data model.

144

GIS: An Introduction to Mapping Technologies

FIGURE 6.4 Representation of adjacency of a topologically structured polygon data layer. (Reprinted with permission from Paul Longley, Michael Goodchild, David Maguire, and David Rhind, Geographic Information Systems and Science, 2nd Edition (Wiley, 2005) p. 187).

6.3 How Attribute Data Is Structured—Relational Databases 6.3.1 Relational Databases Probably the best-known data model for storing attribute data is the table view of the world, which became popular with the prevalence of relational databases. A relational database is based on a relational data model and organizes data into one or more tables of columns and rows, with relations defined among tables. As illustrated in Figure 6.5, a table represents an entity type or object class such as Cities or Students. In the table, rows represent instances of an object class such as Chicago and John Doe, and columns represent attributes of instances of an object class such as Name and Age as shown in the column header. Cells in the table contain attribute values of instances of an object class for a given domain or attribute, such as 1833 for the year Chicago was founded and 24 for John Doe’s Age. Tables can be related or joined based on common attribute values in relational databases. To establish the relation “lives in” in the statement like “a student lives in a city,” two participating tables should have columns whose values exactly match. Figure 6.5 shows that a Cities table has a column Name

Data Models

145

FIGURE 6.5 Relational database.

and a Students table has a column City, where values for two columns match; i.e., Cities.Name = Students.City given Table.Column notation. One student lives in one city if the relationship “lives in” is defined as the current primary residency. Therefore, it can be said that there is a one-to-one relationship between Students and Cities based on some predefined rules. 6.3.2 Database Normalization It is important that data is consistent and structured to prevent any incorrect query or analysis results. The process of organizing data to ensure consistency and minimize redundancy is called database normalization. Normalization is especially crucial when you work with poorly structured spreadsheet files. The following checklist for database normalization follows Codd’s rules for relational databases and will ensure that data are well structured, consistent, and have integrity: • Is the first row a column header? • Is the column about the same subject (domain)? • Is a value atomic (single-valued)? • Is a row unique (nonduplicating)? • Is there a column that uniquely identifies an instance, and is the column populated? • Is there a column that relates one table to another table (if a relationship exists), and is the column populated? • Are other columns functionally dependent on a unique identifier? Column header: the first row is always reserved for a column header in relational databases. Both tables (Cities and Students) in Figure 6.5 have a column header in the first row. It is good practice to use a descriptive but concise name without a special character (except for underscores) for the name of a column. There are usually limits to the number of characters that can be entered for the name of a column. Any column name longer than 10 characters will be truncated after data is loaded into a file geodatabase table; i.e., YearFounded will become YearFounde after data loading. “FirstName” or

146

GIS: An Introduction to Mapping Technologies

“First_Name” is preferred to “First*Name” or “1stName” as “First*Name” contains a special character and “1stName” begins with a number. Column domain: a value entered under a given column should pertain to the domain of a given column. For example, it is wrong to enter “Denver” under YearFounded because “Denver” is about the Name (not YearFounded) of Cities. Atomic values: you cannot have multiple values for a cell. That is, it is not good to enter ‘24, 25’ for the age of John Doe even if he is 24.5 years old because having multiple values will result in inconsistent query results. A unique row: no duplicates are allowed for identifying each instance. Suppose there are two rows representing Chicago with conflicting YearFounded values in the Cities table. Then which row should we go by? Identity integrity: this means that there should be a column that uniquely identifies an instance in the table, and the column should be populated with values. One or more columns that serve as a unique identifier is called a primary key (PK). For example, ID or Name can be a PK for the Cities table in Figure 6.3. If there are two students with the name Jenny in the table, then Name cannot be a PK for the Students table as it does not uniquely identify a student. PK values should not be missing because an instance cannot be accessed if the PK value is null. Referential integrity: this rule applies to at least two tables that are related (e.g., Cities and Students in Figure 6.5). The rule states that there should be a column that relates one table to another table, and the column should be populated. In Figure 6.5, the relationship is established because the values of Cities.Name match the values of Students.City. Here the column Students. City is the foreign key (FK) that relates the Students table to another table, Cities. The FK value should not be null because another table (Cities) cannot be referenced from one table (Students) if the FK value is null. Functional dependency: non-PK columns should be functionally dependent on the PK in the table. For instance, suppose that you add another column PoliticalParty that represents the political affiliation of a city mayor in the Cities table. Then are those values (democratic, republican) a function of cities or mayors? Those values are a function of mayors, not cities. This means you should deal with those attributes (PoliticalParty) in a separate table called Mayor because political party is functionally dependent on a unique instance of mayor. Organizing data into tables that only contain relevant attributes serves to reduce redundancy in datasets. Now we will apply the rules of database normalization to a CSV file you can download from the TRI (Toxic Release Inventory) Explorer at https://iaspub. epa.gov/triexplorer/tri_release.facility. This CSV file contains the toxic chemicals released at TRI facilities (Figure 6.6). Clearly, the file is not in the standard form. There are many problems that need to be fixed. First, rows 1, 3, 4, and 5 should be deleted while row 2 can be retained as a column header. A breakdown of the chemical compounds released from each facility is listed under the Facility column, which violates the second rule (column domain).

Data Models

147

FIGURE 6.6 Before database normalization.

You can delete the chemical compounds (in rows 7, 8, 10–15) if you are not interested in specific chemical compounds. Otherwise, you can transpose the amount of chemical compounds into new columns added to the last column as they are functionally dependent on a facility. Figure 6.7 shows the data after normalization. The first row is a column header with a concise and descriptive name without special characters. The column is about the same subject, but each row is unique. TRIF_ID can be set to a PK as it uniquely and unambiguously identifies each facility. Two columns, OnSiteTx and TotalToxic, representing the amount of onsite and total toxic chemicals, have atomic values. Latitude, Longitude, OnSiteTx, and TotalToxic are functionally dependent on a PK. Now suppose that you have data for lung cancer rate by zip code, and would like to examine how toxic chemicals relate to lung cancer. To link the TRI table to a cancer table, it is necessary to add and populate a new column named Zipcode in the TRI table (Figure 6.8). This column (TRI.Zipcode) is used as a FK that references the cancer table from the TRI table. Since there

FIGURE 6.7 After database normalization.

148

GIS: An Introduction to Mapping Technologies

FIGURE 6.8 Referential integrity.

are many TRI facilities in a zip code, the relationship between the TRI table and the cancer table is many to one. 6.3.3 Field Data Types When adding a new field (column) to a table in GIS software, you will be prompted to choose a data type for the field. There are three main data types: text, date, and number. Text (or string) stores text data such as the name of a city (e.g., Chicago), and date stores date and time (e.g., October 21, 2018, 8:32 PM). Numeric data can be stored in one of four data types: short integer, long integer, float, or double. The name of these four numeric data types may vary depending on the computer system being used. Short integer and long integer data types store numerical values without fractional values. Float and double data types store numerical values with fractional values. More bits (units of computer storage) are assigned to a long integer (32 bits) than a short integer (16 bits). A short integer uses 15 bits to store a range of numbers and uses 1 bit to store sign (+,−). With 15 bits, you can store up to 215 (2 to the 15th power = 32,768). A short integer can store an integer between −32,768 and 32,767. A long integer can store an integer between −2,147,483,648 to 2,147,483,647. Float has seven decimal digits of precision with maximum, and double has 15 decimal digits of precision. You can use double to store sufficiently large or small numerical values, such as annual government spending in dollars or rate of rare diseases in fractions. What data type you should use depends on the type of data, the presence of fractional values, and the possible range of numerical values. For example, a short integer is inadequate to store populations of large cities since the

Data Models

149

largest possible value with a short integer is 32,767. In contrast, using double to represent only three possible values (1, 2, 3) is a waste of storage space. Another consideration when choosing data type are operations. It makes sense to use a numerical data type if you want to do algebraic operations such as calculating population changes. In US census data, the FIPS (Federal Information Processing Standard) code is used to uniquely identify administrative units such as county and tract. It is convenient to store FIPS codes as text (string) data to utilize string manipulation functions for parsing or concatenation. For example, the FIPS code for the census tract 102.01 in Cook County, Illinois is 17031010201, where the first two characters (17) represents a state, the next three characters (031) represents a county, and the next six characters (010201) represent a tract. A string function Left([fips],2) can be used to extract the first two characters (state) from the field [fips] in a field calculator tool. Since attribute data is stored as a table following a relational data model in GIS, you can use SQL queries. SQL, pronounced as “sequel,” is a standard (or structured) query language for relational databases that allows you to interact with the data. With the SQL SELECT statement, you can select features or records that meet a certain condition. For instance, the SQL SELECT statement SELECT * FROM tl_2013_17_tract WHERE “COUNTYFP” = ‘031’ selects records for all columns from the table tl_2013_17_tract where a field COUNTYFP is equal to ‘031’. The general form for the query expression within WHERE is . If a field data type is a string (text), the value should be single quoted. Otherwise, the value does not have to be quoted. You can combine expressions with the AND and OR operators. For example, “AREA” > 1,500 AND “GARAGE” > 3 selects houses that have more than 1,500 square feet and a garage for three or more cars. 6.3.4 Table Join As illustrated above, it is often necessary to join or relate tables in GIS. Table join is a fairly common task. For example, you may want to map unemployment rate by county from a CSV file or tab-delimited file containing unemployment rates. To make this map, an attribute table of unemployment rate by county should be joined to spatial data of county boundaries. Table join appends attributes of a source table (unemployment rate) to the end of a target table (county boundary). Answering the following questions will help you use table join properly: • • • •

Do the two tables share the same attribute values? Are those common attributes of the same data types? What is the relationship between the two tables? In what direction should you perform table join?

150

GIS: An Introduction to Mapping Technologies

Each question is answered using the TRI table and Cancer table example shown in Figure 6.9. The TRI table contains the ID, the amount of toxic chemical, and the zipcode of the TRI facility, and the Cancer table contains the cancer rate by zipcode. Presence of common attribute: two tables should share the attribute values on which a table join will be based. The values of the last column zipcode in the TRI table are the same as those of the first column zipcode in the Cancer table. Hence, two tables can be joined based on the condition TRI. zipcode = Cancer.zipcode. Data type of a common key: if the data types of two tables to be joined do not belong to the same category (number, text, date), table join cannot be used. That is, if TRI.zipcode is the text data type, then Cancer.zipcode should also be the text data type for table join to work. If TRI.zipcode is a numerical data type (short, long, float, double), then Cancer.zipcode should also be the numerical data type. Relationship between tables: relationship refers to how many rows in one table are associated with rows in another table. One zipcode contains many TRI facilities. Therefore, there is one-to-many relationship between the Cancer by zipcode and the TRI table. Direction of table join: Table join can be used in two directions. That is, you can join the TRI table to the Cancer table, or vice versa. Since a source table is appended to a target table, it is always a good idea to join a one-side table to a

FIGURE 6.9 Table join.

Data Models

151

many-side table to avoid losing any information. If you join Cancer (one-side) to TRI (many side), then all the data from the source table (Cancer) will be preserved after the table join as illustrated in the second panel of Figure 6.7. If you join TRI (many-side) to Cancer (one-side), some data from the source table will be lost. That is, either 1 or 2 (not all) for the first row with zipcode 60001 will be saved as shown in the third panel of Figure 6.9.

6.4 How Spatial and Attribute Data are Put Together So far we have reviewed spatial and attribute data structures separately. As mentioned earlier, GIS store both spatial and attribute data, but the question is how are these two data types combined? To answer this question, consider how you would store both spatial and attribute data in a table to explore the demographic characteristics of some communities (Figure 6.10). Storing attributes in a table is relatively straightforward. A community named Lincoln Park has a population of 1,000 with 500 households as shown in Figure 6.8. That’s easy enough. However, storing spatial aspects in a table is not that straightforward. You could enter a series of coordinates into a new column named Boundaries (see the last column in Figure 6.10), but the value for the last column would be multivalued unlike other attribute values. This violates Codd’s rule of database normalization, which states that data values should be single-valued. In addition, it is hard to tell how many coordinate values will be entered in this column. This is not desirable as the storage unit for a column is limited. In other words, encoded spatial data does not neatly fit into a table representation. The IT industry has long been dominated by relational databases. In fact, the market share of database management systems (DBMS), which are based on relational data models, is 86%. Thus the GIS industry has had to learn to work with the data dissemination practices of the database industry. So how does the GIS industry work with the issue of spatial data not being neatly encoded into the relational model? One way is to store spatial data and attribute tables separately and to link spatial data to attribute tables through common keys (attributes), otherwise known as a georelational model. In this model a widely used data format called a shapefile separates spatial data (stored in.shp and. shx files) from attribute data (.dbf files). Another option is to use predefined spatial data types or geometry types (such as Point, LineString, Polygon) that

FIGURE 6.10 Representing spatial data in a table.

152

GIS: An Introduction to Mapping Technologies

can be recognized by different software systems. More recently invented geospatial data formats such as KML and GeoJSON rely on Open Geospatial Consortium (OGC) spatial data types. By building your geodatabase (native data format in ArcGIS) on a relational model, you can organize all types of geospatial data (vector, raster, and table) in one place, and utilize advanced capabilities to ensure data integrity and enable high-level data modeling. In summary, a table cannot adequately represent spatial data because spatial data is diverse in nature, scale-dependent, and multidimensional. To address this complexity, new spatial data models such as geodatabases incorporate the concept of object-orientation that represents constructs as objects while retaining the advantages that relational databases offer. An object-based model views the world as a hierarchy of object classes and allows users to define data by its behavior (‘do’) in addition to its status (‘be’). In addition to dealing with the complexity of geospatial data, the geospatial industry is also faced with the needs for simple geospatial data that works across different platforms and software systems. Today open standards are increasingly adopted, and spatial data models have evolved to meet the needs of the geospatial industry. ACTIVITY 6.1 GET TO KNOW GEOSPATIAL DATA (ARCGIS 10.5) • Get the data for this lab activity. The Lab06 folder contains the following public-domain geospatial data:

Name NED CensusBlockGroup NLCD_clipped Streets

Data Format ESRI GRID Shapefile ERDAS IMAGINE Shapefiles

Data Downloaded and Processed From National Map Viewer US Census Bureau National Map Viewer Census TIGER/Line

1.  Explore Properties of Geospatial Data in ArcCatalog • Open ArcCatalog in ArcGIS. ArcCatalog allows you to explore, manage, and organize geospatial data. • Connect to the working folder (Figures 6.11 and 6.12). The folder connection will be mapped in the Catalog Tree on the left (Figure 6.13). Items in the selected folder connection are shown in the Catalog Display on the right. There are four datasets (metadata in XML excluded). Among them, two data layers are shapefiles and two are raster datasets as shown under the Type panel in the Catalog Display.

FIGURE 6.11 ArcCatalog—Connect to folder 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Data Models

FIGURE 6.12 ArcCatalog—Connect to folder 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.13 ArcCatalog—Catalog tree and catalog display; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

153

154

GIS: An Introduction to Mapping Technologies

FIGURE 6.14 ArcCatalog—Item property—Shapefile properties; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Right-click on the item CensusBlockGroup.shp in the Catalog Display > Properties. In the General tab, the feature (or geometry) type stored in this feature class is shown to be polygon (Figure 6.14). • Click the Fields tab in the Shapefile Properties window. This will show the fields (columns) of the attribute table linked to the boundaries of the census block group. Close the window. You can infer that CensusBlockGroup.shp (shapefile) is a vector data layer with the polygon geometry type. • Right-click on the item NLCD_clipped.img > Properties. This shows the cell size (30, 30) and data format (IMAGINE) under Raster Information (Figure 6.15). • Scroll down in the General tab of the Raster Data Properties window. This will show Meter as the Linear Unit under Spatial Reference as shown in Figure 6.16. Click OK. This suggests that NLCD_clipped.imag is raster data with pixel size 30 m by 30 m. 2.  Manipulate (Rename, Copy, etc.) Geospatial Data in ArcCatalog In addition to exploring the properties of geospatial data, you can also copy, move, delete, and rename it using ArcCatalog. Renaming all eight files prefixed by CensusBlockGroup can be tedious in a Windows Explorer (or My Computer). To rename all the shapefiles at once: • Click on the item CensusBlockGroup and enter the name you want to change to (Figure 6.17). To copy the shapefile States in the Lab05 folder to the Lab06 folder: • Right-click on the item States.shp in the Lab05 folder > Copy (Figure 6.18). Then right-click an empty area in the Lab06 folder in the Catalog Display > Paste (Figure 6.19). States.shp will be copied to the Lab06 folder. • Close ArcCatalog

Data Models

FIGURE 6.15 ArcCatalog—Item property—Raster dataset properties 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

3.  Explore NED (National Elevation Dataset) in ArcMap • Open ArcMap. Open an empty map by choosing My Templates > Blank Map. • Click the Add Data tool in the Standard Toolbar (Figure 6.20). • Select the working folder for Lab06 from the Look in: dropdown list (Figure 6.21). Since the folder connection was previously created in ArcCatalog, the folder connection is mapped under the dropdown list. • Add the layer ned in the Add Data dialog box (Figure 6.22). It will be added. This layer shows elevation (source: National Elevation Dataset). The Table of Contents shows that elevation ranges from 184.73 to 175.048. What is the measurement unit here? To find out the measurement unit of the elevation: • Click the metadata NED_1_n42w088_ArcGrid_meta.html in the Lab06 folder. Then click the link to Spatial_Reference_Information. In this section, you will find information about horizontal and vertical coordinate systems. Under Vertical_Coordinate_System_ Definition, you will see meters for Altitude_Distance_Units.

155

156

GIS: An Introduction to Mapping Technologies

FIGURE 6.16 ArcCatalog—Item property—Raster dataset properties 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.17 ArcCatalog—Rename an item.

When you download public-domain geospatial data, it usually comes with metadata. Metadata (data about data) includes characteristics of data, which is useful for determining the fitness of use for data. To add a basemap to see where this study area is: • Click the arrow to the right of the Add Data icon > Add Basemap… (Figure 6.23). These base maps are map services that require an Internet connection. Choose Light Gray Canvas. Click Add.

Data Models

FIGURE 6.18 ArcCatalog—Copy an item; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.19 ArcCatalog—Paste an item; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.20 ArcMap—Add data 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

157

158

GIS: An Introduction to Mapping Technologies

FIGURE 6.21 ArcMap—Add data 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.22 ArcMap—Add data 3; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

The ned layer blocks a basemap. To make the raster layer transparent: • In the Table of Contents on the left, right-click on the layer ned > Properties > Display tab > enter 70 for Transparent (Figure 6.24). This will make this layer 70% transparent so that a basemap can be still seen underneath. The layer shows the Lincoln Park community area in Chicago. • Click the Find tool in the Tools toolbar (Figure 6.25). Click the Locations tab. • Type “2350 N Kenmore Ave Chicago IL” in the Single Line Input text box. This will find the location of the typed street address using the ArcGIS World Geocoding Service. Right-click on the result at the bottom and click Add Labeled Point (Figure 6.26). • Zoom to the labeled point. This shows the Richardson Library on the DePaul University Lincoln Park Campus (LPC).

Data Models

FIGURE 6.23 ArcMap—Add Basemap; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.24 ArcMap—Make a raster layer transparent; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.25 ArcMap—Find tool 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

159

160

GIS: An Introduction to Mapping Technologies

FIGURE 6.26 ArcMap—Find tool 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

To find the elevation value of any particular location: • Click the Identify tool (Figure 6.27) and click on the labeled point. By default, ArcMap will identify information from the topmost layer, which is the ned layer. You should be able to see 181.614990 for the Pixel value. This means that the altitude for this location is about 182 m. Click the Identify window. • Right-click the layer ned > Zoom To Layer. • Determine the elevation of other locations (such as near Lake Michigan) within the spatial extent of the ned layer.

FIGURE 6.27 ArcMap—Identify tool; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Data Models

FIGURE 6.28 ArcMap—Zoom in; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

To review descriptive statistics (like min, max, mean, and standard deviation) of the pixel values: • Right-click on the layer ned > Properties > Source tab > scroll until you see Statistics. The average altitude for Lincoln Park, Chicago is about 181 m with very low standard deviation (1.65). • Zoom to North Pond (at the northeastern part of Lincoln Park) as seen in Figure 6.28. Here, the elevation is represented by grid cells (or pixels) of uniform size. To measure the dimension of each pixel: • Zoom in until you see an individual pixel clearly. Click the Measure tool (Figure 6.29) and click the lower-left corner and then double-click the lower-right corner to measure the length in east-west direction (or width). The segment length can be read as width. Repeat the same process to measure height (Figure 6.30). To verify pixel size: • Right-click on the layer ned > Properties > Source tab. Make a note of the values for Cell Size (X, Y) under Raster Information. • Turn off the ned layer. 4.  Explore NLCD (National Land Cover Database) in ArcMap • Add a layer NLCD_clipped.img from the working folder. • Make the NLCD layer transparent 75%. • Check a pixel value within North Pond using an Identify tool. The Color Index should be 11.

161

162

GIS: An Introduction to Mapping Technologies

FIGURE 6.29 ArcMap—Measure tool 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.30 ArcMap—Measure tool 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Find out what this pixel value (or color index) 11 means by looking up the NLCD metadata: NLCD_2011_landcover_2011_ edition_2014_10_10.html or NLCD_2011_landcover_2011_edition_2014_10_10.xml in the working folder. HTML and XML files can be read in a web browser. Look for the section Entity_ and_Attribute_Information if you open the HTML file or if you open the XML file. • Right-click on the layer NLCD_clipped.image > Open Attribute Table. It can be seen that Value 11 means open water, and there are 2,548 pixels representing water in this study area given the attribute table. • Close the attribute table. • Turn off the layer NLCD_clipped.img. 5.  Explore Census Block Group Data in ArcMap • Zoom to the layer NLCD_clipped.img (right-click on the layer > Zoom to Layer). • Add the layer BlkGrp.shp from the working folder. • Make the layer BlkGrp transparent using the Layer Properties dialog box of the layer BlkGrp (Figure 6.31).

Data Models

• Open the attribute table of the BlkGrp layer (right-click on the layer BlkGrp > Open Attribute Table). Move the attribute table so that the table does not block the features (boundaries) of the block groups in a map area. Note that there are 46 records (rows) shown at the bottom of the attribute table (0 out of 46 Selected). That is, there are 46 census block groups in Lincoln Park. To sort the table based on an attribute value: • Right-click on the column header POP2010 > Sort Descending (Figure 6.32). The first row (record) will have the largest population. • Select the first record (with the largest population in year 2010) by selecting a record selector (Figure 6.33). The feature (boundaries of a census block) linked to the selected record will be highlighted in a map area. • Clear selected features (Figure 6.34). To get descriptive statistics of the field POP2010 (population in year 2010): • Right-click on the column header POP2010 > Statistics. This will report minimum, maximum, sum, mean, and standard deviation along with frequency distribution (or histogram; Figure 6.35).

FIGURE 6.31 ArcMap—Make a vector layer transparent; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.32 ArcMap—Sort column for a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

163

164

GIS: An Introduction to Mapping Technologies

FIGURE 6.33 ArcMap—Select a record in a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.34 ArcMap—Clear selected features; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.35 ArcMap—Statistics for a column in a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

The descriptive statistics on the left show that POP2010 ranges from 6 to 4,658 with an average of 1,375. The total sum of the populations in year 2010 in Lincoln Park is 63,295. The histogram suggests that most of the block groups have populations less than 2,000, but there are some outliers with large populations. • Find the data distribution of the populations with ages between 18 and 21 (using the field named AGE_18_21) based on the descriptive statistics and histogram above.

Data Models

• Find block groups with large populations aged between 18 and 21. This can be done by using a record selector (at the leftmost bar) after sorting the table in the attribute table. It can be seen that this age group is concentrated around the college campus. • Clear the selected features. • Turn off the BlkGrp layer. 6.  Explore Streets Data in ArcMap • Add Streets.shp in the working folder. • Zoom to North Pond. • Select N Stockton Dr (the road just west of North Pond) using the Select Features tool (Figure 6.36). Use the shift key to select multiple features. • Open the attribute table of the layer Streets. Then click the Show selected records (Figure 6.37). N Stockton Dr is comprised of two lines. To see what each of these lines are made of: • Choose Customize > Toolbar > Editor from the menu. Make sure the Editor tool is checked. • Editor > Start Editing (Figure 6.38). Click Continue if prompted on spatial reference. • Then double-click on the road segment (Figure 6.39). One line segment that makes up N Stockton Dr is comprised of nine vertices (points). • Editor > Stop Editing. • Close the Editor tool. • Close the attribute table. • Save a map in the working folder (File > Save).

FIGURE 6.36 ArcMap—Select features; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.37 ArcMap—Show selected features in a table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

165

166

GIS: An Introduction to Mapping Technologies

FIGURE 6.38 ArcMap—Editor 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.39 ArcMap—Editor 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

ACTIVITY 6.2 MAKE THEMATIC MAPS FROM CENSUS DATA (ARCGIS 10.5) 1.  Download Census Data (American Community Survey) from American FactFinder • Go to American FactFinder https://factfinder.census.gov/. • Click Download Center > Download Center. • Check “I know the dataset or table(s) that I want to download.” Click NEXT. • (1) Select American Community Survey from the “Select a program” dropdown list. (2) Choose 2015 ACS 5-year estimates. (3) Click ADD TO YOUR SELECTIONS at the bottom (Figure 6.40). The selection will be added to Your Selection in the upper-right corner. Click NEXT.

Data Models

FIGURE 6.40 Select a data set from American FactFinder.

• (1) Select a geographic type: Census Tract-140. (2) Select a state: Illinois. (3) Select a county: Cook. (4) Select All Census Tracts within Cook County, Illinois. (5) Click ADD TO YOUR SELECTIONS. Click NEXT (Figure 6.41). • Select table(s) to download by checking AGE AND SEX (the first row). Click NEXT. • Click OK if you are prompted for “You have selected one item(s) to download” (Figure 6.42).

FIGURE 6.41 Select geographies from American FactFinder.

167

168

GIS: An Introduction to Mapping Technologies

FIGURE 6.42 Download census data from American FactFinder.

This will download the table containing the ACS 2011–2015 5-year estimate AGE and SEX table. • Extract the downloaded zip file. • Open ACS_15_5YR_S101_with_ann.csv in the extracted folder with Excel. • Go to the cell FJ2 (Total; Estimate; SELECTED AGE CATEGORIES, 65 years and over). This is the field you need since you will map the percentage of the population over 65 years. Available field names for this table are listed in ACS_15_5YR_S0101_metadata. csv. • Delete all fields except for the first five fields (Id, Id2, Geography, Total population, Margin of Error; Total Population) and the percentage of the population over 65 years (HC01_EST_VC31), and Margin of error of the percent of population over 65 years (HC01_MOE_VC31). • Delete the first row (Figure 6.43). • Change the name of the fields (column header) as shown in Figure 6.44. • Save the file as ACS_15_5YR_Elderly.csv in the extracted folder (aff_download). Close the CSV file.

FIGURE 6.43 Normalize a spreadsheet File 1.

Data Models

FIGURE 6.44 Normalize a spreadsheet File 2.

2.  Download Census Boundary Data (Tracts) from TIGER/Line • Go to the Census Geography site https://www.census.gov/ geo/. • Click Maps & Data. • Click TIGER Products under Geographic Data. • Click TIGER/Line Shapefiles in the table Which product should I use? (Figure 6.45). • Click year 2013 (which is the middle value between 2011 and 2015 of the ACS data). Click Download > Web interface (Figure 6.46). • Select year 2013 and select the layer type: Census Tracts. Click Submit. • Select a State: Illinois. Click Download. • Extract the zip file tl_2013_17_tract.zip.

FIGURE 6.45 Download data from TIGER/Line 1.

169

170

GIS: An Introduction to Mapping Technologies

FIGURE 6.46 Download data from TIGER/Line 2.

3.  Extract Tracts in Illinois Cook County into a New Layer In this part of activity, you will extract census tracts in Cook county, IL using an attribute query. • Create a subfolder named Census in the Lab06 folder. • Copy the shapefile (all files prefixed by tl_2013_17_tract) and the csv file ACS_15_5YR_Elderly.csv into the subfolder Census. • Open an empty map in ArcMap. • Add the shapefile (tl_2013_17_tract.shp) in the working folder in ArcMap. This will show all census tracts in Illinois. Since we are only interested in mapping tracts in Cook County, we will select those tracts using Select by Attribute. • Open an attribute table of the tract layer. STATEFP is the FIPS code for state (17 for Illinois), and COUNTYFP is the FIPS code for county.

Data Models

• Selection > Select by Attributes in the menu. (1) select the tract layer. (2) Double-click the field “COUNTYFP.” (3) Click = button. (4) Click the Get Unique Value button. (5) Double-click “031,” which is the FIPS code for Cook County, IL. (6) Click OK (Figure 6.47). To export the selected tracts into a new shapefile: • Right-click on the tract layer in the Table of Contents > Data > Export Data. Save selected features as ChiTracts.shp (shapefile) in the working folder (Figures 6.48 and 6.49). • Click Yes if prompted for “Do you want to add the exported data to the map as a layer?”

FIGURE 6.47 ArcMap—Select by Attribute; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

171

172

GIS: An Introduction to Mapping Technologies

FIGURE 6.48 ArcMap—Export selected features; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.49 ArcMap—Save a layer as a shapefile; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Data Models

FIGURE 6.50 ArcMap—Copy a value from an identify result window; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Remove the layer tl_2013_17_tract in the Table of Contents since you will not need it (right-click on the layer in the Table of Contents > remove). • Zoom to the layer ChiTracts. • Using an Identify tool, copy the GEOID of the tract that covers part of Lake Michigan (Figure 6.50). • Right-click on the ChiTracts layer > Properties > Definition Query. Using Definition Query, exclude this tract (where GEOID = ‘17031990000’). means “not equal to,” Make sure the 11-digit FIPS code for the tract is single-quoted in the WHERE expression input box (Figure 6.51). This will show tracts where “GEOID” is not equal to ‘17031990000’, effectively getting rid of the water body that is part of the tracts in Cook County. 4.  Prepare the Census Table for Table Join • Open the attribute table of the layer ChiTracts. You can see that the field GEOID contains the 11-digit FIPS code that is used to uniquely identify each census tract. Close the attribute table. • Add the CSV file ACS_15_5YR_Elderly.csv from the working folder. Spreadsheet files such as CSV, TXT, and XLS files can be added to ArcMap as long as data is in a neat tabular form. While this data is not editable in ArcMap, you can export this table to an editable data format. • Right-click on the ACS_15_5YR_Elderly.csv in the Table of Contents > Open. • Scroll all the way down to see if the data looks OK. You might notice that the last three records have missing values as no populations were reported.

173

174

GIS: An Introduction to Mapping Technologies

FIGURE 6.51 ArcMap—Definition query; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

• Click Table Options (the upper-left corner of the table) in the Attribute table > Export (Figure 6.52). • Export this table to a new table you can edit. Save the table as ACS15_Senior.dbf in the dBase file in the working folder (Figures 6.53 and 6.54). • Click Yes if prompted. • Remove the ACS_15_5YR_Elderly.csv. • Open the table ACS15_Senior. Make note of the values for the field Id and ID2. The last 11 digits of Id is a unique identifier for the census tract. • Right-click on the table ACS15_Senior > Properties > Fields. Select Id and make note of the data type. Do the same for Id2. As shown the field Id is text, and the field ID2 is double (numeric data; Figures 6.55 and 6.56). • Determine the data type of the field GEOID of the layer ChiTracts; it should be text. We will join ACS15_Senior to ChiTracts because we want to retain spatial boundaries for mapping and have attributes from ACS15_Senior appended to the ChiTracts layer. For this table join, it is necessary to have common keys in the same category of data type. Note that the portion of the field Id of the layer ACS15_Senior equals the field GEOID of the layer ChiTracts, and the two fields are the same data type (text).

Data Models

FIGURE 6.52 ArcMap—Export a table from table option; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.53 ArcMap—Export data into table; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

175

176

GIS: An Introduction to Mapping Technologies

FIGURE 6.54 ArcMap—Save data as a dBase file; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.55 ArcMap—Check data type in the table properties window 1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Data Models

FIGURE 6.56 ArcMap—Check data type in the table properties window 2; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

For this table join to work, we need to add a new field named TRACT and populate it with 11 characters from the left of the field Id using a string manipulation function. • Table Options > Add Field. • Enter the name TRACT, Text for Type, and 11 for length (Figure 6.57). • This will add a new field named TRACT with the text data type that can store up to 11 characters. • Right-click on the column header TRACT > Field Calculator. Click Yes if prompted. • Enter Right([Id], 11) in the input box. This will return the 11 characters from the right in the field Id and populate the new field TRACT as the output value (11 characters from the right). Click OK (Figure 6.58). Now you have two common attributes with the same data type with the join condition: ChiTracts.GEOID = ACS15_Senior.TRACT. Now you’re ready to join ACS15_Senior to ChiTracts.

177

178

GIS: An Introduction to Mapping Technologies

FIGURE 6.57 ArcMap—Add field; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 6.58 ArcMap—Field calculator; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Data Models

5.  Join the Census Table to the Census Boundary Layer • Open the attribute table of the ChiTracts layer. • Right-click on ChiTracts (target table) > Joins and Relates > Joins. (1) Choose the field in the layer (ChiTracts) that the join will be based on, which is GEOID. (2) Choose the table to join to this layer, which is ACS15_Senior. (3) Choose the field in the table to base the join on, which is Tract (Figure 6.59). Click OK. Click Yes if prompted for the index. This will join ACS15_Senior (source table) to ChiTracts (target table). The result of the table join is that the attributes of the source table will be appended to the target table. • Open the attribute table of the ChiTracts layer (target table). Scroll all the way to the right. You should now see AgeOvr65 (attributes from the source table) appended to the destination table.

FIGURE 6.59 ArcMap—Join data; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

179

180

GIS: An Introduction to Mapping Technologies

FIGURE 6.60 ArcMap—Export data into shapefile; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Note that table join temporarily appends attributes from the source table. To permanently save the appended attributes: • Right-click on the ChiTract layer in the Table of Contents > Data > Export Data. Save all the features as ChiTractsJoined. shp in the working folder. Click OK. Click Yes if prompted (Figure 6.60). • Remove the layer ChiTracts. • Make a choropleth map based on AgeOvr65 (the percentage of the population over 65). You can do this by right-clicking on the ChiTractsJoined layer > Properties > Symbology > Graduated Color and choosing AgeOvr65 in the Value dropdown list (Figure 6.61). You will see that northern part of Cook county has many tracts with a high percentage of elderly people. • Change the coordinate system of the data frame to State Plane Illinois East. To do this, right-click on the data frame called Layers (yellow layer stacks at the top of Table of Contents)  > Properties > Coordinate System > Projected Coordinate Systems > State Plane > NAD 1983 (2011) (Meters) > NAD 1983 (2011) StatePlane Ilinois East FIPS 1201 (Meters). Click OK. • Save the map in the working folder (File > Save).

Data Models

181

FIGURE 6.61 ArcMap—Make a graduated color map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

References Longley, P. A., M. F. Goodchild, D. J. Maguire, and D. W. Rhind. 2015. Geographic Information Science and Systems. 4th edition. Wiley: Hoboken, NJ. Shekhar, S., and C. Sanjay. 2003. Spatial Databases: A Tour. Pearson: London. Rigaux, P., M. Scholle, and A. Voisard. 2001. Spatial Databases: With Application to GISs. Morgan Kaufmann: Burlington.

7 Web GIS

The web is more a social creation than a technical one. I designed it for a social effect – to help people work together – and not as a technical toy. The ultimate goal of the web is to support and improve our weblike existence in the world. We clump into families, associations, and companies. We develop trust across the mules and distrust around the corner. Tim Berners-Lee, creator of the worldwide web, in “Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web”

7.1 Introduction For most people in developed countries, and a little over half of the world’s population, the Internet and the ecosystems and technology that have grown around it are no longer a novelty but instead deeply ingrained in our daily routines, redefining how we work, live, and play to the point. Generations of people now have grown up with the Internet, and commerce and business around the world from stock trading to retail are now so intertwined with the Internet that many companies depend on it to function. Those who are “digital natives,” a sociological term referring to those who are/were growing up with Internet technology such as the web, social media, and filesharing at a young age, may not find it too difficult to imagine ways to share or feature the products of GIS on the Internet; static maps could be hosted on social media or websites, GIS results could be hosted on a dropbox and shared with a colleague, etc. However, we can go further and leverage the resources, technologies, and frameworks of the Internet and the web with GIS to create dynamic interactive maps, data visualizations that go beyond the capabilities of any static map, GIS that enable active participation from people around the world through real-time data feeds, and other techniques to shape our GIS projects simply not possible or feasible without the Internet. We call this category of GIS “web GIS” because the Internet and the web are not indistinguishable; “the web” describes both the technological and semantic structure that has made a network of connected computers more than simply sending files back and forth. We say “web GIS” rather than “Internet GIS” 183

184

GIS: An Introduction to Mapping Technologies

because before the “World Wide Web” as we know it with websites, search engines, and social media was created, it was still possible for early adopters of the Internet such as universities and government agencies to send GIS projects back and forth to each other using the Internet. The sharing of static GIS products using the Internet is much more convenient and quicker than was possible before its invention, but web GIS enables functionality well beyond simply speeding up the sharing of GIS products. For example, the popular website Google Maps, given as an example due to its mainstream popularity without comment on quality, is more than a GIS map with road and business layers made available through the Internet. Google Maps is a dynamic service that can give transit directions in real time, accounting for live feeds of traffic data or potential road closures and using predictions based on the time of day to tell users what their expected transit times will be. Google Maps also “crowdsources” to an extent, which means taking the knowledge of visitors and volunteers to improve the veracity of their website. Users on mobile devices can allow Google Maps to know their location, and if they pass by or stay near a business where Google is uncertain if the information is up to date, they can be prompted to tell Google Maps if the  listed hours of a business are still correct. The information reported by the user may affirm or update the information for the next person that queries that business using Google Maps. Web GIS and related developments make these features possible. In fact, many lay people have no idea that web GIS are fundamental to many products and technologies they use in their daily lives. Even before web GIS were widespread, the GIS community was quick to predict that the rise of web GIS would change the way that GIS are consumed and produced. With the rapid pace of technological change and the growing number of people with access to the Internet, this process has only accelerated. There now exist GIS platforms that can be run through a browser on a computer or smartphone, replicating and extending the features of desktop GIS programs, without needing to be installed on the user’s computer at all. This is a more recent development in web GIS; contributing factors such as innovations in web technologies and the decreased cost of servers to process user data have made this possible. Web GIS practitioners and developers in the current ecosystem can also rely on and refer to standards that were nonexistent or informal in the nascent years of web GIS. One intimidating downside to the pace of change in the web GIS world is how important it is to keep an ear to the ground regarding new technologies and developments. But a strong understanding of the technologies and paradigms behind the Internet and the web helps. “HTML,” first proposed in 1989, is still used to serve web pages to browsers to this day, albeit with major changes over the years. We will go over the history of web GIS and the related innovations and technologies powering the Internet to demystify some of the fundamentals of how web GIS function and give an overview of popular web GIS platforms and software packages. We will also discuss popular web GIS applications and techniques, as well as more niche web GIS

Web GIS

185

of special interest to some GIS practitioners. Finally, we discuss current and emerging trends in web GIS along with resources and suggestions on how to continue with web GIS beyond this chapter.

7.2 Web and Internet Fundamentals Before we explore web GIS, a basic understanding of Internet technology and history is helpful. When we discuss GIS in the context of desktop software, we do not typically go in depth with explanations of how a computer processor works, or who the first person to invent the computer was. In part, this is because GIS at an introductory level, with some exceptions, allow abstraction from the inner workings of a computer through the help of a GUI (graphical user interface). The capabilities and rapid progress of web GIS are directly tied to Internet and web history and functionality, so understanding the basics of the web is necessary to appreciate how many web GIS techniques actually work. We begin with a broad overview of what happens when you search “GIS” on a search engine such as Google. The first real step is to open an “Internet browser” such as Google Chrome, Mozilla Firefox, Safari, or Internet Explorer. Each browser has its own quirks and features, but they all share the same core functionality of requesting and serving web pages and other content to users. When we type in “google.com” in our Internet browser, we are requesting the Google web page to be delivered to us. A conventional web page consists of HTML, which stands for “HyperText Markup Language,” CSS, which stands for cascading style sheets, and optionally, a web programming language such as JavaScript. Put another way, computer programming code organizes the content of the web in forms of text, images, and video, and packages it as a web page delivered to your browser. Your browser then “renders,” which means to interpret the code and content into a visual medium, such as the Google logo with the search bar below it. Before being sent and rendered, of course, the Google web page has to be found. Web pages are hosted on servers; when you type google.com into your browser, your browser is saying “Find me the web page corresponding to the address google.com” and the DNS, or domain name service, responds with the hosting server for the web page and retrieves it for rendering. “Google. com” is a URL, or uniform resource locator, designed to be human friendly and readable, while the actual IP address where Google’s homepage may be located is a series of numbers such as 8.8.4.4. An IP address is much like a physical home address, except for computers, allowing people to send and receive data from different IP addresses. The DNS works like a directory or a map, making sure that even if the actual IP changes, the intended recipient will be reached.

186

GIS: An Introduction to Mapping Technologies

Requests for web pages are made using HTTP (the HyperText Transport Protocol). Once requested, the Google web page is sent to you via what is called TCP/IP (the Transmission Control Protocol/Internet Protocol). Everything from email to the web relies heavily on TCP/IP, which has been mostly unchanged since its inception in the 1980s. TCP/IP allows data (including the web page and content) to be sent to and from the browser using packets. If you have ever clicked on a website and seen “404 Page Not Found,” you have encountered an HTTP error. The 404 HTTP code is a response from the server saying that the resource requested could not be found. A 200 HTTP response, on the other hand, means that the request was successful. We have skipped a great deal of detail in this overview of how a browser tracks down a web page, and how a web page gets from a server to you, but it is enough to set the foundation for this chapter. In the Google Maps example we discussed earlier and in the web GIS examples and techniques we will share later in this chapter, the above steps are key to how interactive maps are updated by users, transferred between servers and to users, and rendered by browsers. Figure 7.1 shows a screenshot of the OpenStreetMap web page. OpenStreetMap functions like Google Maps or other popular consumer mapping applications, allowing users to put in an address or location and

FIGURE 7.1 A screenshot of an OpenStreetMap, made possible due to OpenStreetMap volunteers.

Web GIS

187

find it within the application, or generate transit directions between two locations. It is open source, meaning that much like the popular online encyclopedia “Wikipedia.org,” anyone can sign up and contribute. Figure 7.2 shows a screenshot of the OpenStreetMap using developer tools that show the HTML code of the homepage. In the top address bar, marked by a rectangle and a “1,” notice that the URL has the OpenStreetMap address (www.openstreetmap.org) but also ends with a sequence, “#map=17/41.81317/87.59060.” These numbers refer to the zoom of the map (17, in this case) and determine how zoomed in the map should be for the user as well as the coordinates of the center of the map. In this case, the center of the map is at coordinates 41.81317, -87.59060, in Web Mercator projection. This centers us on the South Lake Shore Drive of Chicago, Illinois. For the most part, when using web GIS and web mapping applications, you can assume that the projection used is “Web Mercator” due to the influential Google Maps in the early 2000s adopting it as their convention and web GIS platforms following suite. On the right-hand side marked with the rectangle and the “2,” you will notice a block of HTML code, starting with the phrase “OpenStreetMap.” This is how HTML documents are organized, with “tags” that enclose content and HTML code, such as , or . Each tag has semantic or functional meaning; some tags are designed to tell the browser how to organize content, while other tags such as are intended to communicate to anyone looking through the HTML code what the tag is enclosing.

FIGURE 7.2 A screenshot of an OpenStreetMap, made possible due to OpenStreetMap volunteers. The HTML page is made visible using the Google Chrome browser’s developer tools.

188

GIS: An Introduction to Mapping Technologies

In Figure 7.3, we have investigated the content that is being sent over the network to our browser, and selected a “Tile” being sent to us, called “48746. png.” You might recognize this tile segment, shown in the right pane in Figure 7.3, as being from the very scene we are centered on; if we clicked on the other PNGs in the list, such as “48749” and “48748,” we would be able to assemble the whole scene as it appears in our browser. This is done for efficiency purposes. Although the scene appears to be vector data, it would be both computationally and network resource intensive to draw and redraw vector layers in the browser across different scales. Instead, OpenStreetMap sends pictures of the tiles in PNG (the “portable network graphics” format) to the browser. PNG data can be considered “raster” data, but note that some interactive web mapping applications do use vector tiles rather than raster tiles. If we zoomed in to another level, we would receive a whole new set of PNG tiles from OpenStreetMap at the new scale. Figure 7.4 shows the results of zooming in one level. First, in the address bar, you can see that the URL has changed from “www.openstreetmap.org/#map=17/41.81317/-87.59060” to “www.openstreetmap.org/#map=18/41.81291/-87.59109”; the scale has changed so we are requesting a new web page at a new scale and center that is at slightly different scale than our previous scene view. In the right panel of Figure 7.4 the list of resources sent from the OpenStreetMap server to our browser can be seen. There are many PNGs, all with the “Status” of 200, which is the HTTP response code that indicates everything was sent correctly. All the PNGs with the naming conventions “97495, 97494, 97493,” etc., are tiles that represent the scene at our current scale and place of interest.

FIGURE 7.3 A screenshot of an OpenStreetMap, made possible due to OpenStreetMap volunteers. We see a “Tile” from the OpenStreetMap server being sent over the network using the Google Chrome browser’s developer tools.

Web GIS

189

FIGURE 7.4 A screenshot of an OpenStreetMap, made possible due to OpenStreetMap volunteers. We see a list of resources being sent via the OpenStreetMap server, which are all Raster tiles in the PNG format.

Tiled web maps are commonly used in web GIS, though different providers and services may use custom methods for generating and providing them. As mentioned, an advantage to using tiled services is the ability to only send what is needed in the scene of focus for the user, with efficiently compressed tiles that do not demand large amounts of data sent back and forth between the browser and the server. Because OpenStreetMaps is an open-source platform, anyone can use and download its datasets, but if you were to download the entire dataset of the world, at the time of writing it would be a file of over 940 GB, which would be unfeasible, inefficient, and costly to send to the browser. Even a subset of the data, say, the state of Illinois, would be several gigabytes in size and take a significant amount of time to transfer on a reliable Internet connection. The less data that has to be transferred from the user, the faster the web GIS application will function, giving the user the impression of a fast, streamlined web GIS application. Although much of the work involved in keeping web GIS applications running smoothly and efficiently is due to the programming and design of the web GIS application, a web GIS practitioner seeking to create and share web GIS maps should keep the following in mind to ensure that their maps perform properly and efficiently. Inefficient web maps not only annoy users who have to wait for their content to load, but can also incur costs, either in terms of website hosting or “service credits” (as in the case of the soon-to-be discussed ArcGIS Online platform). Inexperienced web GIS practitioners sometimes design maps without thinking about how much unnecessary or inefficient data is being sent to users. An inexperienced web GIS practitioner may put a web map online that has ten or more data layers, such as local fire stations, bus stations, health

190

GIS: An Introduction to Mapping Technologies

clinics, etc., all turned on by default. This means that when users view the web map, they will be stuck waiting for all the layers to load and display in their browsers. It is better practice, for instance, to allow users to instead turn on layers of interest to them when they demand it.

7.3 Web GIS History The rapid technological changes from the web in the 1990s to the present are still incremental; year after year Internet browsers are updated and Internet technologies such as HTML have changed version as well. As of 2018, HTML is version 5, also known as HTML5. Nonetheless, many have marked the differences between the “old” and the “new” web by referring to Web 1.0 and Web 2.0, respectively. Beyond the technological advancements of web technology, “Web 2.0” is also used to refer to the different ways and habits of using the web that have come with those advancements, such as social media. An in-depth explanation of the web design and web technologies more common in the 1990s versus the 2010s is out of scope for this book, but we can do a compare and contrast of the original MapQuest homepage from 1997 and the MapQuest homepage as of 2018. MapQuest was one of the first widely known and used web GIS applications, with millions of users in 1997. Figure 7.5 shows the MapQuest home page from 1997, courtesy of the Internet Archive “WayBack Machine.” When we compare it to Figure 7.6, which shows the MapQuest homepage of 2018, we can easily see that the style and design have changed significantly. In Figure 7.5, the website is built using HTML and frames, and featuring ActiveX and Java to serve the interactive maps. In Figure  7.6, the MapQuest homepage uses HTML5, with responsive web design, JavaScript, and CSS. With the 1997 design, navigating content required redirecting the user to new pages each time, but the 2018 design in Figure 7.6 dynamically redraws the webpage if buttons and layers are selected. This is made possible using the combination of HTML5, CSS, and JavaScript. We note here that although Java and JavaScript have similar names, they have widely divergent histories. Java is a programming language that is typically used “server-side,” meaning that the browser sends a request to the server hosting the web page, and the server runs Java code to generate a response and send the result back to the browser. JavaScript has a complicated history but was designed to be run inside the user’s browser and not only on servers, and has become the web programming language of choice over the years. Between 1995, Adobe Flex, Java, Microsoft Silverlight, ActiveX, Flash, Perl, among others, were variously used to power interactive web maps. Several are still used today on servers hosting web GIS applications, or powering legacy applications (legacy

Web GIS

191

FIGURE 7.5 The MapQuest homepage of 1997, courtesy of the Internet Archive: WayBack Machine service.

applications refer to applications that are kept running even after superior technology has been developed, typically due to the cost of migrating to a new system, such as in government agencies that have made large investments into older technology). Figure 7.7 shows the 2018 MapQuest page once we enable web layers and click on navigational buttons. As can be seen, on the right side the “traffic” bar expands to allow clickable icons that enable “Traffic Slow Downs,” “Traffic Incidents,” and “Traffic Webcams.” If we clicked on these checkboxes, the webpage would dynamically redraw and add those layers to the screen while keeping our place. This has been done with the “Pharmacy” tab in the top of the image, with pharmacies being visible in the map. This exemplifies one of the points noted earlier; rather than enabling all layers in a map, allowing users to indicate the layers they want to see cuts down on the resources used to send data to users. Figure 7.8 shows another innovation of the recent web, “Responsive design.” Responsive design refers to designing web pages and content in a way that conforms to the user’s screen size, whether it is a desktop computer, a small laptop, or a mobile phone. In Figure 7.8, we shrunk the screen to the space that a mobile phone would take,

192

GIS: An Introduction to Mapping Technologies

FIGURE 7.6 The MapQuest Homepage as of July 2018. 

FIGURE 7.7 An example of MapQuest’s dynamic web page.

Web GIS

193

FIGURE 7.8 An example of MapQuest exhibiting responsive modern web design elements with automatic adjustment to window resizing.

and the icons and sizing of the web page change accordingly. Tapping or clicking the three circles on the top right will show all the menu items hidden by the smaller screen. If we attempted to shrink the screen of Figure 7.5 instead, it would not rearrange the content to fit this smaller space but would simply hide most of the content and the user would have to manually scroll to see it. Responsive design is made possible thanks to HTML5 extending the original HTML code to allow for relative sizing. For the rest of this chapter, we will focus on current web GIS applications, as well as on emerging trends and technologies, but from time to time we will look to the past to give context to and help explain how certain web GIS ecosystems developed.

194

GIS: An Introduction to Mapping Technologies

7.4 Mainstream Web GIS Applications A web GIS is a massively diverse ecosystem with many product offerings, from the proprietary products that require payment to the open-source web GIS platforms that can be implemented on a web server without cost besides hosting expenses. Google Maps, Bing Maps, MapQuest, and OpenStreetMap are examples of web GIS platforms geared toward consumers rather than GIS professionals. Someone using Google Maps is expected to query locations of interest for navigational purposes, and the design of the service is geared around making that user experience as seamless as possible. Since commanding a market share for this type of web GIS service is expensive in terms of running servers due to the millions of visitors, most examples of this type of web GIS service are monetized. (OpenStreetMap is a notable exception here, but the market share and thus visitor demand of services of OpenStreetMap is much lower than the proprietary services mentioned here, allowing it to be supported via donations). Google Maps, Bing Maps, and MapQuest generate revenues through advertisements; visitors generate data that can be sold to advertisers, and typically also view advertisements themselves while using the services. Quick and efficiently generated transit directions are in demand by the casual user just as much as a GIS professional, but the above-mentioned services provide plenty of functionality for power users hidden from the casual user. We focus on Google Maps and OpenStreetMap when discussing advanced functionality before moving on to some other web GIS platforms that are designed with the GIS practitioner in mind. Google Maps and OpenStreetMap provide advanced functionality through application programming interfaces, or APIs. GIS developers, a subset of GIS professionals who build GIS applications and solutions beyond analysis, often turn to the APIs of web GIS products to implement GIS solutions that blend their own code with the products of other companies. Figure 7.9 shows an example advertisement for the Google Maps API, which aims to “Help users discover the world with rich location data for over 100 million places. Enable them to find specific places using phone numbers, addresses, and real-time signals.” GIS developers pay per-usage for access to the API, which allows functionality to be provided via Google Servers and licensed products rather than by a company’s own servers. Similar APIs exist for Bing, OpenStreetMaps, and other web GIS platforms. Many web GIS applications are aimed primarily at GIS users and website designers, as well as data analysts and other map producers, rather than the general public. Among this category are platforms MapBox, CARTO, and ArcGIS Online, to name a few web GIS providers that provide hosting and access to services for a cost. All three services provide “freemium” service,

Web GIS

195

FIGURE 7.9 Google API “Places” javascript library.

which means that up until a certain level of functionality, the platforms can be used free of charge. After that, users must pay to these services. Because users of these services do not have to worry about the specific server their content is being hosted on, or manage their domain address or other common web hosting tasks, we consider these GIS services as hosting people’s content “in the cloud.” The following gives an overview of each of the three services: 1. CARTO, formerly called CartoDB, advertises itself as being especially compatible with third-party data science sources, such as socioeconomic data and information drawn from databases. For a fee, users can combine the CARTO web mapping platform with live feeds of social media data, Vodafone, Mastercard data, and other sources. Interactive web maps are designed within the browser instead of through a desktop application. Although many features can be accessed and used by pointing and clicking, an understanding of HTML, CSS, and JavaScript allows more complex usage of the service, as seen in Figures 7.10 and 7.11. 2. MapBox also offers third-party integration with data feeds and allows the creation of interactive web maps through “MapBox Studio.” There are a variety of MapBox APIs that can be used to

196

GIS: An Introduction to Mapping Technologies

FIGURE 7.10 Example javascript code from Google Maps API.

FIGURE 7.11 A basic cholorpleth map being made within CARTO using OpenStreetMap tiles.

create and host web map applications. Like CARTO, an understanding of HTML, CSS, and JavaScript opens up more possibilities for making maps within MapBox Studio. 3. Like CARTO and MapBox, ArcGIS Online allows for third-party data feeds, advanced usage using the ArcGIS Online API, and the creation of interactive maps within the browser. However, ArcGIS Online also has the advantage of integration with desktop ArcGIS

Web GIS

197

products, and is anecdotally friendlier to general users without a web development background. This comes at the cost of some limitations with customization; the themes and templates provided within ArcGIS Online are limited in how they can be used and customized compared to some of the other platforms when used by someone with more experience with HTML/CSS/JavaScript. An odd duck out here is QGIS Cloud, which provides hosting for maps produced using the open-source desktop GIS program, QGIS, allowing users to design web maps within QGIS and share on the QGIS cloud using a plugin. Since it does not allow the creation of web maps within the platform like the above three examples, it is more a service to extend QGIS and a hosting platform than a complete web GIS platform. Finally, there are several projects that provide JavaScript API and web GIS applications intended to be hosted on one’s own infrastructure (rather than “in the cloud”). To use the following web GIS projects, a GIS user typically has to be comfortable with JavaScript as well as have the ability to host a website or web server (or otherwise delegate it to another person). Leaflet is a popular, lightweight JavaScript library. A JavaScript library wraps up code in modules that can be accessed by name, saving a GIS developer or website designer from having to write code from scratch to make chloropleth maps from data, create popups for points, etc. Unlike an API, anyone can see the underlying code in the Leaflet library and make changes or modifications to suit their own purposes. CesiumJS is another example of a JavaScript library that provides web mapping features; CesiumJS is primarily centered around 3D visualizations and scenes. While Leaflet is simple and lightweight, CesiumJS is more “heavy duty” and can be used to visualize 3D imagery in web maps. Finally, “Three.js” is a JavaScript library designed around 3D imagery, with extensions for web GIS purposes. Using APIs and JavaScript libraries allows the creation of web GIS products for specialized purposes. When promoting “Independence Day 2,” Fox Studios created two companion promotion websites, “www.­ independencedaymystreet.com“ and “www.warof1996.com/.” The first website used the Google Maps Street View API, which generates a 360-degree panorama view of a given location (to see the normal usage of Google Maps Street View, navigate to https://mapstreetview.com/), and draped battle scenes and custom visualizations over the street view imagery to make it seem like a battle was being waged. The “War of 1996” web page used Three. js to create a 3D visualization of the earth, but with custom markers denoting events of the movie franchise series. These are just two examples, but the web is full of hobbyists, nonprofit organizations, and companies creating GIS products to promote their brand, ideology, or just to have fun. Although not many have ever opened a desktop GIS, many people can say they have looked up a satellite view of their home in an interactive web map. It is an exciting time for web GIS!

198

GIS: An Introduction to Mapping Technologies

7.5 Desktop Web GIS Exercise One of the innovations that web GIS has brought to the table is the capability to run GIS operations completely within an Internet browser to create and manipulate maps. However, even with the advances that have been made in web GIS, desktop GIS software (i.e., software installed on a personal computer) typically still has more advanced capabilities than online web mapping platforms. ESRI ArcMap supports publishing GIS maps created in the desktop, directly to the web. ArcMap maps can be published to an ArcGIS Online personal or organizational account, or to an ESRI GIS server. An ESRI GIS server, called an ArcGIS Enterprise/Portal for ArcGIS server, allows users to host ESRI maps using the ArcGIS Online API on the desktop rather than in the cloud, as well as allowing access to certain extensions and capabilities unavailable through ArcGIS Online. Because setting up and using a GIS server is an advanced topic, here we focus on the ArcGIS Online cloud platform as it allows any user to create a trial account to follow along. To create an account click “Sign in” at the top right of “www.esri.com/en-us/home,” and request an account. To follow along with this exercise, you should have ArcMap installed on your computer. You should also have downloaded the lab data accompanying this chapter, “Hybrid_Electric_Taxi_Chicago.” It has already been geocoded. There are two shapefiles: Hybrid_Electric_Taxi_Chicago and Hybrid_Electric_Taxi_Subset. The first is every hybrid and electric licensed taxi as of July, 2018 in Chicago. The latter is a subset of electric and hybrid taxis for the Albany Park and Lincoln Square neighborhoods only, and is a substantially smaller dataset (197 records versus 5879 records). If you are having trouble with the larger dataset, or are trying to budget your ArcGIS Online credits, follow along using the smaller dataset. Figure 7.12 shows the login screen that is presented when the user goes to “File – Sign In” from within ArcMap. Notice that the option is given to sign in with “Enterprise Login”; if your account is associated with an enterprise organization (such as a university or corporation) be sure to sign in through that tab. In this scenario, we have a layer of public chauffeurs (nonrideshare taxi drivers) with hybrid and electric cars in Chicago, courtesy of the Chicago Data Portal. This layer is a subset of the original dataset that contained all licensed taxi drivers in the city and was tens of thousands of entries. Normally it is not a bad practice to create views of our data but leave the overall layer intact in case we want to return to looking at the whole dataset rather than a subset, but in the case of pushing this dataset to an online map, it will be slow and inefficient to publish the entire dataset if we know we will only be looking at electric and hybrid vehicles. In Figure 7.13 we expand the “File” menu to show the “Share As” option, which allows us to choose between Map Package and Service. The “Map

Web GIS

199

FIGURE 7.12 ArcGIS sign in page for ArcGIS Online from the ArcMap program; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Package” deserves brief mention here as it is one of the best ways to contain and share a dataset created in ArcMap. Most beginners in ArcMap have on occasion run into the issue of saving a map document (MXD) without realizing that the data layers used in the MXD are not packaged together. Instead, to move a GIS project from place to place, or send it to other people, it is best “zip” the files together so that all the data layers are together. But even this can cause problems if the original directory structure is not maintained (ArcMap MXD files when opened will show red exclamation points until the user manually links them to the data layers that are being referenced). A map package (MPK) solves this issue by wrapping the project and all the relevant files in the data frame into one single package that can be uploaded to ArcGIS Online or saved to disk. When a map package is double-clicked by a user with ArcMap installed on their desktop, it will automatically open ArcMap and the project with the data layers intact. However, one of the problems with the map package is that it is limited to the ESRI product suite only; trying to open a MPK with QGIS will not work. The QGIS ecosystem does have their own version of Map Package called “GeoPackage,” and it accomplishes the same goal. Figure 7.14 shows how to create a Map Package.

200

GIS: An Introduction to Mapping Technologies

FIGURE 7.13 “Share As” dialogue for Map Package and Service publishing options; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

But for our purposes we will instead choose Service, so that we can push this dataset about hybrid/electric taxis to our ArcGIS Online account. Figure 7.15 shows the dialog for publishing to ArcGIS Online; you can specify a GIS server or a different organizational/personal ArcGIS Online account at this stage if desired. Figure 7.16 shows the initial “Service Editor” dialog, as well as the attribute table for the Hybrid/Electric taxi dataset overlaid. There are five tabs on the left-hand side of the dialog box: Parameters, Capabilities, Feature Access, Item Description, and Sharing. On the top right we see “Import,” “Analyze,” “Preview,” and “Publish.” The Parameters tab has one item of interest for us now, the Properties, which reads “Maximum number of records returned by the server.” If we leave this at the default value of 1,000, we will have a problem as the records in this dataset exceed that number, as shown in the attribute table. Of course, we could be careful to zoom in to such a level that upon opening the web map not all records would be returned at once. Nonetheless, in this case we want to be sure that if a visitor wants to view all records that there will not be a limit on the server end keeping them from retrieving all 5,879 records.

Web GIS

201

FIGURE 7.14 Map Package publishing dialogue. This dialogue will not be available if you are not signed in to ArcGIS Online first; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 7.15 Publish a Service dialogue. You can select between different connections, or click the icon with a plus sign to add a new connection, as well as set the service name; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

202

GIS: An Introduction to Mapping Technologies

FIGURE 7.16 Parameters tab of the Service Editor. This allows you to set different options for the map service you are publishing; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Next, is the Capabilities tab shown in Figure 7.17, where we see the options to enable “Tiled Mapping” or “Feature Access.” Since Feature Access is enabled by default, it shows up as the next tab. We will not enable Tiled Mapping as we are dealing with point data; if we were instead trying to push a raster dataset to ArcGIS Online we might consider Tiled Mapping as an ideal solution. Figure 7.18 shows the “Feature Access” tab. Note that it refers to a “REST URL,” which stands for “Representational State Transfer,” and is the method that the ArcGIS Online JavaScript API uses to make requests between the client (the desktop or mobile browser) and the server sending out the web map data. “Create, Delete, Query, Sync, and Update” refer to the potential actions a client can make on the dataset. Typically, beginners should be sure that Query is enabled, and leave the rest unchecked. If “Create” were enabled, someone with access to the web map could submit new data to the layer, or if “Delete” were enabled, an attribute could be deleted. There are use cases for this type of behavior, such as with collaborative web maps, but “Query” enables a user to have the features sent to and rendered by the browser, which is the most typical scenario.

Web GIS

203

FIGURE 7.17 Capabilities tab of the Service Editor; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

The Item Description tab prevents users from publishing a map service until the Tags and Summary are filled out in this section, ensuring that map services without any explanation or documentation are not published. It is good practice to give a thorough explanation of the dataset and where it came from here, especially in a larger GIS organization where others may use your map services and want a way to understand them and the data they contain. Finally, the last tab to be completed is the Sharing tab, as shown in Figure 7.19. In this example, we have the option to share it with “My Content,” “DePaul University,” and “Everyone (public).” If we share it to My Content only, we will be the only people able to see it, whereas if we share it with DePaul University (if you belong to another organizational ArcGIS Online account, the name of your organization will be listed here instead) then only people with ArcGIS Online accounts associated with DePaul University will be able to view the content. Finally, if we click Everyone (public), it will be fully accessible to the public. Depending on the intended audience of the data, you should be very careful with these settings. If you are working with very sensitive data, mistakenly clicking “Everyone” will allow anyone to find and

204

GIS: An Introduction to Mapping Technologies

FIGURE 7.18 The Feature Access tab allows you to set which operations such as delete, update, and sync, for a map service; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 7.19 The Sharing tab allows you to set who the map service will be available to, or whether it will be available to the public; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

use your data. Conversely, if you limit the audience, but attempt to embed the dataset in an interactive web map for public consumption, the sharing properties will default to “least privilege,” meaning that if any item is limited, it will not be overridden and embedded in a publicly available map.

Web GIS

205

FIGURE 7.20 When preparing a map service through ArcMap, the program will warn you of serious errors that need correction before publishing. Right clicking on an error will give options to resolve the error, as well as link to help files that explain what the error is; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

We are now ready to analyze our map. The “Analyze” button allows us to see if we have any serious errors with our dataset. In this case, as shown in Figure 7.20, we have several issues that will prevent us from successfully publishing the map without changes. Notice the “Code” next to each error; ArcGIS “Help” resources are helpful for tracking down assistance with error codes not mentioned in this book. High: Unresolved: 00098 Missing Tags in Item Description High: Unresolved: 00097 Missing Summary in Item Description These two errors pop up frequently. It is easy to forget that ArcMap will not let you publish a map without tags and a description, so remember to fill out the item descriptions before attempting to publish. High: Unresolved: 00037: Basemap Layers cannot be published directly to a service (2 items) In this ArcMap document, we loaded an online basemap to help with visualization of the taxi license locations. However, since basemaps are themselves a service, be careful to remove any basemaps services from the layer list before attempting to publish. Medium: Unresolved: 10045: Map is being published with data copied to the server using data frame full extent. Issues like this can often be resolved by right-clicking on the item in the list and seeing what options pop up. In this case, right-clicking will take you to the Data Frame Properties dialog where you can set the data frame to only be the extent of the dataset we are uploading (rather than, for instance, the entire planet). Since our dataset is entirely within Chicago, it is sensible to

206

GIS: An Introduction to Mapping Technologies

FIGURE 7.21 Full Extent dialogue of ArcMap; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

limit the data frame extent. Figure 7.21 shows the extent dialog that allows us to limit the extent to the outline of the layer of interest. Low: Unresolved: 30003: Layer draws at all scale ranges (2 items) This message warns that, if published in the current state, users will be able to zoom in and out and the map will redraw each time. Layer properties allow you to change this behavior and specify the exact scale ranges. Think back to our example of the OpenStreetMap scales; drawing a county road would be computationally pointless if we are zoomed out so far that we can see the entire region. In this case, we should limit the scale to be the overall city of Chicago.

7.6 ArcGIS Online Exercise If you have followed along with the previous steps, you should have the “Hybrid and Electric Taxi Chicago” dataset in your ArcGIS Online account. Under “My Content,” click the item and then click “Open in Map Viewer” on the right-hand side, which will open up the map viewer that allows changes and analysis, as seen in Figure 7.22, and click “Add to new map.” You may want to click the “Save” button and name the map something you will remember so you can come back to it. The Map Viewer allows us to create web maps, which can combine different layers and apply analysis or stylistic changes to them, before finally publishing the maps.

Web GIS

207

FIGURE 7.22 The web map after navigating from the My Content page (you should, of course, be looking at your own “My Content” page, not the “My Content” page of the authors… an important distinction!); from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Note that ArcGIS Online is a changing, evolving service, and these steps could change. Visit “learn.arcgis.com/en” for an ESRI-maintained repository of tutorials, some of which are completely free to the user, if you run into problems. The Help files can also help in these cases too. Even though we previously published the layer from ArcMap, take a look at the “Add” button, shown in Figure 7.23. Clicking this button gives the following options: Search for Layers, Browse Living Atlas Layers, Add Layer from Web, Add Layer from File, and Add Map Notes.

FIGURE 7.23 The add button in the Map Viewer area of ArcGIS Online; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

208

GIS: An Introduction to Mapping Technologies

Search for Layers allows you to look through layers from your own content, your organization’s content, or any content available through ArcGIS Online that is public. For instance, after clicking “Search for Layers” and then “ArcGIS Online,” typing “police” and searching will give several layers from different ArcGIS Online users. “Browse Living Atlas Layers,” meanwhile, is a more curated list of layers maintained by ESRI. Add Layer from File and Add Layer from Web allow you to upload layers from your desktop or from a GIS server, respectively. Add Map Notes allows you to draw points, polygons, and lines directly onto the map with the option to create a feature class from the shapes. Figure 7.24 shows the “Add Layer” dialog box; note that although you can add shapefiles, they have to be enclosed within a zip file before being uploaded to the ArcGIS Online platform. Figure 7.25 shows what would happen if you added the shapefile we have been working to ArcGIS Online rather than preparing it through ArcMap first. The file is too large. The problem here is that we are trying to add a data layer directly into a web map; the proper method is to “host” the layer by adding it from My Content, as shown in Figure 7.26, and then by adding the layer from the Add button within the web map. Rather than uploading a large dataset as a layer in a web map, we are hosting the layer through ArcGIS Online, then making it available as a layer that can be imported. In addition, this method bypasses ArcMap altogether and shows that ArcGIS Online can be used alone to import and analyze data. This is ideal for systems that cannot meet the system requirements of ArcMap, but the tradeoff is that some settings (such as allowing the server to return more than 1,000 records) are either hidden or do not exist when publishing directly to ArcGIS

FIGURE 7.24 The “Add Layer” dialog box. Notice that you can add various GIS datasets in addition to shapefiles. You are encouraged to consider the disadvantages and advantages of the different file formats discussed here; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Web GIS

209

FIGURE 7.25 ArcGIS Online prevents the large (5,000+ record) dataset from being added to the web map before becoming a hosted layer first; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 7.26 Hosting a large layer directly through ArcGIS Online without using ArcMap as an intermediary; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Online. Another advantage is that ArcMap supports Python scripting, allowing GIS users to programmatically prepare and upload interactive web maps through ArcMap with hard-coded (or relative) settings. If you are working with simplified or small datasets, these limitations may not be of much concern. If you are not still in the Map viewer, go back to the web map (denoted in My Content as a “Web Map”). Click the “Add” button and then navigate to

210

GIS: An Introduction to Mapping Technologies

Living Atlas: type in “population density 2018” (as shown in Figure 7.27). As the description points out, the 2018 USA Population Density layer shows population density by county to block group (depending on zoom-in level). We may be interested in seeing whether electric/hybrid taxis are centered north side rather than south side as an exploratory analysis, and adding a population density layer may help. There are pros and cons to gathering layers through the Living Atlas or sifting through ArcGIS Online; sometimes they save time, as with adding a population density layer directly rather than having to do field calculator on raw US Census datasets, but other times the data may be unreliable, overly simplified, or have other issues. Be sure to examine the metadata of any datasets you intend to use to ensure they are appropriate for your analysis. Figure 7.28 shows the Analysis tab (near the Basemap button on the lefthand side). Expand “Analyze Patterns” and then click “Calculate Density.” Figure 7.29 shows some suggested settings. The ArcGIS Online Analysis tools are very similar to the “Kernel Density” and “Point Density” tools used in ArcMap Desktop. Figure 7.30 shows the resulting density map (which is in raster format and clipped to the furthest extent of the point data, similar to the equivalent ArcMap tool). This cluster map has three apparent clusters, and one of them is placed near the O’Hare airport (it is on the far left, bisected). If we were basing our exploratory analysis on the assumption that taxis should be allocated in a negative or positive spatial correlation across

FIGURE 7.27 Living Atlas of ArcGIS Online. Searching through the Living Atlas provides datasets such as USA Population Density; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Web GIS

211

FIGURE 7.28 Hybrid_and_ElectricTaxi_Locations web map through ArcGIS Online; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

the city, this could be the first big clue that variables are missing from the equation. It makes intuitive sense that taxi depots would be located closes to airports for efficiency purposes. The largest cluster, to the right of O’Hare Airport, is centered around the Kennedy Expressway. Near the center of the cluster is a business park where several trucking companies and related services are also located due to easy access to the highway. Again, it makes sense that taxi companies would cluster near easy access to a highway, allowing them to extend their range for dispatch. There is one other factor this analysis is missing, which is perhaps ironic given this chapter’s focus on web GIS. Many people now rely on decentralized ride-sharing services such as Lyft and Uber, which use mobile phone applications, GPS, and GIS to find efficient paths between riders and drivers. These are not (as of this writing) licensed as taxi drivers. Like the disastrous polling efforts of the Chicago Tribune prematurely predicting “DEWEY DEFEATS TRIBUNE” based on erroneous polls, if we continued to assume that our dataset shows only licensed taxi cabs tell the whole picture of hybrid/electric taxi rides, we would be wrong. We can now wrap this up. Click on “Share,” which should bring you to a window like the one in Figure 7.31. Click the button on the right that says “Create a Web App.” Scroll down and click “Live Maps,” and then click “Done” to create the Web App. This web app allows you to pull in live feeds of datasets as a layer, which populates over time given the search parameters. Figure 7.32 shows an example of a live feed that has been set up to include any tweets using the phrase “Prius.” A company or organization may set

212

GIS: An Introduction to Mapping Technologies

FIGURE 7.29 Calculate Density option of the Analysis tab within ArcGIS Online; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

up a live feed to track public tweets of interest to build their marketshare, or given the nature of public data, law enforcement may track for keywords that might indicate criminal activity or cyberbullying, for example.

7.7 Emerging Trends in Web GIS and Advanced Topics In the last decade mobile phone technology has developed so rapidly that today smartphones are comparable to the computers of yesteryear in terms

Web GIS

213

FIGURE 7.30 Results of the Calculate Density tool in ArcGIS Online; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

of processing power. Mobile networking technology has also increased, allowing in some cases (with 4G LTE cellular tower access) speeds exceeding the domestic broadband of some markets. Mobile GIS with GPS receivers were components of fieldwork and surveying, as well as other adjacent fields, well before mobile computers were at today’s level of sophistication. The increasing sophistication of consumer cell phones is gradually democratizing mobile GIS beyond specialized use. For example, Pokemon and its antecedent, “Ingress,” are mobile video games developed by Niantic Labs (licensing the Pokemon property from Nintendo) that use a smartphone’s internal GPS receiver, processing power, and data connection to create augmented reality worlds, where players move about the real world to catch

214

GIS: An Introduction to Mapping Technologies

FIGURE 7.31 Share dialogue within ArcGIS Online, which allows you to set who can view a web map; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

FIGURE 7.32 Live Feed of Twitter, accessed through the social media feed features within ArcGIS Online; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Web GIS

215

creatures in the digital world. The Pokemon Go game generates the “pocket monsters” by cataloging real life places of interest and peppering Pokemon in real-time around those locations. Players have their GPS location tracked while in game, and when they are close to the Pokemon, have the opportunity to play a short mini-game that gives them the chance to collect one of over 150 different varieties of Pokemon. This popularity of this game may hint at the future of web GIS. While augmented reality applications that use GIS data and the movements and behaviors of people promise exciting applications from disaster relief to improvements in quality of life, many web applications, GIS and otherwise, consume and disseminate a large amount of user-generated data, not always with the best of intentions, raising privacy concerns for consumers and corporations alike. As web GIS become easier to use in the browser, and more tools are made available to use web GIS platforms or build one’s own fully fledged web GIS site from scratch, we are also seeing more unexpected and unorthodox uses of web GIS, for political purposes, such as crowdsourcing volunteered GIS information. “FashMaps,” a website that maps known or estimated locations of Neo-Nazis around the world, was built using the open-source Leaflet and keeps an up-to-date list of Neo-Nazis and their estimated locations, to keep people informed of possible far-right extremists in their neighborhoods. During a period of student protests in London in 2011, protesters created a mobile web GIS application suite called Sukey to help protestors avoid kettling (methods police use to control protester movements and shift them into a smaller, more controllable area using barriers and roadblocks). Just as web GIS have been democratized from the ground up, state agencies and governments have also used web GIS to assert social control using geotagged photos (digital photographs with GPS imprints) in criminal investigations, geolocating phones using “Stingray” cell tower simulators, and civil projects such as Chicago’s “Array of Things,” which plans to outfit Chicago light poles across the city with sensors that can detect everything from air quality to how many pedestrians walk by and feed into massive datasets released to the public. Figure 7.33 shows an augmented reality installation assembled at DePaul University using cheap computer components and off-the-shelf parts. The installation functions by using a 3D camera paired and calibrated with an overhead projector to sense the heights of sand in the sandbox below and project a isoline surface on top of the sand. Moving the sand changes the visualization, and holding a hand under the 3D camera generates water visualizations. In 2010, the Microsoft Kinect was cutting-edge consumer technology but is now defunct and can be bought for pennies on the dollar only a few years later. As of this writing, virtual reality headsets like the Hololens, the Oculus Rift, and HTC Vive are fairly expensive and require expensive setups, but in the near future, they may be commonplace and drive new web GIS applications. As web GIS continue to be a major part of the growing

216

GIS: An Introduction to Mapping Technologies

FIGURE 7.33 An augmented reality installation at DePaul University, completed thanks to public build guides available from the UC Davis “Augmented Reality Sandbox” web page, https://­ arsandbox.ucdavis.edu/about.

geospatial business sector, we can only assume that the rapid pace of web GIS functionality and related technology will continue as well.

References Google. https://developers.google.com/maps/documentation/javascript/examples/ map-geolocation. Accessed July 09, 2018. ArcGIS Online: Cloud-Based GIS Mapping Platform. ArcWatch: A New View of the Arctic. www.esri.com/en-us/arcgis/products/arcgis-online/resources. Accessed July 09, 2018. Berners-Lee, T., and M. Fischetti. 2011. Weaving the Web: The Original Design and Ultimative Destiny of the World Wide Web by Its Inventor. HarperBusiness: New York, NY. Felke-Morris, T. 2016. Basics of Web Design: HTML5 & CSS3. Harper College: Boston, NJ. Fu, P., and J. Sun. 2011. Web GIS: Principles and Applications. ESRI Press: Redlands. Simon. Nazis in Your Neighborhood. FashMaps. February 14, 2018. http://fashmaps. com/2018/01/28/daily-stormer-nazis-organizing-in-the-us/. Accessed July 09, 2018. Official MapQuest - Maps, Driving Directions, Live Traffic. Official MapQuest Maps, Driving Directions, Live Traffic. www.mapquest.com/. Accessed July 02, 2018. OpenStreetMap. www.openstreetmap.org/. Accessed July 03, 2018. Strike Debt. 2014. The Debt Resisters Operations Manual: A Project of Strike Debt. PM Press: Oakland. Tomlinson, R. F. 2013. Thinking about GIS: Geographic Information System Planning for Managers. Esri Press: Redlands, CA.

8 Open-Source GIS

8.1 What Is Open-Source GIS? In a kitchen, a chef may be privileged to have a wide array of tools: a German 8 in, “chef’s knife” for chopping vegetable, a santoku 7 in. blade for mincing, an immersion blender for making sauces, and so on. To a casual observer, some blades and appliances might look the same yet each has distinct advantages or disadvantages depending on the task they are used for. A chef may also choose a tool based on subjective reasons, such as experience or comfort level. In a similar way, being aware of different ways to approach GIS problems using different software tools is a valuable skill. But in other ways this analogy quickly falls apart. While software programs are tools, they are complex and learning and keeping up with multiple GIS programs can be difficult. To add to this complexity, it is common for GIS software programs to significantly change between versions, with changes to tools and other functionality. In this book, we focus on the ESRI ArcMap program, one of the market leaders in GIS software across private industry and governmental sectors in the United States. However, other GIS programs such as QGIS have traction and a wide user base, especially in other parts of the world such as Europe and Asia. QGIS is a desktop GIS program, just like ArcMap, with many of the same features as ArcMap. Although the tools within QGIS and ArcMap might have different names, GIS professionals can usually accomplish the same tasks in both QGIS and ArcMap, though the speed of completing tasks may differ. However, a major distinction between QGIS and ArcMap is that ArcMap is a “closed source” proprietary system while QGIS is an “open source” program. This means that the core code that powers the ArcMap program is copyrighted and cannot be used for free, whereas the code for QGIS is freely available without cost. QGIS has been built entirely by volunteers creating and sharing code to build and improve upon the software over time. This gives QGIS (and other similar open-source GIS programs) some key situational advantages not shared by paid software. First, some organizations such as nonprofit organizations or government offices in developing countries lack adequate resources to invest in paid GIS software. While GIS companies often give steep discounts on their paid software to nonprofit 217

218

GIS: An Introduction to Mapping Technologies

companies or for educational purposes, this is not guaranteed. By comparison, when a GIS software product is released as an open-source program, it will always be free. There are several-open source GIS programs besides QGIS with their own histories. GRASS GIS (Geographic Resources Analysis Support System) dates back to 1982 and is an open-source program that was originally created by the US Military. In the 1990s, the GRASS was put into the public domain and its software development was taken over by Baylor University, later released with a permissive “GNU General Public License”, allowing people to make changes and contributions to the code as long as the code changes were transparent and viewable by all parties. Today, GRASS GIS is a stable release that can be used through the command line or with a GUI. GRASS GIS’s functionalities and core features are especially tied to scientific uses, and there are some uses of GRASS GIS that are especially powerful in these areas due to a long history of the codebase. Furthermore, due to collaboration between the QGIS and GRASS project teams and volunteers, many of these tools and features can be called directly in QGIS using the GRASS plugin, augmenting the existing features of QGIS. There are many other open-source desktop GIS releases. To name a few, gVSIG, which is especially good for 3D visualization, SAGA GIS (with a focus on scientific GIS applications similar to GRASS), and GeoDa (which is excellent for exploratory spatial data analysis). But remember that you ultimately want the best tool for whatever GIS task you have at hand. For a complex application such as GIS desktop software, it’s important to choose software that is actively updated by a large userbase of volunteers. This helps make sure that the tool keeps pace with current trends, bugs and issues in programs are timely addressed, and security defects that could harm your computer are kept to a minimum.

8.2 Getting Started with QGIS QGIS is installed as an application, just like ArcMap or other desktop GIS. QGIS is available across several different platforms: Windows, Mac, and even Linux. “Plugins,” community-developed code packages that add features to QGIS, work independently of the operating system QGIS is installed on. As of this writing, QGIS is installed by visiting their main website, “https:// qgis.org” (Figure 8.1). In this chapter, version 3.0 of QGIS is used, at the time of writing the latest QGIS release, but QGIS (and many open-source projects) also has a version 2.18 “long-term release.” QGIS, along with other open-source software projects, releases two versions supported at the same time; one including the most latest features that may not be fully tested for stability, and the other a

Open-Source GIS

219

FIGURE 8.1 The homepage for QGIS, at “ https://qgis.org.” The different tabs at the top, Discover QGIS, For Users, Get Involved, and Documentation, are worth exploring to understand the philosophy behind QGIS as well as to get up to speed on how to download and install QGIS. For Users provides the base installer for QGIS.

“long-term release” designed with stability in mind. If you experience unexpected issues such as crashing or tools not working, you may want to try a long-term release instead of the latest release. We will begin by getting data into QGIS. The program supports a wide array of GIS data types, CSV, shapefiles, GeoJSON, raster formats, and PostGIS databases, but CSV files and shapefiles are fairly ubiquitous in the GIS world so importing CSV files is a good start to get familiar with file importation. We will start with a CSV file downloaded from the Chicago Data Portal, at https://data.cityofchicago.org, where you can find a whole catalog of data related to city issues. Some of the files on the Chicago Data Portal can be downloaded in shapefile format and used without much processing in a GIS project, but many are in Excel or CSV format and need a bit more work to be mapped within GIS software. Figure 8.2 shows a snapshot of the “Map of Urban Farms” dataset with the different icons for managing/exporting the data. Using the search bar in the middle of the site (or navigating directly to the dataset at https://data.cityofchicago.org/Environment-SustainableDevelopment/Map-of-Urban-Farms/uti6-fp3f) look for the “Map of Urban Farms” dataset and export it as a CSV. If you were to open up this file in a spreadsheet program such as Microsoft Excel, you would see that it has both addresses and latitude/longitude coordinates in fields within the sheet. If we import this CSV into QGIS, with just a few clicks we can map the points within the program, and from there, export the file into other formats. Once QGIS is installed, and the example CSV is downloaded, click the “Data Source Manager” on the far left (the icon that appears as three sticky notes with a plus sign) or alternatively, either click the Layer tab and click Data Source Manager, or click the Layer tab, click “Add Layer, and then click “Add

220

GIS: An Introduction to Mapping Technologies

FIGURE 8.2 “Map of Urban Farms” from the “ Chicago Data Portal,” a source for information, GIS and otherwise, relating to the City of Chicago. By clicking the “Export” tab, this particular dataset can be exported in CSV format.

Delimited Text Layer.” (Since CSV files are a type of Delimited Text Layer, this is the correct choice to add this file type.) As you can see, there are a wide variety of layers types that can be brought in to QGIS, with buttons corresponding to Raster layer file types, Vector shapefiles, and so on. You will even see layer types such as “ArcGIS Map Server” and “ArcGIS Feature Server.” These two layer types are ArcGIS web maps hosted online that are linked to the QGIS application and mapped inside the desktop. Even though these types of online maps can only be hosted using paid ArcGIS products, QGIS can use them in proprietary layers. However, note that if we were attempting to import a GIS dataset in shapefile format instead of a CSV, all we would have to do is choose “Vector” in the Data Source Manager and select our shapefile, a process not much different from importing shapefiles in ArcMap. Once you have specified that you want to bring in a Delimited Text layer, a prompt will appear giving you the opportunity to specify the location of the file and set different flags. The three periods to the right of the “File Name” prompt allow you to choose the file, at which point it should automatically set the “File Format” flag to “CSV.” However, you will still have to set the “Geometry definition” so QGIS will know how to display the CSV file as mapped points.

Open-Source GIS

221

You might intuit that we should set the X and Y fields to the respective Longitude and Latitude field values. However, understanding the dataset that we are working with is prudent to avoid unnecessary headaches. The Chicago Data Portal is maintained using a service called “Socrata,” which is a bit hidden on the website. Clicking the “Documentation” button at the bottom right of the page brings us to the “Socrata” home page, but it might not be obvious that the website runs using Socrata, nor why that matters. But if you were to dig in to Socrata’s documentation, you would learn three important things: 1. Socrata CSV datasets with geometry information default to the EPSG:4326 coordinate system. 2. Socrata coordinates are encoded in longitude, latitude order. 3. They are DD (Decimal Degrees) rather than DMS (Degrees, Minutes, Seconds). Figure 8.3 shows the QGIS dialog with these things. If we did not uncheck “DMS coordinates” in the dialog, set the correct coordinate system, or

FIGURE 8.3 CSV Import dialog. (a) Farmers markets mapped. Notice the “Layers Panel” in the bottom left. If Map_of_Urban_Farms is below the Google Streets basemap, it will not show since the top layers of the map were drawn first (and thus cover our vector point layer entirely). To access basemaps in QGIS, go to “Plugins,” “Manage Plugins,” and then search for the “OpenLayers” plugin. You can then bring in several types of basemaps as layers by clicking “Web” and then navigating to the OpenLayers plugin. If you are using QGIS 3.0 or above, look for the QuickMapServices plugin instead. (Continued)

222

GIS: An Introduction to Mapping Technologies

FIGURE 8.3 (CONTINUED) (b) Layer Properties settings, similar to ArcMap.

indicate the X and Y fields correctly, the CSV file would be imported into QGIS, but it would not properly display the points. In Figure 8.3a you can see the urban farms mapped inside QGIS, denoted by large triangles. Similar to ArcMap, the symbology can be adjusted by right-clicking the layer in the sidebar (called “Layers”), entering the “Layer Properties” settings, and scrolling to the Symbology tab. Figure 8.3b shows the different settings that can be adjusted within the Layer Properties. As noted, in the Symbology you can adjust the size, color, and shape of symbols, choose choropleth maps with classification methods (Quantile, Natural Breaks, Equal Interval), and change other symbology settings. Other settings can be controlled through the Layer Properties: The Labels setting allows you to use a field to label points, polygons, or lines and the Joins tab allows for table joins and relates, similar to how this is done in ArcMap. We are working with a CSV file that includes coordinate data, but what if we need to bring in a CSV file that only has addresses? We would have to geocode the file, a process that is slightly different in QGIS than in ArcMap. As a reminder, geocoding is the process of looking up addresses and geographic information and translating them into coordinate information, allowing geometry to be associated with a dataset. When geocoding in

Open-Source GIS

223

ArcMap, it is typical to conduct the process using an ArcGIS Online account, calling an ESRI service that returns information about the coordinates of a given address. This typically requires a small amount of “ArcGIS Online service credits” such that large geocoding requests over time end up costing money. While geocoding in QGIS is free there is a different catch. Since geocoding uses servers available for the public such as Google Maps and OpenStreetMap, attempting large geocoding projects can cause you to hit “usage limits” quickly.

8.3 GDAL and Raster Data GIS analysis often uses raster datasets. While vector datasets are represented by points, lines, and polygons and contain an attribute table, raster datasets are structured instead as an array of cells, or pixels. Think of a digital image; even with a very good camera, if you zoom in close enough, you will start to see small squares of a single color. Those squares are called pixels. The resolution of a picture signifies how many pixels in total are in the picture; a resolution of 600 × 400 means the image is 600 square pixels of width and 400 square pixels of height. In a digital photo, each pixel contains a value that is used to map color to each pixel. In a grayscale image, one value alone for each cell is typically enough to determine an intensity from black to white, as seen in Figures 8.4 and 8.5. In color images, “bands” create a composite of three different values for each pixel; these three bands represent red, green, and blue, or RGB. This is how cameras store their digital images, and it is also how aerial photography, earth monitoring satellites, and drones equipped with cameras record and store data. The processing and analysis of satellite imagery, aerial photography, and other similar raster datasets is the primary focus of a sister technology called “remote sensing.” Operations and analysis for remote sensing raster datasets demand completely different tools and operations than vector datasets. However, because raster datasets represent continuous data, they are especially effective for certain types of analysis. Imagine we have a raster dataset that has been generated from a satellite in space. This satellite has an instrument that can sense the surface temperature in an area while it passes over in space. Since the satellite can record extremes in temperature, we can look for extremes of temperature and map active fires. In simple terms, this is how NASA scientists use thermal data from the Moderate Resolution Imaging Spectroradiometer (MODIS) in orbit to map active wild fires here on earth, as shown in Figure 8.6. Other satellites can record real color images with RGB color. These are often used as “basemaps” in regular GIS programs, with vector datasets overlaid. The power of remote sensing imagery is that, using a particular imaging satellite, you can sweep the same area year to year

224

GIS: An Introduction to Mapping Technologies

FIGURE 8.4 A grid of values ranging from 0 to 255, without color assigned to each value.

and have the same resolution of the resulting image, and look for changes between time periods. For instance, you could compare the chlorophyll in tree cover in a national park over the course of a decade to gauge overall tree health, or compare land cover change in a city in terms of development, as Figure 8.7 shows. Both proprietary and open-source tools are available that work with raster datasets. ArcMap has the “Raster Calculator” toolset among other tools, and other products such as ERDAS IMAGINE and ENVI are specifically designed to work with raster datasets and remote sensing imagery. However, because of the long history of scientists and public organizations creating their own tools and algorithms to work with raster datasets, there are robust and well-supported open-source programs that offer the same functionality, and in some cases, unique functionality. QGIS has raster tools comparable to ArcMap, GRASS GIS has a legacy of tools and libraries going back decades,

Open-Source GIS

225

FIGURE 8.5 The same grid from Figure 8.4, with color values assigned, such that a low value is a low intensity (0 is completely white) and a high value is a higher intensity (255 is completely black). Color ramps are assigned by the user, so this could easily be reversed in the GIS program such that 0 is pitch black, and 255 is completely white.

and NASA releases tools like Worldview to interact with NASA remote sensing datasets. The Geospatial Data Abstraction Library (GDAL) is an opensource software library that dates back to the year 2000 and now powers raster analysis in several programs, both proprietary and open source. A “software library” refers to computer code designed to provide specific functions (such as raster analysis) that can be called from other software. For example, in ArcMap you can convert a raster to another raster format using the “gdal_translate” utility, while in QGIS, the same “gdal_translate” function is used to accomplish the same goal. In other words, if you are familiar with and understand the GDAL software library, you can perform raster analysis regardless of which software you use. Older maps completed before GIS or even computers were common can be scanned and brought into GIS applications as raster datasets as well. If

226

FIGURE 8.6 MODIS fire production.

FIGURE 8.7 LANDSAT basemap.

GIS: An Introduction to Mapping Technologies

Open-Source GIS

227

we wanted to take an older map of a city from the eighteenth century and overlay it over a current map of the city, we could not just simply import the older map into the GIS system and be done because the current map would most likely have a map project and coordinate system set, and the older map would not. Even if we applied the correct coordinate system and projection to the older map, we would also have to warp the map so that the corners matched the current extent and position of the subject area being analyzed. Consider the map created by Dr. John Snow investigating the cholera epidemic surrounding the Broad Street water pump in London 1854. You can find the map in the supplemental files corresponding to this chapter. First, if you have not already, install QGIS. If you are using QGIS 2.18 or earlier, you should also install the plugin “OpenLayers.” If you are using the newer QGIS 3.0, you should enable the “QuickMapServices” plugin. Whichever version you are using, enable the “OSM Standard” layer from either OpenLayers or the QuickMapServices plugin. Zoom down to the small neighborhood around Broad Street. An easy way to do this is by clicking the GeoCoding button and, when prompted, entering “Broad Street, London, UK.” This will prompt you to select from a few different possible full addresses that have the complete location (as shown in Figure 8.8). You should now have a single point geolocated in the neighborhood that you can right-click in the Layer List sidebar to zoom in to. Now we will bring in the John Snow cholera map. Go to QGIS “Plugins” in the menu and then to “Manage and Install Plugins.” Type in “Georeferencer GDAL” and ensure the plugin is activated. It should now be available as the “Georeferencer” tool under the Raster tab. Click on the ‘Georeferencer’ tool to start the plugin up and click on “Add Raster.” Next, add the cholera map as shown in Figure 8.9, ignoring any questions about applying transformations (as you do not yet have control points yet). Figure 8.10 shows the Georeferencer tool with the 1856 map imported as a raster. Review the supplemental lab for naming conventions for the raster. When hovered over, there are icons in the navigational bar of the georeferencer

FIGURE 8.8 Geocoding tool.

228

GIS: An Introduction to Mapping Technologies

FIGURE 8.9 Adding the cholera map.

FIGURE 8.10 The Georeference tool with John Snow’s cholera map imported.

Open-Source GIS

229

tool that relate to importing rasters, applying transformations, and applying control points. As navigation menus often change between releases, refer to lab documentation or, ideally, the Help files related to this plugin to learn more about your particular version. But briefly, from left to right, the checkboard is the “Add Raster” button, the Play symbol applies the transformation and georeferences the image (so do not click this until you are ready!), and the gear is the Settings icon. The icons that look like small satellites (one of them is noted by the “1” arrow as shown in Figure 8.10) relate to the GCP, or Ground Control Points. As noted, the “GDAL” library powers many QGIS operations, including the Georeferencer. The GCP are used to tell the GDAL “gdal_translate” utility how to “translate” the pixel coordinates in the map to be georeferenced to match the system of your QGIS project. “Translate” refers to moving a set of points in a given direction. If you have ever played with image processing software such as Microsoft Paint, you know that when you move an image using your mouse, you are telling the computer to geometrically translate the object in a given distance of pixels according to the path of the mouse. However, since older maps often inconsistent, we need to apply more complex translations that are not necessarily applied in a uniform direction across the entire map. Since the Golden Square park in the cholera map is still present in our current maps, we can place a GCP right in the middle of Golden Park, in both map views, to tell the gdal_translate utility to “translate this image so that Golden Park will be rubbersheet (seamlessly overlap) the same Golden Park in modern maps, as much as possible. Figure 8.11 shows the results of georeferencing in the Golden Square area.

FIGURE 8.11 The overlaid John Snow map with slight transparency.

230

GIS: An Introduction to Mapping Technologies

The small dots in the middle and at the top left represent the GCP used to make sure that Golden Square overlaps after the transformation is applied; the map is set to a slight transparency so that the OpenStreetMap Basemap layer with the contemporary layout of the neighborhood can be seen. Figure 8.12 shows the Transformation dialog that allows us to apply a transformation algorithm to the translation. You will see several different possible methods, several of which are included in ArcGIS suite of products, but we will not discuss each method in length here. Rather, when you want to georeference a map, ask yourself “What is the coverage of this map? Is it on a regional level, or a global one? What is the coordinate system of my project? Is it spheroid?” Some transformation methods such as Thin Plate Spline are more appropriate for low-quality originals, which might apply to our cholera map, but in this case, the Polynomial Second Order is appropriate. It is a nonlinear algorithm (linear algorithms can only translate, not transform), and it is a fairly popular and standard approach that is a good “first pass” before experimenting with other transformations. The appropriate minimum number of points for Polynomial Second Order is about three or four control points; control points should be strategically placed where you think the greatest distortion between the two maps might occur. In the case of an urban map, you may want to choose parks (such as Golden Park), known street intersections shared between both maps, and landmarks. Once you have placed the control points and set the transformation, you are ready to complete the georeferencing. Figure 8.12 shows the resulting map, with transparency set to view the contemporary street. You may notice that although some features appear aligned, others

FIGURE 8.12 Transformation dialog within the Georeference tool.

Open-Source GIS

231

seem “off.” This could be due to a variety of reasons. Maybe the control points could have been placed better, or maybe there needs to be some reallife ground truthing and understanding of city records to know how best to translate the city plan across 150 years! In Chicago, for instance, a city built on swampy muck soils, early buildings were said to shrink as much as an entire foot after construction! Notice that we also have the (albeit blurry) marks that signify deaths in certain places. A next step might be to convert these marks into marker features in a shapefile or similar format that record the total deaths over each point, to “modernize” this flat map into a GIS file that can be queried, etc.

8.4 GeoDa and Open-Source GIS Databases GeoDa is a “spatial data analysis” tool suited for exploratory data analysis. When GIS professionals conduct operations that involve spatial statistics such as regression analysis, they must first ask, “Do I have parametric distribution of my data, or is it skewed toward certain values? Are my observations spatially autocorrelated? A proper explanation of spatial statistics is outside the scope of this chapter, but there it is valuable tool for GIS professionals to approach and understand, especially if in making scientific claims about a dataset. A basic understanding of spatial statistics helps to determine whether points or polygons are “positively spatial autocorrelated” (i.e., whether similar values are clustered together in space). Imagine segregation of neighborhoods where richer neighborhoods cluster together or negative spatial autocorrelation where a wide range of values occur together in space (imagine a more diverse neighborhood where people with different incomes reside near each other). Exploring GIS data in this way is important both for understanding the nature of the data you are working with but is also a prerequisite for attempting certain forms of spatial statistics analysis as certain operations have assumptions about the distribution of level or lack of spatial autocorrelation, among other things. We will cover a few basic tools in GeoDa, but readers are encouraged to also approach the excellent tutorials on the GeoDa homepage (https://­geodacenter. github.io/index.html) as well. But first, in the words of Monty Python, “now for something completely different.” We will briefly discuss the GeoJSON data type as well as open-source GIS servers. This activity (located in the supplemental lab materials) uses a GeoJSON count of food inspections during a time period in Chicago, from the Plenario platform (based on PostGreSQL and PostGIS), an open-source database technology. GeoJSON is the standard for web GIS, and has many advantages over the shapefile format. Most obvious is the fact that in only uses one file rather than five or six separate files, making it an ideal export format for a web service.

232

GIS: An Introduction to Mapping Technologies

Plenario, a “big data” repository for GIS datasets, is maintained by the Urban Center for Computation and Data at the Computation Institute of the University of Chicago and Argonne. Plenario does not host static links to datasets. Instead, all data in Plenario is hosted in a PostGreSQL database instance using PostGIS. When you import a CSV file, even though it is just rows and columns separated by commas, a GIS program such as ArcMap or QGIS converts it into a relational database table. If you have ever tried a GIS operation using what you think is valid GIS file and are told that you are missing a GUID field (for example), the GIS software you are using assumes that your dataset is organized according to the rules of relational tables and that every row can be uniquely identified via a primary key or unique ID. This and other assumptions of your data are essential for operations such as table joins and relates, union operations, and selections (Select by Location, Select by Attribute), GIS operations that use both SQL and relational algebra, key aspects of database design and management. GeoJSON is an extension of JSON, or JavaScript Object Notation. Since JavaScript is so fundamental to web applications today, it is used for sending, receiving, and storing data. A JSON file wraps data in “objects” in name/ value pairs. Since these name/value pairs are in human readable text, the file is both lightweight and human-readable. However, one disadvantage of the JSON format is that there is no standardized or formal way to determine what a name/value pair actually is outside of the rules of JavaScript and the internal rules of an application working with the JSON data. For example, if you running an online banking service for your customers, you might decide that the name/value pair that represents customers should be named “customer.” But another bank might choose to represent that name/value pair using “customerName.” In the world of GIS, this lack of formalization in JSON would cause issues because GIS data often needs to be shared, and applications need to know how data should be interpreted and processed for basic GIS functionality to work. This is the rationale behind GeoJSON. At its core it is JSON, but it also is extended with formal rules such that certain keywords, such as polygon, point, and coordinates, are more than simply words with semantic meanings. Instead, they are standardized so that GIS applications that work with GeoJSON data know to process and project the name/value pairs in a GeoJSON file as points or polygon, in an efficient and lightweight manner. But note two key disadvantages of GeoJSON here. As noted in the specifications all GeoJSON should use the WGS84 (World Geodetic System 1984) reference system (so other transformations must be done through GIS applications and in other formats), and GeoJSON is inadequate for some advanced representation of features, such as topology or expanding beyond the elements of point, line, or polygon geometry objects. For many web GIS applications, these constraints do not prevent GeoJSON from being efficient, including Plenario. As noted, Plenario (accessible via web address “plenario.io”) is built using the open-source database PostGreSQL. The site hosts many different datasets from various cities, with the goal of making public data more accessible, as

Open-Source GIS

233

shown in Figure 8.13. Since they are all represented and stored on the platform in a database structure, these datasets can be queried to get the exact time frame, extent, and desired fields before being exported into the file format of choice (GeoJSON, shapefile, or KML). But note that event datasets can only be exported in GeoJSON or CSV format because it is much more efficient for simple datasets that would be represented as vector points to be exported as GeoJSON or CSV rather than an ESRI shapefile. This is because compared to the ESRI shapefile format, GeoJSON files are easier to create, run processing commands on to add or select records, and are smaller file sizes. In Data Explorer we can filter based on an event dataset of interest; in this case, we will search “food inspection” and choose the Chicago dataset “City of Chicago: Food Inspections.” The Data Explorer can be accessed at plenar. io/explore/discover, navigate to that web page to follow along. Figure 8.14 shows the event dataset and the ability to refine or filter it before exporting (we could, alternatively, download the “raw” complete dataset). It is often best to preview how the data is structured before applying such queries. In this case, we chose to select all recorded failed food inspections during the period between December 8, 2017 and March 8, 2018 by setting the start and end dates, and then setting “Select Field” to Results, the relational operator to “=”, and the value for the selection to “Fail.” This is a simplified userfriendly version of what may otherwise be constructed as “SELECT from “Food_Inspections” WHERE “results” = “Fail.” We will only know this selection will work if we preview the data and see that the possible values

FIGURE 8.13 The Plenario homepage.

234

GIS: An Introduction to Mapping Technologies

FIGURE 8.14 Plenario’s Data Explorer, with the Food Inspections of Chicago dataset selected.

are “Pass,” “Fail,” and some other possible values such as “Not Ready” and “Out of Business.” But note that we are running a query based on a given parameter, in this case, “Return a downloadable view of this dataset where all values of this column are typed as “Fail.” What happens if an entry in the Results column is mistyped, such as a value being “Fail”? It will not be returned in the results, because the SQL query is not “smart” enough to catch these types of human mistakes. Once we click “Add Filter,” we can add another filter, either applied to the same field or another; in this case, we add the Facility Type = Restaurant filter to limit the results to restaurant types. When we click “Refine Query,” a map is generated on the right with aggregated counts of events meeting our filters in small gridded polygons. When we click the “Download” button and select “Complete Raw Data,” we will get a GeoJSON (or CSV) file that includes only those food inspections that failed. If we export as a map grid, we will get a map grid that represents polygons aggregated with the counts of the failed food inspections. But what if we wanted to determine if there is a pattern to food inspections in the city of Chicago? Do certain parts of the city receive fewer food inspections than other parts of the city? This could be a precursor to an investigatory analysis of whether certain neighborhoods face more food inspections after controlling for population density and number of restaurants. While investigating this issue, we may want to investigate whether food inspections are spatially autocorrelated. GeoDa is especially suited for this type of analysis, ESDA, or Exploratory Spatial Data Analysis. You can download GeoDa from https://geodacenter.github.io/index.html. In addition to this simplified demonstration of GeoDa, a robust and continually updated tutorial is available on the website. After installing and opening a session of GeoDa, click on the folder button on the far left and select your dataset. Plenario will export a dataset in GeoJSON format, but the resulting file will still be labeled as a .json file and so on. Don’t worry—they will still adhere to the GeoJSON standards and be interpreted as GeoJSON file once imported. You will see a mapped

Open-Source GIS

235

grid of the food inspections, with each box having the value of counts (how many food inspections took place across the specified period of time within that gridded area). Before we investigate spatial autocorrelation with this dataset, we have to establish “spatial weights.” Spatial weights establish the logic that determines whether observations are neighbors. To access the spatial weight settings, click Tools in the GeoDa navigation bar and  then Weights Manager (or the shortcut represented by a large W), and then click “Create.” We need to establish a spatial weight before investigating spatial autocorrelation to determine the basis for whether observations are “close” or “distant” from each other. You can choose between contiguity weight or distance weight, but in this dataset, several grids are very distant from each other and would not be contiguous (touching), so we will investigate using the distance weight (click the Distance Weight) tab. Since we do not have IDs for each record, we must first click the “Add ID Variable” and select the default POLY_ID. Our options are Distance Band, K-Nearest Neighbors, and Adaptive Kernel. Distance Band uses a set distance (which corresponds to the units used by the projection of the dataset) to determine which observations might be clustered based on being inside or outside of that distance, with the option to weight that distance by an inverse power distance. Adaptive Kernel is appropriate for nonparametric datasets when investigating spatial covariance (i.e., how much two variables change alongside each other spatially. For this ­activity, as shown in Figure 8.15, we only demonstrate and discuss the K-Nearest neighbor option, which is a machine learning algorithm that calculates the distance between different observations. We will look at the distances in the aggregate to determine the proper threshold for which neighbors would be considered close or far and then assign more significance to closer neighbors than further neighbors as a weight. Click “create” and save the weight file to your desktop. We now have weights we can use to run a Univariate Local Moran’s I analysis, which investigates spatial autocorrelation. Click the Space tab and then click “Univariate Local Moran’s I”, being sure to set “count” as the “First Variable” and the weights to those you created in the last step. You will get a window asking what results you want; check all of them as shown in Figure 8.16. You will see a chart of the Moran’s I results across different counts, a LISA (Local Indicators of Spatial Autocorrelation) significant map that shows where the Moran’s I values are significant, and a LISA Cluster Map that shows the clustering of different values, as shown in Figures 8.17 and 8.18. Figure 8.18 also shows the basemap underlaid, illustrating that the clusters of high counts of food inspections surrounded by other high counts of food inspections appear to be concentrated in the heart of downtown of Chicago as well as the northside. Downtown Chicago is a hub for dining and tourist restaurants, so a cluster there makes sense. But it’s possible this exploratory data analysis may lead us in other directions, such as generating a map of known restaurants in Chicago covering this period to see if the food inspections match up with a map of restaurants. By overlaying

236

GIS: An Introduction to Mapping Technologies

FIGURE 8.15 Weights File Creation menu (MacOS).

FIGURE 8.16 GeoDa dialog box asking which results to save.

Open-Source GIS

FIGURE 8.17 Moran’s I significance map.

FIGURE 8.18 LISA clustering map.

237

238

GIS: An Introduction to Mapping Technologies

these two datasets, we could do a Bivariate Moran’s I and determine if food inspections are clustered around areas with more restaurants, or randomized. One of the advantages of GeoDa is that a wide array of spatial statistics tools needed to approach such questions are available and easy to use, such as spatially weighted regression and time series analyses. The “GeoDa Workbook” compiled by Luc Anselin (the lead developer of GeoDa and a noted figure in spatial econometrics) is an excellent resource for learning spatial statistics and GeoDa at the same time, and is highly recommended.

8.5 The Future of Open-Source GIS In the early days of GIS, tools and applications were built for highly specialized tasks and audiences, and for a time, the dominant tools were noncommercial. Now GIS is in demand and usage by nonspecialists, and while commercial tools are available for specialists and nonspecialists alike, opensource GIS are considered as viable alternatives. The time and energy spent by volunteers around the world on open-source GIS projects to keep these programs competitive with commercial GIS offerings is valuable.

References Anselin, L. and S. J. Rey. Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL. GeoDa Press LLC: Chicago, IL, 2014. Anselin, Luc. GeoDa Workbook. Documentation | GeoDa on Github. Accessed April 30, 2018. https://geodacenter.github.io/documentation.html. Graser, A. Learning QGIS: The Latest Guide to Using QGIS 2.14 to Create Maps and Perform Geoprocessing Tasks with Ease. Packt Publishing: Birmingham, 2016. McKee, L. OGC History. OGC History (detailed) OGC. Accessed April 30, 2018. http://www.opengeospatial.org/ogc/historylong. Menke, K., R. Smith, L. Pirelli, and J. Van Hoesen. Mastering QGIS: Go beyond the Basics and Unleash the Full Power of QGIS with Practical Step-by-step Examples. Packt Publishing: Birmingham, UK, 2016. Momjian, B. PostgreSQL: Introduction and Concepts. Addison-Wesley, Boston, MA, 2001. Neteler, M., and H. Mitasova. Open Source GIS: A GRASS GIS Approach. Springer: New York, 2010. Obe, R. O. and L. S. Hsu. PostGIS in Action. Manning, Shelter Island, NY, 2015. QGIS Documentation. Documentation. April 30, 2018. Accessed April 30, 2018. https://www.qgis.org/en/docs/index.html. Wegmann, M., B. Leutner, and S. Dech. Remote Sensing and GIS for Ecologists: Using Open Source Software. Pelagic Publishing, Exeter, 2017.

9 Introduction to Remote Sensing and GIS Remote sensing, the geotechnology that includes the processing, manipulation, and analysis of analog and digital images collected using devices that are not in contact with the Earth, has become one of the driving forces behind the rapid growth of GIS in the past few decades. Remote sensing analysts, a specialized category of geospatial professionals, produce stunning visualizations of environmental and human processes such as climate change, natural resource exploitation, and sprawling urbanization using a vast and growing catalog of digital satellite imagery. Digital image processing and a steady stream of new sensors and airborne platforms (built upon a strong foundation of over 100 years of mapping using aerial photography and photogrammetry) have resulted in powerful automated and reliable methods of big data visualization, analysis, and knowledge creation (Figures 9.1 and 9.2).

9.1 Remote Sensing Fundamentals Remote sensing can be thought of as a linear process that begins with an electromagnetic radiation (EMR) source, commonly designated as passive or active (see Figure 9.3). Passive energy sources exist in nature such as solar radiation from the sun or thermal emission (heat) from Earth. Most remote sensing relies on passive energy sources. An active energy source is humanproduced and generated by a sensors. Active remote sensing includes radar scanners (using microwave pulses), LiDAR (using laser pulses in the visible and infrared), and even ordinary flash cameras. In passive remote sensing using solar radiation, before the EMR reaches Earth it must pass through the atmosphere where it may be partially absorbed or scattered by different atmospheric components (see Figure 9.3b). This has the effect of reducing the amount of EMR that reaches the Earth’s surface, and occurs selectively across the visible and IR spectrum depending upon wavelength. The electromagnetic spectrum, a graphical representation of electromagnetic energy categorized by wavelength, is a useful tool for characterizing energy by wavelengths (Figures 9.4 and 9.5). Upon striking the Earth’s surface EMR is either reflected, transmitted, or absorbed. It is reflected selectively by wavelengths depending on the type and condition of any surface or material that is illuminated. Because the surface 239

240

GIS: An Introduction to Mapping Technologies

FIGURE 9.1 Chicago, Landsat 5 Thematic Mapper (TM) 24-bit color composite, channels 4–2–1, red–green– blue, April 15, 2008. This classic false-color IR image shows substantial turbidity (vivid turquoise) in Lake Michigan, and moderately robust vegetation (bright red) in lakeshore and suburban parkland.

FIGURE 9.2 Chicago, Landsat 8 Operational Land Imager (OLI) 24-bit color composite (scaled from 48-bit raw data), channels 5–3–2, red–green–blue, October 14, 2016. Notice the much more robust irrigated early autumn vegetation, lower lakeshore turbidity, and substantial expansion of O’Hare Airport. 48-bit OLI data scaled to 24-bit for VGA display.

Introduction to Remote Sensing and GIS

241

FIGURE 9.3 Remote sensing using passive sensors, depicted as a linear process.

FIGURE 9.4 Percentage of atmospheric transmission by wavelength (nm), Landsat multispectral sensor channels, 1972 to present.

of Earth is two-thirds water, EMR may be transmitted through the water depending on its clarity, and in shallow water, it may strike a surface and be reflected back into the atmosphere. Any surface can also absorb EMR raising its temperature, and this energy can be released over a period of time as longer-wavelength thermal IR EMR. Because different surfaces reflect energy differently depending on wavelength, we can characterize the reflectance of different surfaces and materials using a spectral reflectance graph (see Figure 9.6). When EMR strikes a surface it tends to reflect in a diffuse manner, but reflection in nature is never perfectly diffuse and normally contains

242

GIS: An Introduction to Mapping Technologies

FIGURE 9.5 Electromagnetic spectrum.

a specular or mirror-like component. Diffuse reflectance from a flat surface can be thought of as exiting that surface around a hemispherical space with uneven levels of reflectance in particular directions, normally biased at the angle equal and opposite to the solar azimuth and elevation. Remote sensing instruments normally target diffuse reflectance in the vertical angle.

Introduction to Remote Sensing and GIS

243

FIGURE 9.6 Spectrograph of healthy lawn grass (Kokaly et al., 2017) probably Kentucky bluegrass, from the lawn outside the USGS Spectral Library.

After reflectance has occurred the EMR must then pass back through the atmosphere (Figure 9.3d) where it is subjected to selective scattering and absorption once again. It may be detected and recorded by analog or digital sensors (e.g., cameras, scanners) operating on platforms in the atmosphere (e.g., UAVs, airplanes, helicopters), or it may pass out of the atmosphere and into low-earth space where sensors on orbiting spacecraft can record and transmit digital image data to ground receiving stations through a global network of geostationary communications satellites (Figure 9.3e). In the case of low-altitude UAVs the atmospheric effects after reflection are minimal. For higher altitude aircraft and satellites these effects are added to those from the incoming path of the EMR. All of this (from sun to sensor) takes place at or near the speed of light (approximately 300,000,000 meters/second). Concurrently, thermal IR sensors may detect longer wavelength thermal energy (heat) from earth surface features as well. After EMR is sensed by a digital device (either fixed-frame, high-resolution digital cameras or line scanners of various types) it may be split into many

244

GIS: An Introduction to Mapping Technologies

bands or channels, and defined by a typically narrow wavelength range of the electromagnetic spectrum. Each of these channels (which can also be described as colors) are typically arranged in a raster data geometry and processed as high-precision georeferenced layers. The instantaneous field of view (IFOV) of line scanners (a solid angle) defines the pixel size that relates to the resolution of the image, typically given as the ground length width or height of a pixel. Imaging devices typically define and select channels between the wavelengths of 0.4 and 2.5 micrometers (μm). These values are sometimes given as nanometers (nm) such as 400–2500 nm. Wavelength ranges for multispectral channels are defined by the purpose of a particular sensor and the available windows in the atmosphere for sensing. Sensors comprised of several discrete channels (with spectral gaps between them) are termed multispectral. Those that record data across all or part of the normal range (e.g., 0.4–2.5 μm) using a large number of very narrow continuous channels are referred to as Hyperspectral Imaging Systems. These are becoming more commonly used, on platforms from UAVs to polar orbiting satellites, with hundreds of continuous, narrow-band channels. After image data has been collected, with raw values recorded for reflectivity for each pixel in each channel, these values are then typically quantized to 8- or 16-bit positive integer values (Figure 9.3f). In the case of Landsat, 8-bit integers were used as per-pixel digital numbers (DNs) from 1972 until 2013. The 4-channel multispectral scanners on Landsat 1–3 thus produced 32-bit per pixel data through the 1970s. With the Thematic Mapper (TM) and Enhanced Thematic Mapper (ETM) scanners on Landsat 4, 5, and 7 the six-channel combination (channels 1–5, 7) produced 48-bit per pixel data— astronomically more dense than MSS data. With the launch of Landsat 8 in February 2013, image data was quantized to 16-bit positive integers. The seven multispectral channels (1–7) produce 112-bits per pixel, covering nearly the entire earth with 30 meter pixels every 16 days. The consistent and well-documented data policies of this landmark program have made it even more valuable as it passes 40+ years of (near) continuous global coverage, data thickness, and (when brought into the GIS constellation) proven analytical power. Perhaps most importantly, nearly the entire catalog of multispectral data since 1972 is in the public domain and available at no cost through US Geological Survey portals such as GloVis (https://glovis.usgs.gov/) and EarthExplorer (https://earthexplorer.usgs.gov/). The next step in the remote sensing process is a robust combination of fieldwork and analysis (Figure 9.3g). In best-case scenarios fieldwork is integrated through the entire remote sensing process. It precedes analysis, may be conducted during or concurrent with analytical work, and normally follows analysis. The types of field studies that may accompany remote sensing studies are as varied as the industries, services, and academic disciplines that use remote sensing technologies in their work. Professionals across the physical sciences (e.g., biology, chemistry, forestry, geology, geography, agronomy, pedology, hydrology, environmental science, etc.) employ remote sensing

Introduction to Remote Sensing and GIS

245

technology as a regular component of their basic and applied research. Remote sensing analysis has become a central component of many environmental analyses, and it is now commonly incorporated at several stages of work. In diverse engineering fields, LiDAR technology has become a key part of site selection and analysis, augmenting analytical and automated photogrammetry as important geotechnical services. In particular, transportation engineering has benefitted from the absolute precision and georeferencing inherent in LiDAR data services and products. Industries that are highly reliant on remote sensing services include the oil and gas, mining, forestry, and commercial agriculture sectors. In addition, numerous government agencies use remote sensing products and services. These include the US Geological Survey (USGS), National Oceanic and Atmospheric Administration (NOAA), National Weather Service (NWS), US Department of Agriculture (USDA), Environmental Protection Agency (EPA), National Aeronautics and Space Administration (NASA), Bureau of Land Management (BLM), Federal Emergency Management Agency (FEMA), and many more.

9.2 Organizational Considerations When Beginning Remote Sensing Work When considering analytical remote sensing in any organization there are several questions that should be considered, as follows (along with comments to help guide answers): 1. Do we need to acquire new technology to conduct remote sensing analyses? The answer to this will usually be “no.” If you have an up-to-date desktop computer this should suffice as a hardware platform for your work. Software is much less a problem than in the past. If you use ArcGIS or other commercial GIS you will likely already have the tools you need. ESRI, for example, has been adding remote sensing functionality over the past several major versions. There are also numerous low-cost or free software options available to meet the needs of most users for relatively sophisticated analytical work such as quantum GIS (QGIS), Multispec, GRASS GIS, and others. 2. Do we need to hire new staff with specialized skills in remote sensing analysis? The answer to this question depends on the size of your organization and perhaps the disciplinary diversity of your workforce. If you are part of a small organization and need remote sensing analysis you may not have employees skilled in the techniques and technology

246

GIS: An Introduction to Mapping Technologies

needed for this work and will need to determine whether it is cost effective to develop these skills inhouse or outscource the work to service firms that specialize in these types of analysis. Most larger organizations (i.e., work with this data) have skilled staff with at least training in remote sensing courses and/or project experience with remote sensing data, image processing systems, and related applications. In this case you can work with HR to identify possible staff members for these types of projects for providing these skills and services internally. It may be the case that these capabilities already exist and are a regularly accessed functional component of your organization. These two questions can lead to another more complicated one: 3. What types of analysis will we be able to do? Introductory remote sensing courses normally build skills that focus on two big questions. The first question, what (and where) is it, suggests the use of different kinds of techniques for automated identification of particular types of land cover, features, or even objects in digital remotely sensed imagery. Image classification aims to drastically reduce the number of possible values in individual pixels of multispectral imagery to a manageable and useful set of classes. For example, Landsat 8 OLI data contains at least 112 bits of reflectance data in the first seven multispectral channels, producing an astronomical number of possible values in each pixel. Using image classification techniques we try to reduce that enormous number to a much more manageable set of categories with descriptive names, such as “forest,” “agriculture,” “urban – high density,” and so on. We may also want to extract a small set of special values from an image, perhaps focusing on plant health, water or land temperature, or the presence or absence of vegetation. The second big question addressed in remote sensing is when (and how) something occurred. In this case, we often have to go outside our disciplinary boundaries to understand how something has occurred, in effect developing an understanding of specific processes by referencing other physical and human disciplines. For example, using remote sensing we may be able to determine that a flood has occurred and even the precise extent and volume of the flood through georeferencing and photogrammetry. We may even be able to determine with some precision when the flood occurred within the constraints of the temporal resolution of a specific sensor system. This is normally done by a process of image differencing, where an analyst subtracts georeferenced digital images of the same area (normally the older image from the newer) using the same channel (from the same range of the electromagnetic spectrum) but from different dates. This produces a difference image where pixels in the image that have not changed will be near zero, and pixels that have

Introduction to Remote Sensing and GIS

247

increased substantially or decreased substantially will be near the tails of the difference image histogram. This technique can be effective for isolating reflectance changes over large areas for particular types of land-cover change analysis. To understand how or why a flood has occurred in a particular place and time, for example, we may need to use data from other disciplines such as climatology, meteorology, and hydrology. Finally the remote sensing process that began at the sun (Figure 9.3a) ends with an analyst or team of analysts producing a set of knowledge products (Figure 9.3h) from their analysis of remotely sensed data and other peripheral geospatial and numerical datasets. This may include digital imagery or interactive web images as well as cartographic maps and thematic visualizations. Written reports link together all of this into a coherent and convincing response to a client’s requirements.

9.3 Major Land Remote Sensing Projects Since the inception of aerial photography and its incorporation into cartography and mapping allowed the relatively rapid and inexpensive mapping of much of the world during the 20th century, it should be no surprise that the incorporation of computing and digital imaging into this realm has created many new and useful geospatial information products. While the beginning of human space travel in the 1960s and the development of meteorological satellites first caused wonder and amazement, it very quickly resulted in determined activity to develop ways to create and use imagery from space for surveillance, environmental analysis, monitoring urban growth, and many other utilitarian applications. This reaction was not unlike the response to photographic images collected using early cameras from the gondolas of balloons over Paris by the Frenchman Nadar (Gaspard-Félix Tournachon) during the 1850s (Figure 9.7). The following section will discuss some of the major remote sensing projects, principally from the United States, that have shaped the terrain of civilian remote sensing today. 9.3.1 Government Aerial Photography Following World War I, many government agencies began to experiment and use aerial photography for map compilation and other tasks related to natural resource and land management. This included federal agencies such as the USGS, US Department of Agriculture, and US Army Corps of Engineers. Many state agencies and other organizations operating at the

248

GIS: An Introduction to Mapping Technologies

FIGURE 9.7 Nadar Élevant la Photographie à la Hauteur de l’Art, Daumier; Honoré-Victorin Daumier, lithograph on newsprint, Le Boulevard, 25 May 1862, currently in the collection of The Brooklyn Museum. Public-domain faithful reproduction of 2D artwork.

state and local levels also became active users of aerial photography for a number of purposes. The USGS was perhaps the most active user of camera and film aerial photography in the period between the wars, compiling a 7.5-minute quadrangle map for the first time using aerial photography and photogrammetry in 1925. Through co-op programs with each of the states they regularly photographed the entire country as part of the program to complete 1:24000 scale topographic maps of the United States. Much of the photography that was used in this program, dating generally from the 1940s, is available online as single-frame high-resolution TIF files through a number of portals, most easily through EarthExplorer (see Figure 9.8). Although not generally well georeferenced at this time (2018), these images are invaluable resources for historical studies of urban change and development, and with minimal work they can be geospatially referenced and imported into project geodatabases. There are also many other aerial image products available through the EarthExplorer portal. Images from the last

Introduction to Remote Sensing and GIS

249

FIGURE 9.8 US Geological Survey Aerial Mapping Photograph, December 13, 1951, 1:23,600 scale, B/W film. At the time, Orchard Field, which was later renamed O’Hare Field, had no commercial flights and served as an Air Force base. Within 10 years, it would be the busiest airport in the world.

15 years are tightly georeferenced, high-resolution, four-channel digital images. The EarthExplorer interface is intuitively designed for exploration and discovery—dive in! The USDA has also been a very active producer and user of aerial photography over the same period as the USGS. For decades these images were stored and provided to different parts of the agency and other related agencies from the Aerial Photography Field Office (APFO) in Salt Lake City. While the USGS tends to prefer leaf-off photography where available for easy viewing of cultural features in leafy cities, the USDA prefers leaf-on photography for viewing agricultural and natural resource features. The USDA, in cooperation with the Natural Resources Conservation Service (NRCS), offers a rich collection of raster and vector datasets through the Geospatial Data Gateway (https://gdg.sc.egov.usda.gov/) again with a relatively intuitive UI. This portal includes links to other rich resources of geospatial data such as

250

GIS: An Introduction to Mapping Technologies

FIGURE 9.9 Geospatial Data Gateway, Natural Resources Conservation Service, USDA.

the US Interagency Elevation Inventory (https://coast.noaa.gov/inventory/) from NOAA, a great source for LiDAR and bathymetric datasets. The USGS GIS data site is also an extraordinary resource for many project-specific datasets not easily available elsewhere (https://www.usgs.gov/products/maps/ gis-data; Figure 9.9). Many state agencies are also notable producers and providers of publicdomain aerial imagery, much of it in very usable digital formats. In Illinois, the Illinois State Geological Survey at the University of Illinois UrbanaChampaign provides a large collection of historical aerial imagery projects through the Illinois Geospatial Data Clearinghouse (http://clearinghouse. isgs.illinois.edu/; Figure 9.10). One of these projects, completed between 1936 and 1941, was the first complete aerial imaging of the state of Illinois, acquired by the USDA for agricultural mapping. The original negatives of this project, including over 33,000 prints, were destroyed at the National Archives during the 1970s, most likely due to deterioration and instability of these materials.

Introduction to Remote Sensing and GIS

251

FIGURE 9.10 Illinois Geospatial Data Clearinghouse, Illinois State Geological Survey, University of Illinois.

Only two complete copies of contact paper prints of these negatives existed at the Illinois State Library and the University of Illinois Map and Geography Library. Partial or regional copies of portions of the complete collection exist in several county libraries and local government agencies. The images were scanned during 2001–2008 using original prints from these collections to produce a full collection of digital images covering the state. They have yet to be georeferenced. These priceless images, and an additional collection covering the Mississippi River channel, flown in 1927 following the Mississippi flood of that year, exemplify the importance of digital archives for these types of cartographic and remote sensing materials. Examples of their georeferenced and annotated use in the Google Earth browser can be seen at http://gis.depaul.edu/geography/webmapping/ (Figures 9.11 and 9.12).

252

GIS: An Introduction to Mapping Technologies

FIGURE 9.11 Illinois Geospatial Data Clearinghouse, Illinois State Geological Survey, Illinois Historical Photographs, 1937–1947.

9.3.2 Landsat and Other Multispectral Imaging Programs What was later to become the Landsat program (originally dubbed the Earth Resource Technology Satellite, or ERTS) was conceived in the 1960s as an early response to the perception of environmental degradation and crisis, a civilian satellite focused on the Earth. Beginning in 1972 repetitive coverage of nearly the entire Earth by a medium-resolution, digital imaging satellite became a reality (initially every 18 days, but after the launch of Landsat 4 in 1982 every 16 days). This program has continued with only slight interruption to the present day, and in the age of the Internet under current US public-domain data policy, the entire catalog of Landsat imagery is available through several portals that allow contextual and geographically referenced searches for easy access. This archive is arguably the most important dataset in existence to document the onset and progression of global change in what many have dubbed the Anthropocene epoch. During the 40+ years of operation through seven satellites and multiple sensors, the technologies for multispectral imaging in the program have changed substantially. There are three main categories of multispectral sensors that have been used (Table 9.1).

253

Introduction to Remote Sensing and GIS

FIGURE 9.12 Historical Aerial Photograph, Schiller Park, Illinois, November 24, 1938.

TABLE 9.1 Landsat Satellites: Multispectral Sensors Approximate Dates

Satellite(s)

Sensor

Landsat 1–3

Multi Spectral Scanner (MSS)

1972–1983

Landsat 4, 5, and 7

Thematic Mapper (TM – L4 and L5) or Enhanced Thematic Mapper (ETM+ – L7) Operational Land Imager (OLI) and Thermal Infrared Scanner (TIRS)

1982–present (2018)

Landsat 8

2013–present (2018)

Comments For data continuity the MSS was also part of the Landsat 4 and 5 system, but it was quickly superseded by the higher spectral and spatial-resolution TM. Some data from these sensors is available in the EarthExplorer archive. The Landsat 7 ETM+ sensor operated nominally until May 2003 when the failure of a scanning component degraded the data collected by the sensor. It is still marginally operational in 2018. The OLI–TIRS sensor system has moved civilian remote sensing into a new era with more spectral channels, improved image calibration and pre-processing procedures, and other system enhancements.

254

GIS: An Introduction to Mapping Technologies

Detailed documentation of all aspects of the Landsat program are readily available online (https://landsat.usgs.gov/landsat-7-data-users-handbook). Another useful tool, the USGS Spectral Characteristics Viewer, is found at https://landsat.usgs.gov/spectral-characteristics-viewer. Other nations have also joined in collecting and disseminating remotely sensed imagery since at least the 1980s, most notably France with the SPOT (Satellite Pour l’Observation de la Terre) satellite series beginning in the 1980s, and more recently the European Space Agency with the Sentinel remote sensing platform. The Sentinel project (https://lta.cr.usgs.gov/sentinel_2) is very similar to Landsat (and in some ways superior), with 10-meter resolution (visible and near IR), 5-day repetitive coverage, and 13 spectral channels. Imagery dated from 2015 to present is now available to remote sensing analysts through the EarthExplorer portal. India, Japan, and China have also become participants in the internationalization of civilian space remote sensing. Virtually all of these programs have followed the Landsat model, with some variation, using a raster data model for the storage and processing of images, resulting in discrete multispectral imagery in the visible and reflected IR spectrum, and (perhaps most exciting for geospatial analysts) georeferencing imagery available on the web. Many private firms such as DigitalGlobe, provider of much of the high-resolution imagery used in Google Earth and other Internet mapping platforms, are joining to meet the rapidly growing global demand for high spatial- and temporal-resolution imagery. Over the past decade UAVs, commonly known as drones, have become an important and relatively inexpensive component of the remote sensing constellation. Autonomous aerial vehicles have a long history dating back to the early 20th century. Contemporary UAVs typically consist of small autonomous aerial vehicles, either fixed-wing or mini-helicopters, with a remote controlling wireless system and human operator. These devices may have different degrees of autonomous flight capability, and can be programmed to follow precise paths using onboard GNSS and perform imaging functions automatically. With the wide availability of many different types of inexpensive miniaturized sensors, and their ability to operate at very low elevations, UAVs have become an inexpensive way to collect high-resolution imagery. In the United States this explosion of largely unregulated civilian aircraft has resulted in governmental response, placing restrictions on where flight can occur, limits on elevations of UAVs, and the qualifications and training necessary to operate these devices. Of course, most of the restrictions relate to airports (Figure 9.13). At the time of writing, regulations regarding the operation of UAVs are fluid and locally specific. For example, Chicago airspace is regulated by federal, state, local, and even university regulations regarding UAV operation. Those considering using these devices as remote sensing platforms should always review and abide by all applicable regulations for their location.

Introduction to Remote Sensing and GIS

255

FIGURE 9.13 Photo by Author, sign along Highway A1A, Patrick AFB, Satellite Beach, Florida, June 17, 2018.

9.4 Digital Remote Sensing Data The wide availability of sensors, computing systems, and platforms coupled with the need to integrate these products into an array of programming and development environments has resulted in a diverse landscape of remote sensing data types and formats. The following is an overview of some of the most commonly used. 9.4.1 Sensors Commonly divided into active and passive categories, virtually all sensors operate by recording reflected or emitted electromagnetic radiation, normally within specific wavelength ranges. They range in size from large electronic systems with several modules weighing hundreds of kilograms to small micro-sized devices with low power requirements that easily mount on small- to medium-sized UAVs. They almost always structure their data as fixed-line length raster arrays. A notable exception to this are active lidar scanners where raw data is structured as point-vector 3D layers. Table 9.2 gives the typologies of some common sensors currently in wide use. Due to the rapid development of miniaturization in digital cameras and

256

TABLE 9.2 Sensor Types and Data Characteristics Digital Sensor Type

Resolution

Wavelength Sensitivity/Channels

Common File Types

Comments

Rapidly changing, currently 2.5 cm ground 4-channel imaging resolution is fairly common. Public domain common, visible blue historical imagery (scans from analog (0.4–0.5 μm) green images) and digital full-frame camera (0.5–0.6 μm) red (0.6– orthoimages can be acquired through 0.7 μm) and Near IR EarthExplorer (https://earthexplorer.usgs. (0.7–1.1 μm) gov/).

GeoTIFF, HDF, ArcGRID

These are commonly used on aerial platforms operating in the atmosphere at elevations ranging from very low (tens of meters) to very high (up to 15,000 meters). Because resolution (ground pixel size) is dependent on platform elevation, these are often used for producing orthophotos (georeferenced aerial images) as the final product.

Multispectral cross-track scanner

Common spaceborne medium resolutions range from 10 to 30 meters with frequent revisits. UAV applications can produce extremely high resolution (as fine as 1 cm) with low altitude scans. Multispectral scanner data in the public domain at various resolutions can be acquired from EarthExplorer (https://earthexplorer.usgs. gov/) or GloVis (https://glovis.usgs. gov/).

GeoTIFF, HDF, IMG

While discrete multispectral scanners have long been associated with well-known polar orbiting land remote sensing satellites (e.g., Landsat, SPOT, Sentinel) these types of sensors are now being used on UAV platforms much more commonly.

Thermal imaging scanner

Lower resolution (60–100 m) than Thermal IR ranges are multispectral sensors in spaceborne land approximately 8–15 μm, remote sensing satellites. Public-domain but the Landsat satellites thermal imagery from the Landsat with thermal channels Program (1984–present) can be acquired have taken advantage of from EarthExplorer (https://earthexplorer. the 10.5–12.5 μm range in usgs.gov/) or GloVis (https://glovis.usgs. particular. gov/).

GeoTIFF, HDF

Many meteorological satellites have thermal IR scanning capabilities, but since they are generally intended for sea surface temperature studies the resolution of these sensors (e.g., GOES, MODIS) have historically been rather coarse. Since the 1980s the Landsat Thematic Mapper and now the TIRS sensor on Landsat 8 have provided medium-resolution thermal IR sensing that has given much greater detail of thermal features on land and in near-coastal areas.

Medium resolution commonly acquire seven to ten channels across the visible, and near to mid (reflected) IR (0.4–2.5 μm). Lightweight UAV multispectral scanners may have up to five discrete channels in the visible and near IR.

(Continued)

GIS: An Introduction to Mapping Technologies

Full-frame aerial camera

Sensor Types and Data Characteristics Digital Sensor Type

Resolution

Wavelength Sensitivity/Channels

Common File Types

Comments

Hyperspectral scanner

Can range from 1 cm scale on UAV-mounted Typical maximum spectral systems to 30 m on spaceborne systems range covers 0.3–2.5 μm such as Hyperion, an experimental sensor using hundreds of on the EO-1 satellite. Hyperion data can be channels. Some sensors, acquired through EarthExplorer (https:// particularly those on UAV earthexplorer.usgs.gov/). platforms, cover smaller subsets of this range.

GeoTIFF

Hyperspectral sensors produce a continuous set of values for each pixel across a wide range of the visible and IR spectrum. Each channel covers a very narrow spectral range, normally 2-15 nm, producing hundreds of separate channels. Because this requires much more data to be processed simultaneously, the number of pixels across a single track will normally be substantially less than typical multispectral scanners.

LiDAR (light radar)

Raw lidar data is collected and stored as As an active topographic vector data, fundamentally points with sensor, lidar uses pulses three (x, y, and z) ordinate values in from visible and near IR reference to an earth model. Public-domain lasers, measuring the time lidar data for a growing number of sites are of their reflected return available through EarthExplorer (https:// using the speed of light. earthexplorer.usgs.gov/). Normally sensing in the near IR, precisely positioned sensors generate a point cloud of classifiable 3D data.

LAS, LAZ

Lidar has rapidly become a technology driving innovation in diverse fields such as archaeology, GIS and surveying, geology, hydrography, etc. Operating from UAV platforms, low-cost LiDAR sensors allow production of relatively inexpensive, precise, and rapid digital elevation models (DEM).

Radar

Shuttle Radar Topography Mission (SRTM) data (February 2000) can be acquired at 1-arc second (approximately 30 m) resolution for 80% of Earth’s land area. Available through GloVis (https://glovis. usgs.gov/).

GeoTIFF

The SRTM mission was conducted on a shuttle flight in 2000. The radar equipment had been previously flown on two shuttle missions in 1994. It was modified for the 2000 mission to collect return signals using two separate antennas on the shuttle, allowing highresolution elevation data to be acquired for land areas between 60° north and 56° south.

257

An active sensor, the Shuttle c-band radar collected interferometric data in the 5.8 cm microwave range. In ranges used for topographic mapping the advantage of cloud penetration has made radar imagery highly desirable for use in tropical regions.

Introduction to Remote Sensing and GIS

TABLE 9.2 (Continued)

258

GIS: An Introduction to Mapping Technologies

increasing widespread availability of UAV platforms, this is a dynamic area of sensor integration with GNSS and application development. The capabilities and sizes of the cameras used in aerial sensing have changed rapidly over the past decade and include greater sensitivity and precision in spectral discrimination and much finer resolution. Resolution is typically given as a linear ground-distance measurement of one side of a single pixel. Wavelength values or ranges are given in micrometers or centimeters (microwaves). Filetypes referenced in Table 9.2 for specific sensors are those commonly used in the systems discussed.

9.5 Remote Sensing Applications When used in GIS, remote sensing is often used to answer the two questions discussed in section 9.3.3. The first question is relatively simple, but because the subjects of GIS and geography are so complex (encompassing the world and everything in it) answering it can be quite involved. • What (and where) is it? In remote sensing this question would most likely be answered using some form of image classification, either unsupervised or supervised. In unsupervised classification an analyst determines the number of desired classes and using the ISO Cluster routine (ArcGIS Desktop 10.5, ISO Cluster Unsupervised Classification) operating on a multispectral image composite across visible and IR channels completes an iterative grouping, placing each pixel into the class it most closely resembles based on reflectance values recorded for that pixel. Field work then establishes the most likely identity of each class of pixels. Supervised classification starts with the analyst creating small training samples, vector polygons that outline small areas of land cover to be identified. Statistics from each of the multispectral channels in an image composite are collected using the training sample polygons and used to establish a “spectral signature” for each training sample. Using any number of different classification algorithms, these signatures are then compared to each pixel and each pixel is then categorized into one of the classes derived from the training sample. These classified images can then be used to analyze land cover. The “where” part to the question is answered by the tightly georeferenced imagery being used in the context of GIS. The resulting classified images are georeferenced using the same parameters as its parent images. The second big question follows almost intuitively: • When (and perhaps why) was it located there?

Introduction to Remote Sensing and GIS

259

The when question can often be answered by performing image differencing change analysis. This technique is used more generally to establish widespread changes that have occurred in an image area between any two points in time for which we have good multispectral imagery. Another possible resource for exploratory work on the “when” question is the Google Timelapse project (https://earthengine.google.com/timelapse/). There are a number of prerequisites to performing this type of analysis: • • • •

Tightly georeferenced raster imagery from two dates Completely cloud free (or as much so as possible) Imagery from the same channel or same spectral range Imagery with the same bit depth (8- or 16-bit normal)

Precise georeferencing ensures that pixels covering a particular feature or area in one image covers the same area in an image from a different date. Cloud-free images ensure that clouds (that will be detected as an area of great change) do not bias or corrupt the analysis. Imagery from the same channel or spectral range ensures that we are comparing reflectance in the same part of the spectrum in both images. The most common problem in performing this analysis with Landsat imagery regarding bit depth occurs when using Landsat 8 and previous Landsat data. Landsat 8 sensors collect data using a 16-bit integer (65,536 possible values), while previous Landsat satellites recorded reflectance data using an 8-bit integer (256 possible values). We normally down-convert Landsat 8 data to 8-bit integers before performing image differencing with data from an earlier Landsat multispectral sensor. After performing a simple radiometric correction to account for different sun elevations at different times of the year, we apply image differencing onto appropriate single-channel images. The outcome of this calculation is a difference image. When viewing the histogram of this image we can see that the form of the histogram is near normal and that the mean of the image data is relatively near zero. The closer to zero the mean of the image is gives us some idea of how successful our radiometric correction has been. The tails of the histogram are areas in the image that have changed substantially between the two dates, and those we are most interested in. If we subtracted the older image from the newer image, the highest values from the difference image would be the pixels that have become substantially brighter. Areas that have become substantially darker will be in the left or negative tail of the histogram. These values can then be classified either manually or using some other logic (Figure 9.14). It is also possible to link together two or more change analyses to compare rates of change in a particular area over longer periods of time (see Figure  9.15). For purposes of comparison, the analyst should determine the number of days between successive images and then normalize the

260

GIS: An Introduction to Mapping Technologies

FIGURE 9.14 Temporal image differencing using Raster calculator. Image 1 (newer) – Image 2 (older) produces a difference image where 0 values are no change, significantly positive values are substantially brighter and significantly negative values are substantially darker.

FIGURE 9.15 Sequencing temporal change analyses.

Introduction to Remote Sensing and GIS

261

determined degree of change as an area change/day unit of measurement (e.g., square kilometers/day). This produces comparable numbers between sequential change analyses of different durations, from events such as wildfire, drought, or deforestation. ACTIVITY 9.1 EXPLORING IMAGE ANALYSIS USING ARCGIS10.5.1 This lab activity will introduce you to image analysis using ArcGIS, providing a quick introduction to some of the more commonly used functions of this system. The data we use will comes from the EROS Data Center (http://eros.usgs.gov/) using an online portal called EarthExplorer (http://earthexplorer.usgs.gov/). There is a list of useful YouTube videos at the end of this activity. What You will Need

1. Dedicated flash drive (at least 4 GB capacity) 2. Notebook (for writing stuff down as you do it) 3. Pencil with eraser

Learning Goals

1. Learn to browse, select, download, and unpack georeferenced Landsat satellite data from the principal US archive for these data 2. Understand how to bring these data into ArcGIS and process them for high-resolution, large-format 24-bit color output 3. Analyze and interpret 24-bit color composite images 4. Learn the fundamental characteristics of Landsat data 5. Understand how to create a cartographically proper layout and export it as a georeferenced pdf

Starting Out… ArcGIS 10.5.1 Desktop (A10) is a complex application with several major modules that either run as components within the principal gateway (ArcMap) or as standalone applications. These include ArcCatalog, ArcGlobe, and ArcScene. Most of the work we will do will be accomplished using ArcMap. Start the application by clicking the Start menu (lower left) and selecting ArcMap from the list of programs. The UI can be customized by the user, mainly by opening and closing toolbars. The toolbars are the most variable and customizable part of the UI, and are accessed through the Customize menu. They remain open on the desktop until you close or move them. You can also have a number of windows open and anchored vertically on the right or left (e.g., ArcCatalog, ArcToolbox, Image Analysis window). You may also need to access extensions. These are separate modules that perform functions within A10, and may also be run as separate applications. This is only a quick introduction to help you get started. As you work you will become more familiar and comfortable with the workspace.

262

GIS: An Introduction to Mapping Technologies

Saving Your Work There are normally two options for saving and sharing your work in ArcGIS: either as a map file (mxd) or map package (mpk) file. In general, a map file is less portable than a map package since it does not contain the data used to make the map. If you want to take everything with you to make your map completely portable then create a map package. Assuming that you will be returning to the same workstation you will normally use the map (.mxd) file option. To see detailed instructions on these two procedures read the help sections “saving a map” and “creating a map package.” Remember if you create a map package all the data contained in your map is saved in the package. This means these files can be quite large, especially when they include raster layers. The new map package file should be slightly larger than the sum of all the data layers used to create it. In A10 most of the modules we will use in this course can be accessed either through the ArcToolbox, accessed through the Geoprocessing menu. These resources should be referred to from time to time to build familiarity. The Toolbox is a vast collection of modules that can be accessed through the UI and includes toolsets that are thematically arranged, along with individual tools normally deployed through dialogs requiring user input. The Catalog is used to do many different types of file management and processing functions—it can always be found in the Windows menu in ArcGIS. When it is anchored on the side of the Desktop it gives you quick access to the folders and files you need (Figure 9.16). The Image Analysis window packages some of the more commonly used tools from the Toolbox in a similar but thematic UI—it can be found in the Windows menu in ArcGIS. While it is not necessary to use this resource in this activity, take a look at it and get comfortable with its characteristics. A10 is a large and somewhat complex system, but once you get comfortable with working in this environment you will appreciate this complexity and the analytical power it offers. Accessing Satellite Imagery Using EarthExplorer The raster image files we are working with in this activity are Landsat Thematic Mapper (TM) images from Landsat 4 or 5, Enhanced Thematic Mapper (ETM) images from Landsat 7 acquired before May, 2003, or Operational Land Imager (OLI) images from Landsat 8 that have been georeferenced to the UTM coordinate system, a format known as GeoTiff. Landsat 8 (launched in February 2013) is the newest satellite in the Landsat program, which dates back to 1972. We begin by learning to download and unpack image files from Landsat using EarthExplorer (EE). This site provides easy access to a massive collection of Landsat, EOS, MODIS, AVHRR, and other civilian image datasets (Figure 9.17) and can be found at http://earthexplorer. usgs.gov/. The site provides intuitive tools for delivering data in a very usable format. There is also a help page at https://lta.cr.usgs.gov/ee_ help. We highly recommend a quick pass through the EE Tutorial on this help page before using the system.

Introduction to Remote Sensing and GIS

FIGURE 9.16  Image analysis window from ArcGIS Desktop 10.5.1; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

To download Landsat data from the EE you need to register using the link on the upper right. Once you are logged in we recommend that you use the Landsat Collection 1 Level-1 data found inside the Landsat archive, although you can look at any other of the collections. Landsat 8 is the most current system and has substantial improvements to previous Landsat satellites. It usually helps to search first for an area that you have some familiarity with, through travel or perhaps some place you have lived. When you

263

264

GIS: An Introduction to Mapping Technologies

FIGURE 9.17  EarthExplorer portal at https://earthexplorer.usgs.gov/.

have found an image that you are interested in studying more carefully begin your review using EE. Look for images that are clear and as cloudfree as possible. The toolbar at bottom of the left frame allows you to do a number of things that help you review the image quality and suitability. Explore these. The most important for us is the Download Options button, as shown in Figure 9.18. There may be different products available here, but the one that we use to do detailed work with Landsat data is a Level 1 Product. If this option is not present there are other ways to access the data (that are relatively

Introduction to Remote Sensing and GIS

FIGURE 9.18  EarthExplorer download options for Landsat 8 OLI imagery.

slow and tedious), but for this activity it is probably best to search until you find an image with this type of product available. Click the “Select Download Option” button. This will download a file with the ID name of the image you were viewing to your default download location. To work with these data your system will need to have a file extraction utility like 7-Zip installed (http://www.7-zip.org/). This utility will allow you to unpack compressed files in several formats, including GZIP, TAR, and ZIP. When you have downloaded your image archive from EE, extract this file with the 7-Zip file manager by right-clicking the file and opening with 7-Zip. After doing this extraction, take the extracted file and do the same process again. This double compression (kind of like a box inside a box) minimizes the size of the file being transferred and increases download speeds—the first file extracted is a GZ filetype, inside this is a TAR filetype. This second file (TAR) contains a number of files as shown in Figure 9.19 Extract these files to a secure location, in a new folder. If you have downloaded an image from Landsats 4, 5, or 7 the TIF files are

FIGURE 9.19  Landsat 5 TM channels from EarthExplorer, GeoTiff files for channels 1–7, georeferencing control points, and metadata.

265

266

GIS: An Introduction to Mapping Technologies

georeferenced raster images, channels 1–7 Landsat TM. These contain 8-bit values (integer values from 0–255) that are recorded reflectances for the particular spectral range of that channel. The filenames contain information that identifies the satellite, date acquired, and Landsat Thematic Mapper (TM) channel (e.g., “B10” refers to channel 1). Channel 6 is the TM channel (emitted energy not reflected) on Landsat 4, 5, and 7 and is at a lower resolution than the other multispectral (reflected) channels. In some applications, this may hamper its use in making a multichannel color composite, although A10 handles georeferenced images at different resolutions very well. There are also georeferencing and metadata files included. The MTL file, for example, contains metadata that describes many of the technical parameters of each image. This file can normally be opened in ArcGIS and is easy to read. You should place all of these in a folder with a descriptive name and archive them (back them up) in a secure location off the machine you have used to do the download. File/data management is an important skill. Try to keep folders and files in organized, logical order. When you extract image files from compressed archives name the folders they are stored in using descriptive filenames. This will become more important as the numbers of files, folders, and datasets begin to multiply. Working With These Files in ArcGIS There are a number of ways to work with these data. First, we will explore some of the generic tools in all versions of ArcGIS. You should begin by adding all of the multispectral channels to your project using the “Add Data” button. When they are added you should see them in the Table of Contents—they are arranged in your Data View like a stack of pancakes— the one on top is the one you see. You can click them on and off using the checkbox in each layer. Try to add them in order (e.g., channels 1 through 7. Single-channel images are usually displayed using a grayscale palette. In this palette, darker tones have lower reflectance values and brighter tones have higher values. You can change the palette using the Symbology tab in the Layer Features dialog. There are several analytical functions in ArcGIS Desktop that allow you to perform some of the main tasks used in remote sensing. The Data Management toolbox is a rich source of functionality. Most of what we will use here can be found in the Raster Processing toolset. In our current activity we will use the “Composite Bands” tool. Clicking this tool will open a dialog that allows you to create a stack of layers from the dataset you just downloaded. You will then be able to swap channels in and out from this stack into the RGB slots of the display system using Properties > Symbology controls on the multispectral composite, allowing you to create many different possible color composite images (Figure 9.20). Right now you will create your multiband image using channels 1–7, but at times you may not want to use all the channels. The order that the bands are listed in the Multi-value Input control box will determine the order of the bands in the output raster dataset. The channels will be renumbered 1–7 so be sure to keep them in the correct order to start with.

Introduction to Remote Sensing and GIS

FIGURE 9.20  Composite bands dialog with seven Landsat 5 TM channels loaded; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

You can save your output from the Composite Bands function to a number of different filetypes. You must specify the file extension when naming your output raster file if you are sending it somewhere other than the default geodatabase. If it is going to the default then you should leave the filename as is. Remember this file will be large. The output raster dataset takes the cell size from the first raster band in the list. By default, the output raster dataset takes the extent and the spatial reference of the first raster band with a spatial reference in the list. You should also pay careful attention to where the multiband image is being saved. This may take a while so be patient. Data is added to a data frame the same way it is added in any other ArcGIS application. After you have created your composite, spend some time trying out different combinations of channels in the red, green, and blue slots of the display system. You can use the Identify tool to view the red, green, and blue values in any pixel—use this tool to explore the different hues and tones in your image. The colors are a combination of 8- or 16-bit integers (probably displayed in the 24-bit space of VGA color used in your graphics card) in the sensor channels you have chosen for the red, green, and blue channels. Consider Figure 9.21. In the selected pixel (at the green “x”) of this Landsat 8 image from October 14, 2016, using channels 5, 3, and 2 for red, green, and blue, the bright red hues are in channel 5 where a brightness value of 15321 is recorded, twice the value of the other two channels (visible red and green). This suggests a well-irrigated fairway, at the Riverside Golf Club in suburban Riverside, Illinois. Remember that the order in which you place the channels in the composite dialog determines their order in the multiband image—they are renumbered in the composite image in a sometimes confusing way. Refer back to the Landsat pages above to determine the wavelengths of the different channels. Try to find a combination that seems to have the most visual information content, then move on to the next step.

267

268

GIS: An Introduction to Mapping Technologies

FIGURE 9.21  Using the Identify tool to investigate pixel-level reflectance values from 24-bit color composite; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Creating a Layout Zoom in to a particular feature you are interested in (say a city, mountain range, lake, etc.) and use the Layout module to make a nice visualization of that image. You can find a useful introduction to the Layout module at the help topic “A quick tour of page layouts.” Make sure you uncheck the “Use Printer Paper Settings” box, and then set the size to 36" by 24" in the Page and Print Setup of the File menu. Select “Landscape” and click the “Scale Map Elements proportionally...” box. Make sure to insert a descriptive Title, North Arrow, Scale Bar, and Text box that describe all the technical particulars of your image. Your text box should include technical details that are extracted from the metadata file included with your image such as the date the image was acquired by the satellite, channels used in the composite with their wavelength ranges, sun elevation, sun azimuth, etc. Get in the habit of including this technical data in each image you publish.

Introduction to Remote Sensing and GIS

269

You should also insert another text box with very small font (~12 point) for the name of the person responsible for this. Give yourself at least a 1-in. margin around the edges of your layout to make certain it all makes it onto the pdf you will create next. When you have your map/image where you want it to be in the Layout module, use the File > Export Map... function to create a pdf copy of your map, including your surname(s) in the file name and the settings as shown below. Save this in a secure location. We highly recommend that you also check the box in the “Advanced” tab to Export Map Georeference Information (see the following). Exporting a map with the Export Map Georeference Information option enabled will record each data frame’s georeference information inside the pdf. When the georeferenced pdf is opened in a compatible viewer, such as Adobe Reader 9 or later, the user can access geospatial functions such as coordinate readout and finding x, y (more information on this can be found in ArcGIS Help). Get in the habit of opening this file before submitting to make sure you have a valid pdf for your portfolio, among other things. Finishing Now that you are almost finished, write a short (about 250 words), articulate, technical description of the process you just completed. A good technical description should: • describe the area you have chosen to map, pointing out some interesting aspect of the image (e.g., the reason you chose particular colors or the combination of brightness values) in your analysis; • explain in clear language the process you used to create your image using ArcMap; • clearly state the TM, ETM, or OLI channels you used to create your three-color composite image noting the spectral ranges and explaining the general color patterns in your image; and • include technical data from the metadata file, especially the image ID, acquisition date of the image, solar elevation and solar azimuth, spacecraft, and sensor.

References Jensen, J. 2015. Introductory Digital Image Processing: A Remote Sensing Perspective, 4th Ed. Pearson: Glenview, IL. Kokaly, R. F., Clark, R. N., Swayze, G. A., Livo, K. E., Hoefen, T. M., Pearson, N. C., Wise, R. A., Benzel, W. M., Lowers, H. A., Driscoll, R. L., and Klein, A. J. 2017. USGS Spectral Library Version 7: U.S. Geological Survey Data Series 1035, 61 p. doi:10.3133/ds1035.

270

GIS: An Introduction to Mapping Technologies

Lillesand, T., Kiefer, R., and Chipman, J. 2015. Remote Sensing and Image Interpretation, 7th Ed. Wiley: Hoboken, NJ. Rudd, R. 1974. Remote Sensing: A Better View. Wadsworth: Belmont, CA.

Video Resources Opening Application (http://youtu.be/oYh-NiTUyQw) Arcmap Raster Options (https://youtu.be/toTJtWKtosg) Help System (http://youtu.be/NGAzPEKN0YQ) Table of Contents (http://youtu.be/VRvr_afHoXw) Toolbars (http://youtu.be/5EMG4Y4soUg) Extensions (http://youtu.be/aRRGpRcus7w) Saving an mxd (http://youtu.be/fW2WyggF-Gs) Toolbox (http://youtu.be/E2rdRQlsMio) Catalog (http://youtu.be/6CjohuTDTzQ) IA Window (http://youtu.be/w6Mpq3i3rKw) Earth Explorer (http://youtu.be/wRTLs7lH-JM) Composite Image (http://youtu.be/djTAi45O3tg) Layout Intro (http://youtu.be/dAR2ews8nSY) Layout 1 (http://youtu.be/fRmpzVCIm-s) Layout 2 (http://youtu.be/cJdk9LUViiM) Layout 3 (http://youtu.be/1j9KbXs88i0) Layout 4 (http://youtu.be/wqF4j3uQf1Q) Using Basemaps (http://youtu.be/A71NZDH118E) Exporting a Pdf (http://youtu.be/WUYFPGOyexE)

10 Introduction to Health GIS Applications

10.1 Introduction Many contemporary textbooks about cartography and medical GIS relate the story of John Snow and his mapping of the 1849 cholera epidemic in London as the start of medical mapping, if not the first geographical analysis that paired mapping human events with statistical analysis. His seminal publication, “On the Mode of Communication of Cholera,” suggested that cholera was transmitted through water rather than “bad airs” or “miasma,” which was the predominant ideology at the time. He conducted surveys and analyzed the spread of cholera and found that most cases were involved in some way with the “Broad Street Pump.” Soiled linens of a cholera carrier had been dumped into the well water source, causing cases of cholera during the epidemic to be traced to the pump. Supporting his argument, a local brewery’s workers (who typically drank beer from the brewery during their work day that was uncontaminated due to the boiling process of the manufacturer) in close proximity to the pump were spared from the cholera epidemic. John Snow illustrated the epidemic in the form of a map, which showed deaths by residence as ticks and made a strong visual case for the Broad Street Pump to be removed to help the cholera epidemic, shown in Figure 10.1. However, the impact of John Snow’s cholera map is often overstated. First, by the time the Broad Street Pump’s handle was removed, ending access to the water, the epidemic had already “burned out,” meaning that the epidemic had slowed down and was not generating enough new infections to continue the epidemic. Secondly, at the time the miasma theory of transmission of disease was still the predominant view and it was several decades before the scientific consensus shifted to the contemporary germ theory of disease. Nevertheless, the innovation of John Snow’s approach is now widely known and heralded as one of the founding moments in epidemiology and considered as an example of early public health mapping. Today, GIS are essential tools for many epidemiologists, public health advocates, and others in the health sector such as insurance adjusters and emergency responders. You will find GIS on computers in the Environmental Protection Agency and at the Centers for Disease Control and Protection. 271

272

GIS: An Introduction to Mapping Technologies

FIGURE 10.1  John Snow’s cholera map.

In  this chapter we will cover GIS tools used in the health sector, but also those used in other sections. Note that in addition to ESRI ArcMap, this chapter uses GeoDa and QGIS software packages, which are free and open source and covered in more detail in Chapter 8.

10.2 Accessibility Public health practitioners are often concerned with accessibility. Using GIS we can investigate whether a neighborhood has a higher proportion of senior citizens (controlling for area and population) than other neighborhoods, and if so, whether it would be prudent to plan for increased handicap accessible transit stops in the area due to increased mobility issues among

Introduction to Health GIS Applications

273

the elderly. We can also use GIS to determine whether neighborhoods that have increased risk of HIV/AIDS have clinics accessible to them that offer STD testing or other related services. Especially in urban regions, the transit distance between people at home and a health clinic is not just a matter of point A to point B; the transit distance is affected by road conditions, traffic lights, one-way streets, and other barriers. We call a distance between two points that is direct Euclidean distance. If you were traversing a park from one end to the other with nothing between both ends besides grass, you would be able to walk a straight line according to Euclidean distance. On the other hand, Manhattan distance refers to distances that are not entirely straight lines, which might include turns and twists similar to city roads (and the term gets its name from the Manhattan borough of New York City). When approaching the question of how many people a health clinic can serve, there are many factors to consider that go beyond only the Euclidean distance and beyond road networks as well. In GIS, the term “network analysis” is used to describe operations that investigate transit times, service areas, and other GIS problems that depend on knowing and understanding a road network. Network analysis in GIS is a complex topic that incorporates many insights and algorithms that originate from the fields of computer science and mathematics. Dijkstra’s algorithm, invented by computer scientist Edsger W. Dijkstra in 1956, is especially relevant here as decades later it is still used both in computer science and GIS. The network analysis tools in ArcMap and other GIS tools still use a variant of Dijkstra’s algorithm to find the shortest distance between two points by the cost to travel. The algorithm works by simulating every combination of paths from an origin to a destination and then chooses the path with the lowest cost. For efficiency purposes, if we find that a given Path A from an origin to the destination will cost X, and another potential path starts to exceed X even without completing the journey, we know that Path A will remain the shortest known path. A visualization may help here. Figure 10.2 shows five nodes: A, D, C, F, and E. Connecting the nodes are lines with set “cost,” which represents a variety of real-world conditions such as windy roads with low speed limits that cause travel to be slower, high-speed highways, temporary road conditions such as construction, etc. Imagine that we want to travel from node A to node F, as shown in Figure 10.2. Even though traveling from A–C–F (total cost of 8) would only involve traveling between three nodes, it would have a higher cost than traveling A–D–C–F (total cost of 7). We know very quickly that the other routes are not feasible, with A direct to F having a cost of 50, A–D–E–F having a cost of 46, and so on. Based on what we know about the costs along path A–D–C–F, it seems to be the most efficient path. As you might imagine, the real world has more complex topologies than this, and also more complex costs that can be hard to determine. Imagine roads near schools; during mornings and early afternoons, they have heavy restrictions (speed limitations, school buses that make multiple stops, etc.) and heavier traffic, but

274

GIS: An Introduction to Mapping Technologies

FIGURE 10.2  Network analysis paths.

during other times they may be part of an optimal path. Ambulances also have different rules of the road than civilian vehicles. The more carefully we consider these and other factors, the more accurate and complete our analysis of cost-distance will be, but the more complicated the process gets. Finding up-to-date data on roads that is topologically correct and includes attribute data that present a most complete picture of road conditions and hazards can be a challenge for GIS analysts and often demands significant data preparation and cleaning even when available. This can make commercial solutions valuable. If you are part of an institution that has the ArcGIS product suite, you might have access to an ArcGIS Online organizational account. ArcGIS Online also offers trial accounts and personal accounts, but keep in mind that for network analysis you may have to pay separately for Service Credits if you do not have an organizational account. In ArcMap, after signing in to your ArcGIS Online account within the application, you will have access to “Ready to Use Services” in the ArcToolbox that use ArcGIS Online servers to complete processing (rather than such processing occurring in the desktop application). This can simplify network analysis since, instead of having to provide a topologically correct roads network dataset for your study area, or having a desktop network analyst license, with a few clicks you can run the needed analysis off of ESRI’s servers using their prebuilt network datasets. Figure 10.3 shows the Ready to Use Services available from within ArcMap. After signing in to ArcGIS Online through ArcMap, then opening ArcToolbox, scroll down to the “Ready-To-Use Services,” expanding the folder and then expanding Logistics and World. “ClosestFacility” can be used to determine the closest facility to a point while accounting for traffic and cost of traveling, mimicking the “Closest Facility Analysis” tool within ArcMap. “LocationAllocation” runs the Location-Allocation analysis tool that chooses which of a given list of facilities is most suitable to serving a hypothetical population. “ServiceAreas” runs the “Service Area Analysis” tool and computes the surrounding area

Introduction to Health GIS Applications

275

FIGURE 10.3  Ready-to-use services within ArcMap’s ArcToolbox; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

that a facility can serve given the cost distance surrounding it. Other features involving routing can also be called through this service. You can read more about the ArcGIS Online Ready to Use Services for Network Analysis at http://desktop.arcgis.com/en/arcmap/10.5/extensions/network-analyst/ consuming-the-logistics-services-from-arcmap-10-2-.htm. All the tools you would use if following the Hosted Services options exist in ArcMap, just in different locations (and slightly different names in some cases). To find them, go to “ArcToolbox” and then click on “Network Analyst” and then expand “Server” as seen in Figure 10.6. You should see “Generate Service Areas” and “Solve Location-Allocations,” for example. If we were to click on and open Generate Service Areas tool, we would be presented with several options to best determine the constraints of a service area given the facilities we pass as input. Running the Generate Service Areas allows us to simulate, for instance, how far from a given hospital facility an emergency vehicle can drive within set break points of time, how far in each direction an emergency vehicle can drive within 8 min of driving time, and so on. The tool uses data about network topology and given restrictions and time constraints and generates polygons that represent the service area given the parameters. In Figure 10.4, for example, we see the option to set whether uturns at junctions are allowed; in some municipalities, everyone can do a uturns, but in others, only emergency vehicles are allowed. We can also set whether the service area would be of a taxi, a bus, or an emergency

276

GIS: An Introduction to Mapping Technologies

FIGURE 10.4  Generate service areas prompt; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

vehicle among other examples; if we set “Impedance” to Drive Time and set the break values to be in minutes to generate service areas in polygons. One of the key disadvantages of using the Ready to Use Services available within ESRI ArcMap is that you need to consume ArcGIS Online “service credits” to access them. However, with some additional preparation, freely available roads datasets can be used in network analysis, albeit with more work. For example, ArcGIS Editor for OpenStreetMap can be used to designate a region to download from the OpenStreetMap database and convert it into a network dataset (http://www.esri.com/software/arcgis/extensions/openstreetmap). Previous versions of ArcGIS Editor for OpenStreetMap can be found at https:// github.com/Esri/arcgis-osm-editor/wiki. OpenStreetMap is a volunteerrun project where individuals contribute data about street-level changes and other information, such as speed limits and one-way streets. It is organized similar to the Wikipedia collaborative encyclopedia project; volunteers sign up using accounts and contribute information. Protections are in place to prevent “vandalism” of the mapping project by malicious users, so you can be reasonably sure that information will be accurate (but may be outdated or missing, depending on the region of focus). With ArcGIS Editor for OpenStreetMap, you can directly import OSM data into ArcMap by creating and importing the downloaded data into a network dataset for use in network analysis. To get started with a network dataset in ArcMap, you have to create a “file geodatabase” for the eventual feature dataset to be stored in, which can be done through ArcCatalog. Once you have the file geodatabase created and the ArcGIS OSM Editor tool installed (which can be found within ArcMap through the ArcCatalog nested in the System Toolbox), expand the System Toolboxes to show the OpenStreetMap Toolbox as seen in Figure 10.5 (OSM Toolbox image). The “Download, Extract, and Symbolize OSM Data” tool will download OSM data directly into ArcMap based on your settings.

Introduction to Health GIS Applications

277

FIGURE 10.5  OSM toolbox; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

Once you open the tool, you will see a few options, as shown in Figure 10.6. The Extent of Data Download is especially important since downloading from the OpenStreetMap servers in this fashion is limited to 50,000 nodes (at the time of this writing) each time you run this tool. What that means in practical terms is that you will likely have no issue downloading OSM data if your screen is centered around a small city or town, but if you have your display or extent set to a densely populated area such as Chicago or London, it will likely exceed the 50,000 nodes and fail to download. A node, in this case, refers to data points such as a storefront, an intersection between streets, etc.; the denser the area, the more careful you have to be with extent while using this tool. Also note the “Extract OSM Tags into Standalone Attributes.” Checking this box ensures that, in addition to streets, you are also getting important attribute information relevant to analysis, such as speed limits, which can be used in network analysis. Set the Target Feature Dataset to the file geodatabase created earlier and it will create a symbolized OSM network dataset that can be used in network analysis. You may run into issues downloading for dense, urban areas using this tool. As an alternative, you can request regional extracts from OSM, or even the entire “planet.osm” dataset that contains all OpenStreetMap data for the entire planet. As of 2018, this dataset is over 60 GB in size! Since OpenStreetMap is a nonprofit organization run by volunteers and donations and large downloads costs money and resources, be sure to download only

278

GIS: An Introduction to Mapping Technologies

FIGURE 10.6  OSM download tool; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

the regional extracts relevant to your scope of analysis. More on downloading regional extracts (rather than using tools like ArcGIS OSM Editor tool with built-in limits) can be found on the OpenStreetMap wiki: https://wiki. openstreetmap.org/wiki/Planet.osm#Country_and_area_extracts. Figure 10.7 shows an example of running a Generate Service Areas tool.

FIGURE 10.7  Example service area (from ArcGIS documentation).

Introduction to Health GIS Applications

279

10.3 Accessibility in Design Accessibility also refers to ensuring our results and maps are accessible to those with visual impairment. As maps are an inherently visual medium, this typically involves including as much semantic and descriptive information encoded into GIS as possible, so that screenreading programs can accurately reproduce information encoded in a map through speech synthesis programs. It is sadly the case that these map-making steps to provide accessibility are not always taken. Being mindful of how those with color-blindness is another aspect of map design that GIS practitioners should consider it. Some estimates suggest that the worldwide prevalence of colorblindness is 4.5% of the entire population. It is a common misconception that color-blindness means the lack of ability to perceive colors; typically, colorblindness presents as a decreased ability to discern between red and green, especially if the hue of the two colors are similar, otherwise known as protanopia. This means if some basic considerations are kept in mind, maps can be made accessible to people with colorblindness. Luckily, these same considerations are also a part of good map design. Dr. Cynthia Brewer, a cartographer and researcher on color theory, has several tools and resources related to balancing color in maps available to the GIS community. One of the tools created by Dr. Brewer, ColorBrewer (http://colorbrewer2.org), allows users to simulate color schemes, with the option to make them “colorblind friendly.” When checked, users can export color schemes by hex value. Hex values allow the precise calibration of a color value to be communicated. “Hex” is short for hexadecimal notation; a hexadecimal number has values that range from 0 to 15, with values above 9 being represented by letters (A would be 10, F would be 15). There are six digits in hex color values, with two digits corresponding to red, two digits corresponding to green, and two digits corresponding to blue, making up the “RGB” color composite value. The two digits of each color value are multiplied to create up to 256 values of each color intensity; for instance, a hex color value of “4C” would be a value of 76, and a hex value of 01 would be a value of 1. In the RGB color model (used by most digital devices), variants of colors are created due to different intensities of red, green, and blue. For example, with full intensity of each color producing white, or a mix of red and green with low blue intensity producing yellow. So a hex color value of #FFFFFF would have the max intensity for each color value (256, 256, 256) and be paper-white. A solid yellow would be #FFFF00, with the max values for red and green and zero for blue. Figure 10.8 shows the ColorBrewer website. In practical terms, choosing hex color values during map design that do not overlap too closely will produce maps that are less confusing to all users. Using the ColorBrewer tool, you can prototype color schemes and then save the hex values for use in GIS programs. Most GIS programs (and

280

GIS: An Introduction to Mapping Technologies

FIGURE 10.8  ColorBrewer website.

FIGURE 10.9  QGIS screenshot showing how to specify colors by hex value.

Introduction to Health GIS Applications

281

other image processing software) allow users to input hex values rather than choose from a color picker or color wheel. Figure 10.5 shows an example from the QGIS Symbology tab, under “HTML Notation.” Being selective about the colors used in map classification is good practice for considering those with color deficiency, but it is also good practice for overall map design as going beyond five color classes reduces the viewer’s ability to discern patterns (Figure 10.9).

10.4 Cartograms A “cartogram” is a popular data visualization technique that distort areas of a map based on the intensity of values. Prior to computerized GIS, cartograms were constructed painstakingly by hand. Modern image processing algorithms have enabled quicker and easier cartograms that display exaggeration of regions according to value. For example, an epidemiological cartogram that exaggerates region based on density of new cases of an illness. Figure 10.10 shows a cartogram constructed using QGIS. For information on installing and using QGIS, refer to Chapter 8. Notice that in Figure 10.10 we have not only the cartogram but also a complete legend and layout. We can work backward from the image to understand how it was created through QGIS. Figure 10.11 shows the Plugins view of QGIS. Searching “cartogram3” shows us the Cartogram plugin in QGIS, which then shows up in the Vector menu once installed. Running the Cartogram plugin allows us to specify an input layer and then the variable that will be exaggerated, along with a number of iterations that will be performed. The more iterations, the more exaggerated the cartogram will be. But how do we configure the layout and legend of QGIS as in Figure 10.10? In QGIS, after setting up our layers and running the cartogram tool, we can create a print composer. We do this by going to Project and then clicking “New Print Composer” and giving the print composer a name. We start with a blank “composer,” but we can fill it with our cartogram by clicking “Layout” and then clicking “Add Map.” We then “draw” a space for the map inside the blank area, and the layers will draw within the print composer. From there, we can resize this window and add a legend or scale bar by clicking Layout and then those options. Every map should have a legend. Once we have placed the legend in the print composer, we can go to Item Properties and customize the text. Figure 10.12 shows that using the conventions of HTML, we can customize the font and add effects and colors using HTML code. HTML is commonly used in applications such as QGIS and ArcMap for formatting. Of course, many GIS applications allow styling and

282

GIS: An Introduction to Mapping Technologies

Community Areas of Chicago Exaggerated by Cumulative Infant Death 2005-2009

Legend Hardship Index (natural breaks) 1 - 17 17 - 38 38 - 58 58 - 78 78 - 98

Highest three community areas by infant mortality 2005-2009:

Austin - 125 Lawndale - 68 Humboldt Park - 61 Lowest three community areas:

0

2.5

5

7.5

10 mi

Burnside - 3 O'Hare - 2 Armour Square - 1

FIGURE 10.10  A cartogram of Cook County, Illinois, exaggerated by cumulative infant death. O’Hare, which is the site of an international airport and otherwise mostly elderly population, noticeably almost shrinks to nothingness between the two side-by-side images as infant mortality is not a major health concern in this area.

FIGURE 10.11  The Plugins menu of QGIS, with the “cartogram3” plugin.

Introduction to Health GIS Applications

283

FIGURE 10.12  Item properties of a legend that include HTML tags. A great resource for learning HTML is “https://www.w3schools.com/html/,” but here we can see a few examples of HTML. The

tags surrounding “Highest three community areas by infant mortality 2005–2009:” indicate that this line should be in bold font and that it is the beginning of a paragraph of text. The

    indicates that the cities should be in a listed format separated by squares, which you can see reflected in Figure 10.10.

    formatting without using any HTML code, but understanding basic HTML allows more advanced customization.

    10.5 Ring Maps In the last section we used a plugin to show a feature not normally included in the basic installation of QGIS. But other advanced GIS tools are only

    284

    GIS: An Introduction to Mapping Technologies

    available via programming scripts and models that must be imported into a GIS program manually. Often, these are specialized programming scripts written by a researcher or GIS professional for a specific use case and then released to the community. GIS programming is outside the scope of this book but understanding how to use and implement programming scripts written by other GIS professionals is a key skill. Both ArcMap and QGIS, the most dominant GIS programs, use Python for custom scripts, so a burgeoning GIS professional is highly advised to look into learning Python programming. In this case we will be using a Ring Map script from Minpa Lee, found at https://github.com/mapplus/qgis-scripts/blob/master/scripts/Create%20 Ring%20Maps.py. It is also located in the supplemental lab files. Ring Maps create concentric rings around a centroid and allow for a time series visualization of a dataset, providing an alternative to chloropeth maps or animations. Once you have the Python script downloaded, go to the Processing tab of QGIS and then click Toolbox. Figure 10.13 shows the Processing Toolbox window. We click “Add script from file” to import the Python script. We can then double-click the resulting script, which will show as [GEEPS] and run the program on our dataset. Since the script is “homemade” we may need to read the documentation on how it works, or in some cases, visit the creator’s website. In this script we have several buttons: “Input Vector Layer,” “Comma-separated fields or Ring Number,” “Output Ring Value Field,” and “Ring Gap.” We also have “Output ring anchor” and “Output ring maps,” which are by default set to temporary locations. These options are shown in Figure 10.14. In this case

    FIGURE 10.13  Processing Toolbox window.

    Introduction to Health GIS Applications

    285

    FIGURE 10.14  “Create Ring Map” options.

    our vector layer is “Diabetes_Incidence_Illinois.” The northwestern counties have been joined to a diabetes dataset that records several years of diabetes incidence rates for each county. Under “Comma-separated Fields or Ring Number,” we add a list of fields that will be used to generate the rings in that box, separated by commas. For instance, if we have three fields we want to make three rings out of, and they are called “2004,” “2005, and “2006,” we would type “2004, 2005, 2006” in that box. “Output Ring Value Field” transposes a field that will contain the value for the ring polygon; since the resulting rings will be an entirely new shapefile we need to specify where the new values will be placed. Finally, “Ring Gap” is a completely optional setting that places gaps between the rings; if the value is 0 it will be one continuous ring with the different values touching. In Figure 10.15 we can see the results of running this script. GIS are used in many different fields, such as in marketing, epidemiology, environmental science, and so on. In each field, GIS professionals use specialized tools, scripts, and plugins to meet the needs of their projects. GIS programming can also enhance GIS functionality, as discussed in this chapter.

    286

    GIS: An Introduction to Mapping Technologies

    FIGURE 10.15  Ring map results. Notice that there is no way to tell from this image what the numbers correspond to (diabetes incidence rates per 1,000 persons each year) and which rings are which years (inner rings are earlier, outer rings are later). This is precisely why careful map design with a legend and explanation is vitally important.

    References Brewer, C. A. 2016. Designing Better Maps: A Guide for GIS Users. Esri Press: Redlands, CA. Duffey, S. 2012. A More Accessible Map. A List Apart the Full. http://alistapart.com/ article/cssmaps. Accessed April 30, 2018. Kalkbrenner, A.E., J. L.Daniels, M. Emch, J. Morrissey, C. Poole, and J. C. Chen. 2012. Geographic access to health services and diagnosis with an autism spectrum disorder. Annals of Epidemiology. http://europepmc.org/articles/PMC3476466. Accessed April 30, 2018. Esri/arcgis-osm-editor. GitHub. https://github.com/Esri/arcgis-osm-editor/wiki/ Create-a-network-dataset-from-osm-data. Accessed April 30, 2018.

    Introduction to Health GIS Applications

    287

    Fernandes, H., N. Conceição, H. Paredes, A. Pereira, P. Araújo, and J. Barroso. Providing accessibility to blind people using GIS. Universal Access in the Information Society 11 (4): 399–407. doi:10.1007/s10209-011-0255–7. Frerichs, R. R. 2017. Deadly River: Cholera and Cover-Up in Post-earthquake Haiti. Cornell University Press: Ithaca. Grady, S. C. and I. J. Ramírez. 2008. Mediating medical risk factors in the residential segregation and low birthweight relationship by race in New York City. Health and Place 14 (4): 661–77. doi:10.1016/j.healthplace.2007.10.011. Graser, A. 2016. Learning QGIS: The Latest Guide to Using QGIS 2.14 to Create Maps and Perform Geoprocessing Tasks with Ease. Packt Publishing: Birmingham. Johnson, S. 2006. The Ghost Map: The Story of London’s Most Terrifying Epidemic—And How It Changed Science, Cities, and the Modern World. Riverhead Books: New York. Lee, M. On Spatial. [QGIS] Create Ring Maps Using PyQGIS Script. http://www. onspatial.com/2014/01/qgis-create-ring-maps-using-pyqgis.html. Accessed April 30, 2018. Pearce, J., R. Barnett, and S. Kingham. 2006. Slip! Slap! Slop! Cutaneous Malignant Melanoma Incidence and Social Status in New Zealand, 1995–2000. Health and Place 12 (3): 239–252. doi:10.1016/j.healthplace.2004.11.006. QGIS Documentation. 2018. Documentation. https://www.qgis.org/en/docs/index. html. Accessed April 30, 2018. Roger, S. Robin’s Blog. http://blog.rtwilson.com/john-snows-cholera-data-in-moreformats/. Accessed April 30, 2018.

    11 GIS and Digital Humanities

    11.1 Digital Humanities Meets GIS Digital humanities is an emerging interdisciplinary field concerned with applications of computational methods to the subjects of the humanities disciplines, including literature, history, philosophy, and art. It is only recently that digital humanities scholars have discovered GIS. This is mainly because the concept of space and places—despite its clear influence on the human condition—has not been a predominant mode of representation and inquiry among humanities scholars. However, there is an ever-increasing collection of digital humanities projects that use GIS as the main computation method (Levin, 2011). Historical events unfold over space, and thus mapping those events over time is instrumental in revealing any underlying processes. With a few clicks on an interactive animated map, you can make sense of the complex political history of Africa efficiently (Figure 11.1). Similarly, well-curated maps online can help you explore how residential segregation has evolved over time in relation to zoning, home loan, and urban renewal in St. Louis (Gordon, 2009; Figure 11.2). Archaeologists have used GIS to map archaeological artifacts and explore the spatial distribution of those artifacts in relation to natural and cultural features. Recently, GIS and LiDAR have been revolutionizing archaeological surveying. A recent study showed that LiDAR unveiled archaeological finds of the Lost Mayan city that are invisible to the naked eye (Wootson and Cleve, 2018). A searchable database of ancient place names has been developed for archeologic research and education. You can browse ancient places and locations at Pleiades, a community-built gazetteer of ancient places (Figure 11.3). A research team at Stanford University developed an application for estimating time cost between origin and destination by different modes of transportation in the ancient Roman World (Figure 11.4). The application was built on a network of known cities, places, and transportation routes. Literary scholars also have used geospatial tools and methods for digital literary mapping (Cooper et al., 2016). GIS were used to map places in literary works as a means to enhance text and corpus analysis. By mapping texts, geographic dimensions of literary works can be brought to light. Researchers 289

    290

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.1 Screenshot of “Animated Atlas of African History 1879–2002” (from http://www.brown.edu/ Research/AAAH/map.htm).

    FIGURE 11.2 Screenshot of “Mapping Decline: St. Louis and the American City” (from http://mappingdecline.lib.uiowa.edu/map/); from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    at Boston College mapped the journey of characters in James Joyce’s novel, Ulysses (Figure 11.5). The locations of characters in the novel were overlaid on historical maps registered to Google Maps. The web platform that combines literary texts with maps invokes readers’ imaginations through a spatial sense and helps them get a sense of what the City of Dublin was like at the turn of the 20th century.

    GIS and Digital Humanities

    291

    FIGURE 11.3 Screenshot of a query result at Pleiades (from https://pleiades.stoa.org/); from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    FIGURE 11.4 Screenshot of ORBIS: The Stanford Geospatial Network Model of the Roman World (from http://orbis.stanford.edu/); from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    292 GIS: An Introduction to Mapping Technologies

    FIGURE 11.5 Screenshot of “Walking Ulysses: Joyce’s Dublin Today” (from http://ulysses.bc.edu/).

    293

    GIS and Digital Humanities

    FIGURE 11.6 Path density calculated from the travel writing. (Reprinted with permission from Murrieta-Flores, P., Donaldson, C., and Gregory, I. GIS and literary history: Advancing digital humanities research through the spatial analysis of historical travel writing and topographical literature. Digital Humanities Quarterly, 11 (1), 2017); from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    Literary scholars would face creative challenges as to how best to utilize geospatial methods to enlighten literary works. One research team combined works of travel writing with topographic literature, and used surface analysis to infer the paths that would have been taken (Figure 11.6; MurrietaFlore et al., 2017). Spatial interpolation (isarithmic mapping) can be used to represent vague and fuzzy constructs such as emotions that are revealed in the physical space from literary works. Density mapping could be useful in visualizing the intensity of events captured in text analysis. Recent developments in the intersection between GIS and digital humanities are positive forces for both. GIS bring spatial perspectives to the humanities disciplines and help diversify modes of representation beyond textual or temporal modes. New insight can be gained when space is construed as a medium in which the human condition is embedded. Subject matter can be better understood if put into context (space and time). GIS enable the integration of data across different domains through geography, and thus can serve to facilitate the interdisciplinary mode of knowledge production. Conversely, humanities scholars challenge the notion that GIS are only good for representing quantitative data. All in all, these developments cause creative disruption on both sides as they provide an opportunity for broadening a means for revealing the human condition and rethinking GIS through a humanistic lens.

    11.2 Thinking Spatially with GIS It is possible to examine digital humanities projects that use GIS in terms of spatial thinking. Spatial thinking can be defined as the cognitive process of representing and reasoning in space, about space, and with space (National

    294

    GIS: An Introduction to Mapping Technologies

    Research Council, 2006). Thinking spatially encompasses using spatial concepts such as location, distance, network, and scale to address questions such as the following (Hwang, 2013): • where things—objects, fields, and events—are (spatial distributions); • how things interact between places (spatial interactions, movement, flow); • how things are related in a place (spatial relationships, correlation); and • how things change in a place (temporal relationships, change). GIS support spatial thinking for digital humanities, as illustrated by the examples discussed in the previous section. You can find out where ancient cities are by looking up a digital gazetteer and discover archaeological finds by analyzing LiDAR data (spatial distributions). You can infer the movement of people by computing the travel cost between places in the Roman World (spatial interactions). By following the routes taken by the characters in Ulysses on an interactive map, you can reimagine sentiments of those characters embedded in places (spatial interactions). It is possible to explore how residential segregation relates to home loan practices by looking at archives that are curated on a map (spatial relationships). Finally, an animated map can assist in exploring how political entities ebb and flow in Africa over time (temporal relationships). To determine where things are, it is necessary to georeference data whether they appear as texts in literary works or in the form of spatial archives such as paper maps. To enter spatial archives into a GIS database, data need to be scanned and georegistered. In this process, local coordinate units such as scanner units are transformed into global coordinate units such as UTM. This enables historical maps to be aligned with base maps in GIS. We detail how to georeference images in Section 11.3. For GIS-based analysis to work, locations in text (such as place names) need to be translated into the metric locations (such as latitude and longitude). As part of geocoding, digital gazetteer services (as shown in Figure 11.3) can be developed to promote the shared understanding of place names and to turn imprecise locations into metric constructs that can be mapped in GIS. We describe how to geocode geographic feature names in Section 11.4. Various GIS tools can be used to explore how things change. Time is critical to defining spatial entities. For example, hurricanes cannot be fully characterized without the temporal dimension at a fine scale. There is no one-size-fits-all solution to representing time in geographic databases as all spatial entities are different with regard to interaction with time. We discuss how to represent time in a database in Section 11.5 and how to map changes in Section 11.6. A great deal of the constructs humanities disciplines are concerned with (such as emotions and sentiments) are difficult to measure and thus yield poorly structured data. This poses challenges to GIS that are amenable to quantitative data. We consider approaches to mapping qualitative data in Section 11.7.

    GIS and Digital Humanities

    295

    11.3 Georeferencing Images History researchers may want to overlay historical maps on current aerial images to uncover any changes. Figure 11.7 shows the map of 1893 Columbian Exposition overlaid on Google Maps. This requires the process of transforming unreferenced XY coordinates to real-world XY coordinates (Figure 11.8). Through coordinate transformation, an unreferenced source layer will be aligned with a geographically referenced destination layer. For this coordinate transformation to work, you need to have a geographically referenced destination layer to which an unreferenced source layer can be registered.

    FIGURE 11.7 Historical maps overlaid on Google Maps.

    FIGURE 11.8 Illustration of georeferencing image using coordinate transformation.

    296

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.9 Displacement link table for coordinate transformation.

    If you use a GIS tool, you will need to enter a pair of source coordinates and destination coordinates (or ground control points) at the same location. Figure 11.9 shows the displacement link between source XY coordinates (srcX, srcY) and destination XY coordinates (dstX, dstY) at seven locations. The table is used to determine the relationship between source coordinates and destination coordinates. Four aspects of the source layer can be altered as a result of transformation: translation, rotation, scaling, and skew (Figure 11.10). The placement can be shifted (translation). The orientation can be changed (rotation). The size or dimension can be changed (scaling). Finally, the shape can be pulled at a slant (skew). When using a GIS tool, users are provided with different transformation methods, such as similarity (conformal), affine (first-order polynomial), and second-order polynomial. A similarity transformation allows rotation,

    FIGURE 11.10 Four aspects in which unreferenced data can be transformed to referenced data.

    297

    GIS and Digital Humanities

    translation, and scaling, but does not allow skew. An affine transformation allows changes in all four aspects. A second-order polynomial allows some curvature in addition to changes in those four aspects. The affine transformation is recommended for most transformations and requires at least three ground control points. A similarity transformation can be used if shape is not altered. At least two control points are required for the similarity method. A second-order polynomial requires at least six control points. The relationship between source coordinates and destination coordinates for the affine method is expressed as follows:

    x ′ = Ax + By + C



    y ′ = Dx + Ey + F

    where x′ and y′ are real-world coordinates from a destination layer, x and y is unreferenced (local) coordinates from a source layer, and A, B, C, D, E, and F are the six coefficients to be estimated. These coefficients control skew, rotation, scaling, and translation. With at least three control points (or six known x, y, x′, and y′ values), it is possible to solve for six unknown coefficients. It is important for control points to be placed as accurately as possible to ensure accurate transformation. It is a good practice to choose control points at an unambiguous location such as a road intersection or building corner to reduce uncertainty. It is also recommended that you spread out the placement of control points because uncertainty will be introduced in areas that lack them. After coordinate transformation, the transformed locations of source coordinates do not perfectly match with control points locations (i.e., destination coordinates). A GIS tool can calculate how much the transformed location deviates from the matched control point location for each of the displacement links. You can sum the squared deviation, divide the sum over the number of displacement links, and put a square root on that value. This results the average error, referred as the root mean square error (RMSE). The lower the RMSE the better the transformation. The RMSE provides auxiliary information about how accurate the coordinate transformation is.

    11.4 Mapping Texts Many text sources contain geographic feature names, such as Chicago, Chicago River, and Soldier Field. These text sources are not easily mapped using conventional address geocoding due to the ambiguity inherent in those names. For instance, “Lincoln Park” in Chicago is not only known as a neighborhood, but also known as Chicago’s largest public park. A primitive identity of Lincoln Park as a neighborhood did not exist until the

    298

    GIS: An Introduction to Mapping Technologies

    mid 1800s. Lincoln Park was recognized in the 1920s as a community area first by researchers, and much later by planners and residents in Chicago. “Lincoln Park” means different things to different people. Due to the vague and dynamic nature of some geographic features, it can be challenging to pinpoint location. These inconsistencies and contradictions among some geographic features is a problem to scientists and digital humanities scholars who need uniform and nonconflicting names. Thus, there is a need for standardization. To resolve this problem, many nations have established policies related to geographic feature names. Traditionally, directories of geographic feature names—known as gazetteers—have been used. Today print forms of gazetteers have been replaced with digital gazetteers, found online or in searchable databases such as the Geographic Names Information System (GNIS). The GNIS contains information about the physical and cultural geographic features in the United States, associated areas, and Antarctica, current and historical, including the federally recognized name of each feature and location by state, county, and geographic coordinates. Other attributes include names or spellings other than the official name, feature classification, and historical and descriptive information. The GNIS assigns a unique, permanent feature identifier as the only standard federal key for retrieving, integrating, or reconciling feature data from multiple datasets. Figure 11.11 shows the search results of “Lincoln Park” in Illinois on the GNIS website. This kind of network-accessible service that retrieves geographic feature names given a query request is called a gazetteer service.

    FIGURE 11.11 Query result from GNIS Domestic Name Search.

    GIS and Digital Humanities

    299

    Digital humanities researchers can geocode geographic feature names using a gazetteer service. Although gazetteer services may come in handy for mapping geographic feature names that appear in text, they come with limitations. First, you cannot geocode geographic feature names if there is no reasonable consensus as to whether a geographic feature exists and what it encompasses. Second, gazetteers are drawn from limited authoritative sources, such as topographic maps, and are not always complete. Indeed, downloadable gazetteer files in the GNIS database have many missing features. Given that places are landmarks of cultural ecology, it is necessary to consider various data sources (e.g., religion, political, economic, natural events) when building gazetteers. Crowdsourced geographic data such as Google Maps or OpenStreetMap should be considered as complements to authoritative gazetteers if currency and completeness is a priority. Third, it is not safe to presume the locations of geographic features are accurate one as the positional accuracy of geographic features is influenced by many factors, such as map scale, processing methods, feature type of data (point or polygon), and quality of source data. Fourth, note that you may be limited by your own knowledge since alternate (unofficial) names are probably only known to local residents. Finally, current gazetteer services rely on one-to-one matches based on text similarity, thereby limiting query results. Alternatively, proximity and spatial relationships can be considered to improve query results of geographic features.

    11.5 Representing Time in a Geographic Database Working with temporal or historical data is essential to exploring change. Historians tend to examine how events unfold over time without necessarily looking at space. Geographers tend to examine characteristics of places without necessarily looking at time. However, it makes more sense to understand the human environment through a lens of space-time than space-dominated or time-dominated views as time is intricately linked to space (Gregory, 2010). Literary scholars have largely organized and analyzed texts based on themes (topical areas), but texts can also be structured by space and time. Unexpected insight can be gained from organizing literary works based on the space and time in which they are set. The space-time view can provide an alternative lens through which humanities data such as literary work, archives, and artifacts can be interpreted. Before we consider different methods for visualizing changes from temporal data, it is necessary to look into how time is represented in a geographic database because visualization options are dictated by database requirements. For example, creating an animated map (time slider) in ArcGIS requires time to be represented at a row level in a geographic database. Time

    300

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.12 Layer-level time-stamping for a geographic database. (Reprinted with permission from Armstrong, M. Temporality in spatial databases. Proceedings of GIS/LIS’88, 2, 880–889, 1988.)

    can be represented in a geographic database at three levels: layer-, column-, and row-level. First, a snapshot of a geographic phenomenon at a given point in time can be represented at a layer level. For example, each raster layer shown in Figure 11.12 represents a phenomenon at different points in time (T1, T2, …, Tn). If pixel values represent precipitation, then a layer A at T1 and a layer B at T2 will show precipitation in two time periods. Map algebra can be used to calculate differentials (B − A) to determine changes in precipitation from T1 to T2. Layer-level time-stamping is a common way to represent an amorphous geographic phenomenon such as temperature and ozone level at a particular time. A stack of temporal layers can be viewed as space-time cube if the third (temporal) dimension is a continuum. Time (T1, T2, …., Tn) can be ideally chosen when changes occur, but it is not always feasible to do so as natural geographic phenomena (such as temperature and ozone level) are continuously changing. Time is often chosen at a regular interval for statistical reporting (such as every 10 years for decennial census). Layer-level time-stamping can be implemented in any type of geospatial data (vector and raster), but it is not the most parsimonious method of representing time in a geographic database especially when no noticeable changes occur between two time periods. Second, attributes that change over time can be stored at the level of columns (fields) of a geographic database. Figure 11.13 illustrates how rural-tourban transition occurs in five areas over four time periods from T1 to T4. All five areas used to be rural at T1, an area with id 2 became urban at T2, areas

    FIGURE 11.13 Column-level time-stamping for a geographic database.

    GIS and Digital Humanities

    301

    with id 3 and 4 became urban at T3, and an area with id 5 became urban at T4. Column-based time-stamping is appropriate for representing changes in an attribute (such as population change and land cover change) over spatial features that do not change. For example, populations in US counties in 2014, 2015, 2016, and 2017 can be stored in four columns, respectively, as geographic boundaries of those counties have not changed during those time periods. Some geographic data experience both attribute change and spatial change over time. This is the most realistic way in which geographic phenomena go through change over time. For example, earthquakes with varying magnitudes occur at different locations at different times. Hurricanes are formed, touch down, and fade away with different amounts of rainfall over different time periods. Landform changes, although at a much slower pace than other dynamic phenomena. Attribute and spatial changes can be represented at row levels as illustrated in Figure 11.14. Examples show how three land covers (agriculture, urban, and industry) change over time (from t1, t2, to  t3). There used to be two land covers, agriculture (S1) and urban (T1) at t1. Agricultural land shrank from S1 to S2 at t2 where urban land expanded at t2. New industrial land was created, and urban land shrank at t3. To represent both spatial and attribute changes, an attribute for land cover (agriculture, urban, and industry) can be stored for each of the six space-time atoms (U, T1, T2, T3, S1, and S2) at a row level. The beginning and ending time periods should be stored for each object in a database. Row-level timestamping is the most versatile time-stamping method as it precisely stores temporal data per spatial object as changes occur.

    FIGURE 11.14 Row-level time-stamping for a geographic database. (Reprinted with permission from Worboys, M. A unified model for spatial and temporal information. The Computer Journal, 37 (1), 26–34, 1994.)

    302

    GIS: An Introduction to Mapping Technologies

    11.6 Mapping Temporal Data There are different ways to depict changes in temporal data with GIS. Suppose you want to map settlements in London by different time periods. It is possible to compress all the data into a single static map where darker tints represent older settlements (Figure 11.15a). Time can be represented in a geographic database at row level as time values need to be associated with each object (row) representing the spatial boundaries of the settlements. For instance, to depict how settlements expanded (Figure 11.15a) the map authors divided the time periods into four groups, and then drew boundaries representing settlements established during the four time periods. Four multipart polygons were digitized and time values were assigned to each feature (or object). Then the color value (lightness) was used as a visual variable to depict time. You can also create a single static map from geographic data where time is represented at a column level. Suppose you are interested in mapping Hispanic population change from 2000 to 2010 by county. The population

    FIGURE 11.15 Modes of depicting changes. (Reprinted with permission from Kraak, M.-J. and Ormeling, F. J. Cartography: Visualization of Spatial Data, 3rd Edition, Routledge, 2009.)

    GIS and Digital Humanities

    303

    FIGURE 11.16 An example of the static map from column-level time-stamping. (Reprinted with permission from The Texas Tribune, retrieved from https://www.texastribune.org/2011/03/25/maps-​ ­visualize-us-population-growth-by-county/.)

    in the year 2000 and the population in the year 2010 will be stored in two columns of a database table, and then the rate of population change can be populated in a new field. You can create a choropleth map (Figure 11.16) from the new field using a diverging color scheme, where a greenish color hue is associated with population increase and a red color hue is associated with population decrease, while color value (lightness) is graduated in proportion to the absolute magnitude within the spectrum of the similar color hue. Graphs can be combined with maps to visualize information from temporal data. Suppose you want to explore changes in income inequality during multiple time periods. You can create a bar chart map where the bar is chronologically ordered and the bar size represents an income inequality measure in each time period. For example, Figure 11.17 shows changes in Gini index (a measure of income inequality) by county from 1959 to 1989 using a bar chart map. Figure 11.18 shows how the motor vehicle theft rate changes over time. The mean motor vehicle theft rate over the entire time period was set as a baseline. If the theft rate is above mean, those rates are depicted in yellow. If the theft rate is below mean, those rates are depicted in blue. This kind of temporal graph map compresses the large amount of information (motor

    304

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.17 An example of a bar chart map to depict changes from column-level time-stamping.

    FIGURE 11.18 Temporal graph map depicting changes (from Common GIS).

    GIS and Digital Humanities

    305

    FIGURE 11.19 An example of a series of multiple maps depicting changes (temperature 1900s–2000s).

    vehicle rate by different time periods in relation to the norm) into a set of visual variables that are easy to understand. You can create this kind of map from geographic data with column-level time-stamping. Although a single static map might represent information concisely, map readers might be overwhelmed if there are too many time periods. As an alternative to a single static map, you can consider a series of static maps that display settlements in London per time period for each map, for example (Figure 11.15b). Spatial boundaries of settlements in London can be created as a layer per time period, and each of the maps can be created from the respective layer associated with a time period. Figure 11.19 shows the average temperature in the 1990s and in 2000s as a series of static maps. The same data classification scheme was used to facilitate comparison between the two time periods. By comparing these two maps, it can be inferred that the overall temperature has risen over the last 100 years, especially west of the Mississippi River. Animation can also be used to depict changes. The series of maps discussed above can make up slides (or frames), and a series of those frames can be played in temporally ascending order (old to new). Animations can be created using different methods. If time is stamped at a layer level, then each layer per time frame can be used to make a map, and the map can serve as a frame of animation in a particular order. Each map per timeframe can be exported as an image file, and the sum of those image files can be put together to create a GIF file using freeware such as GIF Animator. The Time Slider tool (in the Tools toolbar) in ArcGIS can only work if temporal data is represented at row level. In other words, a column on time (such as beginning and ending time periods) needs to be present, and time values should be populated in the column for each object (row) to enable time before using the Time Slider. Daily activities or events that occur at a temporal continuum can be depicted using a space-time cube (Figure 11.20). In the space-time cube, the x- and y-axis represents geographic space in two dimensions, and the z-axis

    306

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.20 Space-time cube depicting daily activities over time. (Reprinted with permission from Kraak, M-J. The space-time cube revisited from a geovisualization perspective. Proceedings of the 21st International Cartographic Conference (ICC). The International Cartographic Association, 2003.)

    (or the cube’s height) represents time. Figure 11.20 shows that a person stayed at a workplace from 8:10 to 17:30 and visited restaurants after work before getting back home at 19:55. This kind of map can be created using 3D modeling tools (e.g., ArcScene), where temporal values are set to the third dimension. This framework inspired by time geography demonstrates that space and time are fundamental dimensions of dynamic processes such as daily activities and climate change. Since geospatial data at high spatiotemporal resolution have become increasingly available, demand for mapping this type of data will only increase. For instance, GPS trajectory data can record location and time—(x, y, t) coordinates—of a person who carries a mobile device (such as smartphone) at every second or minute. Wireless sensors can record and send climate data on a real-time basis. Array-oriented scientific data are disseminated as NetCDF (Network Common Data Form) files that can be displayed and analyzed in GIS.

    11.7 Mapping Qualitative Data A great deal of the data generated from the humanities disciplines can be considered qualitative data. While qualitative data usually refers to categorical data (e.g., land cover, soil type) in the context of GIS, in this chapter qualitative data encompasses any information that can be captured that is not numerical in nature. The major sources of qualitative data in humanities disciplines include observation, interviews, and written documents. Many humanities data are not amenable to being measured in terms of content (e.g., sentiments and emotion), and will not be measured as precisely as other constructs whose measurement is well agreed upon. Humanities data are not well structured in terms of format. Humanities researchers commonly work with unstructured texts, images, and videos that represent vague constructs. Such data are not largely compatible with conventional GIS.

    GIS and Digital Humanities

    307

    FIGURE 11.21 Mapping emotions by adapting qualitative data to GIS.

    You can consider two distinct approaches for mapping qualitative data as there is an irreconcilable ontological difference between GIS and humanities disciplines. The first approach adapts qualitative data so that it can be easily handled in GIS. Humanities data are analyzed within the framework of GIS in this approach. The second approach does not turn qualitative data to data that are compatible with GIS, but uses qualitative data as is and combines those data with GIS for curation and other purposes. GIS or maps are used as media that augments the interpretation of qualitative data in this approach. To adapt qualitative data to GIS (the first approach), you should conduct data coding that turns qualitative data to data that are amenable for mapping or analysis in GIS. For example, you may consider sources like geotagged social media data to map emotions at different urban scenes. Since emotions are multifaceted, you can focus on mapping one dimension such as comfort. To create a map depicting comfort level, you can select social media posts that are relevant to level of comfort, place those relevant texts in an ordinal scale (e.g., from very comfortable to very uncomfortable), and make an isarithmic map portraying level of comfort using spatial interpolation. Figure 11.21 illustrates mapping the coded data (i.e., level of comfort). In this approach, you can take advantage of a full array of GIS functionality for mapping and spatial analysis since data is analyzed in GIS. This will enable GIS users to tap into spatial thinking and aid understanding of how comfort level spatially varies and how such emotion relates to different characteristics of urban scenes. Findings might be considered as input for urban design. Of course, some of original data will be lost in the data coding processes that turn qualitative data to data amenable for mapping in GIS. It is also challenging to come up with an effective coding scheme that defines what diction is used to represent a certain emotion in text and how text should be labeled. Alternatively, you may prefer to keep original data to be intact while still putting qualitative data in spatial perspectives. Suppose you would like to read entire literary texts associated with a particular location rather than

    308

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.22 Combining qualitative data with a map: the Roaring Twenties: a soundmap of New York in the 1920s (from http://vectorsdev.usc.edu/NYCsound/777b.html.)

    work with coded data in GIS. For instance, Walking Ulysses (Figure 11.5) lets you read segments of original texts by different characters at places of interest that you can click on. Similarly, Figure 11.22 shows an example of combining sounds with a map. On this website, you can play sounds at locations in 1920s New York City in an interactive map. Geographic feature names are mapped using geocoding and then sound archives are linked to the locations of the geocoded features. Since data is not curated in a full-fledged GIS, a full array of GIS functionality cannot be utilized in this approach. As GIS platforms move toward web GIS, combining multimedia and unstructured texts with maps will become increasingly common. ACTIVITY 11.1 GEOREFERENCE SCANNED MAPS (QGIS 2.18.16) To georeference images (a source layer), you first need a destination layer that is stored in a projected coordinate system. A destination layer can be a basemap or other geospatial data stored in a known projected coordinate system. In this activity, we will use a web-based Google Street map as a destination layer. You will georeference a scanned historical map to be aligned with the destination layer. You need to have QGIS, free and open-source GIS software, installed on your computer to do this activity. • Open QGIS Desktop. • Install the plugin OpenLayers (Plugins > Manage and install Plugins from the menu). This plugin lets you add web maps from Google Maps and OpenStreetMap as basemaps. This will add a menu called “Web.” • Add a basemap Google Street (Web > OpenLayers plugin > Google Map > Google Street). • Zoom to Chicago.

    GIS and Digital Humanities

    FIGURE 11.23 QGIS—Read a coordinate reference system in the status bar.

    • Add Chicago 1857.jpg (in Lab11 folder) to a map (Layer > Add Layer > Add Raster Layer). You will be prompted to select a coordinate reference system (CRS). Since a source layer (Chicago 1857.jpg) is unreferenced, you can simply click OK at this point. This will simply add a source layer with missing CRS. • Zoom to the Chicago 1857 layer added (right-click on the layer Chicago_1857 in Layers Panel > Zoom to Layer). Zoom out quite a bit until you see the African continent. Note that the image is not aligned with a basemap since the source layer is not geographically referenced. • Click EPSG at the bottom right of QGIS Desktop (Figure 11.23). • This will open the CRS of a currently open project. By default, the first layer added to QGIS determines the CRS of project. A basemap retrieved from the Internet is stored in WGS 84 Pseudo Mercator (EPSG code 3857; Figure 11.24). • Close the Project Properties | CRS window. • Open the Georeference plugin (which is already installed for this version): from the menu, Raster > Georeferencer > Georeferencer. This will open the Georeferencer window. If you do not see Georeferencer under the Raster menu, do the following: (1) close QGIS. (2) Open regedit. (3) Look for the folder \HKEY_CURRENT_USER\Software\QGIS\QGIS2\Plugins. (4) Delete the folder “Plugin_Georeferencer.” (5) Restart QGIS. Georeferencer should now open. • Open a source layer (an image without CRS) in the Georeferencer window. You can click File > Open Raster from the menu, or click the Open Raster icon shown in Figure 11.25.

    309

    310

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.24 QGIS—Project Properties window.

    FIGURE 11.25 QGIS—Open Raster in a Georeferencer tool (plugin).

    • Select the CRS of a destination layer (i.e., WGS 84/ Pseudo Mercator). If the CRS of a destination layer does not show up, type “Pseudo Mercator” in the Filter input box to select it. Click OK. You are choosing the CRS of a destination layer because new coordinates of a source layer after transformation will be cast to the CRS of a destination layer (Figure 11.26).

    GIS and Digital Humanities

    FIGURE 11.26 QGIS—Select a coordinate reference system for the layer to be georeferenced.

    FIGURE 11.27 QGIS—A source layer on the left and destination layer on the right for the Georeferencer tool.

    • Resize and move the Georeferencer window and QGIS Map Display Area so that you can see the historical map on the left and the Chicago area in Google Map Street on the right (Figure 11.27).

    311

    312

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.28 QGIS—Add control points for the Georeferencer tool.

    FIGURE 11.29 QGIS—Click OK after adding a control point for the Georeferencer tool.

    • Add control points. To add control points, click Edit > Add point from the menu in the Georeferencer window or click the Add point icon in the Georeferencer window (Figure 11.28). For example, Figure 11.28 shows how to add a control point at the intersection between W. Webster Ave (formerly Asylum Place) and N. Clark Street (formerly Creek Bay Road) from a source layer and a target layer. • Click OK in the Enter map coordinates window when a control point is added (Figure 11.29). • Continue adding six additional sets of control points and disperse them as much as possible. Figure 11.30 shows seven sets of control points added. srcX is a X coordinate of a source layer, and dstX is an x coordinate of a destination layer. • Choose a transformation type (polynomial 1) and specify output raster (Settings > Transformation Settings in the Georeferencer image). Be sure to save the output raster (click the Browse button Output raster).

    GIS and Digital Humanities

    FIGURE 11.30 QGIS—Ground control point table for the Georeferencer tool.

    FIGURE 11.31 QGIS—Root mean square error for the Georeferencer tool.

    FIGURE 11.32 QGIS—Start georeferencing for the Georeferencer tool.

    • To find the transformation type (method), Google “QGIS georeferencer plugin documentation.” Read the available transformation algorithms. • Check out the mean error (i.e., RMSE) in the GCP Table window in the Georeferencer window (Figure 11.31). A unit for RMSE corresponds to the linear unit of destination CRS (meter). • Transform the image. To transform the image, click File > Start Georeferencing from the menu, or click the Start Georeferencing icon shown in Figure 11.32. • Save the control points (File > Save GCP Points as…). You can use this file to load GCP files to continue to (or modify) coordinate transformation (File > Load GCP Points). • Add the output raster and set the CRS of a transformed layer to the QGIS Map Display Area (Figure 11.33).

    313

    314

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.33 QGIS—Set a coordinate reference system to a layer added.

    • Zoom to the output raster (Chicago 1857_modified.tif). • Make the output raster transparent (50%) (right-click on the layer in the Map Legend > Properties > Transparency). • Zoom in to check how well the image was transformed to the CRS of the destination layer. • Save your work (Project > Save).

    GIS and Digital Humanities

    ACTIVITY 11.2 MAP TEMPORAL DATA (QGIS) 1.  Mapping change in earthquakes 1960–2015. In this activity, you will map earthquakes in the world. You will use two shapefiles in the subfolder named World_Earthquakes in Lab11 folder for this activity. • Open QGIS Desktop. • Add quakes_post1960.shp and Country.shp from the World_ Earthquakes subfolder of Lab11 (Layer > Add Layer > Add Vector Layer. Click the Browse button and click a file with an extension shp). • Change the symbology of the quakes_post1960 layer to a red symbol with sizes 1 (right-click on the layer quakes_post1960 in Layers Panel > Properties). • Change the symbology of the Country layer to land (Figure 11.34). • Open the attribute table of the quakes_post1960 layer (rightclick on the layer > Open Attribute Table). Scroll all the way to the right. The field Date is populated from other fields like YEAR, MONTH, and DAY. This field will be used to create an animated map. • Install the TimeManager plugin. This will the TimeManager window to the Map View. • From the menu, Plugins > TimeManager > Toggle Visibility. This will add the TimeManager window in the Map View. • Click the Settings button in the TimeManager window (Figure 11.35). • Click Add Layer in the TimeManager settings. Select quakes_ post1960 from the Layer dropdown list. Select Date from the Start Time dropdown list. Click OK (Figure 11.36). • Click the Time display options button in the TimeManager settings. Then type 24 for the Font size, change Time Format to %Y, and select N from the Placement Direction dropdown list. This will display time in year format at the top (North) of the map. Click OK (Figure 11.37).

    FIGURE 11.34 QGIS—Choose style in Layer Properties window.

    315

    316

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.35 QGIS—Choose settings for Time Manager tool.

    FIGURE 11.36 QGIS—Select a layer and time for the Time Manager tool.

    • This will automatically find the minimum and maximum value from the Date field. The begin time is 1960–01–13 and the end time is 2015–08-07 as shown. • To show earthquakes every 10 years, change Time frame size to 10 years as shown in Figure 11.38. • To play an animation, click the Play at the lower left (Figure 11.39). The animation will show earthquakes that occurred from time periods 1960s, 1970s, 1980s, 1990s, 2000s, and 2010s. There are six frames total and each frame is displayed for 0.5 seconds.

    GIS and Digital Humanities

    FIGURE 11.37 QGIS—Change time display options for the Time Manager tool.

    FIGURE 11.38 QGIS—Set time frame size for the Time Manager tool.

    • If you want to replay the animation, drag the play bar to the beginning point in time (1960–01–13), and click the Play at the lower left. • Currently each frame is shown for 500 ms (0.5 s). If you want to play each frame (earthquake for every 10 years) slowly (like 1 s), click the Settings button and type 1000 ms (Figure 11.40). • This time we will create a heatmap to provide a smoothed representation of earthquakes. • Open the Properties of the quakes_post1960 layer. Click Style on the left. (1) Change map style to Heatmap. (2) Change a color ramp to a graduated symbol OrRd. (3) Make the layer transparent 50%. Click OK. This will create a heatmap that smooths the density of the earthquakes (Figure 11.41). • Play the animation again. • Save your work (Project > Save).

    317

    318

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.39 QGIS—Play an animation in the Time Manager tool.

    FIGURE 11.40 QGIS—Set a duration for each frame in the Time Manager tool.

    FIGURE 11.41 QGIS—Make a heat map.

    2.  Mapping change in diabetes prevalence. In this activity, you will map change in diabetes prevalence rates from 2004 to 2012 in Illinois. • Open QGIS Desktop. Open a new project (Project > New). • Add a shapefile ILCountiesDiabetes.shp from the folder \Lab11\IL_Diabetes. • Open the attribute table of the layer. Scroll all the way to the right. The field DPR2004 is age-adjusted diabetes prevalence rate (%) in 2004, and the field DPR2012 is age-adjusted diabetes prevalence rates in 2012. • To find the descriptive statistics of the two fields, from the menu click the Vector > Analysis Tools > Basic Statistics for numeric

    GIS and Digital Humanities

    FIGURE 11.42 QGIS—Open a field calculator.

    FIGURE 11.43 QGIS—Use a field calculator.



    • • •



    fields. Select ILCountiesDiabetes from the Input Vector Layer dropdown list and select DPR2004 from the ‘Field to calculate statistics on’ dropdown list. (Note that field names are alphabetically ordered.) Click Run. A Results window will appear. You can see that on average diabetes prevalence rates in 2004 for Illinois counties are 7.2, ranging from 5.4% to 8.7%. Find the descriptive statistics of diabetes prevalence rates in 2012. The average diabetes prevalence rate in 2012 is 10.71% with minimum 7.6% and maximum 13.4%. It appears that diabetes prevalence rates increased. This time we will create a new field named Change and populate the field with DPR2012–DPR2004 to map changes. Open an attribute table and open the field calculator (Figure 11.42). Populate the field. (1) Name the output field. (2) Choose the output field type: decimal number. (3) Click the plus next to Fields at the lower right and enter DPR2012-DPR2004 by clicking the appropriate fields and an operator button. Click OK (Figure 11.43). Map the field change using the graduated symbol (right-click on the layer > Properties. (1) Choose Style: Graduated. (2) Choose Column: Change. (3) Choose Color ramp. (4) Choose

    319

    320

    GIS: An Introduction to Mapping Technologies

    FIGURE 11.44 QGIS—Make a choropleth map.

    FIGURE 11.45 QGIS—Project the map on the fly.

    the Mode of data classification (Natural Breaks) and click the Classify button. Click OK (Figure 11.44). • Change the map projection of the current project to UTM 16N. To change map projection, Project > Project Properties > CRS. Check “Enable ‘on the fly’ CRS transformation,” and type UTM zone 16N in the Filter Input box. Then select NAD83/ UTM zone 16N. Click OK. This will project the map on the fly (Figure 11.45). • Save your work.

    GIS and Digital Humanities

    321

    References Cooper, D., C. Donaldson, and P. Murrieta-Flores, eds. 2016. Literary Mapping in the Digital Age, 1 Edition. Farnham: Surrey, England; Routledge: Burlington, VT. Council, National Research. 2005. Learning to Think Spatially. https://doi. org/10.17226/11019. Gordon, C. 2009. Mapping Decline: St. Louis and the Fate of the American City. F Second Printing Used edition. University of Pennsylvania Press: Philadelphia, PA. Gregory, I. N., and A. Geddes, eds. 2014. Toward Spatial Humanities: Historical GIS and Spatial History. Indiana University Press. http://www.jstor.org/stable/j. ctt16gz7s5. Hwang, S. 2013. Placing GIS in sustainability education. Journal of Geography in Higher Education 37 (2): 276–291. doi:10.1080/03098265.2013.769090. Kraak, M.-J. and Ormeling, F. J. 2009. Cartography: Visualization of Spatial Data, 3rd Edition, Routledge: Abingdon-on-Thames. Levin, J. 2011. DH GIS Projects. Anterotesis (blog). http://anterotesis.com/wordpress/mapping-resources/dh-gis-projects/. March 16, 2011. Murrieta-Flores, P., C. Donaldson, and I. Gregory. 2017. GIS and literary history: Advancing digital humanities research through the spatial analysis of historical travel writing and topographical literature. Digital Humanities Quarterly, 11 (1). http://www.digitalhumanities.org/dhq/vol/11/1/000283/000283.html. Wootson Jr. and Cleve R. 2018. Maya Civilization Was Much Vaster than Known, Thousands of Newly Discovered Structures Reveal. Washington Post, February 3, 2018, sec. Speaking of Science. https://www.washingtonpost.com/news/ speaking-of-science/wp/2018/02/03/mayan-civilization-was-much-vasterthan-known-thousands-of-newly-discovered-structures-reveal/. Worboys, M. 1994. A unified model for spatial and temporal information. The Computer Journal, 37 (1), 26–34.

    12 Organizational GIS

    12.1 Working as a GIS Professional You are likely reading this book as part of your first steps toward becoming a GIS professional, or at least proficient in GIS to enable your other work or research interests. Maybe you are hoping to get a job as a GIS analyst out of college, or hoping to learn GIS techniques that can be applied to your research or work in your main home discipline, such as environmental science, archeology, etc. You may be wondering how to proceed after this step in terms of pursuing employment or furthering your own skills. GIS have many roles to fill across many different organizations. In the “public sector,” which typically refers to governmental services, A GIS can play a supplementary or primary role at different levels in the organization. For example, two separate divisions of a city government may use GIS; one for planning purposes and as-needed analysis of city needs, and the other for maintaining a GIS of public roads that is accessible to the public at all times and that is updated regularly. Both divisions utilize GIS, but for completely different reasons and audiences. Meanwhile, in the private sector, which typically refers to private enterprise undertaken for profit (or otherwise not by those who would be considered direct government employees), a GIS may be used to scout potential retail locations, or to maintain a GIS network of roads for use by a rideshare company, for example. Throughout this book you have been exposed to different tools and methods of analysis using GIS, but whether you deploy these skills in academia, or as a government official, or as a freelance GIS analyst, there are various “soft” and “hard” skills that will help you succeed and implement GIS outside of the classroom. “Soft” skills refer to skills like teamwork, time management, being comfortable speaking in public, keeping control of emotions in the workplace, and so on, while “hard” skills are more quantifiable skills such as writing ability and computer skills. “Hard skills” and “soft skills” as terminology may have a questionable, gendered etymology, but several studies suggest that they often determine what qualifies you beyond your technical competency with GIS. Understanding how to develop and hone such skills to supplement your GIS skills help you succeed professionally. It also helps to 323

    324

    GIS: An Introduction to Mapping Technologies

    be adaptable and to think critically about how GIS skills can best support an organization and in which capacity. A good way to start is to understand how GIS are typically implemented.

    12.2 Organizational Implementation of GIS Dr. Roger Tomlinson discusses how different organizations implement GIS in his book Thinking About GIS. Broadly speaking, GIS can be categorized at levels such as “Departmental,” “Enterprise Systems level,” and “Community Systems level.” When a GIS is implemented at a departmental level, it is typically to serve the primary goal of the department. For instance, a large fast food corporation might have a department dedicated entirely to real estate concerns such as scouting new locations, accessing trends relating to existing business in current locations, or anything else that might affect the bottom line of the company relating to real estate. In this real estate department, there might be a handful of GIS professionals, or even a few dozen solely dedicated to GIS analysis. Alternatively, in this department there might be a mix of people in different roles with different levels of GIS skills. Whatever the makeup of this imaginary department, we can assume that when GIS analysis occurs, it is probably completed using tools particular to the goal of their overall department, such as generating service areas of potential locations using network analysis tools. For this reason, if you are job seeking and are aware that a potential GIS job posting is within a specific department of an organization, it would likely pay off to determine the core functions of that department and the most pressing GIS needs. This would allow you to emphasize your competency in those areas through your resume or during the interview process. An alternative structure is an “Enterprise” system. Consider, for example, a different fast food corporation, where the GIS is considered useful to several different departments and the overall mission of the organization. This fast food company may implement a GIS intended to be shared and used throughout the organization. Rather than having departments that use GIS professionals, some companies may centralize GIS professionals under a “GIS department” within the organization to produce maps. Enterprise systems vary in complexity and implementation. One organization might maintain a single GIS server, whereas another might maintain a “cluster” of highly available GIS servers, combined with in-house code and custom websites. The third example is a “Community” GIS system. In broad terms, this refers to enterprise systems that are built with common standards so that

    Organizational GIS

    325

    other organizations or individuals can share the GIS resources. There are several examples of this in the public sector. For example, the US Geological Survey produces and maintains GIS systems for the sole purpose of disseminating data to the public. Major cities also often maintain community GIS, but often in the interest of private sector organizations as well, again with different levels of complexity. For example, Google allows app developers and regular users to access their maps and other services in the hopes that more users adopt their products and generate ad revenue.

    12.3 Organizational Best Practices Implementing and maintaining GIS in organizations demand skills from GIS professionals in many areas. In addition to technical and communication skills, a GIS professional should be mindful of best practices and other considerations when organizing and planning their GIS work, such as following the CIA/AIC Triad of confidentiality, availability, and integrity when planning GIS. The CIA Triad (often referred to as the AIC Triad, as used here to distinguish it from the well-known US government agency with the same acronym) is a model designed to guide information security within an organization, and can be applied to GIS as well. The three components of the AIC Triad are Availability, Integrity, and Confidentiality. The AIC Triad is used to determine the priorities and needs of an organization and to act accordingly. For instance, if we were implementing a community GIS to disseminate nonsensitive public GIS files such as the political boundaries of a town, it would likely be prudent to focus on Availability to ensure users can access and download the data at all times, and to a lesser extent, Integrity, to ensure users receive the same unaltered versions of files. In this example, Confidentiality would be less important as the data would not be considered sensitive. On the other hand, other organizations often have the exact opposite priorities. Take, for example, a GIS research team working with the sensitive health data of individuals. If the GIS research team was located geographically separate from each other, the organization might decide to implement a system for sharing GIS data using some techniques outlined in the Web GIS chapter as well as later on in this chapter, such as sharing work using Map Packages or ArcGIS Online. In this case, if the organization prioritized Availability over Integrity and Confidentiality, serious consequences could occur if a breach of data occurred, for example. Here are some more examples of how different components of the AIC Triad might look in the context of GIS in organizations as well as how different components of the Triad might be prioritized (and why).

    326

    GIS: An Introduction to Mapping Technologies

    Low Need for Availability: Small GIS research groups that work with sensitive data are a good example of GIS projects that do not require high availability. The work may be saved only on each researcher’s personal computer and only shared among the group using encryption. High Need for Availability: Many GIS professionals routinely pull in basemaps from corporations such as Google and Here. In order for these basemaps to be available, they have to be hosted on servers that can be reached reliably from people around the world. Failing to invest in the hardware needed to ensure 24/7 availability would be quickly noticed, and even scenarios such as a natural disaster (an earthquake near a company’s servers, for example) need to be considered and planned for to prevent downtime of those GIS services. Low Need for Integrity: Nearly all GIS projects require Integrity to ensure reliability of GIS data. But imagine a collaborative web GIS where anyone who visits a website can draw points on a map to contribute local knowledge about where community health centers are located. In this case, being restrictive and cautious about whether the contributions are trustworthy and accurate could be counter to the goals of the project and limit engagement. Also, if we have a small GIS department sharing data only internally the only precautions to ensure data integrity might be to archive GIS results and be careful not to overwrite them. High Need for Integrity: On the other hand, many well-known collaborative crowdsourced projects, such as Wikipedia, or the prominent collaborative mapping service known as OpenStreetMap, need to place restrictions on anonymous submissions or revisions, and use “version control” to keep track of changes and to revert between versions if needed. Long-term projects, which might use a GIS dataset multiple times over several years, may consider Integrity among the highest priorities to ensure data analysis and projections can be trusted year to year. Alternatively, if a company that relies on a GIS for their business needs does not prioritize Integrity, for example, a ride-sharing company that uses maps to allow drivers to navigate to passengers, they may find themselves dispatching rides to incorrect or outdated addresses, ultimately affecting their bottom line. Low Need for Confidentiality: GIS projects are often completed without the need for Confidentiality. This is the case when projects are completed using public data, or results are made available to the public without restriction. High Need for Confidentiality: Any organization that uses sensitive data such as individual health data or classified government data has a high need for Confidentiality. GIS professionals working in these sectors need to understand and follow organizational guidelines relating to proper use and storage of sensitive GIS data. The AIC Triad should be used for guidance only. Nonetheless, when using a GIS at an organization and working with sensitive data, it can be a useful place to start.

    Organizational GIS

    327

    12.4 Collaborative GIS GIS and cartography have a long history but in recent years it become a part of everyday life, given the proliferation of GPS-enabled cell phones. “Collaborative GIS” are more common today than ever and allow individuals and organizations to share documents across the Internet using cloud services and mobile applications. These advancements in Internet technology have enabled GIS to grow beyond the former bounds of desktop applications and the “sneakernet” (files passed by hand by being personally carried!). To give some frame of reference, ArcView, the precursor to ArcMap, was originally released in 1995. JavaScript, the programming language used in most every contemporary web GIS today, was released in a crude form around that time and was not widely used. Most home users and many businesses were limited to “dial-up” Internet speeds that capped at 33.6 kilobits per second (today the average Internet speed is over 500 times that at 18.75 megabits per second). Ten years later, web technologies such as HTML, XML, and JavaScript matured and the buzzword “Web 2.0” was used to herald the new generation of Internet capabilities. ArcGIS Server was released in 2004, and in 2005 Google released their first version of Google Maps. MapServer, an open-source alternative to ArcGIS Server, was on its fifth release in 2007. Many of the underlying technologies behind these programs and web GIS applications are still key to some of the web GIS platforms used today. Just as importantly, many organizations private and public still use older versions of web GIS rather than “cutting edge” tools so understanding and being aware of the older web GIS applications is sometimes just as important as knowing the latest trends. Using “legacy” technology is not limited to GIS; many banks in 2018 still use COBOL, a programming language dating back to 1959. Some legacy GIS can run smoothly for years. At times organizations write customized tools, or adapt GIS programs and applications to their customized workflow. In cases like this changing to a newer system could disrupt business continuity. This is how GIS professionals can find themselves working with legacy systems. You must be adaptive and willing to learn and engage with older systems when necessary. On the other end of the spectrum, sometimes organizations are forced to upgrade or make other software changes due to security considerations. Sometimes even years after a software release flaws and bugs are found in the code that make the software too vulnerable to use. For example, in the past, many web GIS applications were built using Adobe Flash and Adobe Flex. At one point in time, Adobe Flash was used by just about every website and in every browser. However, both Adobe Flash and Flex were eventually removed from web browsers and smartphones due to their inherent security risks. This left many organizations that had depended on web GIS implemented using Flash and Flex unprepared for this change and without a working web GIS until they were able to switch to newer technologies! Unfortunately, due to the

    328

    GIS: An Introduction to Mapping Technologies

    FIGURE 12.1  A product lifecycle of ArcMap 10.5.1’s planned support cycle; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    increasing prominence of cybersecurity threats and even state-run attacks on governmental infrastructure, the need to stay current with software updates and newer technologies is more important than ever. A good practice to ensure GIS software is up to date and secure (and to be sure you have the latest version of the software with the newest features) is to keep checking for new updates from the software maintainer as well as other avenues. Many GIS programs, such as ArcMap, do not automatically install updates. All GIS software, whether proprietary or open source, has some sort of web presence where news about updates and bug fixes are disseminated, such as their web homepage, or newsletters that provide updates via email. Figure 12.1 shows the planned product lifecycle of ArcMap 10.6. Sometimes it is the GIS community that finds the security risks in the industry’s tools. For example, the “National Vulnerability Database” maintained by United States’ National Institute of Standards and Technology (NIST) maintains a list of security concerns for software, GIS and otherwise can be found at https://nvd.nist.gov (Figure 11.2). However, the above only applies to GIS software installed on an organization’s premises, such as with desktop GIS software or web GIS software installed on an organization’s own servers. When using web GIS with cloud services, the considerations are fairly different. Broadly speaking, “cloud services” refer to organizations hosting applications and resources, including GIS resources, on another organization’s servers, which are often decentralized. Amazon, for instance, hosts “cloud services” that share computing

    Organizational GIS

    329

    FIGURE 12.2  “CVE-2014–9741,” a GIS vulnerability cataloged on the NIST webspace.

    load across many different geographically located servers, such that even if an entire array of servers in a building lost power, business continuity is assured. Because of this, there is a growing trend to move GIS infrastructure “into the cloud” rather than incurring the costs of maintaining GIS data centers and related infrastructure.

    12.5 Survey123 and Collaborative StoryMaps Organizational GIS often involve maintaining datasets, for instance, through relational databases or by archiving files. Some organizations also engage in “fieldwork,” where data is collected out in the real world using GPS-equipped survey devices such as ArcPad devices or specialized drones, for example. A health organization, for instance, may go door-to-door surveying residents about whether they have been afflicted by a variant of norovirus to better track the pattern of disease. Using ArcPad devices or tablets equipped with surveying technology, they can link patient interviews and GIS information to the fieldwork survey. Two tools are commonly used for surveys like this, “Collector” and “Survey123,” both made available by ESRI with integration into the overall ArcGIS product ecosystem. Open-source alternatives, such as GeoNode, are available but not robust enough for widespread adoption. We will demonstrate Survey123 in this section. To follow along with this exercise, you will need either an ArcGIS Online trial account or access to an ArcGIS Online organizational account. If the former, navigate to https:// www.esri.com/en-us/arcgis/trial; if the latter, reach out to someone in your organization for access to the ArcGIS Online organizational account.

    330

    GIS: An Introduction to Mapping Technologies

    FIGURE 12.3  Survey123 homepage; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    Depending on your organization, you may also have Survey123 implemented through ArcGIS Enterprise. To get started, navigate to the Survey123 website at https://survey123. arcgis.com/surveys and log in using your ArcGIS Online account. As shown in Figure 12.3, you will see a “Create a New Survey” button (if you are logged ins and have access; you need Publisher permission or above for Survey123 functionality). Once you click “Create a New Survey,” you will see a prompt that asks whether you want to use Web Designer, or Survey123 Connect, which is an XLSforms application that runs on your desktop rather than in the browser. This is shown in Figure 12.4. We will be using “Web Designer,” which lets you use Survey123 in your internet browser. In the past, Survey123 Connect offered much more functionality, but the two different applications are now

    FIGURE 12.4  Web designer vs. Survey123 Connect features; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    Organizational GIS

    331

    FIGURE 12.5  Prompt for creating a new Survey123 survey; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    more or less comparable, the main difference being that Survey123 Connect supports more “smart form” rules. Smart form rules, as you will soon see, allow us to determine which questions appear based on the answers to previous questions. This functionality is expanded in the desktop Survey123 Connect software. Web Designer also supports rules, at a basic level, and is intuitive and easy to use. Next, decide on the scope and subject of your survey. In Figure 12.5, we created a survey for Albany Park (a northside neighborhood in Chicago) to record local health clinics and information about them. Think about a subject you would be interested in investigating in your neighborhood, such as local grocery stores, health clinics, or other places such as parks. Make the scope of it “local,” no bigger than a given community area, such as Lincoln Park or Wicker Park. Also choose a subject for the survey that volunteers and field workers could feasibly walk around and survey, such as locations or subjects that the public has access to, such as number of benches in a local park, or whether a business has disability accessible bathrooms. You will now see the “Design” page for your new survey, as shown in Figure 12.6, with the different areas of interest numbered. 1. In the top bar, you will see tabs labeled “Overview, Design, Collaborate, Analyze, and Data.” Until you are ready to publish the survey, you will want to stick with “Design.” Later, the “Collaborate” tab will allow you to control who submits to the survey; members of

    332

    GIS: An Introduction to Mapping Technologies

    FIGURE 12.6  The design page for a new Survey123 survey; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.



    your organization with ArcGIS accounts, or anyone who has access to the link, for instance. Analyze gives you charts and histograms about responses as well as dates of responses. Data allows you to export reports to CSV files or shapefiles, or to create a web map directly from the Survey for use in ArcGIS Online. 2. Under “Common Questions” you will see several possible responses to your survey. You drag and drop (click and hold, then pull them into the left-hand panel) to add them to the survey. Several are selfexplanatory, but there are a few things you should know: - GeoPoint will use the survey participant’s GPS functionality on their device, if it is on and the device supports it, to assign a record of the coordinate location of where the survey response was recorded. Unfortunately, many consumer cell phones have poor GPS functionality, and are even worse in urban environments with tall buildings. Fortunately, the GeoPoint can be “dragged” to an exact location (depending on the device), so it may be helpful to write a “Hint” (the second text box) asking the user to drag the GeoPoint to a proper location if it does not appear accurate. - “Validation” is available for each question. Checking “Validation” will require the question to be answered in order for the survey to be submitted. Think critically about this, as you do not want, for example, a survey about trees to be submitted without any actual tree selected, but you also do not want to require questions to be answered in edge cases where the question is not applicable. For instance, say you had a multiple choice question in your survey asking whether pet shops that catered to dogs, cats, and/or birds,

    Organizational GIS

    333

    but a survey participant found a pet shop that catered only to reptiles. If the question was required, the respondent would have to answer inaccurately to proceed! (This is also why having “Other/ Not listed” as an option for required questions is sometimes good to avoid such situations.) “Likert” is a strange labeled “Common Question but allows users to answer on a spectrum from “Strongly Disagree” to “Strongly Agree.” You can customize the labels, but this implies a range of responses from weak to strong feelings. - Single Choice restricts the response to one response only, while Multiple Choice allows multiple answers. 3. Once you drag a question into the left panel, click on the question to open up the Edit functionality for that question. This allows you to edit labels, provide notes for survey participants, and set validation in addition to other options. To add additional questions, click “Add” next to the Edit button. 4. After you are done designing the survey, click “Publish” to make the survey available to participants. Once you do this, you will not be able to make changes again, so it may be wise to click “Preview” first to see what the survey will look like. If you click the box directly below the title of your survey (in this case, “Description content for the survey”) you can add additional information for the user. This can be entire paragraphs if you like, and you can even import images. The small icon on the bottom right that looks like a small mountain with a sun can be used to hotlink images as shown in Figure 12.7. Keep in mind that if you find an image on the Internet and you link to it, you are at the mercy of the website host and it may change. An alternative is to host and link to your own image; a popular “Image Host” is “Imgur,” at https://imgur.com/. An advanced option within Survey123 is to set “rules.” In Figure 12.8, we ask if the clinic is open outside of 9AM to 5PM on weekdays. Single Choice, Likert, Rating, and Dropdown questions support Rules, which allow you to use “If X, then show Y question.” In this hypothetical survey, we want an

    FIGURE 12.7  You can edit the description of the survey, which will be shown to field workers or survey participants; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    334

    GIS: An Introduction to Mapping Technologies

    accurate and up-to-date understanding of clinics in the neighborhood, and want to know which of them are open outside normal business hours. If they are open outside 9–5 pm, there are a series of follow-up questions we want to ask, but if they are not, there is no reason for those questions to appear in the survey. As you can see, rules can be useful. When adding a Single Choice, Dropdown, Likert, or Rating question, the “Set Rule” icon will appear in the bottom right. It looks like three arrows pointing in different directions, as shown in Figure 12.8, and the “Set Rules” dialog is shown in Figure 12.9. For our next question, we will ask the respondent to record an image of the health clinic using the “Image” option. On a mobile device, this will use the Camera application. Finally, we will also drag a “Singleline Text” question here, allowing respondents to add any other information they find pertinent to their recording. By the way, the “Note” question does not provide this functionality, but instead allows you to add another textbox.

    FIGURE 12.8  A question eligible to have a Smartforms rule set. The icon in the bottom right, when clicked, will enable this functionality; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    FIGURE 12.9  The “Set Rule” dialog box. It uses “If, then” logic to decide whether a question will appear to a participant based on answers to previous questions. Survey123, the desktop application, allows for more advanced manipulation of Smartforms rules; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    Organizational GIS

    335

    Keep in mind that we will eventually be making StoryMaps out of this information. For that reason, when you make your survey, you may find it wise to map information that will be “quantifiable.” If you use Single Choice, DropDown, Number, or Rating questions, you will get information that is easily categorized in a map. For instance, if we had ratings of health clinics from respondents, we could easily map the health clinics by average rating; red for 2 stars or below, yellow for 3 stars, and blue for 4 stars and 5 stars, to show a map of where the highly rated health clinics are. Once you have designed your survey, before publishing, click on the “Settings” tab. This allows you to customize the message that will be displayed after the survey is submitted, and determine whether multiple recordings are allowed from your respondent. Since it is likely you will be the only person using your survey be sure to check the “Allow the respondent to add another record” button, as shown in Figure 12.10. After you click Publish, click on the Collaborate tab. Set the “Choose who can submit data to this survey” to Everyone. (If this involved a sensitive survey or sensitive information, you would limit this to members of the organization or even a group within, but the odds of a random person stumbling upon the survey link are almost nil.) Note that the “Survey link” is available

    FIGURE 12.10  Notice the “Allow the respondent to add another record” button. If you were sending out field workers to make multiple observations, if this was not checked, they would be limited to a single survey response; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    336

    GIS: An Introduction to Mapping Technologies

    FIGURE 12.11  The data tab. This allows us to export the Survey123 into a StoryMaps application to view our completed surveys; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    here; if you were to share this with the public, you would note this and share this link. You can also can set how people interact with the survey here. People can complete the survey from the desktop, or using their mobile devices. Try and open the survey and do a few yourself, to make sure it works and flows as intended, and also to collect some sample data. We want to embed this into a StoryMap to get a live stream of data. Once you have a few survey recordings to work with, say, at least five survey responses, click on the “Data” tab and then the “Open in Map Viewer” button, as seen in Figure 12.11. You will now see the Web Map editor as in ArcGIS Online. The survey will now be a map layer, on the left sidebar under “Contents.” If you click on the three dots, you will be presented with a plethora of options. “Rename” allows us to rename our survey. We also want to “Share” the map, both so the general public can view it and so we can embed it in a StoryMap application. Be sure to check “Everyone” as shown in Figure 12.12; otherwise no one else will be able to see it, even if you embed it in a StoryMaps application. We also want to click “Save.” For the purposes of the StoryMap, go through the previous steps a second time for a study area of your interest and make a second Survey123 web map with different survey results. The idea behind the StoryMap we are going to experiment with is to compare and contrast between multiple datasets; in this case, we will compare two different Survey123 results. The easiest way to do this is to save the Survey123 you just created, then run the Survey123 creation process a second time. After you have two Survey123 web maps, you can choose one of them and click “Add” inside the web map to combine the Survey123 layers into a single web map. Go ahead and click the “Embed this map” option of “Create A Web App.” You will be given a list of default templates to work from; your list of templates may look slightly different as they change from time to time. Figure 12.13 shows the “StoryMap Series” template located in the “Compare Maps/Layers” section. If this does not appear for you, try to search for “StoryMap Series” in

    Organizational GIS

    337

    FIGURE 12.12  “Share” settings for the Survey123 web map. Make sure this is set to “Everyone” if you want this to be publically viewable. Also note the “Embed this map” options; click “Create a Web App” to import this into a StoryMaps application; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    FIGURE 12.13  “Create a New Web App” dialog; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    338

    GIS: An Introduction to Mapping Technologies

    FIGURE 12.14  “Mapping the Shale Gas Boom” StoryMap Series; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    the search box in the top right or reach out to ESRI. Click on Compare Maps/ Layers, and then click on “StoryMap Series.” You may want to click “Preview” to see what this map looks like when finished. We want to give the user the functionality to switch between the results of the surveys. You will be asked to “Add Entry.” Start with the Map. If you have followed this guide, the map you used to create the StoryMap will have “Map used to publish the Map Series” to the right of it. Be sure to check out the “Settings” tab of the Map Series Builder. We made sure to uncheck “Synchronize Map Locations” (otherwise the layers would be locked in terms of extent), as well as customize the appearance. You can read more about StoryMap Series at https://storymaps.arcgis.com/en/applist/map-series/ and see an example of a Side Accordion StoryMap Series as shown in Figure 12.14. The sidebars number 1: Major Shale Plays, 2: State by State Comparison, and 3: Where is Fracking Happening?” open different web maps depending on the different tabs clicked in the sidebar. Figure 12.15 shows a StoryMap in progress using Survey123 data. We see the “Add” and “Organize” buttons at the bottom of this page. In this example StoryMap, we have two layers, Clinic Hours (a survey that records hours of health clinics) and Clinic Ratings (a survey that records ratings of health clinics), and we organize them into different tabs. In Figure 12.15 we see the underlined phrase “Right Care Medical Center,” a “Story Action” that when clicked centers the right-hand panel around the Right Care Medical Center observation. Story Actions allow the viewer to jump around the StoryMap based on navigational links; they can lead to new tabs, change the view on the right, or link to an outside website. More on Story Actions is found at https:// www.esri.com/arcgis-blog/products/arcgis-enterprise/uncategorized/ using-story-map-journal-story-actions/.

    Organizational GIS

    339

    FIGURE 12.15  StoryMap series with health clinics; from Copyright © 2018 Esri, ArcGIS, ArcMap, and the GIS User Community. All rights reserved. With Permission.

    Survey123 is an intriguing tool for getting data and incorporating it into StoryMaps or other GIS datasets. GIS organizations often do not simply use data—they create it using a variety of tools, from pulling from databases and adding geometry, to working with Excel files and converting tables into shapefiles. We have given a brief overview of some of these tools in this chapter.

    References ESRI. Story Map Series. Story Map Series Story Maps. https://storymaps.arcgis.com/ en/app-list/map-series/. Fu, P. and J. Sun. 2011. Web GIS: Principles and Applications. ESRI Press. Stamp, M. 2011. Information Security: Principles and Practice. Wiley. Survey123 for ArcGIS. https://doc.arcgis.com/en/survey123/. Tomlinson, R. F. 2013. Thinking About GIS: Geographic Information System Planning for Managers. Esri Press.

    Index A Aerial photography, 1 Aerial Photography Field Office (APFO), 249 Age-adjusted death rate, 106 AIC Triad, see CIA Triad Air-quality attributes, 143 Albers Equal-Area Conic, 30, 31, 33 American FactFinder, 166–168 Analytical contexts, 77–78 APFO, see Aerial Photography Field Office (APFO) ArcCatalog and ArcMap, 14–15 elevation value, 160 geospatial data, properties of, 152–154 manipulate geospatial data in, 154–155 National Elevation Dataset, 155–158 ArcGIS creating Layout, 268–269 EarthExplorer, 261–266 image analysis window, 263 Online trial, 329 Server, 327 technical description, 269 working with files in, 266–268 ArcGIS Desktop, 13 ArcSDE geodatabases, 73–74 file geodatabases, 73 personal geodatabases, 72 ArcGIS framework, 12 ArcGIS Online, 196–197, 199–201, 223, 274 exercise, 206–212 Generate service areas, 275–276 OSM Editor tool, 276–277 Ready-to-use services, 274–275 Archaeology and GIS, 289 ArcMap, 198 basemap, 39–41 census block group data in, 162–165

    choropleth maps, 119–124 data frame properties, 42, 46 dBase file, 176 display earthquakes, 48–55 display XY data in, 41–47 dot density maps, 124–125 full extent dialogue of, 206 geocode addresses dialog box, 58 graph maps, 133–136 isarithmic maps, 126–130 map template, 116 National Land Cover Database, 161–162 online sign in, 57 point maps, 117–118 prism maps, 130–133 product lifecycle of, 328 proportional symbol maps, 118–119 streets data in, 165–166 tab-delimited text file, 38–39 vector layer transparent, 163 Association of American Geographers, 7 Atomic values, 146 Azimuthal projection, 28 B Bar Chart Map, 101-102 Base maps, 2 BCG, see Boston Consulting Group (BCG) Belknap Crater, 91 Bing Maps, 194 Black Crater, 91 Black Heritage Trail, 114 Boston Consulting Group (BCG), 4 C Cartesian coordinate system, 24 Cartesian inventory, 2 CARTO, 195–196 Cartogram, 101, 281–283

    341

    342

    Cartographic maps, 91 Census Block Group, 154, 162–165 Census boundary data, 169–170 Census tracts, in Cook County, 170–173 ChiTracts layer, 173, 179–181 Choropleth maps, 94, 119–124, 180 CIA Triad, 325, 326 Codd’s rules, 145 Collaborative GIS, 327–329 Column domain, 146 Column header, 145–146 Coming to GIS: Making sense of geotechnology (McHaffie), 9 Common attribute, presence of, 150 Common key, data type of, 150 Community GIS system, 324–325 Computer-aided cartography, 83 Conic projection, 28 Content-diverse group, 9 Cook County, Illinois, 13–17, 20 Coordinate reference systems (CRSs), 9, 31–36 Copyright Act (1976), 66 Crowdsourced geographic data, 299 CRSs, see Coordinate reference systems (CRSs) Cylindrical projection, 28 D DaaS, see Data as a Service (DaaS) Data acquisition, 63–66 initial stages of, 62 Data as a Service (DaaS), 68 Database management systems (DBMS), 151 Database normalization, 145–148 Data classification methods, 102, 103 Data Frame, “Layout” view of, 17–20 Data models database normalization, 145–148 field data types, 148–149 geospatial data, 139–140 raster and vector, 140–144 relational databases, 144–145 spatial and attribute data, 151 table join, 149–151 Datums, 24–26

    Index

    DBMS, see Database management systems (DBMS) DEM, see Digital elevation model (DEM) Demographic profile 1 (DP1), 14, 79 Departmental GIS system, 324 Digital cartographic data/geodatabases, 1, 66 Digital elevation model (DEM), 87 Digital gazetteers, 298 Digital geographic information, 66 Digital humanities, 289–293 georeferencing images, 295–297 mapping texts, 297–299 qualitative data, 306–308 spatial thinking, 293–294 temporal data, 302–306, 315–320 Digital humanities and GIS (Hwang), 12 Digital Millennium Copyright Act, 66 Dijkstra’s algorithm, 273 Disciplinary-resident response, 2 DIY data collection, 71 Dot density maps, 124–125 DP1, see Demographic profile 1 (DP1) E EarthExplorer (EE), 261–266 Earth’s irregular shape, 31 Electromagnetic radiation (EMR), 239, 241, 243 Electromagnetic spectrum, 242 “Elixir of life,” 1 Ellipsoid, 24-25 EMR, see Electromagnetic radiation (EMR) Enhanced Thematic Mapper (ETM), 244, 262 Enterprise GIS system, 324 Environmental Protection Agency (EPA), 80 EPA, see Environmental Protection Agency (EPA) Equal interval map, 104 ESDA, see Exploratory Spatial Data Analysis (ESDA) ESRI, 198, 245 ESRI ArcMap program, 217 ETM, see Enhanced Thematic Mapper (ETM)

    Index

    European Space Agency, 254 Exploratory Spatial Data Analysis (ESDA), 234 Export Map Georeference Information, 19 F FAA, see Federal Aviation Administration (FAA) Facebook, 68 FashMaps website, 215 Federal Aviation Administration (FAA), 71 Federal Geographic Data Committee, 4, 64 Federal Information Processing Standard (FIPS), 149 Field data types, 148–149 Figure-ground relationship, 114 FIPS, see Federal Information Processing Standard (FIPS) Foreign key (FK), 146 Functional dependency, 146 G “G-Conomy,” 4 GCS, see Geographic coordinate system (GCS) GDAL, see Geospatial Data Abstraction Library (GDAL) General-purpose computing systems, 1 Geocoding algorithm, 36–55 addresses, 36-37 batches, street addresses in, 55–58 reference data for, 37 Geocoding tool, 227 GeoDa, 218 advantages of, 238 definition of, 231 LISA clustering map, 237 Moran’s I significance map, 237, 238 Plenario, 232–234 spatial weights, 235 Weights File Creation, 236 Geodatabases, types, 72-74 Geographic coordinate system (GCS), 32

    343

    Geographic database, representing time in, 299–301 Geographic information systems (GIS), 1–5, 61 academic home of, 2 applications of, 8 disciplinary-resident response, 2 “elixir of life,” 1 and geospatial technology, 3 global wireless data networks, 2 mobile platforms, 2 quantitative analytical techniques, 7 Geographic Names Information System (GNIS), 298, 299 Geographic Resources Analysis Support System GIS (GRASS GIS), 218 Geoid, 25 GeoJSON, 140, 232–234 Geological Survey portals, 244 GeoPackage, 199 Georeferencing coordinate systems, GIS, 31–36 datums, 24–26 geocoding, 36–55 geodetic control points, 26 images, 295–297 latitude and longitude, determination, 24 map projection, 27–31 metric method, 23 relative and nonmetric method of, 23 scanned maps, 308–314 tool of, 227–230 Georeferencing (Hwang), 9 Geospatial analysis, 6–7 applications of, 8 buffering, process of, 84 classification of, 83 definition of, 6 description of, 77–78 interpolation, 85–86 networks, analysis of, 86–88 proximity, notion of, 83 selection and subsetting, 78–82 and spatial relationships, 83–84 Geospatial Analyst, 6-7 Geospatial data, 139–140 in ArcCatalog, 152 data collection, 70–72

    344

    Geospatial data, (cont.) description of, 61–63 national map, 64 in United States, 64 Geospatial Data Abstraction Library (GDAL), 225, 227, 229 Geospatial Data Gateway, 249, 250 Geospatial Platform, 64 GeoTiff, 262 Getting the world into your GIS (McHaffie), 10 GIS, see Geographic information systems (GIS) GIS in organizations (Follett), 12 Global navigation satellite system (GNSS), 68 Global Positioning Systems (GPS), 68 Global wireless data networks, 2 GNIS, see Geographic Names Information System (GNIS) GNSS, see Global navigation satellite system (GNSS) Google Maps, 184, 194 Google Timelapse project, 259 Google web page, 185 Government aerial photography EarthExplorer, 248–249 federal agencies, 247 Illinois Geospatial Data Clearinghouse, 250–253 GPS, see Global Positioning Systems (GPS) Graph maps, 133–136 GRASS GIS, see Geographic Resources Analysis Support System GIS (GRASS GIS) gVSIG, 218 H Health GIS applications accessibility in design, 279–281 Euclidean and Manhattan distance, 273 Generate Service Areas tool, 275–276, 278 network analysis paths, 273–274 OSM download tool, 277–278

    Index

    OSM Editor tool, 276–277 Ready-to-use services, 274–275 cartogram, 281–283 cholera map, 271, 272 ring maps, 283–286 Historical maps and GIS, 295 HTML5, 190 HyperText Transport Protocol (HTTP), 186 I Identity integrity, 146 IDW, see Inverse Distance Weighted (IDW) IFOV, see Instantaneous field of view (IFOV) Illinois Geospatial Data Clearinghouse, 250–252 Instagram, 68 Instantaneous field of view (IFOV), 244 Internet technology, 185–190 Interpolation, 85-86 Introduction to health GIS applications (Follett), 11 Inverse Distance Weighted (IDW), 85 Isarithmic map, 98 ISO Cluster routine, 258 J JavaScript Object Notation (JSON), 232 L Lambert conformal conic, 35 Landsat program, 252–255, 259, 263 Latitude and Longitude, 24 Layout, creating, 17-19 LiDAR technology, 245, 289 Lincoln Park Campus (LPC), 151, 158 Line (1D) data, 142 Low-cost platform, 68 LPC, see Lincoln Park Campus (LPC) M Map algebra tool, 141 MapBox, 195

    345

    Index

    Map package (MPK), 199 Map projection, 27–31, 28 MapQuest web page, 190–194 Map Scale, 91-93 Mercator map, 27, 28 Metric method, 23 Miasma theory, 271 Miniature globe, 27 Mobile phone technology, 212–213 Mobile platforms, 2 Moderate Resolution Imaging Spectroradiometer (MODIS), 223, 226 MPK, see Map package (MPK) N NAD27, see North American Datum of 1927 (NAD27) National Agricultural Statistics Service (NASS), 65 National Elevation Dataset (NED), 155–158 National Geospatial Data Clearinghouse, 65 National Institute of Standards and Technology (NIST), 328 National Land Cover Database (NLCD), 161–162 National mapping agencies, 1 Natural break method, 104 Natural Resources Conservation Service (NRCS), 65, 249 NED, see National Elevation Dataset (NED) Network Analyst toolbox, 86, 87 Network Common Data Form (NetCDF), 306 Network data model, 142 NIST, see National Institute of Standards and Technology (NIST) NLCD, see National Land Cover Database (NLCD) North American Datum of 1927 (NAD27), 25 NRCS, see Natural Resources Conservation Service (NRCS)

    O OGC, see Open Geospatial Consortium (OGC) OLI, see Operational Land Imager (OLI) Open Geospatial Consortium (OGC), 152 Open-source GIS (Follett), 11 OpenStreetMap (OSM), 140, 186–189, 194, 230, 326 Operational Land Imager (OLI), 78, 240, 246, 262 Operations research, 1 Organizational GIS Collaborative, 327–329 implementation of, 324–325 practices and considerations, 325–326 StoryMap Series, 336–339 Survey123 (see Survey123) working as professional, 323–324 OSM, see OpenStreetMap (OSM) P PCS, see projected coordinate system (PCS) Peters map, 27 Photogrammetry, 1 Pie Chart Map, 101-102 PK, see Primary key (PK) Plate Carrée, 32, 33 Point maps, 117–118 PoliticalParty, 146 Polygon (2D) data, 142 Portable network graphics (PNG), 188 Primary key (PK), 146 Prism maps, 99-101, 130–133 Private sector geospatial services and data, 67–69 Projected coordinate system (PCS), 32 Proportional symbol maps, 97, 118–119 Pseudo Mercator, 25 Public-domain geospatial data, 152 Q QGIS, 197 ArcGIS Online, 223 cartogram, 281–283

    346

    QGIS (cont.) CSV Import dialog, 221–222 definition of, 217 ecosystem, 199 GeoDa (see GeoDa) georeference scanned maps, 308–314 installation, 218–219 layer types, 220 “Map of Urban Farms,” 219–220 map temporal data, 315–320 raster datasets (see Raster datasets) Symbology tab, 280–281 Qualitative map, 95, 306 Quantitative thematic mapping technique, 112–113 R Raster data models, 4, 5, 140–144 Raster datasets Geocoding tool, 227 Georeference tool, 227–230 LANDSAT basemap, 226 proprietary and open-source tools, 224 remote sensing, operations and analysis for, 223 Transformation dialog, 230 Reference map, 93 Referential integrity, 146 Relational databases, 144–145 Remote sensing applications ArcGIS, 261–269 image classification, 258 image differencing change analysis, 259–261 electromagnetic radiation, 239, 241, 243 electromagnetic spectrum, 242 fieldwork and analysis, 244–245 government aerial photography EarthExplorer, 248–249 federal agencies, 247 Illinois Geospatial Data Clearinghouse, 250–253 Landsat program, 252–255 Nadar photographic images, 248

    Index

    organizational considerations, 245–247 sensors, 255–258 spectrograph, 243 using passive sensors, 241 Remote sensing and GIS (McHaffie), 11 Responsive web design, 191 Right to Know (RTK), 66 RTK, see Right to Know (RTK) S SaaS, see Software as a Service (SaaS) SAGA GIS, 218 Sentinel project, 254 Shapefile separates spatial data, 151 “Snapshot” services, 67 Socrata, 221 Software as a Service (SaaS), 68 Spatial analysis (McHaffie), 10 Spatial thinking, 294 SPC systems, see State Plane Coordinate systems SQL, see Structured query language (SQL) Standard deviation method, 104 State Plane Coordinate (SPC) systems, 35, 36 StoryMaps, 329–339 Structured query language (SQL), 79 Subscription/“On-Demand” services, 67–68 Survey123 creating survey, 331 data tab, 336 description of, 333 design page for, 331–333 homepage, 330 “Set Rule” dialog box, 334 Share settings, 337 Web designer vs., 330 Symbology, 107 Systems analysis methods, 1 T Table join, direction of, 150, 173–178 Table Join task, 149–151

    347

    Index

    TCP/IP, see Transmission Control Protocol/Internet Protocol (TCP/IP) Temporal image differencing analysis, 259, 260 Temporal mapping, 302 Thematic Mapper (TM), 240, 244, 262, 266 Thematic mapping, 2, 15–17, 83 census data, 166–169 fundamental elements of, 91–93 gallery of, 93–102 organizing thematic content, 113–116 quantitative technique, 112–113 reading and interpretation, 102–107 symbology (visual variables), 107–111 visual variables, 109–110 Thematic mapping (Hwang), 10 Three-dimensional earth model, 24 TIGER, see topologically integrated geographic encoding and referencing (TIGER) TIGER/Line, 169–170 Tiled web maps, 189 Tobler’s Law, 83 Topographic map, 93 Topologically integrated geographic encoding and referencing (TIGER), 78 Toxic Release Inventory (TRI), 80, 146 Transmission Control Protocol/Internet Protocol (TCP/IP), 186 Transportation network, 142 Transverse Mercator, 35 TRI, see Toxic Release Inventory (TRI) Twitter, live feed of, 68, 214

    Unique value map, 112 Universal Transverse Mercator (UTM), 33, 34, 93 Unmanned aerial vehicle (UAV), 68, 254 Urban transportation network, 77 US Census Bureau, 78 US Department of Agriculture (USDA), 249 U.S. Department of Labor, 5 US Geological Survey (USGS), 64, 78, 248–250 UTM, see Universal Transverse Mercator (UTM) V Vector data models, 3, 140–141 Vector-to-vector operations, 84 Visual Variables, 107, 109 W WAAS, see Wide-area augmentation system (WAAS) Web GIS, 183–185 applications, 194–197 desktop exercise, 198–206 history, 190–193 and Internet fundamentals, 185–190 mobile phone technology, 212–213 Web GIS (Follett), 11 Wide-area augmentation system (WAAS), 71 Working with geospatial data (Hwang), 10–11

    U UAV, see Unmanned aerial vehicle (UAV)

    X XY data, 41

    Taylor & Francis eBooks www.taylorfrancis.com A single destination for eBooks from Taylor & Francis with increased functionality and an improved user experience to meet the needs of our customers. 90,000+ eBooks of award-winning academic content in Humanities, Social Science, Science, Technology, Engineering, and Medical written by a global network of editors and authors.

    TA YLOR & FRANCIS EBOOKS OFFERS: A streamlined experience for our library customers

    A single point of discovery for all of our eBook content

    Improved search and discovery of content at both book and chapter level

    REQUEST A FREE TRIAL [email protected]