Geographic Information System 817993537X, 9788179935378

Geographic Information System (GIS) aims to organize complex interrelation between different layers of information throu

313 67 2MB

English Pages 169 [174] Year 2013

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Geographic Information System
 817993537X, 9788179935378

Citation preview

Jatin Pandey $ Darshana Pathak Geographic Information System (GIS) aims to organize complex interrelation between different layers of information through a process of gathering, analysing, processing, storing, and presenting the spatial data and images available through different sources. It integrates hardware, software, and data for capturing, managing, analysing, and displaying all forms of geographically referenced information. This book presents theory, methods, and latest research finding for problem-solving and decision-making using GIS-based technologies.

Key Features $ Explains raster and vector data and attributive database. $ Discusses application of GIS in geotechnical engineering, transport engineering, and water resource engineering. $ Includes model question papers and is well illustrated.

Jatin Pandey $ Darshana Pathak ISBN 978-81-7993-537-8

The Energy and Resources Institute

9 788179 935378

The Energy and Resources Institute

GeoGraphic information SyStem

GeoGraphic information SyStem Jatin Pandey $ Darshana Pathak

The Energy and Resources Institute

© The Energy and Resources Institute, 2014

ISBN 978-81-7993-537-8

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher. All export rights for this book vest exclusively with The Energy and Resources Institute (TERI). Unauthorized export is a violation of terms of sale and is subject to legal action. Suggested citation Pandey, Jatin and Darshana Pathak. 2014. Geographic Information System. New Delhi: TERI

Published by The Energy and Resources Institute (TERI) TERI Press Darbari Seth Block IHC Complex, Lodhi Road New Delhi – 110 003 India

Printed in India

Tel. Fax

2468 2100 or 4150 4900 2468 2144 or 2468 2145 India +91 • Delhi (0) 11 Email [email protected] Website www.teriin.org

Dedication This book is dedicated to my parents Mr S.C. Pandey and Mrs Hema Pandey, and my brother Mr Kunal Pandey for their love and support. Jatin Pandey I dedicate this book to my parents Mr G.D. Pathak and Mrs Ganga Pathak, and my husband Mr Bharat Joshi, and all family members. Darshana Pathak

Foreword

I am happy to present this book on Geographic Information System to the readers. The work put forth by the authors Mrs Darshana Pathak and Mr Jatin Pandey is commendable. I convey my best wishes to both of them and wish them success in their future endeavours. I hope that the readers will find this book useful and fruitful. Padma Vibhushan Shri Sundarlal Bahuguna Environmentalist and Social Worker Leader of Chipko Movement

Foreword

Geographic information system (GIS) is a computer-based technology that allows digitization and graphical representation of geographical data for making efficient planning and decision. It captures, stores, manipulates, represents, and analyses spatial data. The present book has been organized splendidly to address basic terms, tools, and techniques of GIS technology. Geographic information system has multifold applications spanning from natural resource management and network management to planning and a host of other applications. The book intends to first get the readers acquainted with GIS and subsequently drive them into using this technology for various applications. The book has been structured into six chapters, and the content in these chapters is extremely useful and relevant for the students of science and technology. I sincerely congratulate the young and dynamic authors, Mrs Darshana Pathak and Mr Jatin Pandey, for coming up with a book on such an efficient technology. They truly deserve compliments. I do believe that the book will light up the hearts of many GIS practitioners, implementers, policymakers, and enthusiasts. Dr Durgesh Pant Professor and Head of Department School of Computer Sciences and Information Technology Uttarakhand Open University

Acknowledgements

This book would not have taken shape without the contribution of many people. First and foremost, we thank Mr R.K. Joshi of TERI Press for encouraging us to write this book. We express our sincere gratitude to Prof Kamal Kumar Ghanshala, Chancellor, Graphic Era Hill University, and Prof Jitendra Shah, Senior Research Scientist, IIT Bombay for their guidance. We also thank Ms Nayna Garg and Mr Kapil Bhadouria for their help with data entry and figures used in this book. Moreover, we express our heartfelt gratitude to TERI for considering this manuscript worth publishing. We specially thank Ms Sushmita Ghosh and Mr Arun Kumar Paul of TERI Press for their patience and support during the entire process. Last but not the least, we are grateful to all our teachers who molded us into what we are today and being the catalyst for our transformation.

Preface

Information technology is an umbrella term that includes sets of tools, processes, and methodologies with associated digital equipment to collect, process, and present information. Over the course of past several decades, information technology has been expended by various principal technologies to input, process, output, and distribute the information around the globe. Information technology and the informatics wave have touched shores of every sphere of human life, making it simpler and sophisticated at the same time. This information technology revolution has been a catalyst for the metamorphosis of isolated into a global connected village. Digital technologies are now part of our everyday work. Today, innovations in information technology are facilitating wide-ranging impacts across numerous domains of society. Various digital revolutions and technologies have proliferated the pre-mechanical, mechanical, electromechanical, and now electronic phase of information technology. The inherent presence of informatics in the fabric of our lives is also so intricately interwoven that at times we find it difficult to imagine a life without it. In the present times, a life without mobile seems a farfledged possibility, but this was not the case 20 years ago. Geographic information system, or GIS as it is called, is one such technology, which is in its initial stage; soon it will catch up and integrate into our day-to-day activities. It is an information system that has the potential to organize complex interrelation between different layers of information through the process of gathering, analysing, processing, storing, and presenting spatial data and images available through different sources. Information management is an important aspect of informatics occasioned by better presentation technique. The pictorial (graphic) representation of data always provides better opportunities for data interpretation and analysis; this fact triggered the inclusion of geographical data with digital technologies. The more manageable the information is, the easier it is to receive, collect, organize, interpret, and verify it. This in turn helps in making optimal decision. Geographic information

xiv

Preface

system takes the traditional study of geography and projects it to the digital level. Dr Roger Tomlinson, a Canadian geographer, is known as the “father of GIS”. He was the first to use computerized GIS applied data to a computer program, which then assisted in understanding the management and use of lands in Canada. It is a dynamic and newer field of study comprising applicative informatics. GIS is, in essence, an emerging and enhancing science. The future holds vast application of GIS, and hence it becomes a very important interdisciplinary subject. The motivation to write this book on GIS was due to the nonavailability of a quick reference guide especially for students. The book is comprehensive and covers major topics of GIS subject in many universities across India. In Chapter 1, we begin with a review of the basic elements, principles, and research on information technology, digital communication, and information system. Chapters 2 and 3 include introduction to basic GIS data-types and methods or techniques for processing these data types. Within these chapters, basic raster and vector data types and data models are presented and illustrated with example. The mapping of geographical data is an essential part of geographical information system. Chapter 3 focuses on the underlying concept of GIS functionality and GIS database management. Data acquisition is the first step of GIS functionality, and GIS has the potential to assimilate data from different sources for further processing. Today, with the advancement of satellite technology, remote sensing is growing with a rapid pace for acquiring data on earth surface. Chapter 4 explains the fundamentals of electromagnetic radiations and the basic working of remote sensing technology. Chapter 5 concentrates on a topic too seldom discussed: application of GIS technology in various fields with a wide range of advance GIS-based solution for optimal planning and decision-making procedures. Chapter 6 takes numerical in GIS heads on with solutions provided to assist readers. Although proper care was taken to avoid errors and ambiguities in the book, to err is human. Readers are requested to point out any faults they come across and give suggestions that may help in improving the later editions of the book. For this, you can contact the authors at [email protected] and [email protected]. We hope the book serves its purpose and becomes a guide in true sense for students in their journey through GIS study. “Let the light within us guide our path”.

Contents Foreword by Shri Sundarlal Bahuguna Foreword by Dr Durgesh Pant Acknowledgements Preface

vii ix xi xiii

1. Introduction to Geographic Information System Introduction Information System Geographic Information System Cartography and GIS GIS Database GIS Data Type GIS Data Models Topology and GIS Exercises References

1 1 2 5 14 16 17 18 30 32 33

2. Raster and Vector Data Introduction Vector Data Raster Data Raster Encoding Methods Shape of the Earth Transformation Digitization Exercises References

35 35 35 38 39 51 57 61 63 64

3. Attribute Database and Overlay Attribute Data Relations

65 65 66

xvi

Contents

GIS Functionality Spatial Query Vector Data Queries Classification Overlay Buffer Inter-visibility Network Theory Exercises References

68 71 73 74 75 80 82 83 86 86

4. Remote Sensing and Digital Image Processing Remote Sensing Sources of Energy for Remote Sensing Interaction of Electromagnetic Radiation with Atmosphere Interaction of Electromagnetic Radiation with the Earth’s Surface Use of Electromagnetic Spectrum for Remote Sensing Purposes Process of Remote Sensing Sensors and Platforms Orbits and Swaths Platforms Image Processing Applications of Remote Sensing Exercises

87 87 88 89

93 94 95 96 99 100 107 108

5. Applications of GIS GIS in Planning and Management of Utility Lines Geotechnical Engineering Water Resource Engineering Example of GIS Application Development with Open Source References

109 109 111 113 115 121

6. Numerical Problems Scale Conversions Playing with Database

123 123 131

92

Contents xvii

Glossary Bibliography Index About the Authors

137 143 147 153

1

Introduction to Geographic Int Information System

INTRODUCTION Driven by a revolution in digital technologies, the requisites and practice of science are changing. All elements of science—observation, experiment, theory, and modelling—are being transformed by the continuous cycle of generation, access, and use of an ever-increasing range and volume of digital data. Advances in computational capacity and tools, coupled with the accelerating collection and accumulation of data in many disciplines, have given rise to new modes of conducting research. Computers and information technology have penetrated almost all aspects of science and human lives, and data are the essence of this new field. Reuse of digital data has dramatic benefits; it opens up opportunities to utilize information over unlimited time periods and for unlimited purposes, thus aiding science and society. The term “data” is the plural of the Latin word “datum”, which means “something given”. Data refer to the lowest abstract or raw input, which when processed make meaningful output. Data are raw and have no significance beyond their existence. When data are processed, they become information. Information is the data that have been given meaning by way of relational connection. In computers, a relational database makes information from the data stored within it. Contextual information integrated with a set of skills, experiences, and relevant concepts generates knowledge. Knowledge is a term that can be defined in many ways and is being constantly redefined. In simplest words, it is a cognitive process based on a specific context and a dynamic set of information combined with an individual’s expertise and capability to derive new information and conclusions. Recent developments in information and knowledge acquisition, along with the proliferation of personal computers, dramatized by information systems, have bridged the gap between information and its application in

2

Geographic Information System

various imperative fields of management and decision-making. To reach an optimal solution, an individual needs a set of relevant information and criteria to support his or her decision. Information systems have an inherent potential to enhance decision-making processes. Whatever the nature of the domain—business management or enterprise information systems; natural resource management; or medical, defence, or social facilities—an information system is an important tool in it. Table 1.1 gives the differences between data and information. Table 1.1

Differences between data and information

Data

• Symbols • Collection of raw facts • Unprocessed and may not be in order • Difficult to understand • Example: spread sheet

Information

• Processed data • Easy to understand and always in order • Provides answers to what, who, when, and where questions • An appropriate collection of information, which is able to define reasoning forms knowledge

INFORMATION SYSTEM Information system is an integrated set of components for collecting, storing, and processing data and for delivering information, knowledge, and digital products. It is a combination of hardware, software, infrastructure, and trained personnel organized to facilitate planning, control, coordination, and decision-making in an organization. Technically, there are two different perceptions to define information system: 1. Functional perception: An information system is a technologically implemented medium to record, store, and disseminate information as well as to support in making decisions and inferences. 2. Structural perception: An information system consists of a collection of people, processes, data, models, technology, and partly formalized language, forming a cohesive structure, which serves some organizational purpose or function.

Introduction to Geographic Information System 3

An information system captures raw data from within the organization or the external environment as input, processes it, and delivers more meaningful information as output for decision-making. The output is provided to end users or for other activities. In addition to supporting decision-making, coordination, and control, information systems may also help in analysing problems, visualize complex subjects, and create new products.

Components of an Information System A computer-based information system uses the computer to perform its intended tasks. The components of an information system are as follows (Figure 1.1). • Software: The software consists of carefully organized instructions and codes written by programmers in any of the various special computer languages. • Hardware: This refers to the physical parts of a computer and related devices. • People: Stakeholders are involved at different stages of the life cycle of an information system, such as end users, specialists, programmers, database administrators. • Database: A database is a knowledge base containing data. • Network: Communication media and network support are vital components of an information system.

Figure 1.1

Computer-based information system

Source O’Brien (1993)

4

Geographic Information System

Types of Information System Information systems are classified into groups depending on the following four parameters—organizational levels, mode of data processing, type of support provided, and system objectives.

Organizational levels Enterprise information systems, inter-organizational systems, and intraorganizational systems are classified as organizational-level information systems. These systems are organized in a hierarchy where the top system consists of many subsystems below it.

Mode of data processing Information systems are categorized broadly into three different groups based on how they process data. 1. Batch processing systems: In batch processing systems, periodic processing of an already occurred transaction takes place at different times. 2. Online batch systems: In online batch systems, data are captured by online devices and processed periodically. 3. Online real-time systems: In online real-time systems, data are captured and processed in real time to update records.

Type of support provided Information systems under this category include different types of office automation systems used by the lower part of the office hierarchy system.

System objectives Information systems are classified into four subcategories depending on their system objectives. 1. Transaction-processing system: It is a computerized system intended to perform and record the routine daily transactions necessary to conduct a business. A transaction is an event that generates or modifies data, which is eventually stored in an information system. 2. Management information system: It is a planned system of collecting, storing, and disseminating data in the form of information needed to carry out the functions of management. 3. Executive information system: It is a management information system tailored to the strategic information needs of top managers.

Introduction to Geographic Information System 5

4. Decision support system: It is a specific class of computerized information systems that support business and organizational decisionmaking activities. A properly designed decision support system is an interactive, software-based system intended to help decision-makers compile useful information from raw data, documents, personal knowledge, or business models to identify and solve problems and make decisions. Table 1.2 briefly lists the major functions and examples of different types of information systems. Table 1.2

Types of information systems

Type of system

Function

Example

Functional area information system

Supports the activities within specific functional area

System for processing payroll

Transaction-processing system

Processes transaction data from business events

Walmart checkout pointof-sale terminal

Enterprise resource planning system

Integrates all functional Oracle, SAP areas of the organization

Management information Produces reports system summarized from transaction data, usually in one functional area

Report on total sales for each customer

Decision support system

Provides access to data and analysis tools

“What-if” analysis of changes in budget

Expert system

Mimics human expert in a particular area and makes a decision

Credit card approval analysis

Executive information system

Presents structured, summarized information about aspects of business important to executives

Status of production by product

GEOGRAPHIC INFORMATION SYSTEM A geographic information system (GIS) is an information system that has the potential to organize complex interrelations between different layers of information through gathering, analysing, processing, storing, and presenting the spatial data and images available through different sources. It is a computer-based information system that integrates hardware, software, and data for capturing, managing, analysing, and displaying all forms of geographically referenced information.

6

Geographic Information System

Geographic information system allows us to view, understand, question, interpret, and visualize geographical data in ways that reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts. It is an important platform for visualizing geographical data. Some important terms associated with GIS are discussed in the following subsections.

Geography Geography is the science concerned with the formulation of the laws governing the spatial distribution of certain features on the surface of the earth (Schaefer 1953). It is the study of both human and environment landscape where these landscapes comprise real as well as prescriptive spaces. The adverb “geographical” shows the belongingness or characteristics of a geographical location (spatial).

Location Location is an indispensable concept of geography that distinguishes it from other fields. In simple terms, it is the position of an object on the earth’s surface with respect to a coordinate system. Geocoding is a process through which the geographical coordinates of a street or school can be determined.

Distance Distance is another important concept to be considered for any system founded on theories of geography. It is the measure of degree of separation between any two points on the earth’s surface. A variety of units are used to measure distance. For a two-dimensional plane surface, distance can be mathematically calculated as follows. __________________

d=

÷ (x2 – x1)2 + (y2 – y1)2

However, three-dimensional data obtained with the help of surface analysis and map analysis are dealt with in GIS. Direction and space are other geographical concepts to be explored while designing any geography-based decision-making system.

Information Information is processed raw data, which is a valuable asset to any organization and plays a vital role in planning, decision-making, and management. Geographical information describes the location and

Introduction to Geographic Information System 7

attributes of objects on the earth’s surface, the geographic relationship among objects and phenomena, patterns, and residents. It is wide and varied, and answers the question “what is where”. Geographical information can be illustrated with the help of two-dimensional data (that is, in text format), such as feature table, as well as three-dimensional data, such as maps and graphics (Figure 1.2).

System System is a well-defined and purposeful structure comprising functional, interdependent, and interrelated elements joined together to achieve a common goal. Every system is said to be composed of subsystems. Every subsystem has some defined objectives. The general components of a system include the following. • Input and output are the data fed to the system and the resulting information provided by the system. • Transformation processes convert the input into the desirable output. • Control is a process that monitors accuracy, safety, performance, and continuity.

Figure 1.2 Representation of geographical information using map Source

8 Geographic Information System

Feedback is a subsystem that feeds results back to the system and controls the system by making changes to the input and/or process. • Boundaries are defined by system observers. Geographic information system integrates geographical features with tabular data to assess real-world problems. At the simplest level, GIS can be thought of as a high-tech equivalent of a map. It provides the facility to visualize, question, analyse, interpret, and understand geographical data to reveal relationships, patterns, and trends in the form of maps, globes, reports, and charts. •

Components of a GIS A working GIS comprises the following five key elements (Figure 1.3). 1. Hardware: The computer on which a GIS operates is known as a hardware. Peripheral devices such as digitizer and scanners are used to convert data from maps and documents into digital form and send them to the computer. A digitizer board is a flat board used to vectorize any map object. Plotters or other kinds of display devices are used to present the result of the data processing, and a tape device is used to store data or programs on magnetic tape. 2. Software: The GIS software provides the functions and tools needed to store, analyse, and display geographic information. The basic software components are as follows. • A geo-database management system that supports the storage of spatial data for query and analysis. • Software tools for input and processing of geographical data.

Figure 1.3

Components of GIS

Introduction to Geographic Information System 9

• Graphic user interface to support geographical and imagery data. 3. Data: Data are the essence of GIS. A GIS integrates spatial data with other existing data resources. The integration of spatial and tabular data stored in a database management system is a key functionality afforded by GIS. GIS data can be acquired through data collection devices and methods, or can be purchased from a commercial data provider. 4. Methods: A successful GIS operates according to a well-designed plan and business rules, which are models and operating practices unique to each organization. These methods are used to perform complex spatial analysis providing both qualitative and quantitative results. 5. People: The real power of GIS comes from the stakeholders involved in the different phases of a GIS life cycle as well as from the data acquirer to the end user of the developed GIS application. GIS users range from technical specialists who design and maintain the system to those who use it to help them perform their everyday work.

How GIS Works? Geographic information system has the ability to display and analyse spatial data integrated with database. Maps can be drawn from databases, and data can be referenced from maps. A GIS database holds a wide and expanded range of geographical information. GIS overlays interrelated information from different sources to process GIS works on themes or layers. A GIS theme is a collection of similar geographical objects such as a road network, waterbodies, or soil types. Figure 1.4 represents superimposed layers of geographical data to create a map. Any GIS operation consists of the following five central phases. 1. Data acquisition: Data acquisition is the process of gathering relevant geographical information (spatial and non-spatial) from different sources. GIS data can be captured in two forms: (1) analogue or physical data (for example, maps) and (2) digital or data in computerreadable forms (for example, satellite data). The various sources from which GIS data can be captured include maps, satellite images, aerial photographs, tabular data, or field data (see Chapter 2 for data acquisition techniques). 2. Preprocessing: The process of converting gathered data into a suitable format for input into a system is called preprocessing. Data format conversions, digitization of maps, and recording of

10

Geographic Information System

field or attribute information into database are the key steps of this phase. Error detection, data reduction and generalization, and map projection and interpolation are other important techniques of this phase intended to ready the data for analysis and product generation. 3. Data management: GIS databases allow storing, querying, and operating on geographical information, thus enabling adding, deleting, updating, or defining the database contents. Most GIS databases are relational (tabular) in nature, which is easy to manage and manipulate. 4. Analysis and manipulation: The graphic user interfaces of a GIS software enhance its power to manipulate data set according to the analysis requirements. The most common usage of GIS is spatial analysis. With the help of a GIS database and preprocessed data, it is

Figure 1.4

Layer representation of GIS

Introduction to Geographic Information System 11

possible to obtain an interactive visualization of the spatial patterns of any given phenomena. A GIS can take care of a wide range of mathematical operations, overlay techniques, Boolean operations, or logical operators. 5. Product generation: The last phase of a GIS life cycle is product generation. An end product may be interactive digitized maps, reports, charts, documentations, or a GIS web application with the ability to provide a collaborative platform for decision support and policymaking. A block diagram to illustrate the GIS central operations is given in Figure 1.5.

Objectives of GIS Geographic information system has the following objectives. • To facilitate generation of a layered structure of geographical information and help visualization of geographical information from different sources as a set of relevant layers and their interrelationships.

Figure 1.5

Block diagram of GIS

12

Geographic Information System

• To provide the facility to conduct complex analysis and query on different layers and their geographical attributes, thus making it possible to retrieve new information for optimized decision-making and planning. • To define existing patterns and trends in the three-dimensional format for better understanding and conceptualization. • To integrate data from different sources. • To eliminate redundant data, if any. • To provide efficient data handling and distribution. • To incorporate remotely sensed data for resource mapping, monitoring, and management.

Why is GIS Important? Geography is not concerned only with the distribution of various elements on the earth’s surface. It is also a complex pattern of interrelated phenomena, geological structures, and residents. Hence, only a set of geographical information about an element is not sufficient for analysis. It is also useful in providing information about the location and features of other elements to which it relates to; however, it is difficult to depict the relationship among elements or phenomena, resulting in limited analysis potential. Thus, there is a need for a tool or system that provides a platform for data acquisition, processing, and output generation to analyse interrelated relevant geographical information. GIS enables analysis of the complex interrelations between different layers of geographical information through a process of gathering, presenting, analysing, and visualizing the data and images that may be available from different sources. GIS is an extension of traditional cartographical (art of map making) sciences integrated with faculties of information systems. It has rapidly accelerated the use of spatial data in different disciplines of management, planning, and decision-making. It makes map data more interactive and collaborative and hence more useful. GIS databases are often large collections of geographical features and their attributes. An important benefit of GIS is its capacity to combine layers of data into a single map where the user can on or off layers according to requirements. A GIS user can generate new information from existing ones by using different combinations of layers. Query is another advancement of GIS. A query is similar to a “search” for a web page. It retrieves relevant data from the database, which relates to the required data and provides it as a new theme. Not only the visualization of data, but also the characteristic to combine different thematic layers makes GIS a powerful tool. Some examples of thematic

Introduction to Geographic Information System 13

layers include roads, rivers, forest, and buildings. GIS can also combine these layers to produce new themes, which answer questions like “How many buildings are at a 50 m distance from Lake Site?”

Applications of GIS Geographic information system is a technology used by a variety of industries and fields for simulation of complex patterns, visualization and analysis of real-world situations, management, and decision support. GIS plays a decisive role in transport, forestry, natural resource management, business, tourism, public safety, health domain, and education. It helps stakeholders by collecting, processing, manipulating, and displaying data in different formats as per requirement. These computer-based information systems assist experts in different fields to make an appropriate decision by analysing spatial data in the desired format. A detailed discussion on the applications of GIS in different fields is given in Chapter 5. Table 1.3 gives some uses of GIS in various industries. Table 1.3

Uses of GIS in different industries

Industry

Use of GIS

Forestry

Inventory and management of resources

Police

Crime mapping to target resources

Epidemiology

To link clusters of diseases to sources

Transport

Monitoring routes

Utilities

Managing pipe networks

Oil

Monitoring ships and managing pipelines

Central and local government

Evidence for funding and policy

Health

Planning services assessments

Environment agencies

Identifying areas of risk (for example, floods)

Emergency departments (for example, ambulance)

Planning quicker routes

Retail

Store location

Marketing

Locating target customers

Military

Troop movement

Mobile phone companies

Locating masts

Land registry

Recording and managing land and property

Estate agents

Locating properties that match certain criteria

Insurance

Identifying risks

Agriculture

Analysing crop yields

and

health

impact

14

Geographic Information System

CARTOGRAPHY AND GIS Cartography is the art and science of map making. Map is a set of points, lines, and areas all defined both by position with reference to a coordinate system and by their non-spatial attributes. The term comes from two Greek words—chartis, meaning “map”, and graphos, meaning “to draw” or “write”. Basic cartography covers the following two data components. 1. Location data: These indicate where the area being depicted is located. 2. Attribution data: These show bodies of water, mountains, valleys, hills, and other geographical features of interest. Cartography relies heavily on mathematics to represent the earth and on science to help describe and understand geological features (Figure 1.6). A map of the world reflects an immense mathematical and artistic challenge—that of translating the three-dimensional globe to a twodimensional surface. Cartography is an ancient discipline that dates from the prehistoric depiction of hunting and fishing territories. Evidence of map making suggests that the map evolved independently in many different parts of the earth. The people of Marshall Islands made stick charts for navigation. Pre-Columbian maps in Mexico used footprints to represent roads. The oldest known maps have been preserved on Babylonian clay tablets since 2300 BC. Traditional cartography was a difficult and tedious task due to some fundamental challenges. It was difficult to accurately represent terrains

Figure 1.6

Cartography

Source

Introduction to Geographic Information System 15

with different heights and slopes in a two-dimensional flat surface. It was also difficult to determine the spatial information not relevant to the map’s purpose. It was hard to design the schema of the map and select the traits of elements to be referenced in the map. The discovery of the New World led to the need for new techniques in cartography, particularly for the systematic representation on a flat surface of the features of a curved surface. GIS emerged in the 1970s and 1980s. It represents a major shift in the cartography paradigm. In traditional (paper) cartography, the map was both the database and the display of geographic information. In GIS, the database, analysis, and display are physically and conceptually separate aspects of handling geographic data. The advancements in mapping offered by modern computers are incredible. There is practically an unlimited number of available colours, very high resolution of graphical displays, software supporting generation of realistic three-dimensional scenes, dynamic views (animation), user interaction with displays, dynamic display transformation, and interlinking of multiple views (Figure 1.7). All this is available even on standard personal computers, while more complex and powerful equipment can further enhance some of these features. Modern cartography largely involves the use of aerial photographs as a base for any desired map or chart. The procedures for translating these photographic data into maps are governed by the principles of

Figure 1.7 Digital cartography Source

16

Geographic Information System

photogrammetry and yield a degree of accuracy previously unattainable. Satellite photography has made possible the mapping of the features of the moon and of several planets and their satellites.

GIS DATABASE A GIS database is a collection of geographic data sets, features class (a collection of features or a table of rows where each row has a geographic column), raster data, and attribute tables. It is used primarily to store, query, and manipulate GIS data. Like a map, GIS data commonly have two data components. 1. Spatial component: The spatial component of the data describes the unique geographical location for objects or phenomena; for example, the location of a lake. The geographical location must be specified in a unique way. A coordinate system is used to specify the position in an absolute way and on the earth’s surface is known as a georeference system. Examples of geo-reference systems are Universal Transverse Mercator and the latitude–longitude system. For small areas, the simplest coordinate system is the regular square grid. Internationally, there are many different coordinate systems. Location information on a map is provided with the help of points, lines, and polygons. 2. Attribute component: Attributes refer to the properties of spatial entities. They are often referred to as non-spatial data since they do not, in themselves, represent location information. This type of data describes the characteristics of the spatial features; for example, the quality of water, the quantity of water, area, and depth of a lake. Characteristics can be quantitative or qualitative in nature or both. Attribute data are often referred to as tabular data. Geo-databases store geometry, a spatial reference system, attributes, and behavioural rules for data. Various types of geographic data sets can be collected within a geo-database, including feature classes, attribute tables, raster data sets, network data sets, topologies, and many others. Geo-databases can be stored in IBM DB2, IBM Informix, Oracle, Microsoft Access, Microsoft SQL Server, and PostgreSQL relational database management systems, or in a system of files, such as a file geodatabase. A GIS database maintains the following basic characteristics. • It stores a rich collection of spatial data in a centralized location. • It applies sophisticated rules and relationships to data. • It defines advanced geospatial relational models (for example, topologies, networks).

Introduction to Geographic Information System 17

• It maintains integrity of spatial data with a consistent, accurate database. • It works within a multi-user access and editing environment. • It integrates spatial data with other information technology databases. • It supports custom features and behaviour. • It is expensive to create and update, often resulting in outdated data sets, which may have been updated years ago. • Internal data stored in a GIS database must possess completeness, logical consistency, temporal consistency, thematic consistency, and positional consistency (Table 1.4). Table 1.4

Desired properties of internal data stored in GIS

Completeness

Presence and absence of features, their attributes and relationships

Logical consistency

Degree of adherence to logical rules of data structure, attribution, and relationships

Positional consistency Accuracy of the position of features Temporal consistency

Accuracy of temporal attributes and temporal relationships of features

Thematic consistency

Accuracy of quantitative and non-quantitative attributes

GIS DATA TYPE Geographic information system stores spatial information about the real world as a collection of thematic layers, where layers are linked together with associated geography. This simple but extremely powerful and versatile concept has been proven invaluable for solving many realworld problems. The ability of GIS to handle and process geographically referenced data distinguishes it from other information systems. A data type is the classification method that distinguishes different types of data used by computer systems. Human beings can easily recognize different types of data and use special symbols such as $ and % to represent data. Similarly, computer systems use special internal codes to keep track of the different types of data they process. Geographically referenced data describe both the location and characteristics of spatial features on the earth’s surface. GIS supports two basic spatial data types: raster and vector (Figure 1.8).

18

Geographic Information System

Figure 1.8

(a) Raster data and (b) Vector data

Raster Data Type Raster data type represents spatial (geographical) information by dividing it into regularly spaced and quantized cells. A cell is a small grid and is known as a pixel (picture element). Raster cells are organized as a matrix of rows and columns. Each pixel has two associated values: (1) pixel location represented as a row/column number, and (2) cell value to represent attribute/property of interest.

Vector Data Type The following are examples of vector data types. • Points represent discrete points on the earth’s surface. • Lines represent linear features such as rivers and roads. Each line has several coordinate points, which maintain its shape. • Polygons represent bonded areas such as waterbodies and political boundaries.

GIS DATA MODELS A model is a simplified representation of a phenomenon or a system. GIS modelling involves the symbolic representation of the location properties (where), as well as the thematic (what) and temporal (when) attributes describing the characteristics and conditions of space and time. A GIS model attempts to emulate processes of the real world at some point of time or for a limited time period. It allows the testing of a hypothesis with different data sets related to a geographical scenario. A model can

Introduction to Geographic Information System 19

be embedded into a GIS application for easier reproduction of data. A GIS model can be exported as a flow chart or modelling data structure. There are different types of GIS models with some fundamental characteristics such as scale, extent, purpose, approach, technique, association, and aggregation. A large number and variety of data models are used in GIS, some of which are as follows (John 1997).1 • Vector data models Spaghetti data model Topological data model • Raster data models (more specifically, tessellation model) • Surface models Triangular irregular network (TIN) model Digital elevation model (DEM) • Conceptual models Entity-relationship model Enhanced entity-relationship model • Network models • Relational models • Object-oriented models • Hierarchical models • Semantic data models • Conceptual models

Vector Data Models Vector data models use points, lines, and polygons to represent any geographical location. In vector representation, the boundaries are defined as a series of points and each point is uniquely mapped to the x–y coordinates of a geo-reference coordinate system. The nonspatial attributes of these locations are stored in conventional database management systems. Two very common types of vector data models are spaghetti data model and topological data model.

Spaghetti data model A vector-based data model where each element on the map becomes a logical record in a digital file and is defined as a string with x–y coordinates is called a spaghetti data model. This is a simplest data model where every object is stored independently. Objects in the 1

Details available at

20

Geographic Information System

spaghetti model are stored as a set of two elements—name of the object and the x–y coordinate value of the object location. A spaghetti model is illustrated in Figure 1.9. Some properties of the spaghetti model are as follows. • A common boundary between two polygons is recorded twice. Hence, redundant data exist in a spaghetti model. • Lines are encoded as strings of x–y coordinates, while polygons are encoded as curved loops. • No spatial relationships are stored in the spaghetti model.

Topological data model A vector-based data model that encodes spatial relationship of points, lines, and polygons and defines how they share geometry represents a topological data model. A topological model introduces two new elements of discrete mathematics—node and edge. A node is a uniquely defined point that joins several arcs. An edge is an arc that has a defined starting node and ending node. This model stores geometry as a series of nodes and arcs. A shared geometry, such as a common boundary between two polygons, is stored only once in a topological model; hence, redundancy is eliminated. A topological data model is illustrated in Figure 1.10. Some properties of a topological model are as follows. • The node is the basic entity in this kind of model. It is the point where several arcs meet. • An arc is a series of nodes having a starting node and an ending node. • A point is a single x–y coordinate and is considered a polygon with no area.

Figure 1.9

A spaghetti data model

Introduction to Geographic Information System 21

Figure 1.10

A topological data model

• A polygon is a closed loop of arcs that represents the boundary of the polygon. • Every object of the model is composed of a less complex structure. • Topological models provide opportunities for geometric analysis of location without actual access to the location.

Raster Data Models Raster data models represent geographical location as a series of interconnected cells where each cell is limited and represents an equal area of earth surface. Raster data models use raster data type to encode spatial data of the area of interest. The matrix (row–column structure) of cells is called a grid. In raster data models, the accuracy of data depends on the cell size, since the cell is the smallest unit that contains spatial information of a location. Each cell of a raster data model contains an associated data value. For a 1 bit raster file, there are only two possible values for the cell, 0 or 1, while for an 8 bit raster file, there are 256 possible values for each pixel. Figure 1.11 shows a 4 bit raster file. A data value can represent a colour or grey value, depth or height, and measurements or any other thematic value. The area covered by each pixel is known as spatial resolution. An important property of a raster model is that all 0-dimensional (points) and 1-dimensional (lines) features will be located towards the centre of the cell. There are several raster-based models, and the common ones include eGrid ESRI files, digital orthophotos, and satellite imagery. Some properties of a raster data model are as follows. • Often used for biological and physical subsystems of the geosphere, such as temperature, elevation, and vegetation cover. • Focuses on analysis and modelling of images.

22

Geographic Information System

Figure 1.11

A 4 bit raster file

• Lines and points move towards the centre of cells in a raster model. • The spatial position of each cell in a raster model can easily be calculated by defining the origin of the raster and the spatial resolution (cell size) of each cell. • Tiff, jpeg, and bmp are various data formats based on the raster data model. • Landsat TM satellite imagery data are raster data with a spatial resolution of approximately 30 m on one side.

Surface Models Triangulated irregular network model Triangulated irregular network model uses contiguous, non-overlapping triangles to represent a three-dimensional surface (length, width, and height). A geographical region can be divided into both regular (raster) and irregular non-overlapping polygons for modelling and analysis. A TIN model allows surface models to be generated efficiently to analyse and display terrain and other types of surfaces. The elevation value of a specific point on the earth’s surface is modelled as the vertex of a triangle, whereas arcs represent the estimation of elevations between two vertices (two points on the earth’s surface). To maintain the accuracy in drawing the triangles, that is, to maintain the accuracy in elevation modelling, the Delaunay construction rule is exercised. According to the Delaunay construction rule, “three points form a Delaunay triangulation if and only if (iff) a circle which passes through all three points contains no other points in the set.” This rule can be devised to divide areas of similar slope into irregular triangles. For example, a rectangular region can be divided into two rectangles by joining the north-east and south-west corners of the rectangle. By placing a point in the centroid of each triangle, six more non-overlapping triangles can be constructed

Introduction to Geographic Information System 23

(Figure 1.12). This process proceeds until a predefined threshold value is generated. Nodes are the elementary building blocks of the TIN data. They are connected to their nearest neighbours by edges, according to a set of rules. The user is not responsible for selecting the nodes; all the nodes are added according to a set of rules. The TIN creates triangles from a set of points called mass points, which always become nodes. Mass points can be located anywhere, but the accuracy of the model depends on the proper selection of mass points. Every triangle is assigned a unique identifier defined by three nodes and its two or three neighbouring triangles. Some properties of a TIN model are as follows. • The model was developed in the early 1970s as a simple way to build a surface from a set of irregularly spaced points. • It is a vector-based model (in the form of lines, points, and polygons) dividing a surface into polygons having the attributes of slope, aspect, and area, with three vertices having elevation attributes and three edges with slope and direction attributes. • A fewer number of points is required to model the surface; hence, it has a smaller file size.

Figure 1.12

Triangulated irregular network model

24

Geographic Information System

• It is an irregular model because vertices are scattered in ad hoc fashion. • It is simple and economic. • A TIN can be created using contours [a line through all contiguous points with equal height (or other values)] and breaking lines (linear features that define and control the surface behaviour in terms of smoothness and continuity).

Digital elevation model A digital elevation model is a sampled array of spot heights at regular intervals in any surface. The height of the highest point in a given area is expressed in feet or metres above sea level, as marked on topographical charts. In a DEM, digital information about surface elevations is presented in raster format. Each pixel value in the grid structure represents the spot height on the surface. Surfaces like the earth’s surface are continuous phenomena; hence, they require an infinite number of points to be represented with a finite data set. Specific computer software interprets the DEMs by converting them into a three-dimensional depiction of the surface (Figure 1.13). A DEM is the most common and simplest form of topography. It is called a digital terrain model when it represents the earth’s surface without objects on it (the bare earth’s surface). It is

Figure 1.13

A DEM diagram

Source

Introduction to Geographic Information System 25

called a digital surface model when it represents heights of landscape features such as trees and buildings. Elevation and height are technically different. Elevation is the height above a given level, especially that of the sea, whereas height is the measurement from base to top. Some properties of DEM are as follows. • The accuracy of a DEM is measured by resolution and height. • A DEM contains only the specific elevation values at specific grid point locations. • Elevation contours are specified in DEM representation. • A DEM is specifically used for many geo-analysis processes such as landslide study and topographical feature extraction. • A DEM is widely popular for terrain analysis due to its simplicity and extensive software support. • Resolution (distance between two grids) is the most critical parameter to be decided in a DEM model. • A DEM is used to find features on the terrain, such as drainage basins and watersheds, drainage networks and channels, peaks and pits, and other landforms.

Network Models Network models are graphs consisting of arcs that represent linear flows and nodes, which represent the interconnection between the arcs. Nodes can be junctions, and edges can be roads in a network model (Figure 1.14). A network can also be considered a system of vertices and edges, mathematically defined as a graph G = (N, E), where N is the number of nodes and E is the number of edges in the network. Networks are used to store connectivity of source features. Because of its node–arc structure, network models preserve topology and are widely used for allocation, path finding, and tracing. The geometry or topology of a network model should be close to the real-world scenario. Network models find a connected path through a network; they then analyse and manage the parts and assets associated with it. Arcs in a network model can be broadly classified into two types: • Directed links are straight lines connected by two nodes (Figure 1.15a). • Directed chains are topologies with intermediate shape points between two nodes (Figure 1.15b). Two important aspects of a network model are network topology and feature connectivity. Network models are widely used to analyse

26

Geographic Information System

Figure 1.14 A network model Source

Figure 1.15

Arcs in network chains: (a) directed link and (b) directed chain

vehicle traffic over transportation systems, load analysis over an electric network, or pollution tracking over a river.

Relational Models A model that organizes data into a tabular format is called a relational data model. Relational data models store data in tables. Each table has a unique name and identity. The table has two aspects—a set of columns representing field names and rows containing information. Rows are known as tuple, and the order in which they occur in a table is immaterial. No two rows can represent the same values for all columns in the table. In a GIS, each row is usually linked to a separate spatial feature. Accordingly, each row would consist of several columns, each column containing a specific value for that geographic feature. Data are often stored in several tables (Figure 1.16). Tables can be joined or referenced to each other by common columns (relational fields). The possibility of joint operations in relational data models is what makes relational data models commonly used in GIS. The relational database model is the most widely accepted model for managing non-spatial attributional data. It has emerged as the

Introduction to Geographic Information System 27

Figure 1.16

Relational database

Source

dominant commercial data management tool in GIS implementation and application. A relational data model has the following properties. • It is simple to organize information into tables and model it. • Data can be manipulated in an ad hoc manner by joining tables. • It reduces data redundancy by a proper storage of data tables. • There is no need to take into account the internal organization of data.

Object-oriented Models Object-oriented models store data into objects. These objects can be accessed only by methods specified by its class (group of object with similar attributes and methods) (Figure 1.17). An object-oriented model incorporates the following fundamental concepts. • Any real-world entity can be modelled as an object. Every object has a unique identification. • Every object possesses a state (values of different variables at an instance of time) and behaviour (set of methods that operate on the state of the object). The state and methods of an object can be accessed by another object only by passing a message. • Class is a group of all objects that share the same attributes and methods.

28

Geographic Information System

Figure 1.17 Object data model

• Each class has the super class from which a class can inherit objects, methods, or both. • The essence of an object-oriented model lies in its properties, which are explained as follows. Encapsulation: Encapsulation is an attribute of object design by virtue of which all the data related to an object are contained by and hidden in the object. It can only be accessed by member of the object’s class. Polymorphism: Polymorphism is the occurrence of something in many forms. It is a characteristic that allows an object to have more than one form. Inheritance: Inheritance is an attribute that allows a super class to transfer its state and attributes to its children. In GIS, object-oriented modelling not only allows the data to be held as an object (for example, an element on a map) but also allows these objects to be operated on by its methods and establishes relationships between these objects through message transfer. In this approach, querying is very natural, as features can be bundled together with attributes if the application requires. Object-oriented modelling thus holds many operational benefits with respect to geographic data processing.

Introduction to Geographic Information System 29

Hierarchical Models Hierarchical models present data as family tree such that each record has only one member. Figure 1.18 presents a hierarchical data model representing an animal family. A classical data model sets layers of data set, and subsets are organized in a parent–child structure. Hierarchical models are similar to the classic file structure of data in computers. These are the oldest type of data models. They support only a one-to-many relationship among data items. Actual geographical phenomena may not allow the number of parents to be limited; thus this model has very limited scope in GIS applications.

Semantic Data Models Semantic data models (SDMs) represent data in logical structures. They focus on providing the meaning of data along with attributes and interrelationships with other data. In semantic data models, an entity represents an aspect or a phenomenon of the real world. It supports dynamic schema evolution to capture new or evolving types of semantic information. Semantic models are widely used in natural language processing to define the semantic context of entities (words) used at any instance. They follow an arc–node structure where a node represents the basic entities and an arc represents the relationship between these entities (Figure 1.19). SDMs incorporate two types of relationship between entities—“is-a” (membership) relationship and “has-a”(inheritance) relationship.

Conceptual Models Conceptual models are a type of abstraction that uses logical concepts and hides the details of implementation and data storage. Conceptual models are the most abstract form of data. Detailed information, such

Figure 1.18

A hierarchical model

30 Geographic Information System

Figure 1.19

Semantic database

as data types, is omitted from conceptual data models. There are two standard ways in which spatial information is modelled conceptually— object-based and field-based models.

Object-based models Object-based models represent information as discrete geo-referenced entities. Each entity has a coordinate pair of x, y associated with it, defining its location in the real world. Because it is focused on objects, the implementation of this conceptual model will yield data models and structures that are focused on objects.

Field-based models Field-based models represent information as collections of spatial relationships, where each relationship is formalized as a mathematical function from a spatial framework. The spatial framework indicates that the model will divide an area into a finite tessellation of spatial units.

TOPOLOGY AND GIS Topology is the framework to model the relationship among vector features (point, line, and polygon) and determines the way these features share the geometry with their neighbouring vector objects. It is a branch

Introduction to Geographic Information System 31

of mathematics that studies continuity and connectivity (Figure 1.20). Topology is the study of qualitative properties of objects that are invariant under transformation. In GIS, topologies are important to preserve spatial properties when data pass through some transformations. Some basic topological relationships that are not affected by the coordinate system are as follows. • Connectivity: Connectivity represents the arc–node architecture of objects. The arc represents the spatial relation between the starting node and the end node in context of connectivity. • Contiguity: Contiguity is the identification of adjacent polygons by recording the left and right polygon of each arc. A polygon is a closed area generated by a chain of arcs having the same start and end node. Polygons sharing common arcs are regarded as adjacent or contiguous polygons. Thus the left and right sides of each polygon can be defined. This left and right polygon information is stored explicitly within the attribute information of the topological data model. The “universe polygon” is an essential component of polygon topology that represents the external area located outside the study area. • Area definition: A closed area is defined by a boundary. The concept of area definition is that an arc that surrounds an area defines a polygon. Each arc is stored only once, and the boundaries of adjacent polygons do not overlap; hence data redundancy is eliminated.

Figure 1.20 Graphical representation of topology

32

Geographic Information System

Topologies are effectively used to model spatial relationships. Since input data do not contain topological information, GIS software has to build topologies. Topologies are used to detect and correct digitizing errors. They are essential for network analysis. Topologies are also important because many GIS applications do not require coordinates, only topologies.

EXERCISES Question 1 What is GIS? Define the components of GIS. [Hint: A geographic information system (GIS) is a set of computerized tools for collecting, storing, retrieving, transforming, and displaying spatial data. The potential of GIS is explained by its unique ability to take up data from widely divergent sources, analyse trends over time, and evaluate spatial relationships. GIS is made up of five key components: hardware, software, data, people, and method.] Question 2 What is information system? What is the difference between information system and expert system? [Hint: An expert system is a program that uses available information and inferences to suggest solutions to problems in a particular discipline. It has an inference mechanism, for example, the ability to infer and the ability to derive new information from existing ones.] Question 3 Differentiate between spatial and non-spatial data. Explain how spatial data play a vital role in resource management and decisionmaking in various fields/industries. [Hint: Spatial data are information about the locations and shapes of geographic features and the relationships between them, usually stored as coordinates and topology (ESRI definition). They are georeferenced; for example, they have a location component. The location of any object may be relative (for example, the height of a tree with respect to another tree) or absolute (for example, the uniquely defined pin code for an area). Spatial data have four important aspects for processing geographical information with the help of an information system such as GIS—location, direction, distance, and space. Data about attributes of geographical features that are not georeferenced are called non-spatial data. Non-spatial data are stored as tables in relational databases. Tabular and attribute data are non-spatial but can be linked to the location.]

Introduction to Geographic Information System 33

REFERENCES O’Brien, J. 1993. Management Information System: a managerial end user perspective, 2nd ed. Homewood, IL: Irwin Schaefer, F. K. 1953. Exceptionalism in geography: a methodological examination. Annals of the Association of American Geographers 43: 226–249

2

Raster and Vector Data

INTRODUCTION Data are the essence of geographic information system (GIS). Once geospatial data are captured, the question that arises is how to digitally represent the data? Digital representation of geographical information (see Chapter 1) is time saving and economic, and it allows easy access to geographical information for further geospatial operations and analyses. Geographic representation is the technique of representing some part of the earth’s surface or near surface. Geographical data that need to be represented are built up either of atomic elements or details of geographical phenomena. Hence, to represent geographical information, the theory of discrete object view is used, which states that every object has empty space with well-defined boundaries. Objects are countable and have dimensionality (one dimension, two dimensions, three dimensions, and so on). Three basic structures used in discrete object view to represent geographical objects are point, line, and polygon. A GIS application uses different data models to represent and manipulate digital spatial data according to their potential use and source. In Chapter 1, there was a brief introduction to the two types of spatial data—raster data (grid or cell-structured data) and vector data (data represented as points, lines, and polygons). This chapter will study these two types of data in detail, compare them, and discuss their advantages, disadvantages, and applications.

VECTOR DATA Vector data represent spatial information as points (zero-dimensional data type to represent discrete and abstract spatial information, for example, buildings, cities, and so on), lines (one-dimensional data used for representing linear features such as streets, rivers, and so on), and

36

Geographic Information System

polygon (two-dimensional data used for representing areas such as boundaries of lakes or cities) (Figure 2.1). Points are stored using the x–y coordinates with their attributional data. Each point on the map is represented by its longitude and latitude value and stored as a record in a shape file. The point is the fundamental primitive of vector data. Lines are a series of points with a start node and an end node. They may be smooth curves or a series of connected straight lines. Each line is stored in the database in sequence of its first and last node along with the attributes associated with each line. Polygons are closed sequence of arcs. They are two-dimensional; they have area and boundary (Figure 2.2). Boundaries separate interior areas from exterior areas. In a polygon, the first node is the same as the end node. Polygons are stored as a sequence of nodes along with their attributes. Using polygons, several geometric attributes such as areas and perimeters can be calculated easily. Data representation as points, lines, and polygons depends on map scale and functional requirements of analysis for which the GIS study is intended. Features such as roads, rivers, pipelines, and other linear, structured features can be easily identified as those that can be represented as lines. Points are simple to represent, store, and analyse. Polygons need more points as input, but geometric attributes such as area can be easily calculated. If such attributes are not needed at the analysis phase, it is convenient to save data as points. Vector data representation using

Figure 2.1

Vector data

Raster and Vector Data 37

Figure 2.2

Polygon

points, lines, areas, and volumes is not always straightforward because it may depend on map scale and, intermittently, criteria established by government mapping agencies. A city on a 1:1,000,000-scale map is represented as a point, but the same city is shown as an area on a 1:24,000-scale map.

Advantages of Vector Data • Vector data are a comparatively compact data; hence they require less space to be stored. • Topology can be stored explicitly; hence, it is good for network analysis. • Features can be accurately located. • Easy retrieval, updating, and generalization of graphics and attributes are possible. • Data can be represented at their original resolution. • Vector data are good in correction limits and are apparent and easy for making administration maps. • Vector data allow for efficient encoding of topology; as a result, more efficient operations that require topological information are possible.

Disadvantages of Vector Data • Vector data have complex structure. • Location of each point needs to be stored explicitly.

.

38 Geographic Information System

• Vector data are not compatible with remote sensing data. • Vector overlay operation is difficult to implement. • Continuous data, such as elevation data, are not effectively represented in vector form. Usually, substantial data interpolation is required for these data layers. • Data capture and processing are time-consuming processes. • Area analysis within the polygons is a difficult process. • Vector data are unable to model uncertainty. Every object in a vector data is based on a coordinate system. Vector data are expressly used for technical drawings, computer-aided design (CAD), network analysis, road and river networks, and cartography.

RASTER DATA Raster data place spatial information into equally spaced raster cells, where each cell represents a point feature and groups of cells are used to represent lines and area features of a geographical area. It has a grid structure where the geographical area is divided into a series of units, where each unit is a cell. Usually, raster cells are square in shape, but other geometric structures such as triangle, hexagon, or rectangle can also be used for describing the area without leaving any hole in the covering space. The matrix (row–column structure) of raster cells is called a grid. The size of the cell is selected based on the required resolution and data accuracy. By knowing the origin of a grid and cell size, the location for each pixel can be easily calculated. A raster cell or pixel holds the value within a specified range or colour depth of any raster image. There are two values associated with a cell—one is its location value in the x–y coordinates and the other is its data value as colour depth. Some basic properties of a raster cell are spatial resolution, pixel dimensions (number of pixels in the image width × number of pixels in the image height), spectral resolution, temporal resolution, colour depth (available colour ranges), and geo-referencing information (specified by the coordinate point for one upper corner cell and cell size). A two-dimensional array structure is easy to encode in computer programs; hence, a number of analytic operations can be performed over the data, which makes raster data popular for GIS packages. Raster data allow sophisticated mathematical modelling, which provides a basis for quantitative analysis techniques.

Raster and Vector Data 39

Advantages of Raster Data • Raster data have a simple structure. • The geographical location for each cell in the matrix is retrieved by its location in the grid. Only one coordinate, that is, the origin is stored. • There are several scanning techniques used to acquire geographical data in raster format, which can offer a large amount of data as input for GIS processing. • They are suitable for mathematical modelling and quantitative analysis. • They have the ability to represent continuous surfaces, and surface analysis can be performed. • The data set can be compressed in both ways—lossless compression and lossy compression techniques. • Remote sensing techniques and image-processing techniques produce data for raster analysis. • Computers naturally support raster data as arrays and offer many array handling operations.

Disadvantages of Raster Data • The volume of data is large. • Selection of proper resolution is a challenging task in GIS. If an extremely fine cell size is selected, a large volume of data is generated, and if a large cell size is selected, data may be overly generalized. • It is difficult to represent topological relationships. • Linear representation and analysis are difficult with raster data. • Maps produced from grid structures are very crude and not efficient for most cartographic operations.

RASTER ENCODING METHODS Raster systems are a result of the developments in computer-based imagery system over the last decades. Huge data storage is a major problem with raster data. Each cell contains only one value, leading to the decomposition of a data layer into a series of raster maps. For example, a river map may be broken down into different maps, each representing an attribute, such as depth map, pollution map, length map, and so on, where each map is called an attribute map. In each map, the cells containing an attribute are represented by 1; the absence of an

40

Geographic Information System

Box 2.1 Vector versus raster data model

attribute is represented by 0. This allows a quantitative and statistical analysis of raster data; but at the same time, it also creates a huge amount of data to be stored. Storage problems with raster data drew considerable attention towards raster data-encoding techniques. In computers, data encoding is the process of putting characters (letters, numbers, symbols, and punctuation) into a particular format for efficient transmission or storage. Data decoding is the reverse process—conversion of encoded data back into its original format. Raster data in raw form, that is, without compression, are extremely inefficient for storage. Therefore, different data-encoding methods are used to encode raster data. These methods reduce data storage considerably. In the following sections, these methods will be discussed in detail.

Raster and Vector Data 41

Run Length Encoding Run length encoding is data compression technique that encodes a sequence of data units as a single data unit based on the principle of spatial auto-relation. This means all things are related, but neighbouring things are more related and, hence, may contain more similar values. It is the simplest raster encoding technique that stores a single value for a series of consecutive cells with a similar attribute value along with the length of the run (run length is the number of consecutive cells of the same type). Thus a sequence of pixels is encoded as a pair of numbers (run length, attribute value). As mentioned before, in a simple raster file, 1 represents the presence of an attribute and 0 represents the absence of the attribute. Data are built through row by row study of the raster file. Figure 2.3 illustrates an example of the run length encoding technique. In this example, the first row of the matrix is encoded as 2,0 2,1 3,0. The first number (2) represents two consecutive cells with the same attribute. The second number (0) represents the absence of any attribute in those two cells. The third number (2) represents the number of occupied cells, and the fourth number (1) represents the presence of an attribute. The fifth number (3) and sixth number (0) represent again a sequence of three cells with the absence of any attribute. If it is assumed that a numeric value is stored with 1 byte in a computer, a row will take 6 bytes to be stored as run length. Run length encoding is not efficient for digital elevation model data where neighbouring pixels are almost spatially unrelated to each other.

Figure 2.3

Run length encoding technique

42

Geographic Information System

Block Encoding Block encoding is a two-dimensional generalization of run length encoding where the raster grid is divided and stored as a sequence of squares with similar attributes. In the block encoding method, spatial auto-relation with neighbours is stored as sequence of squares. A grid is divided into squares of similar attributes. The position, size, and content of each square is stored. Figure 2.4 represents an example of the block encoding technique.

Quad Tree Encoding A raster encoding method in which the grid is divided into quads (quarters) until there is a homogeneous block or no more subdivisions can take place is called quad tree encoding. It is the most commonly used raster encoding technique. Each square region is subdivided into its quadrants. A quad is a quadrant. The process of subdivision takes place until the contents of the cells have the same values or no more divisions are possible. Basically, the quad tree method uses a 2n × 2n array divided into four quadrants. The array is subsequently subdivided into smaller and smaller areas, down to a one-cell level. This method is difficult to understand but useful for complex data structure. In a complex raster data set, quad tree uses smaller grid sizes for higher resolution and larger grids for uniform area. The smallest quad cell size is determined by pixel resolution (Figure 2.5).

Figure 2.4

Block encoding technique

Raster and Vector Data 43

Figure 2.5

Quad tree technique

Chain Encoding Technique A raster encoding technique that reduces data storage space by defining the boundary of an area as a series of cardinal directions and cells is called chain encoding. A chain is a sequence of coordinates that define a complex line or boundary. Chain codes compress the area by starting with an origin and counting the number of cells in each direction. For example, S5 means moving south by 5 cells. E4 means moving east by 4 cells. However, in actual digital encoding, numbers are assigned to directions, for example, E = 0, N = 1, W = 2, and S = 3. Figure 2.6 illustrates an example of the chain encoding technique.

Figure 2.6

Chain encoding technique

44 Geographic Information System

Maps A map is a two-dimensional representation of the whole or a part of three-dimensional earth surfaces on a flat surface with the help of signs and symbols. Traditionally, maps are used to represent and communicate spatial relationships between the features for which that map is intended to represent. Although the globe is a traditional model to represent the earth’s surface, it is not compatible to most spatial analysis needs because of its spherical nature. Maps are models of reality representing spatial relationships among real and abstract geographical phenomena. Basically, a map is a mathematical concept that transforms information from one form to another. It is a technique to reduce the world to points, lines, and areas using different visual resources, such as size, shape, texture, pattern, colour, orientation, and value. Historically, maps were drawn on individual sheets, papers, or gathered into an atlas. Now, with the advancement of digital systems, maps are popularized as an invaluable source of information. A map is valued for its presentation of information in an effective and efficient way. For this, every map has some formal and informal conventions. For example, many map users deem east as right or north as upside directions.

Map scale Scale is the ratio between distances on map and the corresponding distances in the real world. Earth is the third planet from the sun and the fifth largest planet of our solar system. To represent earth’s features, objects, patterns, and residents on a map, they need to be reduced by a constant amount to a best possible size. The “best possible size” depends on the purpose and need of the map. A map scale represents the amount of reduction. For a map of scale 1:25,000, 1 cm on the map represents 0.25 km on the earth’s surface. A map scale can be represented as text or graphics or a combination of both. Three different ways to represent the scale are explained as follows. • Verbal scale: A verbal scale uses words or a verbal statement to represent the ratio between a map unit and the real-world distance. It states that one map unit equals to x land units. Example: 1 inch = 230,000 inches. For convenience, a mixture of units is used in verbal scale. • Scale bar: Scale bar is a graphical representation, usually a ruler drawn at the bottom of a topographic map. One side of a scale bar represents the distance of real objects, while the other side represents

Raster and Vector Data 45

Figure 2.7

A scale bar

the distance on the map. The left end of the bar is subdivided into smaller parts to estimate more precise distances (Figure 2.7). An important advantage of using a scale bar is accuracy; that is, if a map is enlarged, the scale is also increased, and if a map is decreased, the scale is also reduced. • Representative fraction: A representative fraction (RF) represents scale as mathematical relationship between map distance and realworld distance. It is expressed as a ratio or fraction of map and land distances. An RF can be shown as 1:250,000, which means 1 unit of distance on a map is equal to 250,000 of the same unit of distance on land. Example: 1:250,000 The equivalent land distance of a given map distance can be calculated as follows. 25 cm on a map can be calculated as 25 × 250,000 = 1,250,000 cm = 1250 m or 1.25 km on the ground.

Scale conversions • Verbal scale to representative fraction: Verbal scale is a mixed scale, that is, a verbal expression of a scale contains a mix of different units of distance, while RF uses a single unit for scale representation. Conversion of a verbal scale to an RF scale is a two-step process. Step 1: Choose any one unit of verbal scale for conversion. Verbal scale has two different units, and it has to be decided which one should be converted into the other. It is easier to convert a larger unit into a smaller one. Step 2: Eliminate unwanted units by multiplication. Example: One inch = two miles. Convert it into RF scale. Solution: 1 mile = 52,800 ft = 2 miles × (52,800 ft/1 mile) = 105,600 ft 1 ft = 12 inches 105,600 ft = 12 inches × (105,600 ft/1 ft)

46 Geographic Information System

= 1,267,200 inches 1 inch = 1,267,200 inches = 1:1,267,200 • Representative fraction to verbal scale: RF to verbal scale conversion is an easier process. A verbal scale may have same or different units on both sides of the mathematical expression. Both 1 cm = 10 miles and 1 cm = 10 cm are correct for a verbal scale representation. RF uses only one unit for representation. Example: 1:65,000. Convert it into a verbal scale Solution: As RF uses only one unit for representation, let it be in cm. Hence, verbal scale is 1 cm = 65,000 cm = 65 km on ground

Large Scale and Small Scale A large scale represents a small area in great detail. A map that depicts small territories in great detail is called a large-scale map. A small scale represents large features. A map depicting a large area, such as a country, is called a small-scale map. Small-scale maps show more territory but are less detailed. To understand the difference between large scale and small scale, let us consider the following two representations with the ratio method. 1. The ratio 1:25,000 means that the size of objects on the map is 1/25,000 of their size on the ground. For example, 1 cm on a map is equal to 25,000 cm or 25 km on the earth’s surface. 2. The ratio 1:250,000 means that the size of objects on the map is 1/250,000 of their size on the ground. It is now clear from the aforementioned representations that 1/25,000 is a larger fraction than 1/250,000 (same as half a mango is a larger than one-eighth of a mango). Hence, 1/25,000 represents a large-scale map. Therefore, large-scale maps show a small territory in greater detail, and they are guide maps or topographic maps that show details of cities, towns, and villages. On the other hand, small-scale maps show a larger area in less detail (Figure 2.8). They are wall maps or atlas maps that show important features such as mountains, plateaus, continents, and countries.

Raster and Vector Data 47

Figure 2.8

Large- and small-scale maps

Types of Maps A map helps to represent real-world scenarios. Based on the purpose, there are different types of maps: thematic maps, topographic maps, and general-purpose maps.

Thematic maps Thematic maps are used to depict information about a particular topic or theme. The information depicted on a thematic map may be substantial, statistical, and precise. Sometimes, the map user requires domain-specific knowledge to read the map. Population maps, forest coverage maps, and maps showing the watershed of a river are different types of thematic maps (Figure 2.9). Thematic map’s data have to be accurate. There are various ways to use the data, and each way must be considered with the map’s theme. The sources of a thematic map’s data are also important and should be carefully considered. Cartographers must find accurate, recent, and reliable sources of information in a wide range of subjects—from environmental features to demographic data—to make the best possible maps.

Topographic maps Topographic maps are used to depict extensive graphical details of objects present on the earth’s surface providing preliminary information about terrain details (Figure 2.10). They use a wide variety of symbols to represent human and physical features. Among the most striking features of topographic maps are contour lines, which are used to represent elevation by connecting points of equal elevation. These imaginary lines nicely represent a terrain.

48 Geographic Information System

Figure 2.9

Figure 2.10

A thematic map

Geologic map of the Mexico Valley Basin

Source

Raster and Vector Data 49

General-purpose maps General-purpose maps show a variety of information about a place. These maps are used to represent almost all physical features at a location and summarize the properties of the landscape, for example, a city map or a street map. They are a conglomerate of all the characteristics that depict the presence of all objects in a particular location. When cartography was in its infancy, almost all maps were general-purpose maps. These maps were also called reference maps. They represent both natural and human-made features such as coastlines, lakes, rivers, boundaries, settlements, roads, rail lines, and others (Figure 2.11). General-purpose maps focus on location. Wall maps, most maps found in atlases, and road maps are of this category. Maps are two-dimensional representations of the earth’s surface; therefore, location is another important aspect of maps. Primarily, location is the place of an object, which may be physical or abstract, or a phenomenon on the earth’s surface. A huge amount of information is associated with place. Name, geographical attributes, statistical variables such as population or density, physical attributes, and many other information sets can be linked to a place. Usually, the phenomena that affect places are comparatively slow; hence, they are considered static, while place is a dynamic identity. It is difficult to represent changes such as erosion of soil, changes in population density or vegetation pattern of any place over time; however, it is not wise to avoid such information from our representation. Location on any map also gives the idea of

Figure 2.11

A general-purpose map

50

Geographic Information System

distance and direction. Hence, it is an essential requirement to address the formal definition of location, which can be mathematically defined and manipulated. A coordinate system is a reference system that uses coordinates (set of values that show the exact position of an object along and up or down the defined origin) to define the unique position of an object on the earth’s surface. A coordinate system is a mathematical system used to define and analyse spatial objects geometrically. It is easier to numerically determine the relationships and properties of a set of points with known coordinates. The most familiar spaces are planes and three-dimensional spaces where a point P is determined by (x, y) and (x, y, z) coordinates, respectively. Every coordinate system is defined by its origin (datum, meridian), units (for example, metre, radian), and coordinate axes (for example, x, y, z). The common types of coordinate systems used in GIS are as follows. 1. Two-dimensional coordinate system: In a two-dimensional coordinate system, the location of a point is given by coordinates that represent the point’s distance from two perpendicular lines intersecting at the origin (Figure 2.12). There are four subcategories of a two-dimensional coordinate system. ƒ Plane Cartesian system—x-axis and y-axis ƒ Polar coordinates—r, q(theta) ƒ Raster grid—easting, northing ƒ Map projection

Figure 2.12 Note

A coordinate system

O is the origin of the reference system. The point at (12, 5) is 12 units along the x-axis and 5 units along the y-axis.

Raster and Vector Data 51

A projected coordinate system is based on map projections and designed for a flat surface. A mathematical transformation is carried out to convert spherical coordinates on a globe to the planar coordinates on a flat surface. 2. Three-dimensional coordinate system: In a three-dimensional coordinate system, the location of a point is given by three real values (coordinates) that represent the point’s distance from perpendicular projections on fixed perpendicular lines, called axes, intersecting at the origin. The common three-dimensional coordinate systems used in GIS are as follows. ƒ Three-dimensional Cartesian coordinates ƒ Geographical coordinate system (latitude and longitude) A geographical coordinate system (GCS) uses three-dimensional spherical surfaces to define the location on the earth’s surface. A GCS uses a datum, an angular unit of measure, and a meridian to define the locations. A point on the earth’s surface is referenced by its lat–long values. Latitude and longitude are the angles measured, usually in degree or grad, from the earth’s centre to the point on the earth’s surface. These are global or specific coordinate systems. A coordinate system, either GCS or projected, provides a framework to define the location on the earth’s surface. All spatial information must be referenced by a coordinate system having an associated coordinate value.

SHAPE OF THE EARTH The earth is not a perfect sphere; it is a spheroid. A spheroid is a sphere that bulges around the equator. ESRI defines a spheroid as “a threedimensional shape obtained by rotating an ellipse about its minor axis with dimensions that either approximates the earth as a whole or with a part that approximates the corresponding portion of geoids”. A number of cartographic spheroids have been designed to optimally analyse and study the properties of the earth’s surface. A spheroid designed for one portion of the earth surface not necessarily fits another portion of the earth’s surface. With the advancement of satellite technologies, a number of deviations in the spheroids representing the earth’s surface have been revealed (Figure 2.13). A GCS uses the spheroid to study the earth’s surface. A spheroid does not model the exact folds and other variations on the earth’s surface, and more than one spheroid has to be used to represent it all. A GCS needs a method or framework to support the design and selection of

52

Geographic Information System

Figure 2.13

Representation of earth as sphere or a spheroid

a particular spheroid for modelling the earth’s surface. This is where datum comes into play. A datum specifies the spheroid that should be used for modelling the earth’s surface and the exact location (a point) at which the spheroid needs to be aligned with the earth’s surface. A datum defines the origin for the GCS. An origin is the point on the surface at which the spheroid perfectly matches with the earth’s surface and the lat–long coordinates of the sphere are true and accurate. All other points in the coordinate system are referenced using the origin. Usually, there are two basic types of datum—geocentric datum and local datum (Figure 2.14). Geocentric datum uses the earth’s centre of mass as the origin, while the local datum aligns a spheroid to closely match the earth’s surface at a particular region. The point at which the spheroid meets the earth’s surface is known as the origin and all other points are calculated accordingly.

Figure 2.14

Local datum and geocentric datum

Source Robinson, Morrison, Muehrcke, et al. (1995)

Raster and Vector Data 53

With a defined origin, a datum assigns lat–long values for a feature location on the earth’s surface defined by the GCS. A GCS (Figure 2.15) uses a network of intersecting lines, called graticules (Figure 2.16), to represent locations and features on the curved surface of the earth. The intersecting lines are latitudes and longitudes. The horizontal lines or lines of latitude are called parallels, while vertical lines or lines of longitude are called meridians. The measures of lines of latitude start at the equator with 0° and range from 0° to 90° towards the North Pole and 0° to –90° towards the South Pole of the earth’s surface. Vertical lines or lines of longitude start with a prime meridian (with zero value) and range from 0° to 180° towards the east and 0° to –180° towards the west. Latitude and longitude values are the angles measured from the centre of the earth to the point on the earth’s surface. The globe is the best representation of the earth’s surface with accurate location, shape, and proportions, but it may have size constraints. If a globe is cut into pieces to make flat images, the shape and other information may get distorted; hence, a map is required. So the question is how to convert a three-dimensional spheroid with location information given in latitude and longitude values to a two-dimensional flat map? The answer is map projection. Map projection is a blended technique of mathematical transformation and geometrics to convert a three-dimensional spheroid into a twodimensional flat map. In other words, map projection is used to convert a GCS (lat–long system) to a projected (planar) coordinate system. It is not easy to flatten a spheroid onto a flat surface such as a map; it always

Figure 2.15

Geographical coordinate system

54

Geographic Information System

Figure 2.16

Graticules

creates some distortion to the actual spatial information regarding location. There are different types of projections based on the spatial information (shape, size, or direction) they preserve. Two basic criteria to classify different types of projection are as follows. 1. Based on the spatial attributes (shape, size, or direction) preserved by projection: basic type. 2. Based on the techniques used to project the spheroid onto the flat surface: basic technique. The most common approach of projecting a spheroid onto a flat surface is using a developable surface, which may be a cylinder, cone, or plane. There are different projection methods, and each aims to follow both the basic criteria of projection—minimize distortion and preserve spatial characteristics. The selection of a projection type depends on the purpose of the map.

Basic Types of Projections • Conformal: The conformal projection preserves the shape for small areas or angles for large areas. It is often used for navigation maps, topographic maps, and weather maps. A conformal projection shows 90° graticule lines intersecting at 90° angles on the map; hence, it preserves shape (Eklundh, Arnberg, Arnborg, et al. 1999). There are four conformal projections in use—Mercator, transverse Mercator, Lambert’s conformal conic with two standard parallels, and stereographic azimuthal Mercator’s projection (Robinson, Morrison, Muehrcke, et al. 1995).

Raster and Vector Data 55

• Equal area: The equal area projection preserves the area of the projected region. It is often used for dot density maps and thematic maps. • Equidistant: The equidistant projection preserves distances between features on the projected region. It is specially used for airline route maps and seismic maps. Most equidistant projections have one or more lines that have the same length on the map as that on the globe. • True direction: Projections that preserve the direction between features are called true direction projections. True direction or azimuthal maps can be combined with equidistant, equal area, and conformal projections. These maps are used for navigation routes. Each of these properties (shape, size, and direction) is preserved, one at the expense of others. Selection of a projection type depends on application as there is no perfect projection that can preserve all the properties.

Basic Techniques of Projection The most common technique of projection is using a developing surface such as a plane, cylinder, or cone. A developing surface either touches (tangent) or intersects (secant) the spheroid. There are three ways to project a map. 1. Planar projection: In a planar projection (Figure 2.17), the earth’s surface is projected on a plane surface. Planar projection is the most accurate at the centre where it touches the spheroid. The tangential point at which the plane meets with the spheroid determines the type of planar projection—polar, equatorial, or oblique planar projection. 2. Conical projection: A conical projection uses a cone as the developing surface to project a region on the earth’s surface (Figure 2.18). An elementary method of developing a conic projection is to

Figure 2.17

Planar projection

56

Geographic Information System

Figure 2.18

Normal conical projection

place a cone shape over the globe. There are two types of conical projections—normal conical and secant normal conical. A normal conical projection is tangent to the globe along the line of latitude. This line is called a standard parallel with no distortion. The lines of longitude are projected on the conic surface meeting at the apex, and the lines of latitude are projected as rings. This type of projection is preferably used for polar sections. A secant normal conical projection intersects the globe at two locations; hence, it is defined by two “standard parallel” lines (Figure 2.19). A secant conical has less distortion than a normal conical projection. 3. Cylindrical projection: Another widely used and important technique to project the globe on a flat surface is by using a cylinder as the developing surface. In cylindrical projection, the globe is projected by inserting it inside a cylinder prepared by rolling a plane paper. The cylinder touches the earth along the equator. When the cylinder is open and flattened, the regions near the equator are most accurate and the regions near the poles are most distorted.

Figure 2.19

Secant normal conical projection

Raster and Vector Data 57

Cylindrical projections are also of three types—normal cylindrical, transverse cylindrical, and oblique cylindrical (Figure 2.20). A normal cylindrical projection has a cylinder in which the equator is the line of tangency. A transverse cylindrical projection has its tangency at a meridian, and an oblique cylinder is rotated around a great circle line, located anywhere between the equator and the meridian. Figure 2.21 depicts the steps of projection of a geoid on a planar map with coordinate system.

TRANSFORMATION The term transformation is a mathematical function that has various meanings in different areas. Basically, transformation is the process of moving a shape so that it is in a different position but with the same shape, angle, size, and length. Three primary types of transformations are as follows. 1. Rotation: In rotation, the distance from the centre to any point on the shape remains unchanged. 2. Reflection: In reflection, the size and distance from the centre line are the same for every point in the mirror image. 3. Translation: In translation, movement takes place in the same direction. Geographic information system has the potential to capture data from different sources and combine all that data to design a complete spatial solution for analysis, management of planning, and decision-making problems. To combine data from different sources, a GIS specialist has to bring all spatial data into a common reference system. For that,

Figure 2.20 Cylindrical projection

58

Geographic Information System

Figure 2.21

Projection system

spatial data in one coordinate system need to be transformed to another coordinate system. Here, the term geographical transformation comes into play. There are various definitions for transformation. In general terms, transformation is a technique to register data layers to a common coordinate scheme with which a standard data layer is already registered for further processing. ESRI defines transformation as “the process of converting coordinates of a map or an image from one system to another system by shifting, rotating, scaling, skewing, or projecting them”. Based on the types of coordinate systems to be transformed, transformations are categorized into two classes: projection transformation and datum transformation.

Projection Transformation Projection transformation is the process of transforming a twodimensional coordinate system of one map projection into another two-dimensional coordinate (x, y) system of a specified map projection. For example, a reference system is projected with a Mercator projection. To add a new data layer, which is projected with a Universal Transverse Mercator (UTM) projection system, the specialist has to convert the coordinates of the UTM projection into coordinates of the Mercator projection. In projection transformation, coordinates (x, y) of source projection system are converted into geographical coordinates (latitude, longitude), which is called inverse transformation. Converting geographical coordinates into coordinate values (x, y) of the target projection system is called forward transformation (Figure 2.22). The transformation process is carried out by two equations. Projection transformation is an equation-based transformation. It is the simplest form of transformation where Cartesian values of one two-dimensional system need to be converted into Cartesian values of another two-dimensional system using mathematical equations. There are two types of transformations.

Raster and Vector Data 59

Figure 2.22 Projection transformation

1. Affine transformation: Affine is a Latin word that means “connected with”. Affine transformation between two two-dimensional Cartesian systems is a linear transformation in which a rotation along the x-axis and y-axis is followed by a translation. The transformation function is expressed as follows. x¢ = ax – by + xorigin y¢ = cx + dy + yorigin where a, b, c, d, xorigin, and yorigin are transformation parameters. Affine transformation rotates and enlarges a map, preserving the primary shape of the original map. 2. Polynomial transformation: A polynomial transformation is a nonlinear transformation. It ascertains the relationship between two two-dimensional Cartesian systems through a translation (slides an object to a fixed distance in a given direction), rotation (turns an image about a fixed point called the centre of rotation), and a variable scale change. The transformation function is represented as follows. x¢ = xorigin + a1x + a2y + a3xy + a4x 2 +a5y2 + a6x2y + … y¢ = yorigin + b1x + b2y + b3xy + b4x 2 + b5y2 + b6x2y + … Polynomial transformation is used to geo-reference aerial photographs or satellite imagery.

Datum Transformation The earth is not an even surface. Because of its complex structure and uneven surface, specific datum is required to represent distinct zones. For a comprehensive analysis of any geographical phenomena,

60

Geographic Information System

a specialist may require to combine spatial information from different sources about different zones. Now if different data had been used to represent this spatial information, how this information can be combined? The solution is to transform one projection into another one as well as the underlying datum of one projection into another. Datum transformation is a mathematical procedure that transforms source datum into the target datum using three-dimensional analysis (three-dimensional coordinate system). A mathematical function can be used to map the geographical coordinates (e, l, h) of one datum into another or can map the geocentric coordinates (x, y, z) of source datum into target datum (Figure 2.23). There are several methods for datum transformation such as geocentric transformation, 7-parameter transformation, and Helmert 7-parameter methods. A question arises here. If the coordinates of a projection system for input data are not known prior to the transformation, how the transformation can be performed? The answer is ground control point (GCP), which is a point on the earth’s surface with known locations used to geo-reference (defining coordinate values for geographical data) satellite images and aerial photographs. GCPs are used to establish the relationships between a known set of coordinates and an unknown set of coordinates. Many GIS software provides the functionality to transform projection systems and integrate data from different sources. Integration of map sheets with the same projection allows a single data layer of spatial data

Figure 2.23 Datum transformation

Raster and Vector Data 61

to be generated from different sources for a GIS application. If there is some distortion in any map sheet to be joined with an already registered map, it should be rubber sheeted. The layers that are supposed to be superimposed should be registered to a similar coordinate system. Selected layers are registered to a common reference system. Geometric transformations are used to assign ground coordinates to a map or data layer within the GIS or to adjust one data layer so that it can be correctly superimposed on another of the same area. The procedure to correct geographical data is called registration. Two approaches are used in registration—the adjustment of absolute positions and the adjustment of relative positions. Relative position refers to the location of features in relation to a geographic coordinate system. Rubber sheeting is a relative position registration technique. It is the process of transforming geometric properties of a raster map or edge matching of one map sheet of data layer with another map sheet depicting the same location using a common control point. Primarily, it is a GIS functionality to manipulate geospatial data. Rubber sheeting is necessary because the imagery and the vector data rarely match up correctly due to various reasons such as the angle at which the image was taken, the curvature of the surface of the earth, minor movements in the imaging platform (such as a satellite or aircraft), and other errors in imagery. Rubber sheeting starches one data layer to meet the predefined ground point.

DIGITIZATION Data acquired for spatial analysis need to be stored in a computer; hence, it is required to be in machine-readable or digital form. Computers cannot read analogue data or images. Digitizing is the process of converting analogue information into a digital representation. It is the way of representing analogue signals or images using a discrete set of points. These discrete units are called bits. These bits (of 8) organized in groups are known as bytes. Digital signals are mainly represented in the form of a sequence of integers. These integers can be converted back to analogue signals that are approximately similar to the original analogue signals. Digitization is one of the most expensive and time-consuming aspects of data input in GIS. The digital capture of data from analogue sources (for example, maps, imageries, and aerial photographs) is carried out in two different methods—manual digitization and heads-up digitization.

Manual Digitization Manual digitization is a technique to digitize a map or aerial photograph by affixing it to a digital table. The coordinates of the map feature are

62

Geographic Information System

recorded with the help of a pointing device called a puck. The operator manually traces all the lines from the hardcopy map and creates an identical digital map on the computer. The table used for manual digitization is rectangular in the middle and has a rim around the boundaries similar to a drafting table. The table also contains a mouselike device with cross hairs called the “puck”. The digitization table is a data-sensitive tracing device. It electronically records the positions of points and lines. The most common digitizing table uses a fine grid of wires embedded in the table. The vertical wires record the y-coordinates, and the horizontal ones record the x-coordinates. The operator mounts the map or the imagery to be digitized on this surface. The map is then stuck to the rectangular surface, which prevents the map from any displacement. The puck is moved over the map, and the electronics of the system, working on an electrostatic system in conjunction with the wire grid, picks the signals from the puck and converts the position of the puck into a digital signal. This signal is processed by a software in the computer converting the signal into x–y coordinates depicting the position of the puck. Thus the features on the map or imagery are literally traced out by the puck. The range of digitized coordinates depends on the density of the wires (called digitizing resolution) and the settings of the digitizing software.

Heads-up Digitizing Interactive and on-screen digitization of geo-data is another very commonly used method of digitization, and it is called heads-up digitization. The major difference between the two methods (manual digitization and heads-up method) is that in the heads-up method, the base map or image is already in a digital raster form, that is, in the form of a digital image. During the process of digitization, the attention of the user is focused on the computer screen and not on a digitization tablet, hence the name “heads-up digitization”. The prime objective of this digitization process is to convert this digital image into a form usable in the GIS environment, that is, in a form such that each feature on the map has a geographic coordinate associated with it. At the first stage of the procedure, paper maps or imageries are converted into a digital image. This is a procedure completed with the help of a scanner. A scanner automatically captures the features, text, and symbols in the map as individual cells, or pixels, and produces an automated digital image in raster format. However, this raster image lacks geographic information, which has to be inputted manually. For this, the digital image is displayed on the screen and zoomed to a comfortable level such that all the features on the digital images can be easily traced out

Raster and Vector Data 63

on the screen itself to create new layers or themes. Now, the selection of projection is another important task to be accomplished. In a similar manner, control points with known geographic locations are identified and marked based on which the geographic coordinates of all the features in the map are known. Another improvement to on-screen digitization is the interactive tracing method, which automates the linetracing method in such a way that it automatically traces one line at a time under the guidance of the operator, which reduces the probability of error propagation. Spatial accuracy of the features depicted on the map is very important for a good GIS database. But accuracy of the digitized map depends on a number of different types of errors.

EXERCISES Short Answer Questions 1. What are the differences between spatial, temporal, and thematic modes of GIS data? 2. Differentiate between ratio, verbal, and graphic scales. 3. How can you locate points, lines, or polygons using a lat–long or any other coordinate system? 4. Describe how you might combine pre-existing data with new data to make a new map? 5. What are raster data? 6. What are vector data? 7. What is rubber sheeting? 8. What are GCPs? 9. What is digitization? 10 What is scale?

Descriptive Questions 1. What are the different types of projection systems? Discuss why projection systems are important in mapping? 2. What are the different types of data models? Explain the difference. 3. What is the difference between datum, globe, geoid, and ellipsoid? Explain with a diagram. 4. What is transformation and explain the basic types of transformations?

64 Geographic Information System

5. What is scale and what are the different types of scales? 6. What is encoding? Explain the different types of raster encoding methods. 7. Discuss the pros and cons of raster and vector data. 8. Give the chain encoding results of Figure I. 9. Give the block encoding results of Fgure II.

Figure I

Figure II

REFERENCES Eklundh, L., W. Arnberg, S. Arnborg, L. Harrie, H. Hauska, L. Olsson, P. Pilesjö, B. Rystedt, and U. Sandgren. 1999. Geografisk informations behandling- metoder ochtillämpningar, Central Tryckeriet Borås, ISBN 91-540-5841 4 Robinson, A. H., J. L. Morrison, P. C. Muehrcke, A. J. Kimerling, and S. C. Guptill. 1995. Elements of Cartography, 6th edn. New York: John Wiley & Sons

3

Attribute Database and Overlay

Map features and attributes associated with these features are two important elements of a geographic information system (GIS). In the previous chapters, spatial data, spatial data models (raster and vector data models), and their representation as data layers and maps were discussed. However, separate data models are required to store and maintain attribute data—another important part of GIS.

ATTRIBUTE DATA Every geographical object or phenomenon must have some characteristics associated with it, say, height of a mountain or depth of a lake, area of a forest or density of population. A GIS is a computer-based information system that must be able to store, manipulate, maintain, and query spatial information. However, more specific data models are required to gather the characteristics of spatial features, and attribute data provide the solution. Attribute data are characteristic or descriptive information about spatial features linked with GIS to define geographical features or phenomena. Attribute data are a tabular, textual, and non-spatial set of information that does not inherit any geographical information. A wide range of data models exist for storage and management of attribute data, and some of them are file structure, hierarchical, network, object oriented, and relational models. A brief review of all these models has been provided in Chapter 1. This chapter will discuss in detail the most popular one—relational database. A relational model consists of three primary components. 1. A collection of relations 2. Set of operations and rules to act upon relations 3. Data integrity for accuracy and consistency

66

Geographic Information System

RELATIONS Relational data infer a tabular form of data storage. Hence, a relational model is one that stores data in a matrix of rows and columns (Figure 3.1). The relation or table is a set of tuples, where a tuple is a row that stores attributes associated with a physical object. A column is a set of data values of the same type, one for each row. Each cell of a table represents the field value for individual physical objects. For example, in Figure 3.1, the first tuple, Row 1, shows attributes of Roy (physical object), who is a student. Column 1 represents a data set, student_id, associated with all tuples. Table cell 1×1 represents the value of student_ id for an individual object, namely, Roy. Each table must have a name and store data for a distinct physical entity such as a student. According to Croswell (1991), a database structure commonly used in GIS is one in which data are stored based on two-dimensional tables where multiple relationships between data elements can be defined and established in an ad hoc manner. Geospatial analysis and other GIS activities require database management operations (query and retrieval) on the tabular (attribute) data associated with the geometrical features. Geographic data, which include geographic features and their corresponding attribute information, are entered into a GIS using a technique called digitizing. This process involves digitally encoding geographic features such as cities, streets, buildings, roads, or country boundaries. Attribute data about points/lines/areas features can be entered into different database files. The files can be linked to the default spatial database generated after digitizing by creating an identification key in each data file, which is also common to the spatial database generated by the GIS after digitization. The locational data of different features (coordinates, topology) are generated during the digitization process. Attribute data of locations are created separately. The GIS must provide the link between

Figure 3.1

Student record (relation database)

Attribute Database and Overlay 67

locational and attribute data. The relational database model is most suitable to ensure such linkage and the database query language.

Scale and Source of Inaccuracy The outputs delivered by GIS software highly depend on the quality of data. Data accuracy is the measurement of the truth value of an observed value with respect to the actual value. The difference between the observed and the expected true values indicates the accuracy of the observations. There are two types of accuracies accountable for data quality in GIS—positional accuracy and attribute accuracy. Positional accuracy is related to spatial location of an object. It may be defined as the deviation between observed positions of an object and the actual position of the object. Positional accuracy is further categorized into two subgroups—absolute accuracy and relative accuracy. Absolute accuracy is the deviation of the object location with respect to the coordinate system, such as Universal Transverse Mercator (UTM). However, relative accuracy is the positioning of a map feature with respect to other map features (Figure 3.2). Attribute accuracy is concerned with the accuracy related to representation and interpretation of characteristics associated with object classes. Accuracy in representation of a political boundary of a state or forest area is an example of attribute accuracy. Foote and Huebner (2004) define accuracy as the degree to which information on a map or in a digital database matches true or accepted values. According to them, it is an issue pertaining to the quality of data and the number of errors contained in a data set or map. In discussing

Figure 3.2

Types of accuracy

68

Geographic Information System

a GIS database, they say, it is possible to consider horizontal and vertical accuracy with respect to a geographic position, as well as attribute, conceptual, and logical accuracy (Figure 3.2).

GIS FUNCTIONALITY Geographic information system stores spatial data on geographical features and attribute data associated with these features. The life cycle of a GIS starts with data acquisition, moves through different stages of data processing, and ends with representation of both spatial and attribute data to carry out analysis and decision-making processes for complex spatial problems. To use a GIS with all its advancements and potential, it is necessary to understand its core functions. Although a wide range of GIS packages are available in the market, all with their own specifications, some common core functions are implemented within all systems. GIS functionality is a logical sequence of data capture, data processing, data storage, data analysis, and representation to proliferate solutions for complex spatial problems. The logical flow of data for successful implementation of GIS projects takes the following sequence of processes.

Data Preprocessing Data preprocessing is a sequence of processes carried out to acquire and embed error-free spatial and attribute data into the GIS. It is composed of four sequential steps. 1. Data capture: GIS has the power to capture different types of data from different sources. There are two separate categories of geo-data sources—primary data sources and secondary data sources (Figure 3.3). Primary data sources are digital data sources primarily focused on data acquisition for GIS application; for example, remote sensing to acquire geo-data without any direct physical touch with the object, that is, to remotely sense the data of a physical object on the earth’s surface. Secondary data sources acquire data sets in both digital and analogue forms and initially gather data for any other purpose such as manual cartography. They are then converted for GIS application using techniques such as projection. Again, based on the different types of geo-data (raster or vector), a different technique is employed for different sets of data. For example, ground survey is used to capture vector data, while scanning is used to acquire raster data. Digitization is a key step to be carried out at this stage. It is the process of converting primary data (analogue data such as paper maps) into primary digital data for direct use in GIS.

Attribute Database and Overlay 69

Figure 3.3

GIS data acquisition method

2. Data transfer: Data transfer involves transferring already acquired digital spatial data into GIS using an electronic network or some external media such as a pen drive or a magnetic disc. The transferred data may be of any format—system dependent such as ARC/INFO format or system independent such as TIGER. 3. Data edit: Data editing is the process of compiling the acquired data by making them error free and mapping the relation between spatial data and attribute data of distinct spatial features. GIS software allows correcting spatial and non-spatial data. Textual and graphical sets of data can be copied, moved, deleted, or updated using different functionalities provided by the GIS development software. The resultant digital data files contain all spatial and attribute data present in the original data set but without distortions. Different spatial operations such as rubber sheeting, overlay, and buffering are carried out to manipulate the data and make them error free for further use. 4. Data storage: The compiled data are structured and efficiently stored (because of space constraints) in the GIS. The concept of relational database management system (RDBMS) is used to organize and manage the location and attribute data separately. The unique structure of the database facilitates retrieval, analysis, and manipulation of topographical data. Stored map layers in the database need to be analysed, manipulated, and retrieved for analysis. Prior to data storage, they need to be structured. Raster and vector data models are techniques to organize data in a structural form. Structured data are easy to transfer and well organized.

70

Geographic Information System

Generalization Geographic information system is used to model real world for analysis of geographical phenomena and features. The real-world features have a complex structure and numerous amounts of data are associated with those geographical features. However, due to constraints such as time, space, and processing speed of devices that are used for processing of spatial data, it is not advisable to store all data sets associated with the feature. Here the term “generalization” comes into play. Generalization is a technique to scale the information need to be represented on a map according to the scale of the display medium. It is the process to derive more relevant, purposeful, and less detailed data at a smaller scale from a larger set of data at a higher scale. Major operations for generalization are as follows. • Smoothing: The process of reducing the angularity of line is called smoothing. Smoothing is a technique to simplify the map features, involving several other characteristics such as feature displacement and location shifting of generalization. The purpose of smoothing is to exhibit line work in a much less complicated and less visually grating way. • Enhancement: A cartographer uses the enhancement technique to clarify specific elements that aid in map reading. Enhancement can be used to show the true attribute of the feature being represented and is often used to highlight domain-specific knowledge. • Selection: Selection is a map generalization technique that facilitates the reduction of the complexities of the real world by deliberately reducing auxiliary and unnecessary details. • Simplification: Generalization not only facilitates simplification of data but also provides an opportunity to reduce the complexity of data by simplifying the geospatial data. Conversion of large-scale data into small-scale data for detailed geospatial data is an example of simplification.

Analysis Computer-based information systems are used to acquire, store, and manage information to deliver a digital product. However, GIS is special because of its capability to store spatial information and its potential to recognize the interrelationships and hidden patterns that exist among spatial data sets. Spatial analysis is a set of statistical, mathematical, and spatial operations to determine the existing patterns in the spatial data of a given domain. To no small degree, the recent quantitative

Attribute Database and Overlay 71

analysis in geography represents a study in depth of patterns of points, lines, areas, and surfaces depicted on maps of some sort or defined by coordinates in two- or three-dimensional space (Wilson and Bennett 1985; Hägerstrand 1973). Spatial analysis establishes the link between traditional cartography and statistical, mathematical models to manipulate spatial and nonspatial data in GIS. The GIS software is nothing without its analysis toolbox. Spatial data can be stored as a point, line, or a polygon, and the interrelationship between these features needs to be established with different analysis techniques such as nearest neighbour method (point); network analysis and autocorrelation (line); and surface analysis and Bayesian technique (polygon). Analysis techniques provide support for planning, management, and decision-making applications by exploring the existing hidden patterns and deriving new spatial patterns from old ones.

Representation Geographic information system produces a digital product as an output of processing and manipulating spatial data. Digital interactive maps, reports, graphs, and results of specified queries are distinct methods of representing spatial and non-spatial data (Figure 3.4). Usually, geographical data are represented in the form of maps and graphs, while attribute data are represented as tables and reports. At this stage, GIS works like computer-based cartography. It provides a collaborative and interactive platform for display of spatial data. Results of spatial queries are displayed as charts, tables, or reports on attribute data associated with spatial features. GIS software provides all these methods of data display. Although the representation of data plays a vital role in decision-making, planning, and other objectives of the GIS application, technically it is a less sophisticated stage than the earlier stages of a GIS life cycle.

SPATIAL QUERY A query or question always results in some meaningful information. RDBMS supports the query function by structured query language (SQL) to retrieve meaningful information from the database. A spatial query returns locational or attribute data sets. A spatial relation exists between two spatial features and map distance, topology, and direction. A spatial query is a logical statement or expression that selects a geographical feature based on the spatial relation. There are three types of spatial queries, which are as follows.

72

Geographic Information System

Figure 3.4

Representation of GIS data representation

• Query for attribute data: This query returns attribute data, that is, characteristic associated with some geographical feature. • Query for spatial data: This query returns locational data associated with some geographical feature. • Generation of new data sets: A GIS application may require data sets that are originally not incorporated with the database. For example, a GIS database has SQL developed by IBM, which is used to retrieve information from relational database systems. Since GIS database inherits an RDBMS structure, the spatial data can be queried with the same structure. However, retrieval of spatial data is difficult because of the complex structure of the spatial data and the association of spatial and non-spatial data types. Both data models—raster and vector models—are queried by a distinct set of query functions. A query often appears in the form of a statement or logical expression. In Arc Map, a query contains a field, an operator, and a value (ESRI 2000). Examples of spatial query are as follows: • What is the distance between two points on a map? • How many districts are there in a state?

Raster Data Queries • Query by attributes: A raster data model is a cell-based data model and may be queried by using the cell value of the individual cell. To query a grid, a logical expression such as [city] =3 AND [elevation] ≥ 3456.34 can be used. It is also possible to query multiple grids by cell value. This analysis cannot be done on vector type data. • Query by location: Graphics is an important feature of GIS software and can be used to query raster data. In GIS software, spatial data

Attribute Database and Overlay 73

inherit a layered approach for data storage and a simple operation such as a mouse click and a roll leads to presentation of micro-level details of spatial data. This interaction of graphic user interface with spatial data makes data retrieval easier and less cumbersome.

VECTOR DATA QUERIES Query by attribute: GIS databases store both spatial and non-spatial data types. Non-spatial data are attribute data that store a value for the characteristics of spatial features such as depth of a well and density of population. Such type of data can be retrieved from the database or map by working with the attribute data. The data queries in ArcGIS follow Boolean algebra. Logical operators such as =. >,