Handbook of Data Management: 1999 Edition [1 ed.] 9781315893389, 9781351072489, 9781351089388, 9781351097833, 9781351080934

Packed with dozens of no-nonsense chapters written by leading professionals, Handbook of Data Management, 1999 Edition s

103 15 53MB

Pages [1104] Year 1998

Report DMCA / Copyright


Polecaj historie

Handbook of Data Management: 1999 Edition [1 ed.]
 9781315893389, 9781351072489, 9781351089388, 9781351097833, 9781351080934

Table of contents :

Section I: Data Development Methodologies, Definitions, and Strategy 1. Database Development Methodology and Organization 2. A Review of Database System Terminology 3. Developing a Global Information Vision 4. An Approach for Establishing Enterprise Data Standards 5. Completeness, Context, and Proportion: Shaping Data Design to the Business 6. An Enterprise Decision Framework for System Selection the Importance of Data Architecture in a Client/Server Environment 7. Section II: Data Models and Modeling Techniques 8. Making Data Models Readable 9. Component Design for Relational Databases 10. Practices in Enterprise Data Modeling 11. Integrating Process and Data Models in a Horizontal Organization 12. Object-Oriented Aspects of Windows Component-Based Software 13. Data Warehouse Design: Issues in Dimensional Data Modeling Section III: Data Integrity and Quality Programming Data Constraints Using Microsoft SQL Server 14. Referential Integrity for Database Design 15. AR: An Expanded Version of Referential Integrity for Data Base 16. Data Quality: An Architectural Solution 17. Section IV: Data Administration, Security and Operations 18. Managing Database Backup and Recovery 19. Managing NT Domains for Enterprise Architecture 20. DBMS Recovery Procedures 21. Change Management Strategies for Data Administration 22. Database Security Controls 23. E-Mail Security and Privacy 24. Security Models for Object-Oriented Data Bases 25. Security Management for the World Wide Web 26. Section V: Data Migration and Conversion 27. A Practical Example of Data Conversion 28. Legacy Database Conversion 29. Data Conversion: Doing it Right the First Time 30. Migrating Files to Relational Data Bases Section VI: Database Servers and Universal Servers 31. Creating SQL Server Tables Using DDL Scripts 32. Microsoft SQL Server: Key Transact-SQL Features 33. Selecting Universal Data Server Technology 34. Managing Complex Data Types: Middleware, Universal Data, Data Servers, and Object-Oriented Data Servers

Citation preview


Data Management


Sanjiv Purba, Editor

Boca Raton London New York

CRC Press is an imprint of the Taylor & Francis Group, an informa business

First published 1999 by CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 Reissued 2018 by CRC Press © 1999 by CRC Press, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Handbook of data management / Sanjiv Purba, editor. – 1999 ed.   p. cm. Includes bibliographical references and index. ISBN 0-8493-9976-9 (alk. paper) 1.  Database management.   I. Purba, Sanjiv. QA76.9.D3H3473 1998b 658.4’038’0285–dc21


A Library of Congress record exists under LC control number: 98038833 Publisher’s Note The publisher has gone to great lengths to ensure the quality of this reprint but points out that some imperfections in the original copies may be apparent. Disclaimer The publisher has made every effort to trace copyright holders and welcomes correspondence from those they have been unable to contact. ISBN 13: 978-1-315-89338-9 (hbk) ISBN 13: 978-1-351-07248-9 (ebk) Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

Contributors HEDYALBAN,Freelance Writer, Cherry Hill, NJ MARYAYALA-BUSH, Principal, Computer Sciences Corporation, Waltham,MA CHARLES BANYAY, Manager, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada DAVIDBRASSOR,Technology Architect, Deloitte & Touche Consulting Croup, Toronto, Ontario, Canada JEFFBULLER, Consultant, VISUALSystems Development Group, Toronto, Ontario, Canada JAMESCANNADY, Research Scientist, Georgia Tech Research Institute, Atlanta, CA PATRICIA L. CARBONE, Principal Staff,Mih-etek Systems, McLean, VA MARIONG. CERUTI, Scientist, Advanced C41 Systems, Engineering and Integration Group, Command and Intelligence Systems Diuision, Naval Command, Control, and Ocean Surveillance Center, RDT&E Division, San Diego, CA HANNAN CHERVINSKY, Principal and Cofounder,ABS Systems Consultants Ltd., Pickering, Ontario, Canada TIMCHRISTMANN, Senior Consultant, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada CARMINE CUTONE,Senior Manager, Deloitte and Touche Consulting Croup, Toronto, Ontario, Canada STEPHEN D'SILVA,Senior Consultantand Technology Leader, Deloitte & Touche Consulting Croup, Toronto, Ontario, Canada CHARLES DOW,Practice Leader, Object-Oriented Technologies, Deloitte & Touche Consulting Croup, Toronto, Ontario, Canada RICHARD T. DUE,President, Thomsen Due'&Associates,Ltd.,, Edmonton, Alberta, Canada LENDVORKIN, President, Italex Consulting, Inc., Thornhill, Ontario, Canada ELIZABETH N . FONC,Computer Systems Laboratory, National Institute o f Standards and Technology, Gaithersburg, MD JUDITH N. FROSCHER, Principal Investigator, SINTRA Project, Naval Research Laboratory, Washington,DC YONCJIAN Fu, Professor, Computer Science, University o f Missouri, Rolla, MO

FREDERICK GALLEGOS, Adjunct Professor and MSBA-Information Systems Audit Advisor, Computer Information Systems Department, California State Polytechnic University,Pomona, CA IDO GILEADI, Manager, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada NARASIMHAWH GORLA,Associate Professor,Business Administration, Cleveland State University, Cleveland, OH FRITZH. GRUPE,Associate Professor, Computer Information Systems, University o f Nevada, Reno, NV VENKATN. GUDIVADA, DOWJones, Jersey City, NJ BARBARA J. HALEY,College o f Business, University o f Georgia, Athens, CA CANDICE G. HARP,Human Resources Consultant, Atlanta, GA KATHRYNA. HARVILL, Computer Systems Laboratory, National Institute o f Standards and Technology, Gaithersburg, MD DAVIDC. HAY,President, Essential Strategies, Inc., Houston, 7X DEBORAHHENDERSON, Research Information Coordinator, Ontario Hydro, Pickering, Ontario, Canada DAVIDK. HOLTHAUS, Sothare Specialist, Nationwide Insurance Enterprise, Columbus, OH ASHVINIYENGAR, Consultant, Object Technologies, Deloitte & Touche Consulting Group, DRTSystems, Toronto, Ontario, Canada SUSHILJAJODIA,Professor, George Mason University, Fairfax, K4 JOHNJORDAN, Principal, Consulting & Systems Integration, Computer Sciences Corporation, Waltham, MA WOLFGANG KELLER,Senior Manager, Software Design & Management Ltd., Munich, Germany GIGIG. KELLY,Professor, Management and Information Science, College o f William and Mary, Williamsburg, VA KARLKELTON,Senior Manager, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada BONN-OHKIM,Assistant Professor, Information Systems, Department o f Management, University o f Nebraska, Lincoln, NE STEVENKREHBIEL, Doctoral Student, Business Administration, Cleveland State University, Cleveland, OH WALTERKUKETZ, Consulting and Systems Integration, ComputerSciences Corporation, Waltham, MA CAROLL. LARSON,Technical Writer and Desktop Publisher, Hillsboro, OR JAMESA. LARSON,Senior Software Engineer, Intel Architecture Laboratory, Hillsboro, OR &CHARD LEE,Senior Consultant, Operations Reengineering, Deloitte & Touche Consulting Croup, Toronto, Ontario, Canada PIERREJ. LEMAY,Senior Consultant, Document Management, PJL Information Systems, Orleans, Ontario, Canada DAVIDS. LINTHICUM, Technical Manager, EDS, Falls Church, VA GARYLORD,Partner, KPMG Peat Marwick, Palo Alto, CA

PHILLIPQ. MAIER,Member, Secure Network Initiative, Lockheed Martin Corporation, Sunnyvale, CA DANIEL MANSON,Associate Professor, Computer Information Systems Department, California State Polytechnic University,Pomona, CA JACKMCELREATH, Managing Partner, Consulting and Systems Integration, Computer Sciences Corporation, Waltham, MA LYNDAL. MCGHIE,Director, Information Security, Lockheed Martin Corporation, Bethesda, MD MICHAEL K. MILLER,Senior Data Architect, WorldwideData Planning and Architecture, NCR Corp., Dayton, OH STEWARTS. MILLER,President and Owner, Executive Information Services, Carlsbad, CA JAGDISH MIRANI, Senior Product Manager, Data Warehousing, Oracle Corp., Redwood Shores, CA MADHAV MOGANTI,Advanced Technologies, Bell Laboratories, Naperuille, /L BRETTMOLOTSKY, Lotus Notes Product Manager, Omicron Consulting, Philadelphia, PA ALIH. MURTAZA, Senior Consultant, Data WarehousingServiceLine, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada MARGARET T. O'HARA,Assistant Professor, Computer and Information Science, Central Missouri State University, Warrensburg,MO M. MEDHIOWRANG,Associate Professor, Computer Information Systems, University o f Nevada, Reno, NV R. MICHAEL PICKERING, Consultant, Red Brick Systems, Inc., Toronto, Ontario, Canada SRIRAM PIDAPARTI, Cargill Financial Services Corporation, Minnetonka, MN SANJIVPURBA,Senior Manager, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada T.M. W K U M A RAssociate , Professor, Department o f Decision Sciences and MIS, Miami University, Oxford, OH NYAPATHI RAMANA,Department o f Computer Science, University o f Florida, Gainesville,FL Lols RICHARDS, Data WarehousingProject Manager, Dynamic Information Systems Corp., Boulder, CO RAVIS. SANDHU, Professor, Software and Systems Engineering, George Mason University,Fairfax, VA CHARLES L. SHEPPARD, ComputerSystems Laboratory, National Institute o f Standards and Technology, Gaithersburg, MD SANDRA SHUFFLER, Senior Consultant, Deloitte and Touche Consulting Group, Toronto, Ontario, Canada ANTONIOSI,Assistant Professor, Department o f Computing, Hong-Kong Polytechnic University,Hong Kong, China MICHAEL SIMONYI, Independent Consultant, Etobicoke, Ontario, Canada MARTIND. SOLOMON, C.W Costello, Inc., Middletown, CT DANIELL. SPAR,Consultant, Transarc Corp., Falls Church, VA

NANCY STONELAKE, Senior Consultant,Deloitte & Touche Consulting Group, Toronto, Ontario, Canada WINSTON J. SULLIVAN, President, WJS, Toronto, Ontario, Canada MICHAEL J.D. SUTTON, Rockland, Ontario, Canada BHAVANI THURAISINGHAM, Lead Engineer, Center for Integrated Intelligence Systems, The MITRE Corporation, Bedford, MA DAVID WADSWORTH, Java Evangelist, Sun Microsystems, Toronto, Ontario, Canada DAVID C . WALLACE, Professor, Applied Computer Science Department, Illinois State University,Normal, IL HUGHJ . WATSON, Chair, Business Administration, College o f Business, University o f Georgia, Athens, CA JASONWEIR,Technical Writer, DataMirror Corporation, Markham, Ontario, Canada RONALD A. WENCER, Management Consultant, Toronto, Ontario, Canada NANCY M. WILKINSON, Consultant, Object-Oriented Design and C++,Westfield, NJ JAMESWOODS,Independent Consultant, Lewisville, TX MILESAU YEUNG, Manager, Deloitte & Touche Consulting Group, Toronto, Ontario, Canada DAVID A. ZIMMER, President, American Eagle Group, Warrington, PA MICHAEL ZIMMER, Senior Data Administrator, Ministry o f Health, Victoria, British Columbia, Canada


Contents INTRODUCTION ................................................. xv SECTION I DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY 1 2 3 4 5

6 7

Database Development Methodology and Organization.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 A Review of Database System Terminology. . . . . . . . . . 2-1 Developing a Global Information Vision. . . . . . . . . . . . . 3-1 An Approach for Establishing Enterprise Data Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 Completeness, Context, and Proportion: Shaping Data Design to the Business. . . . . . . . . . . . . . . . . . . . . . . 5-1 An Enterprise Decision Framework for System Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 The Importance of Data Architecture in a ClientJServer Environment. . . . . . . . . . . . . . . . . . . . . . . . 7-1


12 13

Making Data Models Readable. . . . . . . . . . . . . . . . . . . . . 8 1 Component Design for Relational Databases. . . . . . . . . 9-1 Practices in Enterprise Data Modeling. . . . . . . . . . . . . 10-1 Integrating Process and Data Models in a Horizontal Organization . . . . . . . . . . . . . . . . . . . . . . . . .11-1 Object-Oriented Aspects of Windows Component-Based Software. . . . . . . . . . . . . . . . . . . . . . 12-1 Data Warehouse Design: Issues in Dimensional Data Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1


Programming Data Constraints using Microsoft SQLServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-1

15 16

Referential Integrity for Database Design . . . . . . . . . . . 15-1 Data Quality: An Architectural Solution . . . . . . . . . . . . 16-1


Managing Database Backup and Recovery. . . . . . . . . . 17-1 Managing NT Domains for Enterprise Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 DBMS Recovery Procedures . . . . . . . . . . . . . . . . . . . . . .19-1 Change Management Strategies for Data Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.1 Database Security Controls . . . . . . . . . . . . . . . . . . . . . .21-1 EMail Security and Privacy . . . . . . . . . . . . . . . . . . . . . .22-1 Security Models for Object-Oriented Databases . . . . . 23-1 Security Management for the World Wide Web . . . . . . 24-1


A Practical Example of Data Conversion .......... . 25.1 Legacy Database Conversion . . . . . . . . . . . . . . . . . . . . . 26-1 Data Conversion: Doing It Right the First Time . . . . . . 27-1 Migrating Files to Relational Databases . . . . . . . . . . . . 281


Creating SQL Server Tables Using DDL Scripts . . . . . . 29-1 Microsoft SQL Server: Key Transact-SQL Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-1 Selecting Universal Data Server Technology . . . . . . . . 31-1 Middleware. Universal Data Servers. and Object-Oriented Data Servers . . . . . . . . . . . . . . . . . . . . 32.1 Creating Effective Batch SQL Jobs . . . . . . . . . . . . . . . . . 33-1 The Advancing Art of Indexing. . . . . . . . . . . . . . . . . . . . 34.1 Rule Matching and Indexing Techniques . . . . . . . . . . . 35-1 Operating System Support for Database Servers . . . . 36-1


Data Access Using Microsoft Visual Basic 5.0+ . . . . . . 37-1 Component Based Development . . . . . . . . . . . . . . . . . . 381 Developing ActiveX Internet Server Components with Visual Basic and Rational Rose . . . . . . . . . . . . . . . 39-1

40 41 42 43 44 45 46

A Technical Primer for Getting Started with JavaBeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-1 JavaBeans and Java Enterprise Server Platform. . . . . 41-1 Object Technology Essentials . . . . . . . . . . . . . . . . . . . . 42-1 CRC Cards: A Hands-On Approach to Object-Oriented Design. . . . . . . . . . . . . . . . . . . . . . . . . . 43-1 Component Architectures: COM and CORBA . . . . . . . 44-1 Using CORBA to Integrate Database Systems . . . . . . . 45-1 Distributed Objects and Object Wrapping. . . . . . . . . . 46-1


Software Architectures for Distributed DBMSs in a ClientIServer Environment. . . . . . . . . . . . . . . . . . . . . . . 47-1 Distributed Database Design . . . . . . . . . . . . . . . . . . . . . 4 8 1 Database Gateways and Interoperability . . . . . . . . . . . 49-1 Managing Multiple Databases Across Heterogeneous Hardware and Software Systems . . . . 50-1 Mobile and Federated Database Interoperability . . . . 51-1


The Power of Transformational Data Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52-1 Database Replication: Application Design and Deployment Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53-1 Secure Information Through a Replicated Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54-1


Building Database-Enabled Web Applications withIDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55-1 . Intranets: Notes vs. the lnternet . . . . . . . . . . . . . . . . . . 56-1 Web-Enabled Data Warehouses . . . . . . . . . . . . . . . . . . . 57-1 Publishing Database Information on the World Wide Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 8 1 Database Management and the Internet: Developments and Challenges. . . . . . . . . . . . . . . . . . . . 59-1


A Framework for Developing an Enterprise Data Warehousing Solution. . . . . . . . . . . . . . . . . . . . . . . 60-1 Why Dimensional Modeling is Right for Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61-1 Comparison of Relational and Multidimensional . DBMSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62-1 Architecture Overview of OLAP. . . . . . . . . . . . . . . . . ..63-1 Creating Value-Added EISs with Soft Information . . . . 64-1

SECTION XI1 DATA MINING 65 66 67 68 69

Data Mining: Exploring the Corporate Asset . . . . . . . .65-1 When a Business Abandons a Data Mine: Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66-1 Data Mining: Knowledge Discovery in Databases .... 67-1 Multimedia Databases and Data Mining . . . . . . . . . . . . 681 Database Mining Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 69-1


Terms of Reference: The Foundation for Implementing Document Management Systems . . . . . 70-1 Integrating EDMSs and DBMSs . . . . . . . . . . . . . . . . . . . .71-1 Using Lotus Notes in Executive Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72-1 .


75 76

Data Management Challenges of Healthcare Delivery. ......................................73-1 Improving Supply Chain Management for the Mining and Metals Industry . . . . . . . . . . . . . . . . . . . . . .74-1 Pharmaceutical Industry Information Issues and Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75-1 Integrating Package Processes over Multiple Application Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . .76-1


Helping Network Managers Cope with Enterprise Transformation and Data Demands. . . . . . 77-1

78 79

The World of Electronic Commerce . . . . . . . . . . . . . . . 78-1 Content-Based Retrieval and Indexing in Multimedia Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . 79-1

ABOUT THE CONSULTING EDITOR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1

INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .[-l


Introduction DATAMANAGEMENT IS A MAJOR DRIVING FORCE in the ongoing computer revolution. Data comes in many formats, it serves many uses, and it passes through many corporate processes. The purpose of this book is to examine the essential art and science of data management. This handbook focuses on commonly asked questions such as the following: What is data? What is information? What is the relationship between data and processes? How does one build a data centric organization? Who owns the data in an organization? How does one connect heterogeneous data sources in distributed organizations? How are data models related to object or component models? The scope of this handbook covers a full breadth of data management categories including strategy, methodologies, standards, techniques, database products, package solutions, data tools (e.g., modeling tools, database servers, conversion tools), data warehousing, data mining, the Internet, the World Wide Web, and future directions. This handbook is written by experienced and acknowledged experts in the Information Technology (IT) field and in data management, and as such is a powerful tool for organizations of any size and industry.

NOTABLE VICTORIES Data management has won significant battles over the past few years. Some of these are described here: Corporate data is widely perceived and protected as a corporate asset. The growth of data specific roles, such as data modeler, data administrator, and data analyst. An imminent release of the next generation of the ANSI SQL standard. Data source connectivity to the Internet and World Wide Web. Relational database servers have not been replaced by object-oriented databases, and in fact, appear to have proven their importance to

the industry, at least into the next century. This has resulted in continued investments in database server technology. Large data warehouses and databases once were defined to be in the hundreds of megabytes range, grew to the gigabyte range, and now are characterized as having entered the terrabyte range. Dramatic increases in storage capacities. Improved system response times due to improved data access methods and tools. Investments in data development methodologies, frameworks, and techniques. Continued popularity of modeling tools, conversion tools, and other data centric tools.

THE FORCES OF CHANGE Data management has changed dramatically over the last few decades. There have been significant improvements in data storage, data access options, data protection, data security, data distribution, and data storage. Where data storage was once at a high premium, s o much so that saving two bytes of data when storing dates caught on and spawned a billion dollar Year 2000 problem, it is becoming increasingly inexpensive and easily available. The original 5'14" floppy disks were replaced by 3.5" floppy disks -first double density, then high density. Hard disks now routinely offer capacities of 3 GB or more. CD-ROMs can store 500 MB of data, and rumor has it that the next generation of CD-ROMs will store at least 10 times that amount. There have been many challenges in the last few years to the traditional data management philosophies. Object-oriented technology threatened to replace the importance of corporate data with objects that bundled processes and data together. Computer viruses started to become a data epidemic. Application programs became so large that traditional diskettes no longer could store them. At one time, talk of a Windows 95 release threatened to consume all the free diskettes available in the world. This created the demand for CD-ROMs (which also has swept the music industry). For the few readers who may not know this already, computers with sound cards running Windows 95+ can play music CDs right out of the box. Large data stores have grown from the megabyte range to the terrabyte range. Most organizations are investing in their own data warehouses or data marts to support executive decision making. This has supported the rapid growth of the data-mining industry. Data stores also are connected to the Internet and the World Wide Web. This popularized multidimensional data stores, a new data modeling technique.


STRATEGIC ISSUES AND METHODOLOGIES Data management is an enterprise-wide concern. Although it is implemented at a technical level, several strategic issues must be resolved before data management initiatives can begin in most organizations. These include data ownership, data tools selection, building centralized data repositories, and defining corporate models. Data continues to be a valuable corporate asset in most organizations. Data methodologies serve a useful function in many organizations. They are generally a part of a full life-cycle development methodology. Data methodologies are available in a number of delivery channels. They can be purchased from third parties, bought as part of a modeling tool set, or built inhouse, based on best practices. Development initiatives involving data management will benefit from leveraging data methodologies.

n-TIER ARCHITECTURE, CLIENT/SERVER TECHNOLOGY, AND COMPONENTS What was once strictly called clientlserver technology has over the last few years been called many other names. Some even describe clientlserver as an older architecture that is being replaced by newer network architecture. But things are never straightforward in this profession. Since the late 1980s, clientlserver technology essentially consisted of two tiers. One tier was a data tier that typically resided on a dedicated platform physically implemented on a database server (e.g., Sybase SQL Server, Microsoft SQL Server, Oracle, Informix). Some business rules could be maintained in this layer. The user interface was the other tier, and it generally ran on a personal computer type of device. Business logic also was bundled with this layer. Fat clientlserver implementations were those that contained most of the business logic in the user interface tier. The data tier in this server only managed the applications data. Thin clientlserver implementations were those in which most of the business logic was moved into the data tier. The two-tier clientlserver model was popular and relatively straightforward to implement. However, it had several drawbacks: (1) the client and the server tiers were coupled tightly, so they essentially were stuck with each other and difficult to port; (2) the architecture did not scale well in term of users and system response time; and (3) system management, which had been well enabled in the mainframe environment, was problematic in two-tier architecture. Three-tier architecture was a response to some of these limitations. In three-tier architecture, the business logic is given its own tier that can run optionally on a dedicated platform. This additional tier addresses

many of the limitations posed by two-tier architecture. In a three-tier deployment, there is generally a thin client and a thin server (data tier). The business logic is encapsulated in the business logic tier. The Web-browser is arguably one implementation of a three-tier architecture. So, arguably, is network computing. n-tier architecture is supported by component-based development, which is expected t o become significantly mainstream in the next few years. This is one area t o watch closely.

DATA WAREHOUSING AND DATA MINING Data warehouses have been around for a long time, but it was Inmon who coined the term and made it a popular discipline. The entire purpose of data warehousing and data mining is t o provide information, support decision making, and alert t h e business of patterns that are detected. Data warehousing also is known a s information warehousing. Data warehouses, which were once in the 100 of MB range, have grown to terrabytes in size. Data warehouses support OLAP-type processing, which offers decision support and what-if analysis. It also is supported by a variety of tools and new modeling techniques (e.g., multidimensional database, metadata). Data mining is a discipline that uses the information stored in a data warehouse to uncover important trends and information for the organization. Data mining can retrieve information that otherwise would go unnoticed.

DOCUMENT MANAGEMENT Document management satisfies an important requirement in data management. Where an organization has a requirement t o store and retrieve complex datatypes, rich text, and blobs with a small set of access keys, document management offers advantages over the traditional relational or indexed databases. Document management is effective in storing large volumes of data that only require access in a limited number of access keys. It also can include such tools a s electronic forms, Lotus notes, and Internetlintranet solutions.

INDUSTRY SPECIFIC SOLUTIONS The next few years are going t o see the growth of industry-based solutions that package data management and processes together. Package solutions h a v e b e e n available for d e c a d e s . However, t h e w i d e s p r e a d acceptance of SAP, Peoplesoft, and Baan are likely t o accentuate this trend. Organizations that are looking for new solutions should consider purchasing such things a s data models from a third party before building them from the ground up. This will result in more industry-specific data solutions, including models, applications, and processes. The growth of objects and components also will create an industry of reusable code.

THE FUTURE The last few years in this industry have taught a central truth about information technology. The pace of change in this industry is staggering and difficult to predict with any degree of certainty beyond two or three years. However, experience also has shown the continuing importance of data in this industry. No matter where it is, no matter how it is packaged, no matter how it is accessed, data will undoubtedly play a central role in the health and competitiveness of organizations well into the future. Database management is in the process of radical changes coming from many different sources, including the following: faster data access, data on the Internet, stronger data protection, larger data stores, combinations of complex datatypes and traditional database servers, component-based development, industry-specific data models and solutions, new tools and dat a b a s e ~ data , mining, data migration, 64-bit processing, new operating systems, and more demanding data conversions.

GETTING THE MOST OUT OF THIS BOOK Many authors have contributed their experiences, knowledge, and ideas in the chapters of this book. Each of them are strong players in their respective niches. Taken as a whole, this handbook sweeps across many data management categories. It is divided into 15 sections, each focusing on a particularly useful aspect of data management. Section I, "Data Development Methodologies, Definitions, and Strategy," focuses on the full life cycle and strategies issues relating to data management. The section defines criteria for selecting a data development methodology, defines an interactive database development methodology, defines an enterprise-wide data organization, defines commonly used database terminology, provides a framework for establishing enterprise-wide data standards, and focuses on several strategic issues relating to data management. Section 11, "Data Models and Modeling Techniques," investigates a variety of modeling techniques, including the traditional approaches to data modeling and process modeling that have created online transaction processing applications (OLTP) all over the world. This chapter focuses on data models, component-based modeling, process and data model integration, dimensional data modeling for data warehouses, and object-oriented aspects. Section 111, "Data Integrity and Quality," offers advice on ensuring the integrity of data and its quality. This section describes how to use database constraints, referential integrity, and data quality.


Section IV, "Data Administration, Security, and Operations," examines the operational aspects of administrating a database environment. This section describes data disaster recovery options, NT domains, DBMS recovery procedures, change management strategies, security controls, Email security, and other security models. Section V, "Data Migration, Conversion, and Legacy Applications," covers a topic that is exceedingly important in today's rapidly merging business environment. This section describes a case study for data conversion, legacy database conversion, and migrating files to the relational model. Section VI, "Database Servers and Universal Servers," provides examples for using Structured Query Language (SQL) Server Transact-SQL. This section also describes universal data servers and complex data types. Performance improvements also are covered through examples of batch-processing improvements and better indexing schemes. This section also examines operating system support for database servers. Section VII, "Object Technology, Object Modeling, and Object Databases," covers the relationship and coexistence of object technology and data management. This section describes Microsoft Visual Basic's hot new release's of improved data access methods. This chapter also covers component-based development, ActiveX, CORBA, Rational Rose, JavaBeans, Java, component architectures, and object wrapping. Section VIII, "Distributed Databases, Portability, and Interoperability," examines issues related to distributed databases, portability, and interoperability. This section describes distributed database design, managing multiple distributed databases, mobile databases, and database gateways. Section IX, "Data Replication," describes data replication through a variety of techniques and tools. The advantages and types of data replication that are available also are described in this section. Section X, "Data and the Internet, Intranet, and the World Wide Web," describes the relationship of data management to the Internet, intranet, and the World Wide Web. This section also contrasts Lotus Notes and intranets as corporate solutions. Other topics that are covered include publishing database information on the WWW, Web-enabling data warehouses, and connecting data sources to the Web with minimal programming. Section XI, "Data Warehousing, Decision Support, and OLAP," describes the different elements of this popular trend. This section defines a framework for Enterprise Data Warehouse projects, and describes dimensional data modeling, contrasts relational models to dimensional models, and focuses on different aspects of OLAP.

Section XII, "Data Mining," focuses on several techniques for mining corporate data for valuable information. This section examines techniques, costs, and tools related to data mining initiatives. Section XIII, "Document Management," describes the terms of reference for a document management project, compares DBMSs to document management applications, and examines the use of Lotus Notes as an executive information systems (EIS) enabler. Section XIV, "Industry-Specific and Package Solutions," examines techniques for leveraging the increasingly popular trend of using package solutions o r industry-specific prints. This section describes how d a t a management technologies can be used to improve several different industries. It also describes how to integrate several different packages, such as SAP, BAAN, and PeopleSoft. Section XV, "Emerging Practices and Directions," examines such topics a s network demand forecasting, electronic commerce, and multimedia.

Sanjiv Purba

Section I

Data Development Methodologies, Definitions, and Strategy EFFECTIVE DATA SOLUTIONS MUST START AT THE STRATEClC LEVEL with an organization's executive. From here they must sweep across the organization touching all relevant departments and incorporating diverse processes and tools. This section contains chapters to define how effective data solutions can be formalized and implemented across the organization. The cornerstone of this involves the selection and implementation of data development methodologies. Data development methodologies are powerful tools that define the activities and the deliverables produced in data-related projects. In some cases, data development methodologies are a component of a larger, full-cycle development methodology. This section contains the following seven chapters: Chapter 1, "Database Development Methodology and Organization," provides a framework for developing databases from the logical to the physical level in enterprises. The discussion in this section is independent of implementation tools. This chapter also discusses roles and responsibilities to support the database development methodology. The methodology is discussed in the context of a full life-cycle development methodology. The chapter also identifies features that should be considered when selecting or building a corporate data development methodology. Chapter 2, "A Review of Database System Terminology," begins with a definition of data and then goes on to define various terms commonly used in data management, including data element, data item, schema, data administrator, repository, relational database system, attribute, record, federated database system, middleware, client/server, legacy data, migration, and integration.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY Chapter 3, "Developing a Global Information Vision," describes how to provide a global perspective to information and data resources in an organization. This chapter defines a methodology, derived from actual success stories, for showing how a global vision can be constructed and implemented across the wider organization. Chapter 4, "An Approach for Establishing Enterprise Data Standards," provides practical examples and categories for building a set of enterprise data standards. This chapter is assembled from actual project experiences to describe techniques that work and, as importantly, do not work, when trying to build and implement enterprise data standards. Suggestions for avoiding the common pitfalls experienced by organizations during this initiative also are provided in this chapter. Chapter 5, "Completeness, Context, and Proportion: Shaping Data Design to the Business," focuses on aligning business requirements with data design. The fundamental fact argued by this chapter is that good data design must be rooted by a solid awareness of the business. This chapter describes how this can be achieved. The recommendations in the chapter are drawn from real-world projects. Chapter 6, "An Enterprise Decision Framework for System Selection," describes how a decision framework provides assistance t o organizations in identifying common challenges encountered by project teams when selecting and implementing enterprise information systems. By choosing an appropriate team and the right partners, by choosing the right system and data design, companies can substantially increase the performance of their enterprise system. Chapter 7, "The Importance of Data Architecture in a ClientJServer Environment," focuses on data architecture as one imperative component of the traditional clientlserver model. Data architecture has an impact on system performance, system security, application portability, and application scalability. Data architecture often is developed in parallel with technical architecture, organization architecture, and application architecture.

Chapter 1

Database Development Methodology and Organization

Sanjiv Purba

DATABASE DEVELOPMENT IS ONE OF THE FUNDAMENTAL OBJECTIVES of the data management function and certainly one of the end products of the process. In recent years, several trends have impacted the way that databases are built and the role they play in the overall organization. Some of these trends include data warehousing, object-oriented technology, e-commerce, and the emergence of very large databases (VLDBs). Other changes t o the landscape include the popularity of complex data types (e.g., BLOBS,video), universal databases, and object databases. Despite these changes, the basis of many online transaction processing applications (OLTP) that run the business is still the relational database and the flat files. This fact is not going t o change dramatically over the next few years. If anything, the relational database has proven its value as an enterprise enabler and, like the IBM mainframe, is here to stay for the foreseeable future. This chapter defines a database development methodology and approach that has proven successful on a variety of projects, such a s $100,000 t o $15,000,000 budgets, mainframe, client/server, three-tier with 0 0 , and package implementations. This approach promotes viewing methodologies as flexible frameworks that are customized for every specific instance. It allows data-oriented teams to use their personal insight and experience alongside t h e best practices embedded in t h e methodology. This chapter also defines organizational roles for a dataoriented environment.

0-8493-9976-9/99/$0.00+$.50 0 1999 by CRC Press LLC

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY BENEFITS The complexity that is inherent in constructing relational database solutions can be reduced by using proven database development methodologies on projects. Methodologies are an excellent example of best practices and project lessons. Use of methodologies, therefore, reduces risk on development projects. Methodologies define activities and deliverables that are constructed in projects that were successful. Following these successful lessons can reduce project development time while increasing product quality. Furthermore, the use of methodologies simplifies the process of tracking project progress because there are clear benchmarks that can be reviewed by the project manager. Methodologies that offer templates/deliverables also allow a quickstart to the development process. SELECTING A DATABASE DEVELOPMENT METHODOLOGY Development methodologies with well-defined database development phases are commonly available in the marketplace. Some are freely available with modeling or project management tools, although others are found on the World Wide Web. Many of the larger consulting firms have d e veloped proprietary methodologies based on their corporate project experiences and proven best practices. These can be purchased separately or they can be bundled with consulting/mentoring services retained from the firm. The following list identifies some of the features that should be included in any database development methodology that is being considered for deployment in an organization. Linkage to a full lifecycle development methodology: A full life-cycle methodology supports more than database development. The database development methodology chosen should either be a component of a larger full lifecycle methodology, or link seamlessly with one. Failure to do this could result in mismatched techniques or the development of deliverables that are not used. Techniques: Many popular development methodologies support a combination of techniques to streamline development of deliverables. The traditional waterfall approach involves producing deliverables in a sequential fashion. Deliverable B is not started until Deliverable A is completed and signed off. This approach, however, historically has proven to be slow on many projects of all sizes. As a result of this experience, a rapid application development (RAD) approach has gained popularity in the past 10 years. RAD produces deliverables in a much smaller timeframe than the older waterfall approach. Iteration and prototyping are cornerstones of most RAD approaches, as are teams that combine technical resources and users during the analysis and design phases of the project lifecycle. RAD has proven to be successful on smaller projects, but has been problematic on the larger

Database Development Methodology and Organization ones due to the complexity of the business requirements. A relatively new approach combines the best elements of both the waterfall and RAD approaches and has proven valuable on larger development projects. Support: A development methodology (or a database development methodology) is a product, whether an organization has paid for it or not. As such, it is important for the methodology to be supported by the vendor into the future. An unsupported methodology becomes obsolete in sort order. Some questions to ask the vendor include: "How much research is being conducted to improve the methodology?" "Is there a hotline for technical support?," and "When is the next release of the methodology being released?" Price: The price of the methodology should be considered in whole and in parts and assessed against the value that is received. Consider the one-time cost, training costs, upgrade costs, yearly licence fees, costs per user, customization costs, hardwarelsoftware support costs, and costs for future releases. Vendor: Consider the stability and market share of the vendor providing the methodology. The vendor's references also should be checked to ascertain their support for clients. Vendors that are more stable and have more market share are more likely to improve their methodology with new techniques in the future. Proven Success: One of the surest ways of selecting a suitable methodology is to check the references of similar organizations that have used it successfully on development projects. Electronic Availability: The methodology should be available electronically through Lotus Notes, the Internet, or CD-ROM. It also should be available on paper. This makes the methodology widely available to those using it across the organization. Templates/Delioerables: Reusable templates and deliverables are a good source of best practices that provide the means for quick starting development projects. Many methodologies are demonstrated with these, but the templates/deliverables are not provided to customers. In such cases, it is valuable to try to negotiate the inclusion of templatesJdeliverables as part of the transaction. If the templatesldeliverables still are not offered by the vendor, but the rest of the methodology is acceptable, a pilot project should be used to create reusable templates and deliverables for future projects to use. Although this may slow the pilot project down in the short term, subsequent projects will run more efficiently. It is also desirable to select a methodology architecture that allows additional templates and delive r a b l e ~to be added to the database on an ongoing basis. Linkages to newer architectures: The methodology also should support linkages with modules that support data warehousing, object technol-

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY ogy, e-commerce, and Web architectures. Flexibility in expanding the methodology directly or through deliverable linkages is desirable. Ease of Learningand Use: Methodologies that are easy to learn and use are more likely to be used on projects. Some methodologies are packaged with training courses from the vendor or other third parties. It is not unusual to add to this list of features or to assign more weight to a handful1 of them because of their importance to a specific organization. Experience has shown that complicating the selection process does not necessarily improve the quality of the final selection. In fact, this can lead to wasted time and intense team debates or arguments that end in worthless stalemates. It is preferrable to build a short list of candidate methodologies by disqualifying candidates that are weak on one or two key features (e.g., not available electronically or purchase price is greater than $100,000). The short list then can be compared to maybe five or six of the features that are of key importance to the organization. It also is useful to conduct a limited number of pilot projects that test the value of a methodology before making a final selection. It is also not unusual to pilot two different methodologies in a conference room pilot (CRP) to make a final determination. This process can take between 6 weeks and 6 months.

HIGH-LEVEL DATABASE DEVELOPMENT METHODOLOGY This section defines a high-level methodology for database development. This methodology provides a good start for small to medium-size projects; however, a formal third-party methodology should be considered for projects that require more than 6 months of development effort. The activities discussed in this section are mapped to the standard project development framework, which consists of the following main phases: requirements, architecture, design, development, testing, implementation, and post-implementation. These phases can be conducted in parallel or sequentially depending on the exact nature of the methodology, and are restricted to database specific activities. The subprocesses that are described in this section fit into a larger full life-cycle methodology that would address such activities as corporate sponsorship for the project, project plan definition, organization building, team building, user interface development, application design, technology selection, acceptance testing, and deployment. It is assumed that these activities are completed outside the database development methodology phases. Define Business Requirements: Business requirements are captured for any system development effort. The requirements also should be used to build the logical data model. They will feed such things as the number of entities, attribute names, and types of data stored in each attribute. These often are categorized by subject area.

Database Development Methodology and Organization Borrow or Create the Data Model: With a solid understanding of the business requirements, it is a good idea to search the market for a data model that can be purchased from a third party. This subsequently can be customized for the organization. Build Logical Data Model: The logical data model is built iteratively. The first view usually is done at a high level, beginning with a subject area or conceptual data model. Subsequent levels contain more detail. The process of normalization also is applied at this stage. There are many good books on normalization, so normal forms will not be covered. Foreign key fields and potential indexes also can be considered here. It is not necessary to build the logical data model for performance at this time, and physical considerations are left until a later process. Verify the Data Model: The logical data model is validated iteratively with users, the fields of the user interface, and process models. It is not unusual to make changes to the data model during this verification process. New requirements, which need to be fitted into the data model, also may be identified. Build Data Architecture:The data architecture is defined in the context of the physical data environment. Considerations, such as the database server, distribution, components, and partitioning, are considered in this step. Build the Physical Data Model: The logical data model is converted to a physical data model based on the specific database that is used. The physical data model will vary with the choice of database products and tools. The physical data model also contains such objects as indexes, foreign keys, triggers, views, and user-defined datatypes. The physical data model is optimized for performance and usually is denormalized for this reason. Denormalization can result in redundancy, but can improve system performance. Building the physical data model is not a one-stop process. Do not expect to build a final version of the physical data model on the first attempt. Refine the Data Model: The physical data model is refined continuously as more information becomes available, and the results of stress testing and benchmarking become available to the database development team. The logical data model also should be maintained as the physical data model is refined. Complete Transaction Analysis: Transaction analysis is used to review system transactions so that the physical data model can be refined for optimum system performance. Transaction analysis results are only meaningful after the business requirements and systems design are fairly solid. Transaction analysis produces statistics showing the access frequency for the tables in the database, time estimates, and data volumes. Populate the Data: After the database structure is established and the database is created, it is necessary to populate the database. This can 1-5

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY be done through data scripts, applications, or data conversions. This can be an extensive set of activities that requires substantial data mapping, testing, and parallel activities. It is expected that the details of this are included in the full life-cycle methodology. Complete Testing: Testing a database usually is done in the context of applications and is covered in the full lifecycle methodology. Some specific types of testing, such as stress testing, benchmarking, and regression testing, can be used to refine the performance of the physical data model. These require high volumes of data, testing tools, and distribution tools.

DELIVERABLES Some of the important deliverables that are created from inception to the creation of a physical database are discussed in this section. It is useful to build a reference database that contains samples of each of these delive r a b l e ~s o that project teams know in advance what they are attempting to build. Requirements Document: This is the statement of the business requirements for the application being developed. This deliverable can contain narrative and any number of models or prototypes to capture and represent the business requirements. Conceptual Model/Subject Areas: This is a high-level view of the business subject areas that are within the scope of the data model (e.g., accounting, administration, billing, engineering). Logical Data Model: This contains entities, attributes, and business rules within the subject areas. The model also shows relationships between the entities. Key fields and foreign keys also can be identified in this model. Transaction Analysis: This is a list of transactions supported by the system, the entities (and possibly the fields) that are accessed by the transactions, and the frequency with which they are accessed. A create, read, update, and delete (CRUD) matrix is a useful input for helping with this analysis. Physical Data Model: This is a denormalized version of the logical data model that is optimized for performance under a specific technical environment and refined through the transaction analysis results. The physical data model usually is refined throughout a development cycle and is not finished until implementation. The physical data model contains physical objects such a s tables, fields, indexes, foreign keys, primary keys, views, user-defined data types, and rules. Object Model: An object model supports the logical data model. This often serves as an intermediate layer between an object-based user interface and a relational back-end database.

Database Development Methodology and Organization Validation Model: This is a cross-reference of models, such as process models, to the logical data model to prove its validity. It often includes a mapping between the logical data model with a user interface and reports to identify gaps. Conversion Strategy: This is a statement of the strategy used to convert data into a new application. The level of detail can vary signficantly. This could be anything from high-level principles to detailed conversion scripts.

Modeling tools are critical for the database development process. There are a number of tools with various add-ons that can be used in this process. Modeling tools should offer support for both data models and process models. It also is becoming more useful for modeling tools to support object models or to link to other tools that do. Tools that support reversereengineering from physical databases to generate logical data model or scripts are useful for organizations that require extensive changes to data structures (possibly following a corporate merger). There are many other tools that are useful in the database development process. Some of these include CASE tools, conversion tools, testing tools, and database server tools.

ORGANIZATION When staffing a project that involves a data initiative, it is necessary to fill specific roles. The roles defined in this section are generally specific to the data initiative. These roles often are complimented by other roles in full implementation projects. Projects that have high object-oriented content skew the organization towards object-modeling skillsets. Project Sponsor: Projects should not be initiated or conducted without a senior project sponsor who is positioned to remove obstacles and ensure that the project team has the full support they require to be successful. ProjectManager. The project manager is in charge of the entire project, including the data initiative. Business User: This person provides the business rules for the application, which are used to derive the entities and attributes necessary to save the data. Business Analyst The business analyst provides a critical link between the business user and the data architect by understanding the business requirements and translating them into technical words. Data Architect: This person has the responsibility of defining the data architecture. This could be distributed, central, standalone, or integrated with a sophisticated overall architecture.


Data Analyst: The data analyst works with the business analyst to build a consistent view of each element of the data. This person understands the linkage between the business and the individual items of data. Data Modeler: This person works with the data architect to build a logical relational data model and also may get involved in transforming the logical data model into a physical data model. Object Modeler: The object modeller becomes involved in projects to build an object model, including messages and methods. This person also may be responsible for mapping the object model to the corporate data model. Database Administrator: This person implements the physical database, maintains and optimizes the physical environment, restricts access to the database by controlling privilege levels for users, offers advice to the development team for converting the logical data model to the physical data model, and holds the overall responsibility for running the database environment on a data-to-day basis. Network Administrator: The network administrator maintains the physical network, has the responsibility for maintaining the integrity of the physical environment that supports the data environment, and operates at the operating system level and the hardware level. For example, this person would add more physical disk to support larger databases. Developer: This person uses the database(s) for application development. PITFALLS Misuse or misinterpretation of how methodologies should be executed can result in signficantly negative impacts to project timelines. It is not unusual for organizations to use methodologies as process charts or recipes without streamlining any of the activities. This can result in a considerable amount of wasted time as deliverables or activities are produced without an understanding of how they are leading toward a solution. Methodologies should be adjusted for specific projects. Activities or deliverables that are not necessary should be dropped from the project plan. Methodologies that are too complicated or difficult to learn and use frequently are avoided by project teams. There are some methodologies that may contain information for thousands of project contingencies. However, they require thousands of megabytes of storage or dozens of manuals to store. During tight project timeframes, such methodologies are sidelined quickly.

Database Development Methodology and Organization It is important to update methodologies over time. New project experiences and best practices should be included in the methodology at specific intervals. CONCLUSION Database development methodologies are a subset of full life-cycle methodologies. Project teams can access a third-party database development methodology or follow the high-level framework described in this chapter for database development. Database development methodologies also should support parallel development, iteration, high-user involvement, and be accompanied by a database of reusable templates or sample deliverables. References

Deloitte &Touche Consulting Group Framework for Computing Solutions. Maguire, S., Writing Solid Code. Microsoft Press, Redmond, WA, 1993. Purba, S., Developing Client/Server Systems Using Sybase SQL Server System l l . John Wiley & Sons, New York, 1995. Smith, P. N. Client/Server Computing. 2nd. ed., Sams Publishing, Indianapolis, IN, 1994. Willian, P., Effective Methods for Software Testing. John Wiley &Sons, New York, 1995.

Chapter 2

A Review of Database System Terminology

Marion G. Ceruti

MANYPUBLICATIONS, TECHNICAL MANUALS, AND MARKETING BROCHURES related t o databases originated from sources that exhibit a wide variety of training, background, and experience. Although t h e result has been an expanded technical vocabulary, the growth of standards - particularly with regard t o a comprehensive, uniformly accepted terminology has not kept pace with the growth in the technology itself. Consequently, t h e nomenclature used to describe various aspects of database technology is characterized, in some cases, by confusion and chaos. This is true for both homogeneous databases and for heterogeneous, distributed database systems. The state of imprecision in the nomenclature of this field persists across virtually all data models and their implementations. The purpose of this chapter is t o highlight some areas of conflict and ambiguity and, in some cases, t o suggest a more meaningful use of the terminology.

GENERAL DATABASE TERMS What Does the Word Data Mean? According t o Webster, t h e word data is a noun that refers t o things known o r assumed; facts o r figures from which conclusions can b e inferred; information. Derived from t h e Latin word datum, meaning gift o r present, data can be given, granted, o r admitted, premises upon which something can be argued or inferred. Although the word data is most frequently observed, the singular form, datum, is also a real or assumed thing used a s the basis for calculations.

The Department of Defense defines data a s a representation of facts, concepts, o r instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means.

0-8493-9976-9/99/$0.00+$.50 B 1999 by CRC Press LLC

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY The word data is also used as an adjective in terms such as data set, data fill, data resource, data management, or data mining. A data set is an aggregate of data items that are interrelated in some way. Implicit in both definitions of data is the notion that the user can reasonably expect data t o be true and accurate. For example, a data set is assumed to consist of facts given for use in a calculation or an argument, for drawing a conclusion, or as instructions from a superior authority. This also implies that the data management community has a responsibility to ensure the accuracy, consistency, and currency of data. Data Element vs. Data Item In an attempt to define database terms with a view toward practical applications, the Department of Defense POD) defines a data element as a named identifier of each of the entities and their attributes that are represented in a database. As such, data elements must be designed as follows: Representing the attributes (characteristics) of data entities identified in data models. According to functional requirements and logical (as opposed to physical) characteristics. According to the purpose or function of the data element, rather than how, when, where, and by whom it is used. With singularity of purpose, such that it has only one meaning. With well-defined, unambiguous, and separate domains. Other definitions are that a data element is data described at the useful primitive level; a data item is the smallest separable unit recognized by the database representing a real-world entity. What is clear from all these definitions is that there is considerable ambiguity in what these terms mean. The author proposes the following distinction between data element and data item: A data element is a variable associated with a domain (in the relational model) or an object class (in the object-oriented model) characterized by the property of atomicity. A data element represents the smallest unit of information at the finest level of granularity present in the database. An instance of this variable is adata item. A data element in the relational model is simply an attribute (or column) that is filled by data items commonly called the "data fill."

This distinction clarifies but does not preclude any of the other definitions. What Is a Database? The definitions for the term database range from the theoretical and general t o t h e implementation specific. For example, K.S. Brathwaite, H. Darwen, and C.J. Date have offered two different, but not necessarily in2-2

A Review o f Database System Terminology consistent, definitions of a database that are specific to the relational model. Darwen and Date build their definition on fundamental constructs of the relational model, and it is very specific to that model. Brathwaite employs a definition that is based on how databases are constructed in a specific database management system (DBMS). These definitions are discussed in the next section on relational database terms. Actually, the term database can have multiple definitions, depending on the level of abstraction under consideration. For example, A.P. Sheth and J.A. Larson define database in terms of a reference architecture, in which a database is a repository of data structured according to a data model. This definition is more general than that of either Brathwaite or Darwen and Date because it is independent of any specific data model or DBMS. It could apply to hierarchical and object- oriented databases as well as to relational databases; however, it is not as rigorous as Darwen and Date's definition of a relational database because the term repository is not defined. Similarly, P.J. Fortier et al., in a set of DoD conference proceedings, define a database to be a collection of data items that have constraints, relationships, and a schema. Of all the definitions for database considered thus far, this one is the one most similar to that of Sheth and Larson, because the term data model could imply the existence of constraints, relationships, and a schema. Moreover, Fortier et al. define schema as a description of how data, relationships, and constraints are organized for user application program access. A constraint is a predicate that defines all correct states of the database. Implicit in the definition of schema is the idea that different schemata could exist for different user applications. This notion is consistent with the concept of multiple schemata in a federated database system (FDBS). (Terms germane to FDBSs are discussed in a subsequent section.) L.S. Waldron defines database as a collection of interrelated files stored together, where specific data items can be retrieved for various applications. A file is defined as a collection of related records. Similarly, L. Wheeler defines a database as a collection of data arranged in groups for access and storage; a database consists of data, memo, and index files.

Database System vs. Data Repository Both of these terms refer to a more comprehensive environment than a database because they are concerned with the tools necessary for the management of data in addition to the data themselves. These terms are not mutually exclusive. A database system (DBS) includes both the DBMS software and one or more databases. A data repository is the heart of a comprehensive information management system environment. It must include not only data elements, but metadata of interest to the enterprise, data screens, reports, programs, and systems.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY A data repository must provide a set of standard entities and allow for the creation of new, unique entities of interest to the organization. A database system can also be a data repository that can include a single database or several databases. A. King et al. describe characteristics of a data repository as including an internal set of software tools, a DBMS, a metamodel, populated metadata, and loading and retrieval software for accessing repository data.

WHAT IS A DATA WAREHOUSE AND WHAT IS DATA MINING? B. Thuraisingham and M. Wysong discussed the importance of the data warehouse in a DoD conference proceeding. A data warehouse is a database system that is optimized for the storage of aggregated and summarized data across the entire range of operational and tactical enterprise activities. The data warehouse brings together several heterogeneous dat a b a s e ~from diverse sources in the same environment. For example, this aggregation could include data from current systems, legacy sources, historical archives, and other external sources. Unlike databases that are optimized for rapid retrieval of information during real-time transaction processing for tactical purposes, data warehouses are not updated, nor is information deleted. Rather, time-stamped versions of various data sets are stored. Data warehouses also contain information such as summary reports and data aggregates tailored for use by specific applications. Thus, the role of metadata is of critical importance in extracting, mapping, and processing data to be included in the warehouse. All of this serves to simplify queries for the users, who query the data warehouse in a read-only, integrated environment. The data warehouse is designed to facilitate the strategic, analytical, and decision-support functions within an organization. One such function is data mining,which is the search for previously unknown information in a data warehouse or database containing large quantities of data. The data warehouse or database is analogous to a mine, and the information desired is analogous to a mineral or precious metal. The concept of data mining implies that the data warehouse in which the search takes place contains a large quantity of unrelated data and probably was not designed to store and support efficient access to the information desired. In data mining, it is reasonable to expect that multiple, welldesigned queries and a certain amount of data analysis and processing will be necessary to summarize and present the information in an acceptable format.

Data Administrator vs. Database Administrator The following discussion is not intended to offer an exhaustive list of tasks performed by either the data administrator @A) or database admin-

A Review of Database System Terminology

istrator (DBA), but rather to highlight the similarities and essential distinctions between these two types of database professionals. Both data administrators and database administrators are concerned with the management of data, but at different levels. The job of a data administrator is to set policy about determining the data an organization requires to support the processes of that organization. The data administrator develops or uses a data model and selects the data sets supported in the database. A data administrator collects, stores, and disseminates data as a globally administered and standardized resource. Data standards on all levels that affect the organization fall under the purview of the data administrator, who is truly an administrator in the managerial sense. By contrast, the technical orientation of the database administrator is at a finer level of granularity than that of a data administrator. For this reason, in very large organizations, DBAs focus solely on a subset of the organization's users. Typically, the database administrator is, like a computer systems manager, charged with day-to-day, hands-on use of the DBS and daily interaction with its users. The database administrator is familiar with the details of implementing and tuning a specific DBMS or a group of DBMSs. For example, the database administrator has the task of creating new user accounts, programming the software to implement a set of access controls, and using audit functions. To illustrate the distinction between a data administrator and a database administrator, the U.S. Navy has a head data administrator whose range of authority extends throughout the entire Navy. It would not be practical or possible for an organization as large as the U.S. Navy to have a database administrator in an analogous role, because of the multiplicity of DBSs and DBMSs in use and the functions that DBAs perform. These conceptual differences notwithstanding, in smaller organizations a single individual can act as both data administrator and database administrator, thus blurring the distinction between these two roles. Moreover, as data models and standards increase in complexity, data administrators will increasingly rely on new technology to accomplish their tasks, just as database administrators do now.

RELATIONAL DATABASE TERMS Because relational technology is a mature technology with many practical applications, it is useful to consider some of the important terms that pertain to the relational model. Many of these terms are straightforward and generally unambiguous, whereas some terms have specific definitions that are not always understood.


A data set represented in the form of a table containing columns and rows is called a relation. The columns are called attributes,and the rows are called tuples. Darwen and Date define a tuple to be a set of ordered triples of the form where A is the name of an attribute, V is the name of a unique domain that corresponds to A, and 1 is a value from domain V called the attribute value for attribute A within the tuple. A domain is a named set of values. Darwen and Date also describe a relation as consisting of a heading and a body, where the heading is a set of ordered pairs, ;and the body consists of tuples, all having the same heading .An attribute value is a data item or a datum. In some respects, a relation is analogous to an array of data created outside a relational DBMS, such as in a third-generation language (3GL) program like C, FORTRAN, or Ada, in which the rows are called records and the columns are called fields. Waldron defines a field as a set of related letters, numbers, or other special characters, and defines a record as a collection of related fields. The interchangeability of the terms record and row has been illustrated by some of the major DBMS vendors in the way in which they report the results of a query to the user. Earlier versions of commercial DBMSs indicated at the end of a query return messages such as "12 records selected." Now, it is more common to see messages such as "12 rows selected" or "12 rows affected" instead. Relation vs. Relation Variable The correct manner in which the term relation should be used is according to the definition given previously, which specifically includes values v, from domain V. However, the term relation has not always been used correctly in the industry. Relation frequently is used as though it could mean either a filled table with data present (correct), or an empty table structure containing only data headers (incorrect). The confusion here stems from a failure to distinguish between a relation, which is a filled table with tuples containing attribute values, and a relation variable (or relvar), which is an empty table structure with only attribute names and domains from which to choose values. The values of a relation variable are the relations per se. This distinction becomes especially important when mapping between the relational and object-oriented data models. Database vs. Database Variable

In a manner similar to the relation-relvar dichotomy, a database variable is different from a database per se. A database variable (or dbvar) is a

A Review of Database System Terminology

named set of relvars. The value of a given dbvar is a set of specific, ordered pairs cR,r>,where R is a relvar and r (a relation) is the current value of that relvar, such that one such ordered pair exists for each relvar in the dbvar and that, taken together, all relvar values satisfy the applicable constraints (in particular, integrity constraints). A value of the dbvar that conforms to this definition is called a database. Some call this a database state, but this term is not used very often.

Database vs. DBMS As all the examples discussed thus far indicate, not all database terminology is as unambiguous as "rows" and "columns." Incorrect understanding of the fundamental concepts in database technology can lead to inconsistent terminology, and vice versa. DBMS Software Does Not Equal a Database. For example, databases frequently are described according to the DBMS that manages them. This is all well and good, as long as one realizes that references to an Oracle database and Sybase database refer to the databases that are managed using Oracle or Sybase software, respectively. Difficulty arises when this nomenclature results in the misconception that DBMS software is actually the database itself. The assumption that Informix, for example, is a database is as illogical as thinking that the glass is the same as the water in it.

Concept vs. Implementation in Relational Databases Darwen and Date's definition of a database, as well as that of other database researchers (some of whom are mentioned by name in this chapter and others who are not), does not require the presence of a DBMS. Conceptually, it is possible to have a database without a DBMS or a DBMS without a database, although obviously the greatest utility is achieved by combining the two. In the context of a specific DBMS environment, Brathwaite defines an IBM DB2 database as "a collection of table and index spaces where each table space can contain one or more physical tables." This definition is inconsistent with Date's definition because it allows for the possibility that the table spaces could be empty, in which case no data would be present. It is not clear that even relvars would be present in this case. That notwithstanding, if physical tables are present, Brathwaite's definition becomes an implementation-specific special case of Date's definition. (Substitute the word "must" for "can" to resolve the problem with Brathwaite's definition.) Except in the case where the vendor has specified default table and index spaces in the DBMS code, the database and index spaces are not actually part of the DBMS per se. The DBA needs to create both the database space and the index space using the DBMS software.



The topic of database normalization, sometimes called data normalization, has received a great deal of attention. As is usually the case, database normalization is discussed in the following section using examples from the relational data model. Here, the terms relation and table are used interchangeably. However, the design guidelines pertaining to database normalization are useful even if a relational database system is not used. For example, B.S. Lee has discussed the need for normalization in the object-oriented data model. Whereas the intent of this section is to introduce the correct usage of normalization terminology as it applies to database technology, it is not meant to be an exhaustive exposition of all aspects of normalization.

What Is Database Normalization? Strictly speaking, database normalization is the arrangement of data into tables. P. Winsberg defines normalization as the process of structuring data into a tabular format, with the implicit assumption that the result must be in at least first normal form. Similarly, Brathwaite defines data normalization as a set of rules and techniques concerned with: Identifying relationships between attributes Combining attributes to form relations (with data fill) Combining relations to form a database The chief advantage of database or data normalization is to avoid modification anomalies that occur when facts about attributes are lost during insert, update, and delete transactions. However, if the normalization process has not progressed beyond first normal form, it is not possible to ensure that these anomalies can be avoided. Therefore, database normalization commonly refers to further non-loss decomposition of the tables into second through fifth normal form. Non-loss decomposition means that information is not lost when a table in lower normal form is divided (according to attributes) into tables that result in the achievement of a higher normal form. This is accomplished by placing primary and foreign keys into the resulting tables so that tables can be joined to retrieve the original information.

What Are Normal Forms? A normal form of a table or database is an arrangement or grouping of data that meets specific requirements of logical design, key structure, modification integrity, and redundancy avoidance, according to the rigorous definition of the normalization level in question. A table is said to be in " X normal form if it is already in "X-l" normal form and it meets the additional constraints that pertain to level "X." In first normal form (lNF), related attributes are organized into separate tables, each with a primary key. A primary key is an attribute or set of

A Review of Database System Terminology attributes that uniquely defines a tuple. Thus, if a table is in lNF, entities within the data model contain no attributes that repeat as groups. W. Kent has explained that in INF, all occurrences of a record must contain the same number of fields. In lNF, each data cell (defined by a specific tuple and attribute) in the table will contain only atomic values. Every table that is in second normal form (2NF) also must be in INF, and every non-key attribute must depend on the entire primary key. Any attributes that do not depend on the entire key are placed in a separate table t o preserve the information they represent. 2NF becomes an issue only for tables with composite keys. A composite key is defined as any key (candidate, primary, alternate, or foreign) that consists of two or more attributes. If only part of the composite key is sufficient to determine the value of a non-key attribute, the table is not in 2NF. Every relation that is in third normal form (3NF) must also be in 2NF, and every non-key attribute must depend directly on the entire primary key. In 2NF, non-key attributes are allowed to depend on each other. This is not allowed in 3NF. If a non-key attribute does not depend on the key directly, or if it depends on another non-key attribute, it is removed and placed in a new table. It is often stated that in 3NF, every non-key attribute is a function of "the key, the whole key, and nothing but the key." In 3NF, every non-key attribute must contribute to the description of the key. However, 3NF does not prevent part of a composite primary key from depending on a non-key attribute, nor does it address the issue of candidate keys. Boyce-Codd normal form (BCNF) is a stronger, improved version of 3NF. Every relation that is in BCNF also must be in 3NF and must meet the additional requirement that each determinant must be a candidate key. A determinant is any attribute, A, of a table that contains unique data values, such that the value of another attribute, B, fully functionally depends on the value of A. If a candidate key also is a composite key, each attribute in the composite key must be necessary and sufficient for uniqueness. Winsberg calls this condition "unique and minimal." Primary keys meet these requirements. An alternate key is any candidate key that is not the primary key. In BCNF, no part of the key is allowed to depend on any key attribute. Compliance with the rules of BCNF forces the database designer to store associations between determinants in a separate table, if these determinants do not qualify as candidate keys. BCNF removes all redundancy due to singular relationships but not redundancy due t o many-to-many relationships. To accomplish this, further normalization is required. Fourth and fifth normal forms (4NF and 5NF) involve the notions of multivalued dependence and cyclic dependence, respectively. A table is in 4NF if it also is in BCNF and does not contain any independent many-to-many relationships.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY That notwithstanding, a table could be in 4NF and still contain dependent many-to-many relationships. A table is in 5NF if it is also in 4NF and does not contain any cyclic dependence (except for the trivial one between candidate keys.) In theory, 5NF is necessary to preclude certain join anomalies, such as the introduction of a false tuple. However, in practice, the large majority of tables in operational databases do not contain attributes with cyclical dependence.

What Are Over-Normalization and Denormalization? Over-normalization of a table results in further non-loss decomposition that exceeds the requirements to achieve 5NF. The purpose of this is to improve update performance. However, most operational databases rarely reach a state in which the structure of all tables has been tested according to 5FN criteria, so over-normalization rarely occurs. Over-normalization is the opposite of denormalization, which is the result of intentionally introducing redundancy into a database design to improve retrieval performance. Here, the database design process has progressed to 3NF, BCNF, 4NF, or even to 5NF. However, the database is implemented in a lower normal form to avoid time-consuming joins. Because the efficiency of "select" queries is an issue in operational systems, denormalization is more common than over-normalization. The first six normal forms (including BCNF) are formal structures of tables that eliminate certain kinds of intra-table redundancy. For example, 5NF eliminates all redundancy that can be removed by dividing tables according to attributes. Higher normal forms exist beyond 5NF. They address theoretical issues that are not considered to be of much practical importance. In fact, Date has noted that it is not often necessary or desirable to carry out the normalization process too far because normalization optimizes update performance at the expense of retrieval performance. Most of the time, 3NF is sufficient. This is because tables that have been designed logically and correctly in 3NF are almost automatically in 4NF. Thus, for most databases that support real-time operations, especially for those that have tables with predominantly single-attribute primary keys, 3NF is the practical limit. Note that a two-attribute relation with a singleattribute key is automatically in the higher normal forms.

DISTRIBUTED, HETEROGENEOUS DATABASE NOMENCLATURE What Is a Distributed Database? Date defines a distributed database as a virtual database that has components physically stored in a number of distinct "real" databases at a number of distinct sites.

A Review of Database System Terminology Federated Database Systems vs. Multidatabase Systems. M. Hammer and D. McLeod coined the term federated database system to mean a collection of independent, preexisting databases for which data administrators and database administrators agree to cooperate. Thus, the database administrator for each component database would provide the federation with a schema representing the data from his or her component that can be shared with other members of the federation.

In a landmark paper ("Federated Database Systems for Managing Distributed, Heterogeneous and Autonomous Databases," ACM ComputingSurueys, Vol. 22, No. 3, September 1990), Sheth and Larson define FDBS in a similar but broader architectural sense to mean a collection of cooperating but autonomous component database systems that are possibly heterogeneous. They also define a nonfederated database system as an integration of component DBMSs that is not autonomous with only one level of management, in which local and global users are not distinguished. According to Sheth and Larson's taxonomy, both federated and nonfederated database systems are included in a more general category called multidatabase systems. These multidatabase systems support operations on multiple-component DBSs. Sheth and Larson further divide the subcategory of FDBS into two types: loosely coupled and tightly coupled FDBS, based on who creates and maintains the federation and how the component databases are integrated. If the users themselves manage the federation, they call it a loosely coupled FDBS; whereas, if a global database administrator manages the federation and controls access to the component databases, the FDBS is tightly coupled. Both loosely coupled and tightly coupled FDBSs can support multiple federated schemata. However, if a tightly coupled FDBS is characterized by the presence of only one federated schema, it has a single federation. The term multidatabase has been used by different authors to refer to different things. For example, W. Litwin et al. have used it to mean what Sheth and Larson call a loosely coupled FDBS. By contrast, Y. Breitbart and A. Silberschatz have defined multidatabase to be the tightly coupled FDBS of Sheth and Larson. Sheth and Larson have described additional, conflicting use of the term multidatabase. The terms loosely coupled and tightly coupled FDBSs have also been used to distinguish between the degree to which users can perceive heterogeneity in an FDBS, among other factors. In this system of nomenclature (devised by this author and M.N. Kamel), a tightly coupled FDBS is characterized by the presence of a federated or global schema, which is not present in a loosely coupled FDBS. Instead of a global schema, loosely coupled FDBSs are integrated using other software, such as a user interface with a uniform

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY "look and feel" or a standard set of queries used throughout the federation, thus contributing to a common operating environment. In this case, the autonomous components of a loosely coupled FDBS are still cooperating to share data, but without a global schema. Thus, the users see only one DBS in a tightly coupled FDBS, whereas they are aware of multiple DBSs in the loosely coupled FDBS. Here, the tightly coupled FDBS obeys Date's rule zero, which states that to a user, a distributed system should look exactly like a nondistributed system. Given this manner in which to characterize an FDBS, a hybrid FDBS is possible for which some of the component DBSs have a global schema that describe the data shared among them (tightly coupled), but other components do not participate in the global schema (loosely coupled). An Expanded Tcuconorny. An expanded taxonomy is proposed to provide a more comprehensive system to describe how databases are integrated, and to account for the perspectives of both the data administrator and the users. Essentially,most aspects of Sheth and Larson's taxonomy are logical and should be retained. However, instead of using Sheth and Larson's terms for tightly coupled federated database and loosely coupled federated database, the terms tightly controlled federated database and loosely controlled federated database, respectively, should be substituted.

This change focuses on the absence or presence of a central, controlling authority as the essential distinction between the two. In this case, the terms tightly coupled and loosely coupled can then be applied to describe how the user, rather than the data administrator, sees the federation. Given this change, the coupling between components in a federated database will describe how seamless and homogeneous the database looks to the users and applications. The expanded taxonomy can accommodate federated databases that differ widely in their characteristics. For example, if a tightly controlled federated database is tightly coupled, the global data administrator and the global database administrator have exercised their authority and expertise to provide a seamless, interoperable environment that allows the federation's users to experience the illusion of a,single database for their applications and ad-hoc queries. A tightly controlled federated database can also be loosely coupled, in which case the global data administrator allows the users of the federation to see some heterogeneity with respect to the component databases. Both conditions are within the realm of possibility. However, a loosely controlled federated database is almost certain to be loosely coupled. This is because a loosely controlled federated database lacks a central authority capable of mediating disputes about data representation in the federat-

A Review o f Database System Terminology

ed schema and enforcing uniformity in the federation's interfaces to user applications. A loosely controlled federated database is not likely to be tightly coupled. Local or Localized Schema vs. Component Schema vs. Export Schema. A l ocal or localized database generally starts as a stand-alone, nonintegrated database. When a local, autonomous database is selected for membership in a federation, a local schema is defined as a conceptual schema of the component DBS that is expressed in the native data model of the component DBMS.

When the local database actually becomes a member of a federated database, it is said to be a component database. The schema associated with a given database component is called a component schema, which is derived by translating a local schema into the common data model of the FDBS. An export schema represents the subset of the component schema that can be shared with the federation and its users. Similarly, Date defines a local schema as the database definition of a component database in a distributed database. Federated Schema vs. Global Schema vs. Global Data Dictionary. Af e d e rated schema is an integration of multiple export schemata. Because the distributed database definition is sometimes called the global schema, federated schema and global schema are used interchangeably. A global data dictionary is the same as a global schema that includes the data element definitions as they are used in the FDBS. A data dictionary is different from a schema, or database structure specification, because a data dictionary contains the definitions of attributes or objects, not just the configuration of tables, attributes, objects, and entities within that structure.

It is especially important to include the data element definitions with the export schemata when forming a federated database in which multiple data representations are likely. Simply having a collection of database structures is insufficient to complete a useful federated schema. It is necessary to know the meaning of each attribute or object and how it is construed in the component database. Middleware vs. Midware. In a three-tier clientiserver architecture designed to connect and manage data exchange between user applications and a variety of data servers, the middle tier that brokers transactions between clients and servers consists of middleware, which is sometimes called midware.

P. Cykana defines middleware as a variety of products and techniques that are used to connect users to data resources. In his view, the middle-

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY ware solution is usually devoted to locating and finding data rather than to moving data to migration environments. In addition, Cykana describes two options for middleware, depending on the degree of coupling between the user and the data resource. Loosely coupled middleware products allow flexibility in specifying relationships and mappings between data items, whereas tightly coupled middleware products allocate more authority to standard interfaces and database administrators. Each option has its advantages and disadvantages, as follows: Loosely coupled middleware. This type of middleware does not require the migration or legacy data structures to be modified, but it allows users to access multiple equivalent migration systems transparently with one standard interface. Its disadvantage is that it does not prevent multiple semantics and nonstandard structures. Tightly coupled middleware. This option represents a more aggressive strategy that combines applications program interface (API) and graphical user interface (GUI) technologies, data communications, and data dictionary design and development capabilities t o provide distributed data access. Data standardization and reengineering are required. The concept of loose and tight coupling to middleware is somewhat similar to, but also differs slightly from, the loose and tight coupling between data resources as discussed by Sheth and Larson and other researchers. In the case of middleware, the coupling occurs between software at different tiers or layers (between the middle translation layer and the data servers); whereas, in the case of an FDBS, the coupling occurs between data servers that reside at the same tier. (However, this difference does not preclude software that achieves the coupling between data servers from being located in the middle tier.) G.V. Quigley defines middleware as a software layer between the application logic and the underlying networking, security, and distributed computing technology. Middleware provides all of the critical services for managing the execution of applications in a distributed clientlserver environment while hiding the details of distributed computing from the application tier. Thus, midware is seen in a critical role for implementing a tightly coupled FDBS. Similarly, Quigley considers middleware to be the key technology to integrate applications in a heterogeneous network environment. Database Integration vs. Database Homogenization. Many organizations in both industry and government are interested in integrating autonomous (sometimes called "stovepipe") databases into a single distributed, heterogeneous database system. Many terms describe the various aspects of

A Review o f Database System Terminology this integration. The multiplicity of terminology occurs because of the many ways in which databases can be integrated and because of the many simultaneous efforts that are underway to address integration problems. Because the degree to which database integration takes place depends on the requirements of the organization and its users, the term integration, as it is used in various contexts, remains rather vague. For people whose fields of expertise are outside the realm of database technology, it is necessary to hide the specific details of database system implementation behind midware layers and a user interface that together create the illusion of a single, unified database. By contrast, more experienced users with knowledge of multiple DBMS can function efficiently in an environment that preserves some distinctions between the database components. Within all architectural options, database integration, in its broadest sense, refers to the combination and transformation of database components into a database system that is homogeneous on at least one level (e.g., the data level, the schema level, the program interface level, or the user-interface level). Such an integrated database system must satisfy the primary goals of interoperability between database system components, data sharing, consistent data interpretation, and efficient data access for users and applications across multiple platforms.

K. Karlapalem et al. describe the concept of database homogenization as the process of transforming a collection of heterogeneous legacy information systems onto a homogeneous environment. Whereas they do not define what they mean by the term homogeneous environment, they list three goals of database homogenization: To provide the capability to replace legacy component databases efficiently To allow new global applications at different levels of abstraction and scale to be developed on top of the homogenized federated database To provide interoperability between heterogeneous databases so that previously isolated heterogeneous localized databases can be loosely coupled This definition of database integration explicitly includes multiple architecture~and implementations; by contrast, the description of database homogenization is associated with loose rather than tight coupling of localized databases into a homogeneous environment. Sometimes the term database normalization is used incorrectly to mean database integra-


Interoperability vs. Interoperation. The conditions necessary for inter-

operability include:

Interconnectivity via the necessary networking facilities

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY Resolution of system heterogeneity Resolution of semantic heterogeneity Derivation and integration of schemata and views There are three levels of heterogeneity, including platform heterogeneity, data model heterogeneity, and semantic heterogeneity. Excluding semantic heterogeneity, the term system heterogeneity is seen to be some combination of platform heterogeneity (e.g., different DBMS software and implementation) and data model heterogeneity (e.g., schemata, query languages, integrity constraints, and nullness requirements). Because Karlapalem et al. have already listed the integration of schemata a s an item separate from system heterogeneity, system heterogeneity logically should refer to the differences between DBMS vendors, transaction processing algorithms, query languages, query optimization techniques, integrity constraints, and nullness requirements. If this definition is assumed for system heterogeneity, the necessary conditions for database interoperability listed above become sufficient conditions. Similarly, computer system heterogeneity and data management system heterogeneity must be resolved a s a requirement for interoperability among existing information systems. The achievement of database interoperability simply supplies users and applications with the ability to interoperate in a common data environment. It does not guarantee that interoperation will occur. Database interoperation results when users and applications take advantage of a common, integrated environment t o access, share, and process data across multiple databases. Legacy Information System vs. Migration Information System. A u t o n Omous systems that become candidates for integration into a more modern, global, and distributed system sometimes have been called migration systems. These systems are supported by migration information systems with migration databases.

The term migration databases indicates unambiguously that the database in question has been chosen to be included in some form of a modern database system, especially a distributed system such as an FDBS. By contrast, the term legacy information system has been used in two different ways. At one extreme, some people use legacy information system and legacy database to be synonymous with migration information system and migration database, respectively. Others have referred to a legacy information system as if it were not a migration information system and is therefore deliberately excluded from the final integrated database configuration. This is the opposite extreme.

A Review of Database System Terminology

More commonly than in the extreme cases, a subset of legacy data is deemed important to the users of a shared data resource. This means that some or all of the data in a legacy information system may be migrated during a database integration effort. For example, Cykana describes steps in the data integration process that start with the movement and improvement of data and progress to the shutdown of legacy systems. Karlapalem et al. refer to the difficulty of migrating legacy information systems to a modern computer environment in which some difference is presumed to exist between the legacy system and the modern system. The author recommends that the following terminology be adopted as standard: Legacy data and legacy information system should refer to the original data and original format, as maintained in the original, autonomous information system before any modification or migration to a new environment has occurred. Migration data and migration information system should be used to describe the subset of the legacy data and software that has been chosen to be included into a new (and usually distributed) information resource environment. When data and software are modified to accommodate a new environment, they should be called migration instead of legacy.

TERMS ASSOCIATED WITH SEMANTIC HETEROGENEITY Semantic heterogeneity refers to a disagreement about the meaning, interpretation, or intended use of the same or related data or objects. Semantic heterogeneity can occur either in a single DBS or in a multidatabase system. Its presence in a DBS is also independent of data model or DBMS. Therefore, the terminology associated with this problem is discussed in a separate section.

Semantic Interoperability vs. Database Harmonization The terms database integration and interoperability were discussed previously in a general context. For distributed, heterogeneous database systems to be integrated in every respect, semantic heterogeneity must be resolved. Problems associated with semantic heterogeneity have been difficult to overcome, and the terminology to describe semantic heterogeneity has evolved accordingly. For example, R. Sciore et al. define semantic interoperability as agreement among separately developed systems about the meaning of their exchanged data. Whereas the exact meaning of the term database harmonization is not clear, one can infer that the goal of database harmonization must be related to providing an environment in which conflicts have been resolved be-

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY tween data representations from previously autonomous systems. This definition further implies that the resolution of semantic heterogeneity is a prerequisite for database harmonization. Although a more precise definition of database harmonization is needed, it appears to be related to the idea of semantic interoperability.

Strong and Weak Synonyms vs. Class One and Class Two Synonyms A synonym is a word that has the same or nearly the same meaning as another word of the same language. Because a metadata representation will include more attributes (e.g., data element name, type, length, range, and domain) than ordinary nouns, it was necessary to consider various levels of similarity and therefore, levels of synonymy. M.W. Bright et al. have described the concept of strong and weak synonyms. Strong synonyms are semantically equivalent to each other and can be used interchangeably in all contexts without a change of meaning, where as weak synonyms are semantically similar and can be substituted for each other in some contexts with only minimal meaning changes. Weak synonyms cannot be used interchangeably in all contexts without a major change in the meaning - a change that could violate the schema specification. This concept is similar to one (introduced by the author and Kamel) that states that there are two classes of synonym abstraction: Class One and Class Two. Class One synonyms occur when different attribute names represent the same, unique real world entity. The only differences between Class One synonyms are the attribute name and possibly the wording of the definition, but not the meaning. By contrast, Class Two synonyms occur when different attribute names have equivalent definitions but are expressed with different data types and dataelement lengths. Class Two synonyms can share the same domain or they can have related domains with a oneto-one mapping between data elements, provided they both refer to the same unique real-world entity. The concept of a strong synonym is actually the same as that of a Class Two synonym because both strong synonyms and Class Two synonyms are semantically equivalent and they can be used interchangeably because they have the same data element type and length. By contrast, the concept of a Class Two synonym includes (but is not limited to) the concept of a weak synonym because the definition of a weak synonym seems to imply a two-way interchange in some contexts. The main difference is that the interchangeability of Class Two synonyms is determined not only by semantic context, but also by the intersection of their respective domains, as well as their data types and lengths.

A Review o f Database System Terminology Class Two synonyms allow for a one-way, as well as a two-way, interchange in some cases, whereas the "each-other" part in the definition of weak synonyms seems to preclude a one-way interchange. For example, a shorter character string can fit into a longer field, but not vice versa.

SUMMARY This chapter presents a review of the rapidly growing vocabulary of database system technology, along with its conflicts and ambiguities. The sol u t i o n s offered a d d r e s s s o m e of t h e p r o b l e m s e n c o u n t e r e d in communicating concepts and ideas in this field. This effort is intended to be a first step toward the development of a more comprehensive, standard set of terms that can be used throughout the industry. More work is needed to identify and resolve the differences in interpretation between the many terms used in data administration, database development, database administration, database research, and marketing as they occur in industry, government, and academia.

ACKNOWLEDGMENTS This work was created by a U.S. government employee in the course of employment and is therefore in the public domain.

Chapter 3

Developing a Global Information Vision Tim Christmann



increasing professional pressures as their organizations strive to become truly global. In an effort to leverage their resources around the world and serve global customers, companies are turning to information technology as a means of achieving these objectives. IT capabilities have evolved to the point where some would argue that IT can, for the first time, be a strategic enabler in helping a company become truly global. In order for IT to be a strategic enabler for a company on a global basis, all major IT investments must be aligned with the business goals and strategies of the organization. One step toward aligning IT investments with business objectives and strategies is to bring key business and IT leaders to a common understanding or vision of how information will enable the company's strategy and future competitive position in the marketplace.

What Does It Mean To Be "Global"? In the age of economic globalization, it is not uncommon for companies to expand beyond domestic boundaries into foreign markets in search of new growth opportuntities. If a company has operations around the world, does this make the company a "global business"? In short, the answer to this question is no. Being truly global involves mobilizing company resoures around the world and presenting a common face for the company's key stakeholders - customers, suppliers, shareholders, and employees. What Is a Global Information Vision? Let's start with what an information vision is not. It is not a statement conceived in the office of the C10 and posted above the door in the IT department. It is not about having the IT function achieve functional excellency through the use of leading edge technologies. 0-8493-9976-9/93/$0.00+S.50 O 1999 by CRC PES LLC



Information Vision


IT Operations


l/ Application



0scope of Visioning

Exhibit 1. Information vision.

An information vision is a clear statement of how an enhanced information base will help the business achieve its strategic objectives. It is stated in business terms and contains a clear link to the competitive positioning of the business. A vision also indicates business outcomes that are recognizable by business and IT leaders alike. As shown in Exhibit l , within the IT decision-making hierarchy, the Information Vision provides a link between the business strategy and objectives and the IT strategy. Bringing the key business stakeholders to agreement on the Information Vision provides direction throughout the company on where IT investments should be focused to bring the most value to the company.

OBJECTIVES - "PAY-OFF IDEA" Why Do You Need an Information Vision i n Your Company? There are a number of objectives that you should achieve as you facilitate the development of your company's information vision: 1. Raise the understanding at all levels in the company of how information and information technology can and will add value for the company. - Often executives see the enormous operational and capital expenditures for IT and wonder what value they are getting for their mon-

Developing a Global Information Vision ey. They fear that deployment of technology solutions may be based on an underlying desire to have the latest technological innovation rather than the pure business value of the investment. Demonstrating t h e value of IT investments does not s t o p once an information vision has been established. However, developing an information vision does put into clear business terms how information does and will add value to the business. - Do your business decision-makers think about information or IT when they are devising their business strategies? Business decision-makers often do not fully understand the capabilities of IT or do not think about how an enhanced information base can enable their business strategies. Engaging key business leaders in the exercise of developing a vision for information in the company will raise their level of awareness. 2. Promote better alignment of IT projects and the business objectives. - How often do you hear about companies that have sizeable investments in state-of-the-art systems t o improve an area or function that is not core t o the business? Afterwards they often question how such an enormous investment of capital and human resources has actually changed the company's competitive position? By engaging key business and IT leaders in developing an information vision linked to clear business outcomes, a common understanding is reached with respect to where the value-adding opportunities exist for IT projects. Any project or opportunity not in line with the vision comes immediately under question. 3. Keep IT personnel focused on achieving an enhanced information base for the company rather than implementing new technologies for technology-sake. - The visioning exercise creates a better focus on the "I" rather than the 'IT" in Information Technology. By focusing on information, it forces people to question technology investments that do not result in a sufficient improvement to the base of information that will enhance the company's competitive position. 4. Develop an understanding of how a common information vision can benefit the global units and the company as a whole - not just another headaffice exercise. - Often the words of a vision or mission mean little to those not directly involved in composing the statements. Especially if your company operates businesses around the world that have a certain amount of local autonomy, these units may offer some resistance if their interests have not been properly represented in developing the vision or they fail t o realize how t h e vision improves the position of their business. For this reason, it can be beneficial to indicate how enhanced use of information will benefit the local units as well. 3-3


Phase 1: Assessment Understand IT Issues & Concerns

Phase 2: Development


Business Objectives

Develop Vision Framework Gain Input From Business & IT Leaders Develop Draft Vlslon

Galn Executive

Phase 3: Change Strategy


Exhibit 2. The process for developing an information vision.

THE PROCESS To develop an information vision effectively, it is important to understand that perhaps the greatest value is in the process of developing the Vision rather than in the words of the Vision itself. In order to gain the full value of this exercise, the right level of people in the organization, from both the business and IT community, must be involved in developing the Vision. Having the right people involved, especially from the business community, helps to ensure relevant content, organizational buy-in and business ownership. To ensure relevancy and ownership, the vision must be articulated using words that the business community understands, therefore t h e words themselves must come from key business leaders. Exhibit 2 outlines the process for developing a Global Information Vision. Exhibit 3 provides the Critical Success Factors for making this a reality. PHASE 1: ASSESSMENT If you already have a clear understanding of the current situation of information and information technology in the company then this phase can be condensed. However, it is important to note that there are distinct benefits

Developing a Global Information Vision Exhibit 3. Critical Success Factors The following points represent critical success factors in ensuring successful development of an information vision: The right level of sponsorship It is important that this initiative be championed at the highest level in the organization in order to have the desired impact. An information vision should be viewed as an extention of the business strategy by stating how information will help the company achieve its business objectives. Therefore, a successful information visioning initiative will have a sponsor positioned at the same level in the company as the sponsor of a business strategy intiative. These intiatives might even be sponsored by the same individual. Involvement from key business leaders The objectives of the visioning process are aggressive given that they often require the thinking and decision-making of management and staff throughout the company to be changed. One of the best ways to achieve buy-in from these people is to involve them as much as possible in the development process. Executive involvement is key. These people need to be involved to ensure that the vision is clearly aligned with the direction of the business. Also, their involvement lends credibility and importance to the intiative. Senior executives are often somewhat removed from decisions made at lower levels in the company, especially in a global company where decision-making is often decentralized. Therefore, it can be beneficial to have management involvement in the development process from various business areas and geographic regions. Choose representatives from the business areas or regions that are seen as critical to the business and individuals who are viewed as leaders in their respective areas. The right language It is critical for the vision to be perceived as business owned and business led if the stated objectives are to be achieved. Therefore, the vision must be stated in business terms. Linked to business outcomes Linking the vision to specific business outcomes helps people to understand exactly how improving the information base in the company results in business value, especially if they are not involved in developing thevision. It also may help to reveal in which areas IT projects may provide the greatest opportunity in terms of positioning information as a strategic enabler for the business.

in conducting a formal assessment t o ensure that your understanding is complete. As part of the assessment, there are three main questions that need t o be answered: 1. What is the Current State of the IT?

The assessment phase provides an opportunity t o gain a clear understanding of the current situation and bring t o the surface any issues or concerns about t h e current state of information o r information technology in t h e company. These issues o r concerns represent not only areas of improvement, they may also represent significant barriers t o changing the way peo-

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY ple think about the ability of the information and information technology to add value to the business. The business community needs to know that their concerns are being heard. Soliciting input from key business leaders also provides a prime opportunity to begin developing buy-in and ownership for the Vision itself. It is also important to acknowledge past successes or progress that the company has made in IT projects. These successes are often not well known throughout the company and serve to boost the company's collective confidence in its ability t o deliver business value through IT.

2. What are the Major External Forces or Pressures Facing IT? If possible it is beneficial to understand how competitors in the same industry or companies with similar operations are using IT. Presenting this information may serve two purposes. First, if it is determined that competitors are further advanced in the way they are employing IT, it may serve as a strong imperative for change. Exploring how similar companies use IT to gain competitive advantage can also raise the level of education within the company, heightening awareness of what is possible. In addition, it may be beneficial to articulate what the current possibilities are in IT today. People may ask, 'why does the company need a global vision about information?' It is important for people to understand that for the first time, global communications and real-time information sharing are possible and can deliver significant business value. 3. What are the Business Vision, Objectives, and Strategy? This third question is perhaps the most important for a number of reasons. First of all, the information vision must be aligned with the vision of the company as a whole. Often the company vision is not explicitly stated or well understood. Therefore, it is important to conduct interviews with key business leaders to gain a clear understanding of where the business leadership intends to take the company in the future and what the critical success factors are for the company as a whole and its core business areas. Secondly, conducting interviews with key business leaders helps build ownership for the vision within the business community. People are more likely to champion the vision when they are part of its development. Finally, it is critical for the vision itself be be stated in business terms. Exhibit 4 provides an example of typical findings that may result from the Assessment Phase.

The Change Imperative One of the most important outcomes of the Assessment phase is to build a clear understanding of the need for the company t o change. The change imperative must be clearly articulated in terms of issues and challenges

Developing a Global Information Vision



K Issues

~ e c n n @ ~ , cd ab el r s l r y . resulbng in n.gher wslo and ci'fmhes n app ymg osst practoces and g.obal pmcesseo Increasing IT msts IT costs are inaeashg signihnlly yet business value is not clearly urderalood Project redundancy- simllar paects mnducted around the world af one another ~ndependenl Strategic alignment IT investments am Onen noi linked lo business obiectives



New I

.. .


Lowzosl provider Global organtabon Unlls wamng together to ach~evemarkel leadersh~p Cos1reductianthrough: Sharing of 'best pradioes'and expetlise across business units Business fundions shared aaoss business units Devebp a 'learningorganlzalbn' through knowledge sharing Devebp cbser customerlsuppiier relationships




Rdlable ~mununicatanswlth anybody, a w h e r e around the globe OM boln nlenally &sseslfund (across bus noss units) and enernaly (wim rxlstomers/suppliersl shareholddcommun~ty) Real-hme, inlellient techmiogy Ulat enabling optimization of opeatlms Powerful end-user mmpuling capabilily hat can enhance end-user dedsimma~ng

Exhibit 4. Typical findings from the assessment phase.

facing IT, changing business objectives and strategies, and the new IT capabilities that are available. It is important that the executive group understands the outcomes of the Assessment phase and to agree on the need for change. PHASE 2: DEVELOPMENT Different approaches may be used to develop the vision. Perhaps the most effective approach is to facilitate a group of key business leaders, including senior executives and key management personnel through each of the following steps. Although effective, this approach is often very difficult to execute especially if the key business leaders are situated around the world. It important to keep in mind that having active involvement from as many key business leaders as possible is critical to the success of the initiative. Therefore, if it is not possible to conduct steps 1 through 4 as one group, it may be necessary to break the group of targeted participants into smaller focus groups. Step 1: Develop Vision Framework Establishing a framework may not be essential but can be beneficial both in developing the vision and communicating it. A framework can provide an effective structure in which to organize ideas regarding how an enhanced information base can enable the business to achieve it's business




l Low


"Help deliver value to the company as a whole"



Exhibit 5. Information vision framework.

objectives. The framework can also prove very useful in communicating how the vision can be linked to specific business outcomes. Exhibit 5 provides an example of a framework that could be used for a global company that is looking for ways t o articulate how a common vision for an enhanced information base will not only improve the performance of the company a s a whole, but also to improve the performance of the individual business units around the world. Step 2: Gain Input from Key Stakeholders Depending on the audience, different techniques can be used t o gain input from key stakeholders. Often it is difficult t o obtain extensive time with top executives. Therefore it may beneficial t o include specific questions during the executive interviews in the Assessment phase that will provide insight into how enhanced information may better enable the business t o achieve its objectives. The following are examples of specific questions that could be used t o facilitate input from the key business leaders: How can information help the company attain its vision? What information would make a difference in t h e various business units? For customers? For managers? What impediments exist today t o using information to add value t o the organization and t o make a difference t o the bottom line?

Developing a Global Information Vision Exhibit 6. Examples of Business Outcomes from Developing an Enhanced Information Base An enhanced information base will deliver value to the company as a whole by:

An enhanced information base will help to deliver value to the local units by:

enabling optimization of operations between business units enabling a learning organization through experience sharing capturing knowledge about customer needs so that, whenever pertinent, it flows through the company's value chain allowing the internal organization to be transparent to the customer around the world enabling people to be more a part of the company as well as their local groups; people will then act in the common interest of global company rather than only in the interest of their own group at the expense of the others decreasing complexity thereby decreasing costs and improving customer service facilitating organizational change; reducing organizational boundaries; enabling the creation of virtual teams increasing decision-making and implementation speed

enabling optimization of operations within each business unit capturing and exploiting innovative ideas and opportunities more quickly, more easily tapping into global expertise for problem solving tapping into global supplier options to reduce costs improving operational stability, consistency and reliability enabling production capacity optimization increasing decision-making and implementation speed

Using the sample framework presented in Step 1, Exhibit 6 provides some examples of possible ways to add value or enable business strategies. These examples, when linked to the vision, provide clear business outcomes that may result from developing an enhanced information base. Step 3: Develop Draft Information Vision

This step involves developing a simple statement that combines the strategic objectives and critical success factors of the business with the feedback that has been given regarding how the business could change as a result of having an enhanced information base. An example of this is shown in Exhibit 7. How the draft Information Vision itself is assembled depends largely on the development approach that has been chosen. If all key business leaders are in one room then the words can be drafted by the group as a whole. If this approach is not possible, or if there is already general concensus among the participants on the business outcomes that can be achieved through an enhanced information base, the vision can be assembled and presented to the key business leaders for review and approval.


Global Information Vision statement...

Exhibit 7. Assembling the information vision.

Step 4: Gain Executive Acceptance It is important that acceptance b e given for the vision by the various business leaders involved in developing it. Once again, the vision must be business-owned and business-led. Acceptance will help t o ensure that each of the leaders involved will serve a s a champion for building an enhanced information base in their respective business areas or regions.

PHASE 3: CHANGE STRATEGY A successfully executed Information Vision development process will generate a significant amount of awareness among key business leaders of how information can be a strategic enabler for the company. However, it is important t o build on t h e momentum that has been achieved and have a plan for making all company management aware of the vision for how information will be leveraged to create business value. Do not make the mistake of relying on passive or informal communication methods. Instead, a detailed communication plan should b e developed t o ensure that all key stakeholder groups are aware of the vision. Exhibit 8 highlights a simple approach for developing a communication plan. Key Messages First you must identify the key messages that you wish the target audience s t o hear and understand. Given the visioning process that has just been completed, typical messages would include:

Developing a Global Information Vision r


Identify Target Audiences

ldentify Key Messages \


Communication Methods



Develop 'r Communication Plan 4

Exhibit 8. Approach to developing communication plan.

Business Vision and Strategy - pick key phrases that link directly to the Information Vision Information Vision - emphasize that the statement has been developed and approved by key business leaders Business Outcomes - state the business outcomes that can result from developing an enhanced information base are key to making people understand how the vision will change the business

Audiences Generally, all decision-makers in the company should be made aware of the key messages coming out of the visioning exercise. For the purposes of the communication plan, it is important to specifically identify those audiences that will require targeted communications. Once these audiences have been identified, it is a worthwhile exercise to determine the roles of each group with regard t o their use of information and IT and gain an understanding of their specific communication requirements. For example, some groups may simply need to be made aware of the key messages while others may need to incorporate this thinking into their decision-making. The level of communication required for a specific audience will dictate the method used. Communication Methods Typically, there are a variety of communication methods available within a company ranging from media tools, such as e-mail or company newsletters, to face-to-face communications, such as executive presentations. It may be necessary to create some specific communication opportunities to match the objectives of the visioning initiative. Compiling a list of the various methods available will help in developing the communication plan.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY Developing the Communication Plan Assembling the communication plan involves determining which methods will be used t o communicate the key messages to the target audiences. Conclusion and Next Steps Once the Information Vision has been developed and the communication plan has been launched, you may ask yourself, "where to from here?" Once again, given the momentum established during the visioning exercise, it is important t o build on this momentum. The following are typical initiatives o r deliverables that may provide further value to your company. Information Strategy. Building on the stated business outcomes from the visioning exercise, an information strategy would provide further detail regarding what the specific information requirements are t o achieve these outcomes. The strategy would answer questions such as: What knowledge is critical t o the success of the company? How can this knowledge be leveraged further? What are the key strategic and operational decisions in t h e company? What information would enable these decisions t o be made more effectively? An information strategy also provides detail regarding where IT investments should be targeted t o deliver the greatest value t o t h e company. Information Management Plan. Once people acknowledge that information is a valuable resource for the company, they will begin t o realize the importance of managing this resource effectively. An information management plan identifies how key information will be defined, managed, delivered, and protected. The plan also identifies who will be responsible for defining, managing, delivering, and protecting key information.

Chapter 4

An Approach for Establishing Enterprise Data Standards



OR OTHER in most organizations and tend to be wide reaching in establishing how work is performed and delivered within the environment. In the strictest sense, standards should thoroughly describe procedures and outcomes for all t h e events in a n organization. The benefit of this is consistent communication within an organization and reliable deliverables. Implementing standards is a tricky business that can lead to an entirely different set of problems that can also jeopardize the health of the organization. Ideally, it would be best to simply borrow standards that have already been used successful1y on similar projects and adapt them with minimal effort. Unfortunately, this is not possible in many real world cases, so another approach is required. As shown in Exhibit 1, simplicity is the key to successfully implementing enterprise data standards in most organizations.

Enterprise data standards are part of the larger, broader "enterprise development standards" category, which includes such topics as development approach, walkthrough procedures, and coding conventions. This paper focuses on establishing enterprise data standards. There are other types of standards in organizations (e.g., development standards), but these are outside the scope of this article. This distinction can sometimes be subjective; for example, consider the case of form naming standards. Is this a data standard or a development standard? For grouping purposes, I prefer to categorize it as a naming convention under data standards. How0-8493-W76-9/991$0.00+6.50 B 1999 by CRC P ~ s LLC s


DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY Standards are only useful if they are used; Standards will only be used if they are understood; Standards will only be understood if they are simple: Standards will only be understood if the organizationhas bought into them; The organization will buy into standards ifthey are simple, easUy understood, and there is a stake in using them.

Exhibit 1. Guiding principles.

ever, some would prefer to stay pure and categorize it under the more general naming conventions in the development standards.

OBJECTIVES AND BENEFITS OF ENTERPRISE DATA STANDARDS Data exists within organizations in a variety of formats, some of which include: documents, databases, flat files, paper documents, binary objects, voice, and video. Data is everywhere, and its value is in a constant state of flux, either being modified, deleted, changed, or manipulated in some way. All of this activity is supported by many human resources, both inside and outside the organization. Enterprise data standards support the objectives of bringing consistency and integrity across the organization based on the categories shown in Exhibit 2. This list is not exhaustive, but the categories that are identified are common across many organizations in a variety of industries and provide a good start for beginning this process. The significant objectives and benefits of each of the categories defined in Exhibit 2 are described in this section, as they pertain to many organizations.

Architecture and Physical Environment Category Objectives. This includes building a consistent technical data architecture including centralized databases, distributed database servers, and mobile databases. This also includes the physical storage of the data and associated data objects (e.g., indexes) and defines the methods for accessing the data in an n-tier environment from thin or fat clients using embedded SQL or remote procedure calls (RPCs). System management, change management, and version control tools are implemented to meet this objective based on procedures defined in the "Procedures Category." Benefits. Provides a standard physical environment that is easier to support and enhance. Maintenance and support costs are also reduced. A standard architecture and physical environment also simplify application portability and interoperability.

Procedures Category Objectives. This includes all the procedures required to support data for operational or development groups. This can include system management,

An Approach for Establishing Enterprise Data Standards

User Interface Reports - Screens


LlI Interface Layer

Data Layer

Business Logic Layer

Proceduresand Tools

Exhibit 2. Categories of enterprise data standards.

change management, and version control procedures. Included in this group are project naming standards (using tools like Microsoft SourceSave), module check in /check out procedures, support for multiple versions, and delta merging procedures. Benefits. A significant benefit supported by this category includes the use of proven, consistent procedures for maintaining the integrity of the data environment. It is also becomes easier to measure compliance with the standards processes.

n-tierllayers Category Objectives. This category contains anywhere from one to n tiers, with three being the most common. The growing popularity of components will continue to increase the number of tiers. The predominant case includes three tiers, namely, user interface tier (or layer), data tier, and the business logic tier. The user interface tier is discussed in the next category. The ntier category includes naming standards in terms of variable names, field names, data object names (e.g., tables, indexes, databases, user-defined datatypes, triggers, stored procedures), forms class libraries, and objects. Benefits. There are significant benefits derived from standardizing in this category. Application modules are easier to read, build, and maintain. There is increased reuse of code and improved application portability and interoperability. This results in faster development and more accurate debugging. There is a reduced learning curve. The end result is improved product quality.


User Interface Category Objectives. This category includes all types of user interfaces, including application screens and reports (both online and printed). The objectives in this category are t o specifiy placement of data on screens o r printed forms. Categories such as screen navigation are outside the scope of this topic.

Benefits. Standards in this category result in consistent look and feel for user screens and printed reports. Users know where to search for specific types of information. An example of this is an error line on a screen or the page number on a report.

CATEGORIES OF ENTERPRISE DATA STANDARDS Enterprise data standards are required to support both operational and development groups within an organization. Operational groups include those who perform the company's business on a day-to-day basis. These groups require access to information, the ability to manipulate information, procedures, and forms of various kinds. Development groups are more project-focused, in that they are restricted by some sort of timelines and deliverables. These groups require many of the data standards required by the operational groups, but their demands go further. Since development groups get right into t h e architecture and plumbing of applications, they are dependent on how data is stored, accessed, and manipulated at the code level. The exhibits that are included in this section provide subcategories for each category defined previously. Each of the subcategories should be considered in your organization and allocated standards. Exhibit 3 shows the subcategories for the Architecture and Physical Environment category. Exhibit 4 shows the subcategories that are included in the Procedures category. Most of these are focused on protecting data (in databases or files) during the development cycle. Exhibit 5 shows the subcategories for the n-tierllayers category. These are divided between the different tiers in the architecture model. This example covers the business logic tier and the data tier. The user interface layer is covered in Exhibit 6. Exhibit 6 shows the subcategories for the User interface category. There are two primary subcategories, namely user screens and reports.

PROCESS FOR DEFINING ENTERPRISE DATA STANDARDS There are many approaches for defining enterprise data standards. Project experience has shown that the best approach for meeting the objectives identified earlier in this article is the early adoption of a simplified set of en4-4

An Approach for Establishing Enterprise Data Standards Exhibit 3. Architecture and Physical Environment Category Subcategory Data Partitioning Physical Architecture Data Access


Comments Position nonvolatile data near the client platforms that access it. Data architecture for development, testing, and production. Call stored procedures from client platforms to update database table data. Use views to select information from database tables. Return minimum number of data rows to the client. Assume optimistic locking approach instead of the pessimistic.

Exhibit 4. Procedures Category Subcategory


Change Management Standards

All database objects should be created with SQL Scripts. The scripts should be saved in tools such as Microsoft Sourcesave. New client platforms will be configured using Microsoft SMS with local data tables for static data values. Check in/Check out, delta management.

System Management Standards Version Control Standards

terprise data standards that are easily understood and used across the organization. This is best achieved by starting with a set of proven standards from previous projects or from external vendors. For example, companies such as Microsoft, Forte, and Sybase all have publications that define standards to one degree or another. These can be readily found by searching the Websites of the vendors or phoning their marketing or technical departments. Exhibit 7 shows a high level view of a process that should be completed in your organization to define Enterprise Data Standards. This process has been defined through experiences on many small to large projects, and based on observations about what worked well and what did not work so well on the projects. As shown in Exhibit 7, the "Borrow Approach" is the simplest to define or implement, which involves borrowing data standards from vendors, other successful projects, or organizations such as the American National Standards Institute (ANSI) or International Standards Organization (ISO) that define industry standards. A more complete list of organizations that define standards is provided at the end of this article. Such standards can be used on a small number of sample projects and finetuned into a consistent set of data standards for a wider audience. A review team consisting of a Senior Sponsor (e.g., VP or CIO) and a combination of business analysts, systems analysts, and developers can then finetune the data standards and expand them for the enterprise before deploying them.



General Naming Conventions

All variables names must be in lowercase Variables names should be mneumonic and reflect the contents of the variable Variable names should not exceed 40 characters in length Variable names should start with an alphabetic letter (a-z) The remainder of the name can be any combination of The underscore 0is letters or digits or the symbol 0. used to separate parts of a name Variable names cannot contain include embedded spaces Maximum length is 20 characters Use the singular form (e.g., customer instead of customers) Do not use abbreviations Do not use restricted words Preface column name with first two letters of the table name Always contain part of the first 5 letters of the table name Number them sequentially starting with '-1' Preface rule names with the table name Do not include more than 4 tables in a view Always comment your Transact-SQL code Use the 'SETNOCOUNT ON' option to minimize data traffic Avoid the use of NULLS Log errors using master error file Stored procedure names should not exceed 30 characters in length Delete triggers prefaced with "dln. Insert triggers are prefaced with "in"; update triggers are prefaced with "up" Rollback any transactions that result in data integrity errors or which violate referential integrity All tables must have a timestamp field Use ANSI-92 compatible datatypes only Minimize the storage required by variables by using the smallest datatype available (e.g., tinyint instead of int) Improve performance by using numerics instead of strings Avoid variable length strings Build ioins on columns that share compatible datatypes Avoid using NULLS

Table Names

Column Names Index Names Rules Views Stored Procedures



Exhibit 7 also shows another approach that involves "creating" the data standards because there are no satisfactory ones to be borrowed from another source. The key for making this approach a success is to begin with a framework that is similar to the exhibits shown previously. You will probably want to modify the subcategories in the tables, or even to merge or expand the number of tables that are shown. A champion in the organization, perhaps a business analyst, should then review other projects and define a standard for each subcategory (e.g., all variable names must be prefaced with their datatype, all databases will have project names). It is unlikely

An Approach for Establishing Enterprise Data Standards Exhibit 6. User Interface Category Subcategory


Screens: Help Screens Menu Screens Drop Down Menus Information Screens Transaction Screens Reports: Project Plans Status Reports Issue Logs Client Requests

Include field sizes, field types, field objects

Include page headings, page footers, date, page number, column headings, total columns Minimize number of SQL requests per transasction


Standards (Draft)



/ Condud Pilot Pmjeci (S)

Customize or Expand Monitor

Data Standards


Deploy A c m s the

Exhibit 7. Process for defining enterprise data standards.

that this first pass will be complete. The primary challenge is to establish a standards baseline and to begin using them. As in the "borrow" approach, one or more pilot projects are used to validate and enhance the data standards. In the "create" approach, it is likely that several projects will be required to define a satisfactory set of data standards.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY There is a third approach that involves combining the "borrow" and "create" approaches. In this instance, standards organizations such as ANSI or IS0 can be used to provide a first cut at the data standards. These are then customized as discused in the "create" approach. Exhibit 7 shows that after the enterprise data standards have been prepared and signed off by the corporate sponsor, the next task is wide deployment. The manner in which this is done is critical for the successful adoption of the enterprise data standards. The first step here is to ensure that the data standards are readily available electronically or on paper. The next step is to communicate the importance, location, and procedures for using the data standards by both the operations and development groups. This message must be communicated to the organization by a senior sponsor, such as a C10 or VP, in order to maximize corporate buy in.

COMMON PITFALIS AND TRAPS There are several common pitfalls and traps that can befall an organization trying to establish enterprise data standards. One of the obvious problems springs from the "enterprise" term itself. Many organizations view the use of this term to mean "broad consensus required" or "slow adoption". This can lead to several problems, not the least of which is that the standards require too much attention and time to implement, or are too cumbersome to learn, use, or understand. Such standards are usually put aside during tight project timeframes. Too many standards can be as bad as having too few standards. The surest way of ensuring that standards are not used is to spend a lot of time defining them and publishing them in a hundred plus page manual. It is preferable to spend a few days conducting research, and then having a small team publish a single digit page document defining the suggested data standards which are then immediately passed to project teams to test and refine in actual projects. Another common problem involves the inaccessibility of enterprise standards. To be effective, the standards must be readily accessible to project teams at any time. Furthermore, the standards should also be readily usable. They must also be easy to duplicate. A useful method of achieving these is to use a combination of deployment strategies. A central data repository (e.g., a Website or a Lotus Notes database) can hold the enterprise standards. Project teams throughout the organization should have direct and remote access to the repositories. Reusable templates and deliverable~(e.g., screen images) can also be included in the repositories to provide project teams with the ability to get started on their projects quickly. Paper-based, or CD/ROM-based, delivery of the standards and templates should accompany the electronic delivery modes.

An Approach for Establishing Enterprise Data Standards

MEASURING COMPLIANCE AND ENFORCING STANDARDS During the pressure of development projects or unexpected operational emergencies, it is not uncommon for all standards, including data standards, to be suspended until the immediate problem is resolved. In some instances, this is acceptable in the short term. However, compliance is important in the long term, and several mechanisms should be used to measure and enforce this, as follows: Deliverable walkthroughs. Regular sessions should be scheduled to walkthrough and sign off on project deliverables to ensure that they comply with the published enterprise data standards. DBAs should be involved in this process; Audit. Infrequent audits of projects across the enterprise should be used to ensure ongoing compliance with the published standards. Resources on the audit teams can vary over time. A good size for the audit team is about three resources. It is a good idea to rotate resources on and off the teams over a period of time; Enforcement. Examples of noncompliance should be documented and the appropriate project manager(s) should be mandated to ensure that the standards are adhered to within a specific timeframe (e.g., one month); and Quality plan. Every project plan should be accompanied by a quality plan that includes the activities that will be followed by the project teams to ensure compliance with the standards.

TEN TIPS FOR GE'ITING STARTED Many projects have shown that the following tips greatly simplify the process of establishing enterprise-wide data standards:

1. Keep them simple; 2. Borrow, if possible, avoid creating; 3. Use samples from actual projects; 4. Make the standards readily accessible electronically; 5. Do not hesitate to modify enterprise data standards if a good argument is presented; 6. Use standards as reasonable guidelines; 7. Build early successes as showpieces; 8. Build data standard compliance right into the project plan or quality plan; 9. Never create standards in a vacuum; 10. Enforce the use of the data standards.


CONCLUSION Enterprise data standards should be used by organizations that are building and maintaining nontrivial application systems. The key for establishing enterprise data standards is to keep them as simple as possible and to ensure that they are actually used on projects. It is recommended that standards be borrowed from third parties o r vendors. Where this is not possible, it is recommended that a framework, such as the one included in this article, be used to build a first cut of the enterprise standards. These should then be validated in a few pilot projects before rolling them out to the enterprise.



International Organization for Standardization (Global support) Open Software Foundation's (OSF) Distributed Computing Environment (middleware and enterprise standards) applications operating with OS. Deals with system calls, libraries, tools, interfaces, and testing European vendors and manufacturers IBM, HP Santa Cruz Operation, Sun Microsystems, UNIX - application/OS implementations Object Management Croup's Common Object Request Broker Architecture U.S. Standards body; works to get buy-in from ANSI Call-Level Interface (CLI) and Remote Database Access (RDA) Internet Society. Internet standards and internetworking techniques

Chapter 5

Completeness, Context, and Proportion: Shaping Data Design to the Business

Ronald A. Wencer

THESYSTEMS DEVELOPMENT CYCLE IS FOUNDED IN THE NOTION OF ABSTRACTION -the practice of distilling something complex into a simplified

form that establishes better understanding, clarifies direction, and promotes sound decisions. Over the course of time, various tools have been devised to embody such abstractions including flowcharts, structure diagrams, entity-relation diagrams, data flow diagrams, and object models. With varying degrees of success, methodologies have devised special processes to orchestrate the proper creation of these tools, constructing over a project's life an analytical continuum, through which frontend business abstractions ultimately grow into database tables and objects. Methodology standards address the integrity of this continuum -if one follows both the spirit and the form of the rules, the results will be structurally valid. Nevertheless, the end points of the analytical continuum must be validated against the business environment. Simply stated, requirements either are understood or are not; the delivered system either works properly o r does not. Barring Kismet, information systems can be only as good as the requirements-gathering exercises from which they flow. OVERVIEW OF THE PROBLEM Each of the following recent (and real-world) problems indicates a significant failure of the requirements-gathering process: 0-8493-W76-9/99/$0.M)+$,50 8 1999 by CRC Press LLC

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY 1. A retailer tries to link its product and marketing data to a powerful desktop publishing environment, hoping to effect dramatic savings in the preparation of advertising material. After months of effort and significant expenditure, the resulting database and application are rejected by the business area, which deems them fundamentally unworkable. 2. A project manager decides to defer consideration of a relatively unimportant business transaction, for which the project team has been able to garner only a weak, inconsistent understanding from various stakeholders. Subsequently, corporate management decrees that the deferred item must be accommodated. The database structure already established by the project must be revisited, and the partially built applications must be redeveloped. 3. After a brisk prototyping exercise, the pace of requirements gathering seems so slow that management fears a project will stall. To sustain momentum, it directs t h e project t o quickly deliver an incomplete but nominally functional application, speeding delivery by simply not addressing certain requirements. As time passes, the boundaries between the in-scope and the omitted requirements blur; there is a continual retrofitting of changes to the data model (with consequent rewriting of application code). By the time a new application is deployed, it is burdened by many elements of legacy applications: obsolete or absent models, layered amendments to documentation, confused perceptions, and unrecorded changes. 4. A company with a single product line redevelops its data environment to support more flexibility in its product pricing. Two years later, the organization wants to introduce additional product lines, but the recently implemented databases and applications only support billing clients for purchases made against a single product line. 5. After years of preparation, a multimillion dollar development project enters acceptance testing. There is immediate confusion and dismay, as different business areas present vastly different views of the requirements it was t o meet. Project documentation is s o poor that it offers no useful guidance. The physical data models bear no discernible relationship to the logical models used in earlier project phases. Eventually, the project is reorganized, a new requirements set is gathered and accepted, and the application is retrofitted to meet the changed requirements. Already late, project implementation is delayed by another year. 6. Comparable situations long have been known in the information systems world, but when they occur today, they are subject to more intense scrutiny. As the synergy of technologies yields more automatically generated and more reliable system components, manage ment becomes more aware that any surprises at the end of the day

Completeness, Context, and Proportion: Shaping Data Design to the Business stem from a flawed understanding of requirements. Those who have identified requirements may be asked the obvious question: If systems are built by professionals who talk with the people who make the decisions that direct business, why should requirements ever be overlooked or misunderstood? What happened to the value added by data analysis and modeling?

To answer all facets of this question would be difficult. The problems encountered while identifying requirements touch upon the fundamental dynamics of corporate structure. Given that corporations respond to the complexity that permeates their markets and society at large, these dynamics are generally too complex to examine exhaustively. As a matter of practicality, it instead may be sufficient to focus on several key aspects of the requirements-gathering process, beginning with some nuances of the assumptions that underlie the question posed above: Systems professionals have skills and experiences that are relevant to gathering complex business requirements in almost any situation. Key business stakeholders understand the process of gathering requirements, understand the role and commitments it imposes upon them, and understand the products they are asked to approve. As requirements are identified, decision makers and systems professionals exchange information directly and effectively. Obviously, to the extent that corporate culture tends to conform to these assumptions, fewer obstacles need to be surmounted. No organization would willingly ignore these considerations, although anomalies occasionally do arise. For example, due to an inadequate screening process, poorly qualified resources may be hired, or their expertise may be misunderstood, so that they eventually are placed inappropriately. When such blatant problems exist, however, they tend to cause immediate, visible damage that cannot be ignored, typically prompting management to resolve them quickly. Of greater concern here are the more subtle difficulties that can occur even in the best corporate environments. These are often pernicious, as they may long go unnoticed -or at least unreported - although they undermine success, as in the cases cited above. It is incumbent upon all concerned to be wary. Although these various difficulties often overlap, for purposes of this chapter they have been grouped into several broad areas of concern. Preparedness Completeness and proportion Context and change Anomalies in the continuum Prototyping's impacts

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY The following sections outline the risks inherent in each of these areas and offer suggestions for avoiding or limiting that risk. PREPAREDNESS Most business units are not practiced clients of systems development efforts. For many stakeholders, active participation in a significant systems project occurs only once in a career. Even if they have some nominally relevant experience, the novelty introduced by ever-new methodology, tools, and/or technology effectively renders each project a first-time involvement for business participants. Only exceptionally insightful stakeholders can anticipate all the business implications of the difficult systems decisions they must make. Even where the implications are understood, they may be grasped in operational terms that business owners cannot reliably correlate to arcane concepts such as entities, relationships, and keys. Systems professionals also encounter first-time phenomena in most projects. Today's lean but diversified corporate world seeks to tighten narrowly project scope, targeting quick, focused, and certain payback. On successive projects, an analyst may deal in turn with, for example, Accounts Receivable, Inventory Control, Employee Benefits, and Manufacturing, in each case working with different business policies, different decision makers with different personal styles, different jargon, and different unstated business assumptions. Thus, in a very real sense every project team begins as an untried novice, much as a new professional sports franchise enters its first season with a mixed team of veterans and rookies who never before have played as a unit. In both cases, early performance is largely unpredictable and the team's eventual success depends upon finding an effective combination of proper management, good training, and coaching principles. For purposes of gathering business requirements, a fundamentally sound combination usually involves the methodical application of standard techniques and steps, which yield prescribed standard deliverables. Preparation for all team members is a necessity. Although many systems people are trained formally to participate in analytical activities, business people rarely are. Too often they are expected t o contribute on demand, without formal preparation. When faced with a data model for the first time, regardless of the notation used, business stakeholders only can believe it means what they are told it means - not a happy situation. Although used only rarely, training and reference materials for preparing business owners for systems-related roles and responsibilities do exist and taking advantage of these resources is an excellent first step. Once a project is in flight, however, even stakeholders who have received special training will have difficulty simultaneously grappling with

Completeness, Context, and Proportion: Shaping Data Design to the Business abstract concepts (e.g., models) and dealing with the very tactile demands of the business in which they are immersed daily. It is advisable - although rarely feasible - to secure dedicated stakeholder participation throughout the requirements-gathering phase of a project. It may be possible to mitigate this problem by establishing a special project role, which is intended to lever business stakeholders' input. Essentially, the idea is to prepare a team member, generally someone who already has a solid systems-oriented background in gathering requirements and front-end data analysis, to complement and shepherd the input provided by business-area participants. Note that this person is neither a business advocate nor a surrogate user; rather than adopting either a systems or business unit outlook, the role calls for always maintaining a corporate perspective. Prior to the start of the formal requirements definition, this person undergoes an intense familiarization with the business area in question. As requirements definition unfolds, the resource's mandate is To ensure that all relevant parts of the business receive adequate consideration. At appropriate points, to deliberately navigate discussion into, and successfully through, difficult areas that the team might otherwise seek to avoid. To probe tactfully the statements of business participants, allowing stakeholders to test their own conclusions. To foresee the global implications of local decisions. To make sure that stakeholders and systems participants all share a common understanding of the team's work and of the decisions that are reached, especially with respect to data considerations (business area participants often can relate more quickly and effectively to process-oriented notations). To feedback to project leadership objective assessments of the quality of the effort, highlighting areas of success and concern, so that techniques can be tailored to the team's constituency. Candidates for such a role will vary according to organizational setting. Whether they are corporate or consultants, they should be working team members, familiar with an organization's overall direction, and committed to the requirements-gathering process. It sometimes may be acceptable to merge this role with the traditional role of a facilitator, but doing so is not always advisable. If a culture views them primarily as rules-keepers and pace-setters, there will be little opportunity for facilitators to forge the trust-based alliance with business that makes this role work. Indeed, it may be advantageous to conceal this mandate, so that stakeholders watch the role emerge from the natural dynam-

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY ics of the team, gradually appreciating this resource's value, and buying into the whole process with greater commitment.

COMPLETENESS AND PROPORTION Overall, the requirements-gathering process may not seem exciting. Systems professionals may find more appeal in conceptual analysis or in design activities, and business professionals may prefer their respective operational or executive worlds. The prospect of thoroughly defining requirements is even less attractive, as it suggests painfully dull diligence. Unfortunately, there is some accuracy in this perception; the work must be done carefully and completely. There is, then, always a risk that participants in the process will allocate their time and attention disproportionately: They may limit their attention only t o things that seem important. This is a serious error, for "important" is a subjective term. As was noted in the Introduction, retrofitting a supposedly inconsequential business requirement into later analysis may be arduous from a database or application perspective. Participants may opt to defer real consideration of a poorly-understood area of the business, feeling confident after superficial discussion that details can be resolved later. At a distance, such a decision may seem incomprehensible. If something is too hard t o grasp from the high-altitude perspective of requirements investigation, postponing its consideration to more detailed project phases only will introduce artificial constraints; it will not make the investigation any easier. Deciding to ignore it completely is even worse. An elegant study that addresses the simplest 80%of a knotty problem serves little purpose. Remarkably, such decisions seem pragmatic to people struggling under normal project pressures. All too often, investigations are influenced by the very human tendency to plunge into great detail where something is labeled "important" or appears to be challenging (or is merely unusual, or at least more interesting than other things), s o that teams may find themselves skimming over items that do not seem t o warrant immediate attention. When defining requirements, yielding to this temptation leads to a biased view of business, in which the various elements assume disproportionate magnitude and importance. Perhaps this sort of thinking explains one corporate data model, in which employee gender was identified as an entity - one wonders what the attributes might have been. The concept of treating all requirements objectively, s o that a project's documented deliverables reveal a uniform depth of understanding, is a

Completeness, Context, and Proportion: Shaping Data Design to the Business

Need some kind of background and horizon The smile is critical --

delicate sfumafo rendering of landscape


rock-lined horizon


enigmatic smile /

, folded to arms

serpentine road

Foreatms must appear to be horizontal

Incomplete, Unbalanced Requirements


suggest repose

female subject, gazing directly at viewer

Better Requirements

Exhibit 1. Dealing with too much too soon.

guideline common to many well-established methodologies. Despite its value, however, this guideline is respected only infrequently. Most people intuitively comprehend the risks of gathering and digesting too little information. But there is also a risk in trying to deal with too much too soon, especially if it leads a project to focus only on fragments of the overall problem and to make inadvertent and premature decisions. Exhibit 1 illustrates this risk, in the context of Leonardo da Vinci's well-known Mona Lisa. Imagine the artist is concentrating on his subject's smile and on the repose suggested by the positioning of her arms, then going on to paint these in detail, only to discover afterward that he has left no room in the canvas for her forehead. Instead, Leonardo's requirements definition (regardless of whether he documented it or not) must have been closer in concept to the right half of this example. A close parallel exists in business. Like the artist's canvas, the resources available to a requirements definition effort are finite. Consuming them disproportionately in one or two neighborhoods of concern diminishes the r e sources that can be devoted to other areas. It also introduces more review cycles and overcomplicates them, and it tends to skew the project's overall findings. Moreover, it does not add true value in the long run; later analysis

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY phases, which are intended to benefit from progressively deeper appreciations of context, will in any event re-examine the same detail.

CONTEXT AND CHANGE Requirements must be understood as part of a greater whole. Looking some distance beyond a project's nominal boundaries is valuable for several reasons. In the long run, most people only understand scope fully when they feel they know both what it includes and what it excludes. Boundaries can be contoured properly only if there is a coherent understanding of what lies along both sides of the demarcation. The impacts of strategies, markets, and other forces upon specific business areas are localized symptoms of broader conditions, and these factors must be accommodated coherently throughout an organization. Provisional local decisions must be tested by seeing what global results they are likely to effect. Significant change on the horizon may be detected earlier. One reliably cannot evaluate priorities out of context. A more generalized awareness makes it easier to free one's understanding from the knowledge of what is, and to instead start conceiving what ought to be. As noted in Exhibit 2, it is vital for participants t o appreciate that the scope of a systems project initially should be fairly broad; it then can be narrowed a s work progresses toward implementation. The business stands on safer ground if what ultimately is built fits within a broader valid design, if what is designed fits into a broader valid analytical context, and if what is analyzed fits into a still larger coherent view of business requirements. Consider some of the cases cited in the Introduction. Clearly, the team charged with enabling greater flexibility in product pricing was at fault no team member could have looked very far beyond its prescribed scope without discovering the corporate groundwork for creating new product lines. Instead of dealing with this coming change in its data analysis, which would have added no substantial burden to its effort, the team either somehow remained oblivious to what the business was doing or chose to be derelict; whatever the cause, the outcome was unfortunate. The two teams that were directed first to ignore, but ultimately to address, certain areas of requirements were obviously victims of poor management decisions, most likely decisions made according t o incomplete (i.e., excessively localized) reconnaissance. Note that this problem in fact may have originated in the way the team gathered its information, and the sources that it used. To perfectlyaccommodate imprecisely known future

Completeness, Context, and Proportion: Shaping Data Design to the Business Exhibit 2. ConBrming/Refining Project Scope Project Phase and Purpose Initiation Scope is set to plan and establish resources for the overall project, based upon project's business objectives.

Requirements Definition Scope for analysis phase is confirmed or refined at a data subject level,using the inventory of relevant information resources it has created. Analysis Scope for design phase is confirmed or refined at an entity/relationship level, working from internal requirements and external logical models.

Design Scope for build phase is confirmed or refined at a physical model level, working from findings of detailed analysis.

Build Delivery consideration may result in temporal phasing of project; scope for each phase must be defined.

RisksIRequired Action Scope as set by corporate decision makers may not recognize data or application boundaries properly. Team must map project's objectives to the corporate information context. Unexplored relationships with other projects may promote overlaps, inconsistencies, and accidental gaps. Team must inventory the information resources (existing, in-progress, and those still only planned) that touch its objectives in any way. Other projects geared to other objectives may be working with an incompatible understanding of data. Team must review other projects' logical data models at a high level, deriving logical models from physical models if necessary. Other projects geared to other objectives may work to different business rules, leading to different understandings of entities and relationships. Business constraints may have led other projects to stop short of, or to cross, natural scope boundaries. Team must conduct a detailed review of the boundaries of other relevant projects. De-normalized physical models used for other projects may not fit the new, expanded corporate context. Team must recognize conflicts, bring them to management attention, and recommend which project(s) can best address them. Past reliance on local standards or diverse technologies may not permit graceful integration of other projects with the new data analysis. Team must address any conflicts with existing corporate data and technical architectures. Phasing cannot be arbitrary nor can it generally be determined only with respect to resourcing constraints. Business dependencies and external constraints also must be recognized. Team must devise a viable migration plan that bridges multiple subprojects as necessary.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY requirements in a logical data model would be impossible. On the other hand, to exclude them to the extent that the logical data models needed extensive change was probably a situation that could have been avoided readily. For requirements gathering to begin with a broad investigation, it needs suitable sources of information. If an organization has made a commitment to an overall strategic plan, the context in which to gather requirements is evident, but comprehensively planned business environments are scarce. In many business cultures, formal strategic plans will be incomplete, obsolete, untrusted, or totally lacking. Other sources of strategic direction may be available in the absence of formal plans, such as executive interviews, internal announcements, and marketing communications. If all else fails, one can step back and determine the de facto priorities of various projects and business areas, noting which groups are given the largest budgets, which seem to attract the brightest managers and the most skilled staff, and which teams are the largest. No matter what sources the team uses to sketch its context, it will be betterequipped for doing so, having learned (or deduced) how other business areas affect its primary area of concern, how that area is likely to be impacted by new corporate initiatives, and how important its internal priorities really are. One caveat remains: the gathering of requirements may be polluted by political agenda. Managers may compete more than they cooperate; communication to the project team may be withheld, may not be frank, and occasionally may be deliberately inaccurate. Stated corporate policy may be only paid lip-service by local business units, which choose to deviate from overall corporate direction, perhaps actually hindering some aspects of the business' operation in which they have no direct concern. Ideally, political problems have political solutions. When they are brought to the attention of a project sponsor, some course of action can be found outside the team's activity that either eliminates the difficulty or clarifies the corporation's expectations of the team. In reality, satisfactorily addressing all facets of some political issues may not be always feasible, leaving the team with difficult issues that strain its creativity. If severe problems continue unresolved, the most viable recourse available to the team remains the project sponsor. In some cultures, additional support can be found among the team members' line management, but political sensitivities often make the enlistment of such people difficult.

ANOMALIES IN THE CONTINUUM When systems and business people discuss requirements, they begin the modeling exercise that ought to forge a continuous chain of under-

Completeness, Context, and Proportion: Shaping Data Design to the Business standing that will be extended through later project phases. Despite the rigors of sound practice and the use of standard documentation forms, miscommunication may occur, especially where key participants fail to understand project documents. For example, many systems professionals have seen business owners accept incomprehensible entity-relationship diagrams (even when labels either are so abbreviated or are so small that they are functionally illegible, and when intersecting relationship lines are impossible to trace reliably). Such well-intended but meaningless approval, proffered to allow work to pass officially into a new phase, threatens a project's future by disrupting the continuum of business understanding. One source of miscommunication is team members' confusing correct form with meaning, most notably with regard to industry-standard modeling forms, some of which may be seen for the first time by a given participant during requirements gathering. Although these forms were developed expressly to establish a standard, unambiguous modeling vocabulary, this vocabulary usually is not mastered quickly by first-time business participants; some systems participants may be no better. Where team resources are not experts, their critical judgment may be impeded by the very formalism that was supposed to assist them. To their eyes, if a diagram contains all the right elements and if it is presented for approval by someone who probably understands it far better than the nominal acceptors, it certainly looks very good - especially if it is printed with a handsome font, using a high-resolution laser printer. If it looks like a data model, it must be one. A good analogy (Exhibit 3) might be the toy set designed to allow children to construct things from log-shaped blocks - regardless of whether the pieces are assembled to match the proportions of a skyscraper or a shoebox, the child always sees the end-products as log cabins. One absurd case highlights the risk of such shallow comprehension. When budget cuts sharply reduced funding for systems work, a project director asked his staff to shrink the project's scope with scissors, asking them t o cut segments from printouts of different business-area models, which he hoped they would tape together into a coherent new model. (This idea was afterward celebrated as the Frankenstein methodology.) Such profound lack of understanding on the part of acceptors breeds an overall carelessness in a team's work. Front-end deliverables, including data models, may be assembled only mechanically and without insight, so that models are not underpinned by substantial definitions that ring true t o business owners. Basic guidelines, or even integrity rules, may be ignored. A particular area of risk originates with the convenient (but somewhat artificial) division that is made between process and data. Ultimately, neither makes sense without the other, and models should respect their inherent interdependency. In practice, however, some projects exhibit a strong separation between concerns of data and process, attempting t o


d . .

Resembling a log cabin does not necessarily mean that a model of a log cabin is valid. The same is true for a data model.

Exhibit 3. Resemblance vs. actuality.

deal with each in isolation during separate sessions. In the past, at least one corporation adopted a methodology that had rigorous standards for process and data modeling, yet its systems area established two disconnected, separately managed CASE environments - one for modeling data, the other for modeling process. With such an outlook, the risk of miscommunication is high. Communication risks can be minimized through simple diligence. All models should be accompanied by solid definitions (i.e., something of more value than Client: information about the company's clients). The consistency of a project's approach to understanding process and data must be validated at designated intervals. Note that some methodologies employ special model forms to depict the manner in which data and process interact. Evaluating participants' abilities to see past a data model's formalism entails tact and sensitivity (strangely, most dislike being informed publicly of not knowing what one is doing). By and large, modeling failures can be detected by planned off-line quality reviews, devised to target modeling vi-

Completeness, Context, and Proportion: Shaping Data Design to the Business olations and anomalies (e.g., recursive decompositions, inconsistencies between data and process models, etc.). If problems are detected, they can be brought back openly to the project team, which as a body should be prepared to take responsibility for the quality of its work.

PROTOTYPING'S IMPACTS Many organizations rely on prototyping as part of the discovery process, assembling a combined business/systems team that rapidly evolves a skeletal application. As it provides an excellent forum for creative thinking, a prototyping methodology has undeniable advantages. It moves quickly, and its rapid pace usually motivates the team and promotes management buy-in. It effectively can avoid some problems by concealing abstract data-modeling concerns, letting business people deal directly with screens and tables, setting data issues in a merged process/data context that supports good business validation. One also must recognize its limitations, however. Limitations are inclined to linger over all possible cases. Seeking to maintain momentum, a prototyping effort may choose to leap over phenomena that cannot be treated effectively in a hands-on-the-keyboard setting. Its emphasis on creativity encourages a freedom from discipline; consequently, discovery prototyping is not concerned inherently with completeness. Although it can be made to work in broad contexts, it is adapted more readily to a scale in which a diverse group quickly can grasp the nature of problems and solutions. Real-world systems must be defined so that they can deal in an acceptable fashion with the eventualities (data types, transactions, cases, etc.) that are possible within their contexts. If a prototyping exercise does not address all such eventualities, and if it does not define what course of action is acceptable for each, then it must be followed by a more structured (and more laborious) dialogue, which cannot afford to jettison concerns that are difficult to apprehend. The purpose of this dialogue is to test the prototyping results against more formal high-level models and to expand its concepts so that they fill the sphere of concern, encompassing all requirements at a consistent level. Depending on team dynamics, the models can be drafted as a group by business and systems team members, or they can be prepared offline by specialists and then subjected to careful group review. Again, the prime consideration is one of completeness and balance. It is a mistake to employ prototyping to discover the majority of business requirements, but then merely to trust to the fates that the remainder will be identified without a determined effort. One also must realize that prototyping demands a specialized skill set, which does not coincide fully with the skill-set that is needed for other

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY methods of requirements gathering. A given team member may not function equally well, or be equally amenable to, all flavors of a requirements dialogue. Pace is a major consideration. A prototyping team does not grind away tediously; instead, it leaps toward solutions. It needs members who quickly can reach communal decisions that are fundamentally sound. They must be able to voice opinions without fear. They must be objective, dropping yesterday's decisions willingly when superior alternatives are suggested today. Team members must be able to link incompletely articulated concepts to the concrete things that they see, following principles almost intuitively: good prototyping analysts simultaneously think in terms of undefined entities and n-normal forms. These strengths differ markedly from the contemplative, patient skiIls of those analysts who sift through business procedures and forms, conduct interviews chiefly by listening critically, and then embody abstracted concepts in the formal models they construct. Such abilities can bring judgement to bear on the outcome of prototyping, validating, and amplifying its results, but they may not be able to add value in the heat of the prototyping process itself. Nonprototyping analysts may need the time to frame definitions and to identify fundamental relationships before they even begin to think of keys. Indeed, subjected to the pressure of a swift pace, their judgement, which normally is based on critical reflection, may s o decay as to become a liability. Not surprisingly, the team dynamics of prototyping are also distinctive. Traditional, nonprototyping settings for gathering requirements are inhibited more easily by cultural divisions (distinctions of rank and differences between the immediate objectives of systems and business units). Such divisions often are rendered meaningless by prototyping's intensity, but analysts who work in traditional settings must deal with them. In contrast, the prototyping dynamic is highly democratic. Ideally, it excludes gurus in favor of peers, who are at ease operating on an even plane. Thus, placed in a traditional setting (e.g., one in which rank matters), an otherwise experienced prototyping analyst may have difficulty establishing effective give-and-take communication. Conversely, a skilled back-room analyst, who has demonstrated insight in traditional settings, may flounder in prototyping's snap-judgement climate, at a loss without sufficient opportunity to probe more deeply what is being proposed. Rather than focusing at a true requirements level, a prototyping team's working dialogue is framed in a postanalytical perspective. Once the process is underway, instead of actually defining what is (or is not) needed, people think "We need something like this" or "I don't like the way that works" or "Yes, something like that should handle most of the situations

Completeness, Context, and Proportion: Shaping Data Design to the Business we run into." The intention is primarily to affirm or reject, and only secondarily to state reasons. Reasons are to be digested and evaluated later, at which time one can evaluate the true meaning of t h e quick decisions reached at the prototyping table. The basic prototyping dialogue, then, inverts the traditional analytical approach, in which products drafted through off-line analysis are understood, validated, and synthesized by a group. Instead, prototyping first uses group thinking to create products, which subsequently are validated and refined off line. In the end, with business and systems people alike proposing quasi-system solutions t o business problems, prototyping environments may lead to a corporate undervaluing of skills for listening, for reflective synthesis, and for integrating ideas into a larger context -the same skills that are vital to postprototyping information gathering. If a climate relies heavily on prototyping, the in-house experience of its systems professionals may not prepare them adequately to gather and synthesize requirements through nonprototyping activities. Obviously, when a team is being assembled, the respective experience, abilities, and personal preparation of the candidate members should be reviewed in the context of both prototyping and traditional activities. If adjustments are indicated, they can be achieved through: Additional training for individual resources. Changes to the team roster. Use of consistent methodology from one project to the next. Taking the time to properly orient the team as a whole at the project's outset. Setting an appropriate project schedule that respects t h e team's strengths.

CONCLUSION It is vital that people working together to define business requirements share a clear understanding of the task at hand. Paradoxically, the widespread use of now-standard tools and techniques that address this issue may open a new door to old problems, for businesses may rely on unverified expertise and shallow understanding where there is only the appearance of sound practice. When this occurs, the problems of the past are found reincarnated in today's world, compromising results, undermining the value of data analysis, and perhaps leading to outright failure. Every project is a new start, and its team is entitled to proper training and management, so that business requirements are collected in context, completely, and with proper regard to coming change.

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY As in any endeavor, there will be strong business pressures t o move too quickly and t o stray from the mark. From time t o time, projects will b e urged to form too narrow a focus, and to heed only the hot buttons of t h e day. If this happens, the team must still act responsibly, devising a logical data model that later can conform t o predictable requirements with minimal business impact. Regardless of the pressures under which it labors, it is incumbent upon the project team to remain disciplined, t o construct a balanced and objective analytical view, and to make itself understood t o its management, s o that the risk of poor decisions is minimal. The team's sponsor, its fellow employees, the business overall, and the team itself deserve a s much.

Chapter 6

An Enterprise Decision Framework for Information System Selection

Richard Lee


prise resource planning (ERP) package vendors and system integrators (SI) to develop methodologies toward implementation of package solutions. However, there has been little attention to detail prior to implementation of enterprise application packages. There are key considerations that must be resolved prior to implementation to ensure project success. All business procedures in an enterprise must be viewed from the market perspective and optimized throughout all functional areas. To meet the challenges facing large enterprises in today's constantly changing market, enterprises need an effective system for all lines of business. Seldom does an enterprise have the opportunity to implement a corporate-wide integration strategy; particularly relating competitive systems to corporate systems. Enterprises typically make uniform decisions within a functional area such as corporate financial systems, and migrate to a standard solution for all functional areas, provided that the benefits of dealing with one vendor/one solution outweighs any loss in functionality. In today's market, enterprises tend to have a mixed bag of custom solutions and different application packages. Package solutions are being chosen in favor of custom development in business areas such as manufacturing, financials, human resources, and customer service. Enterprises are looking for complete and integrated solutions from one vendor. In addition, enterprises have identified that linking business (e.g., financial) systems with an overall organizational system 0-8493-9976-9/99/$0.OOtS.50 8 1999 by CRC Press LLC

DATA DEVELOPMENT METHODOLOGIES, DEFINITIONS, AND STRATEGY simplifies and accelerates tasks, improves the flow of information, and facilitates scheduling and planning decisions based on consistent data. Data consistency and quality at the strategic level are increasingly important for enterprises that want to remain flexible to meet the changes in the marketplace. To develop a sound enterprise strategy, the data associated with business processes, customers, competitors, and technologies must be relevant, complete, accurate, and timely. In addition, as an enterprise strategy is rolled out, results are tabulated and modified, when specific plans are initiated. The reported results must be accurate and timely, otherwise, execution of the strategy becomes much more difficult. Many enterprises are recognizing the importance of developing an effective enterprise data strategy with their enterprise information system strategy. Enterprises are interested in systems that offer more sophisticated capabilities, with tighter integration of key business areas, i.e., supplychain and sales-force automation. ERP vendors recognize this and now are providing data warehousing capabilities to their software tools. ERP vendors are striving to achieve even tighter integration with data mining and data warehousing capabilities. Enterprises are looking for their new technological solutions to provide them with the ability to stay ahead of their competitors. Vendors have begun offering free extraction and data analysis tools to companies purchasing data warehouses. Enterprises are setting directional standards for the entire organization that reflects the benefit of dealing with one vendor. With the current functional support offered by many packages, enterprises are adopting one vendor, integrated solutions for corporate systems (e.g., financials, human resources, etc.). For other functional areas, enterprises consider the benefit of dealing with one integrated solution versus the cost of missing functionality. Application package vendors are expanding and enhancing their solutions to address more functional breadth and depth. As a result, industry vertical solutions are emerging for the large enterprise packages (e.g., SAP, Peoplesoft, Baan and Oracle). In the near future, it will be much easier for enterprises to follow one standard for the entire organization.

DECISION FRAMEWORK An enterprise-wide strategy provides an organization with a more process-oriented, streamlined information system. Consequently, this allows an enterprise to expand its internal information-processing and communications capabilities. The framework in Exhibit 1 describes the typical decision tree used by management when evaluating large system replacements and deciding on a future direction.

^/Shouldwe >vs replace current S^ systems? ^A

/'What is ou^\ future development Sw direction? ^

' What ^x solution do we move ahead * S^with?^^

'What are the lessons* foamed, successes, failures. & major Issues.

n• Business benefits vs. costs • Support for reengineering results • Competitor initiatives • Availability of practical alternatives

• Business case development • High level current systems review

• Functional gap & costs associated with current systems • Architecturalfitof current systems with future direction • Functionalfitof package alternatives • Vendor viability • Integration of package(s) with other packages/current systems • Acquisition & implementation costs • Time & risk of implementation • Development costs • Timeliness of future development & delivery

• Functional fit (overall and per module) • Enterprise and Departmental needs • Vendor track record • Competitor initiatives • Other department/related party initiatives • Costs • Risks • Integrated versus Best of Breed direction • Implementation time • Future direction of proposed alternative • Support costs

• Availability of resources • Available time for implementation • Risk • Internal & external expertise • New technology • Number of sites

• Detailed current systems analysis • Initial package analysis • Feasibility Study

• Detailed package selection • Detailed Review/Pilot to assess gaps/mods • Technical fit analysis • Best practices analysis/survey • Preliminary implementation planning

• Detailed implementation planning • Contract negotiations • Implementation

Exhibit 1.

Enterprise decision framework.