Introduction to Teradata Warehouse. Teradata Database Release V2R5.1 Teradata Warehouse Release 7.1

208 94 1MB

English Pages 298 Year 2003

Polecaj historie

Teradata Physical Database Design [1 ed.] 9781940540146

The design of the data warehouse is the most important aspect, so this book is a fantastic guide on exactly how to do ju

174 66 37MB Read more

Teradata Load Utilities [1 ed.] 9781940540191

This book covers all of the traditional utilities at the perfect level with detailed examples and explanations so they’r

150 6 31MB Read more

Teradata Aster Data [1 ed.] 9781940540207

This book will explain Aster Data so that users are instant experts. It will explain how things are set up, how things w

147 60 12MB Read more

Knowledge Warehouse 9780712331456, 071233145X

236 80 8MB Read more

GURPS Classic: Warehouse 23 1556343280

http://www.sjgames.com/gurps/books/Warehouse23/

1,102 110 13MB Read more

Data Warehouse Project Management 0785342616354, 0201616351

The average data warehouse takes three years to build and costs $3-5 million -- yet many data warehouse project managers

739 130 9MB Read more

Data Warehouse Project Management 0785342616354, 0201616351

The average data warehouse takes three years to build and costs $3-5 million -- yet many data warehouse project managers

744 85 4MB Read more

Data Warehouse Project Management 0785342616354, 0201616351

The average data warehouse takes three years to build and costs $3-5 million -- yet many data warehouse project managers

576 48 2MB Read more

For Immediate Release 9781953295477

425 58 516KB Read more

Catch and Release 9781101006948

439 31 260KB Read more

Introduction to Teradata Warehouse. Teradata Database Release V2R5.1 Teradata Warehouse Release 7.1

Categories
Computers
Databases

Table of contents :
Introduction to Teradata Warehouse
Preface
Supported Software Release
Changes to This Book
About This Book
List of Acronyms
Technical Information on the Web
Contents
Chapter 1: Teradata Warehouse
What Is a Data Warehouse
The Next Step for the Data Warehouse
Strategic Queries
Tactical Queries
Teradata Warehouse
Section 1:
Teradata Warehouse Overview
Chapter 2: Teradata Warehouse Overview
What Is the Teradata Database
Attachment Methods
How to Communicate with the Teradata Database Using SQL
Purpose in Development
Shared Information Architecture
Teradata Database Server Software
Parallel Upgrade Tool
What Are Teradata Tools and Utilities
For More Information
Chapter 3: Teradata Database Architecture
SMP and MPP Machines
The BYNET
Boardless BYNET
Disk Arrays
Logical Units
Pdisks and Vdisks
Cliques
Hot Standby Nodes
Virtual Processors
Parsing Engine
Access Module Processor
AMP Clusters
Parsing Engine Request Processing
The Dispatcher
The AMPs
Example: SQL Statement
Parallel Database Extensions
Trusted Parallel Applications
PDE and MPP Systems
Start and Stop PDE
The Teradata File System
Cylinder Read
Disk I/O Integrity Checking
Workstation Types and Available Platforms
System Console
Administration Workstation
Teradata Database Window
How the Database Window Communicates with Teradata Database
Running DBW
For More Information
Chapter 4: International Language Support
Character Set Overview
What Is a Repertoire
Character Representation
External and Internal Character Sets
Character Data Translation
What Teradata Database Supports
Teradata Database Character Data Storage
Internal Server Character Sets
User Data
System Dictionary Data
Language Support Modes
Default Character Set for User Data
Character Set for System Dictionary Data
Character Set for Dictionary Data Other Than Object Names
Standard Language Support Mode
LATIN Character Set
Compatible Languages
Japanese Language Support Mode
Advantages of Storing System Dictionary Data Using KANJI1
Advantages of Storing User Data Using UNICODE
Extended Support
For More Information
Section 2:
The Teradata Database Structure
Chapter 5: Structured Query Language (SQL)
Why SQL
What is SQL
Data Definition Language
Data Control Language
Data Manipulation
SQL Data Types
Teradata and ANSI-Compliant Data Types
Data Type Attributes
Statement Punctuation
SQL Statements and Requests
The SELECT Statement
SELECT Statement and Set Operators
SELECT Statement and Joins
SQL Functions
Scalar Functions
Aggregate Functions
Ordered Analytical Functions
User-Defined Functions
Creating User-Defined Functions
SQL Statements Related to Functions
Cursors
For More Information
Chapter 6: Application Development
Types of SQL Development
Explicit SQL Development
Implicit SQL Development
Embedded SQL Applications
What Is Embedded SQL
How Does an Application Program Use Embedded SQL
Supported Languages and Platforms
Macros as SQL Applications
SQL Used to Create a Macro
Macro Usage
SQL Used to Modify a Macro
SQL Used to Delete a Macro
Teradata Stored Procedures as SQL Applications
SQL Used to Create Stored Procedures
Stored Procedure Example
SQL Used to Execute a Stored Procedures
The EXPLAIN Statement
How Is EXPLAIN Useful
EXPLAIN With Simple Join Index Example
Third-Party Development
TS/API Products
Compatible Third-Party Software Products
Performance Monitor/Application Programming Interface
For More Information
Chapter 7: The Teradata Database Model
What is a Relational Model
What is a Relational Database
Set Theory and Relational Database Terminology
Tables, Rows, and Columns
Table Constraints
Permanent and Temporary Tables
Global Temporary Tables
Volatile Temporary Tables
Derived Tables
Rows and Columns
For More Information
Chapter 8: Data Distribution and Access Methods
Teradata Database Indexes
Primary Indexes
Primary Index Characteristics
How Are Primary Keys and Primary Indexes Related
Partitioned Primary Indexes
Non-partitioned Primary Indexes
How Do Partitioned and Non-Partitioned Primary Indexes Compare
Secondary Indexes
Secondary Index Subtables
How Do Primary and Secondary Indexes Compare
Join Indexes
Single-Table Join Indexes
Multi-Table Join Indexes
Aggregate Join Indexes
Sparse Join Indexes
Hash Indexes
Index Specification
Creating Indexes
Strengths and Weaknesses of Various Types of Indexes
Hashing
Identity Column
For More Information
Chapter 9: Data Dictionary
What is the Data Dictionary
Data Dictionary Content
What Is in a Data Dictionary Table
Teradata Database Data Dictionary Views
What Is in a View
Why Use Views
Who Uses Data Dictionary Views
SQL Access to the Data Dictionary
For More Information
Chapter 10: Teradata Meta Data Services Services
What Is Metadata
Types of Metadata
Teradata Meta Data Services
Creating the Teradata Meta Data Repository
Connecting to the Teradata Meta Data Repository
For More Information
Chapter 11: Other Database Objects
What Are Views
SQL Statements Related to Views
Restrictions on Using Views
What Are Teradata Stored Procedures
Why Use Stored Procedures
Elements of a Teradata Stored Procedure
What Are Macros
SQL Statements Related to Macros
Single-User and Multi-User Macros
Macro Processing
What Are Triggers
Types of Triggers
When Do Triggers Fire
ANSI-Specified Order
Trigger Functions
SQL Statements Related to Triggers
Elements of a Trigger
Restrictions on Triggers
For More Information
Section 3:
Teradata Database System Operation
Chapter 12: Normalization and Referential Integrity
Normalization
Normal Forms
Relational Database Terminology
First, Second, and Third Normal Forms
First Normal Form
Second Normal Form
Third Normal Form
Advantages of Normalization
Boyce-Codd Normal Form and Higher Normal Forms
Boyce-Codd Normal Form
Fourth Normal Form
Fifth Normal Form
Referential Integrity
Referential Integrity in the Teradata Database
Referential Integrity Terminology
Referencing (Child) Table
Referenced (Parent) Table
Why Is Referential Integrity Important
Referential Integrity Constraints
Referential Constraints
Batch Referential Integrity
Rules for Referential Integrity Constraints
Referential Constraint Checks
For More Information
Chapter 13: Data Communication Between Client and Teradata Database
Attachment Methods
CLIv2 for Channel-Attached Systems
What CLIv2 for Channel-Attached Clients Does
Teradata Director Program
Server
CLIv2 for Network-Attached Systems
What CLIv2 for Network-Attached Clients Does
Micro Teradata Director Program
Micro Operating System Interface
Other Types of Data Communications
WinCLI
ODBC
JDBC
For More Information
Chapter 14: Reliability
Software Fault Tolerance
Vproc Migration
Fallback Tables
AMP Clusters
One-Cluster Configuration
Smaller Cluster Configuration
Journaling
Teradata Archive/Recovery
Table Rebuild Utility
Hardware Fault Tolerance
For More Information
Section 4:
Management and Monitoring
Chapter 15: Concurrency Control and Transaction Recovery
What is Concurrency Control
What is Recovery
Concept of a Transaction
Definition of a Transaction
Definition of Serializability
Transaction Semantics
ANSI Mode Transactions
BEGIN TRANSACTION/END TRANSACTION Statements
Roll Back an ANSI Transaction
Teradata Mode Transactions
BEGIN TRANSACTION/END TRANSACTION Statements
Roll Back a Teradata Mode Transaction
Concept of a Lock
Overview of Teradata Database Locking
Why Do Database Management Systems Require Locking
Lock Levels
Levels of Locks Types
Automatic Database Lock Levels
Deadlocks and Deadlock Resolution
Host Utility Locks
HUT Lock Types
HUT Lock Characteristics
System and Media Recovery
System Restarts
Transaction Recovery
Down AMP Recovery
Two-Phase Commit Protocol
Definition of Participant
Definition of Coordinator
For More Information
Chapter 16: Database Management and Analysis Tools
Teradata Tools and Utilities - Archive Utilities
Teradata Archive/Recovery Utility
Open Teradata Backup
Teradata Tools and Utilities - Data Load and Export Utilities
Teradata MultiLoad
Teradata FastLoad
Teradata Parallel Data Pump
Teradata FastExport Utility
Database Management Tools
Teradata Database - Active Session and Configuration
System Resource Management
Teradata Database - Ferret Utility
Teradata Database - Priority Scheduler Utility
Teradata Tools and Utilities - Teradata Statistics Wizard
Teradata Database - Teradata Dynamic Query Manager
Teradata Database - Teradata MultiTool
Database Query Analysis Tools
Teradata Tools and Utilities - Teradata Index Wizard
What Can the Teradata Index Wizard Do
Demographics
Teradata Database - Query Capture Facility
QCD Schema Improvement
Teradata Index Wizard Support
Teradata Tools and Utilities - Teradata Visual Explain
Teradata Database - Database Query Log
Teradata Database - Target-Level Emulation
Teradata Tools and Utilities - Teradata System Emulation Tool
Teradata Database - Database Object Use Count
Query Facilities
Teradata Tools and Utilities - Basic Teradata Query Utility
BTEQ Support
BTEQ Communication
Teradata Tools and Utilities - Teradata SQL Assistant
Teradata Tools and Utilities - Preprocessor2
For More Information
Chapter 17: Security and Integrity
Security and Integrity
System Integrity
System Security
Resource Access Control
User Identifiers
Client Identifiers
Logon Policies
TDP Security
Single Sign On
Encryption
Network Data Encryption
Logon Encryption and the Teradata Gateway
Security Features
Password Attributes
User-Level Password Attributes
DBC.DBase Table
SQL Used to Control Logon
Data Access Control
Ownership and Implicit Rights
System Views for Access Information
Security Policies and Physical Access Control
Principle Considerations of a Security Policy
Key Implementation Elements of a Security Policy
Auditing and Accountability
For More Information
Chapter 18: System Administration
Space Allocation for Databases and Users
Databases and Users
How to Create a Finance and Administration Database
How to Create Databases
How to Create Users
Roles and Profiles for Users
Accounting
Session Management
Establishing a Session
Logon Operands
Session Requests
Account String Expansion
Account Performance Groups
Maintenance Utilities
For More Information
Chapter 19: System Monitoring
Teradata Manager
System and Configuration Status
Resource Usage Monitoring
Resource Usage Tables and Views
Resource Usage Data Categories
Resource Usage Data Handling
Resource Usage Macros
How to Control Collection and Logging of Resource Usage Data
Summary Mode
Performance Monitoring
The TDPTMON
System Management Facility
The Performance Monitor/Application Interface
For More Information
Index

Citation preview

Teradata Database

Introduction to Teradata Warehouse Teradata Database Release V2R5.1 Teradata Warehouse Release 7.1

B035-1091-083A November 2003

The product described in this book is a licensed product of NCR Corporation. BYNET is an NCR trademark registered in the U.S. Patent and Trademark Office. CICS, CICS/400, CICS/600, CICS/ESA, CICS/MVS, CICSPLEX, CICSVIEW, CICS/VSE, DB2, DFSMS/MVS, DFSMS/ VM, IBM, NQS/MVS, OPERATING SYSTEM/2, OS/2, PS/2, MVS, QMS, RACF, SQL/400, VM/ESA, and VTAM are trademarks or registered trademarks of International Business Machines Corporation in the U. S. and other countries. DEC, DECNET, MICROVAX, VAX and VMS are registered trademarks of Digital Equipment Corporation. HEWLETT-PACKARD, HP, HP BRIO, HP BRIO PC, and HP-UX are registered trademarks of Hewlett-Packard Co. KBMS is a trademark of Trinzic Corporation. INTERTEST is a registered trademark of Computer Associates International, Inc. MICROSOFT, MS-DOS, MSN, The Microsoft Network, MULTIPLAN, SQLWINDOWS, WIN32, WINDOWS, WINDOWS 2000, and WINDOWS NT are trademarks or registered trademarks of Microsoft Corporation. SAS, SAS/C, SAS/CALC, SAS/CONNECT, and SAS/CPE are registered trademarks of SAS Institute Inc. SOLARIS, SPARC, SUN and SUN OS are trademarks of Sun Microsystems, Inc. TCP/IP protocol is a United States Department of Defense Standard ARPANET protocol. TERADATA and DBC/1012 are registered trademarks of NCR International, Inc. UNICODE is a trademark of Unicode, Inc. UNIX is a registered trademark of The Open Group. X and X/OPEN are registered trademarks of X/Open Company Limited. YNET is a trademark of NCR Corporation. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. The information contained in this document may contain references or cross references to features, functions, products, or services that are not announced or available in your country. Such references do not imply that NCR intends to announce such features, functions, products, or services in your country. Please consult your local NCR representative for those features, functions, products, or services available in your country. Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated without notice. NCR may also make improvements or changes in the products or services described in this information at any time without notice. To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this document. Please e-mail: [email protected] or write: Information Engineering NCR Corporation 100 North Sepulveda Boulevard El Segundo, CA 90245-4361 U.S.A. Any comments or materials (collectively referred to as “Feedback”) sent to NCR will be deemed non-confidential. NCR will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services incorporating Feedback. Copyright © 1999–2003, NCR Corporation All Rights Reserved

Preface Supported Software Release This book supports Teradata® Database Release V2R5.1 and Teradata® Warehouse Release 7.1.

Changes to This Book This book includes the following changes to support the current release: Date

November 2003

Description

Changed the title from Introduction to Teradata RDBMS to Introduction to Teradata Warehouse. Reorganized the book, placing chapters into sections so that information is easier to find. Added the following features to the appropriate chapters: • Hot Standby Nodes • Disk I/O Integrity Checking • SQL Functions • User-Defined Functions • Encryption • Database Object Use Count Added information about: • Teradata Tools and Utilities • Teradata Meta Data Services Updated information in existing sections. This information is marked by change bars in the margins.

Introduction to Teradata Warehouse

i

Preface

Date

December 2002

Description

Chapter 2 • Moved the following: “Data Communications” section to Chapter 6 and updated the features in the section “Database Management Tools” to Chapter 9 • Rewrote the following section: “Disk Arrays” • Added the Cylinder Read feature to “The Teradata File System” section Chapter 3 • Added the following new features: “Referential Constraints” “Batch Referential Integrity” • Added information about rows and columns to: “Derived Tables” “Tables, Rows, and Columns” • Moved the index information to the new Chapter 5: “Data Distribution and Access Methods.”

ii

Introduction to Teradata Warehouse

Preface

Date

December 2002 (continued)

Description

Chapter 4 • Added the following new section: “What Are Macros” “Table Constraints” “Default Database” “Identity Column” • Updated the following sections: “SQL Data Types” “Data Type Attributes” “Data Manipulation” “SELECT Statement and Set Operators” “What Are Teradata Stored Procedures” • Moved the following sections to Chapter 9: “Database Management and Analysis Tools”: “Basic Teradata Query Utility” “Teradata RDBMS Preprocessor2” Chapter 5 • Created Chapter 5: “Data Distribution and Access Methods” and consolidated index information in this chapter. • Added the following new feature: “Partitioned Primary Indexes” “Sparse Join Indexes” • Updated the following sections: “Join Indexes” “Hashing” “Index Specification” “Strengths and Weaknesses of Various Types of Indexes” “Secondary Indexes” Chapter 6 • Updated the following section: “Summary of Data Dictionary Views” Chapter 7 • Updated the following sections: • “Stored Procedures as SQL Applications” • “The EXPLAIN Statement” • “Data Communications” • “CLIv2 for Channel-Attached Systems” • “CLIv2 for Network-Attached Systems” • “Other Types of Data Communications”

Introduction to Teradata Warehouse

iii

Preface

Date

December 2002 (continued)

Description

Chapter 8 • Added new chapter: Chapter 8: “International Language Support” Chapter 9 • Added a new chapter titled, Chapter 9: “Database Management and Analysis Tools.” The chapter contains updated descriptions of the database management tools from Chapter 2 and the new database query analysis tools in release V2R5.0. • Added the following new tools: “Priority Scheduler Administrator” “Teradata Index Wizard” “Database Query Log” • Added the following sections that were previously in Chapter 4: “Structured Query Language (SQL)”: “Basic Teradata Query Utility” “Teradata RDBMS Preprocessor2” • Updated the following sections: “Query Session” “Teradata Dynamic Query Manager” “Teradata SQL Assistant” Chapter 10 • Updated the following sections: “Vproc Migration” “Fallback Tables” “Journaling” “Teradata Archive/Recovery” “BYNETs” “RAID Disk Units” Chapter 11 • Deleted Chapter 11 “System Maximum Capacities”. You can find this information in: Teradata RDBMS SQL Reference, Volume 1 Chapter 11 is now titled: Chapter 11: “Concurrency Control and Transaction Recovery” • Updated the following section: “Deadlocks and Deadlock Resolution” Chapter 12 • Added the following sections: “User-Level Password Attributes”

iv

Introduction to Teradata Warehouse

Preface

Date

December 2002 (continued)

Description

Chapter 13 • Updated the following sections: “How to Create Databases” “Account String Expansion” “Account Performance Groups” “Summary Mode” “How to Control Collection and Logging of ResUsage Data” • Added the following new section: “Roles and Profiles”

June 2001

Added the following features and enhancements: • Single Sign On • Hash Index • UTF-8 character set support • Integrated Database Query Manager • Resource Check Tool • 128 K data block support • Increased number of global and volatile temporary tables Updated the following features: • Stored procedures • Indexes • Backup utilities

September 2000

• Updated glossary. Removed references to UNISYS and KBMS/Intellect. Replaced reference to Nomad2 to NOMAD.

Introduction to Teradata Warehouse

v

Preface

Date

June 2000

Description

• COLLECT STATISTICS: Collected statistics are now stored in a spool table so that you can collect statistics at the same time you execute queries against the table. • Fallback: You can now define join index subtables with fallback. • Online Analytical Processing (OLAP): The operation of OLAP sampling is now optimized in the file system by only accessing the data blocks that contain the target row positions instead of scanning all the data blocks. • PDE Tools Utility (NT Only): A new PDE tools utility allows you to run the ctl, DBS control, DIP, and Vproc Manager utilities from any TPA or non-TPA or from the AWS. You can also start, reset, or stop the PDEs. • EXPLAIN: The size of the EXPLAIN text is now unlimited. • Stored Procedures: Teradata has developed the Teradata Stored Procedures (TDSP) feature.

April 1999

• A new virtual BYNET driver has eliminated the need for the vnet driver on systems that have no BYNET hardware. • Addition of the use of triggers with SQL statements. • Addition of a join index to improve performance • Internationalization of Kanji characters • Increase in the maximum number of vprocs to 16k • Increase in the row size to 64 K • Addition of OLAP features for statistical functions, extended date/calendar capability, and sampling. • Addition of Timestamping to the Data Dictionary.

vi

Introduction to Teradata Warehouse

Preface About This Book

About This Book Purpose This book provides an introduction to the Teradata Warehouse covering: • • • • •

Teradata Database and Teradata Tools and Utilities Teradata Database architecture and the relational model Applications and data communications Data definitions and data manipulation using Structured Query Language (SQL) System administration and security

Audience This book is intended for users who interface with the Teradata Warehouse. Such individuals may include database users or administrators.

How This Book Is Organized This book contains the following chapters: Chapter 1: “Teradata Warehouse” provides an introduction to the data warehouse concepts. Section 1: “Teradata Warehouse Overview” contains the following chapters: Chapter 2: “Teradata Warehouse Overview” describes the components of the Teradata Warehouse. Chapter 3: “Teradata Database Architecture” describes the Teradata Database system hardware and its design. Chapter 4: “International Language Support” describes the language support capabilities of the Teradata Database. Section 2: “The Teradata Database Structure” contains the following chapters: Chapter 5: “Structured Query Language (SQL)” provides information about how to use SQL to manipulate data in the Teradata Database. Chapter 6: “Application Development” provides information about developing applications for the Teradata Database. Chapter 7: “The Teradata Database Model” contains basic information about the tables, rows, and columns that make up the database model. Chapter 8: “Data Distribution and Access Methods” provides information about distributing data to and retrieving data from the Teradata Database.

Introduction to Teradata Warehouse

vii

Preface About This Book

Chapter 9: “Data Dictionary” describes the structure and content of the system tables in the Data Dictionary. Chapter 10: “Teradata Meta Data Services Services” provides information about Teradata Meta Data Services, which allows you to store, access, and administrate metadata on the Teradata Database. Chapter 11: “Other Database Objects” provides more information about views, macros, stored procedures, and triggers. Section 3: “Teradata Database System Operation” contains the following chapters: Chapter 12: “Normalization and Referential Integrity” describes the following: • •

How normalization reduces complex data to simpler, stable data structures How referential integrity protects data

Chapter 13: “Data Communication Between Client and Teradata Database” describes how the client and the Teradata Database exchange information. Chapter 14: “Reliability” describes how fault-tolerant hardware and software increase the reliability of the Teradata Database. Section 4: “Management and Monitoring” contains the following chapters: Chapter 15: “Concurrency Control and Transaction Recovery” describes the mechanisms that prevent concurrently operating sessions from damaging the data that resides within the Teradata Database. Chapter 16: “Database Management and Analysis Tools” describes the tools that you can use to manage the hardware and software that make up the Teradata Database. Chapter 17: “Security and Integrity” describes how to prevent unauthorized access to the information in the Teradata Database. Chapter 18: “System Administration” describes space allocation, roles and profiles, accounting, and maintenance capabilities on the Teradata Database. Chapter 19: “System Monitoring” describes the various aspects of monitoring the Teradata Database, including the monitoring tools used to track system performance.

Prerequisites To gain an understanding of Teradata Warehouse, you should be familiar with the following: • • •

viii

Basic computer technology System hardware Teradata Tools and Utilities

Introduction to Teradata Warehouse

Preface List of Acronyms

List of Acronyms The following acronyms, listed in alphabetical order, are used in this book: 1NF

First Normal Form

2NF

Second Normal Form

2PC

Two-Phase Commit

3NF

Third Normal Form

4NF

Fourth Normal Form

5NF

Fifth Normal Form

AMP

Access Module Process

ANSI

American National Standards Institute

API

Application Programming Interface

ARC

Teradata Archive/Recovery Utility

ASCII

American Standard Code for Information Interchange

ASE

Account String Expansion

AWS

Administration Workstation

BCNF

Boyce-Codd Normal Form

AWT

AMP Worker Task

BTEQ

Basic Teradata Query Facility

BYNET

Banyan Network (high-speed interconnect)

CICS

Customer Information Control System

CLIv2

Call-Level Interface, Version 2

CNS

Console Subsystem

DB2

DATABASE 2

DBC

Database Computer

DBQAT

Database Query Analysis Tools

DBQL

Database Query Log

DBS

Database System or Database Software

DBW

Database Window

DDE

Dynamic Data Exchange

Introduction to Teradata Warehouse

ix

Preface List of Acronyms

x

DDL

Data Definition Language

DIP

Database Initialization Program

DML

Data Manipulation Language

DNS

Domain Name Source

DSS

Decision Support System

EBCDIC

Extended Binary Coded Decimal Interchange Code

FIPS

Federal Information Processing Standards

GDO

Globally Distributed Object

HI

Hash Index

IBM

International Business Machines Corporation

ID

Identification

IMS

Information Management System

I/O

Input/Output

ISV

Independent Software Vender

JBOD

Just a Bunch Of Disks

JI

Join Index

LAN

Local Area Network

LUN

Logical Unit

MDS

Meta Data Services

MIPS

Millions of Instructions Per Second

MOSI

Micro Operating System Interface

MPP

Massively Parallel Processing

MTDP

Micro Teradata Director Program

MVS

Multiple Virtual Storage

NPPI

Non-Partitioned Primary Index

NUPI

Non-Unique Primary Index

NUSI

Non-Unique Secondary Index

ODBC

Open Database Connectivity

OS/VS

Operating System/Virtual Storage

OTB

Open Teradata Backup

PDE

Parallel Database Extensions

Introduction to Teradata Warehouse

Preface List of Acronyms PE

Parsing Engine

PI

Primary Index

PL/I

Programming Language 1

PJ/NF

Projection-Join Normal Form

PP2

Preprocessor2

PPI

Partitioned Primary Index

PUT

Parallel Upgrade Tool

QCD

Query Capture Database

QCF

Query Capture Facility

RAID

Redundant Array of Independent Disks

RCT

Resource Check Tools

RI

Referential Integrity

SCSI

Small Computer System Interface

SIA

Shared Information Architecture

SMP

Symmetric Multi-Processing

SNMP

Simple Network Management Protocol

SQL

Structured Query Language

SR

Single Request

SSO

Single Sign On

TCP/IP

Transmission Control Protocol/Internet Protocol

TDP

Teradata Director Program

TDSP

Teradata Stored Procedures

Teradata DQM

Teradata Dynamic Query Manager

TPA

Trusted Parallel Application

TS/API

Transparency Series/Application Program Interface

UPI

Unique Primary Index

USI

Unique Secondary Index

VM/CMS

Virtual Machine/Conversational Monitor System

VM

Virtual Machine

vproc

Virtual Processor

VS

Virtual Storage

Introduction to Teradata Warehouse

xi

Preface Technical Information on the Web

Technical Information on the Web The NCR home page (http://www.ncr.com) provides links to numerous sources of information about Teradata. Among the links provided are sites that deal with the following subjects: • • • • • • •

xii

Contacting technical support Enrolling in customer education courses Ordering and downloading product documentation Accessing case studies of customer experiences with the Teradata Database Accessing third-party industry analyses of Teradata Warehouse products Accessing white papers Viewing or subscribing to various online periodicals

Introduction to Teradata Warehouse

Contents

Preface Supported Software Release ............................................................................................ i Changes to This Book ....................................................................................................... i About This Book ................................................................................................................vii List of Acronyms ................................................................................................................ ix Technical Information on the Web..................................................................................xii

Chapter 1: Teradata Warehouse What Is a Data Warehouse............................................................................................. 1–2 The Next Step for the Data Warehouse........................................................................ 1–3 Strategic Queries........................................................................................................... 1–3 Tactical Queries............................................................................................................. 1–3 Teradata Warehouse .................................................................................................... 1–3 Section 1:

Teradata Warehouse Overview

Chapter 2: Teradata Warehouse Overview What Is the Teradata Database...................................................................................... 2–2 Attachment Methods ................................................................................................... 2–2 How to Communicate with the Teradata Database Using SQL............................ 2–2 Purpose in Development................................................................................................ 2–3 Shared Information Architecture .................................................................................. 2–4 Teradata Database Server Software.............................................................................. 2–5 Parallel Upgrade Tool..................................................................................................... 2–6 What Are Teradata Tools and Utilities......................................................................... 2–7 For More Information ................................................................................................... 2–14

Introduction to Teradata Warehouse

xiii

Chapter 3: Teradata Database Architecture SMP and MPP Machines ................................................................................................ 3–2 The BYNET .................................................................................................................... 3–3 Boardless BYNET.......................................................................................................... 3–4 Disk Arrays....................................................................................................................... 3–5 Logical Units.................................................................................................................. 3–5 Pdisks and Vdisks......................................................................................................... 3–5 Cliques............................................................................................................................... 3–6 Hot Standby Nodes ......................................................................................................... 3–7 Virtual Processors............................................................................................................ 3–8 Parsing Engine .............................................................................................................. 3–8 Access Module Processor ............................................................................................ 3–9 AMP Clusters .............................................................................................................. 3–10 Parsing Engine Request Processing ............................................................................ 3–11 The Dispatcher ............................................................................................................ 3–12 The AMPs .................................................................................................................... 3–13 Example: SQL Statement ........................................................................................... 3–13 Parallel Database Extensions ....................................................................................... 3–15 Trusted Parallel Applications ................................................................................... 3–15 PDE and MPP Systems .............................................................................................. 3–15 Start and Stop PDE ..................................................................................................... 3–15 The Teradata File System ............................................................................................. 3–16 Cylinder Read ............................................................................................................. 3–16 Disk I/O Integrity Checking..................................................................................... 3–16 Workstation Types and Available Platforms ............................................................ 3–18 System Console ........................................................................................................... 3–18 Administration Workstation..................................................................................... 3–18 Teradata Database Window......................................................................................... 3–19 How the Database Window Communicates with Teradata Database ............... 3–19 Running DBW ............................................................................................................. 3–19 For More Information ................................................................................................... 3–20

xiv

Introduction to Teradata Warehouse

Chapter 4: International Language Support Character Set Overview.................................................................................................. 4–2 What Is a Repertoire..................................................................................................... 4–2 Character Representation ............................................................................................ 4–2 External and Internal Character Sets ............................................................................ 4–3 Character Data Translation ......................................................................................... 4–3 What Teradata Database Supports ............................................................................ 4–3 Teradata Database Character Data Storage ................................................................. 4–4 Internal Server Character Sets .................................................................................... 4–4 User Data ....................................................................................................................... 4–4 System Dictionary Data ............................................................................................... 4–4 Language Support Modes .............................................................................................. 4–5 Default Character Set for User Data .......................................................................... 4–5 Character Set for System Dictionary Data ................................................................ 4–6 Character Set for Dictionary Data Other Than Object Names ............................... 4–6 Standard Language Support Mode .............................................................................. 4–7 LATIN Character Set.................................................................................................... 4–7 Compatible Languages ................................................................................................ 4–7 Japanese Language Support Mode ............................................................................... 4–8 Advantages of Storing System Dictionary Data Using KANJI1............................ 4–8 Advantages of Storing User Data Using UNICODE............................................... 4–8 Extended Support............................................................................................................ 4–9 For More Information ................................................................................................... 4–10 Section 2:

The Teradata Database Structure

Chapter 5: Structured Query Language (SQL) Why SQL........................................................................................................................... 5–2 What is SQL...................................................................................................................... 5–3 Data Definition Language ........................................................................................... 5–3 Data Control Language ............................................................................................... 5–4 Data Manipulation ....................................................................................................... 5–4 SQL Data Types ............................................................................................................... 5–6 Teradata and ANSI-Compliant Data Types ............................................................. 5–6 Data Type Attributes.................................................................................................... 5–6 Statement Punctuation.................................................................................................... 5–8

Introduction to Teradata Warehouse

xv

SQL Statements and Requests ....................................................................................... 5–9 The SELECT Statement ................................................................................................. 5–10 SELECT Statement and Set Operators..................................................................... 5–10 SELECT Statement and Joins .................................................................................... 5–11 SQL Functions................................................................................................................ 5–12 Scalar Functions .......................................................................................................... 5–12 Aggregate Functions .................................................................................................. 5–12 Ordered Analytical Functions .................................................................................. 5–13 User-Defined Functions................................................................................................ 5–14 Creating User-Defined Functions ............................................................................ 5–14 SQL Statements Related to Functions...................................................................... 5–15 Cursors ............................................................................................................................ 5–16 For More Information ................................................................................................... 5–17

Chapter 6: Application Development Types of SQL Development ........................................................................................... 6–2 Explicit SQL Development .......................................................................................... 6–2 Implicit SQL Development.......................................................................................... 6–2 Embedded SQL Applications ........................................................................................ 6–3 What Is Embedded SQL .............................................................................................. 6–3 How Does an Application Program Use Embedded SQL...................................... 6–3 Supported Languages and Platforms ........................................................................ 6–4 Macros as SQL Applications .......................................................................................... 6–5 SQL Used to Create a Macro....................................................................................... 6–5 Macro Usage.................................................................................................................. 6–6 SQL Used to Modify a Macro ..................................................................................... 6–6 SQL Used to Delete a Macro ....................................................................................... 6–6 Teradata Stored Procedures as SQL Applications...................................................... 6–7 SQL Used to Create Stored Procedures..................................................................... 6–7 Stored Procedure Example......................................................................................... 6–7 SQL Used to Execute a Stored Procedures ............................................................... 6–8 The EXPLAIN Statement.............................................................................................. 6–10 How Is EXPLAIN Useful........................................................................................... 6–10 EXPLAIN With Simple Join Index Example........................................................... 6–10 Third-Party Development ............................................................................................ 6–13 TS/API Products ........................................................................................................ 6–13 Compatible Third-Party Software Products ........................................................... 6–13 Performance Monitor/Application Programming Interface ............................... 6–13 For More Information ................................................................................................... 6–14

xvi

Introduction to Teradata Warehouse

Chapter 7: The Teradata Database Model What is a Relational Model ............................................................................................ 7–2 What is a Relational Database ....................................................................................... 7–3 Set Theory and Relational Database Terminology .................................................. 7–3 Tables, Rows, and Columns........................................................................................... 7–4 Table Constraints.......................................................................................................... 7–4 Permanent and Temporary Tables............................................................................. 7–4 Global Temporary Tables ............................................................................................ 7–4 Volatile Temporary Tables.......................................................................................... 7–5 Derived Tables .............................................................................................................. 7–5 Rows and Columns ...................................................................................................... 7–5 For More Information ..................................................................................................... 7–6

Chapter 8: Data Distribution and Access Methods Teradata Database Indexes ............................................................................................ 8–2 Primary Indexes............................................................................................................... 8–3 Primary Index Characteristics .................................................................................... 8–3 How Are Primary Keys and Primary Indexes Related ........................................... 8–3 Partitioned Primary Indexes.......................................................................................... 8–5 Non-partitioned Primary Indexes.............................................................................. 8–5 How Do Partitioned and Non-Partitioned Primary Indexes Compare................ 8–5 Secondary Indexes........................................................................................................... 8–6 Secondary Index Subtables ......................................................................................... 8–6 How Do Primary and Secondary Indexes Compare............................................... 8–6 Join Indexes ...................................................................................................................... 8–7 Single-Table Join Indexes ............................................................................................ 8–7 Multi-Table Join Indexes ............................................................................................. 8–7 Aggregate Join Indexes................................................................................................ 8–7 Sparse Join Indexes....................................................................................................... 8–8 Hash Indexes.................................................................................................................... 8–9 Index Specification ........................................................................................................ 8–10 Creating Indexes ......................................................................................................... 8–10 Strengths and Weaknesses of Various Types of Indexes...................................... 8–10 Hashing........................................................................................................................... 8–14 Identity Column ............................................................................................................ 8–15 For More Information ................................................................................................... 8–16

Introduction to Teradata Warehouse

xvii

Chapter 9: Data Dictionary What is the Data Dictionary........................................................................................... 9–2 Data Dictionary Content.............................................................................................. 9–2 What Is in a Data Dictionary Table ............................................................................ 9–3 Teradata Database Data Dictionary Views.................................................................. 9–6 What Is in a View.......................................................................................................... 9–6 Why Use Views............................................................................................................. 9–6 Who Uses Data Dictionary Views................................................................................. 9–7 SQL Access to the Data Dictionary ............................................................................... 9–8 For More Information ..................................................................................................... 9–9

Chapter 10: Teradata Meta Data Services Services What Is Metadata........................................................................................................... 10–2 Types of Metadata ......................................................................................................... 10–3 Teradata Meta Data Services ....................................................................................... 10–5 Creating the Teradata Meta Data Repository......................................................... 10–6 Connecting to the Teradata Meta Data Repository ............................................... 10–6 For More Information ................................................................................................... 10–7

Chapter 11: Other Database Objects What Are Views............................................................................................................. 11–2 SQL Statements Related to Views ............................................................................ 11–2 Restrictions on Using Views ..................................................................................... 11–2 What Are Teradata Stored Procedures....................................................................... 11–3 Why Use Stored Procedures ..................................................................................... 11–3 Elements of a Teradata Stored Procedure............................................................... 11–4 What Are Macros........................................................................................................... 11–5 SQL Statements Related to Macros .......................................................................... 11–5 Single-User and Multi-User Macros ........................................................................ 11–5 Macro Processing........................................................................................................ 11–5

xviii

Introduction to Teradata Warehouse

What Are Triggers......................................................................................................... 11–6 Types of Triggers ........................................................................................................ 11–6 When Do Triggers Fire .............................................................................................. 11–6 ANSI-Specified Order................................................................................................ 11–7 Trigger Functions ....................................................................................................... 11–7 SQL Statements Related to Triggers ........................................................................ 11–7 Elements of a Trigger ................................................................................................. 11–8 Restrictions on Triggers........................................................................................... 11–10 For More Information ................................................................................................. 11–11 Section 3:

Teradata Database System Operation

Chapter 12: Normalization and Referential Integrity Normalization ................................................................................................................ 12–2 Normal Forms ............................................................................................................. 12–2 Relational Database Terminology ............................................................................ 12–3 First, Second, and Third Normal Forms .................................................................... 12–5 First Normal Form...................................................................................................... 12–5 Second Normal Form ................................................................................................. 12–5 Third Normal Form.................................................................................................... 12–6 Advantages of Normalization .................................................................................. 12–6 Boyce-Codd Normal Form and Higher Normal Forms .......................................... 12–7 Boyce-Codd Normal Form........................................................................................ 12–7 Fourth Normal Form.................................................................................................. 12–7 Fifth Normal Form ..................................................................................................... 12–7 Referential Integrity ...................................................................................................... 12–8 Referential Integrity in the Teradata Database ...................................................... 12–8 Referential Integrity Terminology ........................................................................... 12–8 Referencing (Child) Table ......................................................................................... 12–9 Referenced (Parent) Table ......................................................................................... 12–9 Why Is Referential Integrity Important................................................................... 12–9 Referential Integrity Constraints............................................................................... 12–11 Referential Constraints ............................................................................................ 12–11 Batch Referential Integrity ...................................................................................... 12–11 Rules for Referential Integrity Constraints ........................................................... 12–12 Referential Constraint Checks ................................................................................ 12–13 For More Information ................................................................................................. 12–14

Introduction to Teradata Warehouse

xix

Chapter 13: Data Communication Between Client and Teradata Database Attachment Methods .................................................................................................... 13–2 CLIv2 for Channel-Attached Systems ....................................................................... 13–3 What CLIv2 for Channel-Attached Clients Does................................................... 13–3 Teradata Director Program ....................................................................................... 13–3 Server............................................................................................................................ 13–4 CLIv2 for Network-Attached Systems ....................................................................... 13–5 What CLIv2 for Network-Attached Clients Does.................................................. 13–5 Micro Teradata Director Program............................................................................ 13–5 Micro Operating System Interface ........................................................................... 13–5 Other Types of Data Communications....................................................................... 13–7 WinCLI ......................................................................................................................... 13–7 ODBC............................................................................................................................ 13–7 JDBC ............................................................................................................................. 13–7 For More Information ................................................................................................... 13–8

Chapter 14: Reliability Software Fault Tolerance.............................................................................................. 14–2 Vproc Migration.......................................................................................................... 14–2 Fallback Tables............................................................................................................ 14–3 AMP Clusters .............................................................................................................. 14–4 One-Cluster Configuration ....................................................................................... 14–4 Smaller Cluster Configuration.................................................................................. 14–5 Journaling .................................................................................................................... 14–6 Teradata Archive/Recovery ..................................................................................... 14–7 Table Rebuild Utility .................................................................................................. 14–7 Hardware Fault Tolerance ........................................................................................... 14–8 For More Information ................................................................................................. 14–10

xx

Introduction to Teradata Warehouse

Section 4:

Management and Monitoring

Chapter 15: Concurrency Control and Transaction Recovery What is Concurrency Control...................................................................................... 15–2 What is Recovery........................................................................................................... 15–3 Concept of a Transaction.............................................................................................. 15–4 Definition of a Transaction........................................................................................ 15–4 Definition of Serializability ....................................................................................... 15–4 Transaction Semantics ............................................................................................... 15–4 ANSI Mode Transactions ............................................................................................. 15–5 BEGIN TRANSACTION/END TRANSACTION Statements ............................. 15–5 Roll Back an ANSI Transaction ................................................................................ 15–5 Teradata Mode Transactions ....................................................................................... 15–6 BEGIN TRANSACTION/END TRANSACTION Statements ............................. 15–6 Roll Back a Teradata Mode Transaction.................................................................. 15–6 Concept of a Lock .......................................................................................................... 15–7 Overview of Teradata Database Locking................................................................ 15–7 Why Do Database Management Systems Require Locking................................. 15–7 Lock Levels .................................................................................................................. 15–8 Levels of Locks Types ................................................................................................ 15–9 Automatic Database Lock Levels ........................................................................... 15–10 Deadlocks and Deadlock Resolution..................................................................... 15–10 Host Utility Locks........................................................................................................ 15–11 HUT Lock Types....................................................................................................... 15–11 HUT Lock Characteristics ....................................................................................... 15–11 System and Media Recovery ..................................................................................... 15–12 System Restarts ......................................................................................................... 15–12 Transaction Recovery............................................................................................... 15–12 Down AMP Recovery .............................................................................................. 15–13 Two-Phase Commit Protocol..................................................................................... 15–14 Definition of Participant .......................................................................................... 15–14 Definition of Coordinator........................................................................................ 15–14 For More Information ................................................................................................. 15–15

Introduction to Teradata Warehouse

xxi

Chapter 16: Database Management and Analysis Tools Teradata Tools and Utilities - Archive Utilities ........................................................ 16–2 Teradata Archive/Recovery Utility......................................................................... 16–2 Open Teradata Backup .............................................................................................. 16–2 Teradata Tools and Utilities - Data Load and Export Utilities ............................... 16–3 Teradata MultiLoad ................................................................................................... 16–3 Teradata FastLoad ...................................................................................................... 16–3 Teradata Parallel Data Pump.................................................................................... 16–4 Teradata FastExport Utility....................................................................................... 16–4 Database Management Tools....................................................................................... 16–5 Teradata Database - Active Session and Configuration ....................................... 16–5 System Resource Management.................................................................................... 16–7 Teradata Database - Ferret Utility............................................................................ 16–7 Teradata Database - Priority Scheduler Utility ...................................................... 16–7 Teradata Tools and Utilities - Teradata Statistics Wizard .................................... 16–8 Teradata Database - Teradata Dynamic Query Manager..................................... 16–9 Teradata Database - Teradata MultiTool ................................................................. 16–11 Database Query Analysis Tools ................................................................................ 16–12 Teradata Tools and Utilities - Teradata Index Wizard........................................... 16–13 What Can the Teradata Index Wizard Do ............................................................ 16–13 Demographics ........................................................................................................... 16–14 Teradata Database - Query Capture Facility ........................................................... 16–15 QCD Schema Improvement .................................................................................... 16–15 Teradata Index Wizard Support............................................................................. 16–15 Teradata Tools and Utilities - Teradata Visual Explain ......................................... 16–16 Teradata Database - Database Query Log ............................................................... 16–17 Teradata Database - Target-Level Emulation.......................................................... 16–18 Teradata Tools and Utilities - Teradata System Emulation Tool.......................... 16–19 Teradata Database - Database Object Use Count ................................................... 16–20 Query Facilities ............................................................................................................ 16–21 Teradata Tools and Utilities - Basic Teradata Query Utility ................................. 16–22 BTEQ Support ........................................................................................................... 16–22 BTEQ Communication............................................................................................. 16–22 Teradata Tools and Utilities - Teradata SQL Assistant.......................................... 16–23 Teradata Tools and Utilities - Preprocessor2........................................................... 16–25 For More Information ................................................................................................. 16–26

xxii

Introduction to Teradata Warehouse

Chapter 17: Security and Integrity Security and Integrity ................................................................................................... 17–2 System Integrity............................................................................................................. 17–3 System Security.............................................................................................................. 17–4 Resource Access Control .............................................................................................. 17–5 User Identifiers ........................................................................................................... 17–5 Client Identifiers ......................................................................................................... 17–5 Logon Policies ............................................................................................................. 17–5 TDP Security................................................................................................................ 17–6 Single Sign On............................................................................................................. 17–7 Encryption ...................................................................................................................... 17–9 Network Data Encryption ......................................................................................... 17–9 Logon Encryption and the Teradata Gateway ....................................................... 17–9 Security Features ......................................................................................................... 17–10 Password Attributes................................................................................................. 17–10 User-Level Password Attributes ............................................................................ 17–11 DBC.DBase Table...................................................................................................... 17–11 SQL Used to Control Logon....................................................................................... 17–12 Data Access Control ................................................................................................. 17–12 Ownership and Implicit Rights .............................................................................. 17–12 System Views for Access Information................................................................... 17–13 Security Policies and Physical Access Control........................................................ 17–14 Principle Considerations of a Security Policy ...................................................... 17–14 Key Implementation Elements of a Security Policy ............................................ 17–14 Auditing and Accountability..................................................................................... 17–15 For More Information ................................................................................................. 17–16

Chapter 18: System Administration Space Allocation for Databases and Users................................................................. 18–2 Databases and Users .................................................................................................. 18–2 How to Create a Finance and Administration Database ..................................... 18–2 How to Create Databases .......................................................................................... 18–4 How to Create Users .................................................................................................. 18–4 Roles and Profiles for Users ......................................................................................... 18–6

Introduction to Teradata Warehouse

xxiii

Accounting ..................................................................................................................... 18–7 Session Management.................................................................................................. 18–7 Establishing a Session ................................................................................................ 18–7 Logon Operands ......................................................................................................... 18–7 Session Requests ......................................................................................................... 18–7 Account String Expansion......................................................................................... 18–8 Account Performance Groups .................................................................................. 18–8 Maintenance Utilities .................................................................................................. 18–10 For More Information ................................................................................................. 18–14

Chapter 19: System Monitoring Teradata Manager ......................................................................................................... 19–2 System and Configuration Status................................................................................ 19–6 Resource Usage Monitoring......................................................................................... 19–7 Resource Usage Tables and Views........................................................................... 19–7 Resource Usage Data Categories.............................................................................. 19–7 Resource Usage Data Handling................................................................................ 19–8 Resource Usage Macros ............................................................................................. 19–8 How to Control Collection and Logging of Resource Usage Data...................... 19–8 Summary Mode .......................................................................................................... 19–9 Performance Monitoring ............................................................................................ 19–10 The TDPTMON......................................................................................................... 19–10 System Management Facility .................................................................................. 19–10 The Performance Monitor/Application Interface ............................................... 19–10 For More Information ................................................................................................. 19–11

Index.......................................................................................................................... Index–1

xxiv

Introduction to Teradata Warehouse

Chapter 1:

Teradata Warehouse This chapter presents an overview of the Teradata® Warehouse. Topics include: • •

What is a data warehouse What is the next step in the development of the data warehouse

Introduction to Teradata Warehouse

1–1

What Is a Data Warehouse

What Is a Data Warehouse Originally, the data warehouse was a historical database containing data derived from an active operational database. The data in the warehouse was: • • • •

Subject-oriented Integrated Identified by a timestamp Nonvolatile, that is, nothing was added or removed

Rows in the tables supporting the operational database were loaded into a historical database (the data warehouse) after they exceeded some well-defined date. To support this capability, the data in the data warehouse contained a timestamp, which distinguished it from the data in the tables of the operational database.

1–2

Introduction to Teradata Warehouse

The Next Step for the Data Warehouse

The Next Step for the Data Warehouse The concept of active data warehousing evolved as part of the data warehouse environment. The data warehouse was an enterprise-wide, centralized database that stored information gathered from operational databases. This data was typically used to make strategic business decisions. The active concept takes the traditional data warehouse one step further by allowing you to ask questions that produce answers that are important not only to strategic decision making but to tactical decision making as well.

Strategic Queries Strategic queries are used when taking a proactive approach to the future. They can produce information that you can use to develop a cohesive long-term plan or course of action. The stored data that supports strategic queries must be historical in nature so that it provides a fair representation of what has happened in the past. Strategic queries involve processing volumes of data, and because the end result will provide information that is used in the long term, response time becomes less critical. Queries written to support the strategic decision-making process are typically ad-hoc and are generally not repeated.

Tactical Queries Tactical queries are useful in preparing for the future too, the near future, in that they are reactive and event driven. Tactical queries have some of the data requirements of strategic queries, in that they often act on historical information. Because strategic queries provide information that supports long-term decision making, the data need not be the latest. Because tactical queries support short-term decisions, the data from which tactical queries derive answers must be current or fresh. How fresh the data must be depends on the questions you ask.

Teradata Warehouse In the active environment of the Teradata Warehouse, data is captured from many sources, for example, customer orders, inventory and shipping applications, direct mail and e-mail, phone calls, and so forth. This data is stored in the Teradata Warehouse along with data from operational systems and other sources. Teradata provides utilities that load data in a timely, continuous fashion or in batch loads. This data provides a single source, or one version of the truth, for all those who seek information from the warehouse. The Teradata Warehouse encompasses not only the Teradata Database, the information repository, but also Teradata Tools and Utilities, a comprehensive suite of management tools and utilities.

Introduction to Teradata Warehouse

1–3

The Next Step for the Data Warehouse

The suite is organized into the following functional categories:

1–4

The following category of utility…

Is used…

mainframe

in a mainframe environment.

Teradata Utility Pak

in a network-attached environment.

Teradata preprocessors

to access the Teradata Database by interpreting Teradata SQL statements in C or COBOL programs.

load and unload

to load data into and unload data from the Teradata Database.

database management utilities

to control the Teradata Database.

query analysis tools

to analyze the performance of the Teradata Database and improve the efficiency of the queries run against it.

storage management

to back up and restore data on the Teradata Database.

Teradata Meta Data Services t

to store, administer, and navigate the metadata in a Teradata Warehouse.

Introduction to Teradata Warehouse

Section Contents

Section 1:

Teradata Warehouse Overview

Introduction to Teradata Warehouse

Section Contents

Introduction to Teradata Warehouse

Chapter 2:

Teradata Warehouse Overview This chapter presents an overview of the Teradata Warehouse and its components: Topics include: • • • • • •

A definition of the Teradata Database Purpose in development Shared information architecture Teradata Database server software Parallel Upgrade Tool Teradata Tools and Utilities

Introduction to Teradata Warehouse

2–1

What Is the Teradata Database

What Is the Teradata Database The Teradata Warehouse evolved from the concept of an enterprise-wide, centralized database that was used to store information gathered from operational databases. The Teradata Database hardware and software and Teradata Tools and Utilities provide a complete relational database management system to support the an active data warehouse concept.

Attachment Methods To support its role in the active environment, the Teradata Database can use either of two attachment methods to connect to other operational computer systems as illustrated in the following table: This attachment method…

Allows the system to be attached…

channel

directly to an I/O channel of a mainframe computer.

network

to intelligent workstations through a Local Area Network (LAN).

How to Communicate with the Teradata Database Using SQL Structured Query Language (SQL) is the language of relational database communication. To manipulate data in the Teradata Database, you issue the appropriate SQL statement. With the Teradata Database, you can access, store, and operate on data using Teradata Structured Query Language (Teradata SQL). Teradata SQL, which is broadly compatible with IBM and ANSI SQL, extends the capabilities of SQL by adding Teradata-specific extensions to the generic SQL statements. For more information about SQL, see Chapter 5: “Structured Query Language (SQL).” When you develop applications for the Teradata Database, you should use the most current Teradata SQL syntax because it is the most ANSI-compliant. Teradata SQL still supports older applications written in previous non-ANSI-compliant versions of Teradata SQL. You can run transactions in either Teradata or ANSI mode and these modes can be set or changed. Teradata has an international customer base. To accommodate communications in different languages, Teradata supports non-Latin character sets, for example, Japanese, Chinese, and so forth. For detailed information about international character set support, see Chapter 4: “International Language Support.” Users of the client systems send requests to the Teradata Database through a choice of supported utilities and interfaces. For information about the interfaces, see “What Are Teradata Tools and Utilities” on page 2-7 and “Teradata Database Server Software” on page 2-5.

2–2

Introduction to Teradata Warehouse

Purpose in Development

Purpose in Development Teradata has designed a database that allows users to view and manage large amounts of data as a collection of related tables. Some of the capabilities of the Teradata Database are listed in the following table: Teradata Database provides…

That…

capacity

includes: • Terabytes of detailed data stored in billions of rows • Thousands of millions of instructions per second (MIPS) to process data

parallel processing

makes Teradata Database faster than other relational systems.

single data store

can be accessed by network-attached and channel-attached systems. supports the requirements of many diverse clients.

fault tolerance

automatically detects and recovers from hardware failures.

data integrity

ensures that transactions either complete or rollback to a stable state if a fault occurs.

scalable growth

allows expansion without sacrificing performance.

SQL

serves as a standard access language that permits users to control data.

Teradata developers designed the Teradata Database from mostly off-the-shelf hardware components. The result was an inexpensive, high-quality system that exceeded the performance of conventional relational database management systems. The hardware components of the Teradata Database evolved from those of a simple database machine into those of a general-purpose, massively parallel computer running the database software as a trusted parallel application (TPA). The architecture includes both single-node, symmetric multi-processing (SMP) systems and multi-node, massively parallel processing (MPP) systems in which the distributed functions communicate by means of a fast interconnect structure. The interconnect structure in the current architecture is the BYNET for MPP systems and the boardless BYNET for SMP systems.

Introduction to Teradata Warehouse

2–3

Shared Information Architecture

Shared Information Architecture A design goal of the Teradata Database was to provide a single data store for a variety of client architectures. This single source approach greatly reduces data duplication and inaccuracies that can creep into data that is maintained in multiple stores. This approach to data storage is known as the single version of the truth, and Teradata used Shared Information Architecture (SIA) to implement the database. SIA eliminates the need for maintaining duplicate databases on multiple platforms. With the SIA, most mainframe clients, network-attached workstations, and personal computers can access and manipulate the same database simultaneously. The following figure illustrates the principle of the SIA. In this figure the mainframes are attached via channel connections and other systems are attached via network connections.

IBM MVS mainframe

Teradata Database single data store Local Area Network

IBM VM mainframe

Personal Computer (running Windows)

UNIX workstation 1091F001

2–4

Introduction to Teradata Warehouse

Teradata Database Server Software

Teradata Database Server Software Teradata Database program software resides on the server and implements the relational database environment. The server software includes the following functional modules: Teradata Database Server Software This module…

Provides…

Database Window

a tool that you can use to control the operation of the Teradata Database.

Teradata Gateway

communications support. The serverresident program provides a pathway for applications running on network-attached clients to access the Teradata Database. The Teradata Gateway runs as a separate operating system task. The Gateway software validates messages from clients that generate sessions over the network and it controls encryption.

Parallel Data Extensions (PDE)

a software interface layer on top of the operating system that enables the database to operate in a parallel environment. For more information about PDE, see “Parallel Database Extensions” on page 3-15.

Teradata Database management software:

the following modules: •

Request dispatcher

•

Session controller

•

Access module processor (AMP)

•

Teradata file system

For more information about the Teradata file system, see “The Teradata File System” on page 3-16. Parsing Engine

Introduction to Teradata Warehouse

the following modules: •

Parser

•

Optimizer

•

Step Generator

•

Dispatcher

2–5

Parallel Upgrade Tool

Parallel Upgrade Tool The Parallel Upgrade Tool (PUT) automates much of the installation process for Teradata Database software. There are two major operational modes for PUT:

2–6

The operational mode…

Does the following…

Major upgrade

upgrades one or more software products to the next version.

Patch upgrade

applies patch packages to one or more software products.

Introduction to Teradata Warehouse

What Are Teradata Tools and Utilities

What Are Teradata Tools and Utilities Teradata Database runs with or without a channel- or network-attached client. Teradata Tools and Utilities is a comprehensive suite of management tools and utilities designed to operate in the client environment. The information in the following tables describes the available Teradata Tools and Utilities that can be installed on the client, recognizing that the client may be the computer system that runs the Teradata Database program software as well.

Introduction to Teradata Warehouse

2–7

What Are Teradata Tools and Utilities

The following table contains information about the utilities available for use on channel-attached mainframe clients: Mainframe Utilities

2–8

This package…

Provides…

For…

Basic Teradata Query (BTEQ)

an interactive and batch query processor/report generator

channel-attached clients.

Customer Information Control System (CICS)

an interface that enables CICS macro or commandlevel application programs to access Teradata Database resources

Host Utility Consoles (HUTCNS)

access to a number of AMP-based utilities

IBM IMS/DC

provides an Information Management System (IMS) interface to the Teradata Database

Teradata Archive/Recovery Utility

a means to save and restore data

Teradata Call-Level Interface Version 2 (CLIv2)

a collection of callable service routines that provide the interface between applications and the Teradata Gateway. The Gateway is the interface between CLI and the server

Teradata Director Program (TDP)

a high-performance interface for messages sent between the client and the Teradata Database

Introduction to Teradata Warehouse

What Are Teradata Tools and Utilities Mainframe Utilities This package…

Provides…

For…

Teradata C, COBOL, and PL/I Preprocessor2 (PP2)

a method of accessing data stored in the Teradata Database.

channel-attached clients.

Preprocessor2 interprets and expands Teradata SQL statements incorporated into an application program. Teradata Transparency Series/Application (TS/API)

gateway services allowing products that access either DB2 or SQL/DS databases to access data stored on the Teradata Database

The following table contains information about the Teradata Tools and Utilities available for use by channel- and network-attached-clients: Teradata Utility Pack This package…

Provides…

For…

BTEQ

an interactive and batch query processor/report generator

channel- and networkattached clients.

ODBC

access to the Teradata Database from various tools, increasing the portability of access

network-attached clients.

OLE DB provider

an interface for accessing and manipulating all types of data

Teradata Administrator

an interface that you can use to perform database administration tasks

Teradata Call-Level Interface Version 2 (CLIv2)

callable service routines that provide the interface between applications and the Teradata Gateway. Teradata Gateway is the interface between CLI and the server.

Introduction to Teradata Warehouse

channel- and networkattached clients.

2–9

What Are Teradata Tools and Utilities Teradata Utility Pack This package…

Provides…

For…

Teradata Driver for JDBC Interface

platform-independent, Java-application access to the Teradata Database from various tools increasing portability of data

network-attached clients.

Teradata MultiTool

an interface to various Teradata Database utilities

Teradata SQL Assistant

a means of retrieving data from any ODBCcompliant database server and of manipulating and storing the data on your desktop PC

Teradata Tools and Utilities provides tools that you can use to develop applications that access the Teradata Database by interpreting SQL statements in C, COBOL, or Programming Language 1 (PL/I) programs. The following table contains information about available preprocessors for use by channeland network-attached clients: Teradata Preprocessors - Application Development

2 – 10

This package…

Provides…

For…

Teradata COBOL Preprocessor

a mechanism for embedding SQL in COBOL programs

channel-attached clients and some networkattached clients.

Teradata C Preprocessor

a mechanism for embedding SQL in C programs

channel- and networkattached clients.

Teradata PL/I Preprocessor

a mechanism for embedding SQL in PL/I programs

channel-attached clients.

Introduction to Teradata Warehouse

What Are Teradata Tools and Utilities

The following table contains information about the load and unload utilities available for use by channel- and network-attached-clients: Load and Unload Utilities This package…

Provides…

For…

Data Connector

a block-level I/O interface to one or more access modules that interface to a data storage device

channel- and networkattached clients.

Teradata FastExport

a means of extracting large volumes of data from the Teradata Database

Teradata FastLoad

high-performance data loading from client files into empty tables

Teradata MultiLoad

high-performance data maintenance, including inserts, updates, and deletions to existing tables

Teradata Tools and Utilities Access Modules

a block-level I/O interface to data residing on a specific external data storage device

Teradata TPump

continuous update of tables; performs insert, update, and delete operations or a combination of these operations on multiple tables using the same source feed

Teradata Warehouse Builder

a means to load data into and export data from any accessible database in the Teradata Database or other data store for which an access operator or an access module exists

Introduction to Teradata Warehouse

2 – 11

What Are Teradata Tools and Utilities

The following table contains information about the database management tools available for use by channel- and network-attached-clients: Database Management Utilities This utility…

Provides…

For…

Teradata Dynamic Query Manager

a means to manage access to and use of the Teradata Database resources.

channel- and networkattached clients.

Teradata Manager

a graphical-based systems management platform containing a suite of specialized tools and applications for monitoring and controlling Teradata Database resource usage on one or more systems

network-attached clients.

Teradata Performance Monitor

an orderly presentation of performance, usage, status, contention, and availability data for Teradata Database at the overall, resource, and session levels

The following table contains information about Teradata Database Query Analysis Tools (DBQAT) for use by network-attached clients: Database Query Analysis Tools

2 – 12

This utility…

Provides…

For…

Teradata Index Wizard

analyses of various SQL query workloads and suggests candidate indexes to enhance performance of those queries

network-attached clients.

Teradata Statistics Wizard

automation for collecting workload statistics, or selecting recommended indexes or columns for statistics collection for re-collection

Introduction to Teradata Warehouse

What Are Teradata Tools and Utilities Database Query Analysis Tools This utility…

Provides…

For…

Teradata System Emulation Tool

the capability to examine the query plans generated by the test system optimizer as if the queries were processed on the production system

network-attached clients.

Teradata Visual Explain

a simplified depiction of the execution plan of complex SQL statements

The following table contains information about the storage management utilities available for use by channel- and network-attached-clients: Storage Management Utilities This utility…

Provides…

For…

Archive/Recovery

a means of archiving data to tape and restoring tape data to the Teradata Database

channel-attached clients.

Open Teradata Backup (OTB) includes the following:

open architecture products for backup and restore functions for Microsoft Windows clients

network-attached clients.

• NetVault • NetBackup

Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.

Introduction to Teradata Warehouse

2 – 13

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

Archive utilities

Teradata Archive/Recovery Utility Reference

BTEQ

Basic Teradata Query Reference

Communication using CLIv2

Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems

Database Query Log

Database Administration Data Dictionary Performance Optimization SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing

Embedded SQL

Teradata Preprocessor2 for Embedded SQL Programmer Guide SQL Reference: Stored Procedures and Embedded SQL

General Teradata Database architecture

Database Design

JDBC

Teradata Driver for the JDBC Interface User Guide

Load and unload utilities

Teradata FastExport Reference Teradata FastLoad Referencee Teradata MultiLoad Reference Teradata Parallel Data Pump Reference

ODBC

Teradata ODBC Driver User Guide

Parallel Upgrade Tool

Parallel Upgrade Tool (PUT) for MP-RAS User Guide Parallel Upgrade Tool (PUT) for Windows NT and Windows 2000 User Guide.

Preprocessor2

2 – 14

Teradata Preprocessor2 for Embedded SQL Programmer Guide

Introduction to Teradata Warehouse

For More Information IF you want to learn more about…

THEN see…

Priority Scheduler

Utilities - Volume 2, G-S

Query Capture Database

Database Design Teradata Manager User Guide SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing

SQL syntax and lexicon

SQL Reference: Fundamentals

Teradata Database utilities

Utilities

Teradata Director Program

Teradata Director Program Reference

Teradata Dynamic Query Manager

Teradata Dynamic Query Manager Administrator Guide Teradata Dynamic Query Manager User Guide

Teradata Index Wizard

Teradata Index Wizard User Guide

Teradata Manager

Teradata Manager User Guide

Teradata SQL Assistant

Teradata SQL Assistant for Microsoft Windows User Guide

Teradata Statistics Wizard

Teradata Statistics Wizard User Guide

Teradata System Level Emulation

Database Design Teradata System Emulation Tool User Guide

Teradata Visual Explain

Teradata Visual Explain User Guide

TS/API products

Teradata Transparency Series/Application Programming Interface User Guide

Introduction to Teradata Warehouse

2 – 15

For More Information

2 – 16

Introduction to Teradata Warehouse

Chapter 3:

Teradata Database Architecture This chapter briefly describes the Teradata Database hardware components and software architecture. The hardware that supports Teradata Database software is based on off-theshelf Symmetric Multiprocessing (SMP) technology. The hardware can be combined with a communications network that connects the SMP systems to form Massively Parallel Processing (MPP) systems. Topics include: • • • • • • • • • •

SMP and MPP platforms and the BYNET Disk arrays Cliques Hot standby nodes Virtual processors Request processing Parallel Database Extensions Teradata file system Workstations Teradata Database Window

Introduction to Teradata Warehouse

3–1

SMP and MPP Machines

SMP and MPP Machines The components of the SMP and Massively Parallel Processing (MPP) hardware platforms are: Component

Processor Node

Description

A hardware assembly containing several, tightly coupled, Central Processing Units (CPUs) in an SMP configuration. A single processor node is connected to one or more disk arrays with the following installed on the node:

Function

Serves as the hardware platform upon which the database software operates.

• Teradata Database software • Client interface software • Operating system • Multiple processors with shared-memory • Failsafe power provisions Note: An MPP is a configuration of two or more loosely coupled SMP nodes with shared SCSI access to multiple disk arrays. BYNET

Hardware interprocessor network to link nodes on an MPP system. Note: Single-node SMP systems use a softwareconfigured virtual BYNET driver to implement BYNET services.

Implements broadcast, multicast, or point-to-point communication between processors, depending on the situation.

These platforms use virtual processors that run a set of software processes on a node under the Parallel Database Extensions (PDE). Virtual processors (vprocs) provide the parallel environment that enables the Teradata Database to run on SMP and MPP systems. For more information about the PDE and vprocs, see the following sections in this chapter: • •

3–2

“Parallel Database Extensions” on page 3-15 “Virtual Processors” on page 3-8

Introduction to Teradata Warehouse

SMP and MPP Machines

The BYNET At the most elementary level, you can look at the BYNET as a bus that loosely couples all the SMP nodes in a multinode system. However, this view does an injustice to the BYNET, because the capabilities of the network range far beyond those of a simple system bus. The BYNET also possesses high-speed logic arrays that provide bidirectional broadcast, multicast, and point-to-point communication and merge functions. A multinode system has at least two BYNETs. This creates a fault-tolerant environment and enhances interprocessor communication. Load-balancing software optimizes the transmission of messages over the BYNETs. If one BYNET should fail, the second can handle the traffic. The total bandwidth for each network link to a processor node is 10 megabytes. The total throughput available for each node is 20 megabytes, because each node has two network links and the bandwidth is linearly scalable. For example, a 16-node system has 320 megabytes of bandwidth for point-to-point connections. The total, available broadcast bandwidth for any size system is 20 megabytes. The BYNET software also provides a standard TCP/IP interface for communication among the SMP nodes. The following figure shows how the BYNET connects individual SMP nodes to create an MPP system.

BYNET Interconnect

SMP

SMP

SMP

SMP

SMP Nodes

SCSI Buses

Disk Arrays GG01B002

Introduction to Teradata Warehouse

3–3

SMP and MPP Machines

Boardless BYNET Single-node SMP systems use Boardless BYNET (or virtual BYNET) software to simulate the BYNET hardware driver. Both the SMP and MPP machines run the set of software processes called vprocs on a node under the Parallel Database Extensions (PDE) software layer. For more information about the PDE, see “Parallel Database Extensions” on page 3-15. Vprocs come in two types: Access Module Processors (AMPs) and Parsing Engines (PEs) For more detailed information on vprocs see “Virtual Processors” on page 3-8.

3–4

Introduction to Teradata Warehouse

Disk Arrays

Disk Arrays Teradata employs Redundant Array of Independent Disks (RAID) storage technology to provide data protection at the disk level. You use the RAID Manager to group disk drives into arrays to ensure that data is available in the event of a disk failure. Each array typically consists of from one to four ranks of disks, with up to five disks per rank. Redundant implies that either data, functions, or components are duplicated in the architecture of the array.

Logical Units The RAID Manager uses drive groups. A drive group is a set of drives that have been configured into one or more Logical Units (LUNs). A LUN is a portion of every drive in a drive group. This portion is configured to represent a single disk. Each LUN is uniquely identified and on NCR UNIX MP-RAS systems is sliced into one or more UNIX slices. The operating system recognizes a LUN as its disk and is not aware that it is actually writing to spaces on multiple disk drives. This technique allows RAID technology to provide data availability without affecting the operating system. The PDE translates LUNs into virtual disks (vdisks) using slices (in NCR UNIX MP-RAS) or partitions (in Microsoft Windows 2000).

Pdisks and Vdisks A pdisk is the portion of a LUN that is assigned to an AMP. For information about the role that AMPs play in the Teradata Database architecture, see “Virtual Processors” on page 3-8. Each pdisk is uniquely identified and independently addressable. The group of pdisks assigned to an AMP is collectively identified as a vdisk. Using vdisks instead of direct connections to physical disk drives permits the use of RAID technology without affecting Teradata Database.

Introduction to Teradata Warehouse

3–5

Cliques

Cliques The clique is a feature of multinode systems that physically groups nodes together by multiported access to common disk array units. Inter-node disk array connections are made using SCSI buses. Shared SCSI-II paths enable redundancy to ensure that loss of a processor node or disk controller does not limit data availability. The nodes do not share data. They only share access to the disk arrays. The following figure illustrates a four-node clique.

Node 1 MCA Q 720

MCA

Node 3

Node 2 MCA Q 720

MCA

MCA Q 720

MCA

Node 4 MCA

MCA

Q 720

SCSI

D A C

GG01A003

A clique is the mechanism that supports the migration of vprocs, the AMPs and PEs under PDE, following a node failure. If a node in a clique fails, then AMP and PE vprocs migrate to other nodes in the clique and continue to operate while recovery occurs on their home node. PEs for channel-attached hardware cannot migrate because they are dependent on the hardware that is physically attached to the node to which they are assigned. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMPs.

3–6

Introduction to Teradata Warehouse

Hot Standby Nodes

Hot Standby Nodes The Hot Standby Node feature allows spare nodes to be incorporated into the production environment so that the Teradata Database can take advantage of the presence of the spare nodes to improve availability. A hot standby node is a node that: • • •

Is a member of a clique Does not normally participate in the trusted parallel application (TPA) Can be brought into the TPA to compensate for the loss of a node in the clique

Node 1 MCA

Node 3

Node 2

MCA

Q 720

MCA Q 720

MCA

MCA Q 720

MCA

Hot Standby Node MCA

MCA

Q 720

SCSI

D A C

1091A001

Configuring a hot standby node can eliminate the system-wide performance degradation associated with the loss of a single node in a single clique. When a node fails, the Hot Standby Node feature migrates all AMP and PE vprocs on the failed node to other nodes in the system, including the node that you have designated as the hot standby. The hot standby node becomes a production node. When the failed node returns to service, it becomes the new hot standby node. Configuring hot standby nodes eliminates: • •

Restarts that are required to bring a failed node back into service. Degraded service period when vprocs have migrated to other nodes in a clique.

Introduction to Teradata Warehouse

3–7

Virtual Processors

Virtual Processors The versatility of the Teradata Database is based on virtual processors (vprocs) that eliminate dependency on specialized physical processors. Vprocs are a set of software processes that run on a node under the Teradata Parallel Database Extensions (PDE) within the multitasking environment of the operating system. The following table contains information about the two types of vprocs: Type

Description

PE

The PE performs session control and dispatching tasks as well as parsing functions.

AMP

The AMP performs database functions to retrieve and update data on the vdisks.

A single system can support a maximum of 16,384 vprocs. The maximum number of vprocs per node can be as high as 128. Each vproc is a separate, independent copy of the processor software, isolated from other vprocs, but sharing some of the physical resources of the node, such as memory and CPUs. Multiple vprocs can run on an SMP platform or a node. Vprocs and the tasks running under them communicate using unique-address messaging, as if they were physically isolated from one another. This message communication is done using the Boardless BYNET Driver software on singlenode platforms or BYNET hardware and BYNET Driver software on multinode platforms.

Parsing Engine The PE is the vproc that communicates with the client system on one side and with the AMPs (via the BYNET) on the other side. Each PE executes the database software that manages sessions, decomposes SQL statements into steps, possibly parallel, and returns the answer rows to the requesting client.

3–8

Introduction to Teradata Warehouse

Virtual Processors

The PE software consists of the following elements: Parsing Engine Elements

Process

Parser

Decomposes SQL into relational data management processing steps

Optimizer

Determines the most efficient path to access data

Generator

Generates and packages steps

Dispatcher

Receives processing steps from the parser and sends them to the appropriate AMPs Monitors the completion of steps and handles errors encountered during processing.

Session Control

Manages session activities, such as logon, password validation, and logoff Recovers sessions following client or server failures

Access Module Processor The AMP is the heart of the Teradata Database. The AMP is a vproc that controls the management of the Teradata Database and the disk subsystem, with each AMP being assigned to a vdisk. AMP functions include…

For example…

database management tasks

accounting. journaling. locking tables, rows, and databases. during query processing: •

Sorting

•

Joining data rows

•

Aggregation

output data conversion. file-system management.

Introduction to Teradata Warehouse

disk space management.

3–9

Virtual Processors

Each AMP, as represented in the following figure, manages a portion of the physical disk space. Each AMP stores its portion of each database table within that space.

Parsing Engine

Parsing Engine

BYNET

AMP

AMP

AMP

AMP

Disk Storage Disk Storage Disk Storage Disk Storage

AMP Clusters AMPs are grouped into logical clusters to enhance the fault-tolerant capabilities of the Teradata Database. For more information on this method of creating additional fault tolerance in a system see Chapter 14: “Reliability.”

3 – 10

Introduction to Teradata Warehouse

Parsing Engine Request Processing

Parsing Engine Request Processing SQL is the language that you use to make requests of the Teradata Database. The SQL parser handles all incoming SQL requests. It processes an incoming request as follows: Stage

1

Process

The Parser looks in the Request cache to determine if the request is already there. IF the request is…

THEN the Parser…

in the Request cache

reuses the plastic steps found in the cache and passes them to gncApply. Go to stage 8 after checking access rights (stage 4). Plastic steps are directives to the database management system that do not contain data values.

not in the Request cache

2

begins processing the request with the Syntaxer.

The Syntaxer checks the syntax of an incoming request. IF there are…

THEN the Syntaxer…

no errors

converts the request to a parse tree and passes it to the Resolver.

errors

passes an error message back to the requestor and stops.

3

The Resolver adds information from the Data Dictionary (or cached copy of the information) to convert database, table, view, stored procedure, and macro names to internal identifiers.

4

The security module checks access rights in the Data Dictionary. IF the access rights are…

THEN the Security module…

valid

passes the request to the Optimizer.

not valid

aborts the request and passes an error message and stops.

Introduction to Teradata Warehouse

3 – 11

Parsing Engine Request Processing Stage

Process

5

The Optimizer determines the most effective way to implement the SQL request.

6

The Optimizer scans the request to determine where locks should be placed, then passes the optimized parse tree to the Generator.

7

The Generator transforms the optimized parse tree into plastic steps and passes them to gncApply.

8

gncApply takes the plastic steps produced by the Generator and transforms them into concrete steps. Concrete steps are directives to the AMPs that contain any needed user- or session-specific values and any needed data parcels.

9

gncApply passes the concrete steps to the Dispatcher.

The Dispatcher The Dispatcher controls the sequence in which steps are executed. It also passes the steps to the BYNET to be distributed to the AMP database management software as follows: Stage

Process

1

The Dispatcher receives concrete steps from gncApply.

2

The Dispatcher places the first step on the BYNET; tells the BYNET whether the step is for one AMP, several AMPS, or all AMPs; and waits for a completion response. Whenever possible, the Teradata Database performs steps in parallel to enhance performance. If there are no dependencies between a step and the following step, the following step can be dispatched before the first step completes, and the two execute in parallel. If there is a dependency, for example, the following step requires as input the data produced by the first step, then the following step cannot be dispatched until the first step completes.

3

3 – 12

The Dispatcher receives a completion response from all expected AMPs and places the next step on the BYNET. It continues to do this until all the AMP steps associated with a request are done.

Introduction to Teradata Warehouse

Parsing Engine Request Processing

The AMPs AMPs obtain the rows required to process the requests (assuming that the AMPs are processing a SELECT statement). The BYNET transmits messages to and from the AMPs. An AMP step can be sent to one of the following: • • •

One AMP A selected set of AMPs, called a dynamic BYNET group All AMPs in the system

The following figure is based on the example in the next section. If access is through a primary index, and a request is for a single row, the PE transmits steps to a single AMP, as shown at PE1. If the request is for many rows (an allAMP request), the PE makes the BYNET broadcast the steps to all AMPs as shown in PE2. To minimize system overhead, the PE can send a step to a subset of AMPs.

PE 2

PE 1 BYNET or Boardless BYNET

AMP 1

AMP 2

AMP 3

AMP 4

Disk

Disk

Disk

Disk

R1, R5, R9

R2, R6, R10

R3, R7, R11

R4, R8, R12 HD14A001

Example: SQL Statement As an example, consider the following Teradata SQL statements using a table containing checking account information. The example assumes that AcctNo column is the unique primary index for Table_01. For information about the types of indexes used by Teradata, see Chapter 8: “Data Distribution and Access Methods.” 1. SELECT * FROM Table_01 WHERE AcctNo = 129317 ; 2. SELECT * FROM Table_01 WHERE AcctBal > 1000 ;

Introduction to Teradata Warehouse

3 – 13

Parsing Engine Request Processing

In this example: • •

PEs 1 and 2 receive requests 1 and 2. The data for account 129317 is contained in table row R9 and stored on AMP1. Information about all account balances is distributed evenly among the disks of all four AMPs.

•

The following table lists the steps involved in processing the sample Teradata SQL statement: Stage

Process

1

PE 1 determines that the request is a primary index retrieval, which calls for the access and return of one specific row.

2

The Dispatcher in PE 1 issues a message to the BYNET containing an appropriate read step and R9/AMP 1 routing information. After AMP 1 returns the desired row, PE 1 transmits the data to the client.

3

The PE 2 Parser determines that this is an all-AMPs request, then issues a message to the BYNET containing the appropriate read step to be broadcast to all four AMPs.

4

After the AMPs return the results, PE 2 transmits the data to the TDP.

The following table lists the sequence of AMP step processing: Step

1

Step Name

Lock

Function

Serializes access in situations where concurrent access would compromise data consistency. For some simple requests using Unique Primary Index (UPI), Non-unique Primary Index (NUPI), or Unique Secondary Index (USI) access, the lock step may be incorporated into step 2. For information about indexes and their uses, see Chapter 8: “Data Distribution and Access Methods.”

3 – 14

2

Operation

Performs the requested task. For complicated queries, there may be hundreds of operation steps.

3

End transaction

Causes the locks acquired in step 1 to be released. The end transaction step tells all AMPs that worked on the request that processing is complete.

Introduction to Teradata Warehouse

Parallel Database Extensions

Parallel Database Extensions Parallel Database Extensions (PDE) are a software interface layer on top of the operating system. The operating system can be either UNIX MP-RAS or Windows 2000. PDE provides the Teradata Database with the ability to: • • • •

Run the Teradata Database in a parallel environment Execute vprocs Apply a flexible priority scheduler to Teradata Database sessions Debug the operating system kernel and the Teradata Database using resident debugging facilities

Trusted Parallel Applications The PDE provide a series of parallel operating system services to a special class of tasks called a trusted parallel application (TPA). On an SMP or MPP system, the TPA is the Teradata Database. TPA services include: • • • • •

Facilities to manage parallel execution of the TPA on multiple nodes Dynamic distribution of execution processes Coordination of all execution threads, whether on the same or on different nodes Balancing of the TPA workload within a clique Resident debugging facilities in addition to kernel and application debuggers

PDE and MPP Systems The PDE also enables an MPP system to: • •

Take advantage of hardware features such as the BYNET and shared disk arrays Process user applications that were written on non-Trusted Parallel Application (non-TPA) nodes and disks

Start and Stop PDE You can start, reset, and stop the PDE on Windows systems using the Teradata MultiTool utility and on UNIX MP-RAS systems using the xctl utility. For information about the ctl and xctl utilities, see “Maintenance Utilities” on page 18-10.

Introduction to Teradata Warehouse

3 – 15

The Teradata File System

The Teradata File System The special-purpose Teradata file system is a layer of software between the Teradata Database layer and the PDE layer. Teradata file system service calls allow the Teradata Database to store and retrieve data efficiently without being concerned about the specific low-level operating system interfaces. The data block is a disk-resident structure that contains one or more rows from the same table and is the physical I/O unit for the Teradata file system. Data blocks are stored in physical disk space units called sectors which are logically grouped together in cylinders.

Cylinder Read Cylinder Read, a capability of the Teradata file system, allows full-file scan operations to run efficiently by reading the cylinder-resident data blocks with a single I/O operation. This means the system incurs I/O overhead once per cylinder, as opposed to being incurred once per data block when blocks are read individually. The system benefits from the reduction in I/O time for operations such as table-scans and joins that process most or all of the data blocks of a table. Block sizes range between 6144 bytes and nearly 128 KB, or from 12 to 255 sectors. You can set the default maximum data block size as follows: Set this value either…

Using…

as a system default

DBS Control utility.

for a table

the DATABLOCKSIZE specifier on the CREATE TABLE statement.

Disk I/O Integrity Checking To detect data corruption in the file system metadata, the Teradata Database verifies the following: • • • •

Version numbers Segment lengths Block types Block hole addresses in the data block, cylinder index (CI), master index (MI) internal file system structures

To help detect corrupt data in these structures, disk I/O integrity checking calculates an end-to-end checksum at various user-selectable data sampling rates.

3 – 16

Introduction to Teradata Warehouse

The Teradata File System

You can specify the CHECKSUM option as follows: Set …

Using…

CHECKSUM option to one of the following levels of checking:

a

one of the following statements:

•

CREATE TABLE

• NONE

•

CREATE JOIN INDEX

• LOW

•

CREATE HASH INDEX

• MEDIUM

•

ALTER TABLE

• HIGH

b the DBS Control utility based on the type of table you want to check.

• ALL

For example, you may want to assign a higher level of checking to a user table than you assign to a temporary table.

Introduction to Teradata Warehouse

3 – 17

Workstation Types and Available Platforms

Workstation Types and Available Platforms Workstations provide a window into the interworkings of the Teradata Database. The following types of workstations are available: • •

System console Administration Workstation

Some of the workstation types are only available on specific platforms. The following table shows which workstations are appropriate for the different platforms and how workstations are connected to the node. Type of Workstation

Platform

Description

System console

SMP

Connected directly to the SMP node

Administration Workstation

MPP

network-connected through an Ethernet card on the node

• UNIX workstation

SMP and

• PC with X Windows server

MPP

Connected remotely through network using an Ethernet card on the node

System Console The role of the system console is to: • • • • •

Provide an input mechanism for the system and database administrators Display system status Display current system configuration Display performance statistics Allow you to control of various utilities

Administration Workstation The Administration Workstation (AWS) performs many of the functions of a system console for MPP systems. The AWS is an intelligent workstation whose primary roles are to: • • •

3 – 18

Provide an input mechanism for the system and database administrator Provide a single-system view in the multinode environment Monitor system performance

Introduction to Teradata Warehouse

Teradata Database Window

Teradata Database Window The Teradata Database Window (DBW) allows database or system administrators to control the operation of the Teradata Database. Running in a graphical X Windows or Microsoft Windows 2000 environment, the DBW is also the primary vehicle for starting and controlling the operation of the Teradata Database utilities.

How the Database Window Communicates with Teradata Database The DBW communicates with the Teradata Database through the console subsystem (CNS), which is part of the PDE software. Because the CNS software manages this communication, you might see CNS messages from the system. From the DBW main window, you can access to the following subwindows: From this subwindow…

You can…

Applications 1 through 4

run one Teradata Database utility or program at a time in each of the four subwindows.

DBS I/O

view messages from Teradata Database programs that are not running in DBW application subwindows, for example, some SQL diagnostics appear here.

Supervisor

issue commands and invoke utilities.

Running DBW You can run DBW from the following locations: •System Console • Administration Workstation (AWS) • Remote workstation or PC

Introduction to Teradata Warehouse

3 – 19

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books.

3 – 20

IF you want to learn more about…

THEN see…

Database Window

Database Window

General Teradata Database software architecture

Database Design

System process flows

Database Design

Introduction to Teradata Warehouse

Chapter 4:

International Language Support This chapter describes the capabilities of Teradata international language support. Topics include: • • • • • • • •

Character set overview External and internal character sets Teradata Database character data storage Language support modes Standard language support Japanese language support Extended support Enabling international character support

Introduction to Teradata Warehouse

4–1

Character Set Overview

Character Set Overview To manipulate data successfully, the Teradata Database must be able to store and retrieve the characters that constitute a given written language. To manage storage and retrieval, the database determines the repertoire of characters required and provides a scheme for representing strings of these characters on a computer.

What Is a Repertoire Consider English for example. To write English, you need the alphabetic characters, A–Z, the digits, 0–9, and various punctuation characters. Many applications also commonly require the characters a–z, the lower case counterparts of A-Z. If an application is written in French, you need the alphabetic characters that are required for English, plus accented characters, for example, the é. However, some applications may need accented characters for English as well. The word résumé, borrowed from French, is often displayed in its accented form in English text. Similarly, ö may be used in English text to spell coördinate. You can see that a repertoire comprises the characters that we need to write a language, and clearly, what we include in our repertoire determines what we can write, and how we must write it.

Character Representation Representing strings of characters is essentially a two-step process: • •

Creating a mapping between each character required and an integer. Devising an encoding scheme for placing a sequence of numbers into memory.

The simplest systems map the required characters to small integers between 0 and 255, and encode sequences of characters as sequences of bytes with the appropriate numeric values. Representing characters for repertoires that require more than 256 characters, such as Japanese, Chinese, and Korean, requires more complex schemes.

4–2

Introduction to Teradata Warehouse

External and Internal Character Sets

External and Internal Character Sets Client systems communicate with the Teradata Database using their own external format for numbers and character strings. The Teradata Database converts numbers and strings to its own internal format when importing the data, and converts numbers and strings back to the appropriate form for the client when exporting the data. This approach allows data to be exchanged between mutually incompatible client data formats. Take for example, channel-attached clients using EBCDIC-based character sets and network-attached clients using ASCII-based character sets. Both clients can access and modify the same data in the database.

Character Data Translation Teradata Database translates the characters: • •

Received from a client system into a form suitable for storage and processing on the server. Returned to a client into a form suitable for storage, display, printing, and processing on that client.

Thus, the server translates data from client form to server form and from server form to client form.

What Teradata Database Supports The Teradata Database supports many external client character sets, and allows each application to choose the internal server character set best suited to each column of character data in the Teradata Database. No matter which server character set you chose, communication with the client is always in the client character set (also known as the session charset).

Introduction to Teradata Warehouse

4–3

Teradata Database Character Data Storage

Teradata Database Character Data Storage The Teradata Database uses internal server character sets to represent user data and data in the Data Dictionary within the system.

Internal Server Character Sets Server character sets include: • • • • •

LATIN UNICODE KANJI1 KANJISJIS GRAPHIC

User Data User data refers to character data that you store in a character data type column on the Teradata Database.

System Dictionary Data The term system dictionary data refers to the names of the following objects as they are stored in the Data Dictionary on the Database: • • • • • • • • • • •

4–4

Tables Databases Users Columns Views Macros Triggers Join indexes Hash indexes Stored procedures User-defined functions

Introduction to Teradata Warehouse

Language Support Modes

Language Support Modes During system initialization (sysinit) the database administrator can optimize the database for one of two language support modes: • •

Standard Japanese

The language support mode determines the: • •

Character set that Teradata Database uses to store system dictionary data. Default character set for user data.

IF you enable this language support mode …

THEN Teradata Database stores system dictionary data using this character set …

AND sets the user data default character set to …

Standard

LATIN

LATIN

Japanese

KANJI1

UNICODE

Default Character Set for User Data The language support mode sets the default server character set for a user if the DEFAULT CHARACTER SET clause does not appear in the CREATE USER statement. To override the default character set for a user, you can use the DEFAULT CHARACTER SET clause in a CREATE USER statement.

Introduction to Teradata Warehouse

4–5

Language Support Modes

Character Set for System Dictionary Data The character set that Teradata Database uses to store system dictionary data cannot be changed after you enable the language support mode during the sysinit process. IF you optimize the database for this language support mode…

THEN the names of objects stored in the Data Dictionary can contain …

Standard

only western European characters. Characters outside the ASCII range (all accented characters, for example), cannot appear in a regular identifier. Rather, they can only occur in a delimited identifier (one that is enclosed in double quotes).

Japanese

Japanese characters, but only if you use the Teradatasupplied Japanese client character sets. Japanese characters are stored using the KANJI1 server character set. KANJI1 data cannot necessarily be shared between clients with differing client character sets. If you use other multibyte client character sets, such as UTF8, Korean, or Chinese, only characters in the ASCII range can appear in an object name. Accented characters cannot be used.

Character Set for Dictionary Data Other Than Object Names Object names are only a small part of the character data that Teradata Database stores in the Data Dictionary. Teradata Database always uses the UNICODE server character set to store character data other than object names in the Data Dictionary, no matter which language support mode you enable.

4–6

Introduction to Teradata Warehouse

Standard Language Support Mode

Standard Language Support Mode If you choose the standard language support mode, then Teradata Database stores system dictionary data and user data using the LATIN character set.

LATIN Character Set Standard language support provides Teradata Database internal coding for the entire set of printable characters from the ISO 8859-1 (Latin1) and ISO 8859-15 (Latin9) standard, including diacritical marks such as ä, ñ, Ÿ, Œ, and œ, though the Z with caron in Latin9 is not supported. ASCII control characters are also supported for the standard language set. Note: The ASCII referred to in this chapter is based on Standard ASCII (X’00’ to X’7F’) with Teradata extensions to cover ISO 8859-1 (Latin1) and ISO 8859-15 (Latin9). ASCII, as used here, represents the characters that can be stored as the LATIN server character set, referred to as Teradata LATIN. The EBCDIC referred to in this chapter is the Teradata extended ASCII mapped to the corresponding EBCDIC code points.

Compatible Languages The LATIN server character set that Teradata Database uses in standard language support mode is sufficient for you to use client character sets that support the international languages listed in the following table: International Languages That are Compatible with Standard Language Support

Albanian

English

Germanic

Portuguese

Basque

Estonian

Greenlandic

Rhaeto-Romantic

Breton

Faroese

Icelandic

Romance

Catalonian

Finnish

Irish Gaelic (new orthography)

Samoan

Celtic

French

Italian

Scottish Gaelic

Cornish

Frisian

Latin

Spanish

Danish

Galician

Luxemburgish

Swahili

Dutch

German

Norwegian

Swedish

Introduction to Teradata Warehouse

4–7

Japanese Language Support Mode

Japanese Language Support Mode If you enable the Japanese language support mode during the sysinit process, Teradata Database, by default, stores user data using the UNICODE server character set and stores system dictionary data using the KANJI1 server character set.

Advantages of Storing System Dictionary Data Using KANJI1 The KANJI1 server character set is compatible with the Teradata-supplied Japanese client character sets, allowing you to use object names containing Kanji characters, Hiragana, Zenkaku (fullwidth) and Hankaku (halfwidth) Katakana, Zenkaku Romaji (Latin), and various other characters. You can also use the ASCII characters from other client character sets to name objects that are stored in the Data Dictionary.

Advantages of Storing User Data Using UNICODE Unicode is a 16-bit encoding of virtually all characters in all current languages in the world. The Teradata UNICODE server character set supports Unicode 2.1, and is designed eventually to store all character data on the server. UNICODE may be used to store all characters from all single- and multibyte client character sets. User data stored as UNICODE can be shared among heterogeneous clients.

4–8

Introduction to Teradata Warehouse

Extended Support

Extended Support Extended support allows you to customize the Teradata Database to provide additional support for local character set usage. A sufficiently privileged user can create single-byte and multibyte client character sets that support, with certain constraints, any subset of the Unicode repertoire. Moreover, such a user can customize a collation for the entire Unicode repertoire. Extended support is available on systems that have been enabled with standard language support or Japanese language support.

Introduction to Teradata Warehouse

4–9

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books:

4 – 10

IF you want to learn more about…

THEN see…

Data formatting

International Character Set Support

Introduction to Teradata Warehouse

Section Contents

Section 2:

The Teradata Database Structure

Introduction to Teradata Warehouse

Section Contents

Introduction to Teradata Warehouse

Chapter 5:

Structured Query Language (SQL) This chapter describes SQL, which is the ANSI standard language for relational database management. All application programming facilities ultimately make queries against the Teradata Database using SQL because it is the only language the Teradata Database understands. To enhance the capabilities of SQL, Teradata has added extensions that are unique to Teradata. This comprehensive language is referred to as Teradata SQL. The first part of this chapter describes the data definition and manipulation capabilities of SQL. This includes basic statements used for describing and defining entities and for manipulating and retrieving data. Topics include: • • • •

SQL statements and related topics SQL functions User-defined functions Cursors

Introduction to Teradata Warehouse

5–1

Why SQL

Why SQL SQL has the advantage of being the most commonly used language for relational database management systems. Because of this, both the data structures in the Teradata Database and the commands for manipulating those structures are controlled using SQL. Additionally, all applications, including those written in a client language with embedded SQL, macros, and ad-hoc SQL queries, are written and executed using the same set of instructions and syntax. Other database management systems use different languages for data definition and data manipulation and may not permit ad-hoc queries of the database. Teradata Database lets you use one language to define, query, and update your data.

5–2

Introduction to Teradata Warehouse

What is SQL

What is SQL In principle, the SQL language is a combination of at least three subordinate languages and the SELECT statement. The languages allow you to define database objects, to define user access to those objects, and to manipulate the data stored within them. These languages form the principal functional families of SQL: • • • •

Data Definition Language (DDL) Data Control Language (DCL) Data Manipulation Language (DML) SELECT

The following sections contain information about the functional families of Teradata SQL.

Data Definition Language You use DDL to define the structure and instances of a database. This section describes the data definition capabilities of Teradata SQL, emphasizing the basic definition statements and data types. DDL provides statements for the definition and description of database objects. The following table summarizes the basic DDL statements: Statement

Action performed

CREATE

Defines a new database object, such as a database, user, table, trigger, index, macro, stored procedure or view, depending on the object of the CREATE statement

DROP

Removes a table, database, user, trigger, index, macro, stored procedure or view definition, depending on the object of the DROP statement

ALTER

Changes a table, column, referential constraint, or trigger

ALTER PROCEDURE

Recompiles a stored procedure

MODIFY

Changes a database or user definition

RENAME

Changes the names of tables, triggers, views, stored procedures, and macros

REPLACE

Replaces macros, triggers, stored procedures, or views

SET

Specifies time zones and the collation or character set for a session

COLLECT

Collects statistics on a column, group of columns, or index

Introduction to Teradata Warehouse

5–3

What is SQL

Successful execution of a DDL statement automatically creates, updates, or removes entries in the Data Dictionary. For information about the contents of the Data Dictionary, see Chapter 9: “Data Dictionary.”

Data Control Language You use DCL statements to grant and revoke access to database objects and change ownership of those objects from one user or database to another. The results of DCL statement processing also are recorded in the Data Dictionary. The following table summarizes the basic DCL statements: Statement

Action

GRANT/REVOKE

Controls access rights of the users on an object

GRANT LOGON/REVOKE LOGON

Controls logon rights to a host (client) or host group (if the special security user is enabled)

GIVE

Gives a database object to another database object

HELP and SHOW

Provides help about object definitions such as: •

HELP DATABASE

•

HELP TABLE

•

HELP CONSTRAINT

•

HELP PROCEDURE

•

HELP TRIGGER, and so forth

Provides help about: •

Sessions and statistics

•

SQL statement syntax

•

Displays the SQL used to create the table, with all defaults explicitly shown

Data Manipulation You use DML statements to manipulate and process database values. You can insert new rows into a table, update one or more values in stored rows, or delete a row.

5–4

Introduction to Teradata Warehouse

What is SQL

The following table summarizes the basic DML statements: Statement

Description

INSERT

Inserts new rows into a table. For more information about a special case of INSERT, see Atomic Upsert later in this table.

UPDATE

Modifies data in one or more rows of a table. For more information about a special case of UPDATE, see Atomic Upsert later in this table. Atomic Upsert The upsert form of the UPDATE DML statement is a Teradata extension to the ANSI SQL-99 standard designed to enhance the performance of the Teradata TPump utility by allowing the statement to support atomic upsert. For more information about how TPump operates, see “What Are Teradata Tools and Utilities” on page 2-7. This feature allows Teradata TPump and all other CLIv2-, ODBC-, and JDBC-based applications to perform single-row upsert operations using an optimally efficient single-pass strategy. This single-pass upsert is called atomic to emphasize that its component UPDATE and INSERT SQL statements are grouped together and performed as a single, or atomic, SQL statement.

DELETE

Removes a row (or rows) from a table.

COMMENT

Inserts a text comment for a database object.

MERGE

Combines both UPDATE and INSERT in a single SQL statement. Supports primary index operations only, similar to Atomic Upsert but with fewer constraints.

These statements:

Allow you to better manage transactions.

• ABORT • ROLLBACK • COMMIT • BEGIN TRANSACTION • END TRANSACTION CHECKPOINT

Check points a journal. CHECKPOINT is a function that writes records to a restart log table that the you can use to restart in case of a hardware or software system failure.

DATABASE

Specifies a default database.

ECHO

Echoes a string or command to a client.

Introduction to Teradata Warehouse

5–5

SQL Data Types

SQL Data Types A data type phrase does the following: • •

Determines how data is stored on the Teradata Database Specifies how data is presented to the user

You must specify a data type for each column when you use SQL to create a table because Teradata Database does not provide a default data type. You can include a data type to specify data conversions in expressions.

Teradata and ANSI-Compliant Data Types Teradata Database supports two modes of data types: ANSI and Teradata. ANSI-mode data types adhere to the ANSI SQL standard. Teradata-mode data types were written in older non-ANSI-compliant versions of Teradata SQL. Teradata Database supports the following SQL data types: Teradata supports…

Including…

Teradata SQL data types

Byte. Graphic.

ANSI-compliant SQL data types

Binary Large Objects (BLOBs). Character. Character Large Objects (CLOBs). DateTime. Interval. Numeric.

Data Type Attributes You can use Teradata SQL to define the attributes of a data value. Data type attributes control the following: • •

Import format (internal representation of stored data) Export format (how data is presented for a column or an expression result).

You must define data type attributes when you define a column. You can override the default values of data type attributes. For example, when you create a table, you can use a FORMAT phrase to override the output format of a data type.

5–6

Introduction to Teradata Warehouse

SQL Data Types

The following table summarizes data type attributes: Data Type Attribute

NOT NULL

ANSI

Teradata Extension to ANSI

X

UPPERCASE

X

[NOT] CASESPECIFIC

X

FORMAT quote_string

X

TITLE quote_string

X

NAMED name

X

DEFAULT number

X

DEFAULT USER

X

DEFAULT DATE

X

DEFAULT TIME

X

DEFAULT NULL

X

WITH DEFAULT CHARACTER SET

Introduction to Teradata Warehouse

X X

5–7

Statement Punctuation

Statement Punctuation A typical SQL statement consists of a statement keyword, one or more column names, a database name, a table name, and one or more optional clauses introduced by keywords. You can use the punctuation to separate or identify the parts of an SQL statement: This syntax element…

Named…

Performs this function in a SQL statement…

.

period

separates database names from table names and table names from a particular column name (for example, personnel.employee.deptno).

,

comma

separates and distinguishes column names in the select list, or column names or parameters in an optional clause.

‘

apostrophe

delimits the boundaries of character string constants.

(

left and right parentheses

groups expressions or defines the limits of a phrase.

;

semicolon

separates statements in multi-statement requests and terminates requests submitted via certain utilities such as BTEQ.

“

quotation marks

identifies user names which might otherwise conflict with SQL keywords, or would not be valid names in the absence of the quotation marks.

:

colon

prefixes reference parameters or client system variables.

)

To include an apostrophe or show possession in a title, double the apostrophes.

5–8

Introduction to Teradata Warehouse

SQL Statements and Requests

SQL Statements and Requests A typical SQL statement consists of the following: • • • • •

A statement keyword One or more column names A database name A table name One or more optional clauses introduced by keywords

For example, in the following single-statement request, the statement keyword is SELECT: SELECT deptno, name, salary FROM personnel.employee WHERE deptno IN(100, 500) ORDER BY deptno, name ;

The select list for this statement is made up of the names: • • •

Deptno, name, and salary (the column names) Personnel (the database name) Employee (the table name)

The search condition, or WHERE clause, is introduced by the keyword WHERE: WHERE deptno IN(100, 500)

The sort ordering, or ORDER BY clause, is introduced by the keywords ORDER BY: ORDER BY deptno, name

Teradata offers the following ways to invoke an executable statement: • • • • • • •

Interactively from a terminal Embedded within an application program Dynamically created within an embedded application Embedded within a stored procedure Dynamically created within a stored procedure Via a trigger Embedded within a macro

Introduction to Teradata Warehouse

5–9

The SELECT Statement

The SELECT Statement The SELECT statement is probably the most frequently used SQL statement. It specifies the table columns from which to obtain the data you want, the corresponding database (if different from the current default database), and the table or tables that you need to reference within that database. The SELECT statement further specifies how, in what format, and in what order the system returns the set of result data. You can use the following variations with the SELECT statement to request data from the Teradata Database: • • • • • • • •

• •

DISTINCT option FROM list WHERE clause, including subqueries SAMPLE clause GROUP BY clause HAVING clause QUALIFY clause ORDER BY clause • CASESPECIFIC option • International sort orders WITH clause Query expressions and set operators

Another variation is the SELECT INTO statement, which is used in embedded SQL and stored procedures. This statement selects at most one row from a table and assigns the values in that row to host variables in embedded SQL or to local variables or parameters in Teradata stored procedures.

SELECT Statement and Set Operators The SELECT statement is the only SQL statement that can use the set operators UNION, INTERSECT, and MINUS/EXCEPT. These set operators allow you to manipulate the answers to two or more queries by combining the results of each query into a single result set. You can use the set operators within the following operations: • • • •

5 – 10

View definitions Derived tables Subqueries INSERT SELECT clauses

Introduction to Teradata Warehouse

The SELECT Statement

SELECT Statement and Joins A SELECT statement can reference data in two or more tables, and the relational join combines the data from the referenced tables. In this way, the SELECT statement defines a join of specified tables to retrieve data more efficiently than without defining a join of tables. You can specify both inner joins and outer joins: •

•

An inner join selects data from two or more tables that meets specific join conditions. Each source must be named and the join condition, that is the common relationship among the tables to be joined, must be specified in a WHERE clause. The outer join is an extension of the inner join that includes rows that qualify for a simple inner join, as well as a specified set of rows that do not match the join conditions expressed by the query.

Introduction to Teradata Warehouse

5 – 11

SQL Functions

SQL Functions SQL is a nonprocedural language. That means you use SQL statements to tell the Teradata Database what you want. You do not include instructions about how to get it. In procedural languages, such as C++, BASIC, or COBOL, you write instructions that define how to get what you want. It is a simple, but important, distinction. Procedural languages contain functions that perform complex operations. The usual SQL statements do not support many functions. However, to reduce the reliance on ancillary application code, SQL does support the following standard functions: • • •

Scalar Aggregate Ordered analytical

Scalar Functions You can use a scalar function in place of a column name in an expression. A scalar function works on input parameters to create a result. When it is part of an expression, the function is invoked in parallel as needed whenever expressions are evaluated for an SQL statement. When a function completes, its result is used by the expression in which the function was referenced.

Aggregate Functions Sometimes the information you want can only be derived from data in a set of rows, instead of individual rows. Aggregate functions produce results from sets of relational data that you have grouped (optionally) using a GROUP BY or ORDER BY clause. Aggregate functions process each set and produce one result for each set. The following table lists a few examples of aggregate functions:

5 – 12

The function…

Returns the…

AVG

arithmetic average of the values in a specified column.

COUNT

number of qualified rows.

MAX

maximum column value for the specified column.

MIN

minimum column value for the specified column.

SUM

arithmetic sum of a specified column.

Introduction to Teradata Warehouse

SQL Functions

Ordered Analytical Functions Ordered analytical functions are primarily statistical algorithms. They work over a range of data for a particular set of rows in some specific order to produce a result for each row in the set. Like aggregate functions, ordered analytical functions are called for each item in a set. But unlike an aggregate function, an ordered analytical function produces a result for each detail item. Ordered analytical functions allow you to perform sophisticated data mining on the information in your databases to get the answers to questions that standard SQL alone cannot provide. The following table lists a few examples of ordered analytical functions: The following function…

Returns the…

MSUM

sum using the current row and a number of preceding rows that you specify. This is called a moving sum.

RANK

ordered ranking of rows based on the value of the column being ranked.

Introduction to Teradata Warehouse

5 – 13

User-Defined Functions

User-Defined Functions You can create user-defined functions (UDFs) to address your particular data needs and to fill the void where standard SQL functions are lacking. These special functions can translate into time-saving measures by preprocessing data, or by optimizing query processing. You can use UDFs to map and manipulate non-text data, such as images, in a way that is impossible with standard SQL constructs. You can write new: • •

Scalar functions similar to the standard LOG, SQRT, ABS, and TRIM functions Aggregate functions, similar to SUM, MAX, MIN, and AVG.

Creating User-Defined Functions You create the source code for UDFs using the C programming language. Then You can simply use the CREATE FUNCTION statement and provide the location of the UDF source code. The Teradata Database will do all of the work, including validating the CREATE FUNCTION statement and compiling the C source. The source may be on the client system or on the server. Any compilation errors are reported. If no errors occur, the Teradata Database links the function object into a Dynamically Linked Library (DLL) and distributes it to all nodes in the system. The UDF is usable as soon as CREATE FUNCTION completes. Teradata customers can purchase precompiled UDFs from third-party vendors. To protect their intellectual property, vendors may not wish to make their source available. In those instances, they can simply provide a package in the form of a DLL. The DLL code does not have to be written in C, but the code must use C parameter-passing conventions. Teradata customers can use an option in the CREATE FUNCTION statement to provide just the object. The Teradata Database distributes the object automatically to all nodes. Installing just the object is also useful for sites that develop UDFs on a development system and then transfer the object to the production system.

5 – 14

Introduction to Teradata Warehouse

User-Defined Functions

SQL Statements Related to Functions In addition to creating new functions, you can replace a function by specifying the REPLACE keyword. The CREATE FUNCTION statement conforms to SQL-99. The REPLACE option is a Teradata extension to the ANSI standard. The following table provides information about the privileges you need to create and replace functions:

IF you want to…

THEN you must have the following privilege…

create a function

CREATE FUNCTION on the database in which you want to create the function.

replace an existing function

DROP FUNCTION on the function or the database containing the function.

The following table contains the SQL statements associated with UDFs: Use the following function…

To…

CREATE FUNCTION

originate a new function.

DROP FUNCTION

remove a function.

REPLACE FUNCTION

change a function.

SHOW FUNCTION

display the definition of a function, including the CREATE and REPLACE text. Source code appears if the user has DROP FUNCTION privilege on the UDF.

HELP database

display the specific name and type of function. Types include: •

F for function

•

A for aggregate function

HELP FUNCTION

display the function name, list of parameters, their type, and any comment associated with the parameter.

COMMENT

add a comment about the function.

RENAME FUNCTION

change the name of a function.

Introduction to Teradata Warehouse

5 – 15

Cursors

Cursors Traditional application development languages cannot deal with results tables without some kind of intermediary mechanism because SQL is a set-oriented language. The intermediary mechanism is the cursor. A cursor is a pointer that the application program uses to move through a results table. You declare a cursor for a SELECT statement, and then open the named cursor. The act of opening the cursor executes the SQL statement. You use the FETCH... INTO... statement to individually fetch and write the rows into host variables. The application can then use the host variables to do computations. Teradata Preprocessor2 uses cursors to mark or tag the first row accessed by an SQL query. Preprocessor2 then increments the cursor as needed. Stored procedures use cursors to fetch one result row at a time and then execute SQL and SQL control statements as required for each row. Local variables or parameters from the stored procedure can be used for computations.

5 – 16

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…

THEN see…

Large Objects

SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements SQL Reference: Data Types and Literals SQL Reference: Functions and Operators SQL Reference: Fundamentals SQL Reference: UDF Programming

Teradata SQL

Database Design SQL Reference: Fundamentals Teradata SQL Assistant for Microsoft Windows User Guide

User-Defined Functions

Introduction to Teradata Warehouse

SQL Reference: UDF Programming

5 – 17

For More Information

5 – 18

Introduction to Teradata Warehouse

Chapter 6:

Application Development This chapter describes the tools used to develop applications for the Teradata Database and the interfaces used to establish communications between the applications and the Teradata Database. Topics include: • • •

Types of SQL applications The importance of the EXPLAIN statement Third-party development

Introduction to Teradata Warehouse

6–1

Types of SQL Development

Types of SQL Development Application development for the Teradata Database falls into one of two categories: • •

Explicit SQL Implicit SQL

Explicit SQL Development Under explicit SQL application development you have the following tools: • • • •

Embedded SQL Macros Stored Procedures EXPLAIN statement

More information about each tool is provided in following sections of this chapter.

Implicit SQL Development Under implicit SQL application development, you have tools, such as Teradata and third-party products that generate SQL as their output. More information about third-party products is provided in following sections of this chapter.

6–2

Introduction to Teradata Warehouse

Embedded SQL Applications

Embedded SQL Applications This section describes using embedded SQL in applications.

What Is Embedded SQL When you write applications using embedded SQL, you insert SQL statements into your native language application program. Because third-generation application development languages do not have facilities for dealing with results sets, embedded SQL contains extensions to executable SQL that permit declarations. Embedded SQL declarations include: • •

Code to encapsulate the SQL from the native application language Cursor definition and manipulation

A cursor is a pointer device that you use to read through a results table one record/row at a time. For more information about cursors, see “Cursors” on page 5-16.

How Does an Application Program Use Embedded SQL The client application languages that support embedded SQL are all compiled languages. SQL is not defined for any of them. For this reason, you must precompile your embedded SQL code to translate the SQL into native code before you can compile the source using a native compiler. The precompiler tool is called Preprocessor2, and you use it to: • • •

Read your application source code to look for the defined SQL code fragments Interpret the intent of the code after it isolates all the SQL code in the application and translates it into Call Level Interface (CLI) calls Comment out all the SQL source

The output of the precompiler is native language source code with CLI calls substituted for the SQL source. After the precompiler generates the output, you can process the converted source code with the native language compiler. For information about Call Level Interface communications interface, see Chapter 13: “Data Communication Between Client and Teradata Database.”

Introduction to Teradata Warehouse

6–3

Embedded SQL Applications

Supported Languages and Platforms Preprocessor2 supports the following application development languages on the specified platforms: Application Development Language

C COBOL

PL/I

6–4

Platform

•

IBM mainframe clients

•

UNIX clients

•

IBM mainframe clients

•

Some workstation clients

•

IBM mainframes

Introduction to Teradata Warehouse

Macros as SQL Applications

Macros as SQL Applications Teradata macros are SQL statements that the server stores and executes. Macros provide an easy way to execute frequently used SQL operations. Macros are particularly useful for enforcing data integrity rules, providing data security, and improving performance.

SQL Used to Create a Macro You use the CREATE MACRO statement to create Teradata macros. The format of CREATE MACRO is similar to CREATE VIEW. For example, suppose you want to define a macro for adding new employees to the Employee table and incrementing the EmpCount field in the Department table. The CREATE MACRO statement looks like this: CREATE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 100) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;

This macro defines parameters that users must fill in each time they execute the macro. A leading colon (:) indicates a reference to a parameter within the macro.

Introduction to Teradata Warehouse

6–5

Macros as SQL Applications

Macro Usage The following example shows how to use the NewEmp macro to insert data into the Employee and Department tables. The information to be inserted is the name, employee number, and department number for employee H. Goldsmith. The EXECUTE macro statement looks like this: EXECUTE NewEmp (‘Goldsmith H’, 10015, 600);

SQL Used to Modify a Macro The following example shows how to modify a macro. Suppose you want to change the NewEmp macro so that the default department number is 300 instead of 100. The REPLACE MACRO statement looks like this: REPLACE MACRO NewEmp (name (VARCHAR(12)), number (INTEGER, NOT NULL), dept (INTEGER, DEFAULT 300) ) AS (INSERT INTO Employee (Name, EmpNo, DeptNo ) VALUES (:name, :number, :dept ) ; UPDATE Department SET EmpCount=EmpCount+1 WHERE DeptNo=:dept ; ) ;

SQL Used to Delete a Macro The example which follows shows how to delete a macro. Suppose you want to drop the NewEmp macro from the database. The DROP MACRO statement looks like this: DROP MACRO NewEmp;

6–6

Introduction to Teradata Warehouse

Teradata Stored Procedures as SQL Applications

Teradata Stored Procedures as SQL Applications Teradata stored procedures are database applications created by combining SQL control statements with other SQL elements and condition handlers. They provide a procedural interface to the Teradata Database and many of the same benefits as embedded SQL. Teradata stored procedures conform to the ANSI SQL-99 (SQL3) standard with some exceptions.

SQL Used to Create Stored Procedures Teradata SQL supports creating, modifying, dropping, renaming, and controlling access rights of stored procedures through DDL and DCL statements. You can create or replace a stored procedure through the COMPILE command in Basic Teradata Query Facility (BTEQ) and BTEQ for Microsoft Windows systems (BTEQWIN). You must specify a source file as input for the COMPILE command. You can also create or modify a stored procedure using the CREATE PROCEDURE or REPLACE PROCEDURE statement from CLIv2, ODBC, and JDBC applications, and the Teradata SQL

Stored Procedure Example Assume you want to create a stored procedure named NewProc that you can use to add new employees to the Employee table and retrieve the department name of the department to which the employee belongs. You can also report an error, in case the row that you are trying to insert already exists, and handle that error condition. The following stored procedure definition includes nested, labeled compound statements. The compound statement labeled L3 is nested within the outer compound statement L1. Note that the compound statement labeled L2 is the handler action clause of the condition handler. This stored procedure defines parameters that must be filled in each time it is called (executed). The parameters are indicated with a leading colon (:) character when used in an SQL statement other than a control statement inside the procedure.

Introduction to Teradata Warehouse

6–7

Teradata Stored Procedures as SQL Applications CREATE PROCEDURE NewProc (IN name CHAR(12), IN num INTEGER, IN dept INTEGER, OUT dname CHAR(10) INOUT p1 VARCHAR(30)) L1: BEGIN DECLARE CONTINUE HANDLER FOR SQLSTATE value '23505' L2: BEGIN SET p1='Duplicate Row'; END L2; L3: BEGIN INSERT INTO Employee (Name, EmpNo, DeptNo) VALUES (:name, :num, :dept); SELECT DeptName INTO :dname FROM Department WHERE DeptNo = :dept; IF SQLCODE 0 THEN LEAVE L3; ... END L3; END L1;

SQL Used to Execute a Stored Procedures After compiling a stored procedure, procedures are stored as objects in the Teradata Database. You can execute stored procedures from Teradata client utilities using the SQL CALL statement. Arguments for all input (IN or INOUT) parameters of the stored procedure must be submitted with the CALL statement. BTEQ and other Teradata client utilities support stored procedure execution and DDL operations. These include: • • • •

• •

6–8

CLIv2 JDBC ODBC PP2 DDL statements are not supported from PP2; that is, you cannot create or modify stored procedures from PP2. Teradata SQL Assistant BTEQWIN (BTEQ for Windows)

Introduction to Teradata Warehouse

Teradata Stored Procedures as SQL Applications

You can use the following DDL statements with stored procedures: Use This Statement…

To…

CREATE PROCEDURE

direct the stored procedure compiler to create a procedure from the SQL statements in the remainder of the statement text.

ALTER PROCEDURE

direct the stored procedure compiler to recompile a stored procedure created in an earlier version of Teradata Database without executing SHOW PROCEDURE and REPLACE PROCEDURE statements.

DROP PROCEDURE

drop a stored procedure.

RENAME PROCEDURE

rename a procedure.

REPLACE PROCEDURE

direct the stored procedure compiler to replace the definition of an existing stored procedure. If the specified stored procedure does not exist, create a new procedure by that name from the SQL statements in the remainder of the source text.

HELP PROCEDURE … ATTRIBUTES

view all the parameters and parameter attributes of a procedure, or the creation time attributes of a procedure.

HELP ‘SPL’

display a list of all DDL and control statements associated with stored procedures.

HELP ’SPL command_name’

display help about the command you have named

SHOW PROCEDURE

view the current definition (source text) of a procedure. The text is returned in the same format as defined by the creator.

Introduction to Teradata Warehouse

6–9

The EXPLAIN Statement

The EXPLAIN Statement Teradata SQL supplies a very powerful EXPLAIN statement that allows you to see the execution plan of a query. The EXPLAIN modifier in front of any SQL statement displays the execution plan for that statement, which is parsed and optimized in the usual fashion, but is not submitted for execution.

How Is EXPLAIN Useful The EXPLAIN statement not only explains how a statement will be processed, but provides an estimate of the number of rows involved and the performance impact of the request. When you perform an EXPLAIN against any SQL statement, that statement is parsed and optimized. The access and join plans generated by the optimizer are returned in the form of a text file that explains the (possibly parallel) steps used in the execution of the statement. Also included is the relative time required to complete the statement given the statistics with which the optimizer had to work. If the statistics are not reasonably accurate, the time estimate may not be accurate. EXPLAIN helps you to evaluate complex queries and to develop alternative, more efficient, processing strategies. You may be able to get a better plan by collecting more statistics on more columns, or by defining additional secondary indexes. Your knowledge of the actual demographics information may allow you to identify row count estimates that seem badly wrong, and help to pinpoint areas where additional statistics would be helpful.

EXPLAIN With Simple Join Index Example The EXPLAIN example results from joining tables with the following table definitions. CREATE TABLE customer (c_custkey INTEGER, c_name CHAR(26), c_address VARCHAR(41), c_nationkey INTEGER, c_phone CHAR(16), c_acctbal DECIMAL(13,2), c_mktsegment CHAR(21), c_comment VARCHAR(127)) UNIQUE PRIMARY INDEX( c_custkey ); CREATE TABLE orders (o_orderkey INTEGER NOT NULL, o_custkey INTEGER, o_orderstatus CHAR(1), o_totalprice DECIMAL(13,2) NOT NULL,

6 – 10

Introduction to Teradata Warehouse

The EXPLAIN Statement o_orderdate DATE FORMAT 'yyyy-mm-dd' NOT NULL, o_orderpriority CHAR(21), o_clerk CHAR(16), o_shippriority INTEGER, o_commment VARCHAR(79)) UNIQUE PRIMARY INDEX(o_orderkey); CREATE TABLE lineitem (l_orderkey INTEGER NOT NULL, l_partkey INTEGER NOT NULL, l_suppkey INTEGER, l_linenumber INTEGER, l_quantity INTEGER NOT NULL, l_extendedprice DECIMAL(13,2) NOT NULL, l_discount DECIMAL(13,2), l_tax DECIMAL(13,2), l_returnflag CHAR(1), l_linestatus CHAR(1), l_shipdate DATE FORMAT 'yyyy-mm-dd', l_commitdate DATE FORMAT 'yyyy-mm-dd', l_receiptdate DATE FORMAT 'yyyy-mm-dd', l_shipinstruct VARCHAR(25), l_shipmode VARCHAR(10), l_comment VARCHAR(44)) PRIMARY INDEX( l_orderkey );

The following statement defines a join index on these tables. CREATE JOIN INDEX order_join_line AS SELECT ( l_orderkey, o_orderdate, o_custkey, o_totalprice ), ( l_partkey, l_quantity, l_extendedprice, l_shipdate ) FROM lineitem LEFT JOIN orders ON l_orderkey = o_orderkey ORDER BY o_orderdate PRIMARY INDEX (l_orderkey);

The following EXPLAIN shows that the optimizer used the newly created join index, order_join_line, even though there is no reference to the index in the SQL text. EXPLAIN SELECT o_orderdate, o_custkey, l_partkey, l_quantity, l_extendedprice FROM lineitem , orders WHERE l_orderkey = o_orderkey; Explanation -------------------------------------------------------------1) First, we lock a distinct LOUISB."pseudo table" for read on a Row Hash to prevent global deadlock for LOUISB.order_join_line. 2) Next, we lock LOUISB.order_join_line for read. 3) We do an all-AMPs RETRIEVE step from join index table LOUISB.order_join_line by way of an all-rows scan with a condition of ("NOT (LOUISB.order_join_line.o_orderdate IS NULL)") into

Introduction to Teradata Warehouse

6 – 11

The EXPLAIN Statement Spool 1, which is built locally on the AMPs. The input table will not be cached in memory, but it is eligible for synchronized scanning. The result spool file will not be cached in memory. The size of Spool 1 is estimated to be 1,000,000 rows. The estimated time for this step is 4 minutes and 27 seconds. 4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.

For information about the types of indexes that Teradata supports, see Chapter 8: “Data Distribution and Access Methods.”

6 – 12

Introduction to Teradata Warehouse

Third-Party Development

Third-Party Development The Teradata Database supports many third-party software products. The two general components of supported products include those of the transparency series and the native interface products.

TS/API Products The Transparency Series/Application Program Interface (TS/API) product provides a gateway between the IBM mainframe relational database products DB2 (MVS/TSO) and SQL/DS (VM/CMS) and the Teradata Database. TS/API permits an SQL statement formulated for either DB2 or SQL/DS to be translated into Teradata SQL to allow DB2 or SQL/DS applications to access data stored in a Teradata Database.

Compatible Third-Party Software Products Many third-party, interactive query products operate in conjunction with the Teradata Database, permitting queries formulated in a native query language to access a Teradata Database. The list of supported third-party products changes frequently. For a current list, contact your NCR sales office.

Performance Monitor/Application Programming Interface The Performance Monitor/Application Programming Interface (PM/API) provides a way for third-party performance monitoring programs to access Performance Monitor and Production Control (PM and PC) functions resident within Teradata Database. PM and PC data is available using a specialized PM/API subset of the Call-Level Interface Version 2 (CLIv2).

Introduction to Teradata Warehouse

6 – 13

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

BTEQ

Basic Teradata Query Reference

Call-Level interface programming

Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems

Embedded SQL

SQL Reference: Stored Procedures and Embedded SQL Teradata Preprocessor2 for Embedded SQL Programmer Guide

JDBC

Teradata Driver for the JDBC Interface User Guide

ODBC

Teradat ODBC Driver User Guide

Performance

PM/API Reference Teradata Manager User Guide Resource Usage Macros and Tables

6 – 14

Teradata Director Program

Teradata Director Program Reference

Teradata SQL data manipulation statements

SQL Reference: Data Manipulation Statements

Teradata SQL preprocessor

Teradata Preprocessor2 for Embedded SQL Programmer Guide

Teradata stored procedures

SQL Reference: Stored Procedures and Embedded SQL

TS/API products

Teradata Transparency Series/Application Programming Interface User Guide

Introduction to Teradata Warehouse

Chapter 7:

The Teradata Database Model This chapter describes the mathematical concepts on which relational databases are modeled and discusses some of the objects that are part of a relational database. Topics include: • • •

The relational model The relational database Tables, rows, and columns

Introduction to Teradata Warehouse

7–1

What is a Relational Model

What is a Relational Model The relational model for database management is based on concepts derived from the mathematical theory of sets. Roughly speaking, set theory defines a table as a relation. The number of rows is the cardinality of the relation, and the number of columns is the degree. Any manipulation of a table in a relational database has a consistent, predictable outcome, because the mathematical operations on relations are well defined. By way of comparison, database-management products based on hierarchical, network, or object-oriented architectures are not built on rigorous theoretical foundations. Therefore, the behavior of such products is not as predictable as that of relational products. The SQL optimizer in the database uses relational algebra to build the most efficient access path to requested data. The optimizer can readily adapt to changes in system variables by rebuilding access paths without programmer intervention. This adaptability is necessary because database definitions can change from time to time.

7–2

Introduction to Teradata Warehouse

What is a Relational Database

What is a Relational Database Users perceive a relational database as a collection of objects, for example, tables, views, macros, stored procedures, and triggers, that are easily manipulated using SQL directly or specifically developed applications.

Set Theory and Relational Database Terminology Relational databases are a generalization of the mathematics of set theory relations. Thus, the correspondences between set theory and relational databases are not always direct. The information in the following table (relation) notes the corresponds between set theory and relational database terms: Set Theory Term

Relational Database Term

Relation

Table

Tuple

Row (or record)

Attribute

Column

Introduction to Teradata Warehouse

7–3

Tables, Rows, and Columns

Tables, Rows, and Columns Tables are two-dimensional objects consisting of rows and columns. Data is organized in table format and presented to the users of a relational database. References between tables define the relationships and constraints of data inside the tables themselves.

Table Constraints You can define conditions that must be met before the Teradata Database writes a given value to a column in a table. These conditions are called constraints. Constraints can include value ranges, equality or inequality conditions, and intercolumn dependencies. The Teradata Database supports constraints at both the column and table levels. During table creation and modification, you can specify constraints on single column values as part of a column definition or on multiple columns using the CREATE and ALTER statements.

Permanent and Temporary Tables To manipulate tabular data, you must submit a query in a language that the database understands. In the case of the Teradata Database, the language is SQL. You can store the results of multiple SQL queries in tables. Permanent storage of tables is necessary when different sessions and users must share table contents. When tables are required for only a single session, the system creates temporary tables. Using this type of table, you can save query results for use in subsequent queries within the same session. Also, you can break down complex queries into smaller queries by storing results in a temporary table for use during the same session. When the session ends, the system automatically drops the temporary table.

Global Temporary Tables Global temporary tables are tables that exist only for the duration of the SQL session in which they are used. The contents of these tables are private to the session, and the system automatically drops the table at the end of that session. However, the system saves the global temporary table definition permanently in the Data Dictionary. The saved definition may be shared by multiple users and sessions with each session getting its own instance of the table.

7–4

Introduction to Teradata Warehouse

Tables, Rows, and Columns

Volatile Temporary Tables If you need a temporary table for a single use only, you can define a volatile temporary table. The definition of a volatile temporary table resides in memory but does not survive across a system restart. Using volatile temporary tables improves performance even more than using global temporary tables because the system does not store the definitions of volatile temporary tables in the Data Dictionary. Access-rights checking is not necessary because only the creator can access the volatile temporary table.

Derived Tables A special type of temporary table is the derived table. You can specify a derived table in an SQL SELECT statement.

Rows and Columns A column always contains the same kind of information. For example, a table that has information about employees would have columns for the first name and last name, and nothing other than the employee names should be placed in those columns. A row is one instance of all the columns in a table. For example, each row in the employee table would contain the first name and the last name for that employee, among other things. The rows and columns in a table represent entities or relationships. An entity is a person, place, or thing about which the table contains information. The table mentioned in the previous paragraphs contains information about the employee entity. Each table holds only one kind of row. The relational model requires that each row in a table be uniquely identified. To accomplish this, you define a primary key to identify each row in the table. For more information about primary keys, see “How Are Primary Keys and Primary Indexes Related” on page 8-3.

Introduction to Teradata Warehouse

7–5

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…

THEN see…

Relational model

Database Design

Tables, rows, and columns

Database Design Database Administration SQL Reference: Fundamentals

7–6

Introduction to Teradata Warehouse

Chapter 8:

Data Distribution and Access Methods This chapter describes how the Teradata Database handles data distribution and access. Topics include: • • •

Indexes Hashing Identity Column

Introduction to Teradata Warehouse

8–1

Teradata Database Indexes

Teradata Database Indexes An index is a physical mechanism used to store and access the rows of a table. Indexes on tables in a relational database function much like indexes in books— they speed up information retrieval. In general, the Teradata Database uses indexes to: • • • •

Distribute data rows. Locate data rows. Improve performance. (Indexed access is usually more efficient than searching all rows of a table.) Ensure uniqueness of the index values. Only one row of a table can have a particular value in the column or columns defined as a unique index.

The Teradata Database supports the following types of indexes: • • • •

Primary index may be unique or non-unique and optionally partitioned Secondary index may be unique or non-unique Join index Hash index

These indexes are discussed in the following sections.

8–2

Introduction to Teradata Warehouse

Primary Indexes

Primary Indexes The Teradata Database requires only one primary index for each table.

Primary Index Characteristics The most efficient access method is through the primary index. The two, sometimes conflicting, design goals of choosing a primary index that gives good distribution of data across the AMPs, and choosing a primary index that reflects the most common usage pattern of the table must be balanced. Primary indexes: • • • •

Affect the distribution of rows across AMPs Do not have subtables Can be unique or non-unique May or may not be partitioned For information about partitioned indexes, see “Partitioned Primary Indexes” on page 8-5.

How Are Primary Keys and Primary Indexes Related The values chosen for the unique index of a table are frequently the same values identified as the primary key during the data modeling process, but no hard and fast rule makes this so. In fact, physical database design considerations often lead to a choice of values other than those of the primary key for the primary index of a table.

Introduction to Teradata Warehouse

8–3

Primary Indexes

The following table describes some of the relationships between primary keys and primary indexes: Primary Key

Primary Index

Constraint used to ensure referential integrity

Physical access mechanism

Required by the Teradata Database only if referential integrity checks are to be performed

Required by Teradata Database

64-column limit

8–4

IF the Teradata Database performs…

THEN the column limit is…

referential integrity checks

64.

no referential integrity checks

no arbitrary limit.

Defined by CREATE TABLE statement

Defined by CREATE TABLE statement

Must be unique

May be unique or non-unique

Identifies a row uniquely

Distributes rows

Values cannot be changed

Values can be changed

May not be null

May be null

Does not imply access path

Defines most common access path

Causes a unique primary index or unique secondary index. to be created

N/A

Introduction to Teradata Warehouse

Partitioned Primary Indexes

Partitioned Primary Indexes Both unique and non-unique primary indexes can be partitioned. A partitioned primary index, like a non-partitioned primary index, provides an access path to the rows in the base table via the primary index values.

Non-partitioned Primary Indexes You can define a primary index as either partitioned or non-partitioned. The non-partitioned primary index is the standard Teradata Database primary index. When a table is created with a partitioned primary index, the rows are hashed to the appropriate AMPs and assigned to an appropriate partition based on the value of a partitioning expression that you define when you create or alter the table. Once assigned to a partition, the rows are stored in row hash order.

How Do Partitioned and Non-Partitioned Primary Indexes Compare Partitioned primary indexes are designed to optimize range queries. A range query requests data that falls within specified boundaries while providing efficient primary index join strategies. The following table provides a comparison of partitioned and non-partitioned primary index capabilities: Capabilities

Partitioned

Non-Partitioned

Hash partitioned, that is distributed to the AMPs by the hash of the primary index columns

Yes

Yes

Partitioned on each AMP on some set of columns

Yes

No

Ordered by hash of the primary index columns on each AMP

Yes (within each partition)

Yes

Introduction to Teradata Warehouse

8–5

Secondary Indexes

Secondary Indexes Secondary indexes allow access to information in a table by alternate, less frequently used paths and improve performance by avoiding full table scans. Secondary indexes add to table overhead, in terms of disk space and maintenance, however, you can drop and recreate secondary indexes as needed. Secondary indexes: • • •

Do not affect the distribution of rows across AMPs Can be unique or non-unique Are used by the optimizer when the indexes can improve query performance

Secondary Index Subtables The system builds subtables for all secondary indexes. The subtable contains the rows that associate the secondary index value with one or more rows in the base table. When column values change, the system updates the rows in the subtable. When you drop the secondary index, the system physically removes the subtable.

How Do Primary and Secondary Indexes Compare The following table provides a brief comparison of primary and secondary index features: Feature

Primary

Secondary

Yes

No

Both

Both

Affects row distribution

Yes

No

Create and drop dynamically

No

Yes

Improves access

Yes

Yes

Create using multiple data types

Yes

Yes

Requires separate physical structure

No

Yes, a subtable

Requires extra processing overhead

No

Yes

Is required Can be unique or nonunique

8–6

Introduction to Teradata Warehouse

Join Indexes

Join Indexes A join index is an indexing structure containing columns from one or more base tables. Some queries can be satisfied by examining only the join index when all referenced columns are stored in the index. Such queries are said to be covered by the join index. Other queries may use the join index to qualify a few rows, then refer to the base tables to obtain requested columns that aren't stored in the join index. Such queries are said to be partially-covered by the index. Because the Teradata Database supports multi-table, partially-covering join indexes, all types of join indexes, except the aggregate join index, can be joined to their base tables to retrieve columns that are referenced by a query but are not stored in the join index. Aggregate join indexes can be defined for commonly-used aggregation queries. Much like secondary indexes, join indexes impose additional processing on insert and delete operations and update operations which change the value of columns stored in the join index. The performance trade-off considerations are similar to those for secondary indexes.

Single-Table Join Indexes A single table join index replicates some or all of its columns in another table that is frequently hashed on a join column (usually to match the primary index of the table to which it is most often joined) rather than the primary index of the original base table.

Multi-Table Join Indexes When queries frequently request a particular join, it may be beneficial to predefine the join with a multi-table join index. The optimizer can use the predefined join instead of performing the same join repetitively.

Aggregate Join Indexes When query performance is of utmost importance, aggregate join indexes offer an extremely efficient, cost-effective method of resolving queries that frequently specify the same aggregate operations on the same column or columns. When aggregate join indexes are available, the system does not have to repeat aggregate calculations for every query.

Introduction to Teradata Warehouse

8–7

Join Indexes

You can define an aggregate join index on two or more tables, or on a single table. A single-table aggregate join index includes a summary table with: • •

A subset of columns from a base table Additional columns for the aggregate summaries of the base table columns

You can create an aggregate join index using: •

• •

SUM function A SUM aggregate join index contains a hidden column containing the row count, so that AVERAGE can be calculated from the join index. COUNT function GROUP BY clause

Sparse Join Indexes Another capability of the join index allows you to index a portion of the table using the WHERE clause in the CREATE JOIN INDEX statement to limit the rows indexed. You can limit the rows that are included in the join index to a subset of the rows in the table based on an SQL query result. Any join index, whether simple or aggregate, multi-table or single-table, can be sparse. For example, the following DDL creates J1, which is an aggregate join index containing only the sales records from 2002: CREATE JOIN INDEX J1 AS SELECT storeid, deptid, SUM(sales_dollars) FROM sales WHERE EXTRACT(year, sales_date) = 2003 GROUP BY storeid, deptid;

When you enter a query, the optimizer determines whether accessing J1 gives the correct answer and is more efficient than accessing the base tables. This sparse join index would be selected by the optimizer only for queries that restricted themselves to data from the year 2003.

8–8

Introduction to Teradata Warehouse

Hash Indexes

Hash Indexes The hash index provides a space-efficient index structure that can be hash distributed to AMPs in various ways. The index has characteristics similar to a single-table join index with a row identifier that provides transparent access to the base table. A hash index may be simpler to create than a corresponding join index and takes somewhat less disk storage. The hash index has been designed to improve query performance in a manner similar to a single-table join index. In particular, you can specify a hash index to: • •

Cover columns in a query so that the base table does not need to be accessed Serve as an alternate access method to the base table in a join or retrieval operation

Introduction to Teradata Warehouse

8–9

Index Specification

Index Specification All tables require a primary index. If you do not specify a column or set of columns as the primary index for the table, then CREATE TABLE specifies a primary index by default.

Creating Indexes The following table provides general information about creating indexes. To specify a…

Use the following statement…

And the following clause…

unique primary index (UPI)

CREATE TABLE

UNIQUE PRIMARY INDEX.

non-unique primary index (NUPI)

CREATE TABLE

PRIMARY INDEX.

unique secondary index (USI)

CREATE TABLE

UNIQUE INDEX.

non-unique secondary index (NUSI)

CREATE TABLE

INDEX.

CREATE INDEX

N/A.

join index

CREATE JOIN INDEX

N/A.

Note: A join index can provide an index across multiple tables. hash index

CREATE HASH INDEX

N/A.

Note: A hash index can provide an index across multiple tables.

Indexes are also created when the PRIMARY KEY and UNIQUE constraints are specified.

Strengths and Weaknesses of Various Types of Indexes Teradata Database does not require or allow users to explicitly dictate how indexes should be used for a particular query. The Teradata Database optimizer costs all of the reasonable alternatives and selects the least expensive. The object of any query plan is to return accurate results as quickly as possible. Therefore, the optimizer uses an index or indexes only if the index speeds up query processing. In some cases, the optimizer processes the query without using any index.

8 – 10

Introduction to Teradata Warehouse

Index Specification

Selection of indexes: • • •

Can have a direct impact on overall Teradata performance Is not always a straightforward process Is based partly on usage expectations

The following table assumes execution of a simple SELECT statement and explains the strengths and weaknesses of some of the various indexing methods: This access method…

Has the following strengths…

And the following weaknesses…

Unique Primary Index (UPI)

is the most efficient access method when the SQL statement cont.ains the primary index value

none, provided that the column or columns making up the index are well chosen.

involves one AMP and one row requires no spool file (for a simple SELECT) can obtain the most granular locks Non-unique Primary Index (NUPI)

provides efficient access when the SQL statement contains the primary index value

may slow down INSERTs.

involves one AMP

may decrease the efficiency of SELECTs containing the primary index value when some values are repeated in many rows.

can obtain granular locks but not as fine as a UPI may not require a spool file as long as the number of rows returned is small Unique Secondary Index (USI)

provides efficient access when the SQL statement contains the USI values, and you do not specify primary index values

requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs.

involves two AMPs and one row requires no spool file (for a simple SELECT)

Introduction to Teradata Warehouse

8 – 11

Index Specification This access method…

Has the following strengths…

And the following weaknesses…

Non-unique Secondary Index (NUSI)

provides efficient access when the number of rows per value in the table is small

requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs

involves all AMPS and probably multiple rows

will not be used by the optimizer if the number of data blocks accessed is a significant percentage of the data blocks in the table because the optimizer will determine that a full table scan is cheaper.

provides access using information that may be more readily available than a UPI value, such as employee last name, compared to an employee number may require a spool file Full table scan

Multi-table join index

accesses each row only once

examines every row.

provides access using any arbitrary set of column conditions

usually requires a spool file possibly as large as the base table.

can eliminate the need to perform certain joins and aggregates repetitively

requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs for any of the base tables that contribute to the multi-table join index.

may be able to satisfy a query without referencing the base tables

usually is not suitable for data in tables subjected to a large number of daily INSERTs, UPDATEs, MERGEs, and DELETEs.

can have a different primary index from that of the base table

imposes some restrictions on operations performed on the base table.

can replace an NUSI or a USI Single-table join index

can isolate frequently used columns (or their aggregates) from those that are seldom used

requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs.

can reduce number of physical I/Os when only commonly used columns are referenced

imposes some restrictions on operations performed on the base table.

can have a different primary index from that of the base table

8 – 12

Introduction to Teradata Warehouse

Index Specification This access method…

Has the following strengths…

And the following weaknesses…

Sparse join index

can be stored in less space than an ordinary join index

requires additional overhead for INSERTs, UPDATEs, MERGEs, and DELETEs to the base table.

reduces the additional overhead associated with INSERTs, UPDATEs, MERGE, and DELETEs to the base table when compared with an ordinary join index

imposes some restrictions on operations performed on the base table.

can exclude common values that occur in many rows to help ensure that the optimizer chooses to use the join index to access them

Introduction to Teradata Warehouse

8 – 13

Hashing

Hashing The Teradata Database uses hashing to distribute data to disk storage and uses indexes to access the data. Because the architecture of the Teradata Database is massively parallel, it requires an efficient means of distributing and retrieving its data. That efficient method is hashing. All Teradata indexes are based on (or partially based on) row hash values rather than table column values. For primary indexes, the Teradata Database obtains a row hash by hashing the primary index value. The row hash and a sequence number, which is assigned to distinguish between rows with the same row hash within a table, are collectively called a row identifier and uniquely identify each row in a table. A partition identifier is also part of the row identifier in the case of partitioned primary index tables. For more information on partitioned primary index, see “Partitioned Primary Indexes” on page 8-5. For secondary indexes, the Teradata Database implements the index as a row identifier based on the: • • •

8 – 14

Hash the secondary index value Actual value of the secondary index List of row identifiers for rows with that secondary index value.

Introduction to Teradata Warehouse

Identity Column

Identity Column Identity Column is a column attribute option defined in the ANSI standard. When associated with a column, this attribute causes the system to generate a unique, table-level number for every row that is inserted into the table. Identity columns have many applications, including the automatic generation of UPIs, USI, and primary keys. For example, an identity column can serve as a UPI to ensure even data distribution when you import data from a system that does not have a primary index. For more information about indexes, see “Teradata Database Indexes” on page 8-2.

Introduction to Teradata Warehouse

8 – 15

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…

THEN see…

Identity columns

SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements

Indexes and hashing

Database Design SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing

8 – 16

Introduction to Teradata Warehouse

Chapter 9:

Data Dictionary The Data Dictionary is a set of system tables that contain data about user databases and properties of those databases in addition to a great deal of administrative information about the Teradata Database. This chapter provides information about the Data Dictionary. Topics include: • • •

Definition of the Data Dictionary Data Dictionary views SQL used to access the Data Dictionary

Introduction to Teradata Warehouse

9–1

What is the Data Dictionary

What is the Data Dictionary The Data Dictionary comprises tables and views that reside in the system database called DBC. These tables and views are reserved for use by the system and contain information, called metadata, about the data associated with the Teradata Database.

Data Dictionary Content Data Dictionary system tables include current definitions, control information, and general information about the following: • • • • • • • • • • • • • • • • • • • • • • • • •

9–2

Databases Users Roles Profiles Accounts Tables Views Columns Indexes Constraints Sessions and session attributes Triggers Access rights Journal tables Disk space Events Resource usage Macros Stored procedures Logs Rules Translations Character sets Statistics User-defined functions

Introduction to Teradata Warehouse

What is the Data Dictionary

What Is in a Data Dictionary Table The following table contains information about what is stored in the Data Dictionary when you create some of the most important objects:

WHEN you create…

THEN the definition of the object is stored along with the following details…

a table

table name and table location. database name, creator name, and user names of all owners in the hierarchy. each column in the table, including column name, data type, length, and phrases. user/creator access privileges on the table. indexes defined for the table. constraints defined for the table. table backup and protection, including fallback status and permanent journals. date and time the object was created.

a database

database name, creator name, owner name, and account name. space allocation including: • Permanent • Spool • Temporary number of fallback tables. collation type. password string and password change date. creation time stamp. logon and account logon rules. the date and time the database was last altered and the name that altered it. role and profile names. a unique identifier for the name of the UDF library.

Introduction to Teradata Warehouse

9–3

What is the Data Dictionary

WHEN you create…

a user

THEN the definition of the object is stored along with the following details…

user-name, creator name, and owner name. the date and time the password was last modified. space allocation including: • Permanent • Spool • Temporary default account, database, collation, character type, and date form. creation time stamp. name and time stamp of the last alteration made to the user. role and profile name.

WHEN you create a…

THEN the following details are entered in the Data Dictionary…

view or macro

the text of the view or macro. creation time attributes. user and creator access privileges.

stored procedure

creation time attributes. parameters including parameter name, parameter type, data type, and default format. user and creator access privileges.

9–4

Introduction to Teradata Warehouse

What is the Data Dictionary

WHEN you create a…

THEN the following details are entered in the Data Dictionary…

trigger

The IDs of the: • Table • Trigger • Database and subject table database • User who created the trigger • User who last updated the trigger time stamp for the last update. indexes. trigger name and: • whether the trigger is enabled • the event that fires the trigger • the order in which triggers fire. default character set. creation text and time stamp. overflow text, that is, trigger text that exceeds a specified limit. fallback tables.

User-defined function

database name, function name, specific name. number, data type, and style of parameters. function ID, function type, and external name. source file language. character type. external file reference. platform.

Introduction to Teradata Warehouse

9–5

Teradata Database Data Dictionary Views

Teradata Database Data Dictionary Views You can examine the information in the system tables in database DBC directly or through a series of views. Typically, you use views to obtain information on the objects in the Data Dictionary rather than querying the actual tables, which can be very large. The database administrator controls who has access to views.

What Is in a View A view is a virtual table that you see as a base table. Think of a view as a dynamic window to the underlying tables in the database. A view is constructed from one or more base tables, or views. However, a view usually presents only a subset of the columns and rows in the base table or tables that comprise the view. Some view columns do not exist in the underlying base tables. For example, it is possible to present data summaries in a view (for example, an average), which you cannot maintain in a base table. You can create hierarchies of views in which views can be created on views. This can be useful, but you should be aware that deleting any of the lower-level views invalidates dependencies of higher-level views in the hierarchy.

Why Use Views There are at least four reasons to use views. Views provide all of the following: • • • •

9–6

A simplified user perception of the database Security for restricting table access and updates Well-defined, well-tested, high-performance access to data Logical data independence, which minimizes application modification if base tables require restructuring

Introduction to Teradata Warehouse

Who Uses Data Dictionary Views

Who Uses Data Dictionary Views Some Data Dictionary views may be restricted to special types of users, while others are accessible by all users. The database administrator controls access to views by granting access rights. The following table defines the information needs of various types of users: This type of user…

Needs to know…

End

•

Objects to which the user has access

•

Types of access available to the user

•

Access rights the user has granted to other users

•

How to create and organize databases

•

How to monitor space usage

•

How to define new users

•

How to allocate access privileges

•

How to create indexes

•

How to perform archiving operations

•

Performance

•

Status and statistics

•

Errors

•

Accounting

•

Access logging rules generated by the execution of BEGIN LOGGING statements

•

Results of access checking events, logged as specified by the access logging rules

•

Archive and recovery activities

Supervisory

Database administrator

Security administrator

Operations control

Introduction to Teradata Warehouse

9–7

SQL Access to the Data Dictionary

SQL Access to the Data Dictionary Every time you log on to the Teradata Database, perform an SQL query, or type a password, you are using the Data Dictionary. For security and data integrity reasons, the only SQL DML command you can use on the Data Dictionary is the SELECT statement. You cannot use the INSERT, UPDATE, MERGE, or DELETE SQL statements to alter the Data Dictionary in any way. You can use SELECT to examine any view in the Data Dictionary to which your database administrator has granted you access. For example, if you need to access information in the Personnel database, then you can query the DBC.Databases view as shown: SELECT Databasename, Creatorname, Ownername, Permspace FROM DBC.Databases WHERE Databasename=’Personnel’ ;

The query above produces a report like this:

9–8

Databasename

Creatorname

Ownername

Permspace

Personnel

Jones

Jones

1,000,000

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database book: IF you want to learn more about…

THEN see…

Data Dictionary

Data Dictionary

Introduction to Teradata Warehouse

9–9

For More Information

9 – 10

Introduction to Teradata Warehouse

Chapter 10:

Teradata Meta Data Services Services The Teradata® Meta Data Services product provides a means of storing, administering, and navigating metadata in a Teradata Warehouse. It is the only metadata management system optimized for and integrated with the Teradata Database environment. Topics include: • • •

What is metadata Types of metadata Teradata Meta Data Services

Introduction to Teradata Warehouse

10 – 1

What Is Metadata

What Is Metadata Metadata is the term applied to the definitions of the data stored in the Teradata Warehouse. Simply put, metadata is data about data. In a transaction processing database environment, a Data Dictionary generally satisfies the need for data about data. In the data warehouse environment, the requirements for a more elaborate metadata storage system can exceed the capabilities of the Data Dictionary. Metadata plays an important role across the Teradata Warehouse architecture. In the operational database environment, that role is very formal. All development should use metadata as a standard part of the design and development process. As far as the data warehouse is concerned, metadata is used to locate data. Without it, you cannot not interact with the data in the data warehouse because you have no means of knowing how the tables are structured, what the precise definitions of the data are, or where the data originated.

10 – 2

Introduction to Teradata Warehouse

Types of Metadata

Types of Metadata Metadata has been around for as long as there have been programs and data. Bu, in the world of data warehouses, metadata takes on a new level of importance. Using metadata, you can make the most effective use of the Teradata Warehouse. Metadata allows the decision support system (DSS), analyst, to navigate through the possibilities. The major component of the DSS environment is archival data, that is, data with a timestamp. Because archival data is timestamped, it makes sense to store metadata with the actual occurrences of data, which are time stamped as well. The following table contains information about the types of metadata: For the…

The following types of metadata are stored…

data model

description. specification. the layout of the physical data model tables. relation between the data model and the data warehouse.

data warehouse

data source (system of record). definition of the system of record. mapping from system of record to the data warehouse and other places defined in the environment. table structures and attributes. any relationship or artifacts of relationships transformation of data as it passes into the data warehouse. history of extracts. extract logging. common routines for data access.

Introduction to Teradata Warehouse

10 – 3

Types of Metadata For the…

The following types of metadata are stored…

columns

columns in a row. order in which the columns appear. physical structure of the columns. any variable-length columns. any columns with NULL values. unit of measure of any numeric columns. any encoding used.

database design

description of the layouts used. structure of data as known to the programmers and analysts.

10 – 4

Introduction to Teradata Warehouse

Teradata Meta Data Services

Teradata Meta Data Services Teradata Meta Data Services (MDS) is software that creates a repository in the Teradata Warehouse in which metadata is stored. MDS also permits the DSS analyst to administer and navigate metadata in the warehouse. Teradata Meta Data Services is the only metadata management system optimized for and integrated with the Teradata Warehouse environment. The following table provides information about the benefits of Teradata Meta Data Services to several user groups: For this type of user…

Teradata MDS…

application developers

• Provides a persistent store for application metadata so that developers can concentrate on developing application functions. • Allows the developer to manipulate metadata with the same techniques used to manipulate other data. • Provides security (MDS controls the read and write access). • Allows metadata to be shared between applications. This allows integration of tools such as ordered analytical functions and data mining tools. • Allows application data to be modeled around Teradata Database metadata maintained by MDS. MDS maintains the metadata so that the application is kept current with warehouse database changes.

Introduction to Teradata Warehouse

10 – 5

Teradata Meta Data Services For this type of user…

Teradata MDS…

database administrator

• Provides a common repository for Teradata Warehouse components. • Provides a single shared copy of metadata, or a single version of the truth. One copy eliminates multiple islands of redundant metadata that can cause confusion and administrative difficulties. • Provides the capabilities to browse through data in the repository and to drill-down to see successive levels of detail. • Shows interrelationships between different data definitions. • Provides impact analysis of proposed changes.

business user

• Provides the foundation for a “warehouse view” of enterprise computing. • Allows business analysts to quickly determine where their data comes from, how it was changed, when it was last updated, and how the answer was determined. This greatly increases the value of the detail data and implicitly the value of the metadata. • Supports third-party tools that can be used to import metadata into MDS for viewing. • Supports a web browser that provides general reporting and search capabilities and shows strategic metadata relationships.

Creating the Teradata Meta Data Repository The Teradata MDS repository is a set of tables that resides in the Teradata Database. You must use MDS program software to create these tables before metadata can be added, stored, or accessed.

Connecting to the Teradata Meta Data Repository Each system running a Teradata MDS application must have the following: • •

10 – 6

The appropriate Teradata ODBC driver An ODBC System Data Source Name (DSN) connection to the Teradata Database where the MDS repository resides.

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Meta Data Services and Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

Teradata Meta Data Services

Teradata Meta Data Services Installation and Administrator Guide Teradata Meta Data Services Programmer Guide

Teradata ODBC driver

Introduction to Teradata Warehouse

Teradata ODBC Driver User Guide

10 – 7

For More Information

10 – 8

Introduction to Teradata Warehouse

Chapter 11:

Other Database Objects This chapter provides more information about a few of the database objects stored in the Teradata Database. Topics include: • • • •

Views Stored Procedures Macros Triggers

Introduction to Teradata Warehouse

11 – 1

What Are Views

What Are Views View database objects are actually virtual tables that you can use (as if they were physical tables) to retrieve data defining columns from underlying views and/or tables. Views are an integral to the Data Dictionary because view definitions are stored there. For more information about the role views play in the Data Dictionary, see “What Is in a View” and “Why Use Views” on page 9-6. A view does not contain data and is not materialized until an SQL statement references it. Views are useful because they can simplify access to information in the Teradata Database.

SQL Statements Related to Views The following table lists SQL statements that you can use to implement and change views: Use…

To…

CREATE VIEW

name the view and columns contained in the view. define a SELECT on one or more columns from other tables and/or views.

REPLACE VIEW

alter the characteristics of an existing view.

Restrictions on Using Views You can use views as if they were tables in SELECT statements. Views are subject to some restrictions regarding the INSERT, UPDATE, MERGE, and DELETE statements. For more information, see “SQL Access to the Data Dictionary” on page 9-8.

11 – 2

Introduction to Teradata Warehouse

What Are Teradata Stored Procedures

What Are Teradata Stored Procedures The stored procedure database object that is executed on the Teradata Database server space. It is a combination of procedural control statements, SQL statements, and control declarations that provides a procedural interface to the Teradata Database.

Why Use Stored Procedures Using stored procedures, you can build large and complex database applications. In addition to a set of SQL control statements and condition handling statements, a stored procedure can contain the following: • • •

Multiple input and output parameters Local variables and cursors SQL DDL, DCL, DML, and SELECT statements, including dynamic SQL, with a few exceptions Dynamic SQL is a method of invoking an SQL statement by creating and submitting it at runtime from within a stored procedure.

Applications based on stored procedures provide the following benefits: • • • • • •

They reduce network traffic in the client-server environment because stored procedures reside and execute on the server. They allow encapsulation and enforcement of business rules on the server, contributing to improved application maintenance. They provide better transaction control. They provide better security by granting the user access to the procedures rather than to the data tables. They provide an exception handling mechanism to handle the runtime conditions generated by the application. All the SQL and SQL control statements embedded in a stored procedure are executed by submitting one CALL statement. Nested CALL statements further extend the versatility.

Introduction to Teradata Warehouse

11 – 3

What Are Teradata Stored Procedures

Elements of a Teradata Stored Procedure A Teradata stored procedure comprises some or all of the following elements: This elements…

Includes…

SQL control statements

nested or non-nested compound statements

Control declarations

Condition handlers in DECLARE HANDLER statements for completion and exception conditions: Note: Condition handlers can be: •

CONTINUE or EXIT type.

•

Defined for a specific SQLSTATE code, the generic exception condition SQLEXCEPTION, or generic completion conditions NOT FOUND and SQLWARNING

•

Local variable declarations in DECLARE statements

Local variable declarations in DECLARE statements Cursor declarations in DECLARE CURSOR statements Note: Cursors can be either updatable or read only type. These can also be declared in FOR iteration statements.

SQL transaction statements

DDL, DCL, DML, and SELECT statements, including dynamic SQL statements, with a few exceptions

LOCKING modifiers

with all supported SQL statements except CAL

bracketed and simple comments

Note: Nested bracketed comments are not allowed.

For more information, see “Teradata Stored Procedures as SQL Applications” on page 6-7.

11 – 4

Introduction to Teradata Warehouse

What Are Macros

What Are Macros The macro database object consists of one or more SQL statements that can be executed by performing a single statement. Each time the macro is performed, one or more rows of data can be returned.

SQL Statements Related to Macros The following table lists the basic SQL statements that you can use with macros: Use this statement…

To…

CREATE MACRO

incorporate a frequently used SQL statement or series of statements into a macro.

EXECUTE

run to a macro. Note: A macro can also contain an EXECUTE statement that executes another macro.

DROP MACRO

delete a macro.

Single-User and Multi-User Macros You can create a macro for your own use, or grant execution authorization to others. For example, your macro might enable a user in another department to perform operations on the data in the Teradata Database. When executing the macro, the user need not be aware of the database access, the tables affected, or even the results.

Macro Processing Regardless of the number of statements in a macro, the Teradata Database treats it as a single request. When you execute a macro, the system processes either all of the SQL statements, or processes none of the statements. If a macro fails, the system aborts it, backs out any updates, and returns the database to its original state.

Introduction to Teradata Warehouse

11 – 5

What Are Triggers

What Are Triggers The trigger defines events that happen when some other event, called a triggering event, occurs. This database object is essentially, a stored SQL statement associated with a table called a subject table. Teradata has ensured that its trigger implementation complies with ANSI SQL3 specifications. Triggers execute when any of the following modifies a specified column or columns in the subject table: • • •

DELETE INSERT UPDATE

Typically, the stored SQL statements perform a DELETE, INSERT, or UPDATE on a table different from the subject table.

Types of Triggers Teradata Database supports two types of triggers: This type of trigger…

Fires for each…

statement

statement that modifies the subject table.

row

row modified in the subject table.

When Do Triggers Fire You can specify when triggers fire: WHEN you specify…

THEN the triggered action…

BEFORE

executes before the completion of the triggering event. As specified in ANSI SQL3 standard, a BEFORE trigger cannot have data changing statements in the triggered action.

AFTER

executes after completion of the triggering event.

Sometimes a statement fires a trigger, which, in turn, fires another trigger. Thus the outcome of one triggering event can itself become another trigger. The

11 – 6

Introduction to Teradata Warehouse

What Are Triggers

Teradata Database processes and optimizes the triggered and triggering statements in parallel to maximize system performance.

ANSI-Specified Order When you specify multiple triggers on a subject table, both BEFORE and AFTER triggers execute in the order in which they were created as determined determined by the timestamp of each trigger. Triggers are sorted according to the preceding ANSI rule, unless you use the Teradata extension, ORDER. This extension allows you to specify the order in which the triggers execute, regardless of creation time stamp.

Trigger Functions You can use triggers to perform various functions: • •

• •

Define a trigger on the subject table to ensure that UPDATEs and DELETEs performed to the parent table are propagated to another table. Use triggers for auditing. For example, you can define a trigger which causes INSERTs in a log table when an employee receives a raise higher than 10%. Use a trigger to disallow massive UPDATEs, INSERTs, or DELETEs during business hours. Use a trigger to set a threshold. For example, you can use triggers to set thresholds for inventory of each item by store, to create a purchase order when the inventory drops below a threshold, or to change a price if the daily volume does not meet expectations.

SQL Statements Related to Triggers The following table lists the basic SQL statements that you can use with triggers: Use this statement…

To…

CREATE TRIGGER

create a trigger.

REPLACE TRIGGER

change the definition of a trigger without dropping and recreating it.

DROP TRIGGER

drop a trigger definition from a subject table.

HELP TRIGGER

display the attributes of the specified trigger.

SHOW TRIGGER

display the text used to create the trigger.

Introduction to Teradata Warehouse

11 – 7

What Are Triggers Use this statement…

To…

ALTER TRIGGER

enable, disable, or modify the creation time stamp of a trigger. Note: ALTER TRIGGER is a Teradata extension that is not included in ANSI specifications.

RENAME TRIGGER

change the name of a trigger.

Elements of a Trigger The definition of a database trigger resides in the Data Dictionary. The definition contains some or all of the following elements: Element

Comment

Trigger name

The trigger name must be unique within a database, that is, a trigger and any other object in the database cannot have the same name.

Enabled/Disabled

When you disable a trigger, the definition still resides in the Data Dictionary. To enable the disabled trigger, you can execute: ALTER TRIGGER ENABLED. Note: The ENABLE/DISABLE option is a Teradata extension to ANSI SQL3 triggers.

11 – 8

Table name

The name of the subject table must be the name of an existing base table, not a view, temporary table, join index, or hash index.

Trigger action time

The triggering statement executes based on whether you specify BEFORE or AFTER when you create the trigger: Use…

To…

BEFORE

fire the trigger before the triggering statement executes.

AFTER

fire the trigger after the triggering statement executes.

Introduction to Teradata Warehouse

What Are Triggers Element

Triggering event

Comment

The event is identified by the statement type that causes the trigger to fire.

IF the statement type is…

THEN triggering statement can be the following…

INSERT

• INSERT • INSERT/SELECT • Atomic Upsert • MERGE INTO

UPDATE

DELETE

• UPDATE •

Atomic Upsert

•

MERGE INTO

DELETE

Column name list

The list contains the column names that appear in the subject table for an UPDATE trigger. The columns list applies only when the triggering event is an UPDATE.

Order

When you define multiple triggers, you can specify the order in which the triggers execute. Order values are integers from 1 and 32767.

Transition Table and Transition Rows

The transition table is a temporary table comprising transition rows.

REFERENCING clause

The clause does the following:

The transition rows hold the old and new values for the rows that are modified by a data modifying statement. The transition table is not stored in the Data Dictionary.

• Allows the WHEN condition and triggered actions to reference a set of rows in the transition table • Permits a row trigger to reference variables representing columns of the current row in the transition table. The rules for BEFORE and AFTER triggers are: • AFTER statement triggers can reference transition tables only. • AFTER row triggers can reference both transition rows and transition tables. • BEFORE row triggers can reference transition rows only.

Introduction to Teradata Warehouse

11 – 9

What Are Triggers Element

Comment

Triggered action

• You can specify trigger granularity as either ROW or STATEMENT. • WHEN is the optional search condition. • The database evaluates the search condition as follows: Once for each execution of the triggering statement for a statement trigger Once for each row of the transition table of changed rows for a row trigger. • Cascading is not itself an element but derives from trigger definitions. Sometimes a statement fires a trigger, which, in turn, fires another trigger. Thus the outcome of one triggering event can itself become another trigger. • AFTER row and AFTER statement triggers can cascade. • Backward references to triggering statements are permitted in a chain of cascading triggers. In other words, recursive triggers are allowed.

Triggered SQL statement

Generally, triggered SQL statements comprise a single statement or a block of statements.

Restrictions on Triggers The following table lists restrictions associated with using triggers: Restriction

11 – 10

Comment

The FastLoad and MultiLoad utilities cannot load data into tables that have triggers defined.

You must disable triggers before running the FastLoad and MultiLoad utilities.

A positioned (updatable cursor) UPDATE or DELETE is not allowed to fire a trigger.

You will receive an error message.

You cannot define triggers, join indexes, or hash indexes on the same table.

N/A

The limit for cascading triggers is 16.

You will receive an error message when a triggering statement causes the cascading level to exceed 16.

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database books: If you want to learn more about…

THEN see…

Teradata stored procedures

SQL Reference: Stored Procedures and Embedded SQL

Triggers, views, and macros

Database Design SQL Reference: Data Definition Statements SQL Reference: Data Manipulation Statements SQL Reference: Fundamentals SQL Reference: Statement and Transaction Processing Security Administration

Introduction to Teradata Warehouse

11 – 11

For More Information

11 – 12

Introduction to Teradata Warehouse

Section Contents

Section 3:

Teradata Database System Operation

Introduction to Teradata Warehouse

Section Contents

Introduction to Teradata Warehouse

Chapter 12:

Normalization and Referential Integrity This chapter reviews some concepts of the normalization process. The following topics are described in the chapter. • Normal forms • Referential integrity

Introduction to Teradata Warehouse

12 – 1

Normalization

Normalization Normalization is the process of reducing a complex data structure into a simple, stable one. Generally this process involves removing redundant attributes, keys, and relationships from the conceptual data model.

Normal Forms Normalization theory is constructed around the concept of normal forms that define a system of constraints. If a relation meets the constraints of a particular normal form, we say that relation is “in normal form." Think of normal forms as an onion, with the outermost layer being the set of all relations, including unnormalized relations. As you work your way to the core of the onion, you must pass through each lower normal form. As a result, a relation that has achieved fifth normal form has also achieved first, second, third, and fourth normal forms. By definition, a relational database is always normalized to some degree, because the column values are always atomic. But to simply leave it at that invites a number of problems including redundancy and potential update anomalies. The higher normal forms were developed to correct those problems. The following figure illustrates the layers of normalization.

All relations 1NF relations 2NF relations 3NF relations

BCNF relations 4NF relations 5NF relations

FG04A001

12 – 2

Introduction to Teradata Warehouse

Normalization

Relational Database Terminology The table below defines some important terms that will help you understand discussion of normal forms: Term

Primary key

Definition

A unique identifier for a relation. Set theory (and relational database theory) does not allow duplicate rows for a relation with a primary key. However, commercially available relational databases often allow duplicate rows in relations. In those cases, the relation does not have a primary key. The Teradata Database permits enforcement of the no duplicates rule even when no primary key is specified.

Candidate key

One of multiple unique identifiers for a relation. Any relation might have multiple unique identifiers. A candidate key must satisfy the properties of uniqueness and minimality. That is, for any attribute, no two rows of the table may have the same value for that attribute, and if it is composite, no component can be eliminated without destroying the uniqueness property.

Alternate key

Any candidate key not chosen as the primary key.

Foreign key

A primary key in another relation that is also a column value in the current relation. Foreign keys are used to join tables and may be part of the primary key.

Functional dependence

Attribute X is functionally dependent on attribute Y, if and only if each Y value in the relation has associated with it exactly one X value.

Full functional dependence

Attribute X is fully functionally dependent on attribute Y, if and only if it is functionally dependent on Y and not functionally dependent on any proper subset of Y.

Transitive dependence

A state in which an attribute is fully functionally dependent but by means of an intermediate attribute. Transitive dependence is a state that normalization strives to eliminate.

Determinant

Any attribute on which some other attribute is fully functionally dependent.

Multivalued dependence

Given a relation with attributes X, Y, and Z, multivalued dependence holds if and only if the set of Y-values matching a given (X-value, Z-value) pair depends only on the X-value and is independent of the Z-value.

Join

An operation in which data is retrieved from more than one table.

Introduction to Teradata Warehouse

12 – 3

Normalization Term

12 – 4

Definition

Join dependency

A relation satisfies join dependency if and only if it is equal to the join of its projections on its component attributes.

Constraint

A well-defined physical restriction that can be defined for a table or a column.

Introduction to Teradata Warehouse

First, Second, and Third Normal Forms

First, Second, and Third Normal Forms This section describes the first three normal forms, including what they are, why we need them, and how to achieve them. These first three normal forms are stepping stones to the Boyce-Codd normal form and, when appropriate, the higher normal forms. The next section contains a discussion Boyce-Codd (BCNF) and higher normal forms.

First Normal Form First normal form (1NF) is definitive of a relational database. If we are to consider a database relational, then all relations in the database must be in 1NF. We say a relation is in 1NF if all fields within that relation (simple domains in mathematics) are atomic. This means that a field can contain one and only one value. We sometimes refer to this concept as the elimination of repeating groups from a relation. Furthermore, first normal form allows no hierarchies of data values. The formal definition of first normal form is as follows: For a relation to be in 1NF, the relationship between the primary key of the relation and each of the other attributes must be one-to-one (in that direction). In other words, all underlying simple domains of the relation may contain atomic values only. In this way, the non-key attributes are functionally dependent on the key. Note: A non-key attribute is any attribute that is not part of the primary key for the relation.

Second Normal Form Second normal form (2NF) deals with the elimination of circular dependencies from a relation. We say a relation is in 2NF if it is in 1NF and if every non-key attribute is fully dependent on the entire primary key. The formal definition of second normal form is as follows: For a relation to be in 2NF, the relationship between any portion of the primary key of a relation and each of the other columns must not be one-to-one (in that direction). In other words, the non-key columns are fully functionally dependent on the primary key.

Introduction to Teradata Warehouse

12 – 5

First, Second, and Third Normal Forms

Third Normal Form Third normal form (3NF) deals with the elimination of non-key attributes that do not describe the primary key. The formal definition of third normal form is as follows: For a relation to be in 3NF, the relationship between any two non-primary key columns or groups of columns in a relation must not be one-to-one in either direction. In other words, the non-key columns are non-transitively dependent upon each other and the key. No transitive dependencies implies no mutual dependencies. We say attributes are mutually independent if none of them is functionally dependent on any combination of the others. This mutual independence ensures that we can update individual attributes without any danger of affecting any other attribute in a row.

Advantages of Normalization The following list of benefits summarizes the advantages of implementing a normalized logical model in 3NF. • • • • •

12 – 6

Greater number of relations More primary index choices Optimal distribution of data Fewer full table scans More joins possible

Introduction to Teradata Warehouse

Boyce-Codd Normal Form and Higher Normal Forms

Boyce-Codd Normal Form and Higher Normal Forms When the relational model of database management was originally proposed, it only addressed the first three normal forms. Later work with the model showed that 3NF required further refinement to ensure that update anomalies would never occur. This section describes Boyce-Codd normal form and briefly mentions fourth and fifth normal forms for completeness.

Boyce-Codd Normal Form Third normal form (3NF) does not handle situations in which a relation has multiple composite candidate keys with overlapping attributes. To eliminate these problems, Codd developed the so-called Boyce-Codd normal form (BCNF), which reduces to 3NF whenever the special situation that defines this problem does not apply. A relation is in BCNF if and only if every determinant is a candidate key. This means that only determinants are candidate keys.

Fourth Normal Form We say a relation is in fourth normal form (4NF) if and only if, whenever a multivalued dependency exists in the relation (for example, say X multiply determines Y), then all attributes of the relation are also functionally dependent on X. In practice, we rarely see the need for 4NF.

Fifth Normal Form So far it has been possible to normalize a relation by decomposing it into two of its projections. In rare occasions, simple projections are not sufficient to decompose a non-normal relation into two relations. In these rare instances, we use fifth normal form (5NF) to decompose the unnormalized relation into three or more projections of the original relation. We say a relation is fifth normal form (5NF - sometimes called projection-join normal form, or PJ/NF) if and only if every join dependency in the relation is a consequence of the candidate keys of the relation. This makes 5NF the final possible normal form to be achieved by taking projections and using joins. It is guaranteed to be free of all anomalies that can be removed by taking projections, but not necessarily of all possible anomalies.

Introduction to Teradata Warehouse

12 – 7

Referential Integrity

Referential Integrity Traditional referential integrity is the concept of relationships between tables, based on the definition of a primary key and a foreign key. The concept states that a row cannot exist in a table with a non-null value for a referencing column if an equal value does not exist in a referenced column. Using referential integrity, you can specify columns within a referencing table that are foreign keys for columns in some other referenced table. You must define referenced columns as either primary key columns or unique columns. Referential integrity is a reliable mechanism that prevents accidental database inconsistencies when you perform INSERTS, UPDATES, and DELETES.

Referential Integrity in the Teradata Database To implement referential integrity in the Teradata Database, you have three choices: • • •

Use the referential integrity constraint checks supplied by the database software Write your own, site-specific macros, triggers, or stored procedures to enforce referential integrity Enforce constraints through application code

For information about bypassing the standard referential constraint checks, see “Referential Constraints” on page 12-11.

Referential Integrity Terminology We use the following terms to explain the referential integrity concept: Term

12 – 8

Definition

Parent Table

The table referred to by a Child table. Also called the “referenced table.”

Child Table

A table in which the referential constraints are defined. Also called the “referencing table.”

Parent Key

A primary or secondary key in the parent table.

Primary Key

With respect to referential integrity, a primary key is a parent table column set that is referred to by a foreign key column set in a child table.

Foreign Key

With respect to referential integrity, a foreign key is a child table column set that refers to a primary key column set in a parent table.

Introduction to Teradata Warehouse

Referential Integrity

Referencing (Child) Table We call the referencing table the Child table, and we call the specified Child table columns the referencing columns. Referencing columns should be of the same number and have the same data type as the referenced table key.

Referenced (Parent) Table A Child table must have a parent table, and the referenced table is referred to as the Parent table. The parent key columns are the referenced columns.

Why Is Referential Integrity Important Referential integrity is important, because it keeps you from introducing errors into your database. Suppose you have a table like the following: ORDER PART Order Number

Part Number

Quantity

PK FK

FK

Not Null

1

1

110

1

2

275

2

1

152

Part number and order number, each foreign keys in this relation, also form the composite primary key. Suppose you were to delete the row defined by the primary key value 1 in the PART NUMBER table. The foreign key for the first and third rows in the ORDER PART table would now be inconsistent, because there would be no row in the PART NUMBER table with a primary key of 1 to support it. Such a situation shows a loss of referential integrity. Teradata provides referential integrity to prevent this from happening. If you try to delete a row from the PART NUMBER table for which you have specified referential integrity, the database management system will not allow you to remove the row.

Introduction to Teradata Warehouse

12 – 9

Referential Integrity

Besides data integrity and data consistency, referential integrity provides these benefits: Benefit

12 – 10

Description

Increases development productivity

You do not need to code SQL statements to enforce referential integrity constraints, because the Teradata Database automatically enforces referential integrity.

Requires fewer written programs

All update activities are programmed to ensure that referential integrity constraints are not violated, because the Teradata Database enforces referential integrity in all environments. Additional programs are not required.

Introduction to Teradata Warehouse

Referential Integrity Constraints

Referential Integrity Constraints The combination of the foreign key, the parent key, and the relationship between the two is called the referential integrity constraint. The table containing the parent key is called the parent table and the table with the foreign key is called the child. Teradata provides two other features related to referential integrity constraints: • •

Referential constraints Batch referential integrity constraints

The following table summarizes the basic differences among these referential constraint types: Does This Type Enforce Referential Integrity

Referential Constraint Type

Level of Referential Integrity Enforcement

Referential constraint

No

None

Batch referential integrity constraint

Yes

Transaction

Referential integrity constraint

Yes

Row

Referential Constraints The referential constraint is a mechanism that allows you to specify a type of constraint that is not enforced by the Teradata Database. This capability avoids the database overhead of enforcing the referential integrity, but at the same time, the optimizer can use the constraint information. The ability to specify referential constraints, using the CREATE TABLE and ALTER TABLE statements, is particularly helpful in eliminating redundant joins based on parent key and foreign key relationships. Successful use of referential constraints depends heavily upon your knowledge of the database. To avoid the introduction of inconsistencies, you may choose to use another mechanism to enforce database integrity.

Batch Referential Integrity Teradata offers batch referential integrity as a middle ground between traditional referential integrity and referential constraints. Batch referential integrity is a reliable mechanism that prevents accidental database inconsistencies when you perform INSERTs, UPDATEs, and DELETEs. You can use the WITH CHECK OPTION clause to specify batch referential integrity in CREATE TABLE and ALTER TABLE statements. When you specify

Introduction to Teradata Warehouse

12 – 11

Referential Integrity Constraints

the WITH CHECK OPTION, the database enforces the referential integrity constraint as all or nothing. This means that all child rows have a match in the parent table, otherwise, the database aborts the alter table, insert, delete or update transaction. If you specify the WITH NO CHECK OPTION clause in CREATE TABLE and ALTER TABLE statements, the database does not enforce constraints. You should use extreme care when manipulating data within a NO CHECK environment. NO CHECK means that a row having a non-null value in a foreign key column is allowed to exist in a child table when an equal value does not exist in the parent key or alternate key column of the parent table. Operations such as INSERT, DELETE, or UPDATE are allowed on NO CHECK tables that cannot be performed on tables that have WITH CHECK OPTION specified. Data in the parent tables of these relationships can be deleted or corrupted. Depending on the operation, the database does not give a warning if such an error occurs. Batch referential integrity is less expensive to enforce than standard referential integrity for transactions affecting multiple rows because the database handles batch referential integrity on a transaction-basis rather than on a row-by-row basis.

Rules for Referential Integrity Constraints Referential integrity constraints must meet the following criteria: To implement referential integrity…

Must…

The parent key columns

exist when the referential constraint is defined. be either a unique primary index (UPI) or a Unique Secondary Index (USI) and not null.

The foreign and parent key

have the same number of columns and their data types must match. not exceed 64 columns. not be dropped or altered with the ALTER TABLE statement after you have defined a referential integrity constraint on them. To use ALTER TABLE to drop a foreign or parent key after a referential integrity constraint has been defined, first drop the referential constraint and then USE ALTER TABLE to drop the foreign or parent key columns.

12 – 12

Introduction to Teradata Warehouse

Referential Integrity Constraints To implement referential integrity…

Must…

Foreign key

be equal to the parent key, or it must be null.

When the parent and child tables are the same table, a condition called selfreference, the foreign key and parent keys

not consist of identical columns.

Referential constraints

not be duplicated.

The number of referential constraints defined per table

not exceed 64.

Referential Constraint Checks The Teradata Database performs referential constraint checks whenever you do any of the following: • • •

Add a referential constraint to a populated table Insert, delete, or update a row Modify a parent or foreign key, for example using ALTER table

The following table summarizes how the Teradata Database enforces referential constraint checks: WHEN performing…

The Teradata Database…

an INSERT into parent table

does nothing.

an INSERT into child table

ensures that the parent key value contains a matching value if the foreign key is not null.

a DELETE from parent table

aborts the request if the deleted parent key is referenced by any foreign key.

a DELETE from child table

does nothing.

an UPDATE parent table

aborts the request if the parent key is referenced by any foreign key.

an UPDATE child table

ensures that the new value matches the parent key when the foreign key is updated.

Introduction to Teradata Warehouse

12 – 13

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database book: IF you want to learn more about…

THEN see…

Normalization

Database Design

Referential integrity Relational model of database management

12 – 14

Introduction to Teradata Warehouse

Chapter 13:

Data Communication Between Client and Teradata Database This chapter describes various ways the client applications can communicate with the Teradata Database. Teradata provides the Call Level Interface (CLI) that provides the service routines needed by applications. In addition to CLI, Teradata supports other industry standard communications protocols. Topics in this chapter include: • • •

How clients attach to Teradata Database CLI for channel- and network-attached clients Other standard communications methods

Introduction to Teradata Warehouse

13 – 1

Attachment Methods

Attachment Methods Clients can connect to the Teradata Database using one of the following methods: • •

Channel attached through an IBM mainframe Network attached through a Local Area Network (LAN)

Client applications that manipulate data on the Teradata Database server communicate with the database indirectly by means of communications interfaces: • •

Call Level Interface Version 2 (CLIv2) for channel-attached systems Call Level Interface Version 2 (CLIv2) for network-attached systems

Both versions provide the same functions. The CLIv2 is a library of service routines that act as subroutines of the application. The modules in the CLIv2 library vary based on whether the client is channel- or network-attached. Other types of communications interfaces are available including interfaces for systems running Microsoft Windows 2000 and interfaces for systems running NCR UNIX MP-RAS. The interfaces include: • • •

Windows Call Level Interface (WinCLI) (Windows-based system) Open Database Connectivity (ODBC) (Windows and UNIX MP-RAS-based systems) Java Database Connectivity (JDBC) (Windows and UNIX MP-RAS-based systems)

The data communications interfaces are discussed in the following sections.

13 – 2

Introduction to Teradata Warehouse

CLIv2 for Channel-Attached Systems

CLIv2 for Channel-Attached Systems CLIv2 is a collection of callable service routines that provide the interface between applications and the Teradata Director Program (TDP) on an IBM mainframe client. TDP is the interface between CLIv2 and the Teradata Database server. CLIv2 can operate with all versions of IBM operating systems, including Multiple Virtual Storage (MVS), OS/390, Customer Information Control System (CICS), Information Management System (IMS), and Virtual Machine (VM).

What CLIv2 for Channel-Attached Clients Does By way of TDP, CLIv2 sends requests to the server, and provides the application with a response returned from the server by way of TDP. CLIv2 provides support for: • • • • •

Managing multiple serially-executed requests in a session Managing multiple simultaneous sessions to the same or different servers Using cooperative processing so that the application can perform operations on the client and the server at the same time Communicating with two-phase commit coordinators for CICS and IMS transactions Generally insulating the application from the details of communicating with a server

Teradata Director Program TDP manages communications between CLIv2 and a server. The program executes on the same mainframe as CLIv2, but runs as a different job or virtual machine. An individual TDP is associated with one logical server; note however, that any number of TDPs may operate, and be accessed by CLIv2 simultaneously on the same mainframe. Each TDP is referred to by the application with an identifier called the TDPid (TDP2, for example) that is unique in a mainframe. Functions of TDP include the following: • • • •

Session initiation and termination Logging, verification, recovery, and restart Physical input to and output from the server, including session balancing and queue maintenance Security

Introduction to Teradata Warehouse

13 – 3

CLIv2 for Channel-Attached Systems

Server A server implements the actual relational database that processes requests received from CLIv2 by way of TDP. The following figure illustrates the logical structure of the client-server interface.

Application Program

REQUESTS RESPONSES

CLIv2

TDP

TDP

TDP

Teradata Database Server

Teradata Database Server

Teradata Database Server

1091B004

13 – 4

Introduction to Teradata Warehouse

CLIv2 for Network-Attached Systems

CLIv2 for Network-Attached Systems CLIv2 is a collection of callable service routines that provide the interface between applications on a LAN-connected client and the Teradata Database server.

What CLIv2 for Network-Attached Clients Does CLI is the interface between the application program and the Micro Teradata Director Program (MTDP). CLIv2 can: • •

Build parcels that MTDP packages for sending to the Teradata Database using the Micro Operating System Interface (MOSI) Provide the application with a pointer to each of the parcels returned from the Teradata Database

Micro Teradata Director Program The MTDP must be linked to applications that will be network-connected to the Teradata Database. The MTDP performs many of the same functions as the channel-based TDP including: • • •

Session initiation and termination Physical input to and output from the server Logging, verification, recovery, and restart

Unlike TDP, MTDP does not control session balancing.

Micro Operating System Interface MTDP is the interface between CLI and MOSI. MOSI is a library of service routines that provides operating system independence among the clients that access the Teradata Database. By implementing the MOSI, only one version of MTDP is required to run on all network-connected platforms.

Introduction to Teradata Warehouse

13 – 5

CLIv2 for Network-Attached Systems

These modules and the relationships among them are illustrated in the following figure:

Application Program

REQUESTS RESPONSES

CLI

MTDP

MOSI

Teradata Database Server 1091B005

13 – 6

Introduction to Teradata Warehouse

Other Types of Data Communications

Other Types of Data Communications Other types of communications interfaces are available for systems running Windows 2000 or UNIX MP-RAS.

WinCLI WinCLI is a call-level interface for MS-DOS and Windows-based applications. CLI routines are provided as object modules that have been compiled or assembled according to standard linkage conventions. WinCLI uses the Dynamic Data Exchange (DDE) protocol to communicate with application programs.

ODBC The Open Database Connectivity (ODBC) Driver for the Teradata Database provides an alternate interface to Teradata Databases using the industry standard ODBC Application Programming Interface (API). The ODBC Driver for the Teradata Database provides Core-level SQL and Extension-level 1 (with some Extension-level 2) function call capability using the Windows Sockets (WinSock) Transmission Control Protocol/Internet Protocol (TCP/IP) communications software interface. ODBC operates independently of CLI and WinCLI.

JDBC Teradata developed the Teradata JDBC Driver that enables you to access the Teradata Database using the Java language. Java Database Connectivity (JDBC) is a specification for an API. The API allows platform-independent Java applications to access database management systems using SQL. The JDBC API provides a standard set of interfaces for: • • •

Opening connections to databases Executing SQL statements Processing results

The driver is a set of Java classes that use the TCP/IP communications software to connect to the Teradata JDBC Gateway, which is constantly listening on the network port for connection requests. For each gateway connection, a new session is created. The Java program can select different gateways by using different URLs. All JDBC function requests are routed to the gateway, which in turn accesses the Teradata Database using Teradata CLIv2. More than one gateway can run on the same host if the gateways are configured to use different network ports.

Introduction to Teradata Warehouse

13 – 7

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

Call-Level Interface programming

Teradata Call-Level Interface Version 2 Reference for Channel-Attached Systems Teradata Call-Level Interface Version 2 Reference for Network-Attached Systems

13 – 8

JDBC

Teradata Driver for the JDBC Interface User Guide

ODBC

Teradata ODBC Driver User Guide

Teradata Director Program

Teradata Director Program Reference

WinCLI

Teradata Call-Level Interface Version 2 Developers Kit for Microsoft Windows

Introduction to Teradata Warehouse

Chapter 14:

Reliability The Teradata Database addresses the critical requirements of reliability, availability, serviceability, usability, and installability (RASUI) by combining the following elements: • • •

Multiple microprocessors in a Symmetric Multiprocessing, (SMP) arrangement RAID disk storage technology Protection of the Teradata Database from operating anomalies of the client platform. Both hardware and software provide fault tolerance, some of which is mandatory and some of which is optional.

Topics include: • •

Software fault tolerance Hardware fault tolerance

Introduction to Teradata Warehouse

14 – 1

Software Fault Tolerance

Software Fault Tolerance This section explains the following Teradata Database facilities for software fault tolerance: • • • • • •

Vproc migration Fallback tables AMP clusters Journaling Archive/Recovery Table Rebuild utility

Vproc Migration Because the Parsing Engine (PE) and Access Module Processor (AMP) are vprocs and therefore software entities, they can migrate from their home node to another node within the same hardware clique if the home node fails for any reason. Although the system normally determines which vprocs migrate to which nodes, a user can configure preferred migratory destinations. Vproc migration permits the system to function completely during a node failure, with some degradation of performance due to the non-functional hardware.

14 – 2

Introduction to Teradata Warehouse

Software Fault Tolerance

The following figure illustrates vproc migration, where the large X indicates a failed node, and arrows pointing to nodes still running indicate the migration of AMP3, AMP4, and PE2.

PE1

AMP1

AMP2

PE2

AMP3

AMP4

PE3

AMP5

AMP6

Normal

ARRAY

PE1

AMP1

AMP4

AMP2

AMP3

PE3

AMP6

PE2 AMP5

Recovery

ARRAY

GG01A027

Note: PEs for channel-attached connections cannot migrate during a node failure, because they depend on the channel hardware physically attached to their node.

Fallback Tables A fallback table is a duplicate copy of a primary table. Each fallback row in a fallback table is stored on an AMP different from the one to which the primary row hashes. This storage technique maintains availability should the system lose an AMP and its associated disk storage in a cluster. In that event, the system would access data in the fallback rows. The disadvantage of fallback is that this method doubles the storage space and the I/O (on INSERTs, UPDATEs, and DELETEs) for tables. The advantage is that data is almost never unavailable because of one down AMP. Data is fully available during an AMP or disk outage, and recovery is automatic after repairs have been made.

Introduction to Teradata Warehouse

14 – 3

Software Fault Tolerance

The Teradata Database permits the definition of fallback for individual tables. As a general rule, you should run all tables critical to your enterprise in fallback mode. You can run other, non-critical tables in non-fallback mode in order to maximize resource usage. Even though RAID disk array technology may provide access to data even when you have not specified fallback, neither RAID1 nor RAID5 provides the same level of protection as fallback does. You specify whether a table is fallback or not using the CREATE TABLE (or ALTER TABLE) statement. The default is not to create tables with fallback.

AMP Clusters A cluster is a group comprising of from 2-16 AMPs that provide fallback capability for each other. A copy of each row is stored on a separate AMP in the same cluster. In a large system, you would probably create many AMP clusters. However, whether large or small, the concept of a cluster exists even if all the AMPs are in one cluster.

One-Cluster Configuration Pictures best explain AMP clustering. The following figure illustrates a situation in which fallback is present with one cluster, which is essentially an unclustered system.

AMP1

AMP2

AMP3

AMP4

Primary copy area

1,9,17

2,10,18

3,11,19

4,12,20

Fallback copy area

21,22,15

1,23,8

9,2,16

17,10,3

AMP5

AMP6

AMP7

AMP8

Primary copy area

5,13,21

6,14,22

7,15,23

8,16,24

Fallback copy area

18,11,4

19,12,24

20,5,6

13,14,7 FG10A001

Note that the fallback copy of any row is always located on an AMP different from the AMP which holds the primary copy. This is an entry-level fault tolerance strategy. In this example which shows only a few rows, the data on AMP3 is fallback protected on AMPs 4, 5, and 6. However, in practice, some of the data on AMP3 would be fallback protected on each of the other AMPs in the system. The system becomes unavailable if two AMPs in a cluster go down.

14 – 4

Introduction to Teradata Warehouse

Software Fault Tolerance

Smaller Cluster Configuration The following figure illustrates smaller clusters. Decreasing cluster size reduces the likelihood that two AMP failures will occur in the same cluster. The illustration shows the same 8-AMP configuration now partitioned into 2 AMP clusters of 4 AMPs each.

AMP1

AMP2

AMP3

AMP4

Primary copy area

1,9,17

2,10,18

3,11,19

4,12,20

Fallback copy area

2,3,4

1,11,12

9,10,20

17,18,19

Cluster A AMP5

Cluster B AMP6

AMP7

AMP8

Primary copy area

5,13,21

6,14,22

7,15,23

8,16,24

Fallback copy area

6,7,8

5,15,16

13,14,24

21,22,23 FG10A002

Compare this clustered configuration with the earlier illustration of an unclustered AMP configuration. In the example, the (primary) data on AMP3 is backed up on AMPs 1, 2, and 4 and the data on AMP6 is backed up on AMPs 5, 7, and 8. If AMPs 3 and 6 fail at the same time, the system continues to function normally. Only if two failures occur within the same cluster does the system halt. Performance is the primary factor that determines cluster size. While 2-AMP clusters provide maximum protection against system loss, because the likelihood of both AMPs in a cluster going down simultaneously is very small, this configuration also suffers from a higher workload per AMP in the event of a failure. Typically, a cluster size is four to eight AMPs. For most applications, a cluster size of four provides a good balance between data availability and system performance.

Introduction to Teradata Warehouse

14 – 5

Software Fault Tolerance

Journaling The Teradata Database supports tables which are devoted to journaling. A journal is a record of some kind of activity. The Teradata Database supports several kinds of journaling. The system does some journaling on its own, while you can specify whether to perform other journaling. The following table explains the different journals capabilities of the Teradata Database: This type of journal…

Down AMP recovery

Transient

Permanent

14 – 6

Does the following…

And Occurs …

•

Is active during an AMP failure only

always.

•

Journals fallback tables only

•

Is used to recover the AMP after the AMP is repaired, then is discarded

•

Logs BEFORE images for all transactions

•

Is used by system to roll back failed transactions aborted either by the user or by the system

•

Captures:

–

Begin/End Transaction indicators

–

"Before" row images for UPDATE and DELETE statements

–

Row IDs for INSERT statements

–

Control records for CREATE and DROP statements

•

Keeps each image on the same AMP as the row it describes

•

Discards images when the transaction or rollback completes

•

Is active continuously

•

Is available for tables or databases

•

Can contain "before" images, which permit rollback, or after images, which permit rollforward, or both before and after images

•

Provides rollforward recovery

•

Provides rollback recovery

•

Provides full recovery of nonfallback tables

•

Reduces need for frequent, full-table archives

Introduction to Teradata Warehouse

always.

as specified by the user

Software Fault Tolerance

Teradata Archive/Recovery The Teradata Archive/Recovery utility backs up and restores data for channel-attached and network-attached clients: If you want to…

Then…

archive data

copy all or selected: •

Tables

•

Databases

•

Data Dictionary tables

Note: If your system is used only for decision support and is updated regularly with data loads, you may not want to archive the data. restore data

copy an archive from the client or server back to the database, and restore data to all AMPs, to clusters of AMPs, or to a specific AMP (as long as the Data Dictionary contains the definitions of the table or database you want to restore). Note: If the table does not have a definition in the Data Dictionary because of a DROP or RENAME statement, you can still restore data using the COPY statement.

Similar restore and recovery capabilities are available for systems running the Microsoft Windows 2000 operating system using the Windows NetVault and NetBackup. For more information, see “Open Teradata Backup” on page 16-2. Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.

Table Rebuild Utility Use the Table Rebuild utility to recreate a table, database, or entire disk on a single AMP under the following conditions: • •

The table structure or data is damaged because of a software problem, head crash, power failure, or other malfunction. The affected tables are enabled for fallback protection.

Table rebuild can create all of the following on an AMP-by-AMP basis: • • • •

Primary or fallback portions of a table An entire table (both primary and fallback portions) All tables in a database All tables on an individual AMP

The Table Rebuild utility can also remove inconsistencies in stored procedure tables in a database. An NCR System Engineer, Field Engineer, or System Support Representative usually runs the Table Rebuild utility.

Introduction to Teradata Warehouse

14 – 7

Hardware Fault Tolerance

Hardware Fault Tolerance The Teradata Database provides the following facilities for hardware fault tolerance: Facility

Description

Multiple BYNETs

Multinode Teradata Database servers are equipped with at least two BYNETs. Interprocessor traffic is never stopped unless both BYNETs fail. Within a BYNET, traffic can often be rerouted around failed components.

RAID disk units

Teradata Database servers use Redundant Arrays of Independent Disks (RAIDs) configured for use as RAID1, RAID5, or RAIDS. Non-array storage cannot use RAID technology. RAID1 arrays offer mirroring, the method of maintaining identical copies of data. RAID5 or RAIDS protects data from single-disk failures with a 25 percent increase in disk storage to provide parity. RAID1 provides better performance and data protections than RAID5/RAIDS, but is more expensive.

Multiple-channel and -network connections

In a client-server environment, multiple channel connections between mainframe and network-based clients ensure that most processing continues even if one or several connections between the clients and server are not working. Vproc migration is a software feature supporting this hardware issue.

Isolation from client hardware defects

In a client-server environment, a server is isolated from many client hardware defects and can continue processing in spite of such defects.

Battery backup

All cabinets have battery backup in case of building power failures.

Power supplies and fans

Each cabinet in a configuration has redundant power supplies and fans to ensure fail-safe operation.

14 – 8

Introduction to Teradata Warehouse

Hardware Fault Tolerance Facility

Description

Hot swap capability for node components

Cliques

The Teradata Database can allow some components to be removed and replaced while the system is running. This process is known as hot swap. Teradata Database offers hot swap capability for the following: •

Disks within RAID arrays

•

Fans

•

Power supplies

A clique is a group of nodes sharing access to the same disk arrays. The nodes and disks are interconnected through shared SCSI buses and each node can communicate directly to all disks. This architecture provides and balances data availability in the case of a node failure. A clique supports the migration of vprocs following a node failure. If a node in a clique fails, then its vprocs migrate to another node in the clique and continue to operate while recovery occurs on their home node. Migration minimizes the performance impact on the system. PEs for channel-attached hardware cannot migrate, because they depend on the hardware that is physically attached to the assigned node. PEs for LAN-attached connections do migrate when a node failure occurs, as do all AMP vprocs. To ensure maximum fault tolerance, no more than one node in a clique is placed in the same cabinet. Usually the battery backup feature makes this precaution unnecessary, but if you want maximum fault tolerance, then plan your cliques so the nodes are never in the same cabinet.

Introduction to Teradata Warehouse

14 – 9

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books:

14 – 10

IF you want to learn more about…

THEN see…

Physical database design

Database Design

Restore/Recovery utilities

Teradata Archive/Recovery Utility Reference

Table Rebuild utility

Utilities

Introduction to Teradata Warehouse

Section Contents

Section 4:

Management and Monitoring

Introduction to Teradata Warehouse

Section Contents

Introduction to Teradata Warehouse

Chapter 15:

Concurrency Control and Transaction Recovery This chapter describes the concurrency control in relational database management systems and how to use transaction journaling (permanent journaling) to recover lost data or restore an inconsistent database to a consistent state. The initial sections of this chapter deal with the concepts of transactions and locks. The latter sections describe the closely related topics of concurrency control and recovery. Topics include: • • • • •

Concurrency control Recovery Transactions Locks System and media recovery

Introduction to Teradata Warehouse

15 – 1

What is Concurrency Control

What is Concurrency Control Concurrency control involves preventing concurrently running processes from improperly inserting, deleting, or updating the same data. A system maintains concurrency control through two mechanisms: • •

Transactions Locks

The concepts of transactions and locks are discussed in subsequent sections.

15 – 2

Introduction to Teradata Warehouse

What is Recovery

What is Recovery Recovery is a process by which an inconsistent database is brought back to a consistent state. Transactions play the critical role in this process because they are used to “play back” a series of updates (using the term in its most general sense) to the database, either taking it back to some earlier state or bringing it forward to a current state.

Introduction to Teradata Warehouse

15 – 3

Concept of a Transaction

Concept of a Transaction This section describes the concept of a transaction. Transactions are a mandatory facility for maintaining the integrity of a database while running multiple, concurrent operations.

Definition of a Transaction A transaction is a logical unit of work and the unit of recovery. The statements nested within a transaction must either all happen or none happen. Transactions are atomic: a partial transaction cannot exist.

Definition of Serializability A set of transactions is serializable if the set produces the same result as some arbitrary serial execution of those same transactions for arbitrary input. A set of transactions is correct only if it is serializable. Use of a Two-Phase Locking (2PL) protocol may serialize transactions. The two phases are the growing phase and the shrinking phase: In the…

A transaction must…

growing phase

first acquire a lock on an object before operating on it.

shrinking phase

never acquire any more locks after it has released a lock. Lock release is an all-or-none operation.

For more information on the 2PL protocol, see “Two-Phase Commit Protocol” on page 15-14.

Transaction Semantics The Teradata Database supports both ANSI transaction semantics and Teradata transaction semantics. A system parameter specifies the default transaction mode for a site. However, you can override the default for a session. The Teradata Database returns an error when a transaction operating in Teradata semantics mode issues a COMMIT statement. The Teradata Database supports the ANSI COMMIT statement in ANSI transaction mode.

15 – 4

Introduction to Teradata Warehouse

ANSI Mode Transactions

ANSI Mode Transactions All ANSI transactions are implicit. Either of the following events opens an ANSI transaction: • •

Execution of the first SQL statement in a session Execution of the first statement following the close of a previous transaction

Transactions close when the application performs a COMMIT, ROLLBACK, or ABORT statement. When the transaction contains a DDL statement, including DATABASE and SET SESSION, which are considered DDL statements in this context, the statement must be the last statement.

BEGIN TRANSACTION/END TRANSACTION Statements A session executing under ANSI transaction semantics allows neither the BEGIN TRANSACTION statement, the END TRANSACTION statement, nor the two-phase commit protocol. When an application submits these statements in an ANSI situation, the database software generates an error.

Roll Back an ANSI Transaction In ANSI mode, the system rolls back the entire transaction if the current request: • • •

Results in a deadlock Performs a DDL statement that aborts Executes an explicit ROLLBACK or ABORT statement

Teradata Database accepts the ABORT and ROLLBACK statements in ANSI mode, including conditional forms of those statements. If the system detects an error for either a single or multistatement request, it only rolls back that request, and the transaction remains open, except in special circumstances. Application-initiated, asynchronous aborts also cause full- transaction rollback in the ANSI environment.

Introduction to Teradata Warehouse

15 – 5

Teradata Mode Transactions

Teradata Mode Transactions Teradata mode transactions can be either implicit or explicit. Multistatement requests and macros are examples of implicit transactions.

BEGIN TRANSACTION/END TRANSACTION Statements An explicit, or user-generated, transaction is a single set of BEGIN TRANSACTION/END TRANSACTION statements surrounding one or more requests. All other transactions are implicit. Consider the following transaction: BEGIN TRANSACTION; DELETE FROM Employee WHERE Name = ‘Smith T’ ; UPDATE Department SET EmpCount=EmpCount-1 WHERE DeptNo=500; END TRANSACTION;

Roll Back a Teradata Mode Transaction If an error occurs during the processing of either the DELETE or UPDATE statement within the BEGIN TRANSACTION and END TRANSACTION statements, the system restores both Employee and Department tables to the states at which they were before the transaction began. If an error occurs during a Teradata transaction, then the system rolls back the entire transaction.

15 – 6

Introduction to Teradata Warehouse

Concept of a Lock

Concept of a Lock A lock is a means of claiming usage rights to some resource. The Teradata Database can lock several different types of resources in several different ways.

Overview of Teradata Database Locking Most locks used on Teradata resources are obtained automatically. Users can override some locks by making certain lock specifications, but the Teradata Database only allows overrides when it can assure data integrity. The data integrity requirement of a request decides the type of lock that the system uses. A request for a locked resource by another user is queued until the process using the resource releases its lock on that resource.

Why Do Database Management Systems Require Locking The lost update anomaly best explains why database management systems, in which multiple processes are accessing the same database, require locks.

Introduction to Teradata Warehouse

15 – 7

Concept of a Lock

The following figure provides an example of this anomaly.

Execution of transaction T1

Execution of transaction T2

Database $500.00

READ Balance

Add $1,000.00

$500.00

$500.00

READ Balance

$1,500.00

$2,500.00

Add $2,000.00

$1,500.00 WRITE result to database $2,500.00

WRITE result to database

FG11A001

This example shows a nonserialized set of transactions. If locking had been in effect, the database would not have been able to add $3000.00 to $500.00 and get two different and wrong results. This example demonstrates the most common problem encountered in a transaction processing system without locks. Although several other problems arise when locking is not in effect, the lost update problem sufficiently illustrates the need for locking.

Lock Levels The Teradata lock manager implicitly locks the following objects: Object Locked

15 – 8

Description

Database

Locks rows of all tables in the database

Table

Locks all rows in the table and any index and fallback subtables

Row hash

Locks the primary copy of a row and all rows that share the same hash code within the same table

Introduction to Teradata Warehouse

Concept of a Lock

A user can lock the following resource types in a Teradata Database: • • •

Database Table Row Hash

Levels of Locks Types Users can apply four different types of locking on Teradata Database resources. The following table explains these types: Lock Type

Description

Exclusive

The requester has exclusive rights to the locked resource. No other process can read from, write to, or access the locked resource in any way.

Write

The requester has exclusive rights to the locked resource except for readers not concerned with data consistency.

Read

Several users can hold Read locks on a resource, during which the system permits no modification of that resource. Read locks ensure consistency during read operations such as those that occur during a SELECT statement.

Access

The requestor is willing to accept minor inconsistencies of the data while accessing the database (an approximation is good enough). An access lock permits modifications on the underlying data while the SELECT operation is in progress.

This same information is illustrated in the following table:

Lock Request

Lock Type Held None

Access

Read

Write

Exclusive

Access

Granted

Granted

Granted

Granted

Queued

Read

Granted

Granted

Granted

Queued

Queued

Write

Granted

Granted

Queued

Queued

Queued

Exclusive

Granted

Queued

Queued

Queued

Queued

Introduction to Teradata Warehouse

15 – 9

Concept of a Lock

Automatic Database Lock Levels The Teradata Database applies most of its locks automatically. The following table illustrates how the Teradata Database applies different locks for various types of SQL statements: Locking Level by Access Type Type of SQL Statement

Locking Mode UPI/NUPI/USI

NUSI/Full Table Scan

SELECT

Row Hash

Table

Read

UPDATE

Row Hash

Table

Write

DELETE

Row Hash

Table

Write

INSERT

Row Hash

Not applicable

Write

CREATE DATABASE DROP DATABASE MODIFY DATABASE

Not applicable

Database

Exclusive

CREATE TABLE DROP TABLE ALTER TABLE

Not applicable

Table

Exclusive

Deadlocks and Deadlock Resolution A deadlock occurs when transaction 1 places a lock on resource A, and then needs to lock resource B. But resource B has already been locked by transaction 2, which in turn needs to place a lock on resource A. This state of affairs is called a deadlock or a deadly embrace. To resolve a deadlock, Teradata Database aborts one of the transactions and performs a rollback. If you used BTEQ to submit the transaction, the database reports the deadlock abort to BTEQ. BTEQ resubmits only the statement that caused the error, not the complete transaction. This behavior can result in partially committed transactions. Therefore, you must take care when writing the BTEQ script to ensure that the transaction is one statement. To illustrate, in BTEQ, a statement ends with a semicolon (;) as the last non-blank character in the line. BTEQ sees the following example as two statements: sel * from x; sel * from y;

However, if you write the same statements as shown in this example, BTEQ sees them as only one statement: sel * from x ; sel * from y;

15 – 10

Introduction to Teradata Warehouse

Host Utility Locks

Host Utility Locks The locking operation that the client-resident Teradata Archive/Recovery utility uses is different from the locking operation that the Teradata Database performs. The Teradata Database documentation and utilities frequently refer to archive locks as HUT (Host Utility) locks.

HUT Lock Types Teradata Database places HUT locks as follows: Lock Type

Object Locked

Read

Any object being dumped

Group Read

Rows of a table being dumped if and only if the table is defined for an after-image permanent journal and if you select the appropriate option on the DUMP command

Write

Permanent journal table being restored

Write

All tables in a ROLLFORWARD or ROLLBACKWARD during recovery operations

Write

Journal table being deleted

Exclusive

Any object being restored

HUT Lock Characteristics HUT locks have the following characteristics: • • • • •

•

Associated with the currently logged-on user who entered the statement rather than with a job or transaction Placed only on objects on the AMPs that are participating in a utility operation Placed at the cluster level during a CLUSTER dump Never conflict with a utility lock at another level that was placed on the same object for the same user Remain active until they are released either by the RELEASE LOCK option of the utility command or by the execution of a Teradata SQL RELEASE LOCK statement after a utility operation completes Automatically reinstated following a Teradata Database restart if they had not been released

Introduction to Teradata Warehouse

15 – 11

System and Media Recovery

System and Media Recovery This section describes the conditions under which the Teradata Database performs: • • •

An unscheduled restart A transaction recovery Down AMP recovery

System Restarts Unscheduled restarts occur for one of the following reasons: • • •

AMP or disk failure Software failure Parity error

Failures and errors affect all software recovery in the same way. Hardware failures take the affected component offline and it remains offline until repaired or replaced.

Transaction Recovery Two types of automatic transaction recovery can occur: • •

Single transaction recovery Database recovery

The following table details what happens when the two automatic recovery mechanisms take place: This recovery type…

Happens when the Teradata Database…

single transaction

aborts a single transaction because of: • Transaction deadlock • User error • User-initiated abort command • An inconsistent data table Single transaction recovery uses the transient journal to effect its data restoration.

database

performs a restart for one of the following reasons: • Hardware failure • Software failure • User command

15 – 12

Introduction to Teradata Warehouse

System and Media Recovery

Down AMP Recovery When an AMP fails to come online during system recovery, the Teradata Database continues to process requests using fallback data. When the down AMP comes back online, down AMP recovery procedures begin to bring the data for the AMP up to date as follows: IF there are…

THEN the AMP recovers…

a large number of rows to be processed

offline.

only a few rows to be processed

online.

After all updates are made, we consider the AMP to be fully recovered.

Introduction to Teradata Warehouse

15 – 13

Two-Phase Commit Protocol

Two-Phase Commit Protocol Two-phase commit (2PC) is a protocol for assuring concurrency of data in multiple databases in which each participant votes to either commit or abort the changes. The participants wait before committing the change until they know that all participants can commit. By voting to commit, the participant guarantees that it can either commit or roll back its part of the transaction, even if it crashes before receiving the result of the vote. The 2PC protocol allows the development of (Customer Information Control System (CICS) and Information Management System (IMS) applications that can update one or more Teradata Database databases and/or databases under some other DBMS in a synchronized manner. The result is that all updates requested in a defined unit of work will either succeed or fail.

Definition of Participant A participant is a database manager that performs some work on behalf of the transaction, and that commits or aborts changes to the database. A participant can also be a coordinator of participants at a lower level. In such cases, the coordinator/participant relays a vote request to its participants, and sends its vote to the coordinator only after determining the outcome of its participants. Any number of participants can engage in a two-phase commit operation. A participant is defined as being in doubt from the time it votes to commit or abort until the time it receives a commit or abort instruction from the coordinator, which is the controlling database manager with respect to the distributed transaction. A transaction is in doubt if any of the participants are in doubt.

Definition of Coordinator The coordinator is never in doubt. Selection of the coordinator is arbitrary. However, with respect to the Teradata Database, it is always either IMS or CICS. There can be only one coordinator per transaction at any given time.

15 – 14

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

Specifying transactions in an embedded SQL program

SQL Reference: Stored Procedures and Embedded SQL Teradata Preprocessor2 for Embedded SQL Programmer Guide

Transaction processing in general

SQL Reference: Statement and Transaction Processing

Two-phase commit

Teradata Director Program Reference IBM CICS Interface for Teradata Reference IBM IMS/DC Interface for Teradata Reference

Introduction to Teradata Warehouse

15 – 15

For More Information

15 – 16

Introduction to Teradata Warehouse

Chapter 16:

Database Management and Analysis Tools An important part of the overall design of Teradata Warehouse is the means to manage the hardware and software that make up the system. Teradata offers a wide variety of utilities, management and analysis tools, and peripherals. Some of these tools are resident on the Teradata Database and others are available in Teradata Tools and Utilities, a management suite available for installation in client environments. With these, you can backup and restore important data, save dumps, and investigate and control the Teradata Database configuration, user sessions, and various aspects of its operation and performance. This chapter describes the management and analysis tools that you can use to keep the database running at optimum performance levels. These tools fall into the several basic categories. Topic include: • • • • •

Data archiving Data load and export utilities Database management tools Query analysis tools Query facilities

Introduction to Teradata Warehouse

16 – 1

Teradata Tools and Utilities - Archive Utilities

Teradata Tools and Utilities - Archive Utilities Storing data for future retrieval is an important part of system administration. Teradata offers the following archive and recovery utilities: • •

Teradata Archive/Recovery (for channel-attached and network attached systems) Open Teradata Backup products for Microsoft Windows systems including: • NetBackup (network-attached systems) • NetVault (network-attached systems)

Teradata Archive/Recovery Utility The Teradata Archive/Recovery utility (ARC) supports archiving and restoring Teradata Database databases, individual tables, or permanent journals to any of the following media: • •

Client tape Client file

ARC also includes recovery with rollback and rollforward functions for data tables defined with a journal option. For more information about rollback and rollforward, see Chapter 15: “Concurrency Control and Transaction Recovery.”

Open Teradata Backup Open Teradata Backup (OTB) supports open architecture products that provide backup and restore functions for Microsoft Windows clients. The following products are available: •

•

NetVault The NetVault Teradata Module is a backup system that allows you to graphically select databases and tables and specify the kinds of backups (distributed, online, and so forth) you want to perform. NetBackup NetBackup for Teradata supports parallel backups and restores coordinated across multiple hosts connected to a single Teradata Database. The full functional capabilities of the NetBackup server and the multiple media servers are realized in this product. In addition, NetBackup uses an Administrative Host, which contains a Graphical User Interface (GUI) to provide object browsing and selection, automatic script generation and centralized job monitoring. Note: Contact Teradata Global Sales Support for information about the controlled distribution of NetBackup.

16 – 2

Introduction to Teradata Warehouse

Teradata Tools and Utilities - Data Load and Export Utilities

Teradata Tools and Utilities - Data Load and Export Utilities Data load utilities are usually designed following one of two design philosophies: Utilities operate either…

For example…

And are typically used…

as fast as possible, with little regard for the impact on system users

Teradata MultiLoad

in a decision support environment where transactions for the day are loaded during a nightly batch window when there are few interactive users.

or, in the background and limit the impact on interactive users

Teradata TPump

Teradata FastLoad

to process a continuous feed of near-realtime updates while interactive users require rapid responses.

Teradata MultiLoad The Teradata MultiLoad utility supports bulk INSERTs, UPDATEs, and DELETEs against initially unpopulated or populated tables. Both the client and server environments support Teradata MultiLoad. Teradata MultiLoad can: • • • •

Run against multiple tables Perform block transfers with multi-session parallelism Load data from multiple input source files Pack multiple SQL statements and associated data into a request

Teradata FastLoad The Teradata FastLoad utility loads data in unpopulated tables only. Both the client and server environments support Teradata FastLoad. Teradata FastLoad can: • •

Load data into empty tables Perform block transfers with multi-session parallelism

Introduction to Teradata Warehouse

16 – 3

Teradata Tools and Utilities - Data Load and Export Utilities

Teradata Parallel Data Pump The Teradata Parallel Data Pump (TPump) utility uses standard SQL/DML (not block transfers) to maintain data in tables. It also contains a method whereby you can specify the percentage of system resources to be used for the operations on tables. This allows background maintenance for INSERT, DELETE, and UPDATE operations to take place at any time of day while the Teradata Database is in use. Teradata TPump can: • •

Maintain up to 60 tables at a time Support the same restart, portability, and scalability features as Teradata MultiLoad

Teradata FastExport Utility To export data, Teradata Tools and Utilities provides the Teradata FastExport utility. The Teradata FastExport utility exports data in parallel. The utility exports large quantities of data from the Teradata Database to a client and is the functional complement of the FastLoad and MultiLoad utilities. Teradata FastExport can: • •

16 – 4

Export tables to client files Perform block transfers with multi-session parallelism

Introduction to Teradata Warehouse

Database Management Tools

Database Management Tools Teradata provides tools for investigating and managing active sessions and configurations. The tools are discussed in the following sections.

Teradata Database - Active Session and Configuration Database management tools include utilities for investigating active sessions and the state of the Teradata Database configuration, such as: • • •

Query Session Query Configuration Gateway Global utility

The following table contains information about the capabilities of each utility: This utility…

Does the following…

Query Session also known as Sessions States

provides information about active Teradata Database sessions. monitors the state of all or selected sessions on selected logical host IDs attached to the Teradata Database. provides information about the state of each session including session details for Teradata Index Wizard. For more information about Teradata Index Wizard, see “Teradata Tools and Utilities - Teradata Index Wizard” on page 16-13.

Query Configuration

Introduction to Teradata Warehouse

provides reports on the current Teradata Database configuration, including: •

Node

•

AMP

•

PE identification and status.

16 – 5

Database Management Tools This utility…

Does the following…

Gateway Global

allows you to monitor and control the sessions of Teradata Database networkconnected users. The gateway software runs as a separate operating system task and is the interface between the network and the Teradata Database. supports up to 1200 sessions per gateway, depending on available system resources and the number of allotted PEs. Note: At least one PE that can support up to 120 sessions is required for each logical network attachment. allows client programs that communicate through the gateway to the Teradata Database to be installed and running on either: •

The Teradata Database server,

•

Or, network-attached workstations

Client programs that run on a channelattached host bypass the gateway completely.

16 – 6

Introduction to Teradata Warehouse

System Resource Management

System Resource Management Other tools allow you to monitor and manage system resources, such as: • • • • •

Ferret utility Priority Scheduler Teradata Statistics Wizard Teradata Dynamic Query Manager Teradata MultiTool

The utilities and tools are discussed in the following sections.

Teradata Database - Ferret Utility The Ferret utility is a tool that you can use to set various disk space utilization attributes associated with the Teradata Database while maintaining the integrity of the data managed by the Teradata Database file system. After you have selected the attributes and functions, Ferret dynamically reconfigures the data on the disks to correspond with the selections. Depending on the functions, Ferret can operate at the vproc, table, subtable, disk, or cylinder level.

Teradata Database - Priority Scheduler Utility The Priority Scheduler is a resource management tool that oversees the dispersal of system resources based on a blueprint that you construct to satisfy your site-specific requirements. The Priority Scheduler is active in all Teradata Database systems. The Teradata Database itself automatically moves internal jobs into different priority levels, especially when a quick boost to one activity is critical to overall throughput. Priority Scheduler does the following: • •

Keeps resource usage in your data warehouse balanced around your specific needs Offers flexibility for prioritizing users differently and specifying scheduling options

The Priority Scheduler controls the allocation and consumption of the computer resources available to the Teradata Database on the following: • •

A session-related priority designation The system-level priority strategy that you define

Although the default state of Priority Scheduler assigns the same priority to the jobs of all users, you can take advantage of the capabilities of the Priority Scheduler by doing the following: •

Assigning different priorities to different types of jobs

Introduction to Teradata Warehouse

16 – 7

System Resource Management

•

Assigning jobs of favored users more CPU and faster I/O than the lowerpriority jobs

The Priority Scheduler Administrator available in Teradata Manager enhances the usability of the Priority Scheduler by providing a graphical interface for configuration, management, and monitoring. For information, see “Teradata Manager” on page 19-2.

Teradata Tools and Utilities - Teradata Statistics Wizard The Teradata Statistics Wizard is a graphical tool that was developed to improve the performance of queries and the entire database. The Statistics Wizard automates the process of collecting statistics for a particular workload or selecting arbitrary indexes or columns for collection or re-collection. Additionally, the Statistics Wizard permits you to validate the proposed statistics on a production system. The validation capability enables the you to verify the performance of the proposed statistics before applying the recommendations. The following table contains information about the capabilities of Teradata Statistics Wizard: You can…

For…

select a workload

analysis and receive recommendations based on the results

select a database or select several tables, indexes, or columns

analysis and receive recommendations based on the results.

defer

the schedule for the collection or recollection of statistics.

display and modify statistics

a column or index.

receive recommendations

analysis that are based on table demographics and general heuristics.

As changes are made within a database, the Statistics Wizard identifies those changes and recommends which tables should have statistics collected, based on age of data and table growth, and the columns/indexes that would benefit from having statistics defined and collected for a specific workload. The administrator is then given the opportunity to accept or reject the recommendations.

16 – 8

Introduction to Teradata Warehouse

System Resource Management

Teradata Database - Teradata Dynamic Query Manager The Teradata Dynamic Query Manager (DQM) is an application that lets you manage access to and use of the Teradata Database resources. Managed access allows you to use the database efficiently and manipulate workload capacity. The functions for rules processing and SQL validation are integrated in Teradata Database. To manage queries effectively, Teradata DQM has capabilities that support effective query management: • •

Query Management Request Scheduling

The following table provides information about these capabilities: Teradata DQM provides…

That…

Query Management functions

Examine login and query requests. Check the users who issue requests, the accounts they use to log in, the performance groups they are associated with, and the objects referenced in the requests against criteria that you have previously defined. Rejects or delays those requests that fail to meet the defined criteria.

Request Scheduling Tools

Can be use to schedule single- or multistatement query requests for execution at a later time. The Scheduled Request (SR) function comprises both client and Teradata Database server components. The SR client components submit and monitor scheduled requests, and the SR server piece checks, saves, and executes the requests.

Introduction to Teradata Warehouse

16 – 9

System Resource Management

The following table provides information about the restrictions you can create in Teradata DQM: You can create restrictions based on…

Such as…

names

account names. user and group logon IDs. database names. database object, such as:

resources involved in a query

•

Tables

•

Views

•

Macros

•

Stored procedures

processing time. number of rows returned. joins or full-scans.

date and time

N/A.

SQL queries entering the system, regardless of the source of the request, can be blocked, including queries received from: • • • •

Basic Teradata Query (BTEQ) Call Level Interface (CLI) Open Database Connectivity (ODBC) Java Database Connectivity (JDBC)

You can enable or disable query management as desired. When enabled, all login and query requests, regardless of their origin, are managed by Teradata DQM.

16 – 10

Introduction to Teradata Warehouse

Teradata Database - Teradata MultiTool

Teradata Database - Teradata MultiTool Teradata MultiTool is a Teradata Database utility that offers a graphical user interface (GUI) on Windows systems that Teradata administrators and support personnel can use as an interface to command-line-based Teradata and PDE tasks. You can start specific utilities using the options available in the GUI. The following table lists the tools accessible from Teradata MultiTool: The tool …

Is used to …

Control GDO Editor (CTL)

display and modify the fields of the PDE GDO (Globally Distributed Object).

Database Window (DBW)

activate the Supervisor window and subwindows.

Database Initialization Program (DIP)

execute one or more of the standard Database Initialization Program Structured Query Language (SQL) scripts packaged with the database.

Vproc Manager

perform the following functions: • Obtain the status of vprocs • Change vproc states • Initialize and boot a specific vproc • Initialize the vdisk associated with a specific vproc • Force a database restart

Introduction to Teradata Warehouse

16 – 11

Database Query Analysis Tools

Database Query Analysis Tools The Teradata Database Query Analysis Tools (DBQAT) are designed to improve the overall performance analysis capabilities of the Teradata Database. Teradata Tools and Utilities and Teradata Database tools are described in the following sections. Tools in the following list are discussed in more detail in the following sections: • • • • • • •

16 – 12

Teradata Index Wizard The Query Capture Facility Teradata Visual Explain Database Query Log Target Level Emulation on the Server Teradata System Emulation Tool on the Client Database Object Use Count

Introduction to Teradata Warehouse

Teradata Tools and Utilities - Teradata Index Wizard

Teradata Tools and Utilities - Teradata Index Wizard The Teradata Index Wizard is a tool that interfaces with the Teradata Database. This utility analyzes various SQL query workloads and suggests candidate indexes to enhance the performance of those queries in the context of the defined workloads. The workload definitions, supporting statistical and demographic data, and index recommendations are stored in various Query Capture Database (QCD) tables.

What Can the Teradata Index Wizard Do Using data from a QCD or the Database Query Log (DBQL), the wizard: •

• •

•

•

Recommends secondary indexes for the tables based on workload details, including data demographics, that are captured using the Query Capture Facility (QCF) Allows you to validate index recommendations before implementing the new indexes Allows you to perform what-if analysis on the workload. The Teradata Index Wizard allows you to determine whether your recommendations actually improve query performance Interfaces with other Teradata Tools and Utilities, such as Teradata System Emulation Tool (TSET) to perform offline query analysis by importing the workload of a production system to a test system Uses the Teradata Visual Explain and Compare (VEComp) tool to provide a comparison of the query plans with and without the index recommendations

Teradata Index Wizard can be started from Teradata Visual Explain, Teradata System Emulator Tool, Teradata Statistics Wizard, and Teradata Manager. Index Wizard can also open these applications (except Teradata Manager) to help in your evaluation of recommended indexes.

Introduction to Teradata Warehouse

16 – 13

Teradata Tools and Utilities - Teradata Index Wizard

Demographics The Teradata Index Wizard needs demographic information to perform index analysis and to make recommendations. You can collect the following types of data demographics using SQL: •

•

16 – 14

Query demographics Use the INSERT EXPLAIN statement with the WITH STATISTICS and DEMOGRAPHICS clauses to collect table cardinality and column statistics. Table demographics Use the COLLECT DEMOGRAPHICS statement to collect the row count and the average row size in each of the subtables in each AMP on the system.

Introduction to Teradata Warehouse

Teradata Database - Query Capture Facility

Teradata Database - Query Capture Facility The Query Capture Facility (QCF) is available on the Teradata Database. The QCF captures the data pertaining to an execution plan and stores the data in a set of relational tables in a QCD. Applications of QCF and QCD: • •

• •

Provide the foundation for the Teradata Index Wizard utility. Can store all query plans for customer queries. You can then compare and contrast queries as a function of software release, hardware platform, and hardware configuration. Provide the foundation for the Visual EXPLAIN tool, which displays EXPLAIN output graphically. Provide data so that you can generate your own detailed analyses of captured query steps using standard SQL DML statements and third party query management tools.

You can execute the COLLECT, DROP, and HELP STATISTICS SQL statements against a QCD.

QCD Schema Improvement QCD schema is designed to: • • •

Minimize the number of tables required by capturing information in a generic fashion Promote usability Improve the overall performance of data storage and retrieval

Teradata Index Wizard Support A QCD is the central repository for the information used in the analyses performed by the Teradata Index Wizard. A QCD supports the Teradata Index Wizard by capturing and storing the data demographics and index wizardrelated information that you specify. The workload definitions, supporting statistical and demographic data, and index recommendations are stored in various QCD tables. The Teradata Index Wizard analyzes various SQL query workloads and suggests candidate indexes to enhance the performance of those queries in the context of the defined workloads.

Introduction to Teradata Warehouse

16 – 15

Teradata Tools and Utilities - Teradata Visual Explain

Teradata Tools and Utilities - Teradata Visual Explain Teradata Visual Explain is a tool that visually depicts the execution plan of complex SQL statements in a simplified manner. When you specify the EXPLAIN modifier in the SQL statement, Teradata Visual Explain presents a graphical view of the statement broken down into discrete steps showing the flow of data during execution. Because comparing optimized queries is easier with Teradata Visual Explain, application developers and database administrators can fine-tune the SQL statements so that the Teradata Database can access data in the most effective manner. In order to view an execution plan using Teradata Visual Explain, the execution plan information must first be captured into the QCD using the Query Capture Facility (QCF using the following commands: • • •

INSERT EXPLAIN DUMP EXPLAIN

Teradata Visual Explain reads the execution plan, which has been stored in a QCD, and turns it into a series of icons.

16 – 16

Introduction to Teradata Warehouse

Teradata Database - Database Query Log

Teradata Database - Database Query Log The Database Query Log (DBQL) is a Teradata Database tool that provides a series of predefined tables that can store, based on rules you specify, historical records of queries and their duration, performance, and target activity. DBQL is flexible enough to log information on the variety of SQL requests that run on the Teradata Database, from short transactions to longer-running analysis and mining queries. After implementing DBQL, you use simple SQL statements to control the start, extent, and duration of the logging activity. You can define rules, for instance, that log the first 4000 SQL characters of any query that runs during a session invoked by a specific user under a specific account, if the time to complete that query exceeds the specified time threshold. You can request that DBQL log particular query information or just a count of qualified queries. You can specify that the recording criteria be a mix of: • •

•

Users and accounts Elapsed time, where time can be expressed as: • A series of intervals • A threshold limit Processing detail, including any or all: • Objects • Steps • SQL text

In addition to the query-related data, DBQL stores the following information to help identify the query: • •

User name Session number and account information

DBQL data also can be input to Target Level Emulation, and Teradata Tools and Utilities, including Teradata Manager, and VEComp. Teradata Tools and Utilities aid in analysis and present the information in a graphic form that is easily manipulated and understood.

Introduction to Teradata Warehouse

16 – 17

Teradata Database - Target-Level Emulation

Teradata Database - Target-Level Emulation Teradata Database supports Target-Level Emulation both on the Teradata Database server and in the client as follows: Teradata supports…

On the…

Target-Level Emulation (TLE)

Teradata Database server.

Teradata System Emulation Tool

client.

The Teradata Database provides the infrastructure for Target-Level Emulation (TLE). You can use the standard SQL interface to capture the system configuration details and table demographics on one system and store them on another. Usually the information is obtained from a production system, then stored on a smaller test or development system. With this capability, the optimizer can generate access plans similar to those that are generated on a production system. You can use the plans to in analyze optimizer-related production problems. This information can also be used by the Teradata System Emulation Tool.

16 – 18

Introduction to Teradata Warehouse

Teradata Tools and Utilities - Teradata System Emulation Tool

Teradata Tools and Utilities - Teradata System Emulation Tool When TLE information is stored on a test system, Teradata System Emulation Tool (TSET,) a Teradata Tools and Utilities tool, allows you to examine the query plans generated by the test system optimizer as if the plans were processed on the production system. Using TSET you can: • • •

Change system configuration details and table demographics and model the impact of various changes on SQL statement performance Determine the source of various optimizer-based production problems Provide an environment in which Teradata Index Wizard can produce recommendations for a production system workload

Introduction to Teradata Warehouse

16 – 19

Teradata Database - Database Object Use Count

Teradata Database - Database Object Use Count The database administrator and application developer can use Database Object Use Count to capture the number of times an application refers to an object. Database Object Use Count captures counts for the following: • • • • • • • • •

Database Table Column Index View Macro Teradata stored procedure Trigger User-defined function

Once captured, you can use the information to identify obsolete or unused database objects, particularly those that occupy significant quantities of valuable disk space. Further, the DBU Count information can be useful to database query analysis tools like Teradata Index Wizard.

16 – 20

Introduction to Teradata Warehouse

Query Facilities

Query Facilities A request to the Teradata Database consists of one or more SQL statements, and can span any number of input lines. The Teradata Database can receive and execute statements that are: • • •

Entered interactively, or submitted as a script or a batch job, through the Basic Teradata Query interface Entered using Teradata SQL Assistant Embedded in an application program that is written in a procedural language

Each facility is discussed in the following sections.

Introduction to Teradata Warehouse

16 – 21

Teradata Tools and Utilities - Basic Teradata Query Utility

Teradata Tools and Utilities - Basic Teradata Query Utility The Basic Teradata Query Utility (BTEQ) is an SQL front-end utility that runs on all client platforms. It resides on the client portion of either a channelattached or network-attached system and communicates with one or more Teradata Database systems residing on the server. BTEQ allows you to create and submit SQL queries either interactively or in batch mode from an interactive terminal.

BTEQ Support BTEQ supports the following facilities: • • • • •

Multiple Teradata SQL statements per request Read from and write to client data files Management of multiple sessions per job Sophisticated report format Stored procedure objects in the Teradata Database

BTEQ Communication The client system communicates with the Database as described in the following table:

16 – 22

IF your client system is…

THEN communication occurs over a…

channel attached

high-speed I/O channel.

network attached

Local Area Network (LAN).

Introduction to Teradata Warehouse

Teradata Tools and Utilities - Teradata SQL Assistant

Teradata Tools and Utilities - Teradata SQL Assistant Teradata SQL Assistant is a Teradata Tools and Utilities tool that provides information discovery capabilities on Windows-based systems. Teradata SQL Assistant retrieves data from any ODBC-compliant database server and allows you to manipulate and store the data on your desktop PC. You can then use this data to produce consolidated results or perform analyses on the data using tools such as Microsoft Excel. The following table contains information about key feature of Teradata SQL Assistant: This feature…

Allows you to…

Reports

Create reports from any database that provides an ODBC interface Use an imported file to create many similar reports (query results or answer sets), for example, display the DDL (SQL) that was used to create a list of tables

Data manipulation

Export data from the database to a file on a PC Import data from a PC file directly to the database Create a historical record of the submitted SQL with timings and status information, such as success or failure Use the Database Explorer Tree to easily view database objects

Queries

Use SQL syntax examples to help compose your SQL statements Send queries to any ODBC database or the same query to many different databases Limit data returned to prevent runaway queries

Teradata stored procedures

Introduction to Teradata Warehouse

Use a procedure builder that gives you a list of valid statements for building the logic of a stored procedure, using Teradata syntax

16 – 23

Teradata Tools and Utilities - Teradata SQL Assistant

Teradata SQL Assistant electronically records your SQL activities with data source identification, timings, row counts, and notes. Having this historical data allows you to build a script of the SQL that produced the data. The script is useful for data mining.

16 – 24

Introduction to Teradata Warehouse

Teradata Tools and Utilities - Preprocessor2

Teradata Tools and Utilities - Preprocessor2 The Teradata Tools and Utilities provides a preprocessing facility that lets you include SQL statements in your application programs. The Preprocessor 2 parses application code for SQL statements, converts the statements to Call-Level Interface (CLI) calls, and comments out the SQL statements. After the Preprocessor2 processes the application code, you can submit processed code to your client application language compiler. For more information about embedded SQL, see “Embedded SQL Applications” on page 6-3.

Introduction to Teradata Warehouse

16 – 25

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities Foundation books: IF you want to learn more about…

THEN see…

Archive utilities

Teradata Archive/Recovery Utility Reference

BTEQ

Basic Teradata Query Reference

Database Query Log

Database Administration Data Dictionary Performance Optimization Teradata Visual Explain User Guide

Embedded SQL

Teradata Preprocessor2 for Embedded SQL Programmer Guide SQL Reference: Data Manipulation Statements

General Teradata Database software architecture

Database Design

Load and unload utilities

Teradata FastExport Reference Teradata FastLoad Reference Teradata MultiLoad Reference Teradata Parallel Data Pump Reference

Priority Scheduler

Utilities

Query Capture Database

Database Design Teradata Manager User Guide SQL Reference: Statement and Transaction Processing

Teradata Database management utilities

Utilities

Teradata Dynamic Query Manager

Teradata Dynamic Query Manager Administrator Guide Teradata Dynamic Query Manager User Guide

16 – 26

Teradata Index Wizard

Teradata Index Wizard User Guide

Teradata Manager

Teradata Manager User Guide

Introduction to Teradata Warehouse

For More Information IF you want to learn more about…

THEN see…

Teradata SQL

SQL Reference: Fundamentals Teradata SQL Assistant for Microsoft Windows User Guide

Teradata SQL Assistant

Teradata SQL Assistant for Microsoft Windows User Guide

Teradata System Level Emulation

Database Design SQL Reference: Data Definition Statements SQL Reference: Statement and Transaction Processing Teradata System Emulation Tool User Guide

Teradata Visual Explain

Teradata Visual Explain User Guide

Introduction to Teradata Warehouse

16 – 27

For More Information

16 – 28

Introduction to Teradata Warehouse

Chapter 17:

Security and Integrity This chapter describes security and integrity for the Teradata Database. Topics include: • • • • • •

Security and integrity Resource access control Encryption Password security features SQL used to control logon Security policies and physical access control

The descriptions include both client and server security and Teradata Database user privileges.

Introduction to Teradata Warehouse

17 – 1

Security and Integrity

Security and Integrity Security is the protection of data against unauthorized access. You can secure programs and data by issuing identification numbers and passwords to authorized users of a computer. The operating system can check passwords to prevent users from logging onto the system in the first place, or the system can check passwords in software, such as in a database, where each user is assigned an individual view of the database. Although you can take precautions to detect an unauthorized user, determining if a valid user is performing unauthorized tasks is extremely difficult. Integrity is the process of preventing accidental erasure or corruption of data in a database.

17 – 2

Introduction to Teradata Warehouse

System Integrity

System Integrity The Teradata Database provides support for referential integrity to ensure that every foreign key in a referencing table matches a primary key in a referenced table. Users may also provide their own facilities for monitoring referential integrity in the Teradata Database. For more information about referential integrity, see “Referential Integrity” on page 12-8. You can also write macros and stored procedures that enforce the referential integrity of each table in your system.For more information about macros and stored procedures, see Chapter 11: “Other Database Objects.”

Introduction to Teradata Warehouse

17 – 3

System Security

System Security The four categories of solutions for system security are: Category

Description

Resource access control

Software-enforced access restrictions

Physical access control

Restrictions detailed in a formalized security policy

Encryption

Logon and network data encryption

Security policy

A sound, well-enforced data center security policy

Auditing and accountability

System auditing of security-related user actions

These categories are discussed in the following sections.

17 – 4

Introduction to Teradata Warehouse

Resource Access Control

Resource Access Control This section introduces the Teradata Database software tools you can use to enforce access restrictions. These tools include: • • • • • •

User identifiers (user names) Channel or network (LAN) identifiers (host or client identifiers) Logon policies TDP user security interface Client security Single Sign On

User Identifiers Teradata access control is based on a user identifier. The security administrator can optionally enforce access control based on a channel- or network-client identifier as well. A user name is the name defined in a CREATE USER statement. The security administrator must perform one CREATE USER statement for each authorized user in order to establish the user name, define its password, and allocate user disk space. The DBase table stores user names and database names and resides in the space allocated to a system user named DBC. You can retrieve information about user names from the DBC.DBase table by querying the system view named DBC.Users.

Client Identifiers Any number of different client types can connect to the Teradata Database server. Each connection must have its own unique client identifier. You use the Configuration utility to assign each connection a unique value and define the value to the Teradata Database. Each defined value functions as a client identifier or hostid.

Logon Policies Users must issue a logon request so that the Teradata Database can identify the user and establish a session. The logon string must include a user name that is already established in the system DBase table.

Introduction to Teradata Warehouse

17 – 5

Resource Access Control

The logon string may also include any combination of the following operands: Operand

Definition

tdpid

Each copy of the TDP on a given channel-attached client is assigned a unique tdpid to identify it. The tdpid is a client-based operand and is not transmitted to the Teradata Database.

password

A password authenticates a user request to initiate a Teradata session under the supplied user name. To create a password: The security administrator can use the CREATE USER statement to establish a password for a user. The default is that the password must appear in the user logon string. To logon without a password: If you enable the security administrator user, the security administrator can issue a GRANT LOGON statement containing the WITH NULL PASSWORD option for the user. On IBM mainframe clients…

On Microsoft Windows 2000 clients…

TDP security user exit TDPLGUX must acknowledge that the logon string is valid without a password.

Single Sign On provides the ability to use industry standard network authentication to identify users. For information about this feature, see “Single Sign On” on page 17-7.

Note: Because the null password applies only to logging onto the Teradata Database, all other system security measures continue to be enforced. acctid

The account id can be used for resource accounting. Each user name may have one or more acctids. The logon processor assigns a default value for the acctid if it detects none in the logon string for a user. The acctid can also contain a priority-level prefix that can be used when interactive users are competing for system resources with longrunning batch jobs.

TDP Security IBM mainframe clients running either MVS or VM have the option of enforcing security at the TDP level using tdpids. The TDP provides a user logon exit called TDPLGUX which you can embed in a user-written routine to process logon requests. Using TDPLGUX, you can reject, accept, provide, or modify any logon request to the Teradata Database.

17 – 6

Introduction to Teradata Warehouse

Resource Access Control

TDPLGUX also permits users to set any of the following options: • • •

No logon string (implicit logon) A user id for which the user routine provides a password A user id that can be validated to require no password

You can use TDPGLUX alone or in conjunction with any security package such as: • • •

RACF CA-ACF2 CA-TOP SECRET

Single Sign On The Single Sign On feature allows users of the Teradata Database on Microsoft Windows 2000 systems to access Teradata Database based on their authorized network usernames and passwords. This feature simplifies the logon procedure that requires users to enter an additional username and password when running client applications that access the database. For the Single Sign On feature to work, it must be enabled on the Teradata Database server as well as on the Teradata Gateway. The database administrator can turn Single Sign On OFF or ON for the database using one of the following: • •

Teradata Database Window (DBW) (DBW is the preferred way.) DBS Control utility

To turn the feature ON or OFF on the Teradata Gateway, the database administrator can use the Gateway Global utility. Note: Single Sign On the Teradata Database is not available on NCR UNIX MP-RAS systems. Authentication can be accomplished in several ways. A field in the Gateway Global utility indicates the authentication method the client used to log on to the database. The Authentication field has four values: IF the field contains…

THEN authentication was provided by…

DATABASE

the database. This was the method used before Single Sign On was implemented.

NEGOTIATE

Windows Negotiate.

NTLM

Windows NTLM.

KERBEROS

Windows kerberos.

Introduction to Teradata Warehouse

17 – 7

Resource Access Control

Single Sign On provides the following benefits: • •

•

17 – 8

Enhances site security because authentication mechanisms do not send passwords across the network Supports the use of alternative security mechanisms that automate logon by eliminating the need for an application to declare or store a password on the client system Saves time

Introduction to Teradata Warehouse

Encryption

Encryption Teradata enhances security between the Teradata Database and network-attached clients by implementing encryption. Call-Level Interface version 2 (CLIv2) supports encryption. Other interface products included in Teradata Tools and Utilities, such as ODBC and JDBC type-4, support only logon encryption, which is subset of network data encryption. The encryption feature supports the following: • •

Network data encryption Logon encryption

Network Data Encryption Teradata Tools and Utilities supports network data encryption between client applications and the following: • •

The Teradata Gateway on Microsoft Windows 2000 NCR UNIX MP-RAS systems

A client application can enable or disable network data encryption for the duration of a request by setting the data_encryption flag in dbcarea. When the flag is set to Y, network traffic is encrypted in both directions between the client application and the Teradata Gateway. Clients that do not support encryption on a request-by-request basis can take advantage of network data encryption by enabling encryption on a global rather than a request basis. To accomplish global encryption, the clispb.dat file associated with the client application must have data_encryption=Y.

Logon Encryption and the Teradata Gateway The client application does not enable or disable logon encryption. Encryption is determined by the settings of the Teradata Gateway, which is the target security domain. The database administrator (owner of this security domain) can control encryption using an option in the Gateway Control utility. When operating under default conditions, the Teradata Gateway accepts only encrypted logons and rejects unencrypted ones. For the gateway to accept both encrypted and unencrypted logons, the database administrator must set a Gateway Control option to yes.

Introduction to Teradata Warehouse

17 – 9

Security Features

Security Features You can use a number of attributes to enhance Teradata Database password security.

Password Attributes The following table lists and describes password security attributes: Password Attribute

Description

Expiration

Defines a time span during which the password is valid. After that duration, the user must change the password.

Number of characters, digits, special characters

Restricts the number of characters, digits, or special characters permitted in a password.

Maximum logon attempts

Defines the sequential number of erroneous logon attempts permitted before locking the user from further attempts

Lockout time

Sets the time duration of the user lockout after the user has exceeded the maximum number of erroneous logon attempts. Note: An administrator can do the following:

Reuse

•

Set the lockout duration by specifying a value of up to 32000 minutes (about 23 days)

•

Lock out the user indefinitely

•

End an existing lockout

Defines the time span that must elapse before you can reassign a previously used password.

The DBC.SysSecDefaults table stores password attributes for the Teradata Database. Teradata Database passwords are encrypted and stored in the PasswordString field of the DBC.DBase table.

17 – 10

Introduction to Teradata Warehouse

Security Features

User-Level Password Attributes You can assign the eight password security attributes in a user profile: • • • • • • • •

Password Expiration Password MinChar Password MaxChar Password Digits Password SpecChar MaxLogonAttempts LockedUserExpire Password Reuse

The administrator assigns users to the profile, thus effectively implementing password security at the user level. To learn more about simplifying system administration using capabilities in roles and profiles “Roles and Profiles for Users” on page 18-6.

DBC.DBase Table Teradata stores password information in encrypted form in the DBC.DBase system table. The table contains the date and time a user defined a password, along with the encrypted password. An administrator can modify passwords temporarily when the PasswordLastModDate plus a fixed number has been reached. This allows you to ensure that users change their passwords regularly. Passwords are always encrypted. The PasswordString column of the DBC.DBase table displays encrypted passwords. The password is never decrypted.

Introduction to Teradata Warehouse

17 – 11

SQL Used to Control Logon

SQL Used to Control Logon The Teradata Shared Information Architecture (SIA) allows multiple clients to connect to the Teradata Database simultaneously. By default, the system grants logon permission to all users from all connections. However, the Teradata Database provides tools for restricting logons from specific clients. Use the statements GRANT LOGON and REVOKE LOGON to associate specific user names with specific client (host) ids. You can only grant logons using GRANT LOGON if the user is already created in the Teradata Database and if the client (host) id corresponds to a value assigned to a network- or channel-connection by the Teradata Database. You can retract the privileges granted by a GRANT LOGON statement by using the REVOKE LOGON statement.

Data Access Control The first level of access to the Teradata Database is at the level of the user and the database. This section discusses explicit access rights as controlled by the GRANT and REVOKE statements. These statements grant or remove from a user or group of users one or more privileges on a database, user, table, view, stored procedure, or macro. You must be an owner of the object being controlled, or must have GRANT/REVOKE privileges to the object, before you can submit GRANT or REVOKE statements. If the object is a view, stored procedure, or macro, then the owner must also have the GRANT privilege and any other applicable privileges on the object or objects referenced by the view, stored procedure, or macro. You cannot grant more privileges on an object than you have yourself on that object. When a user explicitly grants privileges to another user or database, certain rules determine whether, how, and on what object the requested privilege is implemented.

Ownership and Implicit Rights As an owner of an object, you have implicit rights on the object. These rights allow you access to the object in certain cases even when you do not have explicit rights for the object.

17 – 12

Introduction to Teradata Warehouse

SQL Used to Control Logon

System Views for Access Information The Teradata Database supplies numerous system views for accessing information in the Data Dictionary. These views provide information about users and access rights and grant, logon, and access activities. For details about views in the Data Dictionary, see “Teradata Database Data Dictionary Views” on page 9-6.

Introduction to Teradata Warehouse

17 – 13

Security Policies and Physical Access Control

Security Policies and Physical Access Control You can use the following methods to ensure the security of physical access to your Teradata Database and the hardware on which it runs.

Principle Considerations of a Security Policy The principle consideration for physical access control is establishing a security policy. The security policy is based on identification of: • •

Security needs Policies and procedures to meet those needs

Key Implementation Elements of a Security Policy The security policy for your Teradata Database should include two essential implementation elements: • •

System-enforced security features Personnel-enforced security features

You should write a set of security policies and procedures to be distributed to all users of the system. Among the topics you should cover in this document are: • • • •

17 – 14

Why security is needed Benefits of the security policy for the users and for the company Suggested security actions for users to follow Required security actions for users to follow

Introduction to Teradata Warehouse

Auditing and Accountability

Auditing and Accountability You can periodically audit events on Teradata Database to detect the following security hazards: • • •

Potential break-ins Attempts to gain unauthorized access to database resources Attempts to alter the behavior of Teradata Database auditing facilities

Teradata Database automatically audits all logon and logoff activity. However, you can specify additional audits of attempts to access data, by configuring the system to log one or any combination of the following parameters: • • •

All access requests made (for all or specific users) All access requests denied (for all or specific users) Specific types of access requests made (for all or specific users)

You can examine or print the audit data during normal system operations, or archive the data to review offline and generate reports. You can use SQL to select data from the audit log during normal operations. If you identify unauthorized or undesirable activity, you take one of the following remedial actions to address the problem: • • • • •

Change the security policy Change compromised passwords Audit intensively all actions of particular users Change access rights Deny the offending users any access to Teradata Database (in extreme cases)

Introduction to Teradata Warehouse

17 – 15

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: If you want to learn more about…

THEN see…

Auditing

Security Administration

C2 level or equivalent security Client (TDP) security

Teradata Director Program Reference

Database Window

Database Window

DBS Control utility

Utilities

Gateway Global utility GRANT logon permissions

Security Administration

Security administration and security System views related to security

Data Dictionary

Tables in the Data Dictionary

17 – 16

Introduction to Teradata Warehouse

Chapter 18:

System Administration This chapter discusses space allocation, roles and profiles, accounting, and maintenance on the Teradata Database as they relate to system administration. Topics include: • • • •

Space allocation for databases and users Roles and profiles for users Accounting Maintenance utilities

Introduction to Teradata Warehouse

18 – 1

Space Allocation for Databases and Users

Space Allocation for Databases and Users Space allocation for the Teradata Database relates not only to the disk space that databases require, but to the space required to define users. In the Teradata Database, a database is a collection of related tables, views, stored procedures, and macros. A database also contains an allotment of space from which users can create and maintain their own tables, views, macros, stored procedures, or other users or databases. A database and a user are almost the same thing in the Teradata Database. The difference is that a user can log on to the system whereas the database cannot. A user identifies someone who can log on to both the system and a database.

Databases and Users When the Teradata Database is first installed on a server, only one user exists on the system, that is user, DBC. The database administrator typically manages this user and assigns space from the user DBC to all other organizations. The user DBC owns all other databases and users in the system. To protect the security of system tables within the Teradata Database, the database administrator typically creates a database administrator user from DBC. The usual procedure is to assign all database disk space that system tables do not require to the new administrator database. The database administrator then uses this database as a resource from which to allocate space to the databases and users of the system.

How to Create a Finance and Administration Database When you create a new database or allocate space to a user, the system assigns disk space from the space belonging to an existing database or user. The creating database (or user) is the owner of the new database (or user space). The owner permanently grants a specified amount of space to the new database or user, which is then subtracted from the total unused space available to the user. Consider the following scenario: the database administrator needs to create a Finance and Administration (F&A) department database with user Jones as a supervisory user, or database administrator within the F&A department. The database administrator first creates the F&A database, then allocates space from it to Jones to act as the F&A database administrator. The database administrator also allocates space from F&A to Jones for his personal use and for creating a Personnel database, as well as other databases, and other user space allocations.

18 – 2

Introduction to Teradata Warehouse

Space Allocation for Databases and Users

The following figure shows the hierarchy of this relationship.

DBC User/ Database

System Administrator User/Database

F&A Database

Personnel Database

User Jones

• • •

Other Department Database

Other Users and Databases for the Department HD08B001

The F&A Database owns Personnel and all the other department databases. F&A also owns User Jones and all other users within the department. Because the user DBC ultimately owns all other databases and users, it is the final owner of all the databases and user space belonging to the organization. This hierarchical ownership structure provides the owner of a database or user space with complete control over the security of owned data. The owner can archive the database or control access to it by granting or revoking privileges on it. For more information on granting and revoking access privileges, see Chapter 17: “Security and Integrity.”

Introduction to Teradata Warehouse

18 – 3

Space Allocation for Databases and Users

How to Create Databases The previous section explains the concept of databases and users in the Teradata Database environment. This section explains the mechanics of how to create a database from DBC. Before you can create tables, views, users, stored procedures, or macros, you must first create a database. Use the SQL statement CREATE DATABASE to create a database. The following example shows the SQL statement used to create the Personnel database from the database, Administration: CREATE DATABASE Personnel FROM Administration AS PERMANENT = 5000000 BYTES, FALLBACK, BEFORE JOURNAL, DUAL AFTER JOURNAL, DEFAULT JOURNAL TABLE = Personnel.FinCopy;

The Personnel database is created from the space available in Administration. The 5000000 value represents bytes of storage. To create a database, the initiator must have CREATE DATABASE privileges on the FROM entry. In this example, the initiator must have CREATE DATABASE privileges on Administration. The new database receives all privileges that have been granted to the initiator. The FALLBACK keyword specifies that a duplicate copy of each table is stored in addition to the original for each table created in the Personnel database. The JOURNAL option specifies that a single copy of the before change image and dual copies of the after change image are maintained for each data table. A duplicate before change image is maintained automatically for any table in this database that uses both the fallback and the journal defaults. The DEFAULT JOURNAL TABLE clause is required because journaling is requested. This clause specifies that a new journal table named “FinCopy” is to be created in the new database.

How to Create Users This section explains the mechanics of how to create a user. The SQL statement for creating a user is CREATE USER. The statement authorizes a new user identification (user name) for the database and specifies a password for user authentication. Because the system creates a database for each user, the CREATE USER statement is very similar to the CREATE DATABASE statement.

18 – 4

Introduction to Teradata Warehouse

Space Allocation for Databases and Users

The following example shows the SQL statement used to create user Jones in the F&A database: CREATE USER Jones FROM F&A AS PERMANENT = 1000000 BYTES, SPOOL = 1000000 BYTES, PASSWORD = Jan, FALLBACK, ACCOUNT = ‘Administration’, STARTUP = ‘DATABASE F&A;’ ;

The optional STARTUP clause specifies one or more Teradata SQL statements that the system can execute automatically when the user establishes a session. Any user who performs this statement must have a CREATE USER privilege on the owner database or be its owner. The system automatically grants the new user all privileges on tables, views, and macros created in this space. The new user gets only DROP PROCEDURE privilege on the stored procedure objects created in this space. Note: In the Microsoft Windows environment, the Single Sign On feature negates the need for users to enter user names, passwords, and account ids. For more information about this feature see “SQL Used to Control Logon” on page 17-12.

Introduction to Teradata Warehouse

18 – 5

Roles and Profiles for Users

Roles and Profiles for Users The task of system administration can be simplified using the features provided by roles and profiles. You may think of a role as pseudo-user with privileges on a number of database objects. A profile can be viewed as a container that holds a set of parameters, such as database, spool space, temporary space, and accounts, to which the system administrator assigns certain values. After creating roles and profiles, the system administrator assigns them to users. Roles and profiles simplify system administration by: Using…

Simplifies administration because …

roles to automatically grant rights to database objects to all users assigned to the role

when users change jobs within their organizations, changing roles is far easier than deleting old rights and granting new rights that go along with their new jobs.

profiles to efficiently change the parameter values associated with users

you change a parameter value once in the profile instead of updating the value for each user.

Teradata allows you to make all roles available to a user by doing one of the following: • •

By submitting a SET ROLE ALL statement during the current session Upon logon, when the default role of the user was set to ALL through a CREATE USER or MODIFY USER statement

Having access to the privileges in all roles is useful when validating access rights. To learn more about using profiles to ensure password security, see “Encryption” on page 17-9.

18 – 6

Introduction to Teradata Warehouse

Accounting

Accounting This section describes the accounting options available for the Teradata Database. Among the areas covered are: • • •

Session management Account usage Account performance groups

Session Management Users must log on to the Teradata Database and establish a session before they can do any accounting.

Establishing a Session To establish a session, the user logs on to the database. The procedure varies depending on the client system, the operating system, and whether the user is an application program, or a user in an interactive terminal session using BTEQ or a third-party query processing product.

Logon Operands The logon string can include any of the following operands: • • • •

Optional identifier for the database, called a tdpid User name Password Optional account number

Note: In the Windows environment, the Single Sign On feature negates the need for users to enter usernames, passwords, and account ids after they have logged on using their authorized user names and passwords. For more information about this feature, see “SQL Used to Control Logon” on page 17-12.

Session Requests A session is established after the database accepts the user name, password, and account number and returns a session number to the process.

Introduction to Teradata Warehouse

18 – 7

Accounting

Subsequent Teradata SQL requests generated by the user and responses returned from the database are identified by: • • •

Host id Session number Request number

The database supplies the identification automatically for its own use. The user is unaware that it exists. The context for the session also includes a default database name that is the same as the user name. When the session ends, the system discards the context and accepts no further Teradata SQL statements from the user.

Account String Expansion The principal Teradata Database feature for accounting is the optional Account String Expansion (ASE) capability. You must modify user logon strings in order to use ASE. To enable ASE, you establish one or more account identifiers for new users when the users are created or modified. When the users log on, they must supply an account identifier as a part of the logon string. The users may enter the identifier explicitly, or the system will supply an identifier by default. Each time the system determines that a new account string is in effect, it begins collecting new AMP usage and I/O statistics. The system stores the accumulated statistics for a user/account string pair as a row in DBC.AMPUsage table in the Data Dictionary. Each user/account string pair results in a new set of statistics and an additional row. You can use this information in capacity planning or in chargeback and accounting software. At the finest granularity, ASE can generate a summary row for each SQL request. You can also direct ASE to generate a row for each user, each session, or for an aggregation of the daily activity for a user. ASE permits you to use substitution variables to include date and time information in the account id portion of a user logon string. The system inserts actual values for the variables at Teradata SQL execution time.

Account Performance Groups Resource partitions divide system resources for allocation to major user groups. Each session is assigned, either explicitly or implicitly, to a performance group, and each performance group is assigned a proportional resource weight. This allows administrators to control resources of the group rather than individual users based either on time of day or resource consumption.

18 – 8

Introduction to Teradata Warehouse

Accounting

The Priority Scheduler is used to manage the workload based on the relative priority of the resource weight of each group. This weight does not guarantee system responsiveness in a corresponding proportion because responsiveness is a function of overall system activity. When an account id prefixed with a group code is provided in a LOGON string, the session is assigned to the associated performance group when the logon is successful. If this form of account id is not present, the session is assigned a default value that corresponds to a medium priority for the default performance group.

Introduction to Teradata Warehouse

18 – 9

Maintenance Utilities

Maintenance Utilities A large number of utilities are available to perform maintenance functions on the Teradata Database. Most, but not all, utilities are invoked from the Database Window (DBW). The following table lists the Teradata Database utilities. The utility …

Allows you to …

Abort Host

abort all outstanding transactions running on a failed host until the system restarts the host.

CheckTable

check for inconsistencies between internal data structures, such as table headers, row identifiers, and secondary indexes.

ampload

display the load on all AMP vprocs in a system, including the number of: •

Available AMP worker tasks (AWTs)

•

Waiting messages waiting (message queue length)

cnsrun

start and run a database utility from a script.

Configuration

define AMPs, PEs, and hosts and their interrelationships for a Teradata Database.

ctl

display and modify the fields of the Parallel Data Extensions (PDE) Control Parameters Globally Distributed Objects (GDOs). Note: ctl is a Windows 2000 utility.

Database Initialization Program (DIP)

execute one or more of the standard DIP Structured Query Language (SQL) scripts packaged with Teradata Database.

DBS Control

interactively display and modify the DBS Control Record fields.

Dump Unload/ Load (DUL)

save or restore system dump tables onto tape.

Ferret

do the following: •

Define the scope of an action, such as a range of selected tables or vprocs

•

Display the parameters and scope of the action

•

Perform the action by either: Moving data to reconfigure data blocks and cylinders Or, displaying disk space and cylinder free space percent in use of the defined scope

Filer

18 – 10

find and correct problems within the Teradata File System.

Introduction to Teradata Warehouse

Maintenance Utilities The utility …

Allows you to …

fsgwizard

manipulate Teradata Database file segments that have been placed in an errored state. Note: This is a UNIX MP-RAS utility.

Gateway Control

modify default values in the fields of the Gateway Control Globally Distributed Object (GDO).

Gateway Global

monitor and control the Teradata network-connected users and their sessions.

Lock Display

view a snapshot capture of all real-time database locks and their associated currently running sessions.

Locking Logger

log the following: •

Transaction identifiers

•

Session identifiers

•

Lock object identifiers

•

Lock levels associated with executing SQL statements.

modmpplist

modify the node list file (mpplist).

Priority Scheduler

prioritize process scheduling.

Query Configuration

report the current Teradata Database configuration, including the node, AMP, and PE identification and status.

Query Session

monitor the state of all or selected Teradata Database sessions on all or selected logical host ids.

Reconfiguration

use the component definition created by Configuration to establish an operational Teradata Database.

Reconfiguration Estimator

estimate an elapsed time for reconfiguration based upon the number and size of tables on your current system and provides estimates for the following phases:

Processes have an externally assigned priority associated with their Teradata Database session. Priority Scheduler uses the priority to allocate CPU and I/O resources.

•

Redistribution

•

Deletion

•

Nonunique secondary index (NUSI) building

Recovery Manager

display information used to monitor progress of a Teradata Database recovery.

Resource Check Tools

do the following: •

Identify a slow-down or hang of the Teradata Database

•

Display system statistics that could lead to the cause of the slow down or hang

Introduction to Teradata Warehouse

18 – 11

Maintenance Utilities The utility …

Allows you to …

RSSmon

do the following: •

Display PDE real-time resource usage per node

•

Select relevant data fields from a specific Resource Sampling Subsystem (RSS) table to be examined for PDE resource usage monitoring purposes.

Note: This is a UNIX MP-RAS utility. Showlocks

display locks placed by Archive and Recovery and Table Rebuild operations on databases and tables.

System Initializer

do the following: •

Initialize the Teradata Database. Create

•

Update the DBS Control Record and other Globally Distributed Objects (GDOs)

•

Initialize or update configuration maps

Set hash function value in the DBS Control Record. Table Rebuild

rebuild tables that the Teradata Database cannot automatically recover, including the following: •

Primary or fallback portion of a table

•

An entire table

•

All tables in a database

•

All tables in an Access Module Processor (AMP)

Table Rebuild can be run as an interactive or a background task. tdlocaledef

convert the Source Specification for Data Formatting (SDF) into an internal form usable byTeradata Database.

tdnstat

do the following: •

Perform a GetStat/ResetStat operations

•

View, get, or clear the Teradata Network Services specific statistics.

tdntune

perform a read/write of tdn tunables. You can use the interface to view, get, or update the Teradata Network Services, which are specific to tunable parameters.

Teradata MultiTool

use a Windows Graphical User Interface (GUI) to run commandline-based Teradata Database and PDE tasks. Note: This is a Windows 2000 utility.

TPCCONS

18 – 12

perform the following 2PC-related functions: •

Display a list of coordinators that have in-doubt transactions

•

Display a list of sessions that have in-doubt transactions

•

Resolve in-doubt transactions

Introduction to Teradata Warehouse

Maintenance Utilities The utility …

Allows you to …

tsklist

display information about PDE processes and their tasks. Note: This is a Windows 2000 utility.

Update DBC

recalculate the PermSpace and SpoolSpace values in the DBASE table for the user DBC and the MaxPermSpace and MaxSpoolSpace values of the DATABASESPACE table for all databases based on the values in the DBASE table.

Update Space

recalculate the permanent, temporary, or spool space used by a single database or by all databases in a system.

vpacd

improve the performance of systems with several CPUs and a high level of concurrency. Note: This is an NCR UNIX MP-RAS utility.

Vproc Manager

manage the virtual processors (vprocs), such as obtain status of all or some vprocs, initialize vprocs, force a vproc restart, and force a Teradata Database restart.

xctl

display and modify the fields of the Parallel Database Extensions (PDE) Control Parameters Globally Distributed Objects (GDOs). Note: This is a UNIX MP-RAS utility.

xmppconfig

manipulate the contents of the node table file, which contains a list of nodes and their configurations. The system configuration information is provided to the Procedural Management Subsystem (PROC) of PDE. Note: This is a UNIX MP-RAS utility.

xperfstate

display real-time performance data for a PDE system, including system-wide CPU utilization, system-wide disk utilization, and more. Note: This is a UNIX MP-RAS utility.

xpsh

use a GUI front-end for performing various system-level tasks in an MPP system environment, such as debugging, analyzing, monitoring, sysadmins, and so forth. Note: This is a UNIX MP-RAS utility.

Introduction to Teradata Warehouse

18 – 13

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books: IF you want to learn more about…

THEN see…

Accounting

Database Administration

Archive and recovery utilities

Teradata Archive/Recovery Utility Reference

CREATE DATABASE statement

SQL Reference: Data Definition Statements

Maintenance utilities

Utilities

Roles and Profiles for Users

Database Design SQL Reference: Data Definition Statements SQL Reference: Functions and Operators SQL Reference: Fundamentals

Space Allocation for Databases and Users

18 – 14

Database Administration

Introduction to Teradata Warehouse

Chapter 19:

System Monitoring This chapter discusses various aspects of monitoring the Teradata Database, including the monitoring tools used to track system and performance issues. Topics include: • • • •

Teradata Manager System and configuration status through the Database Window Resource usage monitoring Performance monitoring

Introduction to Teradata Warehouse

19 – 1

Teradata Manager

Teradata Manager The Teradata Manager is suite of management tools and applications available in Teradata Tools and Utilities. You can use them to monitor, control, and administer one or more Teradata Database servers. The suite of performance monitoring applications collects, queries, manipulates, and displays performance and usage data. This information allows you to quickly identify and resolve resource usage abnormalities. Teradata Manager can displays dynamic and historical data in graphical and tabular formats. The client/server feature of Teradata Manager replicates performance data in the Teradata Database server for access by any number of clients. Because data is collected once, workload on the Teradata Database remains constant while the number of client applications varies. You can access information from a desktop, laptop, or the Wireless Palm VII.

19 – 2

Introduction to Teradata Warehouse

Teradata Manager

The information in the following table summarizes Teradata Manager control applications: Function

Performance applications

Application/Description

Teradata Performance Monitor (PMON) Provides seven functional areas for monitoring system activity: •

Configuration summary

•

Performance summary

•

Resource usage (both physical and virtual)

•

Session and lock information

•

Session history

•

Control functions

•

Graphic displays of resource and session data

Teradata Priority Scheduler Administrator •

Provides administrative capabilities for Teradata Priority Scheduler.

•

Prevents bottlenecks and speeds responses to queries by automatically balancing the database workload.

•

Ensures that queries requiring immediate handling are given priority treatment by letting the jobs cut in line ahead of lower priority work.

Centralized Alerts/Event Management Facilitates the monitoring of performance characteristics and faults. It can automatically send a page or an e-mail when certain events occur. Alert Policy Editor Allows you to define actions and specify when action should be taken based on thresholds that you set for the following:

Introduction to Teradata Warehouse

•

Teradata Database performance metrics

•

Database space utilization,

•

Messages in the database Event Log.

19 – 3

Teradata Manager Function

Performance applications (continued)

Application/Description

The Alert Viewer Allows you to easily view system status for multiple systems. Trend Analysis Allows you to study Teradata Database resource utilization trends from summarized reports displayed as charts. You can do the following:

Database management applications

•

Detect resource usage abnormalities

•

Determine the onset of a problem

•

Analyze the impact of the problem on the system

Teradata Administrator Allows you to perform database administration tasks, such as: •

CREATE, MODIFY and DROP users or databases

•

CREATE tables (using ANSI- or Teradata-mode syntax)

•

GRANT or REVOKE access/monitor rights

•

COPY table, view, or macro definitions to another database or to another system

•

DROP or RENAME tables, views, or macros

•

Move space from one database to another

•

Run an SQL query

•

Display information about a database

•

Display information about a table, view, or macro

Space Usage Monitors disk space utilization and re-allocates permanent space from one database to another. System Maintenance Provides various macros for performing clean-up of system tables.

19 – 4

Introduction to Teradata Warehouse

Teradata Manager Function

Operational control

Application/Description

Session Information Monitors the status of sessions on Teradata. The status information includes: •

Idle

•

Active

•

Blocked

•

Responding

•

Parsing

•

Aborting

•

Details

•

Prolonged idles

Remote Console Allows you to run many of the Teradata console utilities from the Teradata Manager PC. Error Log Analyzer Provides an interface to view the error logs for an associated Teradata Database. LogOnOff Usage Presents daily, weekly, and monthly logon statistics. BTEQ Window (BTEQWIN) Provides a graphical Windows-type interface to BTEQ. Gives Teradata Manager applications a consistent, graphical interface. Access management

Allow you to manage security access to the database using the features of Teradata Administrator and Profile capabilities. Teradata Administrator establishes account and privilege assignments that control access to the Teradata Database. Profile capabilities allow you to create user profiles that define who can access certain Teradata Database and Teradata Manager applications.

Introduction to Teradata Warehouse

19 – 5

System and Configuration Status

System and Configuration Status The Database Window (DBW) is the primary vehicle for starting and controlling the operation of the Teradata Database utilities, and runs in a graphical X Window or Microsoft Windows environment. The DBW communicates with the Teradata Database through the console subsystem (CNS), which is part of the Parallel Database Extensions (PDE) software. By definition, the Teradata Database is always in one of several states. You can monitor these states from the Database Window (DBW). The following table lists and describes the states: Status

Offline

Description

Either the processor to which the database console is attached or the entire database has been started offline. The database cannot be accessed from a client or used for processing.

19 – 6

Startup

The system is starting up but is not ready to process requests.

Logoff

No new sessions may log on (logons are disabled), but one or more sessions remain logged on.

Logoff/Quiet

No new sessions may log on, and no sessions are currently logged on. The system is quiescent.

Logon

New sessions may log on (logons are enabled) and one or more sessions are currently logged on.

Logon/Quiet

New sessions may log on (logons are enabled), but no sessions are logged on.

Reconfig

The reconfiguration program is running.

Introduction to Teradata Warehouse

Resource Usage Monitoring

Resource Usage Monitoring The Teradata Database has facilities that permit you to monitor the use of resources such as: • • • •

CPUs AMPs Disk activity BYNET activity

Resource usage, or ResUsage, is the collection and reporting of statistical information about these resources. You can use resource usage data to: • Measure system benchmarks • Measure component performance • Assist with on-site job scheduling • Identify potential performance impacts • Plan installation, upgrade, and migration • Analyze performance degradation and improvement • Identify problems such as bottlenecks and parallel inefficiencies

Resource Usage Tables and Views Resource usage data is stored in Teradata Database tables and views in the DBC database. Macros installed with Teradata Database generate reports that display the data. You can also write your own queries or macros on resource usage data. As with other database data, you can access resource usage data using SQL. You need to decide which kinds of resource usage data you want to collect and the level of detail you want it to cover.

Resource Usage Data Categories Each row of resource usage data contains two broad categories of information: • •

Housekeeping, containing identifying information Statistical

Each item of statistical data falls into a defined kind and class. Each kind corresponds to one (or several) different things that may be measured about a resource.

Introduction to Teradata Warehouse

19 – 7

Resource Usage Monitoring

Resource Usage Data Handling Resource usage data handling is divided into two phases: Stage

Action

1

Various subsystems gather resource usage data and the Resource Sampling Subsystem (RSS) collects the data in collect buffers.

2

The collected data is logged to ResUsage tables periodically (as determined by user-defined logging intervals).

The logged resource usage data is then available for analysis by the various ResUsage macros.

Resource Usage Macros The facilities for analyzing resource usage data are provided by means of a set of ResUsage macros tailored to retrieving information from a set of system views designed to collect and present resource usage information.

How to Control Collection and Logging of Resource Usage Data Several mechanisms exist within the Teradata Database for setting the collection and logging rates of resource usage data. The control sets allow users to do any of the following: • • • •

Specify data collection rate Specify data logging rate Enable or disable ResUsage data logging on a table-by-table basis Enable or disable summarization of the data

Collection rates control the frequency that resource usage data is made available to applications. Logging rates control the frequency that resource usage data is logged to the ResUsage tables. You can specify data collection without specifying logging. This capability saves space in system tables while making resource usage data available to applications, such as Teradata Performance Monitor. You can use the Database Window (DBW) command SET LOG to establish the logging of resource usage information. The system inserts data into ResUsage tables every logging period for the tables that have logging enabled. You can use the statistics collected in the ResUsage tables to analyze system bottlenecks, determine excessive swapping, and detect system load imbalances.

19 – 8

Introduction to Teradata Warehouse

Resource Usage Monitoring

Summary Mode You can activate summarization mode for many ResUsage tables independently. This mode reduces database I/O by summarizing data from multiple vprocs and other objects on each node in one representative row. The summarization reduces detail, but the data is very useful for exploratory analysis of performance problems and general resource usage issues. When the summarization mode is active, the different classes of data are summarized as follows: • •

The cnt and cur fields contain the sum of all the summarized values they represent. The max fields contain the maximum of all the summarized values they represent.

Introduction to Teradata Warehouse

19 – 9

Performance Monitoring

Performance Monitoring Several facilities exist for monitoring and controlling system performance. This section briefly discusses many of these facilities.

The TDPTMON The Teradata Director Program (TDP) User Transaction Monitor (TDPTMON) is a client routine that enables a system programmer to write code to track TDP elapsed time statistics.

System Management Facility The System Management Facility (SMF) is available in the Multiple Virtual Storage (MVS) environment only. This facility collects data about Teradata Database performance, accounting, and usage. Data is grouped into the following categories: • • •

Session information Security violations PE stops

The Performance Monitor/Application Interface The Performance Monitor/Application Programming Interface (PM/API) provides hooks into the Performance Monitor and Production Control (PM and PC) functions resident within the Teradata Database. PM and PC data is available through a log-on partition called MONITOR using a specialized PM/API subset of the Call-Level Interface version 2 (CLIv2) routines. The PM/API uses the Resource Sampling System (RSS) to collect performance data, and set data sampling and logging rates. Collected data is stored in memory buffers, and is available to the PM/API with little or no performance impact. Using PM/API commands, you can collect performance data on: • • • •

Current system configuration, status, and utilization Resource usage and status of an individual AMP, PE, or node Resource usage and status of individual sessions Problem SQL requests

PM/API data may be used to show how efficiently the Teradata Database is using its resources, to identify problem sessions and users, and to abort sessions and users having a negative impact on system performance.

19 – 10

Introduction to Teradata Warehouse

For More Information

For More Information For more information on the topics presented in this chapter, see the following Teradata Database and Teradata Tools and Utilities books. IF you want to learn more about…

THEN see…

Controlling operation of Teradata Database using Database Window

Database Window

Performance Monitor/Application Interface

PM/API Reference

Priority Scheduler

Utilities Teradata Manager User Guide

Resource Usage

Resource Usage Macros and Tables

Teradata Performance Monitor

PM/API Reference Teradata Manager User Guide

Teradata Manager

Introduction to Teradata Warehouse

Teradata Manager User Guide

19 – 11

For More Information

19 – 12

Introduction to Teradata Warehouse

Index Numerics 1NF, first normal form 12–5 2NF, second normal form 12–5 2PL 15–14 3NF, third normal form 12–6 4NF, fourth normal form 12–7 5NF, fifth normal form 12–7

A Access lock 15–9 Access Processor Modules. See AMPs Account String Expansion. See ASE Accounting account performance groups 18–8 ASE 18–8 DBC.AMPUsage table 18–8 session management 18–7 Active session management Gateway Global utility 16–6 Query Configuration 16–5 Query Sessions 16–5 Administration Workstation. See AWS Aggregate join indexes 8–7 Alternate key, definition 12–3 AMPs clusters 3–10, 14–4 data distribution using hashing 8–14 data distribution using indexes 8–3 down AMP journal 14–6 down AMP recovery 15–13 functions 3–9 operation 3–13 SELECT statement processing 3–14 step processing 3–14 vproc migration 14–2 vprocs 3–8 ANSI mode transactions 15–5 Application development embedded SQL applications 6–3 explicit 6–2 implicit 6–2 platforms 6–4 Preprocessor2 6–4

Application development languages C 6–4 COBOL 6–4 PL/I 6–4 Architecture BYNET 3–2 cliques 3–6 disk arrays 3–5 hot standby nodes 3–7 MPPs 3–2 processor node 3–2 SMPs 3–2 TPA 2–3 vprocs 3–8 workstations 3–18 Archive utilities NetBackup 16–2 NetVault 16–2 Teradata Archive/Recovery 2–8, 14–7, 16–2 Teradata Tools and Utilities 2–13 ASE account string identifiers 18–8 accounting 18–8 logon string 18–8 Attachment methods channel 2–2 network 2–2 Audits addressing problems 17–15 identification of security hazards 17–15 AWS platform 3–18 purpose 3–18

B Basic Teradata Query Facility. See BTEQ Batch referential integrity constraint definition 12–11 level of enforcement 12–11 Battery backup 14–8 BCNF, Boyce-Codd normal form 12–7 Boardless BYNET 3–4 BTEQ attachment methods 16–22 capabilities 16–22 Teradata Tools and Utilities 2–8, 2–9

Introduction to Teradata Warehouse

Index –1

BYNET boardless 3–4 inter-network communication 3–2 multiple 14–8 purpose 3–2, 3–3

C C application development language 6–4 Teradata Tools and Utilities 2–10 Call Level Interface Version 2. See CLIv2 Candidate keys Boyce-Codd normal form 12–7 definition 12–3 fifth normal form 12–7 Channel-attached systems mainframe 2–2 multiple connections 14–8 supported operating systems 13–3 TDP 13–3 Checksums 3–17 Child table 12–8 Cliques disk arrays 3–6 hardware fault tolerance 14–9 purpose 3–6 vproc migration 14–9 CLIv2 channel-attached systems 13–3 definition 13–2 network-attached systems 13–5 PM/API 19–10 support for network data encryption 17–9 Teradata Tools and Utilities 2–8, 2–9 Clusters AMPs 3–10 fault tolerance 3–10 COBOL application development language 6–4 Teradata Tools and Utilities 2–10 Columns attributes 7–3 identity 8–15 Communications interfaces CLIv2 13–2 JDBC 13–7 MOSI 13–5

Index –2

MTDP 13–5 ODBC 13–7 TDP 13–3 WinCLI 13–7 Comparison of partitioned and non-partitioned primary indexes 8–5 Concurrency control definition 15–2 locks 15–7 transactions 15–4 using 2PL 15–14 Constraints and normal forms 12–2 definition 12–4 referential integrity 12–11 rules for referential integrity 12–12 table 7–4 Cursors definition 5–16 Preprocessor2 5–16 SQL statements related to 5–16 stored procedures 5–16 Customer Information Control System 2–8 Cylinder Read maximum default data block size 3–16 purpose 3–16

D Data access control explicit access rights 17–12 implicit rights 17–12 levels of 17–12 views 17–13 Data attributes purpose 5–6 summary of 5–7 Data communications communications interfaces 13–2 for Microsoft Windows and UNIX systems 13–7 Data connector 2–11 Data Control Language. See DCL Data Definition Language. See DDL Data Dictionary DBC.AMPUsage table 18–8 SQL statements related to 9–8 structure 9–6 views 9–6, 9–7 Data distribution hashing 8–14 indexes 8–2

Introduction to Teradata Warehouse

Data load and unload utilities data connector 2–11 Teradata FastExport 16–4 Teradata FastLoad 16–3 Teradata MultiLoad 16–3 Teradata TPump 16–4 Data management active sessions 16–5 archive utilities 16–2 Open Teradata Backup 16–2 system resources 16–7 Data Manipulation Language. See DML Data types ANSI-compliant 5–6 purpose 5–6 Teradata 5–6 Data warehouse active data warehouse 1–3 definition 1–2 Database level locks 15–8 Database object use count 16–20 Database Query Analysis Tools. See DBQAT Database Query Log. See DBQL Database Window. See DBW Databases creation 18–4 database object use count 16–20 DBQAT 16–12 DBQL 16–17 DBW 3–19 space allocation 18–2 DBQAT database object use count 16–20 DBQL 16–17 QCD 16–15 Query Capture Facility 16–15 Teradata Index Wizard 16–13 Teradata Visual Explain 16–16 TSET 16–19 DBQL query information 16–17 TLE support 16–17 user information 16–17 DBW supervisor window 16–11 Teradata MultiTool 16–11 use 3–19, 16–11, 18–10 DCL access control capabilities 5–4 statements 5–4

DDL data definition capabilities 5–3 statements 5–3 Deadlocks resolution 15–10 transaction rollback 15–10 Dependencies full functional 12–3 functional 12–3 multivalued 12–3 Determinant 12–3 DIP Teradata MultiTool 16–11 use 16–11 Disk arrays LUNs 3–5 pdisks 3–5 RAID 3–5 RAID1 14–8 vdisks 3–5 Disk I/O Integrity Checking checksums 3–17 purpose 3–16 SQL statements related to 3–17 Dispatcher operation 3–12 purpose 3–9 DML request processing capabilities 5–4 statements 5–5 Down AMP journal 14–6 recovery 15–13

E Embedded SQL applications 6–3 Encryption CLIv2 17–9 Gateway Control utility 17–9 logon 17–9 network data 17–9 Teradata Gateway 17–9 Exclusive HUT lock 15–11 Exclusive lock 15–9 EXPLAIN statement definition 6–10 use 6–10 Explicit access rights 17–12 Explicit application development 6–2

Introduction to Teradata Warehouse

Index –3

Extended language support. See International character support

F Fallback table 14–3 Fault tolerance clusters 3–10 hardware 14–8 software 14–2 Ferret 16–7 Foreign keys and referential integrity 12–8 and system integrity 17–3 definition 12–3 Full table scans, strengths and weaknesses 8–12 Functional dependencies definition 12–3 full functional 12–3 Functions aggregate 5–12 definition 5–12 ordered analytical 5–13 scalar 5–12 user-defined 5–14

G Gateway Global utility 16–6, 17–7 Gateway. See Teradata Gateway Generator, purpose 3–9 Global temporary tables 7–4 Group Read HUT lock 15–11

H Hardware fault tolerance battery backup 14–8 cliques 14–9 hot swap 14–9 multiple BYNETS 14–8 multiple channel and network connections 14–8 redundant power supplies and fans 14–8 server isolation 14–8 Hash indexes 8–9

Index –4

Hashing data distributing 8–14 primary indexes 8–14 secondary indexes 8–14 Host Utility Locks. See HUT locks Hot standby nodes definition 3–7 use 3–7 Hot swap components 14–9 definition 14–9 HUT lock types Exclusive 15–11 Group Read 15–11 Read 15–11 Write 15–11 HUT locks characteristics of 15–11 used by Teradata Archive/Recovery 15–11

I IBM IMS/DC 2–8 Identity column column attribute 8–15 unique row number generator 8–15 Implicit access rights 17–12 Implicit application development 6–2 Indexes comparison of primary and secondary 8–6 hash 8–9 join 8–7 primary 8–3 secondary 8–6 specification 8–10 SQL statements related to 8–10 strengths and weakness 8–11 types of 8–2 uses 8–2 International character set support client character sets 4–3 client character sets overview 4–2 data translation 4–3 diacritical marks 4–7 extended support 4–9 extended support overview 4–9 internal character sets 4–4 Japanese support 4–8 overview 4–1

Introduction to Teradata Warehouse

standard support 4–7 standard support for compatible languages 4–7 system dictionary data 4–4, 4–6

J Japanese language support. See International character set support, Japanese support JDBC driver 13–7 Teradata Tools and Utilities 2–10 Join dependency 12–4 Join indexes aggregate 8–7 covering and partially covering 8–7 multi-table 8–7 multi-table, partially covering 8–7 single table 8–7 sparse 8–8 strengths and weaknesses of types of join indexes 8–12 Joins and the SELECT statement 5–11 definition 12–3 Journals down AMP 14–6 permanent 14–6 transient 14–6

K Keys alternate 12–3 candidate 12–3, 12–7 foreign 12–3, 12–8, 17–3 parent 12–8 primary 8–3, 12–3, 12–8, 17–3

L Lock levels database 15–8 row hash 15–8 table 15–8

Lock types access 15–9 exclusive 15–9 read 15–9 write 15–9 Locks deadlocks 15–10 HUT 15–11 levels 15–8 types 15–9 Logical Units. See LUNs Logon ASE 18–8 logon string 17–5 logon string operands 17–6, 18–7 password security 17–10 sessions 18–7 Single Sign On 17–7 SQL statements related to logon control 17–12 Logon encryption 17–9 LUNs RAID 3–5 vprocs 3–5

M Macros definition 6–5, 11–5 processing 11–5 resource usage 19–8 single and multi-user 11–5 SQL statements related to 6–5, 6–6, 11–5 use 6–6 Mainframe utilities 2–8 Massively Parallel Processing. See MPPs Micro Operating System Interface. See MOSI Micro Teradata Director Program. See MTDP MOSI definition 13–5 network-attached systems 13–5 MPPs architecture 3–2 hardware platform 3–2 workstation connections 3–18 MTDP definition 13–5 interface 13–5 network-attached systems 13–5 Multi-table join indexes strengths and weaknesses 8–12 use 8–7

Introduction to Teradata Warehouse

Index –5

Multi-table, partially covering join indexes 8–7 Multivalued dependence 12–3

Optimizer purpose 3–9 SQL request implementation 3–12

N NetBackup 2–13, 16–2 NetVault 2–13, 16–2 Network data encryption, CLIv2 17–9 Network-attached systems MOSI 13–5 multiple connections 14–8 supported operating systems 13–5 Network-attached systems, LAN 2–2 Non-partitioned primary index 8–5 Non-unique primary index. See NUPI Non-unique secondary index. See NUSI Normal forms 1NF 12–5 2NF 12–5 3NF 12–6 4NF 12–7 5NF 12–7 BCNF 12–7 Boyce-Codd 12–7 definition 12–2 fifth 12–7 first 12–5 fourth 12–7 second 12–5 third 12–6 Normalization normal forms 12–2 purpose 12–2 NUPI, strengths and weaknesses 8–11 NUSI, strengths and weaknesses 8–12

O ODBC communications interface 13–7 Teradata Tools and Utilities 2–9 OLE DB provider 2–9 Open Teradata Backup for Windows clients 2–13, 16–2 NetBackup 2–13 NetVault 2–13 Teradata Tools and Utilities 2–13

P Parallel Data Extensions. See PDE Parallel Upgrade Tool. See PUT Parent key 12–8 Parent table 12–8 Parser PE element 3–9 purpose 3–9 request processing 3–11 Parsing Engines. See PEs Partitioned primary index 8–5 Passwords attributes 17–10 DBC.SysSecDefaults table 17–10 security 17–10 user-level attributes 17–11 PDE MPP system enabling 3–15 task management with Teradata MultiTool 16–11 TPA and non-TPA 3–15 vprocs 3–15 pdisks 3–5 PE elements dispatcher 3–9 generator 3–9 optimizer 3–9 parser 3–9 session control 3–9 Performance Monitor/Application Programming Interface. See PM/API Performance monitoring. See System performance monitoring Permanent journals 14–6 PEs migration 3–6 purpose 3–8 request processing 3–11 SELECT statement processing 3–14 session control 3–8 vproc migration 14–2 vprocs 3–8 Phases of 2PL 15–4 PL/I application development language 6–4 Teradata Tools and Utilities

2–10

Index –6

Introduction to Teradata Warehouse

PM/API and resource usage 19–10 CLIv2 19–10 performance monitoring 19–10 third-party software support 6–13 Policies elements 17–14 security 17–14 Preprocessor2 application development 6–4, 16–25 C 2–10 COBOL 2–10 cursors 5–16 PL/I 2–10 Teradata Tools and Utilities 2–9 Primary indexes comparison of partitioned and non-partitioned 8–5 comparison with secondary 8–6 data distribution to AMPs 8–3 partitioned and non-partitioned 8–5 relationship with primary keys 8–3 unique and non-unique 8–3 Primary keys and system integrity 17–3 definition 12–3 first normal form 12–5 relationship with primary indexes 8–3 second normal form 12–5 third normal form 12–6 with respect to referential integrity 8–4, 12–8 Priority Scheduler account performance groups 18–8 Priority Scheduler Administrator 16–8, 19–3 resource management 16–7 Priority Scheduler Administrator Priority Scheduler 16–8 Teradata Manager 16–8, 19–3 Processor node 3–2 PUT and installation 2–6 operational modes 2–6

Q QCD applications 16–15 schema 16–15 Teradata Index Wizard 16–15 Teradata Visual Explain 16–16

Queries BTEQ 16–22 configuration 16–5 management 16–9 Preprocessor2 16–25 sessions 16–5 strategic 1–3 tactical 1–3 Teradata SQL Assistant 16–23 Query Capture Database. See QCD Query Configuration 16–5 Query facilities BTEQ 16–22 Preprocessor2 16–25 Teradata SQL Assistant 16–23 Query management 16–9 Query Sessions 16–5

R RAID LUNs 3–5 RAID1 14–8 storage technology 3–5 vdisks 3–5 Read HUT lock 15–11 Read lock 15–9 Recovery definition 15–3 down AMP 15–13 system and media 15–12 transaction 15–12 Referenced table (parent) 12–9 Referencing table (child) 12–9 Referential constraints checks 12–13 definition 12–11 level of enforcement 12–11 Referential integrity and system integrity 17–3 batch referential integrity constraint 12–11 benefits of 12–10 implementation 12–8 referencing and referenced tables 12–9 referential constraint 12–11 referential integrity constraints 12–11 rules 12–12 terms 12–8 Referential integrity constraints level of enforcement 12–11 types of 12–11

Introduction to Teradata Warehouse

Index –7

Referential integrity terminology child table 12–8 foreign key 12–8 parent key 12–8 parent table 12–8 primary key 12–8 Relational database terminology alternate key 12–3 candidate key 12–3 constraint 12–4 determinant 12–3 foreign key 12–3 functional dependencies 12–3 join dependency 12–4 joins 12–3 multivalued dependence 12–3 primary key 12–3 transitive dependence 12–3 Relational databases definition 7–3 relational model 7–2 set theory terminology 7–3 Relational model and relational databases 7–2 and theory of sets 7–2 Request processing 3–11 Request scheduling 16–9 Resource access control client identifiers 17–5 logon policies 17–5 Single Sign On 17–7 user identifiers 17–5 Resource usage categories of data 19–7 collection rate control 19–8 definition 19–7 macros 19–8 monitoring 19–7 summary mode 19–9 tables and views 19–7 Roles and profiles definition 18–6 use 18–6 Row hash locks 15–8 Rows row hash locks 15–8 tuples 7–3

Index –8

S Secondary indexes comparison with primary index 8–6 subtables 8–6 unique and non-unique 8–6 Security audits and accountability 17–15 categories 17–4 data access control 17–12 DBC.SysSecDefaults table 17–10 definition 17–2 logon encryption 17–9 network data encryption 17–9 passwords 17–10 policies 17–14 policy considerations 17–14 policy elements 17–14 resource access control 17–5 SQL statements related to logon control 17–12 Teradata Gateway 17–9 TPD 17–6 Security policies 17–14 considerations 17–14 elements 17–14 SELECT statement and joins 5–11 cursor declaration 5–16 options 5–10 processing 3–13 request data 5–10 set operators 5–10 Session control PE 3–8 purpose 3–9 Sessions how to establish 18–7 logon 18–7 management 18–7 Set operators and the SELECT statement 5–10 Set theory and relational databases 7–3 and the relational model 7–2 Set theory terminology attribute 7–3 relation 7–3 tuple 7–3 Shared Information Architecture 2–4, 17–12 Single Sign On Gateway Global utility 17–7 logon control 17–7

Introduction to Teradata Warehouse

Single-table join indexes strengths and weaknesses 8–12 use 8–7 SMPs architecture 3–2 boardless BYNET 3–4 hardware platform 3–2 workstation connections 3–18 Software fault tolerance AMP clusters 14–4 fallback tables 14–3 Table Rebuild utility 14–7 Teradata Archive/Recovery utility 14–7 vproc migration 14–2 Space allocation databases 18–2 users 18–2 Sparse join indexes strengths and weaknesses 8–13 use 8–8 SQL advantages of 5–2 aggregate function 5–12 cursors 5–16 data types 5–6 EXPLAIN 6–10 ordered analytical function 5–13 scalar function 5–12 SELECT statement 5–10 SELECT statement processing 3–13 statement components 5–9 statement execution 5–9 statement punctuation 5–8 statements related to Data Dictionary 9–8 statements related to disk I/O integrity checking 3–17 statements related to indexes 8–10 statements related to logon control 17–12 statements related to macros 6–5, 6–6, 11–5 statements related to stored procedures 6–7, 6–9 statements related to transactions 15–5, 15–6 statements related to triggers 11–7 statements related to UDFs 5–15 statements related to views 11–2 subordinate languages 5–3 user-functions 5–14 SQL functional families DCL 5–4 DDL 5–3 DML 5–4 Standard language support. See International character set support, standard support Strategic queries 1–3

Subtables in secondary indexes 8–6 Supported operating systems channel-attached systems 13–3 network-attached systems 13–5 Symmetric Multi-Processing. See SMPs System administration accounting 18–7 database creation 18–4 maintenance 18–10 performance monitoring 19–10 roles and profiles 18–6 space allocation 18–2 user creation 18–4 System console DBW 3–19 platform 3–18 purpose 3–18 System integrity and referential integrity 17–3 and tables 17–3 definition 17–2 System Management Facility 19–10 System performance monitoring performance monitoring 19–10 PM/API 19–10 resource usage 19–7 system management facility 19–10 system status 19–6 TDPTMON 19–10 Teradata Manager 19–2 Teradata Performance Monitor 19–3 System resource management Ferret utility 16–7 Priority Scheduler 16–7 Teradata DQM 16–9 Teradata MultiTool 16–11 System status configuration 19–6 states 19–6

T Table level locks 15–8 Table Rebuild utility 14–7 Tables and system integrity 17–3 child 12–8 constraints 7–4 DBC.AMPUsage 18–8 DBC.SysSecDefaults 17–10 fallback 14–3

Introduction to Teradata Warehouse

Index –9

global temporary 7–4 locks 15–8 parent 12–8 permanent 7–4 referenced (parent) 17–3 referenced table (parent) 12–9 referencing (child) 12–9, 17–3 relations 7–3 resource usage 19–7 temporary 7–4 volatile temporary 7–5 Tactical queries 1–3 Target Level Emulation. See TLE TDP channel-attached systems 13–3 definition 13–3 functions 13–3 Teradata Tools and Utilities 2–8 TDPTMON 19–10 Temporary tables global 7–4 volatile 7–5 Teradata Administrator database administration 2–9, 19–4 Teradata Utility Pack 2–9 Teradata Archive/Recovery utility HUT locks 15–11 software fault tolerance 14–7 use 16–2 Teradata Database ANSI transaction semantics 15–4 ANSI-compliant data types 5–6 architecture 3–1 CLIv2 13–4 communications interfaces 13–2 methods of attachment 2–2, 13–2 purpose 2–3 PUT installation software 2–6 referential integrity 12–8 shared information architecture 2–4, 17–12 status 19–6 Teradata Gateway 2–5 Teradata mode transactions 15–6 third-party 6–13 transaction semantics 15–4 Teradata Director Program. See TDP Teradata DQM managing access 16–9 query management 16–9 request scheduling 16–9 Teradata Tools and Utilities 2–12

Teradata Dynamic Query Manager. See Teradata DQM Teradata FastExport, data export 2–11, 16–4 Teradata FastLoad, client/server load utility 2–11,

16–3 Teradata file system Cylinder Read 3–16 disk I/O integrity checking 3–16 purpose 3–16 Teradata Gateway encryption 17–9 security 17–9 server software 2–5 Teradata Index Wizard and Teradata Visual Explain 16–13 demographics 16–14 QCD 16–15 Teradata Tools and Utilities 2–12 use 16–13 Teradata Manager alerts/events management 19–3 Priority Scheduler Administrator 19–3 system monitoring 19–2 Teradata Administrator 2–9, 19–4 Teradata Performance Monitor 19–3 Teradata Statistics Wizard 16–8 Teradata Tools and Utilities 2–12 Teradata mode transactions 15–6 Teradata MultiLoad, client/server load utility 2–11,

16–3 Teradata MultiTool DIP 16–11 PDE tasks 16–11 Teradata Tools and Utilities 2–10 use 16–11 vproc manager 16–11 Teradata Performance Monitor functions 19–3 system performance monitoring 19–3 Teradata Tools and Utilities Teradata Tools and Utilities 2–12 Teradata SQL data types 5–6 non-ANSI compliant development 2–2 see also SQL Teradata SQL Assistant on Windows PC 16–23 Teradata Tools and Utilities 2–10 use 16–23 Teradata Statistics Wizard statistics collection 16–8 Teradata Tools and Utilities

2–12

Index –10

Introduction to Teradata Warehouse

Teradata stored procedures benefits 11–3 cursors 5–16 definition 11–3 elements 11–4 SQL statements related to 6–7, 6–9 use 11–3 Teradata System Emulation Tool. See TSET Teradata Tools and Utilities BTEQ 2–8, 2–9 C 2–10 C preprocessor 2–10 CICS 2–8 CLIV2 2–9 CLIv2 2–8 COBOL preprocessor 2–10 data connector 2–11 Host Utility Console 2–8 IBM IMS/DC 2–8 JDBC 2–10 mainframe 2–8 ODBC 2–9 OLE DB provider 2–9 Open Teradata Backup 2–13 PL/I 2–10 Preprocessor2 2–9 TDP 2–8 Teradata Administrator 2–9 Teradata Archive/Recovery 2–8, 2–13, 14–7,

16–2 Teradata DQM 2–12 Teradata FastExport 2–11 Teradata FastLoad 2–11 Teradata Index Wizard 2–12 Teradata Manager 2–12 Teradata MultiLoad 2–11 Teradata MultiTool 2–10 Teradata Performance Monitor 2–12 Teradata SQL Assistant 2–10 Teradata Statistics Wizard 2–12 Teradata Tools and Utilities Access Modules 2–11 Teradata TPump 2–11 Teradata Utility Pack 2–9 Teradata Visual Explain 2–13 Teradata Warehouse Builder 2–11 TS/API 2–9 TSET 2–13 Teradata Tools and Utilities Access Modules 2–11 Teradata TPump continuous data load utility 2–11 data load utility 16–4 Teradata Utility Pack 2–9

Teradata Visual Explain and Teradata Index Wizard 16–13 comparison of execution plans 16–16 QCD 16–16 Teradata Tools and Utilities

2–13 Teradata Warehouse Builder 2–11 Third-party software compatible 6–13 PM/API 6–13 TS/API products 6–13 UDFs 5–14 TLE and TSET 16–18 supported on server 16–18 use 16–18 TPA, services 3–15 TPD security 17–6 tdpids 17–6 Transactions 2PL 15–4 ANSI mode 15–5 control using 2PL 15–4 deadlock resolution 15–10 definition 15–4 recovery 15–12 rollback in ANSI mode 15–5 rollback Teradata mode 15–6 semantics 15–4 SQL statements related to 15–5, 15–6 Teradata mode 15–6 Transient journals 14–6 Transitive dependence 12–3 Triggers definition 11–6 restrictions 11–10 SQL statements related to 11–7 use 11–7 Trusted Parallel Application. See TPA TS/API Teradata Tools and Utilities 2–9 third-party product 6–13 TSET and TLE 16–19 supported on client 16–19 Teradata Tools and Utilities

2–13 use 16–19 Two-Phase Locking. See 2PL

Introduction to Teradata Warehouse

Index –11

U UDFs creation 5–14 SQL statements related to 5–15 third-party 5–14 Unique primary index. See UPI Unique secondary index. See USI UPI primary index characteristics 8–3 strengths and weaknesses 8–11 User-Defined Functions. See UDFs Users account string identifiers 18–8 creation 18–4 space allocation 18–2 USI characteristics 8–6 strengths and weaknesses 8–11 Utilities Ferret 16–7 Gateway Control 17–9 Gateway Global 16–6 Open Teradata Backup 16–2 overview 18–10 Priority Scheduler 16–7 Teradata Archive/Recovery 16–2 Teradata FastExport 2–11, 16–4 Teradata FastLoad 2–11, 16–3 Teradata MultiLoad 2–11, 16–3 Teradata MultiTool 16–11 Teradata Statistics Wizard 16–8 Teradata TPump 16–4

Vprocs AMPs 3–8 definition 3–8 functionality 3–8 LUNs 3–5 maximum per system 3–8 PDE 3–15 PEs 3–8 types of 3–8 vproc manager in Teradata MultiTool 16–11 vproc migration 14–2

W WinCLI 13–7 Workstations AWS 3–18 PC with X Windows 3–18 platform specific 3–18 system console 3–18 UNIX 3–18 Write HUT lock 15–11 Write lock 15–9

V vdisks 3–5 Views data control access 17–13 definition 11–2 in Data Dictionary 9–6 resource usage 19–7 restrictions 11–2 SQL statements related to 11–2 users 9–7 Virtual processors. See Vprocs Volatile temporary tables 7–5 Vproc migration cliques 14–9 hardware fault tolerance 14–9 software fault tolerance 14–2

Index –12

Introduction to Teradata Warehouse