Programming Languages: Structures and Models [Hardcover ed.] 0534129005, 9780534129002

Programming language theory in the context of a language model (paradigm) and one or more example languages for each of

719 58 34MB

English Pages 413 [440] Year 1990

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Programming Languages: Structures and Models [Hardcover ed.]
 0534129005, 9780534129002

Citation preview

PROGRAMME STRUCTURES AND MODELS

HERBERT L DERSHEM MICHAEL

J.

JIPPIIUG

PROGRAMMING LANGUAGES: STRUCTURES AND MODELS

PROGRAMMING LANGUAGES: STRUCTURES AND

MODELS

Herbert Michael

L.

J.

Dershem

Jipping

Department of Computer Science

Hope College

Wadsworth

Publishing

Company

Belmont, California

A

Division of

Wadsworth,

Inc.

Computer Sciences

Editor:

Editorial Assistant: Carol

Frank Ruggirello

Carreon

Production: Stacey C. Sawyer Print Buyer:

Martha Branch

Designer: James Chadwick

Copy

Editor: Elizabeth

Technical Illustrator:

Judd

Anne Eldridge

Compositor: Graphic Typesetting Service Cover: Williams/Vargas/Design

Cover Photo: The Image Bank Signing Representative: Thor McMillen

©

1990 by Wadsworth,

Inc. All rights

reserved.

No

part of this

book may be reproduced,

stored in a retrieval system, or transcribed, in any form or by any means, electronic,

mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, wadsworth Publishing Company, Belmont, California 94002, a division of

Wadsworth,

Inc.

Printed in the United States of America

123456789

10—94

93 92 91 90

Library of Congress Cataloging in Publication Data

Dershem, Herbert. Programming languages. 1. I.

Programming languages (Electronic computers)

Jipping, Michael

QA76.7.D465

J.

1990

ISBN 0-534-12900-5

II.

Title.

00513

89-21465

BRIEF

CONTENTS

Chapter

I

Introduction and Overview

Chapter 2

Preliminary Concepts

Chapter

Information Binding

3

Chapter 4

Control Structures

1

12

44 88

Chapter 5

Data Aggregates

Chapter 6

Procedural Abstraction

I:

Chapter 7

Procedural Abstraction

II:

Chapter 8

Data Abstraction

Chapter 9

Example Imperative Languages

Chapter Chapter Chapter

1

I

Logic Model

402

144

Exceptions and Concurrency

288

322

Object-Oriented Model

Index

Procedures

222

Functional Model

1

I

114

360

246

182

Digitized by the Internet Archive in

2012

http://archive.org/details/programminglanguOOders

CONTENTS Preface

xiii

Chapter

Introduction and

I

Overview 1.1

What

Is

a

I

Programming Language?

The Early Languages

1

Why Study Programming Languages? 3 A Brief History of Programming Languages 1.3 1.2

Chapter 2

2.1 Language Specification

13

Expressiveness

Orthogonality

15

Syntax Diagrams

Language Translation

34 34

34

Portability

21

2.4 Choice of Language

Problems with Specifications

Compilers

9

Languages of the 1980s

1

14

BNFandEBNF

2..1

8

Preliminary

Concepts Grammars

5

ALGOL-Based Languages

21

Implementation

35

35

Programmer Knowledge

25

and Interpreters 26

Overview of the Compilation Process

28 Syntactic Analysis 30

27

Syntax

35

36

Portability

36 36

Lexical Analysis

Semantics

Semantic Analysis

Programming Environment Model of Computation 36 An Example 37

Code Generation Optimization

31

31

31

2.3 Language Design Characteristics Simplicity

Abstraction

33 33

Terms 33

38

Discussion Questions Exercises

38

39

Laboratory Exercises

42 vii

36

Chapter

Information Binding

3

46

Bindings

3.3 Execution Units and Scope of Binding

48

Identifiers

and

and Expressions 51

Type Checking

and Derived

3.2 Scalar Data Types

56

Types

Exercises

Control Structures 90

%

4.2 Iterative Structures

Posttest Iteration

In-Test Iteration

Terms

107

109

Exercises

1

109

10

Laboratory Exercises

116

116

112

and Binding

121

122

125

126

135 137

Records in Ada

Terms

138 141

141

Discussion Questions

130

Exercises

and Binding

Manipulation

134

Implementation

5.6 Sets

128

133

Variant Records

5.5 Files

Multi-Indexed Arrays

Ada

and Binding

Manipulation

119

131

132

133

Declaration

119

Implementation

Ada

5.4 Records

118

119

Manipulation

Implementation Strings in

117

Discriminated Union

Declaration

106

Scope Issues

Data Aggregates

5

Cartesian Product

Arrays in

104

Statement Labels

Discussion Questions

5.1 Data Aggregate Models

5.3 Strings

96

97 97 98

103

4.3 Unconstrained Control Statements

The goto Controversy

Nonterminating Iteration Pretest Iteration

100

Nondeterministic Iteration

90 Multialternative Conditional 92 Nondeterministic Conditional 95

Declaration

87

88 Fixed Count Iteration

90

Two-Alternative Conditional

5.2 Arrays

79

81

Laboratory Exercises

Simple Conditional

Powerset

79

Discussion Questions

4.1 Conditional Structures

Sequence

Terms

58

75

64

Chapter 4

Mapping

72

Scope of Location Binding

62

Boolean Type

72

Scope of Name Binding

59

Numeric Types Pointer Type

3.4 Scope of Binding

53

54

Type Conversion

Chapter

72

Blocks

Operators, Functions,

Subtypes

70

Statements

49

Type Binding

68

User-Defined Types

46

3.1 Data Objects

Types

44

130

130

viii

Contents:

142

142

Laboratory Exercises

143

133

104

70

Chapter 6

Procedural Abstraction

Procedures 6.1 Procedures as Abstractions

144

146

Activation Record

6.4 Parameters

Name Parameters

149

6.6 Overloading

152

6.7 Coroutines

158

Parameter Association

IN Parameters

Terms

159

Chapter 7

165

173

175

Discussion Questions

OUT Parameters 159 IN OUT Parameters 1 60

164

167

6.8 Procedures in Ada

157

161

162

6.5 Value-Returning Procedures

149

Global Environment

Procedures as Parameters

147

149

Local Environment

160

Aliasing

6.2 Procedure Definition and Invocation

6.3 Procedure Environment

I:

Exercises

175

176

Laboratory Exercises

178

Procedural Abstraction

Exceptions and

Concurrency 7.1 Exceptions

1

82

183

Definition

185

Raising Exceptions

186

Handling Exceptions

190

Implementation

Ada

7.2 Exceptions in

7.3 Concurrency

7.5

193

in

Chapter 8

8.4 Monitors

210

216

Exercises

219

222 237 Generic Packages 239 Using Packages

224

Terms

229

243

Discussion Questions

Package Definition

217

217

Laboratory Exercises

225

8.5 Data Abstraction in

210

201

233

Private Types

195

Data Abstraction

8.3 Parameterization

Ada Examples

Discussion Questions

199

Ada

8.1 Abstract Data Types

8.2 Encapsulation

204

Terms

194

194 Interprocess Communication Concurrency

Selective Waits

Sieve of Eratosthenes

Data Sharing

Synchronization

201

ATM Management 210

191

Definition and Invocation

1A

and Invocation

Data Sharing

Ada

235

233

Exercises

243

244

Laboratory Exercises

245

237

Contents

fX:

201

Chapter 9

Example Imperative

246

Languages C 247 Philosophy and Approach 248 Information Binding 248

Data Aggregates Data Abstraction

251 254

Data

Abstraction

Terms

256

Procedural Abstraction

10

10.2 FP:

A

Functional

Model

A

Comparison of LISP Terms 315

293

1

A

301

302

Logic Model

Pure Logic Language

323

319

Example Programs

in Prolog

345 352

Management Systems Query Language SQL 352 SQL as a Logic Language 355

324 Example Program in DP 328 The Process of Deduction 332

Relational Database

335

A Logic-Oriented Language

Syntax of Prolog 340 Nonlogic Model Features of Prolog

:X

315

11.4 Database Query Languages

Basic Components

11.3 Prolog:

313

322

324

Implementation Considerations

FP

317

Exercises

Laboratory Exercises

11.1 Introduction to Logic Language Model

11.2

to

Discussion Questions

Functional-Oriented Language

1

308

310

Examples

293

293

Basic Components

Chapter

288

296

Examples

271

Function Definition

Basic Components

10.3 LISP:

Chapter Appendix

Pure Functional Language

Introduction

270

Laboratory Exercises

259

289

10.1 Functions

269

269

259

Information Binding

Chapter

268

Exercises

and Approach 259

Philosophy

267

Discussion Questions

258

9.2 Description of Modula-2

265

Procedural Abstraction

Control Structures

Data Aggregates

262 263

Control Structures

9.1 Description of

Contents:

339

Terms

357

Discussion Questions Exercises

341

357

358

Laboratory Exercises

359

352

Chapter

Object-Oriented

12

Model 12.1 Object-Oriented Model

360 361

12.4 Comparison with Imperative Model

397 398

Components of the Object-Oriented Model 362 Properties of the Object-Oriented Model 362

Polymorphism

An Example 363

Terms

12.2 Smalltalk

Overview

371

Class Hierarchy

An Example 12.3 C

++

398

Discussion Questions

371

Smalltalk Syntax

Inheritance

Exercises

372 382

399

Laboratory Exercises

in Smalltalk

384

398

Bibliography

399

400

388

Components of C + + An Example 395

389

Index

402

Contents

xi:

397

PREFACE Programming language courses

at

the undergraduate level can take several

different approaches. Frequently they present a survey of

major program-

ming languages, giving the students exposure to and experience with a number of different languages. Other courses focus on the underlying theory of programming languages. A third approach is for the course to present the fundamental features and concepts common to all programming languages. This textbook a first course in programming languages



for undergraduates

— serves the

third purpose. But

it is

the authors' opin-

on the fundamental concepts cannot be adequately taught without the students being exposed to a variety of languages and gaining experience in their use. Furthermore, the students need some understanding of theoretical topics. This is the philosophy represented by Programming Languages.- Structures and Models. ion that a course

iiiiiiiiiiiiimiiiiiiiiiiiiiiiiimmiiiiiiiiiiiiiiiiiiiiiiiiiiiimi

This

book

is

organized into four different computational models, or par-

adigms, for programming languages. The imperative model first.

Within the presentation of

this

Organization

presented

model, those fundamental features of

programming languages are described imperative languages. This section

is

is

that are

commonly

present in

followed by descriptions of the func-

and object-oriented models. Because these three models share many fundamental concepts with imperative languages, their presentation focuses on those features not present in the imperative model and references the imperative chapters for discussion of the common features. The models presented in this book were chosen for their anticipated

tional, logic-oriented,

xili

importance in the field of computer science in the near future. Other models, such as data flow and pattern matching, although interesting, were not chosen because they appear unlikely to have as great an effect.

iiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiimiimiiiiiiiiiiiiiiiiiiiiiiiiim

Each of the four programming language models is described through the use of a model language that exhibits the important aspects of that model. For the imperative model, Ada has been chosen as the model language because of the richness of features it contains, especially in data abstraction and concurrency. For those instructors who wish to use a language other than Ada as the empirical model, Modula-2 and C are described in Chapter 9 with the topics appearing there in the same order as they appear in Chapters 3 through 8. Chapter 9 can thus be covered in parallel with Chapters 3 through

Use of model languages

8.

used as the model functional language, whereas hypothetical languages are constructed to represent the logic-oriented and objectoriented models. In all cases, the model languages form a standard against which other languages exhibiting properties of that model are compared. Backus' FP

is

Several such languages are discussed for each model.

iiiiiiiiiiiiHiiiniii

Laboratory exercises are included

at

cises require the students to write

the

end of each

chapter.

language feature found in that chapter and to determine is

These exer-

programs, usually to practice using a

how that

implemented. Most of these exercises are language-independent

they could be assigned for any appropriate language that the student.

It

would be very

is

mum

iiiiiiii iiiiiiiiiiiiiiimiiiiiiiiiiiiiiii

Laboratory exercises

feature in that

available to

instructive for the student to repeat a given

exercise for several different languages.

Most of the laboratory exercises are very generally stated to permit the instructor

maximum flexibility in adapting them to the local environment.

iiiiiiiiiiiiiiiiiiiiiiiiiiiMiiiiiiimiiiiiiiiimmiiiiiiiiiiiiiiiiii

The prerequisite for this book is completion of the first two courses in a computer science curriculum, including experience in programming in one structured imperative language. Pascal is used in examples throughout the book with the assumption that the reader is familiar with that language. However, students whose first language was Modula-2 will have no trouble in

understanding the Pascal examples.

xiv

Preface:

Intended audience

The goal of this book is to teach students of programming languages. This includes the

to

become

intelligent users

choose languages appropriate for different applications, the ability to make effective and efficient use of a language in software development, and the ability to quickly learn ration for

although

new

languages. This

book

is

ability to

not intended as a final prepa-

programming language implementors, designers, or researchers, provides appropriate background preparation for more advanced

it

courses in these areas.

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiii

more

be comfortably covered in a single-term undergraduate course. Many different courses can be taught from this book, several of which are outlined here.

This textbook contains

1.

material than can

textbook

Imperative Model Only: Chapters 1-9 could be covered in detail, and students could do a significant amount of programming in several imperative languages that are

2.

Use of This

new

to them.

Imperative Model plus Surveys of Other Models: This course could cover Chapters 1-9, introducing students to only one

new

imperative language, perhaps Ada. Chapters 10-12 could then be covlittle, if any, programming in the nonimperative models. Imperative Model plus One Other Model: Chapters 1-9 could be covered with students introduced to no new imperative languages to the extent that they would program in them. Rather, the students could

ered with 3.

use imperative languages that they already know.

One

of the three

chapters 10-12 could then be covered in detail, introducing the stu-

more languages

illustrating that model and having The other two nonimperative model chapters could be quickly surveyed without any student programming

dents to one or

them program

4.

in those languages.

experience in those models. All Four Models: This course could give approximately equal time to all four models. Chapters 3-6 could be quickly treated, relying heavily

on the

require

more

students' prior experience. Chapters 7

and 8 might

extensive coverage because students are less likely to

have used these features extensively in prior courses. Previously learned languages could be used for programming exercises throughout.

Chapters 10-12 could be covered in

detail, giving

students program-

ming experience with one language representing each model.

Preface

XV:

iiiiiiiiimiiiimiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

The following reviewers provided many helpful comments, which improved the accuracy and coverage of this book:

greatly

Anthony Aaby

Prakash Panangaden

Bucknell University

Cornell University

Acknowledgments

Boumediene Belkhouche

Richard

Tulane University

University of Washington

Bill

John Peterson

Buckles

University of Arizona

Tulane University

Frank

A.

Pattis

John Remmers

Chimenti

Liberty University

Eastern Michigan University

Robert Crawford Western Kentucky University

University of

Al Cripps

Victor Terrana

Middle Tennessee State University

Indiana University Northwest

Thomas Gendreau

Barbara Tulley

University of Wisconsin

Elizabethtown College

Ken Slonneger Iowa

David Oakland

Drake University

The students in the Hope College CSCI 383 course also provided many helpful comments during the class testing of preliminary versions of this

We are also indebted to Frank Ruggirello for his gentle prodding and expert assistance during this book's preparation. He and the Wadsworth staff made this project possible by their professional assistance book.

throughout.

xvi

Preface:

PROGRAMMING LANGUAGES: STRUCTURES AND MODELS

CHAPTER 1.1

What Is a Programming Language?

1.2

Why Study Programming Languages?

1.3

A Brief History of Programming Languages

INTRODUCTION AND OVERVIEW iiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiii

A

language

is

a systematic set of rules for communicating ideas. With a

communication is between people and the language is used in both spoken and written forms. Programming languages differ from natural languages in several important ways. First, the primary communication is between a person and a computer, although programming languages are also useful for communication between people. The second major difference is in the content of the communication, which, in the case of programming languages, is known as a program. Programs are expressions of solutions to problems that are specific enough to give the receiver of the program sufficient information to carry out the solution. A third unique feature of communication via a programming language is the medium used. Since a computer is the intended receiver, this has traditionally meant that programs are represented symbolically as strings of characters as opposed, for example, to audible sounds. Although modern programming environments have somewhat released programmers from this restriction, all languages that we will discuss in this book have been designed for this mode of

natural language, like English, this

communication.

Our working

definition for a

A programming language to express a process

is

programming language

a language intended to

is:

be used by a person

by which a computer can solve a problem.

I . I

What

is

programming language?

a

The four key components in this definition of a programming language are:

1.

2.

3.

4.



Computer the machine that will carry out the process described by the program Person the programmer who serves as the source of the communication the activity being described by the program Process Problem the actual system or environment where the problem arises

— — —

Four models for programming languages are detailed in this text. Each one of these corresponds most closely to the point of view of one of the preceding four components. The imperative model is based on the computer's perspective. This is reflected in the sequential execution of commands and the use of a changeable data store, concepts that are based on the way computers execute programs at the machine-language level. Imperative has been the predominant paradigm for languages because such languages are easiest to translate into a form suitable for machine execution. The program of this model

sequence of modifications to the computer's storage. model is most closely related to the perspective of the person. It looks at the problem from the logical point of view. The program is a logical description of the problem expressed in a formal way, similar to the manner that a human brain would reason about the problem. consists of a

The

The

logic-oriented

The

functional

model focuses on the process of solving the problem. programs that describe the operations that

functional view results in

must be performed

to solve the problem.

The object-oriented model most closely reflects the actual problem. A program in this model consists of objects that send messages to each other. These objects in the program correspond directly to actual objects, such as people, machines, departments, documents, and so on. In this book, we will look at all four of these models or paradigms and the ways they are expressed in programming languages. We will also find that all programming languages provide some combination of these viewpoints to allow for efficiency in the construction and execution of programs.

2

Chapter

I

|

Introduction and

Overview

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiii

five major benefits that you will receive through the study of the and models of programming languages. 1. You will improve your problem-solving ability. It is believed that facility with and understanding of natural language affects our ability to think and form ideas. Similarly, a thorough understanding of a programming language can increase our ability to think about approaches to prob-

There are structure

lems. This

is

especially true

when we have

1

languages?

the ability to think about the

power.

You

will be able to

more

intelligently

choose

an

appropriate lan-

guage. In the 1960s and 1970s, programmers could seldom choose the All programmers working for a given organization program in the language that was the "standard" for that site. In many cases, there was only one computer and only one language implemented on that computer. Frequently, the local programmers knew how to program in just that one language as well. This situation has changed dramatically in the 1980s. In the first place, the improved technology of computers and language translators has made many languages available on present-day machines, even on the smallest personal computers. In addition, programmers who have completed courses such as the one for which this book is intended have experience with and

language they used.

were expected

to

an understanding of a variety of languages.

common

programmer to choose from among several possibilities. Whereas in the past this choice was determined by the language allowed, the one implemented on the computer system, or the knowledge of the programmer, today a programmer with an understanding of programming languages can choose a language that makes the problem solution easier and Therefore,

it is

practice today for the

a language for a given project

more 4.

we

efficient.

You

will find

it

new programming languages. As programming languages, we find that new

easier to learn

study the development of

languages and enhancements to present languages are continually being introduced.

We

also see that throughout this

concepts that remain constant as well as

new

1.2

|

development there are key facilities

Why

being added.

study programming languages?

Why

study

programming

problem using the various models of languages described in Section 1.1. 2. You will be able to make better use of a programming language. The study of programming language structures will give you a better understanding of the function and implementation of these structures. Then, when you are programming, you will be better able to use the language to the full degree of its functionality and to do so in an efficient way. Understanding the power of a language will enable you to utilize that 3.

.2

3:

An important result of the study of programming languages is the new languages and new capabilities of existing languages

ability to learn

Through a thorough understanding of programming language models, one can quickly assess a new language in comparison with those models and determine the ways the new language is the same and the ways it differs. 5. You will become a better language designer. This benefit is more as they are developed.

important than

it

first

appears.

Few people

ever have the desire or the

own programming

opportunity to design their

language. Although you

if we hold our view that language is a means of communication between a person and a computer, then every computer system that is developed must have

may be an

exception, chances are that you will not be. However,

to

a language incorporated within tion.

A good

it

to provide for

human/machine

interac-

understanding of programming language principles can greatly

assist in this interface design.

many modern languages have

In addition to this,

they are extensible in a variety of ways. This

means

the property that

that the

programmer

can enhance the language through the addition of new data types and operators. In these, languages, every program actually is a new language design in the sense that the programmer has the power to enhance the original language.

miiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

It

is

programming languages to have some Much has been written on this topic, including

helpful for understanding

appreciation of their history.

Sammet (1969) and Wexelblat

we

have limited our consid-

eration to those languages that are either used extensively today or that

originated important concepts for today's languages. Thousands of lan-

guages have been implemented and most of these have made important contributions to the field. We have necessarily limited our consideration

We

we

feel

have had the greatest influence.

have structured

this history into three periods.

The

first is

the

period for about a decade beginning in 1955, during which the first higherlevel languages were being developed with a wide variety of philosophies





and concepts. The second period 1965-1971 was a time of consolidation around the model of one language, ALGOL 60, with the development of a number of new languages derived from ALGOL 60 but extending it by adding important new features. In the final period, 1972 and after, the results of earlier research on languages were pulled together to introduce new models and approaches for programming languages.

4

.3

A

brief history

of programming

(1981).

In the historical overview that follows,

to 15 that

1

Chapter

I

|

Introduction and

Overview

languages

FORTRAN. FORTRAN higher-level

has the distinction of being the

programming language.

Prior to

its

first

widely used

implementation,

The

many

people were skeptical about the possibility of a language being compiled successfully and efficiently. FORTRAN quickly erased such skepticism and

became years

a very popular language that

is still

in

heavy use more than 30

later.

We

Most early languages were named by acronyms. in a language's name by writing it in all uppercase

name

display the full

guage's name.

will indicate this

letters.

We

will also

mention of the

in parentheses after the first

lan-

FORTRAN (FORmula TRANslation) was designed and imple-

1956 by John Backus at IBM. It was specifically designed for a single machine, the IBM 704, and still bears the marks of some of the

mented

in

idiosyncrasies of that machine.

language that incorporates maintaining

It

has evolved through the years into a

many modern language

facilities

while

still

original character.

its

FORTRAN was

designed for solving

algebraic notation. Since

it

scientific

problems, adopting an

was a pioneer language,

it

made many

contri-

among these were (1) variand assignment statements, (2) the concept of types, (3) modularity through the use of subprograms, and (4) formatted input/output.

butions to language development. Included ables

COBOL.

Although much early programming was scientific in nature, in the 1950s many business applications were being programmed as well and the requirements of such programming were not well handled by FOR-

TRAN. There was an evident need

for a

common

language suitable for

these business applications. In 1959, through the initiative of the U.S.

Department of Defense, a committee was formed to develop a language to meet these needs. This committee consisted of representatives from computer manufacturers and the Department of Defense. The resulting language,

COBOL (COmmon

implemented

Business Oriented Language), was

first

I960 and soon became the standard for business data

in

processing applications.

Beyond the

difference in their intended applications, there

were two

FORTRAN and COBOL were developed. of one organization, IBM, whereas COBOL resulted

major differences in the ways

FORTRAN was

the effort

from the cooperative

effort of

many

organizations. Furthermore,

FOR-

TRAN was

designed to be run on a single machine and in fact the architecture of that machine affected many of the design decisions. COBOL, on

was designed independently of any specific computer with it be implemented on all computers. Like FORTRAN, COBOL has evolved over the years as new standards have been developed. It has also continued to be used extensively over a 30-year period. Its strengths are in the manipulation of files and in handling

the other hand,

the intention that

fixed decimal data.

1.3

|

A

brief history of

programming languages

5:

early languages

One like.

For

COBOL was that its code be Englishprograms tend to be very wordy, and many cumbersome. However, the English-like property

of the primary objectives of this reason,

programmers find of COBOL was an

COBOL

this

early attempt at designing a language to facilitate the

and thus an important contribution to the development of later languages. The other major contribution of COBOL was the record which the introduction of the heterogeneous data structure became an important component of later languages. readability of programs,





ALGOL

60. The ALGOL (ALGorithmic Oriented Language) 60 language had a European origin and was designed by an international committee in I960. Although it never enjoyed the commercial popularity of FORTRAN and COBOL, it is the most important language of this era in terms of its

on later language development. Like FORTRAN, ALGOL 60 was designed for use in scientific problem solving. Unlike FORTRAN, it was designed independently of an implementation. This was both a major asset and a major liability. Its machine independence permitted the designers to be more creative, but it made influence

implementation

much more

difficult.

ALGOL 60 had was a result of its descripNaur (1963). This report became the accepted definition of the language and was a model of clarity and completeness. A major contribution of this report was the introduction of BNF notation for defining the syntax of the language. This notation will be described in Chapter 2 and used throughout this book. ALGOL 60 was only used on a limited basis, mostly by research computer scientists in the United States and by Europeans. Its use in commercial applications was hindered by the absence of standard input/output facilities in its description and the lack of interest in the language by large computer vendors. ALGOL 60 did, however, become the standard for the publication of algorithms and had a profound effect on future language One

tion as

of the greatest impacts

found

in

development.

Some

of the major contributions of ALGOL 60 to later languages were

(1) block structure: the ability to create blocks of statements for the scope

of variables and the extent of influence of control statements; (2) structured control statements: iteration control;

LISP. Like that

if-then-else and the use of a general condition

and (3) recursion: the

FORTRAN and COBOL,

ability of a

procedure to

LISP (LISt Processing)

was developed for a specific application and is was developed by John McCarthy in the

today. LISP

Group

at M.I.T. in

still

is

for

call itself.

a language

extensively used

Artificial Intelligence

the late 1950s as a language to support

artificial intelli-

gence research. It was first implemented in I960 on the IBM 704. It has remained the primary programming language for artificial intelligence through the years. Unlike FORTRAN and COBOL, LISP has never been

6

Chapter

I

|

Introduction and Overview:

standardized by a national organization and exist.

on

many

dialects continue to

Common LISP was defined in 1981 as an informal standard, and work on Common LISP is presently underway.

a formal standard based

LISP pioneered the idea of nonnumeric or symbolic computing.

introduced as

its

LISP language

is

basic data structure the concept of the linked

functional in nature. This

means

It

also

list.

The

that rather than specify-

ing operations as a sequential set of statements, LISP specifies the invoca-

main device for model of computation will be defined and

tion of a function, using composition of functions as the

specifying multiple actions. This

explored in Chapter 10 along with a description of the LISP language. LISP also used the same basic construct, the S-expression, to represent

both data and program, thus allowing a program to be accessed as data and data to be executed as a program.

APL. Still another language designed in the late 1950s was APL (A Programming Language), which was the creation of Kenneth Iverson. Iverson did his initial work on the language at Harvard and later continued it at IBM. APL was enthusiastically received by a number of programmers. It consisted of many powerful operators and a simple, mathematical notation. The large number of operators resulted in the requirement of a large character set. This latter requirement made implementation of the language difficult. The mathematical nature of APL discouraged programmers who were not adept at mathematics. The definition of APL was specified in Iverson (1962).

The primary data structure of APL is the array and the language features operators that apply to an entire array. Iterative processing

by placing the data array.

The

in

is accom- plished an array and applying a single operator to that entire

variables of

APL are untyped, taking on the type of the objects

assigned to them.

APL

and array processing applications. Because of its powerful operators and compact notation, a great pastime among APL programmers is the construction of one-line programs. Such programs actually use APL in a purely functional manner that very closely matches the functional model. is

especially useful for mathematical

BASIC. The BASIC (Beginners All-purpose Symbolic Instruction Code) language was developed at Dartmouth College by Thomas Kurtz and John Kemeny in the mid-1960s. Its objectives were to be easy for undergraduate students to learn and to use the interactive programming environment that was also under development at Dartmouth at that time. BASIC was quite popular in academic circles over the next decade, but its greatest popularity came with the arrival of the microcomputer in the mid-1970s. The marketers of microcomputers needed a language that would be useful to the consumer. The two major criteria were that the language be easy to learn and that it exploit the interactive environment

.3

|

A

brief history of

programming languages

7:

provided by the microcomputer. Since BASIC was designed with these same two objectives a decade earlier, it was chosen as the language that

was provided with all of the early microcomputers. Although the microcomputer gave BASIC an important place in the history of programming languages, BASIC contributed little to the development of programming language technology. Perhaps its greatest contribution was that it was one of the first languages to provide an interactive programming environment as a part of the language, including the interpretive execution of programs.

The

six

languages described in the

last

section represent the

first

wave of

language development. The next wave built on the ideas and concepts of that first

wave. The most important languages to appear

in the latter half

ALGOL 60 language. Four of these ALGOL-based languages are described in the following of the 1960s were based on the key concepts of the paragraphs.

PL/I. The philosophy behind PL/I (Programming Language

I),

developed

IBM in the mid-1960s, was the replacement of the multitude of languages were in use for specific applications with one general-purpose language. The approach used was to incorporate features from each of the at

that

earlier languages into PL /I. For

example, PL /I included the block structure,

control structures, and recursion from

matted input/output from FORTRAN,

ALGOL file

60,

subprograms and

for-

manipulation and the record

COBOL, dynamic storage allocation and linked structures and the array operations from APL. PL /I, though highly promoted by IBM, never became as popular as its designers hoped. The major difficulty was a lack of cohesiveness in the language design, which contained many different features implemented in many different ways. The language was complex, difficult to learn, and difficult to implement. Two possible remedies for these problems were included in the language: the use of many defaults that could remain transparent to the user, and the intention that a programmer needed to learn only a subset of the language for a given application. These remedies proved to be inadequate, however. Two features of PL /I that have significantly impacted later language development are interrupt handling the ability to execute specified procedures when an exceptional condition occurs and multitasking, the specification of tasks that can be performed concurrently. These topics are structure from

from

LISP,



explored in Chapter



7.

Simula 67. Simula 67 was developed by Ole-Johan Dahl and Kristan Nygaard at the Norwegian Computer Center in the early 1960s. The original work was based on ALGOL 60 and was intended to be a language for system

8

Chapter

I

|

Introduction and

Overview

ALGOL-based

languages

The first version was called SimThe designers soon discovered that this language had potential beyond simulation, and to realize this potential they extended the original design description and simulation programming. ula

1.

to Simula 67.

The major contribution of Simula 67 is the concept of class. A class is an encapsulation of data and procedures that can be instantiated in a number of objects. The class of Simula 67 is the forerunner of abstract data types as implemented in Ada and Modula-2 (see Chapter 8) and of classes from the object-oriented languages Smalltalk and C + + (see Chapter 12). The latter two languages also adopted from Simula 67 the hierarchy of classes with inheritance of

ALGOL ALGOL ecessor.

68. Although 60,

ALGOL

68

its

is

was designed

It

components.

name

implies that

is

it

an improved version of

from its predbe a general-purpose language, as opposed to

actually a rather radical departure

to

having the scientific orientation of ALGOL 60.

ALGOL 68 by ALGOL 60.

never gained acceptance even to the limited level attained

This was, in part, because the original description (van Wijngaarden and others, 1969) was difficult to understand, using notation and terminology that was foreign to its readers.

The major design philosophy of ALGOL 68 tion,

small

namely, orthogonality.

number

constructs.

A

language that

is

is

also

its

major contribu-

orthogonal has a relatively

of basic constructs and a set of rules for combining those

It is

then possible to combine these constructs using any of the

rules with predictable results. This approach

PL /I, which included a large

number

is

in

opposition to that of

of independent constructs.

The most popular of this second wave of ALGOL-based languages is Pascal, developed by Niklaus Wirth in 1969 and named for the mathematician Blaise Pascal. Wirth 's goal was to provide a language that is simple to learn, supportive of structured programming, and easily implemented. He intended it to be a language suitable for use in the teaching of programming. The defining document is provided by Jensen and Wirth (1978). By the early 1980s, Pascal had become by far the most commonly used language for teaching programming at the college level. By the mid-1980s it also had become popular as a production language on microcomputers. Pascal's flexible control structures, user-defined data types, and file, record, and set data structures have made it a model for many of the Pascal.

languages of the next stage of development.

Although

all

were actually become major languages in the 1980s.

of the five languages discussed in this section

designed in the 1970s, they have

all

The designers of these languages benefited

1.3

|

A

greatly

Languages of the

from experience with

brief history of

programming languages

9:

1

980s

earlier languages,

and they all include features

that take advantage of mod-

ern hardware and software technology. The first two languages use entirely different models of computing than the earlier languages, while the the

ALGOL

line that

we

last

three continue development in

the imperative model.

call

Prolog. Prolog was developed at the University of Marseilles in France in 1972. It was designed for artificial intelligence applications and is based on formal logic. The logic-oriented model of programming served as a basis for Prolog, but Prolog falls short of the model's ideal of clauses that

describe the problem and can be expressed in an order-independent way. This

model and Prolog are described Prolog has

research.

It

become

in

Chapter

11.

a competitor with LISP for artificial intelligence

when

received higher visibility

it

was chosen

as the language

of the Japanese Fifth Generation Project.

Smalltalk. Smalltalk was developed by Alan Kay

at

the Xerox Palo Alto

Research Center in the early 1970s as part of the Dynabook project.

The two distinguishing the

strict

features of Smalltalk are

its

environment and

use of the object-oriented model. The Smalltalk language

embedded

is

within a graphical environment that includes pop-up menus,

windows, and the use of a mouse device for

input. This

environment has

served as the prototype for

many modern programming environments,

the most famous of which

that of the

Smalltalk

is

is

Apple Macintosh.

designed around the Simula 67

class

concept and includes

encapsulation, inheritance, and instantiation. All operations in Smalltalk

and the model are described in Chapter 12. This highly extensible and interactive language will undoubtedly have major impact on future language development.

consist of objects sending messages to other objects. Smalltalk

object-oriented

a

C. The language C was developed

at Bell

Laboratories in the early 1970s

implementing the UNIX operating system. C is a powerful language with facilities for access to raw data stored in memory as well as access through data types and structures of the language. The standard is defined by Kernighan and Ritchie (1978). The objective of C is to provide a language that has access to low-level data and generates efficient code. The language has an extensive set of operators. As a result, programs are often expressed with compact code at the expense of readability. A description of the C language is given in Chapter 9. as a language for

C

has

grown

conjunction with UNIX's acceptance as an excellent language for the construction of

in popularity in

an operating system. C

is

portable systems programs.

10

Chapter

I

|

Introduction and Overview:

Modula-2. Modula-2 was developed in the late 1970s by Niklaus Wirth, designer of Pascal, as an improvement to Pascal, especially for use in systems programming. Wirth developed the language as a part of the Lilith project, whose goal was the creation of an integrated hardware/software system. The result as described in Wirth (1982) is an excellent general-





purpose language that has replaced Pascal as a teaching language universities. Modula-2 is also described in Chapter 9. Modula-2 offers the following improvements over Pascal: 1.

Modules can be used

to

2.

All control structures

have a termination keyword.

3.

Coroutines provide for interleaved execution.

4.

Procedure types can be declared.

implement

The modules of Modula-2 make large software

Ada.

development

it

in

many

abstract data types.

an excellent language for use

in

projects.

Department of Defense initiated a project to obtain a suitable programming language for the development of embedded systems. An embedded system is a computer system that operates as a part of a larger system. A large portion of the work done by the Defense Department is on these embedded systems. In the early 1970s, the

After evaluating existing languages against the criteria desired, the

Defense Department decided that no language existed that met

and

that a

new

its

needs

language should be designed.

In 1977, a competition

was

initiated

among

four contractors to design

was chosen and the resulting language was given the name Ada, after Ada Augusta, the Countess of Lovelace and daughter of Lord Byron. She was a collaborator with Charles Babbage in his work on the Analytical Engine in the nineteenth century and is considered by many to be the first programmer. The standard for Ada is given by the Reference Manual (American National Standards Institute, 1983). Every compiler that aspires to be called an Ada compiler must be validated by the Department of Defense. Subsets and supersets of the language are not permitted. Ada is primarily based on Pascal, but it uses the class concept of Simula a suitable language. In 1979 the winning design

67

in

its

abstract data type facility called a package, adopts the exception-

handling features of PL /I, and provides an extensive tasking

facility for

concurrent processing.

Ada has been chosen as the primary example of an imperative in this book because of its wide range of facilities and the fact

guage its

structure

is

lan-

that

representative of the class of imperative languages.

.3

|

A

brief history of

programming languages

11

CHAPTER wmmmmmmammm

2.1

Language Specification

2.2

Language Translation

2.3

Language Design Characteristics

2.4

Choice of Language

PRELIMINARY CONCEPTS iiiini

This chapter presents a

number

iniiiiiii

H

iiiiiiiiiiiiiniiiiiiiii

of concepts that are necessary for a com-

plete understanding of the remainder of the text.

Section 2.1 describes several languages that can be used to describe the syntax of other languages. These are called metalanguages and will

prove useful throughout the book for expressing language syntax.

The structure of a programming language

is

frequently affected by the

process used to translate programs written in that language into an executable form. Section 2.2 presents an overview of that translation process.

Section 2.3 outlines a

number of

characteristics of a

language that are desirable for enhancing

its

effectiveness.

It

programming provides stan-

dards against which languages and their various features can be judged. Criteria useful for

choosing the appropriate language for a given applica-

tion are listed in Section 2.4.

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII

The description of a language is commonly broken into two parts: syntax and semantics. The syntax of a language is the set of rules that determines which constructs are correctly formed programs and which are not. The semantics of a language is the description of the way a syntactically correct program is interpreted or carried out. For example, the syntax of

X.I

Language

specification

Pascal tells us that

a

:

= b;

forms a correct assignment statement, while the interpretation of the statement as "replace the value of a with the current value of b" is a result of Pascal's semantics.

13

In our discussions in this text, we will use a specific formal tool, known EBNF, to describe language syntax. Language semantics will be described as in a much less formal way, usually in English. The reason for this is that

although there are tools for describing semantics, these tools are rather

beyond the scope of this book. Our formal language specifications will therefore concentrate on syntax. Language specification is important for two reasons. First, it is useful describing a language, and this is the way that we will use it throughout in the text to describe both real and theoretical programming language concomplex and

structs.

abstract,

and

their use

is

But language specifications are also useful in the verification of the

which an program can be tested. Such syntactic verification is typically carried out by language translation programs such as compilers.

validity of

programs, since they give us a

set of rules against

allegedly legal

The syntax of

a language

rules that defines

language.

The

all

is

described by a

grammar, which

is

a set of

of the valid constructs that can be accepted in the

basic elements of a

grammar

are:

These are the atomic (nondivisible) symbols that can be combined to form valid constructs in the language. Terminal symbols most commonly are a set of characters, though some languages may consider certain character strings to be symbols as well. 2. Set of nonterminal symbols These symbols are not included in the language itself, but are symbols used to represent intermediate definitions within the language as defined by productions. These nonter1. Set

of terminal symbols

minal symbols

represent syntactic classes or categories.

ofproductions symbol. It is of the form 3. Set

A production

is

a definition of a nonterminal

where x is a nonterminal symbol and y is a sequence of symbols each of which can be either terminal or nonterminal. 4. Goal symbol One of the set of nonterminal symbols is specified as the goal symbol. This is also sometimes called the distinguished symbol or the start symbol.

Two mar.

rules

must be obeyed for the above components

to

form a gram-

They are

1.

Every nonterminal symbol must appear to the

2.

least one production. The goal symbol must

not^

left

of the

appear to the right of the

production.

14

Chapter 2

|

Preliminary Concepts:

:

:

:

:

= of

at

= of any

Grammars

To

mar 2.1.

illustrate this

concept of a grammar,

let

for describing a calculator language. This

Note

that this

grammar has 16

us construct a simple gram-

grammar

is

given in Figure

terminal symbols and 8 nonterminals.

Also note that every nonterminal except for the goal symbol has multiple productions defining

they are

it.

Although such multiple productions are not required,

common and represent alternate definitions. You can also see that

recursive productions are permitted, that

terminal being defined

is

also

found

in

is,

its

productions where the non-

own

definition

on the

right-

hand side of the production. There are two ways that a grammar such as the one found in Figure 2.1 can be used. The first is to generate valid programs in the language. If we begin with the goal symbol and at each step substitute some definition for a nonterminal, proceeding until all remaining symbols are terminal symbols, we have generated a valid program. Examine the generation sequence found in Figure 2.2. At each step, the left-most nonterminal was replaced by a definition from one of the productions. Since multiple productions might apply to any nonterminal, there are multiple choices that

can be made. The choice of a production to apply

is

arbitrary in these

The choice of the left-most nonterminal for expansion was also arbitrary. An infinite number of valid calculations can be generated in this cases.

way.

The second way a grammar can be used is in the reduction of a valid program back to the goal symbol through the reverse application of productions. This verifies that a string of terminal symbols is indeed a program in the language defined by the grammar. This derivation is only possible if a prudent choice is made of the sequence of productions to be applied, as in Figure 2.3. For example, an initial choice of sign for + would have led to an early dead end in our reduction. Furthermore, if this process is attempted on a string that is not a valid program, the goal symbol can never be reached. Figure 2.4 shows this through an example where every step is uniquely determined and the final string matches none of the productions' right-hand sides.

In this section

we will

describe a language for expressing grammars. Since

we

BNF

The metalanguage we describe is BNF, which stands for Backus-Naur Form. This language was created to express the syntax of ALGOL 60 and has become the standard metalanguage. We will use BNF along with three extensions to describe syntax throughout this text. The extended language will be called EBNF, for Extended BNF. The metalanguage described in the previous section is what we will define as BNF, with one notational addition. In BNF, nonterminals will have their names enclosed in angle brackets () to allow a string of characters this

is

a language for describing languages,

call

a metalanguage.

it

2.

1

|

Language specification

15:

and

EBNF

FIGURE

2.

1

Grammar

for calculator language

0123456789+-*/=

Terminal Symbols:

Non-Terminal Symbol Ls:

Productions: 1. calculation 2. expression 3. expression 4. value value 5

:

:

=

:

:

=

:

:

=

:

:

=

:

:

=

.

:

:

=

.

:

:

=

:

;

=

:

:

=

:

:

=

:

=

.

number 7 number 8. unsigned 9. unsigned 10. digit 11. digit 12. digit 13. digit 14. digit 15. digit 16. digit 17. digit 18. digit 19. digit 20. sign 21. sign 22. operator 23. operator 24. operator 25. operator 6

calculation expression value number unsigned digit sign operator

expression = value value operator expression number sign number unsigned unsigned unsigned digit digit unsigned .

1

:

:

:

:

:

:

= 2 = 3 = 4

:

:

= 5

:

:

:

:

:

:

:

:

= 7 = 8 = 9

:

:

= +

:

:

= -

:

:

= +

:

:

= -

:

:

= *

:

i

= 6

— /

16

Chapter 2

|

Preliminary Concepts:

FIGURE

2.2

Generation of a calculation using the grammar

Figure 2.1

Production Applied

Current String calculation expression =

1

3 =

value operator expression number operator expression = unsigned operator expression = digit unsigned operator expression 2 unsigned operator expression = 2 digit operator expression = 25 operator expression = 25* expression = 25* value = 25* number = 25* unsigned unsigned = 25* digit unsigned = 25*1. unsigned = 25*1. digit = 25*1.5=

4 6

9 12 8

15

24 2

4 7 8

.

11

.

name

in

8

15

be distinguished from the corresponding string of terminal characters. For example, the nonterminal symbol can be distinguished from id, a string of two terminal symbols. See Figure 2.5 for a BNF definition of the calculator grammar. We are now ready to add three features to BNF to transform it into our EBNF These features do not add any capabilities to the language, but rather enhance the compactness of the expression. The first new feature, alternation, is the use of the or symbol to express alternate definitions for the same nonterminal within a single production. This symbol is a vertical bar. For example, the set of productions in a

nonterminal





:

:

:

:

:

:

:

:

:

:

to

= 1 = 3 = 5 = 7 = 9

2.1

|

Language specification

17:

FIGURE

Verification that

2.3

=

6 + 3/12

is

a calculation

i

/

operator

digit

unsigned

operator

unsigned

\

number

digit

\

unsigned

\

number

digit

digit

\

unsigned

value

calculation

FIGURE

Reduction of

2.4

digit

operator

unsigned

?

invalid string

?

number value

18

Chapter 2

|

Preliminary Concepts:

FIGURE

2.5

Calculator

















:

;

s

:

:

=

:

:

=

:

:

=

:

:

=

:

:

=

:

:

=

:

:

=

:

;

=

:

;

=

:

;

=

grammar

We

BNF

=







.

1

= 2

:

:

:

:

:

:

:

;

:

:

:

:

:

:

= 8

:

:

= 9

:

;

= +

:

:

= -

:

;

= +

:

:

= -

:

:

= *

:

:

= 3 = 4

s 5 = 6 = 7

= /

can be compressed using

in

this notation to

::=1|3|5|7|9

add optionality by use of the brackets ( ] ) to specify an and repetition by use of the braces ({ }) to specify a repeated The brackets indicate zero or one occurrence of the enclosed specalso

[

optional item item.

ification,

while the braces indicate zero or more repetitions of the enclosed

specification.

2.

1

|

Language specification

19

For example, the production

:

:

=

[a]

b {c}

specifies the following strings as valid goals:

b

ab be abc bec abec bece abece

This indicates that there are zero or

one as followed by one b followed

by zero or more c's. Although these brackets and braces are simply defined, their interpretation can be quite complicated when they are nested and/or include the or metasymbol. Consider, for example, the production

Although

:

:

=

[{a} b]

{d

|

e}

expresses a production compactly,

this

has been lost in the process, since the definition

When used with

care, these extensions

we is

see that

some

clarity

rather complex.

can improve not only the com-

pactness but also the clarity of grammar definitions. Figure 2.6 shows

how

these extensions can be used in defining our calculator grammar. As you

can see,

and

new

this

notation frequently eliminates the need for recursion

alternatives.

One purpose of EBNF is to enhance the clarity of expression for gramis one difficulty that arises in its use when a symbol used in EBNF (a metasymbol) is also to be a terminal symbol in a grammar. For mars. There

example,

if

|

were

a terminal

symbol

in the

grammar being

defined, then

the production

:

:

= a

|

b

could be interpreted in either of two ways.

First,

the nonterminal x might

have two alternative definitions, a or b, if is interpreted as a metasymbol. If, however, is interpreted as a terminal symbol of the grammar, this |

|

specifies a single definition consisting of three terminal symbols.

We will a

avoid this confusion by using the following convention:

metasymbol

is

also a terminal

20

When

symbol of the grammar being defined, the

Chapter 2

|

Preliminary Concepts:

FIGURE

Calculator

2.6

grammar

in

EBNF

= =





[ ] = [] [. ] = digit {} =

=

0|1|2|3|4|5|6|7|8|9

= +

-1*1/

= +

symbol will be underlined when it is to represent the terminal symbol and will represent the metasymbol otherwise. Using this convention, the preceding production would represent two alternative single symbol definitions of x. If we wished to express the three-symbol definition, it would be written

:

:

= a

I

b

A somewhat different approach to expressing grammars was used by Wirth

Syntax diagrams

syntax diagram. It two-dimensional directed graphs whose nodes

in his definition of Pascal. This tool is called the

expresses productions as are symbols.

The possible paths through the graph represent the possible

sequences of symbols

nonterminal of the production.

that define the

Terminal symbols are represented by ovals and nonterminals by rectangular nodes. Syntax diagrams have the advantage of using two dimensions to enhance understandability Their disadvantage

is

the difficulty in

generating the diagrams using a linear input device such as a keyboard. Figure 2.7 shows the syntax diagram for our calculator grammar.

BNF and equivalently by EBNF are grammars. This means that the valid definitions of a nonterminal symbol are independent of the context in which the symbol is found. Most programming languages cannot be completely specified by

The grammars

known

that

can be described by

as context-free

a context-free sensitive,

grammar because they

meaning

that their

contain

some

Problems with specifications

rules that are context-

nonterminal definition depends on context.

common

requirement that a variable must be declared before it is used cannot be expressed in a context-free grammar, since the validity of a variable depends on whether its declaration is in the context For example, the

of its use or not. Although formal tools

do exist to express

2.

1

|

context-sensitive

Language specification

21

FIGURE

2.7

Syntax diagram for calculator grammar

calculator ^

expression

}

value

expression tpt

value

operator

^

^

value

^

J

sign

unsigned

^

^T^ ^

unsigned

unsigned >

digit

^FT

digit

£

digit ~7F~

0)Cl)(2)C3)C4)C5)C6

)C7 )(8 )(9

sign

^r

operator ^T^

z tthen = z + 1 z else = x + u; X u

:

:

:

:

:

end; 7.

A

certain

magazine distributor stores each subscription it distributes and assigns each subscription a code number. That code

in a database

number

looks

like:

JIP081#301870189018901900190019178

Here the

first

three letters of the code are the

subscriber's last

numbers

first

three letters of the

name. The next three characters are the

in the street address.

This field

is

omitted

if

there

first

three

no

street

is

a post office box). The next charThe next character(s) indicates the number of subscription terms, which can be an integer from 1 to 99. The following four characters indicate the starting date of the first subscription term in the format mmyy. A pair of start/end dates is given for each subscription term. The last two characters in the code are a

address (for example, the address

acter

is

always a

two-digit

number

is

sign (#).

ID code for the magazine.

40

Chapter 2

|

Preliminary Concepts:

In the preceding

sample code, the

fields

would be interpreted

as

follows:

JIP 081 #

First

three letters of subscriber's

First

three digits of street address

Number Number

3

last

name

sign

of subscription terms

0187 0189

Start date

189 190 190 0191 78

term 2 End date of term 2 Start date of term 3 End date of term 3 Magazine ID code

of term

End date of term

1 1

Start date of

Give a specification for a language that recognizes

all

forms of

subscription code numbers using BNF and syntax diagrams (not EBNF). Be sure your language accepts exactly what is specified earlier nothing more or less. You may have to invent a few names for nonterminal symbols. You may assume that and are nonterminal symbols and are predefined for your use.



8.

Consider your specification for Exercise strings, give a derivation

a.

b.

9.

and a parse

7.

For each of the following

tree.

JIP081#301870189058906900591129278 SMI#10719128733

The following EBNF

specification

is

ambiguous:

::= if then

if then else

Assume

that

and

are defined elsewhere

and

that

they describe a boolean expression and other statements, respectively. a.

Show

b.

Give a different specification that describes the identical language but that is unambiguous. Do not give further definitions to

that the specification

is

ambiguous.

and . 10. Consider this language specification:

: : : :

:

=

:

=

:

:



* *

A



= {} = C

Exercises

41

symbols is {c,*, A }, and E is the only nonterminal symbol. The start symbol is obviously E. Give a derivation and parse tree for the string a.

The

set of terminal

*

{c

b.

c

A

c

A

*

The preceding

c}

specification

is

ambiguous. Give a proof of

its

ambiguity.

11. Give a

BNF

number

specification for the language that consists of an

of as followed by an

even

odd number of b's.

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiniiiiim

me calculator language described in this chapter.

1.

Write an interpreter for

2.

Write a program that translates a

grammar

written in

EBNF

into a

BNF

grammar. 3.

Write a program that accepts as input an expression in the calculator

language and generates the parse tree for that expression.

42

Chapter 2

|

Preliminary Concepts:

Laboratory exercises

CHAPTER 3.1

Data Objects

3.2 Scalar

Data Types

3.3

Execution Units and Scope of Binding

3.4

Scope of Binding

INFORMATION BINDING iimiilliliiiiiiiiiiin

n

iimmiiiiiiiiiiiiniiim

The imperative model of programming languages was created to mimic, as closely as possible, the actions of computers at the machine language level. At that level, computers operate with two major units the central processing unit (CPU), where computations are performed, and the memory, where data are stored. The typical unit of execution in machine language, which may or may



not be a single instruction, consists of the following four steps: 1

Obtain the addresses of locations for a result and one or more operands.

2.

Obtain the operand data from the operand location(s).

3.

Compute

4.

Store the result data in the result location.

result data

from the operand

data.

For example, a simple assignment statement such as A

:

= B

+

C;

would be executed

as follows:

and

1.

Obtain the addresses of

2.

Obtain the data from the addresses of B and

3.

Compute

4.

Store the result in the location of

the result of B

A, B,

+

C. C.

C. A.

Imperative programming languages have abstracted away the use of

addresses in favor of names, but otherwise have retained the preceding four steps as a standard program unit. This unit of execution has

become

45

the fundamental execution unit of imperative languages and is called the assignment statement. The BNF form for such a statement is given by



where the first component, name, represents the result location, the second component is an operator that signifies assignment ( = in Pascal and Ada, for example), and the third component is an expression that specifies the names of the operands and the computation to be performed. :

Fundamental to the performance of

this

assignment

is

the establish-

ment and use of a number of bindings. In step 1 of the preceding execution model, a binding between names and locations must be used to obtain the locations of the operands and result. In step 2, a binding between location and value must be referenced to establish operand values. And finally, in step 4, a new binding between the result location and its computed value must be established. A view of programming languages from the perspective

of bindings

is

presented in Section

3.1.

our model unit of execution, the computations depend on the interpretation of the data and of the operations that are defined. Programming languages relate data and operations through the mechanism known as type. In this chapter we discuss the role of type in the binding and computation process and review the properties of several fundamental types. A final consideration related to the binding of names and locations is the ability to establish and change these bindings within the program. The part of the program in which a name-location binding is preserved is called the scope of the binding. In Section 33 we examine ways in which scope of binding can be defined within a language. In step 3 of

iiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiiiiimiiiiiiii

The

abstraction that

ity in

we will

imperative languages

a four-tuple (L,N,V,T),

use to express the assignment statement is

activ-

where L

is

the

and T the type of the object. We call the assignment to one of these components a binding. The implication here is that these bindings can be changed at certain times. Figure 31 shows a visualization of a data object and its bindings. The four bindings are all represented here as lines from the data object to the corresponding objects to which it is bound. The storage space, from

46

3,

1

Data objects

A data object is defined as location, N the name, V the value,

the data object.

Chapter

3

|

Information Binding:

Bindings

FIGURE

3.

1

Data object and bindings

Type Space

Storage Space

Type

Value

Binding

Binding

Identifier

Location

Name

Binding

Binding

Space

which location bindings are selected, is the set of virtual storage locations available within the computer system on which the program will be executed. This space is completely invisible to the programmer, who only needs to know that the binding takes place and when, and does not need to know the specific location to which the data object is bound. The time at which a binding takes place is often an important consideration. There are three times when bindings can occur: 1.

Compile time

—when the program

is

being translated into machine

language 2.

Load time compiler

is

—when the machine language program generated by the being assigned to particular locations within the storage

space of the computer 3.

Run-time

—when the program

For convenience, the location time.

is

being executed

binding commonly occurs

at

load

We will see later that bindings to locations can occur during run time

as well.

3.

1

|

Data objects

47

The identifier space of a language is the collection of all possible names that can be given to data objects. In addition, program units are also bound to names selected from the same identifier space. The definition of the identifier space is an important component of a language. In this text we will highlight language components in boxes that will include the description of the component for some example languages, frequently Ada or Pascal. This will also provide you with a template

Identifiers

components for new languages that you might encounter. The laboratory exercises will frequently ask you to do this. for expressing these

Space

Identifier

Language:

Ada

Definition:

::= Usage: is

The domain of names expressed

in this

{

[underline]}

available for data objects

and program

units

way. Uppercase and lowercase are not

distinguishable.

The accompanying box defines the identifier space as the set of all one or longer beginning with a letter, containing only letters, digits, or underscores, and ending with a letter or digit. We will use BNF notation to express language components, where possible, character strings of length

to avoid ambiguity.

Name binding the point

Note

where that

for a data object typically occurs at

a declaration

is

compile time

at

encountered by the compiler.

name binding becomes

a

more complex

issue

when one

is

dealing with aggregate data objects such as arrays and records. Such aggregate data objects, although

bound

to a single

name, are bound

to multiple

locations. In addition, while they are data objects themselves with type,

and location bindings, the individual components of aggregate strucdo not have a binding to a simple name in the identifier space. Rather, each component is identified by a compound binding of some form. For example, if ITEM is a record variable with one of its components being ID, then the simple name ITEM is bound to several locations corresponding to the separate component data objects of the aggregate object ITEM. Moreover, the compound name ITEM. ID is bound to a single data object. We

value,

tures

will ignore

such complications in our present discussion by limiting our

attention to simple, or scalar, data objects with simple

48

Chapter 3

|

name

Information Binding:

bindings.

The type space of a language is the set of bound to a simple data object. In Section available in imperative languages will

Each type

is itself

possible types that can be

all

common

3.2, the most

Types

types

be discussed.

a space of possible values to

which a data object of

that type

may be bound

that type.

Thus the type and value bindings are two phases of the same

and a set of operations that apply to objects of

binding, with the type binding restricting the possible values to which an object can

be bound and defining the

set of operations that

can be applied

We

can therefore think of a type as a set of values and The type is usually bound to a data object at compile time through a type declaration. Declarations are statements in a programming language that create a data object by binding it to a name and type (see box). to the data object.

a set of operations.

Data Object Declaration Language:

Ada

Definition:

::= = ]

:

[constant]

:

[

One

Usage:

or more data objects are created

from binding each

result

indicated.

If

:

identifier in

=

is

at

compile time. The objects

the identifier

list

to the type

present, a binding of the data

object to the value indicated by the expression occurs at compile time.

If

for the

To

constant life

present, that value binding

is

will

remain

in

effect

of the data object.

illustrate the effect

of a declaration, consider the following sample

declarations:

integer; integer := 0; constant integer

The

effects of these three declarations

are indicated in Figure 32. for the

life

modified

:=

at

Double

on bindings

at

compile time hold

lines indicate bindings that will

of the data object. Single lines indicate bindings that

may be

run time.

This Ada declaration form permits not only the declaration of variables,

but also their initialization and the declaration of constants, as the three cases of Figure

shown

in

3.2.

3.

1

|

Data objects

49

FIGURE Sample

3.2

(a)

integer; (b) Sample constant integer := 0;

Sample declaration: A

declaration:

C

:

:

B

declaration:

:

integer

Type Space

Type

Storage Space

Identifier

Space

Binding

Location

v

Binding

'

Name Binding

(a)

Type Space

50

Chapter 3

|

Information Binding:

A

:=

0;

(c)

FIGURE

3.2

(continued)

Type Space

Another important consideration is the syntax a language uses to specify the operations performed when computing results. In general, operators are of two types, monadic and dyadic. Monadic operators have one operand, whereas dyadic operators have two. The standard format is for monadic operators to be expressed in a prefix form with the operator preceding the operand:

Operators, functions, and expressions

::=

The dyadic operations are commonly expressed

in

infix form with

the operator between the two operands:

::=

Functions are another form of operation, but can have an unlimited number of operands. The form of function calls in an imperative language is

typically the prefix

form using parentheses. This

is

expressed by

::= (

3.

1

|

{

Data objects

,

}

51

in

A further consideration in the evaluation of expressions is the order which operations are performed. For example, the expression 6

2*3

+

could be evaluated as 24 evaluated as 12

if

the multiplication

if

performed first. Or it could be performed first. The most common

the addition is

is

rules for determining order of evaluation in an imperative language with, for example, operations +,

-,*,/, and

**,

is

1.

Operations inside parentheses are performed

2.

Next, operations are

performed

first.

in the following order:

**

first:

second: * and / third: +

3.

and

Operations

These

-

at

rules,

represented in

the

same

level in step 2 are

which are summarized

BNF

performed from

in the

as previously described in

left

to right.

accompanying box, can be Chapter

2.

Expression Evaluation Language:

Ada

Operators:

Order determined 1.

parentheses

2.

precedence

3.

left

to right

if

by:

equal precedence

Functions:

(

{,

})

Although these precedence definitions are ative languages,

it

common among

imper-

should be noted that some languages use other strategies

for expression evaluation,

such as prefix notation for dyadic operations, no precedence levels, or right-to-left evaluation among operators of equal precedence. The precedence rules used by any given language are arbitrary,

but they are necessary and important in ensuring the clarity of

expressions.

52

Chapter 3

|

Information Binding:

Although types are bound to data objects at compile time in most Ianguages, it is possible for this binding to occur at run time as well. The

Type binding and type checking

languages APL and Smalltalk implement such dynamic binding.

Whereas declarations are used

to bind data objects to types at

compile

need no declarations. The type of a data determined by the type of its value. Therefore, whenever a data bound to a different value, it takes on the type of that newly bound

time, dynamically typed languages

object

is

object

is

value.

Type checking

is

the process of determining the type of a specified

The performance of such checking by the computer can be of assistance in the detection and prevention of errors. For example, if

data object. great

the expression

A * B appears in a program, type checking can determine the types of data objects

A and B and whether the operator * applies to two objects of these types. Type checking can occur at compile time, at run time, or not at all. Languages that perform

compile time are called strongly typed. Ada is an example of a strongly typed language. Pascal, on the other hand, is not strongly typed, although it can check almost all types at compile time. Two exceptions to static type checking in Pascal are the checking of subrange types and the use of variant records. Consider, for example, the following Pascal program:

type soft

=

all

type checks

at

record case test:boolean of true first: 1. .20) false (second: char) (

:

;

:

end;

var x,y:soft; c char; :

begin c

:

=

x.

y first .

second; :

=

2 * x. first;

end.

It is

impossible for the Pascal compiler to determine

if

typing

is

correct

program, for two reasons. First, when x second is used, it is not possible to check at compile time that the second variant part of x is in effect. Second, at the assignment statement to y. first, there will be a in this

.

3.

1

|

Data objects

53

type violation

if

x.

first

greater than 10,

is

and

it

not possible to

is

determine this at compile time. In a language that is not strongly typed, there are two possible alternatives for those situations where types cannot be checked at compile time. First, the types might not

be checked

at all.

This

is

what

typically

happens in the preceding cases for Pascal. This places the burden of type checking on the programmer and may lead to serious undetected errors. The second possibility is that the type may be checked at run time. Such

dynamic type checking can be expensive in terms of execution time because a check must be performed every time a data object is referenced. It is also expensive in its use of memory since a type indicator must be stored as a part of every data value.

Dynamically typed languages, where types are bound at run time, at run time. Statically typed languages, where types are bound at compile time, can check types at either compile time can only check types or run time.

Another major issue in dealing with types is the way in which a data object of one type is converted to another type if the two types are mixed in the evaluation of an expression. Operators usually require both operands to

be of the same type. This includes the assignment operator whose left operand is the result data object and whose right operand is an expression that represents a value to be stored in the location bound to the result data object.

The two common strategies for converting an operand to a consistent type are implicit and explicit conversion. Implicit type conversion is often called type coercion. Such coercions may occur automatically when certain type mixtures occur. For example, real and integer operand mixtures might result in the integer operand being converted to a real of equivalent value,

if

possible. This often

is

indeed possible because many

integers have an equivalent real representation.

case

is

when

the real site

is

the integer

is

the

left

The only exception

operand of an assignment. In

to this

this case,

truncated to an integer before assignment occurs since an oppo-

coercion would require changing the type binding of the target data

object, a

change

that

not normally permitted. Languages that permit

is

implicit coercions require a

coercion

is

illegal.

sion from

of

all

pairs of types for

which implied

permitted. Nonpermissible pairs are flagged as errors.

The second types

list

option, explicit

type conversion, makes the mixing of

Instead, explicit functions are required to specify the conver-

one type to another. These functions may be given unique names INTEGER_TO_REAL, or, as in Ada, may permit

for each conversion, like

conversion by a function

and

that will accept

name that matches the name of the resulting type

operands of any allowable type. For example, in Ada,

54

Chapter 3

|

Information Binding:

Type conversion

FLOAT converts from any allowable type to float type. Such conversions are normally allowed between derived types and their parent types or between various numeric types. Pascal permits some explicit and some implicit type conversions. For the function

instance, integer-to-real conversions are implicit,

though

explicit functions

round and trunc must be used to convert from real to integer. A question arises as to what actually constitutes different types and when there is a need for type conversion. Consider the following declarations in Ada:

type Tl is INTEGER range 0..10; type T2 is INTEGER range 0..10; A:T1; B,C:T1; D T2 :

;

The question is, which of the variables A,B,C,D are considered to be of the same type? Another way of stating this is to ask, which of the following assignments are legal?

=B

=C

Three possible definitions of type equivalence are given here

in

order of increasing restrictiveness: 1.

Domain equivalence: Two data objects are of equivalent type if they have the same domain of possible values associated with their types. This

2.

is

also

known

as structural equivalence.

Name

equivalence: Two data objects are of equivalent type if they same name. Declaration equivalence: Two data objects are of equivalent type are typed by the

3.

if

they are

bound

to their type in the

same

declaration.

Under domain equivalence, A,B,C,D are all of equivalent type and can be operands that share the same operator without any type conversion. Under name equivalence, A,B,C are all of equivalent type, but D is not since it was bound to a type T2 which differs from the name, Tl, of the



type of the other three variables.

Under declaration equivalence, only B and C are of equivalent type bound to a type in the same declaration.

since they are

3.

1

|

Data objects

55

The previous discussion has ignored the that

is,

issue of anonymous types,

those types associated directly with variables through declaration

without being given a name. For example, in Ada

E,F

:

G

:

we

might declare

INTEGER range 0..10; INTEGER range 0. .10;

domain and declaration However, the interpretation of name equivalence equivalence are obvious. for anonymous types needs more careful definition. The rule used by Ada is that no two objects of anonymous type are name-equivalent. This means that E, F, and G are all considered to be of different types under name The

interpretations of these declarations using

equivalence.

A

difficulty arises

when

the

domain of one type

is

a subset of the

domain

of another. The operations between the two types might have a logical definition without type conversion,

even though type conversion

will

be

required under any of the definitions of type equivalence. For example, consider the following situation in a hypothetical language:

A:

INTEGER;

B:

0. .100;

The assignment statement A:=B; though obviously meaningful,

will

be

illegal in

a language that requires

One way of dealing with this situation is through A subtype of a given parent type includes the parent type, although its domain of values may be

explicit type conversion.

the definition of subtypes.

same operators

as the

a subset of the parent type's domain. Normally, operations, including

assignment, are permitted between an operand of the subtype and an

operand of the parent type. For example, the preceding assignment could be legalized by the following Ada declarations:

subtype T is INTEGER range 0..100; INTEGER;

A:

B:T; This

Ada construct enforces operational equivalence of data objects bound bound to the parent type.

to the subtype with data objects

56

Chapter

3

|

Information Binding:

Subtypes and derived types

Another problem can arise with subtypes, as ceding declarations followed by the assignment

by the pre-

illustrated

B:=A; In this case, the assignment

parent type and

example,

handle 1.

A

if

may have

bound to

Check

a value outside the

bound

is

to the

domain of the subtype. For

is

not legal.

A language might

ways:

that the value of A

is

subtype domain

in the

assignment as an error

if not.

This

is

at

run time and

the policy that Ada follows.

Flag such an assignment as an error at compile time even though

might execute 3.

A

illegal since

101, the assignment

this in three possible

flag the

2.

is

may possibly be

correctly. This

would avoid

it

extra run-time checks.

Ignore the issue altogether by not checking subtypes.

An opposite situation to that addressed by subtypes occurs in languages that employ domain equivalence. Occasionally a programmer wishes two types with

to define

identical

domains but make them operationally

incompatible. For example, suppose

we

have

type TIME is new FLOAT; type LENGTH is new FLOAT; DURATION TIME; LENGTH; DISTANCE :

:

Although DURATION and DISTANCE share the same domain,

it

would not

make sense to add them together. The reason for using different types TIME and LENGTH is to enforce this incompatibility. A language with domain equivalence should, therefore, permit an override of its basic mechanism to permit the declaration of separate, incompatible types with the same domain. One type will have the same properties as the other type but will not be equivalent in the sense of permitting a mixture of the two types

without type conversion. This in

Ada

as

shown

derived type to

its

in the

may also

is

called a

derived type and is implemented

preceding declarations of TIME and LENGTH.

define a subtype that

is

A

not operationally equivalent

containing type.

The type marized

issues just discussed, as they are addressed in Ada, are

sum-

in the following box.

3.

1

|

Data objects

57

Type

Issues

Language:

Ada

Type conversion:

explicit, using

the

name

of the target type as the

conversion function

Type

equivalence:

name

Subtypes:

p4 -* p2 -* p3 —> pi 11. Operators that are defined

on more than one type are

loaded operators. Identify the operators

86

Chapter 3

|

that are

called over-

overloaded in

Information Binding:

Pascal.

iiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiinmiimiiiiiiiiiiniiiiiiiiiii

In Laboratory Exercises 1-10, you are to work with a language or Ianguages whose implementation you have available to you, and determine

Laboratory exercises

answer to the following questions by constructing a sample program or programs and observing the results. the

1.

What

is

the definition of the identifier space for your language?

characters are allowed identifier length? 2.

What

and

in

which positions?

Are uppercase and lowercase

Is

there a

maximum

letters distinguishable?

are the expression evaluation rules for your language?

the operator precedences?

among

Is left

What

What

are

to right or right to left the direction

operators of equal precedence? Write the

BNF

representation

for these rules. 3. 4.

Does your language perform any implicit type conversions? What does your language use for type equivalence: domain, name, or declaration?

language has a subtype capability, which of the three strategies implemented when an expression of the parent type is assigned to a variable of the subtype and the value of the expression is not within the domain of the subtype? What is the domain of allowable integers in your language? What is the domain of allowable reals? Does your language evaluate AND and OR with or without short circuiting? What happens when a pointer data object is deallocated while another data object is pointing to the same location? What is the effect of assigning the same constant to two different enumerated types? Is it permitted? Are ambiguities discovered at compile time or run time?

5. If your is

6.

7. 8.

9.

10.

Do

blocks have a run-time location binding?

you are to work with an implementation of the Pascal programming language that is available to you.

In Laboratory Exercises 11—12,

11. There are at least two ways to violate type checking in Pascal. Find

them and give

at least

in variant records

and

one example of each.

Clue.

They can be found

for loops.

12. Pascal's version of type equivalence might be called "pseudostructural equivalence." a.

Construct

at least

three tests that demonstrate Pascal's use of struc-

tural equivalence.

Make

the tests different (for example, using

different data structures).

b.

Pascal does not use true structural equivalence, however. Find

where

it

does not use structural equivalence to check for comand provide at least two examples (again with dif-

patible types

ferent data structures).

Laboratory exercises

87

CHAPTER 4.1

Conditional Structures

4.2 Iterative Structures

4.3

Unconstrained Control Statements

CONTROL STRUCTURES iiiiiiiiiiiiiiiiiiiiiiiiiiimniimmiiiiiiiiiiiiiiiiimimiiiiiiii

A programming

language must do more than specify the actions that a computer should take. It must also specify the order in which those actions should occur. Because of the sequential nature of machine language execution, virtually all imperative languages follow the same pattern by adopt-

ing the sequential execution of statements. This means that after a statement has completed execution, the default for the choice of the next statement to be executed is the next physical statement in the program. One of the obvious benefits of a programming language is its ability to modify the order of statement execution by presenting alternatives to the sequential mode. Facilities of a language that permit this are called control structures. Although the choice of control structures has remained remarkably consistent from one imperative language to another, there are many subtleties in their definition and use that can cause confusion as a programmer moves between languages. Furthermore, control structures have been the subject of considerable controversy, both in language design and language usage. In this chapter

we will

pursue the important issues involving these

control structures.

The conditional control structure is examined in Section 4.1. It determines the next unit to be executed based on the result of a test or set of tests. Iteration is the repetitive execution of a program unit, and we study its many forms in Section 4.2. The most powerful control structure is the statement that directly specifies the next unit to be executed. This form has a direct counterpart in machine language and is called a transfer, branch, or goto statement. It will be discussed in Section 4.3.

89

MIIIIIHIIIIII

Illllllllllllllllllllllllllllllllllllllllllllllllll

Conditional control structures determine the next block of statements to be executed based on the result of a test or a sequence of tests. Such structures are usually implemented through an if statement. In this section we will examine four forms of such statements.

4.

The simplest form of conditional control structure performs a single test whose result it uses to determine whether or not to execute a specified block of statements. The two parts of the structure, therefore, are the boolean expression that defines the condition, and the block of statements that will be performed if the expression evaluates to true. The typical form of the simple conditional is

Simple conditional

I

Conditional

structures

if then

The most frequent

variation

from language

blocking the statements. Pascal and

ALGOL

to language 60, for

is

the

method of

example, consider the

made

compound statement by enclosing multiple statements between begin and end. Ada and Modula-2, however, have specific keywords for ending the block of statements, Ada terminating with end if, Modula 2 with end. Neither of these languages needs a beginning-of-block marker since then block of statements to be a single statement that can be

into a

functions in this capacity.

Every modern imperative programming language permits the extension of the simple conditional i f statement to a two-alternative structure. This direct extension takes the

form

if then

else

It is

the addition of the

of the Pascal

else

that

makes apparent one disadvantage

method of blocking statements

in conditionals.

It

leads to the

well-known dangling else problem in the case of nested conditionals. To illustrate this, consider the following Pascal fragment. if x>0 then if x0 then

if x>0 then

begin if x

when -> when -> fi

if-elsif construct. The if-elsif evaluates only until a true condition occurs. Furthermore, in the case where more than one of the conditions is true, the alternative whose statement sequence is executed is chosen nondeterministically. This means that there is no rule for choosing among several possibilities, and any of them could be chosen. If none of the conditions is true, the statement is considered to

This construct differs in several ways from the

above form evaluates

all

conditions, while the

4.1

I

Conditional structures

95:

be

in error.

struct

is

The conditions are often

called a

guards, and the entire con-

called

guarded command.

Although Ada does not implement this construct in the generality of Dijkstra's definition, it does have a specialized version of it that is used to

implement concurrency

control.

We will

study this in Chapter

7.

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiin

One

of the most powerful features of an imperative language

doing

and

is its

capa-

of specifying the repetition of a block of statements. Structures for

bility

this are called

many

raise

4.2

Iterative

structures

iteration structures. These are extremely important

interesting issues. For convenience,

we

will refer to the

block of statements to be repeated as the body of the iteration.

The simplest form of

iterative structure is a

that specifies the indefinite repetition of

grammers

nonterminal ing iteration

its

body. Usually, student pro-

are cautioned to avoid such iterations because they will never

terminate and hence execute forever. In practice, however, there are

quently blocks of statements that execute in tions

just this

way.

fre-

A communica-

program, for example, may have the following nonterminating structure:

do forever check for character sent if character is sent then process character end do Perhaps due to the bad reputation of nonterminating iterations,

many

lan-

guages have no direct means of expressing them. Nevertheless, they can

be simulated very

example, one might write

easily. In Pascal, for

while true do begin

end;

Other languages, such as Ada, have a direct form minating

iteration. In Ada,

that expresses a nonter-

one writes

loop

end loop;

96

Chapter 4

|

Control Structures:

Nonterminating iteration

One fundamental on

capability of iterations

the result of a test.

Two

is

the ability to terminate based

Pretest iteration

factors can vary in the specification of this test:

its placement and its logical direction. The placement of the test can either be before, after, or in the middle of the body of the iteration. We will call these three choices pretest, posttest, and in-test iterations and devote this and the next two sections to their description. The logical direction of an iteration specifies whether the test is a termination test, where a true

condition indicates the iteration should

where

a false condition

termination condition ation test

is

found

halt,

or a continuation test,

completes the

is

in the

iteration. In Pascal, for example, the preceded by the until clause and the continu-

while

clause.

while loop

This iteration

then

tests

first tests

the condition.

If

it

is

true,

executes the body.

it

It

the condition again and repeats the process until the condition

becomes false. Note that, for the iteration to terminate, it must be possible body to change the state of the condition. Also note that it is possible the for condition to be initially false, in which case the body is not executed for the

at all.

Once

again,

ment block

an important issue

that serves as the

is

the syntax for delimiting the state-

body of the iteration. As with the i f state-

ment, Pascal assumes that the body

is

one statement

that

can be expanded

compound statement using begin and end. The technique of specifying a block of statements by special delimiters is used by Ada, whose syntax into a

for the pretest iteration

is

while loop

end loop; The word loop

signifies the

beginning of the body and end loop marks

the end.

Whereas

a pretest iteration provides a test before entry to the

iteration, a

posttest iteration places the

pretest iteration

is

body of the

Posttest iteration

the body. Generally,

test after

preferred over the posttest form because posttesting

permits one execution of the loop before the Pascal, a posttest structure

is

test is first

performed. In

repeat-until construct. This its test. The repeat-until method of delimiting the iteration

available, the

construct also uses a termination condition for

presents an inconsistency in Pascal in

its

:

4.2

I

Iterative structures

97

body. This

is

the only control structure for which Pascal abandons the

compound statement convention In this case, the

in favor of the

use of keyword delimiters.

words repeat and until are used

to delimit the iteration

body.

The equivalence of the pretest and posttest iteration constructs For example,

is

obvious.

general pretest iteration

in Pascal the

while do begin

end; can be replaced by the equivalent, though more awkward,

if then repeat

until not ; Similarly, the general posttest iteration

repeat

until ; is

equivalent to

while not do begin

end;

The

is for the convenience of choose the one most natural for any given iterative structure. Ada provides no posttest iteration construct, but can simulate it by placing the in-test conditional at the end of the iteration body. This will be illustrated in the next section.

inclusion of both constructs in a language

the programmer,

Occasionally

it is

who may

desirable to perform the test for terminating an iteration

neither before nor after the execution of the body, but rather in the middle. This it

is

is

called an in-test iteration

and

often argued that the use of a goto statement

98

Chapter 4

|

is

is justified.

Control Structures:

somewhere where

a situation

In-test iteration

A much more

restrictive construct

than the goto can be used for this

— one whose only purpose

purpose, however

is

from an

to exit

iteration

middle of its body. This has the advantage of permitting a flexible from a loop while not allowing indiscriminate branching within the program. This approach also avoids the need to use statement labels. We will examine the capabilities of such an in-test construct by discussing its Ada implementation, the exit statement. The format of this in the

exit

statement

is

exit [when ]

The

condition, in this case,

is

The general form of the

a termination condition.

Ada

in-test iteration in

is

therefore

loop

exit v/hen

end loop;

where the top and bottom bodies are those statements executed respectively before and after the test is performed.

A

further extension of this construct

nation of

more than one nested

is

available to permit the termi-

iteration at the

same

time. For example,

loop

loop

exit when

end loop;

end loop; In this situation, the

inner iterations.

The

exit statement definition of the

the immediate containing iteration. ever, to

is

contained in both the outer and

Ada exit

A facility is

is

that

permit the exit from several layers of iterations

This

is

made

exits

it

from only

also provided in Ada, howat

the

same

time.

possible by the labeling of iterations and specifying the

outermost iteration to be exited by naming the label in the exit statement.

The preceding example could exit from the outer

iteration in the following

way:

4.2

I

Iterative structures

99

OUTER:

loop

loop

exit OUTER when ;

end loop;

end loop OUTER;

loop statement with a colon separator and, in addition, must be appended to the end loop statement. This form of label is used for reference by exit statements only and may only be attached to 1 o op statements. A separate syntax is used for general statement labels, and this syntax will be described in Section 4.3. The exit statement of Ada gives the programmer power beyond the Note

that the label

simple

is

attached to the

in-test iteration.

since there

is

no

provides the capability of multiple-exit iterations

It

limit to the

iteration. This practice is

number of

exits that

can occur within an

discouraged as being counter to the goal of writ-

ing understandable programs, however.

The exit statement

also permits the

programmer

to simulate the

action of posttest iterations, a feature not directly included in Ada. posttest iteration can

An Ada

be written

loop

exit when ; end loop; Another construct, often confused with the exit, is one that termiand begins the next pass, rather than

nates the present pass of an iteration

terminating the entire iteration. Ada does not include such a statement,

but the language

The

C provides

a

continue statement

count iteration, which FORTRAN. This iteration is termiloop of specified number of times rather than until a spec-

oldest of the iteration structures

traces

its

is

the fixed

Fixed count iteration

roots back to the do

nated after executing a ified

for this purpose.

condition occurs.

Fixed count iterations are controlled by a variable known ation control variable (ICV). The general form of such an

for

:=

as the iter-

iteration

is

to step do

100

Chapter 4

|

Control Structures:

Here, ICV

is

a variable,

and

initial, final,

and increment are expressions

whose values have the same type as ICV. Based on this general form, we will address a number of important variations among languages in forming the fixed count iteration.

What

1.

types are permitted for the ICV?

In

some languages only

and and enumerated types, the control of a case structure. These types

integers are permitted; in others only numerics, including integers reals. Pascal

the

and Ada both permit

same as those permitted

in

integer, character,

are permitted because they possess a built-in stepping function. In other words, each element has a natural successor. Real types do not possess this property and require explicit values for the increment if they are

allowed.

What

Most imperative languages require is the scope of the ICV? ICV be a variable that is bound in the execution unit containing the iteration. Ada, however, takes a different approach. The scope of an ICV in Ada is the body of the iteration for which it is declared. This means 2.

that the

that

its

appearance

and binds

it

in the iteration

locally to a location.

statement

On

is

equivalent to

completion of the

its

declaration

iteration, the

ICV

no longer bound to that location. The modifi3. Can the ICV be modified within the iteration body? cation of the ICV within the iteration body is dangerous in that it disrupts is

the sequence of values specified at the beginning of the iteration. For this reason,

some

languages, such as Ada, disallow modification of the ICV,

through either assignment or use as a modifiable argument to a procedure.

Other languages place no 4.

What

is

on changing the ICV. ICV after termination of

restrictions

the value

of

the

the itera-

There are four different responses that languages give to this questhe scope of the ICV is the iteration, as in Ada, the answer is obviously that the ICV no longer is bound to a location, and hence has no value binding either. If the ICV maintains its location binding after termination of the iteration, it could be bound either to the value it had during the last iteration, to one increment beyond its value during the last iteration, or to an unspecified value. This last option means that, unlike with Ada, the ICV will have some value, but no guarantees are made as to what that value might be. This 5. When are the final and increment expressions evaluated? becomes an important issue when variables in these two expressions are modified inside the iteration. For example, the following Pascal program tion?

tion. If

fragment would raise

this issue:

for i = 1 to n do n:=n+l; :

If

the final value n

this iteration will

is

reevaluated each time the iteration body

run forever for n

initially positive.

It

4.2

is

executed,

turns out that for

I

Iterative structures

101

Pascal, as

with most imperative languages, both the

iteration.

fore have

entry into the

is

equivalent to the simple statement

2*n;

:=

6. Is

initial

The changing of n in the preceding iteration body would thereno effect on the number of times the body is executed, and for

positive n the previous fragment

n

and the increment

final

expressions are only evaluated once, prior to the

an increment

other than successor permitted?

Although an

increment expression was specified in the general form, some languages do not allow such a specification. Pascal and Ada, for example, permit only

an increment of one for a numeric ICV. Nonnumeric ICVs are required to be of types where each element has a defined successor making the implied effect

of an increment setting the ICY to the successor element within the

type. 7.

How

is

iteration

backward through a range

guages that permit increment narily

used to indicate

this

specified?

specification, a negative

increment

In lanis

ordi-

type of iteration. Pascal, which has no explicit

increment, replaces the keyword to with downto as in

for

i

:=

6 downto

1

do

...

Ada expresses the initial and final expressions as a range in which initial must always be less than or equal to final. For the ICV to proceed backward through this range, the keyword reverse must be appended. For example, for

I

in reverse 1..6 loop ...

8. Is transfer into the iteration

body permitted? Because the paramand fixed when it is initially

eters for a fixed count iteration are evaluated

entered, branching to the interior of such an iteration without executing

these

initial

evaluations can be very dangerous. Therefore,

guages, like Ada, disallow such transfers. Pascal and

many

some

lan-

others allow

these transfers to occur, though the results will be highly unpredictable.

As with other iterations, the 9. How is the iteration body delimited? two approaches are (1) to allow the body to be a compound statement, or (2) to use keywords to delimit the block of statements forming the body. Pascal, as usual, follows the compound statement philosophy. Ada utilizes the keywords loop and end loop to delimit as before. The general form of the iteration statement in Ada includes all of the types of iteration that can be specified, including nonterminating, pretest, in-test, and fixed count iterations.

102

Chapter 4

|

Control Structures:

Iteration

Language:

Ada

loop-statement [] for in [reverse] [while loop

end loop [] :

:

Pretest: continuation test Posttest: In-test:

no

facility

is

is

while

option

provided for posttest

statement of the form

[when ]

exit [loop-name] exits

used with

from the named

Fixed count:

iteration

for option implements

Types for ICV: Scope of ICV:

integer, character,

iteration

fixed count iteration

enumerated

body only

ICV modification permitted

inside iteration?:

Value of ICV after

not applicable

When

is

iteration:

range evaluated? once, upon

initial

no entry

Permissible increments: successor and predecessor

Backward

iteration:

keyword reverse

Transfer into iteration: not permitted

The nondeterministic conditional can be extended

to

form a nondeter-

Nondeterministic iteration

ministic iteration of the following form:

do

when -> when -> when -> od

sequence to be executed will be made whose guard conditions are true. The iterative form repeats as long as at least one guard condition is true and terminates whenever none are true. Again, we will see this form imple-

Once

again, the choice of statement

nondeterministically from

mented

among

those

for concurrent control in Chapter

7.

4.2

I

Iterative structures

103:

iiiiiiiimiiiiiiiiiiiiiiiiiiiiiiimiiimiiiiiiiiiiiiiiiiiiiiiiiiiini

The most

controversial of

strained, that

is,

all

control structures are those that are uncon-

permit branching to any program unit without

These are generally known tionable value, as

we will

goto constructs and

as

discuss shortly. Nevertheless,

ative languages, with the exception of Modula-2,

In this section

their use all

restriction. is

4.3 Unconstrained control statements

of ques-

popular imper-

provide a goto statement.

we will examine this simple, yet powerful, control construct.

The general format of the goto statement which it is included is

in

almost every language in

goto

FORTRAN

contains alternate forms of the goto, but these are really just

other ways of expressing a multialternative, case-like structure. permits a goto or two to be attached to every statement as a

The

interesting issues

and

variations with

SNOBOL

suffix.

goto constructs

arise

when

the format of labels and the impact of scope are considered.

There

is

great variation in the

way statement

labels are formed. BASIC, for

example, requires a label to be present for every statement. Other languages make labeling a statement optional. Optional statement labels are frequently separated from the statement by a colon. Ada requiring that the label be enclosed between

is

an exception,

« and ». This

is

the case

used in Ada to label iterations, as described earlier. In order to better understand the use of labels and different languages' approaches, we return to the data object model that we used in Chapter 3. For our discussion here, we consider the data object to be a statement because the colon

is

of the program that

is

bound at load time

the executable statement labels

is

that illustrated

storage space

where the

same

memory where

The approach most languages use toward 4.1.

instruction

an element of the label

the

stored.

The name binding occurs

time. to

is

by Figure

to the location in

at

Here, the value

is

the element of the

located. This binding occurs at load

is

compile time and binds the data object may or may not be

identifier space. This space

as the variable identifier space. In

C and

Ada, the label and

variable identifier spaces are the same. But languages often select label

from the space of integers, as in Pascal and FORTRAN. Another variation is whether the label type binding is made explicitly through a label declaration (Pascal) or implicitly through attachment of the label identifier to a statement (C, Ada, and FORTRAN). When a label is declared, there are implications for the scope of the label, which we will discuss identifiers

later.

Figure 4.2 illustrates the use of label variables. The best example of this is

found

in PL/I,

which permits the declaration of an identifier to be a be bound to any legal label constant as its

label variable. That variable can

104

Chapter 4

|

Control Structures:

Statement

labels

FIGURE

4.

1

Statement

labels as constants

Type Space

Label

Type

Storage Space

Identifier

Space

Binding

Value

,

Binding

FIGURE

4.2

Statement

'

Name Binding

labels as variables

Type Space

Label

Storage Space

Identifier

i

;

4.3

I

Space

Name Binding

Unconstrained control statements

105:

value. This permits such interesting activities as passing labels as

parameand forming arrays of labels. The languages SNOBOL and APL extend this idea even further by permitting calculated expressions to have their values assigned to labels. There is a price to be paid for this interesting extension, namely, the loss of program readability. We know that goto statements themselves can be detrimental to program readability, but when a single goto statement can branch to virtually any labeled statement with the choice of statement dependent on some nonlocal action the readability factor sinks to new lows. For this reason, the implementation of label variables is not included in most modern imperative languages. ters



There are several important issues related to the scope of labels. The is the scope of the name binding to a labeled statement. In general, follows the scoping rules for variables, that

is,

first

Scope

this

the binding holds in the

present block and

all

contained blocks. Redefining a label identifier inside

of a nested block,

if

allowed by a language, could result in the "hole-in-

scope" problem as with variables.

To

illustrate the

above point, consider the following Ada fragment.

OUTER: begin INNER: begin

«INSIDE»

— this

goto OUTSIDE

is legal

end INNER

«0UTSIDE»

— this

goto INSIDE;

is not legal

end OUTER;

INNER block is legal because OUTSIDE is bound from the The goto in the OUTER block is not permitted because INSIDE is bound only in the context of the INNER block. By the way, note that OUTER and INNER are block labels, while INSIDE and OUTSIDE are statement labels. It is illegal to use block labels in a goto statement. One further type of scoping block can logically be defined for labels beyond those that apply to variables. This is the block of statements making up the body of a control structure. It is necessary to make these blocks

The goto

in the

containing block.

scoping blocks for labels to prevent branching to the interior of a control

body without executing the following Ada fragment is illegal: structure's

106

Chapter 4

|

test condition.

Control Structures:

For example, the

issues

loop

«INSIDE» end loop;

— illegal

goto INSIDE; Similarly,

to branch into structure body

branching to a statement inside the body of a conditional or any

other iteration structure from outside that body

is strictly

prohibited. In

defining the scope of labels, the bodies of control structures are thus

same as any other scoping block. Another important observation can be made about the situation where a goto statement branches from inside a block to a statement in a containing block. This is perfectly legal in most languages, but its implementation is not as simple as it might appear. Branching out of a block is actually a termination of that block and requires the removal of that block's activation record from the run-time stack. This may require popping several activation records if the branch is out of several layers of nested blocks. treated the

We

are

now

able to summarize the unconstrained control structure

of Ada.

Unconstrained Control Structure Language:

Ada

goto statement

::= goto

label-name ::=

labeled-statement

Declaration of

label: implicit

Scope of

The

units

The

label:

where

unit

is

unit

in

:

by

{

:=

its

which

}

occurrence

it

occurs and

a block or a structure

all

containing

body

goto statement has been the subject of a major controversy in the field of computer science, initiated by Dijkstra (1968) and rekindled by Rubin (1987). This controversy is actually about programming practices rather than programming languages, but inasmuch as language has an impact on practice, programming languages have become a part of this little

4.3

I

Unconstrained control statements

The goto controversy

107:

discussion.

We

our discussions to the impact

will limit

that the

presence

of the goto statement has on the capabilities of a language.

Three

facts

about control structures are important considerations here.

Simple conditionals and goto statements are

1.

sufficient to replace

any control structures.

Each control structure we discussed in this chapter can be replaced by a construct using only simple conditionals and goto statements. For example, the Pascal while construct of the form

while do ; can be replaced by

if then

10:

begin ; goto 10; end; Several exercises at the

end of

this

chapter require you to replace other

control structures with these two simple constructs. 2.

The

two-alternative conditional

and

pretest loop constructs are suf-

any control structure. This result is far less obvious than the

ficient to replace

first,

but has been proved by

Boehm and Jacopini (1966). One consequence of this result is that a language without a goto could duplicate the programs written in a language containing the goto. In other words, the goto is not required for the construction of any program. 3. The goto is the most powerful control structure. This result has meaning only if the word powerful is defined. Kosaraju (1974) has proved that the goto is the most powerful in the sense that replacing a goto with other structures might require additional variables, while replacing other structures with a goto will never require additional variables. In this sense, programs expressed without the goto are more complex than those expressed with

What guages? all

it.

are the implications of these three results for

First,

programming

lan-

a programming language without a goto statement can express

same language with a goto where programs can be more simply

the programs that can be expressed by the

added. Second, there are situations

represented by the use of a goto.

On

the other hand, in the

same way

that

powerful automobiles or

powerful weapons give greater capability but are accompanied by greater danger, there is an increased danger with the use of the goto. This danger

108

Chapter 4

|

Control Structures:

is

in

an increased

ability to

scientists believe that the

advantages inherent in

its

generate unreadable programs.

dangers of using the goto

Many computer

far

outweigh the

power.

Programming language designers have reacted

to this controversy

by

continuing to provide a goto statement while also providing a sufficiently rich set of

weaker control structures

essary. This presents the

to

make

the use of the goto unnec-

programmer with the

final

choice of whether to

use the goto or not.

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiimiiiiiiiiiiiiiiii

Terms

nonterminating iteration termination test

control structure conditional if statement dangling else case statement nondeterministic

continuation

guard guarded

fixed count iteration

test

pretest iteration posttest iteration in-test iteration

command

iteration control variable (ICV)

goto

iteration

-

II

1.

How would

2.

Why do

3.

Why do you

the usefulness of a language be limited

if it

contained no

'

1

1

1

1

:

a

and Ada not permit the use of reals and

string types in

case statement? think Pascal's

repeat-until uses an approach

to block-

ing that differs from that of all other control structures in the language?

4.

What might be some negative consequences of this? What are the advantages and disadvantages of Pascal's compound statement philosophy of blocking control structures?

5.

Give an argument for permitting the modification of the ICV inside a

6.

What are some reasons for requiring some reasons for not doing so?

7.

Consider the following Pascal code.

loop body. that labels

be declared? What are

Discussion questions

:

1

1

1

1

1

1

1

1

-II.

1

1

1

1

:

Discussion questions

control structures? Pascal

-

-

109:

program GotoQuestion (input, output); label 99; procedure ReadUntil (match: integer); var potential: integer; begin while true do begin read(potential if potential = match then goto 99; )

end; end;

99:

begin ReadUntil (42) writeln( 'Got a 42

!

!

end.

Not only does the goto above "break" a loop, it also "breaks" a procedure by jumping outside of that procedure. Discuss the legality, advisability,

and ramifications of using mechanisms such

as this.

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiimiiiiiiiiiii

1.

Show how

a simple conditional and a

goto can be used

to simulate

the actions of

2.

(if-then-else)

a.

a two-alternative conditional

b.

a multialtemative conditional (case)

c.

a pretest iteration

d.

a posttest iteration

e.

fixed count iteration

For each of the following Pascal structures, solve the dangling else

problem by indicating what a.

will

be printed when

if x>10 then

b.

if x>10 then

begin x:=x+2; if x process2; when 2 9 => process3; when 6 when others => error; end case; |

4.

Ada has no Design one

form directly built into the language. would remain consistent with Ada's other iteration

posttest iteration that

constructs. 5.

Design the syntax for a completely general iteration structure that permits either pretest or posttest and continuation or termination logic. Write the

one 6.

BNF for your construct. Does your answer permit more

test for

the

same

than

iteration?

Give examples of situations where each of the following are natural iteration forms: a.

Nonterminating

b.

Termination, pretest

c.

Continuation, pretest

d.

Termination, posttest

e.

Continuation, posttest

f.

Termination, in-test

g.

Continuation

7. If neither

in-test

an exit nor a goto statement were available

in Ada,

how

might you simulate the action of an exit statement? 8.

Give an

problem a repeat loop.

illustration of the "hole-in-scope"

9. Rewrite the following

for index

:=

for loop as

for

Ada

labels.

10 downto -5 do soraething_or_other;

Exercises

111:

10. In BASIC, the on ... goto statement evaluates an expression in the

same way

a

much

case statement does. For example, executing the

statement

25 ON (X) GOTO 100,

200,

300

computer to jump to lines 100, 200 or 300 if X is either none of these, no jump is done. 2, Rewrite the preceding statement as a case statement in Ada.

will cause the 1,

a.

b.

or

3- If

X

is

Rewrite the preceding statement as an i f statement in Pascal.

MMMM

MIMIMIM

MMMMM.MMMMIMMMMMM

In each case you are to work with a language or languages whose imple-

Laboratory

mentation you have available to you, and determine the answer to the following questions by constructing a sample program or programs and

exercises

observing the 1.

2.

3.

results.

Does your language permit an unconditional branch into the block of statements executed under control of a conditional? What types are permitted for the control expression of a case statement in your language?

How does your language react when an unspecified choice is evaluated for the expression controlling a

case statement?

5.

Does your language permit unconditional branching into or out of the body of an iteration? What is the value of the ICV after completion of an iteration?

6.

When

7.

Does your language permit an ICV to be of a real type? Does your language permit an iteration to have multiple exit tests? Does your language permit modification of the ICV in the body of a

4.

8.

9.

is

the final expression evaluated in a fixed count iteration?

fixed count iteration?

112

Chapter 4

|

Control Structures:

IM'MII

CHAPTER 5.1

Data Aggregate Models

5.2

Arrays

5.3 Strings

5.4

Records

5.5 Files

5.6 Sets

5

DATA AGGREGATES II

In addition to the fundamental data types introduced in Chapter ative

programming languages have

aggregates of other types called

facilities for

3,

imper-

made up

types that are

Illllllllllllllll

of

data aggregates. This chapter will exam-

ine these language capabilities.

we look at the general structural models used to aggrenew types. The remaining sections each study a specific

In Section 5.1

gate types into

aggregate type in light of types

we

its

relationship to these models.

The aggregate

will study are arrays (Section 5.2), strings (Section 5.3), records

(Section 5.4),

files

(Section 55), and sets (Section 56).

Each of these aggregate types

will

be examined from the following

points of view:

1.

Declaration and binding: In contrast to the data object bindings that

we

have seen previously, the bindings of interest in

the bindings of the aggregate type to 2.

its

this

chapter are

constituent types.

Manipulation: The fundamental operators

on aggregate types are com-

parison and assignment. In addition, aggregate types need operators

known

as selectors that convert

and constructor operators

from aggregate to constituent values, from constituent to aggregate

that convert

values. 3.

Implementation: The implementation of an aggregate type refers to special considerations given to the representation of aggregate struc-

tures in storage. Data compression, data organization,

and indirect

storage using pointers are important options in implementation.

Data aggregate types provided by a programming language are distinct from abstract data types constructed by the programmer from the simple

115

IIIIMIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIIII

and aggregate types of the language. to data abstraction in Chapter

We address

language features related

8.

MiimmmiiiiimimiiiiiiiiimiiiiiiiiiiiKiiiiiiiiiiiiiiniiiiii

we

In this section,

introduce five abstract models for representing data

aggregation. These models will serve as useful tools in describing specific

data aggregates. This classification of models

For the purpose of

we

this section

5.

1

Data aggregate

models

taken from Hoare (1972).

is

T2

use the notation that Ti,

,

.

.

represent types, either simple or aggregate. These tvpes are not necessarily distinct,

When

so that Tj and

T 2 may

an aggregate type T

is

represent the

from values of the constituent

structures built

may have

nality.

The

T

the

is

number

an aggregate type

constituent types.

its

We

denumerably infinite cardieasily computed from the

is

represent the cardinality of a type

C(T).

The Cartesian product type is

of possible values of that type. Data

either finite cardinality or a

cardinality of

cardinality of

T by

types.

are also interested in the cardinality of a type. Simply stated, the

cardinality of type

types

different types.

The values of an aggregate type are

types are called constituent types.

We

same ivpe or

defined in terms of other types, these other

is

constructed from a set of

types and

finite

defined as follows:

Tj

x T2 x

.

.

.

x Tn =

{(t„ t2

In words, the Cartesian product

tn )

is

where t^Tj,

the set of

all

t2

eT 2

,

.

. .

,

tnCTj

possible tuples that can be

formed by choosing one element from each of the n types

participating in

the product.

Consider the following example with n

=

3

and

finite

types

T lt T2 T3 ,

.

= {1,2,3.4} T2 = { A B C T3 = {true, false} T,

'

'

'

By our

,

'

'

,

'

definition, the possible

(1, 'A' ,true)

false) (1, 'B' .true) (1, 'B' false) (1, 'C ,true) (1, 'C false) (1, 'A',

,

,

elements of type 1\ x

true) false) 2, 'B', true) 2, 'B', false) 2, 'C, true) 2, 'C, false)

T2 x T3

are

true) false) 'B .true) 'B' false) 'C\ true) false)

2, 'A',

(3, 'A', true)

(4, 'A*,

2, 'A',

(3, 'A'

false) (3, 'B\ true) (3, 'B' false) (3, •