Programming language theory in the context of a language model (paradigm) and one or more example languages for each of
719 58 34MB
English Pages 413 [440] Year 1990
PROGRAMME STRUCTURES AND MODELS
HERBERT L DERSHEM MICHAEL
J.
JIPPIIUG
PROGRAMMING LANGUAGES: STRUCTURES AND MODELS
PROGRAMMING LANGUAGES: STRUCTURES AND
MODELS
Herbert Michael
L.
J.
Dershem
Jipping
Department of Computer Science
Hope College
Wadsworth
Publishing
Company
Belmont, California
A
Division of
Wadsworth,
Inc.
Computer Sciences
Editor:
Editorial Assistant: Carol
Frank Ruggirello
Carreon
Production: Stacey C. Sawyer Print Buyer:
Martha Branch
Designer: James Chadwick
Copy
Editor: Elizabeth
Technical Illustrator:
Judd
Anne Eldridge
Compositor: Graphic Typesetting Service Cover: Williams/Vargas/Design
Cover Photo: The Image Bank Signing Representative: Thor McMillen
©
1990 by Wadsworth,
Inc. All rights
reserved.
No
part of this
book may be reproduced,
stored in a retrieval system, or transcribed, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, wadsworth Publishing Company, Belmont, California 94002, a division of
Wadsworth,
Inc.
Printed in the United States of America
123456789
10—94
93 92 91 90
Library of Congress Cataloging in Publication Data
Dershem, Herbert. Programming languages. 1. I.
Programming languages (Electronic computers)
Jipping, Michael
QA76.7.D465
J.
1990
ISBN 0-534-12900-5
II.
Title.
00513
89-21465
BRIEF
CONTENTS
Chapter
I
Introduction and Overview
Chapter 2
Preliminary Concepts
Chapter
Information Binding
3
Chapter 4
Control Structures
1
12
44 88
Chapter 5
Data Aggregates
Chapter 6
Procedural Abstraction
I:
Chapter 7
Procedural Abstraction
II:
Chapter 8
Data Abstraction
Chapter 9
Example Imperative Languages
Chapter Chapter Chapter
1
I
Logic Model
402
144
Exceptions and Concurrency
288
322
Object-Oriented Model
Index
Procedures
222
Functional Model
1
I
114
360
246
182
Digitized by the Internet Archive in
2012
http://archive.org/details/programminglanguOOders
CONTENTS Preface
xiii
Chapter
Introduction and
I
Overview 1.1
What
Is
a
I
Programming Language?
The Early Languages
1
Why Study Programming Languages? 3 A Brief History of Programming Languages 1.3 1.2
Chapter 2
2.1 Language Specification
13
Expressiveness
Orthogonality
15
Syntax Diagrams
Language Translation
34 34
34
Portability
21
2.4 Choice of Language
Problems with Specifications
Compilers
9
Languages of the 1980s
1
14
BNFandEBNF
2..1
8
Preliminary
Concepts Grammars
5
ALGOL-Based Languages
21
Implementation
35
35
Programmer Knowledge
25
and Interpreters 26
Overview of the Compilation Process
28 Syntactic Analysis 30
27
Syntax
35
36
Portability
36 36
Lexical Analysis
Semantics
Semantic Analysis
Programming Environment Model of Computation 36 An Example 37
Code Generation Optimization
31
31
31
2.3 Language Design Characteristics Simplicity
Abstraction
33 33
Terms 33
38
Discussion Questions Exercises
38
39
Laboratory Exercises
42 vii
36
Chapter
Information Binding
3
46
Bindings
3.3 Execution Units and Scope of Binding
48
Identifiers
and
and Expressions 51
Type Checking
and Derived
3.2 Scalar Data Types
56
Types
Exercises
Control Structures 90
%
4.2 Iterative Structures
Posttest Iteration
In-Test Iteration
Terms
107
109
Exercises
1
109
10
Laboratory Exercises
116
116
112
and Binding
121
122
125
126
135 137
Records in Ada
Terms
138 141
141
Discussion Questions
130
Exercises
and Binding
Manipulation
134
Implementation
5.6 Sets
128
133
Variant Records
5.5 Files
Multi-Indexed Arrays
Ada
and Binding
Manipulation
119
131
132
133
Declaration
119
Implementation
Ada
5.4 Records
118
119
Manipulation
Implementation Strings in
117
Discriminated Union
Declaration
106
Scope Issues
Data Aggregates
5
Cartesian Product
Arrays in
104
Statement Labels
Discussion Questions
5.1 Data Aggregate Models
5.3 Strings
96
97 97 98
103
4.3 Unconstrained Control Statements
The goto Controversy
Nonterminating Iteration Pretest Iteration
100
Nondeterministic Iteration
90 Multialternative Conditional 92 Nondeterministic Conditional 95
Declaration
87
88 Fixed Count Iteration
90
Two-Alternative Conditional
5.2 Arrays
79
81
Laboratory Exercises
Simple Conditional
Powerset
79
Discussion Questions
4.1 Conditional Structures
Sequence
Terms
58
75
64
Chapter 4
Mapping
72
Scope of Location Binding
62
Boolean Type
72
Scope of Name Binding
59
Numeric Types Pointer Type
3.4 Scope of Binding
53
54
Type Conversion
Chapter
72
Blocks
Operators, Functions,
Subtypes
70
Statements
49
Type Binding
68
User-Defined Types
46
3.1 Data Objects
Types
44
130
130
viii
Contents:
142
142
Laboratory Exercises
143
133
104
70
Chapter 6
Procedural Abstraction
Procedures 6.1 Procedures as Abstractions
144
146
Activation Record
6.4 Parameters
Name Parameters
149
6.6 Overloading
152
6.7 Coroutines
158
Parameter Association
IN Parameters
Terms
159
Chapter 7
165
173
175
Discussion Questions
OUT Parameters 159 IN OUT Parameters 1 60
164
167
6.8 Procedures in Ada
157
161
162
6.5 Value-Returning Procedures
149
Global Environment
Procedures as Parameters
147
149
Local Environment
160
Aliasing
6.2 Procedure Definition and Invocation
6.3 Procedure Environment
I:
Exercises
175
176
Laboratory Exercises
178
Procedural Abstraction
Exceptions and
Concurrency 7.1 Exceptions
1
82
183
Definition
185
Raising Exceptions
186
Handling Exceptions
190
Implementation
Ada
7.2 Exceptions in
7.3 Concurrency
7.5
193
in
Chapter 8
8.4 Monitors
210
216
Exercises
219
222 237 Generic Packages 239 Using Packages
224
Terms
229
243
Discussion Questions
Package Definition
217
217
Laboratory Exercises
225
8.5 Data Abstraction in
210
201
233
Private Types
195
Data Abstraction
8.3 Parameterization
Ada Examples
Discussion Questions
199
Ada
8.1 Abstract Data Types
8.2 Encapsulation
204
Terms
194
194 Interprocess Communication Concurrency
Selective Waits
Sieve of Eratosthenes
Data Sharing
Synchronization
201
ATM Management 210
191
Definition and Invocation
1A
and Invocation
Data Sharing
Ada
235
233
Exercises
243
244
Laboratory Exercises
245
237
Contents
fX:
201
Chapter 9
Example Imperative
246
Languages C 247 Philosophy and Approach 248 Information Binding 248
Data Aggregates Data Abstraction
251 254
Data
Abstraction
Terms
256
Procedural Abstraction
10
10.2 FP:
A
Functional
Model
A
Comparison of LISP Terms 315
293
1
A
301
302
Logic Model
Pure Logic Language
323
319
Example Programs
in Prolog
345 352
Management Systems Query Language SQL 352 SQL as a Logic Language 355
324 Example Program in DP 328 The Process of Deduction 332
Relational Database
335
A Logic-Oriented Language
Syntax of Prolog 340 Nonlogic Model Features of Prolog
:X
315
11.4 Database Query Languages
Basic Components
11.3 Prolog:
313
322
324
Implementation Considerations
FP
317
Exercises
Laboratory Exercises
11.1 Introduction to Logic Language Model
11.2
to
Discussion Questions
Functional-Oriented Language
1
308
310
Examples
293
293
Basic Components
Chapter
288
296
Examples
271
Function Definition
Basic Components
10.3 LISP:
Chapter Appendix
Pure Functional Language
Introduction
270
Laboratory Exercises
259
289
10.1 Functions
269
269
259
Information Binding
Chapter
268
Exercises
and Approach 259
Philosophy
267
Discussion Questions
258
9.2 Description of Modula-2
265
Procedural Abstraction
Control Structures
Data Aggregates
262 263
Control Structures
9.1 Description of
Contents:
339
Terms
357
Discussion Questions Exercises
341
357
358
Laboratory Exercises
359
352
Chapter
Object-Oriented
12
Model 12.1 Object-Oriented Model
360 361
12.4 Comparison with Imperative Model
397 398
Components of the Object-Oriented Model 362 Properties of the Object-Oriented Model 362
Polymorphism
An Example 363
Terms
12.2 Smalltalk
Overview
371
Class Hierarchy
An Example 12.3 C
++
398
Discussion Questions
371
Smalltalk Syntax
Inheritance
Exercises
372 382
399
Laboratory Exercises
in Smalltalk
384
398
Bibliography
399
400
388
Components of C + + An Example 395
389
Index
402
Contents
xi:
397
PREFACE Programming language courses
at
the undergraduate level can take several
different approaches. Frequently they present a survey of
major program-
ming languages, giving the students exposure to and experience with a number of different languages. Other courses focus on the underlying theory of programming languages. A third approach is for the course to present the fundamental features and concepts common to all programming languages. This textbook a first course in programming languages
—
for undergraduates
— serves the
third purpose. But
it is
the authors' opin-
on the fundamental concepts cannot be adequately taught without the students being exposed to a variety of languages and gaining experience in their use. Furthermore, the students need some understanding of theoretical topics. This is the philosophy represented by Programming Languages.- Structures and Models. ion that a course
iiiiiiiiiiiiimiiiiiiiiiiiiiiiiimmiiiiiiiiiiiiiiiiiiiiiiiiiiiimi
This
book
is
organized into four different computational models, or par-
adigms, for programming languages. The imperative model first.
Within the presentation of
this
Organization
presented
model, those fundamental features of
programming languages are described imperative languages. This section
is
is
that are
commonly
present in
followed by descriptions of the func-
and object-oriented models. Because these three models share many fundamental concepts with imperative languages, their presentation focuses on those features not present in the imperative model and references the imperative chapters for discussion of the common features. The models presented in this book were chosen for their anticipated
tional, logic-oriented,
xili
importance in the field of computer science in the near future. Other models, such as data flow and pattern matching, although interesting, were not chosen because they appear unlikely to have as great an effect.
iiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiimiimiiiiiiiiiiiiiiiiiiiiiiiiim
Each of the four programming language models is described through the use of a model language that exhibits the important aspects of that model. For the imperative model, Ada has been chosen as the model language because of the richness of features it contains, especially in data abstraction and concurrency. For those instructors who wish to use a language other than Ada as the empirical model, Modula-2 and C are described in Chapter 9 with the topics appearing there in the same order as they appear in Chapters 3 through 8. Chapter 9 can thus be covered in parallel with Chapters 3 through
Use of model languages
8.
used as the model functional language, whereas hypothetical languages are constructed to represent the logic-oriented and objectoriented models. In all cases, the model languages form a standard against which other languages exhibiting properties of that model are compared. Backus' FP
is
Several such languages are discussed for each model.
iiiiiiiiiiiiHiiiniii
Laboratory exercises are included
at
cises require the students to write
the
end of each
chapter.
language feature found in that chapter and to determine is
These exer-
programs, usually to practice using a
how that
implemented. Most of these exercises are language-independent
they could be assigned for any appropriate language that the student.
It
would be very
is
mum
iiiiiiii iiiiiiiiiiiiiiimiiiiiiiiiiiiiiii
Laboratory exercises
feature in that
available to
instructive for the student to repeat a given
exercise for several different languages.
Most of the laboratory exercises are very generally stated to permit the instructor
maximum flexibility in adapting them to the local environment.
iiiiiiiiiiiiiiiiiiiiiiiiiiiMiiiiiiimiiiiiiiiimmiiiiiiiiiiiiiiiiii
The prerequisite for this book is completion of the first two courses in a computer science curriculum, including experience in programming in one structured imperative language. Pascal is used in examples throughout the book with the assumption that the reader is familiar with that language. However, students whose first language was Modula-2 will have no trouble in
understanding the Pascal examples.
xiv
Preface:
Intended audience
The goal of this book is to teach students of programming languages. This includes the
to
become
intelligent users
choose languages appropriate for different applications, the ability to make effective and efficient use of a language in software development, and the ability to quickly learn ration for
although
new
languages. This
book
is
ability to
not intended as a final prepa-
programming language implementors, designers, or researchers, provides appropriate background preparation for more advanced
it
courses in these areas.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiii
more
be comfortably covered in a single-term undergraduate course. Many different courses can be taught from this book, several of which are outlined here.
This textbook contains
1.
material than can
textbook
Imperative Model Only: Chapters 1-9 could be covered in detail, and students could do a significant amount of programming in several imperative languages that are
2.
Use of This
new
to them.
Imperative Model plus Surveys of Other Models: This course could cover Chapters 1-9, introducing students to only one
new
imperative language, perhaps Ada. Chapters 10-12 could then be covlittle, if any, programming in the nonimperative models. Imperative Model plus One Other Model: Chapters 1-9 could be covered with students introduced to no new imperative languages to the extent that they would program in them. Rather, the students could
ered with 3.
use imperative languages that they already know.
One
of the three
chapters 10-12 could then be covered in detail, introducing the stu-
more languages
illustrating that model and having The other two nonimperative model chapters could be quickly surveyed without any student programming
dents to one or
them program
4.
in those languages.
experience in those models. All Four Models: This course could give approximately equal time to all four models. Chapters 3-6 could be quickly treated, relying heavily
on the
require
more
students' prior experience. Chapters 7
and 8 might
extensive coverage because students are less likely to
have used these features extensively in prior courses. Previously learned languages could be used for programming exercises throughout.
Chapters 10-12 could be covered in
detail, giving
students program-
ming experience with one language representing each model.
Preface
XV:
iiiiiiiiimiiiimiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
The following reviewers provided many helpful comments, which improved the accuracy and coverage of this book:
greatly
Anthony Aaby
Prakash Panangaden
Bucknell University
Cornell University
Acknowledgments
Boumediene Belkhouche
Richard
Tulane University
University of Washington
Bill
John Peterson
Buckles
University of Arizona
Tulane University
Frank
A.
Pattis
John Remmers
Chimenti
Liberty University
Eastern Michigan University
Robert Crawford Western Kentucky University
University of
Al Cripps
Victor Terrana
Middle Tennessee State University
Indiana University Northwest
Thomas Gendreau
Barbara Tulley
University of Wisconsin
Elizabethtown College
Ken Slonneger Iowa
David Oakland
Drake University
The students in the Hope College CSCI 383 course also provided many helpful comments during the class testing of preliminary versions of this
We are also indebted to Frank Ruggirello for his gentle prodding and expert assistance during this book's preparation. He and the Wadsworth staff made this project possible by their professional assistance book.
throughout.
xvi
Preface:
PROGRAMMING LANGUAGES: STRUCTURES AND MODELS
CHAPTER 1.1
What Is a Programming Language?
1.2
Why Study Programming Languages?
1.3
A Brief History of Programming Languages
INTRODUCTION AND OVERVIEW iiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiii
A
language
is
a systematic set of rules for communicating ideas. With a
communication is between people and the language is used in both spoken and written forms. Programming languages differ from natural languages in several important ways. First, the primary communication is between a person and a computer, although programming languages are also useful for communication between people. The second major difference is in the content of the communication, which, in the case of programming languages, is known as a program. Programs are expressions of solutions to problems that are specific enough to give the receiver of the program sufficient information to carry out the solution. A third unique feature of communication via a programming language is the medium used. Since a computer is the intended receiver, this has traditionally meant that programs are represented symbolically as strings of characters as opposed, for example, to audible sounds. Although modern programming environments have somewhat released programmers from this restriction, all languages that we will discuss in this book have been designed for this mode of
natural language, like English, this
communication.
Our working
definition for a
A programming language to express a process
is
programming language
a language intended to
is:
be used by a person
by which a computer can solve a problem.
I . I
What
is
programming language?
a
The four key components in this definition of a programming language are:
1.
2.
3.
4.
—
Computer the machine that will carry out the process described by the program Person the programmer who serves as the source of the communication the activity being described by the program Process Problem the actual system or environment where the problem arises
— — —
Four models for programming languages are detailed in this text. Each one of these corresponds most closely to the point of view of one of the preceding four components. The imperative model is based on the computer's perspective. This is reflected in the sequential execution of commands and the use of a changeable data store, concepts that are based on the way computers execute programs at the machine-language level. Imperative has been the predominant paradigm for languages because such languages are easiest to translate into a form suitable for machine execution. The program of this model
sequence of modifications to the computer's storage. model is most closely related to the perspective of the person. It looks at the problem from the logical point of view. The program is a logical description of the problem expressed in a formal way, similar to the manner that a human brain would reason about the problem. consists of a
The
The
logic-oriented
The
functional
model focuses on the process of solving the problem. programs that describe the operations that
functional view results in
must be performed
to solve the problem.
The object-oriented model most closely reflects the actual problem. A program in this model consists of objects that send messages to each other. These objects in the program correspond directly to actual objects, such as people, machines, departments, documents, and so on. In this book, we will look at all four of these models or paradigms and the ways they are expressed in programming languages. We will also find that all programming languages provide some combination of these viewpoints to allow for efficiency in the construction and execution of programs.
2
Chapter
I
|
Introduction and
Overview
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiii
five major benefits that you will receive through the study of the and models of programming languages. 1. You will improve your problem-solving ability. It is believed that facility with and understanding of natural language affects our ability to think and form ideas. Similarly, a thorough understanding of a programming language can increase our ability to think about approaches to prob-
There are structure
lems. This
is
especially true
when we have
1
languages?
the ability to think about the
power.
You
will be able to
more
intelligently
choose
an
appropriate lan-
guage. In the 1960s and 1970s, programmers could seldom choose the All programmers working for a given organization program in the language that was the "standard" for that site. In many cases, there was only one computer and only one language implemented on that computer. Frequently, the local programmers knew how to program in just that one language as well. This situation has changed dramatically in the 1980s. In the first place, the improved technology of computers and language translators has made many languages available on present-day machines, even on the smallest personal computers. In addition, programmers who have completed courses such as the one for which this book is intended have experience with and
language they used.
were expected
to
an understanding of a variety of languages.
common
programmer to choose from among several possibilities. Whereas in the past this choice was determined by the language allowed, the one implemented on the computer system, or the knowledge of the programmer, today a programmer with an understanding of programming languages can choose a language that makes the problem solution easier and Therefore,
it is
practice today for the
a language for a given project
more 4.
we
efficient.
You
will find
it
new programming languages. As programming languages, we find that new
easier to learn
study the development of
languages and enhancements to present languages are continually being introduced.
We
also see that throughout this
concepts that remain constant as well as
new
1.2
|
development there are key facilities
Why
being added.
study programming languages?
Why
study
programming
problem using the various models of languages described in Section 1.1. 2. You will be able to make better use of a programming language. The study of programming language structures will give you a better understanding of the function and implementation of these structures. Then, when you are programming, you will be better able to use the language to the full degree of its functionality and to do so in an efficient way. Understanding the power of a language will enable you to utilize that 3.
.2
3:
An important result of the study of programming languages is the new languages and new capabilities of existing languages
ability to learn
Through a thorough understanding of programming language models, one can quickly assess a new language in comparison with those models and determine the ways the new language is the same and the ways it differs. 5. You will become a better language designer. This benefit is more as they are developed.
important than
it
first
appears.
Few people
ever have the desire or the
own programming
opportunity to design their
language. Although you
if we hold our view that language is a means of communication between a person and a computer, then every computer system that is developed must have
may be an
exception, chances are that you will not be. However,
to
a language incorporated within tion.
A good
it
to provide for
human/machine
interac-
understanding of programming language principles can greatly
assist in this interface design.
many modern languages have
In addition to this,
they are extensible in a variety of ways. This
means
the property that
that the
programmer
can enhance the language through the addition of new data types and operators. In these, languages, every program actually is a new language design in the sense that the programmer has the power to enhance the original language.
miiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
It
is
programming languages to have some Much has been written on this topic, including
helpful for understanding
appreciation of their history.
Sammet (1969) and Wexelblat
we
have limited our consid-
eration to those languages that are either used extensively today or that
originated important concepts for today's languages. Thousands of lan-
guages have been implemented and most of these have made important contributions to the field. We have necessarily limited our consideration
We
we
feel
have had the greatest influence.
have structured
this history into three periods.
The
first is
the
period for about a decade beginning in 1955, during which the first higherlevel languages were being developed with a wide variety of philosophies
—
—
and concepts. The second period 1965-1971 was a time of consolidation around the model of one language, ALGOL 60, with the development of a number of new languages derived from ALGOL 60 but extending it by adding important new features. In the final period, 1972 and after, the results of earlier research on languages were pulled together to introduce new models and approaches for programming languages.
4
.3
A
brief history
of programming
(1981).
In the historical overview that follows,
to 15 that
1
Chapter
I
|
Introduction and
Overview
languages
FORTRAN. FORTRAN higher-level
has the distinction of being the
programming language.
Prior to
its
first
widely used
implementation,
The
many
people were skeptical about the possibility of a language being compiled successfully and efficiently. FORTRAN quickly erased such skepticism and
became years
a very popular language that
is still
in
heavy use more than 30
later.
We
Most early languages were named by acronyms. in a language's name by writing it in all uppercase
name
display the full
guage's name.
will indicate this
letters.
We
will also
mention of the
in parentheses after the first
lan-
FORTRAN (FORmula TRANslation) was designed and imple-
1956 by John Backus at IBM. It was specifically designed for a single machine, the IBM 704, and still bears the marks of some of the
mented
in
idiosyncrasies of that machine.
language that incorporates maintaining
It
has evolved through the years into a
many modern language
facilities
while
still
original character.
its
FORTRAN was
designed for solving
algebraic notation. Since
it
scientific
problems, adopting an
was a pioneer language,
it
made many
contri-
among these were (1) variand assignment statements, (2) the concept of types, (3) modularity through the use of subprograms, and (4) formatted input/output.
butions to language development. Included ables
COBOL.
Although much early programming was scientific in nature, in the 1950s many business applications were being programmed as well and the requirements of such programming were not well handled by FOR-
TRAN. There was an evident need
for a
common
language suitable for
these business applications. In 1959, through the initiative of the U.S.
Department of Defense, a committee was formed to develop a language to meet these needs. This committee consisted of representatives from computer manufacturers and the Department of Defense. The resulting language,
COBOL (COmmon
implemented
Business Oriented Language), was
first
I960 and soon became the standard for business data
in
processing applications.
Beyond the
difference in their intended applications, there
were two
FORTRAN and COBOL were developed. of one organization, IBM, whereas COBOL resulted
major differences in the ways
FORTRAN was
the effort
from the cooperative
effort of
many
organizations. Furthermore,
FOR-
TRAN was
designed to be run on a single machine and in fact the architecture of that machine affected many of the design decisions. COBOL, on
was designed independently of any specific computer with it be implemented on all computers. Like FORTRAN, COBOL has evolved over the years as new standards have been developed. It has also continued to be used extensively over a 30-year period. Its strengths are in the manipulation of files and in handling
the other hand,
the intention that
fixed decimal data.
1.3
|
A
brief history of
programming languages
5:
early languages
One like.
For
COBOL was that its code be Englishprograms tend to be very wordy, and many cumbersome. However, the English-like property
of the primary objectives of this reason,
programmers find of COBOL was an
COBOL
this
early attempt at designing a language to facilitate the
and thus an important contribution to the development of later languages. The other major contribution of COBOL was the record which the introduction of the heterogeneous data structure became an important component of later languages. readability of programs,
—
—
ALGOL
60. The ALGOL (ALGorithmic Oriented Language) 60 language had a European origin and was designed by an international committee in I960. Although it never enjoyed the commercial popularity of FORTRAN and COBOL, it is the most important language of this era in terms of its
on later language development. Like FORTRAN, ALGOL 60 was designed for use in scientific problem solving. Unlike FORTRAN, it was designed independently of an implementation. This was both a major asset and a major liability. Its machine independence permitted the designers to be more creative, but it made influence
implementation
much more
difficult.
ALGOL 60 had was a result of its descripNaur (1963). This report became the accepted definition of the language and was a model of clarity and completeness. A major contribution of this report was the introduction of BNF notation for defining the syntax of the language. This notation will be described in Chapter 2 and used throughout this book. ALGOL 60 was only used on a limited basis, mostly by research computer scientists in the United States and by Europeans. Its use in commercial applications was hindered by the absence of standard input/output facilities in its description and the lack of interest in the language by large computer vendors. ALGOL 60 did, however, become the standard for the publication of algorithms and had a profound effect on future language One
tion as
of the greatest impacts
found
in
development.
Some
of the major contributions of ALGOL 60 to later languages were
(1) block structure: the ability to create blocks of statements for the scope
of variables and the extent of influence of control statements; (2) structured control statements: iteration control;
LISP. Like that
if-then-else and the use of a general condition
and (3) recursion: the
FORTRAN and COBOL,
ability of a
procedure to
LISP (LISt Processing)
was developed for a specific application and is was developed by John McCarthy in the
today. LISP
Group
at M.I.T. in
still
is
for
call itself.
a language
extensively used
Artificial Intelligence
the late 1950s as a language to support
artificial intelli-
gence research. It was first implemented in I960 on the IBM 704. It has remained the primary programming language for artificial intelligence through the years. Unlike FORTRAN and COBOL, LISP has never been
6
Chapter
I
|
Introduction and Overview:
standardized by a national organization and exist.
on
many
dialects continue to
Common LISP was defined in 1981 as an informal standard, and work on Common LISP is presently underway.
a formal standard based
LISP pioneered the idea of nonnumeric or symbolic computing.
introduced as
its
LISP language
is
basic data structure the concept of the linked
functional in nature. This
means
It
also
list.
The
that rather than specify-
ing operations as a sequential set of statements, LISP specifies the invoca-
main device for model of computation will be defined and
tion of a function, using composition of functions as the
specifying multiple actions. This
explored in Chapter 10 along with a description of the LISP language. LISP also used the same basic construct, the S-expression, to represent
both data and program, thus allowing a program to be accessed as data and data to be executed as a program.
APL. Still another language designed in the late 1950s was APL (A Programming Language), which was the creation of Kenneth Iverson. Iverson did his initial work on the language at Harvard and later continued it at IBM. APL was enthusiastically received by a number of programmers. It consisted of many powerful operators and a simple, mathematical notation. The large number of operators resulted in the requirement of a large character set. This latter requirement made implementation of the language difficult. The mathematical nature of APL discouraged programmers who were not adept at mathematics. The definition of APL was specified in Iverson (1962).
The primary data structure of APL is the array and the language features operators that apply to an entire array. Iterative processing
by placing the data array.
The
in
is accom- plished an array and applying a single operator to that entire
variables of
APL are untyped, taking on the type of the objects
assigned to them.
APL
and array processing applications. Because of its powerful operators and compact notation, a great pastime among APL programmers is the construction of one-line programs. Such programs actually use APL in a purely functional manner that very closely matches the functional model. is
especially useful for mathematical
BASIC. The BASIC (Beginners All-purpose Symbolic Instruction Code) language was developed at Dartmouth College by Thomas Kurtz and John Kemeny in the mid-1960s. Its objectives were to be easy for undergraduate students to learn and to use the interactive programming environment that was also under development at Dartmouth at that time. BASIC was quite popular in academic circles over the next decade, but its greatest popularity came with the arrival of the microcomputer in the mid-1970s. The marketers of microcomputers needed a language that would be useful to the consumer. The two major criteria were that the language be easy to learn and that it exploit the interactive environment
.3
|
A
brief history of
programming languages
7:
provided by the microcomputer. Since BASIC was designed with these same two objectives a decade earlier, it was chosen as the language that
was provided with all of the early microcomputers. Although the microcomputer gave BASIC an important place in the history of programming languages, BASIC contributed little to the development of programming language technology. Perhaps its greatest contribution was that it was one of the first languages to provide an interactive programming environment as a part of the language, including the interpretive execution of programs.
The
six
languages described in the
last
section represent the
first
wave of
language development. The next wave built on the ideas and concepts of that first
wave. The most important languages to appear
in the latter half
ALGOL 60 language. Four of these ALGOL-based languages are described in the following of the 1960s were based on the key concepts of the paragraphs.
PL/I. The philosophy behind PL/I (Programming Language
I),
developed
IBM in the mid-1960s, was the replacement of the multitude of languages were in use for specific applications with one general-purpose language. The approach used was to incorporate features from each of the at
that
earlier languages into PL /I. For
example, PL /I included the block structure,
control structures, and recursion from
matted input/output from FORTRAN,
ALGOL file
60,
subprograms and
for-
manipulation and the record
COBOL, dynamic storage allocation and linked structures and the array operations from APL. PL /I, though highly promoted by IBM, never became as popular as its designers hoped. The major difficulty was a lack of cohesiveness in the language design, which contained many different features implemented in many different ways. The language was complex, difficult to learn, and difficult to implement. Two possible remedies for these problems were included in the language: the use of many defaults that could remain transparent to the user, and the intention that a programmer needed to learn only a subset of the language for a given application. These remedies proved to be inadequate, however. Two features of PL /I that have significantly impacted later language development are interrupt handling the ability to execute specified procedures when an exceptional condition occurs and multitasking, the specification of tasks that can be performed concurrently. These topics are structure from
from
LISP,
—
explored in Chapter
—
7.
Simula 67. Simula 67 was developed by Ole-Johan Dahl and Kristan Nygaard at the Norwegian Computer Center in the early 1960s. The original work was based on ALGOL 60 and was intended to be a language for system
8
Chapter
I
|
Introduction and
Overview
ALGOL-based
languages
The first version was called SimThe designers soon discovered that this language had potential beyond simulation, and to realize this potential they extended the original design description and simulation programming. ula
1.
to Simula 67.
The major contribution of Simula 67 is the concept of class. A class is an encapsulation of data and procedures that can be instantiated in a number of objects. The class of Simula 67 is the forerunner of abstract data types as implemented in Ada and Modula-2 (see Chapter 8) and of classes from the object-oriented languages Smalltalk and C + + (see Chapter 12). The latter two languages also adopted from Simula 67 the hierarchy of classes with inheritance of
ALGOL ALGOL ecessor.
68. Although 60,
ALGOL
68
its
is
was designed
It
components.
name
implies that
is
it
an improved version of
from its predbe a general-purpose language, as opposed to
actually a rather radical departure
to
having the scientific orientation of ALGOL 60.
ALGOL 68 by ALGOL 60.
never gained acceptance even to the limited level attained
This was, in part, because the original description (van Wijngaarden and others, 1969) was difficult to understand, using notation and terminology that was foreign to its readers.
The major design philosophy of ALGOL 68 tion,
small
namely, orthogonality.
number
constructs.
A
language that
is
is
also
its
major contribu-
orthogonal has a relatively
of basic constructs and a set of rules for combining those
It is
then possible to combine these constructs using any of the
rules with predictable results. This approach
PL /I, which included a large
number
is
in
opposition to that of
of independent constructs.
The most popular of this second wave of ALGOL-based languages is Pascal, developed by Niklaus Wirth in 1969 and named for the mathematician Blaise Pascal. Wirth 's goal was to provide a language that is simple to learn, supportive of structured programming, and easily implemented. He intended it to be a language suitable for use in the teaching of programming. The defining document is provided by Jensen and Wirth (1978). By the early 1980s, Pascal had become by far the most commonly used language for teaching programming at the college level. By the mid-1980s it also had become popular as a production language on microcomputers. Pascal's flexible control structures, user-defined data types, and file, record, and set data structures have made it a model for many of the Pascal.
languages of the next stage of development.
Although
all
were actually become major languages in the 1980s.
of the five languages discussed in this section
designed in the 1970s, they have
all
The designers of these languages benefited
1.3
|
A
greatly
Languages of the
from experience with
brief history of
programming languages
9:
1
980s
earlier languages,
and they all include features
that take advantage of mod-
ern hardware and software technology. The first two languages use entirely different models of computing than the earlier languages, while the the
ALGOL
line that
we
last
three continue development in
the imperative model.
call
Prolog. Prolog was developed at the University of Marseilles in France in 1972. It was designed for artificial intelligence applications and is based on formal logic. The logic-oriented model of programming served as a basis for Prolog, but Prolog falls short of the model's ideal of clauses that
describe the problem and can be expressed in an order-independent way. This
model and Prolog are described Prolog has
research.
It
become
in
Chapter
11.
a competitor with LISP for artificial intelligence
when
received higher visibility
it
was chosen
as the language
of the Japanese Fifth Generation Project.
Smalltalk. Smalltalk was developed by Alan Kay
at
the Xerox Palo Alto
Research Center in the early 1970s as part of the Dynabook project.
The two distinguishing the
strict
features of Smalltalk are
its
environment and
use of the object-oriented model. The Smalltalk language
embedded
is
within a graphical environment that includes pop-up menus,
windows, and the use of a mouse device for
input. This
environment has
served as the prototype for
many modern programming environments,
the most famous of which
that of the
Smalltalk
is
is
Apple Macintosh.
designed around the Simula 67
class
concept and includes
encapsulation, inheritance, and instantiation. All operations in Smalltalk
and the model are described in Chapter 12. This highly extensible and interactive language will undoubtedly have major impact on future language development.
consist of objects sending messages to other objects. Smalltalk
object-oriented
a
C. The language C was developed
at Bell
Laboratories in the early 1970s
implementing the UNIX operating system. C is a powerful language with facilities for access to raw data stored in memory as well as access through data types and structures of the language. The standard is defined by Kernighan and Ritchie (1978). The objective of C is to provide a language that has access to low-level data and generates efficient code. The language has an extensive set of operators. As a result, programs are often expressed with compact code at the expense of readability. A description of the C language is given in Chapter 9. as a language for
C
has
grown
conjunction with UNIX's acceptance as an excellent language for the construction of
in popularity in
an operating system. C
is
portable systems programs.
10
Chapter
I
|
Introduction and Overview:
Modula-2. Modula-2 was developed in the late 1970s by Niklaus Wirth, designer of Pascal, as an improvement to Pascal, especially for use in systems programming. Wirth developed the language as a part of the Lilith project, whose goal was the creation of an integrated hardware/software system. The result as described in Wirth (1982) is an excellent general-
—
—
purpose language that has replaced Pascal as a teaching language universities. Modula-2 is also described in Chapter 9. Modula-2 offers the following improvements over Pascal: 1.
Modules can be used
to
2.
All control structures
have a termination keyword.
3.
Coroutines provide for interleaved execution.
4.
Procedure types can be declared.
implement
The modules of Modula-2 make large software
Ada.
development
it
in
many
abstract data types.
an excellent language for use
in
projects.
Department of Defense initiated a project to obtain a suitable programming language for the development of embedded systems. An embedded system is a computer system that operates as a part of a larger system. A large portion of the work done by the Defense Department is on these embedded systems. In the early 1970s, the
After evaluating existing languages against the criteria desired, the
Defense Department decided that no language existed that met
and
that a
new
its
needs
language should be designed.
In 1977, a competition
was
initiated
among
four contractors to design
was chosen and the resulting language was given the name Ada, after Ada Augusta, the Countess of Lovelace and daughter of Lord Byron. She was a collaborator with Charles Babbage in his work on the Analytical Engine in the nineteenth century and is considered by many to be the first programmer. The standard for Ada is given by the Reference Manual (American National Standards Institute, 1983). Every compiler that aspires to be called an Ada compiler must be validated by the Department of Defense. Subsets and supersets of the language are not permitted. Ada is primarily based on Pascal, but it uses the class concept of Simula a suitable language. In 1979 the winning design
67
in
its
abstract data type facility called a package, adopts the exception-
handling features of PL /I, and provides an extensive tasking
facility for
concurrent processing.
Ada has been chosen as the primary example of an imperative in this book because of its wide range of facilities and the fact
guage its
structure
is
lan-
that
representative of the class of imperative languages.
.3
|
A
brief history of
programming languages
11
CHAPTER wmmmmmmammm
2.1
Language Specification
2.2
Language Translation
2.3
Language Design Characteristics
2.4
Choice of Language
PRELIMINARY CONCEPTS iiiini
This chapter presents a
number
iniiiiiii
H
iiiiiiiiiiiiiniiiiiiiii
of concepts that are necessary for a com-
plete understanding of the remainder of the text.
Section 2.1 describes several languages that can be used to describe the syntax of other languages. These are called metalanguages and will
prove useful throughout the book for expressing language syntax.
The structure of a programming language
is
frequently affected by the
process used to translate programs written in that language into an executable form. Section 2.2 presents an overview of that translation process.
Section 2.3 outlines a
number of
characteristics of a
language that are desirable for enhancing
its
effectiveness.
It
programming provides stan-
dards against which languages and their various features can be judged. Criteria useful for
choosing the appropriate language for a given applica-
tion are listed in Section 2.4.
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII
The description of a language is commonly broken into two parts: syntax and semantics. The syntax of a language is the set of rules that determines which constructs are correctly formed programs and which are not. The semantics of a language is the description of the way a syntactically correct program is interpreted or carried out. For example, the syntax of
X.I
Language
specification
Pascal tells us that
a
:
= b;
forms a correct assignment statement, while the interpretation of the statement as "replace the value of a with the current value of b" is a result of Pascal's semantics.
13
In our discussions in this text, we will use a specific formal tool, known EBNF, to describe language syntax. Language semantics will be described as in a much less formal way, usually in English. The reason for this is that
although there are tools for describing semantics, these tools are rather
beyond the scope of this book. Our formal language specifications will therefore concentrate on syntax. Language specification is important for two reasons. First, it is useful describing a language, and this is the way that we will use it throughout in the text to describe both real and theoretical programming language concomplex and
structs.
abstract,
and
their use
is
But language specifications are also useful in the verification of the
which an program can be tested. Such syntactic verification is typically carried out by language translation programs such as compilers.
validity of
programs, since they give us a
set of rules against
allegedly legal
The syntax of
a language
rules that defines
language.
The
all
is
described by a
grammar, which
is
a set of
of the valid constructs that can be accepted in the
basic elements of a
grammar
are:
These are the atomic (nondivisible) symbols that can be combined to form valid constructs in the language. Terminal symbols most commonly are a set of characters, though some languages may consider certain character strings to be symbols as well. 2. Set of nonterminal symbols These symbols are not included in the language itself, but are symbols used to represent intermediate definitions within the language as defined by productions. These nonter1. Set
of terminal symbols
minal symbols
represent syntactic classes or categories.
ofproductions symbol. It is of the form 3. Set
A production
is
a definition of a nonterminal
where x is a nonterminal symbol and y is a sequence of symbols each of which can be either terminal or nonterminal. 4. Goal symbol One of the set of nonterminal symbols is specified as the goal symbol. This is also sometimes called the distinguished symbol or the start symbol.
Two mar.
rules
must be obeyed for the above components
to
form a gram-
They are
1.
Every nonterminal symbol must appear to the
2.
least one production. The goal symbol must
not^
left
of the
appear to the right of the
production.
14
Chapter 2
|
Preliminary Concepts:
:
:
:
:
= of
at
= of any
Grammars
To
mar 2.1.
illustrate this
concept of a grammar,
let
for describing a calculator language. This
Note
that this
grammar has 16
us construct a simple gram-
grammar
is
given in Figure
terminal symbols and 8 nonterminals.
Also note that every nonterminal except for the goal symbol has multiple productions defining
they are
it.
Although such multiple productions are not required,
common and represent alternate definitions. You can also see that
recursive productions are permitted, that
terminal being defined
is
also
found
in
is,
its
productions where the non-
own
definition
on the
right-
hand side of the production. There are two ways that a grammar such as the one found in Figure 2.1 can be used. The first is to generate valid programs in the language. If we begin with the goal symbol and at each step substitute some definition for a nonterminal, proceeding until all remaining symbols are terminal symbols, we have generated a valid program. Examine the generation sequence found in Figure 2.2. At each step, the left-most nonterminal was replaced by a definition from one of the productions. Since multiple productions might apply to any nonterminal, there are multiple choices that
can be made. The choice of a production to apply
is
arbitrary in these
The choice of the left-most nonterminal for expansion was also arbitrary. An infinite number of valid calculations can be generated in this cases.
way.
The second way a grammar can be used is in the reduction of a valid program back to the goal symbol through the reverse application of productions. This verifies that a string of terminal symbols is indeed a program in the language defined by the grammar. This derivation is only possible if a prudent choice is made of the sequence of productions to be applied, as in Figure 2.3. For example, an initial choice of sign for + would have led to an early dead end in our reduction. Furthermore, if this process is attempted on a string that is not a valid program, the goal symbol can never be reached. Figure 2.4 shows this through an example where every step is uniquely determined and the final string matches none of the productions' right-hand sides.
In this section
we will
describe a language for expressing grammars. Since
we
BNF
The metalanguage we describe is BNF, which stands for Backus-Naur Form. This language was created to express the syntax of ALGOL 60 and has become the standard metalanguage. We will use BNF along with three extensions to describe syntax throughout this text. The extended language will be called EBNF, for Extended BNF. The metalanguage described in the previous section is what we will define as BNF, with one notational addition. In BNF, nonterminals will have their names enclosed in angle brackets () to allow a string of characters this
is
a language for describing languages,
call
a metalanguage.
it
2.
1
|
Language specification
15:
and
EBNF
FIGURE
2.
1
Grammar
for calculator language
0123456789+-*/=
Terminal Symbols:
Non-Terminal Symbol Ls:
Productions: 1. calculation 2. expression 3. expression 4. value value 5
:
:
=
:
:
=
:
:
=
:
:
=
:
:
=
.
:
:
=
.
:
:
=
:
;
=
:
:
=
:
:
=
:
=
.
number 7 number 8. unsigned 9. unsigned 10. digit 11. digit 12. digit 13. digit 14. digit 15. digit 16. digit 17. digit 18. digit 19. digit 20. sign 21. sign 22. operator 23. operator 24. operator 25. operator 6
calculation expression value number unsigned digit sign operator
expression = value value operator expression number sign number unsigned unsigned unsigned digit digit unsigned .
1
:
:
:
:
:
:
= 2 = 3 = 4
:
:
= 5
:
:
:
:
:
:
:
:
= 7 = 8 = 9
:
:
= +
:
:
= -
:
:
= +
:
:
= -
:
:
= *
:
i
= 6
— /
16
Chapter 2
|
Preliminary Concepts:
FIGURE
2.2
Generation of a calculation using the grammar
Figure 2.1
Production Applied
Current String calculation expression =
1
3 =
value operator expression number operator expression = unsigned operator expression = digit unsigned operator expression 2 unsigned operator expression = 2 digit operator expression = 25 operator expression = 25* expression = 25* value = 25* number = 25* unsigned unsigned = 25* digit unsigned = 25*1. unsigned = 25*1. digit = 25*1.5=
4 6
9 12 8
15
24 2
4 7 8
.
11
.
name
in
8
15
be distinguished from the corresponding string of terminal characters. For example, the nonterminal symbol can be distinguished from id, a string of two terminal symbols. See Figure 2.5 for a BNF definition of the calculator grammar. We are now ready to add three features to BNF to transform it into our EBNF These features do not add any capabilities to the language, but rather enhance the compactness of the expression. The first new feature, alternation, is the use of the or symbol to express alternate definitions for the same nonterminal within a single production. This symbol is a vertical bar. For example, the set of productions in a
nonterminal
:
:
:
:
:
:
:
:
:
:
to
= 1 = 3 = 5 = 7 = 9
2.1
|
Language specification
17:
FIGURE
Verification that
2.3
=
6 + 3/12
is
a calculation
i
/
operator
digit
unsigned
operator
unsigned
\
number
digit
\
unsigned
\
number
digit
digit
\
unsigned
value
calculation
FIGURE
Reduction of
2.4
digit
operator
unsigned
?
invalid string
?
number value
18
Chapter 2
|
Preliminary Concepts:
FIGURE
2.5
Calculator
:
;
s
:
:
=
:
:
=
:
:
=
:
:
=
:
:
=
:
:
=
:
:
=
:
;
=
:
;
=
:
;
=
grammar
We
BNF
=
.
1
= 2
:
:
:
:
:
:
:
;
:
:
:
:
:
:
= 8
:
:
= 9
:
;
= +
:
:
= -
:
;
= +
:
:
= -
:
:
= *
:
:
= 3 = 4
s 5 = 6 = 7
= /
can be compressed using
in
this notation to
::=1|3|5|7|9
add optionality by use of the brackets ( ] ) to specify an and repetition by use of the braces ({ }) to specify a repeated The brackets indicate zero or one occurrence of the enclosed specalso
[
optional item item.
ification,
while the braces indicate zero or more repetitions of the enclosed
specification.
2.
1
|
Language specification
19
For example, the production
:
:
=
[a]
b {c}
specifies the following strings as valid goals:
b
ab be abc bec abec bece abece
This indicates that there are zero or
one as followed by one b followed
by zero or more c's. Although these brackets and braces are simply defined, their interpretation can be quite complicated when they are nested and/or include the or metasymbol. Consider, for example, the production
Although
:
:
=
[{a} b]
{d
|
e}
expresses a production compactly,
this
has been lost in the process, since the definition
When used with
care, these extensions
we is
see that
some
clarity
rather complex.
can improve not only the com-
pactness but also the clarity of grammar definitions. Figure 2.6 shows
how
these extensions can be used in defining our calculator grammar. As you
can see,
and
new
this
notation frequently eliminates the need for recursion
alternatives.
One purpose of EBNF is to enhance the clarity of expression for gramis one difficulty that arises in its use when a symbol used in EBNF (a metasymbol) is also to be a terminal symbol in a grammar. For mars. There
example,
if
|
were
a terminal
symbol
in the
grammar being
defined, then
the production
:
:
= a
|
b
could be interpreted in either of two ways.
First,
the nonterminal x might
have two alternative definitions, a or b, if is interpreted as a metasymbol. If, however, is interpreted as a terminal symbol of the grammar, this |
|
specifies a single definition consisting of three terminal symbols.
We will a
avoid this confusion by using the following convention:
metasymbol
is
also a terminal
20
When
symbol of the grammar being defined, the
Chapter 2
|
Preliminary Concepts:
FIGURE
Calculator
2.6
grammar
in
EBNF
= =
[ ] = [] [. ] = digit {} =
=
0|1|2|3|4|5|6|7|8|9
= +
-1*1/
= +
symbol will be underlined when it is to represent the terminal symbol and will represent the metasymbol otherwise. Using this convention, the preceding production would represent two alternative single symbol definitions of x. If we wished to express the three-symbol definition, it would be written
:
:
= a
I
b
A somewhat different approach to expressing grammars was used by Wirth
Syntax diagrams
syntax diagram. It two-dimensional directed graphs whose nodes
in his definition of Pascal. This tool is called the
expresses productions as are symbols.
The possible paths through the graph represent the possible
sequences of symbols
nonterminal of the production.
that define the
Terminal symbols are represented by ovals and nonterminals by rectangular nodes. Syntax diagrams have the advantage of using two dimensions to enhance understandability Their disadvantage
is
the difficulty in
generating the diagrams using a linear input device such as a keyboard. Figure 2.7 shows the syntax diagram for our calculator grammar.
BNF and equivalently by EBNF are grammars. This means that the valid definitions of a nonterminal symbol are independent of the context in which the symbol is found. Most programming languages cannot be completely specified by
The grammars
known
that
can be described by
as context-free
a context-free sensitive,
grammar because they
meaning
that their
contain
some
Problems with specifications
rules that are context-
nonterminal definition depends on context.
common
requirement that a variable must be declared before it is used cannot be expressed in a context-free grammar, since the validity of a variable depends on whether its declaration is in the context For example, the
of its use or not. Although formal tools
do exist to express
2.
1
|
context-sensitive
Language specification
21
FIGURE
2.7
Syntax diagram for calculator grammar
calculator ^
expression
}
value
expression tpt
value
operator
^
^
value
^
J
sign
unsigned
^
^T^ ^
unsigned
unsigned >
digit
^FT
digit
£
digit ~7F~
0)Cl)(2)C3)C4)C5)C6
)C7 )(8 )(9
sign
^r
operator ^T^
z tthen = z + 1 z else = x + u; X u
:
:
:
:
:
end; 7.
A
certain
magazine distributor stores each subscription it distributes and assigns each subscription a code number. That code
in a database
number
looks
like:
JIP081#301870189018901900190019178
Here the
first
three letters of the code are the
subscriber's last
numbers
first
three letters of the
name. The next three characters are the
in the street address.
This field
is
omitted
if
there
first
three
no
street
is
a post office box). The next charThe next character(s) indicates the number of subscription terms, which can be an integer from 1 to 99. The following four characters indicate the starting date of the first subscription term in the format mmyy. A pair of start/end dates is given for each subscription term. The last two characters in the code are a
address (for example, the address
acter
is
always a
two-digit
number
is
sign (#).
ID code for the magazine.
40
Chapter 2
|
Preliminary Concepts:
In the preceding
sample code, the
fields
would be interpreted
as
follows:
JIP 081 #
First
three letters of subscriber's
First
three digits of street address
Number Number
3
last
name
sign
of subscription terms
0187 0189
Start date
189 190 190 0191 78
term 2 End date of term 2 Start date of term 3 End date of term 3 Magazine ID code
of term
End date of term
1 1
Start date of
Give a specification for a language that recognizes
all
forms of
subscription code numbers using BNF and syntax diagrams (not EBNF). Be sure your language accepts exactly what is specified earlier nothing more or less. You may have to invent a few names for nonterminal symbols. You may assume that and are nonterminal symbols and are predefined for your use.
—
8.
Consider your specification for Exercise strings, give a derivation
a.
b.
9.
and a parse
7.
For each of the following
tree.
JIP081#301870189058906900591129278 SMI#10719128733
The following EBNF
specification
is
ambiguous:
::= if then
if then else
Assume
that
and
are defined elsewhere
and
that
they describe a boolean expression and other statements, respectively. a.
Show
b.
Give a different specification that describes the identical language but that is unambiguous. Do not give further definitions to
that the specification
is
ambiguous.
and . 10. Consider this language specification:
: : : :
:
=
:
=
:
:
* *
A
= {} = C
Exercises
41
symbols is {c,*, A }, and E is the only nonterminal symbol. The start symbol is obviously E. Give a derivation and parse tree for the string a.
The
set of terminal
*
{c
b.
c
A
c
A
*
The preceding
c}
specification
is
ambiguous. Give a proof of
its
ambiguity.
11. Give a
BNF
number
specification for the language that consists of an
of as followed by an
even
odd number of b's.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiniiiiim
me calculator language described in this chapter.
1.
Write an interpreter for
2.
Write a program that translates a
grammar
written in
EBNF
into a
BNF
grammar. 3.
Write a program that accepts as input an expression in the calculator
language and generates the parse tree for that expression.
42
Chapter 2
|
Preliminary Concepts:
Laboratory exercises
CHAPTER 3.1
Data Objects
3.2 Scalar
Data Types
3.3
Execution Units and Scope of Binding
3.4
Scope of Binding
INFORMATION BINDING iimiilliliiiiiiiiiiin
n
iimmiiiiiiiiiiiiniiim
The imperative model of programming languages was created to mimic, as closely as possible, the actions of computers at the machine language level. At that level, computers operate with two major units the central processing unit (CPU), where computations are performed, and the memory, where data are stored. The typical unit of execution in machine language, which may or may
—
not be a single instruction, consists of the following four steps: 1
Obtain the addresses of locations for a result and one or more operands.
2.
Obtain the operand data from the operand location(s).
3.
Compute
4.
Store the result data in the result location.
result data
from the operand
data.
For example, a simple assignment statement such as A
:
= B
+
C;
would be executed
as follows:
and
1.
Obtain the addresses of
2.
Obtain the data from the addresses of B and
3.
Compute
4.
Store the result in the location of
the result of B
A, B,
+
C. C.
C. A.
Imperative programming languages have abstracted away the use of
addresses in favor of names, but otherwise have retained the preceding four steps as a standard program unit. This unit of execution has
become
45
the fundamental execution unit of imperative languages and is called the assignment statement. The BNF form for such a statement is given by
where the first component, name, represents the result location, the second component is an operator that signifies assignment ( = in Pascal and Ada, for example), and the third component is an expression that specifies the names of the operands and the computation to be performed. :
Fundamental to the performance of
this
assignment
is
the establish-
ment and use of a number of bindings. In step 1 of the preceding execution model, a binding between names and locations must be used to obtain the locations of the operands and result. In step 2, a binding between location and value must be referenced to establish operand values. And finally, in step 4, a new binding between the result location and its computed value must be established. A view of programming languages from the perspective
of bindings
is
presented in Section
3.1.
our model unit of execution, the computations depend on the interpretation of the data and of the operations that are defined. Programming languages relate data and operations through the mechanism known as type. In this chapter we discuss the role of type in the binding and computation process and review the properties of several fundamental types. A final consideration related to the binding of names and locations is the ability to establish and change these bindings within the program. The part of the program in which a name-location binding is preserved is called the scope of the binding. In Section 33 we examine ways in which scope of binding can be defined within a language. In step 3 of
iiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiiiiimiiiiiiii
The
abstraction that
ity in
we will
imperative languages
a four-tuple (L,N,V,T),
use to express the assignment statement is
activ-
where L
is
the
and T the type of the object. We call the assignment to one of these components a binding. The implication here is that these bindings can be changed at certain times. Figure 31 shows a visualization of a data object and its bindings. The four bindings are all represented here as lines from the data object to the corresponding objects to which it is bound. The storage space, from
46
3,
1
Data objects
A data object is defined as location, N the name, V the value,
the data object.
Chapter
3
|
Information Binding:
Bindings
FIGURE
3.
1
Data object and bindings
Type Space
Storage Space
Type
Value
Binding
Binding
Identifier
Location
Name
Binding
Binding
Space
which location bindings are selected, is the set of virtual storage locations available within the computer system on which the program will be executed. This space is completely invisible to the programmer, who only needs to know that the binding takes place and when, and does not need to know the specific location to which the data object is bound. The time at which a binding takes place is often an important consideration. There are three times when bindings can occur: 1.
Compile time
—when the program
is
being translated into machine
language 2.
Load time compiler
is
—when the machine language program generated by the being assigned to particular locations within the storage
space of the computer 3.
Run-time
—when the program
For convenience, the location time.
is
being executed
binding commonly occurs
at
load
We will see later that bindings to locations can occur during run time
as well.
3.
1
|
Data objects
47
The identifier space of a language is the collection of all possible names that can be given to data objects. In addition, program units are also bound to names selected from the same identifier space. The definition of the identifier space is an important component of a language. In this text we will highlight language components in boxes that will include the description of the component for some example languages, frequently Ada or Pascal. This will also provide you with a template
Identifiers
components for new languages that you might encounter. The laboratory exercises will frequently ask you to do this. for expressing these
Space
Identifier
Language:
Ada
Definition:
::= Usage: is
The domain of names expressed
in this
{
[underline]}
available for data objects
and program
units
way. Uppercase and lowercase are not
distinguishable.
The accompanying box defines the identifier space as the set of all one or longer beginning with a letter, containing only letters, digits, or underscores, and ending with a letter or digit. We will use BNF notation to express language components, where possible, character strings of length
to avoid ambiguity.
Name binding the point
Note
where that
for a data object typically occurs at
a declaration
is
compile time
at
encountered by the compiler.
name binding becomes
a
more complex
issue
when one
is
dealing with aggregate data objects such as arrays and records. Such aggregate data objects, although
bound
to a single
name, are bound
to multiple
locations. In addition, while they are data objects themselves with type,
and location bindings, the individual components of aggregate strucdo not have a binding to a simple name in the identifier space. Rather, each component is identified by a compound binding of some form. For example, if ITEM is a record variable with one of its components being ID, then the simple name ITEM is bound to several locations corresponding to the separate component data objects of the aggregate object ITEM. Moreover, the compound name ITEM. ID is bound to a single data object. We
value,
tures
will ignore
such complications in our present discussion by limiting our
attention to simple, or scalar, data objects with simple
48
Chapter 3
|
name
Information Binding:
bindings.
The type space of a language is the set of bound to a simple data object. In Section available in imperative languages will
Each type
is itself
possible types that can be
all
common
3.2, the most
Types
types
be discussed.
a space of possible values to
which a data object of
that type
may be bound
that type.
Thus the type and value bindings are two phases of the same
and a set of operations that apply to objects of
binding, with the type binding restricting the possible values to which an object can
be bound and defining the
set of operations that
can be applied
We
can therefore think of a type as a set of values and The type is usually bound to a data object at compile time through a type declaration. Declarations are statements in a programming language that create a data object by binding it to a name and type (see box). to the data object.
a set of operations.
Data Object Declaration Language:
Ada
Definition:
::= = ]
:
[constant]
:
[
One
Usage:
or more data objects are created
from binding each
result
indicated.
If
:
identifier in
=
is
at
compile time. The objects
the identifier
list
to the type
present, a binding of the data
object to the value indicated by the expression occurs at compile time.
If
for the
To
constant life
present, that value binding
is
will
remain
in
effect
of the data object.
illustrate the effect
of a declaration, consider the following sample
declarations:
integer; integer := 0; constant integer
The
effects of these three declarations
are indicated in Figure 32. for the
life
modified
:=
at
Double
on bindings
at
compile time hold
lines indicate bindings that will
of the data object. Single lines indicate bindings that
may be
run time.
This Ada declaration form permits not only the declaration of variables,
but also their initialization and the declaration of constants, as the three cases of Figure
shown
in
3.2.
3.
1
|
Data objects
49
FIGURE Sample
3.2
(a)
integer; (b) Sample constant integer := 0;
Sample declaration: A
declaration:
C
:
:
B
declaration:
:
integer
Type Space
Type
Storage Space
Identifier
Space
Binding
Location
v
Binding
'
Name Binding
(a)
Type Space
50
Chapter 3
|
Information Binding:
A
:=
0;
(c)
FIGURE
3.2
(continued)
Type Space
Another important consideration is the syntax a language uses to specify the operations performed when computing results. In general, operators are of two types, monadic and dyadic. Monadic operators have one operand, whereas dyadic operators have two. The standard format is for monadic operators to be expressed in a prefix form with the operator preceding the operand:
Operators, functions, and expressions
::=
The dyadic operations are commonly expressed
in
infix form with
the operator between the two operands:
::=
Functions are another form of operation, but can have an unlimited number of operands. The form of function calls in an imperative language is
typically the prefix
form using parentheses. This
is
expressed by
::= (
3.
1
|
{
Data objects
,
}
51
in
A further consideration in the evaluation of expressions is the order which operations are performed. For example, the expression 6
2*3
+
could be evaluated as 24 evaluated as 12
if
the multiplication
if
performed first. Or it could be performed first. The most common
the addition is
is
rules for determining order of evaluation in an imperative language with, for example, operations +,
-,*,/, and
**,
is
1.
Operations inside parentheses are performed
2.
Next, operations are
performed
first.
in the following order:
**
first:
second: * and / third: +
3.
and
Operations
These
-
at
rules,
represented in
the
same
level in step 2 are
which are summarized
BNF
performed from
in the
as previously described in
left
to right.
accompanying box, can be Chapter
2.
Expression Evaluation Language:
Ada
Operators:
Order determined 1.
parentheses
2.
precedence
3.
left
to right
if
by:
equal precedence
Functions:
(
{,
})
Although these precedence definitions are ative languages,
it
common among
imper-
should be noted that some languages use other strategies
for expression evaluation,
such as prefix notation for dyadic operations, no precedence levels, or right-to-left evaluation among operators of equal precedence. The precedence rules used by any given language are arbitrary,
but they are necessary and important in ensuring the clarity of
expressions.
52
Chapter 3
|
Information Binding:
Although types are bound to data objects at compile time in most Ianguages, it is possible for this binding to occur at run time as well. The
Type binding and type checking
languages APL and Smalltalk implement such dynamic binding.
Whereas declarations are used
to bind data objects to types at
compile
need no declarations. The type of a data determined by the type of its value. Therefore, whenever a data bound to a different value, it takes on the type of that newly bound
time, dynamically typed languages
object
is
object
is
value.
Type checking
is
the process of determining the type of a specified
The performance of such checking by the computer can be of assistance in the detection and prevention of errors. For example, if
data object. great
the expression
A * B appears in a program, type checking can determine the types of data objects
A and B and whether the operator * applies to two objects of these types. Type checking can occur at compile time, at run time, or not at all. Languages that perform
compile time are called strongly typed. Ada is an example of a strongly typed language. Pascal, on the other hand, is not strongly typed, although it can check almost all types at compile time. Two exceptions to static type checking in Pascal are the checking of subrange types and the use of variant records. Consider, for example, the following Pascal program:
type soft
=
all
type checks
at
record case test:boolean of true first: 1. .20) false (second: char) (
:
;
:
end;
var x,y:soft; c char; :
begin c
:
=
x.
y first .
second; :
=
2 * x. first;
end.
It is
impossible for the Pascal compiler to determine
if
typing
is
correct
program, for two reasons. First, when x second is used, it is not possible to check at compile time that the second variant part of x is in effect. Second, at the assignment statement to y. first, there will be a in this
.
3.
1
|
Data objects
53
type violation
if
x.
first
greater than 10,
is
and
it
not possible to
is
determine this at compile time. In a language that is not strongly typed, there are two possible alternatives for those situations where types cannot be checked at compile time. First, the types might not
be checked
at all.
This
is
what
typically
happens in the preceding cases for Pascal. This places the burden of type checking on the programmer and may lead to serious undetected errors. The second possibility is that the type may be checked at run time. Such
dynamic type checking can be expensive in terms of execution time because a check must be performed every time a data object is referenced. It is also expensive in its use of memory since a type indicator must be stored as a part of every data value.
Dynamically typed languages, where types are bound at run time, at run time. Statically typed languages, where types are bound at compile time, can check types at either compile time can only check types or run time.
Another major issue in dealing with types is the way in which a data object of one type is converted to another type if the two types are mixed in the evaluation of an expression. Operators usually require both operands to
be of the same type. This includes the assignment operator whose left operand is the result data object and whose right operand is an expression that represents a value to be stored in the location bound to the result data object.
The two common strategies for converting an operand to a consistent type are implicit and explicit conversion. Implicit type conversion is often called type coercion. Such coercions may occur automatically when certain type mixtures occur. For example, real and integer operand mixtures might result in the integer operand being converted to a real of equivalent value,
if
possible. This often
is
indeed possible because many
integers have an equivalent real representation.
case
is
when
the real site
is
the integer
is
the
left
The only exception
operand of an assignment. In
to this
this case,
truncated to an integer before assignment occurs since an oppo-
coercion would require changing the type binding of the target data
object, a
change
that
not normally permitted. Languages that permit
is
implicit coercions require a
coercion
is
illegal.
sion from
of
all
pairs of types for
which implied
permitted. Nonpermissible pairs are flagged as errors.
The second types
list
option, explicit
type conversion, makes the mixing of
Instead, explicit functions are required to specify the conver-
one type to another. These functions may be given unique names INTEGER_TO_REAL, or, as in Ada, may permit
for each conversion, like
conversion by a function
and
that will accept
name that matches the name of the resulting type
operands of any allowable type. For example, in Ada,
54
Chapter 3
|
Information Binding:
Type conversion
FLOAT converts from any allowable type to float type. Such conversions are normally allowed between derived types and their parent types or between various numeric types. Pascal permits some explicit and some implicit type conversions. For the function
instance, integer-to-real conversions are implicit,
though
explicit functions
round and trunc must be used to convert from real to integer. A question arises as to what actually constitutes different types and when there is a need for type conversion. Consider the following declarations in Ada:
type Tl is INTEGER range 0..10; type T2 is INTEGER range 0..10; A:T1; B,C:T1; D T2 :
;
The question is, which of the variables A,B,C,D are considered to be of the same type? Another way of stating this is to ask, which of the following assignments are legal?
=B
=C
Three possible definitions of type equivalence are given here
in
order of increasing restrictiveness: 1.
Domain equivalence: Two data objects are of equivalent type if they have the same domain of possible values associated with their types. This
2.
is
also
known
as structural equivalence.
Name
equivalence: Two data objects are of equivalent type if they same name. Declaration equivalence: Two data objects are of equivalent type are typed by the
3.
if
they are
bound
to their type in the
same
declaration.
Under domain equivalence, A,B,C,D are all of equivalent type and can be operands that share the same operator without any type conversion. Under name equivalence, A,B,C are all of equivalent type, but D is not since it was bound to a type T2 which differs from the name, Tl, of the
—
type of the other three variables.
Under declaration equivalence, only B and C are of equivalent type bound to a type in the same declaration.
since they are
3.
1
|
Data objects
55
The previous discussion has ignored the that
is,
issue of anonymous types,
those types associated directly with variables through declaration
without being given a name. For example, in Ada
E,F
:
G
:
we
might declare
INTEGER range 0..10; INTEGER range 0. .10;
domain and declaration However, the interpretation of name equivalence equivalence are obvious. for anonymous types needs more careful definition. The rule used by Ada is that no two objects of anonymous type are name-equivalent. This means that E, F, and G are all considered to be of different types under name The
interpretations of these declarations using
equivalence.
A
difficulty arises
when
the
domain of one type
is
a subset of the
domain
of another. The operations between the two types might have a logical definition without type conversion,
even though type conversion
will
be
required under any of the definitions of type equivalence. For example, consider the following situation in a hypothetical language:
A:
INTEGER;
B:
0. .100;
The assignment statement A:=B; though obviously meaningful,
will
be
illegal in
a language that requires
One way of dealing with this situation is through A subtype of a given parent type includes the parent type, although its domain of values may be
explicit type conversion.
the definition of subtypes.
same operators
as the
a subset of the parent type's domain. Normally, operations, including
assignment, are permitted between an operand of the subtype and an
operand of the parent type. For example, the preceding assignment could be legalized by the following Ada declarations:
subtype T is INTEGER range 0..100; INTEGER;
A:
B:T; This
Ada construct enforces operational equivalence of data objects bound bound to the parent type.
to the subtype with data objects
56
Chapter
3
|
Information Binding:
Subtypes and derived types
Another problem can arise with subtypes, as ceding declarations followed by the assignment
by the pre-
illustrated
B:=A; In this case, the assignment
parent type and
example,
handle 1.
A
if
may have
bound to
Check
a value outside the
bound
is
to the
domain of the subtype. For
is
not legal.
A language might
ways:
that the value of A
is
subtype domain
in the
assignment as an error
if not.
This
is
at
run time and
the policy that Ada follows.
Flag such an assignment as an error at compile time even though
might execute 3.
A
illegal since
101, the assignment
this in three possible
flag the
2.
is
may possibly be
correctly. This
would avoid
it
extra run-time checks.
Ignore the issue altogether by not checking subtypes.
An opposite situation to that addressed by subtypes occurs in languages that employ domain equivalence. Occasionally a programmer wishes two types with
to define
identical
domains but make them operationally
incompatible. For example, suppose
we
have
type TIME is new FLOAT; type LENGTH is new FLOAT; DURATION TIME; LENGTH; DISTANCE :
:
Although DURATION and DISTANCE share the same domain,
it
would not
make sense to add them together. The reason for using different types TIME and LENGTH is to enforce this incompatibility. A language with domain equivalence should, therefore, permit an override of its basic mechanism to permit the declaration of separate, incompatible types with the same domain. One type will have the same properties as the other type but will not be equivalent in the sense of permitting a mixture of the two types
without type conversion. This in
Ada
as
shown
derived type to
its
in the
may also
is
called a
derived type and is implemented
preceding declarations of TIME and LENGTH.
define a subtype that
is
A
not operationally equivalent
containing type.
The type marized
issues just discussed, as they are addressed in Ada, are
sum-
in the following box.
3.
1
|
Data objects
57
Type
Issues
Language:
Ada
Type conversion:
explicit, using
the
name
of the target type as the
conversion function
Type
equivalence:
name
Subtypes:
p4 -* p2 -* p3 —> pi 11. Operators that are defined
on more than one type are
loaded operators. Identify the operators
86
Chapter 3
|
that are
called over-
overloaded in
Information Binding:
Pascal.
iiiiiiiimiiiiiiiiiiiiiiiiiiiiiiiiiiinmiimiiiiiiiiiiniiiiiiiiiii
In Laboratory Exercises 1-10, you are to work with a language or Ianguages whose implementation you have available to you, and determine
Laboratory exercises
answer to the following questions by constructing a sample program or programs and observing the results. the
1.
What
is
the definition of the identifier space for your language?
characters are allowed identifier length? 2.
What
and
in
which positions?
Are uppercase and lowercase
Is
there a
maximum
letters distinguishable?
are the expression evaluation rules for your language?
the operator precedences?
among
Is left
What
What
are
to right or right to left the direction
operators of equal precedence? Write the
BNF
representation
for these rules. 3. 4.
Does your language perform any implicit type conversions? What does your language use for type equivalence: domain, name, or declaration?
language has a subtype capability, which of the three strategies implemented when an expression of the parent type is assigned to a variable of the subtype and the value of the expression is not within the domain of the subtype? What is the domain of allowable integers in your language? What is the domain of allowable reals? Does your language evaluate AND and OR with or without short circuiting? What happens when a pointer data object is deallocated while another data object is pointing to the same location? What is the effect of assigning the same constant to two different enumerated types? Is it permitted? Are ambiguities discovered at compile time or run time?
5. If your is
6.
7. 8.
9.
10.
Do
blocks have a run-time location binding?
you are to work with an implementation of the Pascal programming language that is available to you.
In Laboratory Exercises 11—12,
11. There are at least two ways to violate type checking in Pascal. Find
them and give
at least
in variant records
and
one example of each.
Clue.
They can be found
for loops.
12. Pascal's version of type equivalence might be called "pseudostructural equivalence." a.
Construct
at least
three tests that demonstrate Pascal's use of struc-
tural equivalence.
Make
the tests different (for example, using
different data structures).
b.
Pascal does not use true structural equivalence, however. Find
where
it
does not use structural equivalence to check for comand provide at least two examples (again with dif-
patible types
ferent data structures).
Laboratory exercises
87
CHAPTER 4.1
Conditional Structures
4.2 Iterative Structures
4.3
Unconstrained Control Statements
CONTROL STRUCTURES iiiiiiiiiiiiiiiiiiiiiiiiiiimniimmiiiiiiiiiiiiiiiiimimiiiiiiii
A programming
language must do more than specify the actions that a computer should take. It must also specify the order in which those actions should occur. Because of the sequential nature of machine language execution, virtually all imperative languages follow the same pattern by adopt-
ing the sequential execution of statements. This means that after a statement has completed execution, the default for the choice of the next statement to be executed is the next physical statement in the program. One of the obvious benefits of a programming language is its ability to modify the order of statement execution by presenting alternatives to the sequential mode. Facilities of a language that permit this are called control structures. Although the choice of control structures has remained remarkably consistent from one imperative language to another, there are many subtleties in their definition and use that can cause confusion as a programmer moves between languages. Furthermore, control structures have been the subject of considerable controversy, both in language design and language usage. In this chapter
we will
pursue the important issues involving these
control structures.
The conditional control structure is examined in Section 4.1. It determines the next unit to be executed based on the result of a test or set of tests. Iteration is the repetitive execution of a program unit, and we study its many forms in Section 4.2. The most powerful control structure is the statement that directly specifies the next unit to be executed. This form has a direct counterpart in machine language and is called a transfer, branch, or goto statement. It will be discussed in Section 4.3.
89
MIIIIIHIIIIII
Illllllllllllllllllllllllllllllllllllllllllllllllll
Conditional control structures determine the next block of statements to be executed based on the result of a test or a sequence of tests. Such structures are usually implemented through an if statement. In this section we will examine four forms of such statements.
4.
The simplest form of conditional control structure performs a single test whose result it uses to determine whether or not to execute a specified block of statements. The two parts of the structure, therefore, are the boolean expression that defines the condition, and the block of statements that will be performed if the expression evaluates to true. The typical form of the simple conditional is
Simple conditional
I
Conditional
structures
if then
The most frequent
variation
from language
blocking the statements. Pascal and
ALGOL
to language 60, for
is
the
method of
example, consider the
made
compound statement by enclosing multiple statements between begin and end. Ada and Modula-2, however, have specific keywords for ending the block of statements, Ada terminating with end if, Modula 2 with end. Neither of these languages needs a beginning-of-block marker since then block of statements to be a single statement that can be
into a
functions in this capacity.
Every modern imperative programming language permits the extension of the simple conditional i f statement to a two-alternative structure. This direct extension takes the
form
if then
else
It is
the addition of the
of the Pascal
else
that
makes apparent one disadvantage
method of blocking statements
in conditionals.
It
leads to the
well-known dangling else problem in the case of nested conditionals. To illustrate this, consider the following Pascal fragment. if x>0 then if x0 then
if x>0 then
begin if x
when -> when -> fi
if-elsif construct. The if-elsif evaluates only until a true condition occurs. Furthermore, in the case where more than one of the conditions is true, the alternative whose statement sequence is executed is chosen nondeterministically. This means that there is no rule for choosing among several possibilities, and any of them could be chosen. If none of the conditions is true, the statement is considered to
This construct differs in several ways from the
above form evaluates
all
conditions, while the
4.1
I
Conditional structures
95:
be
in error.
struct
is
The conditions are often
called a
guards, and the entire con-
called
guarded command.
Although Ada does not implement this construct in the generality of Dijkstra's definition, it does have a specialized version of it that is used to
implement concurrency
control.
We will
study this in Chapter
7.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiin
One
of the most powerful features of an imperative language
doing
and
is its
capa-
of specifying the repetition of a block of statements. Structures for
bility
this are called
many
raise
4.2
Iterative
structures
iteration structures. These are extremely important
interesting issues. For convenience,
we
will refer to the
block of statements to be repeated as the body of the iteration.
The simplest form of
iterative structure is a
that specifies the indefinite repetition of
grammers
nonterminal ing iteration
its
body. Usually, student pro-
are cautioned to avoid such iterations because they will never
terminate and hence execute forever. In practice, however, there are
quently blocks of statements that execute in tions
just this
way.
fre-
A communica-
program, for example, may have the following nonterminating structure:
do forever check for character sent if character is sent then process character end do Perhaps due to the bad reputation of nonterminating iterations,
many
lan-
guages have no direct means of expressing them. Nevertheless, they can
be simulated very
example, one might write
easily. In Pascal, for
while true do begin
end;
Other languages, such as Ada, have a direct form minating
iteration. In Ada,
that expresses a nonter-
one writes
loop
end loop;
96
Chapter 4
|
Control Structures:
Nonterminating iteration
One fundamental on
capability of iterations
the result of a test.
Two
is
the ability to terminate based
Pretest iteration
factors can vary in the specification of this test:
its placement and its logical direction. The placement of the test can either be before, after, or in the middle of the body of the iteration. We will call these three choices pretest, posttest, and in-test iterations and devote this and the next two sections to their description. The logical direction of an iteration specifies whether the test is a termination test, where a true
condition indicates the iteration should
where
a false condition
termination condition ation test
is
found
halt,
or a continuation test,
completes the
is
in the
iteration. In Pascal, for example, the preceded by the until clause and the continu-
while
clause.
while loop
This iteration
then
tests
first tests
the condition.
If
it
is
true,
executes the body.
it
It
the condition again and repeats the process until the condition
becomes false. Note that, for the iteration to terminate, it must be possible body to change the state of the condition. Also note that it is possible the for condition to be initially false, in which case the body is not executed for the
at all.
Once
again,
ment block
an important issue
that serves as the
is
the syntax for delimiting the state-
body of the iteration. As with the i f state-
ment, Pascal assumes that the body
is
one statement
that
can be expanded
compound statement using begin and end. The technique of specifying a block of statements by special delimiters is used by Ada, whose syntax into a
for the pretest iteration
is
while loop
end loop; The word loop
signifies the
beginning of the body and end loop marks
the end.
Whereas
a pretest iteration provides a test before entry to the
iteration, a
posttest iteration places the
pretest iteration
is
body of the
Posttest iteration
the body. Generally,
test after
preferred over the posttest form because posttesting
permits one execution of the loop before the Pascal, a posttest structure
is
test is first
performed. In
repeat-until construct. This its test. The repeat-until method of delimiting the iteration
available, the
construct also uses a termination condition for
presents an inconsistency in Pascal in
its
:
4.2
I
Iterative structures
97
body. This
is
the only control structure for which Pascal abandons the
compound statement convention In this case, the
in favor of the
use of keyword delimiters.
words repeat and until are used
to delimit the iteration
body.
The equivalence of the pretest and posttest iteration constructs For example,
is
obvious.
general pretest iteration
in Pascal the
while do begin
end; can be replaced by the equivalent, though more awkward,
if then repeat
until not ; Similarly, the general posttest iteration
repeat
until ; is
equivalent to
while not do begin
end;
The
is for the convenience of choose the one most natural for any given iterative structure. Ada provides no posttest iteration construct, but can simulate it by placing the in-test conditional at the end of the iteration body. This will be illustrated in the next section.
inclusion of both constructs in a language
the programmer,
Occasionally
it is
who may
desirable to perform the test for terminating an iteration
neither before nor after the execution of the body, but rather in the middle. This it
is
is
called an in-test iteration
and
often argued that the use of a goto statement
98
Chapter 4
|
is
is justified.
Control Structures:
somewhere where
a situation
In-test iteration
A much more
restrictive construct
than the goto can be used for this
— one whose only purpose
purpose, however
is
from an
to exit
iteration
middle of its body. This has the advantage of permitting a flexible from a loop while not allowing indiscriminate branching within the program. This approach also avoids the need to use statement labels. We will examine the capabilities of such an in-test construct by discussing its Ada implementation, the exit statement. The format of this in the
exit
statement
is
exit [when ]
The
condition, in this case,
is
The general form of the
a termination condition.
Ada
in-test iteration in
is
therefore
loop
exit v/hen
end loop;
where the top and bottom bodies are those statements executed respectively before and after the test is performed.
A
further extension of this construct
nation of
more than one nested
is
available to permit the termi-
iteration at the
same
time. For example,
loop
loop
exit when
end loop;
end loop; In this situation, the
inner iterations.
The
exit statement definition of the
the immediate containing iteration. ever, to
is
contained in both the outer and
Ada exit
A facility is
is
that
permit the exit from several layers of iterations
This
is
made
exits
it
from only
also provided in Ada, howat
the
same
time.
possible by the labeling of iterations and specifying the
outermost iteration to be exited by naming the label in the exit statement.
The preceding example could exit from the outer
iteration in the following
way:
4.2
I
Iterative structures
99
OUTER:
loop
loop
exit OUTER when ;
end loop;
end loop OUTER;
loop statement with a colon separator and, in addition, must be appended to the end loop statement. This form of label is used for reference by exit statements only and may only be attached to 1 o op statements. A separate syntax is used for general statement labels, and this syntax will be described in Section 4.3. The exit statement of Ada gives the programmer power beyond the Note
that the label
simple
is
attached to the
in-test iteration.
since there
is
no
provides the capability of multiple-exit iterations
It
limit to the
iteration. This practice is
number of
exits that
can occur within an
discouraged as being counter to the goal of writ-
ing understandable programs, however.
The exit statement
also permits the
programmer
to simulate the
action of posttest iterations, a feature not directly included in Ada. posttest iteration can
An Ada
be written
loop
exit when ; end loop; Another construct, often confused with the exit, is one that termiand begins the next pass, rather than
nates the present pass of an iteration
terminating the entire iteration. Ada does not include such a statement,
but the language
The
C provides
a
continue statement
count iteration, which FORTRAN. This iteration is termiloop of specified number of times rather than until a spec-
oldest of the iteration structures
traces
its
is
the fixed
Fixed count iteration
roots back to the do
nated after executing a ified
for this purpose.
condition occurs.
Fixed count iterations are controlled by a variable known ation control variable (ICV). The general form of such an
for
:=
as the iter-
iteration
is
to step do
100
Chapter 4
|
Control Structures:
Here, ICV
is
a variable,
and
initial, final,
and increment are expressions
whose values have the same type as ICV. Based on this general form, we will address a number of important variations among languages in forming the fixed count iteration.
What
1.
types are permitted for the ICV?
In
some languages only
and and enumerated types, the control of a case structure. These types
integers are permitted; in others only numerics, including integers reals. Pascal
the
and Ada both permit
same as those permitted
in
integer, character,
are permitted because they possess a built-in stepping function. In other words, each element has a natural successor. Real types do not possess this property and require explicit values for the increment if they are
allowed.
What
Most imperative languages require is the scope of the ICV? ICV be a variable that is bound in the execution unit containing the iteration. Ada, however, takes a different approach. The scope of an ICV in Ada is the body of the iteration for which it is declared. This means 2.
that the
that
its
appearance
and binds
it
in the iteration
locally to a location.
statement
On
is
equivalent to
completion of the
its
declaration
iteration, the
ICV
no longer bound to that location. The modifi3. Can the ICV be modified within the iteration body? cation of the ICV within the iteration body is dangerous in that it disrupts is
the sequence of values specified at the beginning of the iteration. For this reason,
some
languages, such as Ada, disallow modification of the ICV,
through either assignment or use as a modifiable argument to a procedure.
Other languages place no 4.
What
is
on changing the ICV. ICV after termination of
restrictions
the value
of
the
the itera-
There are four different responses that languages give to this questhe scope of the ICV is the iteration, as in Ada, the answer is obviously that the ICV no longer is bound to a location, and hence has no value binding either. If the ICV maintains its location binding after termination of the iteration, it could be bound either to the value it had during the last iteration, to one increment beyond its value during the last iteration, or to an unspecified value. This last option means that, unlike with Ada, the ICV will have some value, but no guarantees are made as to what that value might be. This 5. When are the final and increment expressions evaluated? becomes an important issue when variables in these two expressions are modified inside the iteration. For example, the following Pascal program tion?
tion. If
fragment would raise
this issue:
for i = 1 to n do n:=n+l; :
If
the final value n
this iteration will
is
reevaluated each time the iteration body
run forever for n
initially positive.
It
4.2
is
executed,
turns out that for
I
Iterative structures
101
Pascal, as
with most imperative languages, both the
iteration.
fore have
entry into the
is
equivalent to the simple statement
2*n;
:=
6. Is
initial
The changing of n in the preceding iteration body would thereno effect on the number of times the body is executed, and for
positive n the previous fragment
n
and the increment
final
expressions are only evaluated once, prior to the
an increment
other than successor permitted?
Although an
increment expression was specified in the general form, some languages do not allow such a specification. Pascal and Ada, for example, permit only
an increment of one for a numeric ICV. Nonnumeric ICVs are required to be of types where each element has a defined successor making the implied effect
of an increment setting the ICY to the successor element within the
type. 7.
How
is
iteration
backward through a range
guages that permit increment narily
used to indicate
this
specified?
specification, a negative
increment
In lanis
ordi-
type of iteration. Pascal, which has no explicit
increment, replaces the keyword to with downto as in
for
i
:=
6 downto
1
do
...
Ada expresses the initial and final expressions as a range in which initial must always be less than or equal to final. For the ICV to proceed backward through this range, the keyword reverse must be appended. For example, for
I
in reverse 1..6 loop ...
8. Is transfer into the iteration
body permitted? Because the paramand fixed when it is initially
eters for a fixed count iteration are evaluated
entered, branching to the interior of such an iteration without executing
these
initial
evaluations can be very dangerous. Therefore,
guages, like Ada, disallow such transfers. Pascal and
many
some
lan-
others allow
these transfers to occur, though the results will be highly unpredictable.
As with other iterations, the 9. How is the iteration body delimited? two approaches are (1) to allow the body to be a compound statement, or (2) to use keywords to delimit the block of statements forming the body. Pascal, as usual, follows the compound statement philosophy. Ada utilizes the keywords loop and end loop to delimit as before. The general form of the iteration statement in Ada includes all of the types of iteration that can be specified, including nonterminating, pretest, in-test, and fixed count iterations.
102
Chapter 4
|
Control Structures:
Iteration
Language:
Ada
loop-statement [] for in [reverse] [while loop
end loop [] :
:
Pretest: continuation test Posttest: In-test:
no
facility
is
is
while
option
provided for posttest
statement of the form
[when ]
exit [loop-name] exits
used with
from the named
Fixed count:
iteration
for option implements
Types for ICV: Scope of ICV:
integer, character,
iteration
fixed count iteration
enumerated
body only
ICV modification permitted
inside iteration?:
Value of ICV after
not applicable
When
is
iteration:
range evaluated? once, upon
initial
no entry
Permissible increments: successor and predecessor
Backward
iteration:
keyword reverse
Transfer into iteration: not permitted
The nondeterministic conditional can be extended
to
form a nondeter-
Nondeterministic iteration
ministic iteration of the following form:
do
when -> when -> when -> od
sequence to be executed will be made whose guard conditions are true. The iterative form repeats as long as at least one guard condition is true and terminates whenever none are true. Again, we will see this form imple-
Once
again, the choice of statement
nondeterministically from
mented
among
those
for concurrent control in Chapter
7.
4.2
I
Iterative structures
103:
iiiiiiiimiiiiiiiiiiiiiiiiiiiiiiimiiimiiiiiiiiiiiiiiiiiiiiiiiiiini
The most
controversial of
strained, that
is,
all
control structures are those that are uncon-
permit branching to any program unit without
These are generally known tionable value, as
we will
goto constructs and
as
discuss shortly. Nevertheless,
ative languages, with the exception of Modula-2,
In this section
their use all
restriction. is
4.3 Unconstrained control statements
of ques-
popular imper-
provide a goto statement.
we will examine this simple, yet powerful, control construct.
The general format of the goto statement which it is included is
in
almost every language in
goto
FORTRAN
contains alternate forms of the goto, but these are really just
other ways of expressing a multialternative, case-like structure. permits a goto or two to be attached to every statement as a
The
interesting issues
and
variations with
SNOBOL
suffix.
goto constructs
arise
when
the format of labels and the impact of scope are considered.
There
is
great variation in the
way statement
labels are formed. BASIC, for
example, requires a label to be present for every statement. Other languages make labeling a statement optional. Optional statement labels are frequently separated from the statement by a colon. Ada requiring that the label be enclosed between
is
an exception,
« and ». This
is
the case
used in Ada to label iterations, as described earlier. In order to better understand the use of labels and different languages' approaches, we return to the data object model that we used in Chapter 3. For our discussion here, we consider the data object to be a statement because the colon
is
of the program that
is
bound at load time
the executable statement labels
is
that illustrated
storage space
where the
same
memory where
The approach most languages use toward 4.1.
instruction
an element of the label
the
stored.
The name binding occurs
time. to
is
by Figure
to the location in
at
Here, the value
is
the element of the
located. This binding occurs at load
is
compile time and binds the data object may or may not be
identifier space. This space
as the variable identifier space. In
C and
Ada, the label and
variable identifier spaces are the same. But languages often select label
from the space of integers, as in Pascal and FORTRAN. Another variation is whether the label type binding is made explicitly through a label declaration (Pascal) or implicitly through attachment of the label identifier to a statement (C, Ada, and FORTRAN). When a label is declared, there are implications for the scope of the label, which we will discuss identifiers
later.
Figure 4.2 illustrates the use of label variables. The best example of this is
found
in PL/I,
which permits the declaration of an identifier to be a be bound to any legal label constant as its
label variable. That variable can
104
Chapter 4
|
Control Structures:
Statement
labels
FIGURE
4.
1
Statement
labels as constants
Type Space
Label
Type
Storage Space
Identifier
Space
Binding
Value
,
Binding
FIGURE
4.2
Statement
'
Name Binding
labels as variables
Type Space
Label
Storage Space
Identifier
i
;
4.3
I
Space
Name Binding
Unconstrained control statements
105:
value. This permits such interesting activities as passing labels as
parameand forming arrays of labels. The languages SNOBOL and APL extend this idea even further by permitting calculated expressions to have their values assigned to labels. There is a price to be paid for this interesting extension, namely, the loss of program readability. We know that goto statements themselves can be detrimental to program readability, but when a single goto statement can branch to virtually any labeled statement with the choice of statement dependent on some nonlocal action the readability factor sinks to new lows. For this reason, the implementation of label variables is not included in most modern imperative languages. ters
—
There are several important issues related to the scope of labels. The is the scope of the name binding to a labeled statement. In general, follows the scoping rules for variables, that
is,
first
Scope
this
the binding holds in the
present block and
all
contained blocks. Redefining a label identifier inside
of a nested block,
if
allowed by a language, could result in the "hole-in-
scope" problem as with variables.
To
illustrate the
above point, consider the following Ada fragment.
OUTER: begin INNER: begin
«INSIDE»
— this
goto OUTSIDE
is legal
end INNER
«0UTSIDE»
— this
goto INSIDE;
is not legal
end OUTER;
INNER block is legal because OUTSIDE is bound from the The goto in the OUTER block is not permitted because INSIDE is bound only in the context of the INNER block. By the way, note that OUTER and INNER are block labels, while INSIDE and OUTSIDE are statement labels. It is illegal to use block labels in a goto statement. One further type of scoping block can logically be defined for labels beyond those that apply to variables. This is the block of statements making up the body of a control structure. It is necessary to make these blocks
The goto
in the
containing block.
scoping blocks for labels to prevent branching to the interior of a control
body without executing the following Ada fragment is illegal: structure's
106
Chapter 4
|
test condition.
Control Structures:
For example, the
issues
loop
«INSIDE» end loop;
— illegal
goto INSIDE; Similarly,
to branch into structure body
branching to a statement inside the body of a conditional or any
other iteration structure from outside that body
is strictly
prohibited. In
defining the scope of labels, the bodies of control structures are thus
same as any other scoping block. Another important observation can be made about the situation where a goto statement branches from inside a block to a statement in a containing block. This is perfectly legal in most languages, but its implementation is not as simple as it might appear. Branching out of a block is actually a termination of that block and requires the removal of that block's activation record from the run-time stack. This may require popping several activation records if the branch is out of several layers of nested blocks. treated the
We
are
now
able to summarize the unconstrained control structure
of Ada.
Unconstrained Control Structure Language:
Ada
goto statement
::= goto
label-name ::=
labeled-statement
Declaration of
label: implicit
Scope of
The
units
The
label:
where
unit
is
unit
in
:
by
{
:=
its
which
}
occurrence
it
occurs and
a block or a structure
all
containing
body
goto statement has been the subject of a major controversy in the field of computer science, initiated by Dijkstra (1968) and rekindled by Rubin (1987). This controversy is actually about programming practices rather than programming languages, but inasmuch as language has an impact on practice, programming languages have become a part of this little
4.3
I
Unconstrained control statements
The goto controversy
107:
discussion.
We
our discussions to the impact
will limit
that the
presence
of the goto statement has on the capabilities of a language.
Three
facts
about control structures are important considerations here.
Simple conditionals and goto statements are
1.
sufficient to replace
any control structures.
Each control structure we discussed in this chapter can be replaced by a construct using only simple conditionals and goto statements. For example, the Pascal while construct of the form
while do ; can be replaced by
if then
10:
begin ; goto 10; end; Several exercises at the
end of
this
chapter require you to replace other
control structures with these two simple constructs. 2.
The
two-alternative conditional
and
pretest loop constructs are suf-
any control structure. This result is far less obvious than the
ficient to replace
first,
but has been proved by
Boehm and Jacopini (1966). One consequence of this result is that a language without a goto could duplicate the programs written in a language containing the goto. In other words, the goto is not required for the construction of any program. 3. The goto is the most powerful control structure. This result has meaning only if the word powerful is defined. Kosaraju (1974) has proved that the goto is the most powerful in the sense that replacing a goto with other structures might require additional variables, while replacing other structures with a goto will never require additional variables. In this sense, programs expressed without the goto are more complex than those expressed with
What guages? all
it.
are the implications of these three results for
First,
programming
lan-
a programming language without a goto statement can express
same language with a goto where programs can be more simply
the programs that can be expressed by the
added. Second, there are situations
represented by the use of a goto.
On
the other hand, in the
same way
that
powerful automobiles or
powerful weapons give greater capability but are accompanied by greater danger, there is an increased danger with the use of the goto. This danger
108
Chapter 4
|
Control Structures:
is
in
an increased
ability to
scientists believe that the
advantages inherent in
its
generate unreadable programs.
dangers of using the goto
Many computer
far
outweigh the
power.
Programming language designers have reacted
to this controversy
by
continuing to provide a goto statement while also providing a sufficiently rich set of
weaker control structures
essary. This presents the
to
make
the use of the goto unnec-
programmer with the
final
choice of whether to
use the goto or not.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiimiiiiiiiiiiiiiiii
Terms
nonterminating iteration termination test
control structure conditional if statement dangling else case statement nondeterministic
continuation
guard guarded
fixed count iteration
test
pretest iteration posttest iteration in-test iteration
command
iteration control variable (ICV)
goto
iteration
-
II
1.
How would
2.
Why do
3.
Why do you
the usefulness of a language be limited
if it
contained no
'
1
1
1
1
:
a
and Ada not permit the use of reals and
string types in
case statement? think Pascal's
repeat-until uses an approach
to block-
ing that differs from that of all other control structures in the language?
4.
What might be some negative consequences of this? What are the advantages and disadvantages of Pascal's compound statement philosophy of blocking control structures?
5.
Give an argument for permitting the modification of the ICV inside a
6.
What are some reasons for requiring some reasons for not doing so?
7.
Consider the following Pascal code.
loop body. that labels
be declared? What are
Discussion questions
:
1
1
1
1
1
1
1
1
-II.
1
1
1
1
:
Discussion questions
control structures? Pascal
-
-
109:
program GotoQuestion (input, output); label 99; procedure ReadUntil (match: integer); var potential: integer; begin while true do begin read(potential if potential = match then goto 99; )
end; end;
99:
begin ReadUntil (42) writeln( 'Got a 42
!
!
end.
Not only does the goto above "break" a loop, it also "breaks" a procedure by jumping outside of that procedure. Discuss the legality, advisability,
and ramifications of using mechanisms such
as this.
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiiiiiiiimiiimiiiiiiiiiii
1.
Show how
a simple conditional and a
goto can be used
to simulate
the actions of
2.
(if-then-else)
a.
a two-alternative conditional
b.
a multialtemative conditional (case)
c.
a pretest iteration
d.
a posttest iteration
e.
fixed count iteration
For each of the following Pascal structures, solve the dangling else
problem by indicating what a.
will
be printed when
if x>10 then
b.
if x>10 then
begin x:=x+2; if x process2; when 2 9 => process3; when 6 when others => error; end case; |
4.
Ada has no Design one
form directly built into the language. would remain consistent with Ada's other iteration
posttest iteration that
constructs. 5.
Design the syntax for a completely general iteration structure that permits either pretest or posttest and continuation or termination logic. Write the
one 6.
BNF for your construct. Does your answer permit more
test for
the
same
than
iteration?
Give examples of situations where each of the following are natural iteration forms: a.
Nonterminating
b.
Termination, pretest
c.
Continuation, pretest
d.
Termination, posttest
e.
Continuation, posttest
f.
Termination, in-test
g.
Continuation
7. If neither
in-test
an exit nor a goto statement were available
in Ada,
how
might you simulate the action of an exit statement? 8.
Give an
problem a repeat loop.
illustration of the "hole-in-scope"
9. Rewrite the following
for index
:=
for loop as
for
Ada
labels.
10 downto -5 do soraething_or_other;
Exercises
111:
10. In BASIC, the on ... goto statement evaluates an expression in the
same way
a
much
case statement does. For example, executing the
statement
25 ON (X) GOTO 100,
200,
300
computer to jump to lines 100, 200 or 300 if X is either none of these, no jump is done. 2, Rewrite the preceding statement as a case statement in Ada.
will cause the 1,
a.
b.
or
3- If
X
is
Rewrite the preceding statement as an i f statement in Pascal.
MMMM
MIMIMIM
MMMMM.MMMMIMMMMMM
In each case you are to work with a language or languages whose imple-
Laboratory
mentation you have available to you, and determine the answer to the following questions by constructing a sample program or programs and
exercises
observing the 1.
2.
3.
results.
Does your language permit an unconditional branch into the block of statements executed under control of a conditional? What types are permitted for the control expression of a case statement in your language?
How does your language react when an unspecified choice is evaluated for the expression controlling a
case statement?
5.
Does your language permit unconditional branching into or out of the body of an iteration? What is the value of the ICV after completion of an iteration?
6.
When
7.
Does your language permit an ICV to be of a real type? Does your language permit an iteration to have multiple exit tests? Does your language permit modification of the ICV in the body of a
4.
8.
9.
is
the final expression evaluated in a fixed count iteration?
fixed count iteration?
112
Chapter 4
|
Control Structures:
IM'MII
CHAPTER 5.1
Data Aggregate Models
5.2
Arrays
5.3 Strings
5.4
Records
5.5 Files
5.6 Sets
5
DATA AGGREGATES II
In addition to the fundamental data types introduced in Chapter ative
programming languages have
aggregates of other types called
facilities for
3,
imper-
made up
types that are
Illllllllllllllll
of
data aggregates. This chapter will exam-
ine these language capabilities.
we look at the general structural models used to aggrenew types. The remaining sections each study a specific
In Section 5.1
gate types into
aggregate type in light of types
we
its
relationship to these models.
The aggregate
will study are arrays (Section 5.2), strings (Section 5.3), records
(Section 5.4),
files
(Section 55), and sets (Section 56).
Each of these aggregate types
will
be examined from the following
points of view:
1.
Declaration and binding: In contrast to the data object bindings that
we
have seen previously, the bindings of interest in
the bindings of the aggregate type to 2.
its
this
chapter are
constituent types.
Manipulation: The fundamental operators
on aggregate types are com-
parison and assignment. In addition, aggregate types need operators
known
as selectors that convert
and constructor operators
from aggregate to constituent values, from constituent to aggregate
that convert
values. 3.
Implementation: The implementation of an aggregate type refers to special considerations given to the representation of aggregate struc-
tures in storage. Data compression, data organization,
and indirect
storage using pointers are important options in implementation.
Data aggregate types provided by a programming language are distinct from abstract data types constructed by the programmer from the simple
115
IIIIMIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIIII
and aggregate types of the language. to data abstraction in Chapter
We address
language features related
8.
MiimmmiiiiimimiiiiiiiiimiiiiiiiiiiiKiiiiiiiiiiiiiiniiiiii
we
In this section,
introduce five abstract models for representing data
aggregation. These models will serve as useful tools in describing specific
data aggregates. This classification of models
For the purpose of
we
this section
5.
1
Data aggregate
models
taken from Hoare (1972).
is
T2
use the notation that Ti,
,
.
.
represent types, either simple or aggregate. These tvpes are not necessarily distinct,
When
so that Tj and
T 2 may
an aggregate type T
is
represent the
from values of the constituent
structures built
may have
nality.
The
T
the
is
number
an aggregate type
constituent types.
its
We
denumerably infinite cardieasily computed from the
is
represent the cardinality of a type
C(T).
The Cartesian product type is
of possible values of that type. Data
either finite cardinality or a
cardinality of
cardinality of
T by
types.
are also interested in the cardinality of a type. Simply stated, the
cardinality of type
types
different types.
The values of an aggregate type are
types are called constituent types.
We
same ivpe or
defined in terms of other types, these other
is
constructed from a set of
types and
finite
defined as follows:
Tj
x T2 x
.
.
.
x Tn =
{(t„ t2
In words, the Cartesian product
tn )
is
where t^Tj,
the set of
all
t2
eT 2
,
.
. .
,
tnCTj
possible tuples that can be
formed by choosing one element from each of the n types
participating in
the product.
Consider the following example with n
=
3
and
finite
types
T lt T2 T3 ,
.
= {1,2,3.4} T2 = { A B C T3 = {true, false} T,
'
'
'
By our
,
'
'
,
'
definition, the possible
(1, 'A' ,true)
false) (1, 'B' .true) (1, 'B' false) (1, 'C ,true) (1, 'C false) (1, 'A',
,
,
elements of type 1\ x
true) false) 2, 'B', true) 2, 'B', false) 2, 'C, true) 2, 'C, false)
T2 x T3
are
true) false) 'B .true) 'B' false) 'C\ true) false)
2, 'A',
(3, 'A', true)
(4, 'A*,
2, 'A',
(3, 'A'
false) (3, 'B\ true) (3, 'B' false) (3, •