Language, Mathematics, and Linguistics

486 124 10MB

English Pages [252] Year 1967

Report DMCA / Copyright


Polecaj historie

Language, Mathematics, and Linguistics

Citation preview

Digitized by the Internet Archive in 2019 with funding from Kahle/Austin Foundation

















© Copyright 1962 by Mouton & Co., Publishers, The Hague, The Netherlands. No part of this book may be translated or reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.

Printed in The Netherlands by Mouton & Co., Printers, The Hague.


This essay has two aims.1 The first, a subsidiary one sought mainly in §1, is to introduce some of my fellow linguists to mathematics. Of course, some lin¬ guists know much more mathematics than I do; §1 is not for them. But many know almost none; a few, strangely enough, boast of this ignorance. This is an undesirable state of affairs, for they are thereby seriously hampered in following certain current develop¬ ments in our field. No linguist will acquire a practical control of mathematics by studying this essay, any more than one can become fluent in a foreign language merely by reading a description of it. But, hopefully, the ice will be broken, and he can proceed to learn more mathematics from the many excellent standard texts.2 Learn¬ ing mathematics is like learning any subject, in that one must acquire a new vocabulary. It is like learning a foreign language rather than, say, history, in that one must also acquire alien grammatical habits. And it is like no other subject in that one must also learn how to invent new grammatical devices as they are needed. 1This is a revision and enlargement of my ‘Four Lectures on Theoretical Linguistics’, delivered at the Linguistic Institute, Indiana University, Summer 1964. Except for the Preface, which replaces an earlier Introduction, and for minor alterations of format, the present version is identical with that included in Current Trends in Linguistics, edited by Thomas A. Sebeok, vol. III. Theoretical Foundations (The Hague, Mouton & Co., 1966), pp. 155 304. 2The following bibliography is representative rather than highly selective; it begins with three recent texts intended for accelerated American high school students. Allendoerfer and Oakley 1963; Glicksman and Ruderman 1964, Richardson 1958; Breuer 1958; Lipschutz 1964; Tarski 1950; Courant and Robbins 1941; Wilder 1952; Polya 1954; Kemeny, Snell, and Thompson 1956; Birkhoff and MacLane 1944; Davis 1958; Chevalley 1956. Full information will be found in the References at the end of the essay.




For all this, experience in traditional linguistics should afford about as useful a background as one could hope for. I have tried to help the linguist reader by emphasizing the language-like attributes of mathematics, most of which are not recognized as such by mathematicians because they have no specialist’s knowledge of language. This ignorance on the part of the average mathematician has no bearing on the quality of his mathematics; but ultimately the languagelike nature of mathematics is of basic importance, since it is the most critical clue to an understanding of the place of mathematics in the physical universe of which mathematics, mathematicians, language, linguists, and all of us are a part.3 The second aim, pursued in the rest of the essay, now needs more extended comment than it received in the earlier printed version.1 It was there set forth as follows: ‘... to explore certain properties of certain grammatical systems. The investigation is conducted at a fairly abstract level, so that the conclusions have nothing in particular to do with one human language rather than another: hence examples, on the rare occasions when they are given, have been drawn as a matter of convenience from languages of which I happen to have some knowledge. However, the argument is not intended to be completely abstract and formal. I assume that even in the most ethereally theoretical linguistics we are still concerned with real languages,5 so that the tie between theory and empirical data, though it may become exceedingly tenuous, must not be broken. The point of departure for the investigation, developed at the beginning of §2, is similar to the usual point d’appui of the algebraic grammarians”, but the direction the investigation then takes is different and, if I am correct, new.’ The question that must now be raised is whether the ‘tie between theoiy and empirical data’ has not, indeed, been severed—not only in the bulk of the present essay6 but also in all algebraic grammar 3For this view, see Bloomfield 1935, 1936, 1939; Hockett 1948b; Hockett and Ascher 1964. “See footnote 1 above. Compare the unbreakable, though almost indefinitely stretchable, tie between theoretical and experimental physics, described beautifully in Born 1962. But not in §4 and some parts of §3 (and, of course, not in §1).



in most of Chomsky’s work, in a good deal of Lamb’s, and, indeed, in a sizable proportion of linguistic theory since Bloomfield. Nor do I mean entirely to exonerate Bloomfield: in synchronic matters he followed Saussure quite closely, and thus set a whole generation olf on what I now believe was the wrong track.

A language,

viewed synchronically, was a ‘rigid system’7; individual speakers may in practice violate almost any feature of the system, without modifying the system itself unless ‘whole groups of speakers ..., for some reason unknown to us, coincide in a deviation’.8 Bloomfield’s own work was too thoroughly infused with historical good sense for this view of synchronic language design to do much damage— though, to be sure, the descriptive chapters of his textbook9 set us olf in the direction of a long-unchallenged ‘item and arrangement’ model of grammar,10 ultimately formalized as ‘phrase-structure grammar’ by the transformationalists.11 After Bloomfield, even this check on spurious formalism was lost, as a whole generation of American descriptive linguists were trained in virtual ignorance of the findings of historical linguistics. It is greatly to the credit of Harris and of Chomsky that they uncovered inadequacies in ‘item and arrangement grammar and were thus led to develop a more realistic ‘item and process’ model featuring transformations. The contexts in which they set forth these new (or revived) proposals were unfortunate. Harris’s context was that of ‘game-playing’.12 Perhaps partly in reaction to this, Chomsky spoke from the beginning as though phrase structures, transformations, and the like are in the language, rather than merely

’Bloomfield 1924, 1927. 8Bloomfield 1927. 9 Bloomfield 1933. 10Hockett 1954. Peculiarly, this article, which introduced the term ‘item and arrangement’ and the contrasting ‘item and process’, has often been interpreted as a defense of the former, whereas in fact it was intended as a challenge to the former and a proposal that we investigate the latter more thoroughly. “Postal 1964a is the most explicit discussion of this. 12This is clear from the tone of Harris’s many articles in Language during the 1940’s and early 1950’s, and of Harris 1951. For criticism at the time, see Hockett 1948c, 1952.



useful descriptive devices.13 It is easy to miss this point in reading Chomsky’s essays, since he is quite insistent that a generative grammar is not supposed to be any sort of a picture of how speakers produce (or receive) utterances; thus, merely because phrases structures, transformations, and the like are in the language does not mean that they are in the speaker. To achieve this peculiar state of affairs, he takes the language itself out of its speakers: that is, a language cannot, in his view, be regarded as a set or system of habits of real people.14 Yet obviously a language has something to do with people and their behavior, and Chomsky puts the language back into its users in a different way, in the form of the user’s competence, or knowledge of his language, which is somehow to be distinguished from his actual performance and cannot even be identified with his regularities (habits) of actual performance.15 The main trouble, however, is not merely an innovating termin¬ ology—doubtless we could say about language what needs to be said in Chomsky’s terminology about as well as in Bloomfield’s or Paul’s. Once the vocabulary has been mastered, Chomsky’s system of views has a persuasive coherence and almost has to be accepted or rejected as a whole. It takes quite a bit of exegesis to discern that the whole structure rests, for better or for worse, on one unchalleng¬ ed key assumption.16 Bloomfield, following Saussure, called a language a ‘rigid’ system, but this use of the term ‘rigid’ was metaphorical. In the postBloomfieldian descriptive work of the 1940’s, we sought to match the rigidity of languages by the rigor of our methods, without any very clear notion of what rigor was. It was in this atmosphere that Chomsky received his training. He was the first (as far as I know) to refine the notion of rigidity-rigor by bringing to bear on this the “Chomsky 1957, and repeatedly since. “Chomsky 1964, fn. 3, p. 10. “Chomsky 1965, ch. 1. “For me, this fact emerged clearly only from a careful reading of Chomsky 1965, ch. 1. This chapter is a reductio ad incredibile of the mistakes we have been making in linguistics for the last thirty or forty years; my study of it, after the present essay was completed, was responsible for the radical change of view reported in this Preface.



relevant segment of modern mathematical logic: to wit, the theory of computability and unsolvability.17 In this exact logical frame of reference, the obvious translation of the imprecise ‘rigid’ is the term well-defined. We cannot take the space here to expound this term (the reader can consult the footnote reference), but will merely give a few examples.

All or almost all mathematical systems

(see §1.8 below) are well-defined.

At the opposite extreme, no

physical system is well-defined. Of human institutions, there are some of each: thus, chess is well-defined; baseball may be; American football is definitely not. It seems that indisputable instances of well-defined systems are all products of the human intellect. Chomsky’s views are based on the assumption that a language, at any given moment, is a well-defined system. All known versions of algebraic grammar rest on this same axiom (§2.0 below, where—I am happy to say—even a year ago I expressed some doubts). It is surely not Chomsky’s fault that he should never have challenged this assumption, in view of its honorable pedigree. But it must be challenged. For, in fact, there seems to be not one shred of empirical evidence that supports the view. In particular, the assumption that a language is well-defined proves to be incompatible with all that we know of linguistic change. It is therefore not surprising that some of Chom¬ sky’s followers, whose training in traditional historical linguistics was deficient or wholly missing, are now inventing historical linguistics all over again—and, of course, are repeating all the same old mistakes that were overcome after decades of toil by our pre¬ decessors of a century ago.18 All this requires much more extended exposition than is possible here; put thus briefly, my remarks will probably seem gratuitous. It is in no sense my purpose to be cryptically derogatory towards 17Davis 1958. A system is well-defined (this particular term does not appear in Davis) if it can be completely characterized by deterministic functions; a deterministic function is either a computable function or one specified with sufficient explicitness that its noncomputability can be proved. 18Closs 1965 represents this new development; see especially her first para¬ graph, which refers to other examples. See also the diachronic remarks in Halle 1962; and Postal (forthcoming).



some of my colleagues. But I must be frank about a radical shift in point of view between the time this essay was written (the original manuscript was transmitted in February 1965) and the present. For, if a language is in fact not a well-defined system, then what is the point of this or any other elaboration of algebraic grammar? It is easy to say that, even if the basic assumption of algebraic grammar is false, it may nevertheless afford us a useful approxima¬ tion. Since this is easy to say, I hereby say it, thus offering some justification for the republication of this essay. But there is a much more important point. I now believe that any approximation we can achieve on the assumption that a language is well-defined is obtained by leaving out of account just those properties of real languages that are most important. For, at bottom, the productivity and power of language—our casual ability to say new things— would seem to stem exactly from the fact that languages are not well-defined, but merely characterized by certain degrees and kinds of stability. This view allows us to understand how language works, how language changes, and how humans, using language, have created the well-defined systems of mathematics—for well-definition is born of stability through certain tricks of which only a speaking animal seems to be capable. Current algebraic grammar is good fun. But preoccupation with it should not blind us to the possibility of discovering some mathematicization of language design that would exploit, instead of ignoring, this basic empirical fact. From here on, for the sake of brevity, the bare term ‘language’ will be used only of real human ‘natural’ languages, spoken or written, never in transferred or metaphorical senses. Cornell University Ithaca, New York, U.S.A. January 1966










Mathematical Background.



Some Cautions to Linguists.



Ordered and Unordered Sets.


1.2. 1.3.

Elements and Sets. Abstraction, Notation, Abbreviation.

17 19


Variables and Domains.



Relations among Sets.



Associations, Functions, Correspondences






1.8. 1.9.

Systems. Properties of Binary Relations and Operations . .

30 34

1.10. Isomorphism. 1.11. Recursive and Recursively Enumerable Sets ...

37 41

1.12. Model and Exemplification.


Linear Generative Grammars.


Semigroups, Monoids, Elarps, and Grammars. .


51 51


Linear Generative Grammars.


2.2. 2.3.

Kinds of Rules. Source and Transducer; Effective Rule Chains . .

62 71


Generation and Duality.




The First Inadequacy of Linear Grammars .


81 81







Some Empirical Considerations.



The Three Formats for Problem Two.



The Rewrite Format.



Rewrite Rules for Potawatomi Morphophonemics



The Rewrite Format: Discussion.104


The Realizational Format.


Realizational Rules for Potawatomi Morphopho¬


nemics .107 3.9.

Stepmatricial Grammars and the Stepmatricial Format.114

3.10. Comparison and Summary.116 4.

From Phonons to the Speech Signal.123


Non-Probabilistic Approximation.123


Introducing Probability.

4.3. 4.4.

Paralinguistic and Idiosyncratic Effects.129 Distinctive Features; Sound Change.130


Phonons and Distinctive Features.132

5. Binary Tree Grammars.




The Second Inadequacy of Linear Grammars ...



The Ordered Pair and Unordered Pair Procedures .






Binary Tree Grammars.



Linearizing Input to a Tree Grammar.



The Time Bomb Method.149




6. Conversion Grammars.



Nonlinear Inputs.155


Finite Networks.


Conversion Grammars.161


Generalized Rewrite Rules.

6.4. 6.5.

An Application of Generalized Rewrite Rules. . . 165 The Stratificational Model.170


Architectonic Comparison of the Models.174

155 163




Semons and Semon Networks.178


From Semons to Lexons.188

7. Ensembles of Grammars.196


Introductory; The Problems of Scope and Relation¬ ship .196






7.3. 7.4.

Categories of Conversion Grammars.209 Grammars from Silence to Silence.216


Other Applications.219






Some Cautions to Linguists. Mathematics is derived from

everyday language by the introduction of various special conven¬ tions that permit a precision of statement and of inference not otherwise attainable. Although mathematics can become extremely complex and difficult, there is no mystery in it, save for such mystery as it may inherit from everyday language or from life itself. Several very elementary points should be remembered by any linguist who is seriously undertaking to learn about mathematics. The first is that mathematicians do not care whether the symbols and equations they write can be distinguished in pronunciation. Although they communicate orally, they sometimes get into trouble unless paper and pencil, or blackboard and chalk, are handy; lacking these, they have been known to deface the luncheon tablecloth. Two different symbols that have the same name (or the same pro¬ nunciation) in spoken English, say 'K and lH\ may be quite freely used in totally unrelated senses. I do not know why this feature of mathematical practice should be disturbing to linguists, who in general can make visual distinctions as well as anybody, but in fact it sometimes is. Perhaps it is due to the linguist’s traditional pre¬ occupation with spoken language, which may lead him to feel that there is something not quite cricket about purely graphic distinc¬ tions. Cricket or not, this is the way mathematicians behave, and we have to accommodate. A second point is that the denotations of many of the mathe¬ matician’s symbols change quite kaleidoscopically. When a symbol is no longer needed in one sense, it may be used in another. In elegant mathematical exposition—there is bad writing here as



everywhere—due warning is always given. This variability is easiest to take if we remember the similar behavior of certain everyday words, such as this or it. The mathematician needs many terms of the this and it variety, so he makes them up as he goes along, drawing on various alphabets and type fonts for symbols and modifying them as he finds convenient by diacritics, subcripts, and the like. The mathematical vocabulary of ordinary words is rather more stable, but here we encounter another point worth mentioning. Some technical terms are Latin- or Greek-based neologisms. Many, however, are picked up off the street, and the everyday senses of the words thus introduced into mathematics are at best only mnemonically helpful, at worst quite misleading.

Thus, set has

nothing to do with bridge, concrete, or hair, field nothing to do with agriculture, nor group with sociology, ring with matrimony, ideal with ethics, imaginary with imagination, lattice with architecture, tree with forestry. If and only if has a precise meaning, which is the same as that of just if or just in case. One might expect almost to be vague; but almost everywhere is in fact defined with absolute precision. Three words that are not technical terms turn up constantly in the talk of mathematicians: obvious, elegant, and trivial. Something is obvious if anyone—that is, anyone with proper training!—would agree that it is so. A proof or argument is elegant if what it demon¬ strates was not obvious before the demonstration but is thereafter. Something is trivial (or in some current usage uninteresting) if it is already obvious to the person who calls it so at the time he pro¬ nounces the judgment. Obviously, the triviality, obviousness, or elegance of something in mathematics has little to do with its validity or utility. The amateur or novice might as well face the fact that almost any bit of mathematics with which he concerns himself is going to be trivial to most professional mathematicians. One learns to shrug off such adverse judgments and go about one’s business. In this, one has support from Einstein, who said ‘If you are out to describe the truth, leave elegance to the tailor.’



Ordered and Unordered Sets.


A year after their marriage,

Mr. and Mrs. Jones had a son Paul; a year later, a son John; a year still later, Luke. If we pay attention to their ages, the Jones boys constitute an ordered set (or sequence) (Paul, John, Luke). Paul suffered a lengthy childhood illness, so that he graduated from high school after his brothers. This defines a different ordered set, denoted by ‘(John, Luke, Paul)’. The two ordered sets are different even though their elements—namely, the Jones boys—are the same. The notion of order, just illustrated, is for mathematics an empirical primitive, not to be defined in terms of something simpler (though many have tried to do so), but to be accepted as familiar to all of us, perhaps to all organisms, because of the nature of life, time, and the physical universe. But for many purposes order is irrelevant. The notion of an unordered set, which is usually just called a set, is slightly less obvious. We represent the (unordered) set of Jones boys by ‘{John, Paul, Luke}’, or by ‘{Paul, John, Luke}’, or in any of several other ways that can easily be figured out. In either speaking or writing, the names of the elements have to be presented in one order or another, but, in writing, by enclosing the list of names in braces we indicate that the order is non-distinctive and is to be ignored. Thus, {John, Luke, Paul} = {Paul, John, Luke}, because the order is irrelevant; but (John, Luke, Paul) f (Paul, John, Luke) because, as indicated by the curved parentheses, the order of naming is to be considered distinctive. 1.2.

Elements and Sets.

The notation ‘a e A’ means that a is

an element (which can be anything whatsoever), that A is an (un¬ ordered) set (or class, collection, aggregate, ensemble), and that the element belongs to (or is a member of, or is contained in, or is in) the set. (Of course, we could replace V and ‘A’ by any pair of distinguishable symbols and still mean the same thing.) If a is Leonard Bloomfield and A is American linguists, then it is true



that a e A. The notation

‘ h, b5}


b. b 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 h

ai a2




We may think of a monogamous community in which all the adult males (the as) are married, but not necessarily all the adult females (the b’s). A surjective function (but not just any surjective association) is also called a surjection; and an injective function is also called an injection. A function which is both a surjection and an injection is a bijective function or a bijection.

Since D4 is surjective but not

injective, it is not a bijection; since D5 is injective but not surjective, it is also not a bijection. Here is a bijection:

bx b2 b3 bi A — {aj, a2, a3, a4}




b2> b3, bfj D6.

10 0 0 0











Think of a monogamous society at a moment when all adults are married. We see that: (1) in the diagram, each element of the range is the target of exactly one arrow and each element of the domain the source of exactly one; (2) in the matrix, each row and each column must contain exactly one 1; and (3) the domain and the range must have exactly the same number of elements. As to the inverses of various sorts of associations, we note the following: (1) The inverse of an association is not necessarily an association; (2) The inverse of a surjective association is a surjective asso¬ ciation;



(3) The inverse of a surjection (a surjective f unction) is a surjective association but not necessarily a function; (4) The inverse of a bijection is a bijection. Since a bijection f:A-*B defines, as its inverse, a unique bijection it often does not matter whether we think of passing from A to B or from B to A. Whenever it does not matter, the bijection (or its inverse) is called a one-to-one correspondence between the two sets.7 1.7.


If a one-to-one correspondence (§1.6) can be

established between two sets A and B, then A and B have the same cardinality. For many sets, called finite sets, the cardinality is simply the number of elements. To say that the cardinality of a set A is m, where m is some nonnegative integer, means that a one-to-one correspondence can be established between H and the set {1, 2, ..., m}. This is exactly what we mean when we say that there are m sheep in a flock or that we have m dollars in our bank account. Clearly, the cardinality of the null class is 0. A set whose cardinality is 1 is a unit class. Now suppose that A is a proper subset of B (§1.5), and yet that A and B have the same cardinality.

For example, let A be all

positive even integers and let B be all positive integers. It is clear that A is a proper subset of B, since A is not empty, B is not empty, and every element of A is also an element of B, while there are elements of B that are not in A—namely, the odd positive integers. The required one-to-one correspondence between A and B associ¬ ates, with each positive even integer k, the positive integer


working from B to A, associates with each positive integer m the positive even integer 2m. We may display the beginning of this correspondence as follows:

’Except for ‘association’, the terminology of this section is that which has now become standard: see Chevalley 1956, Hu 1964 ch. 1. Our sole use of ‘function’ corresponds to the older ‘single-valued function’.



A = {2,









B = { 1,









(We put heads at both ends of the arrows because it does not matter in which direction one goes.) We say, when a pair of sets A and B meet the conditions just described, that both are infinite sets. This is the formal definition of ‘infinite’. It obviously agrees, however, with the informal remark in § 1.3 about infinity and terminal etcetera symbols: in our display, just above, of the correspondence between A and B we use terminal etcetera symbols for both sets. A set that is not infinite is finite. This is the formal definition of ‘finite’, but it agrees with the comment on finiteness made two paragraphs above. An infinite set that can be put into one-to-one correspondence with the set of all positive integers is denumerable. Thus, the set of all positive integers is itself denumerably infinite; so, as we have just seen, is the set of all positive even integers.

The notation

‘{w> *2, •••}’ (without anyterminal name‘xTO’) or ‘{xf}’(without any subscript after the closing brace) denotes a denumerably infinite set, since it tells us that if i is any positive integer, ‘x*’ is a legal name for some element of the set. We can thus establish a one-to-one correspondence between the set and the set of all positive integers by associating x* with i for all i. An infinite set that is not denumerable is nondenumerable. 1.8.


A mathematical (or formal) system involves one

or more sets of various kinds of elements (to which special names may be given for mnemonic convenience), together with one or more relations or operations that tie the sets and elements together. Certain assertions will hold for any system of a certain type: some of these assertions are given as postulates, which define the type of system, while others follows from the postulates as theorems; but there is typically some choice as to just which assertions are selected as postulates and which are left to be theorems. Here is a very simple type of mathematical system. A system



S(K, 5g) is defined in terms of one set, K, and one binary relation, (1) Let x and y each be any element of K. Then it is to be always the case that either x ^ y or y 5S x, and both of these are to be the case if and only if x = y (that is, if ‘x’ and ‘y’ are names lor the same element). (2) Now let x, y, and z be any elements of K. If x sS y and y sg z, then it must also be the case that x 5S z. Any system that meets the specifications just given is a simply (or linearly) ordered set. There are an endless number of systems of this type. For example, let K be the set of all positive integers, and let ^ mean ‘is less than or equal to’. The Jones boys of §1.1 form a system ol this type if we interpret A as ‘is not younger than’ (or, indeed, as ‘is not older than’ or as ‘graduated from high school not later than’ or in various other ways—but, of course, only one way at a time). In a general way, we can symbolize a binary relation by ‘xRy’: this means that x and y are elements of some set or sets with which we are concerned, and that the relation R holds between them. Vaguely, ixRy is like a (declarative) sentence, in which R is like a finite transitive verb or verb phrase. Thus, ‘xRy’ asserts something about x and y. If the relation does not hold—if the assertion is false—we Can write ‘~(xAy)’. For ordinary numbers, as we have just seen, ^ is a binary relation: 2 ^ 3, 2 ^ 2, but

~(3 ^ 2).

The symbol ‘e’ of §1.2 stands for a binary relation; in this particular case we adopted the convention of writing ‘xe/ instead of ‘ ~ (x e y)\ though it would not really matter. The symbols ‘ £ ’, ‘ 3 ‘c’, and ‘z>’ of §1.5 represent binary relations: if A and B are sets, then either A £ B or ~ (A £ B), and so on. An equally general notation for a binary operation is ‘xOj = z\ this means that if the ordered pair of elements (x, y) is subjected to the operation O, the result is the element z. If a relation can be vaguely compared to a finite verb, then an operation is rather like a preposition, prepositional phrase, or gerund: ‘x Oy is not a sentence, but a subject requiring a verb ('=’) and a predicate complement

(V) to complete it. In everyday arithmetic, addition and multiplica¬ tion are operations (basically binary, though by extension «-ary for n ^ 2 because of a property we shall point out later): if x and y are



numbers, then x+y and xXy are also numbers. The symbols ‘n’ and ‘u’ of §1.5 represent binary operations. That is, if A and B are any two sets, then A n B is also a set (possibly A), as is A u B. Neither a relation nor an operation need be binary. A relation can be w-ary for any value of n greater than 1; an operation can be n-ary for any value of n greater than 0. But if fewer or more than two elements are involved, notation of the form ‘xRy and ‘xOy = z is obviously impossible. Instead, we use functional notation, which, it will be noticed, has already been slyly slipped in: the general n-ary relation can be symbolized as R(xx, x2, ..., xn)

(n ^ 2)

and the general n-ary operation by 0(*i, *2, ..., xn) = y

(n ^ 1).

For example, let R be the ternary relation of betweenness: then i?(window, chair, door) just in case the chair is between the window and the door. Let

O be the singulary operation (on numbers) of

multiplying by — 1: then, for any number x, 0(*) = —x, and x + O(x) = 0. Or let

O be the ternary operation on numbers defined

as follows: 0(*, y, z) — xxyz. Then, for example, 0(1, 2, 3) — 8; 0(2, 1, 3) = 2; 0(4, 3, 2) = 36. Operations, relations, and sets are close kin. To show this, let us first note that an n-ary operation can alternatively be regarded as an (x+l)-ary relation. The general symbolization for a binary operation presented above involves three variable names for elements: ‘x’, ‘y\ and ‘z’. Now suppose we have a particular binary operation O- We can define a ternary relation R0, by saying that this relation holds for a particular ordered triple of elements (x, y, z) just in case xOT = z. Suppose the operation is ordinary arithmetical addition. Then, since 2 + 5 = 7, we assert that R+(2, 5, 7); simi¬ larly, R+(5, 2, 7), R+(3, 81, 84); but ~i?+(2, 5, 8), since 2 + 5^8. In a sense, all we have here is a change of notation; but that is just the point. Whether we speak of an ‘operation’ or of a ‘relation’ depends on notation and on attitude, rather than on the abstract mathematical nature of what we are dealing with. In general, given



an n-ary operation 0(*i, x2,..., xn) = y, we can define an equivalent (n + l)-ary relation i?0(xl5 x2, ..., xn, y). Next, we note that an n-ary relation can be viewed as a set whose members are ordered n-ads of elements rather than single elements. A binary relation, in this view, is a set of ordered pairs of elements. Let R be the relation ‘less than or equal to’ for numbers. This relation holds for the ordered pairs (1,3), (2,3), (2,2), (1,2), (99,3000), and so on, but not, say, for (2,1). Or, since R is a class, we can say the same thing with class-membership notation: (1,3) e R, (2,3) e R, ..., (2,1) e R. We can say that a particular ordered n-ad belongs to or is a member of a particular relation (or operation) just as we say that an element belongs to a set. A function (or, indeed, any association) can always be reinter¬ preted as an operation and hence, indirectly, as a set. Suppose we consider the function of D4 (or M4) in §1.6. We have f(af) = blf f(a2) = b2,f(a3) = b2,f(a^) = Z>4, and/(a5) = b3. Not only is/a function; it is also, with no change of notation, a singulary opera¬ tion.

Hence, by the procedure described just above, we can

reinterpret it as a set of ordered pairs: (ax, bx) e f (n2, b2) e /, (a3, b2) e f («4, h4) e f and (a5, b5) e /. By an argument that is approximately the reverse of the first part of this, one can show that an n-ary operation can alternatively be viewed as an association on n variables which may, under certain circumstances, be a function of n variables. Thus, clearly, instead of writing ‘3+2 = 5’ and the like, we could use functional notation and write ‘+(3, 2) = 5’. It would now seem that our definition of ‘mathematical system’, given at the beginning of §1.8, is more complicated than necessary: instead of referring to one or more classes together with one or more relations or operations, we need only refer to one or more classes.

But this is not quite true.

There is one relation that

resists the reduction. The relation in question is that denoted by ‘e’: the relation that holds between an element and a class to which the element belongs. Surely, we could rewrite ‘a e +’ in functional notation as ‘e(a, Af. Either notation means the same thing. But now try to take the next step. Note the parallelism:




y same as 5S(x, y) same as (x, y) e ^


same as e(a, A)

same as (a, A) e e.

In trying to eliminate the relation e we find that we must use that very relation. Hence the elimination is impossible. The most we could say, then, is that any mathematical system is definable in terms of one or more sets of elements, the relation e, and (for reasons spelled out in §1.1) the notion of order. This is logically so; in practice, however, it is much more convenient, and more stimulating to the mathematical imagination, to use relations, operations, functions, and so on, reducing them to appropriate classes of ordered «-ads only under special circumstances. I wish now to add a point on which perhaps very few mathe¬ maticians would agree; obviously, therefore, it should not be taken too seriously. To me, a mathematical system in which the primary emphasis is on relations feels like a geometry, whereas when the major emphasis is on operations it is, instead, an algebra. Formally, this difference clearly amounts to no difference at all. But there is such a thing as the ‘psychology’ of mathematics (though I am not sure exactly what it is), and unless the difference between geometry and algebra resides here it ceases to have any reality at all. And mathematicians persistently continue to use both of these words, in ways that seem to fit my impressionistic definition.8 1.9.

Properties of Binary Relations and Operations.

Relations and

operations can be classed in terms of properties of a quite abstract sort. A binary relation R is reflexive if, for any element x, xRx. The relation is symmetric if, for any x and y such that xRy, then also yRx. It is transitive if, for any x, y, and z, xRy and yRz imply xRz. A relation that has all three of the properties just defined is an equivalence relation.

Let A be a set over which an equivalence

8A more elegant approach, which I believe is approximately equivalent, is to say that geometry (as over against algebra) deals with spaces, and to define a space as a set furnished (at least) with a topology, see Hu 1964 p. 16. This view is the most recent descendant of the brilliant suggestion of Felix Klein, Erlanger Programm (1872).



relation = is defined. Then A consists of pairwise disjunct (§1.5) subclasses {Bi}, such that a and y belong to the same subclass Bi if and only if x=y. The subclasses {Bi} are called equivalence classes. For example, let A be the set of all ordered pairs (m, n) of positive integers, and let (m1? m) = (m2, n2) just if m1 +

= m2

+ n2. Then one of the equivalence classes, say B6, contains the ordered pairs (1, 5), (2, 4), (3, 3), (4, 2), and (5, 1), and no others; nor does any of these belong to any other equivalence class. This is so because 1 +5 = 2+4 = 3+3 = 4+2 = 5 + 1 =6, and there are no other ordered pairs, of positive integers whose sum is 6. Or let P be the set of all people in a certain village, and let xRy just if x and y are children of exactly the same two parents.


equivalence class is then a set of full siblings. No equivalence class is empty, but it can be a unit class if a particular individual has no full brothers or sisters. A binary relation R is irreflexive if there is no a such that xRx. The relation ‘is brother of’ is irreflexive.

The relation is non¬

reflexive if it is neither reflexive nor irreflexive. The relation ‘is best friend of’ is nonreflexive if we think that some people are their own best friends and some are not. A relation R is unsymmetric if, for any pair of elements a and y for which xRy, it is necessarily the case that ~(yi?x). ‘Is father of’ is unsymmetric. A relation R is antisymmetric if xRy and yRx can both be true just when a = y. The relation ‘is less than or equal to’, for numbers, is antisymmetric. A relation which is not symmetric, not unsymmetric, and not antisymmetric is nonsymmetric. ‘Is best friend of’ seems to be nonsymmetric as well as nonreflexive. A relation R is intransitive if, for any three elements a, y, and z such that aRy and yRz, it is necessarily the case that ~(xRz). The relation ‘is father of’ is intransitive.

A relation that is neither

transitive nor intransitive is nontransitive.

Once again, ‘is best

friend of’ seems to be an example. An irreflexive, unsymmetric, and transitive relation is a proper inequality relation. Such a relation holds among the members of any simply ordered set (§1.8), although it is not the relation used in the definition of that class of mathematical systems. Let S(K, 5S)



be any simply ordered set, and define x < y to mean that x ^ y but that a # y (or, what amounts to the same thing, define x < y to mean that it is false that y ^ x). Then < is a proper inequality relation, which either holds or does not hold for any pair of elements x, y e K.9 If we did not already know what ordering is, we could use a proper inequality relation to define it—except that we have to know what ordering is before we can define relations in the first place. A reflexive, antisymmetric, and transitive relation is an improper inequality relation. For numbers, ‘less than’ is a proper inequality relation, while ‘less than or equal to’ is an improper one. It is a relation of this sort that we used in the definition of a simply ordered set (§1.8). The relation e, it is interesting to note, is the exact ‘opposite’ of an equivalence relation.

Whereas an equivalence relation is

reflexive, e is by definition irreflexive: for any a whatsoever, a

e a.

Whereas an equivalence relation is symmetric, e is unsymmetric: for any a and y whatsoever, if a e y then jIa. Whereas an equiv¬ alence relation is transitive, e is intransitive: for any a, y, and z whatsoever, if a e y and y e z then xez. Since a binary relation can be of any of three sorts relative to reflexivity, any of four relative to symmetry, and any of three relative to transitivity, it would seem that there should be 3 X 4 x 3 = 36 abstractly different kinds of relations. It is interesting to list all 36 combinations and to look for, or to invent, examples.


of them turn out to be rather dull mathematically: for example, a nonreflexive, nonsymmetric, and nontransitive relation such as ‘is best friend of’ is dull because it affords one no toehold for drawing any conclusions from any given facts. A few combinations are not possible. For example, if a relation R is symmetric and transitive, it cannot be irreflexive or nonreflexive, but must be reflexive. For, since, the relation is symmetric, from xRy we can infer yRx; but, since the relation is also transitive, from xRy and yRx we can infer xRx, so that it is also reflexive. Now let us turn to binary operations. "This notation, ‘x y

e K\ is shorthand for ‘x e

K and y

e K\


A binary operation


O is commutative (or Abelian) if, for any x

and y, xOT = JO*. Ordinary arithmetical addition and multipli¬ cation are commutative. Subtraction, however, is noncommutative, since x—y — y—x only in the special case where x = y. A binary operation

O is associative if, for any x, y, and z,

(xCdOO = xO(yOz). Here the parentheses indicate the order in which the operation is to be applied.

Thus, in the left-hand

expression we first compute xOTi let tis say that xOk = a. Then we compute uOz. That is, (xOt)Oz = uOz. In the right-hand expression, we first compute yOz; let us say that yOz = v. Then we compute xOv. That is, xO(yOz) = xOv. The operation is associative if, for every choice of x, y, and z, uOz = xOv. Ordinary arithmetical addition and multiplication are associative; hence we need not bother to include parentheses to indicate the order in which the operation is applied, and can, if we wish, think of the operations as n-ary rather than merely binary. We can freely write ‘3x4x3 = 36’ (as we did a few paragraphs back), because either (3x4)x3 or 3x(4x3) will give the same result. Arithmetical subtraction, on the other hand, is not associative, and parentheses cannot be omitted: (9—4)—3 = 2, but 9—(4—3) = 8. A set is closed under a binary operation

O if, for every choice of

elements x, y in the set, xOy is also in the set. Arithmetical addition and multiplication of positive integers always yield positive integers: the set of all positive integers is closed under these two operations. But the positive integers are not closed under subtraction (for example, 7—9 is undefined for the positive integers) nor under division (7/9 is similarly undefined). The definition of the closure of a set under a singulary (or n-ary) operation is entirely analogous. 1.10.


Let A be a mathematical system; let K be

any set of elements involved in S; let R be any relation involved in S; and let O be any operation involved in S. Then let S' be another mathematical system, with K', R', and O' defined in the same way. For simplicity of notation we shall pretend that the relations and operations are binary, but the conditions we are about



to set forth must in fact be met by all «-ary relations and operations, for any n, and must hold for all paired sets K and K' in the two systems. Suppose a one-to-one correspondence can be established between K and K', so that to any element x in K there corresponds a unique element x' in K' and conversely. Suppose, furthermore, than under this correspondence: (1) if, in S, xRy, then, in S', x R'y', and conversely; (2) if, in S, xOy = z, then, in S', x'O'y' = z', and conversely. Under these conditions, systems S and S' are isomorphic. The relation ‘is isomorphic to’ is an equivalence relation (§1.9): obviously, any system is isomorphic to itself; if S is isomorphic to S' then S' is isomorphic to S; and if S is isomorphic to S' and S' to S", then S is isomorphic to S".

Thus isomorphism assigns all

possible mathematical systems to equivalence classes. Let S be the simply ordered set of three elements (Paul, John, Luke) of §1.1, where the ordering principle is priority of birth. Let S' be the simply ordered set (John, Luke, Paul) where the ordering principle is priority of graduation from high school. Here is the one-to-one correspondence under which these two systems are isomorphic: (Paul, John, Luke) $


Paul, John, Luke


(John, Luke, Paul)













Again, let S involve the familiar elements {1, 2, ...}, for which are defined such operations as addition (e.g., 3 + 79 = 82) and multiplication (3 x 79 = 237). Let S' involve the elements {I, II, III, ...}, for which are defined what we will call ‘Roman addition’ (III & LXXIX = LXXXII) and ‘Roman multiplication’ (III @ LXXIX = CCXXXVII). The isomorphism is obvious: +







t &

l @

l {I,

l II,

l III,

x IV,

l v,




Indeed, most people would say that the Arabic numerals and the Roman numerals are just two different sets of symbols for the same meanings. Of course this is true; but how do we know that it is? We know precisely because of the isomorphism.

And this is

exactly the point whenever an isomorphism is discovered. If two systems are isomorphic, then, abstractly speaking, they are the same system. This might seem to vitiate the notion of isomorphism, but it does not, for the following reason. Two mathematical systems may have been developed from totally different points of departure, with differing symbols and terminology, and perhaps with totally different applications in view. Their similarity may thus be ob¬ scured, and a proof of their isomorphism may come as a surprise. In order to demonstrate an isomorphism, it is always necessary to ignore something. Whatever is ignored is, by definition, being classed as what linguists would call nondistinctive, relative to the given abstract system. Thus, in our second example, we ignore the different appearances of Arabic and Roman numerals, as well as the differing mechanics of construction of composite numerals (the place notation for Arabic numerals, the subtractive versus additive significance of the relative order of an ‘I5 and a ‘V’, and the like, for Roman numerals). It would seem that by ignoring different things we might well obtain different and cross-cutting isomorph¬ isms. This is true, but becomes more significant if we turn the statement around and say that a specified isomorphism tells us exactly what to ignore about any particular notation or exemplifi¬ cation that may confront us. That is, isomorphisms define irrele¬ vance, rather than the reverse.10 A formal definition of a mathematical system never does more than to define an indefinitely large class of isomorphic systems, but 10There is, of course, no metamathematical principle requiring the mathe¬ matician to ‘identify’ (that is, to decide to ignore the differences between) two systems that have been shown to be isomorphic. He is free merely to say that the systems are indeed isomorphic (relative to certain criteria), and to search for inferences from that fact, but also to seek crosscutting isomorphisms for either system obtained by distinguishing differently between the distinctive and the nondistinctive. Isomorphism is more powerful and more general than more naive principles of ‘equivalence’ or ‘identity’: isomorphism leaves the mathe¬ matician in command.



we then proceed to ignore anything that the definition sets aside as irrelevant. Sometimes a formal definition does not do that much. For example, our postulates for a simply ordered set (§1.8) only define a class of systems many pairs of which are not isomorphic. If we add, to the postulates already given, a specification that K is to contain exactly three elements, then any two systems that meet the specifications are isomorphic and, hence, differ only as to irrelevancies. Abstractly, it does not matter whether the system is (Paul, John, Luke) or (John, Luke, Paul) or (1, 2, 3) or (x1; x2, x3) or (*, y, z) or something else—it is still the same system in every respect that, by definition, can interest us. As another example, let us consider Peano’s postulates for the natural numbers (or ‘positive integers’; for convenience just called ‘numbers’ in the statements). These postulates make use of a singulary operation s: Postulate Nl. 1 is a number. Postulate N2. If n is a number, then s(n) is a unique number called the successor of n. Postulate N3. There is no number n such that s(n) = 1. Postulate N4. If m and n are numbers and if s{m) = s{n), then m = n. Postulate N5. Let P(n) be any statement about a number n. If P(l) is true, and if the truth of P(s{m)) follows from the truth of P(m) for any number m, then P(n) is true of all numbers. (By ‘any statement about a number n we mean just that: for example, tn is blue’, or ‘«2 ^ »’, or ‘n is prime’.) ^n t Any set of elements that satisfies these five postulates is just the set of natural numbers.

If we find two different sets, they are

isomorphic and hence differ at most in nondistinctive ways, such as appearance. Any notation for the natural numbers is a set of numerals, one for each natural number. Two numeral systems are isomorphic just if they satisfy postulates Nl-5. Beyond this, choice of notation is purely a matter of mnemonic and typographical convenience. If Mr. and Mrs. Jones would guarantee to go on having and naming sons forever, we could use {‘Paul’, ‘John’, ‘Luke’, ...} as numerals.



Recursive and Recursively Enumerable Sets.


We return,

as promised, to the topic of §1.2, since we now have the background for some rather crucial points having to do with the conditions under which we can proclaim that a set is defined.11 First let us define an algorithm (in a somewhat stricter sense than is often given that term). An algorithm is a mechanical and determ¬ inate way of computing something.

For example, given any

positive integer n, there is an algorithm for computing the nth even number. We learn a form of this algorithm appropriate for Arabic numerals in grade school.

Suppose n = 3,281,956,481.

We all

know how to compute 2n: 3,281,956,481

2 6,563,912,962. The 3,281,956,481st even number, then, is 6,563,912,962. It is easy to imagine an n so large that the application of the algorithm would lie beyond all available resources: e.g., one so great that, in Arabic notation, with digits the size of those on this page, it would stretch across the galaxy. This does not matter—the algorithm still fits our specifications. (Recall the principle ‘If you know exactly how to, you don’t have to.’) Now let us consider any denumerably infinite set X. From the definition of ‘denumerably infinite’, there exists a one-to-one correspondence between the set X and the set {1, 2, ...} of all positive integers. We are therefore entitled to imagine that we have arranged the elements of X in the order implied by this corres¬ pondence, and have labelled them according to their position. Having done so, we have an ordered set or sequence (xlt x2, ...), where xi is the particular element of set X that corresponds to the positive integer i under the one-to-one correspondence. Thus, by definition, any denumerably infinite set can be ordered; we can “For this section see particularly Davis 1958; I have also been helped by Gold 1964.



therefore continue our discussion in terms of infinite sequences instead of directly in terms of denumerably infinite sets. A sequence (finite or denumerable) differs from a mere set in that the members of a set must all be different, whereas two terms of a sequence can be occurrences of one and the same element. The set {John, Bill, John} is the same as the set {John, Bill}, since in the former notation we have merely, as though by oversight, named one element twice. But the ordered set (John, Bill, John) is not the same as the ordered set (John, Bill)—there must be some reason for the repetition. (For example, perhaps we are tallying who passes through a certain door: John comes in, then Bill comes in, and then John, having meanwhile climbed out the window, comes in again.) Thus (1, 0, 1, 0, ...), in which the terms are alternately 1 and 0, or (1, 1, 1, ...), in which all terms are 1, is just as good an infinite sequence as is (f, 2, 3, ...) or (1, 4, 9, 16, 25, ...). We want to consider only infinite sequences in which all terms are nonnegative integers. There is really no loss of generality in this restriction, because, as is easily shown, any infinite sequence, whatever the nature of its elements, is isomorphic to one in which the elements meet our requirements. We merely need number the elements, perhaps using the number 0 or 1 first, and repeating a number whenever we come to a recurrence of the element to which that number has already been assigned. For example, (John, Bill, John, Bill, ...), in which the terms John and Bill alternate, is isomorphic to (1, 0, 1,0 ...), with 1-^John and 0Bill. Since the set of all elements that occur as terms in an infinite sequence is, by definition, either finite or denumerable, it is clear that in this process of substitution we will never run out of nonnegative numbers. For our further consideration, then, we have a sequence (px, p2, • • •) = (pi), in which every term is some nonnegative integer. The set of elements that occur as terms in this sequence is clearly the image of some surjective function/(§1.6) whose domain is the set of all positive integers: that is, for any positive integer /,/(/) = pi. In the first example of this section, we have /(/') = 2i. In the case of the sequence (1,0, 1, 0, ...), we have


f(i) = 1

if i is odd,

= 0

if i is even.


A set {pi} is said to be recursively enumerable if (1) it is the image of some surjective function / whose domain is the set of all positive integers, and (2) there exists an algorithm by which f(i) can be computed for any choice of i. A recursively enumerable set may be either finite or (denumerably) infinite, and we have already had an example of each. The function that generates the sequence (1,0, 1, 0, ...) has as its image the finite set {1, 0}; the function//) =2/has as its image the denumerable set (2, 4, ...}. An algorithm may be such that, in order to compute/(/) for i > 1, we must first know/(/— 1), hence also f(i—2) and so on all the way back to/(l). It is still an algorithm, and the image of the function is still recursively enumerable. Suppose, however, we are supplied with a mechanical procedure for trying to compute /(/') for any positive integer i, but have no guarantee that, for every choice of i, the computation will eventually be completed.

For example,

imagine that the procedure consists of several steps. The first step gives us, for any i, a number x which is not necessarily an integer. The next step requires us to determine the square root of x. Now there is a mechanical procedure for extracting square roots—one which we all learn in high school and which most of us then forget. But for a great many numbers this mechanical procedure never ends: we can carry it on as far as we wish, but whenever we get tired and stop we still have only an approximation to the square root. If, in the computation of/(/) for a given i, the x we get is of this sort, then we can never move on to the remaining steps of the computation. The whole mechanical procedure for the computation off(i) is then not an algorithm in our sense, and the image of/is not a recursively enumerable set. Now, again, consider any denumerably infinite set X whose members are nonnegative integers. Suppose that, when confronted by any positive integer n, there exists an algorithm by which one can decide in a finite amount of time whether n e X or n e X. As before, we may imagine (because of the definition of ‘denumerably



infinite’) that the elements of X are ordered and appropriately labelled, so that from the set X we get the infinite sequence (px, p.2, ...). Then the supposition just stated can be rephrased as that of being able to decide, in a finite amount of time, whether or not there is a positive integer i such that pi = n. If we can, then X is a recursive set. Referring back to the discussion of §1.2, we see that a set is recursive just if (1) its elements are nonnegative integers, (2) it is denumerably infinite, and (3) it is ‘defined’, in the sense given to that term in the earlier discussion. We remove the first of these three restrictions, however, by asserting that a set is recursive, regardless of the nature of its elements, if it stands in one-to-one correspondence with a set of nonnegative integers which is recursive by the definition given above.

Thus, as in the case of recursive

enumerability, the restriction to nonnegative integers actually loses no generality. Any recursive set is also recursively enumerable.

We shall

illustrate with the set of all primes, and then generalize. There is a simple algorithm for determining whether a given integer n is prime or not: one attempts to divide it by 2, by 3, and so on, and if it is not divisible by any integer (except 1) up to and including n—1, then it is prime. Hence the set of all primes is recursive. Now suppose that we know that n is the zth prime. To find the (z'+ljst prime, we apply the algorithm just described first to n + l, then to n+ 2, and so on, until we find the next integer after n that is a prime; that integer is the (/+l)st prime. This mechanical procedure will always work. It will never go on forever without yielding a result, because (as has been known since Euclid) there is no largest prime. Hence it is an algorithm, and the set of all primes is recur¬ sively enumerable. The generalization is immediate. Let X be a recursive set. Then there is an algorithm by which we can decide whether any given nonnegative integer is in X or not. Using that algorithm, we test in succession the nonnegative integers 0, 1, and so on, until we find the first integer that meets the test. Call that integer px. We proceed to test /?! + !, /h+2, and so on until we encounter the next integer



that meets the test; that integer is p2. In this way, we can determinate Pi for any positive integer i. Since X is denumerably infinite, there can be no largest element in it—our procedure will never go on forever without discovering the next larger element. Thus, X is recursively enumerable. Although, as we have just shown, all recursive sets are recursively enumerable, the reverse is not true; there are recursively enumerable sets that are not recursive. To prove this we merely need exhibit one case.

For our example we shall make use of the decimal

expansion of n, which can be computed to any desired number of decimal places by a mechanical procedure: to five places, n = 3.14159. It is known that, no matter how far we carry the computa¬ tion of n, the decimal portion will never end and will never repeat itself. Yet we can carry it as far as we want to. Now we shall throw away the digit to the left of the decimal place, and use the digits to the right as the source of the successive integers pi of an infinite sequence (px, p2, ...), as suggested by this display: .14159... Pi

- 1

P% = P*=

4 14

Pi = 1 P5= 41 Ps = 141

The next four terms, all of which end in the fourth digit of the decimal part of n, are 5, 15, 415, and 1415; the next five are 9, 59, 159, 4159, and 14159. It is clear that the sequence (pt) is infinite, and that pi can be computed for any integer i by the algorithm we have described, since the decimal expansion of n can be carried out as far as we wish. Some integers will occur more than once: we have already seen p1 = Pi = 1. The sequence cannot contain any largest term, since an integer of n digits is larger than one of m < n digits, and the sequence contains integers of every finite number of



digits. Therefore the set P of all nonnegative integers that occur as terms in the sequence (p%) is denumerably infinite. The set P is recursively enumerable, but it is not recursive. Suppose we are given some particular integer, say 3,827,561,422, and are asked whether that integer is in P. All we can do is proceed to compute successive terms of the sequence (pi) and watch for the given number. If it turns up, we know that it is in P. But if, after any given amount of computation, it has not yet turned up, we do not know whether or not it will turn up later. Thus, for an arbitrary integer n, we can sometimes prove that n is in P, but can never prove that it is not in P. Are there denumerably infinite sets of nonnegative integers that are not recursive and not even recursively enumerable? Suppose that from the sequence (pi) just described we generate a sequence (qi) by the following procedure. We consider in turn each term of (pi)> and decide whether or not to assign it to (qi) by tossing a coin. If the coin comes up heads, then we let qt = pt \ if it comes up tails, then, regardless of the value of pt, we set qt = 0. Clearly, a number (other than 0) cannot occur in (qi) unless it occurs in (pi). But since we cannot know, except as it were by accident, that a given number belongs to P, we certainly cannot know whether or not it is in Q, the set of all integers that occur as terms in (qi). So Q is not recursive. Nor is it even recursively enumerable. Two different people (or machines), independently following the instructions for generating the members of a recursively enumerable set, will, after the same amounts of computation, have generated exactly the same members. But the instructions we have given for (qi), though simple to follow, guarantee no such identity of result. The procedure is not deter¬ minate, because ol the coin-tossing requirement, and hence is not an algorithm in our sense. However, we cannot simply conclude from the foregoing that Q shows the existence of a denumerably infinite set that is not even recursively enumerable. In order to reach that conclusion, we should have to agree that our description of Q defines a set; and that issue is problematic. To be sure, if we follow the instructions and list n terms of the sequence (qi), we have generated a perfectly



respectable finite set: the set of all numbers that occur as members of the finite sequence. But the instructions we have given constitute a sort of etcetera symbol, and the problem is whether that etcetera symbol can be viewed as clearly enough defined to count as naming a set. There is disagreement as to the answer. In any case, we see that there are reasonably respectable and manipulable sets (namely, those that are recursively enumerable but not recursive) that are not defined in the sense of §1.2. The implication is that our definition of ‘defined’ was not sufficiently precise. For denumerably infinite sets, we must replace the simple notion of ‘defined’ by the two notions developed in this section. Rather obviously, wherever in pure or applied mathematics we encounter what purports to be a denumerably infinite set, it is of importance to determine whether it is recursive or only recursively enumerable, or perhaps neither; if it is neither, then we may not even have a set in the mathematical sense; if it is only recursively enumerable, there are strict limitations on possible mathematical manipulations. The example we shall confront in the sequel is the so-called ‘set of all sentences’ of some language.

Some of the

assertions often casually made about this ‘set’ are not at all obvious¬ ly true empirically. Yet we cannot put mathematics to work unless we make some precise assumption about the nature of this ‘set’, even if the assumption goes against some of the empirical evidence and requires a willful suspension of disbelief. 1.12.

Model and Exemplification.

When a particular mathe¬

matical system is used to talk about, or to make inferences and computations that bear on, something outside itself, the system is called a model of what it is being used to discuss, and the latter is called an exemplification of the system. Children learn model and exemplification together, and only gradually pull the model out of context so that it can be manipulated independently of any parti¬ cular exemplification. Thus they start by noting that two oranges and two more oranges makes four oranges, that two pencils and two more pencils makes four pencils, and come to the conclusion that two and two are four whether the exemplification involve



oranges, pencils, or anything else. A mathematical system, such as arithmetic, is a model of anything it can be a model of. In mani¬ pulating a system, we do not care what its exemplification is; but this (as we have insisted before) is very different from saying that it has no exemplification. A formal system that by definition could have no exemplification—if it can even be imagined—would not be mathematics but nonsense. However, it is perfectly acceptable for a formal system to find its exemplification in some other formal system, For example, one exemplification of simply ordered sets (§1.8) is the natural numbers as defined by Peano’s postulates (§1.11). To show that systems of type Sx find exemplification in systems of another type, S2, one must show that the class of all systems that meet the postulates for S2 is a non-null subclass of the class of all systems that meet the postulates for Sx.

In the case mentioned, we define a relation

Sa for natural numbers by saying that, for any natural numbers m, n, and p,m 1. We shall allow ourselves to speak of a character as a simultaneous bundle of components although, formally, the system we are deve¬ loping has nothing to do with time so that the term ‘simultaneous’ is out of place. Despite this usage, the mathematics of individual characters is just finite set theory. Thus, we define a subcharacter K to be any subset of some character K, including K itself and the null subcharacter 0.6 It will be convenient to display the components of a subcharacter (or character) in column rather than row form; since the symbols in a single column represent members of a set, and the vertical alignment will not be used for anything else, we can omit the braces that usually enclose the names of members of an unordered set, but I use enclosing vertical fines in a way that will be clear in a moment: 6At this point we have two symbols, the inverted ‘V’ and the one introduced here, both of which denote simply the null set. Strictly speaking this is redun¬ dant; but it is useful because of what might be called connotations: we use the new symbol instead of the old one just when we are dealing with subcharacters of a componential alphabet. In the ordinary mathematical representation of a term of a matrix, it is customary for the first subscript to denote the row, the second the column. But in a stepmatrix there are no rows', the first subscript therefore represents the column, the second the (arbitrary but fixed) position within the column.



Kj =

en en


A stepmatrix is a string over a componential alphabet Q. The term hints at two-dimensionality; yet a stepmatrix is no more a matrix than a stepmother is a mother. For example, suppose A =

a b

and B =

a d

c d

c b

If A and B are matrices, then A # B except just in case b = d; but if they are stepmatrices they are not only equal but identical. Also, a df b e c is no matrix, but it might be a stepmatrix. Every stepmatrix over Q, then, is of the form e2\








where m ^ 0 (to allow for the null stepmatrix 0 if m = 0), tij ^ 1 for/= 1, 2, ..., m, and the values of


for different


are inde¬

pendent. In an equivalent linear notation, this stepmatrix is


= l^-i

K2 . . . Km|




where each Kj is some K{ of Q. A stepmatricial harp is any subset of the free monoid F(Q) over



a componential alphabet Q. A harp defined in terms of an ordinary (noncomponential) alphabet will henceforth be called a linear harp-, the term ‘harp’, without modifier, will be used ambiguously for either kind. A stepstring is a string of subcharacters; hence every stepmatrix is a stepstring, but not vice versa. Let t = \Kj\v and t' = |Kj\v be two stepstrings of the same length p such that, for all j, Kj n K) = 0. Then the join of t and t' is by definition the stepstring tut1 whose y'th term is Kj u K A stepstring t belongs to (or is in) a stepmatricial harp just in case there is some stepstring t' such that tut' is a stepmatrix of the harp. If t is itself a stepmatrix of the harp, of length p, then the requirement is met by t' = j0|p (the stepstring of length p all of whose terms are the null subcharacter). It is evident that a stepmatrix s can in general be decomposed in either of two ways. (1) If s is of length at least 2, then we may deconcatenate it by a vertical cut into shorter non-null stepmatrices s' and s" such that s's" = s. This is exactly the same as the de¬ concatenation of ordinary strings. (2) If s is non-null, then it can be decomposed by a horizontal slice into stepstrings t' and t", both of the same length as j, such that t’ u t" = s.

Except for the

requirement of identical length, slicing is somewhat more general There is an obvious a df than cutting. For suppose s b e similarity between cutting this into s' =

a d

and s" = |/|, and

b e c slicing it into t'

a d 0

and t" = I 0 0 / |. On the other hand,

b e there is no way to match, by cutting, a slice into t' = | a d 0 and t"

b ef

c If Q is any componential alphabet, we define Q to be the set of all stepstrings that belong to the free monoid F(Q).




Some Empirical Considerations.

We are now in a position

to be more precise about the kind of grammar we need for a spoken language. The set of all sentences of a spoken language can be more closely matched by a stepmatricial harp than by a linear harp. This tells us what we want the outputs of our partial grammar G" to be. We want them to be stepmatrices over a componential alphabet, the components of whose characters can be identified with the phonons of the language. Of course, none of the above tells us how to discover the phonons of a particular language. This is an empirical problem rather than a theoretical one, but it has certain theoretical aspects that we must discuss.

We shall do so in terms of two possible stepmatricial

systems for Potawatomi, arbitrarily labelled ‘Aleph’ and ‘Gimel’. Relative to these two systems, the Potawatomi words /ntept:an/ ‘I hear it’ and /Ciman/ ‘canoe’ are portrayed as shown in Figure 1. The Aleph

Na St


Ap Ap L

Lb Ap L


Lm H

Lb L


Sn Ob F

Ob Ob B


Ob F



Sy St


Sy Na


Cn Cn Ur Cn Cn Ur Ur



Na Sy Na B

Cn Ur Cn Ur Cn

Yd Vd VI Vn Vn VI Vd


A1 A1 Vd UL A1 Vd A1

Vn Vd UL Vd A1 A1



Ap St


The Gimel

Na St


Ap Ap Sy Lb Ap Sy Ap

Do Na




Vd VI Vd

Na Sy Na

Lm Lm Lb Do Ap

Ft Af = affrication; A1 = alveolarity; Ap = apicality; B = backness; Cn = consonantality; Do = dorsality; F = frontness; Ft = fortisness; H = highness; L = lowness; Lb = labiality; Lm = laminality; Na = nasality; Ob = obstruence; Sn = sonorance; St = stopness; Sy = syllabicity; UL = involvement of upper lip; Ur = unroundedness; Yd = voicedness; VI = vocality; Vn = voicelessness. Figure 1

The theoretical bias in the Aleph system might be called that of phonetic realism : it seeks to acknowledge the occurrence of any



definable phonetic feature wherever it is found, undisturbed by the resulting redundancy, if that feature helps to identify sentences. The bias of the Gimel system favors economy: it officially re¬ cognizes the smallest possible number of phonons, and of phononoccurrences, from which everything else is predictable. A highly plausible case can be made out against either system by accepting the bias of the other.

By detesting arbitrariness, we can argue

against the Gimel system. By abhorring redundancy, we can tear the Aleph system to pieces. Thus two linguists, in complete agree¬ ment about the empirical facts of Potawatomi, could get into a long, heated, and inevitably futile argument. But this is just what we want to circumvent. Our task is not to take sides.

It is to spell out the basic constraints within which

choice, being purely a matter of taste, is by definition unarguable. There are four considerations. (1) The basic empirical constraint is that a stepmatricial system is unacceptable unless it distinguishes sentences just as speakers of the language do. There may be reasons, as hinted in §2.0, why the most we can hope for is an approximation, but that is here beside the point.

If either the Aleph or the Gimel system meets this

requirement, then so does the other.

For the two systems are

mutually convertible, to use Bloch’s term.7 Either system provides for an infinite number of stepmatrices. A finite set of rules can be assembled that will rewrite any stepmatrix of the Aleph system as the corresponding one of the Gimel system; furthermore, a finite set of rules can be assembled that will rewrite any stepmatrix of the Gimel system as the corresponding one of the Aleph system. This is what is meant by ‘mutual convertibility’: there exists a one-to-one correspondence between the stepmatrices of the two systems, so that the two systems are isomorphic down to the sizelevel of the whole stepmatrix. That a pair of corresponding stepmatrices look quite different is irrelevant, just as it is irrelevant for the isomorphism between Arabic and Roman numerals that ‘32’and ‘XXXII’ look different (§1.10). Of course, all this means that the ’Bernard Bloch apparently used this expression only orally: neither he nor were able to find it in his published works.



Aleph and Gimel systems are merely two selected from an in¬ definitely large set of stepmatricial systems that are pairwise mutually convertible and hence all equally ‘correct’ as far as the first criterion is concerned. (2) The second requirement is that phonons may not be arbitrary counters; each must be describable in articulatory or in articulatoryacoustic terms. The amount of leeway this allows is difficult to define. In the Gimel system, the difference between Potawatomi /e/ and /i/ is taken to be the same as that between /t/ and /£/: the presence of something called ‘apicality’ in the first of each pair, versus the presence of ‘laminality’ in the second. Since the blade of the tongue is raised for Potawatomi /i/ and not—or not very much—for /e/, I find this phonetically realistic. Whoever devised the Aleph system apparently did not, since for vowels he posits phonons called ‘high’ and ‘low’ that do not appear for consonants. We see that mutual convertibility can hold not merely between different systems but also between differing prejudices.

What I

mean to exclude altogether by the second requirement, even if the first might allow it, is a system invented more or less as follows: one lists the ‘segmental phonemes’ or even the ‘allophones’ of a language in some arbitrary order, and then encodes each into a simultaneous bundle of symbols drawn from the smallest stock of symbols that will do the job of keeping the bundles apart.


Potawatomi, Figure 2 shows one such unacceptable system. Here phonon e is said to recur in /p o i £: ? e p:/. One need have no special knowledge to recognize that an articulatory-acoustic de¬ scription of such a so-called ‘phonon’ is impossible. / p n o £ k: w i s t y s £: t: ? u m s: e k s: a p: / a a a a a a aaaaabbbbb b c c c d e bbbbbbcccd c c c c d d d c c c c d d d e e

d d e

d d e e


Figure 2

Even if we keep well within the loose bounds established by the



second requirement, we tend to have rather strong feelings about the degrees of appropriateness of different systems that meet the first requirement. I think these feelings are the manifestations of an undesirable philosophical bias that might be called ‘elementalism’ or ‘atomism’. Having found a set of phonons that will do what must be done for a particular language, we tend to reify those particular elements, assigning them an independent reality that they need not actually have. The following analogy may help. There is an age-old sophomoric dispute as to whether a triangle is equal to, or greater than, the sum of its parts.

Either view is incorrect,

because the word ‘sum’ is out of place. We are obliged to recognize that a triangle is not a ‘sum’ of anything (unless that term be drastically redefined), but an arrangement of parts—a matter of geometry, not of arithmetic. Once this is recognized, it is rather easy to see that a triangle can be decomposed in more than one way, with a concomitant difference in the rules of assembly. We can view a triangle as composed of three line segments arranged in a certain way, or, equally well, as composed of three wedges (Figure 3). Either approach should yield a consistent treatment ot

Figure 3

the geometry of triangles, and any geometric truth about triangles statable in either treatment should also be statable in the other. Perhaps one treatment would prove easier or simpler than the other. But it would be pointless to contend that line segments are more ‘real’ or ‘natural’ than wedges, or vice versa. (3) A third empirical requirement for stepmatricial systems is that the stepmatrices of the system serve as suitable points of departure for the functions that map phonological material into the speech signal. If there is a core of empirical truth in the pho¬ nemic theory, it is that a speaker’s articulatory motions map a



discrete array of all-or-none elements into a continuous signal, and that, to understand what is said, a hearer must requantize the incoming continuous signal, thus recovering or reconstituting the discrete array (even though doubtless there are many situations in which the reconstitution need not be complete). The continuizing functions that generate the speech signal, then, have stepmatrices of phonons as their arguments. We could think of these functions as ‘rewrite rules’, if we wanted to, but they differ from any set of rules within what we ordinarily think of as a grammar for a language in two crucial ways: first, in that there are necessarily a nondenumerable infinity of minimal rules, so that they can be formu¬ lated only via composite rules; and, second, in that they are stochastic rather than determinate. The general structure of the phonon-to-speech-signal functions is discussed in §4, where the points just made are spelled out in greater detail. Because any two stepmatricial systems that meet the first two requirements are mutually convertible, it is obvious that any system meeting those two requirements will also satisfy the third, though not necessarily with great efficiency. The third requirement is use¬ ful in that it does away with one argument that a proponent of a ‘phonetically realistic’ system of the Aleph type might otherwise bring to bear against a supporter of economy.

The Aleph-type

phonologist wants to include redundant features because they play a part in identifying sentences, especially in the presence of noise. A phonologist of the Gimel persuasion can argue that it is not necessary to set up phonons for materials that will be added by the phonon-to-speech-signal functions, so that one might as well seek the simplest stepmatricial system that meets the other requirements. After all, he can argue, what reaches the ears of a hearer is the speech signal, not a phonon stepmatrix of either the Gimel or the Aleph type. However, the seeker of economy is not completely disenthralled by the point just made. One should like to minimize the complexity of the phonon-to-speech-signal functions, and there is no guarantee that this is accomplished by the stepmatricial system that might be judged simplest from some other point of view.

This goal is



discussed in §4, especially §4.5. Furthermore, there is still a fourth requirement—one that, unfortunately, may be in conflict with the aim just mentioned, so that some compromise must be sought: (4) The stepmatricial system must be a convenient target for the operation of the rules of the partial grammar G", the part of the grammar G that delinearizes.

There is here, again, no a-priori

reason why the portrayal of the phonological system that best meets this requirement should be the simplest of all those that satisfy the other requirements. The rules of G" may be easier to formulate if the stepmatricial system is a bit fuller than a supporter of the Gimel system would prefer.

If so, there seems to be no

reason for adding special rules to prune out the redundancy. 3.3.

The Three Formats for Problem Two. In order to investi¬

gate the second of the three problems set forth at the end of §3.0, we must establish some at least tentative answer to the third. I shall base my tentative answer on the kind of phenomena we used to treat, in pregenerative days, under the rubric ‘morphophonemics’. I shall assume that the terminal subalphabet of G' is a finite set of characters called morphophonemes or, using the convenient short term proposed by Lamb, morphons; and I shall assume that these morphons are very much like what we use to call ‘morphopho¬ nemes’. A terminal string from G' is then a morphon string. I shall further assume that, with one sort of exception to be described in a moment, the effective rule chains of G (§2.3) ter¬ minate within G'. That is, whatever format be possible for G", its workings can be provided for—to revert to the computing machine example—by permanent wiring and interlocks, never requiring independent switch-setting as part of input. There are certain trivial cases (and perhaps some not so trivial, to be considered later) in which the second assumption meets with difficulties. The trivial ones are the cases of what we used to call ‘free alternation’. In Potawatomi, glottal catch is clearly distinc¬ tive: /m?we/ is ‘wolf’; /mwe/ is not a word at all. But within the phrase, after a word ending in a consonant, it does not seem to matter whether the next word begins with glottal catch followed



by a vowel or just with the vowel. By ‘it does not seem to matter5, I mean that hearers ignore this difference as they decode what has been said, and that, accordingly, speakers do not use the difference to distinguish between meanings but, as it were, toss a coin each time to decide whether to pronounce or omit the glottal catch. In a pregenerative frame of reference such a state of affairs is em¬ barrassing: one would like to regard the glottal catch in the parti¬ cular environment as nondistinctive, but the environment in ques¬ tion cannot be described without mentioning things (such as words) that have no status within the phonological system. In a generative model, such a free alternation can be provided for by a nonobligatory rule. Yet we do not wish this particular sort of nonobligatory rule to occur in effective rule chains. Accordingly, we specify that any ‘free alternation5 rule is to be assigned to G" rather than to G', and that its conclusion in or exclusion from any rule chain is to be controlled by an interlock device that, like a speaker, tosses a coin. G", then, is to be activated by any morphon string received from G', and is to generate a stepmatrix; further, the mapping of mor¬ phon strings into stepmatrices is to be a function, save just for instances of free alternation. There are at least three distinct formats for a G" that will meet these specifications. I shall call them respectively the rewrite for¬ mat, the realizational format, and the stepmatricial format. 3.4.

The Rewrite Format.

In this, G" involves a finite set of

rewrite rules much like those of a linear grammar, but with some crucial differences. The instrings and outstrings of the rules are stepstrings of a componential alphabet Q"(G"). Terminal strings are stepmatrices of a componential alphabet T", where T" ci Q". G" must provide a rule chain for any possible morphon string generated by G , and that rule chain must be unique except for cases of free alternation. An ‘initial’ rule of G" is defiined not in terms of an arbitrary instring I but as one which will accept a morphon string from G' as instring. At this point there may seem to be a difficulty. Relative to G',



a morphon string is not a stepstring but merely a string. Yet the rules of G" are supposed to accept only stepstrings. The solution lies in the fact that in switching from G' to G" we switch alphabets. Relative to the terminal subalphabet T'(G'), a morphon string is a simple (linear) string. But relative to the componential alphabet Q"(G"), each morphon string must be a stepstring. To guarantee this, we merely require that T'(G') £ Q". Relative to Q", then, a morphon string is a stepstring whose constituent subcharacters consist each of a single component.

These components can be

quite arbitrary—they need not recur in any of the other characters or subcharacters of Q". Since the special task of G" is to delinearize, it is also necessary that the very first rule of any rule chain in G" begin the delineariza¬ tion. That is, the very first rule must rewrite at least one morphon of the instring as a subcharacter with more than one component. If this were not the case, then the rule would be rewriting one simple string as another simple string—a purely linear manipula¬ tion, which by definition is to be taken care of by the partial grammar G', not by G". As soon as delinearization has begun, this format for G" allows the Hallean type of context-sensitive rule, not available in a linear grammar (See §2.2 (4) and Halle 1962.) A rule of the form a ->• b in the environment x_y z can be interpreted as follows. Suppose a stepstring t can be cut into x, a', and y, and that a' can be sliced into a and z. Then the stepstring t is acceptable to the rule, and the corresponding yield (the ‘outstepstring’ or ‘stepoutstring’) is xb'y, where b' = h u z. 3.5.

Rewrite Rules for Potawatomi Morphophonemics.

We shall

now illustrate the rewrite format for a partial grammar G", using Potawatomi as the language. Potawatomi is useful for this because its morphophonemic behavior is rather complicated. Since I know nothing of Potawatomi intonation, we simply leave it out; it is doubtful that taking it into account would render matters any



simpler or neater. 1 assume (perhaps incorrectly) that internal open juncture is inaudible and hence not phonologically distinctive. It will be phonetically helpful to associate the letter ‘u’ in the transcriptions with the vowel of English cup rather tnan with a high back rounded vowel.8 We need 31 morphons: (p t T c k p: t: T: 5: k: s s s: S: s: m n N wy?#UOuoeia-)-}• The number could be reduced to 25 by viewing ‘f as a separate morphon, but our rules here will treat p:, t:, and so on as units. In addition, we shall use the symbol ‘©’ to mean ‘boundary of string’ in the specification of environ¬ ments. That is, ‘after ©’ means initially, and ‘before 0’ means finally. We divide the rules into two sets, C-rules and R-rules. Every rule, when applied to a string (or stepstring), is to rewrite all suitably environed occurrences of its operand (§2.2 (1)). The C-rules apply first, and rewrite all morphons except + and — as simul¬ taneous bundles of components; the R-rules then adjust the components in terms of simultaneous and successive environments. There are sixteen components: stopness

1 Cn 1 consonantality 1VI 1 vocality 1 Lb 1 labiality

st 1 Sp


Ob |


Na |


Sm |


Sn 1


G1 |


1 Pa 1 palatality 1 D° 1 dorsality

Ft 1


*1 W | weakness.

1AP1 apicality *1 Pp 1 palatalizability

Of these, the two marked with an asterisk do not appear in terminal stepmatrices. The other fourteen, all of which do, are phonons. After all relevant rules have been applied to a morphon string, the result is a phonon stepmatrix. The fourteen phonons occur only 8Hockett 1948a.



in certain simultaneous bundles, and it will be convenient to rep¬ resent stepmatrices linearly by sequences of symbols that represent the bundles. The symbols to be used in this way (enclosed between slant lines), and the bundles they represent, are as follows:



it St St









St St St St St Sp Sp Sp Sp Na Na Sm Sm G1 VIVI VI VI VI

.Jo Ap Pa Do Lb Ap Pa Do Ap Pa Ap Pa Lb Ap Lb Pa Cn

Lb Ap Pa Do

Ob Ob Ob Ob Ft Ft Ft Ft Ob Ob Ft Ft Sn Sn Sn Sn On Cn Cn Cn Ob Ob Ob Ob Cn Cn Ob Ob Cn Cn Cn Cn Cn Cn Cn Cn

Cn Cn

The first 26 C-rules can be applied in any order. We list, with them but unnumbered, three bogus rules that merely replace one symbol for a component by another. The change of notation is typographically and mnemonically convenient, but is not a true rewriting because the substitution is strictly one-to-one. Cl.

p —*






Lb C3.

T -►







St Pa

Pp C5.



p: ->

St Lb


Ft C7.

t: -*



T: -

Ap Ft C9.

5: ->

cn. s -»


St PP Ft








Sp Ap




Sp Pa




s: —»



S: —»

Sp Pp

Ap Ft C15.

s: —>




m —>

Na Lb

Pa Ft C17.

n —>







Ap C19.

w —»








? —>




U —►



# -





Lb W


u —»





Lb C24.






Ap C26.

a —►

VI Do •

The next two C-rules may be applied in either order, but must follow the first 26: C27.

| 0 | —>

Ob | in env

or St



Sn | in env

Sp or



The last C-rule must apply after the preceding two.


provide economically for the addition of redundant components: C29.

0 | -> | Cn | in env

or Ob

or Sn












©OkUma©. The rules that apply are C5, C16, C21, C22, C26, C27, C28, and C29; the result is VI Cn VI Cn VI


Lb Ob W Sn Do W St Do

Na Lb

Since the C-rules (plus the three bogus rules) completely eliminate all the symbols used for morphons, except — and +, we are now free to reintroduce any or all of those symbols in new values. We do so merely as a matter of convenience in notation, to achieve compactness in the statement of the R-rules: If X is any symbol used earlier for a morphon, we shall now use X to denote exactly the simultaneous bundle of components into which the C-rules map that morphon. For example, before the application of the C-rules the symbol ‘k’ denotes a morphon. After the application of the C-rules, the same symbol ‘k’ is defined as linear shorthand for the simultaneous bundle

Cn Ob St Do

Thus, the array displayed above, which results from the applica¬ tion of the C-rules to the morphon-string ©OkUma©, can be represented exactly by the notation ‘©OkUma©’. The conventions by which we allow such notations at this point are not a new set of ‘rewrite rules’ that undo what has just been done, but merely a matter of convenience. Further conventions for the stating of the R-rules are the follow¬ ing : If X is the symbol for some component, we shall continue to enclose X in vertical lines when we refer to the component; if the vertical lines are omitted, then X represents ambiguously and in¬ differently any simultaneous bundle that contains the component. Thus:



| St | St

is a component; is any of the set p t T c k p: t: T: c: k: .

‘C* will denote any sequence of one or more Cn. ‘V’ will denote any VI that does not contain the component | W |. *+’ will denote either itself or ©. The R-rules are strictly ordered. That is, given any output from the C-rules, each R-rule must be considered in turn to see whether or not it applies. Three of the R-rules are optional: that is, if the conditions for their application are met, they can nevertheless be applied or skipped at will. These three are marked by a plus-orminus sign (±). One rule, marked with an asterisk (*), while not optional, yields optionally either of two results from either of two operands. Parentheses enclose what may be present or absent. Rl.


| Pa




Pp 1










in env Pa


Na R3.


in env _i


1 Ap I-


in env +__C .


in env

_C .

in env C__c.


- —»• u -> 0.



in env VI.



or R9. RIO.

w 1w 1


in env_VI.

0 —>


in env -f(C)WC_+ or

Rl 1.



_ o.

0 |

C__c+ •

in env Q_-




where i is even, in X(C1)WC2W...CnWCn+i7, in which X ends with +, or with V if Cx is present, and 7 begins with V, +, or W+. R12.




0. _o

in env or

in env_l


or R15.




_y .

in env C(+)_C or

R16. *R17. ±R18.









in env +_V .



in env



Ft R19.





in env _ _




X where | X | is | Ap | or | Pa |. R20.

| 0 |


| Ft |

in env

SpOb Ft_


| Ft |

| 0 |

in env

Ob Ob(Ob(Ob))


(Ob)ObOb Ob



St V




? St

or or


St |(+) +Ob .



±R22. R23.




in env C+_.. 0.

We are now ready for examples. Each example is given as a morphon string and as a phonon stepmatrix, the latter in the linearized notation.

The R-rules involved in each example are

listed. The C-rules are not, since such listing would hardly be helpful. El.

n-nUkUtUN-we /nnuktuswe/.


kwUtUmoT-ke—# ‘he is out fishing’. 15,16. /ktumocke/.


n-pUk:UT-wep-n-a‘I release him’. Rl,6,7,11,12,21. /npukcuwepna/.


nUt-pUk:UT-wep-n-a (same). k:ucwepna/.




?esU-S:-k ‘the way it lies’. Rl,6,7,10,12,19./?es:uk/.


n-nUkUtUN-a ‘I beat him in a race’. /nnuktuna/.


OkUma ‘chief’. R4,11,12. /wkuma/.


nUt OkUma Um‘my chief’. R7,8,11,12: /ntokmam/; or R5,7,8,11,12,16: /ntokumam/.


‘I win in a race5.

Rl,2,6,ll,12. Rl,6,7,l 1,12,

Rl,6,7,l 1,12./ntup-

S:—n ‘it lies thus’. Rl,6,7,10,11,12. /sus:un/. R3,6,7,ll,12.



sees his wife’. R6,7,8,10,11,12,13,15,17,23. /(P)wapmantuk:weyomun/. El 1.

wUt — Uk:weyo — Um — Un + w — wap — Um — a — Un (same). R6,7,8,10,11,12,13,17,23. /wtuk:weyomun(?)wapman/.


kUt—sya—mUn+Potan ‘we’re going to town’. R6,7,10, 11,12,22,23: /ktusyamunotan/; or R6,7,10,11,12,23: /ktusyamunPotan/.


pUm—y—ik /pmuPik/.

‘they are




nUt—kUk:—?w—a ‘I choose him’. k:u?wa/.


R6,7,ll,12. /ntuk-




n—kUk:—?w—a (same). R6,7,ll,12./nkukPwa/.


n—wap—Um—a+UmUk:0 ‘I see the beaver’. R6,7,ll, 12.21.23. /nwapmamuk/.




/mukmwapma/. Uk:we+n—wap—Um—a ‘I see the woman’.


21.23. /kwenwapma/. n—wap—Um—a+Uk:we (same). R6,7,12,23. /nwapma-


k:we/. n—mUsUnUPUkUn ‘my paper’.


nuPkun/. nUt—Uk:U—im ‘my land’. R7,9,ll,12. /ntuk:im/.


UnUnU#w-Uk ‘men’. R7,10,l 1,12,16. /nunwuk/.

E23. E24.

w—os:—Un ‘his father’. R7,10,13. /?os:un/. Po+UtU+UnUnU#w‘that man’. RIO,11,12,15,16,23./


Potununu/. UmUk:0 +?o +UtU /muk:otu/;





R6,10,11,12. /nmus-



R6,7,l 1,12,23.




21,23: /mukPotu/. Pes:UpUn ‘raccoon’. RIO,12,20,21. /?esp:un/. Index:









R5: R6: R7:




R9: R10: Rl 1: R12:


E2,3,4,5,6,7,9,10,11,12,13,14,15,16,17,18,19,21,22,23. E21. E5,6,10,11,12,20,22,23,24,25,26.

El,2,3,4,5,7,8,9,10,11,12,13,14,15,16,17,20,21,22,24,25. El,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22, 24,25,26.


























The Rewrite Format: Discussion.

The treatment of Pota-

watomi morphophonemics in the preceding section is admittedly not easy to read; the following discussion should help. (We use ‘string’ to mean either string or stepstring.) The system involves 52 rules in all. The way to ‘turn the crank’ and make the system work is as follows. Take a large sheet of paper with 53 horizontal ruled spaces. Insert a morphon string (e.g., one of the 26 given as examples) from left to right in the topmost space. List the 52 rules down a column on the left, one to each space from the second to the last, in just the order in which they are given in §3.5. Now inspect the morphon string and the first rule. If the rule will accept the string, apply the rule and write the outstring to the right of the rule, under the original string. If the rule will not accept the string, check the next rule. Consider each rule in turn, always checking its applicability to the last string alraedy entered on the sheet and ignoring any strings that appear above the last one. Since a rule preceded by the sign ‘±’ is a freealternation rule, it can be skipped even if a string is acceptable to it—toss a coin to decide. Remember that each rule that is applied is to operate on all suitably environed occurrences of its operand. When all the rules have been checked, the last string that has been entered on the sheet is the desired terminal stepmatrix. The rule chain that has been used can also be read from the sheet: it consists of all those rules next to which a string has been entered. This shows that the rules of the rewrite format are ordered. If



Rule R precedes rule R' in the list, then there may be a rule chain that includes R and R' in that order, but there can be none in which R' precedes R. It is important to note, however, that this ordering is only partly functional. The position of two rules relative to each other is functional just in case one of them leaves untouched, or may generate, environment to which the other may refer. On this basis, the first 26 of the Potawatomi rules must precede all the others, but could be permuted in any way among themselves with¬ out modifying in the slightest the mapping of morphon strings into phonon stepmatrices. The next two rules must follow the first 26, and must precede the rest, but could be reversed in order. The remaining 24 rules must have just the order in which they appear.


The Realizational Format.

We can best introduce this by

discussing the rewrite format for a moment more. Recall that the first 26 of the Potawatomi rewrite format rules have no inherently necessary order. Suppose we think of ‘turning the crank’ of the system in two stages instead of one. The second stage, using the last 26 rules, will be just like the single-step crankturning described above, except that it will operate on the output of the first stage instead of on morphon strings. The first stage will involve only the first 26 rules plus, for con¬ venience, the three bogus rules listed with them in §3.5 and two further bogus rules that rewrite — as —, + as +. That gives us one rule for each morphon. We take a sheet of paper ruled into two horizontal strips.

We write a morphon string down in the

upper strip. We have the rules listed in some convenient place for ready reference, but not on the sheet of paper and not in any particular order. We now select any term of the morphon string —not necessarily the first—, find the rule that will operate on that term, and write down, in the lower strip of the paper, directly below the term in question, the subcharacter required by the rule. We then select any other term of the morphon string and do the same thing. We proceed in this way until all the terms of the morphon string have been dealt with. We have then generated, in the lower



strip of the paper, a stepstring, in accordance with the morphon string and the 31 rules. In the first stage, as described, the rules being used are strictly unordered. Also, the order in which we consider different terms of the morphon string is entirely arbitrary. Since this is so, we are certainly free, as a matter of convenience, to move along the morphon string from left to right, operating on each term in succession; but we do not have to do this in order to guarantee that our results will be correct. We also do not have to be sure that we check all the rules to see if they will apply. As the workings of the rewrite format system were described in §3.5, each of the C-rules was to rewrite all occurrences of its operand in the instring. That requirement does not hold in this new way of applying them: instead, a rule is applied to one term at a time of the morphon string. Indeed, in this way of working the system we do not rewrite anything. The morphon string is not erased, even metaphorically, as we apply the rules; it is still there when we are done. We have added something—the stepstring that accords with the given morphon string under the requirements of the rules. These are the crucial clues to the realizational format. In this format for a partial grammar G", all the rules of G" are like the first 26 of §3.5: the order in which they are applied is determined entirely by the order in which one chooses to consider the terms of the morphon string.

There is, however, this crucial difference.

realizational rules are context-sensitive, so that a given morphon may be realized by different subcharacters in different environ¬ ments. The environments for a particular realization must be de¬ scribed with reference to the morphon string in which the operand occurs. This is necessary, since the morphon string is the only string involved in the application of the system except foi the stepstring that one is in the process of generating.

It is also

possible, since the morphon string is not erased or altered in any way by the process, but is there as long as one needs it. Formally, the realizational format involves two alphabets: the alphabet T'(G') and a componential alphabet Q"(G"). There is one composite rule for each morphon of the alphabet T'(G'); it



maps that morphon into one or another subcharacter of Q"(G"), depending on the morphon environment in which the given morphon occur.

The composite realizational rules are strictly

unordered; however, the constituent minimal rules of a single composite rule may be ordered so as to render as simple as possible the specification of environments. 3.8. Realizational Rules for Potawatomi Morphophonemics. There are a few minor discrepancies between this treatment of Potawa¬ tomi morphophonemics and that of §3.5. Morphons strings look the same for both, but here

is treated as a separate morphon

rather than as part of several others. The phonons are fewer for this treatment: |Ob|, |Cn|, and |Sn|, recognized in §3.5, are here ignored. A few rare situations not encountered in the examples given at the end of §3.5 are not provided for here, though they were in the rewrite format. The 25 morphons for this treatment, then, are: {ptTcksSs: mnNwy?#UOuoeia — +}. We use the symbol © as before. The following cover symbols are used in specifying environments:

St = p t T c k Sp — s S s Ob — St Sp Sm = w y Ct = Ob Sm m n N ? # C = any string of one or more Ct C' — any C except one beginning with s = u o e i a W = — U -U U— U-U O -O O- U-O O-U 0-0


W' — any W except —

Parentheses around something indicate that it may be present or absent. The specification of environments also involves two subscripts, 1 and 2. To interpret the specification of environments in applying the rules to a given morphon string, it is first necessary to scan the



string and index certain occurrences of C with these subscripts. The procedure for indexing refers separately to each substring bounded by successive occurrences of + or ©: we use ‘+


represent both: First: (a) If the substring is -\-(C)WCW-\-, index the last C with 2, yielding +(C)W/C2W/+. (b) If the substring ends with ...CWF but does not as a whole conform to the specification of (a), index the last C with 1, yielding ...CXW+. (c) If the substring ends with ...CWC-j-, index the last two C s respectively with 2 and 1, yielding ...C2WC1-\-. (d) If the substring ends with ...CTC+, where X is not W, index as ...CTCx©. Second: Scan the substring from left to right, looking for sequences of any of the following three shapes in which C’s have not been yet indexed: + W(CW...CW) + CW(CW...CW) VCW(CW...CW). What immediately follows, however, must not be another CW with unindexed C.

In any such sequence, index with 1 and 2

alternately beginning at the left: X^WiC^W^W...) + C1W(C2WC1W...)

VC.WCC^WC.W,..). Note that in a sequence of the first type the initial + is indexed. However, in the second step: if any IT is — O, U—O, or 0 — 0, and the immediately preceding C has been indexed with 2, then the immediately following C may optionally be indexed with either 1 or 2; thereafter the indices alternate as before. Third: if any C’s remain unindexed, index with 2. In the specifications of environments in the rules, ‘C’ with no



index refers to any C, but ‘Cx’ and ‘C2’ refer only to C s that have been indexed that way by the above procedure.9 In the statement of the realizational rules we use an arrow to mean ‘is realized as’. The rules are strictly unordered, but within each rule there are subrules for different environments, and these subrules must be considered in the order in which they are listed. Thus, when consulting rule 13 for the morphon w, one must first —as indicated by the cross-reference—check to see whether the w fits the environment given for the spaced-out combination :...w in rule 9; if it does not, then one proceeds with the subrules of rule 13, in order, until the proper environment is found.







realization —►

St Lb





St Ap




_(0 -c 1 _(0(—)i J


all others


Pa St Ap





Pa 5.




St Do

9It might seem that this complex procedure of indexing consonantal morphons constitutes a ‘rewriting’, so that the whole conversion from morphons to phonons is not being accomplished, as claimed, by realizational rules. I think this is not so. The indices introduced by the procedure do not appear in the realizational rules as items to be realized, but only in the descriptions of envi¬ ronments Thus the indexing procedure is logically a part of the description ot morphon environment; it could be replaced by a very much more complicated statement of environments for certain of the individual rules, but the loss in clarity would be great, with no compensating gain.











all others


Sp Ap



—h-s _:X-N-C —:i-N(-)i _fV's



—:2—s _:-C'




all others

Sp Ap






—(:)i^N(-)i all others

Sp Pa



:...? :...w

St_(S,m)1(lF)+(PP)_ S/_(Sm)1(lF)+_fVw 1


S't__(5m)1(W0+_(_)oj St_0Sm)1(lF)+_w


0... | G1 | ■ | Ft J...0

0...0 alternation)

Ob_xWOb ObObxWOb_ St_XW? ? xWSt_ St__(Sm\(W)@ St_0Sm\(W)+Ob ®(W)Stz_


all others

| Ft |


(in free










realization Na Ap









_—c ■



Pa Na

all others

Ap 13.

:...w w

(first see rule 9)

_1lVw ->

—(-)o _2WO

C{Sm\{W)+_fFw C(5m)1(lF)+__(-)o C(SmUlV)+W__z(W)0 ©_Ww



C(SmUW)+_iWC\ all others

I G1

| G1 | 0


(in free alternation)

0 Sm Lb



C_ilL —iWy 1

i G1

_(-)i ] Sm all others


(first see rule 9) J | G1 1

C(Sm\(W)+(W)_ all others

|0 | G1 |

(in free alternation)

112 rule





16. #






I V1 I

















u -u u u u —


o O -o O -u u -o o o

(see rule 9)





c2_c J C2_+j u o

all others

(Note how this rule is to be applied.



When one encounters, in

the morphon string, an occurrence of U, O, or —, one is first to see if that occurrence is part of an occurrence of one of the sequences of two or three morphons of those tabulated in the ‘morphon’



column above. If it is, the appropriate rule for the whole sequence is to be applied. If not, the appropriate rule for the single occurrence is applied.) rule









C ->

| Ft


N!-N_2-C N1-N_2(-)i Sp^WOb_2 all others

- 0

(Since a morphon string includes as many 0’s as one wishes before or after every morphon, to apply this rule one must obvious¬ ly look not for nulls but for the environments. The rule could be discarded by recasting rules 6, 8, and 12 for s, s, and N, and rules 1-8 for Ob) 24.



0 •

In testing this set of realizational rules, it should be noted that they will, in the first instance, write the phonon | Ft | as a separate bundle: e.g., the morphon string nUt—pUk:UT—wep—n—a (example E4 of §3.5) will come out as n

U t

— p





VI St Lb

— wep — n — a VI Na Sm VI St St Ft VI St Do Ap Lb Ap Lb Pa Do

U k



(0’s omitted)

for which the linearized notation (by the table given near the beginning of §3.5, but ignoring the three notations ‘Ob’, ‘Cn\ and ‘Sn’) is /ntupk:ucwepna/. However, this means that we are adding



a trivial readjustment by which

St Ft Do

is changed to


. We

Do Ft

can make the rules themselves yield this result without changing their formulation, merely by adding a convention whereby, when¬ ever a realization rule yields | Ft |, that is to be interpreted as simultaneous with the immediately preceding bundle rather than as a separate bundle in its own right. It is only the notation for this that presents any difficulty. 3.9.

Stepmatricial Grammars and the Stepmatricial Format. The

rewrite and realizational formats discussed above differ in the nature of the coupling of G' and G". In both cases, the coupling is via the terminal strings of G\ But in the rewrite format these ter¬ minal strings turn out, as one passes to G", to be stepstrings of a componential alphabet Q"(G"), so that they are acceptable as in¬ strings to the rules of G". In the realizational format, there is no sudden conversion of alphabet, but merely the conversion worked by the realizational rules themselves, which map strings in the alphabet T (G') directly into terminal stepmatrices in the compo¬ nential alphabet Q"(G"). In either case, of course, the terminal morphon strings from G' function, as required, as control input to G". A third possible format is suggested by a third way of coupling G' and G". Before we can describe this coupling, we must define what will be meant by a stepmatricial grammar. If Q is a componential alphabet, then Q is the set of all stepstrings that belong to F(Q). A stepmatricial grammar is a system GS(Q, /, T„ R), where: Q is a componential alphabet; I is a unique stepstring of Q; Ts is a proper subset of Q called its terminal {componential) subalphabet, and is such that 7 is not one of its stepstrings; and R is a finite set of rules. The postulates for a system Gs are such obvious analogs of those for a linear grammar (§2.1) that we shall not present them in detail but merely specify the crucial difference: namely, that the terminal strings ol a system

Gs must be not only stepstrings but stepmatrices over the com-



ponential subalphabet Ts. Note that if we were to discard require¬ ment (3) in the definition of a componential alphabet (§3.1), then a linear grammar would turn out to be a stepmatricial grammar for which max(«) = 1, so that Q = Q and all stepstrings, including /, would themselves be stepmatrices. A stepmatricial grammar is therefore like a linear grammar in that inputs are rule chains, and in that outputs are terminal strings; it differs in that terminal strings are over a componential alphabet instead of a simple alphabet, and in that context-sensitive rules can refer to simultaneous environment. Now, since the output harp of a linear grammar is linear, and the input harp to a stepmatricial grammar is also linear, let us propose coupling G' and G" by letting the former be a linear grammar, the latter a stepmatricial grammar, and having the out¬ put harp from G' be the input harp for G". To provide for this, it is merely necessary that the set of rules R of the stepmatricial grammar G" have one rule R(ju) for each morphon p of the terminal subalphabet T'(G'). Obviously, R(/u) must be an initial rule (one that accepts the arbitrary substring / as instring) just in case p occurs initially in some morphon string from G'. To turn the crank on a G" of this stepmatricial format (as we somewhat arbitrarily call it), take a large sheet of paper, write the arbitrary initial stepstring / at the top, and write the successive terms of a morphon string down the left-hand side in a column. Since each morphon (relative to G') is a rule (relative to G"), one may think of the entries in the column on the left as cross-references to the detailed rules, which are written out in some other convenient place.

No testing is needed to see whether the successive rules

apply, for by definition they all do. Therefore, one follows the first rule in the column and rewrites I as it requires; then one applies the second rule of the column to the outstring from the first; and so on, until one reaches the end of the rule chain. The last rewriting is the required terminal stepmatrix. There is a potential source of waste in this format that may be only apparent. Either a linear or a stepmatricial grammar charac¬ terizes exactly both its input harp and its output harp. With the coupling specified for the stepmatricial format, the output harp



from G' is characterized by G', but since it is the same as the input harp to G", it is also characterized by G". Obviously it is pointless to make both partial grammars do this one job. However, there is a simple modification that might render the discovery of an actual grammar G = G'G" (for some given language) simpler. Suppose we let the linkage between the two partial grammars be via the set H(G') n C(G"), where this may be a proper subset of H(G') and also of C(G"). That is, in designing the partial grammar G', we can allow it to generate some terminal strings that are not acceptable as inputs to G", if this freedom makes the design simpler and if it indeed generates all terminal strings that are needed as inputs to G''. Similarly, in designing the partial grammar G", we can allow it to accept some input rule chains that would actually never come to it from G', if that makes the design easier, provided it does accept and respond properly to all inputs from G' empirically needed for the characterization of the language.10


Comparison and Summary.

We must now compare the

three proposed formats for a delinearizing partial grammar G (§3.3), and also consider further the third problem of §3.0, to which, so far, we have given only a tentative answer (§3.3). First I must offer a frank statement of my own experiences in trying to formulate examples of the three formats. I began with a prejudice in favor of Lamb's realizational approach, and after hitting on the cute notion of letting terminal strings from one partial grammar be rule chains for the other, I began to hope that the stepmatricial format would work best. Things did not come out that way.

Working with Potawatomi, I found the rewrite

format surprisingly easy to set up. The realizational format was rather more difficult. The stepmatricial format was so hard that I gave up. Thinking that all this might stem from the nature of Potawatomi, or from my long-standing habits of handling that luAt the moment, only one version of this proposal strikes me as possibly advantageous: that of letting H(G') c: C(G"). This was actually done in the sample formulations of §§3.5,8.



language, I turned to the Yawelmani dialect of Yokuts.11 Here, again, the rewrite format was fairly straightforward, but my patience gave out before I could cast the data into either of the other formats. Understandably, then, my current prejudice (for this particular portion of the whole grammar of a language) is for the rewrite format.

However, no one should take this too seriously.

report has to do only with relative ease of discovery.



simplicity of operation, once the partial grammar G" has actually been formulated in one or another format, is far more important. The issue is not closed.12 Nor is the issue as to the proper answer of our third problem (§3.0) closed by the tentative answer we gave: that is, our proposal to regard the break between Gf and G" as falling exactly where delinearization is to begin, to think of the characters of the sub¬ alphabet T'(G') as much like old-fashioned morphophonemes, and to allow no optional rules in G" except in the case of free alter¬ nation. Any one, or all three, of the parts of that tentative answer may be wrong: some alternative may be formally more elegant, empirically more effective, or both. Three points must be made. (1) It might be argued that there should be no break: that is, that G cannot profitably be decomposed into two coupled partial grammars. For a spoken language, this is tantamount to asserting that G should itself be a stepmatricial grammar (or, allowing for sources of difficulty to which we shall come later, some other kind

“Newman 1944, 1946. 12Of course, there are other formats (or further variants of the ones described and illustrated in §3), on which we have not touched. Among these is a rewrite format that includes a transformational cycle: the input is a bracketed string with labelled brackets, and the transformations of the cycle are applied (in order) first to adjust the materials within the innermost pair of brackets and to erase that pair and its label, then for the next-to-innermost, and so on, until all brackets have disappeared. The examples I have seen of this are unconvincing. For English, they do not yield, as terminal strings, anything I am able to recognize as English; also, unless held in check in some manner not yet described they can yield more distinctions along some scales (such as stress) that can possibly be functional. But here, also, further exploration is assuredly in order. See Chomsky and Miller 1963 and their references.



of grammar that yields stepmatrices as its outputs). To posit the desirability of decomposition is to subscribe to a sort of stratificational view of language design, involving at least three strata: that of (original or absolute) input (arrays of rules, inputs to G'); the morphon stratum (linear: outputs from G', inputs to G"); and the phonon stratum (stepmatricial). The term ‘stratum’ is Lamb’s and my own; but the transformationalists, using a different vocabulary, seem to be in agreement on this particular point (see §6.6). Two types of evidence render the decomposition plausible. The first is that it has been possible to do so much, in a fairly efficient way, within the constraint of assumed linearity of output—the great bulk of the transformational literature dealing with specific languages attests to this. Decomposition makes it possible to keep almost everything that was worked out within that constraint, assigning it to G', as one adds the empirically necessary de¬ linearization in the form of a separate partial grammar G". Actually, our reference should not be just to the transformational literature: most of what has been discovered about languages in the whole history of linguistics has been set forth linearly, and very little requires reworking merely because we recognize that a stepmatricial harp matches the sentences of a spoken language more closely than does a linear harp. The second type of evidence lies in the nature of the linkage between paired spoken and written languages, say English. It is clear that the grammars of spoken and of written English are very largely identical. Yet spoken sentences are more closely matched by stepmatrices, whereas written sentences are ordinary strings.13 We may perhaps think of a single generative grammar for English which bifurcates somewhere near the output end,

one fork

generating terminal strings of letters, the other yielding spoken sentences. The bifurcation is then perhaps in the general vicinity of our break between G' and G". Let us make this clearer. A common lay view of the relation “This does not imply that stepmatrices might not be useful for the generative grammar of written languages—say, for capital versus lower case and the like. However, their role would be very different.



between speech and writing assigns priority of some sort to writing, of which speech is merely a fleeting reflection. To formalize this view, let Jt be the set of all (legal) morphon strings, if the set of all phonon stepmatrices, and if the set of all written sentences (for a language like English—not, say, for Chinese); then we can draw the following diagram:



jt*-'if——& A




Here g is the association from Jt to if (§1.6), /' that from if to £?, and g' = gf that from Jt to 5C14 Now there are all sorts of reasons why the lay view cannot be correct, but in the present context we need only consider one: in the frame of reference of generative grammar, g,f, and g' must all be not simply associa¬ tions but functions, and not merely functions but, quite precisely, surjections. But/' cannot even be a function. This is shown by such written sentences as ‘Take a bow’ or ‘That isn t arithmetic , wherein there is no clue as to whether the pronunciation should be /bow/ or /baw/, ‘arithmetic’ or ‘arithmetic’. The traditional view of linguistics, based on the undisputed priority of speech over writing in both ontogeny and phylogeny, gives rise to a different diagram:



-——-—vnr f

- fg'

But formally (and in a strict synchronic sense) this is also wrong, since such spoken sentences as ‘It isn’t /mlyt/

or That hasn t

14If/is an association from a set A to a set B, and g an association fiom B to a set C, then there is a resultant association from A to C that we choose here (and in’§7) to denote as fg, rather than as gf; the latter convention is perhaps slightly commoner.


been /red/ yet’ do not tell us whether the spelling should be ‘meet’ or ‘meat’, ‘red’ or ‘read’. The correct diagram, then, showing the bifurcation


earlier, must be like this:

Here / and g are both surjections; but g' — f~xg and /' = g~xf are merely associations. This seems to accord with the evidence: writing down what someone says aloud is not merely a mechanical encoding, but involves a (trial-and-error) decoding and an en¬ coding,' reading aloud what one sees written similarly involves trial-and-error rather than mere mechanical conversion. (2) In at least one way, it may prove unwise to keep G" free of communicatively significant optional rules. In our sample handling of Potawatomi, we confronted a case of free alternation and decided that we could allow it to be provided for within G" rather than within G' because the difference seems not to be used by the speakers of the language. However, if I had practical control of Potawatomi I might feel that the alternation is not so much free as stylistic. Certainly stylistic differences are ‘used’ by the speakers of a language, though not in the same way that other differences are. What is the difference between slow formal spoken English ‘I do not know’ and rapid informal /ayow+now/? clearly not identical.

They are

Perhaps the answer, in such cases, is that

the rule strings for the two are identical within G' and differ only in G . Such a way of separating style (or one category of style) from other types of meaning would be very neat. But it would mean that effective rule chains are not, after all, constrained to G' (§3.3). Instead, we should have to recognize that each effective rule chain C is the concatenate of two subchains C' and C", the former confined to G', the latter to G".15 15See Klima 1964 for a fairly extensive attempt to handle stylistic differences; a brief example, rather easier to follow, is Komai 1963.



What I believe is a better frame of reference for problems of this sort will be developed in §7, after another brief mention in §6.6. (3) I consider it a defect in the morphophonemic formats ex¬ plored in this chapter, and in all other proposed formats for morphophonemics in a generative grammar, that there is no machinery—at least, no obvious machinery—with which to dis¬ tinguish between irreducible automate alternation and other sorts of alternation. We have alternation whenever a single morphon string is mapped into different phonon arrays in different (morphon) environments. Alternation was classed as automatic, in the old frame of reference of linear phonemes, if one alternant, called the base form, was replaced by some other alternant just in those environments in which the base form would be phonologically (or ‘phonemically’) impossible. For instance, the base form of the regular noun plural suffix in English is /z/; it is phonemically impossible for /z/ to occur after, say, ‘cat’ or ‘fox’, so that the appearance of /s/ after ‘cat’, of /iz/ after ‘fox , is automatic alter¬ nation. Now, as we discard linearity, some instances formerly treated as automatic alternation cease to be alternation of any sort. For example, if English /dz/ and /ts/ are treated, in terms of phonons as something like Ap Ap St Fr


Ap Ap St Fr Vn

(‘Fr’ = ‘fricativity’, ‘Vn’ = ‘voicelessness’), where Vn has an effect that stretches through all successive bundles of phonons until a bundle of a certain sort is reached (see §4), then the noun plural How¬ suffix is represented in both ‘cads’ and ‘cats by /z/ — Ap Fr ever, there are also instances of automatic alternation that remain upon the abandonment of linearity. German bunde : bund (with /t/), versus ‘bunte’ : ‘bunt’ is one such instance. Cases of this sort are what I mean by irreducible automatic alternation. Irreducible automatic alternation differs from all other kinds of



alternation in that it is clearly forced on the rest of the language by the phonological system—by the fact that phonons occur in only certain arrangements, to the exclusion of others: when the rest of the morphophonemics tries, as it were, to generate an illegal array of phonons, irreducible automatic alternation takes over and en¬ forces legality. This, it seems to me, is an empirical fact about the design of some languages (perhaps all). Thus, the grammar we write for a language should not obscure the distinction between this particular sort of alternation and other sorts.

We must

experiment with the various formats that have been suggested for morphophonemics until we discover some modification provides for this distinction neatly and efficiently.




As promised in §3.2 (3), and rendered obvious by the above title, this section describes the functional connection between phonon stepmatrices and the speech signal. The issue is tangential to our main inquiry. It is necessary here to use kinds of mathematics not touched in the survey of §1; but the chapter can be skipped without loss of continuity.1 4.1.

Non-Probabilistic Approximation.

The speech signal can

be viewed as a continuous time-dependent v-dimensional vector y((), y = (yfc)v. The dimensionality v is the number of independent parameters that must be specified to characterize the speech signal at any given instant. We do not care what value v has as long as it is finite; actually, it is probably fairly small, say a dozen at the most. We assume that each parameter is bounded, and specify choice of units so that for each the bounds are —1 and 1. Thus, for all k and at any time t, —1 Rr \ ^ R i




—* R s

R i —> R 2 —► ... —* R Figure 12

array shown in that figure is not a string. If we feel that doublebased transformations are empirically useful, then we are forced to conclude that the input geometry provided by a linear (or stepmatricial) grammar is inadequate: a kind of grammar must be found that allows more complex input geometry. This remark defines the direction our inquiry will take in most of the present section.

However, it should be noted that the

statement is conditional. One might, alternatively, seek to preserve linearity of input by devising some other way to accomplish what double-based transformations do. Recently Chomsky has under¬ taken just this; we shall briefly outline his alternative in §5.5. 5.1.

The Ordered Pair and Unordered Pair Procedures. The most

obvious way to set up a double-based rewrite rule R2 is to specify that such a rule requires, as operand, an ordered pair (slf s2) of instrings rather than a single instring, each of the pair meeting explicit specifications. Each of the instrings, as well as the outstring s' = R2(sy, s2), will be a bracketed string with labelled brackets, in each case with I as the label of the outermost pair of brackets. The specifications for acceptability can be formulated for each instring as illustrated for a single-based transformation in §2.2 (3). If we accept this procedure (hereafter referred to as the ordered pair procedure), we get a certain sort of input geometry. Suppose



we consider the two simple sentences ‘John said so’ and ‘Bill thought so5.3 Nonterminal strings underlying these two can ob¬ viously be conjoined so as to yield several different complex sentences, including in particular ‘John said Bill thought so’ and ‘Bill thought John said so’. Now what is the difference between the rule arrays for these two different complex sentences? Either different rules are involved, or else the same rules appear but with differing interconnections.

In some instances of such pairs of

complex sentences, if not in this particular example, it seems likely that we will want to ascribe the difference to the interconnections rather than to the choice of rules. If so, the difference must be provided for as shown in Figure 13. The point is that in this case ‘John said Bill thought so.’ ‘Bill thought John said so.’ Figure 13

the ordered pair (si, s2) and the ordered pair (s2,

are both

acceptable as operands for the double-based rule R'[. The ordering is indicated in the rule arrays by appropriate numbering of the two arrows converging on R'[. Input geometry thus requires not merely rules and arrows, but rules and numbered arrows (or some equivalent). There is an alternative, that we shall call the unordered pair procedure-, it yields a different input geometry. We can eliminate the ordering of the two instrings for a double-based rule by, as it were, assigning the control of ordering to the instrings themselves. We formulate our rules in such a way that no string is acceptable by a double-based rule as the one to be ‘subordinated’ to the other —that is, to be embedded in the case of an embedding rule, to be represented second in the outstring in the case of a conjoining 3Perhaps the kernel sentences should have ‘it’ instead of ‘so’; for the immediate issue this is not important.



rule—unless it is explicitly marked for subordination; and we add a special single-based rule Rs to provide for the marking. We may think of Rs as a rule that rewrites any acceptable instring merely by adding the special character ‘S\ say at the right-hand end. The operand of a double-based rule is now not an ordered pair of instrings (jl5 s2) but an unordered pair {sj, $2}, of which (in addition to other specifications) exactly one must end with ‘5”. Whether outstrings from Rs are also acceptable as instrings for some of the single-based rules does not have to be decided here. The need for the numbering of arrows is obviated, and the arrays for our two complex sentences can now appear as in Figure 14. ‘John said Bill thought so.’






‘Bill thought John said so.’



Figure 14

Empirically, there seems to be little basis for choice between these procedures. We noted in §2.4 that in a good grammar it should be choice of rules (and of rule arrays) that correlates with meaning. It seems entirely reasonable to propose that in the case of embedding or conjoining there are two sometimes independent choices to be made: the choice of which instring is to be embedded or conjoined (‘subordinated’), and the precise kind of embedding or conjoining. The unordered pair procedure renders both of these choices a matter of selecting rules; the ordered pair procedure handles one of them by an additional geometric property of rule arrays.

But since rule arrays perforce have some geometric

properties, we cannot casually conclude that the simpler geometry is necessarily the better. 5.2.


The nature of the input geometry required for

double-based rewrite rules should be emerging from our discussion, particularly from the diagrams. We assume that every rule of an



effective rule array is either single-based or double-based (never, for example, triple-based), that initial rules are single-based, and that the whole array yields a single outstring (as terminal string or stepmatrix). Then a rule array is a binary converging tree. This is true under either the ordered pair or the unordered pair procedure; the difference will appear shortly.4 There are various ways to characterize a binary converging tree abstractly.

A simple way is to tell how to construct one.


materials needed are n elements called nodes (e.g. dots of the sort any pen or pencil can supply) and n—1 directed connections (arrows). Scatter the dots on a piece of paper in any way desired. Select one dot and mark it terminal. Choose an integer m such that 1 ^ m ^ (« +1)/2, and mark m other dots initial. Then insert the arrows, in any way whatsoever subject to the following four requirements: (1) each arrow connects two dots, and no pair of dots is connected by more than one arrow; (2) exactly one arrow must lead away from each node except the terminal one (from which no arrow leads); (3) either exactly one or exactly two arrows must lead to each node except an initial one (to which no arrow leads); (4) it must be possible to move from any node to the terminal node by passing from node to node along arrows, traversing each arrow from nock to point. All that is abstractly relevant in a binary converging tree is the nodes and arrows.

Once the structure has been assembled, the

nodes can be moved about in any way one wishes without in the slightest changing its geometry, provided that the arrows stretch and bend so that no connections are broken or changed. The two parts of Figure 15 are thus abstractly identical: the second is obtained from the first merely by deforming the nodes and arrows until all the arrows point in the same general direction and no two arrows intersect. Obviously, no generality is lost by requiring that all binary converging trees be in this neat form. One may now optionally add a further specification: that the initial nodes be simply ordered relative to one another. With the “More is done with trees here than in any source known to me. As background, see, for example, McKinsey 1952, pp. 101 ff„ Flament 1963, Berge 1959.





• initial * ** • —► • Figure 15

tree in neat form, this can be indicated by numbering the initial nodes, as in Figure 16. Or one can just say that the order in which

Figure 16

the nodes appear on the paper (for example, reading from top to bottom in the second part of Figure 15) is the structurally distinctive order. If the former procedure is used, or if one labels initial nodes when the tree is not in the neat ‘untangled’ form, then care must be exercised: the numbering must be such that the tree can be de¬ formed until the initial nodes are actually in the order indicated by the numbering, without requiring any arrows to intersect. If we add this optional specification, we have a binary converging tree with ordered initial nodes. If we do not add the specification, we have merely a binary converging tree.

The ordered pair

procedure for double-based rules requires the former; the un¬ ordered pair procedure requires the latter.

A little inspection

suffices to show that ordering initial nodes is equivalent to number¬ ing the arrows of each pair that converge on a single node. A subtree of a binary converging tree is any subset of the nodes of the tree, along with all arrows that connect those nodes, that is itself a binary converging tree. In Figure 17, the closed curves 1



through 6 enclose subsets, of which the first four are subtrees. An initial subtree is a subtree that includes, along with any given node, all those nodes (of the original tree) from which one can move to the given node by following one or more arrows from nock to head. In the figure, 1 and 2 set off- initial subtrees; 3 and 4 mark subtrees that are not initial. Trees are familiar in grammatical theory from their use in the specification of phrase structure (§2.2). The differences between those trees and these should be pointed out. The most important difference has nothing to do with geometry: the nodes of a phrasestructure tree are, or are labelled with, characters of the alphabet A, whereas those of a rule tree are, or are labelled with, rules. To indicate the geometric differences, we must first devote a little discussion to trees in general. If we reverse the direction of all the arrows of a converging tree we have, instead, a diverging tree, with a single initial node or root (which was the terminal node of the converging tree) and one or more terminal nodes (corresponding to the initial nodes of the converging tree). The terminal nodes of a diverging tree, like the initial nodes of a converging tree, can be simply ordered among themselves or not.

If we erase the points of the arrows—or,

alternatively, put a point at each end of each arrow—so that



connections are not directed, we have an unoriented or unrooted tree, in which it is not possible to distinguish between initial and terminal nodes. If we allow a maximum of n arrows to lead to a single node, we have an /7-ary converging tree; the tree is binary when n = 2. Since n is merely a permitted maximum number, rather than an exact required number of arrows leading to every non-initial node, it is clear that the set of all n-ary trees includes, as a proper subset, the set of all m-ary trees for m < n. A finite string is a singulary converging tree and vice versa; a finite string is also an n-ary converging tree for any larger value of n. Of specific concern to us, every finite string is a binary converging tree, though clearly not vice versa. The kind of tree familiar from phrase-structure diagrams (see the one displayed in §2.2 (3)) is usually thought of as diverging, with the single initial node at the top. Although no arrowheads are put on the connecting lines, it is tacitly assumed that they point down¬ wards: the bottommost nodes are called ‘terminal’.

This is a

difference between phrase-structure trees and rule trees, since the latter are converging. Abstractly, this difference is truly trivial, since diverging and converging trees are intimately related in their properties.

Indeed, any theorem that holds for diverging trees

corresponds to a theorem true of converging trees, the one theorem obtained from the other merely by interchanging certain paired technical terms.5 The terminal nodes of a phrase-structure tree (diverging) are ordered. The initial nodes of a rule tree (converging), as we have seen, are ordered or not, depending on our option of the ordered pair or unordered pair procedure. It is the ordering of the terminal nodes of a phrase-structure tree that renders natural and unique a representation of the same structure as a bracketed string (§2.2 (3)). For a tree with unordered terminal (initial) nodes, such representa¬ tion is possible but highly arbitrary.

We shall find that this

difference is of crucial importance in choosing between the un¬ ordered pair and order pair procedures. 6See Birkhoff and MacLane 1944, p. 327, the ‘duality principle’.



Binary Tree Grammars.


Now that we know what input

geometry is required for a grammar that will incorporate doublebased rewrite rules along with single-based ones, we can easily formulate the postulates of the grammar itself. The formulation follows very closely that for a linear generative grammar (§2.1). For the ordered pair procedure, Postulate Tl(b) is to be omitted; for the unordered pair procedure, one omits Postulate Tl(d). A (binary) tree grammar is a system G(A, /, T, R1, R2), character ized by the following definitions and by Postulates Tl-4: A, 7, and T are defined as for a linear grammar (§2.1). R1 is a non-null finite set of single-based rules

R2 is a

finite set of double-based rules {R]}m2- Each rule R1 is a function whose domain is the free monoid F(A) over the alphabet A and whose image is some subset of F(A). Each rule R2 is a function whose domain is the set of all ordered pairs whose terms are elements of F(A), and whose image is some subset of F(A). Postulate Tl. (a) For every rule R1, R\0) = 0. (b) (Omit for the ordered pair procedure) For every rule R2, R\si, s2) = R2(s2, ^). (c) For any rule R2 and any string s over A, R2(0, 5) — 0. (d) (Omit for the unordered pair procedure)

For any rule R2

and any string s over A, R2(s, 0) = 0. Terminal string is defined as for a linear grammar (§2.1). Postulate T2. If s is any string, t any terminal string, and R1 and R2 any rules, then R\t) = R2(s, t) — R\t, s) = 0. Instead of the rule row of §2.1, we require here a rule subtree. The formal definition is tricky because of complexities of notation, but is entirely analogous to that of a rule row. A rule subtree that contains only single-based rules is, indeed, merely a rule row, with a single instring and a single outstring as for a linear grammar. If a rule subtree A contains n double-based rules, then its operand is an ordered set of n+\ instrings {si)n+1, which it processes to yield a single outstring; that is, S(si)n+i = s'. Postulate T3. Given any rule subtree S that contains n doublebased rules (n ^ 0), and any string Sf that is any one of an ordered set (£j)w+i acceptable as operand for S, then S(si)n+i # Si.



An initial rule subtree is a rule subtree that is an initial subtree (§5.2): that is, one all of whose first rules are initial rules, so that its operand is merely (/, /, ..., I)n+1 with /z + 1 identical terms. The outstring is thus independent of the operand, and we can allow ourselves to write merely

since even the number of Fs in the

operand depends wholly on S. A rule tree C is an initial rule subtree whose outstring C(7) is a terminal string. Postulate T4.

Every rule of R1 and of R2 appears in at least

one rule tree. 5.4.

Linearizing Input to a Tree Grammar. From the foregoing

definitions and postulates, it is clear that an input to a tree grammar is a rule tree. Now, as we saw in §5.2, although every string is a converging tree, not every converging tree is a string. It would therefore appear that C(G), the set of all inputs to a tree grammar G, cannot be regarded as a harp. There is no particular reason to be disturbed by this. If inputs are not strings, then so be it. Of course, our computing machine exemplification of §2 would now face some difficulties. A long row of switches would not do for inputs. We should require a twodimensional bank of switches, on which we could produce a pattern of settings like the converging tree of Figure 16 or 17, with all other switches in the bank turned to the out-of-line position. We hardly need spell out the complexities this would entail.

Doubtless the

problems could be solved in a practical way, but I think it would be simpler if we could bypass them altogether. And it is possible to bypass them. It is worthy of note that with two simple provisos it is possible to treat all inputs to a tree grammar as strings. The first proviso is that we accept the ordered pair procedure rather than the unordered pair procedure.


second proviso takes the form of an additional postulate which is independent of T1-T4: Postulate T5. An initial rule accepts as instring no string except /. There seems to be no empirical reason for objecting to either of these provisos, and certainly there is no formal reason. We may



have to have more single-based rules in order to conform to Postulate T5, but the total number is, of course, still finite. If a grammar that does not conform to T5 has p rules

that are

initial but that also accept strings other than I, the system can be adjusted to conform to the added postulate merely by replacing each rule R) by a pair of rules, one of which accepts only /, the other of which accepts only the strings other than / acceptable to the original rule R). Using ‘I5 for an initial rule, ‘S’ for any other single-based rule, and ‘D’ for any double-based rule, we can display a typical rule tree as shown in Figure 18. I have subscripted all the rules in this


2 Figure 18

tree for purposes of cross-reference; the subscripts on the I’s also supplement the vertical arrangement on the page to indicate their simple ordering. To underscore that we are accepting the ordered pair procedure, the two arrows converging on each D are also numbered, though this is obviously redundant because it is de¬ termined by the ordering of the initial nodes: the superordinate input string to a D is the one generated by the subtree that starts with the I bearing the smaller subscript. In the tree, two or more of the I’s might be occurrences of the same rule; or two or more of the S’s; or two or more of the D’s; but by virtue of the added postulate T5 no I is the same as any S (and, of course, no I or S is the same as any D). Now the tree of Figure 18 can be represented unambiguously by a single row of symbols, given certain conventions to be spelled out in a moment. The row is presented as Figure 19. The row of F Sj S2 I2 S7 Dj S3 I3 S8 S9 I4 S10 Sn D3 D2 S4 S5 S6 Figure 19



symbols represents the tree unambiguously because the orderings and interconnections shown in the tree, although not overtly rep¬ resented in the row, can be completely inferred from the distri¬ bution in the row of I’s, S’s, and D’s. To convert any rule tree into a simple row of symbols, draw a curve starting under the symbol at the upper left-hand corner of the tree, passing to the right under successive symbols until a D is encountered, then doubling back to pass under the next branch, and so on until the curve has passed under the terminal node. This is shown for our sample tree in Figure 20. Then follow along the curve and copy down each symbol as the curve passes under it for the first (or only) time.

To convert any row of symbols of this sort back into the corre¬ sponding rule tree, one need only add arrows accord ng to the following instructions: (1) Draw an arrow from each symbol in the row to the next, except when the next one is an I. In our example (Figure 19), the result is as shown in Figure 21. (2) Draw an arrow¬ head pointing in from the northwest towards each D, and an arrow nock pointing out towards the northeast from each symbol in the row, except the last, from which no arrow yet leads. There will be as many arrowheads as there are D’s. The number of arrow nocks will be one less than the number of I’s. This means the same number of heads and nocks, since in a binary converging tree the number of I’s is necessarily one more than the number of D’s. Furthermore, from the way in which a tree is converted into a row


—*• Sx —► S2

I2 -* S7 —*■ Dj —> S3



I4 -*► S10 —► Su —» D3 —> D2 —»> S4 —> S5 —■>S6

Figure 21

of symbols, the nth arrowhead will be preceded by at least n nocks. In our example, the second step yields the result shown in Figure 22.

-> Sx




Dj -> S3

I3 ->■ S8 ->■ sf I4 -*• S10 -* Su -» D3 -> D2 -► S4 -*■ S5 -*■ S6 Figure 22

(3) Connect the nocks and heads by arcs that stay above the row of symbols and do not intersect, the nock of each resulting arrow being to the left of its head. There will be only one way to do this.6 In our example, the result is as in Figure 23. This completes the

• Sj -a S2

Ia—:►S7—►D1Figure 23


One can imagine, if one wishes, pulling downwards

and to the left on the medial Fs, until the figure is deformed to look like the original tree with all the Fs in a column. Our procedure for converting a tree into a row of symbols rests on the fact that the tree is displayed on a flat surface with arrows pointing generally from left to right. A tree can be so displayed whether its initial nodes are ordered or not, but if they are not—that is, if we choose the unordered pair procedure—then what is struc¬ turally a single tree can typically be displayed in several different ways, giving rise to several different rows of symbols. Display in a plane forces arbitrary choices if initial nodes are not ordered. For a moment, let us consider the tree of Figure 18 to be one with unordered initial nodes. The numbers on the arrows are to be deleted; the subscripting can be retained, but merely as an indication The proof is given in §2 fn.7: merely replace the terms ‘opening bracket’ and ‘closing bracket’ respectively by ‘arrow nock’ and ‘arrow head’.



of rule identity, not to show ordering. Now imagine that the nodes are beads and the arrows rigid wires. Suspend the tree from its terminal node, like a mobile, so that everything hanging from a D can rotate freely around the vertical axis through that D. All that is formally relevant in the tree with unordered initial nodes is invariant under any such rotations. For example, we might twist the tree into a single plane in such a way that, when set down on a table, it would look like Figure 24. With unordered initial nodes,


Il¬ ia — -7 Figure 24

this is exactly the same tree as that of Figure 18. But this one, if we follow our instructions, gives rise to the row of symbols shown in Figure 25. In fact, our sample input tree, under the unordered

pair procedure, gives rise to eight different rows of rules, all of which correspond to exactly the same input array. This shows why we insist on the first proviso. We want the row representation of an input tree to be unique. The reason for our second proviso rests in the instructions for converting a row of symbols back into a tree. The first instruction requires us to insert an arrow between each pair of symbols in the row, except when the second of the pair is an I. If some initial rules might also occur noninitially in a tree, we should have no way of knowing whether or not an arrow should be inserted just before such a rule. If initial rules are exclusively initial, there is no such uncertainty.



Now, what we have been calling informally a ‘row of symbols’ is, of course, simply a finite string over an alphabet whose characters are the rules of the grammar. We have thus shown that the set of all inputs to a binary tree grammar can, after all, be viewed as a harp. The more complex input geometry seemingly required to accommodate double-based rewrite rules turns out to be un¬ necessary: the complexities can be provided for otherwise.


our computing machine, a single long row of switches will do. An input is a linear sequence of switch settings. Interlocks can take care of the interconnections represented by the arrows in the tree and by the ordering of initial nodes. What is more, just as for a linear grammar we can replace C(G), the set of all ‘rule chains’ (here, rather, linearized rule trees), by E(G), the set of all ‘effective rule chains’ (here effective linearized rule trees). And E(G), like C(G), is here, as for a linear grammar, a harp. Let us be clear about this. We have not concluded that tree grammars are, after all, linear grammars. They are not. The class of linear grammars is a proper subset of the class of binary tree grammars, consisting just of those for which R2 is null. We have shown only that—given our two provisos—the inputs to any tree grammar can be viewed as finite strings of rules. 5.5. The Time Bomb Method.

Chomsky has recently developed

a procedure that provides for embedding and conjoining within the bounds of a linear grammar.7 It also does away with singlebased transformat ons; at least, they appear in a greatly altered guise, not as rules but as a special sort of auxiliary character. We can begin our illustration of the procedure with a reformulation of Simple Sample Two (§2.2 (3)). Some of the rules of the earlier formulation remain unchanged, but the old To is deleted and two new rules, Rx and Rt, are added in its place: "When this was worked outlhad not had direct access to Chomsky’s discussion of the technique, and had gathered it from indirect sources, notably some of the papers delivered at the 1964 Summer Meeting of the Linguistic Society of America. The formalism is not superficially the same as his in Chomsky 1965, but I still think he should be credited for the basis of this approach.




I -> IX

in the environment_.0


I -> I( BD)

(see comment 1)


B -* B(c)

(see comments 1,


D -» D(EG)

(see comment 1)


E - E(f)

(see comments 1,



G -► G(h)

(see comments 1,



I(BD(EG))X -> I(K(GB)L(M(j)E))


(erases all nonterminal characters; see comment 4)




) (see comment

Comment 1: These rules (that is, those with the cross-reference to this comment) will accept a string


only if the operand of the

rule in the string is not preceded in s by any bracketless labels. Comment 2: We assume that each of these three is one of a set of two or more that operate, respectively, on B, E, and G; we do not care what the other rules are, and make the assumption merely so that these three rules will not inevitably be locally obligatory. Comment 3: A string


is acceptable to this rule if it can be de-

concatenated into I(Bx1D(Ex2Gxa))X, where each xt has the form (y0 in which the brackets are paired and yt is any non-null string over A. RT(s) is then I(K(GxzBx1)L(M(j)ExJ). Comment 4: A string is acceptable to rule RE only if it contains no bracketless labels. In this new formulation, as in the earlier version of Simple Sample Two, the rules that are overtly stated allow just two rule chains, generating just two terminal strings: rule chain,

rule chain,

old version

new version


R\RzRzRiRbRE Rx R\Rz Rz RiR$ RtRe

terminal string cjh hcjf.

We must also compare the effective rule chains of the two versions, remembering comment 2:





rule chain,

rule chain,

old version

new version

terminal string






hcjf -

We see that it is only the terminal string with a ‘transformation5 that is handled differently. The second rule chain is longer in the new version than in the old, but the greater length is due entirely to an additional locally obligatory rule; the effective rule chains are of the same length in both versions. In the old version, the transformation T'0 is an optional rule that can be applied or skipped for any string acceptable as instring. In the new version, the option is made at the outset, where one can either use rule Rx or skip it. If used, this rule plants a ‘time bomb5 X in the string, at a point where it will be out of the way and play no part, for a while, in the successive rewritings of the string.

However, once planted, the

bomb is bound to explode in due time, since there is no way of reaching a terminal string except via the ‘trigger5 rule Rt• Every rule chain of the system must end with the erasure rule Re; but that rule will not accept a string with a time bomb in it because X always remains—until it is exploded—a bracketless label. When it is triggered, its explosion redistributes the pieces of the preceding string in a specified way (see the formulation of rule Rt and comment 3), the bomb itself disappearing completely. We must think of the redistribution of pieces brought about by an explosion not as dependent on the triggering rule but rather as dependent on the time bomb itself. Early rules might allow for the planting of different time bombs, each of them, as it were, a ‘shaped charge5 that will have an explicit effect on the string when it is triggered. There must be a different (minimal) rule for the planting of each time bomb; but all can then be triggered by the same triggering rule RtNow let us speak, in very general terms, of a grammar G in¬ tended to be a grammar for English. We shall refer only to the partial grammar G', whose output is morphon strings, but shall



indicate those strings via ordinary written English, as though they had been processed by a suitable partial grammar G". We need, first, a number of rules of each of the following types: Type 1:

I -> I(IXi)

(/= 1,2, ..., r)

Type 2:

I-* I(IIYj)

(j = 1, 2, .... 0 .

These are the bomb-planting rules; there must be one of type 1 for each bomb Xi and one of type 2 for each bomb Yj. We need, next, phrase structure rules of the ‘expansion’ type described in §2.2 (3).

Since these rules can operate only on a

bracketless label, they cannot affect the / outside the initial opening bracket of a string generated by a rule of type 1 or 2. The rules are context-sensitive, and it is important that the context may in¬ clude one or more time bombs. For example, the first I inside the bracket in I(lXx) or I(IIYX) might be expandable in ways not possible for the first I inside the bracket in I(IX2) or I(IIY2) or for the second I inside the bracket in I(IIY^). A rule chain in G has the following general structure: (1) First comes an initial rule row of zero or more bomb-planting rules, the outstring from which consists of 7’s, brackets, and bombs; each bomb is immediately followed by a closing bracket. Whatever I in this outstring is not immediately followed by an opening bracket is expandable or developable. (2) Next comes a sequence of rows of phrase structure rules, one row for each developable I in the outstring from the first stage. The outstring from this second stage consists of labelled brackets, bombs, and terminal characters, with no bracketless labels except the bombs. Effective rule chains end at this stage; the remaining stages involve only obligatory rules. (3) Next is the triggering rule RT, which we may think of as exploding all the bombs in the string in a single application, but necessarily in a certain order: the innermost bomb—that is, the one enclosed within the string by the largest number of pairs of brackets—explodes first, affecting only the part of the string en¬ closed within the same innermost brackets; then the bomb that is now innermost, and so on. (4) Finally comes the erasure rule Re. The outstring is a morphon string.



To generate ‘John sees Bill5, we skip the first stage, and develop / purely by phrase-structure rules.8 To generate ‘Bill is seen by John’ we first use a rule of type 1, rewriting I as 1(IX), where X is defined to be the proper bomb. Then we use exactly the same phrasestructure rules as for ‘John sees Bill5; but they now operate on the I within the brackets. To generate ‘John said so5 or ‘Bill is coming’ we proceed as for ‘John sees Bill5.

Let us say that the outstring from stage 2 for

‘John said so5 is I(sJ and that that for ‘Bill is coming5 is I(sJ, where sx and s2 are the appropriate strings with no bracketless labels. Now suppose we begin with a bomb-planting rule of type 2, say / -> I(IIYg), and then develop this in stage 2 until we have the outstring I(I(s1)I(s2)Yq). The terminal string will be ‘John said that Bill is coming5. Or, if we choose a different bomb-planting rule of type 2, say Yv, and proceed in the same way, we get ‘John said so and Bill is coming5. These examples, of course, are supposed to define (informally) the bombs Yq and Yv. Note that the contextsensitivity of the rules must allow also the stage-2 outstring I(I(s2)I(s1) Yp), since ‘Bill is coming and John said so5 is a perfectly good English sentence, but that they must preclude *I(I(s2)I(sj Yg). In a more complicated way, a stage 1 outstring of the form

I(II(IIYa)Yb) can be developed to yield a sentence like ‘I heard Bill say you were coming5, involving three kernel sentences; one of the form I(I(IIYe)Xa) could yield ‘John was elected president by the club5; and so on. It is clear that the subset of R consisting just of bomb-planting rules of types 1 and 2 must be open (§2.1), since there is no most complex sentence and hence no longest outstring for this stage. 5.6.


The point of departure for this chapter was the

assumption that a grammar for a language cannot be very satis¬ factory (to express it as mildly as possible) unless it allows doublebased transformations or some other method for handling complex and compound sentences. We have found three ways of doing this: 8Alternatively, of course, we could insist that at least one bomb be planted— and provide a ‘zero bomb’ to take care of kernel sentences.



G (or the partial grammar G') can be (1) a binary tree grammar under the unordered pair procedure; (2) a binary tree grammar under the ordered pair procedure and with added Postulate T5; (3) a linear grammar using time bombs. The first requires inputs to be binary converging trees; the third requires them to be strings; the second permits either. A reasonable next step might be to seek empirical reasons for preferring one of the three alternatives, or, at least, for favoring one of the two types of input geometry.

That step will not be taken

within this monograph, however, because something else has turned up that takes precedence. We have discovered that there need not be, from the formal angle, any unalterable commitment to linear inputs. But if we can accommodate formal grammars to either linear or binary tree inputs, how about even more com¬ plex types of input geometry? Why search for empirical evidence favoring one

of two alternatives if those two are in fact drawn

from a larger range of possibilities? This explains the direction our inquiry will take in the next chapter.




Nonlinear Inputs.

For our purposes in this section (as

suggested in §5.6), we must define a new and more inclusive class of grammars; for that, in turn, we need the formalization of pos¬ sible input geometries presented first below. 6.1.

Finite Networks.1

A finite network is a very general sort

of array of elements and interconnections. To construct one, pro¬ ceed as follows: (1) Obtain a finite number of indistinguishable elements to be called nodes. If desired, attach labels so that they become at least partly distinguishable (wholly so only if no two nodes bear the same label). Scatter them on a sheet of paper (Figure 26).

Figure 26

(2) Draw arcs connecting pairs of nodes, not more than one arc per pair; do this in such a way that one can pass from any node to any other by moving along one or more arcs (Figure 27). (It does not matter if arcs intersect—one may imagine them passing around each other above and below the plane of the paper.) 1‘Network’ is what we might call a ‘semifree’ methematical term: that is, it has been used in a number of related technical senses, but has not been pre¬ cisely standardized, so that we can here give it a new precise sense without apology to anyone. See Flament 1963, Berge 1959.







Figure 27

(3) Attach an arrowhead to at least one end of every arc (Figure 28). Then, optionally, as a matter of convenience, remove both

Figure 28

arrowheads from any arc that bears one at each end; that is, regard a no-headed arc as equivalent to a double-headed one. For a finite network over an alphabet, one does attach labels to the nodes, each label being some character of the alphabet.2 We shall show that all the types of array mentioned so far in this essay (strings, converging and diverging trees, stepmatrices, un¬ ordered sets), as well as many other types, are networks. Some of the types of system to be mentioned below also have infinite varieties: the restriction ‘finite5 is to be understood throughout. If every arc in a network bears arrowheads at both ends, the network is unoriented (Figure 27). A network that is not unoriented is locally oriented (Figure 28). If every pair of nodes of an unoriented network is connected by an arc, the network is merely an unordered set (Figure 29). If the nodes are labelled, a different label for each node, the labels name

Figure 29 2An equivalent formal definition: a finite network is a finite set N on which is defined an irreflexive relation R whose transitive closure R( is the universal relation on N. See Clifford and Preston 1961 §1.4.



the members of the set. If they are not labelled, the network corre¬ sponds merely to a positive integer (or finite cardinality). If we ignore the proviso after the semicolon in step 2, but put arrowheads at both ends of all arcs drawn, the resulting figure is not a network but a more general structure called a simplicial complex3 (Figure 30, but also 27, 29). Any connected subset of a

Figure 30

simplicial complex (any subset that meets the proviso of step 2) is a component-, thus, there are three components in Figure 30. An unoriented network is therefore a simplicial complex of one component. If, instead, we ignore the first part of step 2, allowing as many as 5

arcs between a pair of nodes, but then heed the proviso of step 2

and put arrowheads at both ends of all arcs, the resulting figure is not a network but an s-graph. Structural diagrams in chemistry are 5-graphs, e.g.

Thus an unoriented network is a 1-graph—an 5-graph with 5=1. A loop is a subset of a network within which one can travel from a given node via one or more arcs back to the same node, following arcs only in the directions indicated by the arrowheads. In an unoriented network every connected subset of two or more nodes is a loop: two is enough, since one can move from one node to 3The term is from topology. The meaning given it here was normal in the point-set topology of a quarter of a century ago, when it was explained to me by a graduate student in mathematics at the University of Michigan. Topology has changed so much that current definitions of the term seem (and perhaps are) totally different; e.g., Hu 1964, §§4.1-2.



the other and then back again along the same arc. An unoriented network that contains no loops except those that require the same arcs to be traversed twice is an unoriented or unrooted tree (Figure 31).

Figure 31

An oriented network is a locally oriented network that includes no loops: thus, every arc must bear a single arrowhead. Figure 28 is only locally oriented, for it contains loops. Figure 32 shows an

Figure 32

oriented network. Apart from the trivial case of a network of one node, the nodes of an oriented network are of three kinds. There must be at least one initial node, defined as one to which no arrows lead. There must be at least one terminal node, defined as one from which no arrows lead. There may also be medial nodes, to and from each of which at least one arrow leads. Consider an oriented network for which the following holds: if one can pass from a node a to a node y by some indirect path, involving two or more arcs, then there is no arc leading directly from x to y. Such an oriented network is a partially ordered set. Figure 32 is not a partially ordered set; Figures 33, 34, and 35 are. A partially ordered set can also be defined as a system S(K, sS), where A is a set ot elements and

is an improper inequality re-



lation, and for which the following holds: if x and y are elements of K, then either x sS y or y zL x, or both, or neither; if both, then (from the definition of an improper inequality relation, §1.9), x = y; if neither, then there exists at least one z such that x^z and y

z or else such that z ^ x and z < y. To represent

a (finite) partially ordered set by an oriented network, we interpret the elements of K as nodes, and draw an arc with an arrowhead from x toy just in casex ^ y and there is no z such that x

z Lk y ■

An upper bound of a subset A of a partially ordered set K is an element b such that, for any element x e S, x fL b. A universal upper bound is an upper bound for the whole set K. Lower bound and universal lower bound are defined in an obvious parallel way. If an oriented network has a unique terminal (initial) node, that is the universal upper (lower) bound of the network viewed as a partially ordered set. Figure 32 has a universal lower bound but no universal upper bound. In an oriented network (or partially ordered set) a least upper bound of two elements x and y is an upper bound b of the set (x, y} such that, if b' is any upper bound of (x, y), b fLb'. A greatest lower bound is defined similarly. An upper semi-lattice is an oriented network in which every pair of elements has a least upper bound. A lower semi-lattice is an oriented network in which every pair of elements has a greatest lower bound.

A lattice is an upper semi-lattice which is also a

lower semi-lattice. Figure 33 shows an upper semi-lattice; Figure 34 shows a lattice.

Figure 33

Figure 34

If, in any oriented network, x < y and there is no element z



such that x < z < y, then y is an immediate successor of x and x is an immediate predecessor of y. A converging tree (Figure 35) is an upper semi-lattice in which immediate successors, when they exist, are unique (the terminal node, of course, has no successors at all). A diverging tree is a lower semi-lattice in which immediate predecessors, when they exist, are unique. For a diverging tree, look at Figure 35 holding the page upside down and imagining the arrowheads at the opposite ends of the arcs. A simply ordered set (or a string) is a converging tree that is also a diverging tree. There are many interesting special kinds of lattices; for example, any Boolean algebra is a lattice.4 All of these, of course, are finite networks. 4Birkhoff and MacLane 1944, ch. 11.



Consider, now, a network whose nodes fall into pairwise disjunct subsets Ki, i = 1,

2, ...,

m, where: every pair of nodes of a single

subset is connected by a double-headed arc; an arrow leads from

, 2,

every node of subset Ki to each node of subset Ki+1, for i = 1

..., m—1. This could be very messy to draw, so we introduce some conventions: we put all the nodes of a subset in a column, and we let the columns come one after another from left to right; we omit the arcs and arrows, which are predictable from the arrangement. Then we omit the nodes too, just retaining the labels. We then have a stepmatrix. Thus, a stepmatrix is (or can be represented as) a network of a certain kind over an appropriate alphabet.


Conversion Grammars.

A conversion grammar is a system

G(C, L, g), where: (1) C is a set, at most denumerable, of inputs Ci. Each Ci is a finite network over a finite alphabet R of characters Rt. (2) L is a set, non-null but at most denumerable, of outputs Li. Each Li is a finite network over a finite alphabet T of characters 7). (3) g is a surjection with domain C and range L; that is, for any C, g(C) is a unique L. Furthermore, for a fixed L, the set of inputs {Cj} such that g(Cj) = L is finite. In addition, any conversion grammar must be either inputmonitored or output-monitored (not both): (3a) G is input-monitored if, when confronted by any finite network C over the alphabet R, one can tell without computing g(C) whether or not C e C. (3b) G is output-monitored if, when confronted by a finite net¬ work C, one must compute g(C) = L and inspect L in order to tell whether or not C e C. This means also that, when confronted by a finite network L over the alphabet T, one can tell whether or not L e L without searching for a C such that g(C) = L. From the definition of g, the cardinality of C must be at least as great as that of L; in the cases that interest us, both sets are denumerably infinite. By virtue of the definition of g and the fact that the grammar is either input-monitored or output-monitored.



the grammar specifies exactly what networks over R belong to C and also exactly what networks over T belong to L. The inverse of a conversion grammar G(C, L, g) is a system G-1 (L, C, g-1) such that, if g(C) = L, then g~\L) = C.


inverse of a grammar is not in general a grammar, since g need not be injective; if g is in fact not injective, then g_1 is not a function. We see that the inverse of an input-monitored grammar is outputmonitored, and that the inverse of an output-monitored grammar is input-monitored. The class of conversion grammars is clearly not empty. We take the following examples to be input-monitored. A linear generative grammar as defined in §2.1 is a conversion grammar in which all inputs and all outputs are (simple) strings. A linear grammar with stepmatricial output, or a stepmatricial grammar (§3), has strings as inputs, stepmatrices as outputs. A binary tree grammar allows simple strings (under the ordered pair procedure) or binary con¬ verging trees (under either procedure) as inputs, and can be adjusted for either simple strings or stepmatrices as outputs. Some valid examples of conversion grammars are not what we would ordinarily call ‘grammars’. The procedure set forth in §5.4 for the reversible conversion of a certain class of rule trees into rule strings fits our definition of a conversion grammar; in this case, by exception, the inverse of the grammar is itself a grammar (outputmonitored), since the function g is bijective. There is no particular reason to doubt the existence of many other types of conversion grammar, and these should be investi¬ gated, not only for the abstract pleasure of the exploration but because we might well find a type that will fit languages more neatly than any so far proposed. For this investigation we have a few guidelines. From §3, we can conclude that outputs should be stepmatrices, except that for at least some written languages simple strings may do as well. For applications in linguistics, then, there is no reason to explore more bizarre types of output geometry. Input is another matter. That speech carries meaning—at least sometimes—is not worth arguing. That a language quantizes the world that is being talked about is true almost by definition. But



beyond this, our empirical information does not point clearly to strings, nor to converging trees, nor to any other specific type of network as the appropriate type for input geometry.

For that

matter, despite the great generality of networks, it is an act of faith to assume that inputs can be viewed as networks of any sort. But we shall assume this, in order to carry through two specific investigations. 6.3.

Generalized Rewrite Rules.

The first investigation has to

do with rewrite rules in the enlarged frame of reference. Rewrite rules can obviously be used in a conversion grammar, but if they are then there are certain very specific constraints on input geo¬ metry.

We should know what these constraints are, since if it

should turn out that empirical data suggest a type of input geo¬ metry that does not satisfy the constraints, it would follow that rewrite rules would have to be abandoned. After all, rewrite rules were not invented with a view to their use as input. We have found that they can be so used, but we have no guarantee, only the hope, that they can be made to correlate with meanings in the way we wish. So far, we have defined two kinds of rewrite rules (in terms of their geometric properties in rule arrays: §5.3). An R1 operates on a single instring and yields a single outstring. An R% operates on a pair of instrings but, like an R1, yields a single outstring. There is a simple generalization. For fixed positive integers m and n, we define a (generalized) rewrite rule RW (/ = 1,2, ..., m;j — 1, 2, ...,«) as one whose operand is an ordered /-ad of strings and whose yield is an ordered y'-ad. Each of the j outstrings, of course, must be a specified function of the i instrings. A network of such generalized rewrite rules must yield a single terminal string; this means that there must be a single terminal rule in the network, and that it must be of the type Ra. There might be any finite number of initial rules in the network. We lose no generality by assuming Postulate T5 (§5.4), so that a rule that occurs initially in a network can occur nowhere else. Further, an initial rule must be of the type R}1. j arrows lead from each initial rule to some non-initial rule, i arrows lead to the single terminal rule. Otherwise, i arrows



The subscripts on the nodes indicate the rule-type.

lead to and j arrows lead from each participating rule RiJ'. Figure 36 shows a rule network that conforms to the specifications just out¬ lined.

It will be noted that all arrows are numbered.

This is

necessary in the generalized rewrite-rule case, since multiple outstrings from a rule are by definition ordered and there must be some device to control traffic from one rule to another.


numbers on the arrows are a kind of apparatus not mentioned in our definition of a network (§6.1).

But each arrow bearing a

number can be replaced by a pair of arrows and a node, the node bearing the number as label. The nodes thus labelled can refer to ‘bogus’ rules that do no true rewriting but merely direct traffic properly. Thus, a network with numbered arrows can be replaced by one without any such ancillary apparatus.5 The class of networks admissible if the system uses generalized rewrite rules is not more inclusive than the class of all (finite) partially ordered sets with a universal upper bound. Now let us say that a set of networks is linearizable if one can map any network of the set into a string whose labels are just those of the nodes of the network, in a way that permits unique recover¬ ability. In §5.4 the set of all binary converging trees with ordered 6When this was first published I thought that the point made in the text was a simple generalization of the unordered pair procedure of §5.1. It turns out to be somewhat more complicated: see the Appendix.



initial nodes (and with initial nodes bearing labels that cannot label non-initial nodes) was shown to be linearizable. Now, of the whole class of networks allowed by generalized rewrite rules, only a certain subclass is linearizable by the procedure described in §5.4—which seems to be the only obvious procedure.6 That sub¬ class is exactly the set of all m-ary converging trees. Such a network admits only rewrite rules of the types Ra. Figure 37 shows such an m-ary converging tree with m > 2, together with the ‘lineariza¬ tion curve’, the string into which it can be mapped, and the steps for recovery. If we try to draw a linearization curve in the network of Figure 36, we find no way to reach some of the nodes because arrow shafts are in the way; such an obstacle is never encountered in a converging tree. It is obvious that every m-ary converging tree is a partially ordered set with a universal upper bound, and equally obvious that the converse of this statement is false. We have, then, these results: (1) If a conversion grammar is to use rewrite rules as input elements, an input must be a partially ordered set with a universal upper bound; no more general type of network will serve. (2) If, in addition, inputs are to be lineariz¬ able, they must be m-ary converging trees. 6.4.

An Application of Generalized Rewrite Rules.


the substance of the preceding section was developed with no 6Any finite network, of course, permits what we may call a linear description: (1) list the node labels as row-headings and as column-headings of a square table, using the same order for both listings and repeating each node label as often as it occurs in the network; (2) if an arrow leads from node x to node y, put a ‘1’ in the intersection of the xth row and the ytb column, otherwise a ‘O’; (3) copy the headings and entries out following successive northeast-tosouthwest diagonals. For example: N a b c

a b c

Oil 1 0 0 0 0 0

yields: NaabObcllclOOOOO. Recoverability is obvious. But this procedure (like various more or less transparent alternatives) does not conform to our definition of linearizability, since the string includes labels that do not occur as node labels.






S4 S2

I2 s3



S5 S6



Q4 i5

s3 s9

d4 I6

S10 Sn S12


S13 S14

Tx s4

RECOVERY OF TREE (all arrows point to right; heads and nocks are omitted fo clarity):

li-SrS/ I2-S3-S/ I3-S5-S/ i4-s7-q^ Is'Ss’Sg-Eh N\v/ li-S1-S2/ I2-S3-S^ ^'Ss-Sg l4-S7-Qx Is'Ss'Sg-D]^




I7-Si3-Si4-T4-S —__








Figure 37 T = initial /?n; ‘S’ = noninitial i?11; ‘D’ = R21\ ‘T’ = i?31; ‘Q’ = R*1.



practical application in view, one turned up almost immediately. As is well known, one of the dilemmas of transformational theory has been how to handle the phenomenon of multiple cross¬ subclassification of the forms of some major class.7

A simple

example is Potawatomi noun stems (N), which are either animate (An) or inanimate (In) in gender (Gn), and also either independent (Ind) or dependent (Dep) in dependency (Dp). If rules can be only of types Rn and R21, as in a linear or binary tree grammar, the situation is as follows. One could provide for the whole situation in a single composite rule subsuming four elementary rules:8 PR±.

N -* NAnlnd, NAnDep, NInInd, NInDep.

This is undesirable because it conceals structure: two different elementary rules correlate, as it were, with An, two with In, two with Ind, and two with Dep. An alternative, more often chosen in current practice, is as follows: PR2.

N -*■ NAn, NIn


NAn -* NAnlnd, NAnDep


NIn -> NInInd, NInDep.

Here each listed composite rule subsumes two elementary rules, giving a total of six. There is now a single elementary rule to correlate with An, and a single one for In. But there are still two each for Ind and Dep.

Furthermore, the ordering is arbitrary.

There is no way to choose between PP2-PP4 and PRb-PR1\ PR-.

N -► NInd, NDep


NInd -*■ NAnlnd, NInInd


NDep -*■ NAnDep, NInDep.

7The procedure developed here stems from R. P. Stockwell’s (unpublished) notion of a fork rule. 8I am not sure how the procedure developed here would lit into the rest of the grammar. I am not sure whether my notations such as NAnlnd are single characters or strings of characters; perhaps, indeed, one needs to use a componential alphabet so that NAnlnd (and the like) can be a single character but with components susceptible to separate manipulation. Compare Chomsky’s ‘syntactic features’, Chomsky 1965.



With generalized rewrite rules, we can handle this situation more neatly. First, we need a single (elementary) rule of type R12: PRS.

N-*(N Gn, N Dp).

Note that the parentheses to the right of the arrow are not italicized. They are not brackets included in the alphabet A as auxiliary characters (§2.2 (3)), but the curved parentheses that signal that what appears between denotes the elements of an ordered set (§1.1). That is, the yield of rule PRS is an ordered pair of outstrings: in the first, N has been expanded into N Gn, while in the second the same N has been expanded into N Dp. Next, we need four rules of type R11, which can be formulated as two composite rules: PR9.

Gn -*■ An, In

in the env N_


Dp -*■ Ind, Dep

in the env N_.

Finally, we need a single rule of type R21; note, again, the nonitalicized parentheses, here to the left of the arrow: PRn.

(N X, NY)-* NXY

where X and Y are variables, X being An or In, Y being Ind or Dep. Instrings acceptable to PR9 and PRW are generated, we assume, by no rule except PR8. Paired instrings acceptable to PRX1 are generated by no rules except PR9 and PR10. PR8 is obligatory— there is (presumably) no way to eliminate the nonterminal character N except via PR9-PRn. PR1X is obligatory in a similar way. But the four elementary rules subsumed by PR9-PR10 are not indi¬ vidually obligatory. Hence we have four non-obligatory elementary rules matching four alternatives: one each for An, In, Ind, and Dep. The selection of gender is not given any artificial priority over that of dependency, nor vice versa. In a more general way, suppose that there is a major class of forms N in some language, subject to r intersecting subclassifications {G]c)r (‘G’ for ‘generic category’), and that there are nk classes (‘specific categories’) {Skik}nk in generic category Gk.


generalized rewrite rules, we should need nx elementary rules for



the introduction of the first generic category, np?2 for the intro¬ duction of the second, and so on, ending with nxn2 ■ •nr for the last. The total number is thus + nxn2

nji2•••nr — n.

This total depends on the order in which the generic categories are introduced, and can be minimized by ordering them (with appropriate relabeling) so that nx sS «2 = • • • = nr- In the Potawatomi sample, where r = 2 and nx = n2 = 2, either order minimizes, and the total number required is 6. Whichever generic category is put last, in the general case, choice of any specific category from that generic category will correlate with nxn2 • -nr-i elementary rules. With generalized rewrite rules, we need, first, a single rule of type Rlr: GRX.

N -+ (N

(?!, N G2,..., N Gr).

Next, there must be r sets of rules, all of type R11; the kth of these sets can be formulated as a single composite rule that sub¬ sumes nic elementary rules: GR2.

Gk -> Sjct, Sk2,..., Sicnk

in the env N—.

The total number of elementary rules in these sets is clearly +


+ • • ■ + nr — n'. Finally, we must have a single rule of

type Rrl: GR3.

(N Xx, N


N Xr)

-> NXxX2 ■■•Xr,

where Xj, j = 1, 2, ..., r, is a variable with domain


the total number of elementary rules required is «'+ 2. It is easy to see that for almost all possible values of the m, n is very much greater than «'+ 2. This shows the greater efficiency of the generalized rewrite rule procedure.

In addition, the use of

generalized rewrite rules has the advantage that there is just one non-obligatory elementary rule for each specific category of each generic category.




The Stratificational Model.

Our next investigation (§6.2,

end) has to do with Lamb’s stratificational model.9

Figure 38

‘The man caught the tiger.’







t adult human male


t (mammal)


t (cat)

t (tiger) Figure 38

shows one of Lamb’s networks of semons. In the stratificational model, such networks are proposed as inputs. Ignoring, for the moment, the labels on the nodes, we note that although the net¬ work is a partially ordered set, there is no universal upper bound. This is perhaps clearer in Figure 39, where the same network is the, adult, human—* being male'' decl -> agt past (catch)—^*do gl' (tiger) -> (cat) -» (mammal)——£ being the*'^ Figure 39

9Lamb 1961, 1964a, 1964b, 1965?, forthcoming; Gleason 1964; Sgall 1964, White 1964.



deformed so that all the arrows point in the same general direction. If my own explorations of the stratificational model (§§6.7-8) can be relied on, then semon networks are always partially ordered sets but rarely, if ever, have universal upper bounds. In the light of the discussion of §6.3, the inference is clear. A conversion grammar designed to accept semon networks cannot use rewrite rules. Stratificational grammars are not rewrite grammars in disguise, as some have suspected, but an essentially different genus. We can best at the essential differences if we first consider a little more closely just how the rewrite format manages to meet con¬ version grammar requirements. A conversion grammar must have inputs: in the rewrite format, inputs are arrays of rewrite rules. A conversion grammar must have outputs: in the rewrite format, outputs are arrays of characters in the terminal subalphabet T. A conversion grammar must have the function g: in the rewrite format, that function is specified by the detailed structure of the individual rewrite rules. A conversion grammar must be monitored: we take a rewrite grammar to be input-monitored, in that the detailed structure of the individual rewrite rules specifies exactly what networks of rules are acceptable as inputs. The detailed structure of the individual rules, be it noted, is the only locus of occurrence of the nonterminal characters of the alphabet A (or Q). What, then, is the role of these non¬ terminal characters? They do not—or, at least, need not—match anything in the language of which the grammar purports to be a grammar. They are accounting symbols, arbitrary constants, dummy variables, traffic signals, descriptive conveniences, whose sole role is to specify how elements of input can be assembled and how such assemblies lead to outputs. A rewrite rule is two things at once: it is a character of an input alphabet, and it is a specification of a mapping of strings (or stepstrings) into strings (or stepstrings). As a specification of a mapping, a rewrite rule tells us two things: (1) the possible positions of the particular rule in input arrays, and (2) the effect of the rule, in any position where it can occur, on the terminal string being generated. As an input unit, a rewrite rule should also (3) correlate in some



sensible way with meaning. Here, then, are three considerations. Let us try to associate them in a different way. First, let us stop using the term ‘rule’ when we mean an element of input, and instead use Lamb’s term ‘semon’—even with reference to the rewrite rule format. In this format, then, an input is an array of semons, and each semon refers to a specific rule. This change is so purely terminological that, if we were not intending to move outside the rewrite format, it would be laughable. For there are the same number of semons and of rules—we are just using different names for a coin depending on which side is up. But this will cease to be so with our other proposals. Second, let us try to make semons match meanings. Third, let us try to make input geometry a function of the participating semons themselves, not something determined by the rules to which the semons refer. Suppose we have a finite collection of semons, not necessarily all different. The semons of the collec¬ tion have inherent valences, by virtue of which, if the collection is compatible, they will ‘spontaneously’ assemble themselves into one or another valid network; for a given compatible collection, the number of distinct valid networks must be finite. If the collection permits no valid network, it is incompatible. The analog is ob¬ viously chemistry: given two carbon atoms and six hydrogen atoms, where atoms of the same element indistinguishable, the only valid network is that for ethane: H H



However, the analog is vague: semons must have valences that differ not only in number but also qualitatively—specific affinities for semons of certain other types. Fourth, instead of using rewrite rules as the rules to which semons refer, let us try to use context-sensitive realization rules, like those discussed in §§3.7-8. That is: any semon in a given environ¬ ment of other semons has an unambiguous realization; there are



approximately10 as many composite realizational rules as there are semons; and each such composite rule consists of an ordered set of simpler realizational rules to provide for environmentally con¬ ditioned differences in the realization. These alterations take us from the rewrite format to the stratificational format.

At the risk of becoming tiresome, we must

summarize and underscore the differences, which are widely mis¬ understood. It seems to me that the misunderstanding stems largely from the fact that the same word ‘rule’ is used in such strikingly different ways. To show this, we shall temporarily eject the term ‘semon’ from the terminology of the rewrite model, into which we introduced it just a moment ago; the proper parallelism between ingredients in these two kinds of grammars is then as follows:

rewrite grammars:

sfratificational grammars:

input dements:

RULES (nonobligatory


rewrite rules) internal machinery

auxiliary characters;

for specification

obligatory rewrite

of g:



RULES (realizational); intervening strata

(simple strings, or stepmatrices, in both).

Now let us restore the term ‘semon’ to the terminology for any conversion grammar intended to be a grammar for a language. Recall the three considerations set forth earlier in this section. In either a rewrite grammar or a stratificational grammar we have semons, and the semons refer to rules. In a rewrite grammar, a semon correlates with meaning (consideration 3); the associated rewrite rule specifies input geometry (consideration 1) and the effect of the semon on output (consideration 2).

In a stratifica-

10I say ‘approximately’ because it may be convenient to allow some rules to operate on small sets of semons in fixed arrangement rather than on individual semons. Compare §3.8, where there are 25 morphons but 24 composite rules.



tional grammar, a semon correlates with meaning, but it also specifies input geometry, so that the associated realizational rule need only provide for the effect of the semon on output. 6.6.

Architectonic Comparison of the Models.

The preceding

section covers the fine-detail differences between the stratificational model and rewrite-rule grammars, but there are also some largerscale comparisons to be made. In Lamb’s view (but setting it forth in our terminology), we should describe a stratificational grammar for a language as a conversion grammar consisting of three partial grammars G-lG^ in tandem. Inputs for Gx are networks of semons; outputs are strings of elements called lexons.11 These latter are in turn inputs for G2, whose outputs are strings of morphons; G3 is then like our G" of §3.0, converting morphon strings into phonon stepmatrices.

Gl5 the coupled pair GXG2, and the coupled system

G = G^G^ are input-monitored conversion grammars. But G2, G3, and the coupled system G2G3 need not be grammars at all, since they do not have to be either input-monitored or outputmonitored—the specification of inputs to G2 is provided for by Gx. Let us now consider the most recent proposal by the trans¬ formationalists.12

The grammar proper takes the form of two

partial grammars G2G3 (the choice of subscripts is intentional); G2 and the coupled pair G2G3 are conversion grammars, while G3, taken alone, need not be. G2 is input-monitored: inputs for G£ are networks of rewrite rules; outputs are strings of morphons; and from our extended discussion of §3 we might as well say that G3 = G3. It is not assumed that there can be any very simple or direct correlation between either the input to or the output from G2 and meanings.

Therefore, one associates with the grammar

proper an interpretive system S which ties together sentences and meanings. It is customary to view S as attached to the grammar proper via outputs from Go. But this view is a consequence of the “Lexon strings may be bracketed; this is not clear. 12See Chomsky and Miller 1963, Katz and Fodor 1963, and, especially, Katz and Postal 1964.



particular loose-limbed conception of G2, which allows distinct sentences to have the same ‘structural description’ except for alternative choices from a lexical choice rule. As we have developed rewrite grammars in this essay, no two distinct sentences can have the same ‘structural description’—that is, the same input to G2— though two different inputs may lead to the same output. There¬ fore the proper place for the attachment of S is at the input end of G2. Furthermore, a ‘meaning’ is simply a network of semons. Hence we may say that, if C is any legal input to G2 then S associates C with one or more (but at most a finite number of) networks of semons. Inputs to S are not determined by S itself, but by G2. However, we may detach from G2 just exactly all those features which define the class of inputs acceptable to it (and hence also to S), and, instead, assign them to both S and G2.

With this modification,

G2 remains an input-monitored conversion grammar, while S be¬

comes the inverse of an output-monitored conversion grammar, whose own inverse S_1 = Gj is therefore an output-monitored conversion grammar. We have, then, the following: stratificational grammar

G = ^ Gx G2 G3,

transformational grammar G' =

Gi^G2 G3,

where Gx is an input-monitored conversion grammar, G'x an outputmonitored conversion grammar, and the arrow means something like start here. The difference between an input-monitored and an output-monitored conversion grammar is that, although both map inputs into outputs algorithmically, in the former one knows whether a would-be input is legal or not without reference to output, whereas in the latter the acceptability of an input can be determined only by seeing whether the corresponding output is valid or not. This difference in the first partial grammar of the whole system is, then, the crucial architectonic difference between the stratificational model and the transformational. At least one other key difference stems from this first one. In the stratificational model, the ‘interpretive system’ corresponding to the grammar G



is not merely the inverse Gi1 of the first partial grammar, but the whole inverse G_1 — GjdG^Gi1; and this whole system G_1 allows only trial-and-error determination of the acceptability of an input. In a moment we shall see the advantage of this. In characterizing a stratificational grammar G of a language, we said that the partial grammars G2 and G3 need not themselves be conversion grammars, since inputs to G2G3 are determined by Gx. But we only said 'need not be’; we did not say that they could not be. Suppose we let G3 be a conversion grammar such that the set of outputs from G2 are a proper subset of the set of inputs acceptable to G3. The grammar, as a theory of a language, can then hope to account for utterances that conform to the phonology of the language but are both ungrammatical and meaningless (or, more often, only in part grammatical or meaningful). Such utter¬ ances occur: ‘pocketa-pocketa-pocketa queep5 is nonsense—but English nonsense, not French or Swahili. We say that such utter¬ ances are generated within (or by) G3 without input from, or with only partial control from, G2. Similarly, let G2 be a conversion grammar such that the set of outputs from Gx is a proper subset of the set of inputs acceptable to G2. The theory then accounts also








Chomsky’s famous example ‘Colorless green ideas sleep furiously5, or genuine (rather than especially coined) cases like ‘I want some hot-water juice and a lemon5.

Furthermore, the theory has the

requisite machinery with which to account for all manner of mixtures of these two types of nonsense. I think it is clear from our exposition of the very close architectonic similarity of trans¬ formational and stratificational grammars that the former could easily make these distinctions too. We saw in §3.10 (2) that it may not be advisable to keep the partial grammar G3 (we were then calling it G") completely free of optional rules; instead, we may regard choices made within G3 as characterizing stylistic differences. But now, in the stratifica¬ tional model, we have not merely the original input to Gx but also three successive conversions—from semons to lexons by Gx, from lexons to morphons by G2, and from morphons to phonons by Ga.



If each of these three partial grammars is exactly a conversion grammar, then there is no room for slippage: stylistic variations have to be accounted for just by the machinery available for the explication of nonsense, namely by partially independent inputs to G2 or to G3. It is perhaps preferable to think of the three partial grammars not as conversion grammars but as almost conversion grammars. That is, the mapping gi for each partial grammar


(i = 1, 2, 3) is almost a surjection, in the sense that it is very nearly many-to-one rather than one-to-many or many-to-many, but occasionally (for some inputs) not quite. This way of putting it is not very elegant, abstractly speaking; but I believe it can be ‘fixed up’ along lines of formalism touched on in §7. However we may choose to formalize the informal loosening-up just described, it is clear that we have three distinct varieties of stylistic variation built into the theory, instead of just one. These are essentially available to both models, not just to the stratificational. Perhaps the proposed trichotomy of style will prove to be a theoretical embarras de richesse, but I doubt it. There is one further possible distinction between the stratificational and transformational models that we must mention. This has to do with the apportionment of matters between Gi and G£ in the transformational model, between G: and G2 in the stratificational.

Since G2 is, by definition, a rewrite-rule grammar, it

presumably includes single-based and double-based transformations among its rules, or else bomb-planting rules as set forth in §5.5. G2, on the other hand, operates with realizational rules which may not serve to distinguish valid from invalid inputs. An input to G2 —a string of lexons—is going to look much like a string over the alphabet A generated by some partial network of rewrite-rules of GQ, and the realizational rules of G2 are not going to bring about u

any major reordering of the terms of that string comparable to that achieved by a transformation or by the details of the bomb¬ triggering rule. Thus, some of what we imagine being done by transformations or bomb-planting rules in the transformational model, within G2, will probably be done by the input geometry for and the realizational rules of Gi in the stratificational model.




Semons and Semon Networks.

In §§6.5-6 I spoke of the

stratificational model in somewhat glowing terms, yet tried to make clear the absence, so far, of any formal guarantee that stratifica¬ tional grammars exist. There are, as a matter of fact, two by no means trivial problems in the stratificational approach. The partial grammar Gj must be a conversion grammar of a certain species. Its input elements must be geometrically self-organizing: inputs, as we have seen (§6.5), may well be networks of some class more general than partially ordered sets with a universal upper bound. And g must map such networks uniquely into strings of lexons. The problems, then, are these: (1) Can semons be set up in such a way that, in addition to correlating with meaning, they define the geo¬ metry of their own interconnections in acceptable networks? (2) Can realizational rules be devised that will map such general net¬ works into bracketed strings? We shall discuss these problems in this section and the next. There is a heuristic strategy in the search for semons. Consider the following three assertions: (a) X belongs to class Y; (b) X has property Y; (c) X consists of components, one of which is Y. The close kinship of (a) and (b) has long been known, and is reflected by the two possible ways of defining a set (§1.2): one may enumerate the members (if the set is finite), or one may state a membership criterion. The latter procedure defines a class Y in terms of a property Y. The former procedure defines a property in terms of a class, since property Y can be defined as the property of belonging to class Y. Now if we define a component Y, or perhaps merely the fact of containing the component Y, as a property Y, the three-way kinship is immediate. A working principle in the search for semons and their potential interconnections is to examine sentences with full regard to their meanings and with clues in the form of all the traditional gram¬ matical statements that might be made about them; whenever such an assertion is in form (a) or (b), it is converted to an equivalent



assertion in form (c). For example, where the traditional gram¬ marian would say that ‘man’ is a noun, which is to use form (a), we say that ‘man’ consists of components, one of which is noun (or, as Lamb has it in his networks, being). Where tradition says that ‘The man caught the tiger’ is a declarative sentence, we say that the sentence contains a component declarative. Consider, similarly, the following two assertions: (d) X bears the relation kto 7; (e) X and Y are the first and third terms of an ordered triad (X, R, Y). Where tradition would use form (d), in the search for semons we convert to form (e). Tradition says that, in the sentence ‘The man caught the tiger’, ‘the man’ bears to ‘caught the tiger’ the relation of subject to predicate; we say, instead, that the sentence contains the triad ‘the man’, agent, and ‘caught the tiger’. The components turned up by these working principles are what Lamb calls sememes. A semon is then a sememe that does not consist of smaller components. In Figure 38, the node-labels that are not enclosed in parentheses are supposed to name semons; those enclosed in parentheses name sememes the semonic structure of which is uncertain or, for the moment, unimportant. Whenever analysis dissects a sememe into two smaller sememes, we represent each smaller sememe by a (labelled) node, and connect the nodes by an arc. (We also orient the arc by adding an arrow¬ head, but the criteria for this cannot be discussed until later in our exposition.) For example, recognizing that ‘The man caught the tiger’ contains a sememe declarative (= decl) implies this network: •-> • decl

(rest of sentence).

The network for ‘Did the man catch the tiger?’ would be the same except that the left-hand node would be labelled inter (= interrogative) instead of decl. Again, since ‘cat’ is a noun we have




( cat)

and since ‘black cat’ is a noun phrase we have

•->• (black)



Calling on dictionary information—which is just as valid for our purposes as the sorts already mentioned—we observe that in some contexts the noun ‘man’ refers only to some adult male human; hence we have (i/fa tit m

human © The phrase ‘adult male human being’, on the other hand, is this: —-* ©—-






where ‘being’ denotes an identified semon while ‘(being)' denotes a sememe that remains unanalyzed. Likewise, when a sememe is recognized as being composed of X, R, and Y, we posit a node for each of the three, with the R in the middle: • •

(the man) agt (killed the tiger); •• (killed)


(the tiger).

Figures 40-61 give further and more elaborate examples; in every case, some sememes are left unresolved into semons, but that will not affect the argument.

Note that the difference between

declarative and interrogative is taken to be the presence at the same place in the network of two different semons (compare Figures 40 and 42), but that that between active and passive is interpreted as a difference of arrangement of exactly the same



semons (Figures 40 and 41). Note also the handling of reflexive sentences, in Figures 52-53. As was suggested in §6.5, semons cannot be self-organizing into networks unless they are of various valence types; this, in part, explains the arrowheads on the arcs in our examples. An arrow leading from the node for a semon constitutes a (positive) valence of that semon. We see that, purely as to the number of positive valences, the semons in our examples fall into three types: I. IF III.

Links', positive valence 2: agt, gl. Kernels', positive valence 0: being, do. Modifiers: positive valence 1: the, past, decl, sg, adult.

It is also necessary to indicate the specificity of each positive valence: that is, what sort of semon can appear at the arrowhead end of the arc. Thus, one valence from a link necessarily attaches itself to the being semon, and the other to the do semon, except that either of these may be replaced by another link (e.g., in Figure 50 there is an occurrence of gl with one arrow attached to an occurrence of agt instead of to an occurrence of being). Modifiers fall into a number of subclasses in this regard: IIIA.

^-modifiers (or adverbials), which can attach to a do or, perhaps, to another J-modifier, but to nothing else: past.


6-modifiers (or adjectivals), which can attach to a being or


to another 6-modifier: adult. /-modifiers (or concorders), which can attack to a link:


decl, sg. Universal modifiers, which can attach to a being, to a do, or to a link: inter.

Of course, this is by no means sufficiently refined; but we are concerned only with how the system might be made to work, not with all the details for a specific actual language. A semon, then, has a set of valences. A sememe larger than a semon also has a valence, consisting of the valences of the con¬ stituent semons insofar as they are not saturated (or ‘satisfied’) within the sememe. A sentence is then a sememe with a valence of



zero, all valences of participating semons being saturated within the sentence.13

Any network into which any set of semons is

allowed to organize itself by virtue of the inherent valences of the semons themselves must be an acceptable input for the grammar Is valence avoidable? By a repeated application of the heuristic principles described at the beginning of this section, one might hope to bypass the need for recognizing sememes or semons of different valence types. It turns out, however, that any such effort leads to an infinite regress. Suppose, thus, that sememe X is of valence type Y.

This is an assertion of type (a)—it say that X

belongs to class Y. We convert it into a componency statement: sememe X consists of components, one of which is Y. We have thus decomposed X into ©-6



where I have omitted the arrowhead because I don’t know where to put it, and where X' is everything in X except for Y. But now we have a sememe X', which must be of some valence type, say Z, defined by the fact that it can be linked to the semon Y.


threatening infinite regress is obvious. We stop the process, there¬ fore, just when it has given us all the structure we need. ‘The man shot the tiger.’ decl


1 1 ——* being do 4— — agtthe-*• being f—


t sg


I being do 4— t human

t sg

t (shoot) Figure 44


—► being 4-







‘What did the man shoot?’ decl






the-* being *-agt-► do being T (man)

t sg

t (shoot)

T sg

T nonhuman

Figure 45

‘Who shot whom?’ inter i being




1 i —agt——* do *—

t human

T sg

t (shoot)

i —► being

gl t sg

T human

Figure 46

‘Who did what to whom?’ inter


i being «-

i agt—-*■ do do *-—gl--* agt-> do being) —>




the(-+ being) the-+dist{-^ being)



Now suppose that, in inspecting a semon network, we find the expanded nominal [03]:


human The realizational rules cited above tell us the lexon string into which this sememe maps will involve the lexons A/s/, A/the/, and A/man/. Obviously, we must also know the proper order for these three. We can make the specificities of valence yield this informa¬ tion: the modifiers in the expanded nominal all have a positive valence of 1, but the can be of one subtype, adult, male, and human of another; decl and pi are already of a separate type (concorders), whose influence on the nominal filters through the link agt. Thus the valence-controlled output will be A/the man s/. (This is what we want: the mapping of A/man s/ into p\men/ is done by the partial grammar G2.) In some cases the realizational rules appropriate for a particular expanded nominal or verbal must map it into an ordered pair of lexon strings instead of a single one; it is in this connection that lexon strings turn out to be bracketed (at least in the process of deriving from semon networks).

Consider, from Figure 42, the

expanded verbal

The output must be not A/shoot ed/, as for the corresponding verbal [A{\ of Figure 40, and not A/do ed shoot/, but (A/do ed/, A/shoot/), because these two parts are destined to be separated when the lexon outputs for the participating expanded nominals



and verbals are appropriately arranged. As another example, con¬ sider the network of Figure 52, where the reduced representation takes this form:


/ 4

3 \Aa V Here [ 03] is sg / the adult


male''// human'/

si and the representation must be (2/the man/, 2/himself/), in two parts also due to be separated in the completed lexon string. If a nominal is dominated by two relators, as in Figure 52, we may think of a step in which the two successive strings into which the nominal will be mapped are ‘detached’ and separated. Thus, letting


be that portion of the expanded nominal which is

responsible for 2/the man/, and


be the portion responsible for

2/himself/, we can modify the reduced representation to 03