A Theory of Meter [Reprint 2016 ed.] 9783111352268, 9783110997316

157 41 6MB

English Pages 229 [236] Year 1965

Report DMCA / Copyright


Polecaj historie

A Theory of Meter [Reprint 2016 ed.]
 9783111352268, 9783110997316

Table of contents :
Table of Contents
I. Introduction
II. The Nature of Rhythm
III. Phonological Backgrounds to Metrical Analysis
IV. Objective Analyses of Metrical Properties: A Survey
V. The Components of English Meter
VI. Shakespeare's Eighteenth Sonnet: An Experiment in Metrical Analysis
VII. The Function of Meter
Appendix. The Stress Systems of Kenneth Pike, and George L. Trager and Henry Lee Smith, Jr.

Citation preview





edenda curat








@ 1965




© Copyright 1964 by Mouton & Co., Publishers, The Hague, The Netherlands. No part of this book may be translated or reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publishers.

Printed in The Netherlands by Mouton & Co., Printers, The Hague.


I wish to express my thanks to the following individuals with whom I have discussed metrics and related matters: Robert Beloof, W. Nelson Francis, Dwight Bolinger, Susan Ervin, Don Geiger, Martin Halpern, Einar Haugen, Peter Ladefoged, Ilse Lehiste, Samuel Levin, David Reed (and his students), Robert Stockwell (and his students), Francis Utley, and W. K. Wimsatt. I am also grateful to the twenty-one colleagues who served as guinea-pigs for the experiment reported in Chapter VI. My profoundest thanks go to Elizabeth Closs and Arthur and Sylvia Carson for helping me in many research capacities, and to Eleanor McGuffey, Ann Moskowitz, Fern Skellings, and Nancy Richard for helping with the manuscript. This research was supported by grants from the University of Pennsylvania and the University of California.




I. Introduction


II. The Nature of Rhythm III. Phonological Backgrounds to Metrical Analysis

18 .



IV. Objective Analyses of Metrical Properties: A Survey .


V. The Components of English Meter


VI. Shakespeare's Eighteenth Sonnet: An Experiment in Metrical Analysis


VII. The Function of Meter


Appendix. The Stress Systems of Kenneth Pike, and George L. Trager and Henry Lee Smith, Jr 225



The recent revival of interest in English meter is a gratifying instance of how scholars will resume questions which have been answered in unsatisfactory ways when fresh evidence and techniques become available. Interest in metrics began to decline in the nineteen-thirties and was practically moribund in the forties. In the meantime, some structural linguists had begun to realize that English was at least as complex as Blackfoot or Zulu and as worthy of descriptive analysis. They were particularly successful in describing its sound system, or phonology, both the individual sounds represented (imperfectly) by our alphabet, and sound modifications, those features traditionally referred to as "stress", "speech tune", "accent", "pause", etc. So impressive were their results that it occurred to those colleagues who were also interested in literary problems to apply phonology to the analysis of meter. The attempt aroused interest, not all of it friendly, in the larger community of literary scholarship, and lively discussion ensued. Metrics had not witnessed such a flurry of activity in many years, and although the discussion has sometimes generated more heat than light, it has been on the whole profitable, particularly where fundamental principles were asserted or attacked. The present book is an attempt to demonstrate the utility of structural linguistics in developing a theory of English meter. The time seems ripe for an extended treatment of the subject, even though definitions of features crucial to the metrical construct like "stress" and "syllable" are by no means universally agreed upon among linguists. The point is that more sophisticated definitions exist today than we have ever had before, and the fact that still better analysis can be expected in the future should not blind us to the possibility of present gains. Actually it is by no means inevitable



that the use of linguistic method will disturb all our traditional ideas about English meter; if anything, it strengthens many of them by providing a stronger empirical proof for the judgments of generations of metrists. In addition to contributing accurate terminology and information about the basic elements of the metrical system, structural linguistics can also function as a model for metrical research. It has shown itself to be, par excellence, a most efficacious way of analyzing languages at fundamental levels, having developed a good deal of sophistication of technique in the few years of its history. Its concern with hypothesis-formation, accuracy of description as a goal, and simplicity, elegance and consistency as criteria for evaluating formulations - all of these deserve emulation in metrics. I wish to clarify at the outset the sense in which I mean the title "theory of meter". I am primarily concerned with systematizing the set of ideas which seem to me fundamental to metrical analysis and scansion. (I separate these terms for reasons which are discussed in Chapter V.) This set of ideas is quite small, but it has proved most troublesome in traditional metrics. My concern is with the elements of meter, where "elements" is meant not in the sense of "simplest" parts but in the sense of "most basic" properties. An analogy can be drawn with mathematics: the fundamental elements of mathematics are explained by set theory, but set theory is far from being "elementary". My reasons for not wanting to go beyond a discussion of meter's elements are two-fold. 1) These elements are linguistic entities and therefore explicable in terms of knowledge gained recently in linguistics. And 2) larger metrical phenomena, like line structure or stanza pattern, have been satisfactorily described and classified in the past; no new linguistic information will significantly alter, for example, the definition of the heroic couplet or the sonnet. Thus, "theory of meter", in the present context, means a theory of metrical elements, and I apologize in advance to anyone who has hoped to find the sort of exhaustive listing of metrical types available in Saintsbury's History of English Prosody or Enid Hamer's The Metres of English



Poetry.1 Very succinctly put, this book concerns the nature, arrangement and prominence (ictus) of the metrical event feature (the syllable), as the transmitter of the rhythmic impulse in verse. (At least this statement is true of Chapters II through VI; Chapter VII concerns a somewhat different topic, as will be explained below.) It is unfortunate but necessary that much of what I have to say in attempting to be precise about the nature of the metrical elements must be said in a language strange and perhaps offensive to students of literature. I wish it were not so, because I am anxious to help develop closer ties of interest and sympathy between linguistics and literary criticism. I can only echo the consternation expressed but accepted as inevitable by I. A. Richards in his Principles of Literary Criticism: ... the more obvious remarks are couched [in language which] may seem unnecessarily repellant. The explanation of much of the turgid uncouthness of its terminology is the desire to link even the commonplaces of criticism to a systematic exposition of psychology. The reader who appreciates the advantages so gained will be forgiving.2 I have done my best to keep barbarisms out of the discussion and to redefine old and honored terms (like "ictus") where possible. But linguistics is a technical subject, and so is (or should be) metrics; we must use technical terminology where definitional accuracy is necessary, because the consequences of not doing so are the breakdown of whatever rigor we may have built up in other ways. Metrics owes it to its own self-respect to define its terms and to stick to them even at the risk of losing stylistic polish. I must also explain why I have felt it necessary to synthesize fairly elaborately research in metrics and in various impinging fields like rhythmics, articulatory and acoustic phonetics, and suprasegmental phonology. In the case of metrics itself, I have outlined in detail the classic studies of the "objective" approach, that is, research which has attempted to measure in some way, either mechanically or linguistically, the elements of meter. To my knowledge, a survey of these materials has not recently been made; 1 8

(London, 1910) and (New York, 1930). I. A. Richards, Principles of Literary Criticism (New York, 1930), p. 3.



there is a need to evaluate the methods and results of what is a substantial body of research. As for ancillary fields, my thought is simply to provide the interested reader with the kinds of information which should be incorporated in a viable theory of English meter. For those who ultimately disagree with my theory, I hope the discussion offered in Chapters II and III will be useful, at least as a point of departure. There has been no major summary of metrical research since T. S. Omond's great book of 1907,3 and it is time that we consolidate the gains that the twentieth century has made in this field. Like most metrists, I consider it axiomatic that meter is a species of rhythm, and so I start in Chapter II with a definition of rhythm based on psychological research. By establishing the nature of the rhythmic components and then inquiring into the sorts of linguistic elements that can function as rhythmic components, one preserves this fundamental axiom. In too many discussions of meter one finds confusion between rhythm and meter, and I felt it necessary to go to some lengths to define rhythm carefully so that it could effectively underpin the metrical definitions. Perhaps the most important observations reported in Chapter II concern modern views of "perception" and "conception", for various fallacies that have plagued metrics (like the assumption that feet are actually equal in duration) derive basically from a failure to keep these notions clear. Actually the mind is very elastic in interpreting sensory data, which it regularly over- and underestimates to suit its own predispositions. Perceiving itself turns out to be a categorizing process. The necessity of considering rhythmic fundamentals stems from the necessity of relating meter to non-literary phenomena, so that we may better understand it. Kenneth Burke has stated the point very clearly: The forms of art are not exclusively 'aesthetic'. They can be said to have a prior existence in the experiences of the person hearing or reading the s

English Metrists... (London, 1907). There is, of course, Karl Shapiro's A Bibliography of Modern Prosody (Baltimore, 1948), which contains short descriptions of titles listed.



work of art. They parallel processes which characterize his experiences outside of art.4 Chapter III amounts to a short course in English phonology for metrists. (Professional linguists will hopefully excuse what is elementary in this account.) A detailed attempt is made to find and delineate the linguistic features which correspond to the two metrical elements, the event (i.e., the recurring unit) and its prominence (which I refer to as "ictus" to prevent confusion with linguistic terms). The metrical events correspond to linguistic syllables, and these are considered from a number of different points of view: articulatory, acoustic, and phonological (as a function of the combining property of segmental phonemes). I proceed to a consideration of the problem of the nature of ictus, the most difficult, I think, in metrical theory. Here it is necessary to examine very carefully terms like "stress" and "accent", to discover, if we can, their linguistic nature and to ascertain whether they are to be identified, in simple fact, as the signals for metrical ictus. I take up the physiological explanations of syllable prominence, the acoustic correlates (both physical concepts like amplitude and frequency and psychological percepts like loudness and pitch), and finally the way in which these sound features operate linguistically, as parts of phonemic systems. I present in Chapter III and in Appendix A accounts of three metrically significant formulations of English stress and related phenomena, one of which forms the phonological basis for my metrical theory. Chapter IV is a survey of scientifically oriented metrical research. It follows a twofold division. The first part evaluates instrumental research on meter from the turn of the century to 1935, the date of the latest study that I have been able to find. The second part sketches developments in metrical theory based on structural linguistics from early statements by the Russian Formalists to the present day. My purpose in the survey is to summarize the positive findings of earlier workers and to indicate the nature of remaining 4

Kenneth Burke, Counter-Statement, second edition (Los Altos, 1953), p. 143. Francis Utley has pointed out to me the Richardsian origin of such ideas.



problems. What emerges very clearly is that machines alone will not solve the problems of metrical theory. In conjunction with accurate linguistic orientations, however, they can provide very useful data for analysis, data that one can happily rely upon to be free of the categorizing prejudices that always cling to our auditory impressions. The sound spectrograph, a machine that is particularly useful in phonetics, is described in some detail as a practical device for metrical research. Chapter V presents the theory of meter per se. It must be stressed, again, that this chapter does not attempt to provide a complete prosody but to explain the metrical elements in structuralist terms. My intention is not to provide an essentially new ideology but to measure the consequences of current assumptions against principles of simplicity and economy of description. If subjects like stanza structure are passed over lightly or not discussed at all, it is not that they are unimportant to metrics, but simply that I find the traditional accounts to be sufficiently accurate. My main concern is to try to specify the mechanism by which we decide how to scan a line, so that we may replace intuition (no matter how perceptive) with procedure. It all comes down finally to this question: What are the cues by which the scanner determines how many syllables a line contains and which syllable carries ictus? Having answered this, we may ask another question: Why is it that readers vary in their scansions of a line? After making a distinction between scansion and metrical analysis (a token-type relationship), I consider the syllable count, including the various devices for adjusting syllables to the standard enumeration. I next take up the foot, which I presume to be a pure metrical convention with no relation to English or to the sense of the poem. The conventional nature of temporal equality is particularly stressed in terms of the mind's inveterate tendency to take disparate things as "amounting to the same thing". And since the foot is so completely an artificial device for making metrical description simpler, I take the position that the metrist's function is not to find out how many kinds of feet there are, but rather to insure that there aren't any more kinds than necessary.



The consideration of ictus starts out with the recognition that its linguistic cue is only partly phonemic stress, that other features, like phrase-accent or the occurrence of a full vowel may also perform that function. (These terms are explained at length in Chapter III.) But, additionally, it is shown that in some cases there may be no overt linguistic cue at all and that one simply depends upon the ongoing metrical pattern (which I call "set") to know which syllable is ictic. I list a variety of foot-types as various combinations of syllable weights, and consider the problems involved in recognizing the "level feet", spondee and pyrrhic. Finally, a discussion of metrical ambiguity, foot-reversals, stress shifts, and substitutions, both monosyllabic and trisyllabic, concludes the chapter. Chapter VI amounts to an extended illustration of the application of metrical principles adduced in Chapter V. A well-known and metrically simple poem was picked as the object of this demonstration; at this stage I wish only to discover if I can what we do when we scan a poem. If we can determine that, then we can go on to formulate a procedure for clearly resolving hard lines. The analysis proceeds by considering a number of professional readings of the poem. These are scanned on the basis of criteria proposed in Chapter V. To get some idea of the exact nature of sound patterns which stimulate decisions about the location of ictus, all the recitations were analyzed spectrographically, and also phonemically. Furthermore, some of the feet were isolated on tape and played to a jury of people with metrical competence. I then compared the variant scansions and made a metrical analysis (the analysis being simply the sum of the scansions). Throughout, the criterion of semantic reasonableness was used to judge whether a given scansion was possible or not. Such a criterion requires interpretation of the meaning of the poem and is a point at which literary criticism and metrics necessarily intersect. At the same time, I have tried to keep my metrics from being value-dependent; thus, I want to consider as temporarily relevant the scansion of any reasonable interpretation, even one which is less preferable than others on literary grounds. I make no suggestion that metrical relevance implies



literary excellence, or that metrics supercedes literary criticism. Thus, Chapter VI reverses the usual procedure; instead of relying upon my own production of a poem, which necessarily commits me to one reading and blinds me to the possibility of other scansions, I have chosen to study the process of scanning verse read by others. With performances given, I can reasonably expect to eliminate one source of indeterminacy in metrical analysis, and by using a panel of judges I can reduce somewhat another source, namely the variance in perception that derives from my own idiosyncratic way of hearing things. Regardless of the success or failure of my theory, I would recommend this method for achieving metrical objectivity. Nothing deceives us more than our predisposition to hear things in our mind's ear as if they could only be said in one way. It would be foolish of me to suggest, of course, that ordinary metrical analysis needs to be as finicky and exhaustive as the experiment of Chapter VI. But I do think the effort represented there was worth the time and energy it required as a demonstration of the variety of factors in metrical judgments, so that when we resume more impressionistic analysis, we will recognize the complexities we are smoothing over and the possibility of arbitrariness in some of our decisions. Finally, it is necessary to comment on the importance of strictly limiting our purview when we embark on metrical experiments. There are all sorts of interesting problems that arise in the analysis of performances of poetry, and many of them have genuine literary significance. What is the "tone of voice" in which Shakespeare's 18th sonnet should be read? Should the voice be eager or grave? Light and ironic or serious? Severely masculine or somewhat effete? Vigorous or tender? All the matters imperfectly summed up under the term "dramatic propriety" have their vocal implications.5 But these are not metrical, and they must be rigorously excluded if we are to keep clear on what is metrical. The final chapter concerns a somewhat different matter, namely the role that meter plays in poetry. During the course of my reading I have been repeatedly made conscious of the fact that few metrists 6 This is properly the province of oral interpretation. See Don Geiger, 77ie Sound, Sense, and Performance of Poetry (Chicago, 1963).



have attempted to collect and categorize the various theories of metrical function. It is surprising that the average prosody is so preoccupied with details of analysis that it finds no time to discuss what meter is for, why poets elect to use it and what they hope to derive from it. I have no vital new explanation of metrical function to offer, but I do think the matter important enough to merit systematic discussion. In this area, as in meter proper, there have been many excellent ideas, and it seems to me very useful to sift the tradition to help preserve what is worth preserving.



If meter is a species of rhythm, let us consider the genus. The idea of rhythm has interested men for thousands of years. Aristoxenus seems to have been first to define it: he called it "an ordering of times". 1 This phrase posits 1) time-intervals in some sort of proportionate sequence, separated by 2) events of some duration. Aristoxenus' definition has been elaborated but not radically changed by modern psychological investigation. A typical modern definition is "The serial recurrence of a given time interval or group of time intervals, marked off by sounds, organic movements, etc." 2 (Most writers would limit rhythm to the time dimension; the analogous spatial phenomenon they would call symmetry. A row of equidistant fence palings seen at a single glance is symmetrical, not rhythmical.3) The terms of the definition need specification: 1. ...serial...: How many events are needed to constitute a series? Common sense answers that a single sequence cannot make rhythm but that two or more are needed.4 We must also distinguish 1

Td^i? xp6vcov. H. C. Warren, Dictionary of Psychology (Boston, 1934), p. 234. E. A. Sonnenschein, What Is Rhythm? (Oxford, 1925), p. 16, defines rhythm as "that property of a sequence of events in time which produces on the mind of the observer the impression of proportion between the durations of the several events or groups of events of which the sequence is composed". Paul Fraisse, Les Structures Rythmiques (Louvain, 1956), pp. 3-4, quotes a group of interesting definitions. 8 Sonnenschein, p. 14. The problem, of course, is only semantic. The esthetician who prefers to think of rhythm as "proportioned arrangement" in general can distinguish two kinds of rhythm, temporal and spatial. The essential temporality of rhythm is also asserted by Fraisse, p. 3, and the Russian Formalists: see V. Erlich, Russian Formalism ('s-Gravenhage, 1955), p. 182, fn. 5. 4 R. MacDougall, "The Relation of Auditory Rhythm to Nervous Discharge", Psychological Review, IX (1902), 461-462. 1



between the mere occurrence of rhythm and its establishment and stabilization; naturally the more frequent the repetition, the firmer the rhythmic perception.5 2. ...given time interval...: Two important questions arise: 1) What are the minimal and maximal sizes of the interval? 2) Must the interval between each event or group of events be precisely equal? The minimal limits have been easier to determine than the maximal. It is obvious that the events must recur at a speed not greater than the speed at which the mind is able to perceive their discreteness. Beyond a certain speed a sequence of events will sound less like a series than a vibration. The smallest interval which the mind tolerates before it begins to perceive vibration is about one tenth of a second. This is also the smallest span of time that can be distinguished by finger-tapping. The maximal limits of the interval are more difficult to determine because they involve the vexed problem of the so-called "span of consciousness". Early psychologists like Wundt thought that a temporally fixed span of consciousness could be determined by finding the longest perception that a human being might identify as a whole without needing to specify the parts. For example, the largest number of beats that one could perceive as a unit without actually keeping count, he thought, was six. Larger numbers of beats could only be perceived in groups; thus, one could perceive as many as sixteen beats, but only as four groups of four beats each. On the basis of such assumptions, psychologists concluded that something like five or ten seconds would delimit the maximal attention span. Discussions of the attention span became suspect when the basic concept itself was called into question,6 thus posing a serious problem for the rhythmist. For if there is no such thing as the average human attention span, there is no way of knowing what the longest conceivable interval between rhythmic events could be. Still, there must be some point at which the perception 6

Fraisse, p. 21. " Edwin G. Boring, Sensation and Perception in the History of Experimental Psychology (N.Y., 1942), p. 584. But see Fraisse's verification of attention-span as a discontinuous phenomenon, p. 20.



of rhythm - the actual short-timed experience of it - shades off into the more abstract or metaphorical conception expressed in such phrases as the "rhythm of the seasons" or the "rhythm of life". (Clearly, to speak of the latter is really to draw an inference from short-lived perceptions.7) However, despite this difficulty, some estimates of the temporal limit have been made, and they are worth mentioning. First, however, we must make an important distinction between two kinds of rhythm. These can be referred to as "primary" rhythm and "secondary" rhythm (although other terms like "cardiac" and "iambic" are perhaps more vivid8). Primary (cardiac) rhythm is the simple periodic return of a given stimulus. It can be represented graphically by a sequence of asterisks separated by equal spaces: * * * * * *. These represent the regular recurrence of events precisely equal in weight or emphasis. Secondary rhythm, on the other hand, is recurrence which groups the elements into a secondary pattern: * * * * ** ** ** **. There is not only regular return of sensory stimuli but also periodic differentiation of these stimuli. Secondary rhythm has a kind of internal structure that does not exist in primary rhythm. Grouping may be based on a recurrent objective difference among the stimuli - for example, the first, third, fifth, seventh, etc., may be louder, or higher, or longer than the second, fourth, sixth and eighth. Or there may be an objective difference between the intervals: the interval between the second and the third may be greater than that between the first and second, the interval between the fourth and fifth greater than that between the third and fourth, and so on. Or grouping may be purely subjective. (Subjective grouping effects will be discussed below.) Thus, common definitions of rhythm like the following are not complete since they do not include the possibility of ungrouped, primary rhythm: 7

Fraisse, p. 5. MacDougall, 460; Fraisse, p. Iff. Fraisse calls secondary rhythm "iambique" but he uses the word in a more general sense than it usually takes, i.e. any rhythm whose foot has one prominence and one nonprominence; thus trochaic rhythm would also be a "rythme iambique". 8



When a succession of discrete auditory, tactual, or visual stimuli is perceived as a succession of groups we have the perception of rhythm. In this experience the elements so combine that each group appears as a unit, a meaningful whole... To return to the temporal limits of rhythm, let us first consider the durations of the silences. What is the maximal duration of the silence interval within the rhythmic groups? The answer seems to be about 1.5 seconds; by 2 seconds the perception of secondary rhythm (i.e. the sense of grouping) is pretty much lost. 9 As for the durational limits of the silence interval between the groups (which one psychologist calls the "dead" interval, another the "interval-pause", but which might more neutrally be termed the "external" interval), the average interval utilized by subjects tapping out rhythms was about .7 seconds, a period often referred to as the "indifference" interval. The maximal limit seems to be the same as for the maximal internal interval, namely two seconds; but this does not imply that there is a constant correlation between external and internal intervals. We may now consider the second question posed above: Must the silence intervals between the events or groups of events be precisely equal? The answer is no; according to experiments 10 subjects feel that rhythms continue to be fairly good (i.e. stable) with as much as 14.5% displacement of temporal regularity. In other words, people perceive as roughly the same, intervals in rhythm which are as different in time as 1/7. There is apparently a process of mental equalization at work: "rhythmical grouping is determined by the duration of the subjective intervals, not by the objectively measurable intervals, but by the subject's consciousness of these intervals, that is, by the intervals considered as mental ' Fraisse's method of determining the limit was as follows: subjects were asked to accompany with finger taps a metronome set at different speeds. The metronome was turned off, and the accuracy of their rhythmic sense in continuing the taps at proper intervals was measured. Fraisse, pp. 13-15, noted that the subject's estimates of the intervals became increasingly inaccurate from 1.2 seconds to 3 seconds. 10 J. E. Wallace Wallin, "Experimental Studies of Rhythm and Time", Psychological Review, XVIII (1911), 108.



magnitudes". 11 Indeed, absolutely identical rhythmical repetitions rarely occur in nature; the perception of rhythm is almost always based upon the mental approximations of slightly divergent recurrences. What is important is the impression of proportion or equivalence, not mathematically exact proportion or equivalence itself. "A rough measurement or estimate of relative durations may be effected by the unaided ear; and it is with these estimates of the auris sibipermissa that we are primarily concerned in rhythm". 12 Indeed, so powerful is the tendency to perceive rhythm in a reasonably constant sequence of events that even complexly overlapping recurrences are easily ordered in perception. One psychologist reports that ten metronomes tapping at different speeds may rather quickly be joined into a regular rhythm. 13 3. ...or group of time intervals...: It is necessary to clarify the two senses of the word "group" in this context. We have already distinguished between "primary" or "cardiac" rhythm - the simple repetition of single events between equal intervals ( * * * * ) from "secondary" or "grouped" rhythm - the repetition of events such that there is a sequence of structures, rather than of single events ( ** ** ** * * ) . We must further distinguish between two kinds of secondary rhythm. In simple secondary rhythm, there is only one internal interval: ** ** ** **. In complex secondary rhythm, there are two or more internal intervals which may or may not be equal; thus: 2 *** | *** | *** | *** (two equal intervals) 2 i * * * | * * * | * * * | * « * ( t w o unequal intervals) 3. * * * | * * * | * * * [ * * * (two unequal intervals) 4. * ** * j * *• * j * ** * (three unequal intervals) 5. * ** * | * ** * j * ** * (threeunequal intervals) and so forth. 11

H. Woodrow, "A Quantitative Study of Rhythm", Archives of Psychology (1909), p. 66. 18 Sonnenschein, p. 18. 13 C. A. Ruckmick, "The Rhythmical Experience from the Systematic Point of View", American Journal of Psychology, XXXIX (1927), 362.



4. ...marked off by...: The marking-off process, of course, is perceptual, but a detailed description of the mechanism of perception would go beyond the purposes of the present study. 1 4 It is enough to observe that "marking-off" implies the existence of two essential components without which rhythm cannot be said to exist: a time-continuum, and a series of events perceived by the senses to be in a proportional relationship. In this regard, those who try to establish some sort of priority of time measurement over time division obscure the issue. It is a mere quibble, for example, to deny that the events or prominences which divide the time continuum are "factors" in rhythm because that would imply that they are "ingredients" of rhythm. 1 5 This argument seems to impute some special meanings to "factor" and "ingredient" which they do not ordinarily have. If rhythm is a proportional division of the time continuum, surely the things which are doing the dividing are parts of the rhythm and in that sense may be called indifferently "factors" or "ingredients". The 14

For a current view of perception as a species of mental categorization, see J. Bruner, J. Goodnow and G. Austin, A Study of Thinking (New York, 1956), p. 9: "Logically speaking, there is no distinction between them [percepts and concepts] save in the sense that the materials categorized differ. Categorization at the perceptual level consists of the process of identification, literally an act of placing a stimulus input by virtue of its defining attributes into a certain class. An object of a certain color, size, shape, and texture is seen as an apple. The act of identification involves a 'fit' between the properties of a stimulus input and the specifications of a category. Categorization of 'conceptual objects' also involves the fit of a set of objects or instances to the specifications of a category. We categorize, say, Whig and Tory statesmen of the first half of the 19th century in terms of whether each instance of the class had certain characteristics of allegiance, belief, etc... One of the principal differences between the two forms of categorization - the 'perceptual' on the one hand and the 'conceptual' on the other - is the immediacy to experience of the attributes by which their fitness to a category is determined. In the perceptual case, the relevant attributes are more immediately given by which we judge the categorial identity of an object, at least in simple perceptual situations." It must be noted that Fraisse, the most recent student of rhythm, points out (p. 5) that not only perception but also affective and motor systems are involved in the experience of rhythm. 15 Sonnenschein, p. 18. Fraisse argues emphatically for the essential unity of time-span and time-marking; see pp. 21-22.



perception of that which divides is as necessary to the perception of the fact of division as is the thing divided. By the same token, the sense of recurrence does not depend upon any actual estimate of the length of the time spans. Just as a good cook knows how much a "pinch" of salt is, without needing to measure it, the perceiver of rhythm knows what are the limits of "acceptable equality" between events.16 What is the mechanism of the perception? An interesting speculation is that rhythmic perception is due to a kind of "disassociated reflex response of characteristic organs"; these organs are thought to vibrate at their own constant or "pendulum" rates, regardless of the rate of the stimulus. For example, if one were to imitate the rhythm of a woodpecker by tapping his fingers, the sensory and motor organs would produce a rhythm whose sequence was not literally that of the woodpecker's but consistent with the natural pulse-movements of fingers and arms and related organs. Thus, regularity may be imputed to an objective stimulus17 itself where it does not really exist. Metrical "timers" - those who assert the existence of exact temporal equalities in verse - seem to be victims of this particular assumption. 5.... sounds, organic movements, etc...: The question early arose in the psychological study of rhythm "Is the perception of rhythm limited to one sense organ, or can any of the organs perceive it?" The answer, after careful experimentation, was that the rhythmic perception is possible with any organ.18 It is for this reason that the word "event" is used in the present study since it seems to refer with least confusion to an occurrence of any kind, perceptible to any of the senses. But in the general context of meter, of course, the event is always aural in nature. In addition to a definition, psychology provides us with other interesting observations about the rhythmic experience. There is, for example, the disposition toward "grouping", the tendency to 16 E. Isaacs, "The Nature of the Rhythm Experience", Psychological Review, XXVII (1920), 275-276. 17 Isaacs, p. 287. 18 Isaacs, p. 272.



perceive an internal structure among rhythmic events, to think of them as forming more or less uniform clusters in the time continuum. When a series of sounds precisely equal in loudness, pitch, and length, and occurring at precisely equal intervals is presented to a subject, the chances are that he will not hear the series as the cardiac rhythm it really is, but as grouped rhythm, that is, he will overestimate every other interval, thus creating a purely subjective distinction between external and internal intervals. He may also begin to perceive a regular difference in prominence (either loudness or pitch or length) among alternating events. For example, a clock with a perfectly steady tick will after a brief time be perceived as: tick tick tick tick tick tick tick tick. This subjective distortion can be conveniently called "the highlighting effect". Grouping and highlighting are in most instances correlative. As the first member of a perceived group is made more intense than the second, or as the external interval preceding the group is progressively lengthened, the strength of the grouping is increased. The question then arises, if the mind insists upon grouping and highlighting, how is it possible to speak of a primary or cardiac rhythm at all? The answer is that primary rhythm does exist, but that it is not to be perceived in objectively equal events and intervals. To create primary rhythm, one actually has to distort mathematically exact rhythm: he must shorten the interval before the subjectively (not really) more prominent event until a point is reached where the subject will no longer be able to decide consistently whether the rhythm is iambic or trochaic. Only when this "indifference point" is reached can the rhythm be truly called primary. For example, a rhythm like * * * * * *, although objectively regular in prominence and interval, is liable to be perceived as ? * * * ** **. If the experimenter decreases the interval before events 3, 5, 7, and so forth, a point will be reached where the rhythm although objectively something like * ** ** ** * will appear to the perceiver a s * * * * * * * . In fact, the psychologist's clever way of measuring the strength or stability of a given grouping is to



determine the amount of time which must be added to its internal interval (or deducted from its external interval) to reach the indifference point. What are the time limits of grouped rhythms? It has been observed that the perception of grouping begins to disappear when the "measure" (the distance from one prominence to the next) is slower than 7.0-10.0 seconds.19 Also, the maximally acceptable size of the perceptual group itself (from the beginning of the first event to the end of the second in rhythm by two, from the beginning of the first to the end of the third in rhythm by three, etc.) is four to five seconds. As measures become more rapid than one second the tendency is for subjects to group more than two events into the same rhythmic group. In experiments, subjects have grasped as many as eight beats in a single group.20 The usual number is five or six. Although in some experiments, subjects have been known to hold many more than this in mind, it is felt that greater numbers tend to break down into sub-groups: twenty events can be perceived but only as five recurrences of four-member groups (or four recurrences of five-member groups). Conversely, the number of events perceptibly diminishes proportionately to the length of the interval separating them - the longer the interval the fewer the number of events that can be grouped together. 21 It is not enough simply to say that prominence causes grouping, for it appears that different kinds of prominence cause different kinds of grouping. The effects of prominence of three kinds have been studied: length, "stress" (i.e., perceived loudness), and tone (pitch). As between length and "stress", the following can be said: when the prominent event is louder, it tends to begin the group, thus entailing trochaic movement; when the prominent event is considerably longer and when both events are reasonably 19 Woodrow, p. 36. That is, beyond this limit, intensity or duration prominences do not exert rhythmical effect because they cannot elicit the sense of temporal segregation. 20 W. B. Pillsbury, Fundamentals of Psychology, revised edition (N.Y., 1922), p. 350. 21 Fraisse, p. 16: 5.7 elements can be grouped at an interval of 0.37 seconds, 5.4 at 0.63 seconds, 4 at 1.2 seconds, 3.3 at 1.8 seconds and so forth.



long, the longer event tends to end the group, thus an iambic movement; when the prominent event is barely longer and when both events are rather short, the longer event tends to begin the group, thus a trochiac movement.22 These three possibilities can be symbolized as follows (using accent marks to indicate prominence) : 1. Prominent event louder:




- 1

2. Prominent event considerably longer, both events long:





3. Prominent event barely longer, both events short:





As for rhythms marked by tone, the conclusions are less certain; tone seems to have neither a predominately group-ending nor a predominately group-beginning effect, although grouping, of course, does occur. 23 As one might suspect, subjects often confuse one kind of prominence with another. Thus, it has been found that louder sounds seemed longer than quieter ones which were objectively of the same length,24 and also that louder sounds seemed higher in pitch than quieter sounds which were objectively at the same frequency.25 It has also been observed that in the initial stages of a rhythmic sequence, the subject tends to be biassed by what comes first, regardless of how the ultimate pattern works out. Thus, for example, although a stressed rhythm will soon develop into a strong trochaic movement, it may for a few moments be interpreted as iambic if the unstressed event occurs first; for example, The most difficult area in the study of rhythm concerns the 22

Woodrow, p. 65. H. Woodrow, "The Role of Pitch in Rhythm", Psychological XVIII (1911), 77. 21 Woodrow (1909), p. 6. 26 Ruckmick, p. 363. 23




psychological origin of rhythm. When the "faculty of attention" was an acceptable doctrine, it was thought that the perception of rhythm was one of its specialized functions. The rhythmic group was felt to be equal to one pulse of attention, and rhythmic sequences corresponded to attention pulse sequences. One psychologist went so far as to state categorically that rhythm corresponded to an alternation of attending and not attending. 26 A much more popular theory of the origin of rhythm was to ascribe it to kinaesthesia. Grouping, according to this view, was thought to be induced by "strain sensations" arising from tensions in muscles stimulated by external rhythmic impulses. 27 Almost all of the early theorists noted the significance of motor sympathy with manifested rhythms - movements of the head and toes, taps of the fingers, etc. It was even suggested that in the absence of visible movement accentual prominence was provided by strain sensations in the ear. 28 Some believed that rhythmic motor activity like walking was the basis of the rhythmic sense.29 Others attributed it to breathing or the beat of the heart, or even to some special "subjective temporal measuring scale" which rattles "along in the consciousness of an aggressively rhythmic person . . . as a subjective foot-rule with which to correlate all experience". 30 But the more authoritative accounts warn against overestimating the role of kinaesthesia, observing how frequent it is that the sense of rhythm is quite abstract, being conceived by sophisticates (for example, musicians and composers) as an idea rather than a percept. 31 For some skilled musical performers, sensory factors may completely disappear or become unconscious, so that all that is left is a kind of pure form or Gestalt of rhythms. The modern student of rhythm does not believe that kinesthesis is a sine qua non,32 28

Isaacs, p. 291 f. J. B. Miner, "Motor, Visual and Applied Rhythms", Psychological graphs, supp. V, 4 (No. 21), June 1903, 2-3. 28 Boring, p. 586. 29 Loc. cit. 30 William Patterson, The Rhythm of Prose (N.Y., 1917), p. 47. 31 Ruckmick, p. 360. 32 Boring, p. 587. 27




On the basis of the above observations, one can venture a tentative definition of meter as a skeletal hypothesis to be fleshed out by linguistic specification. Let us assume that meter is basically linguistically determined "secondary rhythm" - linguistic events grouped regularly in time, such that each group has unity in its internal composition and in its external relations. A "foot" can be defined as one of these groups of events. The strategy of the present argument is to assume as psychologically valid the perception of rhythm and then to investigate metered language as an instance of it, or, to put it in slightly different terms, to assume a concurrence of rhythmic system and linguistic system. The justification for such assumptions is inherent in the fact that measurements show metered language to be well within the valid psychological time limits of rhythm.



Nobody would deny that English meter, whatever it may be, utilizes the sounds and sound sequences of the English language. Therefore, any attempt at a theory of meter is obliged to consider all features of the language which might have metrical relevance. We have tentatively defined meter as delimited secondary rhythm, linguistically signalled. Now we need to identify the linguistic features which can perform as components of secondary rhythm. It will be recalled that rhythm contains three essential features: the time continuum; sets of events recurring at perceptibly equal intervals; and an incidence of prominence (either objectively or subjectively perceived) among the events in some regular arrangement. What are the events and prominences in meter, and how are they disposed in time?


The "event" in meter is the syllable: it functions as the essential rhythmic integer. There has been much discussion in recent years about the nature of the syllable, both generally and with specific reference to English; but, although many definitions have been proposed, none is totally defensible in all respects. It will be useful to survey the field briefly. First, some elementary distinctions. Linguists usually divide the study of vocal sounds into two major areas: phonetics and phonology (or phonemics).1 Phonetics studies vocal sounds as articulatory or acoustic events without reference to meaning or I use the word "phonology" in its usual American sense, that is, the study of sounds as units in phonemic systems, or the systems themselves.



larger linguistic relations. Articulatory phonetics describes the minute and subtle physiological processes involved in manufacturing vocal sounds.2 Acoustic phonetics considers vocal sounds as physical (rather than physiological) phenomena, measurable in terms appropriate to other kinds of sounds: amplitude, frequency, rate, etc.3 Phonology, on the other hand, is the study of linguistically relevant sounds, i.e., sounds as entities which serve to distinguish meanings in specific languages and which operate together to form larger linguistic units, like word-elements (morphemes), words and clauses.4 The basic unit of phonology is the phoneme, defined as a class of phonetically related sounds in contrast with all other classes of sounds in the sound system of a given language. "Contrast" means that the phoneme alone may serve to distinguish a word-element containing it from a word-element containing another phoneme, even though the other phonemes in both words are identical. For example, /p/ is a phoneme in English because it is in contrast with /b/, /k/, /n/, /l/, etc. Phonemes combine to form morphemes; a morpheme consists of a phoneme or sequence of phonemes to which a meaning is attached by the culture. Morphemes may be words {dog, man) or word-elements (s "plural", -tiori). Phonemes, it must be remembered, do not carry meanings; they only mark differences in meaning between morphemes. Thus, the meaning of cat differs from rat by virtue of the difference between /k/ and /r/, but it would be incorrect to say that either /k/ or /r/ "have" meanings. Sometimes, a morpheme may be only one phoneme long, as in the indefinite article "a", phonemically /i/; but it is theoretically important to remember that it is a morpheme, not a "meaning-carrying phoneme" - a phrase which would be a contradiction in terms. To speak more precisely, /p/ is a class or set of similar sounds in a

An excellent book on the subject is Kenneth Pike, Phonetics (Ann Arbor, 1943). ' See Peter Ladefoged, Elements of Acoustic Phonetics (Edinburgh, 1962). 4 The relation between phonemes and larger linguistic units is currently believed to be somewhat more complex than mere inclusion. See Charles Hockett, Manual of Phonology (Baltimore, 1955) and "Linguistic Elements and Their Relations", Language, XXXVII (1961), 32.



contrast with the sets /b/, /k/, etc. Phonemes recur in slightly different articulations, and consequently have varying acoustic effects, depending on the phonological context in which they occur. When /p/, for example, occurs as the first phoneme in a morpheme, it is frequently followed by a rather strong puff of air, phonetically transcribed as a superscript [h]; when it occurs after an /s/, the air-puff is strongly reduced or even absent; and when it occurs at the very end of a word, it may have no release at all (that is, the lips may simply remain closed, allowing no air to escape). These three forms of /p/ are nonsignificant variants or allophones of the phoneme /p/ containing phonetic differences which do not serve to distinguish meanings. Allophones are conventionally set in brackets, to distinguish them from phonemes. One could say, if he were careless of his articulation, [rip h ] or [rip] or [rip0], and it would still amount to the same morpheme; in no case would a meaning difference of the sort signalled by /rib/ be implied (unless, say [p°] were misunderstood as /b/). The pattern formed by all the phonemes in a language constitutes its phonemic system. Phonemes like /p/, /b/, /r/, and /i/ are usually called "segmental"; that is, any utterance is interpreted as a sequence or string of such phonemes which can be figuratively (and in tape recordings, literally) cut out of the larger context. 5 Some phonologists extrapolate another set of phonemes, frequently called "suprasegmental", presumed to co-occur with and give additional meanings to the string of segmental phonemes. For example, the same sequence of segmental phonemes takes on two different meanings according to whether the voice falls - as in He's coming. or rises - as in He's coming? The delineation of the syllable is more important for phonology than for phonetics. Insofar as one is concerned only with the physiological or physical properties of a sound, there is no particular need to know in which syllabic positions it occurs. Any sound can be described in purely articulatory terms: [ae], for instance, the vowel sound in cat, is low, front, and unrounded (the tongue is low 5 The fact that "segmental" phonemes are not actually discrete but flow into each other can be ignored in this elementary account.



in the mouth, pushed up against the bottom teeth, and the lips are spread); [t] is a voiceless alveolar stop (the vocal cords do not vibrate, and the tongue arrests the flow of air by pressing up against the alveolar ridge - the hard rippled part of the roof of the mouth immediately behind the teeth). Because [ae] is made without any stoppage or friction in the vocal canal, it is sometimes called a vocoid, whereas [t] is a contoid, or contact sound.6 The words vowel and consonant, then, represent a purely phonemic distinction within the framework of the syllable: vowels, in phonology, are vocoids which tend to form the centers of syllables, while consonants are contoids which tend to form peripheral elements. A center of a syllable is called a syllabic or syllable nucleus. Not all syllabics are vocoids, however; in a "word" like pst, for instance, the contoid [s] is syllabic. Conversely, vocoids may not be syllabics: in water and yes, [w] and [y] are phonetically vocoids (no friction or stoppage), yet function as consonants. But what is a syllable? Are its boundaries determinable? Does it have a "real" (i.e., physical), or a physiological, or a psychological mode of existence? Let us consider some of the definitions that have been proposed. One school of phonetics - the "motor phoneticians" - asserted the real existence of the syllable by correlating it with a specific physiological movement, namely, the chest pulse.7 This motion was described as "ballistic", a kind of sudden clenching and letting go, in which the muscular contraction is unopposed by any other set of muscles. (Ballistic movements are the opposite of controlled or tensional movements, which involve the mutual contraction of opposing muscles.) The "pulse" was said to be "a puff of air forced upward through the vocal canal by a compression stroke of the intercostal muscles" (small muscles along the rib cage), modulated by the action of the vocal cords, and accompanied by accessory 8

Kenneth L. Pike, Phonemics A Technique for Reducing Languages to Writing (Ann Arbor, 1947), Chapter I. ' R. H. Stetson, "The Relation of the Phoneme to the Syllable", Proceedings of the Second International Congress of Phonetic Sciences (Cambridge, 1936), pp. 245-252. See also Bases ofPhonology (Oberlin, 1945); and Motor Phonetics, second edition (Amsterdam, 1951).



movements ("syllable factors") which characterized it: the release (by action either of the chest muscles or of the organs creating the releasing consonant), the vowel shaping movements of the mouth, and the arrest of chest muscles or syllable-ending consonants. 8 Four different kinds of syllables were distinguished : chest-released + chest-arrested (OVO), as in the exclamations ah, oh; chestreleased + consonant-arrested (OVC), as in at, up\ consonantreleased + chest-arrested (CVO), as in for, too ; and consonantreleased + consonant-arrested (CVC), as in top, cook. The physiological view took the syllable to be a product of pressure differences between the air inside and outside the vocal apparatus. For some years the chest-pulse theory had considerable support but recently has been subjected to serious criticisms on acoustic grounds. 9 Other phoneticians 10 based their definition of the syllable on the concept of sonority. Although some advocates of the concept equated it with loudness (the psychological impression equivalent but not equal to acoustic amplitude or intensity), 11 sonority was usually defined as "fullness" of voice and said to be related to the 8

Motor Phonetics, p. 200. Stetson's theories were quoted approvingly by Alf Sommerfelt, "Can Syllable Divisions Have Phonological Importance?" Proceedings of the Second International Congress of Phonetic Sciences, p. 31 and K. L. Pike, Phonemics, p. 91. W. F. Twaddell, in "Stetson's Model and the 'Supra-Segmental Phonemes'", Language, XXIX (1953), 415-453, tried to reconcile Stetson's position with that of structural linguistics. An important criticism of Stetson's work appears in Peter Ladefoged, "Syllables and Stress", Miscellanea Phonetica, III (1958), 1-14. The French phonetician L. Roudet (Eléments de Phonétique Générale, Paris, 1910, Ch. XVI) suggested a three-way definition; in addition to definitions from the sub-articulatory (pulmonary) and auditory points of view, he added a definition from the articulatory point of view: a syllable is a "system of articulatory movements whose centre corresponds to a maximum opening of the vocal canal, and whose limits correspond to sudden variations of the aperture of this canal." (As quoted in A. Classe, The Rhythm of English Prose, Oxford, 1939, p. 40.) 10 For example, Henry Sweet, Primer of Phonetics (Oxford, 1902), p. 65, and Paul Passy, Petite Phonétique Comparée (Leipzig, 1906), pp. 42-3. 11 For example, Leonard Bloomfield, Language (New York, 1933), pp. 120, 125, although he also stated that sonority is a function of resonance (the capacity of a cavity or organ to vibrate sympathetically with a sound-issuing organ at certain frequencies). 9



general audibility of the sound. 1 2 Although held in disrepute by most modern phoneticians, 13 the term still appears occasionally in phonetic handbooks. 14 Sonorists defined the syllable as a unit consisting of a single sonority, and called its most sonorous part its "center" or "crest" - a "syllabic". 15 Several degrees of sonority were hypothesized: (1) The least sonorous speech sounds are voiceless sounds like [p], [t], [k], or [f], [s], [h]. (2) Only slightly more sonorous are voiced sounds like [b], [d], [g], or [v], [z], [3]. (3) Quite clearly more sonorous are the nasal sounds like [m], [n], [rj], and the lateral or "L" sounds. Here also belong the "sh" sounds [f], (4) Stronger than these are the "r" sounds. (5) Next come sounds like [u] or [i]; and then (6) the sounds like [o] and [e]; and (7) the sounds like [o], [ae], and [a].16 What constituted a syllabic, in this view, was a relative matter, depending not only on the degree of sonority of the phoneme, but also on the sonority of the phonemes on either side of it. If we indicate relative sonority by numerals from 4, least sonorous, to 1, most sonorous, we have: Jack /Jaek 314

caught ko:t 414

a 1 1

red red 213

bird brd/ 1 7 323

Here, /r/ would be a nonsyllabic or "glide" in red because it is less sonorous than /e/ and a syllabic or "peak" in bird, because it is more sonorous than either /b/ or /d/. The distinction between glide 12

R-M. S. Heffner, General Phonetics (Madison, 1949), p. 74. Martin Joos, Acoustic Phonetics (Baltimore, 1948), p. 6n writes: "'Talking nonsense' here does not mean making false statements, nor does it mean using terms unfamiliar to physicists; it means making statements which cannot be converted into meaningful statements (true or false) by any translation of terminology. Discussions of SONORITY in phonetic literature generally belong in this category." 11 Heffner, p. 74, referring to the measurements made by electronic methods at the Bell Telephone Laboratories and reported in H. Fletcher, Speech and Hearing (New York, 1929), p. 74. 15 Bloomfield, p. 125; Pike, Phonemics, p. 90a. 16 Heffner, p. 74. 17 The example is from Bloomfield, p. 120. The phonemicization is modernized, except for syllabic /r/ which must be used to make the example meaningful. 18



and peak was thought to involve a difference in timing (the glide being more rapid), force of articulation (the glide occurring in a "half-strong" chest pulse), and less clearly audible acoustic effects.18 Some experimental phoneticians became quite skeptical of ever finding adequate articulatory definitions of the syllable. They believed the syllable to be a convenient fiction, but one which could not be verified by acoustical recordings.19 Syllables do not exist, they said, only syllabicity; that is, one can perceive a series of peaks and valleys of prominence - like a mountain chain - but the assignment of boundaries is necessarily arbitrary and difficult to justify. 20 As machines develop which can trace more precisely the various parameters of speech - for example, the sound spectrograph (discussed in Chapter IV) - new hope has arisen about finding genuine acoustic correlates for syllables. One acoustician, for example, has shown 21 that vowel traces seem to have different shapes according to whether consonants follow in the same syllable or not 18

Hockett, Manual, p. 41. Hockett elsewhere (p. 52) distinguished the peak of a syllable f r o m the "onset" (the pre-peak phonemes) and the " c o d a " (the post-peak phonemes). 19 The negative views of the phoneticians Scripture, Panconcelli-Calzia, and Schramm are cited in J. Milton Cowan, Pitch and Intensity Characteristics of Stage Speech, Supp. to Archives of Speech (Iowa City, 1936), p. 10. Cf. the denials of Rousselot, and Gemelli and Pastori, as reported in A. Rosetti, Sur la théorie de la syllabe ('s-Gravenhage, 1959), pp. 11-12. 20 Wilbur Schramm, Approaches to a Science of English Verse (Iowa City, 1935), pp. 33-34, writes of the difficulty of finding suitable points on the graph for measurement: should one measure the phonation, or f r o m the beginning of one phonation to the next, or from the intensity peak of one syllable to the next? Different results will occur with different methods of measuring. 21 Bertil Malmberg, "The Phonetic Basis for Syllable Division", Studia Linguistica, I X (1955), 80-87. Malmberg uses synthetic (hand-painted) spectrograms to show that syllables whose vowel formants are modified by and blend into characteristic consonant configurations include these consonants within their precincts; for example,

is perceived as ag | a, while

is perceived as a | ga.



(alga vs. ag/a) and that these differences correspond to slight but real differences in articulation. Phonological theories of syllable structure seek to define the syllable in phonemic, rather than phonetic terms. A basic assumption is that each phoneme has certain privileges of occurrence, that is, that it can occur in some positions and not others. For example, both /b/ and /v/ can occur by themselves before vowels at the beginning of words, but they cannot occur together as a cluster: */bv-/ and */vb-/ are not allowable combinations. Contrarily, /br-/ is allowable, but */vr-/ is not (although it is possible in other languages, for example, Dutch). Some attempts have been made to work out the design of the English syllable in these terms, that is, a formula to account for every syllabic position in a monosyllabic unit and the range of phonemes which could occur therein. 22 The formula is too long and complicated to present here, but a display of the first three positions will demonstrate the general principle: P g k t § O ~ C-q ~



d 0 f


b +(2) +(3)

What this chart means is that any monosyllabic word in English can begin in at least three ways (actually there would be seven if we printed the whole original chart). The sign ~ means "or"; that is, a word may begin with no consonant ( = 0 , "zero"), or with any 84

Benjamin Lee Whorf, "Linguistics as an Exact Science", in Language, Thought and Reality (New York, 1956), p. 223.



single consonant except (i.e., "minus") /q/ (the sound usually spelled "ng" in English and occurring only finally in syllables, as in sing and rung), or with any one of the following combinations: /gl-/, /kl-/, /A-/> /bl-/, /pl-/, /gr-/, /kr-/, /sr-/ (/§/ is the phoneme usually spelled "sh" in English), /dr-/, /0r-/ (one of the phonemes spelled "th"), /fr-/, /br-/» /tr-/, and /pr-/. Examples are: gleam, clean, flee, bleed, greed, creed, shriek, drink, thrill, frill, bread. Other possibilities listed in the columns not presented here are single consonants before /-w-/, as in twist; single consonants before /-y-/, as in cute', three-part clusters beginning with /s/, as in spray, skew, and straight; two-part clusters beginning with /s/, as in stand, sweat, smooth, etc. Such formulas allow us to delimit syllable boundaries in terms of permissible phoneme clusterings. One of the most successful interpretations of this general theory defines the syllable as "a structural unit most economically expressing the combinatory latitudes of vowels and consonants within a given language", where "combinatory latitude" means "a minimal pattern of phoneme combination with a vowel unit as nucleus, preceded and followed by a consonant unit or permitted consonant combination". 23 Longer stretches are interpreted as syllable sequences. Where syllable-division is unclear, the relative frequencies of occurrence of syllable-initial and syllable-final consonant combinations provide a basis for decision. For example, in the word anger /aeggir/ we know that the syllable division must be /asi)/ + /gir/, rather than /as/ + /qgir/ because */i]g-/ is not a permissible beginning cluster in English. And although it is possible to divide a word like aster /aestir/ in more than one way - - /se/ + /stir/ or ™ J. D . O'Connor and J. L. M. Trim, "Vowel, Consonant, and Syllable A Phonological Definition", Word, IX (1953), 105. Cf. similar conclusions independently drawn by Einar Haugen, "The Syllable in Linguistic Description", in For Roman Jakobson (The Hague, 1956), pp. 213-221. Haugen defines a syllable as "a sequence of phonemes which together constitute a unit", "... the smallest unit of recurrent sequences", "... that stretch of phonemes which makes it possible to state [... the]... relative distribution [of stress, pitch, length and juncture] most economically" (p. 216). His rules are similar to O'Connor's and Trim's: "... whenever possible, no new position or members shall be introduced. In nitrate a division /naytr.eyt/ would introduce a non-existent final cluster -tr..." (p. 219.)



/as/ + /tir/ or /aest/ + /i r/ - we may choose one division over another by considering the relative probabilities of each division. The totality of possible phonemes in the combination V/CCV is 38, whereas in VC/CV it is 698, and in VCC/V, 71. Thus, the division /aes/ + /tf r/ is statistically more probable than the other two. 24 It is suggested that many of the intuitions that English speakers have about syllabification, like the dictum, "If possible a syllable should begin with a consonant", may be unconsciously developed perceptions of clustering probabilities.25 Although there are differences of opinion about the constitution of the syllable, the problem does not seem serious from the point of view of metrics. Metrics is concerned mostly with the number of syllables-as-events; syllables are easily recognized, and the problem of identifying their boundaries rarely matters. In most cases, the metrist can rely on some minimally acceptable phonological view, namely that syllabicity exists, and that the centers of syllables can be found, even if precise divisions are not clear. For example, in Wordsworth's Ethereal minstrel! pilgrim of the sky! or in Gray's And drowsy tinklings lull the distant folds, it is metrically unimportant whether minstrel is to be divided minstrel or mins-trel or minst-rel, or whether tinklings is to be divided tin-klings or tink-lings. The answer seems often to be more a matter of poetic tradition than linguistic analysis. The only important question for metrics is "How many syllables are there?" Is fire one syllable or two, and is rambling two syllables or three? As one 84

O'Connor and Trim, p. 121. Loc. cit., and Haugen, p. 219. Hockett, Manual, p. 52, calls phoneme combinations whose syllable-divisions are indeterminate "interludes". Syllable divisions, of course, are not to be identified with "junctures" - "phenomena relating to the way sounds are joined together". For example, both night-rate and nitrate are disyllabic, although only night-rate has an occurrence of juncture internally. See also Daniel Jones, "The Hyphen as a Phonetic Sign. A Contribution to the Theory of Syllable Division and Juncture", Zeitschrift furPhonetik, IX (1956), 99-107. 86



linguistically oriented metrist puts it, syllables involve "maxima and boundaries; of these two only the maxima are relevant for the meter". 26 METRICAL PROMINENCE, OR ICTUS: THE PROBLEM OF "STRESS"

Having considered the metrical "event", the syllable, we turn to the more difficult component, namely prominence. I shall use the venerable term "ictus", hoping to keep the discussion free of confusing associations. Ictus is a feature of the metrical, not the linguistic system; our problem is to determine the way in which the English language actualizes it. Metrists have said that ictus is marked by "stress", which is identified as articulatory force, or emphasis, or loudness, or any one of a number of other things. But "stress" needs careful consideration, and it is best to keep it wrapped in quotation marks until we are prepared to define it. 27 "Stress" is controversial but efforts to define it seem quite necessary if real progress is to be made toward an objective metrics. And we must also keep in mind the possibility that ictus may only be partially signalled by "stress". One metrical school maintains that prominence is not marked by "stress" but by syllable length. It is asserted that metrically prominent syllables in English verse are perceptibly and regularly longer than the unprominent ones, as they were in Latin and Greek poetry, and that it is this difference which the reader grasps when he identifies the meter.28 Many "timers" feel that a regular a

® John Lotz, "A Notation for the Germanic Verse Line", Lingua, VI (1956), 2. Perhaps I should use the symbols suggested by I. A. Richards in Speculative Instruments (Chicago, 1955), p. 30: "? ? to show that how the word or phrase is to be comprehended is the question. It may be read as query,— These ?'s should carry no derogatory suggestion; their work is to locate and orientate inquiry " 28 For metrists in the quantitative tradition, see T. Omond, English Metrists in the Eighteenth and Nineteenth Centuries (London, 1907); a recent statement is Sheridan Baker, "English Meter Is Quantitative", College English, XXI (1960), 309-315. Omond points out that for centuries metrists have confused accent and length. 27



mathematical ratio exists between prominent and unprominent syllables, say two to one, and that since syllabic lengths are temporally negotiable, so to speak, certain equivalences can take place; for example, two unprominent syllables can replace one prominent one, or a pause of a certain time span can function as a substitute for a syllable. Some have scanned with musical notations, implying that syllabic relations are as precisely computable as musical notes.29 The evidence shows that although length is an element in "stress", and hence contributes to ictus, it is only one of several features, and that the timer confuses a part for the whole when he claims special priority for time differences. Since there is an extensive experimental literature which disproves the temporal thesis, arguments need not be gone into in detail here.30 Suffice it to say that the timer cannot provide objective measurements which will confirm his views; on the contrary, the actual lengths of syllables vary so considerably that prominent syllables are occasionally shorter than adjacent unprominent syllables. The central question is whether readers of poetry have internalized a kind of temporal yardstick by which they can automatically measure the length of syllables and the proportions which exist between them. There has been no demonstration that they have, and some that they have not, so that it would seem wise to search elsewhere for an explanation of how we are almost "intuitively" 29

Sidney Lanier, The Science of English Verse (Boston, 1880), was the modem founder of the so-called musical school, although meter was being marked by musical notes as early as Joshua Steele, Prosodia Rationalis (London, 1779). A recent use of the musical analogy is that of Northrop Frye: see "Lexis and Melos", in Sound and Poetry, English Institute Essays (1956) (New York, 1957) and Anatomy of Criticism (Princeton, 1957). See also Karl Shapiro's A Bibliography of Modern Prosody (Baltimore, 1948), where recent "timers" and "stressers" are distinguished. R. Wellek and A. Warren, Theory of Literature (New York, 1955), p. 169, are surely exaggerating when they say that "In America, at least among teachers of English, it [Lanier's system] seems the accepted theory." 80 W. K. Wimsatt and Monroe C. Beardsley, "The Concept of Meter: An Exercise in Abstraction", PMLA, LXXIV (1959), provide a list of references in footnote four on p. 589. See particularly Warner Brown, Time in English Verse Rhythm (New York, 1908), and Ada L. F. Snell, "An Objective Study of Syllabic Quantity in English Verse", PMLA, XXXIII (1918), 396-408 and XXXIV (1919), 416-435, discussed in Ch. IV.



able to scan. For a theory of meter must not only provide a systematic method of scansion, but it must also explain how it is that people have always been able to scan with some degree of success, even without a system, or with faulty systems. The linguist feels that in the structure of the language itself exist automatically learned and systematic features which may be used to signal ictus. But unalloyed syllabic length is not such a feature, i.e., it is not phonemically significant in the structure of English. The main trouble with the timers is that their introspections have been mislabeled. We respond to several kinds of cues when we recognize "stress", of which time difference is only one. Time is the medium through which rhythm flows and is necessarily a "component" of meter; however, this does not say that duration is the only or even an essential component of ictus. Naturally, time is what is being cut up by the metrical events - i.e., syllables - and the condition for the experience of rhythm involves some kind of perception of temporal equality. Graphically, one might represent this phenomenon as follows: The horizontal lines are the syllables, and the braces represent "temporal equalities". It must be emphasized that these are assumed, not objectively demonstrable equalities; in the graph, the periods marked by distances between the points of the braces are taken to be roughly equivalent, although in physical reality the elapsed time between two pairs of syllables may be as much as seven times as long as between two others. 31 But timers assume not only that these equalities are real but that the events themselves are mathematically proportional. In the graph l



2 b

3 c




4 d IV

5 e V

See the measurements of J. E. W. Wallin, conveniently summarized in A. R. Chandler, Beauty and Human Nature (New York, 1934), 250. Wallin came to the interesting conclusion that even "routine scansion" or "sing-song" reading, designed explicitly to reveal the meter, contains as much as 7 % variation (in comparison to 17% for "reading scansion" and 19% for normal recitation).



they would maintain both that there are temporal equalities among the groups (I, II, III), and that the prominent syllables themselves (a, b, c) are equally long. The simplest way to disprove this is to measure the durations. One learns that by no stretch of the word can any of the relations be called proportionate. 32 One can simply extend the observations made by the psychologists about the role of time in rhythm in general (p. 21 above) to the special case of meter. Or take to heart the commonsense observations of a metrist like Lascelles Abercrombie: Since... [English] verse does not merely consist of a certain number of accents, but also of those accents in a certain disposition, the effect of rhythm is in an important respect dependent on the time-values of its utterance; namely, in respect of the intervals at which the accents come. But when time-values are thus asserted to be necessary to accentual rhythm, there is no need to understand from this that accentual rhythm is measured in time - i.e., consists in a certain duration of utterance, or of accents embedded in a certain duration... to recognize the intervals as a scheme need imply no recognition of measurement of the intervals... In a word, nothing need be demanded of the sense of time in accentual rhythms but perception of the order in which accents and non-accents occur. 33 Since metrical history has shown that it is easy to be misunderstood when speaking about time, the point may be restated. I do not deny that time is the medium through which meter flows, or even that length itself is a component of "stress"; what I do deny is that the mind has some elaborate faculty of measuring and identifying time spans and that this is what it does in meter, rather than noting comparative length as one of several cues for ictus while simply assuming as a convention the temporal equality of feet. Let us return to our first question. H o w is ictus marked? T o initiate our inquiry let us consult our impressions. In a very ,a

Warner Brown, p. 72 and passim. Lascelles Abercrombie, Principles of English Prosody (London, 1923), p. 131-2. Schramm, however, goes a little too far, in discounting time entirely: "When we say that a line is iambic pentameter, we do not mean that it consists of five approximately equal units, which may - like links in a chain - be lifted out at will. No. We mean that the line has five principal stresses, or their equivalent, and that the syllables are so arranged that a heavy stress regularly follows a light one" (p. 71). 33



obvious way, there is something about the second, fourth, sixth, eighth, and tenth syllables in the following lines which makes them stand out: Extremes in Nature equal ends produce, In Man they join to some mysterious use. But the "something" is fluid and uncertain if we have not analyzed it scientifically. It might be a matter of our voice rising, or an increase in loudness or force or energy in pronunciation, or a greater length or a greater clarity in the articulation of the syllable, or a general increase in the muscular activity of a part or the whole of the vocal apparatus. Our impression is, in short, that although we feel we know which syllables are prominent, or could be prominent if we insisted, we find it difficult to explain how we know and why we can decide so readily.34 We need to turn to professional students of language to find the answers. And they, again, are either phoneticians or phonologists. They cannot directly answer our question, but they can give us a great deal of useful information. From the phoneticians we learn that the question amounts to two questions, according to whether we consider it from the point of view of the speaker or of the hearer. What does the speaker do when he makes one syllable more prominent than another? Or, what does the auditor hear that cues syllable prominence? The first is a problem in articulatory phonetics, the second in auditory phonetics. On their side, articulatory phoneticians have long considered "stress" to be the product of an increase in the expenditure of energy of production. (For the moment, I shall refer to phonetic syllable prominence as "stress", reserving a narrower and more rigorous definition of the word, without quotation marks, for the phonemic section of the chapter, p. 52.) They have said that "stress" is produced by the "force of breath-impulse initiating 34

Shortly after having written this passage, I came across the following remark by an 18th century metrist, quoted in Omond, p. 72: " 'It is to this hour disputed what accent is,' yet no one who sees an accent-mark placed on a syllable 'feels any doubt in regulating his voice according to that accent-mark'." - R. Nares, Elements of Orthoepy (London, 1784), pp. 140-141.



syllables",35 or by an increase in tension and expended energy throughout the whole vocal apparatus, 36 involving more vigorous vibrations of the vocal cords and a greater exertion of the muscles of the upper vocal tract. 37 Some have spoken of a general increase of pressure in the entire speech-canal,38 and of the kinesthetic sense that muscle and pressure changes are strongly involved.39 But simple articulatory explanations by no means solve our problem. Even if "stress" be explained in terms of the musculature, we still have no satisfactory explanation of how we perceive its incidence when someone else speaks. A knowledge of articulatory causes does not necessarily bring us closer to an understanding of acoustic effects. Measuring muscle activity may tell us what the speaker was doing when he created " stress", but it will not identify which sound elements, emerging through that effort, managed to communicate the phenomenon of "stress" to his auditor. Different organs perform different activities. When we speak, the mouth and the other parts of the vocal apparatus are engaged in productive activity; when we listen, the ear is engaged in perceptual activity. These activities, of course, are physiologically different. Any full phonetic account must show what the ear does; that is, one must isolate and describe the phonetic signals or "cues" in the sound itself to which an auditor responds upon perceiving "stress". This task has not yet been accomplished, although phoneticians are learning more and more about the process. It is now generally believed that there is no simple correlation between the output of articulatory effort and "stress" perception, that the relationship is complex and full of variables. Before considering this relationship in detail, we should establish a certain precision of terminology. We must distinguish between psychological terms - like "loudness" - referring to the auditor's perception, and terms from acoustic physics - like "intensity", "power", 85

Abercrombie, p. 19. B. Bloch and G. Trager, Outline of Linguistic Analysis (Baltimore, 1942), p. 35. " Bloomfield, pp. 110-1. 88 Classe, p. 37. " Heflner, pp. 224-5.




"amplitude", "energy", "sound pressure" - which pertain to properties of sound waves as natural events. Linguists have sometimes used these terms as if they were interchangeable,40 but psychologists and acoustic phoneticians warn against making this sort of simplified identification. Loudness is a psychological term for the auditory sensation which allows us to judge whether a sound is 'soft' or 'loud'. 41 Intensity, on the other hand, is a concept in acoustics; it is not what the ear hears, but rather the name for the actual physical event, i.e., the transmission of sound energy, where energy, in the scientific sense of "what a system consumes when it does work", refers to the movement of air-particles set in motion by the sound-issuing agent. Strictly defined, intensity is "the sound energy transmitted per unit of time in the specified direction through a unit area normal to this direction at the point". 42 Intensity is a difficult element to isolate in the sound complex. Phoneticians sometimes define it negatively, for example, simply as "the other variable", additional to frequency and time.43 Many phoneticians prefer to use the term power instead of intensity; power is the rate of doing work. The amount of power used in speech is so small that five hundred speakers would have to speak for a whole year to generate enough power to heat water for a cup of tea. 44 Amplitude, too, is a concept in acoustic theory rather than a perception. It is the simple to-fro (graphically, up-down) motion of a sound wave, reflecting the excursions in space made by the 40

Bloomfield, pp. 110-1, wrote of intensity and loudness as if they were synonyms, although elsewhere (p. 90), he defined stress as loudness only. On the other hand, he, like Pike, Phonemics, p. 250b, may have been using the term in some physiological sense rather than in the acoustic sense, perhaps in reference to relative pressure of breath force. But this would be a confusion of articulatory with perceptual terms. See H. Mol and E. M. Uhlenbeck, "The Linguistic Relevance of Intensity in Stress", Lingua, V (1956), 207. 11 S. S. Stevens, Hearing (New York, 1934), p. 452. Another percept which is sometimes encountered in phonetic discussions is "volume" - "that aspect of auditory sensation in terms of which sounds may be ordered on a scale running from 'small' to 'large'..." 42 Loc. cit. 48 Stevens, p. 25. 44

Harvey Fletcher, Speech and Hearing in Communication (New York, 1953), p. 68.



generating sound-emitter. When one strikes a tuning-fork, a sound will be emitted which can be represented as follows:

This is the simplest sort of sound: a pure tone. The distance between the farthest reach in either direction from the norm (the medial line) is the amplitude of the sound. This span represents the size of the maximum deviation of air pressure from its normal level. There is no necessary correlation between the property of a sound wave called power and the amount of energy which the speech mechanism requires to produce that sound wave. And although there is some relationship between acoustic power and perceptual loudness, it is not a simple one. There are several reasons why correlations are complex: for one thing, auditors seem to vary in perceptiveness. Further, frequency affects the perception of loudness; "for a given intensity, the loudness is greater in the middle range of the frequency scale than it is at either extreme". 45 A final source of complexity is the qualitative differences among sounds. Stressed high vowels like [i] may register less energy than unstressed low vowels like [a] because of the different ways in which the sounds are formed. 46 46

J. C. R. Licklider, "Basic Correlates of the Auditory Stimulus", Handbook of Experimental Psychology, ed. by S. S. Stevens (New York, 1951), p. 1003. 46 Heffner, p. 225. I do not mean to imply that the expenditure of energy varies with each vowel. "Since the human vocal tract is a variable acoustical tube, with a variable radiating orifice, one would not expect to obtain the same



Pitch must be distinguished carefully from frequency. Pitch, like loudness, is a perceptual term: it is the impression of how high or low a sound is. Frequency, on the other hand, is the number of cycles per unit of time which compose the sound wave. (Since sound waves are often complex, it is usual to speak of the frequency of the sound in terms of its fundamental, i.e., the lowest component frequency of the wave.) Thus, the greater the frequency, the higher the perceived pitch. 47 The highest frequency which the human voice can reach is around 2100 cycles per second and the lowest around 40 cycles per second. Average voices have much narrower ranges; female voices average about 220 cycles per second and male voices about 120 cycles per second.48 If articulatory and auditory processes are so different, how does one manage both to create "stress" and to recognize it when somebody else creates it? The connection seems to be that of muscular sympathy, a sort of unconscious judgment of the physiological effort one would need oneself to produce the sounds. What we hear as a more prominent syllable is in reference to the memory of similar articulatory efforts which we ourselves have made in the past; the memory is automatized, and instantly and unconsciously we are able to reconstruct the speaker's effort in our own minds.49 pressure or power outputs for identical physiological-input energies" - 1 . Lehiste and G. Peterson, "Vowel Amplitude and Phonemic Stress in American English", Journal of the Acoustical Society of America, XXXI (1959), 429. 47 The correlation between pitch perception and the actual facts of frequency is much greater than between loudness and intensity. See Licklider, loc. cit. There are some disturbing factors, however; for example, extremely short sounds tend to be judged as sharper than they really are. See D o n Lewis, "Pitch: Its Definition and Physical Determinants", University of Iowa Studies in the Psychology of Music, IV (1936), 346-373. 48 E. Pulgram, Introduction to the Spectrography of Speech Cs-Gravenhage, 1959), p. 131. 48 Lehiste and Peterson, p. 428. Cf. Peter Ladefoged, "Syllables and Stress", 1-14. Lehiste and Peterson further state: "... the listener interprets speech according to the properties of the speech production mechanism rather than according to the psychophysical principles of the perception of abstract sounds" (p. 428). One assumes "the acoustical signal to be a representation of the positions and movements of the physiological mechanism relative to the distribution of air pressures within the mechanism" (p. 429).



We rely upon certain typical cues: pitch differences, loudness differences, length differences, and vowel quality differences, yet not abstractly, not as psychoacoustic concepts, but in terms of how much effort it would take us to imitate them. Isolating these cues for study is no easy matter. Although the human ear is sensitive to changes in pitch, loudness and length, occurrences in speech are so closely intertwined that even trained phoneticians often cannot trust their ears to disentangle the various threads. Fortunately, it is becoming increasingly possible not only to analyze human speech mechanically and to present the components in the form of visual, measurable displays, but even to "zero out" certain variables or to create or imitate variables artificially. How do the variables of pitch, loudness, length, and quality operate together to signal "stress"? We do not entirely know, but it is clear that their use is mixed and redundant in any act of speech, that is, more than one cue almost always occurs, even though one would be enough to signal "stress". Redundancy is normal, rather than exceptional in speech: ... it is to be found at every level of speech activity and as a consequence there is scarcely any feature which can be said to be essential for speech communication. A system that is common to the speaker and the listener and a time pattern of change in the medium of communication are indeed the only two factors that can be regarded as essential. For the rest, speech consists of features that sub-serve these requirements and operate in combinations that depend upon the conditions of the moment.60

These observations must be utilized by a viable theory of meter. Many metrists of the past (although not all51) have mistakenly assumed that one must fix ictus in a single phonetic feature - length or loudness or pitch change, or whatever names these have been given over the years. Experimental phonetics has shown the truth to be more complex. The question then arises, Granted their redundancy, which of 60 Dennis Fry, "Experiments in the Perception of Stress", Language and Speech, I (1958), 128-129. 51 Omond mentions several metrists who felt that more than one phonetic feature might be involved in "stress"; see the discussion of Guest, p. 128, Patmore, p. 151, Ellis, p. 169, Bright, p. 221, and Scripture, p. 230.



the variables is most important as a cue to "stress"? Is there a relative order of importance among its components? Experiments have shown that in controlled situations there is such a priority of importance. Where speakers were asked to decide between cognate forms in isolation, pitch changes seemed to have been more important than even massive increases in intensity. 52 In one experiment the words permit (noun) : permit (verb) were recorded with normal citational pitch contours, i.e., (a) P e r mit vs. (to) p e r m i t (Relative position of type-alignment indicates relative height of the voice.) Then the normal intensity ratio of the syllables in permit was disturbed by artificially amplifying the syllable per- very strongly and decreasing the syllable -mit to bare audibility. Yet, no matter how intense per- became, -mit was judged to be more prominent, demonstrating that pitch change tends to override loudness changes as a cue to prominence, at least in isolated words. By pitch change is meant either rise or fall in the voice; what seems to be significant for "stress", as shall be pointed out in greater detail below, is "pitch isolation" - "a rapid and relatively wide departure from a smooth and undulating contour". 5 3 Although "stress" is most often identified by a rise, it has long been recognized that a syllable may also be "stressed" by considerably lowering it. 54 This is obviously true, for example, where a word with prominence on the first syllable forms a one-word question. A conversation might proceed as follows: "Did you say 'forbears' or 'forebears'?" ('disease' or 'dizzies'?) ('defer' or 'differ'?) ' " F o r e b e a r s . " C D i z zies' or ' D i f f e r ' ) 62 63

Mol and Uhlenbeck, 205-213. Dwight Bolinger, "A Theory of Pitch Accent in English", Word, XIV (1958),

112. 61 A. C. Gimson, "The Linguistic Relevance of Stress in English", Zeitschrift fur Phonetik und Allgemeine Sprachwissenschaft, IX (1956), 143-149.



l l T o r e b e a r s ? " ' C D i z z i e s ? ' Q r < D i f fer?')

"Yes." In the third sentence, fore- (diz-, dif-) is "stressed" just as it is in the second, but it is clearly lower in pitch than -bears (-zies, -fer). Pitch change is not the only feature enabling the hearer to make correct stress attributions; on occasion, other features may be more important. Length, for example, seems to become especially important where the "stressed" syllable is lower in pitch. In the above, fore- in the question "Forebears?" tends to be longer than for- in the question "Forbears?" In one experiment, the effects of intensity vs. duration were compared with that of duration vs. frequency. 55 The experimenter used a recently developed technique for artificially creating speech: a device known as the "pattern playback" allows one to control speech variables in experimental ways. The product sounds enough like human speech for a listener to understand what is being "said". In one part of this experiment, frequency was kept constant while duration and intensity were manipulated; in another, intensity was kept constant and duration and frequency were manipulated. The investigator concluded that durational differences were more influential than intensity differences as cues to "stress", and that frequency differences were more influential yet. 56 66

Dennis Fry, "Duration and Intensity as Physical Correlates of Linguistic Stress", Journal of the Acoustical Society of America, XXVII (1955), 765-768 and "Experiments in the Perception of Stress", 126-152. 56 In the duration-intensity experiment, it was discovered that 28 per cent more people interpreted a syllable as "stressed" when it carried minimal intensity (regardless of the duration) than when it carried minimal duration (regardless of the intensity); similarly, 10 per cent more people interpreted it as stressed where it had maximal duration (regardless of its intensity) than when it carried maximal intensity. So at both extremes, duration seems more effective. Fry writes that his data "show no case in which change of intensity ratio caused a complete shift of the stress judgment from first to second syllable" (p. 151). In the frequency-duration experiment, it was discovered that as many as 40 per cent of the listeners interpreted subjects as being stressed on sub where it had higher pitch (i.e., su^ject), even though sub- was only one-quarter the length of -ject, and that over half the listeners made the same judgment where subwas only 2/5 the length of -ject. (As one might expect, it was observed that a



A fourth variable, vowel quality, has not as yet been measured very closely, but it is also clearly a factor in recognizing "stress". It is well known that many cognate pairs have different vowel-sets corresponding to different stress patterns: for example, the first vowel in convict (noun) is usually /a/ (or /a/), but in convict (verb) it is ¡iI; relay (noun) has /i:/ but relay (verb) has /i/ or /a/, etc. 6 7 Most commonly, the vowels of unstressed syllables are "degraded", i.e., tend to approximate the articulatory position of the highcentral vowel /i/.

THE PHONEMIC ANALYSIS OF WORD-STRESS AND OTHER FEATURES It is quite clear that there is a connection between "stress" and meaning and, therefore, that stress is phonemic in English. Consider the following simple "minimal pairs" - i.e., words 5 8 with different meanings distinguished by a single feature of sound: dizzies vs. disease, insight vs. incite, pervert (noun) vs. pervert (verb). 69 Clearly, if the words are pronounced in isolation, i.e., step down ^su^ject) was more likely to effect a judgment of stress on the first syllable than a step up (subject^' 57 For an attempt to derive the English stress system from vowel quality differences by calling them effects of stress differences, see Marshall Berger, "Vowel Distribution and Accentual Prominence in Modern English", Word, XI (1955), 361-376. 58 The concept of "word" is hotly debated in linguistics. Some linguists go so far as to deny its existence, writing it off as an orthographic convention of no real structural significance. For the present simple purposes, we may accept the definition of H. Marchand, The Categories and Types of Present-Day English Word-Formation (Wiesbaden, 1960), p. 1, which represents an eclectic summary: A word is "the smallest independent, indivisible unit of speech, susceptible of being used in isolation". So-called "phonological" definitions of the word e.g., "a word is that linguistic entity which is marked by a heavy stress" - could not be used in this formulation because circularity would obviously result. 60 At least in some pronunciations. It is, of course, true that many cognate pairs may involve differences in segmental phonemes as well: thus, the noun subject is often pronounced /sabjikt/ whereas the verb is /sibjskt/. There are, however, a considerable number of words which do not have much of this kind of vowel degradation. Consider the following (at least in my dialect); pérmit:



in their "citation" forms, anyone who speaks English will certainly recognize one or the other by reference to certain sound differences. Those differences may be indicated by the mark '; thus, dizzies: disease; insight: incite-, pervert: pervert. The phonetic complexity o f ' was discussed in the preceding section; here we are concerned with its recognizability and semantic differentiating function. Because it can, by itself, serve to distinguish two words, we are entitled to call' a phoneme: hence /'/. Since the other, "segmental" phonemes ("segmental" because they can be analyzed as discrete and separable entities occurring in the stream of speech, like beads on a string) are identical, we must conclude that it is the placement of /'/ which is significant. We may now drop the burden of quotation marks by providing a partial phonemic definition that applies at least to polysyllabic words: stress is the capacity of one syllable of a polysyllabic word to be pronounced with a suprasegmental feature or combination of features which make it more prominent than any other syllable in the word. This capacity is a potential for prominence, which may or may not be filled. In some contexts, the prominence of the stressed syllable is just as marked as when the word occurs in isolation. For example, a usual way of saying I found no one of insight vs. Ifound no one to incite is I found no one of m sight and

I found no one to in



But often, where emphasis is on other words, the stress may have a weak or null phonetic signal: His desire for insight is re m a r ^able His desire to incite is re m a r ' c able. Stress distinctions continue to operate even where their phonetic manifestations are somewhat flattened or even disappear entirely, permit, digest: digest, increase: incredse; billow: beldw; import: impdrt; ¿xport: expdrt; incline: incline; differ: defer; insult: insult; prdceeds: proceeds; Into it: intuit; discount: discount; imprint: imprint; discharge: dischdrge; fdrebears: forbears; dutcry: outcfy; (as in outshdut).



because the context almost always provides sufficient information: the word for above clearly causes us to anticipate a noun, while to signals the likelihood of a verb. But we have not accounted for all the syllables that may be called stressed, and we must expand our definition. Clearly, monosyllabic words can also be pronounced in isolation, or with other words, and be marked by the same sorts of suprasegmental features as are used by the stressed syllables of polysyllabic words. One can say the word sight in such a way that it has precisely the same prominence contour as -cite in incite: i/'te sgh


(as a response to the question "Which sense is he lacking?").

Thus, we may broaden our definition to include monosyllabic words: stress is the capacity or potential of one syllable in a word to be given suprasegmental prominence (or upon which such prominence is understood and can be elicited on demand) when the word is pronounced in isolation or in a larger stretch of speech. A characteristic of stressed syllables is that they always contain "full vowels" whereas unstressed syllables typically are "weakened" (or "reduced" or "degraded"): ... the vowels of unstressed words and syllables appear in a 'weakened' form: they are shorter and formed with looser muscles, the voice is sometimes reduced to a murmur, and the tongue-positions tend toward a uniform placing, somewhere near higher mid position... The unstressed vowel is a shorter, looser, less extremely formed variant of the stressed vowel.60 Full vowels need not be stressed (compare the second vowels in context, highway, purport, etc.); and reduced vowels cannot be stressed. Reduced vowels may be emphasized, but through the instrumentality of another linguistic entity which will be explained below, namely accent. Some monosyllabic words, like the indefinite article " a " , have a phonological variant in some positions 80

Bloomfield, p. 112.



(technically a sandhi61 variant). Comparatively few monosyllabic words have reduced variants, and in most cases the vowel is like the one occurring in unstressed syllables of polysyllabic words, /i/. Examples are you /yi/, but /bit/, shall /sil/, etc. Thus, one syllable in each polysyllabic word and all full-vowel monosyllables possess stress in English, while those syllables are by definition unstressed which cannot normally be given suprasegmental prominence because they contain degraded vowels or, if they occur in polysyllabic words, are not in the lexically assigned positions. Let us consider another aspect of the pair His desire for insight is re mar ^able His desire to incite is re m a r ' i able. The syllable -mark- in the above sentences seems not only more prominent than re- and -able, but also more prominent than in- in insight and -cite in incite. And it is quite common to give special prominence to other syllables which we might wish to "emphasize": desire ifor insightl is remarkable (not hers) | to incite J His de Slr e ifor insight! is remarkable (not his capacity). [ to incite J We know that it is the lexically stressed syllable which will be made more prominent when this "emphasis" falls on the given word: His desire for m sight is remarkable (not for knowledge) His desire to in Clt e is remarkable (not to inflame). But this observation presents a problem: shall we say that the most prominent, or "emphasized" syllable has a greater degree of stress than the stressed syllables of "unemphasized" words? This would "

Bloomfield, pp. 186-188.



imply three different stress levels: emphasized stress, unemphasized stress, and lack of stress. Or shall we say that this additional prominence on the "emphasized" syllable is something else, not a matter of stress, but some other entity operating at another level of linguistic structure? Both of these positions have been advanced, the latter by Kenneth Pike, and the former by George Trager and Henry Lee Smith, Jr. Their systems are sketched in detail in Appendix A. The notion of stress used in the present theory of meter is based rather on the work of Dwight Bolinger.62 Bolinger distinguishes between stress and accent. He shows in interesting experimental ways that it is largely pitch change, or more precisely, pitch obtrusion that causes one syllable in a phrase to seem most prominent, although he allows that greater loudness and greater length are co-occurrent factors. For example, if we were pointing out some recent additions to the Spanish Department, we might say There are the recently-arrived Span i s h teachers. The prominence of Span- is linguistically significant and needs to be included in our system because the utterance contrasts semantically with There are the recently-arrived Spanish



in which the expression means "teachers from Spain", not "teachers of Spanish". The phonological difference between the two phrases is not one of stress, however, since both Span- and teach- already carry lexical stress. (We would say Span i s h or

teach er

in isolation as responses to the questions "What nationality is she?" and "What's her profession?" respectively.) Bolinger suggests that 12 " a Theory of Pitch A c c e n t . . . " a n d references cited therein.



this relative phrase prominence is different linguistically from stress and names it accent. Accent is to the phrase what stress is to the word. Marking stresses with angular strokes and accent with type realignment we may distinguish: The S P i n i s h teacher, from

The Spanish



The use of type alignment suggests the importance of pitch change as a signal for accent, but it should not be interpreted to mean that loudness and length differences are of no significance in these syllables. These are not separate factors but part of the phonetic complex that constitutes accent. As in stress, the cues of accent are mixed. Although Span- and teach- carry lexical stress in both utterances, only Span- carries phrase accent in the first utterance, and only teach- carries phrase accent in the second. The phrase is a single intonational unit, and the accent forms its center. In this formulation (as in that of Trager and Smith), it follows that every utterance - including single-word utterances - must be pronounced with an intonation pattern and hence must contain an accent. We were really simplifying the facts in assuming that stress alone was operating in citation forms since the citation form itself is a minimal phrase and therefore must contain an accent. That is, the citation form subject when actually pronounced alone possesses both lexical stress and phrase accent on its first syllable, hence s,ib


Monosyllabic words also carry phrase accent. Their accentual contour is characteristic. Whereas polysyllabic words may often occur in step-like contours, each syllable at a different level He's a S 6nt leman ("He's a gentleman") monosyllables are often spoken with a glide through the syllable: He's a gentle m a n ("He's a gentle man").



Any stressed syllable bears a potential for accent; whether that potential is fulfilled will depend on broader conditions of meaning and emphasis. Unstressed syllables of polysyllabic words, however, do not ordinarily bear the potential for accent in normal English constructions. 63 Let us recapitulate. Stress is a fundamental property of full vowel monosyllabic words, and of one syllable in polysyllabic words, which in any environment, accented or not, can serve to distinguish them from what are otherwise homonyms. The actualization of stress is not uniform; its phonetic cues will vary according to the phonological context in which the word finds itself. Nevertheless, it is real; speakers will not ordinarily differ in their sense of where it occurs, and can always make it more prominent on demand. Accent, on the other hand, is the prominence which one syllable in an uttered phrase receives when it is the center of the pitch contour; it is not fixed to the word but to the phrase. Thus, a word like very always has its stress on the first syllable, but that syllable may or may not receive phrase-accent. It does, for instance, when it forms the entire utterance by itself: an answer to the question Was he nice? is Very. But stress and accent do not coincide on many syllables in larger utterances. Lexical stress on syllables is usually signalled by changes of pitch, loudness, and/or length slighter than those that mark accent. Indeed, the stress may not be overtly signalled at all, but its position is well known, and the listener has no difficulty in reconstructing it from the context. In other words, stress is a purely 63 They take accent only in a few restricted and easily recognizable situations, for example, where the word itself is under consideration, as a word, not as a unit in the construction. If the word improve were not clearly understood, for example, one might say

I said lm prove, not ls prove. See Dwight Bolinger, "Contrastive Accent and Contrastive Stress", Language, XXXVII (1961), 83-96.



lexical function whose phonetic externalization will vary in the phrase according to position, emphasis, and other factors. Precisely as do segmental phonemes, stress identifies and distinguishes words. Accent is a function of the phrase unit and a marker of relative emphasis and of larger semantic distinctions. Compare, for example, He's

a very nice fellow

He's a


He's a very

nice fellow mce

He's a very nice

fellow fel


(not someone else) (not just ordinarily nice) (as well as being rich) (normal way of saying it).

The minimal contrast between the words billow and below will provide another example of the distinction. Below (/bilo:/) and billow (/bilo:/) are distinguished (for some speakers, at least) by lexical stress alone. The actualized stress may coincide with phrase accent, as in (Where was it?)

Be'°w (or) We saw it be'^vv.

(Was it wave or billow?)


(or) We saw the


Or it may not: One below was S °ft. One billow was S °ft. The advantage of saying that stress in below and billow are matters of different potentials is that the actualization of stress is not always the same. It may be very clear, as when the stressed syllable also carries phrase accent; or it may be so weakly marked that if the word were isolated - say by cutting it out of a tape recording auditors might be unable to determine where the stress falls and hence which word is being pronounced. 6 4 Such reductions rarely 64

I performed a small experiment to check this assumption, whose results

may be of interest.

Eighteen sentences were read into a tape recorder by two



seem to cause ambiguity, however, since there are all sorts of other, contextual, cues available to the auditor. For example, the fact that nouns generally occur after the tells us that exports is a noun in the phrase The exports are here, even when the stress o n ex- is completely flattened. If the context is not clear, greater prominence can always be placed o n the appropriate syllable in a clarifying repetition, even t o the point of having two phrase accents: One b e ^ w was S ° f t . One


l o w was



Bolinger calls phrase accent "pitch accent" because he is convinced by his experimentation that pitch change is its most important (although not its only) component. It is useful to describe a few of his experiments. H e presented subjects with an artificial reading of the sentence Break both apart*5 The pitch patterns and relative speakers, a man and a woman. These sentences contained nine sets of minimal cognate pairs: pérmit: permit; digest: digést; pérvert: pervért; subject: subject; increase: increase; billow: belów; éxports: expórts; imports: imports; incline: incline. The sentences were designed so that the word in question occurred early in the sentence and therefore would be unlikely to receive phrase accent, which is normally placed on the last stressed syllable; thus: "His driving permit is expired", "He can't really digest his food", etc. The eighteen sentences were presented to the speakers in random order and they were given simple instructions to avoid emphasizing any word except the last one in the sentence. The various words were then separated from their contexts by literally cutting them out of the tape. The isolated words were played to two groups of subjects, one, a beginning class in freshman composition and the other, a group of fairly sophisticated graduate students in linguistics. These were the reactions to the randomly arranged words: 1. Among the freshman, roughly 50 percent could not distinguish the noun pervert from the verb pervert extracted from "They'll pervert everything you say." Two-thirds thought that permit extracted from the sentence "They won't permit him to work" was a noun. Eleven out of twenty-five thought the fragment extracted from the sentence "Everybody below suffered from it" was "billow." 2. Among the graduate students, eleven out of eighteen identified pervert as a verb in the sentence "The pervert was captured." Seven out of nineteen thought that the word was below in the sentence "The billow was soft." The group divided evenly between making noun and verb assignments to incline in the sentence "The incline was icy." 66 "A Theory of Pitch Accent..." p. 121.



intensities were synthesized by the pattern playback techniques described above. The following utterances were constructed (relative frequency is shown in cycles per second (cps) on the vertical dimension and relative intensity in decibels (db) below each syllable): 1.


110 B 100 f e ak both a P a r t cps Odb 10.5db 2.5db. The majority of listeners heard the accent on -part because it had slightly greater intensity than that of Break, and because its pitch pattern represented a rapid obtrusion from a- (whereas both, for all its intensity, was on the same pitch level as the last part of Break). 2. 120

110 100

B fe

ak both apart Odb l l d b 2db.

Both was recognized as accented by a majority of subjects, proving that a massive increase in intensity is required to overcome a comparatively small inflection of pitch. 3. 120 Break 110 100 both apart Odb 2db ldb. Here, although both and -part were more intense than Break, the wide separation in pitch between Break and both signalled accent to the large majority of subjects. Smaller differences of intensity apparently are cancelled by noticeable frequency jumps. 4. 80 Break both apart 10.5db Odb Odb. Naturally Break was heard as accented since the pitch was perfectly level and the only distinction was intensity; however, the massive increase in intensity was not as effective as comparatively small rises in pitch in other experiments.



5. 90 80 Break °th apart 6db Odb 6db. Both was heard as accented. This is another instance of what is proved in (3): a slight change in pitch will be more effective than considerable increases in intensity. Bolinger concludes that accent is most clearly perceived where there is a relatively wide departure from a smooth or undulating contour - that is to say, a pitch obtrusion. Not that length and intensity do not operate in marking accent but simply that they are not as effective in that function as is pitch obtrusion. The results of other experiments seem to confirm this conclusion. For example, when Bolinger asked subjects to judge the lifelikeness of artificially created accents, they distinctly preferred the addition of small rather than large quantities of intensity, that is, they thought those accents most lifelike which were the product of little or no additional intensity and which relied mostly on pitch obtrusion. Bolinger's conclusions are supported by other experiments mentioned above (p. 49-51). is not the only kind of accentually The pattern significant pitch obtrusion in English, although it is probably the most common. Bolinger has isolated two other types. His definitions are as follows: Accent A: A relative leveling off of the accentable syllable followed by a relatively abrupt drop, either within the accentable syllable (which is prolonged for the purpose) or in the immediately following syllable. Thus

Accent B: The characteristic of this accent is up-motion. It is neither skipped down to nor skipped down from. It may be [1] approached from below and skipped up to, with the following motion continuing level,



or rising (the usual thing), or falling slightly (an abrupt drop would create an A). Or [2] it may be approached from a relative level and skipped up from, after which the movement usually continues upward slightly or levels off. Thus

Accent C: This is a kind of anti-accent A, both in form and in meaning. The accentable syllable is approached from above, and skipped down to. What follows may level off or rise, but a further fall seems to be avoided :


"A Theory of Pitch Accent..." p. 143f. In working in experimental conditions with Bolinger's system, I have noted the possibility of confusing Accents A and C in certain situations. Since accents are primarily established in relation to adjacent syllables, one might wonder whether, in an utterance like „ • , your Dr,nk


one were dealing with an Accent A on your or an Accent C on beer. That is, /



or the pattern can either be-. ^ * \ • It seems to me that only loudness and length can resolve this ambiguity. Where your is longer and/or louder it carries Accent A, and where beer is longer and/or louder it carries Accent C. It is conceivable also that both syllables might carry an unusual degree of loudness or length, in which case we must recognize that both are accented. This would be a very unusual way of achieving double accent, however; the normal way would seem rather to be to repeat the same accent, as A A

Drink y ° u r b e e r , or at least to use accents which worked through the syllable without relying on the state of surrounding syllables, as B Drink y°


uf t e e




These accents are meaningful units, and hence "morphemes", although their meanings are very general. 6 7 Accent A is described as "assertive": "It is used with items that are separately important, contrastive, and/or new to the discourse". Accent B has the meanings of "connectedness" or "incompleteness". Accent C is "antiassertive", something like "lackadaisical", or "restrained". Here are some examples: A 1. D o you really hate your k r o t | . c


This is a normal, unemphatic way of asking the question; the level syllable bro- is followed by an abrupt drop in the immediately *' The meanings of intonation contours are often so general as to be difficult to formulate. Pike writes in The Intonation of American English (Ann Arbor, 1945): "Once a particular intonation contour has been isolated (by studying its contrasts) its meaning is determined by finding the least common denominator of the linguistic contexts or physical and emotional situations within which that contour occurs. If, for example, a low slightly rising contour occurs in utterances which are variously statements, queries, dependent clauses, and also occurs in the discussion of trees, children, algebra, atoms, and cancer, while in each utterance the speaker is deliberating carefully on these items, then it is precisely the speaker's attitude of deliberation which constitutes the only contextual characteristic common to all of them. In this case, the low slightly rising intonation contour must be defined as meaning a deliberate attitude of the speaker. As with words which may have two or more related [or unrelated] meanings, however, so with intonation contours one must sometimes indicate a central meaning with marginal variations from it." (23) And later: "In analyzing the meanings of intonation contours the chief danger of error -an error which has vitiated much work in the past - lies in the failure to get the common meaning from a large enough number of contexts. By abstracting the meaning of a particular contour just from a single context, or from contexts which are all grammatically or physically similar even although that contour actually occurs elsewhere in grammatically and physically diverse contexts, one tends to assume that the meaning is much more concrete than it actually is; this takes place when one includes in the definition of a contour the characteristics of the local context selected, whereas these characteristics would not universally appear with that contour if the sampling had been wider." (23) An interesting attempt to apply a sophisticated psychological technique (Osgood's "semantic differential") to the determination of these meanings is Elizabeth Uldall, "Attitudinal Meanings Conveyed by Intonation Contours", Language and Speech, III (1960), 223-234.



following syllable. Accent A falling on some other part of the sentence, A 2a. Do you


* ly hate your bier?

A 2b. Do


really hate your b r her?

gives special emphasis to the accented words, in this case really and you, with special implications ("Are you certain you do?" or "I'm not asking about other people, I'm asking about you") We must distinguish between 2a above, where really is marked for special emphasis as an Accent A by the sharp drop on -ly, and an utterance in which the glide down from real- is so gradual as to signal an Accent B with the meaning "connectedness": B rea

% hate ^ your b

3a. Do you


°th e r ?

The implication here is a kind of sad incredulity: "I find it difficult to believe that anyone could hate his brother". We could also have an Accent B on really with a normal prominence Accent A on brother with very little difference in meaning: A B

3b. Do you







Two Accent A's, on the other hand, amount to an utterance whose meaning is more like 2a: A reali

A a

2c. Do you

S yo°"r

b r o t




3a is an example of the first type of Accent B, where the accented syllable is skipped up to. An example of the same utterance with the second type of Accent B - where the accented syllable is skipped up from - would be: B h a K

3c. Do you really

y o u r ^ "


B (or Do you real1* h a t e y ° u r brother?) Finally, Accent C has (in American English, at least) a lackadaisical or bored implication, as in : C


D o

C y° really hate your brother? u

When Accent C is added to Accent A r in this utterance, the result is an impatient repetition of the question: A 4b. Do you C


real* ^


("What's the matter with you? Can't you hear what I'm saying?") A good example of how de-emphasis may blend into or be taken for boredom is in the common greeting "How are you?" Normally pronounced with Accent A, A How



it has a range of de-emphasizing implications with Accent C How

Q are


depending on the context in which it is spoken. Its de-emphasis may be in the direction of graciousness, as when one speaks to a woman



or a child. Or, accompanied by a turning of the head, it may signal lack of interest or curtness. It is also an appropriate accent when one wants to tone down an overly enthusiastic introduction. If, for example, the introducer has attached an excessive importance to the meeting ("I know how much you've wanted to meet each other") or to the speaker ("This is the great Mr. So-and-so"), Accent C, in this situation, seems to say "The introducer is making too much of a fuss, but nice to meet you anyway". It is to be noted that phrase accents are marked by pitch obtrusions which are of relatively great magnitude. Obtrusions of lesser magnitude may mark lexical stresses but not accents. Thus, slight obtrusion on -mits The company per™^ no smoking on the J°b is simply the cue for indicating that it is the verb and not the noun; but considerable obtrusion would give the word accent with meanings of special or contrastive emphasis: The company per m i t s no smoking on the job (perhaps with the implication "and I'm tired of warning you!"). The precise point at which pitch obtrusion begins to signal accent as well as lexical stress is not always clear, and informants may differ in their interpretation. This should not disillusion strict phonemic formalists, since the boundaries of segmental phonemes too are sometimes indeterminate; for example, there are certain tongue postures where it is not clear whether pin or pen is being said if contextual cues are not present. Phonemes divide up a continuous stretch of speech sound, and precise "fixes" may be difficult or impossible to make in isolation. But a no-man's land never disproved the existence of two opposing armies. Another question may have arisen about this analysis of stress and accent: "How do we account for such apparent differences in prominence as that between the syllables veg- and -tar in vegetarian'! Or, similarly, between op- and -a- in dperator vs. operdtion. Surely we ought to identify these lesser prominences as 'secondary' stresses of some sort: vegetdrian, and dperator vs. operdtion."



For several reasons it is awkward to introduce different "levels" of stress. For one thing, stress in the limited view we have adopted operates in an "on-off" fashion; either it is there, or it is not, and its presence or non-presence is known by every speaker on clear lexical criteria. The differences between the syllables veg- vs. -eand -an in vegetarian and op- vs. -er and -tion in operation depend more upon the qualitative nature of these vowels than upon suprasegmental characteristics.68 The vowels in the syllables -e- and -an in vegetarian and -er and -tion in operation tend to have the centralized or neutral tongue position typical of reduced vowels. A convenient representation of the reduced vowel is /i/ ("barred i"); the phoneme occurs in the second syllable of roses (ro:z?z/ (as opposed to the /a/ or "schwa" of Rosa's /ro:z9z/). Veg- and op-, on the other hand, have the full quality of stressable vowels, namely /E/ and /a/, even though in this instance, they do not bear stress. It is true that veg- and op- are usually pronounced a little louder and a little longer and at a slightly different pitch than the reduced vowels, but we assume these to be allophonic. To secure evidence of the nonphonemic character of these lesser prominences, one need only scan the dictionaries to see how greatly opinions vary about whether syllables contain "intermediate stresses" or not. In a random glance through three popular American college dictionaries - The American College Dictionary, Webster's New Collegiate Dictionary, and Webster's New World Dictionary - we find one placing intermediate stress marks on the -ism in Congregationalism (ACD), while two do not ( WNCD, WNWD). Similarly, confiscate (ACD, WNWD) vs. cdnfiscdte (WNCD) and 68 See Robert Stockwell's review of intonation research, International Journal of American Linguistics, XXVII (1961), 278-283. Evidence for the existence of a reduced vowel phoneme is to be found in Allen Hubbell, "The Phonemic Analysis of Unstressed Vowels", American Speech, XXV (1950): "The phonetic facts are far better explained and more simply set forth if we conceive of a separate phonemic category in which all stressed vowel oppositions are suspended", p. 110. The position is supported by Nathaniel Caffee, "The Phonemic Structure of Unstressed Vowels in English", American Speech, XXVI (1951), 103109, who observes that illiterates do not know how to "reconstruct" the full vowel character of an unstressed vowel like that of the second syllable of "stomach". The naive spelling "could of" is another example of the same thing.



patriarch (ACD, WNWD) vs. pdtriarch (WNCD). Differences of opinion are so widespread among these dictionaries, which practically always agree on the position of lexical stress, as to suggest that the distinctions are structurally insignificant by-products of differences between vowel phonemes.69 The notion of "intermediate stress" reflects a genuine impression that such syllables have some sort of prominence, but this prominence is not part of the language's system of signalling meaning differences among words. It depends rather on the quality of the vowel - its full or reduced character - and upon the relative position of the syllable in the word. It is customary, when pronouncing polysyllabic words which have their lexical stress on a late syllable, to give one of the earlier syllables a slight pitch obtrusion, thus g4 con







But these are not essential or fixed parts of the citation forms of these words, which could just as easily be pronounced: Congregationalism

intention. It may be that this non-significant vacillation is what suggests different notations to the dictionary's editors.70 These nonpho69 This is recognized by John Kenyon and Thomas Knott, A Pronouncing Dictionary of American English (Springfield, Mass., 1944), p. xxiv: "Syllables more prominent than one or more others in a word, but less prominent than the strongest one in the word, have some degree of subordinate accent, which may or may not be indicated by the secondary accent mark (,), as hesitate hezo'tet, hesitation iheza'tejan. Since subordinate accent varies from nearly the weakest to nearly the strongest in words where it can occur, and varies moreover with innumerable styles of speech, no dictionary can accurately mark all instances. Hence the use of secondary accent marks here is largely conventional." [Their term "accent", of course, is my "stress".] ,0 The precise nature of the allophonic conditioning is too complicated to consider here, but there is an interesting attempt to work it out in N. Chomsky,



nemic prominences may be manifested by loudness and length as well as by pitch obtrusion. In a sense, the syllables which carry them may be said to have prominence potential because they have full vowels. But this is allophonic, nonsignificant potential and not the same as lexical stress potential. The important fact about syllables with nonphonemic prominence is that they do not carry the potential for accent except in the special case of "elucidational contrast". They cannot ordinarily be more prominent than the main lexical stress in the word. One cannot normally say: I don't like your a t t i r e or He colMbo rat ed with them. Where two words are united in a compound, a loss of lexical stress always occurs in one, although it may maintain allophonic prominence. For example, in adding ¿1 evator to Aerator one may get 61 61 evator operator (or) evator °Perator. It is also possible to say Elevator °^erator for instance, in contrast to elevator starter. The form does not contradict the observation that unstressed syllables may not carry phrase accent, however, since it is clearly "elucidational" and therefore outside of the usual lexical function. The pronunciation with accent on op- is parallel to M. Halle, and F. Lukoff. "On Accent and Juncture in English", For Roman Jakobson (The Hague, 1956), pp. 65-80.



I didn't say "Worm", I said " u n form". Another reason for declining to admit intermediate degrees of phonemic stress is the fact that if one were to recognize them in some words, it would be hard to know where to stop. If disestablish is said to have, say, four levels of stress (disestablish), what is to prevent us from saying that antidisestablishmentarianism has eight? w « Va ^ / \ •/v^ v antidisestablishmentarianism

(> is primary, / secondary, * tertiary, A quaternary, quintary, \ sextary, " septenary, and ^ weak.) Stress and accent distinctions do not suffice to account for all the meaningful "suprasegmental" differences in English. For example, it is clear that we can turn two utterances which we have recognized as stress- and accent-distinguished into questions by raising our voice at the end; consider citation forms of the words insight and incite as statements and questions. The question forms clearly have a different sort of ending than the statement forms. We can represent these endings as follows: In

sight. vs.



In q te. vs. I n q t e ? Pitch fall at the very end, then, after accent, must have the meaning "statement" in these forms, and pitch rise the meaning "question". Since they occur at the very end of the utterance, it is customary to call these rises and falls "terminals". Three sorts of terminals are ordinarily distinguished: falling (or fade), rising, and level (or sustentional). They can be represented graphically as follows: \ / and ->. 71 Authorities differ as to their precise constitution. Originally it was felt that \ was marked by a simple fall; later it was variously considered a combination of falling pitch and 71 These are different symbols for the same phenomena originally marked by G. L. Trager and H. L. Smith Jr., Outline of English Structure (Norman, Okla., 1951), as / # / , /| I/, and/|/.



"fade" in volume, or only a fade in volume, or fall and fade plus additional features (e.g., "drawling" of the final sounds over which it operates, or a "pre-inhalation posture"). 72 There has been less difference of opinion about the phonetic nature of / , which is generally defined as a rise in pitch on the last syllable of the final word in the phrase, no matter at what level the preceding syllable was pronounced, although some question exists about whether the voice ceases more or less abruptly than with \ , 73 -» is "marked by the absence of positive features for either \ or / "; it is, thus, a sustention of the previously occurring pitch. Here, too, there is some question about whether the voice does or does not fade, and if it does, at what speed.74 -* is the hardest terminal to distinguish, because, by definition, it has no distinctive contour, so that its difference from normal transition or plus juncture (see citations fn. 72) is a matter of timing. Without becoming deeply embroiled in the technical difficulties of the analysis of terminals, we can define them simply as phrase-ending pitch movements or sustensions which occur before actual or potential pauses (i.e., voice cessations). It is also useful to observe that terminals often coincide with major syntactic breaks. Here are some additional examples of how terminals combine with accents: B rea

(Speaker A) Do you (Speaker B) Do I

r 6

% h a tee

A yo U r

br6t, e

% hate my b r > e r =

r = B + A + B




+ ^

72 Trager and Smith, p. 42; W. Nelson Francis, The Structure of American English (New York, 1958), p. 157; James Sledd, A Short Introduction to English Grammar (Chicago, 1959), p. 29; C. F. Hockett, A Course in Modern Linguistics (New York, 1958), p. 37, and Twaddell, pp. 419-420. See also M. Joos, "The Definition of Juncture and Terminals", Second Texas Conference on Problems of Linguistic Analysis in English (Austin, 1962), 4-38, who provides time specifications. ' s Francis, p. 157 ("somewhat less abrupt"), and Twaddell, p. 422 ("either abrupt crescendo plus cut-off or abrupt cut-off..."). 74 Francis, p. 157, and Sledd, p. 30; Hockett, Course, p. 37.



The first is a straightforward question, but the second is an "echoquestion," i.e., the original question is repeated to make sure that it is understood. The same accents in both sentences are on the same syllables - real and bro-\ but the terminal pitch contour on the ««accented syllable -ther by itself conveys the meaning difference. This illustrates an important distinction between accents and terminals: accents fall normally on stressed syllables; terminals fall on unstressed syllables if these are final. The meaning contrast for level contour B r

D o you


A ha ate



ther = B + A +


is not quite so clear: it might function as a less strongly marked echo-question or as an indication of a certain lack of interest. It is conceivable that almost any combination of accent and terminal might occur. 75 This chapter must end with something of a disclaimer. The above formulation, based mostly on Bolinger's research, does not pretend 76 Pike (Intonation) speaks not of terminals but of pauses. There are two significant kinds of pause in his system. Their linguistic status is not exactly clear: they are either phonemes or morphemes (p.31). Nor is the precise phonetic distinction between them entirely fixed : the pause called tentative (and written /) is "usually shorter in length than the final one" (written //), "but it is not always so" (31). The chief distinction seems to be that the tentative pause may on occasion contain no actual cessation of speech at all, but rather a lengthening of the last sound of the preceding word; thus a tentative pause could occur between "man" and "is" in the sentence "The man is here" in the form of an extra-long sounding of'/n/, approximately equal in time to that of an actual pause in speech. The elongation "is accompanied by a considerable weakening of the strength of the sounds" (31). Further, tentative and final pauses have different effects on the sounds which immediately precede them: "The tentative pause tends (1) to sustain the height of the final pitch of the contour. A 2-4 contour, for example, before a tentative pause tends to end on one or more syllables on pitch four without drifting downward ; there may prove to be occasional slight drift upward, although never as much as is found in a rise from significant level four to significant level three. In addition, (2) the tentative pause often affects the quantity of the preceding contour in various ways not as yet clearly defined. The syllable preceding a tentative pause is often



to provide a complete analysis of English suprasegmental phonology. N o one yet has the answers to all the complex questions which an air-tight analysis would require. The theory of meter presented in the fifth chapter however, does rest upon structural linguistic orientations, and it is hoped that some increase in accuracy and reliability is thereby achieved. Improved analyses of English stress and related phenomena are to be expected, and alterations of metrical statement will then be necessary. In the meantime we

longer than usual, sustained on a level pitch. At other times, it is the beginning point of the [next] primary contour that carries length and so gives the clue to the presence of the tentative pause. On the other hand, the departure from the undefined norm may be in the opposite direction, and yet give related results: a very short ending often indicates that a tentative pause follows... In general, it may be that any departure from the normal length of the elements of a primary contour contributes to the recognition of a following pause as tentative, provided that the full height of the pitch is sustained at the end of the contour." (31). The final pause, on the other hand, "modifies the preceding contour (or contours) by lowering in some way the normal height of the end of the contour. If the contour itself ends in pitch four, then preceding a final pause it will tend to fade into silence while drifting downward; this is considerably different from the pitch of the same contour which has a somewhat level, possibly sustained, ending when it occurs in the middle of a sentence without pause, or when it occurs before a tentative pause". That is, one must contrast utterances such as I'm going// and I'm going I when I'm through//. In the first sentence the voice sinks lower on -ing than it does in the second, and also tends to fade away more rapidly and more dramatically. These can occur as minimal pairs where tentative pause signals uncertainty or non-finality: I'm going. 4- 2-4//

(Implying, possibly, and that's that)

I'm going... 42-4/

(Implying, possibly, if you do not dissuade me).


It will be observed that certain distinctions which have been made in the Pike and Trager-Smith analyses are not included in this formulation; perhaps the most important is the analysis of English intonation into four pitch levels. See Dwight Bolinger, "Intonation: Levels Vs. Configurations" Word, VII f 1951), 199-210. To give the reader a complete account of relevant phonemic study, Pike's and Trager-Smith's systems are described in detail in Appendix A.



must make the neatest statements we can with the systems at our disposal. In any case, the desiderata of suprasegmental analysis are clear. A systematic notation needs to satisfy three requirements: 1) Consistency. The system must be consistent, that is, there must be little variation between two transcriptions by different transcribers or between two transcriptions made at different times by the same transcriber. Anyone who learns the system should be able to read his transcription aloud with demonstrably the same features that he originally heard. 2) Simplicity. The system must be both easy to learn and easy to teach if it truly represents what everybody knows about the language by virtue of being a native speaker. Linguistic science, like other sciences, always strives for the simplest formulation to account for all the facts. The system should have neither too many nor too few symbols to represent the significant differences in the language. 3) Convertibility. The system should account for "stress flattening," the phenomenon which occurs when syllables in longer utterances lose part or all of the suprasegmental prominence they would have in isolated citations. If, for example, one pronounces the words very, decent and fellow alone, as separate utterances (say, as answers to the questions "Was it very nice?" "How was it?" and "Did you say fallow or fellow?"), the first vowel of each word will most likely be stressed by the usual combination of features discussed above: higher pitch, greater length, greater loudness, and clearer articulation. But if these words are spoken normally (without special emphasis) as parts of a sentence - say, He's a very decent fellow the pitch on ver- and de- will flatten out, and their length and loudness will be reduced. Often, in rapid speech, only a single rise in the sentence will occur, on the last stressed syllable:



He's a very decent


and that syllable will sound regularly longer and louder than it would in some other sentence where it was not final, for instance, in He's a fellow c i t iz e n How to account systematically and with maximal simplicity for flattening and reduction of word-stress in sentences is part of the broader question of how to incorporate all suprasegmental elements into a coherent system. The fact that no system has completely succeeded in satisfying the minimal requirements of consistency, simplicity, and convertibility demonstrates the severity of the problem, not the lack of ingenuity or competence of the linguists concerned with it. But metrics need not wait for the creation of an air-tight system; some of its purposes will be served if it acquires from linguistics nothing more than an increased sophistication about these matters than has been its custom. It is very instructive for metrists to consider the difficulties which such concepts as "stress" entail. Careful study of the schemes described in this chapter and in Appendix A will dramatize the problems facing any analysis. The justification for presenting a theory of meter on linguistic principles before there is unanimous agreement among linguists is that even contingent views will teach us more that is relevant to the study of meter than we would otherwise know.



The mechanical analysis of verse first became feasible with the development of machines for measuring speech properties. The first phonetic machine was the Marey tambour, usually referred to as the "kymograph", which measured air pressure variations. (Strictly speaking, the kymograph was only one part of the device, the clockwork drum upon which the trace was recorded.) One spoke into a rubber tube which conveyed pressure changes to the drum, upon which a stylus made appropriate marks. 1 Kymograph registrations are long, single, intricately wavy lines. The trace of a reading of the "To be or not to be" Une from Hamlet appears on page 78. Recognizing significant phonetic patterns in this sort of trace was by no means easy ; one needed sharp eyes or even magnifying instruments, and he had to make close measurements to perceive relevancies. The amplitude of the sound was indicated by the relative height of the wave. In the example, the [i:] in be has relatively more amplitude than the [u:] in to because it is higher from the base Une. The length of the sound was indicated by the length of the line measured from left to right: here [i:] is more than twice as long as [u:]. The frequency of the sound was indicated by the number of waves in the Une occurring per second: here, [u:] in to lasts approximately 0.2 seconds and contains 18 waves, hence, it must have been pronounced at approximately 90 cycles per second. The 1

For complete descriptions of the kymograph, E. W. Scripture, Elements of Experimental Phonetics (New York, 1902), pp. 195-211, P. Verrier, Essai sur les principes de la métrique anglaise (Paris, 1910), III, 1-22. Summaries of kymographic and other mechanical research in English metrics can be found in Cary Jacob, The FoundationsandNatureof Verse (New York, 1917), W. Schramm, Approaches to a Science of English Verse (Iowa City, 1935), and Albert Chandler, Beauty and Human Nature (New York, 1934).

differences in segmental phonemes were particularly difficult to determine. About all one could do without extensive training was to recognize the differences between vowels, which clearly show voice vibration, and voiceless stops (like [t]), which are long, flat, and at the zero line, since they correspond to actual cessation of sound. Those who used the kymograph clearly recognized its shortcomings: Graphic experimentation yields but poor results... in the case of pitch. The pitch of clear vowels when sung can be measured, but the vowels of ordinary speech are so complex that no system of analysis into simple components is adequate...



On the whole I think it sensible frankly to admit the impossibility of any exact analysis of verse with respect to either loudness or intensity.. . 2 The kymograph, however, did make it possible for the first time actually to measure speech durations and hence to lay some temporalist ghosts once and for all. The dissertation of Warner Brown, Time in English Verse Rhythm (1908), was one of the first effective investigations of syllable duration in performed verse. Brown limited himself to time measurements; he was aware of the importance of the other variables, pitch, loudness and quality, but felt that recording techniques had not sufficiently advanced to provide accurate traces. He demonstrated clearly that such attempts as Lanier's to scan by using musical notes were unwarranted because of the inexactness and incommensurabilities of actual durations. ("It is seldom that the method of mere observation and of introspection can be so plainly convicted."3) He discovered that constancies do exist in the recitation of metrical dummies made up of nonsense syllables (pa pa pa pa). Feet tended to be approximately the same length, and a constant ratio (varying from one to three) occurred between unprominent and prominent syllables. But no temporal constancies could be found in either respect in the recitation of actual verse. In several cases, the "long" syllable (i.e. ictus) was actually briefer than the "short". Brown also discovered an interesting difference between iambic and trochaic nonsense lines; ictus in iambic meter was from 2.1 to 2.9 times longer than non-ictus, whereas in trochaic meter, the ratio was profoundly different, the ictus ranging from only .46 to 1.04 times the length of the non-ictus. The temporalist is hard put to explain why, in the metrically pure situation involving nonsense syllables, English speakers would elect to give the ictus as little as half the duration of the non-ictus. This research on the components of nonsense recitations was important because it gave some indication of the norms assumed 8

Warner Brown, Time in English Verse Rhythm (New York, 1908), pp. 21, 24. See also a variety of difficulties listed by Verrier, III, 9-13. » Brown, p. 27.



by English speakers. The absolute syllable durations varied considerably from speaker to speaker, but the ratios of relative duration between the syllables in a foot seemed to be fairly constant. Thus, readers do assume a norm and in fact can demonstrate it in an artificial situation if called upon to do so.4 Brown's conclusions were supported by the work of Ada Snell (1918).5 Her statistics too demonstrated clearly that syllable length was not necessarily an indication of metrical ictus, that indeed unstressed syllables could last longer than adjacent stressed syllables. For example, one of her readers took .22 seconds to pronounce mo- in moment but .34 seconds to pronounce -ment. Likewise, the ter- in terror was shorter than the -ror, liv- in living shorter than the -ing, -ut- in unutterable shorter than the un-, and fa- in fables shorter than the -bles. To deny the metrical ictus to these syllables because they were actually shorter would be to contradict the stress pattern of the language itself. Miss Snell suggested that in 90 percent of the cases, length was significant. But significance meant something different to her than to the metrical "timers" ; she felt that length usually coincided with other features as markers of ictus, not that it alone was the ictic cue. In 1918, she seemed to have understood the mixed character of ictus quite well; she wrote "... in a rhythmic series already clearly established in the mind, the slightest change of pitch, quantity, or stress would satisfy the rhythmic feeling." A contradictory view about time constancies seems, at first glance at least, to characterize the experimental portion of Paul Verrier's Essai sur les Principes de la Métrique Anglaise (1910), also based on kymographic studies. Verrier held that English feet 1 I have performed Brown's experiment with pa pa on the sound spectrograph and can generally corroborate his results. For the iambic line my durations were pa (15 centiseconds) pa (27) pa (16) pa (30, pa (12) pa (27) pa (15-1/2) pa (28) pa (19) pa (39). The last syllable is, of course, longer than the others by position and should not be considered in the tabulation. 6 Ada Snell, Pause (Ann Arbor, 1918), and an article in PMLA, XXXIII (1918), 396-408 and PMLA, XXXIV (1919), 416-435, entitled "An Objective Study of Syllabic Quantity in English Verse."



were constant in duration, provided one measured "correctly", i.e. from the very first vibration of each stressed vowel rather than from the beginning of the stressed syllable. He carefully explained, however, that he was not espousing a quantitative theory of meter, that feet are not absolutely equivalent, but more or less perceptually (sensiblement) equivalent. He suggested parallels with other phenomena: the edge of a razor is commonly held (i.e. ordinarily perceived) to be straight, yet under a microscope it looks as jagged as the peaks of the Andes. Even with this concession, however, Verrier's position is not totally defensible. Although his statistics were sophisticated, he was willing to take as equal, feet which differed by as much as 40%.® Another weakness was the very small size of his sample. There certainly exist lines of English verse which would present greater temporal discrepancies in normal recitations than 40 %. Unlike Brown, Verrier studied the frequency information in his traces, though he admitted the uncertainty of his measurements.7 From these, he composed intonational scores in musical-staff notation, complete with key registration and time signature. But little connection was made with the metrical analysis. He did not consider the scores as intonation contours, but rather as real musical sequences, referring to "cadences", "dominant and tonic chords", etc. The "melody" was taken as a separate aspect of the performance, indirectly aiding the rhythm proper but not always coinciding with it. Amos R. Morris published mechanical traces of verse as part of his doctoral dissertation, The Orchestration of the Metrical Line (1923). Using a kymograph which registered both nasal and vocal air emission (an advance over Brown's) he presented graphs and measurements of regular lines by R. L. Stevenson, Tennyson, Emerson, Frost, and Shakespeare, and free verse lines by Whitman, Jeanne D'orge, and Alfred Kreymborg. He also studied short prose passages by Pater, Lincoln, Bryce, and Emerson. (An appendix contains extensive tables of time and pitch measurements and * One foot in a line was 97 cs. and another 69 cs. Verrier, HI, 283. 7 Verrier, III, 253.



"cadence graphs" displaying pitch contours as a function of time. Amplitude, however, was not measured.) Morris' purpose seems to have been to determine the dominant rhythmic factor in the "composite of patterns" which he discovered in the performance of lines. This composite is the "orchestration" of the title. He concluded, in respect to some genres at least, that the most stable relationship was time equality, although he recognized that the determining factor could be some other element, presumably pitch or stress. The biggest weakness in Morris' work is a failure to explain what he meant by "stress". He used the term freely, inserting stress strokes widely, but it is impossible to know precisely what the strokes correspond to. They are clearly not derived from the kymographic record. Furthermore, Morris offered no rationale for making foot divisions, and his practice was not consistent; for example, a line from Stevenson was divided as trisyllabic How do you / like to go / up in a / swing? but the next as bisyllabic Up in / the air / so blue. But why not Up in the / air / so blue or Up / in the air / so blue? And if foot determination is not itself justified, how can one present convincing measurements of foot duration? Morris did not seem to have really discovered anything new about meter through his instrumentation. He was the first to provide usable intonation graphs, and interesting documentation of performances, but he did not face fundamental questions about the nature of the metrical elements. The most famous of the experimental phoneticians to investigate English metrics was E. W. Scripture. Scripture wrote on English



meter for over thirty years.8 His book Grundzüge der englischen Verswissenschaft (1929) summarized his methods of investigation and conclusions. He early recognized the phonetic complexity of metrical prominence. Scripture did not identify it with traditional concepts like "stress" and "accent". Rather, he invented the concept of "centroid" - a point in the stream of speech where a peak of "energy" occurred. Energy, or "auditory impressiveness", was the sum of a number of features: intenser loudness (Lautstärke), slower sound-transition (Lautdauer), higher or lower pitch (Tonhöhe), and more precise enunciation (Genauigkeit).9 Kymographic measurements were converted into energy-tables, in which each sound was adjudged plus, neutral or minus in respect to each feature: Energietabelle T o b e or n o t t o b Genauigkeit

+ +

+ + +

Lautstärke Dauer


+ + +

Tonhöhe Energie

e t h a t i s




t h e q u e s t i o 11 -

+ +


+ W






+ W








Data from the energy-table were then reconverted into graphic form representing rises and falls in energy (see table on p. 84). The numbers below the base line represent seconds; the dots above the letters represent centroids. The interval between centroids was called the "rhythmical period". Rhythmical periods tended to be equal, but they were not feet in the ordinary sense. Scripture discarded the foot because he discarded the possibility 8 Among his publications are "The Nature of Verse", British Journal of Psychology, XI (1921), 225-235; "Investigations on the Nature of Verse", Vox, XXXII (1922), 4-14; "Whence Does the Poet Get the Form of his Verse?" Modern Languages, V (1923-1924), 163-172; "The Physical Nature of Verse", Nature, CXIV (1924), 534-535; "The Biology of Verse", Nature, CXIV (1924), 825-826; "Die neue Metrik", Archiv f . d. ges. Psychol., LXIV (1928), 463-474; "Experimentalphonetische Studien über die englische Verszeile", Arch f . d. ges. Psychol., LXV (1928), 203-215; "The Choriambus in English Verse", PMLA, XLIII (1928), 316-322. • See Elements of Experimental Phonetics, pp. 447-448.




b e





1.000 be

o 2.000




lϙ s the qu e



o , 5.000


of finding syllable boundaries in speech, admitting only the existence of syllabicity (Silbigkeit).10 Thus syllables could be counted but not separated. For Scripture, only the line counted ("A stretch of the verse-stream that coincides with the printed line"). He proposed a complete accounting of it in terms of number of centroids ("Z"), total number of vowels ("G"), number of strong ("energetic") vowels ("S"), number of weak vowels ("W"), number of vowels between each centroid ("v"), the rhythmic stroke ("q") equivalent to the ratio of strong versus weak vowels, and, finally, the disposition of first and last vowels ("k"): weak-strong ( y ) , strong-weak ( \ ) , strong-strong (->s), and weak-weak (->w). (It is not entirely clear why all these tallies were necessary.) The formula for the line from Hamlet graphed above was Z = 5, G = 1 1 , S = 6, W = 5, v = 0 - l - 2 , q = 6 : 5 , k = / . Since only lines were recognized, Scripture had to classify dozens and dozens of linear kinds according to the number of centroids and of intervening weak vowels, proliferating terminology endlessly. For instance, among "duocentroidal" (Zweischlager), there were ;



one, two


The stream flows

Grundzuge der englischeit Verswissenschaft (Marburg, 1929), pp. 25-26.



My flocks feed not


'Tis the world's winter


and so forth. One has the strong feeling of losing the metrical woods for the trees. Other difficulties in Scripture's method suggest themselves too. For one thing, the parameters are suspect. The Lautstarke of a sound was represented by the distance away from the base-line, but a glance at the graph will show how difficult it is to interpret this feature as loudness. Since what was measured was force of outgoing breath pressure, the line does not merely represent the acoustic intensity of the sound (from the impact of the vocal cord vibration), but also the force of the air-current that was emitted. But, of course, articulatory force is not at all the same thing as loudness. The aspiration after a voiceless stop like [t] may involve comparatively great pulmonary effort, but it certainly cannot be louder than the vowels on either side of it. Therefore Scripture's display is unreasonable for acoustic intensity. Duration seems to have been more fairly marked, but the lack of clear-cut distinctions between segmental phonemes make questionable the time-measurements of Scripture (and other kymograph users). Since there was no way of distinguishing between voiced sounds, particularly vowels, it was impossible to tell where one left off and the other began. In the kymogram above, [i:] in be is completely continuous with the vowel in the next word, or, and the insertion of any dividing mark would be arbitrary. (This problem is partly but not totally solved by the sound spectrograph which shows differences between vowel phonemes by means of characteristic formant contours.) Because of confusion between loudness and expiratory breath pressure, Scripture was forced by his statistical method to find the locus of metrical prominence in peculiar places, for example, between two syllables interrupted by actual silence; in the Hamlet line, between not and to. This judgment surely goes against com-



mon sense. Other prominence positions in the line seem equally strange: between be and or, between be and that, on the [t] of that, and between the [t5] and [a] of question. Furthermore, Scripture's component of "articulatory precision" could only be impressionistic. The intensity, frequency, and duration of different sounds can be measured and fairly easily compared, but it is difficult to know how to measure precision of articulation. If one uses precision as a feature, he necessarily penalizes the vowels in favor of the consonants, since degree of precision is not an important factor in vowel articulation. Notice that none of the vowels in the energy table reproduced above were marked "precise". Scripture's method of deciding how to group sums of features into weak or strong also seems arbitrary. It is not clear, for example, why the vowel in or is grouped together with the vowel in be, while the [b] itself is grouped with to. Why not connect the final [t] in not with [na], in which case the strong syllable not (four pluses) would be more energy-laden than to (two pluses)? Similar questions could be raised about the analysis of other syllables. Even more fundamental questions must be asked about the general method: why assume that "energy" is the significant metrical feature of verse at all, or that it consists of the simple sum of the four features selected by Scripture? It would be strange if everything were quite so neatly additive, since redundancy characterizes language in so many other respects. Furthermore, what was added together was disparate, some things being purely physiological and others purely acoustic. Since the loci of actual energy expenditure need not necessarily coincide with the loci of perceived loudness, why add the two together? Finally, there are objections to be raised on the score of economy of method. Scripture's rejection of the foot and categorization solely in terms of lines required a vast proliferation of line-types: for example, he needed 279 kinds of four-centroid lines. From the point of view of system design, such large categories are very inelegant indeed. The labor required to learn Scripture's vast terminology does not seem worth the minute distinctions which they enable one to make. Wilbur Schramm, in Approaches to a Science of English Verse



(1935), was the first to conduct large-scale metrical investigations with electronic equipment.11 He employed an oscillograph for measuring pitch and quality, a strobo-photographic camera for pitch, and a highspeed output-level recorder for intensity. These devices collected data which were then converted into scansional graphs or "scores" in a staff-like display. Intonation contours, calibrated to the musical scale, were marked on the upper staff, while intensity appeared in decibels, on the lower. Time was expressed by vertical bars representing one-second intervals. A recitation of the first two lines of Gray's "Elegy in a Country Churchyard" was transcribed as follows:

tm IA


—J 7i

I .

7T< ¿TTrK


^ L _ .


Schramm distinguished between stress and accent but used these terms in senses opposite to those offered in Ch. III. "Accent" for him was the "verbal, dictionary emphasis which is the property of one syllable of a word when that word is pronounced by itself" (p. 25). To determine what the phonetic nature of accent "really is", Schramm measured the properties of the "accented" syllables in a list of polysyllabic words pronounced by A. Lloyd James for the BBC Advisory Committee on Spoken English. He concluded that 11

Oscillograms of lines of verse had, however, been printed in Henry Lanz, The Physical Basis of Rime (Stanford, 1931), and metrical implications briefly discussed (pp. 208-211).



accent was a mixed phonetic phenomenon: "In 99 per cent of the cases, at least two of the three elements - intensity, pitch, duration join to make an accent, and in at least half of the cases all three elements joined" (29). The "typical" accented syllable was one in which the pitch was about a musical third higher than the surrounding syllables or "rising from their level", the intensity about four decibels greater and the duration about ten per cent longer. Stress, on the other hand, was defined as "rhythmical emphasis" or (in a somewhat unfortunate metaphor) that which "beats the drum" for the rhythm. Therefore, although Schramm's "accent" corresponds roughly to what I have called "lexical stress", his "stress" does not correspond to my "phrase accent" - his is a rhythmic definition, mine is linguistic. Schramm's work includes an analysis of the stress patterns in ten recitations of nine poems representing a wide range of English poetry, from Shakespeare to Vachel Lindsay. His approach was empirical: he studied those syllables which "in the judgment of five able students of verse and speech... unquestionably receive a rhythmical stress" (32). His measurements showed the "stressed" syllable to be more intense in 76 % of the cases (an average of 4 decibels louder), longer in 88% of the cases, and significantly different in pitch (i.e., at least a semi-tone higher or lower) in only 45 % of the cases. It is not clear why he decided to exclude pitch variations smaller than a semi-tone. There is evidence that phonetically untrained individuals can perceive much smaller differences in pitch without difficulty. After some observations upon "melody" in verse, Schramm ended with a treatment of rhythm and the nature of the verse foot, discussing the intersections of rhythms feet, phrases, lines, couplets - and pointing out that in many respects the traditional foot was not a real phonetic entity. The foot was not set apart by sense, nor by pauses, nor by the psychological grouping effect, nor by equal time spans. Neither did it "express the rhythm of the line" (69). What then was its function? Schramm considered it an empty category existing only to indicate the sequence of stressed and unstressed syllables: "Each foot contains one stress, or represents a part of the line which would contain a



stress if the pattern were normal" (70). The domain of the foot was the interval between stressed syllables; its boundaries were the intensity peaks of these syllables, rather than the syllable boundaries. Thus, Schramm's formulation resembled Scripture's centroid-rhythmical interval system; and like Scripture, Schramm did not make entirely clear how much intensity was required to qualify a syllable as stressed. A close study of Schramm's work is essential to any objective theory of meter. The limitations of a purely phonetic metrics become obvious, however, when we examine the results of his investigation. In a sense, the very matter for analysis (from a theoretical point of view) was what Schramm took for granted, namely how "able students of verse and speech" decide that certain syllables receive a "rhythmical stress" (32). A theory of metrics must show what it is in the sequence of words in a verse line which prompts a scanner to say "Here is an ictus and here a non-ictus." Schramm was content to let the traces speak for themselves, but if we examine them, we will see that they are not unequivocal. Consider, for example, the lines from Gray's "Elegy" reproduced on p. 87. Notice that the word homeward, in line three, shows greater intensity and length in the second syllable; -ward is about three decibels louder and about 28 % longer. The pitch-level is about the same (although home- has a rising inflection, while -ward is level). But the scansion can only be homeward. The word must then be an exception to the norms for intensity and pitch, if not length. But how is it that every one of the five judges knew that homeward was to be scanned with ictus on the first syllable? How did they know that it was necessary in this case to "disbelieve" their ears? To put it more accurately, what linguistic information did they possess which prompted them to discount the actual acoustic prominence of -wardl Chapter V will attempt to provide an answer. Developments in electronic research have made possible more exact analysis of speech. The cathode-ray oscilloscope, a device for displaying the flow of electrons, when attached to a microphone, permitted the first clear display of sound waves by electrical means.



The simplest sound wave, or pure tone, for example, the sound made by a tuning fork, is an oscillation of energy in the air. It can be represented by an oscilloscope as a moving dot of light, as follows :

The oscilloscope screen is laid out in checkerboard fashion, and the frequency, or the sum of oscillations per second, and the amplitude, or size of the maximum deviation of air pressure from its normal level can be read easily.12 The oscilloscope is not very useful in analyzing human speech, however, for two reasons. First of all, even the simplest speech sounds, pure vowel sounds pronounced at monotone pitch, form very complex waves. Like most natural sound-issuing agents, the voice makes sounds rich in overtones. An overtone is a sound wave which is higher (i.e., of greater frequency) than some basic tone and co-occurrent with it. When overtones occur as exact whole-number multiples of the fundamental frequency (the frequency of repetition of the whole complex wave), they are called harmonics. What makes a complex wave complex is precisely the fact that it combines two or more sound waves together. If three sound waves are added together whose frequencies are 100,200 and 300 cycles per second: 12

The design and capacities of oscilloscopes are discussed in many basic handbooks on electronics and radio. See, for example, Abraham and William Marcus, Elements of Radio (New York, 1943), II, 625-638. Diagrams throughout are taken from Peter Ladefoged, Elements of Acoustic Phonetics (Edinburgh, 1962).



the resulting complex wave looks like this:

The 100 cycle tone is the fundamental frequency, of which the other tones are harmonics (the first and second, or the second and third, depending on how one counts). The tone we perceive is the fundamental; that is to say, a complex 100 cycle tone is heard as occurring



at 100 cps. The harmonics add a kind of musical depth or richness, usually called timbre. The different tonal qualities of musical instruments are products of different harmonic structures. The basic sound emitted by the human vocal cords possesses a very large number of amplitudes. When the sound passes through the upper vocal tract and mouth it is modified or resonated by the natural tendency of the oral cavity to vibrate sympathetically according to its varying shape. This resonating process increases the amplitude of some of the waves and decreases the amplitude of others. All the vowel sounds and some of the consonants are characterized by different resonances, so that to represent speech sounds graphically, one would have to analyze them into their separate harmonic components, showing the amplitude of each. The second reason for the difficulty of using the oscilloscope is that although the oscillographic record shows clearly the time and the varying summed amplitude of the harmonics of the complex wave as a whole, it cannot display separately the individual harmonics. Long and painstaking mathematical analysis is required to measure and extract the component waves of the complex wave. To correct the deficiencies of the oscillograph as a practical speech measuring device, an instrument to separate the components of the "sound spectrum" was developed in the Bell Laboratories during World War II, largely by Ralph K. Potter. The particulars of the machine, called the speech spectrograph, were first published in 1946 13 and a text on its use entitled Visible Speech appeared in 1947.14 The sound spectrograph is essentially a filter which can be tuned, like the tuning knob on a radio, to varying frequencies. The amount of sound energy of each frequency 16 is registered in varying degrees of black by a stylus on a piece of chemically treated ls

See vol. 17, pp. 1-89 of the Journal of the Acoustic Society of America (July, 1946). The commerical name of the machine is "Sonograph". 14 Among the practical uses beyond phonetic research that the inventors envisaged was the development of a visual telephone for the deaf. Potter, Kopp, and Green, Visible Speech (N.Y., 1947) describe a program to teach people how to read spectrographic output as they would regular print. 16 Actually each frequency band. The two bands used on the commerical machine are 50 cycles and 300 cycles wide.



paper. The frequency components are laid out on the vertical axis, one on top of another, with the fundamental at the bottom, while time is measured along the horizontal axis. The chief advantage of the sound spectrograph over the kymograph and the oscilloscope for metrical analysis is the comparative ease with which frequency can be read. 16 Speech changes so rapidly that frequency displays requiring the counting of curve elements are tedious. The spectrograph displays very clearly and exactly the component frequencies of each sound at each instant according to the relative height of the mark on the paper. Furthermore, the earlier machines did not provide an easy way of distinguishing the various vocables. The ability of the spectrograph to provide a clear formulation, not only of harmonic components but also of their relative amplitudes, makes it possible to identify different sounds by characteristic bands, or formants. 17 One disadvantage of the spectrogram is that it does not give a clear picture of the sum of the amplitudes of the component frequencies. Amplitude information is conveyed by the relative darkness of the registration, but the human eye is much less sensitive to differences in shading than to relative position on a graph. For this reason, a supplementary amplitude display device has been developed which provides a clear picture of the summed amplitude on the vertical axis versus time on the horizontal. To demonstrate the kind of information that the sound spectrograph provides, we may compare spectrograms of the words insight and incite: 16

The difference between spectrographs and kymographic records is the difference between the pattern of a rug spread out and the threads of the rug unravelled and bundled together (Visible Speech, p. 315). Good records of frequency are also made by the Griitzmacher-Lottermoser pitch-meter; see Akustische Zeitschrift, 1937, 1938. 17 N o t all speech sounds, of course, are complex tones. Phonemes like /t/ and /p/, for instance, are mostly absence of sound, and leave appropriately blank stretches on the spectrogram. Others, like /s/ and /z/, are really noises, i.e., collections of random, non-harmonic sound waves. These show up on spectrograms as irregular vertical striations. But even silence-gaps and noises can be identified by considering their effects on contiguous tones. See Visible Speech, Chapters VIII and IX.



The tenth harmonic is traced in white ink and set off by two dots. The amplitude is traced above. The syllable-time measurements are displayed below. The machine conveniently and accurately accounts for all three variables involved in metrical analysis.


The limitations of a purely phonetic approach to metrics should be quite clear from discussions in chapter III and the earlier part of this chapter. Neither the event feature, the syllable, nor the prominence feature, the ictus, can be totally accounted for by acoustic traces. For such traces are continuous, and the metrical features are discrete. The sound spectrograph cannot tell us how English speakers interpret data; it provides physical, not psychological information, and its results are only meaningful when combined with linguistic interpretations by native speakers. There is nothing in mechanical curves of frequency and amplitude to suggest the two-valued structure we have assumed meter, as a rhythmic phenomenon, to be. For all their minute registration, mechanical data lack the vital human elements of perception and linguistic meaning. We need to turn again to the study of sounds in their linguistic function to interpret the raw acoustic profiles gathered from the machines. Among the first to espouse a structural linguistic approach to metrics were the Russian and Czech Formalist linguists and literary scholars, particularly Roman Jakobson. 18 Jakobson ridicules the 18 It is difficult for the non-reader of Slavic languages to learn much about Formalist metrics because little has been translated, and even when discussions are in more familiar languages, examples are usually not. But there are the following: Jan Mukarovsky, "Intonation comme facteur de rhythme poetique", Archives néerlandaises de phonétique expérimentale, VIII-IX (1933), 153-165; and "The Connection between the Prosodie Line and Word Order in Czech Verse" (translated by Paul Garvin), in A Prague School Reader on Esthetics, Literary Structure and Style (Washington, 1958), pp. 131-154 (which volume contains an extensive bibliography of Prague School writings); Roman Jakobson, "Ober den Versbau der Serbokroatischen Volksepen", Archives néerlandaises de phonétique expérimentale, VIII-IX (1933), 135-144; and "Axioms of a Versification System Exemplified by the Mordvinian Folksong", Acta Instituti

% VV

H Jr.



< f,



'•i, ;


HWgjTi twifcw* 5» w W" .

>J> *


54 cs

•A 4 jjfe" '

i t ® 'C y 3KÉ •35 ; • - f VJr .




idea that meter could best be studied without reference to meaning, as if it were a purely phonetic object. Even if recitations are ultimate phonetic realities, meter itself is a phonological phenomenon. "Not the phone, but the phoneme as such is utilized as the cornerstone of verse."19 Jakobson, along with other structuralists like Lotz and de Groot, came to meter as a specifically linguistic problem ("Linguista sum; linguistici nihil a me alienum puto"). Although primarily concerned with Slavic, he has had occasion to discuss English meter, although all too briefly. He sees English meter as an accentual metrical system marked by contrasts between more and less prominent syllables. As for the composition of prominence, he has anticipated the present theory in one important aspect, recognizing the possibility of ictus as a product of either phonemic word stress or "phrasal stress" (accent). Jakobson has expressed in very clear terms the important distinction between performance and meter: The intention 'to describe the verse line as it is actually performed' is of lesser use for the synchronic and historical analysis of poetry than it is for the study of its recitation in the present and the past.20 Cross-cutting this distinction is that between design (the abstract meter, the "type") and instance (the actual line, the "token"): ... meter - or in more explicit terms, verse design - underlies the structure of any single line - or, in logical terminology, any single verse instance. Hungarici Universitatis Holmiensis Series B Linguistica, I (1952) 5-13 (with John Lotz); and "Studies in Comparative Slavic Metrics", Oxford Slavonic Papers, III (1952), 21-66. Jakobson's latest views on meter were expressed in Style in Language (New York, I960), 359-377, and passim. See also the work of John Lotz, "Notes on Structural Analysis in Metrics", Helicon, IV (1942), 119-146, "A Notation for the Germanic Verse Line", Lingua, VI (1956), Iff. "Metrics and Linguistic Analysis", Report of the Tenth Annual Round Table Meeting on Linguistics and Language Studies (Washington, 1960), pp. 129-137, and "Metric Typology", Style in Language (New York, 1960), pp. 135-148; and the book by A. de Groot entitled Algemene Versleer (The Hague, 1946). There is a good account of Russian, Czech, and Polish metrics by Victor Erlich in Russian Formalism ('s-Gravenhage, 1955). 19 "Über den Versbau..." p. 136. 30 Style in Language, p. 365.



Design and instance are correlative concepts. The verse design determines the invariant features of the verse instances and sets up the limits of variations.21 But, he warns, A variation of verse instances within a given poem must be strictly distinguished from the variable delivery instances.2a It is but one logical step from this position to recognize the verse instance as a sum or common denominator of all meaningful delivery instances, a hypothesis which underlies much of my own theory of meter. The sum or common denominator is part of the poem itself, the "enduring object" in contradistinction to the many performances of it, which are merely "events". 23 Jakobson's views have been totally supported by recent structuralist metrists. Three ideas now seem clearly established: 1) it is the linguistically relevant, not the unanalyzed speech sounds which signal metrical features; 2) meter itself is a system, parallel to and actualized by, but not to be confused with, the linguistic system; and 3) there is an essential difference between performance (recitation, realization) and abstract meter. The difference between performance and meter has been briefly but profoundly analyzed by Rulon Wells, using the technique of "logical construction". 24 Wells distinguishes the abstract meter from 1) the orthographic record (the original and its copies, the "text"), 2) a recitation of the poem (a more adequate record) inferred from the necessarily inadequate orthographic record and, 3) the phonemic system of the language itself. Even more suggestive is Wells' conception of metrics as a derivational or extractional process based upon operational rules. These are not "rules" in the prescriptive sense; they do not level a finger at the poet, warning him to conform or else. Rather, they propose to systematize the procedure by which the metrist makes his decisions so that these are not arbitrary and impressionistic. Wells' 21 22 23 24

Style in Language, p. 364. Style in Language, p. 365. See Wimsatt and Beardsley, PMLA, LXXIV (1959), pp. 587-588. Style in Language, pp. 197-200.



rules concern the determination of ictus by reference to the TragerSmith analysis of English stress. Primary stress is always ictic, weak stress never ictic. Secondary and tertiary stresses are indeterminate, depending on what makes the poem most uniform. This latter consideration falls under the "maximization principle": "among the possible interpretations of the ambiguous written record... [the interpreter] picks that one (if there is just one) with the most regular meter (in other words, one that maximizes the regularity of the meter)." Possible metrical implications of the Trager-Smith analysis of stress and related phenomena were first recognized by Harold Whitehall in 1951.28 Since that time there have appeared a number of studies utilizing the Trager-Smith system. A Kenyon Review symposium on metrics in 195626 reprinted Whitehall's views with amplifications, along with an analysis of several recitations of "Mowing" by Robert Frost in terms of the Trager-Smith system, a controversy about the metrical interpretation of a line from Donne's "Elegy X", and a defense of traditional metrics by the editor. Trager-Smith oriented metrics assumes as its fundamental task the explanation of how the four-levelled stress system it ascribes to English is adjusted to the simpler, binary matrix of English meter. Meter as performed must represent some greater complexity than traditionalists have assumed. The adjustment has been described by some as an accomodation, a squeezing or expanding or "tension" between the sound as articulated and the abstract metrical pattern. The stress phonemes of English are four, though it is probable that there were only three in Old English. Metrical stress, however, recognizes only two stresses, a stronger and a weaker. It therefore follows, as we think we have been able to demonstrate, that there is a permissable range of stress variation within which variety is felt as one of the conventional ornaments of English, though variation beyond these limits is a blemish. The two extreme stresses, primary or weak, are poetically fixed, the first being necessarily always a poetic strong, the second always 26

Harold Whitehall, "From Linguistics to Criticism", Kenyon Review, XIII (1951), 710-714. " Kenyon Review, XVIII (1956), 411-477.



a poetic weak. The two middle stresses, secondary and tertiary, may be poetic strongs, or poetic weaks. The principle of poetic stress is that a syllable is strong if it is stronger than those which surround it, so that as indicated above, a tertiary stress followed by a weak may count as a poetic strong, while if followed by a secondary or primary stress, it may count as a poetic weak.27 ... the traditional 'ideal' metrical patterns of much English verse - patterns based on the two-level contrast of stressed versus unstressed syllables have been 'orchestrated' since Marlowe by a poetic adaptation of the actual four-level contrast of speech.28 ... The present method... tries to account for the phonological complexity of verse by envisaging a tension between two systems: the abstract metrical pattern, as historical product of the English verse tradition, and the ordinary stress-pitch-juncture system of spoken English.29 Others, including Smith himself, have preferred a different approach, not treating meter as a separate system at all, but simply distinguishing between "stronger" and "weaker" among actually uttered stresses. ... the foot... is the profile level. The unit level then must consist of the two kinds of relative stress that a foot may have - stronger and weaker. It is to be emphasized that we are here dealing with the units of prosody; these are based on linguistic units which are much more varied, and our discussion will consist in part of showing how the four linguistic stresses, in sequences of two with some one of the junctures between, pattern as stronger or weaker prosodically. This quotation appears in the prosody of Epstein and Hawkes, 30 which is the fullest working-out of the Trager-Smith metrical hypothesis to be published. It differs from previous treatments in extending the concept of "relative strength" - the relation "weakerstronger" - to include feet containing syllables with the same phonemic stresses but differing only allophonically; unfortunately, the assertion is made, but not completely demonstrated: 27

H. Whitehall and A. A. Hill, "A Report on the Language-Literature Seminar", in Readings in Applied English Linguistics, ed., Harold Allen (New York, 1958), p. 395. 28 Whitehall, Kenyon Review, XVIII (1956), p. 418. a9 Seymour Chatman, "Robert Frost's 'Mowing': An Inquiry into Prosodic Structure", Kenyon Review, XVIII (1956), 422. 30 Edmund Epstein and Terence Hawkes, Linguistics and English Prosody, in Studies in Linguistics Occasional Papers, 7 (Buffalo, 1959), p. 16.



... when two instances of the same stress phoneme occur on syllables immediately following each other, the occurrence of the second in the sequence will be phonetically more 'prominent' than the first. So in animal the weak stress over the final syllable is phonetically stronger perceived as 'louder' - than that of the second syllable... ... not only the stress phonemes but the allophones of these phonemes are utilized by English prosody in determining where the ictus falls. In a predominantly iambic line, not only will we get / \ / / , / a / / , /^a/, / \ a / , W , but M , / \ \ / , /AA/, and / / | / / . 3 1 Epstein and Hawkes give an inventory of all the actual phonemic possibilities for the four feet (only four, since "equal feet" - spondees and pyrrhics - are declared impossible, one of the two syllables always being considered allophonically louder than the other). Including junctures, they postulate 6,236 possible kinds of iambs, and 2,376 kinds of trochees; anapests and dactyls are not computed, but "the possibilities soar up into the thousands". 32 However, trisyllabic feet are protrayed as a mere sub-class of bisyllabic feet, on the principle of "isometrism". 33 Epstein and Hawkes assert the reality of the foot and make the important observation that the foot must be the simplest recurring unit in the line. What they mean by "simplicity" seems to be something like maximal homogeneity and regularity, i.e., the fact that of two scansions that which has the greater number of identical feet is 31

Epstein and Hawkes, op. cit., p. 7. These figures are astronomical because they represent the multiplication of all the suprasegmental variables in the Trager-Smith system. For example, the possibilities for metrical ^ ^ are, for "normal transition" (transcribed as w ) : II' XJ/ +> #» f° r Pl us juncture: ||, w-f ^ etc. It is difficult to believe that any person could hear all these differences, let alone make semantic or value associations with them. Hawkes makes some effort to reduce the number in a recent article: "The Problems of Prosody", A Review of English Literature, III (1962), 32-49. 33 Epstein and Hawkes, p. 38. By isometrism they mean "roughly" what Pike meant by "isochronism" - the tendency for stresses to occur, regardless of the number of intervening syllables - at approximately equal time intervals. (See Pike, Intonation of American English, p. 35.) However, since in the next paragraph they reject the concept of time that Pike utilizes, it is difficult to know precisely what they mean. H.L. Smith, Jr., in "Toward Redefining English Prosody", Studies in Linguistics, XIV (1959), 68, agrees with Epstein and Hawkes' reduction: "all of English 'metered' verse is predominately iambic, with always the possibility of some trochees alternating with the iambs." 32



to be preferred. This principle might also be called "commonality". Following Trager, 34 they organize English metrics into three levels, the "profile" level, the "unit" level and the "system" level. The profile is the "totality of the contents of substance of a field": in meter, "The substance of verse ... must be the foot." The units are the elements which compose the profile: here the two kinds of relative stress, stronger and weaker. Finally, there is the system of relationships of the parts of the profile: the system level of metrics is twofold, consisting of the line and the strophe. The higher levels define the lower: "the strophe defines the line, and the line units." 35 To demonstrate, they analyze the second line of "Full fathom five", "Of thy bones are coral made", in which the meter is identified as iambic tetrameter with a monosyllabic first foot, rather than iambic trimeter with anapestic substitution for the first foot or trochaic tetrameter brachycatalectic, on the basis of global, i.e., strophic, considerations. This practice, of course, is not new; it has been utilized by generations of metrists. What is new is formalization into an actual computation, so that recourse is not made to general impressions. No one as yet has used the Bolinger accent analysis as a basis for metrical theory. Chapter V represents in part an attempt to do so.

George L. Trager, "Linguistics", Encyclopedia Brittanica (New York, 1956), and Epstein and Hawkes, p. 13. Epstein and Hawkes, p. 47.




Why do we need metrics at all?1 The answer seems obvious. The phenomenon of verse exists. It is in part a product of the constraining of language by a supervening system we call meter. The science of metrics has arisen to specify and categorize the elements of the system. Metrics thus exemplifies the general human need to categorize, which, psychologists assure us, we possess for five good reasons: to reduce the complexity of our environment, to identify the objects in the world about us, to reduce the necessity of constantly treating things as if they were new occurrences, to aid in problem-solving, and to discover (or invent) orders and relations among events.2 But other, more specifically literary, motivations may be adduced, too. Through metrical analysis, we are able to distinguish the effects of phonetic surface in poems, a matter of some critical consequence. Since meters are complex organizations, they can be handled by poets in idiosyncratic ways, and so are appropriate subjects for stylistic analysis. Furthermore, metrics aids in esthetic evaluation to the extent that meter and meaning are mutually informative or mutually appropriate. Too easy assumptions of "expressive form" need close examination, but if such appropriateness exists or is even merely useful as a metaphor for something 1

"Why do we want to measure verse? For the same reason that we study laws of colour, or laws of musical harmony. In each case we seek to analyze results which have pleased us in the work of poet, painter, or musician. By measuring, by dividing this into its units, we hope to throw light on its architecture. Such knowledge is not necessary to the artist, nor even to his intelligent admirer. It will not make a genius, nor teach us infallibly to detect one; we can but judge of results, not lay down laws for the future." T. S. Omond, A Study of Metre (London, 1903), p. 1. 2 J. Bruner, J. Goodnow, and G. Austin, A Study of Thinking (New York, 1956), pp. 12-13.



less expressible, metrical analysis qualifies as an important preoccupation of literary criticism. In any case, the difference between prose and verse must be accounted for, and meter is obviously central to that distinction. Success and persuasiveness in metrical analysis depend upon the adequacy of the theory behind it. Too often metrists are arbitrary and unwilling to justify or even disclose their tacit assumptions. Many propose and debate scansions on ad hoc grounds, without considering basic premises or even acknowledging the need for their consideration. One thing is clear: people do learn how to scan poems. The ability cannot be intuitive, but must develop by virtue of their native command of their language and certain simple rhythmic assumptions. Further, it seems clear that scansions can only derive from recitations - whether actually vocalized or "silent", that is, the scanner cannot but proceed by actually reading the words and coming to some decision about their metrical status. A metrist's proper task, then, is to try to discover by observation 3 what people do when they perform metrical analysis. If he were to run experiments by tabulating the responses of skilled metrists to a standard passage, he would discover that metrical judgment varies widely. A simple search through the standard handbooks shows the same thing. Why is there variation? Let us recall the two essential features of meter as a rhythmical phenomenon, the metrical event and the metrical prominence. We have equated the former with the syllable but were unable to find any single unequivocal linguistic feature to represent the latter. Three reasons for variation in metrical practice suggest themselves:


The need for inductive, empirical metrics was clearly asserted by Elder Olson, General Prosody (Chicago, 1938), p. 98: "... prosody is inductive; the prosodic hypothesis is established by particulars, and is finally discarded, if it ever is discarded, because of its inability to analyze particulars. The theorist who cannot or will not illustrate his hypotheses with examples showing that what he asserts as actual or as possible does indeed exist, as possibility or fact, does not deserve serious hearing. The fabrication of a theory which has only logical excellences can be admired by logicians alone."


1. Metrists word or 2. Metrists or not; 3. Metrists


do not agree upon the number of syllables in a given line; do not agree upon whether a given syllable is prominent do not agree upon how the syllables are grouped.

Disagreements stemming from the second and third reasons are complex and more profound than those stemming from the first, since the number of syllables in English words is usually easy to agree upon. However, the phenomenon of metrical elision does exist as a historical fact and deserves theoretical consideration. I propose that the scansion of a recitation is not the same thing as the meter, but merely one version of it. 4 The scansion is not a sheer record, i.e., a full phonetic or phonemic transcription. It is, rather, a conventionalized or formulaic reduction of the phonetic complex of the performance to the simple distinctions implicit in such terms as "ictus" and "non-ictus". My view of the mechanism of this reduction will be described in detail below. The meter of a poem is not some fixed and unequivocal characteristic, but rather a structure or matrix of possibilities which may emerge in different ways as different vocal renditions. Obviously, these will not be of equal merit; but value judgments should not obscure the range of linguistic possibility even before inquiry begins. It is a mistake in method to confuse the metrical abstraction (in the sense of "derivation of common features") with any of its 4

This distinction is implied in L. Abercrombie, Principles of English Prosody (London, 1923), p. 79: scansion is "the exhibition of the natural speech-rhythm of verse in its metrical form. Scansion does not establish the verse-rhythm as metre; that is done, if at all, in the hearing of the verse; and scansion has to show how it is done. Since what is actually heard in verse is the natural sound of the words, but, since, if there is metre, this is heard with reference to a constant schematic pattern of rhythm, the problem of scansion is to show, precisely and unmistakably, the manner of the reference." ("The constant schematic pattern of rhythm" - what Abercrombie elsewhere calls "the rhythm of the Base" - is defined in this book as the meter itself, as a structure of possibilities.) The point was repeated by Abercrombie on p. 116: "... the fact that very many lines may be metrically spoken in several ways does not belong to scansion; scansion merely has to exhibit the speech-rhythm natural to the speaker."



actualizations. Evaluations which assert that one rendition is more valuable than another ("better", "richer", "more meaningful", "more profound") must ultimately be made. But one can hardly proceed to grasp the meter of a poem in the fullest sense (the history of metrics shows) if he is so committed at the outset to his own rendition that other possibilities are not even conceivable, let alone acceptable. Who has not read with surprise the claim that a certain scansion is the meter when it seems so clearly only one way of reading the verse? It has long been clear that descriptive orientations are more appropriate to metrics than prescriptions. It is hard to express the need more clearly than did Lascelles Abercrombie more than thirty-five years ago: The art of Prosody is ... not a matter of rules and prescriptions, but of the empirical use of certain laws which are themselves no more than general statements of the methods that actually have proved capable of being used for expressive purposes. Any 'art of prosody' that professes to explain how prosodic expression is to be obtained, beyond a description of the proved means available to obtain it, is to be profoundly mistrusted.5 We shall extend Abercrombie's phrase "general statements" to mean not only that meter is what poets do, not what metrists say to do, but also that the meter of any poem is best described as the matrix of all meaningful scansions. 5

Ibid., pp. 77-78. Cf. John Lotz, "Notes on Structural Analysis in Metrics", Helicon, IV (1942), 125-126: "Verse, in contrast with prose, might be described as bound by some restrictions aiming at a greater regularity of construction, and a repetition of some of its parts (rhythm). These restrictions may be formulated in rules. These rules will have the nature of norms, as in all branches of normative social science like linguistics, law, ethics, etc. Of course the validity of norms cannot be compared with that of a natural law; norms describe a given (eventually individual) phenomenon and have only the validity of description, not of control. Thus, on a metrical basis, it can never be decided whether a verse be objectively good or bad in a metric sense, only, how far it conforms to a given norm accepted by the community. These norms need not at all be conscious, the knowledge consciously formulated of them may even be erroneous (as in the case of popular etymology, or, the naive attitude some poets assume towards verse)."



Two terms need to be distinguished at the outset. By "scanning" I shall mean identifying aspects of recitations as rudimentary rhythmic patterns (according to some convention to be described) in order to show how secondary rhythm is being verbally manifested, by perceived word-stress, phrase-accent, or in other ways.6 By "metrical analysis", on the other hand, I mean the process of summing the scansions of all intelligible recitations. Thus, the meter of a poem is at once more and less than any feasible scansion; more because it presupposes not only that scansion but others as well, and less by virtue of its very inclusiveness, for at points it must be indeterminate to include the possibility of variation. The meter in this sense is a consensus, not a normative formulation. 7 Implicit in this view is an appreciation of the complexity of meter. It is not like rhythm, which can be defined in the simplest perceptual terms since the experience is easily discoverable, in experiment and real life, in sets of temporal recurrences of only one or two standard events. It is wiser to think of meter as a concept, rather than a percept, even though it is based on the percept of rhythm; it is the mind, not the senses, which performs the task of reducing disparate linguistic phenomena to simple distinctions, learning to measure and to equate things which are very different indeed in their absolute physical nature. Rhythms can be perceived by any human •

The schoolboy's accentual distortion The ^

s t o o d 0 " the

Whence a U but







is a reduction in recitation to conform to scansion - accomodating the voice in an elemental fashion to the rhythmic simplicities of finger-tapping. A description of this sort of scansional parsing appeared as early as 1775 in an anonymous elocutionary treatise called The Art of Delivering Written Language (see T. S. Omond, English Metrists... (London, 1907), p. 65). 7 The problem of what constitutes the meter - whether there is one meter necessarily common to all acceptable performances of a poem - was discussed by W. K. Wimsatt and M. Beardsley in "The Concept of Meter: An Exercise in Abstraction", PMLA, LXXIV (1959), pp. 590-591, and by Rulon Wells, Style in Language, p. 198. Wells seems to agree that all the different possible interpretations of a line need not have the same meter, although they may.



being with ears, but one must know English before he can grasp its metrical pattern. SYLLABLE C O U N T

The numerical aspect of meter - the count of syllables per line - does not vary extensively from recitation to recitation, and metrical indeterminacy is ordinarily the result of other factors. Furthermore, when questions arise, the metrical context usually shows us how many syllables to attribute to a word. For example, it is not difficult to see that oblivious is metrically trisyllabic in And with some sweet oblivious antidote (Shakespeare, Macbeth) but oblivion is quadrisyllabic in Wherein he puts alms for oblivion (Shakespeare, Troilus and Cressida). Saying, in the first example, that oblivious is metrically trisyllabic, does not necessarily prescribe elisions to reciters (say, the actual pronunciation/fbhvyis/ instead of / ibliviyis/). By distinguishing between scansion and metrical analysis, we may conceive of either performance, with or without scansional elision, without denying the metrical existence of elision at this point. Devices have been used by poets and editors to make the numerical intention clear: spelling modifications (preterite "-ed" becoming "-t"), apostrophes ("-ed" becoming "-'d" or "-'t"), grave accent marks ("-6d"), etc. In the view of a metrist like Robert Bridges, elisions can often be assumed even where diacritics do not appear in the text. Phonology can help explain the mechanism by which elisions adjustments of syllable-count by omission8 - and their opposites, insertions, may be manifested in speech. Consider the example cited by Bridges: 8 For a discussion of the propriety of the term to cover all cases of omission, see Robert Bridges, Milton's Prosody (Oxford, 1921), pp. 9-18.



In English verse when there is poetic elision of the terminal vowel of one word before the initial vowel of the next word, the sound of it is not lost, the two vowels are glided together, and the conditions may be called synaloepha. For instance, the first example of terminal synaloepha in P.L. is Above th'Aonian Mount, while it pursues,

i. 15

where the final vowel of the is glided into the A of Aonian, it is still heard in the glide, though prosodically asyllabic.9 Bridges is implying that there is necessarily a disparity between the meter of this line and any recitation of it: that this elision is a paper matter only, prosodic bookkeeping so to speak, since the elided vowel is still heard, although it doesn't quite "count". (In another place, he speaks of "the fiction of elision".) But this accounts for only one recitational style. It is quite possible to say /Se:onyin/, in which the whole vowel of "the" is deleted, even though modern readers would probably prefer /6i:+e:onym/. It is also possible to convert the vowel to the consonant /y/: /Qye:onym/, a pronunciation which Bridges might have liked. The scansional status of these pronunciations could be described as follows: Meter 1) /6e:onym/ elision 2) /3i: +e:onym/elision 3) /6ye:onyin/ elision

Recitation elision non-elision (recitational exception) "psudo-elision" (see below). 10

The point is that regardless of current recitational practice, many of the elisions indicated in older texts can actually be articulated. 9

Bridges, p. 9. In two sample pronunciations which I measured spectrographically, /Se:onym/ was 18 centiseconds long and /6i:+e:onym/ was 27 cs. The latter had a double vowel movement in the second and third formant, corresponding to /di:+e:-/10



The metrist should understand their phonological nature, even if he prefers to ignore them in his own recitations. And who is to say whether we may not turn again to the widespread articulation of elisions in some future era of formalistic recitation? The consideration of syllable-adjustment - elision and insertion requires a review of some purely linguistic notions. In Chapter III, distinctions were made between phonetic types - "contoids" (contact sounds) and "vocoids" (unobstructed sounds) - and their phonemic counterparts, "consonants" and "vowels" (see p. 33). The term "syllabic" was also introduced as a name for the centermost part, the "crest" or "peak" of syllables, and it was pointed out that certain contoids, like [n], fm], [1], and [r] may function as syllables.11 Their peculiar phonetic character allows these contoids to function syllabically. Contoids occur as two basic types: stops and continuants. Stops are single brief articulatory acts: our lips can explode only one [p] or one [b] at a time. It is obvious, therefore, that a stop cannot be sustained long enough to be a syllabic. Continuants, on the other hand, like vowels, can be sustained for as long as our breath holds out: we can say [mmmmmm] or f f f f f f f ] or [zzzzzz] indefinitely, just as we can say [aaaaa] or [iiiii]. Phoneticians subdivide continuants into fricatives and resonants. Fricatives are formed by forcing air through small apertures in the mouth - between the lips and the teeth ([f] and [v]), between the teeth and the tongue ([0] and [6]), and between the tongue and the roof of the mouth ([s] and [z]). The resultant friction makes these sounds "noisy" (in the physicist's sense), that is, anharmonic, or composed of random sound waves. Fricatives can function as syllabic crests but do so very infrequently in English, precisely because they are so unlike harmonically regular vocoids. They usually occur in marginal words like "pst", "tsk", "fft", etc. But the resonants are much more voicoid-like and hence more easily syllabicized. We form the resonants by causing large areas of the 11

Leonard Bloomfield, Language (New York, 1933), p. 121. For a discussion of these phones, see James W. Abel, "Syllabic /n, 1/", Quarterly Journal of Speech, XLVIII (1962), 151-156.



vocal canal above the glottis to resonate or vibrate sympathetically with vocal cord vibration, just as they do in the utterance of vowels. Resonants differ from vowels only in that the mouth is partially or totally closed. Unlike fricatives, resonants are not made noisy by being squeezed through small openings, but emerge with comparatively little obstruction ([m] and [n] through the nose, and [1] down the sides of the tongue), [r] in many English dialects is even more vocoid-like since it does not involve closure at all: the tongue is held in approximately the same position as for [a] but its tip is rolled back ("retroflexed"). Some phoneticians even represent the sound as a vocoid [a«], where the "tail" stands for retroflexion. Resonants frequently occur as crests in unstressed syllables: "help 'em" [helpm], "button" [botn], "bottle" [batl], "water" [wotr] (or [watr]). Although, as we have seen, the contoids [m], [n], [1], and [r] do occur as syllabics in English, it is preferable from the phonemic point of view to analyze them as combinations of non-syllabic consonants with a vowel, thus /im/, /in/, /il/, and /IT/. The reduced vowel phoneme /i/ is inserted because the syllabic phones [m], [n], [1], and [r] are in complementary distribution with syllables formed by the vowel /»/ before the nonsyllabic phones [m], [n], [1], and [r]. Although there are phonetic differences between these sets of syllables, the differences do not correlate with differences of meaning, and so are linguistically non-significant. It is more economical to treat the syllabic variants [m], [n], []], and [r] as allophones of /im/, /in/, /si/ and /IT/ than to set them up as separate phonemes, even though the phonemicizations /batin/, /batil/, etc., may seem misleading as representations of the pronunciations [batij], [batl], etc.12 In the metrical adjustment of syllables we can recognize three basic categories of elision: losses, transformations, and pseudoelisions. These can additionally be distinguished according to their position: within words (syncope) or between words (apocope). 12

See W. Nelson Francis, The Structure ofAmerican English (New York, 1958), pp. 149-150.



Either consonants or vowels may be elided. 1) In simple consonant loss, consonants may be elided such that the syllables on either side are fused, often by the loss or change to a glide of the second vowel (for example, "by his" /bai+hiz/ becomes "by's" /baiz/, "power" /pauir/ becomes "pow'r" /pa:r/, "being" /bi:ir)/ becomes "be'ng" /bi:rj/). 2) In simple vowel loss, vowels are elided without alteration of contiguous consonants ("medicine" /medisin/ becomes "med'cine" /msdsm/). 3) In complex vowel loss, vowels are elided in such a way that accompanying consonants are shifted into other syllables; for example, a preceding consonant may become initial in a following syllable ("the army" /6i:+ armi:/ becomes "th'army" /Sarmi:/ and "to write" /ti+rait/ becomes "t'write" /trait/), or the shifted consonant may have been originally syllable-final ("it is" /it+iz/ becomes " 'tis" /tiz/). The latter is particularly common in apocope where resonant consonants are involved ("amorous" /aemiris/ becomes "am'rous" /aemris/, "groveling" /gravilir)/ becomes "grov'ling" /grovliq/, "countenance" /kaunttnins/ becomes "count'nance" /kauntnins/). One of the chief elisional transformations is consonantization: the desyllabization of the /i:/ and /u:/ into /y/ and /w/ respectively (for example, "many a" /meni:i/ becomes "many'a" /menyi/ and "shadowy" /saedu:i:/ becomes "shad'wy" /sasdwi:/). Finally, the term "pseudo-elision" may be used to refer to the assumption of elision between two consonants that cannot be clustered, i.e., that cannot really occur in English without an intervening syllabic (for example, the scansion of words like "prism" /prizim/ or "heaven" /hevin/ as if they could actually be pronounced as monosyllables -*/prizm/ and */hevn/); or the postulation of consonant clusters that go against native clustering habits (for example, writing "th'sea" or "th'loss" or "th'grave" even though /8s-/, /61-/, and /5g-/ are not possible initial clusters in English). The opposite of elision is insertion; for example, giving full syllabic value to a vowel-letter not currently pronounced; thus "marked" /markt/ becomes "marked" /markid/, etc. Another common sort of insertion is the syllabification of resonant consonants. For example, "evenings" /i:vniqz/ becomes /i:vinirjz/,



"assembly" /isembli:/ becomes /issmbili:/, and "entrance" /sntrins/ becomes /entirins/.13 The logical status of elisions and insertions depends upon the extent to which the syllable adjustment has become a part of the language.14 If the adjusted word has become totally accepted in the language as a word, then its occurrence is metrically irrelevant. An example is 'tis; presumably no one, even in our metrically naturalistic age, would read Where ignorance is bliss It is folly to be wise. On the other hand, if the syllable adjustment is not totally incorporated into ordinary usage, its actualization will vary with prevailing recitational habit. In the eighteenth century, indications for syllable adjustments were apparently rigidly adhered to in performance ; 15 in later times, the common view was that the sounds are not really lost but "heard in the glide, though prosodically asyllabic". Furthermore, the degree of acceptability of an elision has varied from word to word and from sound to sound. Many elisions occur as variants in everyday speech, and presumably these would be acceptable even to naturalistic reciters. For example, few modern readers would question the propriety of either the bisyllabic or trisyllabic pronunciations of champion. On the other hand, most of us would object to the monosyllabic pronunciation /dait/ for diet. In the line 13

See Bridges, p. 28. I am grateful to Professor Monroe Beardsley for his views on this matter. 16 See Paul Fussell, Theory of Prosody in 18th Century England (Connecticut College Monographs #5,1955), p. 71: "Unpleasant as it may be to recognize the fact, contemporary evidence leads to the conclusion that almost all readers in the early 18th century both scanned and pronounced 'am'rous', 'om'nous', etc " And George R. Stewart, Jr., Modern Metrical Technique As Illustrated By Ballad Meter (1700-1920) (New York, 1922), p. 30, after reading 18th century elocutionists like Mason, Rice, Sheridan and Beattie, concluded: "There can be little doubt, however, that general practice in the eighteenth century read such lines as commonly printed. One of the best proofs of this is the fact that about the middle of the century a series of rhetoricians began to protest against such a conventionalized reading of verse. If the practice had not been general, such a protest would not have been necessary." 14



And rapture so oft beheld? those heav'nly shapes (Milton, PL) we might not like a vowel elision like "rapture s'oft", because it is too easily confused with "rapture soft". Even where lexical confusion is not at issue, many indicated elisions seem too extreme for modern taste, for example, where punctuation intervenes or an initial /h/ must be dropped: To a fell Adversary, his hate or shame (Milton, PL). Or where a whole syllable is deleted: /yu:/ from the contracted form "pop' lar", for "popular", or from "artic' late", "cred' lous", etc. Thus, whether a syllable is elided or preserved in pronunciation is a matter of the reciter's taste and the taste of his age, and all arguments pro or con are essentially scansional, not metrical. 16 Elisional marks like the apostrophe resemble musical ornaments in a score - grace notes, turns, and mordents - which may or may not be performed, depending upon our feeling of their appropriateness. Most decisions about elision are up to the performer, but a few elisions may not be possible to articulate at all, existing merely as orthographic conventions, concessions to metrical norms with no relation to the linguistic facts. This is the category of "pseudoelision". We cannot admit the recitational reality of forms which are not in fact pronounceable within normal definitions of the English phonological syllable; so that apostrophes appearing in expressions like "th'loss" and "th'sea" must be considered purely conventional, artificial (i.e., nonphonemic) metrical ligatures introduced to maintain numerical decorum. Here we may speak genuinely of "metrical fictions". Thus, syllable-count qualifies as a genuine metrical abstraction. The poet may select his words in part by considering the numbers of syllables they contain, and ordinarily he can be sure that this feature will be conveyed to his reader. The selective feature of number becomes a part of the poem's mode of existence and is not ordinarily subject to interpretational variation. The feature "decasyllabicity" can be said to exist in a poem and can be ab16

See, for example, arguments discussed by Egerton Smith, The Principles of English Metre (London, 1923), pp. 37-38.



stracted from it by the metrist, since it is not merely a property of the printed line, nor of any single reading, but of all conceivable readings. Syllable-count or metrical numeration has meaning only by reference to constancies of line-length. Lines are metrically formal units (that is, in the central isosyllabic verse tradition) printed separately to help us understand that they contain standard numbers of syllables; and where exceptions occur, these are of a predictable sort, usually one syllable more or less, although occasionally there are greater deviations. In this respect, syllable-count adds a third dimension to metrical structure. In addition to the components of event (syllable) and prominence (yet to be discussed) by which foot-grouping exists, meter includes the factor of numerical regularity. Syllable-count as a structural feature is a form of "secondary grouping" (to distinguish it from the primary grouping of the foot itself). Meter, then, is an instance of complex secondary rhythm, since it contains not only grouped events, but also groups of grouped events (lines) and even groups of groups of grouped events (stanzas). And it is in metrical numbers that poets can count on the least discrepancy between the meter of the poem and any reader's scansion of it. The theory presently offered describes the meters of poems in the "isosyllabic" or syllable-counting tradition of English verse - the tradition of Chaucer, Spenser, Shakespeare, Milton, Pope, Wordsworth, Keats and Tennyson. There exists another tradition - the "isoaccentual" or "isochronic" tradition of Old English poetry, some Middle English poetry and a variety of modern revivals. In the isochronic tradition, the number of syllables in the line is irrelevant; all that counts is the number of ictuses, their relative distribution, and possibly other unifying features, like alliteration. Some special problems occur in describing isochronic meter, but it is the simpler and less important metrical type in later English poetry, and discussion may be saved for another occasion. Since syllable-count is the most relatively constant feature of meter, on the principle of simplicity we may give it priority of application in the analysis of isosyllabic verse. That is, the compu-



tation of line-length should be the first step in scansion and metrical analysis. After counting, the metrist can decide on the basis of simplicity of patterning which sum constitutes the normal line (or lines). Then, having determined the normal line, he may proceed to determine foot divisions. THE FOOT

In Chapter III temporal equalities among feet were characterized as putative and conventional. These adjectives need elaboration. For generations metrists saw no reason to doubt that the assumption of equality of measure in verse corresponded to reality, that periods that seemed equal were literally and exactly equal. Coventry Patmore, for example, wrote: These are two indispensable conditions of metre, - first, that the sequence of vocal utterance, represented by written verse, shall be divided into equal or proportionate spaces; secondly, that the fact of that division shall be made manifest by an 'ictus' or 'beat', actual or mental, which, like a post in a chain railing, shall mark the end of one space, and the commencement of another.17 For Patmore, temporal equality was an accepted fact, "recognized with more or less distinctness by all critics who have written on the subject to any purpose". 18 But the evidence from machines has long since demonstrated clearly that feet are not physically equal or even mathematically proportionate. Later metrists like Sonnenschein grasped this fact and explained the foot as a purely psychological concept: ... the durations found in these [kymograph] records do not stand to one another in exact ratios. This, however, can only be understood when the phenomena of verse are regarded from a psychological or ¡esthetic, as distinct from a purely physical, point of view. What we are concerned with in all manifestations of rhythm is not so much a physical fact as a psychological fact - i.e., the impression made by the physical fact upon the mind of man through the organs of sense. And here two important 17 Coventry Patmore, "English Metrical Criticism", North British XXVII (1857), 136. 18 Patmore, p. 134.




points have to be considered, which may be briefly summed up as follows: (1) The mind of man acting through the organs of sense is not an exact chronometer, and therefore fails to distinguish ratios which differ from one another in only a relatively small degree; (2) ratios which are recognized by the mind as distinctly different may, nevertheless, be unified or identified as representative of or intended for one and the same ratio.

Just as the eye may accept a line that is actually far from straight as representing a straight line, or a figure that is far from circular as standing for a circle, so the ear may accept as equal two ratios which actually differ perceptibly from one another. This 'acceptance' involves a certain rapport between the mind of the observer and the mind of the creator of the rhythm. The observer, knowing the intention of the poet to produce a rhythmical scheme of verse, recognizes that certain syllables or groups of syllables are meant to stand in a certain ratio to other syllables or groups of syllables, and accepts their actual durations as representing these intended durations.19 In short, we conceptualize metrical time in terms of rough equalities ; we treat syllables of varying durations as if they were identical in duration. That we should do so is not remarkable from the psychological point of view; it is simply one more instance of our propensity to categorize experience, to consider things in terms of structural groups rather than fragmentarily, as legions of disparates. The motive is a simple principle of human economy: Were we to utilize fully our capacity for registering the differences in things and to respond to each event encountered as unique, we would soon be overwhelmed by the complexity of our environment.20 Psychologists have a nice distinction between equivalence categorization and identity categorization: We speak of an equivalence class when an individual responds to a set of discriminably different things as the same kind of thing or as amounting to the same thing.11 18 20 21

E. A. Sonnenschein, What is Rhythm? (Oxford, 1925), pp. 35-36. J. Bruner, et al, p. 1. Bruner, p. 4.



The perception of metrical equality is clearly an equivalence rather than an identity categorization. Scanning is learned behavior, and precisely what is learned is how to take complexly different entities - syllables of varying phonetic shapes and weights - as instances of the same thing. The scanner who feels that he controls the process in a completely intuitive manner is simply unconscious of how he really learned it. The difficulty in metrics is not a deficiency of scanning ability among modern readers but rather an inability to come to agreement about which metrical categorizations make most sense; behind that lies the question of how one decides such things. It has been tacitly assumed in this book that scansion and metrical analysis may be reduced to something like a scientific method. If this is presumptuous, at least unsuccessful attempts may suggest some external criteria by which success may be measured. Let us assume that scansion and that metrical analysis to be best which most simply, economically and consistently account for all the facts, that describe with least complexity a mechanism by which language may create secondary rhythms. Let us start by considering the metrical foot. Not the least important justification for metrical feet is that they permit a simpler and more elegant descriptive statement of metrical and scansional facts. It is simpler to assume that the series ^ - ^ - v - o _ w _ consists of five recurrences of one event, ^ - , than that it constitutes some single homogeneous event. For then we would have to concede that the patterns w — w — —• •u KJ — «w> — j

and also constitute separate single events. It is simpler and more elegant to interpret the above three sets of events as variations of the first set with the substitution of - ^ for ^ - in a different position in each case. Another reason for recognizing the metrical foot is psychological. From the outset, the present theory has sought its foundation in the general human perception of rhythm. It is an established psycho-



logical fact that people tend to "group" rhythmic phenomena, even where there is little or no justification in the actual temporal reality. Since these groups are so clearly a part of general rhythmic activity, they must also exist in each species of rhythmic activity, for example, meter. Metrical feet are conventions for analyzing the grouping behavior of syllables which in their metrical aspect are not treated as sound complexes but as mere rhythmic counters of one of two values. Feet have nothing else to do with language: they are nongrammatical and non-lexical, and so do not bear any relation to word-integrity, phonological juncture, or any other real linguistic feature. Foot boundaries may split words, and two words separated by even the strongest juncture (say the one represented by a period) may occur within the same foot. Feet, in short, are purely "notional". 22 Determining the composition of the normal foot logically comes after counting the syllables of the normal line. Its composition is assumed to be the smallest submultiple of the normal line. For example, in most sonnets, the normal line, by inspection, has ten syllables. The submultiples of ten are five and two. But a twosyllabled foot is intrinsically simpler to assume than a five-syllabled foot. It allows only four possibilities, namely | - - |, or | ^ ^ |, whereas the five-syllabled foot allows many more possibilities, thirty-two to be exact:


Abercrombie, pp. 103-104: "... the foot-division of words, of actual sound, is notional. The boundaries of the feet may fall in the middle of words, where not even the most infinitesimal pause can be supposed; and, on the contrary, strong sense-pauses may occur in the middle of feet, so that the syllables composing the feet, so far from attracting, repel one another. The foot-division of speech-rhythm takes no notice of these things. It conveys no suggestion that the syllables within a foot necessarily cohere more than they adhere to syllables outside of it; it conveys no suggestion that poetry is heard, spoken, or composed in feet." "Feet, that is to say, belong wholly to scansion; they are a formality used in the investigation of metre, and have nothing to do with its composition or with its real nature."



^ —

j — — l^/

| W

I I—

W — W

V-/ |

| >_/ V - /

— — J

— — — I J—


| ^



— j

j —

—I I—

W — —


W |


| — • ^

^ —



—I I—

Obviously such a formulation is too complex to be acceptable. Most metrists have found it possible to restrict the constituency of the metrical foot in English to two and three syllables. If we allow residues, all lines can be rather simply explained in terms of feet of one of these two lengths. Consider as a concrete example of the problem of determining foot-composition the relative merits of these analyses: U

VS. (or) -








|A — - A I- A

(where A = "metrical pause" equivalent to a syllable). One metrist 23 has written that the first division is "meaningless for metrical analysis", that there can be only one kind of metrical variation, namely variation in the number of syllables, and that the metrical inversion (foot-reversal) does not exist. But this denial entails a sacrifice in economy, for by abandoning inversion the number of exceptional feet is increased from three to four: the perfectly normal second foot ^ - must be re-interpreted as trisyllabic. To wield Occam's razor metrically is to assume that the best analysis is the one which most effectively, efficiently and reasonably accounts for the largest variety of verses with the fewest units, rules, and exceptions. Usually what happens in a poem is that one foot-type comes to dominate over the others because of its recurrence in linguistically unequivocal settings. Lanier described the process clearly: When a poet puts forth his verse in print, he indicates the manner of grouping the verse-sounds for secondary rhythm by arranging words whose accent is known in such a manner that the ordinary pronunciation23 Egerton Smith, p. 13, and see p. 56f. where the concept of reversal is rejected.



accent [i.e. lexical stress] falls where the rhythmic accent [ictus] is intended to fall. For example if the poet wishes the rhythmic accent to fall upon the first sound, and upon every third sound after, so as to group the whole series into threes... he may indicate such a grouping by beginning with a couple of three-syllabled words whose pronunciation-accent falls on the first syllable, thus initiating the type of the rhythm which the reader is intended to carry on through the poem; as, for example Wistfully | wandering | over the | waters, she 1 2 3

1 2 3





Sought for the | land of the | blessed. 1





Or course, the syllable -ing in wandering may just as easily bear ictus, if it occurs in a line whose constraint is duple: Art thou pale for weariness Of climbing heaven, and gazing on the earth, Wandering companionless Among the stars that have a different birth (Shelley, "To the Moon"). ICTUS

Analyzing ictus 25 is perhaps the most critical problem in English metrical theory. Chapter III has demonstrated that there is no simple linguistic entity - say "stress" - which can be equated with it. The act of assigning ictus actually involves a number of variables which tradition has not always been able to keep clearly apart. Even the best metrists have often been too simplistic: ... it is evident that any notation which professes to give the complete range of accentuation or anything near it will be not only unnecessary but misleading. Unnecessary, because the nature of metre requires no such thing as the reproduction in speech-rhythm of the uniform accentuation supposed in the base; misleading, because it would tend to make the nature of metre rest in speech-sound as actually heard, instead of in 24

Sidney Lanier, The Science of English Verse (New York, 1880), p. 110. The old-fashioned word "ictus" seems best for the purely metrical phenomenon and avoids confusion with linguistic terms like "stress" and "accent". Martin Halpern, "On the Two Chief Metrical Modes in English", PMLA, LXXVII (1962), 180, has coined the terms "major" for ictus and "minor" for non-ictus. 26



speech-sound heard with reference to an ideal constancy; and for the purpose of that reference the mere fact of accent being perceptible is sufficient, without regard to the strength of the accent.26 And more recently, as a reaction to linguistic metrics: We wish in the main to avoid the cumbersome grammar of the new linguists. For all we know there may be, not four, but five degrees of English stress, or eight. How can one be sure? What one can nearly always be sure of is that a given syllable in a sequence is more or less stressed than the preceding or the following.27 Asserting that a notational system should be simple and binary 28 to reflect the simple binary dimensions of rhythm, however correct from the metrical point of view, does not disprove the usefulness of considering the means by which the language fleshes the rhythmical base out. At issue is the difference between applied and theoretical metrics. Applied metrics is content to scan; it does not trouble itself with the nature of the phonetic perceptions underlying its judgments. Theoretical metrics wants to know what it is to scan, that is, how and which elements in and out of the language are interpretable as metrical features. The expression "in and out of the language" needs some explanation. Let us recall the distinction made in Chapter III between stress and accent: stress is a phonemic constituent of full-vowelled monosyllabic words, and of one syllable in each polysyllabic word, which enables us to distinguish the word from other words. It is learned as part of the word, is marked in dictionaries, and specifies the syllable which may be accented in actual utterance. Accent, on the other hand, is not a part of words as words but rather a function 26

Abercrombie, p. 91. Wimsatt and Beardsley, p. 593. 28 The term "binary" is an oversimplification, since there exists a kind of verse with two degrees of ictus, often called "dipodic". See George Stewart, pp. 95-113. An example is: 27






°aVes> la ke s> fe ns> b 0 & > d ^


shades o f


in which the first three feet are clearly spondees. Spondaic variation, of course, need not occur in a foot which is divided by a caesura; and in any case, it is not the caesura which causes the spondee, but rather the weight of the syllables themselves. How may we preserve these possibilities in our metrical analysis without sacrificing the criterion of simplicity? We could classify spondee not as a separate foot but rather as a subclass of iambus or trochee, whichever the prevailing mode,36 graphically, | - - | = | ^ - | 36

This is the opinion of Wimsatt and Beardsley, p. 594, who claim that "it is impossible to pronounce any two successive syllables in English without some rise or fall of stress - and some rise or fall of stress is all that is needed for a



or | - v |; or we might not use the combination | — | at all, but simply include in our definition of ^ all levels of unprominence up to and including equal prominence. The analysis V_/




Rocks, caves, lakes, fens, bogs, dens, and shades of death then would be said to include both scansions discussed above. But such a system seems not only uninformative but actually misleading. Consider again the line Above his e | quals. Fare | well hap | py fields. If we were to mark the second foot | -quals Fare- | and the third | -well hap- | we would be suggesting that the metrical relations of the two syllables in farewell have precisely reversed their normal linguistic stress disposition.37 The metrical analysis of syllables which may be either ictic or non-ictic is best shown by double marks: w




Rocks, caves, lakes, fens, bogs, dens, and shades of death. Spondee reflects a degree of emphasis, but not the overriding emphasis of actual foot-reversal. In many cases, there is little more than an added deliberation, a weighting of the word somewhat beyond its normal implication. Whether the additional freight is added depends upon the semantic context. We are more inclined to make spondaic To make ^ a d

and good provoke to harm

than we are to say metrical ictus." The term "some" here will not quite do; there must not only be a difference in syllable weight but that difference must be perceptible to the common scanner. And experiments can easily be set up to show that in many cases judges cannot reach agreement upon the place of ictus. 87 I must admit that this prospect does not disturb all metrists, even conservative ones like Sir George Young, p. 38: "The relation of contrast which exists between the [ictic] and [non-ictic] places of the foot does not extend beyond the wave of the [ictic syllable's] influence to the places of another foot. As between two feet, there is nothing to prevent [an ictic] syllable ending the first from being lighter in utterance than [a non-ictic] syllable beginning the next."



And ever a ^ i « ,s



. c a r „e ing s.

The latter makes much more of the preposition than the context will bear. What of the pyrrhic, the assumption of two non-ictic syllables within the same foot? It is clear that even the weakest of syllables, d, may be ictic; we have had such instances as When he finish j es re | fection and

W —

And prophecying with accents terr | ible j. Even though one might object to actualizations as extreme as f ^ i s h ^ re f e c tion and terr b

i le

he would probably allow some slight increase in weight on -es and -ble under the influence of metrical set. The tendency is not evident, however, when the weak syllable precedes (or follows, in the case of trochee) a heavy syllable which is ictic. This is the so-called "ionic" effect: Upon | the su | prime theme | of art and song U p °n the s u p r e m e ww



w w —

To a green thought in a green shade Toa8re


Although stress and accent relations do not ordinarily operate across foot boundaries, in such cases it is simpler to assume that they do, or we introduce artificial and rather awkward conflicts between metrical and linguistic systems.



Sometimes the sense clearly prescribes the ionic effect. A good example occurs in Thomson's "Brittania": ... while hot Disease, And Sloth distemp | er'd swept | off burn | ing crowds. If one does not read swept

ffburn cr c mg



or the like, and gives ictus to swept instead of off swept * off burn-ing_ c rr „o W d s a clearly false sense is conveyed: disease and sloth are not destroying the crowds, they are pouring down from them. Level feet must be distinguished from reversed and ambiguous feet. Reversed feet displace the ictus within the foot; they are either trochees amid iambs or iambs amid trochees. Two sorts of pressures counter metrical set to effect reversal, stress pressure and accent pressure. Stress pressure results simply when a stressed syllable falls into a position set for non-ictus, the adjacent ictic position being occupied by an unstressed syllable. When this happens, it is clear that ictus itself has shifted within the foot, since by definition it is nothing other than syllabic prominence. Some examples in iambic lines are: ...whose top z | Brightness | had made invisible, thus spake (Milton, PL) J- u My uncombed locks, j matted | like mistletoe, (Dryden, All for Love) , \j ...like a man | Flying | from something that he dreads... (Wordsworth, "Lines...") J- « From the keen ice | shielding | our linked sleep (Shelley, Prometheus Unbound) —



Much have I seen and known, - | cities | of men (Tennyson, "Ulysses"). Stress-marked iambs amid trochees are much rarer: few can be found, even in the trochaic poems of an uninhibited metrist like Browning: Fear I naturally look for - | unless | of all men alive (Browning ,"Clive"). The second sort of pressure that may operate to reverse metrical set is accentual. Foot-reversals may derive from accents produced by the same conditions that help fix ictus in normal feet: 1) the accentability of major as opposed to minor words; 2) contextual demands for special emphasis; 3) the occurrences of syllables as centers of intonational phrases; and 4) the need to avoid syntactic or lexical confusion. Examples of these conditions in iambic verse are: 1) A mind | not to | be changed by Place or Time (Milton, PL) —


Illumine, what is low [ raise and I support (Milton, PL) The grey | trunks, and, | as gamesome infants' eyes (Shelley, "Alastor")


| Strove to | buffet to land in vain. A tree (Tennyson, The Princess) If lecherous goats, if serpents envious —


Cannot be damned, Alas! why should | I be | (Donne, "If Poisonous Minerals") Exposed a matron, to avoid | worse rape |

(Milton, PL)

More glorious and more dread than from [ no fall | (Milton, PL) 3)

Nay, answer me: | stand, and | unfold yourself (Shakespeare, Hamlet)



| 'Boys!' shriek'd | the old king, but vainlier than a hen (Tennyson, The Princess) The Sea of Faith Was once, | too, at | the full, and round earth's shore (Arnold, "Dover Beach") —


| Cast in | the same poetic mold with mine (Dryden, "To the Memory of Mr. Oldham").

Reversals in trochaic lines which rely upon accent are, again, harder, but not impossible, to find: —

'Ticket up in one's museum, Mind-Freaks, \ Lord Clive's | Fear, UniqueV (Browning, "Clive") 'There's no standing him and Hell and God all three against me, - so, w —

j I did | cheat!'

(Browning, "Clive").

And supported by italics: w — | Ah! some | power exists there which is ours? (Arnold, "Self-Deception") 'Then, more kisses!' - | Did I ] stop them, when a million seemed so few? (Browning, "A Toccata of Galluppi's"). We may call feet ambiguous38 which can be scanned with either the normal disposition of ictus and non-ictus or a reversal. They are 38

The use of this term is not really a relinquishment of the point I was trying to make in "Mr. Stein on Donne", Kenyon Review, XVIII (1956), pp. 443-451. There I asserted (somewhat too earnestly, it now strikes me) that the voice had no mechanism for the "hovering" that had been attributed to the last line of Donne's "Elegy X": So if I dream I have you, I have you. I felt that the foot was most meaningful as a trochee. But from the analytical point of view, there are two scansions - h&ve ySu - and the foot is ambiguous. "Ambiguous", in short, does not mean "vague" or "uncertain" or "uncommitted", as "hovering" would seem to imply, but simply "capable of two actualizations". The possibility of a metaphorical function for meter will be taken up in detail in Chapter VII.



most appropriately marked 3 ^ (or ^ J if trochee dominates). The symbols are staggered to avoid the suggestion that both syllables can have ictus at the same time. Feet are usually ambiguous when there are two interpretations possible, two points of emphasis which can be accomodated. At the same time, the validity of metrical ambiguity does not need to depend upon a general theory of literary value. Metrists should, I think, be able to work out a system with sufficient flexibility to incorporate legitimately the differing interpretations of poems that sometimes arise and one which does not depend at the outset upon given critical opinions. Not that scanners do not owe it to their readers to specify the semantic reasons for their choice; but metrists cannot hope to achieve a satisfying theory of meter if it needs to originate in a literary value system. It is not unreasonable to expect metrics to be pre-critical; the metrist (as opposed to the literary critic) would do better to err on the side of phonetic inclusiveness than on that of critical exclusiveness. As an example of ambiguous meter, we may consider a recent discussion of Pope's famous line Flies o'er th' unbending corn, and skims along the main. —



A metrist writes: "The present scansion [flies o'er instead of flies o'er] takes the grammatical force of the parallelism in this passage to play flies against skims: Camillia flies over the corn and skims across the sea. If the first foot is taken as an iamb, then one must argue that she flies o'er and skims along - a reading one may not be able to defend." 39 The rendition Aies o'er is perfectly justifiable, but so is the alternative;./?/«


r does not necessarily imply skims

a°ng and gives no special prominence to the preposition. Many verb-preposition and verb-adverb combinations operate in this way: run lnto, work a^nst, put °ut, etc. The point is that both recitations have something to be said for them; the foot is ambiguous and should be marked flies o'er. "

See John Ciardi, How Does a Poem Mean? (Boston, 1959), p. 927.



Another example occurs in Shelley's "Adonais" J ^ ^ _ | Where wert | thou, might | y Mother, when he lay. The alternatives imply different emphases: if one reads Where wert thou, he expresses some real curiosity about Urania's whereabouts. Where wert thou, on the other hand, introduces a plaintive tone ("You were missed mightily") and a greater sense of the inevitability of doom ("Couldn't you have been there and done something? No, I don't suppose you could"). Sometimes the ambiguity crosses foot boundaries; in that case both feet permit normal or level interpretations: | The same | heart beats | in every breast (Arnold, "The Buried Life"). Same heart implies "There is an identity of feeling among men, despite widespread differences and the ubiquitous fear of giving oneself away"; while same heart implies "There's a faculty shared by all members of the human race, namely the emotions; and there, if anywhere, we may find community." A third possibility is A A — — ^am n e r same heart (The t or the like), where emphasis is equally divided between the fact of emotional community and the identification of what it is that enables us to commune. Feet composed of monosyllabic words tend to be the most ambiguous, particularly where they are not major words. I can conceive of many possible scansions of the line from "Lycidas": —

Who sing | for Lycidas? —would not