Corpus-based Studies of Diachronic English 9783035102697, 3035102694

573 62 2MB

English Pages [310] Year 2011

Polecaj historie

English on Croker Island: The Synchronic and Diachronic Dynamics of Contact and Variation 9783110707854, 9783110707755

Existing accounts of Australian Aboriginal English do not investigate the significant degree of variation found across t

173 24 4MB Read more

Diachrony: Diachronic Studies of Ancient Greek Literature and Culture 3110425378, 9783110425376

Not a few of the more prominent and persistent controversies among classical scholars about approaches and methods arise

208 39 2MB Read more

Ghanaian Pidgin English: In search of diachronic, synchronic, and sociolinguistic evidence

505 80 175MB Read more

English for Economic Studies

Учебное пособие знакомит студентов с базовыми темами делового английского языка. Каждый юнит содержит уникальную структу

254 63 15MB Read more

English on Croker Island: The Synchronic and Diachronic Dynamics of Contact and Variation 9783110707854, 9783110707755

Existing accounts of Australian Aboriginal English do not investigate the significant degree of variation found across t

192 85 2MB Read more

Handbook of Diachronic Narratology 9783110617481, 9783110616439

This handbook brings together 42 contributions by leading narratologists devoted to the study of narrative devices in Eu

190 57 2MB Read more

Diachronic Studies in Romance Linguistics: Papers presented at a Conference on Diachronic Romance Linguistics, University of Illinois, April 1972 9783110811827, 9789027934734

181 56 56MB Read more

Diachronic Studies in Romance Linguistics. Papers presented at a Conference on Diachronic Romance Linguistics, University of Illinois, April 1972 [Reprint 2010 ed.] 9027934738, 9789027934734

202 26 54MB Read more

Diachrony: Diachronic Studies of Ancient Greek Literature and Culture 3110425378, 9783110425376

Not a few of the more prominent and persistent controversies among classical scholars about approaches and methods arise

163 107 3MB Read more

ENGLISH FOR SOCIAL STUDIES

1,078 90 1MB Read more

Corpus-based Studies of Diachronic English
9783035102697, 3035102694

Author / Uploaded
Matti Rissanen (editor)
Roberta Facchinetti (editor)

Table of contents :
Contents
Introduction
The Importance of Historical Corpora, Reliability, and Reading • Anne Curzan / Chris C. Palmer
Old English and Middle English
More on the Ancestors of Need • Johan Van Der Auwera / Martine Taeymans
Spotting Spoken Historical English: The Role of Alliteration in Middle English Fixed Expressions • Manfred Markus
Towards a Corpus-Based History of Specialized Languages: Middle English Medical Texts • Irma Taavitsainen / Päivi Pahta / Martti Mäkinen
Towards the Automatic Identification of Directive Speech Acts • Barry Morley/ Patricia Sift
Modern English
Leaders of Linguistic Change in Early Modern England • Helena Raumolin-Brunberg
ZEN Corpus 1.0 • Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef
Death Notices: The Birth of a Genre • Udo Fries
The Contribution of Computer-Searchable Diachronic Corpora to the Study of Word Stress Variation • Franck Zumstein
19th-Century and 20th-Century English
19th-Century English: An Age of Stability or a Period of Change? • Merja Kytö / Erik Smitterberg
The Conventions’ Spelling Conventions: Regional Variation in 19th-Century Australian Spelling • Clemens Fritz
The Grammaticalization of the English Adjectives of Comparison: A Diachronic Case Study • Tine Breban
Panchrony in Linguistic Change: The Case of Courtesy • Göran Kjellmer

Citation preview

UG LI 31.p65

1

○○○○○○○○○○○○○○○○○○○○○○○○○○○○

Corpus-based Studies of Diachronic English

Corpus-based studies of diachronic English have been thriving over the last three decades to such an extent that the validity of corpora in the enrichment of historical linguistic research is now undeniable. The present book is a collection of papers illustrating the state of the art in corpus-based research on diachronic English, by means of case-study expositions, software presentations, and theoretical discussions on the topic. The majority of these papers were delivered at the 25th Conference of the International Computer Archive of Modern and Medieval English” (ICAME), held at the University of Verona on 18-23 May 2004. A number of typological and geographical varieties of English are tackled in the book: from general to specialized English, from British to Australian English, from written to speech-related registers. In order to discuss their tenets, the contributors draw on corpora and dictionaries from different centuries, including the most recent ones; hence, they testify to the fact that past and present are so strongly interlocked and so inextricably entwined that it proves hard – if not preposterous – to fully understand Present-day English structure and features without turning back to the previous centuries for an indepth knowledge of the ‘whys’ and ‘hows’ of the current state of the art.

li31

Linguistic Insights Studies in Language and Communication

Roberta Facchinetti & Matti Rissanen (eds)

Corpus-based Studies of Diachronic English

Peter Lang

Roberta Facchinetti is Professor of English at the University of Verona, Italy. Her research field and publications are mainly concerned with language description, textual analysis and pragmatics. This is done mostly by means of computerized corpora of both synchronic and diachronic English. Matti Rissanen is Emeritus Professor of English Philology at the University of Helsinki and a team leader in the Research Unit for the Study of Variation, Contacts and Change in English, at the same university. His research interests include long-term diachronic development of English syntax and grammatical vocabulary and the compilation of historical corpora.

○○○○○○○○○○○○○○○○○○

ISBN 3-03910-851-4

31

Roberta Facchinetti & Matti Rissanen (eds) •

○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○

li31

li

16.02.2006, 13:13

Verbal Constructions in Prescriptive Texts

Corpus-based Studies of Diachronic English

Linguistic Insights Studies in Language and Communication Edited by Maurizio Gotti, University of Bergamo

Volume 31

PETER LANG Bern • Berlin • Bruxelles • Frankfurt am Main • New York • Oxford • Wien

Roberta Facchinetti & Matti Rissanen (eds)

Corpus-based Studies of Diachronic English

Verbal Constructions in Prescriptive Texts

PETER LANG Bern • Berlin • Bruxelles • Frankfurt am Main • New York • Oxford • Wien

Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available on the Internet at ‹http://dnb.ddb.de›. British Library and Library of Congress Cataloguing-in-Publication Data: A catalogue record for this book is available from The British Library, Great Britain, and from The Library of Congress, USA

Published with a grant from Università degli Studi di Bergamo (Italy), Dipartimento di Lingue, Letterature e Culture Comparate

ISSN 1424-8689 ISBN 9783035102697 US-ISBN 0-8204-8040-1

© Peter Lang AG, European Academic Publishers, Bern 2006 Hochfeldstrasse 32, Postfach 746, CH-3000 Bern 9, Switzerland [email protected], www.peterlang.com, www.peterlang.net All rights reserved. All parts of this publication are protected by copyright. Any utilisation outside the strict limits of the copyright law, without the permission of the publisher, is forbidden and liable to prosecution. This applies in particular to reproductions, translations, microfilming, and storage and processing in electronic retrieval systems. Printed in Germany

Contents

ROBERTA FACCHINETTI / MATTI RISSANEN Introduction .......................................................................................... 7 ANNE CURZAN / CHRIS C. PALMER The Importance of Historical Corpora, Reliability, and Reading....... 17

Old English and Middle English JOHAN VAN DER AUWERA / MARTINE TAEYMANS More on the Ancestors of Need .......................................................... 37 MANFRED MARKUS Spotting Spoken Historical English: The Role of Alliteration in Middle English Fixed Expressions ................................................. 53 IRMA TAAVITSAINEN / PÄIVI PAHTA / MARTTI MÄKINEN Towards a Corpus-Based History of Specialized Languages: Middle English Medical Texts ........................................................... 79 BARRY MORLEY / PATRICIA SIFT Towards the Automatic Identification of Directive Speech Acts ....... 95

6

Contents

Modern English HELENA RAUMOLIN-BRUNBERG Leaders of Linguistic Change in Early Modern England ................. 115 HANS MARTIN LEHMANN / CAREN AUF DEM KELLER / BENI RUEF ZEN Corpus 1.0................................................................................ 135 UDO FRIES Death Notices: The Birth of a Genre ................................................ 157 FRANCK ZUMSTEIN The Contribution of Computer-Searchable Diachronic Corpora to the Study of Word Stress Variation .............................................. 171

19th-Century and 20th-Century English MERJA KYTÖ / ERIK SMITTERBERG 19th-Century English: An Age of Stability or a Period of Change? .... 199 CLEMENS FRITZ The Conventions’ Spelling Conventions: Regional Variation in 19th-Century Australian Spelling .................................................. 231 TINE BREBAN The Grammaticalization of the English Adjectives of Comparison: A Diachronic Case Study.................................................................. 253 GÖRAN KJELLMER Panchrony in Linguistic Change: The Case of Courtesy .................. 289

ROBERTA FACCHINETTI / MATTI RISSANEN

Introduction

Each year brings new problems of Form and Content. (W.H. Auden, ‘Shorts’) The great source of pleasure is variety. Uniformity must tire at last, though it be uniformity of excellence. We love to expect; and, when expectation is disappointed or gratified, we want to be again expecting. (Samuel Johnson, Lives of the English Poets, ‘Butler’)

The present book is a collection of papers illustrating the state of the art in corpus-based research on diachronic English, by means of casestudy expositions, software presentations, and theoretical discussions of the topic. The majority of these papers were delivered at the 25th Conference of the International Computer Archive of Modern and Medieval English (ICAME), held at the University of Verona on 18-23 May 2004. Corpus-based studies of diachronic English have been thriving over the last three decades to such an extent that the validity of corpora in the enrichment of historical linguistic research is now undeniable. Bearing this in mind, scholars are now pondering how far diachronic corpus linguistics may be improved in order to further enhance our knowledge of the kaleidoscopic shifts and turns of the English language through the centuries. This is the issue tackled in the very first paper of the collection (‘The Importance of Historical Corpora, Reliability, and Reading’), by ANNE CURZAN and CHRIS C. PALMER; the authors provide a detailed overview of ways to widen the rich contribution of historical corpus linguistic studies to the broader field of linguistics, while recognizing the limitations inherent to corpus-based methodologies. The argument of the authors has two complementary strands, one focussed on the objects of study and one on the linguistic analysis. First, historical

8

Roberta Facchinetti / Matti Rissanen

corpus linguistics would benefit from embracing a wider definition of a useful diachronic corpus, given the understanding that principled and unprincipled corpora can prove productive and limiting for different studies with different research goals. Second, studies in historical corpus linguistics should involve complementary methodologies and engage current linguistic theories in ways that both enrich the analysis of the corpus data and inform the development of linguistic theory. This is true for all the main periods of the historical development of English, though Old and Middle English appear to be the most troublesome, due to scanty original documents and to the different versions of transcripts available particularly for the most ancient documents. Though scanty as the original data may be, yet these two periods yield interesting food for thought, as testified to by the four papers focussing on them in the present volume. The very first one, by JOHAN VAN DER AUWERA and MARTINE TAEYMANS (‘More on the Ancestors of Need’) deals with the Old English origins of the Present-day English verb need; the discussion is largely based on a scrutiny of the research literature, on the entries in the standard dictionaries, and the Old English and Middle English parts of the Helsinki Corpus of Diachronic English. Bearing on these data, the authors argue that Present-day need replaces at least four earlier constructions: (i) a personal need verb meaning ‘compel’, (ii) an impersonal need verb meaning ‘it is necessary’, (iii) the verb þurfan meaning ‘need’ in negative polarity contexts, and finally (iv) a set of polarity neutral nominal constructions with the nouns nedþearf, þearf, and ned, all meaning ‘need’. Focussing more on speech-related varieties of English, MANFRED MARKUS convincingly comments on the affinity of alliteration to spokenness; to do so, he analyzes Middle English alliterative verse (‘Spotting Spoken Historical English: The Role of Alliteration in Middle English Fixed Expressions’), which has lately aroused new interest (cf. Brinton/Akimoto 1999; Hartle 1999; Minkova 2003). While the dominant understanding seems to be that alliteration was a specifically poetic metrical device, drawing on the Innsbruck Prose Corpus of ICAMET (Innsbruck Computer Archive of Machine-Readable English Texts), Markus demonstrates that alliteration played a considerable role in common Middle English

Introduction

9

phraseology. To illustrate his tenet, he analyzes noun-headed phrases (N’s N, N x N, Adj N) and verb-headed phrases of composite predicates (V (Prep) N). Both on statistical and on linguistic grounds, the paper provides the necessary background to the motivation of many fixed expressions partly still in use today, such as to do sports (< disport) and to make merry. Moving from general English to typologically specialized varieties, IRMA TAAVITSAINEN, PÄIVI PAHTA and MARTTI MÄKINEN illustrate the corpus of Middle English Medical Texts (MEMT), compiled by the authors themselves (‘Towards a Corpus-Based History of Specialized Languages: Middle English Medical Texts’). MEMT forms the first part of the Corpus of Early English Medical Writing (1375-1750) and contains texts from c. 1375 to c. 1500 together with a small appendix of recipes from c. 1330. The authors discuss in great detail the principles of corpus compilation, data selection criteria and database structure; moreover, they report on pilot studies on earlier versions of the corpus, and indicate areas in need of future research. Indeed, MEMT proves to be very useful for several kinds of analyses on linguistic developments and systematic historical accounts of specialized and professional languages; moreover, it provides a new window to the late medieval medical register and to genre-based language history, as medicine was the forerunner in vernacularization processes. The final contribution pertaining to the early periods of English development is by BARRY MORLEY and PATRICIA SIFT on the expression of speech acts in Late Middle English (‘Towards the Automatic Identification of Directive Speech Acts’). According to the authors, a challenge for Historical Pragmatics is an automated ‘function-to-form’ study (Jucker 1995) for data identification and quantification. For example, speech acts are realized in a large variety of syntactic patterns that are difficult to trace electronically; in this study, the computerized identification of speech acts is shown to be viable, given well-defined research parameters. The paper investigates directive speech acts (manually identified) in a small corpus of Late Middle English prose sermons (1350-1500) taken from the PennHelsinki Parsed Corpus of Middle English. In such religious instruction, directive speech acts are typically issued by preachers to their audiences and occur in recurring syntactic forms or formulae

10

Roberta Facchinetti / Matti Rissanen

qualitatively depending on audience composition (laity, educated audience, presence of a monarch, etc.). An interesting attempt has been made by the authors to quantify these formulae in terms of a set of computer filtering rules. The following four papers exploit corpora to tackle issues pertaining to different typological and regional varieties of Modern English. HELENA RAUMOLIN-BRUNBERG studies people who led morphological changes in Early Modern England (‘Leaders of Linguistic Change in Early Modern England’); to do so she analyzes the Corpus of Early English Correspondence (CEEC, c. 1410-1680), compiled at the University of Helsinki and covering the time-span of c. 1410-1680 for a total of 2.7 million running words, representing over 6,000 letters. Three changes are discussed: the replacement of subject YE by YOU, the change from the third-person singular suffix -TH to -S, and the loss of the final nasal from the possessive determiners MINE and THINE. The leaders are singled out for two different phases of each change: the incipient (the overall proportion of the new form below 15%) and the new and vigorous (15-35%). The individual leaders can be found among the groups that are known to have led the changes on the basis of previous research. Interestingly, the incipient and the new and vigorous leaders had different social backgrounds and networks; specifically, the incipient leaders were geographically mobile middle-ranking people with a large number of weak links, whereas the new and vigorous leaders – who had a higher social status – were central people in their social networks. Moving to 17th- and 18th-century journalistic writing, HANS MARTIN LEHMANN, CAREN AUF DEM KELLER and BENI RUEF illustrate the first public release of ZEN, the Zurich English Newspaper corpus, consisting of early English newspapers published in London between 1661 and 1791 (‘ZEN Corpus 1.0’). The authors describe the selection, transcription and format of the material and discuss the methodological decisions taken at the stages of sampling, transcribing and structuring/formatting, with specific reference to the conversion into XML format and its advantages. Finally, they provide an interesting overview of ZEN Online, a web-based search interface of the corpus, analyzing pattern-based data retrieval, the use of annotation and the integration of visual representations of the original newspaper material.

Introduction

11

An interesting application of the ZEN corpus is provided by UDO FRIES, who investigates the genre of obituaries (‘Death Notices: The Birth of a Genre’). By the end of the 18th century – when The Times was first published – death notices introduced by an appropriate header, usually Died or Deaths, were a well-established genre. The earliest examples in the ZEN Corpus date from the early 1730s. When one extends the definition of the text class to include brief death reports without any headers, we come across many earlier examples, right from the beginning of the collection (in 1671). These show a greater variety of expression than the later ones with a header. In his well-balanced discussion, the author also warns about the limits of using computer corpora to answer text linguistic questions, such as the rise of a new text class or genre; indeed, a complete text linguistic study must always consider the possibility of texts outside the corpus, since even large corpora cannot contain all the texts of a period. Thus, the type of death notices found in newspapers should also be sought in monthly magazines, which so far have been disregarded in linguistic studies. The final paper in this section is by FRANCK ZUMSTEIN (‘The Contribution of Computer-Searchable Diachronic Corpora to the Study of Word Stress Variation’), who presents the early results of a study carried out by the research group in Linguistics at the University of Poitiers. Such results include the making and use of computersearchable lexico-phonetic corpora of 18th- and 19th-century English. These two centuries saw an upsurge in dictionary-making activity and were marked by the lexicographical works of famous authors such as Samuel Johnson and John Walker. In most dictionaries, the authors meticulously endeavoured to represent the pronunciation of entry words. The Poitiers research group has undertaken the task of digitizing these documents and turning them into computer-searchable corpora, in order to help the linguist to retrieve quickly and exhaustively the data she/he is looking for. Hopefully, these new tools will be of help in accounting for pronunciation variation of word entries in relation to their spelling, stress pattern and morphological structure; for word-formation tendencies and related phonological issues; and finally for the pronunciation peculiarities of learned and specialized words.

12

Roberta Facchinetti / Matti Rissanen

The last four contributions in the volume provide a composite, insightful overview of 19th- and 20th-century English; the former is discussed as a period of linguistic stability, variation, and change by MERJA KYTÖ and ERIK SMITTERBERG (‘19th-Century English: An Age of Stability or a Period of Change?’). The authors analyze lexical bundles, multal quantifiers, and the progressive and phrasal verbs as they occur in CONCE (A Corpus of Nineteenth-century English) and ARCHER (A Representative Corpus of Historical English Registers). Indeed, their survey of lexical bundles reveals a notable degree of stability across time in e.g. Letters and Science; trials display features familiar from conversation as well as from the courtroom situation; while certain bundles are genre-specific. With reference to multal quantifiers, open-class options such as a lot of are shown to increase at the expense of closed-class quantifiers like much, but only in certain linguistic contexts (especially uncountable and assertive contexts). Finally, as for progressives and phrasal verbs, they both increase greatly in frequency across the 19th century, but genre and gender are important conditioning factors. Taken together, these three case studies demonstrate that stability, variation, and change are multifaceted notions that may apply on many levels of language use. From a methodological perspective, the results illustrated by the authors are related to previous multi-feature/multi-dimensional analyses and their study testifies to the fact that such analyses can be of great help when the communicative functions of other linguistic features are interpreted. CLEMENS FRITZ (‘The Conventions’ Spelling Conventions: Regional Variation in 19th-Century Australian Spelling’) draws attention to Australian English spelling by analyzing the minutes of the Federation Debates of the 1890s, comprising ca. four million words transcribed in Adelaide (1897), Melbourne (1890, 1898) and Sydney (1891, 1897). Despite its comparative uniformity, Australian English appears to exhibit a number of regionalisms, which are not restricted to lexis but cover the area of spelling as well. Currently, spelling variables are typically associated with differences between British and American English, while in the 19th century they were not yet codified to the same extent. Therefore, English in Australia could still choose e.g. between -our and -or, and choices differed by writer, period and region. The investigation of the Federation Debates shows

Introduction

13

that the state parliaments of Adelaide, Melbourne and Sydney had fixed certain spelling variables differently. On account of this, regional standards were established which can sometimes still be found today. The last two papers start from the 20th century but go well beyond the mere illustration of a “short-term change in diachrony” (Kytö/Rudanko/Smitterberg 2000: 85), which is supposed to focus only on the language of the previous century; indeed, both authors embrace also the older periods of the history of English. TINE BREBAN does so to test the hypothesis that adjectives of comparison – such as other, identical and similar – display a polysemy explainable as the result of an ongoing process of grammaticalization (‘The Grammaticalization of the English Adjectives of Comparison: A Diachronic Case Study’). Specifically, the author lays the results of her study on the diachronic development of six adjectives of comparison from 750 to 1920, representing three semantic subgroups – other and different for ‘difference’, same and identical for ‘identity’, and similar and comparable for ‘similarity’ – and compares their development to the situation in Present-day English. The reference corpora consist of eight random samples, covering the periods 7501050, 1050-1250, 1250-1500, 1500-1710, 1710-1780, 1780-1850, 1850-1920 and a Present-day English sample containing material from 1990 onwards, for each of the six adjectives. The first four samples (750-1710) are from the Helsinki Corpus of English Texts, the next three (1710-1920) from the Corpus of Late Modern English Texts, while the Present-day English texts are extracted from the COBUILD Corpus (Bank of English). The resulting analysis encompasses both a qualitative and a quantitative perspective, since it thoroughly investigates both the development of new meanings and the changing distribution of the different meanings for each of the adjectives. The final article is by GÖRAN KJELLMER (‘Panchrony in Linguistic Change: The Case of Courtesy’) and focuses on the development of the word courtesy. The author remarks that, in a normal pattern of linguistic change, one phase is succeeded by another one, which in turn is succeeded by a third, etc.; the older stages vanish successively as the later ones establish themselves. However, with the help of data from the Oxford English Dictionary and from the CobuildDirect corpus, it can be observed that this is not the only

14

Roberta Facchinetti / Matti Rissanen

pattern in language change. The case of Present-day English courtesy illustrates a pattern where it is possible to trace a number of consecutive stages – logically developed from an original stage – which have all remained in the language, none of them showing any signs of being supplanted by the others. Moreover, the final phase is well on its way to becoming grammaticalized. The author concludes that the simultaneous existence in the current language of all the stages of the development of courtesy is seen, perhaps paradoxically, as an instance of panchrony in linguistic change. This collection terminates with the slightly provocative claim of panchrony in linguistic change and the insightful, perceptive exploitation of Present-day English corpora for the study of diachronic issues. With such premise, the ground is open for a second book of corpus-based studies to be published out of the conference ‘Corpus linguistics: The state of the art twenty-five years on’; in the book, currently in preparation, theoretical issues and individual case studies will be dealt with reference to contemporary English. The two books are intended as complementary since, as we hope to have testified to in the present collection, past and present are so strongly interlocked and so inextricably entwined that it proves hard – if not preposterous – to fully understand present-day English structure and peculiarities without turning back to the previous centuries for an in-depth knowledge of the ‘whys’ and ‘hows’ of the current state of the art.

References Brinton, Laurel J./Minoji, Akimoto (eds) 1999. Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: Benjamins. Hartle, Paul 1999. Hunting in the Letter. Middle English Alliterative Verse and the Formulaic Theory. Berlin: Lang. Jucker, Andreas H. (ed.) 1995. Historical Pragmatics: Pragmatic Developments in the History of English. Amsterdam: Benjamins.

Introduction

15

Kytö, Merja/Rudanko, Juhani/Smitterberg, Erik 2000. Building a Bridge between the Present and the Past: A Corpus of 19thCentury English. ICAME Journal 24, 85-97. Minkova, Donka 2003. Alliteration and Sound Change in Early English. Cambridge: Cambridge University Press.

ANNE CURZAN / CHRIS C. PALMER

The Importance of Historical Corpora, Reliability, and Reading

1. Introduction Many publications about historical corpus linguistics over the past ten to fifteen years have stressed the rich possibilities that electronic databases open for both research and teaching on the history of the English language. In fact, one of the authors of this study wrote, as the opening sentence to an abstract for the MLA Annual Convention in December 2004: “New electronic databases or corpora have created exciting possibilities for research on the history of the English language, of a scale unimaginable even a few decades ago”. This study is not intended to challenge this statement or others of its ilk. Instead, it aims to make explicit the perils that endanger corpus-based historical linguistic research and the objectives that are critical to the field’s success. It is the metaphorical caution label for how to use these exciting new resources in ways that are as responsible and reliable as possible. It is also the goals statement for how to make historical corpus linguistics studies as interesting as possible to a wider audience of linguists and historians of English. The fact that historical linguists can now do searches in three seconds that would previously have taken three decades creates a new kind of pressure for us to come to brilliant conclusions based on our three-seconds worth of research. (This is an exaggeration, but it captures the idea). It leads to the temptation to overgeneralize from the numbers that we can now get so easily, if only to fulfill the expectations for what is ‘now possible’ with these vast amounts of searchable text. It also can emphasize speed and numbers over traditional close reading of texts and smaller, more qualitative studies.

18

Anne Curzan / Chris C. Palmer

Our argument has two strands. First, historical corpus linguistics will benefit from embracing a wider definition of a useful historical corpus while recognizing the limits of any corpus and its search engines. Second, studies in historical corpus linguistics must involve complementary methodologies and engage current linguistic theories (e.g. sociolinguistics, language acquisition, cognitive linguistics, pragmatics).

2. Expanding the definition of a corpus (cautiously) One of the central questions in corpus linguistic research is the use of principled versus unprincipled (or non-systematic) corpora. The focus of primary concern – and interest – has been the World Wide Web. But for historical linguistic studies, there are many other unprincipled databases of (primarily literary) texts that should be part of the discussion. In principle, any collection of texts can be called a corpus (corpus < Latin corpus ‘body’). But a corpus in modern empirical linguistics has typically been more specialized. It is a large and principled collection of texts, characterized by the following features (McEnery / Wilson 1996: 21-24): x

x x x

sampling and representativeness (i.e., language variety through a range of authors and genres – dialect, time period, register, etc.); finite size; machine-readable form; and a standard reference for the language variety it represents.

This kind of principled corpus allows an important set of assumptions for statistical studies of language variation and change. At the same time, unprincipled corpora also offer rich possibilities for historical language studies. These often large text collections are not designed for systematic linguistic study, but they

The Importance of Historical Corpora, Reliability, and Reading

19

prove to be valuable resources for researchers to investigate how particular linguistic features and changes manifest themselves in actual texts from a given period in the history of language. They allow informative qualitative, if not straight quantitative, studies. Current debates about unprincipled corpora have focused primarily on massive databases such as Lexis-Nexis1 as well as on the World Wide Web, both of which are relevant to scholars working on the history of English. No linguist denies that search engines such as Google offer tantalizing access to vast amounts of information about recent and ongoing changes in the language. The problem is that these resources grow every day (hence the colloquial label ‘big bangs’).2 The information from one moment to the next moment is not statistically comparable given the lack of stability of the size or composition of the web and the ever-expanding text selection in Lexis-Nexis; and it would be impossible to do any kind of overall frequency count. New software such as WebCorp is now making it possible to create a mini-corpus drawn from the Web, which can be kept stable; and through careful use of parameters, researchers can create a stable mini-corpus in Lexis-Nexis. An unstable resource like the Web can also be very helpful for proving that a supposedly nonexistent construction does, in fact, exist (e.g. socksless).3 Even

1

2

3

Lexis-Nexis Academic is an electronic, primarily full-text database of news, legal, business, medical, and reference resources. For further information see http://www.lexisnexis.com/academic/. The ‘bang’ just got bigger at the University of Michigan (UM), as well as several other universities: the seven million volumes in the UM libraries will be added to the Google search engine over the next decade. Google will digitally scan almost everything available in the UM libraries and make it searchable, including full access to texts out of copyright. This endeavor, if successful, will turn much of the UM libraries into a vast, searchable, unprincipled corpus. For those of us interested in searching systematically for historical examples of language use, we will need to figure out how to exploit these new possibilities in ways that allow a search to be replicated and verified. Some theories of morphology hold that derivational suffixes cannot be attached to a word that already ends in an inflectional suffix. The example of socksless, posted by kombuchakid at the blog , is a possible counter-example to this assumption.

20

Anne Curzan / Chris C. Palmer

these searches, however, pose the problem of identifying the reliability of the source, including whether the author is a native English speaker. Recognizing the diversity of our research goals in historical corpus-based linguistics allows us to exploit the possibilities offered by unprincipled corpora, particularly when used in conjunction with principled ones. A study that aims to test for a construction’s frequency or the effect of sociolinguistic factors requires a systematically compiled, finite, representative corpus. A study that aims to prove the potential grammaticality of a construction or to collect lexical evidence may benefit from the use of a larger, unprincipled corpus. The key is to develop methodologies that exploit the historical corpora available appropriately, given the research goals and the nature of the corpus. Many large unprincipled databases of historical texts are already available and ready to be explored using innovative corpus linguistic methodologies. Examples of these databases include the Middle English Compendium and the Chadwyck-Healey Collections, which contain a variety of (untagged and unparsed) whole-text literary specimens from different periods such as Early English Prose Fiction, Eighteenth-Century Fiction, and Twentieth-Century African American Poetry. All these collections of literary texts feature search engines that allow historical linguists to pursue word-based searches. This point actually needs to be stated and perhaps repeated, as one can still encounter resistance to these kinds of studies of literary databases: for example, using fiction databases for research on gerundial expressions containing ‘accusative’ and ‘genitive’ subjects, e.g. I witnessed him losing the race vs. I witnessed his losing the race.4 Linguists should not shy away from collections that appear to be more literarily oriented; they must simply recognize their limitations and strive to supplement them with complementary databases. Searches can be usefully applied to such databases as Chadwyck-Healey – which even provide the benefit of full-text searching – to draw some interesting

4

When one of the authors of this chapter presented such a study before a fellow linguist, he initially scoffed at the idea, saying: “One should never rely on literature to gain a better understanding of naturally occurring syntax”.

The Importance of Historical Corpora, Reliability, and Reading

21

observations about language change.5 That said, we still must be careful not to extrapolate too far from the particular database/genre under analysis, since its examples tell us not about language generally and entirely but about language in that specific discourse or genre, in those available texts.

3. Recognizing the limits of corpora and search engines Quantitative research – from lists of frequency statistics to lists of collocational patterns – does little without context and interpretation. As they say in Helsinki (one of the centers of research on the history of English using electronic databases): “Research begins where counting ends”. Qualitative research relies on the intuitions of the researcher. At this point, no corpus linguists are arguing for relying on corpora without the benefit of introspection; corpus data must always be interpreted with the insights of the researcher’s introspection. But corpus data, unlike purely introspective methods, allow for verifiability and require accountability. As three well-known linguists have put it, computers do not “change their mind or become tired during an analysis” (Biber / Conrad / Reppen 1998: 4). What is less often recognized is how deeply implicated intuition is in the collection of corpus data. We will argue here that for much historical research, quantitative results rely on research intuitions as well. The construction of corpus searches, not to mention of corpora themselves, is guided at least in part by research intuitions. And almost all computer-generated results need to be sorted through by hand, no matter how reliable the search engine. The design – and often redesign – of any more complex search requires researchers’ manipulation of what is possible as well as their 5

It is an unfortunate consequence that the expense of these resources limits their availability to the public. Many of them are available only at libraries that can afford to pay the substantial fees required by the companies that own them.

22

Anne Curzan / Chris C. Palmer

intuitions about what is plausible. This is particularly critical for ‘word-based searches’, Susan Hunston’s label for searches which retrieve and organize data that correspond to strings of characters specified by the user without reading for any tagging in the corpus (Hunson 2002). The search may contain such features as Boolean searches (return X AND Y, X OR Y, etc.), proximity searching (return all occurrences of X when it is within 10 words of Y), or wild card searching (pr*y returns pry, pray, prey, privy, etc., since * is a common operator that means ‘zero or any number of continuous characters’). The kind of functionality provided by a search indisputably affects the kind of research questions one can ask, and the researcher must find creative ways to employ allowable searches to generate the appropriate data to answer his or her questions. This dialectic between researcher and research tool – each influencing and being influenced by the other – manifests most clearly for corpus linguists when searches must be redefined or redesigned to resolve a number of data problems: specifically overgeneration, undergeneration, and over-reliance on canonical spellings. We briefly address each of these concerns to emphasize how researchers must use their intuition to acquire the data they want when designing wordbased searches for untagged corpora. Overgeneration: One of the largest pitfalls in searching large, untagged databases is the retrieval of far more data than the researcher initially intended. Such overgeneration of forms commonly occurs when one employs wild-card searches, e.g. in a gerundial search such as see/saw him *ing vs. see/saw his *ing. In these examples, a number of gerundial phrases with either an accusative or genitive subject would be retrieved, along with a number of irrelevant forms including (perhaps) I saw him sing or Did you see his ring? These searchgenerated data can skew the results if a researcher fails to analyze the results carefully and throw out such irrelevant forms. Furthermore, if a search generates far too many extraneous results to pick through, the researcher must reconfigure the original search – using negative Boolean operators, a different wild-card pattern, etc. – to minimize the irrelevant forms that are most likely to clog up the data. Undergeneration: In many cases the problem with a search is not overgeneration but rather a dearth of relevant data. The researcher has two related but distinct concerns when it comes to

The Importance of Historical Corpora, Reliability, and Reading

23

undergeneration: (1) Does the search in its current form fail to retrieve important forms?; (2) Is the search generating enough data for the researcher to draw any meaningful conclusions? Regarding (1), if a search string is too narrowly defined, the researcher could miss significant forms. Depending on how a wild card is defined in different search interfaces, the example above (see/saw him *ing) might miss gerunds with premodification (e.g. saw him furtively glancing) and would need to be revised. Question (2) is a particularly critical one for researchers exploring phenomena that manifest relatively infrequently in corpora, such as derivational affixation. Consider Christiane Dalton-Puffer’s study (1996) of Middle English morphological data and her use of the parameter code settings of the Helsinki Corpus, which mark each text for such features as author age/rank/sex, dialect, etc. Every time she adds a code to one of her searches, the additional parameter shrinks the already small number of hits for any given suffix. The limited size of her results makes it difficult for her to provide strong conclusions about the influence of social, geographic, and textual variables on the productivity of different affixes.6 And her conclusions about the structure of Middle English morphology could be bolstered by more data from the period. In many ways, the problems with undergeneration here are the result of Dalton-Puffer’s strict reliance on a small, principled corpus. When researchers suspect their searches are providing too little data, we strongly recommend they consider expanding the corpus size to increase the number and types of hits of the linguistic forms under investigation. By searching larger, often unprincipled databases alongside smaller, principled corpora, researchers may be able to strengthen and/or broaden their conclusions about phenomena that would otherwise yield too few hits. Canonical Spellings: Historical linguists have to rely on canonical sources when designing a search. When looking for particular forms within a historical period, they usually cull information from older philological studies, historical dictionaries, synchronic accounts of the grammar at different periods – anything 6

This problem is sometimes referred to as ‘the phenomenon of vanishing reliability’: the more variables a researcher adds to a search, the less data is generated, and the less representative the data becomes.

24

Anne Curzan / Chris C. Palmer

that provides a list of the various forms of a word or item in a certain period. But a more exhaustive search cannot depend solely on canon, since many non-standard forms continue to go unrecognized in standard materials. This particular problem tests the limits of linguistic intuition, since it is difficult for us to imagine and search for all plausible, possible, though possibly unattested forms of a linguistic item. In fact, sometimes our corpus work is designed to answer this very question: what grammatical forms are possible in a historical period, in this genre, within this community? Dalton-Puffer gets around the canon problem somewhat by using Wordcruncher to sort through items in the Helsinki Corpus so that she can identify orthographic variations of the same word-ending, such as for the suffix -able. Even though she expands the canonical list of forms, she still manages to miss some forms such as in feelably, which Gary Miller (1997) finds in a few Middle English texts not excerpted in the Helsinki Corpus. When it comes to identifying possible orthographic forms in earlier periods, it turns out that not only our intuitions but also our corpora are limited. Historical linguists may need to search databases of unexcerpted texts – or perhaps even read through and analyze original manuscripts and full texts themselves – to establish more comprehensive lists of orthographic variants. As all these examples make clear, historical linguists must rely on their judgment over the computer’s in order to sort through the data before proceeding to the analysis. This hand-sorting, in addition to addressing the concerns raised above, can locate ‘outliers’ in the data: often a particular text or writer whose results potentially skew the data. For example, in a study of which versus the which in the Corpus of Early English Correspondence, the calculation of overall statistics explicitly excludes Sabine Johnson because, with her absolutely consistent use of the which, she so deviates from all her contemporaries that to include her would distort the overall picture of use in the period (Nevalainen/Raumolin-Brunberg 2003: 129-130).7 In historical corpora without tagging, researchers must also rely on handsorting and their judgment to catch ‘red herring’ results, be they the 7

Importantly, the researchers present the individual statistics so any reader could recalculate the statistics including Sabine Johnson.

The Importance of Historical Corpora, Reliability, and Reading

25

result of spelling variation, homonymy, computer misreading of scanned text, or other causes.8 With some written Old and Middle English, for example, researchers may need to weed out irrelevant hits given the lack of distinction between third-person masculine singular him and plural him or between third-person singular feminine hie and plural hie. Computers can search texts much faster and much more reliably than humans and locate valuable linguistic data. In the end, however, computers cannot read the way that humans can.

4. Preserving reading in historical corpus linguistics Susanne Schmid echoes a common argument when she notes that “the computer has been accused of being instrumental in destroying the culture of reading” (1999: 372). Although most scholars in language and textual studies realize that computers are merely a different technology that has had a significant impact on the way we read in ways different from manuscripts or printed books, outright complaints about the use of corpora still persist. Coming from a more philological background, Gary Miller protests the use of corpora for diachronic morphological research: “the most fundamental problem [with such studies] involves methodology, reliance on modern technology, incomplete electronic samples of edited texts with no critical apparatus” (1997: 252). However, computers do not destroy reading; the use of a search on a computerized text is in fact a kind of reading of that text. Once we acknowledge this view of the search, we can better address the various problems that arise alongside these new types of reading. There are (at least) three types of reading that researchers rely upon in every corpus analysis: (a) the computer’s recognition, retrieval, and presentation of forms input into a search (as well as its inclusion and exclusion of various levels of context for each 8

Syntactic tagging, of course, relies heavily on researchers’ intuitions about grammatical structure and/or lexical meaning in any ambiguous construction.

26

Anne Curzan / Chris C. Palmer

occurrence); (b) the researcher’s analysis and interpretation of these results; and (c) the editors’/transcribers’ versioning of the texts that constitute the corpus. The kind of reading computers provide for corpus linguists in (a) is vastly different from the methods employed in other sorts of textual studies, but it is equally powerful. Key-WordIn-Context (KWIC) lists refashion texts into selective, partial readings; they make certain language patterns, which might otherwise remain unknowable, immediately visible to the human eye. Much of the theoretical and practical import of corpus studies occurs (or should occur) during the reading characterized in (b), and we elaborate on the larger relevance of this type of reading in Section 5. But currently the most underacknowledged reading in corpus studies is that described in (c). In this section we argue that researchers should acknowledge the impact of editors on the texts within their corpora and should consider reading full, original texts to supplement their interpretive reading of corpus data. One problem with corpora is that a corpus (if built along the typical trends in corpus design) can represent only one choice among variants in different manuscripts or possible readings of a minim sequence. Consider this brief example from two different manuscripts of Julian of Norwich (the variants are italicized):9 (1)

The which kirtle is the flesh; the syngulhede is that there was ryte now atwix the godhod and manhede. (MS Sloane 2499)

(2)

The wyth [white] kirtle is the flesh; the syngulhede is that there was ryte nought atwix the godhod and manhede. (MS Fonds anglais 40)

The syntax and semantics of these examples are completely different; moreover, the statistical counts and analysis of relative pronouns or negators in Julian of Norwich (or in mystical texts, or in Middle English, etc.) would differ – perhaps significantly – depending on

9

These examples of manuscript variation are drawn from the TEAMS Middle English Texts Series edition of The Shewings of Julian of Norwich, edited by Georgia Ronan Crampton (1993). The example below of minim ambiguity in an original manuscript comes from the 1601 account of Richard Warwick, one of many fully transcribed accounts in the collection Everyday English 15001700, edited by Bridget Cusack (1998).

The Importance of Historical Corpora, Reliability, and Reading

27

which manuscript is used in the corpus. But even when there is only one manuscript of a text, there may be ambiguous forms within it which will ultimately appear as unambiguous forms in a printed text. When a 1601 tailor’s receipt by Richard Warwick presents the word ounces in a minim sequence, the minims could symbolize either ounces or onnces. This small orthographic difference could have major significance for a historical phonologist looking at diphthongs in such texts. But the researcher would know of the possible ambiguity here only if the original manuscripts were checked. In the end, there is no replacement for working with the actual texts that make up historical corpora. Several years ago, Matti Rissanen (1989) wrote about his concern that corpus linguistics could damage the discipline of philology: students still need to be able to work comfortably with medieval texts – both the language and the handwriting. The bottom line is that the texts themselves hold the answers, and corpora must often rely on editions, which are not always reliable. A case in point: Henry Machyn’s Diary, from the middle of the 16th century. The work that has gone into creating a new electronic edition of the Diary at the University of Michigan, using scanned images of the original manuscript, has demonstrated how many errors occur in the available editions. And the kinds of small errors that have been found are exactly the kinds of subtle distinctions in word forms that we often use to build cases for historical variation. The work on this Diary, to create a corpus of one text, also speaks to the difficult question of ‘representativeness’ in corpus design. The construction of most corpora assumes that an excerpt of a text represents the language of not only that text but also of the relevant demographics: a writer/speaker of that gender and age in that location and of that socioeconomic status. As Richard W. Bailey and Colette Moore point out in their recent work on the Diary (Bailey/Moore forthcoming), there can actually be enormous variability within one text, both in terms of spelling and in terms of morphological and syntactic use. It can also be difficult to pinpoint the origins/dialect of a speaker as well as his/her class status. Just as we balance the quantitative and qualitative, we must complement more general studies of excerpts of many texts with studies of single texts (which can also be electronically facilitated, it is just that ‘the corpus’ is only one text).

28

Anne Curzan / Chris C. Palmer

Complementary studies – studies that combine database searches with analyses of fully read complete texts – provide the best way to address all of the above concerns. Even though full-text (i.e., old school) reading is far slower than database searching, it has many advantages. First, a careful reader can account for a larger range of available variants of a certain form than the more canonical sources provide; these variants can then be entered into later searches of much larger corpora. In this way, the full texts can become supplemental to the corpus study, adding to the total number of texts under analysis. Second, any generalizations proposed for a linguistic process from a database analysis – say, the productivity of a certain suffix – can be checked against the statistics derived from the full-text analysis. Such a contrastive analysis can evaluate the representativeness of databases with excerpted material. And third, many lesser known texts are still in print or even manuscript form, and are not available digitally. Until someone takes the time to put them online, these texts must be read line by line. Generally speaking, such texts (like the records of the London Grocers’ Company) tend to be less canonical and may provide surprising bits of data when compared to more canonical texts. For all these reasons, we should think of corpus-based studies as enhancing and complementing more traditional full-text manuscript studies rather than surpassing or replacing them.

5. Enriching the interpretation of corpus-based results Debates about whether corpus linguistics is best viewed as a methodology or a field raise two fundamental concerns. If corpus linguistics is described as a separate field, it runs the risk of being marginalized by linguists in other fields who can seemingly safely say they “don’t do that”. If corpus linguistics is described as a methodology, however, it risks losing some of the credibility that comes with the label ‘a (sub)field of linguistics’. These stakes are very real, but we approach them here from a different angle. For the value of corpus-based studies to be more widely recognized and corpus-

The Importance of Historical Corpora, Reliability, and Reading

29

based methodologies adopted in the field of linguistics (both synchronic and diachronic), corpus linguistics must engage current linguistic theory from a range of linguistic subfields. There is a rich conversation to be had, and corpus linguists should take responsibility for initiating and furthering it, at least in these relatively early stages. In a funny and telling anecdote about armchair linguists (rational linguists) and corpus linguists, Charles Fillmore (1992: 35) concludes: These two don’t speak to each other very often, but when they do, the corpus linguist says to the armchair linguist, “Why should I think that what you tell me is true?”, and the armchair linguist says to the corpus linguist, “Why should I think that what you tell me is interesting?”

Much has changed since Noam Chomsky’s initial devastating critique (or some might call it dismissal) of corpus studies. But corpus linguistics, at least in the United States, remains outside most definitions of ‘mainstream’ linguistics. To capture their relevance for a broader audience, corpus linguistic studies should stretch the interpretations of data into dialogue with current linguistic theorizing in sociolinguistics, historical linguistics, syntax, cognitive linguistics, and pragmatics – just to name a few fields. As part of this dialogue, corpus studies should not only apply current theory but also assert their implications for linguistic theory generally as well as for more specialized areas such as the history of English. Recent historical corpus-based work provides many models for the application of current linguistic theory to the interpretation of data, as well as to the construction of corpora. For example, sociolinguistic theory has informed the design of corpora (e.g. the tagging of the Helsinki Corpus, the motivation and tagging of the Corpus of Early English Correspondence), and studies based on these corpora exploit the available parameters to examine the effects of factors such as age, gender, rank, and register on the progression of specific linguistic developments in English. Susan Fitzmaurice (2000) has employed social network theory to investigate a specific coalition of male writers in the late 18th century and to examine the development of particular models for Standard English. In that work, Fitzmaurice recognizes the inherent difficulties and limitations of applying this

30

Anne Curzan / Chris C. Palmer

kind of sociolinguistic model historically, given that we have only written records available and limited knowledge of social ties. While theory has been productively applied to the design and analysis of corpora, the key complementary aim for corpus-based historical studies is to inform the development of current linguistic theory. Sinclair (2004: 10) puts out a call to corpus linguists to explain how their findings require a revision and an enrichment of a range of linguistic theories: The axioms and apparatus and traditions of linguistic theories that were devised before corpus evidence became available need to be revised in the light of that evidence, because it shows that a language in use does not behave in the ways predicted by the theories.

Sinclair even comes close to arguing for a theory-free approach to corpora, so that the corpus linguist can avoid any theoretical impositions on the categorization of the data from prior assumptions about syntax, semantics, or other linguistic domains. While we agree with Sinclair that the interpretation of corpus data should engage theory, we also believe that theory plays a valuable role in formulating initial research questions, in devising searches, and even in designing a corpus itself.10 However, at present historical corpus studies do not always engage theory successfully on several fronts: sometimes a framework is invisibly incorporated into a project design and not fully articulated; other times a framework is adopted and articulated but applied wholesale, with no critique of the assumptions of the theory; and at times the results of a study, no matter which framework was initially adopted, are not brought to bear on any relevant language theories. Historical corpus studies will benefit from our efforts to articulate the theories that inform our research questions, to evaluate those frameworks, and to communicate the relevance of our conclusions to the theories of appropriate linguistic sub-fields. 10

Moreover, it does not seem possible to devise a theory-free approach to corpora. Whenever a search is designed, one must always decide what linguistic units matter – be they graphemes, morphemes, words, phrases, bundles, etc. – as well as which variables (social, dialectal, generic, etc.). The use of any of these items already presumes some sort of theoretical influence in defining which units and variables are worth investigating.

The Importance of Historical Corpora, Reliability, and Reading

31

Terttu Nevalainen and Helena Raumolin-Brunberg’s pioneering work on historical sociolinguistics provides one model. Their book Historical Sociolinguistics (2003) includes many detailed studies of particular linguistic developments (e.g. the replacement of ye by you, the decline of multiple negation), examined within a sociolinguistic framework that considers factors such as a speaker’s age, rank, and gender; the relationship of the speaker to the addressee; and the genre of the text. In the opening chapter, they set the theoretical bar higher than these individual studies: “As the work progresses, as many sociolinguistic ‘universals’ will be put to the historical test as possible” (Nevalainen/Ramoulin-Brunberg 2003: 11). The individual studies not only provide new insights about specific historical developments in English; they also confirm, complicate, and question broader sociolinguistic conclusions. For example, some historical studies of changes in the Tudor and Stewart periods support the broad conclusion that women often lead in linguistic changes; however, women’s historical access to education complicates any such generalization and a change such as the decline of multiple negation, which seems to have been a change from above promoted by legal language, was led by men. This finding can usefully be applied to sociolinguistic studies of contemporary societies as it points to social factors that must be considered in conjunction with broad categories such as gender. In some cases the potential theoretical conversations between diachronic and synchronic corpus-based studies can lead to surprising connections, compelling us to ask new questions about the organization of the mental lexicon, of grammar, and of language in a variety of historical periods. Compare the recent synchronic work by John Sinclair with historical studies of lexical bundles by scholars such as Merja Kytö and Jonathan Culpeper (2002, 2005). Sinclair (2004) demonstrates how corpus work has problematized the common assumption that every word in every language can be consistently classified into a limited number of parts of speech. He suggests that, for any large-scale classification to be possible, some words and wordtypes must be excluded altogether – that is, they must be deemed unclassifiable in terms of parts of speech. The implications of this proposition go far beyond the need to retool part-of-speech taggers; it suggests that the mind may organize language in ways that are far

32

Anne Curzan / Chris C. Palmer

more complicated than what theories of word classes currently allow. For this reason, Sinclair (2004: 5) rejects the pre-tagged individual token as a unit of analysis and instead relies on what he calls a lexical item, “one or more words which together realise a single unit of meaning”. Kytö and Culpeper’s research on lexical bundles similarly investigates units of analysis other than the tagged word or phrase. Their work raises provocative questions about whether the bundle, instead of the word, should be a fundamental element in historical linguistic studies investigating both semantic and grammatical developments. It should also make us consider the role these bundles could play in the mental organization of the lexicon and the grammar. Working with corpora in any period offers linguists the opportunity to rethink their theoretical assumptions about language. As Sinclair (2004: 4) reminds us, “it is now a priority to model language structure as revealed by insights directly derived from data” instead of simply allowing “external analytical schema [to be] imposed on the data”. Historical corpus linguistics is richer through the application of relevant linguistic theory and promises to make theory answerable to its findings. Through genuine conversation between synchronic and diachronic studies and between theory and corpus analysis, the impact will be multi-directional (theory will inform data, and data will improve theory) and the benefits mutual.

6. Conclusion These are exciting times for those of us who study change in the English language. As historical linguists who employ corpus-based methodologies, we must, however, use judgment and caution in how we conduct our searches and how we reach our conclusions – whether we are using small, principled corpora or large, untagged databases. The strength of corpus-based research is verifiability and accountability. We must try to ensure that we carry out our research in ways that can be verified if possible; this may require us to check original sources and to read full texts to corroborate our interpretation

The Importance of Historical Corpora, Reliability, and Reading

33

of the textual portions retrieved in our searches. We must then document and articulate the decisions we have made as we leap over each pitfall and skirt each obstacle along the way. At the same time, we must stretch our analyses to bring them into dialogue with the theoretical developments in other areas of linguistics. Linguistic theory shapes the research questions we ask, and it must also guide the objectives we set for our analysis.

References Bailey, Richard W./Moore, Colette Forthcoming. Henry Machyn’s English. Studies in the History of the English Language III. Berlin/New York: Mouton de Gruyter. Biber, Douglas/Conrad, Susan/Reppen, Randi 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press. Crampton, Georgia Ronan 1993. The Shewings of Julian of Norwich. Kalamazoo, Michigan: Published for TEAMS in Association with the University of Rochester by Medieval Institute Publications, Western Michigan University. Cusack, Bridget (ed.) 1998. Everyday English 1500-1700: A Reader. Ann Arbor: University of Michigan Press. Dalton-Puffer, Christiane 1996. The French Influence on Middle English Morphology: A Corpus-Based Study of Derivation. New York: Mouton De Gruyter. Fillmore, Charles J. 1992. ‘Corpus Linguistics’ or ‘Computer-Aided Armchair Linguistics’. In Svartvik, Jan (ed.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82. Berlin/New York: Mouton, 35-60. Fitzmaurice, Susan 2000. The Spectator, the Politics of Social Networks, and Language Standardization in Eighteenth-Century English. In Wright, Laura (ed.) The Development of Standard English, 1300-1800. Cambridge: Cambridge University Press, 195-218.

34

Anne Curzan / Chris C. Palmer

Hunston, Susan 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Kytö, Merja/Culpeper, Jonathan 2002. Lexical Bundles in Early Modern English Dialogues: A Window into the Speech-Related Language of the Past. In Fanego, Teresa/López-Couso, María José/Pérez-Guerra, Javier/Méndez-Naya, Belén/Seoane, Elena (eds) Sounds, Words, Texts and Change: Selected Papers from the 11th ICEHL, Santiago de Compostela, 7-11 September 2000. Current Issues in Linguistic Theory 224. Amsterdam/Philadelphia: Benjamins, 45-63. Kytö, Merja/Culpeper, Jonathan 2005. Exploring Speech-Related Early Modern English Texts: Lexical Bundles Revisited. Paper presented at ICAME-26/AAACL-6. Ann Arbor, Michigan. May 12-15. McEnery, Tony/Wilson, Andrew 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press. Miller, Gary D. 1997. The Morphological Legacy of French: Borrowed Suffixes on Native Bases in Middle English. Diachronica 14/2, 233-264. Nevalainen, Terttu/Raumolin-Brunberg, Helena 2003. Historical Sociolinguistics. London: Pearson. Rissanen, Matti 1989. Three Problems Connected with the Use of Diachronic Corpora. ICAME Journal 13, 16-19. Schmid, Susanne 1999. ‘Alps Piled on Alps’ – The Romantic Sublime and the Keyword Search Function in Chadwyck-Healey’s English Poetry Full-Text Database. Anglistik & Englischunterricht 62, 357-374. Sinclair, John 2004. Language and Computing, Past and Present. In Proceedings of the 14th European Symposium on Language for Special Purposes. Available at .

Old English and Middle English

JOHAN VAN DER AUWERA / MARTINE TAEYMANS

More on the Ancestors of Need1

1. Introduction The present-day English verb need has attracted a lot of attention, especially because need comes in two versions, (i) a full verb with a third person indicative present -s, do for negatives and questions, and a to infinitive, and (ii) an auxiliary, without the -s, without do, without infinitival to, and also without a positive affirmative use. (1a) (1b) (1c)

Does he need to see this? He does not need to see this. He needs to see this.

(2a) (2b) (2c)

Need he see this? He need not see this. *He need see this.

Studies hail from the partially overlapping fields of English modals (Duffley 1994) or modals in general (van der Auwera 2001), grammaticalization theory (Taeymans 2004), and negative polarity semantics (van der Wouden 2001). In this chapter we turn to the origin and the early development of this verb. We will see that there are puzzles there too, and that some are relevant for understanding the present-day problems. The old 1

An earlier version of this chapter appeared as van der Auwera, Johan/Taeymans, Martine (2004). Thanks are due to the Research Council of the University of Antwerp for supporting this work with a GOA grant (20032006). Special thanks are also due to Louis Goossens and to Mike Hannay. The glosses use the following abbreviations: ACC ‘accusative’, DAT ‘dative’, DEF ‘definite’, F ‘feminine’, GEN ‘genitive’, IND ‘indicative’, M ‘masculine’, PRS ‘present, ‘PTR’ ‘preterite’, SUBJ ‘subjunctive’, 2 ‘second person’, and 3 ‘third person’.

Johan van der Auwera / Martine Taeymans

38

puzzles have received far less attention, though there is a recent quickening of interest (Loureiro Porto 2002, 2003; Molencki 2002, 2005). We will discuss the early history of need in terms of two double replacements. Each time we will start out from Visser (1969) as a representative of what may be called the ‘accepted view’. The accepted view treats need as being involved in two replacements, and we subscribe to it, but we will add that they are actually double replacements. In section 2 we will briefly present the sources used for this study. In section 3 we will analyse how personal need replaced an impersonal need. In the final and fourth section we will discuss how need replaced the verb þurfan.2

2. The corpora used The present chapter is based on the entries in the standard dictionaries and data retrieved from three diachronic corpora. The frequency data presented in this study are based on the Helsinki Corpus of English Texts (henceforth HC) for Old English (henceforth OE), and for Middle English (henceforth ME) the Penn-Helsinki Corpus of Middle English Texts (henceforth PPCME2), which is further subdivided into Early Middle English (henceforth EME) (1150-1350) and Late Middle English (henceforth LME) (1350-1500) parts. The total number of words in the OE and EME part of the Helsinki and Penn-Helsinki corpora are roughly the same (413,250 vs. 352,089 words respectively), which allows us to draw comparisons between the two. In order to be able to compare findings of the LME period (consisting of 803,876 words in total) with the other two periods, we divided the number of occurrences by 2.28. In addition, we also consulted the Dictionary of Old English Corpus (henceforth DOEC) to expand the rather limited data set provided by the HC. It comprises almost all 2

Strictly speaking, the replacements did not involve need but rather its OE and ME ancestors neodian, nedan, etc. If the context is clear, we prefer the easier wording. Of course, in the case of þurfan there is no easy wording, for the verb did not survive.

More on the Ancestors of Need

39

extant OE manuscripts and allows to track changes in different versions of the same text, but is not suitable for quantitative analyses.

3. A double replacement A good starting point for the historical syntax of English is Visser (1969). Visser (1969: 1424-1425) points out that the modern verb need derives from an impersonal verb that says that something is necessary for someone. This change happened in Early ME, and the change from an impersonal to a personal use affected many verbs. The change is represented in (3), both for need and for like. (3a)

me nedeth Ļ I need

(3b)

me lyketh Ļ I like

The OE ancestor was identified by Visser as neodan,3 and he cites two examples, both from a text known as the Monasterialia Indicia (http://www2.sjsu.edu/depts/english/Indicia.htm). (4) is one of them. (4)

Gyf þe smælre candelle if 2.DAT.SG small.GEN.F.SG lamp.GEN.SG geneodige need.SUBJ.PRS.3SG ‘If a small candle is necessary for you …’ (Visser 1969: 1424)

Interestingly, the form neodige is strictly speaking a subjunctive prefixed form of a so-called ‘class 2’ verb in -ian. So one should 3

Visser (1969) actually refers to (ge)neodan, i.e., he makes clear that the verb may carry the prefix ge-. We have no indication that the presence or absence of this prefix is relevant for what follows. So we will only refer to unprefixed forms, unless we are forced by an example.

40

Johan van der Auwera / Martine Taeymans

associate it with a verb neodian and not with neodan, which would be ‘class 1’. Indeed, other classical sources – the lemmas in Bosworth/Toller’s Anglo-Saxon Dictionary (henceforth BTD), the Supplement to Bosworth/Toller’s Anglo-Saxon Dictionary (henceforth BTS), Clark Hall’s Concise Anglo-Saxon Dictionary (henceforth Clark Hall), Kurath/Kuhn/Lewis’s Middle English Dictionary (henceforth MED), and the Oxford English Dictionary (henceforth OED) – associate the impersonal meaning verb with neodian. The BTD and Clark Hall further point out that this verb is also attested as neadian. None of the dictionaries lists any neodan verb. The OED also mentions that the OE impersonal neodian/neadian was rare. The fact that Visser and the BTD each supply only two OE examples and the OED just one (which is furthermore identical with one of the BTD examples) points in the same direction. The MED further states that impersonal neodian was late OE, a view also found in Van der Gaaf (1904: 20). As a partial check of both the rarity and the lateness of impersonal neodian, we investigated the OE part of the HC, and the DOEC. On a total of 47 relevant uses, there was not a single example of an impersonal neodian or neadian in the HC. The DOEC renders only 6 uncontroversial occurrences with neodian expressing ‘necessity’, all in 11th-century manuscripts. This result accords well with the claims about the rare and late appearance of the impersonal use. What we do find – in all of the 47 cases (see also Loureiro Porto 2002, 2003) – is a personal verb meaning ‘compel’, illustrated in (5) with a Bede example. (5)

Eft se papa afterwards DEF.NOM.M.SG pope.NOM.SG nedde þone abbud compel.IND.PTR.3SG DEF.ACC.M.SG abbot.ACC.SG Adrianus þæt he Adrianus that 3.NOM.M.SG Biscophade onfenge office.of.bishop.ACC.SG take.up.SUBJ.PTR.3SG ‘Afterwards the pope forced the abbot Adrianus to take up the office of bishop.’

More on the Ancestors of Need

41

Is this nedde form also a form of the same neodian or neadian verb? This is not that clear and the literature suggests three points of view: (a)

According to the OED, supported also by the MED, the ‘compel’ and the ‘it is necessary’ verb are not the same. Whereas the ‘it is necessary’ verb would be neodian, the ‘compel’ verb would be neadian or niedan, nidan, nydan and nedan.

(b)

The lemmas in the BTD, followed by Clark Hall, imply a partial identity. Neodian would again be the ‘it is necessary’ verb, nidan and nydan are listed for ‘compel’, but neadian would be possible with both uses. Curiously, the two examples listed for the neadian meaning ‘it is necessary’ have neod- forms, and in this way, the BTD treatment could also be put under (a).

(c)

According to Van der Gaaf (1904: 20-21), Molencki (2002), and implicitly also Loureiro Porto (2002, 2003), there is only one verb. It is attested with various forms and it has both the ‘compel’ and the ‘it is necessary’ use.

None of the analyses mentioned is accompanied by any discussion. Presumably, all authors faced a fairly high degree of formal and orthographic variation, with manifestations for no less than five (infinitival) forms, i.e. neodian, neadian, niedan, nidan and nydan. This variation is not random: it may well be due to there having been two ancient roots, neod- and nead-, and for the verbs, umlaut produced some further variation. Then there might have been additional temporal, dialectal, idiolectal, and scribal variation. Our issue here is merely to question whether there was a discriminatory role for semantics. Did the forms go for a division of labour according to whether the sense was ‘compel’ rather than ‘it is necessary’? Circumstantial evidence makes us inclined to accept the correctness of the polysemous account, and thus also the analysis in (c), i.e. that there was a personal need verb meaning ‘force, compel’ that gave rise to an impersonal verb expressing ‘necessity’. First, there is a remarkable difference in the way the dictionaries treat the relevant verbal forms vs. the associated nominal

42

Johan van der Auwera / Martine Taeymans

ones. The clearest expression of the hypothesis that the personal and the impersonal use have their own verb is found in the OED. The OED further claims that the impersonal verb (neodian) is based on a noun neod, and the personal one (neadian) on a noun nead. However, one will look in vain for two nominal need lemmas. There is only one, but it has a panoply of both meanings, ‘compulsion, force, necessity, need’, and forms, nead, neod, nied, nid, nyd and ned. If there is only one need noun, why should there then be two need verbs? At least, the inconsistency makes the analysis of the verbs suspicious and perhaps the lexicographers have been misled by the fact that the difference between the ‘compulsion’ and ‘necessity’ uses is bound to be more visible in the syntax (valency patterning) of the verbs than in that of the nouns. Nevertheless, a one noun – two verbs analysis is possible. We could be dealing with a polysemous noun associated with two formally similar but not quite identical verbs, with similar but not quite identical uses. The treatment in the BTD and Clark Hall is also suspicious, though there is no true inconsistency now: both the impersonal and the personal verb are associated with one noun. But even for the BTD and Clark Hall the fact remains that the lexicographers treat the verbs and their related nouns in a different way: two lemmas for the verbs and only one for the nouns, even when there is a form (neadian) that is taken to be polyfunctional. This disparity even holds for the MED: it confronts the one lemma of a noun lemmatized as nede with the two lemmas of two verbs lemmatized as neden. Second, it is noteworthy that for OE the ‘it is necessary’ use is both rare and late. Surely, when a verb might be around already and one faces a new and infrequent use, one’s first hypothesis should be that it is a development of this verb’s earlier use. In fact, a scenario developing the ‘it is necessary’ use from the ‘compel’ use seems possible. Consider the late ME Richard Rolle example in (6). (6)

it nedis to hym to do many gud werkis. (Visser 1969: 1425)

It is offered as an example of impersonal ‘it is necessary’. It differs from example (4) in that the need verb is construed with a clausal complement, more particularly, an infinitive. Suppose that the type with

More on the Ancestors of Need

43

the clausal complement is older than the one without. This hypothesis cannot be verified in the OE part of the HC, as it does not have examples for either, but the oldest impersonal example with a clausal complement cited in the OED (also found in Van der Gaaf 1904: 20) is dated at c. 960, with the oldest example of type (4) at 1362. What now prevents one from taking nedis of the ‘impersonal use’ in (6) as a form of the ‘personal’ use, the ‘compel’ use (which definitely was around until late ME – see the entries in the OED and the MED, and the instances in the ME part of the HC), and to derive the impersonal effect from the fact that the verb has an impersonal subject? From this perspective the appropriate gloss would be ‘it compels him to do many good works’. From a semantic perspective, therefore, it is possible to view the ‘it is necessary’ use as deriving from the ‘compel’ use. Note that it is not denied that the polysemy of the noun triggered or facilitated such development. Also, the possibility of a semantic derivation still does not prove that it really happened this way. This semantic derivability is also compatible with a scenario in which the impersonal use was not a direct development of the personal use, but rather a new derivation of the noun, as Van der Gaaf (1904: 20) suggested. The semantic derivability is still relevant here. What we would then claim is that once the polyfunctional need verb with both ‘compel’ and ‘it is necessary’ uses is in place, it is easy to interpret the latter as a use of the former. The two scenarios are represented in (7). (7a) starts off with forms that first only have a ‘compel’ use that later give rise to an ‘it is necessary’ use, all while continuing the ‘compel’ use. For simplicity’s sake we represent the forms with only neadian. (7b) also starts off with the ‘compel’ forms but in this case they are later joined by ‘it is necessary’ forms. We use neadian for ‘compel’ and neodian for ‘it is necessary’. (7a) neadian

neadian

‘compel’

‘compel’

neadian ‘it is necessary’

44

Johan van der Auwera / Martine Taeymans

(7b)

As to the later history, in ME the question whether or not the two senses are to be paired with different forms does not exist. All the forms end up as neoden, neden, nede or neede. The ME impersonal ‘it is necessary’ use will also spawn the modern personal ‘need’ use. By the end of the ME period, the ‘compel’ use will disappear – the latest attestation in the OED is 1496. The impersonal use will also disappear, but only in the Modern period. For the sub-pattern of an impersonal need with only a dative of person, for instance, the OED gives sentence (8) from 1691 as the last example. (8)

What need us some Instances abroad.

In (9) we complete the two scenarios sketched in (7). It goes without saying that the schema is a simplification. It does not, for instance, describe the split between auxiliary and full verb. (9a)

More on the Ancestors of Need

45

(9b)

Note that ME will have two personal uses, the new ‘need’ and the old ‘compel’. Whether one treats the need verb as homonymous or polysemous, it is clear that their coexistence could damage communication. Consider (10). (10)

I needed him.

Its ME wording allowed both the ‘I forced him’ use and the modern ‘I needed him’ use. Apparently, the forces promoting the new use were stronger than the ones keeping the old one, probably because the new construction arose in a grand personalization reanalysis, affecting a large number of verbs. In any case, under analysis (9), one can say that modern personal need replaced two earlier constructions: an impersonal ‘it is necessary’ need and a personal ‘compel’ need. Yet one could also say that the old need verbs did not disappear completely. The new need is a personal verb, just like the old need: it shares much of its syntax. With respect to meaning, however, the new need is more like the impersonal need. A quantitative summary evidencing the double replacement of (i) a personal by an impersonal need verb in ME, and (ii) the impersonal by a new personal need verb is provided in Figure 1. While the old personal need meaning ‘force, compel’ is lost from the

Johan van der Auwera / Martine Taeymans

46

language in late ME, the new personal need verb will only oust the impersonal verb in Early Modern English.

47

need ‘force, compel’ (pers.)

18 4

need ‘need’ (imp.)

OE

8

EME

22

LME

need ‘need’ (pers.)

1 15 0

20

40

60

Figure 1. The verb need in OE (HC), EME and LME (PPCME2).

4. Another double replacement In section 3 we analyzed the replacement of need verbs by other need verbs. The change primarily concerned the meaning and the syntax of the verbs. In this section we will consider the way the modern personal need replaced a personal þurfan verb. A good part of the story is well known. Thus Visser (1969: 1423) points out that the OE counterpart to modern personal need was the OE preterite present þurfan. Then, with the rise of impersonal need meaning ‘it is necessary’ in EME, and the later personal need ‘need’ in LME, þurfan and need became competitors in the semantic field of necessity. The bar graph in Figure 2 shows that the use of þurfan sharply declined from OE till LME, while the more recent impersonal and personal need verbs steadily gained in frequency and eventually replaced the older þurfan in LME. In the ME period þurfan furthermore became homonymous with the descendant of another OE preterite present, viz. durran ‘dare’. By the end of the 15th century, the homonymy had disappeared: the resultant form survived with the ‘dare’ meaning, while the meaning of þurfan was taken over by need. Dare ‘dare’ survives up to this day,

More on the Ancestors of Need

47

and, surprisingly, it can function both as an auxiliary and as a main verb, not unlike need.

9

need ‘need’

37 OE EME 86

þurfan ‘need’

LME

23 5

0

20

40

60

80

100

Figure 2. The verbs þurfan and need (pers. + imp.) in OE (HC), EME and LME (PPCME2).

In (11) we repeat the part of the scenarios of (9) that is common to both the a. and the b. version, and we add the replacement story concerning þurfan and durran. (11)

neadian

nede

‘compel’

‘compel’

neadian

nede

need

‘it is

‘it is necessary’

‘it is necessary’

necessary’

nede

need

need

‘need’

‘need’

‘need’

þurfan

‘need’ durran

dare

dare

dare

‘dare’

‘dare’

‘dare’

‘dare’

48

Johan van der Auwera / Martine Taeymans

The homonymy appeared in (the) other Germanic languages as well (see also Birkmann 1987). In German, the winning form dürfen continues the ‘need’ meaning, and it seems that Dutch continues both: there is both a derven ‘need’ and a durven ‘dare’. Why is it that the English homonymy was settled in favour of the ‘dare’ meaning? In search for an answer, we will stumble on yet another replacement. Like the modern auxiliary need, OE þurfan seems to have been a negative polarity item (Visser 1969: 1423-1424). This begs the question of how OE expressed ‘need’ in a positive affirmative sentence, i.e., the equivalents to modern (1c). As Visser (1969: 14301431) points out, we find constructions with be or have, of the type it is (him) need to go and he has/had need to go. Visser supplies examples with the noun neod ‘need’, but as the OE part of the HC will show and as has been pointed out by Molencki (2002: 375-378), the neod pattern alternated with patterns involving the noun þearf ‘need’ and even a complex noun nedþearf ‘need’. Schematically: (12a) Us is þearf þæt … (12b) Us is nedþearf þæt … (12c) Us is ned þæt ‘We need …’ (13a) We habbaþ þearfe… (13b) We habbaþ nedþearfe… (13c) We habbaþ neode… ‘We need…’

These constructions were not restricted to positive affirmative sentences. One should consider them neutral with respect to polarity. The finer details of the alternation remain to be studied, but this much seems clear from the data. Already in OE and thus before the ancestor of the verb need varied with þurfan, a neod etymon was in alternation with a þearf etymon in the nominal domain, and the two even combined in nedþearf (cf. Figure 3). However, except for a single example of be þearf in the Lambeth Homilies, which is also the last example quoted by the OED and the MED, constructions with þearf and the mixed nedþearf were no longer attested in EME (cf. Figure 4). The noun need on the other hand, was fully operational in impersonal constructions with be (58 occurrences), but also in

More on the Ancestors of Need

49

personal constructions with have (36 occurrences). It had almost fully replaced þearf, and was found in contexts where þearf would have appeared earlier.

1

habban neode

8

beon neod 5

habban nedÞearfe

OE

16

beon nedÞearf

21

habban Þearfe

89

beon Þearf 0

20

40

60

80

100

Figure 3. Personal and impersonal constructions with þearf, nedþearf and neod in OE (HC).

1 have need

36

8 be need

OE

58

EME 89 be þearf

1 0

20

40

60

80

100

Figure 4. Personal and impersonal constructions with þearf and neod in OE (HC) and EME (PPCME2).

It seems plausible to assume that the replacement in the nominal domain facilitated the later association and eventual replacement in the verbal domain. It is also clear that the early nominal constructions with a need noun with be or have either did not survive to the present

50

Johan van der Auwera / Martine Taeymans

day or have become marginal, and that at least one of the replacements is the modern verb need, this time in its full verb form. In sum, modern need replaced a negatively polar þurfan and a set of polarity neutral nominal constructions. Modern need inherits features of both, though. The auxiliary need is a polarity negative need and the full verb is polarity neutral.

5. Conclusion We have argued that the present-day English verb need replaces at least four earlier constructions: (i) a personal need verb meaning ‘compel’, (ii) an impersonal need verb meaning ‘it is necessary’, (iii) a non-need verb meaning ‘need’ in negative polarity contexts and (iv) a set of polarity neutral nominal constructions meaning ‘need’. With the help of a set of historical corpora and dictionaries, we further argued that the impersonal need is just an impersonal use of the personal need verb meaning ‘compel’. Then the impersonal uses were reanalyzed giving the modern personal ‘need’ meaning, and the latter ousted the earlier ‘compel’ verb. Finally, the new personal need verb further replaced an old þurfan verb, which had the same meaning, and which suffered from a homonymy with durran. This replacement was facilitated by an earlier replacement of nominal constructions using þearf and neod.

References Birkmann, Thomas 1987. Präteritopräsentia. Morphologische Entwicklungen einer Sonderklasse in den Altgermanischen Sprachen. Tübingen: Niemeyer. BTD: Bosworth, Joseph/Toller, Thomas Northcote 1989 (1898). An Anglo-Saxon Dictionary. Oxford: Oxford University Press.

More on the Ancestors of Need

51

BTS: Toller, Thomas Northcote 1921. Supplement of BTD. Oxford: Oxford University Press. Clark Hall, John Richard 1960. A Concise Anglo-Saxon Dictionary. With a supplement by Herbert D. Merritt. Toronto: University of Toronto Press. DOEC: Healey, Antonette diPaolo/Haines, Dorothy/Holland, Joan/McDougall, Ian/Xin, Xiang (eds) 2004. The Dictionary of Old English Corpus in Electronic Form. Toronto: DOE Project 2004 (on CD-ROM). Duffley, Patrick J. 1994. Need and Dare: The Black Sheep of the Modal Family. Lingua 94, 213-243. HC: The Helsinki Corpus of English Text: Diachronic and Dialectal. Helsinki: University of Helsinki. Loureiro Porto, Lucia 2002. Gramaticalización de Algunos Modales de Necesidad en la Historia del Inglés: Un Estudio de Corpus. In Interlingüística 13 (Actas del XVII Congreso Internacional de Jóvenes Lingüistas, Alicante, 18-20 abril de 2002). Alicante: Universidad de Alicante, 393-404. Loureiro Porto, Lucia 2003. Semantics in the Old English Predecessors of Present-day English Need: Gradience in Root Necessity. In Palacios Martinez, Ignacio Miguel/López Couso, Maria José/Fra Lopez, Patricia/Seoane Posse, Elena (eds) Fifty Years of English Studies in Spain (1952-2002). A Commemorative Volume. Santiago de Compostela: Servicio de Publicacións e Intercambio Científico, 321-327. MED: Kurath, Hans/Kuhn, Sherman/Lewis, Robert E. (eds) 19522001. Middle English Dictionary. Ann Arbor, Michigan: University of Michigan Press. Molencki, Rafaá 2002. The Status of Dearr and Þearf in Old English. Studia Anglica Posnaniensia 38, 363-380. Molencki, Rafaá 2005. The Confusion between Tharf and Dare in Middle English. In Schendl, Herbert/Kastovsky, Dieter/Ritt, Nicholas (eds) Rediscovering Middle English Philology. Bern: Peter Lang, 147-160. OED: The Oxford English Dictionary on CD-ROM 2002. Oxford: Oxford University Press. PPCME2: The Penn-Helsinki Parsed Corpus of Middle English. University of Pennsylvania.

52

Johan van der Auwera / Martine Taeymans

Taeymans, Martine 2004. An Investigation into the Marginal Modals Dare and Need in British Present-day English: A Corpus-based Approach. In Fischer, Olga/Norde, Muriel/Perridon, Harry (eds) Up and Down the Cline – The Nature of Grammaticalization. Amsterdam: Benjamins, 97-114. Van der Auwera, Johan 2001. On the Typology of Negative Modals. In Hoeksema, Jack/Rullmann, Hotze/Sánchez-Valencia, Victor/van der Wouden, Ton (eds) Perspectives on Negation and Polarity Items. Amsterdam: Benjamins, 23-48. Van der Auwera, Johan/Taeymans, Martine 2004. On the Origin of the Modal Verb Need. In Aertsen, Henk/Hannay, Mike/Lyall, Rod (eds) Words in their Places: A Festschrift for J. Lachlan Mackenzie. Amsterdam: Vrije Universiteit Amsterdam, Faculty of Arts, 323-331. Van der Gaaf, Willem 1904. The Transition from the Impersonal to the Personal Construction in Middle English. Heidelberg: Winter. Van der Wouden, Ton 2001. Three Modal Verbs. In Watts, Sheila/West, Johnatan/Solms, Hans Joachim (eds) Zur Verbmorphologie Germanischer Sprachen. Tübingen: Niemeyer, 189-210. Visser, Frederick Th. 1969. An Historical Syntax of the English Language. Part Three, First Half. Syntactic Units with Two Verbs. Leiden: Brill.

MANFRED MARKUS

Spotting Spoken Historical English: The Role of Alliteration in Middle English Fixed Expressions

But trusteth wel, I am a Southren man, I kan nat geeste ‘rum, ram, ruff’ bi lettre. (Chaucer, ‘The Parson’s Prologue’, Canterbury Tales)

1. Introduction Alliteration, i.e. the marking of the stressed, mostly word-initial syllables and morphemes, by the repetition of identical or similar sounds or sound combinations (cf. Knowles 1987: 84), is well-known in quite a number of contexts: from occasional literary titles (Love’s Labour’s Lost, Of Mice and Men) to stage names, such as Kevin Costner, and from the poetry and opera texts of alliterative afficionados, such as Gerard Manley Hopkins and Richard Wagner respectively, to everyday idiomatic phrases of Present-day English (henceforth PrE), such as to beat about the bush. But apart from such examples, alliteration, in the shape of the alliterative long line, is a metrical feature limited to the Middle Ages, to Old English (henceforth OE) poetry and the Middle English (henceforth ME) literature of the so-called Alliterative Revival. From the 13th century on, alliteration was generally replaced by end rhyme in most European countries, including Britain. The impulse for this development came from the Romance languages, in particular Italian and French, with their generally final or penultimate word stress. The common opinion about alliteration in English today seems to be that it

54

Manfred Markus

is, if not as dead as a doornail, still a marginal and merely decorative formal device. This may be the case as far as Modern English is concerned. Yet it seems fair to say that historical English and, in particular, ME (apart from OE, of course) show more evidence of alliteration as an important factor of language motivation than has generally been observed. The historical role of alliteration in Germanic poetry is well-known and has, of course, been studied in some depth. 1 Yet alliteration was also an essential feature in West Scandinavian, particularly Norwegian prose of the Middle Ages (cf. Lexikon des Mittelalters, I, 431, ‘Alliteration’), and one would like to claim that it played a more than marginal role in ME prose as well and, it may be concluded, in spoken everyday language. What at first looks like mere decoration in phrases and idioms today seems to have been an important factor in ME word formation and phraseology. Research literature on alliteration in spoken ME, beyond verse literature, is sparse to non-existent.2 This study therefore tries to trace the role and mechanisms of alliteration in non-metrical ME, using the 130 texts of the Innsbruck Middle English Prose Corpus3 (henceforth ICAMET) as a text basis. While prose does not fully represent spoken English, it allows us to get as close as we possibly can. With the lack of verse lines, alliteration in prose marks cohesion within other units of speech, namely within words, phrases and clauses. Compounds such as busybody and ME love longing (bedesbydding, ber-bayting) fall into the first category. Alliterative coordinative pairs, such as better and better, represent the main subgroup within phrases, but since this type has been dealt with elsewhere (Markus Forthcoming), the present study will focus on ‘subordinate constructions’ (Bloomfield 1933: 195), headed by nouns, 1 2

3

In Markus (Forthcoming) I have given a short survey of the research situation, with reference to further literature. The most extended collection of alliterative formulae in ME still is Oakden, but it refers to prose only sparingly (Oakden 1935: 15-20 and 361-363). Also cf. Minkova (2003: 81). Available on a CD-ROM via Manfred Markus, English Department, University of Innsbruck, Austria, and from the ICAME HIT Centre in Bergen, Norway. The full corpus comprises 159 files from ca. 1100 to 1500 (some texts have been split), with a total of nearly 5 million words. The Sampler, which is copyright-free, comprises 119 works (in 131 files).

Spotting Spoken Historical English: The Role of Alliteration

55

adjectives or verbs. The third type of alliteration, alliterative cohesion within clauses (type: veni, vidi, vici)4 is difficult to trace by computer and is therefore excluded from the present discussion. The study will topicalise the following phrase types: first, it will concern itself with noun and adjective-headed constructions (section 2), then with verb-headed ones (section 3), with the sub-chapters focusing on ‘complex predicates’ based on some of the main English verbs (to make/do/take/get/have/bear/cast). A complex predicate is a verb phrase which has a verb idiomatically complemented by an object or prepositional object (type: to make money). Section 4 will then widen the horizon by discussing the role of ME alliteration in the feature of spokenness and within the historical context. In the conclusion, I will dispute the traditional reliance of historical English linguists on verse poetry and give reasons why we should need a grammar of spoken historical English.

2. Noun- and adjective-headed phrases Type (a), N’s N, applied to the letter b, is poorly represented in the corpus, namely by only 27 text passages, seven of which refer to Belial’s brother (4) and Belial’s beadle (3). Some other examples of this type refer to proper names. Otherwise the Saxon genitive does not seem to encourage fixed expressions, perhaps because the possessive of the type ‘John his father’ was also common in later ME (cf. Bøgholm 1939: 206), for which see (b). Type (b): In this type (N x N), x stands for all kinds of interceding word classes between the nouns, though the pattern mainly dealt with here is N prep N. Modern expressions such as tit for tat and tea for two suggest a high frequency of this type in ME. Indeed, b* for b* provides four occurrences of blood for blood (meaning something like ‘a tooth for a tooth’) and fifteen occurrences of body for body, the equivalent of modern ‘man against man’. A query b* his b* provides 4

This type also includes comparative subordinate clauses introduced by as: as bold as brass (cf. Quirk et al. 1985: 1137).

Manfred Markus

56

an output of 35 cases of b* his brother. To avoid this output to be merely accidental, I checked the same query pattern with the letter d (d* his brother), and received only five occurrences; the letter f provided eleven co-occurrences.5 Obviously alliterative cohesion was a driving force behind phrasal forms. The best candidate for checking this hypothesis is, of course, the ‘Romance’ genitive N of N. And, indeed, 281 alliterative b* of b* occurrences confirm the vitality of the pattern, from Bacham of Bliberg and other proper names to bath of blood (2), the bishop of Bristo, the bastard of Burgogne and bushes of briars. Crossing out even the samples with only one proper name (such as bishop of Bangor), there is still an output of 143. The most frequent phrases are (with the numbers of occurrences added): beauty of body (8) blood of beasts (3) broth of beef (19) bone of bitterness (6) business of body (4) breaking of bread (4) bodies of bondemen (3) burthen of barn (5) blessing of b* (4) Some of the collocations turn out to be fairly flexible. Thus, blood often co-occurs with beast, but also with bath, brain and bread (ca. 10x altogether). Type (c): The third type of an NP with alliteration is the one with an adjectival attribute (type: friendly fire). Epithets are so common a feature in classical rhetoric that alliteration as a decorative or emphatic means of expression can be expected in this syntagm. And, indeed, the computer provides more than 300 phrases of this type, from baleful beast (3x) to brown bread (12). Baleful beast, 5

For the sake of simplification I have included cases of apposition (such as Frederik, his brother), which are of course syntactically different from the possessive construction Frederik his brother; but syntactic ambiguity is of no concern in our context.

Spotting Spoken Historical English: The Role of Alliteration

57

which is a metaphor for the devil, is also referred to as the bitter beast (2) and, in one case, the bitter baleful beast. Bitter otherwise also cooccurs with besmen (normalised besom, = ‘bundle of rods used as an instrument of punishment’, see OED), behreowsing (‘repentance’) and bale, all of them suggesting the lexical field of sin and evil. Other prominent collocates of the output list are beloved brother (31x; mainly stemming from the formula of address in 15thcentury letters), combinations with blessed or blissful, moreover with bodily, best and black. Blessed and blissful frequently combine with birth (9), blood (9), body (80) and bread (3), and in one case even tautologically with bliss (‘blessed bliss’). All these expressions have the familiar Christian meaning, with blessed body, for example, suggesting the body of Christ and the ceremony of the Eucharist. Given this background one may also feel encouraged to interpret best blood (3) and best body (4) as fixed expressions. By contrast, the use of this common-or-garden word best in combination with nonsuggestive words like beast (6) or blanket (1) does not allow the assumption of fixed expressions. Along these theoretical lines, it seems fair to split the remaining alliterative syntagms into fixed expressions and apparently accidental coinings. In the first group one could include the following: blind beholding (7) bloy Bretaigne (13) busy beholding (3) bodily behaviour/bemeaning/bearing (7) bodily business/busiship (3) bodily bread (5) black blood (5) black bile (4) black bread (1)/brown bread (12) black bays (= ‘black berries’) (3) bright(est) bower (5) brenning brond (3) blind bayard (1) bodily battle (1)

58

Manfred Markus

The criteria for inclusion in this list are mainly quantitative (the threeor-more rule), but in the case of the last two samples also qualitative. When the adjective tends to be redundant, as in bodily battle (or brenning brond) or cannot be interpreted literally (as in black blood), the alliterative form seems to have dominated over meaning. The same is true when the whole expression shows metaphorical shift or is closely connected with a concept based on a metaphor. Blind bayard is not only the bay (= ‘brown’) horse that bayard literally refers to, but is also applied to people (cf. OED, see 3). And bodily bread only makes sense as opposed to the blessed bread, with its Christian connotations. Finally, black vs. brown (and white) bread is also a stylised concept of the facts involved, a concept that has survived to the present day, since some non-white yellowish bread referred to as ‘brown’ in Britain is not really that colour, and its counterpart is called ‘grey’ by some nations on the European continent (cf. German Graubrot). Alliteration seems to have influenced the concept of reality. On the other hand, ICAMET contains a number of b-initial adjective-noun phrases which cannot claim to be fixed expressions, at least not at first sight. Black box, black boy and black buck are such collocations, so are best blanket (mentioned above) and broken bones. Alliteration plays a decorative or coincidental role in these phrases, but there is no evidence for them to have been fixed in the Middle Ages. Black box developed its special modern meaning much later (according to the OED, it started as Royal Airforce slang), black boy in the sense of a ‘Negro servant’ (definition of the OED) was only coined after the beginning of Western slavery, and black buck in medieval contexts cannot possibly have signified an antelope yet (cf. OED); as regards best blanket and broken bones, they mean, in their contexts, exactly what they suggest. In other words, these and many other such findings have to be considered mere syntactic groups as long as their character as a fixed expression has not been proved individually. As regards adjective-headed alliterative phrases, the frequent formulaic phrase best beloved (brother, or friend) 6 suggests that an 6

Hyphenated best-beloved, which occurs only once, is a compound rather than a phrase, but the distinction does not matter in the present context and is

Spotting Spoken Historical English: The Role of Alliteration

59

adjective complemented by an adverb also contributed to the medieval wealth of alliterative phrases. There are, in fact, some twenty occurrences of best beloved (in its different spellings) in the corpus. To these it seems fair to add four cases of better beloved. And while searching for better and best, one finds further adjectives, mainly participles with better or best: best-bitornen (1), better/best biseyn (‘beseen’) (3), best betrusted (2), better beset (‘in a better position’) (4), better believed (3) and a number of cases of better or best coupled with burning/borne/breathed or bestowed. Apart from better/best there are hardly any other adverbs complementing an adjective (including participles) with initial b, probably as a result of the relatively low frequency of the syntagm adv + adj. Bitterliche breokinde (i.e. ‘bitterly breaking’), however, seems a good candidate for a fixed expression, since bitterly also occurs headed by the verb to break (see below). Otherwise, only brightly, partly in the unmarked version bright, occurs six times in an alliterative pattern, four times in brightly burning (e.g. bryZte brennynge swerd), and twice in the phrase bright blikinde (‘brightly looking’). Coming now to the occurrences of noun-headed alliterative clusters based on other letters than b, the following tables provide a survey: syntactic categ. N* of N*

Phrase c*/k*/ch* of c*/k*/ch* d* of d* f* of f* g*/3* of g*/3* L M N P R S t/th (excl. to/the/they/them) W

Total 1175 480 218 851 252 742 102 476 78 923 2056 328

occur. 3 or more 548 310 488 599 89 222 5 27 15 315 17 52

Table 1. Frequencies of occurrence of alliterative N* of N*. difficult to draw in ME (cf. his best be loved fareth well in one of the texts of the corpus). Separate, hyphenated and one-word spellings have therefore been included.

Manfred Markus

60 syntactic categ. Adj N

phrase (a-z)*-ful N (a-z)*-ly N (a-z)*-ous N

Total 307 5092 96

occur. 3 or more 140 630 53

Table 2. Alliteration in NPs with attributive adjectives. syntactic categ. many/much N N=m* N=b* N=d* N=f* N=g* N=k* N=l*

phrase 4685 870 255 308 339 382 108 189

Total

occur. 3 or more

N=n* N=p* N=r* N=s* N=t* N=th* N=w*

124 369 120 555 326 315 425

Table 3. Many/much as attributes alliterating with the heading noun.

The tables show that in the syntagms N of N and Adj N alliteration occurs very frequently. Gleaning out the phrases with at least three occurrences (column 4) still results in long lists which cannot be quoted here in full. They are, however, provided in the Appendix. The queries for adjectives were, for reasons of retrievability, limited to a subset, namely adjectives marked by -ful, -ly and -ous, and in view of their generally high frequency, the quantifiers much and many were also considered. While quantity of occurrence is not the only criterion of idiomaticity, contrasting output figures with those of alternative options can be revealing. For example, the phrase cleanness of conscience, which occurs eight times, could equally well have been expressed by synonymous purity of conscience, but the corpus has this alternative option only once, though the word purity as such occurs 23 times. By the same token, much and many are clearly more often combined with an m-initial word than with words which begin with any other letter of the alphabet. The dimensions of the results are: much/many m* (870), much/many b* (255), much/many d* (308), etc. (Table 3). Obviously alliteration was a motivating factor in the formation of attributive clusters, whether formed with the adjective or the genitive.

Spotting Spoken Historical English: The Role of Alliteration

61

Beyond such quantitative findings a few qualitative remarks concerning special cases seem appropriate here. Some of the phrases suggest their character as fixed expressions by the metaphorical quality of at least one of the units, as in cleanness of conscience, cleanness of charity (for /k/ alliterating with /tS/ see below), darkness of death, lamb of life (‘Jesus’), mantle of meekness, wells of wisdom, etc. Some other phrases reveal their quality as fixed expressions by lexical archaism, as in father of frumschaft (‘first creation’) and willnung of worship – both frumschaft and willnung are OE words which, according to the OED, last occurred a1225. In some rare cases the phrases in the Appendix lists show aphetic word formation, i.e. words were used in the form of ‘clippings’ by omitting unstressed prefixes: pistle of prayer (rather than epistle), pain of prisonment (rather than imprisonment); in either case the OED reveals that the full and the clipped forms coexisted side by side during ME. Obviously the author preferred the aphetic forms for the sake of alliteration. The main qualitative feature of the phrases at issue, however, is a striking redundancy of meaning, obviously accepted for the sake of emphasis, and/or the high degree of recognition or esteem attributed to the referent, particularly apparent in the case of the perpetual truths of religion. Redundancy prevails in keiser/king of kings, doom of doomsday, flame of fire, lord of lords, man of man etc. Well-known religious formulae can be seen in phrases such as church of Christ, cross/crown of Christ, lamb of life, lichom (‘body’) of lamb, lusty life (in the special sense of the theory of the seven deadly sins), fleshly father (as opposed to God as spiritual father) etc. (for further examples see the lists in the Appendix).

3. Verb-headed phrases The collocation of adverbs with verbs confirms some of the results of the last section: brightly (ME brihte) collocates with to burn, and better/best with to bear, bring, build and a number of other verbs. It can also be gathered from the context of the retrieved text passages

Manfred Markus

62

that to beseech and to behold, combined with the adverbial complement busily, mean something like ‘to take care of’. While the OED confirms this assumption, the MED does not seem aware of the collocation (though a few of the quotations listed under ‘bisili adv.’, p. 902, give evidence of it). However, all in all, it is obvious that adverbs are generally infrequent in ME. Given this drawback and, moreover, the fact that adverbs were often morphologically unmarked, an investigation of alliteration in the so-called ‘composite predicates’, i.e. verbal phrases of the type to make a mistake, now seems preferable. There have been some recent studies on composite predicates in ME (for example, by Matsumoto/Tanabe/Closs-Traugott, all in an edition by Brinton/Akimoto, 1999 7 ). It is uncontested that these composite or ‘complex’ predicates played an increasing role in ME, moreover that they consist of what Jespersen (1942: 117) called a ‘light verb’, i.e. a verb with little concrete meaning, followed by a noun which carries the essential meaning of the whole phrase (cf. Matsumoto 1999: 60). The light verbs that have been identified by researchers are, above all, make, do, give, get, have and take. Nickel (1978: 82-83) in this context also mentioned to bear and to cast, which he traced to be important in Sir Gawain and the Green Knight. Since this is one of the main works of the Alliterative Revival, there is good reason for including them in the following analysis. The ME composite predicates have evoked some interesting questions concerning syntax, for example, in view of the variably acceptable modification of the phrases using adjectives (such as to make great mourning) and articles (make [a] noise) (cf. Matsumoto 1999: 83-88). It has also been discussed whether certain frequencies such as that of make are the result of French influence (Prins 1952, cf. Matsumoto 1999: 61). The option that alliteration could also play a motivating role for the high frequency of the composite predicates has never been considered. In the present context I will focus on this aspect, ranking the issue of the degree of ‘frozenness’ and of the influence of French as secondary.

7

Cf. my review of the book in Anglia 122 (2004: 491-494).

Spotting Spoken Historical English: The Role of Alliteration

63

3.1. To make The phrases striking us by their high frequency are combinations of to make with mention (ca. 30), memory (11), mind (20), moan (21), melody (9), marvel (6), marriage (4), mirth (15), mourning (8), mumming (‘pretense’, 5) and mend (‘reparation’, 5). Some of these phrases have been traced by the OED or MED; the OED, for example, defines to make a mumming (‘to treat with levity or contempt’. Obs.) and gives 1523 as the date of first occurence. To make mend is not mentioned at all. The reason for such mistakes may well be the fact that the ‘partners’ of alliterating couples, the so-called ‘conjoins’, are frequently positioned at a distance. To make followed by manner, for example, while probably a calque of French faire manière de (cf. OED), occurs more often than not after intervening parts of text: to make a manner/some manner/no manner etc. (cf. Table 4).

Table 4. WordSmith results file of make…manner.

Only in some of the 19 output passages do we still find manner with the syntactic function of an object case, as in line 18: he maade a

64

Manfred Markus

maner of thonking to þe man. But in several other cases manner is obviously on its way to a grammaticalised adverbial: make all maner confectyons (l. 5, cf. 8, 10, 11, 13, 14; cf. kind of in ModE). This is also true of in manere of (17), which is, of course, not an object-case, but an adjunct. It has obviously ‘lost’ its article (in [the] manner of), similar to all maner confectyons (cf. 5 and 8), which functions without the plural marker (maneres) and the cohesive marker of. The example of make…manner shows that, while such grammatical changes and inconsistencies are typical of idiomatisation, the general motivation of all the phrases listed is the alliterative pattern. This is also true of the idiom to make merry, which, though still common today, is strange in that it makes use of the adjective merry, rather than the noun mirth, which would seem more grammatical. Mirth is an i-umlauted derivation of merry. There are about 30 occurrences of to make merry in the corpus, either in this very form or in a more explicit form, such as to make him/thee (etc.) merry. The ungrammatical form to make merry is obviously a rudimentary version of the explicit phrase with the object to make sb merry (as was also claimed by Nevanlinna 1980: 38). It seems fair to assume that the idiomatisation of to make merry, with its present archaic meaning, came about as a concession to the regular (trochaic) scansion. Moreover, there are usages of the adjective for us to conclude that it was confused with the noun mirth: for example, more mery (with as an allograph of thorn). Both points are mentioned by Nevanlinna (1980: 35f.). But the heart of the matter is that alliteration encouraged both phrases and has prevented the one with the adjective, which was more common, from becoming obsolescent. Given that alliteration helped to coin phrases or keep them alive, the reader may now be ready to trace it in other constellations as well. Extending, for example, the scope of the WordSmith query routine to four words (rather than two) to the right of make, we find even more examples of to make plus manner, mostly with manner as adjunct, for example, make my testament in maner (…). The query provides 119 output samples. Even though only a subset of these are of the syntactic type (alliterative) V + N, the frequency of the occurrence lends further evidence to the validity of the collocational pattern make…manner.

Spotting Spoken Historical English: The Role of Alliteration

65

Another way of increasing the output of the query is by including the strikingly large number of quantifying and intensifying adjectives before the object noun in phrases such as make mekyl/much mone (‘to make great lamentation’) and make much/more/most joy/sorrow. The first type, of triple alliteration, also manifests itself in make mekyl mourning. However, given the artificiality of the three successive alliterative words and also the fact that the form mekyl (‘mickle’) was restricted to the Northern and North Midland dialects (cf. OED), the pattern is fairly rare. We do, however, have quite a number of manifestations of the pattern with substitutional great: make great mone (13), make great mirth/marvel/mourning/mangeries/ mention (ca. 10). The attribute great seems to owe its flourishing in such cases to the fact that it was the main lexical heir of mekyl. On the other hand, much, which was increasingly used as a short form of mickle, is so frequent after make and before an object noun that one tends to assume the alliterative pattern to be working here. Make much joy and ~ sorrow occur 27 times in the corpus, and there are also cases of making marvellous joy and making mervellous great sorrow; moreover, we have some occurrences of make maunde or make maundement, which show clipping of the m-nouns from French-derived commaunde and commaundement respectively. All this suggests that the choice of words, and partly even word formation, were occasionally influenced by the trend to use alliterative patterns.

3.2. To do To do preferably collocates with alliterative deeds, damage, diligence, desire and dread. Including adjectival attributes (as in the case of make), one also has to mention do deadly sin(s)/due service/divine service/due penance. On the other hand, the alliterative echo sometimes follows the verb do only after a modifier, i.e. in the second or third position to the right. One thus finds: do great deeds, do my/your devoir, etc. While devoir is extremely common within this pattern (ca. 40 occurrences), it does not occur without the possessive modifier. There are also eight occurrences in the corpus with damage modified by some adjective such as great. To do deed likewise usually

Manfred Markus

66

occurs in the complemented version with a modifier, such as marvellous (some 40 occurences altogether). Moreover, to do is followed by debt (2), devotion (2), duty (1), disease (3), dancing (1), destruction (1), derogation (1), as well as disportes (‘sports’). It now comes to light why PrE has idiomatic to do sports, rather than *to act sports or whatever else (cf. German Sport treiben) – to do sports is a relic of an alliterative idiom from the time when disport was not yet clipped.8 Expanding the computer query to the third position to the right of to do provides further results, such as to do him great discomfort. But rather than pursue this pattern, examples of which have to be gleaned out of hundreds of lines, mention should be made of the obviously very idiomatic usage of what looks like a redundant do: he did do; this also works with other verbs: do/did draw, etc. Our concordance provides dozens of cases of this pattern (including the variant do do), both with an additional d-word after do (dyd do digge) and with some other infinitive. The first twenty, out of 80, samples are provided in Table 5.

Table 5. Did do and do do (excerpt).

8

According to the OED, sport had its first occurrence c. 1440 (disport 1303).

Spotting Spoken Historical English: The Role of Alliteration

67

As regards the meaning of the construction, it can be gathered from the contexts of the passages and from sparse remarks in the OED (cf. 23 and 25a)9 that the construction is either causative or periphrastic; in other words, the first quotation The King did do cry this feast means either ‘The King had this feast announced’ or ‘The King announced this feast’.10 There is no need in the context of this study to decide in individual cases which of the two semantic interpretations is the better one. Ellegård, in his detailed study on auxiliary do (1953: 110-115), gives evidence of his opinion that duplicative did do + verb – he does not mention do do – was meant to render simple did or let or made + verb and that the phrase was a loan translation from French faire + verb, particularly in the early Caxton translations and in Melusine. The duplication of do, he thinks, is “accepted as a controversy on the issue” (in Visser 1970. III, § 1411-1417). This is a tempting argument, but in the face of the greater number of sources now available for this study, it is not totally conclusive since did do occurred, even if rarely, in some texts before 1400, when periphrastic to do did not really play an important role yet (cf. Ellegård 1953: 103, and his diagram p. 162). Later comments, such as those by Visser (1970), have not brought definite clarity either.11 Fact is that did do/do do + infinitive was quite a popular pattern, which became obsolescent in Modern English as a result of the grammaticalised use of to do in negative and interrogative clauses: causative do + verb was only in use until the early 16th century, and the periphrastic pattern until the 19th century (OED). It seems fair to conclude, therefore, that the popularity of the 9

10

11

Strangely enough, the MED does not even mention the construction did do, but only refers to “unstressed don plus inf., used as the equivalent of the simple verb” (p. 1235: 11b). As it happens, the example has been taken by Visser (1970: III, § 1213) to demonstrate the causative use of do, but Visser then adds that the causative character was “often obscure” and mixed up with the periphrastic pattern. Visser critically questions the validity of Ellegård’s statement by referring to his (Ellegård’s) own quotations from prose texts which were written before 1400 and in which periphrastic do occurs (III, § 1418). On the other hand, Visser also argues the opposite, suggesting that the main motivation for the use of periphrastic do was the fact that it gave an author metrical flexibility within the line, enabling him to move the infinitive to the end of the line into the rhyme position. Did do + inf. as a special pattern is not mentioned at all by Visser.

Manfred Markus

68

construction in the Middle Ages, no matter what caused its genesis, was partly a result of its jingling alliteration.

3.3. To take, to get and to have The verb to take is followed by several striking alliterative nouns, among these town (11x; sometimes with a determinator: e.g. many towns), truce (7), tournement (2), teaching, tables, time, tribute, tribulations, testament (2), treasure (3), translation, trewage (‘toll’), and tolle (‘toll’). Tent occurs even ten times. The word is a ‘false friend’ of PrE tent and, in fact, means ‘intent’ (< Anglo-Norman entent, see OED n2). As in various earlier cases mentioned above, we are again faced with an aphetic word, and again it seems fair to assume that its coining was encouraged by the love of alliteration. To get, like to take a word of Scandinavian origin, likewise produced a number of complex predicates, among these to get good(s) (ca. 30), ~ grace (8), grant (6), ground, gear and glory. Grace and good(s) allow some flexibility as to modification of the noun: to get God’s grace, to get more good, etc. Finally, to have produced quite a number of alliterative fixed expressions used in prose, for example: have harm (‘be hurt’) (7), have x harm (26) have (hasty) help (24), have x help (11) have (high) honour (11), have x honour (4) have hope (15), have (x) hope (3) have x horse (8)/harness (5), have (x) heart (3)/husband (4), have heaven (11)/have hunger (12)/have health (8) Have hwile (‘have time’) (2) nicely shows a case where alliteration only worked in early ME, i.e. as long as still occurred as . The fixed expression have the hour and the time (1) suggests that the semantically redundant hour is mainly motivated by the alliteration with have. A few other cases should be seen against the background of h-dropping, which was a common phenomenon in ME, just as the insertion of h by hypercorrection (cf. Markus 2002: 21). Given the general instability of word-initial h’s, the complex predicates have

Spotting Spoken Historical English: The Role of Alliteration

69

habundance (6) (= abundance), have habilite (ability) (4) and have hauoyr (< Fr avoir ‘property, money’) are not surprising. In all these cases an h was inserted in h-less French words, thus allowing alliteration. Seen against this background of unstable h, the syntagm to have abundance (2), non-alliterative on the surface, is also part of the alliterative pattern. So are the forms [h]aue are of (‘have ear of’ = ‘lend me your ear’) and [h]ave an answare, which both show original h-dropping emended by editors irrespective of the alliterative pattern. Given the unstable status of word-initial h’s in ME, the combination of to have (i.e. to ’ave) with a vowel-initial noun could be seen as a further case of an alliterative cluster. The first letter of the alphabet provides a large number of candidates: have auctorite (6), have akyng (‘pains’) (3), have ado (56)/answer (7)/acquaintance/ apparence/appetite/affection/acquittance and to have advice & council, with the typically synonymous doublet, are the main examples. Yet, as is well known, vowels were generally seen to alliterate with one another. So this query should not be restricted to the letter a, but include the other vowels as well, i.e. complex predicates such as have ende (20)/example (21)/envy (13). This exploratory study, however, does not aim at providing complete lists, and trusting that the role of alliteration in the syntagm have + N has been presented conclusively, we may ‘have ende’ with this point.

3.4. To bear and to cast To bear is very common in Early ME, since within its lexical field it also covered the meaning of to carry, which is the synonym that was added in Late ME (cf. OED). It therefore comes as no surprise that to bear alliterates frequently. The main collocating nouns as objects are: burden (16) blame (6) body (14) banner (13) bag (3) battle (6) bliss/blessed x (13)

70

Manfred Markus

To cast, borrowed from Scandinavian casta during the OE period (OED), in ME shared its lexical field with both to throw and to warp (< OE weorpan). Accordingly, alliterative collocates of to cast are not very frequent, and ICAMET, with its more than 6 million words, may well turn out to be too small for such rare collocates. Nevertheless, the results available allow for a few interesting observations. Certain items of clothing, at least if they are cloaks, clothes and kirtles (‘tunicas’), are cast (on or off), rather than put on or whatever else. Cast also collocates with cry (4), in the sense of ‘to utter a cry’, and with the terms for two spices, clove (8) and canel (= ‘cinnamon’) (9). While the normal word for adding ingredients in ME cookery books is the imperative take, cast has occupied a few semantic niches obviously under the influence of alliteration. This is true beyond the narrow domain of cookery recipes. The OED (32, 64, 69) gives evidence of to cast cantraips (‘spell of witchcraft’), to cast cavel (‘to do magic’), to cast a clod between (Sc.) (= ‘to widen the breach between’) and some others, but there is no sufficient evidence for the quantitative role of these phrases in ME. In ICAMET, cast also collocates once or twice with clarity, club, cunning, calx (‘metallic oxyde’), calion (‘pebble’), comade (‘mixture’) and conscience. There is, unfortunately, slightly more quantitative evidence for cwarterne (= Lat. carcer ‘prison’) and cwalhus (cf. OE cwellan ‘to kill’, thus, ‘place of execution’), and in one case the two nouns are combined: kasten in cwarterne & i cwal-hus. Finally, there are some collocations of casten with charm (‘magic’, 4), church (4), and chere (‘face’, 1). It should be explained here that – contrary to our modern expectations – phonemic /k/ alliterated with the sibilans /tS/ at least in early ME – the reasons are given in a detailed discussion by Minkova (2003: 72-107). Her arguments are closely connected with the phonological process of the palatalisation and assibilation of OE /k/ into /tS/. The rule affected OE cirice, but of course not the ME French-based words chere and charm, and yet these were accepted as eligible for alliteration. This means that the rule of alliterating originally identical but now divergent sounds was up to a point generalised to apply even when these sounds had never been identical in their previous history.

Spotting Spoken Historical English: The Role of Alliteration

71

4. The historical context: towards a description of the parameters of spoken idiomaticity Alliteration, whether in verse or in prose, is an essentially spoken feature of language (Minkova 2003: V). Like rhyme, assonance and consonance it is a type of sound parallelism (cf. Knowles 1987: 84). In PrE all these forms of sound parallelism are stylistic devices, highlighting two or more items which are “substantially similar but [...] differ at some point” (Knowles 1987: 84). Their role in non-poetic language and, thus, in prose is marginal and eclectic. The reason for this is obvious: prose is usually read, i.e. neither recited nor listened to. ME prose was different in this respect, namely widely written to be recited and listened to. This simply follows from manuscripts being an expensive medium, moreover from the high rate of illiteracy and the fact that prose was widely intended for the cultural lower and middle classes as final target groups. Homilies, even when read by lower clergy as mediators, were intended for laymen as recipients. This is all the more obvious in the cases of other genres: cookery recipes, herbals, tracts about horses. Even in the case of genres which were somewhat stylised, such as charters and letters, prose served more the interest of the common people and reflected more their needs. Alliteration in this prose then was more than just a decorative feature. It was a mnemonic device since the common people had hardly any written sources available to them and were dependent on their good memories. It was, moreover, a feature of marking phrases as idiomatic phrases, as well-known, suggestive, metaphorical or topical concepts. The phraseological types within which these concepts materialised were manifold, from coordinate clusters to the various types of subordinate phrases discussed in this study and to clausecohesive alliteration. In view of the large amount of material on subordinate phrases alone, the argumentative line in this analysis has been selective. There has been a focus on the affinity to spokenness as the main motivation

Manfred Markus

72

for the high frequency of alliteration in ME prose. Since speech can be seen “as an imperfect version of the written language” (Knowles 1987: 5), the analysis was bound to forward some evidence of ‘imperfection’ – dropping h’s; aphetic forms and high frequency of apocope; semantic vagueness (do do, a manner of); emphatic repetitiveness on the margin to redundancy. The general prevalence of dialect in ME, which made the concordances less easily accessible, also give evidence of this ‘imperfection’ in the spoken language. However, ME should, at least to some extent, be seen as a spoken culture, whereas the written culture was widely in the hands of the French and remained under their linguistic influence. The historical dimension comes in as an additional factor. While the two cultures, English and French, merged from the latter half of the 14th century onwards, the common people who were concerned with ME prose up till then and after – the lower clergy, the non-aristocrats – were fond of alliteration also as a nostalgic bulwark against foreign infiltration. Whatever the courtiers – and Chaucer was (partly) one of them (see motto) – wrote, people concerned with prose wanted to have their own say.

5. Conclusion In English linguistics the picture of historical English including ME is almost totally based on verse poetry, at least up to the literature of the 18th century. The literary interest in Chaucer’s works, Shakespeare, Dryden, Milton, Pope, etc. motivated our philological ancestors, the pioneers of our academic field of the 19th century, to ignore the difference between literary English and spoken English. Practically all historical English grammars12 and dictionaries (such as the MED and the OED) have relied too much on the stylised language of verse. Not 12

Including a study book on ME by myself (Markus 1990). An exception is the pioneering book by Bøgholm, but from a present point of view it is too rudimentary in its results and non-transparent in its source evidence; cf., for example, the chapter on ME to do (Bøgholm 1939: 261-263).

Spotting Spoken Historical English: The Role of Alliteration

73

that prose has been totally neglected. Yet verse has dominated the picture, and prose has at best been included eclectically in text-bases used for linguistic analysis. While the written tradition is the only one we have up to the early 20th century, it stands to reason that the development of English can better be traced on the basis of its spoken varieties. It is true, the written sources of Old and ME poetry fulfilled their purpose of allowing linguistic conclusions on the level of spelling, phonology and morphology, which are the fields that have dominated historical English linguistics until recently. But in phraseology, syntax, semantics, pragmatics, sociolinguistics – to mention only the main new branches that have gained ground in historical English linguistics recently – there is good reason for us to investigate spoken features of historical English, i.e. the type of language that was in daily use. How can this be achieved for ME? One way of getting close to spoken texts is the decided preference of text types which throw some light on ME everyday language, such as recipes and homilies. Another way of coming to terms with spokenness is the reflection of its parameters. One such parameter is the alliterative jingling of sounds in phrases. Given that people do not speak in words, but in phrases (and clauses), alliterative phrases throw some light on everyday spoken linguistic practice.

References Bloomfield, Leonard 11933, 21947. Language. New York: Holt, Rhinehart & Winston. Bøgholm, Niels 1939. English Speech from an Historical Point of View. London: George Allen & Unwin. Brinton, Laurel J./Minoji, Akimoto (eds) 1999. Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: Benjamins. Closs Traugott, Elisabeth 1999. A Historical Overview of Complex Predicate Types. In Brinton, Laurel J./Akimoto, Minoji (eds)

74

Manfred Markus

Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: Benjamins, 239-260. Ellegård, Alvar 1953. The Auxiliary Do. The Establishment and Regulation of its Use in English. Stockholm: Almqvist & Wiksell. ICAMET: The Innsbruck Middle English Prose Corpus on CDROM 2004. Innsbruck: University of Innsbruck, English Department; http://www2.uibk.ac.at/fakultaeten/c6/c609/ projects/icamet/. Jespersen, Otto 1942. A Modern English Grammar on Historical Principles. Part VI: Morphology. London: George Allen & Unwin. Knowles, Gerald 1987. Patterns of Spoken English. An Introduction to English Phonetics. London/New York: Longman. Lexikon des Mittelalters 1999. 9 Vols. Stuttgart/Weimar: Verlag J.B. Metzler. Markus, Manfred 1990. Mittelenglisches Studienbuch. UTB für Wissenschaft: Grosse Reihe. Tübingen: Francke. Markus, Manfred 2002. The Genesis of h-Dropping Revisited: An Empirical Analysis. In Lenz, Katja/Möhlig, Ruth (eds) Of Dyuersitie & Chaunge of Langage. Essays Presented to Manfred Görlach on the Occasion of his 65th Birthday. Heidelberg: Universitätsverlag Carl Winter, 6-26. Markus, Manfred Forthcoming. B & B: The Role of Alliteration in Twin Formulas of ME Prose. Folia Linguistica Historica. Matsumoto, Meiko 1999. Composite Predicates in ME. In Brinton, Laurel J./Akimoto, Minoji (eds) Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: Benjamins, 59-95. MED: Kurath, Hans/Kuhn, Sherman/Lewis, Robert E. (eds) 19522001. Middle English Dictionary. Ann Arbor, Michigan: University of Michigan Press. Minkova, Donka 2003. Alliteration and Sound Change in Early English. Cambridge: Cambridge University Press. Nevanlinna, Saara 1980. To Make Merry. Neuphilologische Mitteilungen 81, 34-41.

Spotting Spoken Historical English: The Role of Alliteration

75

Nickel, Gerhard 1978. Complex Verbal Structures in English. In Nehls Dieter (ed.) Studies in Descriptive Linguistics. Heidelberg: Julius Groos, 63-83. Oakden, James Parker, with assistance from Elizabeth E. Innes 1935. Alliterative Poetry in Middle English. A Survey of the Traditions. Manchester: Manchester University Press. OED: The Oxford English Dictionary 1992. Second Edition on CDROM (Version 1.13). Oxford: Oxford University Press. Prins, Anton Adriaan 1952. French Influence in English Phrasing. Leiden: Universitaire Pers Leiden. Quirk, Randolph/Greenbaum, Sidney/Leech, Geoffrey/Svartvik, Jan 1985. A Comprehensive Grammar of the English Language. London/New York: Longman. Tanabe, Harumi 1999. Composite Predicates and Phrasal Verbs in The Paston Letters. In Brinton, Laurel J./Akimoto, Minoji (eds) Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: Benjamins, 97-132. Visser, Frederikus Theodorus 1963-1973. An Historical Syntax of the English Language. Leiden: E.J. Brill.

Manfred Markus

76

Appendix (occurrences of 3 or more) C*/k*/ch* of c*/k*/ch* ~548 charity of Christ/Christenmen 23 charter of covenant 9 charter of king 64 changing of colour 6 changing of countenance 6 church of Canterbury 31 church of Christ 9 church of c*/k*/ch* >100 child(hood) of Christ 12 city of c*/k*/ch* >100 cleanness of charity 4 cleanness of conscience 8 commandment of charity 7 coming of Christ 11 company of knights 11 Confirmation of king 25 course of kind 13 court of King (mainly Arthur) 49 cross of Christ 12 craft of clergy 3 crown of Christ 3 keiser of kings 4 king of kings 25 knight of king 13 D* of d*: 310 day of doom 225 day of death 8 Darkness of death 8 doctor of divinity 9 lamb of life 3 length of life 5 letters of love 5 leave of lord 3 licome (‘body’) of lamb 3 liking of love 3 lord of lords 21 love of life 3

doom of doomsday 9 doom of damnation 5 doubt of death 6 dread of death 40 f* of f*: 488 father of frumschaft (‘first creation’) 3 for fear of f* 441 flame(s) of fire 26 filth of flesh 2 flesh of folks 6 furnice of fire 3 Fowling of flesh 3 frailty/frailness of flesh 4 g* of g*: 609 grace of God (+ v) 499 grant of God (+v) 10 goodness of God 7 (the) good of grace 5 gift of God 44 (given of God 10) gird (‘rod’) of God 6 year (Zere) of grace 12 gift of grace 10 young of years 6 l* of l*: 89 law of love 5 labourage (‘labouring’) of lands 9 lady of love 5 lack of leasure 5 part of payment 8 pain of purgatory 55 pain of penance 4 perfection of priesthood 4 pain of prisonment 4 pine of purgatory 4 pine of penance 3 pistle of prayer 3

Spotting Spoken Historical English: The Role of Alliteration lover of lechery 19 m* of m*: 222 master of meekness 7 making of man/men 3 malice of man 5 man of man/men 75 man of might 4 manner of man 8 manner of merchandise 5 manner of measure 8 manner of meat (‘food’) 9 mantle of meekness 3 marks of money 6 medicine of man 3 might of maidenhood 13 might of men 4 mind of man 6 mother of mercy 33 month of may 12 morsel of meat (‘food’) 4 multitude of men 14 n* of n*: 5 need of nothing 5 p* of p*: 270 prince of peace 9 prince of philosophers 12 prince of priests 8 principles of philosophers 4 part of penance 9 part of penitence 8 part of prudence 5 sleep of sin 4 sum of silver 13 sorrow of sin 6 soul of sin 3 souls of saints 3 spice of science 4 spoon of silver 13 spot of sin 7 step of sobriety 5

place of prayer 3 plea of profession 3 plenty of people 14 powder of pepper 81 power of people 7 puisance of people 3 point of pride 3 prees of people 9 provost of paradise 7 q* of q*: r* of r*: 15 rule of religion 4 rule of reason 3 root of raddish 8 s* of s*: 314 salvation of soul 14 song of songs 3 savour of sweetness 3 shame of sin 4 sick(ness) of sin 18 secret of secrets 4 sect of stoikis 3 sickness of soul 5 servage of sin 3 servant of servants 3 service/servitude of sin 6 service of sisters 3 showing of sin 4 shilling of silver 33 shilling of sterlings 61 sides of silver 6 *-ful N (allit.): 140 dreadful doom 9 dreadful day 24 dreadful dream 3 faithful friend 12 faithful fellow 5 grurefulliche/grimfulliche god 4 manful man 6 painful passion 11

77

Manfred Markus

78 state of spousehood 3 steering (styring) of sin 10 stink of sin 3 stroke of spear 7 stroke of sword 14 succession of sins 11 sword of sorrow 8 sword of steel 3 sweetness of song 4 simony of silver 3 sin of sloth 10 sin of sodomy 4 sin of simony 4 t* of t*: 17 telling of tales 2 tempest of thunder 3 tour of tree 3 travail of temptation 3 truth of things 3 time of temptation 3 w* of w*: 54 wanting of wit 4 world of worlds 15 wells of water 5 wells of wisdom 4 well of wit 3 willnung of worship 5 woh of word 4 wone of witness 4 word of wisdom 4 wind of words 5

sinful soul 41 sorrowful sickes (‘sick people’) 5 woeful wretch 11 wonderful working 3 worshipful women 6 *-ly N (allit.): ~ 660 fleshly father (‘natural father’) 12 fleshly friends 12 ghostly good (‘spiritual good’) 18 ghostly gift 6 goodly gift 3 ghostly gladness 7 costly clothes 5 lovely lord 3 lovely loving/love 3 lusty life/living 6 manly man 3 mighty man 55 many manners/matters/men ~500 privy place 9 sely souls 7 worldly winning 3 worthy worship 4 worthy women 4 *-ous N: 53 delicious drinks 5 despitous death 3 gracious God 4 lecherous lust 4 lecherous lorelis 3 malicious men 5

IRMA TAAVITSAINEN / PÄIVI PAHTA / MARTTI MÄKINEN

Towards a Corpus-Based History of Specialized Languages: Middle English Medical Texts

In this chapter we introduce a new electronic research tool that facilitates systematic corpus-based analysis of the development of English as a language of science and medicine in the medieval period: the corpus of Middle English Medical Texts (MEMT hereafter), compiled by us at the University of Helsinki.1 The corpus is available on a CD-ROM (published by John Benjamins in 2005) with MEMT Presenter software designed by Raymond Hickey (Essen University). In what follows, we shall describe the characteristics of the corpus and define its place among other corpora available for historical linguists. We shall discuss our principles of corpus compilation, including data selection criteria and database structure, basic preliminaries that have a crucial impact on the results of empirical studies and also determine the usefulness of the database to a large extent. We shall also briefly report on our pilot studies on earlier versions of the corpus, and discuss the areas where the potential for future studies is greatest. MEMT will be useful for several kinds of study providing new evidence for linguistic developments and change. It offers a new window to language history in general, and an entirely new opening for systematic historical accounts of specialized and professional languages that have recently attracted growing interest in their own right. In English, the language of medicine lends itself to diachronic studies in an ideal way as medicine was the forerunner in vernacularization processes of the fourteenth century, and remedy book material already existed in Old English. The focus of this chapter is on the first phase of vernacularization, which seems to have been 1

The work reported in this chapter has been supported by the Academy of Finland, and the Research Unit for Variation and Change in English and the Helsinki Collegium for Advanced Studies, both at the University of Helsinki.

80

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

largely completed by the end of the medieval period. At the beginning of the new era in c. 1500, the array of vernacular medical material was wide, ranging from learned treatises and institutional texts to practical advice and private writings. Results obtained on the basis of MEMT should be applicable more widely as it can be assumed that the same patterns are manifested in other fields of scientific writing.

1. MEMT MEMT is a purpose-built electronic corpus, a resource to be used on its own or together with other modern research tools. It was originally designed to serve as material for our project Scientific Thought-styles: The Evolution of Early English Medical Writing (see below). MEMT contains texts of the scholastic period, from c. 1375 to c. 1500, with a small appendix of trilingual material from c. 1330, and it forms the first part of the Corpus of Early English Medical Writing (1375-1750). MEMT covers the Late Middle English period of medical writing as comprehensively as possible. It includes surgical texts used as university textbooks in Latin but now vernacularized, sophisticated treatises on natural philosophy rendered in English for the first time, compendia with a variety of different and heterogeneous components, the first institutional medical texts in English in a barber-surgeons’ guild book, various types of astrological and prognostic medical texts, recipes and practical advice of various sorts, charms, and even medical rules jotted down in commonplace books for private use. The corpus is primarily based on published editions but we have complemented the selection by including several editions previously available only in unpublished theses, some additional texts transcribed directly from manuscripts, and extracts of early printed books; more than one-fourth of the material in MEMT is published for the first time. The guiding principle in our corpus compilation has been to include material from all extant editions of medical texts we know of. It took a great deal of work to track down and get hold of the editions already published, and in this task modern reference tools

Towards a Corpus-Based History of Specialized Languages

81

were invaluable.2 Tracing copyright holders and acquiring copyright permission proved fairly easy in some cases, but very difficult and even impossible in others. For copyright reasons, some texts are represented by short extracts; only a few had to be left out altogether. All in all, the corpus includes 86 text files, containing material from over 70 different texts and 77 manuscripts. The total word count in MEMT is around half a million words.

2. MEMT among corpora A computerized corpus can be defined as a collection of linguistic data designed to represent a particular language variety in machinereadable form. General and special corpora have different aims, and historical corpora are different from present-day databanks. Historical corpus compilation is both labour-intensive and time-consuming and the project has gone through several steps of development. Historical corpora like ours, covering a remote period over five hundred years ago, belonging to the age of manuscript culture, have certain limitations as they are dependent on extant data, representing skewed reality of materials. The fact that the corpus is based on editions poses some restrictions, but the same applies to most historical corpora. Yet in some respects MEMT differs from other historical corpora that are available. From the onset, it was intended for interdisciplinary use, as a source of information for a variety of research questions extending beyond linguistics. For this purpose, the corpus comes with a catalogue containing background information about the texts. It is also accompanied by a flexible purpose-designed search engine. In corpus linguistic terms, MEMT is a ‘second-generation’ corpus focusing on one register of writing. The continuous line of 2

These include Scientific and Medical Writings in Old and Middle English: An Electronic Reference compiled by Linda Ehrsam Voigts and Patrica Deery Kurtz (eVK, 2000) and A Manual of the Writings in Middle English 10501500, Vol. 10: Works of Science and Information, revised by George Keiser (1998).

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

82

English medical writing starts with the vernacularization processes of the fourteenth century. Some scientific writings are extant from the Old English period and are accessible in the Helsinki Corpus of English Texts (released in 1991; HC hereafter), a general-purpose historical corpus aimed at giving an overall picture of extant language data from Old English up to 1710 (see e.g. Rissanen et al. 1993). The Late Middle English part of HC contains 2000-word samples of a few medical texts, which are also included in our corpus.3 The desideratum for a larger corpus became evident with pilot studies on scientific writing in HC (Taavitsainen 1994) and MEMT grew out of this need. It provides a much larger database for the medieval period and a solid basis for tracing longer diachronic lines of development. The second part of our corpus, Early Modern English Medical Texts (EMEMT), is under compilation and will build a bridge to A Representative Corpus of Historical English Registers (ARCHER hereafter), which contains medical materials from 1650 up to the 1990s (see e.g. Biber et al. 1994). HC is available on ICAME CD-ROM; the ARCHER corpus is not publicly available but can be accessed by special permission of the compilers. Helsinki Corpus c. 750-1700 ARCHER 1650-1990

MEMT 1375-1500 EMEMT 1500-1750 Figure 1. The time span of MEMT and some other historical corpora containing medical and scientific texts.

3

Taavitsainen was responsible for the compilation of this part of HC.

Towards a Corpus-Based History of Specialized Languages

83

3. Contextualization of MEMT texts For validity, historical accounts of specialized and professional languages need contextualizing. Historical texts need sociohistorical anchoring; their meanings are elusive and can only be accessed in relation to their authors/writers and users. Formal assessments on morphology and syntax do not necessarily need a larger context, but as soon as we enter the fields of stylistics, sociolinguistics or pragmatics we cannot do without background facts. A new feature in historical corpora is that we have included a catalogue of background information on each text in MEMT, making our knowledge of the material readily available to MEMT users, and thus facilitating contextualized analyses of linguistic aspects. HC, the predecessor of our corpus, includes sociolinguistic annotation of texts. The level of the author’s education is given whenever known, but for the early periods the value ‘unknown’ is frequent. For MEMT, this parameter is perhaps not so important, as most materials in our corpus ultimately derive from learned sources. Instead, with vernacular material from the period when Latin prevailed as the language of science and the use of English was significant in itself, the audience parameter seems more important. Authors struggled to find means of expression in the vernacular, and they also strove to make their texts understandable to meet the needs of the target audience, so that the level of the audience and the addressee is likely to be an important factor that causes variation in scientific and professional language use. Discourse communities within which and for whose use texts were written have increasingly been used to explain variation in scientific writing, and it is generally agreed that genre conventions are created to ensure smooth and efficient communication. Practitioners of medicine were from heterogeneous backgrounds and only a very small proportion received university education; others were trained in monasteries, households, and guilds through apprenticeship and practical experience. Vernacularization of scientific writings reflects the dissemination of knowledge from the highest institutes of learning through various layers and stages to laymen and popular readership of almanac users.

84

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

4. MEMT and the dissemination of knowledge in the late Middle Ages Dissemination of learning and diffusion of knowledge in society from the learned elite in the monolingual Latin discourse world of science and medicine at universities to lay people is one of the guiding principles that runs through MEMT. University curricula provided for the transmission of authoritative medical texts that formed the basis of medical knowledge and practice in society at large. By the fourteenth century, texts originally derived from the university world had begun to be translated, adapted, abridged, and quoted in several fields, including natural philosophy, science and medicine, legal writing, and theology, and in addition to the conscious efforts of translation and abridgement, more accidental features of textual transmission like merges, blends and mixtures are found. Academic and surgical treatises were totally new in the vernacular. Most texts follow their originals, only a few compositions are more independent. The practices of free borrowing and common sources seem to apply to learned medical writing as well as to remedy books in this period, and the transmission of texts needs further study. This network of texts with various sociolinguistic anchorage points, though deplorably few in number, provides researchers with a challenge. We believe that it is possible to establish a linguistic ‘taxonomy’ to place texts in their proper place in the elaborate hierarchy from learned to popular. Corpus methodology is a much more reliable and efficient tool in showing the links and giving evidence than qualitative analysis of representative texts alone. Earlier versions of MEMT have been the starting point of our pilot studies to sketch this hierarchy and arrive at a linguistic description of thought-styles. We shall continue our efforts with the full version of MEMT.4 4

For descriptions of the Scientific Thought-styles project, see e.g. Taavitsainen/Pahta (1997, 1998), and Taavitsainen et al. (2002); for publications, see the Introduction on MEMT CD-ROM or the project website at http://www.eng.helsinki.fi/varieng/team4/1_4_4_projects.htm). The overall aim of the project is to trace changing scientific thought-styles in a long diachronic perspective. The late medieval period belongs to the scholastic age;

Towards a Corpus-Based History of Specialized Languages

85

The diffusion of knowledge can be verified in the general outline of changes from Latin dominance to a multilingual situation: at first, medical texts in English occurred in company with Latin and/or Anglo-Norman French in bi- or trilingual manuscripts. These factors are well known from philological studies. In England, like elsewhere in Europe, the vernacularization process gathered momentum during the latter half of the fourteenth century, when texts of all kinds started to be translated, compiled or composed in English and other vernacular languages. The process reached a peak in the fifteenth century, and continued in the early modern period, the proportions remaining in favour of Latin till the end of the seventeenth century. The rise in the numbers of English texts is surprising: the last quarter of the fourteenth century produced some two hundred texts, whereas nearly 8,000 items are recorded from the fifteenth century (eVK). Monolingual English manuscripts became more common only towards the end of the fifteenth century, though multilingualism lingered on. The multilingual context of early medical and scientific writing is reflected in the frequent use of different languages side by side, and texts belonging to different varieties and traditions of writing show different structural and functional patterns of code-switching (Pahta 2004). In MEMT texts the multilingual context is specifically reflected in the texts included in the Appendix, predating the others by almost fifty years, although many later MEMT texts in fact come from polyglot manuscripts containing material in Latin and/or French as well. Electronic data makes it possible to analyse the data with a greater precision and acquire new knowledge of e.g. the distribution of foreign elements in lexis. Early medical texts provide fruitful material science was logocentric and the mode of knowing was “that someone said so”. We have identified linguistic features typical of scientific styles of writing and paid attention to argumentation patterns and evidentiality. Our pilot studies have shown that some features are diagnostic of the level of writing and serve as indicators of the learnedness of the text, e.g. a hierarchy of authorities can be detected and set scholastic phrases occur more frequently in learned writing. Linguistic features indicating e.g. involvement and emotionality, evidentiality and modality show different distributions in different layers of writing, and argumentation patterns and metadiscursive practices vary. We have also observed that ‘specificity’, typical of modern scientific writing, is an important indicator of the level of writing. For results of research on the medieval part of the project, see also Taavitsainen/Pahta (2004).

86

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

for studying the influence of the underlying source languages on the development of English syntax and lexicon as well as discourse patterns.

5. MEMT design MEMT contains about half a million words of running text from medical treatises representing different varieties of writing. In the corpus design, the texts have been divided into four main categories. Three categories are based on a well-known tripartite classification, which divides medieval medical texts into surgical, academic and remedy texts on the basis of their tradition of writing. In MEMT these categories have been redefined, and are labelled ‘Surgical texts’, ‘Specialized texts’ and ‘Remedies and materia medica’. In addition, a small group of texts in verse form have been placed in a category of their own, as it is important for some research purposes to distinguish between prose and verse formats. This fourth category is simply labelled ‘Verse’. Similarly, the early fourteenth-century trilingual texts (recipes and a herb glossary) are grouped separately and placed in the ‘Appendix’. Assigning individual texts, often with complicated and layered transmission histories, into the three categories reflecting different traditions of writing has not been straightforward and our classification should only be taken as indicative. Several texts contain components that show an affiliation with more than one category and new connections will certainly be found with new studies on MEMT texts. In cases like these our categorizations have been made according to the background literature and our own knowledge of the text contents. An example is provided by Bartholomaeus Anglicus’ encyclopaedic work De proprietatibus rerum, which was among the first learned scholastic writings to be rendered in the vernacular. In MEMT, extracts of the medical sections from John Trevisa’s late fourteenth-century translation appear in two categories: a sample of the book on human anatomy is placed among surgical texts, while an

Towards a Corpus-Based History of Specialized Languages

87

extract of the book on human physiology appears in the category of specialized texts.

5.1. Surgical texts The texts in the first category belong to the learned tradition and a number of them represent the highest academic level of writing, being derived from university texts. Rather than actual discussions of surgical procedures, some texts in this category are sophisticated and theoretical descriptions of the human anatomy. Fourteenth- and early fifteenth-century translations include surgical writings by famous surgeons like John Arderne, Guy de Chauliac and Lanfranc of Milan. Some compilations in English border on original compositions. We have made an attempt to include various versions of multiple translations and display the whole variety of texts in the field in English in the late medieval period. Our pilot studies in a longer diachronic perspective indicate that medieval surgical texts formed the solid basis on which later writings were built; the ideas presented in them were not abandoned, but they acquired a different position in text books and manuals of instruction. MEMT makes discoveries like this possible as it shows the transmission of ideas and use of stylistic features. The continuation of MEMT will be needed for longer diachronic lines, but even the medieval part can provide a great deal of new evidence on the development of the special language of medicine. Another linguistic area of research well worth attention is register variation and incipient standardization. According to the present view, standardization took place at different rates in different genres of writing, and besides spelling forms the processes involved generic patterns and styles of writing.

5.2. Specialized texts This category contains translations of learned theoretical ancient treatises and scholastic texts on physiology and natural philosophy, as well as tracts focusing on a specific illness or field of specialization, or particular method of prognosis or treatment. The topics comprise

88

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

ophthalmology, reproduction, gynecology and obstetrics, urinoscopy, phlebotomy, epilepsy, syphilis, and the plague. The samples include an extract from Galen’s De ingenio sanitatis, one of the rare Middle English texts connected with the renowned Greek physician, and Caxton’s Ars moriendi, a popular handbook providing spiritual and practical guiding at one’s deathbed, printed in 1491. Several texts in this category illustrate the challenge and the problems that the first translators of medical texts faced in their attempt to vernacularize medical science. English had yet to develop the lexical resources and syntactic conventions to address learned topics rich in technical and theoretical details for which the GrecoRoman learned tradition had developed its own means of expression over a millennium. A logical place for the early vernacular translators was to look for a model in Latin, and many texts in this category are characterized by undue reliance on the source language. The difficulties of expressing abstract concepts and complicated causal and spatial relations in English have been pointed out by earlier studies, and a great deal remains to be done in e.g. tracing the advent of new syntactic constructions.

5.3. Remedies and materia medica This category contains texts belonging to the remedy-book tradition, e.g. recipes and prognostications, and herbals represent materia medica. Remedy-books pertain to a large and complicated field of research with several intertwining traits, from components originally derived from learned classical sources to materials bordering on occult and magic. The common denominator of the texts included in this section is practical applications of a medical theory, usually in the form of recipes. In addition to that, many of the texts offer elaborate presentations of a medical theory with respect to a particular malady, which are then applied in the recipes that round off the texts or entries in them (Mäkinen forthcoming). The tradition of remedy-book writing in English is longer than in the other fields of medical writing, which shows e.g. in the standardization of recipe collections. Recipes have received a great deal of scholarly attention lately. The focus is on the standardization of generic features and repertoires of language use, but

Towards a Corpus-Based History of Specialized Languages

89

transmission patterns and intertextuality is also emerging as an important area of study. Appropriation provides another perspective on transmission, as it focuses on the readers and users of texts, how scientific ideas spread and how meaning was made.

5.4. Verse The relation between verse and prose was different from modern times: prose was a more sophisticated and elegant means of conveying ideas. Verse was employed for more practical purposes, as the meter and rhyme scheme provided a mnemonic aid, and standard stock phrases were developed for the same purpose. MEMT includes a selection of verse texts and offers material for comparison. In terms of contents, the verse texts in this category range from instructions on bloodletting and diet to accounts of fetal development and physiological theory.

6. MEMT Presenter The MEMT Presenter program has been specially designed for the MEMT corpus. It was developed by Raymond Hickey, in co-operation with MEMT compilers, and is partly based on his Corpus Presenter software (Hickey 2003). The MEMT Presenter can only be installed in a Windows environment, but the CD-ROM also contains an HTMLbased Java-version, which makes it possible to view the data and the information package also on Mac. In MEMT Presenter, the data has been arranged hierarchically according to the categories discussed above. There are two basic means for displaying the structure of the corpus. The default view shows a multi-level tree, where the main categories form the firstlevel branches and the individual texts within the traditions form the second level (see Figure 2). This structure allows retrieval of information from the whole corpus, any one branch of the corpus tree,

90

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

or an individual text. The other option, an indented list display, allows flexible information retrieval from any other data groupings that the user might want to make; here the texts can be selected for processing by simply ticking a box in front of the text labels. In both displays a click on a text label shows the relevant text on the screen.

Figure 2. MEMT Presenter multi-level tree.

MEMT Presenter contains various kinds of search functions, ranging from a simple search of any string in a text currently on display to wildcard searches and searches with multiple keywords with the help of input word lists through the whole corpus or any part of it. The finds can be collected in a KWIC list with a definable amount of context, and copied to the Windows clipboard. MEMT Presenter also generates a list of unique words from a single file or any number of selected files. This function is vital for coping with the spelling variation of lexemes in the data; a lemma of a lexeme generated using this function can then be used as an input word list in the search. It is also possible to generate a list of unique words in reverse order, which makes it possible to study suffixes. All text files in MEMT Presenter are rtf files. They can be lifted from the program and analyzed with other tools, such as Corpus Presenter or WordSmith, which allow more complicated and refined corpus linguistic searches than MEMT Presenter. In addition to texts and data retrieval functions, the MEMT Presenter provides the following: a general introduction to the corpus, a discussion of the editorial policy and text mark-up applied in compilation, a text catalogue with socio-historical background information and bibliographical information, and a manual for the program.

Towards a Corpus-Based History of Specialized Languages

91

7. Future potential The potential of MEMT is considerable. Its use may be limited for phonology, but it will certainly prove useful for morphological or syntactic studies; these fields are as yet largely unexplored. Lexical semantic and terminological studies will also find MEMT useful; Norri’s comprehensive lexical analyses (1992, 1998) are based on a corpus partly overlapping with ours, and we are able to offer some additional materials. The pilot studies described in this chapter on scientific thoughtstyles, vernacularization strategies, and linguistic and stylistic features of texts have revealed that the field is rich and that there is plenty to explore. What we have achieved so far is more information about register variation in the early periods and more details about overall development. More can certainly be accomplished and we are happy to open up the possibility to scholars, and it is exciting to see what new research it will inspire. We believe that our contribution to the scholarly world in the form of a new database of historical medical texts will further encourage research activities in this field.

References Biber, Douglas/Finegan, Edward/Atkinson, Dwight 1994. ARCHER and its Challenges: Compiling and Exploring A Representative Corpus of Historical English Registers. In Fries, Udo/Tottie, Gunnel/Schneider, Peter (eds) Creating and Using English Language Corpora. Amsterdam: Rodopi, 1-14. eVK = Scientific and Medical Writings in Old and Middle English: An Electronic Reference 2000. Voigts Ehrsam, Linda/Deery Kurtz, Patricia (compilers). CD-ROM. Ann Arbor: University of Michigan Press. Hickey, Raymond 2003. Corpus Presenter: Software for Language Analysis. Amsterdam/Philadelphia: Benjamins.

92

Irma Taavitsainen / Päivi Pahta / Martti Mäkinen

Keiser, George R. 1998. A Manual of the Writings in Middle English 1050-1500, Vol. 10: Works of Science and Information. New Haven: The Connecticut Academy of Arts and Sciences. Mäkinen, Martti forthcoming. Between Herbals et alia: A Study of Intertextuality in Medieval English Herbals and Other Contemporary Medicine (Dissertation). Norri, Juhani 1992. Names of Sicknesses in English, 1400-1550: An Exploration of the Lexical Field. (Dissertation) (Annales Academiae Scientiarum Fennicae, Dissertationes Humanarum Litterarum 63.) Helsinki: Academia Scientiarum Fennica. Norri, Juhani 1998. Names of Body Parts in English, 1400-1550. (Annales Academiae Scientiarum Fennicae, Humaniora 291). Helsinki: Academia Scientiarum Fennica. Pahta, Päivi 2004. Code-switching in Medieval Medical Writing. In Taavitsainen, Irma/Päivi, Pahta (eds) Medical and Scientific Writing in Late Medieval English. Studies in English Language. Cambridge: Cambridge University Press, 73-99. Rissanen, Matti/Kytö, Merja/Palander-Collin, Minna (eds) 1993. Early English in the Computer Age: Explorations through the Helsinki Corpus. Berlin/New York: Mouton de Gruyter. Taavitsainen, Irma 1994. On the Evolution of Scientific Writings from 1375 to 1675: Repertoire of Emotive Features. In Fernandez, Francisco/Fuster, Miguel/Calvo, Juan José (eds) English Historical Linguistics 1992. Amsterdam/Philadelphia: Benjamins, 329-342. Taavitsainen, Irma/Päivi, Pahta 1997. The Corpus of Early English Medical Writing: Linguistic Variation and Prescriptive Collocations in Scholastic Style. In Nevalainen, Terttu/KahlasTarkka, Leena (eds) To Explain the Present: Studies in Changing English Language in Honour of Matti Rissanen. Mémoires de la Société Néophilologique de Helsinki 52. Helsinki: Société Néophilologique, 209-225. Taavitsainen, Irma/Päivi, Pahta 1998. Vernacularization of Medical Writing in English: A Corpus-based Study of Scholasticism. Early Science and Medicine 3, 157-85. Taavitsainen, Irma/Päivi, Pahta (eds) 2004. Medical and Scientific Writing in Late Medieval English. Studies in English Language. Cambridge: Cambridge University Press.

Towards a Corpus-Based History of Specialized Languages

93

Taavitsainen, Irma/Päivi, Pahta/Leskinen, Noora/Ratia, Maura/Suhr, Carla 2002. Analysing Scientific Thought-styles: What Can Linguistic Research Reveal about the History of Science? In Raumolin-Brunberg, Helena/Nevala, Minna/Nurmi, Arja/Rissanen, Matti (eds) Variation Past and Present: VARIENG Studies on English for Terttu Nevalainen. Mémoires de la Société Néophilologique de Helsinki 61. Helsinki: Société Néophilologique, 251-270.

BARRY MORLEY / PATRICIA SIFT

Towards the Automatic Identification of Directive Speech Acts

1. Introduction A challenge for pragmatics, and even more so for historical pragmatics, is an automated ‘function-to-form’ study (Jucker/Jacobs 1995) for data identification and quantification. For instance, speech acts are realized in a large variety of syntactic patterns, which are difficult to trace electronically. In this chapter, we show that computerized identification of speech acts is indeed viable given welldefined research parameters. ‘Function-to-form mapping’ in historical pragmatics, or, more precisely, in diachronic pragmatics (cf. Jucker/Jacobs 1995), has so far been confined to time-consuming manual analysis on account of the nature of this subfield. In this kind of approach, linguistic items with a particular function (such as speech acts) are taken as the starting point and thus as a constant factor, while their linguistic realizations (or ‘forms’) at different times are the variables (in the case of speech acts, the syntactic patterns in which they occur). By contrast, in diachronic ‘form-to-function’ mapping, one particular linguistic form (e.g. a particular modal verb) is the constant and its functions across time are subject to change. This kind of study can be carried out electronically because a single linguistic form, including its variant spellings, is fairly easily retrievable by a computer even from a large corpus. This is not so in the case of linguistic items with a particular function. Directive speech acts (or any kind of speech acts, for that matter) are almost impossible to retrieve electronically, as they can be syntactically and lexically realized in a virtually infinite number of different ways.

Barry Morley / Patricia Sift

96

Therefore, it seems a viable starting point to narrow down an investigation into the electronic identification of speech acts to a relatively restricted domain, i.e. one type of speech act in one particular text type in which that speech act is typically expected to occur. Thus, we confined our study to directive speech acts as found in Late Middle English prose sermons. Our small corpus will be presented in section 2. As a first step, we identified the different realizations, i.e. syntactic patterns, of the directive speech acts, by means of hand analysis. These patterns are discussed in section 3. Next, part of speech (POS) grams have been analysed electronically from the corpus text files to reveal significant patterns with regard to directive speech acts. The method for this analysis is outlined in section 4 and primary results are given in section 5. All returns from the corpus matching the significant patterns were then used to determine a set of filtering rules that would leave only the directive speech acts as returned data. The analysis of these returns is elaborated upon in section 6 and the results presented in section 7.

2. The data Sermons are a text type of religious instruction. These texts are typically records of the lessons given by a preacher to his congregation. Therefore, it is characteristic of sermons to contain directive speech acts. A directive speech act, according to John Searle’s classic theory, is an attempt by a speaker or writer to get his 1 addressee to carry out an act (Searle 1969: 66 and 1976: 11). Applied to a sermon, the speaker or writer of the directive speech act would be the preacher, author or compiler of the sermon, and the addressees 1

There are, of course, more recent classifications of speech acts, based on various criteria, e.g. Bach/Harnish (1979), Ballmer/Brennenstuhl (1981), Sadock (1994), Allan (1998); for a recent overview see also Sadock (2004). Searle’s taxonomy, however, is still regarded as the classic one and is the one most widely used (cf. Sbisà 1995: 502), not only within historical pragmatics (cf. Kohnen 2002 and 2004).

Towards the Automatic Identification of Directive Speech Acts

97

would be the whole or parts of the congregation benefiting from the sermon. The directive speech acts found in the sermons investigated are realized in the form of more or less readily predictable syntactic patterns, which – because of their regular recurrence – can be called formulae (cf. section 3). The automatic identification of such syntactic patterns and their subsequent filtering in order to result in a properly extracted set of directive speech acts require texts tagged for parts of speech. Therefore, the only suitable source of sermons available at present is the Penn-Helsinki Parsed Corpus of Middle English (2nd edition) hereafter referred to as PPCME2.2 For this study, the larger groups of sermons from the Late Middle English sections were selected, corresponding to the subperiods ME3 (1350-1420) and ME4 (1420-1500) in PPCME2, yielding a small corpus of 121,813 words: x

File cmwycser.m3, which contains a selection from the English Wycliffite Sermons, namely the first 45 sermons from the Sunday Gospels and 4 from the Sunday Epistles. These provide 57,067 words to the study;

x

file cmroyal.m34, comprising two complete sermons (2 and 4) and the second half of a third (41) from the volume Middle English Sermons and containing 6,405 words;

x

file cmmirk.m34, comprising 33 of the 74 sermons presented in John Mirk’s Festial and containing 58,341 words.

More tagged historical corpora are eagerly awaited in the research community.

2

The use of tagged versions of sermons is a disadvantage in the respect that relatively little material to choose from is available at present. However, the great advantage is that the varying orthographic realizations (due to the lack of standardization in earlier stages of English) of the individual words that the formulae consist of pose no problem at all for automatic identification.

98

Barry Morley / Patricia Sift

3. Lexicogrammatical patterns found in directive speech acts As was said before (see section 2), sermons characteristically contain directive speech acts which are instances of religious instruction given by a preacher to his audience. Therefore, it is only these directives that we included in our study, that is, directives in which the addressees are explicitly mentioned; directive speech acts that do not explicitly address the audience of the sermon have been excluded. Such directives can be, for instance, part of narratives, i.e. expository sections of a sermon, and issued by one of the characters of the narration. Another example of excluded directives is that type which the sermon compiler introduces as quotations, mostly from Scripture, such as: (1)

þen when Thomas was comen out of his chapell, þe abbot felle downe to þe grownde and sayde: “Syr, ¤e mowe blesse þe tyme þat ¤e wer borne, forto haue suche vysitacion, as I now haue herde.” þen sayde Thomas: “Yf þou haue oght herde, I charche þe þat þou neuer telle hit, whyll I am on lyue.” (cmmirk, 10: 41)

The manifestations of directives found in the sermons investigated are for the most part regularly recurring (formulaic) syntactic patterns, of which some are more and some less predictable. 3 The most prominent and most frequent patterns and their relative levels of predictability are shown in Table 1.4 3

4

Among the latter we find the group of indirect directives, which have been excluded for the very reason that they appear to be too unpredictable, as in: And þanne me þinkuþ þat we schulden preye þat Godis wille be don, as hit is in heuene so here in erþe […]. (cmwycser, G36: 376) and singly occurring impersonal modal constructions with non-pronominal subject, such as: And so eche man by þis lawe is holdon ay to loue eche broþur. (cmwycser, E11: 521) Examples given in this section are taken from the Helsinki Corpus of English Texts and other electronic sources and are not restricted solely to the examples of religious instruction used in the computer analysis. Examples from other sources have been added to show that the syntactic patterns under

Towards the Automatic Identification of Directive Speech Acts

Predictable patterns

99

Unpredictable patterns

A) Directive performative I + VB + you B) 1st-person plural pronominal subject + a) subjunctive (hortative subjunctive) b) let (let-paraphrase) c) modal verb + lexical verb C) 2 nd-person singular/plural pronominal subject + modal verb + lexical verb

E) (2nd-person) Imperative

D) 3 rd-person singular/plural pronominal subject + a) subjunctive b) let-paraphrase c) modal verb + lexical verb

F) 3rd-person singular/plural nonpronominal subject+ a) subjunctive b) let-paraphrase c) modal verb + lexical verb

Table 1. Summary of directive patterns.

The first thing to note is that it is encouraging that the number of predictable patterns far outweighs those that are unpredictable. Examples of each type are outlined in the ensuing examples. The first type of predictable pattern is the directive performative (Table 1: A). It typically consists of a directive performative verb (that is, a directive speech-act verb with a subject in the 1st-person singular or plural indicative active), an object referring to the addressee, and the act requested of the addressee. The directive performative is issued in order to express a varying level of obligation (depending on the verb used) attached to the requested act. Examples include: (2)

Wherfor, good men and woymen, I charch you heyly in Godys byhalue þat non of you to-day com to Godys bord, but he be in full charyte to all Godis pepull; and also þat ¤e be clene schryuen and yn full wyll to leue your synne. (cmmirk, 30: 131)

(3)

In þis prayour is conteyned more witt þan anny erthly man can tell, and þer-fore I concell you þat ¤e loue to vse þis prayour a-boven all prayours. (cmroyal, 2: 10)

consideration also occur in other Late Middle English as well as in Early Modern English sermons. This supports the claim that they are indeed formulae (cf. Sift forthcoming).

Barry Morley / Patricia Sift

100

Hortative subjunctives (Table 1: B.a) consist of a 1 st-person plural pronominal subject and a verb in the subjunctive, mostly inverted, that is, in the form VB + we: (4)

þerfor dwelle we stidfast in bileue, and cleue we faste to þis foure wheelid chare of vertues, þat þorou¤ Cristis help we moun delyuere soule, þat is bitokened bi þe dou¤tir, fro þe fendis [boond] of helle. (Lenten Sermons, 3: no page indication)

(5)

We, þan, þat be not ryghtwis as Seynt Basile was, take we ensampull of Theophile þe synner and pray we with hym […]. (cmroyal, 41: 261)

The let-paraphrase designates a periphrastic imperative construction with let. A predictable formula found in the sermons comprises let and a 1st-person plural pronominal subject, i.e. let us VB (Table 1: B.b): (6)

But now let vs pause here a whyle [...]. But now let vs retourne to our instruccyon. (John Fisher, Sermon against Luther: 1) rd

There is an alternative version to this pattern, using the 3 -person singular/plural, i.e. let him/them VB (Table 1: D.b), as exemplified by: (7)

Let them be sure to abstain from all those things, which by experience and observation they find to be contrary to each other. (Taylor, The Marriage Ring: 16)5

Corresponding to this predictable pattern, there is an unpredictable rd version in the form let + 3 -person non-pronominal subject (Table 1: F.b): (8)

Lette euerie man do his owne busines, and folow his callying. Let the priest preache, and the noble men handle the temporal matters. (Hugh Latimer, Sermon on the Ploughers: 29)

5

Our filtering rules do recognize this pattern as well. However, in our rather limited data set, the only example of this pattern that was found had to be excluded on account of being a false alarm: þen Symeon toke hym yn his armes wyth all þe reuerens þat he cowþe and cussed hym and þonked hym heghly þat he let hym lyue to þe tyme for to see hym bodely wyth his een (cmmirk, 14: 58).

Towards the Automatic Identification of Directive Speech Acts

101

It is clear that the variability in the non-pronominal subject is the factor that leads to the unpredictability of this pattern. However, it should be noted that the subject is always, obviously, in the form of a noun phrase. Therefore, the development of a noun phrase identifier would bring such patterns into the predictable group. The next pattern listed in Table 1 forms one of a group that will be reported together here (B.c, C, D.c). What they have in common is the occurrence of a modal verb followed by a lexical verb. Only the subject is different: It can be a 1 st-person, 2nd-person or 3rd-person pronominal subject: x 1 st-person plural pronominal subject + modal verb + lexical verb (B.c): (9)

Then sith þis mercifull kinge Criste Ihesu commyth to visitte vs taking þe nature of mankinde, we must receyue hym on iij manere wise as þe trewe legeman receyvith a temperall kynge, þat is to say, with honest aray in cloþing, the second to be commendable in porte and shewyng, the third to gife a precious present þat is plesing. (Festial Revision, 3: 67-68)

x 2 nd-person singular/plural pronominal subject + modal verb + lexical verb (C): (10)

But here þou schalt vnderstonde þat not eche man haþ illiche myche blisse, but after hire loue was here, [so schal her blisse þere be] more or less proportioned thereafter. (Lollard Sermons, 8: 164-166)

(11)

By þis preyoure ¤ou shuldeste grett Oure Lady, þat she be goode mene to hure Sonne Criste Ihesu to haue mercy on þe þat þou my¤the at Domesday com to þat ioye þat euer shall laste. (cmroyal, 2: 12-15)

x 3 rd-person singular/plural pronominal subject + modal verb + lexical verb (D.c): (12)

Wherfor he þat wyll scape þe dome þat he wyll come to at þe second comyng, he most lay downe all maner of pride and heynes of hert, and know hymselfe þat he ys not but a wryche and slyme of erth, and soo hold mekenes yn his hert. (cmmirk, 1: 2)

102

Barry Morley / Patricia Sift

These three patterns can therefore be summed up by the pattern pronominal subject + modal verb + verb, the strength of the directive being indicated by the modal verb used. It is, however, possible to find an example of this general form using a non-pronominal 3 rdperson subject: (13)

By þys ensampull ych crysten mon and woman schuld lerne to do reuerence, and seruyce, and honor þys day to þys child. (cmmirk, 6: 25)

Again, variability and hence unpredictability are enveloped within the ‘uncertain’ noun phrase being used as subject here instead of a simple pronoun.The final predictable pattern is 3rd-person singular/plural pronominal subject + subjunctive (Table 1: D.a). This is a pattern that is structurally parallel to B.a., except for the person of the subject: (14)

Whoso wole lengur dilate þis mater of preyoure, loke he more þerof in þe sermoun of Ephiphanie. (Lollard, 15: 190)

This pattern also has an unpredictable equivalent, namely with a 3 rdperson singular/plural non-pronominal subject (Table 1, F.a): (15)

And whoso likiþ to trete lenger of þis mater, loke in þe sermoun of Corpus þirsday. (Lollard, 15: 193)

It should be noted that the most obvious form of directive, the imperative, though clear to the human mind as an instruction, is difficult to isolate using a computer given the fact that the only identifying feature of such a speech act is that it contains a verb. Of course, it goes without question that there are a very large number of verbs in any given text.

Towards the Automatic Identification of Directive Speech Acts

103

4. The methodology of automatic directive speech act identification In order to begin the study in a more focussed direction, the major predictable patterns were identified by computer from a subset of the complete small corpus. This subset consisted of 9 sermons in which all the directive speech acts had been hand-identified. The sentences in which these speech acts were to be found were taken and reduced to their Part of Speech (POS) tags alone. From these, ‘POSgrams’ were extracted. These POSgrams consisted of a string of n POS tags where n was a positive integer. Given the inconsistency of grammar and syntax in Middle English, it was suggested that these POSgrams be padded in order to allow a certain flexibility in the matter of grammatical pattern identification. As such, up to 2 wildcards were allowed between each of the specified parts of speech. Therefore, for a POSgram of length n, the length of the resulting search pattern could vary from any between n and n+2(n-1) = 3n-2. Thus, the first thing to be measured was the viable range of n. In order to achieve this, a Perl code analysis of the subset of the small corpus was developed returning frequency values for all POSgrams of n=2 and upwards until the returned frequency became too low to be of any significance. A lower cut off was applied to these frequencies based on counting statistics, i.e. the square root of the maximum count whereby smaller counts would fall within the noise threshold of the highest count. From these counts, a significant return was determined as being at least double the noise threshold whilst not exceeding hand-identified counts already established from within the subset of the small corpus (such patterns could only be deemed as irrelevant). Then POSgrams satisfying these conditions were taken and assembled into a ‘minimum set’, i.e. dependent pairs were only represented by their more general pattern. For example, the frequency of the pattern PRO MD VB * * P is contained within the counts for PRO MD VB. This minimum set was taken and searched for in the corpus as a whole, in order to determine which patterns could be fully automatically extracted. This determination was made by looking for patterns which displayed a

104

Barry Morley / Patricia Sift

similar frequency behaviour to those hand-identified in the small corpus subset, but on a correspondingly larger scale. The second phase in the study was to improve the quality of automatically returned directive speech acts by hand-analysing the initial returns and developing a series of suitable filters, i.e. patterns (lexical or grammatical) that occurred in false alarms, but not in correctly identified speech acts. In addition to the analysis of bulk patterns found in the first stage of the study, other predictable patterns, as outlined in section 3, were searched for and filtering rules sought in order to further refine the procedure. Finally, directive speech acts from throughout the small corpus were identified by hand and sorted into the various predictable and unpredictable categories outlined in section 3. A Perl program was developed which returned complete sentences containing the sought POSgrams after all filtering had been applied. The output of the computer program was checked against this hand-verified output. The counts from the automatic extraction were separated into good returns and those that are false alarms (FAs), i.e. returned information that ought not to have been returned. Standard statistical measures were then employed. These were recall = a/(a+b) and precision = a/(a+c), where a is the number of correctly returned hits, b is the number of misses (i.e. correct information not returned) and c is the number of false alarms. These statistics therefore provide a quantitative measure as to the success of the study.

5. The determination of significant POSgrams Table 2 shows the various counting statistics returned from the determination of suitable lengths, n for POSgrams from the 9 sermon subset of the small corpus. 56 directive speech acts were identified by hand from the subset of the small corpus. This would indicate that there are too many superfluous returns from n=2 in which a single POSgram alone almost satisfies this count. At the other end of the

Towards the Automatic Identification of Directive Speech Acts

105

scale, there is a large reduction in returned frequency where n>4. Therefore, using only the patterns with a count exceeding double the noise threshold, patterns with n=3 and a count of 10 or more were taken alongside those of n=4 with a count of at least 6. The choice to exclude n=5 was made significantly easier by the fact that there were no counts exceeding 6 for POSgrams of this length. These patterns were searched for in the full corpus, the resulting frequencies being binned into the histogram shown in Figure 1. N 2 3 4 5 6

Maximum count from a single pattern, c 52 23 10 4 3

Noise threshold (¥c rounded) 7 5 3 2 2

Frequency of patterns exceeding noise 1483 801 597 95 6

Table 2. Counting statistics for the determination of suitable POSgrams.

Figure 1. Counts for significant POSgrams in the small corpus.

Barry Morley / Patricia Sift

106

Given the POSgram counts of directive speech acts found in the subset of the small corpus, 400-500 returns were expected from the corpus as a whole. Patterns for which this number of returns occurs are highlighted in Figure 1. Two other groups of returns can be seen with clear gaps between the count levels. Those over 700 form common patterns in the general text of this period which show no special qualities in the determination of directives. Those around 200 and below form the pattern VB * and will be considered further in section 6. Of the highlighted returns, only PRO MD VB displays structure outside a random grammatical pattern and corresponds to one of the known predictable patterns outlined in section 3.

6. POS pattern filters for Middle English 477 returns were found for the pattern PRO MD VB.6 Pronoun-modal verb inversion was also accounted for, resulting in a further 70 returns, and certain conditions allowing the pattern to be spread further than the originally anticipated 2 wildcards provided a further 19 output matches. These returns were reported as whole sentences and, after the application of filtering rules outlined below, had repetitions removed. The output was analysed by hand in order to verify correct extraction. Patterns were sought in those incorrectly returned sentences. These patterns were: x Removal of marked narratives i.e. those returned patterns preceded by some part of said, bade, quod, etc.; x L1 (‘left one’) collocates as follows: 6

As the aim of the study is to automate the process of directive speech act extraction as much as possible, all the different patterns identified in section 3 which are contained within the pattern PRO MD VB will be dealt with together so as to establish the simplest and most efficient general rules possible. Therefore this pattern corresponds to the combination of patterns B.c, C and D.c from Table 1.

Towards the Automatic Identification of Directive Speech Acts

107

relative pronoun – provides additional information only (99 cases); the conjunction how – as above (11 cases); o as – coincides entirely with unmarked narratives (further data required to establish a solid link; 9 cases); o than, though, why, what – low count of only 7 cases requires further data. x might/may – provides suggestions and possibilities. Here, they have been considered not strong enough to be directive. (51 cases); x 1 st/3rd person + will/shall/would/should – This is included for statistical purposes as the vast majority of returned cases (80%+) were found to be part of an unmarked narrative. It is possible for directives to occur with the shall/should modal elements of this pattern, for instance in a structure such as “Every good Christian man, he should pray…”. However, here we opted for omission of the pattern for the betterment of the statistics. The other predictable patterns outlined in section 3 were then searched for. The returned counts are shown in Table 3 together with the pattern or group of patterns they correspond to from Table 1. Performative (A)

I + VB + you

14

Let-paraphrase (B.b + D.b)

Let + PRO/DET

7

Hortative (B.a)

VB + we

49

PRO BE TO VB

9

Other subjunctive (D.a)

VB + he/she/they

99

Imperative (E)

VB + filtering

Table 3. Returned pattern counts for other predictable directive forms.

It should first be noted that, as expected, these counts are significantly smaller than those for the pattern PRO MD VB. Of all the cases of letparaphrase and PRO BE TO VB, all fall within narrative in the small corpus and so are omitted here. Imperatives are included here solely for the purposes of illustration and have not been included further in

Barry Morley / Patricia Sift

108

the study. However, it has been noted that their automatic identification can only take place through the identification of a verb (of which there are many!) followed by suitable filtering schemes. The remaining patterns (including imperatives) all contain elements which correspond to the VB * form noted as the low count groups shown in Figure 1. Performatives are filtered by removing marked narrative and inappropriate verbs located in the central slot of the pattern. Hortative subjunctives provide excellent results (as given in section 7) when filtered using the rules: x Remove marked narrative; x Remove L1 collocate: but/whether – conditionals; (t)herefore – following information only. Other subjunctives consist of a very high proportion of sentences involved in narrative. Therefore, following the removal of a variety of narrative markers (said, bade, quod, etc.) as well as other temporal discourse markers (then, after this, so, etc.) results are greatly improved without the loss of acceptable directive speech acts.

7. Comparison of automatic extraction and hand analysis Counts of hand-identified directive speech acts (of various categories) from throughout the small corpus are given in Table 4.

Cmroyal Cmmirk Cmwycser

TOTAL

PRO MD VB (B.c, C, D.c)

Perf. (A)

Hort. (B.a)

1 85 88 174

4 8 1 13

1 7 29 37

Other subj. (D.a) 0 2 1 3

Unpredictable (E, F) 4 21 0 25

Table 4. Hand-counted categorization of speech acts from the small corpus.

TOTAL

10 123 119 252

Towards the Automatic Identification of Directive Speech Acts

109

Having been checked against the hand-verified output, the counts resulting from automatic extraction using the Perl computer program are given in Table 5. This program, developed especially for this study employs the filters outlined in section 6 in order to remove false alarms. These counts have been separated into positive returns (‘hits’) and negative returns (‘false alarms’ or FAs). Note that, as mentioned above, patterns of types B.b and D.b are all filtered out as being contained within identifiable narrative structures and so are not reported here as they bear no significance on the results.

cmroyal cmmirk cmwycser TOTAL

PRO MD VB (B.c, C, D.c) HITS FAs 1 5 81 12 88 15 170 32

Performatives (A) HITS FAs 2 1 8 1 0 0 10 2

Hortatives (B.a) HITS FAs 1 0 6 0 28 1 35 1

Other subj. (D.a) HITS FAs 0 0 2 1 1 3 3 4

Table 5. Categorized counts of automatically extracted directives (with false alarms).

Using the analysis statistics outlined in section 4, the above information yields a recall of 87% and a precision of 85% for this study. Given the variability inherent within Middle English, this is a reasonable result. Further confidence can be gained from the fact that the false alarms returned here are almost uniformly located in unmarked narratives and the misses in the unpredictable patterns identified in section 3. Precision could be increased in future by considering the effects of filtering on narrative discourse markers such as in tyme and aftyr as can be found in two of the four false alarms in category D.a. The application of such filters would have to be such that they have little or no effect on the recall statistics. The unpredictable patterns that lower the recall value stem from the replacement of a pronoun in a predictable pattern by a noun phrase (as illustrated in the examples in section 3). Therefore, it can be said with confidence that the already reasonable statistics presented here can be improved further by the creation of narrative and noun phrase identification schemes for

110

Barry Morley / Patricia Sift

Middle English. It may also be possible to overcome the latter of these two problems using a reductive technique, i.e. searching for X MD VB and removing cases where X is a pronoun, and then considering patterns in the replacement element.

8. Conclusion Following the identification of statistically significant POSgram patterns associated with directive speech acts in a selection of Middle English sermons, a series of filtering rules have been developed that allow 87% recall and 85% precision in the automatic identification of these directive structures when measured against the manually verified equivalent. The only remaining failures of this filtering system are restricted to the identification of unmarked narratives and noun phrases as a replacement for the more predictable simple pronoun in various well-known directive forms. These two elements provide well directed areas for study into the improvement of the system in the future. In addition to this, the system will be useful in augmenting manual searches for pilot studies aimed at establishing research hypotheses. This study has established the filtering rules for one text type known to have contained directive speech act formulae. In the future, it will be possible to widen the search to further text types of religious instruction (e.g. religious treatises) and periods (Early Modern English and Modern English) as well as general instruction.

References Allan, Keith 1998. Meaning and Speech Acts. Online at www. arts.monash.edu.au/ling/staff/allan/papers/speech_acts.html, last accessed on 31/07/2005.

Towards the Automatic Identification of Directive Speech Acts

111

Bach, Kent/Harnish, Robert M. 1979. Linguistic Communication and Speech Acts. Cambridge Mass: MIT. Ballmer, Thomas T./Brennenstuhl, Waltraud 1981. Speech Act Classification: A Study in the Lexical Analysis of English Speech Activity Verbs. Berlin: Springer. Erbe, Theodor (ed.) 1905. Mirk’s Festial. A Collection of Homilies, by Johannes Mirkus (John Mirk), Part I, Early English Text Society, Extra Series 96. Helsinki Corpus of English Texts (diachronic part): file ceserm1a (John Fisher, Sermon against Luther) file ceserm1b (Hugh Latimer, Sermon of the Ploughers) file ceserm3b (Jeremy Taylor, The Marriage Ring) Hudson, Anne (ed.) 1983. English Wycliffite Sermons, Vol. 1. Oxford: Clarendon. Jucker, Andreas H./Jacobs, Andreas 1995. The Historical Perspective in Pragmatics. In Jucker, Andreas H. (ed.) Historical Pragmatics: Pragmatic Developments in the History of English. Amsterdam: Benjamins, 3-35. Kohnen, Thomas 2002. Towards a History of English Directives. In Fischer, Andreas/Tottie, Gunnel/Lehmann, Hans Martin (eds) Text Types and Corpora. Studies in Honour of Udo Fries. Tübingen: Narr, 165-175. Kohnen, Thomas 2004. Methodological Problems in Corpus-based Historical Pragmatics. The Case of English Directives. In: Aijmer, Karin/Altenberg, Bengt (eds) Advances in Corpus Linguistics. Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) Göteborg 22-26 May 2002. Amsterdam: Rodopi, 237-247. Lenten Sermons, downloaded from the Oxford Text Archive (www.ota.ahds.ac.uk). Lollard Sermons, downloaded from the Oxford Text Archive. Penn-Helsinki Parsed Corpus of Middle English (PPCME2): file cmmirk.m34 (John Mirk’s Festial) file cmroyal.m34 (Middle English Sermons, from MS Royal) file cmwycser.m3 (English Wycliffite Sermons) Powell, Susan (ed.) 1981. The Advent and Nativity Sermons from a Fifteenth-Century Revision of John Mirk’s Festial. Edited from B.L. MSS Harley 2247, Royal 18 B XXV and Gloucester

112

Barry Morley / Patricia Sift

Cathedral Library 22. Middle English Texts 13. Heidelberg: Carl Winter. Ross, Woodburn O. (ed.) 1940. Middle English Sermons, Edited from British Museum MS Royal 18 B. XXIII, Early English Text Society, Original Series 209. London: Oxford University Press. Sadock, Jerrold M. 1994. Toward a Grammatically Realistic Typology of Speech Acts. In Tsohatzidis, Savas L. (ed.) Foundations of Speech Act Theory. London: Routledge, 393-406. Sadock, Jerrold M. 2004. Speech Acts. In Horn, Laurence R./Ward, Gregory (eds) The Handbook of Pragmatics. Oxford: Blackwell, 53-73. Sbisà, Marina 1995. Speech Act Theory. In Verschueren, Jef/Östman, Jan-Ola/Blommaert, Jan (eds) Handbook of Pragmatics. Amsterdam: Benjamins, 495-506. Searle, John R. 1969. Speech Acts. Cambridge: Cambridge University Press. Searle, John R. 1976. The Classification of Illocutionary Acts. Language in Society 5, 1-24. Sift, Patricia Forthcoming. Face-Work in Early English Sermons: A Corpus-Based Study of Directives in Late Middle English and Early Modern English Prose Sermons (1350-1500). Doctoral Dissertation, University of Duisburg-Essen.

Modern English

HELENA RAUMOLIN-BRUNBERG

Leaders of Linguistic Change in Early Modern England1

1. Introduction This chapter is part of my research project ‘Language change and the individual’, which focuses on the different ways people behave under ongoing linguistic change. My research topics include general issues such as the longitudinal study of the linguistic behaviour of individuals (Raumolin-Brunberg 2005a) and more specific questions like the leadership of linguistic changes. My material consists of Late Middle and Early Modern English personal correspondence. Despite problems such as limited quantity and uneven representativeness, diachronic data have an advantage over synchronic material in offering the depth of time that longitudinal analyses require. The leadership of linguistic change is a complex phenomenon (Labov 2001: 323-411), and I will restrict my analysis to the following three questions: (1) Is it possible to trace the leaders of a particular change on the basis of what we know about the social trajectory of this shift? (2) Were there people who could be characterized as innovative in their linguistic behaviour in general; in other words, did the same people lead several changes? (3) In what types of social networks did the leaders live? This study deals with three morphological changes: the introduction of the object pronoun YOU2 into the subject function, the replacement of the suffix -TH by -S in the indicative third person 1

2

The research reported here was supported in part by the Academy of Finland Centre of Excellence funding for the Research Unit for Variation and Change in English at the Department of English, University of Helsinki. Small capitals are used to refer to the linguistic items studied at an abstract level, covering all spelling variants; e.g., YOU for you, yow, 85%

Table 1. The phases of a linguistic change.

In the analysis I compared individual scores with the corpus aggregates. The informants with a personal score over 30 during the incipient phase were singled out as incipient leaders. In other words, at a time when the corpus aggregate was below 15 per cent, those individuals who used the new variant in over 30 per cent of the cases were identified as leaders. This analysis includes all informants with ten or more occurrences of the variable. For the new and vigorous phase, when the corpus aggregate of the incoming variant was between 15 and 35 per cent, those people whose personal scores exceeded 50 were identified as the leaders.

126

Helena Raumolin-Brunberg

This division may be clarified by Figure 1, which shows that the period from 1410 to 1499 represents the incipient phase, since the proportion of YOU is below 15 per cent, whereas the timespan from 1500 to 1539 designates the new and vigorous period, with the proportion of YOU between 15 and 35 per cent. As far as MY/THY versus MINE/THINE is concerned, the incipient phase spans the 1410s to 1459, while the period 1460-1539 represents the new and vigorous stage (Figure 3). The rise-fall-rise model of -S, illustrated in Figure 2, makes the treatment of this change more complicated. As already mentioned, there are good reasons to see this change as two developments, at least in the South, and so this shift was divided into two waves with two incipient and two new and vigorous phases. It is also possible that the first new and vigorous period is more of a corpus artefact than a real development, since the relatively high frequency of -S largely stems from London merchants, whose contribution to the corpus is relatively large during this period. Nevertheless, I decided to regard the period 1410-1479 as incipient and 1480-1499 as new and vigorous for the first wave, and 1500-1579 as incipient and 1580-1599 as new and vigorous for the second wave.

4.2. The findings Tables 2-5 introduce the people identified as leaders according to the above-mentioned criteria. Three incipient and seven new and vigorous leaders were singled out for YOU v. YE. The corresponding figures for the third person suffix -S were three and two for the first wave and four and four for the second.

Leaders of Linguistic Change in Early Modern England NAME

SCORE

Incipient leaders 1410-1499 Kesten Thomas 93 Dalton John 61 Paston Walter 60

SOCIAL STATUS

REGION

Merchant Merchant Gentry

London, Calais Leicestershire, London, Calais Norfolk, Oxford

New and vigorous leaders 1500-1539 Henry VIII 100 Royalty (autograph letters) De la Pole Edmund, 100 Nobility Duke of Suffolk Grey Henry, Duke 92 Nobility of Suffolk More Thomas 74 Upper Gentry Willoughby Edward 67 Upper Gentry Plantagenet Honor, 65 Nobility Lady Lisle Pole Germayn 65 Gentry

127

London, Court Oxford, Continent, London Leicestershire, Court London, Court Dorset, Devon, Chester, Hertforshire, Essex Hampshire, London, Calais, Court Derbyshire

Table 2. Adoption of subject YOU: the leaders of the change.

NAME

SCORE

Incipient leaders 1410-1479 Greene Godfrey 100 Cely Richard jr Rocliffe Brian

96 46

SOCIAL STATUS

REGION

Gentry professional Merchant Gentry professional

North, London

New and vigorous leaders 1480-1499 Cely Richard jr 96 Merchant Page Richard 93 Professional

London, Calais North, London

London, Calais London, Kent

Table 3. Adoption of third-person sg suffix -S, first wave: the leaders of the change.

Helena Raumolin-Brunberg

128 NAME

SCORE

Incipient leaders 1500-1579 Preston Richard 100 67 Percy Henry, 5th Duke of Northumberland Harvey Gabriel 39 Johnson Sabine 38

SOCIAL STATUS

REGION

Non-gentry Nobility

Northamptonshire North

Professional? Merchant

Essex, Cambridge Northamptonshire

New and vigorous leaders 1580-1599 Chamberlain 100 Gentry John Henslowe Philip 55 Non-gentry, theatrical manager Dudley Robert, 52 Nobility Earl of Leicester Elizabeth I 50 Royalty

London London

Court Court

Table 4. Adoption of third-person sg suffix -S, second wave: the leaders of the change.

As Table 5 shows, the pattern was different in the case of MY/THY v. MINE/THINE, as no incipient leaders appeared, while there were three people who fulfilled the criteria for new and vigorous leadership. There were a few individual occurrences of the short form before 1460, but the individual scores did not reach the required 30. NAME

SCORE

SOCIAL STATUS

REGION

North, London

91

Gentry professional Nobility

50

Nobility

Hampshire, London, Calais, Court

Incipient leaders 1460-1539 Plumpton Edward 94 Stanley Edward, Earl of Darby Plantagenet Honor, Lady Lisle

Court

Table 5. Loss of -n- in first and second person possessives: the leaders of the change.

Leaders of Linguistic Change in Early Modern England

129

Let us now turn to the three questions that were posed in the introduction. Had the leaders the same characteristics as the leading groups of each change? In other words, could the leaders be traced on the basis of what we know about the general trajectories of the changes? This should be the case if Chambers’s argument (2003: 93) that sociolinguistic research has uncovered very few idiosyncratic cases holds true. Indeed, it seems that the leading positions can to a large extent be explained by what we know about the general social embedding of the changes. The social status and regional backgrounds of the leaders often comply with the general corpus findings. As regards the social status, the incipient leaders of YE/YOU and the first wave of -S represent the middle ranks, among whom these changes originated according to the CEEC (Nevalainen/RaumolinBrunberg 2003). The second-wave incipient leaders of -S came from the lower ranks, as expected. Here, however, as a reflection of the fact that this change was in the first place a region-driven development, region actually overrides social status in one case. Henry Percy, the fifth Duke of Northumberland, in other words, a northern nobleman, is found among the incipient leaders during a period when -S was mostly used by the lower or middling ranks. As regards Sabine Johnson and Richard Preston, the former was the wife of a London/Calais merchant and the latter a servant of the Johnson family. Although Johnson and Preston mostly stayed on their estate in Northamptonshire, their contacts with the capital were frequent and close. The new and vigorous leaders represent the upper ranks and most of them spent at least some time at court or in London. As regards the new and vigorous leaders of the loss of the nasal in the possessives, two people were attached to the court and the third leader came from the north. The first wave of new and vigorous leaders of the use of the -S suffix were middle-ranking Londoners, while the corresponding second-wave leaders came from the capital area, representing a broad spectrum of social ranks. In broad terms this is what we might expect, even the fact that the leaders of the use of -S had varying social backgrounds. This becomes understandable if we consider that the suffix -S diffused from the lower social echelons upwards during our period. As women were found to be ahead of men in all our changes, the low number of female leaders is not quite expected. This is at least

130

Helena Raumolin-Brunberg

partly a consequence of the corpus structure, as there is so little material by many of the female informants that the requirement of ten occurrences is not fulfilled. Only three female informants are among the leaders: Honor Plantagenet, Lady Lisle, Queen Elizabeth I, and the merchant’s wife Sabine Johnson.5 The second question deals with the possibility of the same people leading several concurrent changes. As our changes shared many external constraints and, as shown above, the leaders emerged from the leading groups, it would not have been surprising to find the same individuals as leaders, but this is the case with one person only. Honor Plantagenet was a new and vigorous leader in the introduction of both YOU and short possessives. The fact that no more than one person was leading more than one change suggests that there may not be much justification for generally characterizing people as advanced or conservative in their language use. Individuals may be advanced in one change but conservative in another. My third finding deals with the social networks. Although the historical reconstruction of the social networks is a demanding task and caution is needed here, it seems to me that some observations can be made on the basis of my material. The incipient leaders of YOU and the first-wave leaders of -S appear to have been geographically mobile people, most likely with a great many weak links, which according to James and Lesley Milroy (1985), promote the diffusion of linguistic changes. For example, the wool merchants Thomas Kesten, John Dalton and Richard Cely bought wool in the English countryside, looked after their businesses in London and Calais and travelled in the Low Countries. Godfrey Greene and Brian Rocliffe were lawyers travelling between Yorkshire and London. The new and vigorous leaders, on the other hand, seem to have been individuals with an influential position, in other words, the kind of people Labov (2001: 385-411) speaks of as leaders of linguistic changes. It is hardly wrong to suggest, for instance, that the rulers of 5

A further reason for the limited number of woman leaders is the fact that I have excluded all letters written by scribes. Most medieval women had to rely on scribal assistance because of their illiteracy. Unlike this study, Nevalainen/Raumolin-Brunberg (2003) included some scribal data for the calculation of the corpus aggregates.

Leaders of Linguistic Change in Early Modern England

131

the country like Henry VIII, leading the use of YOU, and Elizabeth I, an advanced user of third-person -S, were central figures in their social networks, as was also the case with the courtiers and noblemen among the new and vigorous leaders. As Tables 2-5 show, the sphere of activities of these people covered various parts of the country. It seems that in order to be diffused to all ranks and all over the country a change needed to be adopted by the topmost social ranks. Perhaps it is not only the social status that is at issue here but also the type of social networks.

5. Discussion and conclusion Apart from the findings introduced above, there may be some more general conclusions to be drawn from this study. My findings seem to corroborate the curvilinear hypothesis; that is, the argument that linguistic changes originate in the interior social groups, not in the highest or the lowest. As already pointed out, the first occurrences of both YOU and -S in the CEEC were found in the language of middleranking people; such as merchants and lawyers. Tables 2-4 show that most of the early incipient leaders belonged to these middle strata. My findings may also have some general relevance for the role of social networks in the diffusion of linguistic changes. It seems to me that my study helps to realize that the two well-known network approaches are not necessarily in conflict, although it might seem to be so. James and Lesley Milroy (1985) have argued that weak links in loose-knit social networks typically promote the diffusion of linguistic changes, while multiplex dense social networks tend to hinder change. On this view, one would expect people with weak links to lead changes. William Labov (2001: 325-365) claims that, at least in Philadelphia, linguistic leaders are influential central people in their social networks, with a high degree of interaction within the block they live in and a large proportion of wider contacts off the block. It is, of course, quite possible that the differences between the arguments go back to diverging linguistic and social embeddings of the

132

Helena Raumolin-Brunberg

phenomena studied, as Labov has claimed, but it may just be a question of focus. The Milroys seem to concentrate on the incipient phase, while Labov’s arguments explicitly deal with new and vigorous changes (2001: 385). I hope to have shown that the linguistic behaviour of individuals can be investigated in a historical context. However, I think that this is only possible if there is sufficient baseline data against which the individual usage can be compared and analysed. The baseline data and its analysis are only possible with representative electronic corpora.

References Bailey, Guy/Maynor, Natalie/Cukor-Avila, Patricia 1989. Variation in Subject-verb Concord in Early Modern English. Language Variation and Change 1, 258-300. Busse, Ulrich 2002. Linguistic Variation in the Shakespeare Corpus: Morpho-syntactic Variability of Second Person Pronouns. Amsterdam/Philadelphia: Benjamins. Chambers, Jack 22003. Sociolinguistic Theory. Linguistic Variation and its Social Significance. Oxford: Blackwell. Corpus of Early English Correspondence (CEEC) 1998. Compiled by Terttu Nevalainen/Helena Raumolin-Brunberg/ Jukka Keränen/Minna Nevala/Arja Nurmi/Minna PalanderCollin at the Department of English, University of Helsinki. Ferguson, Charles 1996. Variation and Drift: Loss of Agreement in Germanic. In Guy, Gregory R./Feagin, Crawford/Schiffrin, Deborah/Baugh, John (eds) Towards a Social Science of Language: Papers in Honor of William Labov. Vol. 1. Variation and Change in Language and Society. Amsterdam/ Philadelphia: Benjamins, 173-198. Holmqvist, Bengt 1922. On the History of the English Present Inflections, Particularly -th and -s. Heidelberg: Carl Winters Universitätsbuchhandlung.

Leaders of Linguistic Change in Early Modern England

133

Kytö, Merja 1993. Third-person Present Singular Verb Inflection in Early British and American English. Language Variation and Change 5, 113-139. Labov, William 1994. Principles of Linguistic Change. Vol. I: Internal Factors. Language in Society 20. Oxford, UK/ Cambridge, USA: Blackwell. Labov, William 2001. Principles of Linguistic Change. Vol. II: Social Factors. Language in Society 29. Oxford, UK/Cambridge, USA: Blackwell. Lutz, Angelica 1998. The Interplay of External and Internal Factors in Morphological Restructuring: The Case of you. In Fisiak, Jacek/Krygier, Marcin (eds), Advances in English Historical Linguistics (1996). Berlin/New York: Mouton de Gruyter, 189210. Milroy, James/Milroy, Lesley 1985. Linguistic Change, Social Network and Speaker Innovation. Journal of Linguistics 21, 339-384. Mustanoja, Tauno F. 1960. A Middle English Syntax. Part I. Mémoires de la Société Néophilologique de Helsinki 23. Helsinki: Société Néophilologique. Nevalainen, Terttu/Raumolin-Brunberg, Helena 2000. The ThirdPerson Singular -(E)S and -(E)TH Revisited: The Morphophonemic Hypothesis. In Dalton-Puffer, Christiane/Ritt, Nikolaus (eds) Words: Structure, Meaning, Function. A Festschrift for Dieter Kastovsky. Trends in Linguistics. Studies and Monographs 130. Berlin: Mouton de Gruyter, 235-248. Nevalainen, Terttu/Raumolin-Brunberg, Helena 2003. Historical Sociolinguistics: Language Change in Tudor and Stuart England. Longman Linguistics Library. London: Longman. Nevalainen, Terttu/Raumolin-Brunberg, Helena/Trudgill, Peter 2001. Chapters in the Social History of East Anglian English: The Case of Third Person Singular. In Fisiak, Jacek/Trudgill, Peter (eds) East Anglian English. Woodbridge: Boydell & Brewer, 187-204. Ogura, Mieko/Wang, William S.Y. 1996. Snowball Effect in Lexical Diffusion: The Development of -s in the Third Person Singular Present Indicative in English. In Britton, Derek (ed.) English Historical Linguistics 1994: Papers from the 8th International

134

Helena Raumolin-Brunberg

Conference on English Historical Linguistics. Amsterdam/ Philadelphia: Benjamins, 119-141. Raumolin-Brunberg, Helena 2005a. Language Change in Adulthood: Historical Letters as Evidence. European Journal of English Studies (Thematic issue on Letters and Letter Writing, ed. by Nevala, Minna/Palander-Collin, Minna) 9/1, 37-51 Raumolin-Brunberg, Helena 2005b. The Diffusion of YOU: A Case Study in Historical Sociolinguistics. Language Variation and Change 17/1, 55-73. Raumolin-Brunberg, Helena/Nevalainen, Terttu Forthcoming. Historical Sociolinguistics: The Corpus of Early English Correspondence. In Beal, Joan C./Corrigan, Karen/Moisl, Hermann (eds) Models and Methods in the Handling of Unconventional Digital Corpora. Volume 2: Diachronic Corpora. New York: Palgrave. Schendl, Herbert 1997. Morphological Variation and Change in Early Modern English: my/mine, thy/thine. In Hickey, Raymond/ Puppel, Stanislaw (eds) Language History and Linguistic Modelling: A Festschrift for Jacek Fisiak on his 60th Birthday. Vol. I. Language History. Berlin/New York: Mouton de Gruyter, 179-191. Stein, Dieter 1987. At the Crossroads of Philology, Linguistics and Semiotics: Notes on the Replacement of th by s in Third Person Singular in English. English Studies 5, 406-431. Wyld, Henry Cecil 31936. A History of Modern Colloquial English. Oxford: Blackwell.

HANS MARTIN LEHMANN / CAREN AUF DEM KELLER / BENI RUEF

ZEN Corpus 1.0

1. Introduction The Zurich English Newspaper Corpus (henceforth ZEN Corpus) consists of early English newspapers published in London between 1661 and 1791. It documents newspapers as an emerging genre, from the early issues of The London Gazette up to the period of the first publication of The Times. Regular newspapers (for example, twiceweekly papers) appeared from the middle of the 17th century onwards, with the first publication of The London Gazette in 1665. The corpus ends six years after The Times was founded in 1785. The ZEN project was initiated in the early 1990s by Udo Fries. What began as a small project for an exclusive circle of students interested in media language gradually expanded into a 1.6-millionword corpus of pre-19th-century London newspapers. Over the years, many collaborators have made the release of the ZEN Corpus possible and have published on various aspects of the corpus, among them Fries (1994, 1997a/b, 2001a/b, 2003), Fries/Schneider (2000), Xekalakis (1999), Fischer/Schneider (2002), Studer (2003) and auf dem Keller (2004). This chapter focuses on the advances that led up to the first public release on CD-Rom. For more detailed documentation of the earlier stages of the project, we refer to Fries (1994) and Fries/Schneider (2000).

136

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

2. The ZEN Corpus: Source Materials The ZEN Corpus consists of 349 newspaper issues covering 130 years. It contains 1.6 million words that were keyed in manually. It includes ten newspaper types that have multiple issues: The London Gazette 1671-1761, 1781-91; The Flying Post 1701-21; The Post Boy 1701-21; The Post Man 1711-21; The Daily Courant 1711, 1731; The Country Journal 1731, 1741; The Daily Post 1731, 1741; The London Chronicle 1761, 1781, 1791; The London Evening Paper 1761, 1781; and The Morning Chronicle 1781-91. The London Gazette was the first official, regular newspaper in London. It was founded in Oxford and moved to London when the Great Plague was finally overcome after the Great Fire of 1665. The London Gazette was published twice weekly, on Mondays and Thursdays. It was first issued as a single leaf, printed on both sides. The layout was in two columns divided by a rule. Some newspapers followed the style of The London Gazette, for example The Currant Intelligence or Smith’s Protestant Intelligence: Domestick & Foreign (Sampson 1974: 46). Between 1695 and 1702, the first thrice-weekly Posts appeared, of which The Post Boy, The Post Man, and The Flying Post are included in the ZEN Corpus. The first daily newspaper, The Daily Courant, appeared in 1702 (cf. Morison 1932: 73). Regarding the typographical layout, The Daily Courant was modelled on the style of The London Gazette, with the exception of the rule dividing the two columns. This is illustrated in Figure 1.

ZEN Corpus 1.0

137

Figure 1. Front pages of The London Gazette and The Daily Courant.

The 18th century saw the following developments: The Evening Post was the first evening paper, published in 1706; The St. James’s Post, With the best Occurrences Foreign and Domestick, published in 1715, was the first morning paper; weeklies included, for example, The Weekly Journal, founded in 1713, or Read’s Journal, probably founded in 1715; The Daily Post appeared from 1719 onwards, and The Daily Journal was published for the first time in 1720; the first Daily Advertiser appeared as early as 1730 (cf. Morison 1932). With the exception of St. James’s Post, all these papers are included in the ZEN Corpus.

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

138

Figure 2. Front pages of The General Evening Post and The London Chronicle.

This brief overview of the development of newspapers shows that an extraordinary diversification of newspapers took place within only a single century. Starting with pamphlets (i.e. papers that only appeared to mark particular events), regular newspapers appeared at the beginning of the 18th century. These regular publications further developed into monthly journals, weeklies, twice-weeklies, thriceweeklies, morning and evening papers, and even advertisers. This diversification of newspapers is represented in the ZEN Corpus, which comprises 52 different types of newspapers. Table 1 presents the 52 papers according to their frequency of publication. Daily 11

Thriceweekly 13

Twice weekly 6

Weekly

Monthly

Unclear

17

1

4

Table 1. Overview of the frequency of publication.

ZEN Corpus 1.0

139

The original material for the ZEN Corpus was sampled from newspaper issues copied from microfilms from the British Library in London. Some issues were of poor quality; for example, individual words or whole passages were illegible. Due to the limited quality of the extant material and the irregular typefaces used at that time, optical character recognition was not a viable option. For this reason, students and staff of the University of Zurich English Department, more than a hundred people in all, keyed in the material manually, which proved to be extremely time-consuming. The transcription covers the entire content of the newspapers, with the exception of lists of names, places, addresses, stocks, prices and goods, as well as poems and Latin texts of more than one line.

3. XML format of the ZEN Corpus The first issues of the ZEN Corpus were transcribed in 1991. At that time, the files were transcribed in Microsoft Word format and transcribers used the formatting capabilities of the word processor to represent much of the original formatting. From today’s perspective, this was a questionable choice. However, more structured approaches were not within the scope of the project at that time. XML did not yet exist, and SGML-aware tools were not readily available in the early nineties, at least not on personal computers. For the later stages, transcription guidelines were established and the transcriptions were undertaken in text editors. The annotation was added with cocoa bracketing. Early texts were automatically converted to this format. The resulting version of the corpus was then proofread for typing errors. It is with that version of the corpus that we started our project for a public release of the corpus formatted in XML. The heterogeneous origin and history of the corpus files presented us with a wide range of problems. The various word processors and text editors used on at least three different operating systems left us with a variety of different character encodings. In some cases, individual files contained three different encodings happily

140

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

mixed and matched. The automatic conversion of the early files transcribed in Microsoft Word format introduced a new set of problems of its own. One of them was end-of-line hyphenation that was left behind by the removal of the end-of-line information and resulted in a three-way ambiguity between hyphenated compounds, end-of-line hyphenation and dashes. Another source of problems was errors and inconsistencies in the mark-up. On the one hand, adding the mark-up manually inevitably resulted in missing, incomplete and mistyped tags. On the other hand, the many transcribers and – at times – even their supervisors interpreted the transcription guidelines differently. One of the most irritating problems resulting from misinterpretation of the guidelines was the use of

elements intended for paragraph endings, which were often erroneously used to indicate the end of a line. This basic confusion of text structure and text formatting proved to be the root of many inconsistencies. The conversion of the original format into a TEI-conformant version was undertaken in two steps. First, the corpus was converted into well-formed XML; second, the well-formed XML version was transformed by means of an XSLT stylesheet into a TEI-conformant version. This transformation primarily involved the renaming of tags. For instance, the original elements, marking italic type, had to be replaced by the TEI-conformant . The conversion routines that transformed the original into wellformed XML had to make certain minimal assumptions regarding the original format. In those cases where these assumptions turned out to be wrong, we either adapted the conversion routine to account for the original version, or we changed the original version to meet the assumptions. Generally, manual changes were performed on the original version, whereas automatic and heuristic changes were incorporated into the conversion routine. This strategy leaves open the possibility of correction and further development of the conversion routines. At the same time, it also safeguards the manual work incorporated in the original version, which in turn can serve as a fallback position for future work on the corpus. Finally, the TEI-conformant version was validated against an XML schema that was formulated as strictly as possible. For example, p elements cannot directly contain character data, i.e. all text must be

ZEN Corpus 1.0

141

within an s, head or dateline element. This schema validation uncovered further markup errors, which could then be corrected. As a result of this process, the ZEN Corpus in its release version is TEI-conformant, meaning its encoding follows the current version (TEI P4) of the Guidelines for Electronic Text Encoding and Interchange as set out by the Text Encoding Initiative (TEI) in Sperberg-McQueen/Burnard (2002). However, the TEI Guidelines are just that: guidelines, i.e. a framework within which the compiler of a corpus is left with many choices concerning the annotation and segmentation of the data. In the rest of this section, we illustrate some of these choices. See http//:es-zen.unizh.ch for a detailed description of the coding scheme. The files in the ZEN Corpus correspond to individual newspaper issues. They are structured as follows. The body element is divided into div elements. The div element contains head, dateline and p elements. The p element is divided into s elements. According to the TEI Guidelines, head elements may contain s elements. Because most head elements are very short (‘LONDON’, ‘To be SOLD by AUCTION’), the encoding scheme of the ZEN Corpus does not allow s elements to occur within head elements. The div element marks the basic text unit and categorizes it according to text type. The text class of a div element is referred to by its decls attribute. Each div element is referenced by the value of its n attribute, which is unique within the whole corpus. At present, there are a total of 4714 div elements. head, dateline, and p elements are numbered per corpus file and are treated alike. For example, the first p element in a corpus file, if preceded by one head element, becomes

. s elements are numbered per p element, i.e. the third s-unit in the example above becomes . This makes it possible to refer unambiguously to any s element within the ZEN Corpus. Quotation marks are not retained but are replaced by qb (start of quotation) and qe (end of quotation) elements. These empty elements, so-called “milestone” tags, were chosen instead of q elements because many quotations cross the boundaries of s or even p elements. The texts, i.e. technically speaking the div elements, are classified in three dimensions: decade, newspaper, and text class.

142

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

The decade, i.e. the publication date of a newspaper issue, is indicated in the creation element within the corresponding TEI header, e.g.: 1721

The newspaper classification is handled with the textClass element of the TEI header:

The target attribute in the catRef element above points to an entry in the newspaper taxonomy within the corpus header file:

The Athenian Mercury Applebee's Original Weekly Journal . .

The text class of a div element is indicated by its decls attribute, e.g.:

The decls attribute above refers to a textClass element in the corpus header file. The textClass definitions for the text.class scheme must be in the corpus header file to keep the values of the id attributes unique.

The target attribute above points, in turn, to an entry in the text class taxonomy within the corpus header file:

ZEN Corpus 1.0

143

. . Foreign News. . .

The double referencing (from ‘foreign.news’ to ‘FOR’ to ‘Foreign News’) is a technical necessity: because one TEI document within the ZEN Corpus contains one newspaper issue, which in turn contains several div elements belonging to different text classes, the div elements must be linked explicitly by means of their decls attributes with the corresponding textClass elements in the corpus header file. The choice of a TEI-conformant format has many advantages. On the one hand, it ensures a high consistency of linguistically relevant annotations. For example, each qb element is matched by a corresponding qe element. On the other hand, misleading typographical markup is removed (e.g. end-of-line hyphenation) or corrected (cf. the replacement of p elements by lb elements). In addition, the TEI Guidelines force the corpus compiler to provide the corpus with a minimum amount of metadata in a well-structured manner. However, the use of the TEI format not only improves the overall quality of a corpus; at least as important are the advantages for corpus compilers and corpus users. Corpus compilers with new projects can profit from TEI-aware editors and other XML tools, which allow the compilation of corpora whose markup is correct, i.e. TEI-conformant, in the first place. Many of the problems described above could have been avoided from the outset. Another advantage is that XML allows for very easy conversion to other formats by means of XSLT stylesheets. For instance, the creation of the text-only version of the ZEN Corpus was a matter of a few hours. Last but not least, the advent of XML-based tools will allow for complex queries in a standard way, i.e. without programming the user’s proprietary query tool. For example, XAIRA – the successor of the SARA software used for searching the British National Corpus – will operate on any corpus of well-formed XML documents.

144

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

4. Components and Word-counts of the ZEN Corpus The ZEN Corpus is annotated for two main dimensions. The classification by decades is directly linked to the sampling criteria. For every decade, papers from year one were sampled. If there was not enough extant material, papers from the years zero and two were also included. There are, however, exceptions to this sampling strategy in the early decades, where relevant material is particularly difficult to find. These exceptions are limited to issues 3, 4, 6 and 13 of The Current Intelligence dating from 1666, which are all subsumed under the decade labelled 1671. Table 2 shows a cross-tabulation of the word-counts according to decades and text-types. The horizontal totals indicate the number of words per decade. From 1701 to 1791, the decades are each represented with some 100,000 words or more. For earlier decades, the material is not as abundant. Especially the period 1661 contains only 4,412 words. For many purposes, it may make sense to collapse the first two decades and treat the resulting class as pre-1681. The ZEN Corpus is also annotated for domain or text-class. This classification began as a system of ad-hoc decisions taken by the transcribers; a classification of text-types based on external criteria is bound to reflect specific research interests. Over the years, different collaborators changed and shaped the classification system, which, as a result, is far from concise. It is astonishing to see classes like crime and deaths at the same classificatory level as home news and foreign news; crime obviously being a subclass of either foreign news or home news. Besides reflecting special research interests, some text-classes were created because they seemed formulaic and repetitive, and it was thus found to be desirable to treat them separately. Such classes are births, deaths, weddings and ship news. Fries (2001a: 178) suggests including accidents, births, crime, deaths, lost and found, ship news and weddings in home news. This may be specifically advisable for research, where formulaic language is not a concern. However, we decided not to collapse these classes in the present release of the corpus, since the corpus user can easily collapse sub-classes.

ZEN Corpus 1.0

145

The word-counts in Table 2 are based on a simple algorithm for tokenisation. Tokens were derived by a space-and-punctuationdelimited approach. As a possible difference from tokenization strategies for present-day language, apostrophes are not treated as word boundaries. The majority of cases are genitives, as in majesty’s, or simple past forms of verbs, as in publish’d, where such a treatment is justified. As a matter of course, due to the differences in word-counts, normalization is compulsory in the comparison of different corpus components. Table 2 provides the necessary information for calculating relative frequencies for the most salient components of the corpus.

5. ZEN Online ZEN Online is a web-based search interface for the ZEN Corpus. Besides the obvious advantage of being server-based and platformindependent, its strengths are regular expression-based searches, the possibility of distributing search results according to the various dimensions of textual classification, and the direct link to graphical representations of the original newspapers. ZEN Online is implemented in PERL and makes extensive use of an SQL database for storing search results and textual classifications. The search strategy is based on classic regular expressionsearches over flat-files. Instead of parsing the XML-structured corpus for every search, ZEN Online optimizes searches by making use of a bridge version of the ZEN material, which allows for a flat-file search strategy based on regular expressions. Instead of searching the text in XML, ZEN Online searches the bridge version of the corpus. (1a) and (1b) show a sample of the ZEN Corpus in XML format, and the corresponding part of the bridge version.

146

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

ZEN Corpus 1.0 (1a)

147

I am, Servant, AMB. GODFREY.

Sir,

May 15, 1761. street,Covent-Garden. To POST.

the

EDITOR

of

Your

humble

Southampton-

LLOYD'S

EVENING

SIR, In reading your last, I met with an account of the death of the Earl of Shelburne. (1b)

I am, Sir, Your humble Servant, AMB. GODFREY. May 15, 1761. Southampton-street, Covent-Garden. To the EDITOR of LLOYD'S EVENING POST. Sir, In reading your last, I met with an account of the death of the Earl of Shelburne.

The regular expression engine provided by the PERL programming language is a powerful tool for defining and locating patterns in unstructured text. Unlike index-based approaches, this strategy does not impose a specific tokenization and thereby a predefined view of the basic entities contained in the corpus. As a consequence, it is possible to formulate patterns based on parts of words and patterns containing optional elements. For example, the expression \S+ing\b can be used to retrieve all words ending in -ing, or the pattern \bmusick?\b can be used to retrieve the spelling variants music and musick. The bridge version is searched character-by-character. In the search patterns, alpha-numeric characters are interpreted literally, except if they are preceded by a backslash character as in \b, which stands for a word-boundary, or \S, which stands for any character appearing on screen. Non-alphanumeric characters often have a nonliteral interpretation, for example ?, which, in the pattern \bmusick?\b, specifies that the character to its left may be present or not. Regular expressions are ideal for explorative work, because they make it possible to formulate loose-fitting patterns that impose as few assumptions as possible on the phenomenon to be retrieved. In the case of the spelling variants of the word music, it is advantageous to formulate a looser pattern that makes fewer assumptions about the

148

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

possible variants. The expression \bmus[icky]+\b requires the word to begin with mus, followed by any combination of the characters i, y, c and k. In the ZEN Corpus, this pattern reports not only the instances of musick and music, but also musik, musk and musky. In this way, it is possible to find variants of a phenomenon not previously thought of, like musik, at the possible expense of also retrieving other irrelevant instances. Once the variants are established empirically, it is possible to formulate a narrow pattern like \b(musik|music|musick)\b to retrieve the empirically established set of variants. Regular expressions allow the corpus linguist to cast the net wide and bootstrap the corpus, without making too many armchair assumptions. A thorough description of regular expressions is beyond the scope of this paper. See http://es-zen.unizh.ch for more examples specific to the ZEN material, and Wall et al. (1996: 57-76) for a detailed description of regular expression syntax. One of the main disadvantages of using a stripped version of a corpus is the loss of annotation. ZEN Online avoids this problem. Instead of simply deleting annotation, it keeps track of it in a database. As a result, the bridge version is indexed with the original corpus and its annotation. In this way, the results found in the bridge version are automatically linked to the original corpus and its annotation, like year, paper and issue number. In the following, we will use the example of words ending in -ic or -ick to showcase the functionality of ZEN Online. The pattern \b\S+((i|y)(c|k)+)\b retrieves any word ending in i or y, followed by any combination of the letters c and/or k. Entering this pattern in the search window of ZEN Online returns 3,944 results. A sample result window can be seen in Figure 3. The result window in Figure 3 displays pages of 20 results. The controls at the top left of the window offer easy access to the different pages of the result set. The page size is configurable by the user. The button labelled Sentence View switches between the KWIC view seen in Figure A and a sentence view in which whole s-units are displayed. The highlighted search results are links to the original XML version of the corpus. The link in the second column displays a scanned version of the original newspaper. The scanned newspaper material is available in DjVu format at a resolution of 600dpi. The choice of DjVu, a highly compressed format, permits fast transfer via the internet and direct

ZEN Corpus 1.0

149

display in most web-browsers like Internet Explorer or Netscape via a plug-in. See Bottou et al. (1998), http://djvulibre.djvuzone.org/ and http://www.lizardtech.com/ for more information on the DjVu format.

Figure 3. Result window for the query \b\S+((i|y)(c|k)+)\b in ZEN 1.0.

The pop-up menu on the left of the toolbar offers several additional functions. There is a Query History function that allows access to previously executed queries, as well as a help page with instructions and sample queries. There are also several functions that offer further processing of the query result. Given the flexibility of the search patterns, it is possible to retrieve many different result strings with a single query. As can be seen in Figure 3, the pattern \b\S+((i|y)(c|k)+)\b retrieves a broad variety of word-forms. The menu item Frequency List offers a list of the different items retrieved by the pattern, ordered by absolute frequency or alphabet. In the case of the set of different endings considered in our example, it would be interesting to produce a frequency list on the ending, disregarding the rest of the instances retrieved. This is possible, thanks to the back-referencing built into PERL. The round brackets in the regular expression \b\S+((i|y)(c|k)+)\b not only group items for alternation or quantification; they also produce a reference to what was matched by

150

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

the expression in brackets. By selecting the option reference 1 in the search dialogue, it is possible to produce a frequency list of only the ending, because – in our case – the first set of brackets refers to the different endings. By reformulating our pattern as \b(\S+)(i|y)(c|k)+\b, we are still referring to the identical result set, but the first reference set by bracketing now refers to the word without the ending. Figure 4 illustrates these options with frequency information for the whole expression, for the first reference marking the ending, and for the first reference marking the match without the ending. The grouping is case-insensitive.

Figure 4. Frequency list based on matches of the whole pattern and on partial references.

Such frequency lists can be of great help in explorative work. The results are hot-linked to the result set, and by clicking on a link, all the instances subsumed under that row are opened in a new, reduced result set. For example, clicking on Mus in the third frequency list will open a new result window with all the instances beginning in Mus or

ZEN Corpus 1.0

151

mus, like musik, musick and music. Exploring the second frequency list shows that the -ik variants are marginal, with only the two relevant instances catholik and musik. The variants -yck, -yc and -yk may be irrelevant for our phenomenon, as they refer to proper names and place names. This firmly establishes -ic and -ick as the main variants to be considered. Given such flexibility in terms of focusing on the whole search expression or only on a specific part, the frequency list function can be used for a whole range of different purposes. As an extreme case, it is even possible to produce a frequency list of the whole ZEN Corpus by simply searching for \w+. As pointed out above, the bridge version of the ZEN material is linked to the XML release version of the corpus and its annotation. The Distribution feature makes it possible to distribute a result set over information contained in the headers of the original XML files. By default, selecting the Distribution menu from the pop-up menu results in a distribution based on the classification by decades, which provides an immediate overview of the diachronic development of a phenomenon. Other classifications like text-types and newspaper are available from a pop-up menu. Figure 5 shows a distribution of the result set according to the classification by text-types.

Figure 5. Distribution of the query \b\S+((i|y)(c|k)+)\b according to text-type.

152

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

Distributions over a single category provide a breakdown into subcategories. As can be seen in Figure 5, distributions provide the absolute frequency per category, the number of words in that category, and the relative frequency per 1,000 words. The row for absolute frequency also functions as a link into the result set. For example, clicking on 1,652 in the advertisement row opens a result set with all the 1,652 instances that occur in advertisements. It is also possible to produce cross-tabulations by selecting categories for the X- and Yaxes of the table in the pop-up menus. In the case of our -ic and -ick spelling variants, a distribution according to the decade and the ending chosen is probably the most interesting option for a cross-tabulation. This is shown in Figure 6.

Figure 6. Cross-tabulation of results according to categories ‘year’ and ‘type’ found.

A quick comparison of the frequency of -ic versus -ick over the decades shows -ick as the main variant, with -ic becoming more and more frequent. From 1761 onwards, -ic becomes the main variant. The results in Figure 6 are based on the entire set of results, irrespective of whether a variant is relevant or not. In any thorough attempt at linguistic description, however, cases like sick, stick, Patrick or Warwick should be treated as separate cases, as there is no -ic variant of these. Again, the links into the various cells are useful for exploring the data and finding reasonable working hypotheses for further research. A precursory glance at the -ick variants in 1791 shows a vast majority of irrelevant place names, proper names and monosyllabic words, and the virtual absence of adjectives. A proper analysis would have to be based on manual exclusion of garbage and further classifications like proper nouns and adjectives. The most efficient means of manual post-processing are databases (see Kirk

ZEN Corpus 1.0

153

1994). For this purpose, ZEN Online provides the Download menu, which permits the user to save the results of a query and its associated annotation in a tab-delimited file on the local computer. Tab-delimited files can easily be imported into standard database software. In addition, every instance is provided with a link that points back at the original corpus view in ZEN Online. As a consequence, the search result remains linked to the original corpus, even after its incorporation into a database system. The download feature also offers the possibility of randomsorting result sets. This can be particularly useful in dealing with large result sets, where manual analysis of the whole result set may be impractical, and where statistical significance can often be reached through the manual analysis of a fairly small random sample. In addition, a random ordering permits the corpus linguist to perform a pilot study on a first set of instances and – if the results are encouraging – to increase the number of results analyzed manually until statistical significance is reached. Thanks to the ubiquity of web-browsers, ZEN Online is ideal for classroom use. It also provides direct and operating-systemindependent access to the ZEN Corpus for researchers worldwide. In addition, its ability to display the original documents graphically has been vital in incrementally finding and verifying transcription errors with minimal effort for the corpus compilers.

6. Conclusion In this chapter, we have given a brief overview of the release version of the ZEN Corpus and the main decisions made in its coding scheme. We have shown the advantages of coding a corpus in the TEI format in the areas of corpus consistency, corpus compilation and corpusbased applications. The overview of the various components of the corpus and their word-frequencies are an important reference for future studies based on the ZEN Corpus. The capabilities of ZEN Online have been demonstrated by means of a brief investigation into

154

Hans Martin Lehmann / Caren auf dem Keller / Beni Ruef

variants of -ic and -ick endings, demonstrating the possibility of bootstrapping the corpus and dealing with spelling variation. The ZEN Corpus documents English newspapers as an emerging genre. Finally, we have shown that the ZEN Corpus with its annotation is a rich source of data for the documentation of language change in the 18th century.

References Bottou, Léon/Haffner, Patrick/Howard, Paul G./Simard, Patrice/ Bengio, Yoshua/Le Cun, Yann 1998. High Quality Document Image Compression with DjVu. Journal of Electronic Imaging 7/3, 410-425. Fischer, Andreas/Schneider, Peter 2002. The Dramatick Disappearance of the Spelling, Researched with Authentic Material from the Zurich English Newspaper Corpus. In Fischer, Andreas,/Tottie, Gunnel/Lehmann, Hans Martin (eds) Text Types and Corpora. Studies in Honour of Udo Fries. Tübingen: Gunter Narr, 139-150. Fries, Udo 1994. ZEN – Zurich English Newspaper Corpus. In Kytö, Merja/Rissanen, Matti/Wright, Susan (eds) Corpora Across the Centuries: International Colloquium on English Diachronic Corpora. Amsterdam: Rodopi, 17-18. Fries, Udo 1997a. Electuarium Mirabile: Praise in 18th-Century Medical Advertisments. In Aarts, Jan/de Mönnink, Inge/Wekker, Herman (eds) Studies in English Language and Teaching. Amsterdam/Atlanta: Rodopi, 57-73. Fries, Udo 1997b. The Vocabulary of ZEN: Implications for the Compilation of a Corpus. In Hickey, Raymond/Kytö, Merja/Lancashire, Ian/Rissanen, Matti (eds) Tracing the Trail of Time: Proceedings from the Second Diachronic Corpora Workshop. Amsterdam: Rodopi, 153-166. Fries, Udo 2001a. Text Classes in Early English Newspapers. European Journal of English Studies 5/2, 167-180.

ZEN Corpus 1.0

155

Fries, Udo 2001b. Foreign Place Names in the ZEN Corpus. Language Contact in the History of English. In Kastovsky, Dieter/Mettinger, Arthur (eds) Language Contact in the History of English. Studies in English Medieval Language and Literature 1. Frankfurt am Main: Peter Lang, 117-129. Fries, Udo 2003. Korpuslinguistik und die Ersten Englischen Zeitungen. Zürich: Züricher Universitätsschriften. Festrede anlässlich des Dies Academicus 2003. Fries, Udo/Schneider, Peter 2000. ZEN: Preparing the Zurich English Newspaper Corpus. In Ungerer, Friedrich (ed.) English Media Texts – Past and Present; Language and Textual Structure. Pragmatics and Beyond 80. Amsterdam/Philadelphia: Benjamins, 3-24. auf dem Keller, Caren 2004. Textual Structures in Eighteenth-century Newspaper Advertising. A Corpus-based Study of Medical Advertisements and Book Advertisements. Aachen: Shaker. Kirk, John 1994. Concordances or Databases? In Fries, Udo/Tottie, Gunnel/Schneider, Peter (eds) Creating and Using English Language Corpora. Amsterdam: Rodopi, 107-115. Morison, Stanley 1932. The English Newspaper. Some Accounts of the Physical Development of Journals Printed in London Between 1622 and Present Day. Cambridge: Cambridge University Press. Sampson, Henry 21974 11874. A History of Advertising from the Earliest Times. Detroit: Gale Research. Sperberg-McQueen, C.M./Burnard, Lou (eds) 2002. Guidelines for Text Encoding and Interchange. Published for the TEI Consortium by the Humanities Computing Unit. Oxford: University of Oxford. Studer, Patrick 2003. Textual Structures in Eighteenth-century Newspapers: A Corpus-based Study of Headlines. Journal of Historical Pragmatics 4/1, 19-44. Wall, Larry/Christiansen, Tom/Schwartz, Randal L. 1996. Programming Perl. Sebastopol: O’Reilly & Associates. Xekalakis, Elefteria 1999. Newspapers through the Times: Foreign Reports from the 18th to the 20th Centuries. Zurich: Zurich University (PhD Thesis).

UDO FRIES

Death Notices: The Birth of a Genre

1. Introduction In the first issue of The Times, of January 1, 1785, then still called The Daily Universal Register, under the heading Deaths, there were two notices: (1)

DEATHS. Died a few years ago, at his house in Greenwich, Capt. Robert Walter, of the Royal Navy. In Dublin, the Honourable Miss Isabella Howard, second daughter to the Right Hon. Lord Clonmore.

In Fries (1990: 60) I pointed out that the initial use of the word died was exceptional: it occurred only in this first entry, was hardly ever used in subsequent death notices in The Times, and if it occurred at all, it did not occur at the beginning of an entry. Its omission can be interpreted as an indicator that the genre or text class1 of death notices was well established by 1785. In The Times this text class was separated from other parts of the newspaper by a headline, either Deaths or Died, and a reader would have known that in all the individual entries the verb died had to be substituted. The individual death notice consisted usually of one sentence only, which was composed of a number of elements that a reader would expect in this text class. The most important element was the name or some other expression for the identification of the deceased, without which such an announcement would hardly make sense. All other elements were optional: strictly speaking they depended on 1

I use the terms genre and text class synonymously, corresponding to the German term Textsorte. There is a long discussion about a useful definition of genre, cf. e.g. Ljung (2000: 139).

Udo Fries

158

conventions, both of the period and of the individual newspaper. Among the frequently occurring items in early death notices we find the date and the place of death, the age of the deceased, his or her previous and last occupation, and the names of his or her relatives. Following Enkvist (1987: 211), I used the concept of template texts to describe such items as slots to be filled (Fries 1990: 60). Computer corpus studies do not often consider text linguistic questions. In the following I want to show in which ways the Zurich English Newspaper (henceforth ZEN) Corpus can enhance our better knowledge of the birth of the text class Death Notices, but also where we must see the limitations of any corpus.

2. The text class Death Notices in the ZEN Corpus The text class Death Notices is here defined as a section separated from other text classes by a header, which is either Dead, Deaths or Died, and which is almost invariably preceded by the text class Wedding Announcements, introduced by the headers Marriage, Marriages (until 1761), or Married (from 1771 onwards). These headers are either centrally positioned (in the examples of 1761 and 1791), or they occur at the beginning of the first line of the text class (in the examples of 1731, 1741 and 1771). YEAR 1731 1741 1741 (January) 1741 (July) 1761 1771 1771 1771 1771 1771 1771

HEADER Dead Dead Dead Deaths Deaths Died Died Died Died Died Died

NEWSPAPER The Country Journal: or, The Craftsman The Country Journal: or, The Craftsman The Champion; or, The Evening Advertiser The Champion; or, The Evening Advertiser Lloyd’s Evening Post, and British Chronicle Bingley’s Journal The London Evening Post Middlesex Journal: or, Chronicle of Liberty The Craftsman; or SAY’S Weekly Journal The General Evening Post The Westminster Journal: and London Political Miscellany

Death Notices: The Birth of a Genre 1791 1791 1791 1791

Died Died Died Deaths

159 Evening Mail The Morning Chronicle The Morning Post, and Daily Advertiser Public Advertiser

Table 1. Death notices and their headers in the ZEN Corpus.

These death notices are of two kinds, which are not always strictly separated. The earliest examples in the ZEN Corpus, from the Country Journal of 1731, give us a good idea of the varied nature of this text class. In some instances we have full sentences including the verb died, while other entries look like the elements of a list from which the verb of dying has been deleted and frequently there is little more information provided than the name of the deceased. (2)

Dead. On the 17th of last Month dy’d at Paris, the Right Hon. Lucius Henry Lord Viscount Falkland, and is succeeded in Honour and Estate by his eldest Son, Lucius Charles, now Lord Viscount Falkland; his Lordship is the first Viscount in North Britain. – Will. Northey, Esq; elder Brother to Sir Edward Halstead, Esq.; an eminent Attorney at Law. – Yesterday Se’nnight Mr. Joseph da Costa, Villa Reall, was taken ill of the dead Palsy, and died on Sunday Night at his House on College Hill: He is reputed to have died worth above 20,000 l. which he brought with him not long since from Portugal, where he was Contractor for supplying the Portuguese Army with Provisions, &c. but was forced to fly for Fear of the Inquisition. – The Hon. Mrs. Smith, Governess to his Royal Highness the Duke, and Daughter to Thomas Smith, Esq; who had been twice Speaker to the House of Commons. – Mr. Francis Chamberlaine, a Spanish Merchant of this City. (1731cjl00235:s:49.2)

In (2) the individual entries are separated by a dash. The first entry is a full sentence with the word died (in an old spelling variant). The second entry (William Northey) has no verb, whereas the third (Joseph da Costa) initially gives an account of the deceased’s illness, then tells the place and time of his death, using the word died (here in its modern spelling), and finally adds a sentence about the deceased’s wealth. The last two entries contain no verb.

160

Udo Fries

In The Champion; or, The Evening Advertiser, of January 1741 the telegraphic style is much clearer (cf. 3), but by July of the same year, the paper had completely changed the format of its death notices: each entry is a separate sentence using the verb died, and each one forms a new paragraph (cf. 4). (3)

DEAD. The Lady of Thomas Apreece, Esq; Mr. Robert Humphreys, an eminent Brewer. – Mr. Salvo, a Jew-Merchant. – Mr. Peter Harvey, of the Six Clerk’s Office. – Lieutenant General Kirk. – Capt. William Frazer. – Mr. Bernard Lens, Miniature-Painter. – Thomas Allen, of Durham, Esq; – And Robert Vernon, Bart. one of the Gentlemen-Porters to His Majesty. (1741cea00179:s:53.2)

(4)

On Monday last died Mrs. Montescute, Wife of Mr. Montescute, DEATHS an eminent Merchant in Cannon-street. Tuesday died at his House in Abchurch-Lane, after a short Illness, Mr. Wright, a Merchant of large Concerns in the New-England and Jamaica Trades, in Partnership with Mr. Sidebothom in Birching Lane. A few Days ago died Miss Hoppe, Daughter of Mr. Hoppe, an eminent Hamburgh Merchant, in Lime-street. Yesterday about Noon died Mr. Thomas Rivers, an ingenious Chaser, one of the Court Assistants of the Cutlers Company, and Colonel of the Lumber Troop: This unhappy Gentleman had the Misfortune to fall into Fleet-Ditch about three Weeks ago, which occasion’d his Death. A few Days since died, in an advanced Age, at his Seat at Compton in Wiltshire, Thomas Penruddock, Esq. (1741cea00265:s:83.1)

Lloyd’s Evening Post, and British Chronicle of 1761 returns to the telegraphic style. The date is regularly given as a numeral at the beginning of each line, with no reference to the month. Lloyd’s Evening Post appeared three times a week, and the section given in (5) is from the edition of Wednesday, February 11, to Friday, February 13, 1761. Therefore, the dates refer to the previous day (which was February 12) and the 8th and 9th of February. Secondly, the word died is omitted in the individual entries. There is a new line for each death notice.

Death Notices: The Birth of a Genre (5)

161

DEATHS. 8. Captain Lloyd, Deputy-Governor of Greenwich Hospital. 9. Mr. Woolley, Chief Porter to Greenwich-hospital. 12. Mrs. Colly, the Owner of great part of Brick-street, Hyde-Park-Corner. (1761lep00559:s:113.1 )

Bingley’s Journal, a weekly paper in the 1770s, is one of the first to use the header Died, which was common form in all papers by 1771. Furthermore, the individual entries in Bingley’s Journal no longer begin on a new line. (6)

Died.] A few days ago, at Erme, in Cornwall, the Rev. William Stackhouse, D. D. rector of that parish, A few days since, at Bristol, Mr. Clemens Patterson, late an apothecary in Hounsditch. Tuesday, in Nassau-street, Soho, Mrs. Craufurd, relict of Lieut. Col. Craufurd. Wednesday, at West Horsley, in Surry, aged near 80, John Paston, Esq. The same day, at Hertford, by a fall from his horse, in returning from Harlow Bush fair, Mr. Pym, maltster. Thursday, in Bloomsbury-square, Mrs. Kirkman. The same day, at Isleworth, the Rev. John Huckle. The same day, in Tooley-street, Mr. Hosea Miller, timber-merchant. Yesterday, in Clarges-street, Piccadilly, John Miller, Esq. (1771bug00068:s:53.1)

The lists which appeared in The London Evening Post and the Middlesex Journal: or, Chronicle of Liberty, look exactly the same. In The Craftsman; or SAY’S Weekly Journal, The General Evening Post, the Westminster Journal: and London Political Miscellany the date of death is omitted altogether. (7)

DIED.] In New Bond-street, aged 96, James Nelson, Esq; At Kentish Town, Mr. Havers, many years chief clerk to Lord Chief-Justice De Grey. In Fenchurch-street, Mrs. Mary Moore. At Deptford, Mrs. Marbyn, wife of ---Marbyn, Esq; At Brussels, aged 97, General Macartney, a native of Ireland, and many years in the Hungarian service. In Bow-street, Covent-garden, Mendes Da Costa, Esq; In New Bond-street, Thomas Hooper, Esq; Mrs. Roberts, wife of Mr. John Roberts, late of Russel-street, Covent-garden. Mrs. Lockwood, Lady of Thomas Lockwood, Esq; of Mortimer-street. At Streatham in Surry, Mrs. Fuller, lady of Thomas Fuller, Esq; Mrs. Holcombe, wife of Mr. Holcombe, merchant, in Crutched-friars. (1771csw00657:h:227)

Udo Fries

162

The Evening Mail, The Morning Chronicle, and The Morning Post, and Daily Advertiser for 1791 carry lists similar to the one shown in (6) with the date of death mentioned in words or phrases, but the word died omitted. The death notices in the Public Advertiser of 1791, however, show a complete reversal of this convention: the headline is Deaths as it was fifty years before and the verb died is used for each entry. (8)

DEATHS Tuesday died at the Bank-side, Southwark, Miss Bates, daughter of Mr. Bates, banker, at Bridgenorth. Monday last died at her house in the square at Kensington, Mrs. Torriano, widow of the late Samuel Torriano, Esq; The same day died at Chichester, after a short illness, Mrs. Smyth, relict of the late Dr. Smyth, Rector of St. Gileses in the Fields. Yesterday se’nnight died at Morpeth, Joseph Roberts, Esq; Collector of his Majesty’s Stamp Duties there. The same day died at Hockham Hall, in Norfolk, Frances Catharina Dover, wife of James Dover, Esq; of the same place. A few days since died, in Prince’s-street, Hanover-square, John Lawson, Esq; brother to the late Sir Henry Lawson, of Broughhall, in the county of York, Bart. (1791pad17660:h:235)

If we want to learn more about the history of the genre Death Notices, we must proceed in two directions. On the one hand, we must look into the telegraphic style of lists, which were very popular in the 18th century, and on the other hand, we must study the full sentences that could stand by themselves in any news report.

3. Lists in magazines The ZEN Corpus is a collection of newspapers, some of which were weekly papers, others appeared twice or three times a week. The Daily Courant was the first daily paper. The corpus does not contain any monthly publications. However, such publications may have had a direct influence on the text classes published in newspapers. Monthly

Death Notices: The Birth of a Genre

163

publications were the ideal place for the publication of a wide variety of lists, and some of the newspapers collected in the ZEN Corpus carry advertisements for these monthly magazines, which, among many other news items, contain lists of births, marriages and deaths. The Post Angel or Universal Entertainment is advertised for in several newspapers of 1701. This is the advertisement found in The Post Man of May 1701: (9)

ADVERTISMENTS. † † † The Post Angel or Universal Entertainment. In 5 distinct parts. viz. I. The remarkable Providences (of Judgement and Mercy) that happened in May, &c. 2. The Lives and Deaths of the most Eminent Persons that Died in that Month, &c. 3. A New Athenian Mercury; resolving the most Nice and Curious Questions proposed by the Ingenious of either Sex. 4. The Publick News at Home and Abroad. 5. An Account of the Books lately publish’d and now going to the Press. With a spiritual Observator upon each head. To be continu’d monthly. This for May. Printed, and are to be sold by A. Baldwin, near the Oxford-Arms in Warwick Lane. 1701. Where are to be had those for January, February, March and April. Price 1 Shilling. (1701pmn00856:s:33.1)

The second part of The Post Angel consisted of the “Lives and Deaths of the most Eminent Persons that Died in that Month”, which must have been a list very similar to that which appeared some 30 years later in the Gentleman’s Magazine (cf. 10), which carried a page header referring to the “Deaths of Eminent Persons”. The advertisement for The Post Angel shows us that it contained a whole month’s news, and that back numbers from previous months were still available. The advertisement for the first number of the Gentleman’s Magazine also has a variety of lists, some of which could also be found in newspapers, particularly in The London Gazette: bankrupts, and the lists of the Sheriffs and the Assizes. (10)

This Day is publish’d, N° I. for JANUARY 1731, THE GENTLEMAN’S MAGAZINE; or, Trader’s Monthly Intelligencer. Containing, 1. A View of the Weekly Essays and Controversies.

164

Udo Fries 2. Poetry, viz. the Ode for the New Year, by Colly Cibber, Esq; Remarks upon it; Imitations of it, by Way of Burlesque; Verses relating to the same Subject; with ingenious Epitaphs and Epigrams. 3. Domestic Occurrences, viz. Births, Deaths, Marriages, Preferments, Casualties, Burials and Christenings in London. 4. Melancholy Effects of Credulity in Witchcraft. 5. Prices of Goods and Stocks, and a List of Bankrupts. 6. A correct List of the Sheriffs for the current Year, and the Circuits for the Lent Assizes. 7. Remarkable Advertisements. 8. Foreign Affairs, with an Introduction to this Year’s History. 9. Catalogue of Books and Pamphlets published. 10. Observations in Gardening for the Season, and a List of Fairs till the 12th of March. By SYLVANUS URBAN of Aldermanbury, Gent. Prodesse & Delectare Printed for the Author, and sold by A. Dodd without Temple-Bar, and M. Smith at the Royal Exchange. (1731dpt03558:s:166.1)

At the bottom of page 32 of this first number of the Gentleman’s Magazine, under the header DEATHS, there is the entry given in (11):2 (11)

Jan. I. William Willoughby, of West Knoyle in Wiltshire, Esq; and 700l. per Annum fell to his Brother Richard Willoughby of Southampton Buildings, Esq;

An exact date as well as the sum of money is given “that fell to his brother”. On the following page, with the page header “VOL. I. DEATHs of Eminent Persons, JANUARY, 1731” the list is continued. The first entries are the following: (12)

Sir Peter Verdoen, Kt. late Lord Mayor of Dublin. Casper White, Alderman of the same City, and Dutch Merchant. 2. Capt. John Turner, at his Seat at Tilford, near Farnham, formerly a Wholesale Mercer in Bucklersbury. 3. Mr. Morris, Coach-maker to his R. Highness the Prince of Wales. Mr. Oliver Savigny, Cutler to his Majesty. Dr. Morton, of the College of Physicians. Mr. Dobbyns, Lithotomist and Senior Surgeon of St. Bartholomew’s Hospital.

2

The text is taken from the online version provided by the Bodleian, Oxford: pages 32 and 33. Page 34 could not be accessed (5/20/2005): http://www.bodley.ox.ac.uk/cgi-bin/ilej/image1.pl?item=page&seq=4&size=1 &id= gm.1731.1.x.1.x.x.32.

Death Notices: The Birth of a Genre

165

Mr. Boheme of Lincolns-Inn-Fields Play-house. (http://www.bodley.ox.ac.uk/cgi-bin/ilej/image1.pl?item=page&seq=1&size= 1&id=gm.1731.1.x.1.x.x.33)

The month is not repeated, but the day is given as a numeral. This type of death notice is taken up by Lloyd’s Evening Post, given in (5), where the month is not even mentioned in the first entry. Although there are many entries, there are quite a few days for which no death was to be reported. But even in the 19th century, death notices in The Times did not appear every day. Besides the two mentioned above, 18th-century magazines containing lists of births, marriages and deaths were The London Magazine; or, Gentleman‘s Monthly Intelligencer, The Universal Magazine, and The Lady’s Magazine; or Entertaining Companion for the Fair Sex.

4. Death reports and death notices Early newspapers contain many reports about deaths, which have nothing to do with a death notice. Many of these reports are really about an accident resulting in the death of a person. All these instances have been disregarded here. Two examples must suffice. (13a)

The Prince of Monaco, Ambassador of France, had Audience of the Pope the first Instant, and before he came away was seized with a fainting Fit, and was carried home very ill, and died the 3d Instant. (1701lgz03673:s:2.3)

(13b) On the 23d Instant one Thomas Gifford of Gainsborough in Lincolnshire, ript up his own Belly with a Pen-knife, and afterwards cut out two or three Yards of his Guts, which he put into a Chamber-Pot, and died soon after. (1731rwj00328:s:62.1)

However, other reports should be seen as a source for death notices and have therefore been categorised as such in the ZEN Corpus. Many of these have the prototypical form of a death notice, as given in (8), and they occur from the early numbers of The London Gazette

Udo Fries

166

onwards. They may be found on any page and following any other news report. In (14), from The London Gazette of 1671, the death of Jacques Charles Amelot is reported. This notice is preceded by a notice about the declining health of two archbishops, and, followed by another one on the promotion of several bishops. The whole section is headed by the dateline Paris, Jan. 10. (14)

The Archbishop of Sens, who for some time hath lain dangerously ill, is at present somewhat better, but he of Narbonne, is past all hopes of recovery. The sixth instant died here Jacques Charles Amelot, First President of the Court of Aydes, in the Thirty seventh year of his age, much lamented for his great Parts and Abilities, which he had given long testimony of in the discharge of that place. The King hath been pleased of late to make several promotions of Bishops, having nominated the Sieur de Rosmades, Bishop of Vannes, to the Archbishoprick of Tours; and the Sieur de Vantorte, Bishop of Leytoure to succeed him in that See. (1671lgz00537:s:16.1)

Two paragraphs further down, the next death report follows: (15)

Monsieur Caillet, who hath been several times in Poland, employed in the affairs of the Prince de Conde, died here some days ago, and was buried by Order of his Highness with much pomp and magnificence. (1671lgz00537:s:19.1)

The two examples in (14) and (15) illustrate the most common formats of death reports, both of which also occur in the lists. The first has the temporal element in initial position, which causes the inversion of the predicate (died) and the subject (the name of the deceased). It is used with both short and long subject phrases, but it is particularly useful whenever there is much information about the deceased, as this can then directly follow the name. (16)

The 24th instant dyed here suddenly Don Nicolo Fernando di Castro, Knight of the Order of St. Iago, and President Extraordinary of the Courts of Justice here, being generally lamented for his great worth and exemplary integrity, of which he had given eminent testimony in the discharge of several great Employments, in which he hath served for many years. (1671lgz00540:s:21.1)

Death Notices: The Birth of a Genre

167

The second type begins with the name of the deceased, the date usually follows the verb died. (17a)

Monsieur Rose, Lieutenant-General, died the 7th instant, in the 88th year of his age. (1701ept00037:s:14.1)

(17b) John South, Esq; one of the Commissioners of her Majesty’s Revenues of this Kingdom, died on Sunday last. (1711evp00272:s:50.1) (17c) Cardinal Laurentius Altieri died the 3d Instant. (1741dpt06847:s:18.1)

Another way of introducing news was by mentioning the letters or ‘advices’ from abroad, but also referring to unspecified sources (we learn, we hear, they write). These are normal introductions for news reports, and would not, strictly speaking, belong to the category of death notices. They are given a separate column in Table 2 below. (18a) Letters from Venice of the 26th December say the Duke of Mirandola died at Bologna the 21st. (1711dct02869:s:11.1) (18b) We have Advice that the Cardinal de Mailly Archbishop of Rheims died there on the 13th Instant at Night, by whose Death several fine Benefices are vacant and a great number of Candidates are already making Application for them to the Duke of Orleans. (1721fpt04474:s:13.4) (18c) They write from Cambridge, that last Week died there Rev. Mr. Lowcock, Fellow of Trinity College. (1741dpt06807:s:46.1) (18d) From Antigua we hear, that Capt. Soon of Col. Dalzell’s Regiment, died before the Regiment left that Island. (1741dpt06807:s:47.1) (18e) They write from Dublin, that Francis Lake Esq; Secretary to the Lord Chancellor, died there the 29th past, and is succeeded by the Hon. Mr. Hill, Brother to the Lord Hillsborough. (1721pby04978:s:29.1)

Udo Fries

168

As Table 2 illustrates, there are 237 instances in the ZEN Corpus that begin with the date and consequently have an inversion of subject and predicate, 37 that open with the name of the deceased, and 20 that start with a reference to the source. YEAR 1671 1681 1691 1701 1711 1721 1731 1741 1751 1761 1771 1781 1791 TOTAL

DATE FIRST 5 1 1 1 1 16 24 36 52 30 15 15 40 237

NAME FIRST 2 1

SOURCE

10 5 7 3 2 4 2 1

1 1 3 4 3 7

37

1

20

Table 2. Three types of death reports.

Whereas in the early decades there were only a few of these death reports, their number increases in the course of the 18th century. Clearly, around the middle of the century this was, next to the columns explicitly marked as death notices, a popular way for English newspapers to report a death. Death reports were no longer isolated instances tucked away among other news from abroad, but were collected and presented as a batch in the reports from London. In all the examples analysed here, the verb of dying used is died. It occurs in 371 instances. Passed away is not used at all, but departed this life or departed this world is found in a handful of examples and with surprising frequency at the beginning of the 18th century in four different newspapers. The earliest example is even from a newsbook of 1661. (19)

Last night being the 9. Instant, Cardinal Mazarin after a lingring sickness, departed this world at Boys de Vincennes: Not many dayes before he dyed, he made his Will, and received the last Unction. (1661kin00010:s:33.1)

Death Notices: The Birth of a Genre

169

Two papers report about the death of King William III in 1701. (20a)

On the 8th Instant, about 8 a Clock in the Morning, King William III of ever blessed Memory, departed this Life at his Palace of Kensington, after a Fortnight’s Indisposition. (1701fpt01067:s:2.1)

(20b) Yesterday morning His Most Sacred Majesty of Great Brittain departed this Life, much Lamented by his Subjects, and all that wish well to the Protestant Interest, and the Liberty and Property of the Subject. (1701lpt00434:s:32.1)

Finally, instead of died, we find is (lately) dead, but only in the early decades, between 1671 and 1721. The corpus yields 9 examples. This manner of expression is never used in death notices proper. (21a)

Here is lately dead the Bishop of Dromore, and its [sic!] thought that Doctor Essex Digby may succed him in that See. (1671lgz00537:s:3.1)

(21b) The Baron de Neveu, Imperial Envoy, is dead here. (1701lpt00268:s:12.1) (21c) Madame Raisin, the late Dauphin’s Mistris, is dead at her Country-Seat in Normandie. (1721pby05024:s:14.6)

5. Concluding remarks Computer corpora can be profitably used for text linguistic questions, providing the individual text passages are long enough and contain both the beginning and the end of the genre to be investigated. With the help of the ZEN Corpus we have been able to shed some light on the early history of English death notices. These occurred as individual death reports from the end of the 17th century, and from the early decades of the 18th century onwards, they were collected in lists and presented under changing headers. We must not forget, however, that there is a world of texts beyond any corpus.

170

Udo Fries

Even a large corpus cannot contain all the texts of a period, but only a representative selection. Nevertheless, more data would not radically alter the story of the birth of death notices. We may expect some earlier lists, and some earlier reports, possibly among the corantos and newsbooks of the 17th century, and almost certainly among the monthly magazines of the 18th century, which have not been systematically collected in any corpus. The text class Death Notices as defined in the ZEN Corpus, which combines both the lists and the individual death reports, provides a useful starting point for further studies of early news reports.

References Enkvist, Nils Erik 1987. Text Strategies: Single, Dual, Multiple. In Steele, Ross/Threadgold, Terry (eds) Language Topics: Essays in Honour of Michael Halliday. Amsterdam: John Benjamins, 203-211. Fries, Udo 1990. Two Hundred Years of English Death Notices. In Bridges, Margaret (ed.) On Strangeness. SPELL, Swiss Papers in English Language and Literature 5. Tübingen: Gunter Narr, 57-71. The Gentleman’s Magazine. http://www.bodley.ox.ac.uk/cgi-bin/ilej/ image1.pl?item=page&seq=4&size=1&id=gm.1731.1.x.1.x.x.3 2. Ljung, Magnus 2000. Newspaper Genres and Newspaper English. In Ungerer, Friedrich (ed.) English Media Texts Past and Present. Language and Textual Structure. Pragmatics & Beyond new series 80, Amsterdam and Philadelphia: John Benjamins. ZEN Corpus. Zurich English Newspaper Corpus 2004. http://eszen.unizh.ch.

FRANCK ZUMSTEIN

The Contribution of Computer-Searchable Diachronic Corpora to the Study of Word Stress Variation

1. Outline of the study In this chapter, within the framework of a study of English word stress-patterns – taking into account morphological internal structures, phonological processes, and grapho-phonemic correspondences in individual words1 – I first discuss the issue of stress pattern variations, as it appears more often than not in pronunciation dictionaries. The second part proceeds with a description of an existing large lexicophonetic computer-searchable corpora of contemporary English, as well as of an electronic version of an 18th-century pronunciation dictionary. It is shown how useful these corpora are when it comes to exhaustively collect specific data for analysis. The final part develops a detailed account of stress placement variation of words ending in -ate as it appears in contemporary English pronunciation dictionaries, with the help of exhaustive lists of words retrieved from the electronic corpus of present-day English. Then, the conclusions drawn from this study are confronted to the data collected from the diachronic corpus in order to present a complete picture of such stress variation.

1

Guierre (1979) devised this framework in his study of English word-stress.

Franck Zumstein

172

2. Word stress variation in English 2.1. What kind of variation? Any change within a linguistic system may fall under the notion of variation. Yet it must be pointed out first that variation is a process which opposes two co-existing forms: a base form and a variant form, and variants may be defined as follows: Variant […] (1) Different in some aspects while the same or similar in others […] (2) A similar but distinct form […]. (McArthur 1996: 988)

This definition stresses the fact that the base form and the variant are strongly related forms, the latter being a slightly modified replica of the first one. But how strong is the link between the two forms? Let us consider word pairs such as the following: (1)

import (n. ["ImpO;t]) / import (v. [Im"pO;t]) and estimate (v. ["estImeIt]) / estimate (adj ["estIm@t])

Are the above examples some type of stress and phonemic variations? They certainly are, but in each case, complementary distribution accounts for the alternation, that is to say that “it is possible to state a rule specifying which of the pair is found in any particular environment” (Wells 1982: 45). The rule for verb/noun stress alternations of prefixed disyllabic words is that the noun is front stressed, and the verb is stressed on its last syllable. As for verb/noun and/or adjective phonemic alternation in verbs ending in -ate, the rule states that the unstressed vowel of the last syllable is reduced in nouns and adjectives, but it is tense in verbs. Thus, both forms in each case are orthographically the same and semantically related, but they are two distinct words as regards their respective grammatical categories. It is not possible to substitute one for another in each pair. Wells mentions another type of variation which he defines as follows: “Two or more sound types are said to be freely variant if the occurrence of one or another is a matter of random chance” (1982: 45).

Word Stress Variation and Electronic Diachronic Corpora

173

Free variation may be exemplified by the vowel alternation [i;]/[e] in the first syllable with secondary stress in British English in words such as: (2)

ecological, economic, ecumenical, egocentric, egocentricity, egocentrism, egoistic, egomania, egomaniac, egomaniacal, elasticity, emendation, emissivity, equidistant, equilibrate, equilibration, equilibrium, equinoctial, equipollent, eructation, estivation, evangelic, evangelical, evocation, evolution, evolutionary.

Fowler (1996: 237) comments on this type of variation as follows: economic, economical […] 2 I have been unable to establish a consensus of any kind about the pronunciation of the first syllable of the two words. It seems to be a clear case of pleasing oneself whether to say /i;k-/ (my own preference) or /ek-/.

Here, Fowler is more specific when writing that it is a matter of one’s own preference. Fowler’s favourite pronunciation actually echoes the results of a poll panel preference mentioned in Wells’s Pronunciation Dictionary (1990: 234) in the entry of the word economic: economic […] — BrE poll panel preference: %i;k- 62%, %ek- 38%

Then, it comes as no surprise that Wells gives the [%i;k-] pronunciation as the main pronunciation (i.e. the base form) and the [%ek-] pronunciation as a variant pronunciation for all the words listed in (2). Such ordering of the different pronunciations for the same word is thus significant. In the preface of his dictionary, Wells (1990: x) clearly states that: For them [teachers and learners of ESL/EFL] one main pronunciation, printed in colour, is given at each entry. […] Other users of LPD, especially those who are native speakers of English, will be interested not only to see what form is recommended but also what variants are recognized. Where pronunciations other than the main one are in common educated use, they are too included, but as secondary pronunciations, printed in black.

Actually, Wells’s ‘random chance’ refers to the fact that it seems difficult to have a clear-cut picture of the distribution of both forms

174

Franck Zumstein

among Received Pronunciation speakers. There has been no sociolinguistic investigation which would determine such a distribution.

2.2. RP and variation Studies on variation in languages have indeed mainly been conducted by sociolinguists. They implemented fieldwork methods (questionnaires, recordings, samplings) to collect data which made up their corpora. Their aim was to account for linguistic change in progress through the study of linguistic variables, generally phonological in nature, in connection to linguistic contexts and social criteria. Besides, these surveys were carried out in geographically localized linguistic communities. For instance, Labov (1972, quoted in Hudson 1980: 148-152) studied post-vocalic r in New York City and Trudgill (1974, quoted in Hudson 1980: 152-155) accounted for the alternative pronunciation [N] or [n] of the final consonantal sequence -ng of the suffix -ing in Norwich. Thus, their works do not focus on variability within a standard pronunciation which may only be a system of reference to assess variability across social boundaries. Jones (1957), Gimson (1975) and Wells (1982) have described at length the standard pronunciation in Britain which is referred to as Received Pronunciation (henceforth RP). Wells (1982: 117) comments on this appellation as follows: “This name is less than happy, relying as it does on an outmoded meaning of received (‘generally accepted’)”. Yet it is possible to reconsider the meaning of received as an adjectival use of the past participle of the verb to receive, in which case this would mean ‘which can be heard’. Thus, the alternative appellation ‘BBC English’ would be fully justified, as well as its status as a standard, as Gimson writes: “[…] it [RP] has become more widely known and accepted through the advent of radio” (1975: 89). The advent of television has also played a considerable role in this process, not to mention the blooming business of ESL and EFL teaching. Like many other types of accents, RP encompasses variability, even though it is regarded as a standard. In fact, Wells considers that RP is not a “homogeneous monolith invariant” (1982: 279) and

Word Stress Variation and Electronic Diachronic Corpora

175

Gimson backs up this view by adding that: “RP must be regarded as an evolving mode of pronunciation in its phonological system, its phonetic realization and the incidence of its phonemes” (1975: 302). Both authors and sociolinguists give many instances of variation in the phonemic structure of individual words in English, as well as variation due to phonological processes in connected speech (assimilation, elision, yod-coalescence, r-sandhi, etc.). Most linguistic variables studied here are vowels and consonants. Yet any work on stress variation in individual words is hardly to be found.

2.3. Accounts of alternating stress patterns Gimson (1975: 230-232) devotes only two pages of his book to the variation of individual word stress patterns. In order to account for stress variation he puts forward two reasons: Hesitancy and variation of pattern occurring at the present time are the result of rhythmic and analogical pressures, both of which entail in addition considerable changes of sound pattern in the word.

Gimson defines ‘rhythmic changes’ as “a tendency to avoid a succession of weak syllables” (1975: 231). He illustrates his point with words in which stress variation is at work such as acumen ["&kjUm@n]/[@"kju;m@n], sonorous ["sQn@r@s]/[s@"nO;r@s] and precedence ["presId@ns]/[prI"si;d@ns] among others. The problem is that these words are actually counterexamples because the new stress-variant forms are front stressed, and thus allow a succession of weak syllables. As regards the definition of ‘analogical changes’, Gimson states that “a word’s accentual pattern is influenced not only by rhythmic pressures, but also by the accentual structure of a similar word of frequent occurrence” (1975: 231). Examples like distribution and contribution, with secondary stress on the first syllable, are given to explain stress retraction on the first syllable in distribute and contribute which tend to replace the older forms where stress is on the second syllable. The question is why such analogical changes have not played the same role (i.e. stress retraction) in words such as the following:

Franck Zumstein

176 (3)

ex"hibit (%exhi"bition), pro"hibit (%prohi"bition) and in"hibit (%inhi"bition) or even com"pete (%compe"tition), de"fine (%defi"nition), de"molish (%demo"lition), a"bolish (%abo"lition), etc.

It is then obvious that his explanations tend to show that stress variation is erratic in nature. Nevertheless, some sort of ‘logic’ may be found in what seems to be a chaotic situation (Deschamps 2000: 93107). Kingdon (1958), Chomsky/Halle (1968), Guierre (1979) and Fudge (1984) have shown in their own terms that stress placement can be derived by rules. In Guierre’s view these stress rules are determined according to three criteria: the word’s grammatical category, its morphological encoding and its graphemic constituents. In this framework, stress placement variation is the result of unresolved conflicting rules, each of which usually pertains to these criteria. Gimson’s examples such as distribute and contribute for which stress variation occurs would then be best explained. On one hand, penultimate stress in these words is regular when considering the rule whereby primary stress falls on the root of the vast majority of prefixed verbs, whether the prefix is a true prefix, or an etymological one. On the other hand, most trisyllabic prefixed verbs ending in -ute are irregularly front-stressed, as shown below: (4)

"substitute, "institute, "constitute, "prostitute, etc.

These words of romance origins have certainly been subject to the ‘iambic reversal’ (see section 4.1. below for more details), which was a purely rhythmic change, thus disregarding the words’ morphological structure. Why is it then that contribute and distribute are stressed on the second syllable as for their main pronunciation in the dictionary? The existing autonomous form tribute, stressed on the first syllable, has certainly played a role which concurs with the analysis based on the word’s morphological structure in explaining penultimate stressing. Today’s new front-stressed variants of both words are the result of Gimson’s analogical pressures, but it is doubtful that the derivatives distribution and contribution have contributed to the appearance of the new variants. The paradigm which must be considered here for analogy is the list of front-stressed words ending in -ute as listed in (4). The new variants fit in the picture so that this

Word Stress Variation and Electronic Diachronic Corpora

177

class, determined by its graphemic ending -ute, is becoming homogeneous as regards stress placement. As for the front stressed variant of the verb to attribute, Wells considers it to be a non-RP form in his dictionary. In fact, the stress differentiation attribute (v.) [@"trIbju;t]/attribute (n.) ["&trIbju;t] certainly makes things more difficult for the front-stressed variant of the verb to settle in the RP system. Guierre used a computer-searchable version of Jones’s Pronouncing Dictionary to assess the efficiency of each rule under scrutiny. It is then the contention of this chapter that large computer searchable lexico-phonetic corpora of contemporary English should be used to account for stress variations in individual words. It is also necessary to retrieve data from diachronic corpora to conclude on the status of variants and to assess the relative weight of stress rules because, as Gimson pointed out, RP is “an evolving mode of pronunciation” (1975: 302).

3. Corpora and data retrieval 3.1. Synchronic corpora In France, Guierre initiated the use of large computer searchable lexico-phonetic corpora of contemporary English when he studied word stress rules in the 1960s. He set up a team to digitize and tag Jones’s twelfth edition of Jones’ Pronouncing Dictionary. First, they used punch cards, but with the rapid evolution of computers the corpora were exported to a single text file format. Guierre used these corpora in the 1970s and 1980s for his research. Then he had access to the digitized version of Wells’s first edition of the Longman Pronunciation Dictionary (LPD1). The examples below are extracts from the latter corpus:

Franck Zumstein

178

combustion ƒN k@m÷"bVs…tS’@’nŸ ß¥(%)kQm-ßSƒ£3.010noitsubmoc despot ƒN "desp…QtŸ Ç-@tÇ ! -A:t Ç-@tßSì ~s ƒæsSƒ£2.10topsed

Each paragraph of the corpus corresponds to one word entry of the paper version of the dictionary. Table 1 shows how the data are organized in each entry: orthographical form part-of-speech main pronunciation in British English variant pronunciation in British English main pronunciation in American English variant pronunciation in American English orthographical form of suffixes pronunciation of suffixes Syllable count stress pattern reverse spelling

combustion ƒ N k@m÷"bVs…tS’@’nŸ

despot ƒ N "desp…QtŸ

ß¥(%)kQm-ßSƒ

Ç-@tÇ ! -A:t Ç-@tßSì ~s ƒæsSƒ

£3 .010 noitsubmoc

£2 .10 topsed

Table 1. The different data fields in the electronic version of LPD1.

Guierre automatically added new fields of data to the original file. It is the case of the last three lines which do not appear in the paper version of the dictionary. The original file2 also contained different symbols which are actually field separators, as described below: x ƒ (only) indicates the end of the orthographical form; x Ÿ indicates the end of the main pronunciation; x Ç and ß combine as start tags or end tags to delimit variant pronunciations, whether it is British English or American English; x ! is followed by all American English pronunciations;

2

The electronic text file was sent to Guierre by Longman.

Word Stress Variation and Electronic Diachronic Corpora

179

x combined Sƒ indicates the end of any pronunciation information in the entry. Yet, in the original file, it indicated the end of an entry. Guierre added the following symbols which correspond to the three new fields of data: x £ is followed by the syllable count of the words, based on the syllabification operated by Wells in the phonetic transcriptions; x . is followed by the stress pattern of the word where 1 stands for main stress, 2 for secondary stress, 3 for tertiary stress and 0 for unstressed syllables; x is followed by the reverse spelling of the head word. Symbols which stand for syllable separators were also added in the main pronunciation field. A list of these symbols is given below: x … between the last and penultimate; x ÷ between the penultimate and the antepenultimate; x z between the 3rd syllable and the 4th syllable from the word; x ¡ between the 4th syllable and the 5th syllable from the word; x # between the 5th syllable and the 6th syllable from the word; x = between the 6th syllable and the 7th syllable from the word; x ^ between the 7th syllable and the 8th syllable from the word; x ö between the 8th syllable and the 9th syllable from the word.3

end of the end of the end of the end of the end of the end of the

The phonemic transcriptions of the headwords in each entry are coded with the help of the Speech Assessment Methods Phonetic Alphabet

3

The three longest words in the dictionary have nine syllables.

Franck Zumstein

180

(henceforth SAMPA) devised by Wells.4 Table 2 presents the main correspondences between the SAMPA codes and the IPA symbols: CONSONANTS SAMPA Corresponding code IPA symbol tS tS

VOWELS SAMPA code

Corresponding IPA symbol

I

I

dZ

dZ

&

&

T

T

Q

Q

D

D

V

V

U

U

S

S

Z

Z

i:

i;

N

N

u:

u;

X

x

A:

A;

L

L

O:

O;

}

t} (AmE)

o:

O; (AmE)

Suprasegmentals

Q:

Q; (AmE)

3:

3;

SAMPA code " % º

Corresponding IPA symbol " primary stress % secondary stress tertiary stress °

Diphtongs SAMPA code

Corresponding IPA symbol

OI

OI

@U

@U

oU

oU (AmE)

aU

aU

Vowels SAMPA code $:

Corresponding IPA symbol 3r (AmE)

I@

I@

e@

e@

@

@

U@

U@

Table 2. The SAMPA correspondences in the electronic version of LPD1.

4

For more details, see http://www.phon.ucl.ac.uk/home/sampa/home.htm.

Word Stress Variation and Electronic Diachronic Corpora

181

Guierre could retrieve data from this corpus with the help of a piece of software called Macintosh Programmer’s Workshop (henceforth MPW).5 It is a tool which includes a set of built-in commands enabling the user to perform different types of operations (search, replace, sort, catenate, etc.) with large corpora files. The following example is a set of lines of commands which were edited to retrieve all the words of nine syllables from the electronic version of LPD1: Open 'A_Z9CD;2' Search -s -q /£9/ 'A_Z9CD;2' > 'LPD_9syll' Open 'LPD_9syll' Count -l 'LPD_9syll'

The first line indicates which corpus will be investigated and that it will appear on screen after the whole process. The second line will launch the search. The -s and -q attributes attached to the search command respectively indicate that the search is not case sensitive and that the file name and line numbers will not appear in the output file. The regular expression is located between the two slashes. Then, the repetition of the file name means that it is the input file and the symbol > indicates that an output file will be created. The output file contains all the entries in which the searched items appear. The third line indicates that the output file will be opened. Finally, the last line indicates that searched items included in the output file will be counted. The -l attribute stands for ‘write line counts only’, which actually means ‘write paragraph counts only’ so that the entries are counted in the output file. This tool and the electronic version of LPD1 were used to retrieve lists of words which are analysed in the following section.

3.2. Diachronic corpora Buchanan’s dictionary, first published in 1766, has been digitized and turned into a computer searchable file by Frédéric Duchesne at the University of Poitiers. Buchanan published his dictionary in the mid 5

For more information, see http://developer.apple.com/tools/mpw-tools/.

Franck Zumstein

182

18th century, and interestingly enough, it may be considered as one of the ancestors of modern pronunciation dictionaries. It consists in lists of words alphabetically ordered, and, for each word, a representation of the word’s pronunciation is given. Buchanan devised his own way of writing down the pronunciation, as shown in the image below:

Figure 1. An extract from Buchanan’s dictionary.

Duchesne added different fields of data to the corpus. Thus it provides the numeric pattern of the word’s stressing, using 1 for primary stress and 0 for unstressed syllables. Duchesne also encoded the orthographic, but not graphemic, layout of each word, using C for consonants and V for vowels. Stressed vowels are immediately followed by the digit 1. It is possible to retrieve data from the dictionary corpus, via the internet.6 The user interface, still in French, is made up of different fields in which it is possible to type data, or list-menus with different choices made available. The user sets all the parameters that make up the request and sends it through the internet. The image below is an example of parameter settings to retrieve all words ending in -ate where a consonant cluster immediately precedes the ending:

6

See http://www2.mshs.univ-poitiers.fr/Forell/PHONDICT/index.html (click on the picture of the title page of the dictionary).

Word Stress Variation and Electronic Diachronic Corpora

183

Figure 2. The parameters set to retrieve all words ending in -ate with pre-final consonant cluster.

The results are returned within seconds in the form of a table with eight columns, as exemplified in the image below:

Figure 3. The table of results.

The columns’ headings are described below: x No: ordered numeric key from 1 to the last item; x Graphie: the orthographical form of the word; x Accentuation: orthographical form of the word repeated, plus a stress mark on the stressed syllable; x Prononciation: representation of the word’s pronunciation.

Franck Zumstein

184

When the SILDoulos IPA font in this column is set with a word processor, it displays Buchanan’s representation of the pronunciation as it appears in the dictionary:

x Schéma accentuel: stress pattern (see above); x Nb syllables: syllable count. Syllabation was carried out with regard to Walker’s syllabation in his pronunciation dictionary; x Schéma graphique: orthographic pattern (see above). In order to support the analysis of alternating stress patterns of words of at least three syllables, ending in -ate in the following section of this paper, several requests were made to obtain lists of -ate words, -ate words with pre-final consonant clusters, -ate words with final /-100/ stress pattern and no pre-final consonant clusters, -ate words with final /-100/ stress pattern and pre-final consonant clusters. These results from Buchanan’s dictionary were then compared to the results obtained from LPD1 with the help of data retrieval procedures using MPW, as described in 3.1.

4. Stress variation in words ending in -ate 4.1. The situation in present-day English In British English, phonologists such as Danielsson (1948), Kingdon (1958), Gimson (1975), Guierre (1984); Duchet (1994) and Deschamps (1994) have shown that English verbs and nouns and/or adjectives of at least three syllables, ending in -ate, prefixed or not, are regularly stressed on the antepenultimate syllable, as in (5): (5)

"abdicate (v.), "acurate (adj.), di"rectorate (n.), %excom"municate (n., v. adj.), etc.

Yet in Wells’s dictionary some words are stressed on the penultimate syllable as for their main stressing, as in (6):

Word Stress Variation and Electronic Diachronic Corpora (6)

185

fe"nestrate (v.), con"summate (adj.), re"tardate (n.), al"ternate (n., adj.), etc.

Using MPW, data retrieval procedures from the computerized version of LPD1 yield a list of 846 -ate words. This number goes down to 819 as the 27 words listed in (7) are mainly compounds, foreign words or proper nouns of foreign origins: (7)

Aldersgate, antedate, Billingsgate, Bishopsgate, boilerplate, city-state, copperplate, Cripplegate, fingerplate, Harrogate, Hecate, interstate (2 instances), Irangate, Jubilate, karate, Marprelate, Newdigate, nickel-plate, numberplate, out-of-date, roller-skate, second-rate, solid-state, stablemate, up-to-date, Watergate

Then, words stressed on the last syllable or on the pre-antepenultimate syllable were also suppressed from the list, as in (8) and (9): (8)

18 words with preproparoxytone stressing: alienate, ameliorate, deoxygenate, deteriorate, disorientate, etiolate, hydrogenate, lanceolate, mandarinate, meliorate, orientate, oxygenate, patriarchate, peregrinate, pomegranate, propionate, tergiversate, variolate.

(9)

10 words with oxytone stressing: interrelate, mistranslate, overrate, overstate, recreate, reinstate, relocate, transmigrate, underrate, understate.

The stressing of such words will not be discussed here. The final number of -ate words of three or more syllables has thus gone down to 791. Words with antepenultimate stress as their main pronunciation in the dictionary represent approximately 96% of the list (760 out of 791). Some 31 words only have their primary stress on the penultimate syllable. Antepenultimate stress of the overwhelming majority of -ate words may be accounted for as the result of a phonological rule which transformed the stress pattern of these words of romance origin. This rule was called the ‘iambic reversal’, or as Danielsson (1948: 26) would put it, ‘the counter-tonic principle’. It is a process whereby late primary stress has been retracted to the beginning of the word with a one-syllable ‘jump’. This process is still occurring through the variant stressing of words listed in (10): (10)

"espionage/%espio"nage, %espla"nade/"esplanade, "fricassee/%frica"see, "kerosene/%kero"sene, "nicotine/%nico"tine, %espla"nade/"esplanade, etc.

Franck Zumstein

186

Words ending in -age or -ade are interesting in this respect because some items do not have stress variants, thus showing that the process is complete, as in (11) and (12): (11)

"acreage, "advantage, "brigandage, "cartilage, con"cubinage, "encourage, "personage, etc. (with vowel reduction occurring in the last syllable of most words)

(12)

"alidade, "marmalade, "renegade.

As regards the 31 -ate words with penultimate stress, those listed in (13) have also been suppressed from the list because they are cases of compounding, or considered as such, with stressed prefixes or combining forms: (13)

%bi"chromate, %bi"furcate, %carbo"hydrate, %de"hydrate, %in"quorate (in-(=not)+quorate), %interco"ligiate, %mepro"bamate, %super"phosphate.

The stress pattern of the noun %arti"sanate is closely connected to the stress pattern of %arti"san. The words listed in (14) also show such strong connections to base forms where semantic motivation between a base form and a derivative has overridden the -ate words stress rule: (14)

"oxygen > "oxygenate, "hydrogen > "hydrogenate.

Finally, four other -ate words should be set apart from the list: %equi"librate, per"borate, in"choate and "microclimate. The stress pattern of those words has most certainly been influenced by very similar words. The noun %equi"librium must have set a stress model for equilibrate and the noun "borate for perborate, just like the stressing of the verb "relate has influenced the stress pattern of cor"relative, despite the existing base "correlate. The stress pattern of "microclimate is modelled on that of other similar words such as "microchip, "microcomputer, "microfiche, "microfilm, etc. where micro- is a combining form which is stressed on the first syllable. The adjective in"choate seems to be the only exception to antepenultimate stress for which no apparent explanation may be given. Beside their irregular stress pattern, many of the above listed words and the words

Word Stress Variation and Electronic Diachronic Corpora

187

in (13) have stress variants with regular antepenultimate stress, as shown in (15): (15)

"bifurcate, "dehydrate, e"quilibrate, "incohate, me"probamate, "perborate.

Thus, the list of -ate words with penultimate stress amounts to 18 items, listed in (16): (16)

al"ternate, a"postate, ap"pellate, con"summate, de"cussate (v.), de"cussate (adj.), e"dentate, fe"nestrate, in"carnate, in"cruvate, in"sensate, in"spissate, in"testate, mo"lybdate, %pari"pinnate, %rein"carnate (v.), %rein"carnate (adj.), re"tardate.

Six of them are alternately stressed on the antepenultimate syllable, that is to say that they also have a regular stress pattern. They are gathered in Table 3 below: MAIN STRESS PATTERN con"summate de"cussate (v. & adj.) fe"nestrate in"spissate %rein"carnate (v.)

VARIANT STRESS PATTERN "consummate "decussate (v. & adj.) "fenestrate "inspissate %re"incarnate (v.)

Table 3. Stress variation in -ate words with pre-final consonant cluster: penultimate stress vs. antepenultimate stress.

All the words listed in (16) have a common orthographic feature: the ending -ate is always immediately preceded by a consonant cluster. Pre-final consonant clusters must be taken into account in the stressing of many English words. For example, the rules in (17) determine stress placement in adjectives ending in -al: (17a)

If the ending -al is immediately preceded by zero or one and only one consonant, primary stress is on the antepenultimate syllable of the word, whether a deriving autonomous base exists in present-day English or not. Examples: "origin > o"riginal, "adverb > ad"verbial, ? > do"minical

(17b) If the ending -al is immediately preceded by two or more consonants, except consonant + l and consonant + r, primary stress is on the penultimate syllable

Franck Zumstein

188

of the word, whether a deriving autonomous base exists in present-day English or not. Example: "dialect > %dia"lectal, ? > su"pernal, but "arbitral, "cerebral, "integral, etc. (stressed as in (a)).

Rule (17b) is then at work as for the stress placement of the -ate words in (16), but for a relatively small number compared to all the other -ate words with pre-final consonant clusters (around 50 of them) with a regular stress on the antepenultimate, as the examples in (18): (18)

ad"ministrate (v.), "magistrate (n.), "designate (adj.), etc.

Still, ten of them, listed in Table 4 below have penultimate stress variants: MAIN STRESS PATTERN "adumbrate "condensate "exculpate "impregnate "incarnate "inculcate "inculpate "insufflate "remonstrate "sequestrate

VARIANT STRESS PATTERN a"dumbrate con"densate ex"culpate im"pregnate in"carnate in"culcate in"culpate in"sufflate re"monstrate se"questrate

Table 4. Stress variation in -ate words with pre-final consonant cluster: antepenultimate stress vs. penultimate stress.

A comparison of the two tables shows then that all the words listed there have either antepenultimate stress according to the -ate word stress rule, which is the dominant one in this word class, or penultimate stress according to the pre-final consonant-cluster rule (rule 17b), which accounts for a small number of items. Stress variation here is thus the result of a conflict between both rules for stress assignment. Undeniably, the -ate word stress rule is the main one, but how is it possible to account for the other rule synchronically? Is it a new

Word Stress Variation and Electronic Diachronic Corpora

189

rule appearing in the English system, or is it a disappearing relic? It is interesting first to consider Trevian’s point of view on these questions: Conversely, positive evolution is undeniable in the way British English has resolved the conflict between consonant cluster stress assignment and antepenultimate stressing in verbs in -ate by shifting the penultimate stress of the former type to initial position as in consummate, condensate, demonstrate, defalcate, elongate, exculpate, fecundate, illustrate, impregnate, incarnate, inculcate, inculpate, inundate, promulgate and today remonstrate and sequestrate whereas the /010/ stress pattern still lingers on for some of them in American English. (Trevian 2000: 89)

In Wells’s dictionary, indeed, Table 5 below shows that more -ate words with pre-final consonant cluster have penultimate stress as a main stress pattern, or, at least, as variant stress pattern in American English: MAIN STRESS PATTERN de"falcate e"longate im"pregnate in"culcate in"culpate in"filtrate in"nervate %inter"pellate "incurvate "obfuscate "promulgate

VARIANT STRESS PATTERN "defalcate "elongate "impregnate "inculcate "inculpate "infiltrate "innervate in"terpellate in"curvate ob"fuscate pro"mulgate

Table 5. Stress variation in -ate words with pre-final consonant cluster in American English: penultimate stress vs. antepenultimate stress and vice versa.

When using the word ‘positive’, Trevian means that all -ate words tend to have a homogeneous antepenultimate stress pattern, and stress variation in present-day English is a ‘snapshot’ of a process of ‘regularization’ which Wells (1982: 101) defines as follows: Some sound changes can be explained on the grounds that they lead to greater simplicity in the grammar (in the widest sense of this term, i.e. including phonology). This involves simplifying not the physical movements of the articulators but the abstract mental plan of the language which underlies our

Franck Zumstein

190

ability to speak it. There is always a pressure to remove irregularities by bringing irregular forms under the general rule.

It may be said here that regularization is analogical in Gimson’s terms because words with pre-final consonant cluster are largely outnumbered by those with no pre-final consonant cluster. Thus, penultimate stress pattern in -ate words may be considered as irregular nowadays, but this was not the case centuries ago.

4.2. The need for diachronic corpora The digitized version of Buchanan’s dictionary yielded interesting results as regards the stressing of -ate words. On one hand, there are 393 words where final -ate is immediately preceded by one and only one consonant. All words of this class are stressed on the antepenultimate syllable. To this list must be added the 86 words in which final -ate is immediately preceded by a vowel, and all stressed on the antepenultimate. Where final stress pattern is /-10/, vowel syneresis has occurred, as, for example, in the words listed in (19): (19)

im"mediate -yait, i"nitiate -shait, ne"gotiate -shait, re"taliate -yait, sub"stantiate -shait, tra"lineate -yait.

On the other hand, out of 51 words with pre-final consonant clusters, only 9 words have antepenultimate stress, as listed in (20): (20)

ad"ministrate, "cucullate, "desiccate, "magistrate, "potentate, "scintillate, sub"ministrate, "titillate.

In words like arbitrate, consecrate, denigrate, etc. the sequence consonant + r immediately preceding the ending -ate is not a functional consonant cluster so that these words are stressed on the antepenultimate syllable. In Buchanan’s time, the iambic reversal had already taken place for -ate words, and a tiny proportion of words (under 20%) with pre-final consonant clusters had such a stressing. Yet the vast majority of these words were stressed on the penultimate. This means that the -ate word stress rule and the consonant-cluster rule were co-existing,

Word Stress Variation and Electronic Diachronic Corpora

191

but apparently not conflicting. Stress assignment was then in complementary distribution, and the words listed in (16) could be considered as exceptions. Nevertheless, it is possible to state now that they certainly were precursors of a trend of ‘analogical regularization’ which developed later. However, most pronunciation dictionaries were made by orthoepists at the time so that each dictionary content is very prescriptive in nature.7 Indeed, the title of Buchanan’s dictionary clearly states that it intended to establish “a Standard for an Elegant and Uniform Pronunciation of the English Language”, an aim which allowed little room for variation. Walker (1825: 3) also refers to such a project in the preface of his dictionary: The importance of a consistent and regular pronunciation was too obvious to be overlooked; and the want of this consistency and regularity has induced several ingenious men to endeavour at a reformation; who by exhibiting the irregularities of pronunciation, and pointing out its analogies, have reclaimed some words that were irrecoverably fixed in a wrong sound, and prevented others from being perverted by ignorance or caprice.

It is thus necessary to look up in other dictionaries in order to confirm the conclusions drawn from the study of the data provided by Buchanan’s dictionary. Yet no other pronunciation dictionary has been digitized and turned into computer-searchable version so that words listed in Table 6 below are just samples retrieved from paper versions of several dictionaries. The words with pre-final consonant clusters listed in Table 6 all have penultimate stress. Besides, it also shows that today’s stress variation described in section 3.1. has appeared after 1825, certainly in the second half of the 19th century. This observation supports earlier conclusions, but only exhaustive lists retrieved from many different electronic lexico-phonetic corpora of 18th- and 19th-century pronunciation dictionaries would be really conclusive. The results would yield a clearer picture of the evolution of the stressing of such words. It would also be possible to compare the different recordings of stress placement, and thus account for some sort of variation. 7

Wakelin (1988: 155) refers to the 18th century as “the age of prescriptivism” when mentioning works on pronunciation and grammar.

Franck Zumstein

192 WORDS

Adumbrate Alternate Compensate Consummate Demonstrate edentate extirpate illustrate impregnate inculcate

STRESS PATTERNS (1 = accented syllable, 0 = unaccented syllable) Bailey (1727) Entick (1798) Jones (1809) Walker (1825) /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/ /010/

Table 6. Different recordings in old dictionaries of -ate words with pre-final consonant cluster.

Walker occasionally mentions such variation in his dictionary by quoting other works, as in the following word entry:8 ABSOLUTORY, a4b-so4l’u1-tu2r-re1. a. That which absolves. In the first edition of this Dictionary [1791] I followed the accentuation of Johnson and Ash in this word, and placed the stress upon the first syllable, contrary to what I had done some years before in the Rhyming Dictionary [first published in 1775], where I had placed the accent on the second, and which was the accentuation adopted by Mr. Sheridan. Upon a nearer inspection of the analogies of the language, I find this the preferable mode of marking it, as words in this termination, though very irregular, generally follow the stress of the corresponding noun or verb; and consequently this word ought to have the same accent as absolve, which is the most immediate relation of the word in question, and not the accent of absolute, which is the most distant. 512. Kenrick, W. Johnston, Entick and Nares, have not inserted this word; and Mr. Perry very improperly accents it upon the third syllable.

Such comments in different editions of Walker’s dictionary are invaluable sources of information as regards the evolution of word stress placement. These comments include cross-references which point to other dictionary makers’ works and to principles of pronunciations listed at the beginning of Walker’s dictionary. For example, number 512 appears in the word entry above and refers to a 8

In Walker’s notational system for the pronunciation, a4 stands for [&], o4 stands for [Q], u1 stands for [(j)u;], u2 stands for [V] and e1 stands for [i;].

Word Stress Variation and Electronic Diachronic Corpora

193

principle in which Walker gives a very detailed account of the stressing of words ending in -arous, -erous, -orous, -ative, -atory, -etive, etc. A hypertext edition of Walker’s dictionary would do justice to the accurate reference to the general principles provided in almost every entry. Conversely, each dictionary entry which illustrates a given principle should be retrieved from the corresponding paragraph of the listed principles. The regular pattern which Walker has encapsulated in the formulation of a principle could then be evidenced by all the relevant entries. The resulting word list could then be compared to those of the recent dictionaries whose electronic text can be searched in the way described in section 3.1.

5. Conclusion Synchronic analysis of stress variation in English can be accounted for in terms of unresolved conflicting rules of stress assignment in individual words. Stress placement alternations in -ate words with prefinal consonant cluster (i.e. antepenultimate stress vs. penultimate stress) are examples of such an on-going process. Data retrieved from an electronic diachronic corpus, that is to say an 18th-century pronunciation dictionary, show that, in this case, antepenultimate stress has gained ground as it has imposed itself in most words with pre-final consonant cluster. Penultimate stress, which used to be a regular stress pattern for such words, is now a disappearing relic, and is considered an irregular variant stress pattern. Yet only one computer-searchable version of these old dictionaries was used. It is the contention of this paper that electronic versions of others are needed to retrieve exhaustive word lists so that it would be possible to firmly conclude on stress alternations. This would pave the way towards a panchronic dictionary of English pronunciation in the electronic form of a searchable database. To this aim, a newly funded

Franck Zumstein

194

research group has been set up at the University of Poitiers9 aimed at developing this database and digitizing pronunciation dictionaries of past centuries whose pronouncements should be analysed and compared.

References Bailey, Nathaniel 1727. An Orthographical Dictionary, Shewing both the Orthography and the Orthoepia of the English Tongue. London: Printed for T. Cox at the Lamb under the RoyalExchange. Buchanan, James 1766. An Essay towards Establishing a Standard for an Elegant and Uniform Pronunciation of the English Language. London: W. Teggs & Co. Chomsky, Noam/Halle, Morris 1968. The Sound Pattern of English. New York: Harper & Row. Danielsson, Bror 1948. Studies on the Accentuation of Polysyllabic Latin, Greek, and Romance Loan-Words in English, with Special Reference to those Ending in -able, -ate, -ator, -ible, -ic, -ical and -ize. Stockholm: Almqvist & Wiskell. Deschamps, Alain 1994. De l’Écrit à l’Oral et de l’Oral à l’Écrit, Phonétique et Orthographe de l’Anglais. Paris: Ophrys. Deschamps, Alain 2000. La Logique des Variantes Accentuelles de l’Anglais. In Busuttil, Pierre (ed.) Points d’Interrogation, Phonétique et Phonologie de l’Anglais. Pau: Université de Pau et des Pays de l’Adour, 93-107. Duchet, Jean-Louis 21994. Code de l’Anglais Oral. Paris: Ophrys. Entick, John 1798. The New Spelling Dictionary Teaching to Write and Pronounce the English Tongue with Ease and Propriety. By William Crakelt, first published in 1764. London: Printed for C. Dilly. 9

The research group is FORELL (FOrmes et REprésentations en Linguistique et Littérature), Equipe d’Accueil 3816, Equipe A: Linguistique Interlangues et Traitement des Textes (http://www.mshs.univ-poitiers.fr/Forell/forell.htm).

Word Stress Variation and Electronic Diachronic Corpora

195

Fowler, Henry Watson 31996. The New Fowler’s Modern English Usage. Oxford: Oxford University Press. Fudge, Eric 1984. English Word Stress. London: Allen & Unwin. Gimson, Alfred Charles 31975. An Introduction to the Pronunciation of English. London: Arnold. Guierre, Lionel 1979. Essai sur l’Accentuation en Anglais Contemporain. Paris: Université de Paris VII. Guierre, Lionel 41984. Drills in English Stress Patterns. Paris: Armand Colin-Longman. Hudson, Richard 1980. Sociolinguistics. Cambridge: Cambridge University Press. Jones, Daniel 21957. An Outline of English Phonetics. Cambridge: Heffer. Jones, Daniel 151997. English Pronouncing Dictionary. Roach, Peter/Hartman, James (eds). Cambridge: Cambridge University Press. Jones, Stephen 21809. A General Pronouncing and Explanatory Dictionary of the English Language. Philadelphia: Bennett & Walton. Kingdon, Roger 1958. The Groundwork of English Stress. London: Longmans, Green & Co. Labov, William 1972. Sociolinguistic Patterns. Philadelphia: University of Pennsylvania Press. McArthur, Tom (ed.) 1996. The Oxford Companion to the English Language. Abridged edition. Oxford: Oxford University Press. Trevian, Ives 2000. Variants and Phonetic Changes in Lexemes with Irregular Realisations: Is the English Language Overcoming its Phonological Conflicts? In Busuttil, Pierre (ed.) Points d’Interrogation, Phonétique et Phonologie de l’Anglais. Pau: Université de Pau et des Pays de l’Adour, 72-90. Trudgill, Peter 1974. The Social Differentiation of English in Norwich. Cambridge: Cambridge University Press. Wakelin, Martyn 1988. The Archaeology of English. London: Batsford. Walker, John 21825. A Critical Pronouncing Dictionary and Expositor of the English Language. Stereotyped edition. New York: Collins & Hannay.

196

Franck Zumstein

Wells, John 1982. Accents of English. Cambridge: Cambridge University Press. Wells, John 1990. Longman Pronunciation Dictionary. Harlow: Longman.

19th-Century and 20 th-Century English

MERJA KYTÖ / ERIK SMITTERBERG

19th-Century English: An Age of Stability or a Period of Change?

1. Introduction The 19th century has so far been comparatively neglected in diachronic studies of the English language.1 One reason for this state of affairs is the view that there are few conspicuous differences between the syntax of late Modern English and that of the present day. By and large, this view is based on the fact that few qualitative changes take place in this period: the inventory of syntactic variants has remained largely the same since 1800 (Rydén 1979: 34; see also Beal 2004: 66). The English of this period also formed the basis for statements in the famous grammars by authors such as Jespersen (1909-1949) and Poutsma (1914-1929), which may have led scholars to see 19thcentury English in particular as an extension of Present-day English backwards in time. Finally, the abundance of printed sources with standardized language, and the dearth of linguistic studies based on manuscript sources, may give the impression that 19th-century English is a fairly homogeneous entity (see, however, Fairman forthcoming for a study of manuscript documents). But despite this apparent similarity, 19th-century English is not characterized by linguistic stability alone. If we apply a quantitative perspective and study the relative distribution of variants, lexical items, etc., it becomes clear that the language of the period rather exhibits a tension between stability and change. As the present study will show, linguistic variation, which is a prerequisite for language 1

In recognition of his significant contributions to the empirical study of the English language, it gives us great pleasure to dedicate this study to Professor Günter Rohdenburg, on the occasion of his 65th birthday.

200

Merja Kytö / Erik Smitterberg

change, is in evidence both within idiolects, between genres, and in the form of gender differences in usage. Moreover, when variation does lead to change, the result is often that colloquial features, which previously occurred mostly in speech, increase in frequency in written genres as well. Dekeyser (1975) has shown that prescriptive statements became less strict during the course of the 1800s, perhaps reflecting a gradual acceptance of oral features in writing. Many of these changes continued into the 20th century. This study aims at investigating aspects of stability, variation, and change in 19th-century English, with some attention paid to 20thcentury developments as well. It will also outline some linguistic and extralinguistic factors that seem to be influential in this regard. These factors may, for instance, create the potential for change by causing stylistic variation between genres. Finally, the results of this study will show that combining several methodological approaches can improve our understanding of 19th-century English, and that certain issues in corpus linguistics and historical linguistics methodology must be considered when the results are interpreted. After a section describing the material used, three case studies are presented. The topics of the case studies are lexical bundles, or multi-word expressions (section 3), multal quantifiers (section 4), and the distribution of the progressive compared with that of phrasal verbs (section 5). The occurrence of lexical bundles in Present-day English has received a great deal of attention in recent years, but not much is known about their distribution in historical texts. As regards multal quantifiers, the progressive, and phrasal verbs, previous research indicates that all of these features undergo change in 19th-century English. The case studies will, among other things, compare results from different corpora, consider the results of multi-feature/multidimensional analyses, and examine variation within idiolects and across the extralinguistic parameters of time, genre, and gender.

19th-Century English: An Age of Stability or a Period of Change?

201

2. Material The present study is based on two corpora. Our 19th-century data come from the one-million-word corpus of British English, CONCE (Corpus of Nineteenth-Century English; see Kytö/Rudanko/Smitterberg 2000). In order to contrast 19th-century and 20th-century English, we will compare the results for CONCE with those for roughly 200,000 words taken from the 20th-century British English sections of the 1.7-millionword ARCHER corpus (A Representative Corpus of Historical English Registers; see Biber et al. 1994 for a description of the ARCHER corpus). The texts in the CONCE corpus are divided into three periods: 1800-1830, 1850-1870, and 1870-1900.2 The material is representative of seven genres (Debates, Drama, Fiction, History, Letters, Science, and Trials) that fall in the categories of speechrelated versus written texts, and expository versus non-expository texts. The Letters genre is represented extensively, stratified into letters written by women and by men, which enables a gender perspective on language variation and change. (For the word counts used, see Kytö/Rudanko/Smitterberg 2000: 89 [Table 4], 90 [Table 5].) The ARCHER corpus comprises both written and speechrelated genres from 1650 to 1990. ARCHER contains four genres that can be considered roughly parallel to the corresponding genres in CONCE, and only these genres will be included in comparisons with results based on CONCE. The genres included are Letters (c. 26,000 words), Fiction (c. 112,000 words), Drama (c. 64,000 words), and Science (c. 44,000 words). The case study of multal quantifiers draws for data on all of these texts; for lexical bundles, Letters, Science, History, and Trials (of which the latter two are not represented in ARCHER) were used; and progressives and phrasal verbs were extracted from parts of the 2

The periodization of the CONCE corpus was influenced both by important extralinguistic developments, such as the Reform Bills, and by the availability of suitable texts in the libraries consulted; this explains the 20-year gap between periods 1 and 2.

202

Merja Kytö / Erik Smitterberg

CONCE corpus. There are several caveats regarding comparisons of results based on CONCE and ARCHER. First, each subperiod covers a larger stretch of time in ARCHER than in CONCE, whereas the period samples are bigger in CONCE than in ARCHER. Secondly, the most extensively sampled genre is Letters in CONCE but Fiction in ARCHER. Thirdly, the genre descriptions and sampling principles of the two corpora are not identical.

3. Lexical bundles 3.1. Introduction Over the past two decades or so, linguists have begun to show an interest in multi-word combinations such as in a nutshell and I don’t think I. These are referred to by a plethora of terms, among them lexical bundles, prefabs (i.e. prefabricated or fixed expressions or patterns), clusters, and lexical phrases. There is some evidence that it is not accurate to assume that grammar is purely compositional, i.e. that larger units (e.g. phrases) are built by smaller units (e.g. words). Instead, grammar appears to comprise prefabricated expressions stored in our linguistic resources as single units. The study of these expressions enables us to better analyse and understand various aspects of language use than would be the case if the merely compositional model were adopted (Biber et al. 2003: 71). We approach this topic from the historical perspective, something that has not been done to any greater extent so far, using a systematic corpusbased approach (for a pilot study, see Culpeper/Kytö 2002). Our research questions can be summed up as follows: (i) how do three-word and four-word combinations vary across the genres included in the study; (ii) what are the grammatical characteristics of these combinations; (iii) how do these word-combinations compare with those revealed in present-day studies for spoken and written English?

19th-Century English: An Age of Stability or a Period of Change?

203

We refer to our word-combinations by the term ‘lexical bundle’, along the lines introduced in the Longman Grammar of Spoken and Written English (1999), by Biber et al. According to Biber et al. (1999: 990), lexical bundles can be defined as “recurrent expressions, regardless of their idiomaticity, and regardless of their structural status. That is, lexical bundles are simply sequences of word forms that commonly go together in natural discourse”. Consequently, lexical bundles need not be structurally complete (cf. the four-word bundle I don’t think I above), nor is it often possible to use a single word to substitute for a bundle (Biber et al. 1999: 989). Using computerized techniques, Biber et al. (1999, 2003) derived the most common three-word and four-word lexical bundles in both conversation and academic prose. Among other things, they showed that, in conversation, most lexical bundles are parts of interrogatives or declarative clauses, some 90% including part of a verb phrase. In academic prose, on the other hand, over 60% of lexical bundles are parts of noun phrases or prepositional phrases. In the present historical survey, the key genres investigated are Letters, Science, History, and Trials, and the focus is on three-word and fourword combinations. Results for three-word bundles are given in section 3.2 and those for four-word bundles in section 3.3, with the exception that three- and four-word bundles found in Trials are discussed together in section 3.3. In our study, to be included in the analysis, a lexical bundle had to occur at least ten times, and instances had to appear in at least four different texts; only the most frequent lexical bundles in each data set are commented on in what follows. In addition to the ranking numbers and raw figures, we give incidence figures per 10,000 words.

3.2. Three-word bundles Table 1 shows the occurrence of three-word lexical bundles in the letters by male and female writers in the CONCE corpus; in all tables in section 3, NF stands for ‘normalized frequency per 10,000 words’.

204

1 2 3 4 5 6 7 8 9 10

Letters by men Freq. I do not 102 I have been 101 I shall be 81 I am very 69 I have not 56 I have no; that I am 45 I did not 44 it is a; it will be; that I have 43 I am not 42 And I am; I am glad 40

Merja Kytö / Erik Smitterberg NF Letters by women Freq. 6.3 1 I do not 101 6.2 2 I have been 99 5.0 3 I have not 75 4.3 4 I did not 61 3.5 5 I am sure; I am very 60 2.8 6 it would be 54 2.7 7 it is a 53 2.7 8 I shall be 51 2.6 9 God bless you 47 2.5 10 I could not; 45 it will be

NF 5.6 5.4 4.1 3.4 3.3 3.0 2.9 2.8 2.6 2.5

Table 1. Three-word lexical bundles in CONCE, letters by men and women.

By and large, the incidence figures are slightly higher for men for the top-five bundles and there is also some variation in the ranking order of the items. However, six of the top-ten ranked bundles are common to both men’s and women’s writing (among the top-five bundles, four appear in both lists). As regards the grammatical characteristics of the expressions, we mostly find parts of verb phrases, with first person singular subject followed by an auxiliary, or BE, or HAVE, for instance, I do not, I have been, I have not. Significantly, these are parts of verb phrases that Biber et al. (1999, 2003) have shown are characteristic of Present-day English conversation; note, however, that no contracted forms appear in these lists (they do occur in the data but are rare in 19th-century writing, overall). Letters also display a clear instance of a genre-specific bundle, the expression a letter from. In addition, the idiom God bless you ranks high in both men’s and women’s letters in period 1, and remains in use in women’s letters across the century (this may be due to the fact that the women letter-writers sampled mostly wrote to their family members, and not to colleagues or acquaintances as the men did). In Science texts (see Table 2), the top-ranked three-word bundles clearly differ from those found in Letters: they are mostly parts of noun phrases or prepositional phrases, for instance, part of the, of the same, the case of. These, as we already mentioned, have been shown to be characteristic of academic prose in Present-day

19th-Century English: An Age of Stability or a Period of Change?

205

English. Only two verb phrases appear among the top-ranked items in science texts, that is, it may be and it will be.

1 2 3 4 5 6 7 8 9 10

Science part of the of the same the case of the rate of at the same; in the same parts of the it may be; of the country; the number of on the other; some of the it will be the same time

Freq. 60 46 40 38 37 33 32 31 29 28

NF 6.0 4.6 4.0 3.8 3.7 3.3 3.2 3.1 2.9 2.8

Table 2. Three-word lexical bundles in CONCE, Science.

The three-word bundles in the 20th-century Science section of the ARCHER corpus also comprise similar parts of noun phrases and prepositional phrases: in the case (25x / 5.6), the case of (24x / 5.4), the presence of (22x / 5.0), part of the (20x / 4.5), it will be (17x / 3.8), in order to, the effect of, the values of (16 x), the value of (14x), in the presence, that of the, the rate of (13x), the number of (12x), due to the, the ratio of (11x). In fact, five of the expressions appear in both 19th and 20th-century lists. The only verb phrase is it will be, which also occurred in the CONCE data. The History genre presents more variation (Table 3). History 1 2 3 4 5 6 7 8

as well as the house of Would have been Which he had he had been one of the it was not; of the king on the other

Freq. 44 37 36 28 27 26 25 24

NF 4.8 4.0 3.9 3.0 2.9 2.8 2.7 2.6

Merja Kytö / Erik Smitterberg

206 9 10

and it was; the end of which had been

20 19

2.2 2.1

Table 3. Three-word lexical bundles in CONCE, History.

Parts of noun phrases and prepositional phrases appear in the top-ten list, accompanied by verb phrases in the past tense or pluperfect (for instance, would have been and it was not). The latter emerge as genre markers typical of this genre. An interesting bundle is the top-ranked item as well as: this conjunction provides an efficient way of packing information, well suited for the purposes of descriptive narration, which is characteristic of History texts.

3.3. Four-word bundles Compared with three-word bundles, the figures for four-word bundles are generally low in Present-day English (see Biber et al. 1999: 994). This also holds for our 19th-century data, but some trends can still be observed. Table 4 shows that in Letters, as was the case with threeword bundles, verb phrases or parts of them dominate in texts by both men and women. Letters by men 1 I am very sorry 2 I am very glad; I have no doubt 3 but I do not 4 I am glad to; I do not know; I do not think 5 I am going to 6 at the end of; I am sorry to 7 a day or two; a great deal of

Freq. NF 18 1.1 17 1.1

Letters by women 1 I am going to 2 I should like to

Freq. NF 25 1.4 22 1.2

16 15

1.0 0.9

3 I hope you will 4 at the same time

19 18

1.0 1.0

14

0.9

16

0.9

13

0.8

5 I feel as if; I have no doubt 6 I do not know

15

0.8

12

0.7

14

0.8

7 a great deal of; but I do not; had a letter from; I am very glad; I have had a; it would be a; to write to you; we are going to

19th-Century English: An Age of Stability or a Period of Change? 8 I have written to; I shall be glad; I should like to; I was very glad; in the course of

11

0.7

207

8 at the end of; I am so glad; I do not think; I have not yet; to be able to

13

0.7

Table 4. Four-word lexical bundles in CONCE, letters by men and women.

Among the bundles that are common to both men’s and women’s writing are again frames with first person singular subject followed by an auxiliary, or BE, or HAVE, for instance, I am going to, I am very glad, I have no doubt, and I do not know. With regard to similarities and differences between men’s and women’s Letters texts, of the nearly 20 items in the men’s list and nearly 30 items in the women’s list, about ten appear in both. We also find a number of bundles that are clearly genre-specific, for instance, I have written to, had a letter from, and to write to you. Also, the multal quantifier a great deal of (see section 4) occurs as a lexical bundle, ranked as number seven, in both men’s and women’s writing. In Science texts (see Table 5), we mostly find prepositional phrases which function as cohesive or organizational devices structuring the text, for instance, in the case of and on the other hand. Science 1 at the same time 2 in the case of; on the other hand 3 in proportion to the; that is to say 4 in the course of 5 in consequence of the 6 in a state of

Freq. NF 26 2.6 24 2.4

History 1 on the other hand 2 of the house of

Freq. 17 16

NF 1.8 1.7

16

1.6

3 at the same time

15

1.6

15 11 10

1.5 1.1 1.0

4 at the head of 5 it would have been

14 11

1.5 1.2

Table 5. Four-word lexical bundles in CONCE, Science and History.

In the case of is also the top-ranked bundle that emerges in the 20thcentury Science section of the ARCHER corpus (23x / 5.2); this and another item in the 19th-century list, namely the item on the other hand, are also the two top-ranked expressions in present-day academic prose, as shown in the Longman Grammar.

Merja Kytö / Erik Smitterberg

208

Table 5 shows, further, that two of the top-ranked lexical bundles in the 19th-century Science texts also appear in History, that is, on the other hand and at the same time. History also exhibits the occurrence of genre-specific bundles such as of the house of. Finally, Table 6 gives the results for three-word and four-word lexical bundles in 19th-century Trials (the results for the three-word and four-word bundles are presented in the same table for this genre, as no comparative data were readily available for this investigation with regard to the parameters of time and gender).

1 2 3 4 5 6 7

Three-word lexical bundles I do not I did not did you see Do you remember Do not know At that time did he say

8

Do you know

9 10

what did you In the morning

Freq. NF Four-word lexical bundles Freq. NF 334 17.5 1 I do not know 136 7.1 237 12.4 2 did you see him 50 2.6 179 9.4 3 what did he say 46 2.4 160 8.4 4 what did you say 44 2.3 154 8.1 5 I do not think 39 2.0 142 7.5 6 I do not remember 33 1.7 136 7.1 7 how long have you; in 31 1.6 the course of; in the habit of 114 6.0 8 do not know whether; I 29 1.5 did not see; I think it was 112 5.9 9 I did not know 28 1.5 105 5.5 10 did he say anything; 26 1.4 what time did you

Table 6. Three-word and four-word lexical bundles in CONCE, Trials.

The incidence figures are the highest so far; this reflects the formulaic discourse situation in the courtroom. Both lists comprise items familiar from present-day conversation; for instance, by far the most frequent four-word bundle I do not know is essentially the same as the most common three-word bundle I don’t know in present-day conversation (over 1,000 occurrences per a million words, according to Biber et al. 1999). Interestingly, some of the expressions are also found in Letters, for instance, the three-word bundles I do not, I did not, and, in the group of four-word bundles, I do not know. This

19th-Century English: An Age of Stability or a Period of Change?

209

overlap indicates that Letters texts incorporate features of spoken communication. Some genre-specific expressions are also represented in both lists, for instance, parts of or full interrogatives, did you see, do you remember, did he say, do you know, did you see him, what did he say, and what did you say. Moreover, there are answers that seem to be specific to the courtroom situation, for instance, I do not remember, I think it was, and I did not know.

3.4. Summary and discussion In sum, our survey of three-word and four-word lexical bundles in 19th-century English has revealed a surprising degree of stability across time, as indicated by the similarities observed in the findings for both 19th-century and Present-day English. The lexical bundles characteristic of Letters and Science in CONCE are also typical of present-day conversation and academic prose respectively (see Biber et al. 1999: 993–995). Trials displayed features familiar from presentday conversation but were also characterised by language use typical of the courtroom situation. In contrast to the stability across time that was found in the data, there were notable differences between the genres studied: some bundles were clearly genre-specific. As regards the gender parameter, men and women letter writers made use of similar bundles to a great extent but also exhibited differences in their usage. Finally, the fact that many lexical bundles in Letters contained first-person pronouns – while most top-listed bundles in Science and History contained nouns and prepositions – enables interesting comparisons with multi-feature/multi-dimensional analyses. In Geisler’s factor score analysis of CONCE, Letters emerge as an involved genre and Science as an informational genre on Dimension 1; while first-person pronouns load as an involved feature on this dimension, nouns and prepositions load as informational features (Geisler 2002). The linguistic differentiation between these two genres is thus clear both in the factor score analysis and in the lexical-bundle analysis. As will appear below, the other linguistic features included

210

Merja Kytö / Erik Smitterberg

in this investigation also display interesting parallels with the results of multi-feature/multi-dimensional analyses.

4. Multal quantifiers 4.1. Introduction Multal quantifiers denote a large quantity or degree, e.g. a lot of in We have sold a lot of horses this year. Their development through the history of English is of interest for several reasons. First, the paradigm of multal quantifiers has changed across time, with some variants falling out of use, such as Old English fela, and other variants being introduced, e.g. plenty of (see Dekeyser 1994 for a diachronic account). In addition, the distribution of variants has changed; for instance, the multal quantifier much has been restricted to uncountable contexts (Dekeyser 1994: 291). The 19th and 20th centuries are relevant to the development of multal quantification in English with regard to both the make-up of the variant field and the distribution of variants. The informal quantifiers a lot (of) and lots (of), which are common in Present-day English, appear to have entered the written language during the late Modern period (Dekeyser 1994: 294); Dekeyser’s earliest example of lots of is from the 19th century. As regards the distribution of variants, it will be shown that late Modern English was important to the process of change by which much and, to a lesser extent, many came to be associated with non-assertive contexts. Previous quantitative research on multal quantifiers in late Modern English has focused on the language of fiction (e.g. Behre 1967, 1969). However, as stylistic factors influence the distribution of variants, it is relevant to take genre differences into account. Moreover, since late Modern English is characterized by increasing linguistic diversification of written genres (see Biber/Finegan 1997), a cross-genre study of multal quantifiers during this period can clarify the connection between genre development and language change.

19th-Century English: An Age of Stability or a Period of Change?

211

In this case study, we will examine the use of multal quantifiers in English in the 19th and 20th centuries. We will take into account linguistic factors such as countable vs. uncountable contexts and the syntactic function of the quantifier (determiner or pronoun). The extralinguistic factors considered are time and genre. 4.2. Data Seven main types of multal quantifiers were included in the study. These types will be referred to as MUCH, MANY, DEAL, X MANY, PLENTY, LOT, and LOTS in what follows; see Table 7. Type MUCH MANY DEAL X MANY PLENTY LOT LOTS

Typical realizations much many a good/great deal (of) a good/great many plenty (of) a lot (of) lots (of)

Table 7. Realizations of the seven types of multal quantifiers included in the study.

The seven types listed in Table 7 can be seen as variants of a variant field, if a wide definition of semantic equivalence is adopted (see Raumolin-Brunberg 1988: 140f. for a discussion). Examples (1)-(7) illustrate the seven types. (1)

Of course one never had much faith in the report, but I had rather it had been circulated by her. (CONCE, Letters, Butler, May, 1870-1900, p. 229)

(2)

However, important though resin canals undoubtedly are, too little is known with regard to them to warrant many of the prevailing conclusions. (ARCHER, Science, 1900-1950, 1925thom.s8)

(3)

She was quite satisfied that a good deal was effected by this make-belief of housekeeping; and was as merry as if we had been keeping a baby-house, for a joke. (CONCE, Fiction, Dickens, 1850-1870, p. 459)

212

Merja Kytö / Erik Smitterberg

(4)

[$Q.$] Now your wife has said, that she has a great many children? (CONCE, Trials, Bowditch, 1800-1830, p. 93)

(5)

But that has not lasted long, for God knows I have plenty to cheer me in the long run. (CONCE, Letters, Dickens, 1850-1870, p. 348)

(6)

But there are a lot of unconsidered trifles about, and if you get a good telescope and watch, you will have a glimpse as they hover between sand and rooks’ beaks. (CONCE, Letters, Huxley, 1870-1900, p. 310)

(7)

[$Chodd Sen.$] I shan’t, – I ain’t ashamed of what I was, nor what I am; it never was my way. Well, sir, I have lots of brass. (CONCE, Drama, Robertson, 1850-1870, p. 8)

In addition to these seven types, previous research sometimes mentions other options, such as a large amount of and a large number of (see, for instance, Quirk et al. 1985: 264). However, these options do not seem to have the same status as the seven types included in the study. First, their form is not as fixed as that of the types in Table 7: it is even possible to reverse the connotation of multeity by substituting a paucal premodifier (e.g. small) for large. Secondly, previous quantitative research on the distribution of multal quantifiers, e.g. Behre (1967, 1969) and Dekeyser (1994), appears to focus on the seven types included in Table 7 (Dekeyser 1994 does not specifically mention the type X MANY, but may have intended for it to be included as a subtype of MANY). For these reasons, we exclude patterns such as a large amount/number of; nor are pure adjectives that denote multeity, like numerous, included. The multal quantifiers listed in (1)-(7) all function as pronouns or determiners. For instance, MUCH in (1) functions as a determiner, while DEAL in (3) was classified as having pronoun function, because MUCH would be a pronoun if it filled the same slot. However, some of the seven types can also have adverbial function, as in (8): (8)

I do like your verses very much, and almost know them by heart. (CONCE, Letters, Wilson, 1850-1870, p. 427)

The present study covers pronouns and determiners only; instances such as (8) were excluded from the counts. Multal quantifiers

19th-Century English: An Age of Stability or a Period of Change?

213

functioning as pronouns and determiners will be referred to as ‘pronouns’ and ‘determiners’, respectively. Multal quantifiers can be subdivided into two main groups: on the one hand, MUCH and MANY, and, on the other hand, the multiword quantifiers DEAL, X MANY, PLENTY, LOT, and LOTS. Within these groups, factors such as the countability of the head of the noun phrase and the stylistic level of the text then affect the choice of the variant. We will therefore often present conflated results for these two groups. MUCH and MANY will be referred to as ‘closed-class’ quantifiers, and DEAL, X MANY, PLENTY, LOT, and LOTS as ‘openclass’ quantifiers, since members of the latter group all contain morphemes that can also occur as words belonging to open word classes in Present-day English.3 Instances of the seven types were retrieved automatically from the corpus files; the concordances were then screened manually in order to exclude irrelevant instances. In order to focus on variation between the open-class and closed-class groups, only multal quantifiers that occurred in linguistic contexts where both open-class and closed-class variants were possible were included in the counts (thus, for instance, multal quantifiers preceded by central determiners, as, how(ever), so, such, and too, or followed by as, were excluded). In late Modern English, choice between open-class and closed-class quantifiers seems to exist chiefly in assertive contexts, such as (2) above. In contrast, in nonassertive contexts (see Quirk et al. 1985: 775-785), like the clause negated by never in (1), closed-class quantification is the norm (in Smitterberg 2003, open-class quantifiers accounted for less than five per cent of all relevant instances in nonassertive contexts). Only quantifiers occurring in assertive contexts, such as MANY in (2), were therefore included in the counts. Declarative questions such as that in (4) were also excluded, as it was not certain whether the choice of quantifier would be influenced 3

The term ‘open-class’ may not be entirely appropriate, as these types have undergone grammaticalization and developed into closed-class constructions that may be termed complex determiners and pronouns (we are grateful to Anne Curzan for drawing our attention further to this issue). However, we will use the term ‘open-class’ in the present study, as the open-class words are still recognizable in writing, and as this is the most common term for these types in previous research (see, for instance, Quirk et al. 1985: 264).

Merja Kytö / Erik Smitterberg

214

chiefly by the interrogative meaning or the statement form. In addition, a number of more or less set phrases and constructions, such as many thanks, much obliged, many + a(n) + noun, and in plenty, were excluded from the counts. After the manual post-processing round, 928 multal quantifiers were included in the counts. 4.3. Results The cross-genre distribution of open-class and closed-class quantifiers in CONCE and ARCHER is given in Tables 8 and 9. Genre Debates Drama Fiction History Letters Science Trials TOTAL

Open 13 30 34 1 115 3 48 244

% 21 70 52 1 40 2 49 31

Closed 50 13 32 82 174 133 49 533

% 79 30 48 99 60 98 51 69

Total 63 43 66 83 289 136 97 777

Table 8. Open-class and closed-class multal quantifiers in CONCE by genre. Genre Drama Fiction Letters Science TOTAL

Open 17 31 14 1 63

% 81 43 52 3 42

Closed 4 41 13 30 88

% 19 57 48 97 58

Total 21 72 27 31 151

Table 9. Open-class and closed-class multal quantifiers in ARCHER (British English, 1900-1990; Drama, Fiction, Letters, and Science) by genre.

Tables 8 and 9 show that stylistic differences between genres influence the distribution of multal quantifiers. The Drama genre, which is characterized by informal, speech-related language, contains the highest proportion of open-class quantification, followed by genres that are influenced by spoken language. In contrast, closedclass quantification prevails in the written expository genres History and Science. A look at the occurrence of the most informal open-class

19th-Century English: An Age of Stability or a Period of Change?

215

variants in the material, LOT and LOTS, further strengthens this impression: these quantifiers do not occur in Debates, History, and Science in CONCE, and are absent from Science in ARCHER. LOT and LOTS appear to have entered written English in the late Modern period (see section 4.1); they are not attested until period 2 in CONCE. However, Behre’s (1967) investigation of Agatha Christie’s fiction revealed that LOT especially was common in dialogue. Against this background, an increase in open-class quantification between CONCE and ARCHER might be hypothesized. The results support this hypothesis as regards Drama and Letters, while no change was attested in Science; Fiction even seems to develop towards more closed-class quantification. However, the overall figures hide important developments in subcategories of the data. As Tables 10 and 11 show, determiners and pronouns are not sharply differentiated in the CONCE data, whereas they diverge in ARCHER. Function Period 1800-1830 1850-1870 1870-1900 TOTAL Table 10. Function Period 1900-1950 1950-1990 TOTAL Table 11.

Open 35 57 77 169

% 20 30 41 31

Determiners Closed % 140 80 131 70 112 59 383 69

Total 175 188 189 552

Open 14 26 35 75

% 19 35 46 33

Pronouns Closed 60 49 41 150

% 81 65 54 67

Total 74 75 76 225

Open-class and closed-class multal quantifiers in CONCE by function and period.

Open % 18 36 19 31 37 33

Determiners Closed % 32 64 43 69 75 67

Total 50 62 112

Open 13 13 26

% 62 72 67

Pronouns Closed 8 5 13

% 38 28 33

Total 21 18 39

Open-class and closed-class multal quantifiers in ARCHER (British English, 1900-1990; Drama, Fiction, Letters, and Science) by function and period.

As shown in Tables 10 and 11, the proportion of open-class pronouns increases steadily in the 19th and 20th centuries, while this trend is

Merja Kytö / Erik Smitterberg

216

reversed for determiners between CONCE and ARCHER (however, comparisons across these corpora need not be reliable, as the genre make-up of CONCE and ARCHER is not identical – see section 2). In CONCE, the overall development of both functions is statistically significant at the 0.05 confidence level. Within ARCHER, in contrast, neither function exhibits a significant change. On the other hand, determiners and pronouns are not statistically distinct in CONCE, but they are in ARCHER.4 The low percentage of open-class determiners in ARCHER chiefly concerns countable contexts, where MANY is used instead of LOT, LOTS, PLENTY, or X MANY (in uncountable contexts, the choice is between closed-class MUCH and open-class DEAL, LOT, LOTS, and PLENTY). The influence of the countability parameter on determiners and pronouns taken together can be seen in Table 12. In order to facilitate comparisons between CONCE and ARCHER, Table 12 only includes data from Drama, Fiction, Letters, and Science in CONCE; nor are quantifiers that resisted classification on the countability parameter included. Context Period 1800-1830 1850-1870 1870-1900 1900-1950 1950-1990 TOTAL Table 12.

4

Open 8 19 34 7 10 78

% 7 19 33 19 21 20

Countable Closed 103 79 68 29 38 317

% 93 81 67 81 79 80

Total 111 98 102 36 48 395

Open 23 49 43 22 22 159

Uncountable % Closed % 50 23 50 53 44 47 61 28 39 69 10 31 71 9 29 58 114 42

Total 46 93 71 32 31 273

Open-class and closed-class multal quantifiers by context and period in CONCE (1800-1900; Drama, Fiction, Letters, and Science) and ARCHER (British English, 1900-1990; Drama, Fiction, Letters, and Science).

Determiners, 1800-1900: d.f. = 2, Ȥ2 = 18.413, p < 0.001; pronouns, 18001900: d.f. = 2, Ȥ2 = 12.512, p = 0.002; determiners, 1900-1990: d.f. = 1, Ȥ2 = 0.359, p = 0.549; pronouns, 1900-1990: d.f. = 1, Ȥ2 = 0.464, p = 0.496; determiners vs. pronouns, CONCE: d.f. = 1, Ȥ2 = 0.548, p = 0.460; determiners vs. pronouns, ARCHER: d.f. = 1, Ȥ2 = 13.456, p < 0.001.

19th-Century English: An Age of Stability or a Period of Change?

217

The results in Table 12 indicate that uncountable contexts display a continuous increase in open-class quantification through the 19th and 20th centuries. In countable contexts, there is, instead, an apparent decrease in open-class quantification between the 19th and 20th centuries. (As the relative proportions of the sample sizes differ between the corpora, the results were not tested for significance.) This decrease is mainly due to the Fiction genre, which affects the overall figures more than the other ARCHER genres do, because it is sampled more extensively. The language of Fiction texts is influenced by factors such as the proportion of narrative to dialogue, and genre development in the form of literary trends, which may affect the results. However, Behre’s (1969) results, based on fiction by five 19thcentury and 20th-century authors, imply an increase in the percentage of open-class determiners in both countable and uncountable contexts. It is also possible that the occurrence of other ways of expressing multeity is influential (see section 4.2 for examples). More data are needed in order to make clear what other factors may be relevant in this regard. A look at gender differences in the Letters genre in CONCE, and a comparison of the results with Geisler’s (2003) factor score analysis of the same texts, strengthens the impression that open-class quantifiers tend to occur in ‘oral’ rather than ‘literate’ genres. Table 13 compares women and men letter-writers’ use of open-class and closed-class quantification in CONCE. Gender Period 1800-1830 1850-1870 1870-1900 TOTAL

Open 11 20 20 51

% 31 44 44 41

Women Closed 24 25 25 74

% 69 56 56 59

Total 35 45 45 125

Open 8 24 32 64

% 17 34 67 39

Men Closed 38 46 16 100

% 83 66 33 61

Total 46 70 48 164

Table 13. Open-class and closed-class multal quantifiers in CONCE (Letters) by gender and period.

Merja Kytö / Erik Smitterberg

218

10 8 6 4 2 0

80 60 40

%

Dim. score

As Table 13 shows, women letter-writers have higher percentages of open-class quantification than men until period 3. Only men’s letters display a significant increase in open-class quantification.5 Geisler (2003) has calculated factor scores for the women and men letter-writers in CONCE on some of Biber’s (1988) dimensions of variation. The most powerful dimension in this analysis is Dimension 1, ‘Involved versus Informational Production’. On this dimension, women’s letters are more involved in periods 1 and 2, but men’s letters are more involved in period 3. Men’s and women’s letters thus display similar trends with regard to dimension scores and the percentage of open-class quantification. This similarity is illustrated in Figure 1, which plots Geisler’s (2003: 94) dimension scores for Dimension 1, ‘Involved versus Informational Production’ (the left Y-axis), and the percentage of open-class quantification for women and men letter-writers (the right Y-axis). These results point to a connection between the distribution of multal quantifiers and that of features that load on Dimension 1. Open-class quantifiers may thus share communicative functions with the ‘involved’ features on Dimension 1, while closed-class quantification is indicative of ‘informational’ texts.

20 0 1800-1830

1850-1870

1870-1900

Period Women Dim.1

Men Dim.1

Women Open-class

Men Open-class

Figure 1. The percentage of open-class quantification and dimension scores on Dimension 1 by period and gender for women and men letter-writers in CONCE (dimension scores from Geisler 2003). 5

Women: d.f. = 2, Ȥ2 = 1.768, p = 0.414; men: d.f. = 2, Ȥ2 = 25.121, p < 0.001.

19th-Century English: An Age of Stability or a Period of Change?

219

As shown in Figure 1, women’s letters are more involved on Dimension 1 in periods 1 and 2, but men’s letters are more involved in period 3. Men’s and women’s letters thus display similar trends with regard to dimension scores and the percentage of open-class quantification. This similarity points to a connection between the distribution of multal quantifiers and that of features that load on Dimension 1. Open-class quantifiers may thus share communicative functions with the ‘involved’ features on Dimension 1, while closedclass quantification is indicative of ‘informational’ texts.

5. Phrasal verbs and progressives 5.1. Introduction Our final case study concerns two linguistic features that undergo similar developments in 19th-century English. The progressive (e.g. am reading in I am reading a book) and phrasal verbs (e.g. was put off in The meeting was put off until next week) are both more frequent in conversation than in formal writing in Present-day English (Biber et al. 1999: 409, 462). Moreover, Strang (1970: 276) claims that verbparticle combinations, a term that covers both phrasal and prepositional verbs, have always had an “air of colloquiality that still often clings to them”, and Görlach (1999: 82) argues that the progressive “may have been a feature of spoken, non-formal English”. Like the increase in the proportion of open-class multal quantifiers (see section 4), a rise in the frequency of these features in written texts would thus be a further indication of the colloquialization of some genres in late Modern English. Previous research (e.g. Pelli 1976; Arnaud 1998; Denison 1998; Hundt 2004; Smitterberg 2005) indicates that both features became more frequent during the course of the 19th century. However, the present study will examine their development in the same texts, and thus identify possible connections between the occurrence of these two linguistic features on the level of genre, or

Merja Kytö / Erik Smitterberg

220

even idiolect. The idiolectal analysis will also incorporate concurring data on passive verb phrases from Gustafsson (forthcoming). 5.2. Data The present investigation focuses on periods 1 and 3 in CONCE. However, because idiolectal cross-genre variation is relevant to our study, Charles Darwin’s Letters and Science texts from period 2 were also included in the counts; results for these samples will be considered separately. We study stylistic differences in the use of the constructions by considering three genres that differ with respect to formality and medium: Debates (formal, spoken), Letters (informal, written), and Science (formal, written). Progressives in the CONCE corpus were retrieved by searches for combinations of the verb BE and words ending in -ing; manual post-processing of the data excluded irrelevant instances (see Smitterberg 2005 for a detailed account of this procedure). For phrasal verbs, we based the search on lists of adverbial particles given in previous research, such as Claridge (2000), Pelli (1976), and Fraser (1976). A tagged version of CONCE was used to retrieve instances of these particles classified as certain or possible adverbs. In the output, we manually identified the adverbs that formed part of phrasal verbs. Both idiomatic combinations, such as put off meaning ‘postpone’, and non-idiomatic combinations, such as go back meaning ‘return’, were accepted as phrasal verbs. 5.3. Results Table 14 shows that progressives and phrasal verbs display similar patterns with regard to their frequency texts. Both features increase in frequency in Letters. Debates form a middle ground, with a huge increase in the frequency of the progressive and a modest change in that of phrasal verbs. In Science, progressives change marginally, and phrasal verbs not at all. Thus we have a case of simultaneous stability and change.6 6

Frequencies of the progressive and of phrasal verbs have not been tested for significance in the present study as they can only occur in verb phrases, which

19th-Century English: An Age of Stability or a Period of Change? Genre Feature Period 1800-1830 1870-1900 TOTAL

221

Debates Letters Science Prog. Phrasal v. Prog. Phrasal v. Prog. Phrasal v. Freq. NF Freq. NF Freq. NF Freq. NF Freq. NF Freq. NF 12 0.6 70 3.5 246 2.0 618 5.1 26 0.7 135 3.5 30 1.5 84 4.2 386 4.2 748 8.2 35 1.1 110 3.6 42 1.1 154 3.9 632 3.0 1366 6.4 61 0.9 245 3.6

Table 14. Progressives and phrasal verbs in CONCE (Debates, Letters, and Science) by genre and period: raw frequencies and normalized frequencies (=NF) per 1,000 words.

As mentioned above, previous research indicates that both features increase in frequency in late Modern English as a whole. The development in Letters may thus be part of a general tendency for features that are characteristic of spoken interaction to become more common in writing. However, Görlach (1999: 150) claims that scientific texts became more objective and impersonal across the 19th century, and such a genre development may counteract the tendency for progressives and phrasal verbs to increase in this genre: the apparent stability in Science may be due to these two forces cancelling each other out. As regards Debates, some of the differences between the periods may be due to the change in speech presentation from indirect to direct speech. The combination of speech-related language and a formal situational context may also help to explain the intermediate position of this genre in relation to Letters and Science. Developments such as those shown in Table 14 increase the linguistic differentiation between ‘oral’ and ‘literate’ genres, a Modern English trend that has previously been noted by Biber and Finegan (1997: 273). Moreover, in Biber’s (2003) factor analysis of spoken and written present-day academic English, both the progressive and phrasal verbs load as ‘oral’ features on the dimension ‘Oral vs. literate discourse’, the most powerful dimension in the analysis. These results further point to a connection between (i) an increase in the frequency of progressives and phrasal verbs and (ii) a

may make tests based on the frequency of a linguistic feature and the number of words in a text unreliable.

Merja Kytö / Erik Smitterberg

222

gradual colloquialization of informal written genres in late Modern English. The gender parameter is also relevant to the distribution of phrasal verbs and progressives; the results are given in Table 15. Gender Feature Period 1800-1830 1870-1900 TOTAL

Women Prog. Phrasal v. Freq. NF Freq. NF 158 2.3 399 5.8 245 4.9 456 9.1 403 3.4 855 7.2

Men Prog. Freq. NF 88 1.7 141 3.5 229 2.5

Phrasal v. Freq. N 219 4.2 293 7.2 512 5.5

Table 15. Progressives and phrasal verbs in CONCE (Letters) by gender and period: raw frequencies and normalized frequencies (= NF) per 1,000 words.

Again, phrasal verbs and progressives follow similar patterns. Both features become more frequent in both women’s and men’s letters, and women consistently use these features more than men do. According to Labov (2001: 292), when overt prescription is not involved, “[i]n linguistic change from below, women use higher frequencies of innovative forms than men do”. Against this background, the results imply that the progressive and phrasal verbs increased in frequency as a result of change from below, and that women were leaders in this linguistic change. However, the frequency increases are likely to have several causes, both linguistic and extralinguistic. As mentioned above, the increases may be indicative of a gradual colloquialization of some written English genres. Such a process has previously been suggested for the 20th century by, for instance, Hundt and Mair (1999) and Westin (2002); the results of the present study imply that the process may have been underway in the 19th century. Extralinguistic developments such as universal education, the gradual increase in literacy, and the enlargement of the franchise are likely to be relevant to such a process. As Beal (2004: 13), among others, has pointed out, the diffusion of many phonological and syntactic changes in the late Modern period “from spoken to written English and from ‘vulgar’ to ‘educated’ English, respectively” is dependent on factors such as education and social mobility. When new groups of speakers become

19th-Century English: An Age of Stability or a Period of Change?

223

literate and/or obtain political influence, their linguistic habits are also likely to become better represented in the standard language. Moreover, progressives and phrasal verbs are comparatively analytical features, in the broad sense that they use a larger number of free morphemes to express a given meaning than do their simplexverb counterparts. In addition, both features can express grammatical and lexical aspect, respectively. The changes attested here may thus be indicative of long-term trends in the English language that favour analytical constructions and constructions that make aspectual distinctions explicit (see e.g. Claridge 2000: 41). The difference across time between the Letters and Science genres discussed above is also apparent on the synchronic, idiolectal level. Both academic texts and private letters by Charles Darwin are included in CONCE, which enables a comparison between different types of text produced by the same author. Table 16 presents the results of this comparison; the table also includes results for passive verb phrases taken from Gustafsson (forthcoming). Feature Genre Letters Science TOTAL

Progressives Freq. NF 51 2.6 5 0.5 56 1.9

Phrasal verbs Freq. NF 120 6.2 20 1.9 140 4.7

Passives Freq. NF 137 7.1 168 15.7 305 10.2

Table 16. Progressives, phrasal verbs, and passives in Darwin’s CONCE texts (period 2; Letters and Science): raw frequencies and normalized frequencies (= NF) per 1,000 words; figures for passives from Gustafsson (forthcoming).

As Table 16 shows, phrasal verbs and progressives are both clearly more common in Letters than in Science, whereas passives are much more frequent in Darwin’s scientific text than they are in his private letters. As mentioned above, in Biber’s (2003: 55ff.) analysis of present-day academic genres, progressives and phrasal verbs emerge as ‘oral’ features; in contrast, many types of passives load as ‘literate’ features. The genre differences in Darwin’s use of ‘oral’ and ‘literate’ features illustrate the widening split between ‘oral’ and ‘literate’ language in 19th-century English.

224

Merja Kytö / Erik Smitterberg

6. Concluding discussion As mentioned in section 1, the 19th century has been characterized as a period of relative stability. However, our results show that there is no simple answer to the question of stability versus change in 19thcentury English. In all of the three case studies reported on in this investigation, evidence of both stability and change was found. The complexity of the issue of stability versus change is partly due to the fact that linguistic as well as extralinguistic parameters affect the results. For this reason, several levels of analysis must be considered. For instance, progressives and phrasal verbs displayed change in Letters but stability in Science; the extralinguistic feature of genre thus was found to be influential. As regards multal quantifiers, linguistic factors proved important: change could only be observed in assertive contexts, and the change was more clear in uncountable contexts than in countable contexts. Lexical bundles displayed stability across time while they also exhibited genre-specific variation, which shows that various extralinguistic parameters may affect the distribution of data in different ways. As regards methodology, the case studies have shown that the results of multi-feature/multi-dimensional analyses can form important interpretive tools when the communicative functions of other features, such as open-class quantifiers, are interpreted. Biber (1988) used previous research on individual linguistic features in order to select features for inclusion in his factor analysis; such previous research was also used to interpret the dimensions of variation underlying the co-occurrence patterns produced by the factor analysis, for example ‘Involved versus informational production’. We argue that macroscopic analyses such as those carried out by Biber (1988, 2003) and Geisler (2002, 2003) can be equally important in the interpretation of results obtained on the microscopic level, for individual linguistic features. In this way, several levels of analysis can complement each other and give a fuller picture of stability, variation, and change. For instance, open-class multal quantifiers were shown to broadly share their occurrence pattern in Letters with features characteristic of involved production. This similarity lends

19th-Century English: An Age of Stability or a Period of Change?

225

support to the hypothesis that open-class multal quantifiers are indeed indicative of informal and emotional texts. Also, the fact that progressives and phrasal verbs loaded as oral features in Biber’s analysis of present-day academic language helps to explain the similarity of their occurrence patterns in the data provided by CONCE. It cannot be claimed with certainty that the results of a factor analysis based on 20th-century English are valid for the language of the 19th century; nevertheless, such analyses are important interpretive tools in the study of language change. Moreover, the study has demonstrated the methodological importance of considering the sources of data. In the absence of largescale historical corpora on a par with the 100-million-word British National Corpus, one promising option is to combine several corpora, or sections drawn from them, in order to reach more reliable results. However, issues of corpus comparability must then be considered, as the validity of the study will decrease if the corpora used are not fully comparable. For instance, for a study of a linguistic feature such as lexical bundles, where keeping the genre parameter constant is of utmost importance, the fact that the 20th-century section of the ARCHER corpus includes comparatively few Letters texts precludes a valid comparison with CONCE on this parameter. Even when the genres in question are well represented in all of the corpora being compared, it is important to consider whether an unexpected difference between two corpora, such as that for open-class determiners in CONCE and ARCHER, might be due to factors such as sampling setup, sampling frames, or genre development. Finally, the results show that the language of a single author may exhibit considerable situational variation, as was the case with phrasal verbs and progressives in texts written by Darwin. The importance of the individual writer’s idiolect must not be underestimated as a potential source of bias. However, it can also represent an individual locus of linguistic variation and change. As Nevalainen and Raumolin-Brunberg (2003: 92-98) show, tracking the linguistic behaviour of individuals across time in the early Modern English period can reveal ongoing language change. Similarly, examining the language of a single author at the same point in time, but in different spheres of usage, can make apparent stylistic variation;

226

Merja Kytö / Erik Smitterberg

as Aitchison (2001: 40-42) points out, such stylistic variation may indicate language change in progress. However, regardless of whether individuals or groups of individuals are considered, it is important to bear in mind that, even in the 19th century, we chiefly have access to texts from the upper socioeconomic groups. Consequently, the results must be questioned carefully before the claim is made that they are valid for 19th-century English as a whole. There is a need for studies based on textual evidence – whether in manuscript or printed form – of the English that was spoken and written by representatives of the lower echelons of 19th-century society. In addition, our knowledge of 19th-century English chiefly derives from texts produced in London and, more generally, England, while many other regional varieties remain more or less unexplored within the empirical framework. In the light of the three case studies presented, stability, variation and change emerge as multifaceted notions that may apply across large-scale parameters such as genre and gender as well as within a single idiolect. Although they raise important questions about comparability and representativeness, the case studies nevertheless present valuable empirical evidence of 19th-century usage. Moreover, the genres considered have continued to be important in the 20th century, and belong to the set of texts that have been relevant to the formation of Present-day Standard English. It is hoped that the results of the case studies presented here will stimulate further research on other features and varieties of 19th-century English.

References Aitchison, Jean 32001. Language Change: Progress or Decay? Cambridge: Cambridge University Press. ARCHER = A Representative Corpus of Historical English Registers, compiled by Douglas Biber and Edward Finegan (see Biber, Douglas/Finegan, Edward/Atkinson, Dwight 1994).

19th-Century English: An Age of Stability or a Period of Change?

227

Arnaud, René 1998. The Development of the Progressive in 19thCentury English: A Quantitative Survey. Language Variation and Change 10, 123-152. Beal, Joan C. 2004. English in Modern Times: 1700-1945. London: Arnold. Behre, Frank 1967. Studies in Agatha Christie’s Writings: The Behaviour of A GOOD (GREAT) DEAL, A LOT LOTS, MUCH, PLENTY, MANY, A GOOD (GREAT) MANY. Stockholm/Gothenburg/Uppsala: Almqvist & Wiksell. Behre, Frank 1969. Variation and Change in the Distribution of Lot(s), Deal, Much, Many, etc. English Studies 50, 435-451. Biber, Douglas 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, Douglas 2003. Variation among University Spoken and Written Registers: A New Multi-dimensional Analysis. In Leistyna, Pepi/Meyer, Charles F. (eds) Corpus Analysis: Language Structure and Language Use. Amsterdam/New York: Rodopi, 47-70. Biber, Douglas/Conrad, Susan/Cortes, Viviana 2003. Lexical Bundles in Speech and Writing: An Initial Taxonomy. In Wilson, Andrew/Rayson, Paul/McEnery, Tony (eds) Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech. àódĨ Studies in Language 8. Frankfurt am Main: Peter Lang, 71-92. Biber, Douglas/Finegan Edward 1997. Diachronic Relations among Speech-based and Written Registers in English. In Nevalainen, Terttu/Kahlas-Tarkka, Leena (eds) To Explain the Present: Studies in the Changing English Language in Honour of Matti Rissanen. Mémoires de la Société Néophilologique de Helsinki 52. Helsinki: Société Néophilologique, 253-275. Biber, Douglas/Finegan, Edward/Atkinson, Dwight 1994. ARCHER and its Challenges: Compiling and Exploring a Representative Corpus of Historical English Registers. In Fries, Udo/Tottie, Gunnel/Schneider, Peter (eds) Creating and Using English Language Corpora: Papers from the Fourteenth International Conference on English Language Research on Computerized Corpora, Zürich 1993. Language and Computers: Studies in Practical Linguistics 13. Amsterdam/Atlanta, GA: Rodopi, 113.

228

Merja Kytö / Erik Smitterberg

Biber, Douglas/Johansson, Stig/Leech, Geoffrey/Conrad, Susan/ Finegan, Edward 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education. Claridge, Claudia 2000. Multi-Word Verbs in Early Modern English: A Corpus-based Study. Language and Computers: Studies in Practical Linguistics 32. Amsterdam/Atlanta, GA: Rodopi. CONCE = A Corpus of Nineteenth-Century English, compiled by Kytö Merja and Rudanko Juhani (see Kytö, Merja/Rudanko, Juhani/Smitterberg, Erik 2000). Culpeper, Jonathan/Kytö Merja 2002. Lexical Bundles in Early Modern English: A Window into the Speech-Related Language of the Past. In Fanego, Teresa/Méndez-Naya, Belen/Seoane, Elena (eds) Sounds, Words, Texts and Change: Selected Papers from 11 ICEHL (Santiago de Compostela, 7-11 September 2000). Amsterdam/Philadelphia, PA: Benjamins, 45-63. Dekeyser, Xavier 1975. Number and Case Relations in 19th-Century British English: A Comparative Study of Grammar and Usage. Antwerpen/Amsterdam: Uitgeverij De Nederlandsche Boekhandel. Dekeyser, Xavier 1994. The Multal Quantifiers Much/Many and their Analogues: A Historical Lexico-Semantic Analysis. Leuvense Bijdragen: Leuven Contributions in Linguistics and Philology 83, 289-299. Denison, David 1998. Syntax. In Romaine, Suzanne (ed.) The Cambridge History of the English Language. Vol. IV: 17761997. Cambridge: Cambridge University Press, 92–329. Fairman, Tony Forthcoming. Words in English Record Office Documents of the Early 1800s. In Kytö, Merja/Rydén, Mats/Smitterberg, Erik (eds) Nineteenth-Century English: Stability and Change. Cambridge: Cambridge University Press. Fraser, Bruce 1976. The Verb-Particle Combination in English. New York/San Francisco/London: Academic Press. Geisler, Christer 2002. Investigating Register Variation in NineteenthCentury English: A Multi-Dimensional Comparison. In Reppen, Randi/Fitzmaurice, Susan M./Biber, Douglas (eds) Using Corpora to Explore Linguistic Variation. Studies in Corpus Linguistics 9. Amsterdam/Philadelphia: Benjamins, 249-271.

19th-Century English: An Age of Stability or a Period of Change?

229

Geisler, Christer 2003. Gender-based Variation in Nineteenth-Century English Letter Writing. In Leistyna, Pepi/Meyer, Charles F. (eds.) Corpus Analysis: Language Structure and Language Use. Amsterdam/New York: Rodopi, 87-106. Görlach, Manfred 1999. English in Nineteenth-Century England: An Introduction. Cambridge: Cambridge University Press. Gustafsson, Larisa Oldireva Forthcoming. The Passive in NineteenthCentury Scientific Writing. In: Kytö, Merja/Rydén, Mats/Smitterberg, Erik (eds) Nineteenth-century English: Stability and Change. Cambridge: Cambridge University Press. Hundt, Marianne 2004. The Passival and the Progressive Passive: A Case Study of Layering in the English Aspect and Voice Systems. In Lindquist, Hans/Mair, Christian (eds) Corpus Approaches to Grammaticalization in English. Amsterdam/ Philadelphia: Benjamins, 79-120. Hundt, Marianne/Mair, Christian 1999. ‘Agile’ and ‘Uptight’ Genres: The Corpus-based Approach to Language Change in Progress. International Journal of Corpus Linguistics 4, 221-242. Jespersen, Otto 1909-1949. A Modern English Grammar on Historical Principles. 7 Vols. Heidelberg: Carl Winter. Kytö, Merja/Rudanko, Juhani/Smitterberg, Erik 2000. Building a Bridge between the Present and the Past: A Corpus of 19thCentury English. ICAME Journal 24, 85-97. Labov, William 2001. Principles of Linguistic Change. Vol. II: Social Factors. Language in Society 29. Oxford, UK/Cambridge, USA: Blackwell. Nevalainen, Terttu/Raumolin-Brunberg, Helena 2003. Historical Sociolinguistics: Language Change in Tudor and Stuart England. London: Pearson Education. Pelli, Mario G. 1976. Verb-Particle Constructions in American English: A Study Based on American Plays from the End of the 18th Century to the Present. Swiss Studies in English 89. Bern: Francke. Poutsma, Hendrik 1914-1929. A Grammar of Late Modern English. Groningen: Noordhoff. Quirk, Randolph/Greenbaum, Sydney/Leech, Geoffrey/Svartvik, Jan 1985. A Comprehensive Grammar of the English Language. London/New York: Longman.

230

Merja Kytö / Erik Smitterberg

Raumolin-Brunberg, Helena 1988. Variation and Historical Linguistics: A Survey of Methods and Concepts. Neuphilologische Mitteilungen 89, 136-154. Rydén, Mats 1979. An Introduction to the Historical Study of English Syntax. Stockholm Studies in English 51. Stockholm: Almqvist & Wiksell. Smitterberg, Erik 2003. Multal Quantifiers in 19th-Century English. Paper Presented at the 24th ICAME Conference, Guernsey, 2327 April, 2003. Smitterberg, Erik 2005. The Progressive in 19th-Century English: A Process of Integration. Language and Computers: Studies in Practical Linguistics 54. Amsterdam/New York: Rodopi. Strang, Barbara M. H. 1970. A History of English. London: Methuen & Co. Westin, Ingrid 2002. Language Change in English Newspaper Editorials. Language and Computers: Studies in Practical Linguistics 44. Amsterdam/New York: Rodopi.

CLEMENS FRITZ

The Conventions’ Spelling Conventions: Regional Variation in 19th-Century Australian Spelling

1. Introduction The uniformity of Australian English is well established. A.G. Mitchell noted, in 1960, the possibility of ‘pockets of distinctive usage’, but the MitchellDelbridge survey of Australian speech confirmed the absence of any clearly defined regional difference. (Fielding/Ramson 1971: 165)

Received wisdom has it that Australian English (henceforth AusE) is a remarkably homogeneous variety spanning a whole continent without regional influences. This is so remarkable that to this day several theories are competing to explain this fascinating fact. Some claim that proto-AusE had already originated in England before transportation, others that the outcome of a Sydney ‘mixing bowl’ spread all over the country and a minority even claims that every port produced the same mix from the same ingredients. All point to the overwhelming mobility of 19th-century Australian society as the prime cause for the uniformity. Trudgill (1986: 145) emphasizes that “the extreme uniformity of Australian English […] appears to be quite typical of the initial stages of mixed colonial varieties […], with degree of uniformity being in inverse proportion to historical depth.” Since AusE is often believed to exist only in accent, and of course in some swearing slang and other abominations, often sweeping claims about the monolithic nature of AusE are made. One prominent example comes from the eminent linguists Peters and Delbridge (1989):

232

Clemens Fritz Whether one looks at variation in accent […], in local lexicon […], or even the patterns of swearing at an ordinary football match […], there is little evidence of independent language development. There are perhaps historical reasons for this, though the factors invoked […] have largely been eclipsed in the twentieth century […]. How is it that Australian English, both as spoken and written, remains relatively uniform or varies within the same parameters, from Perth to Cooktown? What regulatory forces are Australians responding to in their use of language that they seem to resist in other areas of public conduct? (Peters/Delbridge 1989: 127)

That AusE is remarkably uniform is not questioned. But such a claim should not preclude linguistic research into possibly interesting regional variation. Therefore Bernard (1989: 255) rightly cautions: But the smoothest surface reveals its crevices when the power of the microscope is increased and this statement is not to be taken to mean that there is no regional variation at all. There is, and further, there is every possibility that it is increasing, as the years of white settlement in particular places grow more and more numerous and as the influence of the recent and substantial but regionally non-uniform, non-British migration begins to be felt.

Bryant (1989: 303) explains why regionalisms are so hard to identify in AusE: Demographic and geographical conditions in Australia hinder the recognition of regional usage areas. Given the uneven distribution of the population, the geographical vastness of the country, and the large size of the usage regions […], it is possible to travel hundreds of kilometres, even in the more closely settled parts of the country, without encountering any regional changes in the lexicon. This of course makes the language seem very uniform when speakers are surrounded on all sides by people whose lexicon is the same as their own.

The usage regions defined by Bryant are (1) Western Australia, (2) Southeast South Australia, (3) Victorian language usage area, (4) New South Wales and Queensland, (5) Australian Capital Territory and (6) Northern Territory (Bryant 1989: 311-313). Some efforts have been taken to find regionalisms, mostly in the lexicon (cf. Brooks/Ritchie 1994, Bryant 1985, 1989, 1997; Flint 1965; Jauncey 2004, Ramson 1988), but also in phonology (cf. Bradley 1989, 1991; Horvath/Horvath 2001) and morphosyntax (cf. Taylor 2000). Another area where regionalisms are suspected, but as

Regional Variation in 19th-Century Australian Spelling

233

yet unproven, is spelling. Leitner (1984: 56), Peters (1995: 546f) and Görlach (1991: 158) all hint at the possibility of regional variation but without backing up their observations with corpora or valid statistical analyses. This study attempts to investigate the crevices in the apparently smooth surface that is AusE. It does so by looking at possible regional spelling standards in 19th-century Australia, something never attempted so far. The investigation is empirical, using millions of words as a database.

2. The data 2.1. COOEE The Corpus of Oz Early English (COOEE) is a two million word corpus of early English in Australia (1788-1900). It was collected and edited by the author in order to form the basis of a doctoral thesis. The corpus is divided into four time periods (ca. 500,000 words in each period: 1788-1825, 1826-50, 1851-75, 1876-1900), which correspond to major sociolinguistic boundaries in Australian history. Sociological information about the authors (e.g. gender, region/country of origin, social status, year of arrival, etc.) and about the texts (year and place of writing, register, and text types) were collected and stored in a database. The corpus is described in greater detail in Fritz (2004). For the present study the regional distribution of the texts is most relevant and is therefore given here. All of the states of Australia are represented in the places of writing. Naturally, New South Wales takes the lead, followed by Victoria, South Australia, Western Australia and Van Diemen’s Land (today’s Tasmania). For a text to be assigned to a state, today’s political borders were used, even if this state was historically not in existence at that time. Otherwise the regional distribution would have been skewed by historical names, e.g. if a text written at Port Phillip would be counted as coming from New South Wales. Texts written in Great Britain, at Sea or in other

234

Clemens Fritz

places outside Australia were included in the corpus if their author was a native Australian or had lived there for a considerable time.

Figure 1. Place of Writing.

2.2. The Federation Debates 1890-1898 The late 19th century has often been described as Australia’s nationalist period. In the 1890s the movement towards a union of the Australian colonies gained ever more momentum and culminated in the proclamation of the Commonwealth of Australia in 1901. Five conventions (Melbourne 1890, Sydney 1891, Adelaide 1897, Sydney 1897, Melbourne 1898) were necessary in order to draft a constitution acceptable to the individual delegates and their colonies. Queensland and New Zealand took only part in the first two conventions, the Northern Territory, still a part of South Australia at that time, was not accorded any delegates. Queensland later joined the Commonwealth of Australia on the basis of a popular vote. The conventions were held in the three most important cities of the day, Melbourne, Sydney and Adelaide, each representing a usage region as defined by Bryant (1989). Historically important conferences were also held in Corowa in 1893 and in Bathurst in 1896. These were not included in the investigation. The debates offer a wealth of historical and linguistic information and therefore we have to be grateful to the SETIS project (http://setis.library.usyd.edu.au/), set up by the University of Sydney

Regional Variation in 19th-Century Australian Spelling

235

Library, since it published the entire debates and many a document pertaining thereto in electronic form, and made them thus accessible. PLACE

YEAR

Melbourne

1890

Sydney

1891

Adelaide

1897

Sydney

1897

Melbourne

1898

TITLE OF CONFERENCE Australasian Federation Conference National Australasian Convention National Australasian Convention (1st Session) National Australasian Convention (2nd Session) National Australasian Convention (3rd Session)

WORDS

DELEGATES

100,284

14

638,952

48

840,913

50

738,760

50

1,732,810

50

Table 1. Data about the conventions.

The official Victorian website dedicated to the centenary celebration of the Australian Commonwealth (http://home.vicnet.net.au/~centfed/) provides twenty-three biographies of the conventions’ delegates. Of course this is only a selection, but the sociological data can be assumed to be comparable with those of the other delegates. Ten of the twenty-three delegates covered were Australian-born, five came from England, four from Scotland, two from Ireland, and one each from Wales and Portugal. All of the immigrants had already lived in Australia for decades and some had come when very young, e.g. Quick at the age of two. The English-born Braddon was the oldest (*1829) and the native Australian Peacock (*1861) the youngest. It is worthwhile noting that an examination of two Australians and two Englishmen did not show the two pairs using a different lexis. Only two Australianisms were found, one used by an Australian, the other by an Englishman. The delegates were linguistically united by their background and their purpose rather than divided by their different origins. The proceedings were taken down by the Hansard scribes of the respective state parliaments. Each colonial parliament had taken on and trained its own staff. In the case of Victoria, the parliament had employed a Hansard team of its own since 1866. There are many references to Hansard in the proceedings. For example, Mr Deakin

236

Clemens Fritz

comments on Hansard during the Sydney 1891 debates on March 18th. He is praising the accuracy of the Sydney Hansard team, only deploring that they do not record all the private conversations going on between the delegates: The task on which we have been engaged for the last six weeks has been onerous and arduous to an almost unparalleled degree. Critics who look to the record of our debates, admirably rendered as they have been by the Hansard staff of this colony, will not derive even from that excellent statement a full view of all the circumstances which have been operating upon the minds of hon. members. There is much unstated in that record, because the delegates to this Convention have practically lived together for six weeks in private as well as in public intercourse, and from the natural action and reaction of mind upon mind have been gradually shaping their thoughts upon this great question. (Sydney, 18.03.1891)

Mr Reid seems afraid of the stress put on the delegates and the press due to the verbatim quality of Hansard in Adelaide: The strain on members of this Convention […] will be sufficient to tax the resources of any ordinary individual […]. I also cannot forget the labors which the press will have upon them, and ‘Hansard’, in taking a verbatim account of a sitting which is to last a very considerable time each day. (Adelaide, 22.03.1897)

The website of the Department of Parliamentary Reporting Staff (http: //www.aph.gov.au/dprs/history.htm) states that the Hansard protocols of the first parliament in 1901 were taken down by a team of “nine gentlemen staff”. Something like that number should also have been present during the previous federation debates. The participants could complain if their contributions were recorded wrongly. This is remarked upon by the Hon. Sir Joseph Abbott: I should like to say with regard to the official report of speeches that the practice in this colony has always been to issue a proof copy of the Debates to members. If anything wrong is found in this proof copy the member complaining has the right to go to the Principal Shorthand-writer and have it rectified. With our Hansard staff we have never adopted the practice of submitting proofs of speeches to members for the purpose of correction. If any inaccuracy is found in the proof copy of the Debates now issued the member complaining can have it rectified by directing the attention of the Principal Shorthandwriter to the matter. (Sydney, 10.09.1897)

Regional Variation in 19th-Century Australian Spelling

237

The discussions were lively and the language used in the debates quickly changes from informal to formal and vice versa. Most of this is faithfully recorded in Hansard, although it is very likely that some minor changes in grammar were ‘corrected’, thereby making the accounts less reliable linguistically. Interruptions were frequent, sometimes even in the middle of a word, and long, prepared speeches were hardly ever produced. Again we have the Hon. Sir Abbott who complains about the lack of discipline in the convention: Mr. SYMON. No; it was published after the last debate on the subject. Sir JOSEPH ABBOTT. Very well; I will get over the difficulty. Mr. SYMON. I want to hear it. Sir JOSEPH ABBOTT. I can quote from the Hansard debates. I do not speak so very often that I should be invariably met with interruptions. The Chairman, of course, has a perfect right to call my attention to any irregularities, and I am not above learning, but I object to these continual interruptions by honorable members who have occupied a very large amount of time in this Convention, and who are always exhorting other honorable members not to waste time. (Melbourne, 31.01.1898)

The following exchange shows that even the president’s words were sometimes cut short: Mr. REID: I regret to Say that I have not had sufficient leisure to study the rules of the South Australian Parliament yet. May I ask if there is a time limit. The PRESIDENT: There is no time limit, but there is a powe >sic@ Mr. HIGGINS: I think we ought to have a copy of the Standing Orders before we adopt them in ignorance of what arrangements are made in South Australia. (Adelaide, 23.03.1897)

Even stylistically inadequate language was recorded and not ‘emended’. An example is the colloquialism used by Mr. Barton: Mr. BARTON: They are proposals which should never be in one Bill together. […] One of them – the income tax – comes from the earnings or profits of the people, or of that portion of the people who, I was almost guilty of saying, are to ‘hump the swag’ – at any rate they are to bear the burden. (Adelaide, 23.03.1897)

One area of language the Hansard staff did, however, control entirely was that of spelling and punctuation. And this power they used

Clemens Fritz

238

extensively. The study will show that each local Hansard staff had its own conventions, probably the same as the ones used for the colonial parliaments, and that these were applied rigorously.

3. Spelling conventions In some instances Australian usage aligns itself with the norms of American English […]. (Romaine 1998: 30)

Spelling is an area of language where English differs world-wide. Since spelling variants often come in pairs, they are frequently associated with differences between AmE and BrE. All other varieties are then judged, in a rather Manichean, black and white only, perception of the world, as following either one or the other of the major varieties. In this vein spellings like honor, center and apologize are perceived as ‘Americanisms’ in AusE. But this is wrong. Proper orthography came to be regarded as an indication of refinement rather late, a legacy of the 18th century where logic was considered the liberating panacea. The breakthrough came with Samuel Johnson’s Dictionary of 1755. Since then the British have followed an established set of norms even in their private writings. Only the uneducated still differed and were more and more scorned for this. Scragg (1974: 90) provides a famous example in a letter by Lord Chesterfield to his son in 1750: I come now to another part of your letter, which is the orthography, if I may call bad spelling orthography. You spell induce, enduce; and grandeur, you spell grandure; two faults of which few of my house-maids would have been guilty. I must tell you, that orthography, in the true sense of the word, is so absolutely necessary for a man of letters, or a gentleman, that one false spelling may fix a ridicule upon him for the rest of his life; and I know a man of quality, who never recovered the ridicule of having spelled wholesome without the w.

Only a small number of words have retained variable spellings and these were codified in BrE, in AmE by Noah Webster, and in AusE to

Regional Variation in 19th-Century Australian Spelling

239

greater or lesser extent. The following table shows some of the variables in question. VARIABLE ae reduction DG(E) EINquire EINquiry EINsure ENSCE nouns ER/RE grAEy homophone mergers ISZE JAIL L-doubling LOG(UE) LYSZ O(U)L O(U)R OE reduction practiSCe (verb) prograM(ME) S-doubling diSCK SCKeptic sulPHFur whisk(E)y

AME VARIANT E (e.g. anemia) -DG- (e.g. judgment) inquire inquiry insure -ENSE(e.g. defense) -ER (e.g. center) gray check, curb, draft, story -IZE (e.g. criticize) jail -L- (e.g. traveler) -LOG (e.g. dialog) -LYZ (e.g. analyze) -OL- (e.g. mold) -OR (e.g. color) E (e.g. fetus) practice program -S- (e.g. focused) disk skeptic sulfur whiskey

BRE VARIANT AE (e.g. anaemia) -DGE- (e.g. judgement) enquire enquiry ensure -ENCE(e.g. defence) -RE (e.g. centre) grey cheque, kerb, draught, storey -ISE (e.g. criticise) gaol -LL- (e.g. traveller) -LOGUE (e.g. dialogue) -LYS (e.g. analyse) -OUL- (e.g. mould) -OUR (e.g. colour) OE (e.g. foetus) practise programme -SS- (e.g. focussed) disc sceptic sulphur whisky

Table 2. Spelling variables distinguishing AmE from BrE (adapted from Sigley 1999: 7)

Not all of these variables are fully opposed standards. Some are standardized in Britain, but variable in the US, e.g. whiskey, ae/oe digraph retention and disCK. Others are standardized in the US, but variable in Britain, like -ise/-ize, jail and practiSCe (verb). Of these the variables -re/-er, -our/-or and -ise/-ize are investigated in this study in order to establish whether there were indeed different spelling standards in the Australian colonies in the late 19th century.

Clemens Fritz

240

4. The conventions’ conventions 4.1. -re vs.-er The first variable to look at is -re/-er. It is attached to a very limited number of words. Almost all of them are of Latin origin, but most took a detour via French, -re obviously following the French spelling of the word. The suffix is mostly attached to words whose original form contained the letter sequence r + vowel, e.g. centrum, fibra, lustrum, mitra, sepulchrum, etc. In Early Modern English, the spelling -er, which is in line with a very common letter sequence denoting the schwa sound, predominates. But this changed when the enlightened scholars of the 18th century took issue with this regularized spelling and changed it into -re where they believed this was true to the word’s ancestor. Contrary to today’s relatively clear-cut positions, the 19thcentury varieties of English were still more open to variability. In an edition of Johnson’s Dictionary from 1836 calibre and meagre are given -er suffixes. Webster’s edition of 1828 has three words where it allows both spellings, -er and -re, viz. sabre, sombre and theatre. Australia was different in that it had standardized -re/-er to -re from very early on. In COOEE there are 256 instances of words where variation is possible. Only three of these were written with -er. The first example comes from the 1794 speech by the Reverend Richard Johnson, the second from the 1822 ship diary of Lachlan Macquarie on his way to England after thirteen years in Australia. The third was not counted because it was found in the Federation Debates, some parts of which are also included in COOEE. Sin is such a horrid evil, that unless it is forgiven, and blotted out, by the blood of Jesus, it will sink your souls lower than the center of the earth, even into the very depths of hell, never, never, never more to rise. (1794) […] being now about Ninety Miles to the Eastward of the Center of the group of Falkland Islands. (1822)

The Hansard staff of all three state parliaments, Melbourne, Sydney and Adelaide, agree that -re should be used consistently. This can be

Regional Variation in 19th-Century Australian Spelling

241

deduced from the fact that out of 128 possible slots, only one is taken up by -er. Again the word is center. […] so that after all, the whole process that is proposed has nothing unEnglish about it, because it is an attempt to center a full measure of representation, instead of taking any of it away. ( Sydney, 17.03.1891)

This is certainly a slip of the pen, and of the watchful eyes of the proof-readers. The fact that 127 instances are uniform shows that all the members of the different staffs were looking out for -re/-er in order to spell it consistently. However, it is debatable whether this was influenced by prescribed spelling conventions or whether the Hansard staff simply spelled like most people in Australia at that time.

4.2. -our vs.-or The question of French-derived -our vs. Latin-derived -or was hotly debated in the 18th century. This resulted in thoroughly mixed spellings. And not always was there scholarly agreement on a word’s history. Therefore honor could be seen next to honour in a single text. Even three Old English words were erroneously assigned -our, namely harbour, behaviour and neighbour. In the US the move towards -or was greatly furthered by Webster’s publications. In Britain, on the other hand, this trend was arrested by the fleet of reprints of Dr Johnson’s Dictionary. During the early formative years of AusE there was no accepted American or British standard and -our/-or was certainly not considered to be distinctive for either of these. The Melbourne Age decided as early as 1854 that -or is ‘better’ and that it therefore should be used in all articles, a policy it has not changed since then! In a 19th-century Australian context -our/-or is much more rewarding than -re/-er. COOEE has a total of 2579 instances where variation is possible. 443 of these are spelled -or and 2136 -our, so -our comes up in 83 per cent of all possible instances. Fritz (forthcoming) has shown that the choice was influenced by the origin (colonial-borns used -or much more frequently), education (the higher

Clemens Fritz

242

the more -or) and gender (-our was more preferred by women) of the author and the register of the text (speech-based texts and Government documents favoured -or). As regards the use in different periods, it can be said that -or continued its minority existence throughout. This is all the more surprising since -our/-or, unlike -re/-er, had become fully opposed standards already in 19th-century British and American English. Comparing the data from COOEE with the spelling found on Australian websites, a remarkable congruence can be established. The former has -our in 83 per cent of all cases, the latter in 80 per cent. VARIABLE ADELAIDE MELBOURNE MELBOURNE SYDNEY SYDNEY 1897 1890 1898 1891 1897

-our -or % -our

14 662 2.1

103 0 100

953 39 96.1

359 63 85.1

538 49 91.7

Table 3. -our/-or in the Federation Debates.

The degree of difference between the Hansard reports of the three Hansard staffs is astonishing. Victoria seems to have the strictest spelling policy, but the other parliaments also make definite choices. Melbourne 1890 is uniform, but the total number of instances is comparatively low. The same Hansard staff produced 39 instances of -or eight years later, which seems, at first, surprising, despite the fact that this still only constitutes 3.9 per cent. A look at individual words helps to solve the riddle. Thirty-six instances come from harbor, which is only once spelled harbour. It is clear that harbor was singled out as an exception by the shorthand writers. If harbor is not counted, the percentage of -our rises to 99.7 in the Melbourne 1898 proceedings. Then the only exceptions left are candor and honor (2). There are several explanations which can account for these. Possible factors are: x mistakes in proof-reading by the principal; x preferences of individual members of the staff who were flouting the rules; x editing mistakes by the SETIS staff.

Regional Variation in 19th-Century Australian Spelling

243

The preference for -our is very high in colonial Victoria ranging from 85-88 per cent, despite the newspaper The Age which had implemented an -or policy. This margin of doubt is further diminished by the Melbourne Parliament staff which achieve almost 100 per cent in 1890 and in 1898. A colonial trend can thus be said to have been picked up and ‘perfected’. Interestingly, harbor, the only word spelled consistently -or in the debates, is never found in other Victorian writings with that variant. The Sydney staff, on the other hand, appears rather lenient in comparison, accepting a much greater degree of variability. Again pure numbers are misleading. Only a single word is responsible, namely honor. It accounts for 62 instances in 1891 and for 49 in 1897. It is never spelled with -our. Apart from that there is only a single example of -or in any other of the 898 slots where -or is possible. This is favor in 1891. This one instance is balanced by 166 cases where it is spelled with -our in 1891. Leaving out honor, the percentage of -our is 99.7 and 100 per cent in Sydney. This speaks for a very strict policy here together with a consistently applied exception to that rule. Results from COOEE show that the decline of -or in New South Wales, which had almost become extinct between 1850 and 1875, was not only arrested after 1876. It had even become stronger than ever, making up 26% of all instances. This is not reflected in the Sydney debates from 1891 and 1897, which shows a growing marginalization of -or from 14 to 8 per cent. In fact, the only word spelled -or in Hansard reports, honor, is a clear minority option in New South Wales as a whole at that time. The parliament of South Australia had implemented the opposite policy, strictly enforcing -or, although in some rare cases, -our was retained. Contrary to the Melbourne and Sydney data, this is not due to exceptions for specific words. Indeed many different words are found with -our, viz. ardour (1), behaviour (2), endeavour (2), favour (3), honour (2), labour (1), neighbour (2) and valour (1). Only for the last one an exception can be surmised since valour never appears in the Adelaide debates. Sessions took place on twenty-five days, but -our occurs only in the proceedings of four of them, March 25th (2), 30th (3) and 31st (6) and April 12th (3). All of these were days during the week which meant that the proof-reading and the typing had to take place between

244

Clemens Fritz

the closing of the session and the following morning. All four of them closed late, the one on March 30th as late as 22:31, but not exceptionally so. It is possible that this contributed to slacker proofreading or more careless typing. Whatever the reason, the 97.9 per cent of -or surely compare favourably with most of today’s Hansard results, though the Melbourne and the Sydney staff were better at this. Comparing the convention data with the South Australian texts in COOEE a great difference can be established. COOEE shows an overwhelming majority of 103:4 of -our whereas the Adelaide Hansard has a ratio of 14:662. This indicates a carefully protected policy.

4.3. -ise vs.-ize This variant is standardized only in AmE but variable in BrE and in AusE. The Australian Government Style Manual favours -ise since the 1970s and it is standard in Australia’s press today. There is a regional difference in that the education departments in Victoria and South Australia in 1987 prescribe -ise consistently, whereas New South Wales and Queensland also have -ize (Peters 1995: 406f). With respect to etymology, -ize in verbs is derived from Greek -izein, Latin -izare, but -ise comes up in words borrowed from French -iser. Again BrE chose the French over the Latin variant, with the notable exception of Cambridge and Oxford University Press, both of which prefer -ize. On the other hand, the Australian branch of Oxford University Press discontinued use of -ize after 1991 (Peters 1995: 407). The -ise/-ize variation in COOEE is even more pronounced than the -our/-or one. There is a total of 761 cases, 427 of which have -ise (56.1%). The date of the text is the most important factor determining use of -ise/-ize in COOEE, with -ise on a steady rise from 36.9 in the first period (1788-1825) to 74.4 in the last (1876-1900). The register of a text (-ise in speech-based texts and -ize in government documents) and the origin (Australians and Irish were more prone to use -ise, the British slightly favoured -ize) and status (the highest classes used -ise in only 48.2 per cent, the lowest in 90.9 per cent) of an author were also influential factors. Throughout the 19th century

Regional Variation in 19th-Century Australian Spelling

245

-ise/-ize variation was considerable in all varieties of English and only today, AmE has reached an almost consistent use of one variable, -ize. VARIABLE ADELAIDE MELBOURNE MELBOURNE 1897 1890 1898

-ise -ize % -ise

239 8 96.8

4 52 7.1

4 544 0.7

SYDNEY SYDNEY 1891 1897

179 1 99.4

164 1 99.4

Table 4. -ise/-ize in the Federation Debates.

Again the conventions’ conventions can be shown to differ regionally with the Sydney staff achieving an almost impeccable consistency of 99.4 per cent during both conventions. One can almost picture the stern face of the ‘Principal Shorthand-writer’ moodily pondering the sole example of -ize that he had not discovered before publication. The fact that it happened in 1891 (naturalize) and in 1897 (authorize) may have made him a very unhappy man. We can take solace in the thought that it is possible that these ‘blatant errors’ could also have been committed by the SETIS staff. The New South Wales part of COOEE shows that -ise was growing in the course of the 19th century from the minority (34.3%) to the majority (59.8%) variant. This trend is taken up by the Sydney Hansard staff. Probably uncomfortable with the variability, they eliminated all except one instance of -ize in the 1891 and 1897 records. The fact that they chose -ise and not -ize could be a reflection of the dominant spelling in this colony. This preference of -ise is not matched by the present-day policy of the New South Wales Department of Education. The South Australian staff has a more human looking record of 96.8 per cent. The eight instances come from seven days and involve five lexical items: civilize (3), harmonize (1), naturalize (1), organize (2) and scrutinize (1). This pattern looks more accidental than deliberate. There seems no policy behind it, only human fallibility. In the South Australian COOEE texts, -ize is the majority choice, although it is declining in the course of the century. In this case the parliament in Adelaide was in accordance with its surroundings, only being much more consistent in its choices. Unlike

246

Clemens Fritz

in New South Wales, -ise is still recommended by the South Australian Department of Education. This time the Hansard staff of Victoria is the one who follows a policy opposite to the other two and -ize is the spelling of choice. It is important to note that the level of conformity grows significantly between 1890 and 1898. In the first convention the words in question were authorise (1), civilise, (1) and organise (2). All of these are kept in check by more frequent occurrences of the same word spelled -ize, so a lexicalized exception cannot have been a contributing factor. In 1898 there is civilize (1), again, and advertise (3). Since there is no instance of advertize in either 1891 or 1898 it is likely that this word was not included in the -ise/-ize policy of the Melbourne Hansard staff. Perhaps the Victorian Parliament had employed a new principal shorthand-writer in 1898 who continued the spelling policies of his predecessor, but enforced them more strictly, in other words he was more pedantic. If advertise is left out, the record of 99.8 per cent -ize is truly impressive. Here the Victorian Hansard is totally at odds with local traditions. Whereas COOEE shows an overall majority of -ise for Victoria, the 1891 and 1898 conventions’ records strongly favour -ize. These local traditions seem to have prevailed if the choices of the Victorian Department of Education are taken as a guide. A word not considered in this context was recognise. The spelling recognize can be found quite frequently in COOEE (63), but mostly in the earlier periods. It is therefore not surprising that recognize was only found in the Melbourne 1898 proceedings, but even there the four instances are overwhelmed by 336 counterexamples.

4.4. Other spelling variables Most other areas of spelling variability investigated proved to be unrewarding in the sense that regional differences could not be established. All the parliaments’ reports unanimously, and almost consistently, used -dgment and not -dgement in words like acknowledgment, judgment and lodgment. This is in line with findings from COOEE. However, the level of consistency was much higher,

Regional Variation in 19th-Century Australian Spelling

247

though a statistical significance could only be established for the Sydney 1897 proceedings. If this is a result of a prescribed spelling or simply the continuation of everyday spelling habits in Hansard cannot be judged on this basis. Two other words are of some interest in this context, namely endorse/indorse and ensure/insure. In COOEE indorse occurs almost exclusively in the first period. In Sydney (1891 = 9:1, 1897 = 19:3) indorse outnumbers endorse, defying the common trend discernible in COOEE. In this case parliamentary conventions openly contradicted the spelling choices made by most people in New South Wales in the late nineteen hundreds. Unlike the Sydney conventions, Melbourne used endorse exclusively in 1890, whereas it was the minority option in 1898 (43:8). The fact that Melbourne changes from endorse only to a great majority of indorse suggests a conscious editorial decision. Numbers are, however, low. In Adelaide the endorse:indorse ratio is 20:8 making it the only convention which is in line with the preferred spelling in its colony. A similar picture emerges when looking at ensure/insure. All instances where insure carried a financial meaning were excluded so that the difference in the vowel grapheme did not correspond to a difference in meaning. This ensured that the semantics could not skew the data. VARIABLE ADELAIDE MELBOURNE MELBOURNE SYDNEY 1897 1890 1898 1891

Ensure Insure % ensure

29 3 90.6

3 0 100

0 65 0

16 2 88.9

SYDNEY 1897

24 16 60

Table 5. Ensure/insure in the Federation Debates.

Ensure is the favoured spelling in COOEE and in the Federation Debates. The Sydney 1897 records are remarkable in that they allow for a much greater variability than all the others. They are also much more variable than the 1891 data, the change being statistically significant. Change is going on inside the Sydney Hansard staff, but the low levels of consistency suggest a change from below rather than one from above. It is noteworthy that the direction of the change is away from local spelling customs! Adelaide shows a strong preference for ensure. However, the percentage is much lower than for -re/-er,

Clemens Fritz

248

-our/-or and -ise/-ize. This could mean that the South Australian Parliament did not care about the spelling of ensure/insure but left it to the individual Hansard reporter. Two of the three instances of insure come from a single day which further supports this hypothesis. The Victorian data again show a complete shift. In 1890 there is only ensure and in 1898 only insure. This strongly hints at a conscious change of policy. The difference is highly significant at a level of confidence of less than 0.001. It also suggests that the Melbourne Hansard staff indeed had a policy on ensure/insure, whereas the Adelaide and the Sydney ones did not. As in Sydney, the shift from ensure to insure is a shift away from local traditions. All in all, a definite trend towards use of i rather than e is discernible in all three parliaments, which is always at odds with local spelling preferences.

4.5. Summary The regional variation of the choices made by the three Hansard staffs are best captured by tables and figures which are supplied and discussed here. VARIABLE

-re/-er -our/-or -ise/-ize endorse/ indorse ensure/ insure

ADELAIDE MELBOURNE MELBOURNE SYDNEY SYDNEY 1897 1890 1898 1891 1897

-re (100%) -or (97.9%) -ise (96.8%) endorse (71.4%) ensure (90.6%)

-re (100%) -our (100%) -ize (92.9%) endorse (100%) ensure (100%)

-re (100%) -our (96.1%) -ize (99.3%) indorse (84.3%) insure (100%)

-re (95.7%) -our (85.1%) -ise (99.4%) indorse (90%) ensure (88.9%)

-re (100%) -our (91.7%) -ise (99.4%) indorse (86.3%) ensure (60%)

Table 6. The conventions’ conventions.

The very high levels of consistency achieved in texts containing millions of words written down in shorthand and proof-read and published in haste can only be called admirable. Given this consistency, the number of 90% can be taken as an arbitrary measure

Regional Variation in 19th-Century Australian Spelling

249

for a presumed spelling policy. Some of the percentages would be even higher if items which evidently were considered exceptions are not included (e.g. honor in New South Wales and advertise in Victoria). This means that most of the areas investigated indeed had an established spelling policy, but some were open to individual choices. The Victorian parliament is the only one with visible changes in policies, as well as a significant rise in the percentage of -ize. This change could be due to a new editor-in-chief, but this is, of course, only a conjecture.

Figure 2: The conventions’ conventions.

250

Clemens Fritz

5. Conclusions This study has found evidence of regional spelling standards never before investigated. The monolithic character of AusE has, once again, been successfully challenged. Indeed, it was shown that each colonial parliament had spelling policies and that these were strictly followed. The level of consistency achieved is truly astounding and surely would do honour to any parliament’s staff today. Surprisingly, the editorial decisions taken differed from state to state, e.g. with -our/-or and -ise/-ize, and are sometimes the opposite of the majority choices within that colony, e.g. -ise in Victoria and -or in South Australia. Although there is no direct proof, it is very likely that there were style manuals for the Hansard staffs of the three colonial parliaments. These were used to achieve consistency in spelling and punctuation in the reports. However, not all parliaments chose to regulate all matters of divided usage. Some allowed for variability, e.g. ensure/insure in South Australia and New South Wales, leaving the ultimate choice to the individual scribe. It would be intriguing to unearth these manuals which divided AusE at the point when the Australian colonies had decided to form an inseparable union.

References Bernard, John R.L. 1989. Regional Variation in Australian English: A Survey. In Collins, Peter/Blair, David (eds) Australian English: The Language of a New Society. St. Lucia: University of Queensland Press, 255-259. Bradley, David 1989. Regional Dialects in Australian English Phonology. In Collins, Peter/Blair, David (eds) Australian English: The Language of a New Society. St. Lucia: University of Queensland Press), 261-270.

Regional Variation in 19th-Century Australian Spelling

251

Bradley, David 1991. /Ae/ and /a:/ in Australian English. In Cheshire, Jenny (ed.) English around the World. Cambridge: Cambridge University Press, 227-34. Brooks, Maureen/Ritchie, Joan 1994. Words from the West. Melbourne: Oxford University Press. Bryant, Pauline 1985. Regional Variation in the Australian English Lexicon. Australian Journal of Linguistics 5, 55-66. Bryant, Pauline 1989. The South-east Lexical Usage Region of Australian English. Australian Journal of Linguistics 9/1, 85134. Bryant, Pauline 1997. A Dialect Survey of the Lexicon of Australian English. English World Wide 18/2, 211-241. Collins, Peter/Blair, David (eds) 1989. Australian English: The Language of a New Society. St. Lucia: University of Queensland Press. Fielding, Jean/Ramson, William S. 1971. The English of Australia’s ‘Little Cornwall’. AUMLA 36, 165-173. Flint, Elwyn Henry 1965. The Question of Language, Dialect, Idiolect and Style in Queensland English. Linguistic Circle of Canberra Publications, Bulletin 2, 1-21. Fritz, Clemens 2004. From Plato to Aristotle – Investigating Early Australian English. Australian Journal of Linguistics 24/1, 5798. Fritz, Clemens Forthcoming. Favoring Americanisms? -or/-our Spellings in Early English in Australia. Proceedings of the 24th ICAME Conference, Guernsey, April 24-27, 2003. Görlach, Manfred 1991. Australian English: Standards, Stigmata, Stereotypes and Statistics. In Görlach, Manfred (ed.) Englishes: Studies in Varieties of English 1984-1988. Amsterdam: Benjamins, 144-173. Horvath, Barbara M./Horvath, Ronald J. 2001. Short A in Australian English: A Geolinguistic Study. In Blair, David/Collins, Peters (eds) English in Australia. Amsterdam: Benjamins, 341-355. Jauncey, Dorothy 2004. South Australian Words: From Bardi-grubs to Frog-cakes. Melbourne: Oxford University Press. Leitner, Gerhard 1984. A Diachronic Study of Broadcast Communication. Australia Journal of Communication 5/6, 57-64.

252

Clemens Fritz

Peters, Pam 1995. The Cambridge Australian English Style Guide. Cambridge: Cambridge University Press. Peters, Pam/Delbridge, Arthur 1989. Standardization in Australian English. In Collins, Peter/Blair, David (eds) Australian English: The Language of a New Society. St. Lucia: University of Queensland Press, 127-137. Ramson, William S. 1988. Some South Australian Words. In Burton, Tim L. / Burton, Jill (eds) Lexicographical and Linguistic Studies: Essays in Honour of G.W. Turner. Woodbridge: Boydell and Brewer, 145-149. Romaine, Suzanne 1998. Cambridge History of the English Language, vol. IV: 1776-1997. Cambridge: Cambridge University Press. Scragg, Donald G. 1974. A History of English Spelling. Manchester: Manchester University Press. Sigley, Robert 1999. Are we Still under England’s Spell? Te Reo 42, 3-19. Taylor, Brian 2000. Syntactic, Lexical and Other Transfers from Celtic in (Australian) English. Lecture delivered at the University of Sydney. Trudgill, Peter 1986. Dialects in Contact. Oxford: Oxford University Press.

TINE BREBAN

The Grammaticalization of the English Adjectives of Comparison: A Diachronic Case Study1

1. Introduction The surge of interest in semantic change and particularly grammaticalization in recent decades is founded on the growing belief that, in order to fully grasp the synchronic behaviour and make-up of linguistic items, we need to take the diachronic evolution of these items into account. The present-day appearance of language is seen as the result of the historical interaction of linguistic, social, cultural and other factors directing the ways in which language can be used. As we have no direct access to historical stages of languages, only to their synchronic reflexes, we are ultimately dependent on historical corpora to make well-founded claims about the relation between synchronic behaviour and diachronic explanations. One group of language items for which a diachronic approach can be expected to shed light on their synchronic behaviour are the English adjectives of general comparison. These are adjectives such as same, other, similar, etc. which express comparison “in terms of likeness and unlikeness without respect to any particular property” (Halliday/Hasan 1976: 76-77). Although they are a rather neglected area of research, the adjectives of comparison are a semantically fascinating group. In Present-day English, they can be used in several different ways, as attribute, postdeterminer or classifier in the noun phrase (henceforth NP) and as predicative adjective. Semantically, 1

Many thanks are due to Kristin Davidse for the accurate and much appreciated comments she made with respect to this chapter as well as in the discussions on adjectives of comparison in general. I would also like to thank Keith Carlon for his careful reading of the chapter and his unobtrusive changes and corrections of the text.

254

Tine Breban

they are polysemous between a fully lexical meaning, associated with the function of quality-attribution,2 and textual meanings, associated with the functions of postdeterminer and classifier in the NP. In example (1), for instance, different displays a fully lexical meaning: it indicates (gradable) likeness or how many qualitative features different entities share. In this example different conveys the idea that Haifa is as a city not very much like Tel Aviv as it does not share the features ‘flat’ and ‘open’. Example (2) shows the other use of different. Here, different does not indicate that the two houses are not like each other, but simply signals that another instance of house is being referred to. Hence it has a determiner-like function indicating that a new instance of a known type is being introduced into the discourse. (1)

Again, the weather report in Haifa is not my expertise at this exact moment, but Haifa’s a very different city from Tel Aviv which is very flat and open and the dissipation of chemical agents will be much swifter. (CB)3

(2)

If you have problems once you arrive at the cottage, the agency may be able to move you to a different house or solve the difficulty. (CB)

In previous studies (Breban 2002/2003; Breban/Davidse 2003), it has been argued on the basis of synchronic corpus material that this polysemy can be explained as the simultaneous presence of different stages of a process of grammaticalization in the same stage of a language, also known as ‘layering’ (Hopper 1991: 22-24). The thesis is that the adjectives of comparison have been going through a process of grammaticalization which develops a new secondary determiner meaning from the original lexical attribute use. For two of the adjectives, this analysis has obvious intuitive appeal, as the 2

3

The concrete realization of the attribution of quality includes both the use of the adjective as attribute in the NP and the predicative use in combination with a copular verb. The source of the examples is indicated between brackets. The examples marked CB are taken from the COBUILD Corpus, the ones marked HC from the Helsinki Corpus of English Texts, and those marked CLMET from the Corpus of Late Modern English Texts. In the few cases where they are taken from other material, full information on the source is given. In each of the examples, the relevant NP has been put in bold.

The English Adjectives of Comparison: A Diachronic Case Study

255

combinations another and the same function in Present-day English as real determiner units. But there are also synchronic indications that other semantically related adjectives such as identical, different, similar and comparable undergo the same semantic development. They display the same polysemy between fully lexical and textual meanings and the corpus data also contain a few transition examples or ‘bridging contexts’ (Evans/Wilkins 2000) in which both types of meaning are available and licensed by different elements from the context. Although they are indications of a similar development, these adjectives manifest this semantic change only to a lesser extent than other and same, and their semantic development is not accompanied by the same formal reflexes of grammaticalization, such as bonding or ‘coalescence’ (Lehmann 1985: 308), displayed for example by other in the fused form another. This study will further explore the promising synchronic hypothesis that this current polysemy is the result of a diachronic grammaticalization development. More specifically, it presents the results of a search for the necessary diachronic evidence supporting the grammaticalization hypothesis. This diachronic investigation takes the form of separate case studies looking into the historical evolution of six adjectives representing the three semantic subgroups of comparison: other and different for difference, same and identical for identity, and similar and comparable for similarity. The corpus extractions that will be used for these case studies consist of eight random samples, covering the periods 750-1050, 1050-1250, 1250-1500, 1500-1710, 1710-1780, 1780-1850, 18501920 and a Present-day English sample containing material from 1990 onwards, for each of the six adjectives. The first four samples (7501710) are taken from the Helsinki Corpus, the next three (1710-1920) from the Corpus of Late Modern English Texts.4 The Present-day English samples are extracted from the COBUILD Corpus (Bank of English) via the Collins wordbanks online service. The historical samples contain (if possible) 100 instantiations of the respective 4

This corpus was recently compiled by Hendrik De Smet (cf. De Smet Forthcoming) on the basis of texts drawn from the Project Gutenberg and the Oxford Text Archives. It consists of almost ten million words and covers the period 1710-1920.

Tine Breban

256

adjectives, but are enlarged to 200 instantiations if necessary, for example to collect a sufficiently large number of postdeterminer data. The Present-day English data always consist of 200 examples.5 The composition of the corpora reflects the intention to study the development of the six adjectives not only from a qualitative point of view, viz. which meanings occur in a particular period and how they develop, but also from a quantitative point of view, drawing a quantitative profile of the distribution of the different meanings for each of the adjectives in the different periods. These quantitative results have, of course, to be handled with the necessary precautions, as they are biased by the material that is used: the data samples consist of a small proportion of written language only, from a limited range of genres and contexts. Nonetheless, they will provide us with a general picture of the development of the different meanings of the adjectives in question. So, the purpose of this chapter is twofold. It first and foremost sets out to provide diachronic support, both qualitative and quantitative, for the grammaticalization hypothesis. Secondly, it also aims to study the historical development of the individual adjectives of this neglected group in more detail. The chapter will consist of the following sections. The second section will briefly summarize the grammaticalization analysis as it was formulated on the basis of synchronic material by Breban (2002/2003) and Breban and Davidse (2003). The third and main part of the study will present the results of the six case studies and sketch the historical development of each of the six adjectives of comparison. 5

The exact sizes of the different samples are as follows:

750-1050 1050-1250 1250-1500 1500-1710 1710-1780 1780-1850 1850-1920 1990-

same 21 0 100 100 100 100 100 200

identical 0 0 0 0 23 33 73 200

Table i. Sizes of the different samples.

other 100 100 100 100 200 200 200 200

different 0 0 0 13 200 200 200 200

similar 0 0 0 0 110 200 200 200

comparable 0 0 0 1 5 17 18 200

The English Adjectives of Comparison: A Diachronic Case Study

257

The fourth and final section will summarize the results of the diachronic investigation in the light of the grammaticalization hypothesis and point out some interesting directions for future research.

2. The grammaticalization of the English adjectives of comparison: synchronic argumentation 2.1. General path of grammaticalization The point of departure for the present investigation was the recognition of the polysemous semantics of the English adjectives of comparison in synchronic corpus data. On the one hand, these adjectives can express gradable likeness, or more concretely, how many qualitative features different entities share. In this case, they function as qualitative adjective, viz. as attribute in the NP or as predicative complement in copular clauses. In (3), for instance, different functions as an attribute and conveys the idea that the way coming of age is looked at in the film Rambling Rose has very little in common with the way it is presented in the earlier film Valley Girl. The submodifier very furthermore explicitly grades the unlikeness as high. (3)

Film director Martha Coolidge is best known for her teen comedy, Valley Girl. She takes a very different look at coming of age in her latest movie, Rambling Rose, which stars Laura Dern, Diane Ladd, and Robert Duvall. (CB)

On the other hand, adjectives of comparison also manifest textuallyoriented meanings in the functions of postdeterminer and classifier in the NP.6 In these functions they convey the meaning that the instance or subtype denoted by the NP is the same one as or a different one 6

Breban and Davidse (2003: 312) argue that the development of textual classifiers is a later development in the process of grammaticalization, created by analogy with the textual postdeterminer uses of the adjectives.

258

Tine Breban

from another instance or subtype that is available in the context. In example (4), other is functioning as postdeterminer and indicates that the boy in question is not the same boy that is talked about in the previous sentence. More specifically, the determiner combination another introduces a new instance (indefinite reference realized by an-) of a known type (marked by -other). Example (5) illustrates the classifier use of other, which conveys the meaning that a different subtype from previously mentioned subtypes is derived from the general type denoted by the head noun. In example (5), other indicates that different subtypes of problems besides the lexically indicated subtype ‘political’ problems are included. (4)

A PREGNANT woman, 27, and a boy, 4, were killed in a house fire at a Calliope housing estate, 20 km south-west of Gladstone, at 6.30 pm yesterday. Gladstone police said another boy, 2, was in the Gladstone Hospital but was not expected to live. (CB)

(5)

Foreign trade strengthens co-operation between nations, eases mutual understanding, makes the solution of political and other problems easier, and creates an atmosphere of trust, security and peace. Trade is after all considered to be the harbinger of peace. (CB)

From this brief presentation it becomes clear that the different meanings involved are in fact two different types of meanings: the quality-attributing meaning is a lexical meaning with propositional content, whereas the textual meanings can be characterized as functional, grammatical meanings. These two types correspond to the types of meaning involved in Traugott’s (1989, 1995) definition of grammaticalization as “the tendency to recruit lexical (propositional) material for purposes of creating text and indicating attitudes in discourse situations” (Traugott 1995: 47). Table 1 illustrates this point on the basis of the different uses of other. As will be discussed in greater detail in section 3, other could originally be used as a lexical attribute equivalent to different. On the basis of this lexical meaning, it then developed textual postdeterminer and classifier meanings. In the course of this process of semantic change, the original lexical meaning of other was lost (as indicated by the asterisk). In Present-day English, this meaning has to be expressed by different.

The English Adjectives of Comparison: A Diachronic Case Study determiner

postdeterminer

a(n) an +

attribute

classifier

*other ‘different’

259 head noun look

other

boy political and other

problems

Table 1. The semantics of other in the NP.

The corpus material also provided further evidence suggesting that the two meanings involved were not simply separate meanings happening to fit in with the types of meanings involved in grammaticalization, but indeed semantically related meanings. The data contained a number of examples that can be characterized as ‘bridging contexts’ (Evans/Wilkins 2000). These are examples in which two meanings are not only available for one form, but are each in their own way evoked and supported by different elements from the context. Although the two meanings are clearly distinct in a number of ways, the basis for the dual semantics is the fact that the two meanings still share some semantic features. Evans and Wilkins have argued that these bridging contexts constitute the semantic stage preceding polysemy. Example (6) is an illustration of such a bridging context. In this example, different can either be interpreted as expressing that the Arab standards are qualitatively different from those of the rest of the world, or as simply signalling that the two ‘worlds’ have distinct sets of standards. (6)

Prince Saud declined to mention Yemen by name, but referring to catastrophe of Iraqi aggression, he said: One of the saddest elements of crisis was that there were voices in the Arab world trying to justify premise that Arabs lived by different standards from the rest of international community. (CB)

the the the the

Thus from a semantic point of view the synchronic data seem to support an analysis in terms of grammaticalization. As already indicated, the different adjectives display this process of grammaticalization to different degrees. Same and other

260

Tine Breban

only have textual uses and are hence fully grammaticalized in Presentday English.7 They function as models for the grammaticalization to textual uses signalling relations of identity and non-identity in the discourse respectively. They represent the two main poles of comparison, viz. identity and difference. Adjectives of the third subgroup of comparison, i.e. similarity, have fewer textual uses. This was expected, as their semantic development involves a more complex shift from the middle ground in the scale of gradable likeness to either identity or non-identity. So, whereas the change is a straightforward abstraction process for the adjectives of identity and difference, it presents more difficulties for the adjectives of similarity which, as will be illustrated in section 2.2., divide into the two meanings of identity and non-identity. The prototype status of same and other as grammaticalized postdeterminers is also formally reflected in their bondedness or ‘coalesence’ (Lehmann 1985: 308) with the primary determiner. In the case of other, this bonding is recognized in the conventional orthography of the combination with the indefinite article, another. This combination has according to the Oxford English Dictionary (henceforth OED) (Vol. 7: 229) been written as one word since the 17th century. For same, the OED (Vol. 9: 74) remarks that, although it is not orthographically bonded with the definite article, “the prefixed article is functionally part of the word”. They hence form a single functional unit. From a synchronic point of view, finally, additional support for the grammaticalization hypothesis was provided by a comparative analysis of the corresponding Dutch adjectives of identity and difference (Breban 2002/2003), which manifest the same general semantic and formal characteristics associated with the grammaticalization analysis. Similar to their English counterparts – same, identical, other and different – the Dutch adjectives zelfde, identiek, ander and verschillend/verscheiden have lexical attribute and predicate uses as well as textual postdeterminer and classifier uses. Moreover, the Present-day Dutch data also contain several examples 7

As we will see in section 3, same was in fact never used in English as quality attribute. It was introduced into the language as postdeterminer strengthening the relation of identity marked by the definite article.

The English Adjectives of Comparison: A Diachronic Case Study

261

of bridging contexts indicative of ongoing semantic change. In addition to these semantic similarities, the Dutch adjective zelfde (‘same’) displays the same formal reflex of grammaticalization as the English adjective other. It can be bonded orthographically to the article, either definite or indefinite, resulting in the complex determiners dezelfde, hetzelfde (‘the same’) and eenzelfde (‘a same’).

2.2. Specific patterns of grammaticalization In the previous section, it was shown how the semantic polysemy which characterizes the adjectives of comparison in Present-day English fits in with the general semantic characterization of the point of origin and result of a grammaticalization process. In this section, it will be explained how this general semantic characterization, from lexical, propositional semantics to grammatical, textual semantics, is realized, on a more specific level, by two different grammaticalization patterns determined by the two distinct constructions in which the adjectives of comparison are used.8 From a formal point of view, the synchronic corpus data show that the adjectives of comparison can operate in two different constructions in the NP, realizing either external or internal comparison (cf. Halliday/Hasan 1976: 78). In the former construction, the second element of the comparison is expressed separately and not by the same NP as the first element of the comparison which is always expressed by the NP containing the adjective of comparison. The second element can either be referred to by another NP in the context, as in (7), or in a prepositional phrase attached to the NP with the adjective of comparison, as in (1) and (6). In the latter, internal, construction, both elements of the comparison are expressed by the same NP which also contains the adjective of comparison and the comparison is hence NP-internal, e.g. (8).

8

In recent contributions to grammaticalization research (e.g. Heine 2003; Traugott 2003), strong emphasis is placed on the context-induced nature of grammaticalization and closer attention is given to the role of the specific constructions in which lexical items occur as they grammaticalize.

262

Tine Breban

(7)

In the open-plan Sport office – all grey carpet tiles and yellowing back issues – sales and marketing director Karren Brady is on the telephone to her boss, publisher David Sullivan. […] For our interview, Brady chooses a rather different environment, an airy Italian restaurant in Knightsbridge. (CB)

(8)

Neither partner should be comparing the man’s touching style with the style the woman used in part A. There’s no reason they should be taking the same approach or using the same touches or sequence; they are two different people with individual feelings and perceptions. (CB)

Although the grammatical uses that develop in both constructions can be characterized as textual uses designating identity or non-identity (the two elements of comparison are either signalled to be the same instance or distinct instances), they realize very different functional values in the various constructions. In the external construction, the qualitative adjective develops into a marker of text-cohesive relations (i.e. anaphoric and cataphoric relations) in the discourse. The textual adjective, either postdeterminer or classifier, signals that the instance or subtype referred to by the NP is the same one as or a different one from another instance or subtype that is mentioned in the discourse context (i.e. the second element of the comparison). As we saw with respect to a different house in example (2) for instance, the postdeterminer of non-identity, different, indicates that a new instance of the type ‘house’, and not the previously mentioned cottage, is concerned. When used in an internal construction, the qualitative adjective expresses the idea that the different entities denoted by the NP share no to few characteristics, as in (8). The grammatical uses that develop in this construction are functional elements which specify the number of the NP. The postdeterminer uses of non-identity, e.g. (9), say that the NP refers to distinct, and hence more instances than one. The postdeterminer uses of identity, as in (10), indicate that the same instance, which is often a generalization rather than a concrete spatiotemporal instance, is associated with different contextually specified situations. (9)

The IT department is faced with the task of successfully integrating large numbers of PC based client systems with the central server computers. The design of software to run simultaneously on different computers linked by a network is an essential aspect of the implementation of these systems. (CB)

The English Adjectives of Comparison: A Diachronic Case Study (10)

263

In Europe, we work gradually on our mise-en-place, which means doing all the basic preparation for the service of a meal, such as chopping the herbs, making the basic sauces and generally insuring that all the ingredients are ready for immediate use as soon as the order is received in the kitchen. But in Japan they are so competitive it becomes a race to cut the chives the fastest and make sure that each piece of herb is the identical size. (CB)

It should be noted that the second type of grammaticalized uses, those originating in the internal construction, are in the synchronic corpus material limited to postdeterminer uses only. This implies that there were no instances of non-phoric classifier uses in the data used here. The recognition of these two grammaticalization patterns provides the necessary background to take a closer look at the semantics and uses of the third group of adjectives of comparison, the adjectives of similarity. In section 2.1., the observation was made that their semantics do not allow a straightforward semantic shift, but induce a split, causing some lexical uses to develop into grammatical uses expressing identity and others expressing non-identity. For the postdeterminer uses of the adjectives of similarity, the split is in the first place determined by the opposition between the distinct constructions of external versus internal comparison. In examples such as (11), the comparison is internal. Like adjectives of identity such as same in (10), similar in (11) indicates that one and the same thing, viz. the same socio-economic group, is associated with different situations implied in the surrounding discourse, in this case, the different persons talked about in the book. When the comparison is external, the value of the adjective of similarity is further determined by the type of instance that is designated by the NP. In the framework of Cognitive Grammar, Langacker (1991) proposes to distinguish different levels of instances denoted by NPs: the prototypical instance is of course a concrete spatio-temporal entity, such as the houses in (12), but it is also possible for a NP to denote a generalized entity, such as a quality or a generalization, as illustrated in (13). The actual combination of the adjectives of similarity with these two types of instances gives rise to the following two patterns. In combination with a concrete spatio-temporal instance, as in example (12), the adjective of similarity has the same textual semantics as other in (4) or different in (2): it indicates that a different instance of the same type is referred

264

Tine Breban

to.9 In a NP designating a generalized instance, as in (13), by contrast, the adjective of similarity functions in the same way as a postdeterminer of identity, and signals that the same generalization talked about earlier is being referred to again. (11)

Close friend Peter Murray, executor of the millionaire’s estate, said Wright’s book called the Uncommon Thread was a thriller based on real events. That uncommon thread was that they were all members of a similar socioeconomic group, of very wealthy parents and there are certainly some very high ranking people amongst them, he said. (CB)

(12)

I’d love to visit the house in Scotland or, even better, love to read about similar stately homes cared for in such a way all over Britain. (CB)

(13)

It has become common for young blacks to greet each other as ‘nigga’, Prof. Kelley said. He has heard white youths in New York’s Greenwich Village cheerfully greeting their black friends in a similar fashion. (CB)

With respect to the classifier uses of the adjectives of similarity, the split is dependent on a different factor, as the only attested classifier uses are phoric ones. The split between the schemata of identity and non-identity depends on the type of head noun that the adjective modifies and especially its hierarchical relation with the head noun describing the second element of the comparison. As examples (14) and (15) illustrate, the two head nouns can either be of the same level of generality, viz. “African Americans” and “Whites” in (14), or the head noun of the second element of the comparison can refer to a more specific type, “greenfood” in (15), while the head noun of the comparative NP refers to a more general supertype, viz. “fresh foods”. (14)

Indeed, studies, even with children, show that when the self-images of middle-class or affluent African Americans are measured, their feelings of self-esteem are more positive than those of comparable Whites. (CB)

9

The semantics of the two types of adjectives, difference versus similarity, is not entirely the same. They highlight different aspects of the general phoric semantics of non-identity in accordance with their original lexical semantics: the fact that a new instance is involved for the adjectives of difference and the fact that the type specification is shared for the adjectives of similarity.

The English Adjectives of Comparison: A Diachronic Case Study (15)

265

The most common condition encountered is Vitamin A deficiency. This is because this vitamin is only present in seed at low levels: much richer sources are present in greenfood and similar fresh foods. (CB)

In examples of the former type, the classifier expresses identity; in (14) for example, comparable conveys the meaning that the subtype concerned is the same subtype that was derived from the general type “African Americans” earlier in the discourse, viz. “middle-class or affluent”. In examples such as (15), by contrast, the classifier follows the schema of non-identity, and similar indicates that other types of “fresh foods” besides greenfood are being referred to. So, again, the different semantic options, identity versus non-identity, are determined by two distinct lexicogrammatical constructions, which Breban and Davidse (2003: 304) refer to as subclassification and superclassification respectively. These two sections have briefly presented the synchronic situation for the English adjectives of comparison. The next section zooms in on the main topic of this study, the actual diachronic development of the six aforementioned adjectives of comparison, and correlates it with different aspects of the grammaticalization hypothesis that are mentioned or implied in the synchronic analysis given here.

3. Six diachronic case studies This section discusses the results of six diachronic corpus studies, supplemented with information and examples found in standard reference works such as the OED, the Anglo-Saxon Dictionary (henceforth ASD) and the Middle English Dictionary (henceforth MED). Each section starts with a qualitative historical overview: that is, which meanings the individual adjectives developed at which period in history. The main part of each section will present and discuss the quantitative profiles of the distribution of the different meanings in each of the subcorpora. Because of their parallel semantic evolution, the two adjectives of similarity, similar and comparable,

266

Tine Breban

will be treated together in section 3.5. The other adjectives, same, identical, other and different, will be dealt with separately in sections 3.1. to 3.4.

3.1. Same Although same functions in Present-day English as a model for the grammaticalization of the adjectives of identity, the diachronic data show that there are no instances of lexical uses of same in the history of English. From its earliest uses in the English NP on, same is used with a textual meaning as a postdeterminer. Therefore, on the basis of the absence of lexical NP data in the corpora consulted, it can be concluded that same did not grammaticalize in English. The etymological information available in the OED (Vol. 9: 74) reveals that same was originally an Indo-Germanic word, reconstructed as *somo. This word is related to the Sanskrit word samá (‘level’, ‘equal’, ‘same’), which is evidence of an earlier lexical semantics of same. In addition, in the Old English data (750-1050), same is present in a special fixed construction which went out of use after this period. The Old English variant of same, sama, is always part of the adverbial construction swa sama (swa) ‘so same (as)’ equivalent to Present-day English ‘in the same way’. Same here seems to have a lexical value with a propositional content comparable to Present-day English equal. So, although same always has grammatical semantics in the English NP, there are indications of an earlier lexical meaning associated with same. As shown in Table 2, the quantitative distribution of the same remains identical throughout the different subcorpora. Same is, except for the Old English data, always used with textual semantics as a postdeterminer. The remainder of this section sketches the semantic development of same in the corpus data. In the subcorpus 1250-1500, the adjective same comes into use in the English NP, where it functions as an emphatic marker of the text-cohesive relation of identity as it is expressed by the definite determiner, e.g. (16) and (17). This emphatic use has a similar function to the text-cohesive postdeterminer of identity: it stresses that the instance denoted by the

The English Adjectives of Comparison: A Diachronic Case Study

267

NP can be identified with a previously mentioned instance of the type and hence realizes a phoric relation of identity. SAME

Size of sample

attr

postdet

Attr or postdet

class

pred

combination swa same

7501050 10501250 12501500 15001710 17101780 17801850 18501920 1990-

21 100% 0

0

0

0

0

0

21 100%

100 100% 100 100% 100 100% 100 100% 100 100% 200 100%

0

100 100% 100 100% 100 100% 100 100% 100 100% 200 100%

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0 0 0 0 0

Table 2. The historical distribution of the different meanings of same.10 (16)

For Salomon seith that ‘ydelnesse techeth a man to do manye yveles.’ And the same Salomon seith that ‘he that travailleth and bisieth hym to tilien his land shal eten breed, but he that is ydel and casteth hym to no bisynesse ne occupacioun shal falle into poverte and dye for hunger.’ (HC 1350-1420)

(17)

(a1420) Lydg. TB 3.3281: þe same nexte niÑt..it sempte in þe hiÑe hevene þe cataractis hadde bene vn-do (MED Vol. 18: 67)

10

The tables that are included for each of the adjectives focus on the distribution of the adjectives over the main types of uses that these adjectives can realize. The different uses in question are the attribute use (abbreviated as attr), the postdeterminer use (abbreviated as postdet), bridging examples that allow both an attribute and a postdeterminer reading (abbreviated as attr or postdet), the classifier use (abbreviated as class), with special mention of the lexical classifier examples (abbreviated as lex class) for the adjectives identical and other, and the predicative use (abbreviated as pred). Quantificational information on more specific ‘intrafunction’ types of uses is included in the text itself, as it is specific to each adjective and inclusion in the general tables is felt to render comparison with respect to the distribution of the different adjectives studied more difficult.

268

Tine Breban

In this capacity, same replaces the Old English emphatic combinations definite determiner + self or ilca (cf. ASD: 860 for self and 587 for ilca). Because of their emphatic nature, these markers relatively easily lose their strengthening effect and hence their usefulness, and are subject to frequent ‘renewal’ (Hopper/Traugott 2003: 122-124) by other semantically similar words. Same itself will, in the same way, later lose ground to new emphatic markers such as very and identical (see 3.2.). In the period 1500-1710, the phoric combination the same develops a special use. As illustrated in (18) and (19), it can be used as a sort of pronoun signalling anaphoric relations of identity within the text, and is then similar to the endophoric uses of it, them, and to a lesser extent he. In later stages of the language, this proform use is lost again. (18)

This Jaff was Sumtyme a grett Citee, as it appereth by the Ruyne of the same, but nowe ther standeth never an howse but oonly ij towers, And Certeyne Caves vnder the grounde. (HC 1500-1570)

(19)

There was a man of the Pharisees, named Nicodemus, a ruler of the Iewes: The same came to Iesus by night, and said vnto him, Rabbi, wee know that thou art a teacher come from God: for no man can doe these miracles that thou doest, except God be with him. (HC 1570-1640)

A second, less frequent, postdeterminer use of same, which is not attested in the corpus data, but illustrated in the MED, is the nonphoric postdeterminer use.11 When used as a non-phoric postdeterminer, the same expresses the meaning that one and the same instance is associated with distinct situations. In the earliest examples, this non-phoric postdeterminer use is often part of the combination one and the same, as in (20). The first examples of non-phoric use of same in the corpus data date from the period 1710-1780, e.g. (21). From then on, the non-phoric postdeterminer use accounts for a 11

The examples of non-phoric postdeterminer use are however restricted to the MED. The OED contains early examples of phoric postdeterminer use only. Since the earliest non-phoric example dates from 1384, while the earliest examples of phoric use date back to 1200, it can be speculated that the nonphoric postdeterminer use is a later development caused by the spread of the same from the external to the internal comparison construction.

The English Adjectives of Comparison: A Diachronic Case Study

269

substantial part of the postdeterminer data, viz. 17 out of 100 postdeterminer data for the period 1710-1780, 18 out of 100 for 17801850, 13 out of 100 for 1850-1920 and 41 out of 200 for the Presentday English postdeterminer data. (20)

(c1384) WBible (1) 1 Cor.12.11: Alle thes thingis oon and the same spirit worchith. (MED Vol. 18: 66)

(21)

The trade of the corn merchant is composed of four different branches, which, though they may sometimes be all carried on by the same person, are, in their own nature, four separate and distinct trades. (CLMET 1710-1780)

From 1710, the data also contain a large number of fixed expressions with the same, e.g. at the same time, which display processes of lexicalization into one functional unit. The prepositional phrase at the same time develops first into a temporal conjunction equivalent to while, as in (22), and later into a coordinating or concessive conjunction equivalent to and, also, e.g. (23), or but, however, e.g. (24). (22)

If only I could in any way manage to pin him against the wall till help came! Once more I dashed my hardest angle against him, at the same time alarming the whole household by my cries for aid. (CLMET 1850-1920)

(23)

When my brother left us yesterday, he imagined that the business which took him to London might be concluded in three or four days; but as we are certain it cannot be so, and at the same time convinced that when Charles gets to town he will be in no hurry to leave it again, we have determined on following him thither, that he may not be obliged to spend his vacant hours in a comfortless hotel. (CLMET 1780-1850)

(24)

“Mentioning to your Papa that I thought Miss Tox and myself might now go home (in which he quite agreed), I inquired if he had any objection to your accepting this invitation. He said, ‘No, Louisa, not the least!’” Florence raised her tearful eyes. “At the same time, if you would prefer staying here, Florence, to paying this visit at present, or to going home with me…” (CLMET 1780-1850)

This development from prepositional phrase to temporal conjunction to coordinating/concessive conjunction is analogous to the development of other phrases such as þa hwile þe which developed into while. Traugott and König (1991) have analysed this type of

Tine Breban

270

development as a common path of grammaticalization and subjectification. The total number of examples in which the same is part of a fixed expression is 7 out of 100 for the period 1710-1780, 8 out of 100 for 1780-1850, 10 out of 100 for 1850-1920 and 23 out of 200 for the COBUILD data. A second type of use which occurs from 1710 is the occurrence of the same as a predicate-like proform in combination with a copular verb, e.g. in the combination to remain the same in (25). (25)

If the existing areas are to remain the same, then, on the whole, my vote is against municipal trading, and on the whole, with regard to light, to tramways and communications, to telephones, and indeed to nearly all such public services, I would prefer to see these things in the hands of companies, and I would stipulate only for the maximum publicity for their accounts and the fullest provision for detailed regulation through the Board of Trade. (CLMET 1780-1850)

This proform use is found in 4 out of 100 examples of the 1710-1780 data, 2 out of 100 for the 1780-1850 data, 4 out of 100 for the 18501920 data and 10 out of 200 for the COBUILD data.

3.2. Identical With the adjective identical, the diachronic material seems to require a more nuanced grammaticalization analysis: in the earliest stages, the postdeterminer uses predominate, while the lexical uses come to be more frequent in the later stages. According to the OED (Vol. 5: 18), the adjective was recruited in the 17th century from French and/or Latin in the forms identical and identic, the latter of which went out of use. On its entry in English, the adjective seems to have been used especially as an emphatic marker for the relation of identity, similar to same in the thirteenth century (see 3.1.). This use is illustrated by example (26), in which identic is added to the definite article to further emphasize the relation of identity. Similar to same, identical thus functions as postdeterminer indicating a phoric relation of identity. In addition to this phoric postdeterminer use, the early examples in the OED also contain a few non-phoric postdeterminer examples of identical, often in combination with same. In (27), for

The English Adjectives of Comparison: A Diachronic Case Study

271

instance, identical is added to reinforce the idea that only one path is referred to. (26)

1664 BUTLER Hud. II. i. 149 The Beard’s th’ Identick Beard you knew. (OED Vol. 5: 18)

(27)

1633 AUSTIN Medit. (1635) 36 The Spirit..leades not every man in the same identicall path. (OED Vol. 5: 18)

As indicated in Table 3, the earliest attestations of identical in the corpus material date from the period 1710-1780 and consist of postdeterminer as well as lexical attribute and predicative uses of identical. Based on the chronology of the examples cited in the OED, these lexical uses seem to emerge later (18th and 19th centuries) than the postdeterminer use, which dates back to the early 17th century, as already mentioned. IDENTICAL

size of sample

750-1050

0

1050-1250

0

1250-1500

0

1500-1710

0

1710-1780

23 100% 33 100% 73 100% 200 100%

1780-1850 1850-1920 1990-

attr

postdet

attr or postdet

class

lex class12

Pred

5 22% 2 6% 12 16.5% 51 25.5%

16 69.5% 15 45.5% 18 24.5% 23 11.5%

0

0

0

0

0

0

0

0

0

11 5.5%

3 1.5%

45 22.5%

2 8.5% 16 48.5% 43 59% 67 33.5%

Table 3. The historical distribution of the different meanings of identical.

12

The lexical classifier uses, such as identical in identical twins, include classifier uses that determine subtypes on the basis of a lexical value. In identical twins for instance, identical indicates that the two persons involved look identical or have the same genetic material.

272

Tine Breban

The quantitative analysis of the corpus data represented here allows us to explain the evolution of identical in the following way. In the period 1710-1780, identical is mainly used (16 out of 23 instances) as a postdeterminer emphasizing the relation of identity, often in combination with other emphatic markers such as the demonstrative determiner, and very or same, e.g. (28). (28)

Susan, from the account she had received of Mrs Waters, made not the least doubt but that she was the very identical stray whom the right owner pursued. (CLMET 1710-1780)

By contrast, in the next period (1780-1850), identical is much more often used as a predicate (16 out of a total of 33 examples as opposed to 2 out of 23 in the period 1710-1780), often in combination with a prepositional phrase introduced by with (8 out of 16 predicative data), as in (29). A possible factor of influence in the emergence of this predicative use is the formally similar use of identical expressing the very specific concept of identity in logical treatises, which in the OED dates back to the early 17th century, and which is illustrated in (30). (29)

We will now turn to the order of reptiles, which gives the most striking character to the zoology of these islands. […] There is one snake which is numerous; it is identical, as I am informed by M. Bibron, with the Psammophis Temminckii from Chile. (CLMET 1780-1850)

(30)

1644 DIGBY Two Treat. II. ii. 18 The greatest assurance and the most eminent knowledge we can have of any thing is, of such Propositions as in the Schooles are called Identicall; as if one should say, Iohn is Iohn, or a man is a man. (OED Vol. 5: 18)

The data from 1850-1920 show an increase in the number of attribute uses of identical (12 out of 73 data as opposed to 2 out of 33 data for the preceding period). The lexical use of identical, which now manifests all the characteristics associated with qualitative adjectives, such as regular occurrence with submodifiers (e.g. almost and nearly), has become the most attested use of identical in the corpus material. As shown in Table 2, it accounts for 55 out of a total of 73 instantiations of identical or 75.5%. The postdeterminer use of identical becomes numerically less important as the combination the identical as an emphatic marker loses its strength; it covers only

The English Adjectives of Comparison: A Diachronic Case Study

273

24.5% of the data. However, the data also contain certain new postdeterminer uses of identical in new contexts. Similar to same, the non-phoric postdeterminer use of identical in examples with internal comparison, such as (31), becomes more frequent (4 of the 18 postdeterminer examples). However, in this period, in contrast to same, this non-phoric postdeterminer use is no longer limited to the combination the identical, but also manifests itself in the indefinite constructions an identical + count noun and ø identical + uncount noun, as illustrated by (32). Thus, the indefinite construction, found in 2 of the 4 non-phoric examples, becomes a new context in which identical allows for grammatical semantics, similar to the combination eenzelfde [a same] in Dutch. In fact, this new grammatical reading of identical in indefinite NPs is not restricted to constructions with internal comparison, i.e. the non-phoric postdeterminer use, but also becomes available in NPs with external comparison, generating phoric uses (encountered in 1 of the 14 phoric postdeterminer data). The resulting combination of indefinite article and phoric postdeterminer signalling identity is, however, only appropriate in very specific contexts, as identical conveys the idea that the instance denoted by the NP is the same as another instance that is present in the discourse context, while the indefinite article indicates that the instance denoted by the NP cannot/should not be identified. Contexts which satisfy both requirements are contexts in which the particular instance is only available by implication, and hence not yet identifiable for the hearer. For instance, in (33), the combination of the indefinite article with a postdeterminer signalling identity is licensed because in the context it is not relevant which specific language is being talked about; the only important point is that it is the same language that is shared by the foreigner and the speaker. (31)

Thus, this animistic belief in samsara in sense allegorico claims that the identical vijnana as a subject of samsara can be literally reincarnated as different lives – the heavenly, the human, the animal, the ghostly and the purgatorial – one after the other, without cease, all during its lives of prenirvana. (CB)

(32)

It is clear that in employing vernacular languages for translation, missionaries saw these languages as more than arbitrary devices. On the contrary, they saw them as endowed with divine significance, so that they may substitute

Tine Breban

274

completely for the language of revelation. The fact that all languages are, for the purposes of Christian translation, interchangeable, makes them ‘instrumental’, so that in their very differences they all serve an identical purpose. (CB) (33)

How can we heartily obey one who is but a foreigner with the accident of an identical language? (CLMET 1850-1920)

It is this indefinite construction an/ø identical that becomes the main postdeterminer use in Present-day English (accounting for 7 out of 9 non-phoric postdeterminer uses and 11 out of 13 phoric postdeterminer uses), while the original, definite phoric combination the identical is, in the corpus used here, restricted to NPs with a restrictive relative clause conveying the second element of the comparison, as in (34).13 (34)

Ironically, One Man’s fatal fall came at the identical fence which caused the retirement of another great grey, Desert Orchid. (CB)

With respect to identical, we can conclude that, similar to same, the adjective did not grammaticalize in English for the combination the identical, as this combination was borrowed from Romance languages as a renewing emphatic marker. The borrowing of the lexical uses of identical by English language appears to be of a later date and can be hypothesized to be influenced by the availability of the word form in logical treatises and as an emphatic marker. However, the combination an identical with postdeterminer meaning may have developed from the same combination with identical functioning as lexical attribute.

3.3. Other Other is the second adjective of comparison that has, as Table 4 shows, fully grammaticalized in Present-day English: the only uses encountered in the synchronic data are phoric, text-cohesive, post13

However, the larger corpus used in Breban/Davidse (2003), consisting of 400 instantiations of identical, did contain one example of an anaphoric postdeterminer use.

The English Adjectives of Comparison: A Diachronic Case Study

275

determiner and classifier uses. But in contrast to same, the diachronic data up until 1920 still contain a few examples of lexical (attribute and predicative) uses, which in Present-day English have to be expressed by different. This lexical meaning is also recognized as a separate meaning by the main reference works, OED, ASD and MED, which, moreover, also contain examples of lexical uses for periods that did not contain any lexical uses in the corpus data. Some examples of lexical uses from the different periods are given in (35)-(38). (35)

Ic ÿa ælfred cyning þas togædere gegaderode & awritan het, monege þara þe ure foregengan heoldon, ÿa ÿe me licodon; & manege þara þe me ne licodon ic awearp mid minra witena geÿeahte, & on oÿre wisan bebead to healdanne. (‘and commanded to behave in a different way’) (HC 850-950)

(36)

a1450(a1338) Mannyng Chron.Pt.13954: Eumaneus was Morganes broþer, Bot his maners were alle oþer. (MED Vol. 14: 333)

(37)

“You should be friends with your cousin, Mr Hareton,” I interrupted, “since she repents of her sauciness. It would do you a great deal of good; it would make you another man to have her for a companion.” (CLMET 1780-1850)

(38)

Mr Reardon, it was true, did not impress one as a man likely to push forward where the battle called for rude vigour, but Amy soon assured herself that he would have a reputation far other than that of the average successful storyteller. (CLMET 1850-1920)

In the Present-day English data, there are no more examples of other as a lexical qualitative adjective. In this respect, the diachronic material of other confirms the grammaticalization analysis. As shown in Table 4, the distribution of the grammatical uses of other remains stable over the different subcorpora. The vast majority (always over 90%) are (phoric) postdeterminer uses, but there is always a smaller set of (phoric) classifier uses attested (ranging from 1.5% to 9%). In Present-day English, however, the data manifest two new constructions (usually restricted to qualitative predicative adjectives) in which grammatical other in combination with the preposition than comes to be used: firstly, as a postmodifier of the form an N other than, both with postdeterminer and classifier meaning, as in examples (39) and (40) respectively, and secondly, in combination with a copular verb with a meaning equivalent to not, e.g. (41).

Tine Breban

276 OTHER

750-1050 1050-1250 1250-1500 1500-1710 1710-1780 1780-1850 1850-1920 1990-

size of sample

attr

postdet

attr or postdet

class

lex class14

pred

100 100% 100 100% 100 100% 100 100% 200 100% 200 100% 200 100% 200 100%

4 4% 4 4% 0

92 92% 93 93% 91 91% 98 98% 197 98.5% 193 96.5% 194 97% 190 95%

0

3 3% 3 3% 9 9% 2 2% 3 1.5% 3 1.5% 4 2% 9 4.5%

0 0

1 1% 0

0

0

0

0

0

0

0

1 0.5% 1 0.5% 0

0 0 2 1% 1 0.5% 0

0 0 0 0 1 0.5% 0 0

0 1 0.5%

Table 4. The historical distribution of the different meanings of other. (39)

Erm I’ve been arguing against the notion of crediting children with full blown schemas like containment and support as the basis for learning spatial words on the grounds that these schemas don’t help if you happen to be learning a language er other than English or closely-related languages. (CB)

(40)

The wise men seemed annoyed when they realized that all three of them were approaching the same people. Each must have assumed the others were there on business other than pastoral and had rudely chosen that moment to deal with it. (Martel 2003: 87)

14

The corpus contains one (Present-day English) example in which other is used as a lexical classifier, viz. Every holiday he spent with his wife, so Hannah was left alone at important times like Christmas. He had no intention of leaving his wife and there were none of those gifts or little niceties which an ‘other woman’ usually receives. (CB) Although the semantic value of other is based on its grammaticalized meaning, this attestation is categorized as a lexical classifier use because the adjective has attained the status of a lexical compound in combination with the head noun woman, which is signalled by the addition of quotation marks and the separation of other and the indefinite article.

The English Adjectives of Comparison: A Diachronic Case Study (41)

277

That is to say, the attack on religion by denying there could be a god of goodness and purpose, is actually a backhanded way of holding the universe to an inviolate moral standard the source of which is other than the universe we are in the act of challenging. (CB)

The use of other than in postmodifier position is, moreover, involved in a process of lexicalization and further grammaticalization into a preposition similar to except (for), besides, as illustrated in (42) and (43) respectively. Consequences of this further grammaticalization are that other than can be separated from the noun it modifies and occur at the beginning of the sentence, as in (43), or even modify other elements than nouns, as in (42). (42)

Tunisian officials are refusing to expand on the communique, other than to indicate that Mr. Mahjoubi is suspected of having used his former position of police commissioner for personal ends. (CB)

(43)

And you know, people are often saying, well, chimpanzees and humans are very alike, we share 98 of our DNA, of our genetic material. We have so much in common in our behaviour, and what is the real difference? Well other than that other 1.8 genetic difference, to me, the real key thing about humans is that we have developed a spoken language. (CB)

3.4. Different Although the earliest examples of different cited in the OED (Vol. 3: 341) and the MED (Vol. 4: 1077) date back to 1400 and 1384 respectively, the first attestations of different in the Helsinki Corpus are two examples dating from the period 1570-1640; more specifically, there is in this period one example of a predicative use and one postdeterminer example. So, the diachronic corpus material does not contain a stage in which different only displays lexical uses. All the examples dating from before 1570 quoted in the OED and MED are however strictly lexical uses (both attribute as in (44) and predicate as in (45)), which seems to confirm that different underwent the predicted grammaticalization process. (44)

(c1449) Pecock Repr.438: Petir..was heed in a dyuers and different maner fro ech other Apostle. (MED Vol. 4: 1077)

278

Tine Breban

(45)

c1450 De CMulieribus 391: Voyce and stature wass lytell different Twyx hirr and hym. (MED Vol. 4: 1077)

The distribution of the different meanings in the corpus data, or more concretely the proportion lexical versus textual uses, shown in Table 5, does not straightforwardly support the grammaticalization hypothesis, as the number of lexical uses increases over the period 1710-1920 at the expense of the postdeterminer uses, going from 66 lexical uses versus 118 postdeterminer uses for the period 1710-1780, and 100 versus 83 for 1780-1850, to 123 versus 69 for 1850-1920. But as will become clear from the following discussion, several secondary factors can be shown to be responsible for this decrease in postdeterminer uses.

750-1050

Size of sample 0

1050-1250

0

1250-1500

0

1500-1710

13 100% 200 100% 200 100% 200 100% 200 100%

DIFFERENT

1710-1780 1780-1850 1850-1920 1990-

attr

postdet

attr or postdet

class

pred

3 23% 25 12.5% 32 16% 51 25.5% 45 22.5%

5 38.5% 118 59% 100 50% 69 34.5% 88 44%

0

0

16 8% 17 8.5% 8 4% 11 5.5%

0

5 38.5% 41 20.5% 51 25.5% 72 36% 56 28%

0 0 0

Table 5. The historical distribution of the different meanings of different.

Other aspects of the corpus material, nevertheless, do provide readily available additional support for the grammaticalization analysis. In contrast to other, the postdeterminer uses of different consist, throughout the entire diachronic corpus, of two types: a majority of non-phoric postdeterminer uses (here different emphasizes the fact that the NP refers to distinct instances) and a small group of phoric postdeterminer uses (equivalent to the postdeterminer uses of other).

The English Adjectives of Comparison: A Diachronic Case Study

279

The precise distribution of these two types of postdeterminer uses is given in Table 6. For both types of postdeterminers, the data reveal, through the different historical periods, further processes of semantic change. More specifically, the data attest some cases of further grammaticalization for the non-phoric postdeterminers and some instances of specialization for the phoric postdeterminers. postdet. uses of DIFFERENT 1500-1710 1710-1780 1780-1850 1850-1920 1990-

total number of postdet 5 100% 118 100% 100 100% 69 100% 88 100%

non-phoric postdet 3 60% 99 83.9% 84 84% 52 75.4% 70 79.5%

phoric postdet 1 20% 18 15.3% 16 16% 16 23.2% 16 18.2%

non-phoric + phoric interpretation 1 20% 1 0.8% 0 1 1.4% 2 2.3%

Table 6. Non-phoric versus phoric postdeterminer uses of different.

When different is used non-phorically in a plural NP, it manifests an analogous process of grammaticalization to other semantically similar adjectives such as various, sundry, divers and several. As noted in the OED, these adjectives, which all originally express difference or dissimilarity of some kind, display a development from expressing dissimilarity, via indicating that separate instances are referred to (i.e. a non-phoric postdeterminer meaning), to a weak quantifier sense in indefinite NPs that can be paraphrased as ‘several, more than one’. The different diachronic subcorpora show the further grammaticalization of different along this path, going from non-phoric postdeterminer use to the preparatory stages of a weak quantifier use.15 In the examples dating up to 1780, different occurs as a nonphoric postdeterminer equally in definite and indefinite plural NPs. 15

In Breban (2002/2003: 188-190), the same process of further grammaticalization was shown to happen to the Dutch counterparts of different, verschillend and verscheiden, which in contrast to different did develop a full quantifier use.

280

Tine Breban

But in the period 1780-1850, the number of indefinite NPs containing different (which is the only context in which a quantifier use is possible) crosses the fifty percent mark (43 out of 84 examples), and from 1850, it is the main context in which non-phoric different is found, accounting for 40 of the 52 examples. The data from this period, 1780-1920, also contain a few examples in which different can only be paraphrased by the quantifier meaning ‘several’, ‘more than one’ (3 out of 84 non-phoric examples for 1780-1850 and 2 out of 52 for 1850-1920), e.g. (46). (46)

A criminal was branded, during my stay here, for the third offence; but the relief he received made him declare that the judge was one of the best men in the world. I sent this wretch a trifle, at different times, to take with him into slavery. (CLMET 1780-1850)

The Present-day English situation is comparable to the period 18501920. In contrast to divers and especially several, the process of grammaticalization has not run its full course, but seems to have reached an equilibrium. The main uses of different remain the lexical uses, and the non-phoric uses are postdeterminers rather than fullyfledged quantifiers. With respect to the phoric postdeterminer use, the diachronic data show a process of specialization, in the sense of a restriction of the contexts in which different can be used as a phoric postdeterminer. The phoric postdeterminer use of different has always been restricted, compared to the possibilities of phoric other, as it has always been limited to indefinite NPs. But, especially in the Present-day English material, the contexts in which different occurs as a phoric postdeterminer are drastically reduced, leaving only contexts in which the antecedent (second element of the comparison) is either a more extended text referent or is part of the extra-verbal context of situation (different then realizes ‘exophoric reference’ (Halliday/Hasan 1976: 18; Martin 1992: 122)), as illustrated in (47) and (48) respectively. (47)

What you doing in science at the moment? Erm well we’re doing compounds and what makes up compounds and stuff like and water and doing experiments on water and how to make water into hydrogen and oxygen and hydrogen and oxygen into water. Mhm. Erm we’re now doing what was it erm we were doing a couple of weeks ago about erm nails and how they rust and

The English Adjectives of Comparison: A Diachronic Case Study

281

we were setting up experiments and that. Mhm. What did you do at the beginning of term er was it that as well or have you or have you covered a bit more? Well we’re doing we’re on a different topic now. (CB) (48)

My husband speaks very well but his job involves my answering the phone on his behalf quite a bit of the time. Yesterday he said, very nicely, that he wished I had a different accent – I’m pure Scouse – because he didn’t think it sounded good to his clients. (CB)

The Present-day English data for other show that these are exactly the contexts in which other loses ground. The language system thus seems to resolve a situation in which two forms have a similar meaning by developing a situation of complementary distribution. In the light of the formal characteristics of grammaticalization proposed by Lehmann (1985: 308), this development can be interpreted as evidence for a certain ‘paradigmaticization’ of the phoric postdeterminer uses of other and different. With respect to the quantitative diachronic analysis of different, it can be argued now that precisely this restriction of the use of phoric different and the blocking of the further grammaticalization of non-phoric different by several and various, is to a certain extent responsible for the decrease in postdeterminer uses manifested in the corpus data.

3.5. Similar and comparable As Tables 7 and 8 show, the most noticeable difference between the comparative adjectives of similarity and the adjectives of difference and identity discussed above is that the development of the textual use of the former starts at a later date. For comparable, the postdeterminer use only occurs in Present-day English and for similar, it is in the same period that we see a great increase in postdeterminer examples. A second element suggesting a more recent grammaticalization process is the comparatively large number of bridging contexts found in the Present-day English data, as can also be seen in Tables 7 and 8. This late development poses a practical problem, as the lack of corpus data between 1920 and 1990 makes it impossible to look at the actual process at work in the development and increase of the grammatical uses.

Tine Breban

282

750-1050

size of sample 0

1050-1250

0

1250-1500

0

1500-1710

0

1710-1780

110 100% 200 100% 200 100% 200 100%

SIMILAR

1780-1850 1850-1920 1990-

Attr

postdet

attr or postdet

class

pred

55 50% 99 49.5% 106 53% 57 28.5%

7 6.5% 17 8.5% 17 8.5% 60 30%

12 10.5% 30 15% 22 11% 18 9%

0

36 33% 53 26.5% 55 27.5% 51 25.5%

1 0.5% 0 14 7%

Table 7. The historical distribution of the different meanings of similar. COMPARABLE

Size of sample

750-1050

0

1050-1250

0

1250-1500

0

1500-1710

1 100% 5 100% 17 100% 18 100% 200 100%

1710-1780 1780-1850 1850-1920 1990-

Attr

postdet

attr or postdet

class

pred

0

0

0

0

0

0

0

0

0

0

0

0

1 5.5% 42 21%

0

0

0

37 18.5%

14 7%

6 3%

1 100% 5 100% 17 100% 17 94.5% 101 50.5%

Table 8. The historical distribution of the different meanings of comparable.

On the basis of the figures presented here, the historical development of the semantics of similar and comparable can be summarized along the following lines. During the period 1710-1920, similar shows a slow increase in postdeterminer uses, going from 7 out of a total of

The English Adjectives of Comparison: A Diachronic Case Study

283

110 available data for 1710-1780 to 17 out of 200 for the periods 1780-1850 and 1850-1920. Analogously to the data for different, the corpus gives no evidence for a stage at which similar could be used only as a lexical qualitative adjective. With regard to the data for comparable, by contrast, the only examples available up until 1920 are lexical uses, consisting for the period 1500-1850 solely of predicative uses, such as (49). The subcorpus 1850-1920 contains the first attestation of comparable in a NP, in the lexical function of attribute. This example is reproduced here as (50). (49)

So this Citie had it beene built but one mile lower on the Sea side, I doubt not but it had long before this beene comparable to many a one of our greatest Townes and Cities in Europe, both for spaciousnesse of bounds, Port, state, and riches. (HC 1570-1640)

(50)

But now the real power is not in the Sovereign, it is in the Prime Minister and in the Cabinet – that is, in the hands of a committee appointed by Parliament, and of the chairman of that committee. Now, beforehand, no one would have ventured to suggest that a committee of Parliament on foreign relations should be able to commit the country to the greatest international obligations without consulting either Parliament or the country. No other select committee has any comparable power… (CLMET 1850-1920)

The Present-day English data contain the first textual, both postdeterminer and classifier, uses of comparable, illustrated by (51) and (52) respectively, and a considerably larger amount of postdeterminer examples for similar (60 out of a total of 200 examples as opposed to 17 out of 200 for the period 1850-1920). (51)

The climax was 72 hours of non-stop talks, with Mr Blair tackling problems that threatened to kill the deal right up to the last moment. The talks overshot their Thursday midnight deadline by more than 17 hours. But at 5.36 pm yesterday, exhausted politicians – many of them sworn enemies – announced that agreement had been reached. […] Senator Mitchell said of Mr Blair and Mr Ahern: “I cannot think of a comparable instance when two leaders participated in a round-the-clock, hands-on basis for several days as they did.” (CB)

(52)

While pirates and comparable free-lance operators on land were active in the capture of people for enslavement, the actual trading of slaves in the marketplace was often done by merchant peoples who treated slaves as simply an additional form of merchandise. (CB)

284

Tine Breban

The classifier use of similar also seems to fully develop as late as Present-day English, as only one example was attested in the historical data (in the period 1850-1920). This seems to suggest that the development of the classifier use is indeed, as was hypothesized by Breban and Davidse (2003: 312), a later development than the postdeterminer use. A similar remark applies to the location of the development of non-phoric versus phoric postdeterminer use of the adjectives of similarity in time. Only very few instances of non-phoric post-determiner uses are found in the corpus material, viz. only one Present-day English example for comparable, reproduced as (53) and a small number of examples for similar (two examples for the period 1850-1920 and another two for the Present-day English data), e.g. (54). (53)

Another problem is that not all IVF clinics have comparable results: in fact, some have never had a successful pregnancy resulting in a live birth. (CB)

(54)

The dinner was thus a series of emotional crises for the diners, who knew that full dishes and clean plates came endlessly through the same door. They were all eating similar food simultaneously; they began together and they finished together. (CLMET 1850-1920)

Example (54), which is the earliest non-phoric use of similar, dates from the subcorpus 1850-1920. This observation, taken together with the restricted number of instances, seems to suggest that the development of the non-phoric postdeterminer use occurs later than the phoric postdeterminer use. A similar temporal order could be deduced on the basis of the diachronic data for same (see 3.1.). Thus, although the present corpus data provide only limited information about the semantic development of similar and comparable, the information proves to be important for establishing the order of the different changes.

The English Adjectives of Comparison: A Diachronic Case Study

285

4. Conclusion The aim of this study was to put to the test the grammaticalization hypothesis proposed to account for the polysemy displayed by the English adjectives of comparison in earlier studies and to check it against actual diachronic material. As each adjective was discussed in a separate section in this chapter, it seems useful by way of conclusion to bring the findings of the distinct corpus studies together in order to formulate a final evaluation of the grammaticalization hypothesis. For some of the adjectives, the diachronic data provided straightforward corroboration of the grammaticalization hypothesis. The historical data on other showed the availability and loss of the adjective’s lexical semantics. With respect to the two adjectives of similarity, the absence of data for the period 1920-1990 made it impossible to trace their grammaticalization as a process, but the data that were available nevertheless showed two very different semantic situations for the two adjectives before 1920 and after 1990. For comparable, the data before 1920 contained only lexical uses, while the Present-day English data consisted of a fair number of grammatical uses, both postdeterminer and classifier, and a considerable number of bridging examples. The Present-day English data for similar showed a large increase in postdeterminer and especially classifier uses and a decrease of lexical uses in comparison with the data from 1850-1920. The development of the adjectives of identity, same and identical, proved more complex. In the construction with a definite article, the same and the identical, they did not grammaticalize in English but were introduced into the language, at distinct times, as pre-made emphatic markers of the relation of identity, a function which is subject to frequent renewal. In the course of time, however, identical in particular was shown to develop grammatical meaning in new contexts, viz. both phoric and non-phoric postdeterminer meaning in the indefinite constructions an identical + count noun and ø identical + uncount noun. Finally, with regard to different, the quantitative results were at first sight at odds with the grammaticalization hypothesis. But a closer

286

Tine Breban

look at the processes of semantic change that were going on in the diachronic period concerned, revealed that processes of further grammaticalization in fact brought about constraints on the numerical development of the grammatical uses. The further grammaticalization of the non-phoric postdeterminer use into a weak quantifier use was stopped by a stronger similar development of the semantically related adjectives various and several. Secondly, the phoric postdeterminer use became restricted to a limited number of special contexts in more or less complementary distribution with the postdeterminer use of other, which has the same functional meaning. All in all, notwithstanding the limitations of the present material of course, this diachronic investigation seems to confirm the grammaticalization hypothesis in general. Moreover, it provides new information concerning the semantics of the adjectives, allowing us to finetune the grammaticalization analysis and to arrive at a more accurate description of the semantic development of the six adjectives of comparison. In the first place, it revealed the existence of other factors influencing the historical development of different adjectives, e.g. the original use of same and identical as emphatic markers. Secondly, it brought to light several secondary processes of semantic change, such as further grammaticalization, specialization and lexicalization. Finally, it helped to establish the order of the different subprocesses of grammaticalization. The evidence suggested that the phoric classifiers constitute a later development than the corresponding postdeterminer uses, and that the non-phoric use, for the adjectives of identity and similarity, developed later than their phoric use. The exact relation between these different types of grammatical uses, the classifier use and the postdeterminer use, on the one hand, and the phoric and the non-phoric postdeterminer use, on the other, needs of course to be looked at in more detail in the future. To conclude on a more programmatic point, the present investigation seems to prove the added value of diachronic corpus research, not only for the corroboration of synchronic hypotheses, but more importantly for gaining a better insight into the semantics of complex, but rich lexical items such as the English adjectives of comparison.

The English Adjectives of Comparison: A Diachronic Case Study

287

References ASD: Bosworth, Joseph/Toller, T. Northcote (eds) 1898. An AngloSaxon Dictionary. Oxford: Oxford University Press. Breban, Tine 2002/2003. The Grammaticalization of the Adjectives of Identity and Difference in English and Dutch. Languages in Contrast 4/1, 165-199. Breban, Tine/Davidse, Kristin 2003. Adjectives of Comparison: The Grammaticalization of their Attribute Uses into Postdeterminer and Classifier Uses. Folia Linguistica 37/3-4, 269-317. De Smet, Hendrik Forthcoming. A Corpus of Late Modern English Texts. To appear in ICAME Journal. Evans, Nicholas/Wilkins, David 2000. In the Mind’s Ear: The Semantic Extensions of Perception Verbs in Australian Languages. Language 76, 546-592. Halliday, Michael A.K./Hasan, Ruqaiya 1976. Cohesion in English. London: Longman. Heine, Bernd 2003. Grammaticalization. In Joseph, Brian D./Janda, Richard D. (eds) The Handbook of Historical Linguistics. Oxford: Blackwell, 575-601. Hopper, Paul J. 1991. On Some Principles of Grammaticization. In Traugott, Elizabeth Closs/Heine, Bernd (eds) Approaches to Grammaticalization. Volume I: Focus on Theoretical and Methodological Issues. Amsterdam: Benjamins, 17-35. Hopper, Paul J./Traugott, Elizabeth Closs 22003. Grammaticaliza-tion. Cambridge: Cambridge University Press. Langacker, Ronald W. 1991. Foundations of Cognitive Grammar. Volume II: Descriptive Application. Stanford: Stanford University Press. Lehmann, Christian 1985. Grammaticalization: Synchronic Variation and Diachronic Change. Lingua e Stile 20, 303-318. Martel, Yann. 2003. Life of Pi. Edinburgh: Canongate Books. Martin, James R. 1992. English Text. System and Structure. Amsterdam: Benjamins.

288

Tine Breban

MED: Kurath, Hans/Kuhn, Sherman M./Reidy, John/Lewis, Robert E. (eds) 1952-2001. Middle English Dictionary. Ann Arbor: University of Michigan Press. OED: 1989. The Oxford English Dictionary. Oxford: Oxford University Press. Traugott, Elizabeth Closs 1989. On the Rise of Epistemic Meanings in English: An Example of Subjectification in Semantic Change. Language 65, 31-55. Traugott, Elizabeth Closs 1995. Subjectification in Grammaticalisation. In Stein, Dieter/Wright, Susan (eds) Subjectivity and Subjectivisation: Linguistic Perspectives. Cambridge: Cambridge University Press, 31-54. Traugott, Elizabeth Closs 2003. Constructions in Grammaticalization. In Joseph, Brian D./Janda, Richard D. (eds) The Handbook of Historical Linguistics. Oxford: Blackwell, 624-647. Traugott, Elizabeth Closs/König, Ekkehard 1991. The SemanticsPragmatics of Grammaticalization Revisited. In Traugott, Elizabeth Closs/Heine, Bernd (eds) Approaches to Grammaticalization. Volume I: Focus on Theoretical and Methodological Issues. Amsterdam: Benjamins, 189-218.

GÖRAN KJELLMER

Panchrony in Linguistic Change: The Case of Courtesy

1. Pattern of linguistic change It is self-evident that linguistic change takes place in real time. One stage of a language, whether at the phonological, morphological, lexical or syntactic level, is followed by another stage as time moves on. Stage A is followed by stage B, which is followed by stage C, in turn followed by stage D. If an illustration is needed, take the simple example of classical Old English bindan, which becomes late Old English binden, changes into binde in the Middle English period and eventually ends up as bind in Modern English. This is the kind of description we usually get in etymological dictionaries and in the etymological sections of ordinary desk dictionaries: A

B

C

D

Figure 1. Stages of linguistic development.

Such a description is to some extent idealised, in that the different stages normally do not succeed each other in an orderly fashion. There is typically a varying amount of overlap between them, so that an innovation only gradually establishes itself as the standard while the previous standard is being phased out. Stages A and B coexist for a limited but variable time, like, later, B and C, and, later again, C and D. This might be illustrated in the following way:

Göran Kjellmer

290 A B C D Figure 2. Partly overlapping stages of linguistic development.

This situation is normal in any given variant of a language, and it becomes strikingly obvious when the whole of the language with its different variants is considered. Aitchison (2004: 4) describes it in the following way: Sounds and words do not gradually ‘turn into’ one another, as had been assumed. Instead, a new sound or meaning creeps in alongside the old, and coexists, sometimes for centuries. Eventually, the intruder takes over, like a young cuckoo pushing an existing occupant out of the nest. Yet even the young cuckoo idea is now recognized as over-simple. Multiple births – several new forms – may arise, and co-exist for a long time. Then eventually, one is likely to win out.

In any case, linguistic change is clearly essentially diachronic.

2. Panchrony Despite what was remarked in the previous section, there are complications. If stages A, B, C and D, having developed in the language, all show signs of staying on and of being used independently of each other, a new situation arises, one where both diachrony and synchrony are involved. Here we may be entitled to speak of a panchronic situation of linguistic variability and change.1

1

Panchronic is here being used without the pretensions implicit in its definition in the OED: “Pertaining to or designating linguistic study applied to all languages at all stages of their development”. It rather refers to the

Panchrony in Linguistic Change: The Case of Courtesy

291

In this context the case of the English word courtesy could be of some interest. In order to study courtesy in Modern English I made use of the CobuildDirect corpus, which comprises c. 57 million words. There are 542 occurrences of the singular form courtesy and 22 of the plural form courtesies in the corpus when duplicates have been removed.

3. Development of courtesy In the following, four stages in the development of courtesy will be distinguished. In the first stage, where courtesy is a regular common noun, countable and uncountable with a wide range of collocations and often used as a premodifier, it has the positive core meaning of ‘(sign of) consideration, politeness, respect, generosity’. In the second, it tends to occur without a determiner as a frozen phrase, by courtesy (of), where the positive element of the core meaning often disappears. In the third, by is omitted, and the remaining phrase, courtesy of, becomes a compound preposition neutrally indicating the source or cause of something. The last phase of its development shows courtesy as a one-word preposition with a neutral referencing function.

3.1. First stage Courtesy is a French loanword (< Old French corteisie), which was introduced into English in Middle English times. Its meaning was ‘Courteous behaviour; courtly elegance and politeness of manners; graceful politeness or considerateness in intercourse with others.’ (OED, courtesy 1.a), or (as a quality) ‘Courteous disposition; courteousness; also nobleness, generosity, benevolence, goodness’ (obs.) (OED, courtesy 2.a). The word is current in Modern English, and its core meaning has changed very little from Middle English simultaneous application of a synchronic and a diachronic approach, as used in Persson (1993), for example.

Göran Kjellmer

292

times, ‘(sign of) consideration, politeness, respect, generosity’. The Corpus material shows it to be used both as an uncountable and as a countable noun, as in the following sentences:2 Uncountable noun (1)

They listened with courtesy and an air of mild interest ... (ukbooks/08 Text: B0000000026)

Countable noun, singular and plural (2)

In Middle East heat, foot washing was a physical necessity and a common courtesy offered to guests at feasts, usually carried out by a servant on both men and women. (times/10. Text: N2000960405)

(3)

It’s the little courtesies that keep a marriage going. (times/10. Text: N2000951224)

It is much more frequent as an uncountable than as a countable noun (statistics will be given later), and it occurs in a variety of collocations, of which the most notable may be: out of courtesy common courtesy have the courtesy to treat sb. with courtesy extend the courtesy to sb. do/give/grant/show/offer sb. the courtesy to/of. Like most other nouns, courtesy can also be used in premodifier function: courtesy call, courtesy visit, courtesy title; in this function its meaning becomes slightly extended as ‘(supplied, esp. for use) free of charge, as a courtesy’ (OED s.v. courtesy 13, “chiefly U.S”), as in courtesy car, courtesy coach, courtesy phone, courtesy van service, etc. A fairly frequent pattern in which courtesy occurs is DETERMINER + courtesy + PREPOSITION:

2

In all the examples, the underlining is mine.

Panchrony in Linguistic Change: The Case of Courtesy

293

(4)

But, again, while stressing his courtesy towards the Queen, he does say that the question of Prince Charles’ succession ‘must be more of an open question than it has ever been’ [...] (ukmags/03. Text: N0000000444)

(5)

We are privileged to present the accompanying block by the courtesy and kindness of the Editor and proprietors of the Glasgow News, and also the particulars in this paragraph. (ukephem/02. Text: E0000002202)

3.2. Second stage A certain blocklike quality begins to emerge when the determiner is omitted, whether or not the head has a prepositional postmodifier; the OED records such articleless phrases from the 17th century. Modern examples are: (6)

People in Afghanistan, after they retired, always retained, by courtesy, the highest title they had held during their careers. (ukbooks/08. Text: B0000000124)

(7)

Access to the lochside can be gained by courtesy of the Loch Arthur Community. (ukbooks/08. Text: B0000000702)

(8)

The Ayatollah Khomeini, having settled at Neauphle-le-Chateau, outside Paris, by courtesy of President Giscard d’Estaing, was sending ever more inflammatory messages to the religious masses in Iran. (ukbooks/08. Text: B0000000888)

The meaning is essentially the same in these last examples as we have seen before (‘politeness, consideration’). However, the positive interpretation of by courtesy of becomes less obvious in cases like (9): (9)

Darkie, sorry Darcus, claims he is qualified to share a platform with Brother Zinzun having watched the entire uprising by courtesy of CNN reports relayed to the TV set in his hotel room [...] (ukmags/03. Text: N0000000571)

We do not expect CNN reports to show signs of consideration or generosity. The phrase is rather a polite way of referring to the source

Göran Kjellmer

294

of the material where much of the positive connotation of the phrase is gone. By courtesy of seems to display a semantic change not unlike that of thanks to; in both cases their original positive connotations can disappear and be replaced by neutral or even negative connotations, as in the following: Positive (10)

A Sunday roast wouldn’t be the same without a hearty helping of Yorkshire pudding. And now, thanks to Findus, you don’t have to wait for Sunday to come around. [‘with the help of’] (ukmags/03. Text: N0000000103)

Neutral or negative (11)

Their success faded after another two years, thanks to waning public interest in silver space bellbottoms and problems caused by jealousies within the band. [‘owing to’] (ukmags/03. Text: N0000000387)

3.3. Third stage As the next stage in the development of courtesy, a more reduced phrase then appears in the form of courtesy of. This phrase is different both formally and semantically from the courtesy of, where courtesy is the head of a noun phrase as in (12a), whereas in courtesy of it has become part of a compound preposition introducing an adverbial phrase as in (12b): (12a)

the problem here is that if you are going to show somebody the courtesy of listening to their advice they ain’t half going to be pissed off afterwards if you choose not to follow it. (ukspok/04. Text: S9000001538)

(12b) Thanks to Hughes’ tireless work in the toughest of environments, though, thousands of youngsters have been saved from a life of crime and learned to be disciplined courtesy of boxing. (sunnow/17. Text: N9119980501)

It is significant that courtesy of, unlike the courtesy of, is often preceded by a punctuation mark in the corpus: a comma, a dash or a parenthesis, signalling the beginning of the adverbial constituent. It is

Panchrony in Linguistic Change: The Case of Courtesy

295

also worth noticing that a great many of the courtesy of phrases are followed by the name of a company or an organisation. Courtesy of often has positive connotations: (13)

Meeting Venus is a splendid movie, punctuated by some of the sweetest singing, courtesy of Dame Kiri Te Kanawa’s voice-overs. (today/11. Text: N6000920327)

(14)

“[...] They were organised, disciplined and deserved the victory.” It came courtesy of a 36th-minute goal. (sunnow/17. Text: N9119980406)

The clause preceding courtesy of frequently expresses something positive whose source or cause is identified by means of the phrase courtesy of. There is thus a semantic element remaining from the original positive meaning (if someone has the courtesy to do something, that something is perceived as something positive). But that this element has become bleached and is giving way to a more neutral meaning is shown by the fact that courtesy of can also refer to something negative in the context:3 (15)

An Everton defeat today – courtesy of arch-enemies Arsenal – will mean Klinsmann can indulge himself in a celebration party against Southampton next Sunday. (sunnow/17. Text: N9119980503)

(16)

But he was hampered at Edgbaston by a groin strain and [...] by a badly bruised big toe, courtesy of Pakistani pace ace Waqar Younis in England’s second innings. (today/11. Text: N6000920704)

(17)

Set in Melbourne’s suburban wastelands this film involves some desperate young adults, portrayed by Aden Young (disfigured by a harelip courtesy of make-up) [...] (oznews/01. Text: N5000950525)

3

In the British National Corpus the following sentence occurs: “The headache came courtesy of a gash in the scalp suffered in the thrilling 1 - 0 win over Aston Villa and from the celebrations that followed Norwich going back on top of the Premier League” (HJ3 4521).

Göran Kjellmer

296

In such cases – admittedly fewer than the ones with positive connotations – courtesy of just refers neutrally to the cause or source of the event or circumstance just mentioned.

3.4. Fourth stage The last step in its development comes when courtesy of disposes of the preposition and itself becomes a one-word preposition denoting source or cause.4 (18)

TAKE it easy with a pleasant Sunday afternoon of music, song and fun, courtesy Caloundra Chorale and Theatre Company on August 6 at 2pm (oznews/01. Text: N5000950730)

(19)

MARLBORO invest in the infrastructure of motor racing and, courtesy of mclaren, they gave three promising young drivers a taste at Silverstone that summer of 1983: Brundle, Stefan Bellof, and Senna. (ukbooks/08. Text: B0000000807)

(20)

Cover and map photos courtesy Santa Barbara Conference and Visitors Bureau (usephem/05. Text: E9000000287)

(21)

Elsewhere, Barri licks over a standard ‘Murderer’ lyric, [...] speeds up the bogle and adds a world music edge courtesy Sly Dunbar [...] (ukmags/03. Text: N0000000861)

As used here, courtesy has become a fully-fledged one-word preposition. Quirk et al. (1985: 658) give three negative criteria for central prepositions. Such prepositions “cannot have as a complement (i) a that-clause (ii) an infinitive clause (iii) a subjective case form of a personal pronoun”

4

It is unlikely that the same result could have come about by the loss of the preposition in by courtesy, as that phrase, far less frequent than courtesy of, never seems to be used prepositionally (*by courtesy President Giscard d’Estaing).

Panchrony in Linguistic Change: The Case of Courtesy

297

Applying those criteria to courtesy, we do not find examples like: (i) *She passed the exam courtesy that she had studied intensely. (ii) *She passed the exam courtesy to have studied intensely. (iii) *She passed the exam courtesy I. It may be noted that courtesy is not the only noun to have changed into a preposition; the Old English noun dún ‘hill’ was used adverbially in ofdúne ‘from the hill, downwards’, which via Middle English adoun became Modern English down, adverb and also preposition (‘down the hill’). The parallelism with courtesy is striking.

4. Grammaticalisation This last stage in the development of courtesy, to become a one-word preposition, is represented by fewer cases in the Corpus than the corresponding pluri- or multi-word phrases. As a preposition courtesy also represents the last stage in its grammaticalisation process. From being an ordinary common noun it has changed semantically, having had restrictions placed upon it both formally and functionally, with courtesy of as a stepping-stone on the way. Semantically, the central politeness/respect/generosity element is almost gone and in any case no longer obligatory; the typical meaning it conveys is a neutral but polite one of reference. Formally, it is condensed to the one invariable form, courtesy. And functionally, it has been transformed into a preposition with one single task, that of introducing a postposed reference – it can no longer be shifted around freely (“fixation”). Its grammaticalisation has thus involved a radical change from a variable, adaptable, many-faceted noun into an invariable preposition with just one fixed function (cf. Lehmann 1995: 164). Whether it has also been affected phonologically is difficult to tell from its written manifestations, but as prepositions are normally unstressed, it seems likely that courtesy in its role as a preposition undergoes stress reduction, which may be viewed as yet another restriction.

Göran Kjellmer

298

5. Statistics The occurrences of courtesy in the CobuildDirect corpus are distributed as follows: Type of courtesy Common noun, countable Common noun, uncountable Common noun, countable/uncountable Courtesy as premodifier by courtesy by courtesy of + N Courtesy of + N Courtesy + N TOTAL

N 12 103 51 49 2 8 309 8 542

% 2,2 19 9,4 9 0,3 1,4 57 1,4 100

Table 1. Courtesy in CobuildDirect.

The statistics show that courtesy is fairly frequent as a common noun: 166 cases, 215 if we include its use as a premodifier. What is striking, however, is that its most frequent use is in the prepositional phrase courtesy of, 309 cases, out of which we have good reason to believe the pure preposition courtesy has developed. Its prepositional use is thus by far the most frequent, which is worth noting, if only because it is not even mentioned by the OED. The corpus shows that the most frequent use of courtesy, in a compound preposition, is well established in the variants of English represented in the Corpus, British, American and Australian; it is common in the quality press as well as in the popular press.

6. Panchronic situation If we regard the different variants of courtesy as subsequent stages in a chain of development, as has been done, it is noteworthy that all the links in that chain are still with us in Modern English. It is obvious

Panchrony in Linguistic Change: The Case of Courtesy

299

that Figure 2 will not serve as a schematic representation of that development; it will have to be amended to look something like Figure 3.

A. Common noun

B. By courtesy (of)

C. Courtesy of

D. Courtesy

Figure 3. Completely overlapping stages of linguistic development.

If the steps in the development of courtesy are described in this schematic way, the letters A, B, C and D are used as follows: A: Common noun, positive connotations (“his courtesy towards the Queen”, “old-fashioned courtesies”); B: By courtesy (of), mostly positive connotations (“retained, by courtesy, the highest title they had held”, “by courtesy of CNN reports”); C. Courtesy of, positive > neutral connotations, compound preposition (“Cover photograph courtesy of Gerry Ellis”); D. Courtesy, neutral connotations, one-word preposition (“music, song and fun, courtesy Caloundra Chorale”). It should be noted that although the variants resulting from this development are clearly related historically, they are nevertheless completely independent items in the present-day language. Moreover,

300

Göran Kjellmer

none of them shows signs of obsolescence; they all belong to the current lexical inventory of today’s language. In being intimately related diachronically and fully independent synchronically they might be seen as a good Janus-faced illustration of panchrony in linguistic change.

References Aitchison, Jean 2004. Absolute Disasters: The Problems of Layering. In Gottlieb, Henrik/Mogensen, Jens Erik/Zettersten, Arne (eds) Symposium on Lexicography XI. Proceedings of the Eleventh International Symposium on Lexicography (May 2-4, 2002, University of Copenhagen). Lexicographica, Series Maior. Tübingen: Niemeyer, 1-15. Aston, Guy,/Burnard, Lou 1998. The BNC Handbook. Edinburgh: Edinburgh University Press. British National Corpus, see Aston and Burnard (1998). CobuildDirect Corpus, see Sinclair (1987). Lehmann, Christian 1995. Thoughts on Grammaticalization. München/Newcastle: Lincom Europa. OED = The Oxford English Dictionary, On-line version. Oxford: Clarendon. Persson, Gunnar 1993. Think in a Panchronic Perspective. Studia Neophilologica 65, 3-18. Quirk, Randolph/Greenbaum, Sidney/Leech, Geoffrey/Svartvik, Jan 1985. A Comprehensive Grammar of the English Language. London/New York: Longman. Sinclair, John MacHardy (ed.) 1987. Looking up. An Account of the COBUILD Project in Lexical Computing. London/Glasgow: Collins.

Linguistic Insights Studies in Language and Communication

This series aims to promote specialist language studies in the fields of linguistic theory and applied linguistics, by publishing volumes that focus on specific aspects of language use in one or several languages and provide valuable insights into language and communication research. A cross-disciplinary approach is favoured and most European languages are accepted. The series includes two types of books: – Monographs – featuring in-depth studies on special aspects of language theory, language analysis or language teaching. – Collected papers – assembling papers from workshops, conferences or symposia.

Vol. 1 Maurizio Gotti & Marina Dossena (eds) Modality in Specialized Texts. Selected Papers of the 1st CERLIS Conference. 421 pp. 2001. ISBN 3-906767-10-8. US-ISBN 0-8204-5340-4 Vol. 2 Giuseppina Cortese & Philip Riley (eds) Domain-specific English. Textual Practices across Communities and Classrooms. 420 pp. 2002. ISBN 3-906768-98-8. US-ISBN 0-8204-5884-8 Vol. 3 Maurizio Gotti, Dorothee Heller & Marina Dossena (eds) Conflict and Negotiation in Specialized Texts. Selected Papers of the 2nd CERLIS Conference. 470 pp. 2002. ISBN 3-906769-12-7. US-ISBN 0-8204-5887-2

Vol. 4 Maurizio Gotti, Marina Dossena, Richard Dury, Roberta Facchinetti & Maria Lima Variation in Central Modals. A Repertoire of Forms and Types of Usage in Middle English and Early Modern English. 364 pp. 2002. ISBN 3-906769-84-4. US-ISBN 0-8204-5898-8 Vol. 5 Stefania Nuccorini (ed.) Phrases and Phraseology. Data and Descriptions. 187 pp. 2002. ISBN 3-906770-08-7. US-ISBN 0-8204-5933-X Vol. 6 Vijay Bhatia, Christopher N. Candlin & Maurizio Gotti (eds) Legal Discourse in Multilingual and Multicultural Contexts. Arbitration Texts in Europe. 385 pp. 2003. ISBN 3-906770-85-0. US-ISBN 0-8204-6254-3 Vol. 7 Marina Dossena & Charles Jones (eds) Insights into Late Modern English. 378 pp. 2003. ISBN 3-906770-97-4. US-ISBN 0-8204-6258-6 Vol. 8 Maurizio Gotti Specialized Discourse. Linguistic Features and Changing Conventions. 351 pp. 2003, 2005. ISBN 3-03910-606-6. US-ISBN 0-8204-7000-7 Vol. 9 Alan Partington, John Morley & Louann Haarman (eds) Corpora and Discourse. 420 pp. 2004. ISBN 3-03910-026-2. US-ISBN 0-8204-6262-4 Vol.10 Martina Möllering The Acquisition of German Modal Particles. A Corpus-Based Approach. 290 pp. 2004. ISBN 3-03910-043-2. US-ISBN 0-8204-6273-X

Vol. 11 David Hart (ed.) English Modality in Context. Diachronic Perspectives. 261 pp. 2003. ISBN 3-03910-046-7. US-ISBN 0-8204-6852-5 Vol.12 Wendy Swanson Modes of Co-reference as an Indicator of Genre. 430 pp. 2003. ISBN 3-03910-052-1. US-ISBN 0-8204-6855-X Vol.13 Gina Poncini Discursive Strategies in Multicultural Business Meetings. 338 pp. 2004. ISBN 3-03910-222-2. US-ISBN 0-8204-7003-1 Vol.14 Christopher N. Candlin & Maurizio Gotti (eds) Intercultural Aspects of Specialized Communication. 369 pp. 2004. ISBN 3-03910-352-0. US-ISBN 0-8204-7015-5 Vol.15 Gabriella Del Lungo Camiciotti & Elena Tognini Bonelli (eds) Academic Discourse. New Insights into Evaluation. 234 pp. 2004. ISBN 3-03910-353-9. US-ISBN 0-8204-7016-3 Vol.16 Marina Dossena & Roger Lass (eds) Methods and Data in English Historical Dialectology. 405 pp. 2004. ISBN 3-03910-362-8. US-ISBN 0-8204-7018-X Vol.17 Judy Noguchi The Science Review Article. An Opportune Genre in the Construction of Science. Forthcoming. ISBN 3-03910-426-8. US-ISBN 0-8204-7034-1 Vol.18 Giuseppina Cortese & Anna Duszak (eds) Identity, Community, Discourse. English in Intercultural Settings. 495 pp. 2005. ISBN 3-03910-632-5. US-ISBN 0-8204-7163-1

Vol. 19 Anna Trosborg & Poul Erik Flyvholm Jørgensen (eds) Business Discourse. Texts and Contexts. 250 pp. 2005. ISBN 3-03910-606-6. US-ISBN 0-8204-7000-7 Vol. 20 Christopher Williams Tradition and Change in Legal English. Verbal Constructions in Prescriptive Texts. 216 pp. 2005. ISBN 3-03910-644-9. US-ISBN 0-8204-7166-6 Vol. 21 Katarzyna Dziubalska-Kolaczyk & Joanna Przedlacka (eds) English Pronunciation Models: A Changing Scene. 476 pp. 2005. ISBN 3-03910-662-7. US-ISBN 0-8204-7173-9 Vol. 22 Christián Abello-Contesse, Rubén Chacón-Beltrán, M. Dolores López-Jiménez & M. Mar Torreblanca-López (eds) Age in L2 Acquisition and Teaching. 214 pp. 2006. ISBN 3-03910-668-6. US-ISBN 0-8204-7174-7 Vol. 23 Vijay K. Bhatia, Maurizio Gotti, Jan Engberg & Dorothee Heller (eds) Vagueness in Normative Texts. 474 pp. 2005. ISBN 3-03910-653-8. US-ISBN 0-8204-7169-0 Vol. 24 Paul Gillaerts & Maurizio Gotti (eds) Genre Variation in Business Letters. 407 pp. 2005. ISBN 3-03910-674-0. US-ISBN 0-8204-7552-1 Vol. 25 Ana María Hornero, María José Luzón & Silvia Murillo (eds) Corpus Linguistics. Applications for the Study of English. 526 pp. 2006. ISBN 3-03910-675-9 / US-ISBN 0-8204-7554-8 Vol. 26 J. Lachlan Mackenzie & María de los Ángeles Gómez-González (eds) Studies in Functional Discourse Grammar. 259 pp. 2005. ISBN 3-03910-696-1 / US-ISBN 0-8204-7558-0

Vol. 27 Debbie Guan Eng Ho Classroom Talk. Exploring the Sociocultural Structure of Formal ESL Learning. --- pp. 2006. ISBN 3-03910-761-5 / US-ISBN 0-8204-7561-0 Vol. 28 Forthcoming. Vol. 29 Francesca Bargiela-Chiappini & Maurizio Gotti (eds) Asian Business Discourse(s). 350 pp. 2005. ISBN 3-03910-804-2 / US-ISBN 0-8204-7574-2 Vol. 30 Nicholas Brownlees (ed.) News Discourse in Early Modern Britain. Selected Papers of CHINED 2004. 300 pp. 2006. ISBN 3-03910-805-0 / US-ISBN 0-8204-8025-8 Vol. 31 Roberta Facchinetti & Matti Rissanen (eds) Corpus-based Studies of Diachronic English. 300 pp. 2006. ISBN 3-03910-851-4 / US-ISBN 0-8204-8040-1 Vol. 32 Marina Dossena & Susan M. Fitzmaurice (eds) Business and Official Correspondence: Historical Investigations. 209 pp. 2006. ISBN 3-03910-880-8 / US-ISBN 0-8204-8352-4

Editorial address: Prof. Maurizio Gotti

Università di Bergamo, Facoltà di Lingue e Letterature Straniere, Via Salvecchio 19, 24129 Bergamo, Italy Fax: 0039 035 2052789, E-Mail: [email protected]

David Hart (ed.)

English Modality in Context Diachronic Perspectives Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2004. 261 pp., num. ill. and tables Linguistic Insights. Studies in Language and Communication. Vol. 11 Edited by Maurizio Gotti ISBN 3-03910-046-7 / US-ISBN 0-8204-6852-5 pb. sFr. 70.– / €* 48.30 / €** 45.10 / £ 29.– / US-$ 53.95 * includes VAT – only valid for Germany and Austria ** does not include VAT

This volume presents a collection of papers which consider the phenomenon of modality in the context of English historical linguistics, in particular as a consequence of changes taking place at the beginning of the Early Modern period. The contributions, representing post-Lightfoot thinking, consider semantic and pragmatic approaches to the question in a generally corpus-based approach. It is essentially a review of modal forms in use, whether they be central or marginal verbal forms or the non-verbal forms which are available in English. Contents: David Hart: Introduction – Olga Fischer: The Development of the Modals in English: Radical Versus Gradual Changes – Debra Ziegeler: On the Generic Origins of Modality in English – Rafał Molencki: What Must Needs Be Explained About Must Needs – Arja Nurmi: Youe shall see I will conclude in it: Sociolinguistic Variation of WILL/WOULD and SHALL/SHOULD in the Sixteenth Century – Maurizio Gotti: Pragmatic Uses of Shall and Will for Future Time Reference in Early Modern English – Gabriella Mazzon: Modality in Middle English Directive/ Normative Texts – Marina Dossena: Hedging in Late Middle English, Older Scots and Early Modern English: the Case of SHOULD and WOULD – Vanda Polese: Semantic and Pragmatic Shades of Modal Meaning in Utopia. The Editor: David Hart is Associate Professor of English in the Department of Linguistics at the University of Rome Three. He has coordinated the research into aspects of Early Modern English modality, supported by the Italian Ministry of Education. He teaches and researches in the History of the English Language, and is particularly interested in questions relating to word formation and to pragmatic aspects of the sixteenth century theatre in England.

PETER LANG Bern · Berlin · Bruxelles · Frankfurt am Main · New York · Oxford · Wien

Ana María Hornero / María José Luzón / Silvia Murillo (eds)

Corpus Linguistics Applications for the Study of English Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2006. 526 pp. Linguistic Insights. Studies in Language and Communication. Vol. 25 Edited by Maurizio Gotti ISBN 3-03910-675-9 / US-ISBN 0-8204-7554-8 pb. sFr. 118.– / €* 81.30 / €** 76.– / £ 53.20 / US-$ 90.95 * includes VAT – only valid for Germany and Austria ** does not include VAT

The aim of this volume is to present a state-of-the-art view on corpus studies. This collection of papers, presented at the XII Susanne Hübner Seminar in November 2003 at the University of Zaragoza, comprises both quantitative and qualitative analyses and studies on both written and oral corpora. Structured in seven sections, the book covers a wide range of approaches and methodologies and reflects current linguistic research. The papers have been written by scholars from a large number of universities, mainly from Europe, but also from the USA and Asia. The volume offers contributions on diachronic studies, pragmatic analyses and cognitive linguistics, as well as on translation and English for Specific Purposes. The book includes several papers on corpus design and reports on research on oral corpora. At a more specific level, the papers analyse aspects such as politeness issues, dialectology, comparable corpora, discourse markers, the expression of evidentiality and writer stance, metaphor and metonymy, conditional sentences, evaluative adjectives, delexicalised verbs and nominalization. With contributions by: Ana María Hornero – María José Luzón – Silvia Murillo – Terttu Nevalainen – Laurel Smith Stvan – Keiko Abe – Antonio Pinna – Carmen Santamaría-García – Laura Hidalgo – Juana I. Marín – Elena Martínez – Silvia Molina – Olga Isabel Díez – Carlos Inchaurralde – Josep Marco – Brian Mott – Ma Pilar Navarro – Noelia Ramón – Patricia Rodríguez – Rosa Lorés – Sonia Oliver del Olmo – Carmen Pérez-Llantada – Ignacio Vázquez – Isabel Verdaguer – Natalia Judith Lasso – Esther Asprey – Lourdes Burbano – Kate Wallace – Carmen Valero – Paula García – Nancy Drescher – Javier Pérez-Guerra – Ma Dolores Ramirez.

PETER LANG Bern · Berlin · Bruxelles · Frankfurt am Main · New York · Oxford · Wien

Maurizio Gotti / Marina Dossena / Richard Dury / Roberta Facchinetti / Maria Lima

Variation in Central Modals A Repertoire of Forms and Types of Usage in Middle English and Early Modern English Bern, Berlin, Bruxelles, Frankfurt am Main, New York, Oxford, Wien, 2002. 364 pp. Linguistic Insights. Studies in Language and Communication. Vol. 4 General Editor: Maurizio Gotti ISBN 3-906769-84-4 / US-ISBN 0-8204-5898-8 pb. sFr. 89.– / €* 61.40 / €** 57.40 / £ 41.– / US-$ 68.95 * includes VAT – only valid for Germany and Austria ** does not include VAT

This volume presents the results of a research team of the University of Bergamo, whose aim was the analysis of verbal modality in the Helsinki corpus. This corpus includes a large selection of texts compiled in Middle English and Early Modern English and offers a good diatypic coverage, as it contains a wide range of text-types, genres and registers. Within a common methodological framework, individual chapters measure and analyze the occurrence and semantic values of central modal verbs, relating them to such parameters as text type, speech-relatedness and pragmatic function. This research project is part of a wider national project aiming to register and comment on the formal variety of modal manifestations and their relative frequency in a range of texts covering approximately four centuries, from about 1300 to 1700. Contents: English Linguistics – History of the English Language – Modal Verbs – Dynamic, Epistemic, Deontic Modality and Meanings – Verb Phrase Structure. The Authors: Maurizio Gotti is Professor of English at the University of Bergamo (Faculty of Foreign Languages and Literatures). Marina Dossena is Associate Professor of English at the University of Bergamo (Faculty of Foreign Languages and Literatures). Richard Dury is Associate Professor of English at the University of Bergamo (Faculty of Literature and Philosophy). Roberta Facchinetti is Associate Professor of English at the University of Verona (Faculty of Education). Maria Lima is Associate Professor of English at the University of Salerno (Faculty of Political Science).

PETER LANG Bern · Berlin · Bruxelles · Frankfurt am Main · New York · Oxford · Wien

UG LI 31.p65

1

○○○○○○○○○○○○○○○○○○○○○○○○○○○○

Corpus-based Studies of Diachronic English

Corpus-based studies of diachronic English have been thriving over the last three decades to such an extent that the validity of corpora in the enrichment of historical linguistic research is now undeniable. The present book is a collection of papers illustrating the state of the art in corpus-based research on diachronic English, by means of case-study expositions, software presentations, and theoretical discussions on the topic. The majority of these papers were delivered at the 25th Conference of the International Computer Archive of Modern and Medieval English” (ICAME), held at the University of Verona on 18-23 May 2004. A number of typological and geographical varieties of English are tackled in the book: from general to specialized English, from British to Australian English, from written to speech-related registers. In order to discuss their tenets, the contributors draw on corpora and dictionaries from different centuries, including the most recent ones; hence, they testify to the fact that past and present are so strongly interlocked and so inextricably entwined that it proves hard – if not preposterous – to fully understand Present-day English structure and features without turning back to the previous centuries for an indepth knowledge of the ‘whys’ and ‘hows’ of the current state of the art.

li31

Linguistic Insights Studies in Language and Communication

Roberta Facchinetti & Matti Rissanen (eds)

Corpus-based Studies of Diachronic English

Peter Lang

Roberta Facchinetti is Professor of English at the University of Verona, Italy. Her research field and publications are mainly concerned with language description, textual analysis and pragmatics. This is done mostly by means of computerized corpora of both synchronic and diachronic English. Matti Rissanen is Emeritus Professor of English Philology at the University of Helsinki and a team leader in the Research Unit for the Study of Variation, Contacts and Change in English, at the same university. His research interests include long-term diachronic development of English syntax and grammatical vocabulary and the compilation of historical corpora.

○○○○○○○○○○○○○○○○○○

ISBN 3-03910-851-4

31

Roberta Facchinetti & Matti Rissanen (eds) •

○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○○

li31

li

16.02.2006, 13:13

Verbal Constructions in Prescriptive Texts