Voices Past and Present - Studies of Involved, Speech-related and Spoken Texts: In Honor of Merja Kytö 9027207658, 9789027207654

This volume provides a diachronic and synchronic overview of linguistic variability and change in involved, speech-relat

621 47 8MB

English Pages 362 [364] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Voices Past and Present - Studies of Involved, Speech-related and Spoken Texts: In Honor of Merja Kytö
 9027207658, 9789027207654

Table of contents :
Table of contents
List of contributors
Foreword • Jonathan Culpeper
1 Voices of English: Tapping into records past and present • Ewa Jonsson and Tove Larsson
Part I. Early Modern English
2 Pragmatic noise in Shakespeare’s plays • Jonathan Culpeper and Samuel J. Oliver
3 Keywords that characterise Shakespeare’s (anti)heroes and villains • Dawn Archer and Alison Findlay
4 Revealing speech: Agentivity in Iago’s and Othello’s soliloquies • Juhani Rudanko
5 Saying, crying, replying, and continuing: Speech reporting expressions in Early Modern English • Terry Walker and Peter J. Grund
6 Interjections in early popular literature: Stereotypes and innovation • Irma Taavitsainen
7 Godly vocabulary in Early Modern English religious debate • Jeremy J. Smith
8 Patterns of reader involvement on sixteenth-century English title pages, with special reference to second-person pronouns • Matti Peikola
Part II. Late Modern English
9 Epistemic adverbs in the Old Bailey Corpus • Claudia Claridge
10 Question strategies in the Old Bailey Corpus • Patricia Ronan
11 Sure in Irish English: The diachrony of a pragmatic marker • Raymond Hickey
12 American English gotten: Historical retention, change from below, or something else? • Lieselotte Anderwald
Part III. Present-day English
13 Explaining explanatory so • David Denison
14 Return to the future: Exploring spoken language in the BNC and BNC2014 • Ylva Berglund Prytz
15 Sort of and kind of from an English-Swedish perspective • Karin Aijmer
16 From yes to innit: Origin, development and general characteristics of pragmatic markers • Anna-Brita Stenström
17 “If anyone would have told me, I would have not believed it”: Using corpora to question assumptions about spoken vs. written grammar in EFL grammars and other normative works • Sarah Schwarz and Erik Smitterberg
18 Intensification in dialogue vs. narrative in a corpus of present-day English fiction • Signe Oksefjell Ebeling and Hilde Hasselgård
19 Orality on the searchable web: A comparison of involved web registers and face-to-face conversation • Douglas Biber and Jesse Egbert
Select list of publications by Merja Kytö
Index

Citation preview

Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts In honor of Merja Kytö edited by Ewa Jonsson Tove Larsson Studies in Corpus Linguistics

97 JOHN BENJAMINS PUBLISHING COMPANY

Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Studies in Corpus Linguistics (SCL) issn 1388-0373

SCL focuses on the use of corpora throughout language study, the development of a quantitative approach to linguistics, the design and use of new tools for processing language texts, and the theoretical implications of a data-rich discipline. For an overview of all books published in this series, please see benjamins.com/catalog/scl

General Editor

Founding Editor

Ute Römer

Elena Tognini-Bonelli

Georgia State University

The Tuscan Word Centre/University of Siena

Advisory Board Laurence Anthony

Susan Hunston

Antti Arppe

Michaela Mahlberg

Michael Barlow

Anna Mauranen

Monika Bednarek

Andrea Sand

Tony Berber Sardinha

Benedikt Szmrecsanyi

Douglas Biber

Elena Tognini-Bonelli

Marina Bondi

Yukio Tono

Jonathan Culpeper

Martin Warren

Sylviane Granger

Stefanie Wulff

Waseda University

University of Alberta University of Auckland University of Sydney Catholic University of São Paulo Northern Arizona University University of Modena and Reggio Emilia Lancaster University University of Louvain

University of Birmingham University of Birmingham University of Helsinki University of Trier Catholic University of Leuven The Tuscan Word Centre/University of Siena Tokyo University of Foreign Studies The Hong Kong Polytechnic University University of Florida

Stefan Th. Gries

University of California, Santa Barbara

Volume 97 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts. In honor of Merja Kytö Edited by Ewa Jonsson and Tove Larsson

Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts In honor of Merja Kytö Edited by

Ewa Jonsson Uppsala University

Tove Larsson Uppsala University / University of Louvain

John Benjamins Publishing Company Amsterdam / Philadelphia

8

TM

The paper used in this publication meets the minimum requirements of the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.

Cover design: Françoise Berserik Cover illustration from original painting Random Order by Lorenzo Pezzatini, Florence, 1996.

doi 10.1075/scl.97 Cataloging-in-Publication Data available from Library of Congress: lccn 2020030886 (print) / 2020030887 (e-book) isbn 978 90 272 0765 4 (Hb) isbn 978 90 272 6064 2 (e-book)

© 2020 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Company · https://benjamins.com

Table of contents

List of contributors

ix

Foreword Jonathan Culpeper

xi

Chapter 1 Voices of English: Tapping into records past and present Ewa Jonsson and Tove Larsson

1

Part I.  Early Modern English Chapter 2 Pragmatic noise in Shakespeare’s plays Jonathan Culpeper and Samuel J. Oliver

11

Chapter 3 Keywords that characterise Shakespeare’s (anti)heroes and villains Dawn Archer and Alison Findlay

31

Chapter 4 Revealing speech: Agentivity in Iago’s and Othello’s soliloquies Juhani Rudanko

47

Chapter 5 Saying, crying, replying, and continuing: Speech reporting expressions in Early Modern English Terry Walker and Peter J. Grund

63

Chapter 6 Interjections in early popular literature: Stereotypes and innovation Irma Taavitsainen

79

Chapter 7 Godly vocabulary in Early Modern English religious debate Jeremy J. Smith

95

vi

Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Chapter 8 Patterns of reader involvement on sixteenth-century English title pages, with special reference to second-person pronouns Matti Peikola

113

Part II.  Late Modern English Chapter 9 Epistemic adverbs in the Old Bailey Corpus Claudia Claridge

133

Chapter 10 Question strategies in the Old Bailey Corpus Patricia Ronan

153

Chapter 11 Sure in Irish English: The diachrony of a pragmatic marker Raymond Hickey

173

Chapter 12 American English gotten: Historical retention, change from below, or something else? Lieselotte Anderwald

187

Part III.  Present-day English Chapter 13 Explaining explanatory so David Denison Chapter 14 Return to the future: Exploring spoken language in the BNC and BNC2014 Ylva Berglund Prytz Chapter 15 Sort of and kind of from an English-Swedish perspective Karin Aijmer Chapter 16 From yes to innit: Origin, development and general characteristics of pragmatic markers Anna-Brita Stenström

207

227

247

265



Table of contents vii

Chapter 17 “If anyone would have told me, I would have not believed it”: Using corpora to question assumptions about spoken vs. written grammar in EFL grammars and other normative works Sarah Schwarz and Erik Smitterberg Chapter 18 Intensification in dialogue vs. narrative in a corpus of present-day English fiction Signe Oksefjell Ebeling and Hilde Hasselgård Chapter 19 Orality on the searchable web: A comparison of involved web registers and face-to-face conversation Douglas Biber and Jesse Egbert

283

301

317

Select list of publications by Merja Kytö

337

Index

347

List of contributors

Karin Aijmer Lieselotte Anderwald Dawn Archer Ylva Berglund Prytz Douglas Biber Claudia Claridge Jonathan Culpeper David Denison Jesse Egbert Alison Findlay Peter Grund Hilde Hasselgård Raymond Hickey Ewa Jonsson Tove Larsson Signe Oksefjell Ebeling Samuel J. Oliver Matti Peikola Patricia Ronan Juhani Rudanko Sarah Schwarz Jeremy J. Smith Erik Smitterberg Anna-Brita Stenström Irma Taavitsainen Terry Walker

University of Gothenburg, Sweden University of Kiel, Germany Manchester Metropolitan University, United Kingdom University of Oxford, United Kingdom Northern Arizona University, United States of America University of Augsburg, Germany Lancaster University, United Kingdom University of Manchester, United Kingdom Northern Arizona University, United States of America Lancaster University, United Kingdom University of Kansas, United States of America University of Oslo, Norway University of Duisburg and Essen, Germany Uppsala University, Sweden Uppsala University, Sweden / University of Louvain, Belgium University of Oslo, Norway Lancaster University, United Kingdom University of Turku, Finland Technical University of Dortmund, Germany Tampere University, Finland Uppsala University, Sweden University of Glasgow, United Kingdom Uppsala University, Sweden University of Bergen, Norway University of Helsinki, Finland Mid Sweden University, Sweden

Foreword Jonathan Culpeper

Merja Kytö, professor at Uppsala University (Sweden) where she has been since 1995, is unassuming. She does not adopt the airs and graces of an academic of considerable international repute – of somebody who is the author or editor of over 20 books (see the select list of publications at the end of this volume); who received the Torgny Segerstedt Medal in 2012 and the Gustaf Adolf Gold Medal in 2019 for outstanding work in English linguistics; who has served as Head of her Department; or, above all, who has pioneered so many areas of English linguistics. It is difficult to overestimate the importance of the corpus-based approach on linguistics generally, and the history of English in particular. Merja was an early pioneer, as we see from her co-edited collection, Corpus Linguistics, Hard and Soft (1988). She was part of the team led by the late (and great) Matti Rissanen which created the Helsinki Corpus of English Texts, the first fully computer searchable historical corpus of note. Amounting to some 1.6 million words in total, it represents a range of registers from Old and Middle English to Early Modern English. Research, such as the 1993 edited collection edited by Rissanen, Merja and Palander-Collin, flowed from this corpus, establishing Merja as one of the founders of historical corpus linguistics. The fact that the texts in this corpus were grouped according to register is important: register reflects one of Merja’s enduring research interests (2019 witnessed her journal article on the topic of register in historical linguistics). It was in the early period of her research that Merja became a leading authority on the English of the early North American colonies (she remains the go-to authority on Early American English, writing the entry on that topic in the 2020 Cambridge Handbook of World Englishes). She spent a period at Yale University, studying and collating early letters, diaries and other documents. Historical socio­ linguistics had just started to gain ground, and Merja’s contribution to it was her 1991 monograph, Variation and Diachrony, with Early American English in Focus, which has become a touchstone for anybody writing about the development of the modal verbs in English. Note that this book rests on a foundation of manual labour. A computer cannot collect data by itself. Much of Merja’s career has been spent

https://doi.org/10.1075/scl.97.for © 2020 John Benjamins Publishing Company

xii Jonathan Culpeper

engaging manuscripts, doing the legwork of painstaking scholarship. Notable manuscript projects include her work with Terry Walker and Peter Grund investigating early modern witness depositions, the nature of this genre, its socio-historical and legal background, and its language. They not only released the data to the scholarly community but published a book on it in 2011: Testifying to Language and Life in Early Modern England: Including a CD-ROM Containing An Electronic Text Edition of Depositions 1560–1760 (ETED). Merja’s work on witness depositions was partly driven by her interest in the spoken interaction of the past. That is where there is a particular intersection between Merja’s interests and mine. We first met in 1996 at a hotel in Lancaster, UK. That meeting had been triggered by the late (and also great) Geoffrey Leech, who knew that we had this interest in common. In the meeting, we determined to work together to build a corpus of speech-related material, a necessary first-step, as there was no single collection of such material (or even much thought given to what might constitute speech-related material). Eventually, in 2006, A Corpus of English Dialogues 1560–1760 (CED) emerged. It is a 1.2-million-word computerized corpus of Early Modern English speech-related texts (see , or for even more detail, Kytö & Walker 2006). That corpus led to our 2010 monograph, Early Modern English Dialogues: Spoken Interaction as Writing. Interestingly, on the whole, little turned out to be a distinctive feature of spoken interaction alone. Register was almost always the more powerful factor. Merja’s work on early modern spoken interaction spurred her to develop further lines of inquiry. Speech-related texts are apparent in the Corpus of Nineteenth Century English (CONCE), which she created with Juhani Rudanko and Erik Smitterberg (see Kytö, Rudanko & Smitterberg 2000). Furthermore, experiences with dialogue heralded a move into historical pragmatics. One might note, for example, her 2020 book, co-edited with Claudia Claridge, Punctuation in Context – Past and Present Perspectives, which fills a gaping hole in research. The picture I have painted of Merja’s research is far from comprehensive. Aside from specific research avenues, Merja has contributed to English Language scholarship generally, a noteworthy example being the excellent Cambridge Handbook of English Historical Linguistics (Kytö & Pahta 2016). And all these accomplishments have been achieved with good humour and zest for life. I still chuckle when I think, for instance, of Merja discovering that one of our early candidate texts for the dialogues corpus was, one might say, ‘X rated’, or her reaction when a conference chair introducing one of our papers muddled the word ‘landmarks’ and instead referred to our works as great ‘landmines’ in the history of English scholarship (we restrained from laughing during the paper, but were mightily amused after). Merja

Foreword xiii

shows us by her own example that monumental achievements in academia are possible through dedication, yet without loss of fun. This volume rightly celebrates her pioneering, inspiring achievements, the end of which we certainly have not seen.

References Claridge, C. & Kytö, M. 2020. Punctuation in Context – Past and Present Perspectives. Bern: Peter Lang. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Kytö, M. 1991. Variation and Diachrony, with Early American English in Focus. Frankfurt: Peter Lang. Kytö, M. 2019. Register in historical linguistics. Register Studies 1(1): 136–167 Kytö, M. 2020. English in North America. In The Cambridge Handbook of World Englishes, D. Schreier, M. Hundt & E. W. Schneider (eds), 160–184. Cambridge: CUP. Kytö, M. & Pahta, P. 2016. The Cambridge Handbook of English Historical Linguistics. Cambridge: CUP. Kytö, M. & Walker, T. 2006. Guide to A Corpus of English Dialogues 1560–1760 [Studia Anglistica Upsaliensia 130]. Uppsala: Acta Universitatis Upsaliensis. Kytö, M., Grund, P. J. & Walker, T. 2011. Testifying to Language and Life in Early Modern England: Including a CD-ROM Containing An Electronic Text Edition of Depositions 1560–1760 (ETED). Amsterdam: John Benjamins. Kytö, M., Ihalainen, O. & Rissanen, M. 1988. Corpus Linguistics, Hard and Soft: Proceedings of the Eighth International Conference on English Language Research on Computerised Corpora. Amsterdam: Rodopi. Kytö, M., Rudanko, J. & Smitterberg, E. 2000. Building a bridge between the present and the past: A corpus of 19th-century English. ICAME Journal 24: 85–98 Rissanen, M., Kytö, M. & Palander-Collin, M. 1993. Early English in the Computer Age: Explorations through the Helsinki Corpus [Topics in English Linguistics 11]. Berlin: Mouton de Gruyter.

Chapter 1

Voices of English Tapping into records past and present Ewa Jonsson and Tove Larsson

Uppsala University / University of Louvain

This edited volume offers a selection of diachronic and synchronic corpus-linguistic studies of features of involvement in spoken, speech-related and interactive written texts in English. Broadly expressive of the state of the art in the field, the volume is intended to recognize and celebrate the groundbreaking contributions of Professor Merja Kytö in making available speech-related and interactive written material for corpus-linguistic research and leading the way in its exploration. The volume spans the time period between Early Modern English and Present-day English and is made up of original empirical investigations exploring variation and change in ‘involved’ texts (i.e. texts in which speakers’ or authors’ voices are evident), from pragmatic noise (such as mhm, ah, ha) in Shakespearean drama dialogues to features of orality in texts on the searchable web. All chapters present the voice of the speakers/authors by ‘tapping into’ the material to discern and describe oral or involved features. The volume thus enables voices long gone to be heard, along with voices of personal involvement in contemporary texts. As pointed out by Stubbs (1996), Linell (2005) and Culpeper and Kytö (2010), there has generally been a written-language bias in linguistic research, particularly in historical accounts, despite the near-unanimous agreement on the primacy of speech as the driving force in the evolution of language(s). In addition, the kind of writing studied has traditionally been ‘authoritative texts’ that are highly valued in society and controlled by individuals and groups with power, such as legal texts, academic texts and imaginative literature (Stubbs 1996; Culpeper & Kytö 2010). However, as pointed out by Kytö and colleagues, this is unfortunate, as it cannot safely be assumed that such authoritative texts represent language as a whole (Culpeper & Kytö 2010: 403, original emphasis): “[a]fter all, the bulk of language – even more so historically – is comprised of spoken interaction, and spoken interaction is generally not well (re)presented in authoritative texts” (ibid.). Instead, spoken and speech-related data play an integral part for our understanding of language, and these kinds of data deserve more attention in language description. https://doi.org/10.1075/scl.97.01jon © 2020 John Benjamins Publishing Company

2

Ewa Jonsson and Tove Larsson

In order to counteract the written-language bias and to promote register awareness also in historical linguistics, linguists have worked tirelessly to compile and make available ‘involved’ and speech-related data to enable corpus-linguistic research into texts of this kind; examples include the depositions and private letters of the Helsinki Corpus of English Texts and the trial proceedings constituting the Old Bailey Corpus. In addition to her pioneering work on the Helsinki Corpus of English Texts, Merja Kytö has been a driving force behind a number of other significant corpus projects involving speech-related texts, including A Corpus of Nineteenth-Century English, A Corpus of English Dialogues 1560–1760, A Representative Corpus of Historical English Registers and An Electronic Text Edition of Depositions 1560–1760. In a vast number of publications, a select list of which is found as an addendum to this volume, she has put these and other corpora to excellent use and, in doing so, demonstrated the significance of the register parameter. A number of advantages have come out of the increased availability of speech-related texts and more diverse historical language data. Not only do they enable language historians, just like linguists studying contemporary texts, to carry out stratified, register-based sampling from the corpora, but they also allow us to track language change more accurately, as many new features first appear in spoken language (Smitterberg 2016, forthcoming). Moreover, they help us counteract the underlying bias of ‘authoritative’ historical texts, namely their focus on the language of literate, mostly male, authors from the upper echelons of society. Many speech-related texts, such as scribal records of witness depositions and trial proceedings, represent language from a cross-section of society, thus providing access to speaker groups generally underrepresented in the authoritative texts: women, illiterate and lower-class speakers, as will be evident in several of the contributions in this volume. Studies of contemporary English have also benefited greatly from increased availability of carefully compiled corpora, and there has been a growing interest in studies of spoken, speech-like and involved features in English (e.g. Biber 2015; Aijmer 2018). However, as in the cases of written-to-be-spoken dialogues of Early Modern English plays and the records of speech in Late Modern English court transcriptions, the line between written and spoken production is blurred in Present-day English as well. For example, with the emergence of new means of communication, certain types of written material, in particular online, exhibit a higher degree of ‘orality’ than many spoken registers (cf. Jonsson 2015). This goes to show that it is the presence of a(n imagined) discourse participant rather than the mode of the production itself that is a determining factor for involvement, which is something this volume will serve to underscore.



Chapter 1.  Voices of English

In light of this, the volume offers a collection of original articles representing corpus-based research on involvement in historical and present-day data, reflecting the growing interest in speech and speech-related language enabled by the increased availability of such material, thereby enabling ‘voices past and present’ to be heard. The volume brings together the perspectives of international experts from a range of related fields, including historical (socio)pragmatics, historical manuscript studies, contemporary discourse pragmatics and register studies, exploring aspects such as grammaticalization, lexical, syntactic and stylistic variation, semantic roles and metadiscourse. Following this introductory chapter, the volume is divided into three parts by language period: Early Modern English, Late Modern English and Present-day English. The summaries below introduce the chapters by period. Part I, Early Modern English, opens with three chapters addressing properties of dialogues and soliloquies in Shakespearean plays. Jonathan Culpeper and Samuel J. Oliver explore the distribution and functions of pragmatic noise items such as mhm, ah, ha, oh and hum across character groups of different gender and social rank, describing the implications of these features for our conception of the characters. Using the same resource as these authors, a fully searchable, regularized version of Shakespeare’s dramatic works, Dawn Archer and Alison Findlay carry out a keyword analysis of the speech of seven characters traditionally classified as villains or anti-heroes. Their examination of the keywords (e.g. ha, mine, they/them, would, your) and collocates reveals traits and motives of the characters that provide new insights into the characters’ involvement with others. Adopting instead a perspective of semantic roles, Juhani Rudanko also offers a window into the minds of Shakespearean characters, more specifically by comparing the agentivity of two speakers, Iago and Othello, as revealed by predicates they use. Among other comparisons, Rudanko explores the extent to which the two characters are volitionally involved in the event or state of affairs denoted by the predicates. The part on Early Modern English continues with four chapters drawing on speech-related data from A Corpus of English Dialogues 1560–1760 (CED), the Helsinki Corpus of English Texts (HC) and Early English Books Online (EEBO). Terry Walker and Peter J. Grund explore the form, frequency and function of speech-reporting expressions such as said and answered in said Evangelist and he answered in the dialogues of CED, to identify, for instance, what the reporting clauses reveal about the speakers as they reflect on the voices and words of others. Using the same resource (CED) together with the HC, Irma Taavitsainen focuses on the pragmatic functions of interjections, such as alas, lo, O, and other deictic devices that signal involvement, humor and comic effects of surprise in early modern drama dialogue and popular literature. Jeremy J. Smith subsequently reconstructs the voice of English puritans by tracking their affective deployment of vocabulary (e.g. the

3

4

Ewa Jonsson and Tove Larsson

keyword godly) in comparison to that of authors with other confessional orientations, drawing on texts from EEBO that constitute the representatives’ involved, public dialogue with contemporaries. In the concluding chapter on Early Modern English, Matti Peikola uses the EEBO corpus to investigate involved features used by sixteenth-century English printers on their title pages to engage prospective readers/ purchasers, for instance first- and second-person pronouns, imperatives and terms of address. By using these linguistic means in their personal metadiscourse, the printers explicitly signal the presence of a(n imagined) discourse participant. In Part II of the volume, which is devoted to Late Modern English, the first two authors approach the spoken language of the time by ‘tapping into’ scribal records of voices heard in London’s central criminal court (known as the Old Bailey) in 1720 to 1913. Drawing on the records constituting the Old Bailey Corpus (OBC), Claudia Claridge investigates epistemic adverbs (such as probably, likely, apparently), generally known to be more common in spoken and interactive contexts, thereby shedding light on their characteristics, frequency development, preferred user groups and contexts. In the ensuing chapter, Patricia Ronan focuses on the diachronic development of question strategies used in the courtroom, correlating legal interaction representing the social practices of the time with societal changes. She also explores differences in strategies used by lay litigants and the professional trial participants, finding the latter to ask more varied questions with broader scopes. Moving away from London’s central criminal court to other areas where English is spoken, the next two authors study Late Modern English as represented in Ireland and the United States. Comparing data from A Corpus of Irish English and the latter part of A Corpus of English Dialogues 1560–1760, Raymond Hickey investigates a key pragmatic feature in the dialogues of Irish nineteenth-century dramas, namely the pragmatic marker sure in sentence or clause-initial position, reflecting on how cultural norms have influenced the pragmatics of Irish English. Shifting the focus to the early United States, Lieselotte Anderwald challenges the primacy of speech for the development of the past participle gotten in American English, finding the form of the past participle to have been promoted and revived through more formal text types, after having almost died out in speech, as revealed by for example the Corpus of Historical American English (COHA). Part III, on Present-day English, begins with four chapters tracking the development of expressions that are used by speakers to position themselves in relation to their interlocutors in ongoing discourse. David Denison looks at ‘turn-initial so’ when used as to accept an invitation to take the floor and offer an explanation. Using data from academic discourse and media language, he maps out its many discourse functions in spoken discourse to show its versatile nature. Ylva Berglund Prytz then turns to expressions of future (e.g. will and going to + infinitive) to see



Chapter 1.  Voices of English

how the use of these has changed in the years that separate the British National Corpus (BNC) from the Spoken BNC2014. She also looks at their immediate context, for example exploring co-occurrence patterns between different expressions and personal pronouns. In the subsequent chapter, Karin Aijmer maps out the discourse functions of two other expressions used by speakers to position themselves, namely sort of and kind of, using the English-Swedish Parallel Corpus. She shows how the translations can help us tease apart different meanings and functions of the two expressions. Anna-Brita Stenström then looks at teenage speech from The Bergen Corpus of London Teenage Language (COLT), as compared to spoken production from the Multicultural London English Corpus (MLE), the BNC and the Spoken BNC2014, in order to study the use of innit. She shows how the new pragmatic marker has come to serve functions similar to those of words like yes, yeah and okay in British English. Following these chapters on development over time in native-speaker data, the section turns to a topic that deals with the development of the authorial voice of learners of English. Specifically, Sarah Schwarz and Erik Smitterberg look at the apparent mismatch between advice on certain constructions provided in grammars vis-à-vis usage in naturally occurring language to see whether grammatical features labeled as ‘spoken-like’ match actual usage in the Corpus of Contemporary American English (COCA). The final two chapters of Part III deal with orality across registers and subregisters. Signe Oksefjell Ebeling and Hilde Hasselgård investigate intra-register differences in fiction data. In more detail, they look at the use of adverbial intensifiers across narrative and dialogue, which enables them to compare the voice of the narrator to the voices of the characters of the novels. Using a corpus of online registers, Douglas Biber and Jesse Egbert then tackle the question of the extent to which involved written texts from the web can be seen to represent the linguistic characteristics of spoken registers. As is clear from their content, the chapters of the present volume have been written with a view to honoring Merja Kytö’s leading contributions to the fields of historical corpus linguistics, sociolinguistics and pragmatics, but also with a view to recognizing her generous support of colleagues working in related fields and on contemporary data. The large number of prominent scholars who have offered their contributions to the project highlights not only the impact that Merja Kytö’s research has had on the field, but it also reflects how many people owe her their gratitude for all the effort she has put into the numerous corpus-compilation projects that she has been involved in. In her work, Merja Kytö seeks to advance corpus linguistic research into involved texts, particularly spoken and speech-related

5

6

Ewa Jonsson and Tove Larsson

material, to counteract the written-language bias in linguistic theory and to promote register awareness also in historical linguistics. In this volume, renowned experts from all over the world have come together to honor her contribution to the field by following her example. We wish to extend our heartfelt thanks to the chapter authors for their generous contributions and their enthusiastic commitment to this volume, and for offering each other valuable feedback. We are also deeply grateful to the editorial team at Benjamins and to the external referees who generously took time out of their busy schedules to share their expertise in honor of Merja Kytö: Erika Berglind Söderqvist, Theresa Fanego, Solveig Granath, Henrik Kaatari, John Kirk, Samantha Laporte, Marie-Aude Lefer, Ursula Lutzky, Christian Mair, Belén Mendez Naya, Ilka Mindt, Sean Murphy, Minna Palander-Collin, Eva Pettersson, Hanna Salmi, Gerold Schneider, Lucia Siebers, Heli Tissari, Gunnel Tottie, Jukka Tyrkkö, Kristel Van Goethem and Peter Wikström. In addition, we wish to thank all our colleagues in the Department of English at Uppsala University for supporting our editorial enterprise, as well as colleagues from around the world for their advice along the way. In particular, we would like to express our gratitude to Erika Berglind Söderqvist, Saqueb Kathon and Donald MacQueen for their hands-on – and very much appreciated – editorial assistance. A patron at the concept stage of the project, Nils-Lennart Johannesson, long-time friend of Merja Kytö and professor of English linguistics at Stockholm University, sadly passed away before the completion of the volume. We are most grateful for his support and encouragement. A secret during its production, this book was officially presented to Merja Kytö as a gift on the occasion of her having served as professor at Uppsala University for 25 years. In her 1996 inaugural speech, she specified as one of her particular research interests “the spoken language of the past, the core of most of language change” (Rehn 1996: 67, our translation). She also presented some of her corpus-compilation plans and talked about how they were motivated by a strong wish to study the spontaneous expressions “of everyday people, which clearly reveal the spoken-language characteristics of the time” (Rehn 1996: 68, our translation). A quarter of a century later, there is no question that the field has benefited greatly and advanced significantly thanks to her work. Having first gotten to know Merja in her role as supervisor of our own respective doctoral projects, we were immediately struck by her unpretentiousness, approachability and contagious optimism. Her communication is unparalleled in its considerateness, and her advice is always kind and constructive. As collaborators and colleagues around the world will testify, Merja sees opportunities and solutions around every corner, always a strong proponent of going the extra mile



Chapter 1.  Voices of English

for the sake of furthering knowledge. Although she studiously avoids the limelight at conferences and prefers to give the floor to collaborators, her international network is larger than that of most scholars. Many travel far to inform themselves of her latest findings and return home with fresh ideas that she has shared in a most respectful manner. With this book, we wish to pay her our respect in return.

References Aijmer, K. 2018. “That’s well bad”: Some new intensifiers in spoken British English. In Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014, V. Brezina, R. Love & K. Aijmer (eds), 60–95. London: Routledge. https://doi.org/10.4324/9781315268323-6 Biber, D. 2015. Stance and grammatical complexity in conversation: An unlikely partnership discovered through corpus analysis. Corpus Linguistics Research 1: 1–19. https://doi.org/10.18659/CLR.2015.1.0.01 Culpeper J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Jonsson, E. 2015. Conversational Writing: A Multidimensional Study of Synchronous and Supersynchronous Computer-mediated Communication. Frankfurt: Peter Lang. Linell, P. 2005. The Written Language Bias in Linguistics: Its Nature, Origins and Transformations. London: Routledge. Rehn, J. (ed.). 1996. Nya professorer vid Uppsala universitet: Installationer våren 1996. Uppsala: Uppsala University. Smitterberg, E. 2016. Extracting data from historical material. In The Cambridge Handbook of English Historical Linguistics, M. Kytö & P. Pahta (eds), 181–199. Cambridge: CUP. https://doi.org/10.1017/CBO9781139600231.012 Smitterberg, E. Forthcoming. Language Change in Late Modern English: Colloquialization and Densification. Cambridge: CUP. Stubbs, M. 1996. Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Oxford: Basil Blackwell.

7

Part I

Early Modern English

Chapter 2

Pragmatic noise in Shakespeare’s plays Jonathan Culpeper and Samuel J. Oliver Lancaster University

Pragmatic noise, first coined in Culpeper and Kytö (2010), refers to the semi-natural noises, such as ah, oh, and ha, that have evolved to express a range of pragmatic and discoursal functions. Taking advantage of the regularised spellings and grammatically tagged texts of the Enhanced Shakespearean Corpus (Culpeper 2019), this study considers the frequency, distribution and functions of pragmatic noise across Shakespeare’s plays and characters. It reveals and discusses, for example, the facts that: whilst particular types of pragmatic noise maintain a steady presence across all the plays, there is variation in token density; female characters have a much greater density of pragmatic noise tokens compared with male; and characters in the middle of the social hierarchy use pragmatic noise particularly often. Keywords: characterisation, corpus-based methods, pragmatic noise, Shakespeare, social groups

1. Introduction Compared with the voluminous literary critical literature, linguistic research on Shakespeare’s language is somewhat lacking; compared with the voluminous linguistics literature on items that comprise the main or matrix clause, research on interjections is distinctly lacking. This chapter makes a contribution to both areas. Our study focusses on “pragmatic noise”. Overlapping to an extent with interjections, pragmatic noise comprises the semi-natural noises – ah, oh, ha, mhm, ugh – that people make to express angst, anger, pain, surprise, pity, amusement, encouragement, listenership, and so on. Pragmatic noise is intimately connected with spoken interaction, and thus with the work of Merja Kytö, who at multiple points in her career has led research in historical speech-related phenomena, as this volume testifies. Merja is also one of the pioneers of historical corpus linguistics, and this chapter will adopt corpus-based methods to interrogate historical data. Furthermore, the notion at the heart of this chapter, pragmatic noise, was coined by Culpeper and Kytö (2010) in their work on Early Modern English. https://doi.org/10.1075/scl.97.02cul © 2020 John Benjamins Publishing Company

12

Jonathan Culpeper and Samuel J. Oliver

In this chapter, we investigate the frequency, distribution and functions of pragmatic noise across Shakespeare’s plays, and especially across particular social groups – those constituted by sex and social status. Do particular types of pragmatic noise cluster in particular plays? Does a play being a tragedy, comedy, or history have an influence on pragmatic noise item usage? Do female characters use more and/or different pragmatic noise items compared with male? Is the status of the character reflected in the use of pragmatic noise? These are the key research questions we will address. Our study utilises the resources made available by the AHRC-funded Encyclopedia of Shakespeare’s Language project, based at Lancaster University. Not only do we have access to a fully searchable, regularised version of Shakespeare’s plays, but we have access to a version that has been annotated for various social categories. The following section expands a little more on pragmatic noise, and then Section 3 describes the Shakespeare play data, and elaborates on how we extracted pragmatic noise items. Sections 4 and 5 report and then discuss the pragmatic noise in the plays and social groups. Finally, we offer brief concluding remarks. 2. Pragmatic noise Culpeper and Kytö (2010) devote four chapters of their book on Early Modern English dialogues to pragmatic noise, the first of which introduces the notion (see in particular Section 9.2). Pragmatic noise concerns material that lies outside the main syntax and is pregnant with pragmatic meaning. Unlike many discourse markers or pragmatic markers discussed in the literature, it concerns items that do not have homonyms in other word classes (e.g. well can act as a discourse marker in addition to, for example, an adjective), and are almost always monosyllabic and sometimes phonologically unusual (consider mhm) (the use of small caps throughout this chapter signals the inclusion of spelling variants). They are formed of semi-natural or instinctive noises, including not only single types like ah, ha, oh, hum, but also reduplicative forms such as ha ha. Pragmatic noise overlaps with two other notions in the linguistics literature. It overlaps to a great extent with what have been called primary interjections (e.g. Ameka 1992), but there are a number of differences. Unlike the label ‘interjections’, the label ‘pragmatic noise’ emphasises their pragmatic importance. Interjections are traditionally thought of as performing expressive or emotive functions. More recently, scholars have recognized that they perform a wider range of functions. This is particularly true of pragmatic noise. For example, hey and oi typically have directive functions, whilst mhm is typically phatic. Scholars have also noted that the items that constitute pragmatic noise perform discoursal functions. Person



Chapter 2.  Pragmatic noise in Shakespeare’s plays 13

(2009), studying oh in Romeo and Juliet, notes the range of contexts it appears in and the nuances of function it has, including discourse functions, such as marking a change of addressee (2009: 88) or prefacing requests (2009: 97). Jucker (2002), also studying oh, observes a general shift from an exclamatory (i.e. emotive) function to a discourse function. Furthermore, pragmatic noise includes types that are not classified as interjections at all, examples being laughter, pause-fillers, hesitation markers and listenership devices (e.g. mhm). Pragmatic noise also overlaps with what Biber et al. (1999) refer to as ‘inserts’, but again there are differences. Pragmatic noise overlaps with central members of inserts, present-day examples including: oh, oi, oops, ah, ha, aha, uh, um, eh, erm, mhm, tut, whoa, and whoops (all examples drawn from Biber et al. 1999: 1082–98). However, as Culpeper and Kytö (2010: 203) point out, unlike inserts, pragmatic noise sometimes has unusual phonetic or phonological characteristics, as noted at the beginning of this section. It is this that helps underpin the term “noise” in their label pragmatic noise. Furthermore, Culpeper and Kytö (2010: 199) claim that pragmatic noise items “have less arbitrary meanings compared with most words (they are sound symbolic to a degree)”. They are relatively natural noises, evolved as spontaneous reactions to particular cognitive states (see Culpeper & Kytö 2010: Chapter 12, for a detailed elaboration of that evolution). No such claim is made of inserts. In short, pragmatic noise comprises the semi-natural noises that have evolved to express a range of pragmatic and discoursal functions. Turning to their presence in written texts in particular, Culpeper and Kytö (2010), examining speech-related late early modern texts, demonstrated how particular sets of pragmatic noise types tend to have particular functions in and were distributed across genres in particular ways. Five types were common to all their speech-related genres: o, oh, alas, ah and fie (Culpeper & Kytö 2010: 268–270). However, there was variation in the density of occurrence across the genres, and also in the rank order of the particular types of pragmatic noise. Play-texts are particularly dense in pragmatic noise, which occurred in their data with a density of 5.5 per thousand words, compared with 1.7 per thousand words in the next most densely populated genre, prose fiction (Culpeper & Kytö 2010: 269). The rank order for particular pragmatic noise types reported for plays is: o, ha, oh, fie, ah, he, alas, ay, pshaw and tush (Culpeper & Kytö 2010: 269). This differed for other genres. For example, alas occurs in seventh position in play-texts but second position in prose fiction (Culpeper & Kytö 2010: 269). What was not considered, however, are potential differences amongst individual play-texts, and this is one of the goals of this chapter. Pragmatic noise items have much to do with personal affect, a term which in linguistics has been used to encompass people’s feelings, emotions, moods and attitudes, as well as personality (Caffi & Janney 1994: 328). Taavitsainen (1999)

14

Jonathan Culpeper and Samuel J. Oliver

examined the role of features of personal affect – features which she termed ‘surge features’ – in literary characterisation, specifically in The Canterbury Tales, and these features include pragmatic noise. She states that personal affect is “a component of participant relations and finds outlets in various forms, thus it gives us a picture of the person’s behavioural patterns and mental characteristics” (1999: 219–20). Examples include oh expressing surprise or fie expressing disgust. What researchers have not done, however, is to consider, as we will do in this chapter, pragmatic noise (or overlapping categories such as surge features) across groups of characters comprising entire social groups. 3. Data and method 3.1

The Enhanced Shakespearean Corpus

The largest single body of Shakespeare’s works and the earliest publication of a large group of his works is that constituted by the First Folio (1623). This was the obvious choice for the Encyclopedia of Shakespeare’s Language project to have as its core data. Needless to say, scholars have recognised the presence of other hands in plays listed in the First Folio; collaborative works were common at the time. To the First Folio, we added two further plays: Pericles (Quarto 1) and The Two Noble Kinsmen (Quarto 1), believed to be collaborations with George Wilkins and John Fletcher, respectively. The resultant 38 plays, totalling 1,038,509 words, represent what is generally thought of as Shakespeare’s canon. The Enhanced Shakespearean Corpus (ESC) is termed ‘enhanced’ because of its tagging/annotation. Original spelling texts were kindly supplied by Internet Shakespeare Editions. Pragmatic noise is not immune to spelling variation. Without regularisation, a search on, for example, alas, would not retrieve instances spelt alasse. Spelling was regularised with the program Variant Detector (VARD), developed by scholars at Lancaster University over more than 15 years, and most significantly by Alistair Baron (see ). This program regularises spelling by matching variants to “normalised” equivalents using a search and replace script, as well as contextual information, to tackle ambiguities, and an additional lexicon to treat word forms that are specific to or have undergone semantic change since the early modern period. The program does not delete the original spelling, but places it in a specific XML element, thereby making it easily available for inspection. Because the project demanded a high level of accuracy, we did not run the program in fully automatic (whole-text) mode. Instead, the program’s manual (word-by-word) mode can on most occasions suggest regularisation



Chapter 2.  Pragmatic noise in Shakespeare’s plays 15

options in order of likelihood, from which the human operator approves a selection. We made no attempt to “correct” the spelling, with very rare exceptions made for obvious printer errors, such as aud for and. The ESC is tagged for parts of speech. As we explain in Section 3.2, the grammatical category of interjections played an important role in our method for retrieving instances of pragmatic noise. Part of speech tagging was partly achieved through the program CLAWS (the Constituent Likelihood Automatic Word-tagging System: see ). In a nutshell, CLAWS works on the basis of (1) a lexicon, including words (or multi-word units) and suffixes and their possible parts of speech, and (2) a matrix containing sequencing probabilities (e.g. the likelihood that the word following an adjective will be a noun), which is applied to each sentence to disambiguate words which could potentially be several parts of speech. However, Early Modern English presents a range of problems, aside from spelling variation. These include vocabulary change over time: some words have disappeared from English over the last 400 years (e.g. iwis, meaning ‘certainly’ or ‘assuredly’) and are thus not in the tagger’s lexicon, whilst others still exist but behave differently in grammatical terms (e.g. the word fee could equally well be a verb as a noun). In addition, CLAWS overlooks many grammatical features of Early Modern English – for instance, the existence of thou and thee as forms distinct from you, rather than as marginal phenomena as they are today; or the regular use of an inflected second person for verbs. Our solution to both these problems was to make adjustments to CLAWS, and also to manually check all texts in the ESC. The corpus itself has also been annotated for speakers’ sex and social status, and other speaker characteristics. This annotation scheme was only applied to characters whose talk makes up at least five per cent of the total word count of the play in which they appear (this excludes very minor characters – messengers, for example – whose nature can be difficult to determine). Categorising characters as male or female is relatively straightforward, though it was necessary to develop separate categories for characters with an assumed identity (e.g. a female character playing a male character). The social status categorisation scheme drew upon the approach developed by Archer and Culpeper (2003), where further detail can be found. The social hierarchy is as follows: monarchy > nobility > gentry > professional > other middling groups > ordinary commoners > lowest groups. The project was sensitive to the fact that it was working with fictional data. Hence, for example, we added a “supernatural beings” category accounting for more than 40 ghosts, gods, fairies, etc. in the 38 plays. Partly for reasons of space, in this chapter, we will confine ourselves to groups on the social hierarchy.

16

Jonathan Culpeper and Samuel J. Oliver

3.2

A method for pragmatic noise extraction

Pragmatic noise instances had been treated by the tagger as interjections, the relevant label being UH. A search for ‘_UH’ in the ESC produced 92 types with a combined total of 8,179 tokens. Not all of these interjections, of course, counted as pragmatic noise, and the statuses of some were difficult to determine. Consequently, the following were removed from this list: 1. Affirmative/negative items: no, nay, yes, aye, yea. 2. Morphologically complex items and/or items with homonyms in other word classes: e.g. gramercy, farewell, weladay. Alas and alack are borderline cases, as their second elements have homonyms in other word classes, but their first elements, a or ah, are more clearly pragmatic noise. For this reason, alas and alack remain in this study. 3. Highly restricted items. For example, nonino and nonny occur only in song and in phrases following on from hey. 4. Items with fewer than three instances. We instituted this to ensure that we had a sufficient number of occurrences of each item to interpret its function. 5. Items lacking clarity. There were two examples here: sessa, which occurs three times, each with a different original spelling, but no clear function or meaning, and ugh, all three instances of which appear consecutively in a single line. A total of 4,524 tokens were eliminated, or 55.19% of all tokens tagged as interjections. The remaining 21 types in order of frequency are as follows: o, oh, alas, ho, ha, fie, ah, alack, lo, tut, tush, la, holla, hum, foh, sola, hem, hey, hush, pish and mum. These amounted to 3,649 tokens in total. 4. Distribution of pragmatic noise across Shakespeare’s plays 4.1

Overview of distribution by play

Table 1 displays our frequency results across Shakespeare’s plays. Play genre refers to whether the play is a tragedy, comedy or history. For these classifications, which are notoriously controversial, we follow the classification given in the First Folio, with the exception of Cymbeline, which we reclassify as a comedy. The plays vary considerably in length. This is an important fact for interpreting the other figures in the table. The middle columns of the table contain the number of different pragmatic noise types and the number of pragmatic noise tokens. Finally, the table displays the normalized frequencies of pragmatic noise (per thousand words), and the rows of the table are ordered according to these results.

Chapter 2.  Pragmatic noise in Shakespeare’s plays 17



Table 1.  The number of pragmatic noise (PN) types, the number of tokens and the relative frequencies for each play in the Enhanced Shakespearean Corpus, rank-ordered by relative frequency Play

Romeo and Juliet Othello Hamlet Troilus and Cressida King Lear Antony and Cleopatra Titus Andronicus Julius Caesar Twelfth Night The Tempest Love’s Labour’s Lost A Midsummer Night’s Dream Measure for Measure The Two Noble Kinsmen The Merry Wives of Windsor King John Richard III As You Like It The Taming of the Shrew Cymbeline Two Gentlemen of Verona Much Ado about Nothing Timon of Athens Henry VI, part 3 Henry IV, part 2 Pericles The Winter’s Tale Henry VI, part 2 The Merchant of Venice Richard II The Comedy of Errors Henry IV, part 1 Henry VI, part 1 Coriolanus Henry VIII Henry V Macbeth All’s Well that Ends Well

Play Total genre words T T T T T T T T C C C C C C C H H C C C C C T H H C C H C H C H H T H H T C

29,556 32,668 34,761 32,060 29,188 30,277 24,584 24,037 24,033 20,482 25,867 20,126 26,380 29,393 26,663 24,768 35,401 25,954 25,344 33,819 21,212 25,203 22,510 29,779 31,977 22,073 31,026 30,763 25,065 26,495 17,587 29,724 26,083 33,722 30,022 31,366 21,118 27,423

No. of different PN types

No. of PN tokens

11 14 15 14 14 11 10  7 11 10 12  8  8 10 11  8 10 12 11 11  9 10 10  8 11  8 11  7  7  9  7 12  9 11  9  8  9  6

218 240 171 157 137 141 113 105 100  81  99  77  98 108  94  87 123  90  84 111  65  77  68  88  94  63  85  80  65  66  43  71  62  78  67  60  40  43

Normalized frequency of PN tokens (per 1,000 words) 7.4 7.3 4.9 4.9 4.7 4.7 4.6 4.4 4.2 4.0 3.8 3.8 3.7 3.7 3.5 3.5 3.5 3.5 3.3 3.3 3.1 3.1 3.0 3.0 2.9 2.9 2.7 2.6 2.6 2.5 2.4 2.4 2.4 2.3 2.2 1.9 1.9 1.6

18

Jonathan Culpeper and Samuel J. Oliver

Table 2 displays the actual pragmatic noise items (types and tokens) for each play, along with their raw frequencies of occurrence in that play. The plays are listed according to their overall normalized frequency of pragmatic noise tokens; in other words, the order of plays here matches that of Table 1. Table 2.  The pragmatic noise (PN) items in each play in the Enhanced Shakespearean Corpus (the plays are rank-ordered by relative frequency of PN tokens per 1,000 words)) Play

Play Pragmatic noise items and their raw frequencies genre

Romeo and Juliet

T

o (135), oh (16), ah (15), ho (13), alack (11), alas (8), fie (6), ha (5), tut (4), lo (3), tush (2)

Othello

T

oh (117), o (34), alas (28), ho (26), ha (17), fie (8), lo (2), pish (2), ah (1), alack (1), foh (1), hem (1), holla (1), hum (1)

Hamlet

T

oh (79), o (39), ho (12), alas (9), fie (7), ha (7), alack (3), holla (3), lo (3), ah (2), hey (2) tush (2), foh (1), hum (1) la (1)

Troilus and Cressida

T

o (70), ha (20), oh (17), ho (11), alas (9), fie (7), lo (5), ah (4), foh (4), holla (4), hum (2), la (2), hem (1), hey (1)

King Lear

T

o (63), oh (17), alack (13), ho (13), ha (12), fie (7), alas (2), holla (2), hum (2), mum (2), ah (1), foh (1), hey (1), hush (1)

Antony and Cleopatra

T

oh (83), ho (15), o (13), ah (10), fie (5), ha (5), alack (4), hush (2), lo (2), alas (1), la (1)

Titus Andronicus

T

oh (39), o (30), ah (13), alas (10), fie (7), ha (5), lo (4), tut (3), ho (1), holla (1)

Julius Caesar

T

o (69), ho (18), alas (9), ha (5), lo (2), oh (1), tut (1)

Twelfth Night

C

o (42), alas (14), ho (12), oh (9), fie (6), ha (6), hey (6), ah (2), la (1), lo (1), tut (1)

The Tempest

C

o (42), ha (9), oh (9), ho (5), lo (5), alack (3), alas (3), hey (3), fie (1), hush (1)

Love’s Labour’s Lost

C

o (75), ah (5), alack (5), alas (2), ha (2), lo (2), oh (2), sola (2), fie (1), ho (1), holla (1), tush (1)

A Midsummer Night’s Dream Measure for Measure The Two Noble Kinsmen

C

o (55), oh (6), ho (5) alack (40, fie (3), ah (2), hey (1), lo (1)

C

oh (56), alas (9), fie (9), ha (7), ho (7), o (7), alack (2), foh (1)

C

o (53), alas (17), oh (14), ho (5), lo (5), ha (4), hey (4), fie (3), alack (2), ah (1)

The Merry Wives C of Windsor

o (25), fie (14), alas (13), ha (13), oh (9), ho (7), la (7), ah (2), tut (2), foh (1), hum (1)

King John

H

o (46), oh (26), lo (4), ah (3), alack (2), alas (2), ha (2), ho (2)

Richard III

H

o (67), ah (17), oh (10), alas (9), ha (5), lo (5), tut (5), alack (2), fie (2), ho (1)

Chapter 2.  Pragmatic noise in Shakespeare’s plays 19



Table 2.  (continued) Play

Play Pragmatic noise items and their raw frequencies genre

As You Like It

C

o (38), hey (14), oh (12), ho (10), alas (7), holla (3), ah (1), alack (1), fie (1), ha (1), hem (1), lo (1)

The Taming of the Shrew

C

oh (47), fie (13), ho (50, 0 (4), tut (4), tush (3), alas (2), ha (2), ah (1), lo (1)

Cymbeline

C

oh (60), o (22), alack (7), ho (7), alas (6), ah (2), fie (2), ha (2), hum (1), hush (1), lo (1)

Two Gentlemen C of Verona

oh (33), o (13), alas (9), fie (5), ah (1), ha (1), ho (1), lo (1), tut (1)

Much Ado about Nothing

C

o (46), ha (9), alas (4), tush (4), ah (3), fie (3), hey (3), ho (3), hem (1), mum (1)

Timon of Athens

T

o (24), oh (13), fie (60, ho (6), ha (5), alas (4), la (4), hum (3), ah (2), alack (1)

Henry VI, part 3 H

ah (29), oh (24) o (19), alas (9), lo (3) tut (2) ha (1), tush (1)

Henry IV, part 2 H

o (42), ha (12), oh (12), alas (7), ah (5), fie (5), ho (3), lo (3), alack (2), hem (2), la (1)

Pericles

C

o (37), oh (9), alas (4), ha (4), ho (4), alack (2), fie (2), hum (1)

The Winter’s Tale

C

o (31), oh (27), alas (7), fie (4), ha (3), hey (3), ho (3), alack (2), holla (2), lo (2), la (1)

Henry VI, part 2 H

o (26), ah (19), oh (18), alas (9), fie (4), tut (3), lo (1)

The Merchant of Venice Richard II

C

o (33), sola (8), fie (6), ha (6), ho (5), alas (4), alack (3)

H

oh (35), alack (6), o (6), ah (5), alas (5), ha (3), ho (2), lo (2), tut (2)

The Comedy C of Errors Henry IV, part 1 H

oh (25), fie (5), alas (4), o (4), ah (3), ho (1), lo (1)

Henry VI, part 1

H

o (21), oh (18), fie (4), tush (4), ah (3), alas (3), ha (3), lo (3), tut (3)

Coriolanus

T

oh (40), o (12), fie (7), ho (6), ha (4), alack (2), la (2), tush (2), ah (1), alas (1), lo (1)

Henry VIII

H

o (29), ha (12), alas (10), oh (5), ah (3), fie (3), lo (3), ho (1), tush (1)

o (42), tut (6), oh (5), fie (3), ha (3), hey (3), ho (3), ah (2), alas (1), hem (1), hum (1), tush (1)

Henry V

H

o (47), alas (3), ha (3), oh (2), pish (2), ah (1), fie (1), tut (1)

Macbeth

T

o (17), oh (7), alas (4), fie (3), ho (3), ha (2), lo (2), alack (1), hum (1)

All’s Well that Ends Well

C

o (31), 6 (oh), alas (2), ho (2), ah (1), foh (1)

20 Jonathan Culpeper and Samuel J. Oliver

4.2

Discussion of distribution by play

A striking feature of Table 1 is the dramatic difference in the normalized frequencies of pragmatic noise tokens. They vary from Romeo and Juliet with 7.4 to All’s Well that Ends Well with 1.6 tokens per thousand words. It is noteworthy that the top eight plays, rank-ordered by normalized frequency, are all tragedies; the histories inhabit the lower echelons, mixed in with some comedies, which take up some of the centre ground. There is no easy way of interpreting this, because on the one hand the notion of a tragedy is not a particularly coherent and consistent one, and on the other hand pragmatic noise covers such a range of functions. Nevertheless, one might speculate that plays, such as tragedies, that feature high emotions are somewhat more likely to contain more pragmatic noise. We write “somewhat” because much seems to depend on the particular kind of tragedy. The tragedy Macbeth is conspicuous by its positioning almost at the bottom of Table 1. The spread in relative frequencies across the plays is not quite even. In particular, note that Romeo and Juliet and Othello both score around 7.4 per thousand words, but then there is a drop to around 5 with Hamlet. Why are these two plays particularly dense in pragmatic noise? Both plays are love tragedies. Antony and Cleopatra, another notable love tragedy, ranks not far behind in sixth place, and Troilus and Cressida, in fourth place, has a claim to being a love tragedy. Perhaps this particular type of tragedy, with its extreme emotional ups and downs, results in high density of occurrence of pragmatic noise. Let us probe exactly what lies underneath the high numbers for these plays. Two factors seem to be at play. Both Romeo and Juliet and Othello contain a high number of different pragmatic noise items. Othello has 14 types, a number that is only exceeded by Hamlet with 15 (and Hamlet is in third position), and Romeo and Juliet has 11. The other factor, as can be seen from Table 2, is that both plays have an exceptionally high number of tokens of their most frequent pragmatic noise types. In Romeo and Juliet, o occurs 135 times, which is almost twice the frequency of its next most frequent occurrence (in Troilus and Cressida, where it has 70 occurrences). In Othello, oh occurs 117 times, substantially more than its next most frequent occurrence, in Hamlet, with 79 instances. It should be noted that, aside from the fact that o has a particular tendency to accompany terms of address as part of a vocative, it overlaps considerably in functional terms with oh (see Culpeper & Kytö 2010: Section 11.5). Also, the choice between o and oh was probably susceptible to choices made by compositors, if they needed to save space in a printed line. Let us examine the use of o and oh in Romeo and Juliet and Othello a little more closely. In Romeo and Juliet, the characters Juliet, Romeo and the Nurse each use approximately three times as many pragmatic noise tokens as any other character.



Chapter 2.  Pragmatic noise in Shakespeare’s plays 21

Of those, the Nurse uses them most densely (10.2 times per thousand words, compared with 7.2 for Juliet and 6.3 for Romeo). One of the Nurse’s main functions in the play is to act as an emotional mirror for various happenings, particularly in relation to Juliet. In addition, she seems to have been constructed as having a general propensity for pragmatic noise, and o in particular. She uses it for a wide range of functions, including to express the pain she feels (pragmatic noise instances are emboldened):

(1) My back a’ the other side: o my back, my back: 

(Romeo and Juliet 2.5)

To attract attention, and reinforce a request: (2) O holy Friar, O tell me holy Friar, Where ‘s my Lady ‘s Lord?  (Romeo and Juliet 3.3)

To swoon (or pretend to) at the thought of Paris: (3) O he’s a Lovely Gentleman: Romeo’s a dishclout to him  (Romeo and Juliet 3.5)

However, above all, the Nurse uses it for lamentation: (4) O woe, O woeful, woeful, woeful day, Most lamentable day, most woeful day, That ever, ever, I did yet behold. O day, O day, O day, O hateful day, Never was seen so black a day as this: O woeful day, O woeful day. 

(Romeo and Juliet 4.5)

Pragmatic noise was used in the expression of particular rhetorical figures. Here, the relevant figure is lamentatio. This figure particularly accounts for alas, but also ah and alack in plays. As in example (4), lamentatio is the conventional reaction in plays to death. (See also Taavitsainen’s discussion of alas in this volume, which echoes some of our points about emotion and tragedy). Regarding the particular pragmatic noise items displayed in Table 2, note that quite a few of the rank-ordered pragmatic noise types for play-texts reported in Culpeper & Kytö (2010: 269) – o, ha, oh, fie, ah, he, alas, ay, pshaw and tush – are in evidence here, with a few exceptions. ay is not included because of our method (see Section 3.2). He has a long history in English, but was a relatively minor form until later in the seventeenth century, when it became regularly used to represent laughter. Similarly, pshaw only took off in the 1670s. The remaining types, o, ha, oh, fie, ah, alas and tush, appear regularly across the plays in Table 2 – a reflection perhaps that plays are a fairly cohesive genre. However, there are some interesting differences in the raw frequency rank orders.

22

Jonathan Culpeper and Samuel J. Oliver

In the plays The Merry Wives of Windsor, The Taming of the Shrew and The Comedy of Errors, fie makes it to second position. Fie is regularly used to cast shame or pour scorn on something or someone, or to express anger or exasperation: (5) Fie upon thee! art not ashamed? 

(Much Ado about Nothing 3.4)

These are indeed plays which are notable for such emotions, The Taming of the Shrew being the best exemplar. Another interesting example concerns the positioning of alas. It occurs in second position in Twelfth Night, Measure for Measure, The Two Noble Kinsmen and Henry V. Alas, as already mentioned above, is key to the expression of lamentation, but its use extends beyond that. It is used to express regret for something, and sometimes sympathy or empathy with others: (6) Alas, ‘tis a sore life they have I’ th’ other place – such burning, frying, boiling, hissing, howling, chatt’ring, cursing  (The Two Noble Kinsmen 4.3)

Measure for Measure and The Two Noble Kinsmen are both classified as comedies in the First Folio. However, for over a century, this classification has often been viewed as problematic, and they have been described as tragicomedies (for possibly the earliest comment on problematic plays, see Boas 1910). The threat of death via execution is a key theme in both. Henry V is certainly laced with dark moments and death. The odd one out then seems to be Twelfth Night. But here too, despite being a comedy, there are dark moments. “Alas the day!” says Antonio, in act two scene one, when reflecting on the death by drowning of his sister. Eight of fourteen instances of alas are spoken by Olivia and Feste, four each. Olivia most often uses it to express regret for or sympathy with another’s situation (e.g. “Poor Malvolio”, 5.1). Feste, being the Fool or Clown, might lead the modern reader to think that he is all levity. In fact, he provides insightful and sometimes somewhat melancholic reflections on himself and the behaviours of others. Example (7) is from the last song he sings: (7) But when I came alas to wive, with hey ho, etc. By swaggering could I never thrive, for the rain, etc. 

(Twelfth Night 5.1)

5. Distribution of pragmatic noise across Shakespeare’s social groups of characters 5.1

Overview of distribution by social groups

Table 3 displays the frequencies of pragmatic noise across the sex of the characters in Shakespeare’s plays, and Table 4 displays the rank order of pragmatic noise items for each sex, along with their raw frequencies.

Chapter 2.  Pragmatic noise in Shakespeare’s plays 23



Table 3.  The distribution of pragmatic noise (PN) in male and female characters in the Enhanced Shakespearean Corpus Sex

No. of No. of characters words

No. of Average (mean) no. of PNs Normalized frequency PNs per character and standard of PN tokens deviation in round brackets (per 1,000 words)

Male

1,235

811,531 2,620

2.1 (7.8)

3.2

Female

  166

171,132   937

5.6 (9.2)

5.5

Table 4.  The rank-ordered pragmatic noise (PN) items by sex of characters in the Enhanced Shakespearean Corpus Sex

Pragmatic noise types and their raw frequencies

Male

o (999), oh (639), ha (188), ho (181), alas (130), fie (115), ah (105), lo (54), alack (45), tut (36), tush (20), holla (16), hum (13), la (11), sola (10), foh (9), hem (5), hush (4), pish (4), mum (2)

Female o (371), oh (249), alas (109), ah (53), fie (43), ho (31), alack (29), lo (15), ha (12), la (9), hey (6), holla (3), hem (2), tut (2), hush (1), mum (1), tush (1)

Table 5 displays the frequencies of pragmatic noise across the social status of the characters in Shakespeare’s plays. Figures 1 and 2 reproduce some of the same information in graphic form. Figure 1 displays the average (mean) number of pragmatic noise types per character of a particular status, and Figure 2 displays the normalized frequency of pragmatic noise tokens (per thousand words). Finally, Table 6 displays the rank order of pragmatic noise items for each social status, along with their raw frequencies. Table 5.  The distribution of pragmatic noise (PN) across social status of characters in the Enhanced Shakespearean Corpus Status

Monarchy Nobility Gentry Professional Other middling Ordinary commoners Lowest groups

No. of No. of characters words of that status

No. of Average (mean) no. of PNs PNs per character of that status and standard deviation in round brackets

Normalized frequency of PN tokens (per 1,000 words)

 78 379 263 102  71  90

164,427 404,284 199,731  42,626  34,557  35,497

  511 1,547   786   184    83   144

6.6 (9.1) 4.1 (9.6) 3.0 (9.2) 1.8 (4.8) 1.2 (2.7) 1.6 (4.0)

3.1 3.8 3.9 4.3 2.4 4.1

324

 50,946

  168

0.5 (2.8)

3.3

24

Jonathan Culpeper and Samuel J. Oliver

0

1

2

3

4

5

6

7

Monarchy Nobility Gentry Professional Other middling Ordinary commoners Lowest groups

Figure 1.  Average (mean) number of pragmatic noise (PN) tokens across social status of characters in the Enhanced Shakespearean Corpus 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Monarchy Nobility Gentry Professional Other middling Ordinary commoners Lowest groups

Figure 2.  Normalized frequencies of pragmatic noise (PN) tokens (per 1,000 words) across social status of characters in the Enhanced Shakespearean Corpus

Chapter 2.  Pragmatic noise in Shakespeare’s plays 25



Table 6.  Rank-ordered pragmatic noise (PN) items by social status of characters in the Enhanced Shakespearean Corpus Status

Pragmatic noise types and their raw frequencies

Monarchy

o (183), oh (159), ah (43), ha (34), alas (26), ho (23), fie (20), alack (14), lo (6), hey (2), hum (1), hush (1), la (1)

Nobility

o (673), oh (360), alas (102), ah (73), ha (72), ho (70), fie (57), alack (40), lo (31), tut (18), tush (13), hey (9), hum (8), la (8), holla (5), hem (4), hush (2), foh (1), pish (1)

Gentry

o (241), oh (240), ha (57), ho (56), alas (48), fie (42), ah (25), lo (16), tut (14), alack (12), tush (8), la (7), foh (6), holla (5), hey (4), hem (2), hush (1), mum (1), pish (1)

Professional

o (68), oh (25), fie (24), ho (18), alas (14), ha (14), hey (9), lo (3), ah (2), alack (2), hum (2), foh (1), holla (1), la (1)

Other middling

0 (29), oh (15), alas (13), fie (6), ho (6), ah (5), la (3), ha (2), alack (1), hey (1), holla (1), tut (1)

Ordinary commoners

o (57), oh (20), ho (17), alas (13), sola (8), ha (7), lo (5), ah (4), holla (4), fie (3), hey (3), hum (2), pish (1)

Lowest groups

o (56), oh (44), alas (18), ho (14), ah (7), ha (7), lo (6), tut (5), alack (4), hey (2), holla (2), fie (1), hem (1), pish (1)

5.2

Discussion of distribution by social groups

Male characters speak many more words than female characters in the plays. Consequently, weighted or normalized frequencies are crucial in discussing frequencies of pragmatic noise. In Table 3, the average number of pragmatic noise tokens per female character is 5.6, whereas it is 2.1 for male. This tendency for pragmatic noise to be more frequently uttered by female characters is also reflected in the overall frequencies relative to the number of words spoken. Female characters speak 5.5 pragmatic noise tokens per thousand words, compared with 3.2 for male. We should of course remember that female characters here are constructed by a male author (or male authors, for plays which are thought to involve collaborations). These representations in Shakespeare’s plays may partly reflect a wider male stereotype of women as being more emotional. But they could also partly reflect the frequent role ascribed to female characters in the plays – Desdemona in Othello being a good example – namely, to demonstrate the emotional consequences of what the male characters are doing. Certainly, at the time women were associated with two specific areas of emotion, grief and fear, as these quotations from texts of that period make clear: “Yet he doth not with womanly weping bewaile his departure” (Rudolph Gwalther & John

26 Jonathan Culpeper and Samuel J. Oliver

Bridges, 1572, An hundred, threescore and fiftene homelyes or sermons […]); “but somewhat moved with her too womanly tymerousnes and fear” (Thomas Bentley, 1582, The sixt lampe of virginitie conteining a mirrour for maidens and matrons […]). The rank order of the raw frequencies of pragmatic noise tokens displayed in Table 4 provides evidence for the first of these areas of emotion. Alas appears in third position for female characters, but in fifth for male. Furthermore, other pragmatic noise items point in the same direction. Both alack, in seventh position for female characters and ninth position for male, and ah, in fourth position for female characters and seventh position for male, overlap to an extent with the core grief and sorrow emotional functions of alas, though they encompass a somewhat wider array of states relating to emotional distress. Conversely, it is worth noting that pragmatic noise types with some kind of directive function play a larger role in male character talk compared with female, reflecting the key role male characters are given in the direction of events, and also the patriarchal society of the time. Tush, for example, is in 11th position with 20 occurrences, whereas it is at the end of the rank-ordered list for female characters with a single instance. That single instance is spoken by Queen Margaret in Henry VI, part 3: (8) Suffolk: Sweet Madam, give me hearing in a cause. Margaret: Tush, women have been captivate ere now.  (Henry VI, part 3, 5.3)

This is significant because Margaret is characterized, not least by other characters, as an unnatural woman on account of her masculine behaviours (e.g. leading an army). Holla and hush also have a similar pattern to Tush. Turning to pragmatic noise across characters of different social status, Table 5 and Figure 1 tell a story made complex by the fact that there are widely varying numbers of characters at each status level who speak widely varying numbers of words. Focusing on the average number of pragmatic noise occurrences per character, we see a relationship which almost exactly follows social status level such that the higher the status the more items used. The rank order is: monarchy > nobility > gentry > professional > ordinary commoners > other middling > lowest groups. The most likely reason for this is that the characters of higher social status have larger parts (as is apparent from the number of words spoken by the number of characters in Table 5), and thus have more opportunities to use pragmatic noise. More illuminating, perhaps, are the frequencies of pragmatic noise tokens relative to the total number of words spoken by the different character social groups, as displayed in Table 5 and Figure 2. Here, the rank order is professional > ordinary commoners > gentry > nobility > lowest groups > monarchy > other middling groups. The final group, other middling groups, is poorly represented – not much can be concluded from it. Broadly speaking, and as can be clearly seen in Figure 2,



Chapter 2.  Pragmatic noise in Shakespeare’s plays 27

these results suggest that the middle groups in the social hierarchy are rather more densely populated by pragmatic noise than the groups on the extremities. It is difficult to explain the reasons for this without significant further study, including a consideration of dispersion, particularly as the differences between the groups were quite small. However, we might observe that it is the characters of middling groups that are often engaged in colloquial interactions, and acting as foils for the main characters (witness the Nurse, as discussed in the previous section). Some clues to the functional characteristics of particular social status groups can be found in the rank-ordered lists in Table 6. Perhaps the most striking feature is the position of fie. It is dominant in the middle of the social hierarchy, especially the professional group, but also other middling and to some extent gentry. It is present in reasonable quantities in the nobility group and also monarchy, but barely exists at the bottom of the hierarchy in the ordinary commoners and lowest groups. The two plays in which it occurs most densely are The Merry Wives of Windsor and The Taming of the Shrew. For example, in the former, Mistress Page, who is of gentry status, pours scorn on Master Ford: (9) Fy, fy, M Ford, are you not asham’d? 

(The Merry Wives of Windsor 3.3).

In the latter, Grumio, a professional groom, habitually scorns others: (10) Fie, fie on all tired Iades, on all mad Masters […]  (The Taming of the Shrew 4.1)

Note that in both these cases we see the reduplicative form (we take a reduplicative form to involve repeated adjacent constituents which together form a single conventionalized unit). Whilst we have not systematically investigated this, it does seem to be the case that the groups in the middle of the hierarchy engage in reduplicative forms or repeated forms (as in the Nurse’s speech; see example (4)) more than the groups at the extremities of the social hierarchy. This, obviously, could be a factor in their relatively high frequencies. 6. Conclusion One overall finding of our work is that there seems to be more variation in frequency of pragmatic noise tokens than in pragmatic noise types. Love tragedies seem to attract particular densities of pragmatic noise, probably as a consequence of their extreme emotional ups and downs. In contrast, the pragmatic noise items o, ha, oh, fie, ah, alas, and tush maintain a fairly steady presence across all of the plays. We noted how some pragmatic noise items are a part of rhetorical figures.

28

Jonathan Culpeper and Samuel J. Oliver

Alas, for example, is part of lamentatio. These figures are played out in particular contexts. Thus, death would trigger lamentatio, and hence the utterance of alas (and to some extent ah and alack). Plays that had these contexts would see an increase in density of pragmatic noise. We found that female characters have a greater density of pragmatic noise tokens, perhaps partly as a consequence of their frequent function in the plays as an emotional mirror, but also perhaps as a reflection of the male stereotype of women as more emotional. More specifically, there is evidence that female characters are constructed to perform the emotions of grief and sorrow, through alas, alack and ah, whereas male characters are constructed as exercising power through the direction of events with items such as tush, holla and hush. The distribution of pragmatic noise across social status groups proves complex. Whilst overall the higher a character is in terms of social status the more pragmatic noise they use, that could simply reflect the fact that higher status characters get to speak so much more than lower; in other words, they have more opportunities to use pragmatic noise. However, if we factor in the amount that each character speaks overall (i.e. we consider normalized frequencies), we are more likely to hear groups in the middle of the social hierarchy using pragmatic noise more often than at the extremes. This may be because those characters also tend to function as emotional foils for the main characters. These middle characters gravitate in particular to the form fie. We also noted that they seem to tend towards reduplicative or repeated forms, which obviously would help account for their relatively high densities of use. We have of course only scraped the surface of what is possible. Future studies would benefit from more probing of specific forms in their contexts. In addition, a more sophisticated statistical model, and also one that combines a number of variables and examines dispersion, would help shed light. Nevertheless, we hope that we have shown that studying pragmatic noise and how it varies across genre and groups of characters is a fascinating enterprise worthy of study.

Funding The research presented in this chapter was supported by the UK’s Arts and Humanities Research Council (AHRC), grant reference AH/N002415/1.



Chapter 2.  Pragmatic noise in Shakespeare’s plays 29

References Ameka, F. 1992. Interjections: The universal yet neglected part of speech. Journal of Pragmatics 18(2–3): 101–118.  https://doi.org/10.1016/0378-2166(92)90048-G Archer, D. & Culpeper, J. 2003. Sociopragmatic annotation: New directions and possibilities in historical corpus linguistics. In Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech [Łódź Studies in Language 8], A. Wilson, P. Rayson & T. McEnery (eds), 37–58. Frankfurt: Peter Lang. Bentley, T. 1582. The sixt lampe of virginitie conteining a mirrour for maidens and matrons […]. At the three cranes in the vintree, by Thomas Dawson and Henry Denham, for the assignes of William Seres. London. (STC 1894). Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education. Boas, F. S. 1910. Shakspere and his Predecessors. London: John Murray. Caffi, C. & Janney, R. W. 1994. Toward a pragmatics of emotive communication. Journal of Pragmatics 22(3–4): 325–373.  https://doi.org/10.1016/0378-2166(94)90115-5 Culpeper, J. 2019. Lead compiler of the Enhanced Shakespearean Corpus, part of the Encyclopedia of Shakespeare’s Language Project (AHRC grant reference AH/N002415/1). Lancaster: Lancaster University. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Gwalther, R. & Bridges, J. 1572. An hundred, threescore and fiftene homelyes or sermons [...]. By Henrie Denham, dwelling in Pater noster rowe, at the sign of the starre. London. (STC 25013). Jucker, A. H. 2002. Discourse markers in Early Modern English. In Alternative Histories of English, R. Watts & P. Trudgill (eds), 210–230. London: Routledge. Person, R. F. 2009. “Oh” in Shakespeare: A conversation analytic approach. Journal of Historical Pragmatics 10(1): 84–107.  https://doi.org/10.1075/jhp.10.1.05per Taavitsainen, I. 1999. Personality and styles of affect in The Canterbury Tales. In Chaucer in Perspective: Middle English Essays in Honour of Norman Blake, G. Lester (ed.), 218–234. Sheffield: Sheffield Academic Press.

Chapter 3

Keywords that characterise Shakespeare’s (anti)heroes and villains Dawn Archer and Alison Findlay

Manchester Metropolitan University / Lancaster University

This chapter undertakes a keyword analysis of seven Shakespearean characters: Titus, Tamora, Aaron, Lear, Edmund, Macbeth and Lady Macbeth. The chapter discusses how, once contextualised, these keywords provide useful insights into their feelings/thoughts towards others, events, motivations to act, etc. In terms of findings, only Aaron denotes his “villainy” directly. Tamora, in contrast, draws upon a keyword that is denotatively positive; in context, though, “sweet” reveals her womanly wiles. “Weep”, for Lear, and “legitimate” and “base”, for Edmund, problematize their status as (one-dimensional) villains. Macbeth and Lady Macbeth draw upon grammatical keywords, “if ” and “would” in ways that signal something about their (deteriorating) emotional and social positions as much as their villainous intentions. Keywords: context, keywords, log ratio, Shakespeare and villainy

1. Introduction Professor Merja Kytö is well known for both her interest in “involved” texts – spoken/speech-related, historical and contemporary  – and also her work in ensuring others have access to rich resources that can be interrogated using corpus linguistic techniques. In line with the above, this chapter draws upon a new resource, developed as part of Lancaster University’s AHRC-funded Encyclopedia of Shakespeare’s Language project, which allows researchers to explore Shakespeare’s plays using statistical keyword methods. We will demonstrate how this technique can benefit – by confirming/refuting or advancing – existing literary understandings of Shakespearean depictions of villainy (Sections  4–7.2). Three of the seven Shakespearean characters under analysis – Titus, Tamora and Aaron – are taken from an early play: Titus Andronicus (1594). The remaining four – Lear, Edmund, Macbeth and Lady Macbeth – appear in two of the later tragedies: King Lear (c.1605 but revised for the Folio) and The Tragedy of Macbeth (c.1606). https://doi.org/10.1075/scl.97.03arc © 2020 John Benjamins Publishing Company

32

Dawn Archer and Alison Findlay

Even quick guides like Quennell and Johnson’s Who’s Who in Shakespeare (2013: 1) indicate the complexity of these particular characters. They note of Aaron, “the forebear of other Shakespearean villains” (ibid.), that he is simultaneously: […] a heartless Machiavel, an advocate of ‘policy and stratagem’, and ‘chief architect and plotter’ of the tragic events [of the play]; the evil Moor of Christian tradition… distinguished by…cruelty; [and] above all…the direct descendent of the figure of Vice in the medieval morality plays. (Quennell & Johnson 2013: 1)

However, they go on to point out how his behaviour towards his son humanizes him beyond the personified figure of Vice (ibid). Quennell and Johnson (ibid: 64) describe Edmund as “a witty and attractive villain” who is nonetheless less guilty than the figure on which Shakespeare based him. Longer studies such as Charney (2012: 100) are less flattering, though, describing Edmund as being “without much scruple” and “cunning like Iago”. Charney spares Lady Macbeth the villain label, in spite of conceding she shares her husband’s “murderous and savage thoughts” (ibid: 86). Macbeth, in contrast, is argued to have “a special status in Shakespeare as a villain-hero”, in part because of how he “agonizes…over his ill-doing” (ibid). We discuss the extent to which the keyword results for these characters confirm them as (anti)heroes or villains following our description of the resource used in this particular study (Section 3) and the keyword methodology adopted (Section 4). We begin, however, with an outline of similar Shakespeare-focussed keyword studies within the pragma-stylistic tradition, as a means of situating our work (Section 2). 2. Background The use of corpus linguistic approaches to analyse Shakespeare is now well ­established. Studies range from fine-grained investigations of particular characters (Archer & Bousfield 2010; Culpeper 2002, 2009) to investigations across the whole body of plays, exploring themes (Archer et al. 2009 on love) or specific language features (Beatrix Busse 2006 on vocatives, Ulrich Busse 2002 on second-person pronouns, Culpeper & Oliver this volume on pragmatic noise). As well as adding some much-needed empirically based findings to the large and long-established body of qualitative literary critical work, these quantitative studies have provided useful insights into the way Shakespeare used language to construct different types of individuals, settings and plots. Culpeper (2002: 21) has suggested, for example, that the Nurse in Romeo and Juliet is “dispositionally emotional” (i.e. affected by and reacting to the traumatic events of the play) based on her use of the keywords “god”, “warrant”, “faith”, “marry” and “ah”. Two of Juliet’s grammatical keywords – “if ” and “yet” – are similar in that they are occasioned by the unfolding events. This



Chapter 3.  Shakespeare’s (anti)heroes and villains 33

study builds on such work in the pragma-stylistic tradition. Our aim is to analyse the words spoken by each character, in context, paying particular attention to why they are spoken and to whom. While this reduces each ‘character’ to a collection of words, we demonstrate how the keyword approach can be used, nonetheless, to explore the dramatized expression of their feelings and thoughts, and from this, their potential motivations (Archer & Lansley 2015). We explain our methodology in more detail in Section 4, after describing the resource drawn upon in this study. 3. Resource drawn upon The AHRC-funded Encyclopedia of Shakespeare’s Language project (Grant Ref: AH/N002415/1) uses “computers to identify patterns of use across Shakespeare’s works” (Culpeper, forthcoming) that can be difficult to detect otherwise. This chapter draws on the project’s core dataset, namely, electronic versions of the 36 plays of the First Folio (1623) plus Pericles and The Two Noble Kinsmen from Quarto 1 (downloadable from the Internet Shakespeare Editions [ISE] website). Project-focussed enhancements made to the electronic plays are explained in detail in Culpeper and Oliver (this volume) and so will not be discussed here beyond highlighting two, which are specifically designed to improve the accuracy of results derived from using corpus linguistic techniques like keyword analysis (Baron & Rayson 2008). First, every original spelling within each play has been checked and, when relevant, regularised manually, according to the criteria in Culpeper (forthcoming), aided by VARD . Second, each play text has been annotated using a customised instance of the CLAWS4 part-of-speech tagging software (Garside & Smith 1997), based on a variant of the C6 tagset d ­ esigned to account for the language of the period. In brief, seventeenth-century vocabulary has been added to the tagger’s twentieth-century lexicon, and tags added to the tagset, so we can capture the second-person singular pronouns thou, thee, etc., and thus achieve better verb agreement (with dost, didst, etc.). This is necessary because the default CLAWS tagger achieves only 89% accuracy on Shakespearean text even with spelling regularisation (Rayson et al. 2007). Whilst the aforementioned enhancements have raised this, it is not to CLAWS’s 95–97% success rate for Present-day English. In consequence, the part-of-speech annotation has also been manually post-edited to correct all errors at the major word-class (verb/noun/adjective/etc.) level. Whilst the possibility of human errors remain, accuracy is thus “as close to 100%” as can be achieved at this time (Culpeper forthcoming).

34

Dawn Archer and Alison Findlay

4. Keyword approach adopted Previous studies like Culpeper (2002, 2009) have tended to draw on one statistical measure when determining keywords and what they might tell us about a particular character (and/or their relationships with others). We follow the approach of the Shakespeare Language project, however, and draw on a cutting-edge three-step process. The first measure, Log-Likelihood (LL), is an indicator of statistical significance, that is, how much evidence we have for a given difference between two wordlists (in our case, the wordlist for a character only when compared with the wordlist for the play in which they appear, minus their turns). The results presented here are filtered using a LL cut-off of 6.63, meaning each keyword has a minimum confidence level of 99%. The second measure, Log Ratio (LR), is an effect size statistic used to sort the keyword list such that the quantitatively largest differences are at the top of that list (http://cass.lancs.ac.uk/log-ratio-an-informal-introduction/). Because LR is a binary logarithm of the ratio of relative frequencies, each increase by 1 indicates a doubling of how many times more frequent the word is, with respect to a particular character’s wordlist, when compared against the full play wordlist (minus their turns). In order to build the analysis on only the most prominent differences among those shown to have a sufficient evidence base (using the LL filter), this chapter reports on keywords with an LR ranging between 1 and 7 (making them twice to sixty-four times more frequent in the relevant character’s wordlist, relatively speaking). We have also restricted the analysis to keywords with a minimum frequency of 5, on the understanding that a qualitative interpretation of a word’s use in a character’s speech is difficult when such words occur less than this. Taking significance and effect size into account, as well as a minimum frequency filter, tends to generate less keywords for consideration but also ensures both qualitative and quantitative robustness. This is important, given that keyword generation is a first stage of a two-stage approach, which also involves checking the concordance lines of each keyword as a means of determining their use in context. In our case, we are particularly interested in keywords that divulge something about the seven characters’ villain vs. hero status. As such, Section 5 initially reports all generated keywords. We then select so that we can discuss specific keywords (from Table 1) in their context-of-use. 5. Keyword results for the seven characters Table 1 shows those keywords (of LL6.63+) that occur five times or more in the seven characters’ wordlists, and are ordered according to their LR values (provided in brackets, following each keyword’s frequency).

Chapter 3.  Shakespeare’s (anti)heroes and villains 35



Table 1.  Keywords for the seven selected Shakespearean characters AARON

gold villainy black keep set empress

(8/6.6) (5/4.92) (6/3.6) (7/2.82) (5/2.6) (14/1.65)

TAMORA

ear fear Andronicus sweet revenge at

(5/2.7) (5/2.51) (15/2.09) (11/1.79) (9/1.77) (13/1.46)

TITUS

sea eat ha girl service drink sirrah Publius read get Marcus hold mine earth tribune tear(s) come these they/them

(7/5.09) (5/4.61) (5/4.61) (5/3.61) (5/3.61) (5/3.61) (5/3.61) (8/3.28) (7/3.09) (6/2.87) (29/2.68) (8/2.28) (10/2.02) (10/2.28) (12/1.87) (25/1.68) (44/1.25) (28/1.5) (63/1.03)







KING LEAR

dower ha weep Regan cause kill boy daughter her she

(5/5) (10/4) (8/3.67) (13/3.67) (7/2.9) (7/2.48) (10/2.41) (29/1.89) (26/1.57) (42/1.23)

EDMUND

legitimate base brother sword business by father

(5/6.55) (5/6.55) (11/2.88) (6/2.28) (5/2.38) (17/1.34) (14/1.23)

MACBETH

tomorrow born till &/and blood fear if

(8/5/09) (7/3.9) (14/2.31) (10/2.09) (14/1.73) (23/1.22) (23/1.15)

LADY MACBETH

without bed would your

(6/2.52) (5/2.45) (15/1.38) (27/1.14)

36

Dawn Archer and Alison Findlay

As is clear from these results, very few of the keywords (which are spoken by the characters rather than about them) see characters self-identify as villains. Only Aaron has “villainy” as a keyword, although Edmund’s keyword “base” draws on the original, opprobrious meaning of “villain” as a “low-born, base-minded rustic” (OED 1a). Only a few of the other keywords – “blood”, “kill”, “fear”, “revenge” – are associated with villainous behaviour. The second stage of our methodology – qualitative analysis of the context-of-use for a selection of keywords from Table 1 – is thus crucial to establish how each character’s usage of a word helps define them in relation to heroism and villainy. If we view the above collectively, we see evidence of both keywords to do with aboutness (i.e. content relating to the plots) and also keywords that are more grammatical in nature, but reveal potential character traits. Pronouns feature as keywords for three of the seven (Titus, Lear and Lady Macbeth) and, as the forthcoming sections reveal, allude to their (often tumultuous) relationships with others. The characters from King Lear have kinship terms as keywords (“daughter”, “brother”), in addition, providing our first signal that family divisions act as a driver for the villainy of this particular play (see Section 7). Other “grammatical keywords” (Culpeper, 2009) of note (“if ”, “till”, “without”, “would”) highlight the anxious state of the two Macbeth characters (see Sections 8.1 and 8.2). Macbeth also uses the emotion-related term, “fear”, more than other characters from his respective play (statistically speaking). “Fear” is a keyword for Tamora, in addition, but (contra Macbeth) is something she deliberately causes others to feel (cf. Sections 6.2 and 8.1). The patriarchs, Titus and Lear, also draw on similar emotion-related words to the other: “tear(s)” and “weep”. In each case, they allude to their heartfelt ­despair, occasioned by the treatment of others, but it is Lear alone who comes to recognise their part in precipitating that treatment and demonstrating remorse in ­consequence (cf. Sections 6.3 and 7.1). 6. Discussion of the Titus Andronicus characters Titus Andronicus foregrounds villainy through its reworking of the genre of revenge tragedy. A victorious Titus returns to Rome, having taken Tamora, Queen of the Goths, her sons, and her Moorish lover prisoner. When Titus offers Tamora’s eldest son as a sacrifice, Tamora schemes to marry Saturninus, new Emperor of Rome, so that, aided by Aaron, she can take revenge on Titus and his family.

Chapter 3.  Shakespeare’s (anti)heroes and villains 37



6.1

Aaron

Multiple instances of two keywords – “black” and “villainy” – confirm Quennell and Johnson’s (2013: 1) depiction of Aaron as “the evil Moor of Christian tradition”. Aaron chooses wickedness over goodness to the point of delighting in it (like the Vice figure from medieval drama). Having tricked Titus into chopping off his hand, for example, he informs spectators: “how this villainy / Doth fat me with the very thoughts of it, / Let fools do good, and fair men call for grace, / Aaron will have his soul black like his face” (3.1). Although “villainy” has no statistical collocates, he uses the keyword alongside other negative terms, for example, “rape and villainy” (2.1), “villainy and vengeance” (2.1) and “Mischief, Treason, Villainies” (5.1). Aaron’s identity as a “slave” (5.1. and 5.3), moreover, makes him self-identify with the opprobrious associations of the word’s original meaning as low-born. His social aspirations are demonstrated by the co-occurrence of his keywords “villainy” and “gold”. Assuming kinship with the spectators, Aaron informs them that “To bury so much Gold under a tree” (2.3) may seem illogical, but in fact, “this gold must coin a stratagem, / Which cunningly effected, will beget / A very excellent piece of villainy” (2.3). He affectionately addresses the “sweet Gold” (2.3) as an accomplice in his plot to implicate two of Titus’s sons in the murder of Bassianus, revealing the “bag of gold” as evidence (2.3). Aaron’s social ambition is further evidenced in the use of keywords “villainy” and “gold” with reference to Tamora. Aaron concedes his “Empress” has a “sacred wit / To villainy and vengeance” that is a match for his (2.1). He likens her to the “golden sun”, but goes on to claim she is a slave to his love, allowing him to “mount her pitch” (2.1). Although Aaron’s legacy (as a figure of Vice) explains his delight in plotting, the superficial rationale of “ambition and vague desire for revenge” is more complex than Quennell and Johnson suggest. His paternal affection shows that he is more nuanced than a stereotypical character denoting evil. Indeed, his use of the terms “black” as well as “slave” (4.2 and 5.1) to address his newborn son challenges early modern cultural assumptions about blackness and villainy. He asks “is black so base a hue?”, when his son is referred to as a devil and threatened with death (see also White 1997). He also suggests “Coal-black is better than another hue” when challenging Chiron and Demetrius to recognise their step-brother as their equal. Aaron’s keywords, when considered collectively, thus suggest that even his refusal to repent for the “thousand dreadful things” he has done against the Romans and wish to do “a thousand more” (5.1) may be a race-specific desire for revenge against the racist hegemony that labels him a villain. Simply put, it is more than simply a reversion to a Vice-like role as “the personification of evil” (Quennell & Johnson ibid: 1).

38

Dawn Archer and Alison Findlay

6.2 Tamora When viewed in their context-of-use, three of Tamora’s keywords – “sweet”, “ear” and “fear” – hint at her feminine wiles. After being made Empress, she appeals to her “sweet” Emperor to “pardon what is past” before encouraging him and Titus to come together (1.1). Spectators’ suspicions about her are confirmed, however, when she confides to Saturninus she will “find a day to massacre…all” the Andronici. Her next public display of sweet-talking, when she advises her “sweet emperor” they “must all be friends”, assuring her “sweet heart” she “will not be denied” (1.1), is thus blatantly deceptive. The “revenge of the villains”, as Bowers (2015: 112) calls it, is not a single act on Tamora’s part. It requires, instead, her complicity in her “sweet” Moor’s maiming of Titus and his kin. That Tamora is as deliberately villainous as Aaron becomes evident in turns where she brags about her sweet-talking abilities to Saturninus, telling him she “will enchant the old Andronicus, / With words more sweet, and yet more dangerous / Than baits to fish, or honey-stalks to sheep” (4.4). “Ear” is significant in this regard too, with Tamora claiming she “can smooth and fill his aged ear, / With golden promises, that were his heart / Almost Impregnable, his old ears deaf, / Yet … both ear and heart” would “obey her tongue” (4.4). Whilst the keyword “fear” might seem appropriate to a prisoner of war, their context-of-use reveal a fearless rather than fearful Tamora. Three of the five instances collocate with “not”. Two “fear not” instances occur at points where Tamora is attempting to deceive others (in 1.1 and 2.3), by putting them at their ease. The third instance admonishes Saturninus at the point he fears a public uprising (4.4). Tamora also draws upon the keywords “fear” and “ear”, menacingly, when disguised as Revenge, telling Titus: “There’s not a hollow Cave or lurking place … / Where bloody Murder or detested Rape, / Can couch for fear, but I will find them out, / And in their ears tell them my dreadful name” (5.2). 6.3

Titus

Titus’s keywords point to him being a character of extremes: a hero and anti-hero/ villain. His absolute loyalty to Roman tradition and “service” (a keyword for Titus) help to explain his belief that sacrifices – like slaying Alarbus – have to be made regardless of the cost, if it ensures his own slain “brethren” can “rest” in “eternal sleep” (1.1). The keyword “mine” captures both Titus’s (at times competing) attempts to control and protect the Andronicus family of which he is head, and his inseparable link to Rome however badly it treats him. He draws on “mine” when swearing allegiance to the newly-elected Emperor, Saturninus: “I hold me Highly Honoured of your Grace, / And here in sight of Rome to Saturnine / …do I Consecrate, / My Sword my Chariot… / Mine Honour’s Ensigns humble at thy feet” (1.1). Such



Chapter 3.  Shakespeare’s (anti)heroes and villains 39

loyalty perpetuates additional sacrifices for Titus, including rejecting his disobedient son Mutius as “no son of mine” before slaying him. Titus’s keyword, “tears”, occurs 25 times, and is a prime example of how keywords can allude to characters’ emotional states. In Titus’ case, we see him forgo the Roman stoicism that meant he “never wept” (3.1) to the point of being overwhelmed on learning of Lavinia’s mutilation. He likens his “girl” (a keyword that Titus only uses in reference to Lavinia) to “the weeping welkin” and himself, “the earth”: before lamenting how “earth with her continual tears” has “Become a deluge, overflow’d and drown’d” (3.1). The keywords related to weeping most clearly exemplify Titus as tragic hero rather than villain. Receiving the heads of Martius and Quintus, alongside his own severed hand, marks a change for him, however, and leads to “our fearless hero brutally exact[ing] revenge upon [his] equally vicious opponents”, to quote McDonald (2000: 5). Laughing hysterically, Titus declares “I have not another tear to shed” (3.1) – hence the keyword, “ha”. The following scenes then see him transcend from grief through madness into a single-minded revenge, as he comes to appreciate the full extent of Tamora and Aaron’s plot against him. The tribal nature of this revenge is signalled through the grammatical keywords “they” and “them”, which Titus uses to objectify his enemies. After promising to “o’er reach them in their own devices” (5.2), he instructs a mutilated Lavinia to get “them [Chiron and Demetrius] ready” for the banquet (5.2). The latter equates to a grotesque perversion of a Eucharistic feast, with Tamora’s sons served up to her in a pie (hence the keyword, “eat”). Titus appears to believe he was, albeit violently, righting wrongs as opposed to acting villainously: “They ravished [Lavinia], and cut away her tongue, / And they, t’was they that did her all this wrong”, (5.3). This inability to see himself as wrong-doer or to show remorse (cf. Charney 2012) is what ultimately problematizes Titus’s hero status, in our view. 7. Discussion of the King Lear characters The villainy in King Lear comes about because of the family divisions Lear triggers when dividing his kingdom. Lear wants his daughters  – Goneril, Regan and Cordelia – to make public declarations of their love in return for portions of his kingdom. Goneril and Regan comply and receive land for themselves and their husbands. Cordelia’s refusal to engage in the same way sees her disowned. Gloucester’s bastard son, Edmund, meanwhile, deceives his father into believing Edgar (his older legitimate brother) is trying to kill Gloucester for his land. Edmund is named heir in consequence. Edmund then betrays his father to Regan (when Gloucester offers help to, first, Lear and, then, Cordelia) and pursues romantic relationships with Regan as well as Goneril (with the intention of cementing his power further).

40 Dawn Archer and Alison Findlay

7.1

Lear

When studied in their context-of-use, Lear’s keywords confirm that the villain/ hero dichotomy is far too simplistic for a protagonist who claims he is “a man / More sinned against than sinning” (3.2). Lear’s statistical overuse of the keywords “Daughter”, “she” and “her” is unsurprising given the plot. Lear draws upon “she/ her” to emphasize Cordelia’s (decreased) transactional worth after she fails to ­flatter him as her sisters had done: “When she was dear to us, we did hold her so, / But now her price is fallen” (1.1). It is here, too, Lear draws on his final use of the keyword “dower”, to warn Cordelia’s suitor, Burgundy, that she is now ‘Unfriended, new adopted to our hate, / Dowered with our curse,” before informing him to “Take her or, leave her”. The above in conjunction with Lear’s disowning of Cordelia – by calling her his “sometime Daughter” (1.1) – are so negatively loaded/derogatory that they problematize Ray’s (2007: 98) assessment of Lear as the “undisputed hero of the play”. Lear’s uses of the keyword “cause” can be understood as self-centered markers for his egotistical nature: a characteristic he displays for most of the play. It is only when he is reunited with Cordelia that he begins to (be able to) see from the perspective of others. He appreciates that Cordelia has “some cause” (4.6) for wanting to do him wrong, for example, while his other daughters “have not.” Lear’s uses of “she/her” in his attempt to revive Cordelia’s dead body are further ­indications of his remorse at being ultimately responsible for her demise. He desperately hopes “she lives” while acknowledging “she’s dead as earth” and “I might have saved her, now she’s gone for ever” (5.3). Lear’s use of “she/her” in reference to Goneril is always negative, often to an extreme: “If she must teem, / Create her child of Spleen” (1.4). When Goneril reduces Lear’s retinue of knights, he regards this particular “Daughter” as “a disease” which contaminates his “flesh” (2.2). He admits, in addition, to being “ashamed” Goneril had the “power to shake [his] manhood”, causing him to shed “hot tears” that, from his perspective, she did not deserve. He then draws on the keyword “ha”, but as a surge feature designed to heighten his rejection of her: “Ha? Let it be so. I have another daughter” (1.4). “Regan” is the only named daughter to be a keyword for Lear. Most mentions occur in Act 2, Scene 2 where Lear is (still) hopeful she will be more sympathetic towards him than Goneril: “Beloved Regan , / Thy Sister’s naught: oh Regan … thou wilt not believe … O Regan” (2.2). The eight instances of the keyword “weep” again reveal something about Lear’s transformation (or development) as a character. Six instances are accompanied by negation and/or “I”, and demonstrate Lear’s egotistical self-preoccupation for much of the play. He states “I’ll not weep”, in spite of having “full cause of weeping” for example (2.2, see also 3.4, 4.5 and 4.6). Once reunited with Cordelia, though, he

Chapter 3.  Shakespeare’s (anti)heroes and villains 41



focuses on the grief he has caused her: “I pray weep not” (4.6) and determines that when they are imprisoned together, their captors will starve “ere they shall make us weep” (5.1). The keyword is replaced by Lear’s howls and tears when he enters with the corpse of Cordelia who has been hung in the prison, thus marking his maturation to tragic hero. 7.2

Edmund

Edmund’s keywords in context modify Quennell and Johnson’s view of Edmund’s “self-awareness and delight in his own villainy” (2013: 63). Four of Edmund’s ­keywords – “base”, “legitimate”, “brother” and “father” – point to his obsession with social status and, crucially, his desire to achieve legitimacy or get “to the Legitimate” (1.2) in order to escape from the role of “base” villain. He muses over why he is considered “base”, unworthy and inferior such that he cannot inherit his father’s estate – “Why Bastard? Wherefore base?” (1.2); “With baseness Bastardy? Base, Base?” (1.2) – before going on to plot against his brother, Edgar, stating – “Well then, / Legitimate Edgar, I must have your land…fine word: Legitimate. / Well, my Legitimate, if this Letter speed, / And my invention thrive, Edmund the base / Shall to the Legitimate” (1.2). As the previous turn reveals, Edmund’s plan to “have lands by wit” “if not by birth” (1.2) depends on projecting his own villainy (and social inferiority) onto “a Brother Noble” (1.2) who does not suspect Edmund’s own “villainous” plotting, thereby transforming Edgar into a base outcast. Edmund’s success depends, in turn, upon “A Credulous Father,” Gloucester, who is ready to believe Edmund’s tale that Edgar is the “villain” (1.2) trying to seize his father’s estate. Edmund subsequently plots to have Gloucester apprehended as a “villain” to the state (3.7) in order to advance his own rise to legitimate power. “Sword” is also a keyword for Edmund, but this is because he uses it as a key prop in the play. He persuades his father of Edgar’s supposed treachery by cutting himself with a sword (1.2); he attempts to defend his usurped position as Gloucester’s noble, legitimate heir by duelling with the “villain-like” Edgar (5.3), and; prior to dying, attempts (in vain) to reprieve Lear and Cordelia (5.3) using his sword as a symbol. This last endeavour and his pathetic claim that “Edmund was beloved” (5.3) are marks of nobility / legitimacy that are his real goal and save him from being a one-dimensional villain that can only gloat about his evil “practices” over his brother and father (cf. Charney 2012).

42

Dawn Archer and Alison Findlay

8. Discussion of the Macbeth characters Villainy in Macbeth is triggered by prophecies given by the three “weird sisters” who tell Macbeth he will become Thane of Cawdor, and then, King of Scotland. Macbeth conspires with Lady Macbeth to murder King Duncan, in consequence, but is unable to prevent Banquo’s sons being the country’s future kings (as the sisters foretold). 8.1

Macbeth

Macbeth’s tentative claim to the throne throughout the play is indicated by his grammatical keyword “if ”, first seen when he contemplates assassinating Duncan: If it were done when ’tis done, then ’twere well It were done quickly: if the assassination Could trammel up the consequence, and catch With his surcease success […] We’d jump the life to come.

(1.7)

When the conditional is drawn upon again, it is as part of the fateful question – “If we should fail?” (1.7) – and, hence, serves to indicate Macbeth’s emerging insecurity about sovereignty. This, in conjunction with the weightiness of Macbeth’s growing guilt, accounts for the keyword “fear”. Macbeth confesses to “fear[ing]…Banquo”, in particular. When Banquo’s son Fleance escapes Macbeth’s murder plot, his “doubts and fears” (3.4) intensify further. The keyword, “blood”, a verbal equivalent to Lady Macbeth’s incessant washing of her hands, signifies Macbeth’s “agoniz[ing] with himself over his ill-doing” (Charney 2012: 86). He acknowledges “I am in blood / Stepped in so far that, should I wade no more / Returning were as tedious as go o’er” (3.4), for example. Such guilt gives Macbeth “a special status in Shakespeare as a villain-hero”, according to Charney (ibid). His haunting by an interminable line of Banquo’s descendants means he is doomed to continue his murderous course. The keyword, “&/and”, is repeated when Macbeth recalls a “seventh”, an “eighth” and “many more” kings in a line stretching out to the “crack of doom”, while “Banquo smiles upon [him] / And points to them” (4.1). It echoes alongside the keyword, “tomorrow” in Macbeth’s nihilistic vision of the future following his wife’s suicide: “Tomorrow, and tomorrow, and tomorrow… And all our yesterdays have lighted fools / The way to dusty death” (5.5). The six instances of the keyword “born” are all used as part of a phrase denoting a caesarean section (5.3, 5.7, 5.8), and (as Macbeth learns), refer ultimately to Macduff, the thane who slays him at the play’s end, thus fulfilling the final part of the weird sisters’ prophecy.

Chapter 3.  Shakespeare’s (anti)heroes and villains 43



8.2

Lady Macbeth

Charney (2012: 86) is confident that Lady Macbeth is not a villain, in spite of having “murderous and savage thoughts”. The keywords suggest a slightly different interpretation: that of a villain who is then haunted by the error of her ways. The keywords, “would” and “without”, capture her manipulative disposition, for example. “Would” collocates with “thou” (x6) and “yet” (x3), the latter of which is also used with “without” when Lady Macbeth considers the likelihood of her husband taking the necessary steps to make himself king. She thinks he “would’st be great” and is “not without ambition”, and “wouldst [aim] highly”, for example, but also feels he “wouldst not play false” and “wouldst” “win” “holily” because he is “too full o’th milk of human kindness” (1.5). Lady Macbeth’s symbolically feminine keyword “without” expresses her own overwhelming sense of lack. She longs for the spirits to “unsex me here”, and lambasts Macbeth for any sign of feminine weakness: “My hands are of your colour; but I shame / To wear a heart so white” (2.2). In the banquet scene, she scolds him for being “unmanned by folly” (3.4) when he sees the ghost of Banquo. Like her husband, though, Lady Macbeth ultimately realizes they are trapped by lack: “Naught’s had; all’s spent / When our desire is got without content”, and can only live in negative terms: “things without all remedy / Should be without regard” (3.2). Her sleepwalking scene (5.1), rhythmically repeats the keyword “bed” amongst other negatives: “To bed, to bed … What’s done cannot be undone. To bed, to bed, to bed” (5.1.) It inverts the earlier murder scene where she urges Macbeth to wash his hands and get to bed, even though he can “sleep no more”. (2.2). Ironically, it is now the sleeping Lady Macbeth who is without – as in outside – the bed and cannot rest. With hindsight, she inverts the ambitious impulse that guided her earlier use of the conditional “would” to lament “who / Would have thought the old man to have had so much / Blood in him”. This unintended confession ultimately leads to a public condemnation of the Macbeths as villains: “this dead butcher and his fiend-like queen”. Yet, such a judgement seems hugely reductive, based on the keywords in context. 9. The seven Shakespearean characters: Hero, anti-hero or villain? Our aim in this chapter was to demonstrate how a corpus linguistic technique like keyword analysis can benefit – by confirming/refuting or advancing – existing literary understandings of Shakespearean depictions of behaviour deemed villainous. Our keyword results were gleaned using a cutting-edge, three-step filtering process that leads to fewer but arguably more robust results, quantitatively

44 Dawn Archer and Alison Findlay

and qualitatively speaking, than would be achieved by relying on LL alone. Each of the generated keywords was then checked (via concordance lines) to reveal those that, when studied in context, tell us something about the characters’ malicious / heroic qualities. Aaron proves to be the only character of the seven to have “villainy” as a keyword, though Edmund cleverly projects his own villainy onto others, driven by the patrilineal system’s exclusion of him as base born (the original meaning of villain). Our analysis of keywords-in-context shows that both base-born villains are more complex than the Vice figures of medieval morality plays. Aaron’s use of “villainy” along with the keywords “black” and “gold” reveal his ambition to challenge the stereotypical connections between race and villainy (in the sense of both social inferiority and evil behaviour) for his son, if not for himself. Edmund’s keywords “base” and “legitimate” reveal he is a villain driven by a desire to attack, albeit with the ultimate intention of inserting himself in “to the legitimate,” (1.2), to win a place in the patrilineal system which excludes him based on his birth. Our keyword analyses likewise uncover much about the tragic heroines Tamora and Lady Macbeth. Tamora begins as a tragic victim but grows into a towering figure of revenge, modelled on classical feminine anti-heroes like Medea (albeit aided and abetted by Aaron’s villainous plots). We have focussed on Tamora’s use of “sweet”, in order to highlight her use of speech to flatter and manipulate. Such a depiction makes her an early example of a characteristically feminine villainy (Tassi 2011; Pollard 2017). Lady Macbeth is as manipulative as Tamora, but her chosen methods involve “unsex[ing] herself ” (Charney, 2012: 86) and emasculating Macbeth with the aid of keywords such as “would” and “without”. As the play progresses, the “would” and “without” keywords also provide us with a window into this character’s insecurity: in particular, the sense of lack that leads ultimately to her undoing herself through confession and suicide. Titus is a military hero-turned-villain by tragic circumstances (that he inadvertently set in motion himself). We see him as a villain, more than hero (or even villain-hero), in spite of his ultimately shedding “tears”, due to his inability to see himself as wrong-doer/show remorse (cf. Charney 2012). King Lear also sheds tears. Unlike Titus, however, his early cruelty to Cordelia and egotistical indifference to others is slowly transformed as signalled by his keywords “she”/”her”; “cause” “weep” and its collocates, “I” and “not”. The suffering he endures at the hands of others, along with his increasing acknowledgement of his folly and responsibility make Lear “a man more sinned against than sinning” (3.2), meaning that even the villain-hero label is too simplistic to account for his growth in tragic awareness. Macbeth is arguably a victim as well as agent of villainy. Charney (2012: xvii) describes him as a “villain-hero” because he is perpetually tormented by his own guilt, and this overwhelming sense of anxiety/dread (if not guilt) is evidenced by the keywords “if ” and “fear”.



Chapter 3.  Shakespeare’s (anti)heroes and villains 45

When taken collectively, we believe that such results provide a convincing argument that keyword analyses can illuminate the linguistic patterns that give nuance to characters whose actions are morally reprehensible or questionable, at the same time as grounding previous (literary) understandings of these characters’ thoughts, feelings and intentions in (empirical) linguistic analysis.

References Archer, D. & Bousfield, D. 2010. ‘See better, Lear’? See Lear better! A corpus-based pragmastylistic investigation of Shakespeare’s King Lear. In Language and Style, D. McIntyre & B. Busse (eds), 183–203. Houndmills: Palgrave Macmillan. https://doi.org/10.1007/978-1-137-06574-2_12 Archer D. & Lansley, C. 2015. Public appeals, news interviews and crocodile tears: An argument for multi-channel analysis. Corpora 10(2): 231–258.  https://doi.org/10.3366/cor.2015.0075 Archer, D., Culpeper, J. & Rayson, P. 2009. Love – “a familiar or a devil”? An exploration of key domains in Shakespeare’s comedies and tragedies. In What’s in a Word-list? Investigating Word Frequency and Keyword Extraction, D. Archer (ed.), 137–158. Farnham: Ashgate. Baron, A. & Rayson, P. 2008. VARD2: A tool for dealing with spelling variation in historical corpora. Presented at the Postgraduate Conference in Corpus Linguistics, Aston University, 22 May 2008. Bowers, F. T. [1959]2015. Elizabethan Revenge Tragedy, 1587–1642. Princeton NJ: Princeton University Press. Busse, B. 2006. Vocative Constructions in the Language of Shakespeare [Pragmatics & Beyond New Series 150]. Amsterdam: John Benjamins.  https://doi.org/10.1075/pbns.150 Busse, U. 2002. Linguistic Variation in the Shakespeare Corpus [Pragmatics & Beyond New Series 106]. Amsterdam: John Benjamins.  https://doi.org/10.1075/pbns.106 Charney, M. 2012. Shakespeare’s Villains. Madison NJ: Fairleigh Dickinson University Press. Culpeper, J. 2002. Computers, language and characterisation: An analysis of six characters in ­Romeo and Juliet. In Conversation in Life and in Literature: Papers from the ASLA ­Symposium [Association Suedoise de Linguistique Appliquee 15], U. Melander-Marttala, C. ­Ostman & M. Kytö (eds), 11–30. Uppsala: Universitetstryckeriet. Culpeper, J. 2009. Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1): 29–59. https://doi.org/10.1075/ijcl.14.1.03cul Culpeper, J. Forthcoming. General introduction. Encyclopedia of Shakespeare’s Language [Arden Shakespeare]. London: Bloomsbury. Garside, R. & Smith, N. 1997. A hybrid grammatical tagger: CLAWS4. In Corpus Annotation: Linguistic Information from Computer Text Corpora, R. Garside, G. Leech & A. McEnery (eds), 102–121. London: Longman. McDonald, R. (ed.). 2000. Titus Andronicus. New York: Pelican Shakespeare. Oxford English Dictionary, 2nd rev. ed. (nd). Oxford: OUP. Pollard, T. 2017. Greek Tragic Women on Shakespearean Stages. Oxford: OUP. https://doi.org/10.1093/oso/9780198793113.001.0001 Quennell, P. & Johnson, H. 2013. Who’s Who in Shakespeare. London: Routledge.

46 Dawn Archer and Alison Findlay

Ray, R. 2007. William Shakespeare’s King Lear [Atlantic Critical Studies]. New Delhi: Atlantic Publishers. Rayson, P., Archer, D., Baron, A., Culpeper, J. & Smith, N. 2007. Tagging the bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Proceedings of the Corpus Linguistics Conference: CL2007, M. Davies, P. Rayson, S. Hunston & P. Danielsson (eds). Birmingham: University of Birmingham. Tassi, M. A. 2011. Women and Revenge in Shakespeare: Gender, Genre and Ethics. Selinsgrove PA: Susquehanna University Press. White, J. S. 1997. “Is black so base a hue?” Shakespeare’s Aaron and the politics and poetics of race. College Language Association Journal 40: 336–366.

Chapter 4

Revealing speech Agentivity in Iago’s and Othello’s soliloquies Juhani Rudanko

Tampere University

In several of Shakespeare’s plays, soliloquies serve as a window into the speaker’s mind and a view of the world at the time of the soliloquy. The framework of analysis in this chapter is that of semantic roles, with the focus on the Agent. The author develops a view of the Agent based on a cluster of selected semantic features, and applies it to thematically linked soliloquies in Othello. Each subject in the set of soliloquies is considered with respect to its agentivity or lack of it on the basis of the nature of the predicate in question. Iago’s soliloquies are seen to be higher in agentivity than Othello’s, revealing Iago as a “doer” and Othello as someone being acted upon. Keywords: agent, agentivity, Shakespeare’s soliloquies, Shakespeare’s Othello, Iago in Othello

1. Introduction This chapter seeks to shed light on an aspect of the nature of soliloquies by employing a linguistic method of analysis. Another general purpose is to define – and to refine – the method of analysis itself by applying it to selected soliloquies in a major Shakespearean tragedy. To outline the structure of the chapter, in this section brief comments are offered on the nature of soliloquies in Shakespeare’s plays, and the soliloquies to be considered are also identified. Section 2 is devoted to a discussion of the Agent and agentivity, which define the particular linguistic method used in this chapter. In Section 3, the linguistic method is applied to the data selected, and in a concluding section the findings are summarized and research questions for further work are outlined. Previous work on soliloquies in Shakespeare includes both specialized studies, for instance by Clemen (1964, 1987), Muir (1964), Gingrich (1978) and comments https://doi.org/10.1075/scl.97.04rud © 2020 John Benjamins Publishing Company

48 Juhani Rudanko

on them in publications with a more general orientation, for instance by A. C. Bradley ([1904] 1978), Ellis-Fermor (1948), Hussey (1982), and Skiffington (1985). A point of departure is provided by the OED definition of the term “soliloquy,” which does not have a specific reference to Shakespeare: An instance of talking to or conversing with oneself, or of uttering one’s thoughts aloud without addressing any person. (OED, s. v., sense 1)

With more specific reference to Shakespeare, Clemen begins his 1964 article by emphasizing their variety: The first thing that strikes us about Shakespeare’s soliloquies [note omitted], if we compare them with those of any other dramatist, is their extraordinary variety.  (Clemen 1964: 1)

Despite the variety of Shakespeare’s soliloquies, stressed by Clemen, some properties can be identified in them, as least in the case of Shakespeare’s mature plays, including Othello. Gingrich formulates one such important property concisely. She notes that the “soliloquy is delivered to the audience while a character is alone on stage” (Gingrich 1978: 8). Hirsch (2003), a recent critic to have written about soliloquies, might disagree about Gingrich’s stress on the audience, suggested by “delivered to the audience,” because he argues that Shakespearean soliloquies represent “self-addressed speech” (Hirsch 2003: 26). However, it would hardly be appropriate for a Shakespearean character to whisper a soliloquy to himself or herself on the stage. The role of the audience should not be underestimated, and Gingrich’s view is adopted here. In some other Shakespearean plays there may sometimes be some scope for the person speaking to suspect the presence of one or more eavesdroppers on the stage, with differing degrees of certainty, but in the case of the soliloquies to be examined here, this complication does not arise. Another property of soliloquies is also worth noting. It is a general assumption in scholarship on Shakespeare that “by convention … what a character says in a soliloquy is to be taken as sincere,” at least within the limits of what he or she knows (Hussey 1982: 182). Many years earlier, the famous critic A. C. Bradley said in a similar vein that “with Shakespeare the soliloquy generally gives information regarding the secret springs as well as the outward course of the plot” (Bradley [1904] 2007: 166). Hirsch writes that in a soliloquy a “character may give voice to schemes, hopes, regrets, fears, and so on, that the character would not confide even to his closest friend” (2003: 27). Hirsch stresses the novelty of his approach, but making such revelations in a soliloquy is hardly different from giving “information regarding the secret springs … of the plot,” pointed out by Bradley over a century ago. Several critics have also drawn attention to the physical aspect of the Shakespearean stage: the actors were surrounded by spectators when Shakespeare’s



Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 49

plays were performed, and as a consequence the physical openness of the stage promoted a sense of immediacy in the case of a soliloquy (cf. Styan [1967] 1988: 73). A Shakespearean character engaging in a soliloquy in general does not refer to the presence of an audience, but the physical set-up was conducive to a sense of intimacy between the character and the audience. Some commentators do not make a distinction between soliloquies and asides. The present author recognizes that it may sometimes be difficult to distinguish them, but he still prefers to follow the tradition where they are not lumped together. It should also be borne in mind that asides are of different types. One type is what has been called conversational asides. These are addressed to some subset of those on the stage, and as such they are easy to separate from soliloquies. Another type of aside may be illustrated with an example from Othello. The textual extracts in this chapter are from the 1974 Riverside edition, edited by Evans (1974), chosen as a standard edition of the plays. Iago.

Kind gentlemen, let’s go see poor Cassio dress’d; Come, mistress, you must tell ‘s another tale. Emilia, run you to the citadel, And tell my lord and lady what hath happ’d.– Will you go on afore? [Aside.] This is the night That either makes me, or foredoes me quite. Exeunt. (V.i.123–128)

Exeunt, the stage direction for leaving the stage, comes only after Iago’s last sentence in both the Folio (Hinman 1968: 335) and the Quarto (Allen & Muir 1981: 89), and neither of these sources designates Iago’s preceding remark “This is the night / That either makes me, or foredoes me quite” as an aside. It is designated as an aside in the 1974 Riverside edition – Evans 1974 – used here. In any event, it is clear enough from the content of the remark that it cannot have been meant to be heard by the others on stage. This type of aside has been called a “solo aside” by Beckerman (1962: 186), and it is “spoken while other characters are present – and known to be present by the speaker – but unheard by them” (Sprague 1935: 68). A solo aside is typically short, and it seems appropriate to keep it separate from a soliloquy. Soliloquies are here kept separate from asides, following most critics. Even when doing so, it is appropriate to bear in mind that the “Elizabethan playwright did not necessarily think in terms of soliloquy or aside, but more simply of audience address,” as Gingrich (1978: 13) puts it. When asides are kept separate from soliloquies in Othello, the result is that Iago has six soliloquies, while Othello has three of them. Regarding the selection of soliloquies to be considered, in an ideal world the investigator would want to examine soliloquies that are in the same act and in the same scene, preferably next to each other. This would ensure that the soliloquies

50

Juhani Rudanko

occur in circumstances that are identical for both speakers. However, such an idealized state of affairs is unrealistic in the study of an actual play, and it seems sufficient to select soliloquies that are fairly close to each other and that are linked to each other because they relate to the same thematic concerns in the unfolding of the events of the play. To achieve thematic coherence in this sense, the focus is on Iago’s soliloquies in II.iii.336–362, II.iii.382–388, and in III.iii.321–329, which are his fourth, fifth, and sixth soliloquies, and on Othello’s first soliloquy in III.iii.258– 279. These three soliloquies by Iago have a focus on his main plot, which is also a concern of Othello’s first soliloquy, which ensures a degree of thematic coherence. 2. The analytic framework: Agentivity and the Agent role A key question in this study concerns the notion of an Agent. That notion can be seen against the broad background of language as providing the “communicative resources for the definition and enactment of (past, present, and future) realities” (Duranti 2004: 451), with the Agent role being one facet of the communicative resources. In more narrowly semantic terms, the notion of Agent and the related notion of agentivity are based on work on semantic roles, or theta roles, or case roles, as they used to be called in early work, for instance in Fillmore (1968). Two early definitions give us a point of departure for understanding agentivity. Fillmore (1968) used the term Agentive for what is here called an Agent, and defined it as follows: Agentive (A), the case of the typically animate perceived instigator of the action identified by the verb. (Fillmore 1968: 24)

Gruber’s much less famous definition, published a year earlier than Fillmore’s, approached the issue from the perspective of what he called an agentive verb, defining it as follows: An Agentive verb is one whose subject refers to an animate object which is thought of as the willful source or agent of the activity described in the sentence.  (Gruber 1967: 943)

Gruber considered and compared the verbs look and see. The former is normally agentive and the latter is normally non-agentive, and Gruber notes insightfully that there are several linguistic criteria that may bear on the identification of agentive verbs. In particular he identified three such criteria: substitution by do something, compatibility with manner adverbials such as carefully, and with purpose clauses (1967: 943; for the purpose clause criterion, see also Gruber 1976: 160). As regards the first of these, the substitution of a verb, which is a lexical category, by a phrasal category, suggested by the phrase do something, may not be the optimal way of



Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 51

phrasing the criterion, but Gruber’s examples show clearly enough what he had in mind. For instance, consider (1a) to (1c), from Gruber’s article. (1) a. What John did was look through the glass. b. John looked through the glass carefully. c. John looked into the room to learn who was present.

In each case replacing the form of the verb look with the corresponding form of the verb see would result in a less natural sentence. (Gruber (1967: 943) in fact stars the variants with see.) Gruber’s contribution (see also Gruber 1976) was – and still remains – valuable because of the linguistic correlates or reflexes of agentivity that he looked for and proposed. Later contributions to work on agentivity include Cruse (1973). An important point that he makes concerns the naturalness of happen paraphrases in the case of non-agentive constructions. For instance, consider the non-agentive nature of (2a), and the felicity of the happen paraphrase in (2b). (2) a. John died in a car accident. b. What happened to John was that he died in a car accident.

A do paraphrase would be less likely in this case: ?What John did was die in a car accident. Cruse (1973) speaks of nouns as being Agents, but in order to bring the analysis of semantic roles in line with argument structure considerations, it is more appropriate to say that noun phrases, NPs, are Agents. The reason is simply that the subject argument of a sentence is always a phrasal category. It is also helpful in considering the agentivity of a subject NP to take into account the whole verb phrase, VP, of the sentence, and not only the verb. This is because the object of a verb may clearly have a significant effect on the interpretation of the meaning of the verb and of the subject, as has been noted by Marantz (1984: 25–26) and Chomsky (1986: 59–60), who for instance considered the verb break and pointed to examples such He broke a window and He broke his arm. Of these, the predicate of the former is much more likely to be an agentive predicate than that of the latter. These refinements are taken into account in the discussion here from now on. At the same time, these examples illustrate that the analyst needs to pay attention to the context of use when assessing the agentivity or otherwise of a predicate. Lakoff (1977: 244) is a further important contribution. He raised the question of what properties define prototypical agent–patient sentences, and proposed a long list of such properties, consisting of 14 items. Some of them are narrow in scope, for instance number 11 – “the agent uses his hands, body, or some instrument” – and number 14 – “the agent is looking at the patient” (Lakoff 1977: 244) and less useful for that

52

Juhani Rudanko

reason. However, overall the article is important here because it offers an approach to the definition of an Agent on the basis of a set or a cluster of properties or features. This methodological approach, based on features, is the one that the present author has adopted in work on English grammar and its evolution that he has authored or co-authored, including Rudanko (2017) and Gentens and Rudanko (2019), and he has decided to use a set of features in this study to define Agents.1 Three features are singled out from Lakoff ’s list here, as in Rudanko (2017) and Gentens and Rudanko (2019). These are given in the following, with Lakoff ’s (1977: 244) numbering, but with the initial letter of the semantic role capitalized and the personal pronouns in the second and third feature expressed in a more gender-neutral way: 4. the Agent’s action is volitional. 5. the Agent is in control of what he or she does. 6. the Agent is primarily responsible for what happens (his or her action and the resulting change). The first criterion might also be expressed with the concept of “volitional involvement in the event or state,” which is prominent in Dowty’s (1991: 572) analysis of what he terms the Proto-Agent. Regarding the notion of control, mentioned as a second feature, see also Berman (1970). The three features are semantic in nature and they can be summed up as volitionality, control and responsibility. They are also prominent in Hundt’s (2004: 49) comments on agentivity. They are taken into consideration here, in conjunction with the three syntactic criteria from Gruber (1967, 1976) referred to above. For instance, the predicate of the sentence John looked through the glass, cited from Gruber’s (1967: 943) work above, is agentive, because it encodes a conceptualization of an event where the referent of the subject acts volitionally (of his free will), is in control of the action and can be regarded as being responsible for the action. The term “agentive” is assigned here to both the predicate and to its subject, which has the semantic role of Agent assigned by the predicate. That is, an agentive predicate has an Agent as its subject. By contrast, predicates that do not assign the Agent role to their subjects are non-agentive. The three semantic properties listed, supplemented with the syntactic criteria from Gruber (1967, 1976) referred to above, enable the investigator to make a 1. Rudanko (1993: Chapter 4) also considers the application of semantic roles to Othello. However, the present study differs from the earlier one in that essential use is made here of a cluster of features, which was not a basic feature of the earlier study. A second major difference is that the present study has a focus on the dichotomy between Agents, on the one hand, and non-Agents, on the other, which was not a feature of the earlier study. As a consequence, the present investigation moves on lines different from those of the earlier study.



Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 53

decision on agentivity or the lack of it with a fair degree of confidence, and also make it possible, with the help of context, to make a determination for predicates that may be ambiguous when taken out of context. Perhaps the most famous example of such a predicate was brought up by Jackendoff (1972), who pointed to a sentence such as Max rolled down the hill and made this comment: On one reading Max may be asleep and not even be aware of his motion. On the other reading he is rolling under his own volition; for this reading he must be an Agent. (Jackendoff 1972: 34)

Jackendoff ’s sentence has an artificial flavor, as invented sentences often do, but it serves to emphasize the point that it is important to pay attention to the context of use in assessing the agentivity or otherwise of a predicate. (For illustrations of verbs that may be used agentively or non-agentively, see also Perlmutter 1978: 162.) To recall one of the syntactic criteria identified by Gruber (1967: 943; 1976: 160), the addition of a purpose clause, for instance, would resolve the ambiguity, as in John rolled down the hill in order to attract attention. When an addition of this type is compatible with the context of the sentence, the subject is an Agent. The three semantic features of agentivity listed above, volitionality, control and responsibility, may be compared with those offered by Duranti (2004) for what he puts forward as a “working definition of agency.” Agency is here understood as the property of those entities (i) that have some degree of control over their own behavior, (ii) whose actions in the world affect other entities’ (and sometimes their own), and (iii) whose actions are the object of evaluation (e.g. in terms of their responsibility for a given outcome).  (Duranti 2004: 453)

Those working in syntax and semantics often prefer to speak of agentivity, rather than of agency, but there is still considerable overlap between the present approach and Duranti’s. In particular, the element of control is important in both, and the notion of responsibility is a precondition for evaluation. As for Duranti’s second point, agentive actions do often affect other entities, and this is especially clear in sentences with an Agent and a Patient (or Undergoer), but in intransitive sentences, as in Max stood up, this is less to the fore (even though the subject NP would normally be interpreted as an Agent), and the aspect of affecting other entities is not used as a criterion in the present analysis. A further feature that is typically shared by agentive predicates is that they easily permit imperative uses. This has been well expressed by Taylor: Prototypically, an imperative instructs a person to do something, and is therefore only acceptable if a person has a choice between carrying out the instruction or not.  (Taylor 2003: 31)

54

Juhani Rudanko

An imperative of the type Look through the glass! is entirely natural, confirming the agentive nature of the predicate in question. It can also be taken for granted that there is an understood agentive subject in an imperative. 3. Analysis of selected soliloquies by Iago and Othello The three soliloquies by Iago selected in Section 1 for consideration in this chapter are given in (3), (4) and (5) with their predicates analyzed with respect to agentivity on the basis of the discussion in Section 2. In (3), as in later representations, verbs of agentive predicates are marked in bold and those of non-agentive predicates are marked with an underscore. (That only the verbs are marked is for presentational convenience, and it must still be borne in mind that semantic roles are assigned by predicates, not by the verbs in them alone.) The markings in the margin are A and -A, standing for agentive and non-agentive predicates, with the markings located in the same lines as the verbs of the predicates that they pertain to. The line numbers in Evans (1974) are given in the rightmost column. (3) Iago.                                            

And what’s he then that says I play the villain, When this advice is free I give, and honest, Probal to thinking, and indeed the course To win the Moor again? For ‘tis most easy Th’ inclining Desdemona to subdue In any honest suit; she’s fram’d as fruitful As the free elements. And then for her To win the Moor, were[’t] to renounce his baptism, All seals and symbols of redeemed sin, His soul is so enfetter’d to her love, That she may make, unmake, do what she list, Even as her appetite shall play the god With his weak function. How am I then a villain, To counsel Cassio to this parallel course, Directly to his good? Divinity of hell! When devils will the blackest sins put on, They do suggest at first with heavenly shows, As I do now; for whiles this honest fool Plies Desdemona to repair his fortune, And she for him pleads strongly to the Moor, I’ll pour this pestilence into his ear – That she repeals him for her body’s lust; And by how much she strives to do him good,

A, A A   A A     A, A     A, A, A, A A   A   A A A A, A A A A A, A

-A -A   -A   -A   -A   -A     -A                    

336       340         345         350         355      

Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 55



           

She shall undo her credit with the Moor. So will I turn her virtue into pitch, And out of her own goodness make the net That shall enmesh them all. (II.iii.336–362) Totals

A A A A   27

          7

  360        

The analysis of one particular, but frequently recurring feature of language deserves a comment at this point. This concerns the analysis of non-finite subordinate clauses, as in “… she strives to do him good” in line 358. The non-finite subordinate predicate “do him good” does not have an expressed subject, but the assumption is made here that non-finite subordinate clauses have their own understood subjects. This assumption was made by major traditional grammarians, including Jespersen, who put the point as follows: Very often a gerund stands alone without any subject, but as in other nexuses (nexus-substantives, infinitives, etc.) the connexion of a subject with the verbal idea is always implied. (Jespersen [1940] 1960: 140)

The same principle is also taken for granted in much current work (see for instance Postal 1970; Landau 2013) for several reasons, including the consideration that the postulation of understood subjects makes it possible to represent the argument structures of the verbs involved. The principle is accepted here. This is the reason why for instance in “… she strives to do him good,” there are two Agent roles, as marked in the margin, with the first representing the role of the overt NP she and the second representing the role of the understood or covert subject of the subordinate predicate. Similar considerations apply to other non-finite clauses in the extract above and in the other extracts. The two passives “is framed as fruitful as …,” and “is so enfettered to her love …” also deserve a comment. In English grammar it is customary to make a distinction between dynamic (or verbal) passives, describing an event,” on the one hand, and adjectival passives, describing a state resulting from some prior event (Huddleston & Pullum 2002: 1436), on the other (see also Wasow 1977). To illustrate the distinction with an example from Huddleston and Pullum (2002), the sentence They were married may have a verbal interpretation, as in They were married last week in London, or it may have an adjectival interpretation, as in Hardly anyone knew they were married – that they had been for over ten years (Huddleston & Pullum 2002: 1436). Sentences that may be ambiguous without any context can normally be interpreted and disambiguated in their contexts. With respect to semantic roles, understood or implicit Agents can be postulated for dynamic passives, but not for adjectival passives.

56

Juhani Rudanko

As far as the two passives in the extract mentioned above, “is framed as fruitful as …” and “is so enfettered to her love …,” are concerned, both are adjectival, because of their stative or statal nature. They are about a state that has resulted from “some prior event,” to hark back to Huddleston and Pullum (2002: 1436), and they are not about the person or entity that brought about the event or who is responsible for the event, to recall one of the properties of agentivity highlighted in Section 2. (On the backgrounding of the responsibility of the person or entity that brought about the event in passives more generally, see also van Oosten 1984: 231 and Wanner 2009: 186.) As a consequence, the two predicates in question are analyzed as non-agentive. However, overall, it is remarkable how frequent agentive predicates are in the soliloquy. Iago’s fifth soliloquy is given in (4): (4) Iago.                

Two things are to be done: My wife must move for Cassio to her mistress – I’ll set her on – Myself a while to draw the Moor apart, And bring him jump when he may Cassio find Soliciting his wife. Ay, that’s the way; Dull not device by coldness and delay. Exit. (II.iii.382–388) Totals

A A A A A A A   7

        -A -A     2

      385          

In the first line of the fifth soliloquy, the passive is a dynamic passive, and for such a passive, as opposed to an adjectival passive, an understood Agent can be postulated. In the last line, the analysis adopts the reading that is based on the gloss in Honigman ([1997] 2016: 208), where the imperative it taken to be “addressed to himself,” with the sense “don’t let the plot lose its momentum.” Iago’s sixth and final soliloquy is given in (5): (5) Iago.                      

I will in Cassio’s lodging lose this napkin, And let him find it. Trifles light as air Are to the jealous confirmations strong As proofs of holy writ; this may do something. The Moor already changes with my poison: Dangerous conceits are in their natures poisons, Which at the first are scarce found to distaste, But with a little act upon the blood Burn like the mines of sulphur. I did say so. (III.iii.321–329) Totals Grand totals for Iago

A A   A         A   4 38

  -A -A   -A -A -A, -A   -A   7 16

        325              

Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 57



In the first line the verb is lose, part of the VP lose this napkin. Lose is a verb that is often used to form part of a non-agentive predicate to designate an event that is non-volitional, not under the control of the referent of the subject and one that he or she is not responsible for, as in In spite of his efforts, the candidate lost the election. Even a VP of the type lose one’s napkin might well often be used non-agentively, to designate something that happens to the person in question, rather than what he or she undertakes volitionally or is in control of. However, in Iago’s sentence the VP lose this napkin is clearly used agentively. It expresses a fully deliberate action, something that is part of Iago’s plan, with the action fully under his control, and it could easily be followed by a purpose clause, for instance, to recall one of the criteria invoked in Section 2. The subject is therefore classed as an Agent. The example is a good illustration of how each token needs to be considered in its context of use. Overall, in the three soliloquies by Iago there are altogether 38 agentive predicates and 16 nonagentive predicates. Turning to Othello, his first soliloquy is given in (6). (6) Oth.                                              

This fellow’s of exceeding honesty, And knows all [qualities], with a learned spirit, Of human dealings. If I do prove her haggard, Though that her jesses were my dear heart-strings, I’ld whistle her off, and let her down the wind To prey at fortune. Haply, for I am black, And have not those soft parts of conversation That chamberers have, or for I am declin’d Into the vale of years (yet that’s not much), She’s gone. I am abus’d, and my relief Must be to loathe her: O curse of marriage! That we can call these delicate creatures ours, And not their appetites! I had rather be a toad And live upon the vapor of a dungeon, Than keep a corner in a thing I love For others’ uses. Yet ‘tis the plague [of] great ones, Prerogativ’d are they less than the base; ‘Tis destiny unshunnable, like death. Even then this forked plague is fated to us When we do quicken. Look where she comes, If she be false, [O then] heaven [mocks] itself, I’ll not believe’t. (III.iii.258–279) Totals

    A   A, A A         A A     A         A, A A     10

-A -A   -A   -A -A -A, -A -A -A, -A -A   -A -A -A -A -A -A -A -A -A -A   21

    260         265         270         275            

58

Juhani Rudanko

There are some Agents in Othello’s soliloquy, but overall it abounds in non-agentive predications, with subjects that are non-Agents. These include simple adjectival predicates that are non-agentive, as in I am black (line 263), and there are stative passives, as in Prerogativ’d are they less … (line 274) and in Even then this forked plague is fated to us (line 276). They also include the predicates with the verbs live (line 271), love (line 272) and believe (line 279), taking account to the markings in Cook (1989: 211, 213). The numerical findings are represented in Table 1. Table 1.  Numerical findings on agentive and non-agentive predicates in the soliloquies investigated (raw frequencies, relative frequencies and frequencies normalized per 100 words)2   Iago’s soliloquies Othello’s soliloquy

+Agentive

  −Agentive

freq.

rel. freq.

norm.

38 10

70% 32%

11.0  5.4

freq.   16 21

rel. freq.

norm.

30% 68%

 4.6 11.4

The difference in agentive vs. non-agentive predicates between Iago’s and Othello’s production, respectively, is statistically significant at the 0.01 level (Chi-squared value: 10.14; df: 1; Pearson’s Chi-squared test with Yates’ continuity correction), but with a relatively small effect size (Phi Coefficient: 0.35). There is a clear difference in the normalized frequencies of the two types of predicates. Naturally, some caution should be exercised when evaluating the results: there may be hesitancy in some cases about the application of one or more of the criteria of agentivity, because semantic roles “should be thought of as points delimiting a space where predicates are located,” and it is the case that “not every predicate is equidistant from the point nearest to it” (Rudanko 1989: 59). The notion of gradience should therefore be recognized in the assessment of agentivity, and the most clearly agentive predicates in all the soliloquies considered are probably the first three in the line “That she may make, unmake, do what she list” in Iago’s soliloquy in II.iii.346, with the “make” and “unmake” contrast emphasizing Iago’s conception of Desdemona’s power. Nevertheless, agentivity is a linguistically defined notion – and as such, independently defined – and it can be applied to a literary text to draw conclusions about aspects of the text. The present study of agentivity reveals a significant difference between Iago’s and Othello’s language in their soliloquies, also seen in the normalized frequencies in Table 1, and this finding suggests a difference 2. Normalization based on the following word counts (incl. lemmatized contractions): Iago total: 346 words; Othello total: 185 words.



Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 59

in their world-views at this point in the play. As Fowler put it some four decades ago, “[c]umulatively, consistent structural options, agreeing in cutting the presented world to one pattern or another, give rise to an impression of a world-view, what I shall call a ‘mind-style’” (Fowler 1977: 76; for a recent full-scale discussion of mind style, see Nuttall 2018). Fowler’s focus was on novels, but the point also holds for drama and for the language of major characters in Shakespearean soliloquies. 4. Conclusion Soliloquies in Shakespeare’s major tragedies offer unique glimpses into the minds of speakers, into their mindsets or world-views and into how they perceive the situation and circumstances where they find themselves. The notion of agentivity, as understood here, is founded on the semantic notions of volitionality, control and responsibility, and the analysis also draws on three syntactic considerations originally identified by Gruber (1967, 1976). The notion makes it possible to provide a linguistic foundation for judgments that have sometimes been based on intuition alone. In the present case, the study of agentivity shows that in a thematically coherent part of the play, the proportion of Agents is considerably higher in Iago’s soliloquies than in Othello’s soliloquy. A speaker whose language has a high degree of agentivity conceives of his or her role in the world as an actor, as someone who sees the world as being “molded and shaped by the deliberate actions of human beings” (Rudanko 1993: 103). Such a person is presented as being ready to shape the world around him or her, and conceives of himself or herself as capable of doing so. By contrast, a person whose speech has a low degree of agentivity views his or her role as someone who is less of an agent, as someone who is acted upon, as someone who yields the initiative to others. The soliloquy is peculiarly well suited to the study of such issues, because a speaker engaging in a soliloquy is alone and is not affected by the constraints that arise in conversational interaction, where speakers take turns, and speakers react to, and almost always pay attention to, what is said by the previous speaker. By viewing the world around him as something that can be shaped by his deliberate actions, Iago exerts, or at least aspires to exert, a noteworthy degree of power and dominance over those around him, and a study of his soliloquies sheds light on his mindset. The present study invites further work on the notion of agentivity and of the Agent role, and on the application of these notions, and of semantic roles more generally, in the study of canonical literary and other texts. One avenue for such later work would be to investigate whether agentivity should be viewed as a graded notion for the purposes of literary study. A graded notion would imply degrees of agentivity, and a possible way to set up such degrees would be by means of

60 Juhani Rudanko

adjusting the different features of agentivity that are part of the approach used in this study, by giving each a different weight in assessing agentivity. Or it would also be possible to contemplate a system of features of agentivity that is different from the one espoused in this chapter, for instance, by adding a causativity feature. Such potential elaborations are left to future work here. In any event, since the system of semantic roles is based on linguistic considerations, its application to literary analysis is well suited to offer one type of bridge between linguistic and more literary branches of study.

References Allen, M. J. B. & Muir, K. (eds). 1981. Shakespeare’s Plays in Quarto: A Facsimile Edition of Copies Primarily from the Henry E. Huntington Library. Berkeley CA: University of California Press. Beckerman, B. 1962. Shakespeare at the Globe. 1599–1609. New York NY: The Macmillan Company. Berman, A. 1970. Agent, experiencer, and controllability. In Mathematical Linguistics and Automatic Translation [Report NSF–24], S. Kuno (ed.), 203–237. Cambridge MA: Harvard University. Bradley, A. C. [1904]2007. Shakespearean Tragedy, 4th edn. Houndmills: Palgrave Macmillan. Chomsky, N. 1986. Knowledge of Language. New York NY: Praeger. Clemen, W. 1964. Shakespeare’s Soliloquies [The presidential address of the Modern Humanities Research Association]. Cambridge: CUP. Clemen, W. 1987. Shakespeare’s Soliloquies. London: Methuen. Cook, W. S. J. 1989. Case Grammar Theory. Washington DC: Georgetown University Press. Cruse, E. A. 1973. Some thoughts on agentivity. Journal of Linguistics 9(1): 11–23. Dowty, D. 1991. Thematic proto-roles and argument selection. Language 67(3): 547–619. Duranti, A. 2004. Agency in language. In A Companion to Linguistic Anthropology, A. Duranti (ed.), 451–474. New York NY: Blackwell. Ellis-Fermor, U. 1948. The Frontiers of Drama. London: Methuen. Evans, G. B. (ed.). 1974. The Riverside Shakespeare. Boston MA: Houghton and Mifflin. Fillmore, C. 1968. The case for case. In Universals in Linguistic Theory, E. Bach & R. Harms (eds), 1–88. New York NY: Holt, Rinehart and Winston. Fowler, R. 1977. Linguistics and the Novel. London: Methuen. Gentens, C. & Rudanko, J. 2019. The great complement shift and the role of understood subjects. Folia Linguistica 53: 51–87. Gingrich, M. C. 1978. Soliloquies, Asides, and Audience in English Renaissance Drama. PhD dissertation, Rutgers University. Gruber, J. S. 1967. Look and see. Language 43(4): 937–947. Gruber, J. S. 1976. Lexical Structures in Syntax and Semantics. Amsterdam: North-Holland. Hinman, C. 1968. The Norton Facsimile: The First Folio of Shakespeare. New York NY: W.W. Norton.



Chapter 4.  Agentivity in Iago’s and Othello’s soliloquies 61

Hirsch, J. 2003. Shakespeare and the History of Soliloquies. Madison NJ: Fairleigh Dickinson University Press. Honigman, E. A. J. (ed.). [1997]2016. The Arden Shakespeare: Othello. London: Bloomsbury. Huddleston, R. & Pullum, G. K. 2002. The Cambridge Grammar of the English Language. ­Cambridge: CUP. Hundt, M. 2004. Animacy, agentivity, and the spread of the progressive in Modern English. English Language and Linguistics 8: 47–69. Hussey, S. S. 1982. The Literary Language of Shakespeare. London: Longman. Jackendoff, R. 1972. Semantic Interpretation in Generative Grammar. Cambridge MA: The MIT Press. Jespersen, O. [1940]1961. A Modern English Grammar on Historical Principles, Part V: Syntax (Vol. 4). London: Allen and Unwin. Lakoff, G. 1977. Linguistic gestalts. In Papers from the Thirteenth Regional Meeting of the Chicago Linguistic Society, W. A. Beach, S. E. Fox & S. Philosoph (eds), 236–287. Chicago IL: Chicago Linguistic Society. Landau, I. 2013. Control in Generative Grammar: A Research Companion. Cambridge: CUP. Marantz, A. 1984. On the Nature of Grammatical Relations. Cambridge MA: The MIT Press. Muir, K. 1964. Shakespeare’s soliloquies. Ocidente LXVII: 45–58. Nuttall, L. 2018. Mind Style and Cognitive Grammar: Language and World View in Speculative Fiction. London: Bloomsbury Academic. Perlmutter, D. 1978. Impersonal passives and the unaccusative hypothesis. In Proceedings of the Annual Meeting of the Berkeley Linguistics Society, Vol. 4, 157-190. Berkeley CA: BLS. Postal, P. 1970. On coreferential complement subject deletion. Linguistic Inquiry 1: 439–500. Rudanko, J. 1989. Complementation and Case Grammar. Albany NY: State University of New York Press. Rudanko, J. 1993. Pragmatic Approaches to Shakespeare. Lanham MD: University Press of America. Rudanko, J. 2017. Infinitives and Gerunds in Recent English. London: Palgrave Macmillan. Skiffington, L. 1985. The History of English Soliloquy: Aeschylus to Shakespeare. Lanham: University Press of America. Sprague, A. C. 1935. Shakespeare and the Audience: A Study in the Technique of Exposition. ­Cambridge MA: Harvard University Press. Styan, J. L. [1967]1988. Shakespeare’s Stagecraft. Cambridge: CUP. Taylor, J. R. 2003. Meaning and context. In Motivation in Language: Studies in Honor of Günter Radden[Current Issues in Linguistic Theory 243], H. Cuyckens, T. Berg, R. Dirven & K. Panther (eds), 27–48. Amsterdam: John Benjamins. Van Oosten, J. 1984. The Nature of Subjects, Topics and Agents: A Cognitive Explanation. PhD dissertation. University of California at Berkeley. Wanner, A. 2009. Deconstructing the English Passive. Berlin: Mouton de Gruyter. Wasow, T. 1977. Transformations and the lexicon. In Formal Syntax, P. E. Culicover, T. Wasow & A. Akmajian (eds), 327–360. New York NY: Academic Press.

Chapter 5

Saying, crying, replying, and continuing Speech reporting expressions in Early Modern English Terry Walker and Peter J. Grund

Mid Sweden University / University of Kansas

This chapter investigates the form, frequency, and function of speech reporting expressions in Early Modern English, such as quod she in “I perceiue now [$ (quod she) $] how mishap doth follow me” (CED, D1FGASCO, 1573). We focus on the use in the prose fiction texts in Periods 1 and 3 in A Corpus of English Dialogues 1560–1760 (CED). The study points to developments over time, in the distribution of individual verbs as well as groups of verbs with similar functions. Variation is also evident in the word order of the speech reporting expression in relation to the represented speech and in the internal order of the speech reporting expressions (subject+verb or verb+subject). Keywords: speech representation verbs, direct speech, Early Modern English fiction, semantic-functional categories, word order

1. Introduction The characteristics of the spoken language of the past have attracted much interest from English historical linguists, an interest that has been deepened by the development and increasing sophistication of historical pragmatics and historical sociolinguistics (see Kytö 2010). At the same time, the absence of actual recordings of spoken language means that, in order to study historical speech, researchers have to contend with the inevitable filter that written representation of spoken language imposes. The filtering mechanisms of written language are many and varied (see Grund & Walker 2020a). While these may seem like a distraction from recreating what historical speech was like, they are in themselves of interest as indicators of how language users signal, manage, and even reflect on the speech they represent (whether actual or fictional).

https://doi.org/10.1075/scl.97.05wal © 2020 John Benjamins Publishing Company

64 Terry Walker and Peter J. Grund

One integral part of representing what someone else has said is the speech reporting expression that “introduces” the represented speech as a representation, as in (1). (1) [$Then said (^Evangelist^) ,$] If this be thy condition, why standest thou still? [$He answered,$] Because I know not

whither to go.  (CED, D3FBUNYA, 1678)1

In the example, said Evangelist and he answered indicate the identity of the speakers and frame what follows as a representation of their (fictional) speech. The nature of these reporting structures has received considerable attention in previous research on Present-day English, especially the type of verbs used in speech representation. Studies have revealed that these verbs are much more than a small set of straightforward, “neutral” markers; instead, they are varied in their semantic and structural makeup, and they are deployed strategically for a number of communicative purposes. However, just like research on speech representation in general, work on speech reporting expressions in the history of English lags behind (see Section 2). We thus know little about the historical characteristics of these speech-marking mechanisms and diachronic trends in their use. This chapter contributes to scholarship on speech reporting expressions (such as said Evangelist and he answered in (1)) by exploring their form, frequency, and function in Early Modern English, focusing on their use in the prose fiction texts from A Corpus of English Dialogues 1560–1760 (CED). In Section 4.1, we chart what verbs (and other expressions) signal direct speech representation, concentrating on Periods 1 and 3 of the CED (spanning the 1560s to 1670s). To gain greater insight into the motivations behind writers’ choices of expression, we use the framework proposed by Caldas-Coulthard (1987, 1994) to classify the markers according to their functions. In 4.2, we also consider structural characteristics of the reporting expressions, such as word order of the subject and reporting verb (where applicable) and the placement of the reporting clause in relation to the represented speech. Overall, our work contributes to the burgeoning field of speech representation in the history of English (see Grund & Walker 2020b). As such, and utilizing the CED, this chapter draws on resources and research findings from Merja Kytö’s longstanding and influential work on the dynamics and representation of the spoken language of the past (see Kytö 2010; Culpeper & Kytö 2010).

1. For the coding conventions, including the use of [$…$], which denotes text other than direct speech, and (^…^), which signals that the enclosed item appears in a font other than the usual font in the text, see Kytö & Walker (2006: 32–40).



Chapter 5.  Speech reporting expressions in Early Modern English 65

2. Background The representation and negotiation of what other people have said plays a significant role in communication in a number of present-day contexts. Not only do we find representation of speech in obvious quoting circumstances such as newspaper reporting and legal testimony, but people frequently repeat, allude to, and comment on others’ speech in conversation, and they use speech representation in creative works for purposes of plot development and characterization (see e.g. Philips 1986; Tannen 1989; Semino & Short 2004). The scholarly interest in all aspects of this phenomenon has created a rich field of speech representation research, encompassing various methodological and theoretical perspectives (for an overview, see Grund & Walker 2020a). Most relevant for our purposes is the work focused on speech reporting expressions, also known as speech tags, speech reporting clauses, quotatives, and inquit clauses, among other terms, which are usually represented by a clause such as They said, but can also appear as a prepositional phrase (e.g. according to them…), and other, less common, forms. Studies have outlined the grammatical, semantic, functional, and sociolinguistic characteristics of such expressions and their varying deployment in a range of genres and contexts (e.g. Quirk et al. 1985: 1024, note a; Thompson & Yiyun 1991; Urban & Ruppenhofer 2001; Semino & Short 2004; Bevitori 2006). Among other interesting characteristics and importantly for our focus on the CED prose fiction texts, the range of speech reporting verbs appears to be particularly large and varied in present-day fiction compared to other contexts (e.g. Oostdijk 1990: 238; de Haan 1996: 31–37). It is also clear that the expressions carry functional loads beyond simply bringing readers’ and hearers’ attention to the representation of speech: they often signal the speech reporter’s interpretation of and stance towards the reported speech and possibly the original speaker (e.g. Caldas-Coulthard 1987, 1994; Bevitori 2006). Systematic historical research in this area so far is limited. However, it is clear that some of the same dynamics of speech reporting expressions existed historically. The verb say appears to have remained the most prominent speech reporting expression across most of the history of English, with the exception of Old English (e.g. Moore 2011; Aijmer 2015; Cichosz 2019). However, significant changes have also occurred as some expressions have disappeared (such as quoth/quod; Moore 2015; Aijmer 2015; Cichosz 2019) and others have been introduced, including be like (e.g. Urban & Ruppenhofer 2001; D’Arcy 2017). As in present-day language, the inventory of expressions seems to have been particularly rich in literary texts historically, especially in the nineteenth century, where they could play a significant role in characterization (e.g. San Segundo 2016, 2017). The present study adds to our understanding of the diachronic developments of these expressions by giving a detailed picture of their use in fiction before the nineteenth century.

66 Terry Walker and Peter J. Grund

Although we are interested in charting trends for specific expressions, we also want to see how writers deployed groups of expressions with similar meanings and whether changes took place in the distribution of such groups over time. In fictional texts in particular, such groups can be important as they may be implicated in larger dynamics of the creative process, such as casting characters in a particular light or advancing particular themes, as noted above. We here draw on the taxonomy suggested by Caldas-Coulthard (1987, 1994), which has also been used in the study of literary texts historically by, for example, San Segundo (2017) (for another framework, see e.g. Urban & Ruppenhofer 2001: 89). Caldas-Coulthard (1987, 1994) outlines a number of semantic-functional categories and subcategories that speech reporting verbs especially fall into, from Neutral (say, tell) and Structuring (e.g. ask, answer) to Metapropositional (e.g. agree, complain, direct) and Descriptive (e.g. shout, giggle, murmur) (see Caldas-Coulthard 1987: 162–164). Since not all of the categories proved useful for our analysis (which is a finding in itself), we will not elaborate on the full system here, but we will return to the categories as relevant in our subsequent analysis. 3. Material and methodology Our data is drawn from A Corpus of English Dialogues 1560–1760 (CED), compiled by Merja Kytö and Jonathan Culpeper. We focus on the CED samples of prose fiction texts, and in order to begin investigation into changes over time in the early modern period, we elected to examine only the first and third periods of the corpus (1560–1599 and 1640–1679) for this initial foray (see Table 1). We chose the first period to get a sense of the conventions in the earliest forms of fictional texts in the CED, and the third period, which includes texts written at a time with different forms and aesthetics, but which is not too far separated in time, seemed to represent a good contrast. We will expand the study in the future to cover one or more periods. The CED samples of these fiction texts are predominantly stretches of dialogue, with longer narrative stretches summarized by the corpus compilers. These texts fall into a number of subgenres, including chapbooks (in the sixteenth century), and a political satire, heroic romances, and a fictionalized criminal biography, among others (in the seventeenth century) (Culpeper & Kytö 2010: 39). This genre diversity means that any apparent diachronic developments may in fact be attributable to subgenre – or even individual text – differences rather than reflecting overall genre development, as also noted in Grund (2018: 273).

Chapter 5.  Speech reporting expressions in Early Modern English 67



Table 1.  CED Prose Fiction Texts 1560–1599 and 1640–1679 (Kytö & Walker 2006: 19) File name

Publication date

Short text title

Author

Word count

D1FBOORD D1FTALES D1FGASCO D1FCOBLE D1FSHARP D3FMARIA D3FPARLI D3FFIDGE D3FCRISP D3FDAUNC D3FNEWES D3FBUNYA Total

1565 1567 1573 1590 1597 1641 1646 1652 1660 1662 1673 1678  

Mad Men of Gotam Merie Tales Sundrie Flowres The Cobler of Caunterburie Discouerie of the Knights Marianvs The Parliament of VVomen The English Gusman Don Samuel Crispe The English Lovers The Sack-Full of Newes Pilgrim’s Progress  

Andrew Boorde Anonymous George Gascoigne Anonymous Edward Sharpham Anonymous Anonymous George Fidge Anonymous John Dauncey Anonymous John Bunyan  

 4,720  5,550  8,170  9,860 11,080  8,400  5,240  9,270  3,030 11,460  2,070  9,820 88,670

We chose to focus on speech reporting expressions for direct speech given the limitations of time and space for the current study (excluding e.g. indirect speech examples such as “They answered that they would come”); much of previous research has also focused on direct speech, which facilitates comparisons (e.g. Aijmer 2015; San Segundo 2016, 2017; Cichosz 2019). The speech reporting expressions were extracted by close reading of the texts, supported by the “running text other than direct speech” coding ([$…$]; see Kytö & Walker 2006: 37), as no automated search could be devised that would identify all (and only) examples of direct speech reporting expressions without the need for excessive manual processing. Identifying the direct speech reporting expressions proved relatively straightforward, although there were a few examples where the speech could be interpreted as either direct speech or indirect speech: such ambiguous instances were excluded from the data. We coded the speech reporting expressions for text, period, publication year, lemma of the speech reporting expression, form (past tense, present participle, modal, etc.), nature of the subject (pronominal or nominal), word order (SV, VS) of the speech reporting clause subject and verb (when applicable), and placement of the speech reporting expression in relation to the represented speech (initial, medial, or final). We also coded the expressions according to Caldas-Coulthard’s (1987, 1994) categories (see Section 2), as well as noting any “speech descriptors” (as in “said softly”; see Grund 2018).

68 Terry Walker and Peter J. Grund

4. Results 4.1

Individual speech reporting expressions and functional categories

Table 2 shows the overall results in our study. Of the 1,497 examples of direct speech reporting expressions from the two periods investigated, only answer, continue, cry, cry out, quoth/quod, reply, and say have a frequency of 1% or more in one or both periods; a large number of other expressions are used very infrequently (see below). Perhaps not surprisingly, the speech reporting expressions almost exclusively involve verbal constructions, usually of the verb+subject or subject+verb kind (see 4.2); indeed, only one noun construction initiates direct speech in our texts: farewell in (2). Table 2.  Speech reporting expressions in CED prose fiction, Periods 1 and 3 Speech reporting expression say quoth/quod answer reply cry cry out continue Other expressions Total

Period 1 (1560–1599) Period 3 (1640–1679) 395 (45.6%) 406 (46.9%)   24 (2.8%)   8 (0.9%)   5 (0.6%) – –   28 (3.2%) 866 (100%)

402 (63.7%)   67 (10.6%)   53 (8.4%)   54 (8.6%)   7 (1.1%)   11 (1.7%)   6 (1.0%)   31 (4.9%) 631 (100%)

Total   797 (53.2%)   473 (31.6%)    77 (5.1%)    62 (4.1%)    12 (0.8%)    11 (0.7%)     6 (0.4%)    59 (3.9%) 1,497 (100%)

(2) [$Whereunto (^F. I.^) made none answere, but departed with this farewel.$] (^My losse is mine owne, and your gayne is none of yours, […]  (CED, D1FGASCO, 1573)

This mirrors results in studies of present-day fiction (e.g. Semino & Short 2004: 96). The overall picture of the individual verbs is also not dissimilar from present-day patterns of fictional texts, where say is the most common verb in both Oostdijk’s (1990: 238) and de Haan’s (1996: 31) studies; reply, cry, and answer make Oostdijk’s (1990) top-15 list and said, replied, and continued make de Haan’s (1996) top-20 list.2 There thus seems to be some continuity in fictional representation, although Oostdijk (1990) and de Haan (1996) also list verbs that are not recorded in our texts or are barely used (and hence included in the Other expressions category). 2. It is unclear whether Oostdijk (1990) considered only direct speech representation, and, as seen by the forms, de Haan (1996) reports figures for specific forms rather than lemmas. Neither study cites actual frequencies for the verbs.



Chapter 5.  Speech reporting expressions in Early Modern English 69

Indeed, the overall pattern in our dataset – with a few frequent expressions and a number of infrequent ones – is found in both Oostdijk (1990) and de Haan (1996). Several trends are evident in a comparison of the two periods. Overall, the number of lemmas is fairly equal in the two periods: Period 1 has 22 different expressions, while Period 3 has 28. Of course, there is only partial overlap in the expressions in the two periods, but it is difficult to see an overall diachronic development in the Early Modern English prose fiction as constructed in the CED. There is also little variation in the frequency distribution: only a limited number of expressions are used with any great frequency in both periods. But we do see some changes in proportion of these frequent expressions. Whereas in the late sixteenth century quoth/quod dominates (46.9%), by the mid-seventeenth century, it has declined drastically, to only 10.6%. In contrast, say has increased substantially in proportion from 45.6% to as much as 63.7%. It probably takes over contexts where quoth/quod was used, but notably it does not completely make up for the decline. Instead, answer (8.4%) and reply (8.6%) have also become much more frequent, cry has increased slightly in frequency (1.1%), and cry out (1.7%) and continue (1%) have also appeared. The interpretation of these patterns is not straightforward. The larger proportion of additional verbs may be an indication of further integration of alternative means of marking speech representation into the repertoire of fictional language. This also involves more often signalling with the reporting expression the nature of the represented speech or how it should be interpreted (as an answer, as a continuation, as a cry out, etc.; see below). Interestingly, Aijmer (2015: 236) records very similar patterns for the CED depositions texts in terms of overall distribution of the most common speech reporting expression, the decline of quoth/quod, and the existence of various other reporting expressions (although it is not clear how many and what the majority of the markers are). These similarities may point to more general patterns of speech representation conventions in Early Modern English. On the other hand, subgenre or textual differences may also play a role. While there is clearly a decline in quod/quoth over time, the texts reveal considerable variation in their use, even in Period 1 (where the texts are more uniform in terms of subgenre): three texts contain 100+ instances, while one text has zero and another four instances. In Period 3, quod/quoth is absent from four texts and represented in three but at much lower frequencies than in the three Period 1 texts (31, 28, and 8 instances, respectively). Another notable pattern is found in reply, where 45 out of 62 examples are found in The English Lovers (1662, Period 3). This is not likely to be a subgenre trend, as other heroic romances in Period 3 do not show the same pattern. This kind of variation is not wholly unexpected as speech representation mechanisms appear to be sensitive to more general textual dynamics in fictional texts (see below; see also San Segundo 2017; Grund 2018, 2020).

70 Terry Walker and Peter J. Grund

Looking at individual verbs gives us insight into the vicissitudes of particular lexical items and the possible expansion of the inventory of verbs used for speech representation. A broader view is gained by investigating the semantic-functional similarities among the verbs. As indicated in Section 2, we used the system developed by Caldas-Coulthard (1987, 1994) to chart such characteristics: Table 3 and Figure 1 show our results, followed by a discussion and examples that illustrate the categories.3 Table 3.  Frequency of Caldas-Coulthard’s (1987, 1994) functional categories Caldas-Coulthard category Period 1 (1560–1599) Period 3 (1640–1679) Neutral Structuring Descriptive Metapropositional Metalinguistic Total

803 (92.7%)   42 (4.8%)   11 (1.3%)   9 (1.0%)   1 (0.1%) 866 (100%)

479 (76.0%) 114 (18.1%)   31 (4.9%)   6 (1.0%) – 630 (100%)

100

Total 1,282 (85.7%)   156 (10.4%)    42 (2.8%)    15 (1.0%)     1 (0.1%) 1,496 (100%)

Descriptive Metalinguistic

90

Metapropositional

80

Neutral

70

Structuring

60 (%) 50 40 30 20 10 0

Period 1

Period 3

Figure 1.  Development of functional categories

The Neutral category, which is represented primarily by say and quoth/quod (as in (4) and (3) respectively), dominates the picture in both periods (85.7% overall), although the proportion is down somewhat in Period 3. It is important to note that, in Caldas-Coulthard’s (1987, 1994) system, the verbs are classified irrespective 3. One example from Period 3 is excluded as it did not clearly fit into any of the categories.



Chapter 5.  Speech reporting expressions in Early Modern English 71

of what speech act the actual speech representation performs. Said in (4) is thus considered Neutral, although it introduces a question. (3) Nay [$quoth diuers of the gentlemen,$] wee will put in our verdict with you:  (CED, D1FCOBLE, 1590)

(4) [$ (^Hind^) knowing they were his Companions, said,$] (^did they leave thee any money^)  (CED, D3FFIDGE, 1652)

This means that, when we see differences in functional categories over time or across texts, we are concerned with the way the author or narrator of the fictional text highlights how the speech representation should be seen, or how the author/ narrator perceives the delivery of the speech or indeed structures the overall dialogue or narrative for the listener/reader. By this token, the increased proportion of Structuring expressions over time is unlikely to mean an increase in questions and answers, which are the main constituents of this category as defined by Caldas-Coulthard (1987, 1994). It instead likely indicates that narrators/authors highlight more often that what follows should be seen as a question or answer, or the reciprocal nature of the exchange more broadly, which helps organize complex dialogues (see 4.2). In our texts, the Structuring expressions are dominated by answer and reply (as in (5)), while ask is extremely infrequent (2 instances).

(5) Thou most worthy of thy Nation [$ (replyed (^Spencer^) ) $] do not so much mistake me, to think that wounds, blood, death, or all tortures imaginable, could force one drop from hence  (CED, D3FDAUNC, 1662)

The remaining categories are marginal frequency-wise in our material, but represented by a range of different verbs. Descriptive expressions are illustrated in (6) to (7), Metapropositional in (8) to (9), and Metalinguistic in (10).

(6) Then know, [$continued she,$] that I cannot believe that this Bumbast Captain is any thing but an empty vessel, […]  (CED, D3FDAUNC, 1662)

(7) one cryed out,$] let not maides stay from marriage till they are troubled with the greene-sicknesse:  (CED, D3FPARLI, 1646)

(8) in came Sir (^Pemmel^) , and laying the Fire-Fork on their shoulders, bid them,$] (^Rise up Sir Toby, and Sir Samuel Crispe, Knights of the Order of Fond Love^)  (CED, D3FCRISP, 1660)

(9) [$At length, impatient of delay, he uttered his woes in this sort:$] Oh most unfortunate of men, and most wretched of young men!  (CED, D3FCRISP, 1660)

(10) [$Skelton dyd reade:$] Drynke: more Drynke: 

(CED, D1FTALES, 1567)

72

Terry Walker and Peter J. Grund

Metalinguistic expressions (as in (10)), which are limited to this one example in our dataset, focus on marking a “linguistic act” (Caldas-Coulthard 1987: 161). These are related to the Metapropositional expressions, which “name the utterance they refer to” (Caldas-Coulthard 1987: 161). That is, they label the speech act that the following reported speech is supposed to convey, in (8) a directive and in (9) an expressive. As we noted above in conjunction with example (4), speech representations that perform a speech act of course need not be indicated as such by a corresponding speech act verb. Instead, these verbs could be seen as making sure that a particular interpretation or speaker intention is understood (Caldas-Coulthard 1987: 157). However, they may also have characterization functions, as shown by San Segundo (2017: 117–188). It is difficult to identify any such characterization strategies deployed consistently in our data, since the verbs are infrequent and scattered across a number of texts. This does not of course deny the possibility that they have more local functions along these lines. The Descriptive category contains several subcategories. The most straightforward is “discourse progress”, as in continued in (6); these constitute ten of the 42 examples in the category. The more frequent subcategory is “prosodic” (with 32 instances). As the label suggests, these verbs comment on the manner of delivery as regards mostly pitch and strength of voice. The vast majority of our examples involve different constructions of cry, as in (7). We see a marginal rise in Descriptive features across the two periods, but, even with an increase, there appear to be major differences here compared to later fictional texts (although it is difficult to determine the quantitative strength of the differences as previous research does not provide frequency breakdowns). Caldas-Coulthard (1987: 164) records several subcategories that do not appear in our dataset, especially “paralinguistic” markers such as whisper, murmur, and groan. Such verbs are amply attested in both San Segundo’s (2016, 2017) studies of Dickens’s novels, and in Oostdijk (1990: 238) and de Haan (1996: 31), which cover present-day fiction. As San Segundo (2017: 120) notes, Descriptive verbs of this kind can play a major role in the “creation of fictional personalities”. However, our authors do not seem to take advantage of such opportunities. It is thus likely that such uses are a later development, perhaps tied to the development of the novel. (We hope the expansion of our project to include CED Period 5, which encompasses (proto) novels such as Defoe’s Moll Flanders, will give us more insight into this question.) Neither do our texts use combinations of Neutral verbs such as say with adverbial modifiers (or speech descriptors), creating combinations that may be seen as performing similar functions (Caldas-Coulthard 1987: 165). Examples such as said… with a trembling voice in (11) are extremely rare in our data (see Grund 2018, 2020). (11) Mistres [$sayd (^F. J.^) (and that with a trembling voyce) $] assure your self,  (CED, D1FGASCO, 1573)

Chapter 5.  Speech reporting expressions in Early Modern English 73



4.2

Structural characteristics of speech reporting expressions

Previous research (e.g. Moore 2011, 2015; Aijmer 2015; Cichosz 2019) has shown that speech reporting expressions vary in terms of word order, position relative to the direct speech, and type of subject (whether nominal or pronominal). This is to some extent reflected in our data as well. We focus here on the patterns in the dataset as a whole; diachronic patterns could not be discussed in detail within the scope of this chapter, although we do include a brief overview later in this section. Table 4 presents the results as regards word order (speech reporting expressions with a frequency of less than six are grouped together under Other expressions). The “Other” category comprises examples where the subject is understood but not stated, as in (12), or when the speech reporting expression is in the passive voice. (12) His time being short, spoke as followeth 

(CED, D3FFIDGE, 1652)

Table 4.  Speech reporting expressions and word order4 Speech reporting expression answer continue cry cry out quoth/quod reply say Other expressions Total

SV

VS

Other

  40 (51.9%)   1 (16.7%)   8 (66.7%)   10 (90.9%) –   15 (24.2%) 262 (32.9%)   51 (87.9%) 387 (25.9%)

  37 (48.1%)    5 (83.3%)    3 (25.0%) –   473 (100%)   47 (75.8%)   527 (66.1%) – 1,092 (73.0%)

– –   1 (8.3%)   1 (9.1%) – –   8 (1.0%)   7 (12.1%) 17 (1.1%)

Total   77 (100%)    6 (100%)   12 (100%)   11 (100%)   473 (100%)   62 (100%)   797 (100%)   58 (100%) 1,496 (100%)

Strikingly, the less frequent expressions (included under Other expressions), as well as cry out, occur exclusively with SV word order. While all other speech reporting expressions occur with both SV and VS word order, quoth/quod only occurs with inverted word order. This tallies with Cichosz (2019: 189–194, 200), who demonstrates that this verb was restricted to VS word order already in the Middle English period. The great majority of instances of speech reporting expressions occur with VS word order (1,092 instances), all accounted for by just six speech reporting verbs. There is a correlation between word order and the position of the expression relative to the direct speech, as shown in Table 5.

4. The non-verbal speech expression with this farewell (see 4.1) is excluded from Table 4.

74

Terry Walker and Peter J. Grund

Table 5.  Word order and position of speech reporting expressions relative to the direct speech5 Speech reporting expression

SV / Initial

VS / Initial

VS / Medial VS / Final

answer continue cry cry out quoth/quod reply say Other expressions Total

  40 (51.9%)   1 (16.7%)   8 (72.7%)   10 (100%) –   15 (24.2%) 262 (33.3%)   51 (100%) 387 (26.2%)

– – – –   3 (0.6%) – 56 (7.1%) – 59 (4.0%)

  33 (42.9%)   5 (83.3%)   3 (27.3%) – 403 (85.2%)   43 (69.3%) 397 (50.4%) – 884 (59.9%)

  4 (5.2%) – – –   67 (14.2%)   4 (6.5%)   72 (9.1%) – 147 (10.0%)

Total    77 (100%)     6 (100%)    11 (100%)    10 (100%)   473 (100%)    62 (100%)   787 (100%)    51 (100%) 1,477 (100%)

When the word order is SV, the speech reporting expressions in our data invariably take initial position, as in (13). (13) [$another sayde$] I haue thus many red hearings, (CED, D1FBOORD, 1565)

The most common of the speech reporting expressions show variation in position (initial, medial, and final; see (14) to (16)) in relation to the direct speech when the word order is VS. (14) [$Then said (^Christian^) ,$] I rejoyce and tremble.  (CED, D3FBUNYA, 1678) (15) I perceiue now [$ (quod she) $] how mishap doth follow me,  (CED, D1FGASCO, 1573) (16) I am afraid, (^you will finde it so^) , [$replyed the Gentleman:$]  (CED, D3FFIDGE, 1652)

Although four speech reporting verbs exhibit some variation in position, all verbs occurring with VS word order show a clear preference for medial position (59.9%), and initial position is least frequent with VS word order (4.0%). As regards quoth/ quod, this pattern follows the line of development shown by previous research: Moore (2015: 259) found that in both Late Middle English and in her survey of American English in the late modern period, quoth/quod occurs with direct speech, with inverted word order, and in medial position.

5. Excluded from Table 5 are 17 examples in which there was no expressed subject, the abovementioned noun construction, and two examples of say in which the position was unclear.



Chapter 5.  Speech reporting expressions in Early Modern English 75

A development can be detected from Period 1 to Period 3 in the five expressions that occur in both periods. In Period 3, VS final position becomes rare with quoth/quod (4 of 67 instances) and also declines with say (22 of 398 instances), but more interesting is the development as regards VS medial position: the preference for this position with say becomes greater in Period 3, increasing from 43.3% to 57.3%. Moreover answer is rare (2 of 24 instances) with VS medial position in Period 1, but it is the favoured position in Period 3 (31 of 53 instances); similarly, reply does not occur with VS at all in Period 1, while in Period 3 medial position with VS is dominant (43 of 54 instances). Cry is infrequent but also does not occur with VS, in medial position, until Period 3. However, as noted in 4.1, individual texts can exert a strong influence on the patterns: for example, The English Lovers (1662) accounts for 24 of the 31 instances of answer and 40 of the 43 instances of reply in VS medial position. In line with Moore (2011: 57, 2015: 259) who found that Late Middle English direct speech reporting expressions (primarily say and quoth/quod) with VS word order tended to have pronominal subjects, Cichosz (2019) showed that the type of subject was a factor that distinguished the two most frequent verbs, say and quoth/quod. Table 6 presents our results for specific expressions with VS word order (excluding initial position, which rarely occurs with pronominal subjects (8 instances), with say and quoth/quod). Table 6.  Individual speech reporting expressions occurring with VS word order in medial or final position according to type of subject6 Speech reporting expression answer continue cry quoth/quod reply say Total

Nominal

Pronominal

Total

  31 (83.8%)   1 (20.0%)   2 (66.7%) 262 (55.7%)   25 (53.2%) 265 (56.5%) 586 (56.8%)

  6 (16.2%)   4 (80.0%)   1 (33.3%) 208 (44.3%)   22 (46.8%) 204 (43.5%) 445 (43.2%)

   37 (100%)     5 (100%)     3 (100%)   470 (100%)    47 (100%)   469 (100%) 1,031 (100%)

In our data, only continue shows a preference for pronominal subjects, and the raw figures for this verb are extremely low. In her study of Early Modern English, Cichosz (2019: 193) found that say was also found with SV word order with both pronominal and nominal subjects in medial and final position, but that other verbs (except tell) were found only with VS word order. However, as already seen, our

6. Table 6 is thus based on the same data as Table 5, excluding the data for Initial examples.

76

Terry Walker and Peter J. Grund

CED prose fiction data offers no such examples of SV used in positions other than initial position. It has been argued that say and quoth/quod are used in medial position, with VS word order, and with pronominal subjects, as they act as semantically empty inserts to identify who is speaking in stretches of dialogue (Moore 2011: 60; Aijmer 2015: 245–249). We find plenty of examples of this usage in our data, as in (17). (17) At what time I bad him good morrow,$] good morrow [$ (quoth he:) $] God graunt it proue so, but as I speede, so will I praise the day: why sir [$ (said I) $] then well ouertaken, I trust my greeting deserues noe grudge.  (CED, D1FSHARP, 1597)

However, in our data we also find nominal subjects used in a similar way, as in (18). (18) What knaue, [$sayd Skelton,$] art thou a cowarde, hauyng so great Bones? No [$sayde the Cobler.$] I am not a fearde: it is good to slepe in a whole skinne.  (CED, D1FTALES, 1567)

Moreover, this type of exchange is not limited to the verbs quoth/quod or say, and one can argue that reply in (19) is equally semantically empty (Caldas-Coulthard 1987: 155). (19) and what is it [$ (said I in some passion) $] can make me so miserable that you should thus have cause to grieve for me? is (^Mariana^) turn’d inconstant? and hath she now I am come home fraught with full hopes to enjoy her, plighted her faith unto another? It cannot sure be, oh no, [$ (replyed he) $] she continued constant to you even to her last; to her last? [$ (said I) $] and is she then dead?  (CED, D3FDAUNC, 1662)

It is this style of exchange that accounts for the preference for VS medial position in our data, predominantly with the verb say and quoth/quod but also with other verbs. The ratio of pronominal to nominal subjects varies from text to text, regardless of subgenre, in both periods examined (from as little as 2.1%), with only The English Lovers (1662; see (20) and (21)) actually favouring pronominal subjects (59.2%) over nominal subjects. The VS medial position is thus reserved for a small set of speech reporting expressions, for the most part those labelled as Neutral and Structuring in terms of Caldas-Coulthard’s (1987, 1994) semantic-functional categories (see 4.1). Those expressions that are infrequent, including what may be seen as “incoming” expressions, involve especially the Metapropositional verbs, and occur in initial position (and with SV word order) as writers presumably want to let readers know directly how they want the speech to be interpreted.



Chapter 5.  Speech reporting expressions in Early Modern English 77

5. Conclusion Our study has revealed several trends in speech reporting expressions in Early Modern English fiction. Those involving quoth/quod and say proved to be by far the most common, although the former decreased drastically from Period 1 to 3. Answer, reply, and to a lesser extent cry were increasingly common over time, while cry out and continue appeared in Period 3. This also means that, semantic-functionally, Neutral expressions predominate, but other functions, especially Structuring and Descriptive, increase over time. There appear to be complex interactions here with word order, position, and overall dialogic function. For example, it is in initial position with SV that we see Descriptive and Metapropositional verbs, as writers/narrators signal how an unfolding speech representation or dialogue should be interpreted. Medial position, on the other hand, is more or less reserved for semantically “empty” Neutral and Structuring expressions with VS word order, which highlight the shifts between different interlocutors. This split in functional load deserves further attention. Notably, unlike in studies of Present-day and Late Modern English, we did not find Metapropositional expressions used for clear characterization purposes or any Descriptive “paralinguistic” expressions in our data. Such usage is thus likely to have developed later, perhaps in conjunction with the formation of the novel. Examining the final period of the CED in which (proto)novels are represented will help us chart the path of development in this and other regards across the whole Early Modern English period. In a broader perspective, these results underline the picture that has emerged from Merja Kytö’s many publications: recreating the spoken language of the past is a complex puzzle that must be approached from various angles and with various methodologies.

References Aijmer, K. 2015. Quotative markers in A Corpus of English Dialogues 1560–1760. In The Pragmatics of Quoting Now and Then, J. Arendholz, W. Bublitz & M. Kirner Ludwig (eds), 231– 254. Berlin: Mouton de Gruyter. Bevitori, C. 2006. Speech representation in parliamentary discourse: Rhetorical strategies in a heteroglossic perspective: A corpus-based study. In Studies in Specialized Discourse, J. ­Flowerdew & M. Gotti (eds), 155–179. Bern: Peter Lang. Caldas-Coulthard, C. R. 1987. Reporting speech in written narrative texts. In Discussing Discourse, M. Coulthard (ed.), 149–167. Birmingham: University of Birmingham. Caldas-Coulthard, C. R. 1994. On reporting reporting: The representation of speech in factual and factional narrative. In Advances in Written Text Analysis, M. Coulthard (ed.), 295–308. London: Routledge.

78

Terry Walker and Peter J. Grund

CED = A Corpus of English Dialogues 1560–1760. 2006. Compiled under the supervision of M. Kytö (Uppsala University) and J. Culpeper (Lancaster University). Cichosz, A. 2019. Parenthetical reporting clauses in the history of English: The development of quotative inversion. English Language and Linguistics 23(1): 183–214. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. D’Arcy, A. 2017. Discourse Pragmatic Variation in Context: Eight Hundred Years of LIKE. Amsterdam: John Benjamins. Grund, P. J. 2018. Beyond speech representation: Describing and evaluating speech in Early Modern English prose fiction. Journal of Historical Pragmatics 19(2): 265–285. Grund, P. J. 2020. The metalinguistic description of speech and fictional language: Exploring speech reporting verbs and speech descriptors in Late Modern English. In Speech Representation in the History of English: Topics and Approaches, P. J. Grund & T. Walker (eds), 102–130. Oxford: OUP. Grund, P. J. & Walker, T. 2020a. Speech representation in the history of English: Introduction. In Speech Representation in the History of English: Topics and Approaches, P. J. Grund & T. Walker (eds), 1–28. Oxford: OUP. Grund, P. J. & Walker, T. (eds). 2020b. Speech Representation in the History of English: Topics and Approaches. Oxford: OUP. de Haan, P. 1996. More on the language of dialogue in fiction. ICAME Journal 20: 23–40. Kytö, M. 2010. Explorations into ‘spoken’ interaction of the past: Evidence from early English texts. In Anglistentag 2009 Klagenfurt: Proceedings, J. Helbig & R. Schallegger (eds), 9–20. Trier: Wissenschaftlicher Verlag. Kytö, M. & Walker, T. 2006. Guide to A Corpus of English Dialogues 1560–1760. Uppsala: Acta Universitatis Upsaliensis. Moore, C. 2011. Quoting Speech in Early English. Cambridge: CUP. Moore, C. 2015. Histories of talking about talk: Quethen, quoth, and quote. In The Pragmatics of Quoting Now and Then, J. Arendholz, W. Bublitz & M. Kirner Ludwig (eds), 255–270. Berlin: Mouton de Gruyter. Oostdijk, N. 1990. The language of dialogue in fiction. Literary and Linguistic Computing 5(3): 235–241. Philips, S. U. 1986. Reported speech as evidence in an American trial. In Languages and Linguistics: The Interdependence of Theory, Data and Application, D. Tannen & J. E. Alatis (eds), 154–170. Washington DC: Georgetown University Press. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. San Segundo, P. R. 2016. A corpus-stylistic approach to Dickens’ use of speech verbs: Beyond mere reporting. Language and Literature 25(2): 113–129. San Segundo, P. R. 2017. Reporting verbs as a stylistic device in the creation of fictional personalities in literary texts. Atlantis 39(2): 105–124. Semino, E. & Short, M. 2004. Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing. London: Routledge. Tannen, D. 1989. Talking Voices: Repetition, Dialogue, and Imagery in Conversational Discourse. Cambridge: CUP. Thompson, G. & Yiyun, Y. 1991. Evaluation in the reporting verbs used in academic papers. Applied Linguistics 12(4): 365–382. Urban, M. & Ruppenhofer, J. 2001. Shouting and screaming: Manner and noise verbs in communication. Literary and Linguistic Computing 16(1): 77–97.

Chapter 6

Interjections in early popular literature Stereotypes and innovation Irma Taavitsainen

University of Helsinki

Early modern jests and drama provide excellent materials for studies on speech-based language. This article focuses on a core group of interjections, alas, lo and O, and assesses their use from a diachronic perspective. The method of study is qualitative stylistic analysis and the data comes mostly from the popular genres of the Helsinki Corpus (HC) and the Corpus of English Dialogues (CED). Genuine feelings are depicted in romances and tragedies, but in popular genres these items express stereotypical reactions to awkward situations, contributing to audience involvement. Innovative uses emerge with novel stylistic effects in the early seventeenth century. Keywords: emotions, genre, context, convention, humour

1. Preliminaries and research questions Linguistic analysis of texts has given us new knowledge of the ways in which language features work as literary devices in historical texts. The interjections alas, lo and O show interesting diachronic developments that have their roots in the late medieval period. Their use became conventionalized early and shows a dual pattern: on the one hand, interjections convey genuine feelings and, on the other hand, they have been reduced to stereotypical outcries as reactions to awkward situations. In the latter function they mark comical scenes in popular literature. The tradition goes back to Chaucer’s genre of fabliaux, that is antisocial short narratives with a special kind of humour. Their plots turn power hierarchies upside down, for example clever students play tricks on old men marrying young wives, and “fabliaux justice” wins at the end. The importance of interjections is attested in Early Modern English, too, and in combination with other linguistic features they contribute to merriment and laughter in the audience. Thanks to the detailed account of pragmatic noise with interjections in CED (Culpeper & Kytö 2010), we already know some aspects of their diachronic https://doi.org/10.1075/scl.97.06taa © 2020 John Benjamins Publishing Company

80 Irma Taavitsainen

development, and I shall not repeat their research questions. Instead, my aim is to shed new light on the tradition of using these features for various effects and probe into their working mechanisms including irony and other humorous touches. The ironical mode is prominent in Chaucer, and his skillful manipulation of audience reactions poses new research questions: is this feature preserved or does it become simplified in the early modern short narratives? And further, are new shades of meaning and innovative uses added to the repertoire of functions in the course of time? Variability according to genre and audience parameters is a well-established fact, but does the widening readership in a largely illiterate society show in the present data? What are the features attributed to popular literature like? I shall first trace the beginnings of the tradition in the late medieval period and then extend the assessment to the sixteenth and early seventeenth centuries. The empirical part discusses examples that are meant to serve as eclectic fireworks of the kind of humour that these interjections can create. 2. Data and method of the study Early modern jests represent comical tales that follow well-established conventions and circulated widely in different languages (see Davies 1976: 19). The tradition goes back to Chaucer.1 The data to be investigated in the present study comes from HC and CED: A Hundred Merry Tales (HC) was published in 1526 and Penny Merriments (HC) in the last period of the corpus in 1690; Merie Tales (CED) is from 1567 and the Cobbler of Canterbury (CED) from 1590. The situations in which the fictional protagonists interact are prone to inversions and reversals of all sorts. Their humour builds on “jesting […] in flirtation with disorder” (Holcomb 2001: 3). The tradition continues with modification in the early years of the seventeenth century by Robert Armin, who was Shakespeare’s Fool; his A Nest of Ninnies (1608) shows the influence and continuity of Chaucer’s tradition (see Felver 1961). But Armin was also able to renew the conventions and introduce a novel application (see below). Besides jests, drama comedies also feature spoken language in reconstructed speech turns. Thomas Heywood’s How a Man May Chuse (CED 1602) is also important for my argumentation as it contains some innovative uses of interjections. In addition to corpus data, Shakespeare’s Othello (1604)2 is quoted to illustrate genuine emotion in a tragic scene. 1. Examples from Chaucer come from the Riverside edition by Benson (1987). 2. Example (1) comes from the Riverside edition by Evans (1973).



Chapter 6.  Interjections in early popular literature 81

Dictionary definitions of interjections render their basic meanings with short illustrative examples. The language of emotion in jests and drama comedy largely consists of phrases that belong to the stock-in-trade of these genres in Early Modern English. Strings of exclamations, swearing and mild oaths are often found at the turning points of plots. The power words in these passages continue the tradition of swearing by using religious terms (see Hughes 1991). Longer discourse contexts are, however, needed for a more profound examination of the functional profiles of these linguistic features. The main method of this study is qualitative discourse analysis, but the linguistic features under scrutiny are located by corpus linguistic searches. The multilayered contexts of the interjections alas, lo and O are scrutinized from their narrow linguistic cotexts to discourse contexts. The overarching cultural context of the period is also considered together with literacy practices. Orality features with interjections and other emotive and expressive components were important to the reception of popular texts in a society that was still largely illiterate. 3. Beginnings of the tradition and genre continuity Anonymous jests represent afterlives of Chaucer’s fabliaux with numerous interjections that serve as humorous components and contribute to the comical effects, but at the same time they work as structural devices to mark turns of plot and facilitate audience reception. The storylines are fairly simple and build on practical jokes. This kind of literature was delivered either by reading aloud or by word of mouth and targeted at a wide audience including the middle ranks (Brown 2003: 21–22). The surviving literature gives us access to written versions of the jests and records speech-based elements especially in direct quotations; often they reveal the gist of the story allegedly in the protagonist’s own words, thus adding to reader involvement. Comical narratives build on stereotypical character types who react to unexpected situations with emotive outbursts and exclamations. As deictic elements, interjections are dependent on their contexts and receive different shades of meaning accordingly. The tradition goes back to Chaucer and the opposition of the two uses, genuine and stereotypical, is already found in The Canterbury Tales. The Knight’s Tale (KnT) is a romance set in aristocratic surroundings and targeted at courtly circles, whereas its reverse counterpart The Miller’s Tale (MiT) has a rural setting among peasants and the lower social classes, so that it was easy for a wide and heterogeneous audience to identify with the protagonists. The use of alas, lo and O shows different patterns for different target groups. The Knight sighs “Allas, thou felle Mars! Allas, Juno!” (KnT 1559) invoking pagan gods with eloquence; Lo is used to draw readers’

82

Irma Taavitsainen

attention with proximal deictic pronouns to bring the events closer to auditors by appealing to imagination in “Lo here this Arcite and this Palamoun” (KnT 1791). O occurs in vocative use and in a wish with an imperative form O Cupide, out of alle charitee!/ O regne…” (KnT 1623–4). In MiT the carpenter, led by the lecherous student to believe that the Second Flood is coming, wails “Allas, my wyf! And shal she drenche? Allas, myn Alisoun!”, “Allas and weylawey” (MiT 2522–3). The audience is well aware of Nicholas’s hidden agenda and the interjections are used in a mocking tone to emphasize the carpenter’s simplicity in preparation to the fabliaux justice ending. Similarly, in another comic episode, Absolon’s reaction to the misplaced kiss culminates in “Fy! Allas! What have I do?” (MiT 3739). The pattern is found in other fabliaux as well, for example in the string “Out! Help! Allas! Harrow!” (Merchant’s Tale 2366) where the interjections contribute to the racy pace and trigger merriment and laughter in the audience. Likewise, a meaning change occurs when lo is transferred to fabliaux: didactic connotations become enhanced and echo biblical certainty. In addition, lo serves in a textual function to foreground the story climax, reversing accepted social norms. Irony is implied in the narrator’s metacomment “Lo, which a greet thing is affeccioun!” (MiT line 3611).3 The stories under scrutiny in this article are comical and build on different types of humour. Irony gets its force from an unexpected context and intertextuality: words evoke earlier experience of texts and genres so that something more is said than what the surface might indicate. The words convey more to the real audience than to the naïve listeners in the fictional world. Intertextuality and the shades of meaning are often extremely subtle. Parody has been defined as the emotional counterpoint of a tragic theme, and laughter is often created through exaggerated imitation of salient features of a writer’s style. Interjections serve this purpose as their use is easy to exaggerate and the emotions depicted seriously in tragic literature may be reduced to stereotypical reactions in popular genres. Carnivalism in the Bakhtinian sense means a special kind of parody that exposes dogmatic norms to parody (Morson 1981: ix). This type of humour belongs to antisocial literature that mocks society with its own laws of justice and reverses established norms. Fabliaux and jest books belong to this type of literature, and carnivalism is present in Armin’s works, too.

3. For more details and interjections in Chaucer’s narrative techniques, see Taavitsainen (1995a).



Chapter 6.  Interjections in early popular literature 83

4. Definitions and previous studies Culpeper and Kytö (2010) launched the term pragmatic noise and discuss the earlier literature. The three items, alas, lo and O, belong to the core group of interjections, whereas pragmatic noise includes more ad hoc creations as well as hesitations and disfluencies like stammering or onomatopoetic repetitive sounds in imitation of the spoken mode. Culpeper and Oliver survey the definitions in the present volume, starting with Quirk et al. (1985) who classify interjections as emotive words. Ameka (1992) goes into detail about their pragmatics uses in Present-day English with their expressive, conative and phatic functions in spoken language. Biber et al. (1999: 1083) place them in the category of inserts of spoken grammar, with Oh as its most frequent item (Aijmer 1987: 61, 81, 83; see also Aijmer 1996; Heritage 2019). Historical pragmatic studies on speech-based written data have been conducted especially on the early modern but also on the late medieval and late modern periods. My pilot study (Taavitsainen 1995b) provided an overview of occurrences and functions and assessment of written interjections up to 1710. I came to the conclusion that interjections are genre-specific and their colourings are apt to change across genre boundaries; in addition they have textual functions. Culpeper and Kytö (2010) base their study on early modern speech-based genres in CED and present a meticulous diachronic corpus-linguistic analysis including both quantitative and qualitative evidence. The first period of their corpus (starting from 1560) has a fairly high incidence of pragmatic noise and the rising tendency continues till 1639.4 Yet, the different genres need to be considered separately as the distributions are uneven. In conclusion, Culpeper and Kytö (2010: 271) state that “spontaneous” and “expressive” uses are corollaries of oral styles. The audience factor is also important, as the texts were intended as oral entertainment for the middle classes. A more recent study (Murphy 2015) applied statistical keyword analysis with WordSmith tools to Shakespeare’s soliloquies in 37 plays including comedies, histories and tragedies in reference to dialogues in the same data. The aim was to identify key language forms. Pragmatic noise proved to be significant. Expressions where the typically interpersonal O occurs as an interjection expressing strong emotion are frequent, and it is found in “self-reflexive” vocatives and hypothetical wishes in soliloquies. Interestingly from the present viewpoint, Alas is quoted in an example of comic soliloquies: “Alas, how love can trifle with itself!” (Murphy 2015: 350).

4. Play texts show the highest incidence of pragmatic noise: 5.5 items per 1,000 words. Fiction holds the second place with 1.7 occurrences per 1,000 words (Culpeper & Kytö 2010: 224, 267).

84

Irma Taavitsainen

5. Interjections with genuine feelings versus to “[f]lout & mock & Iest”5 In this section I shall present the results of my analysis, contrasting conventional and innovative uses. The empirical evidence comes from both Chaucer and Shakespeare, as stated above. 5.1

Alas

The dual nature of interjections as expressions of genuine versus stereotypical feelings in the data is especially pronounced with alas, conveying the emotions of grief, pity, regret, disappointment, or concern (OED). Originally it comes from Old French Ha las or A las (< Latin Lassum ‘weary’), with some spelling variation: Halas!, Eylace! and Alaas! Short exclamations are quoted from Shakespeare and Milton in the OED examples. OED entry number 2 deals with the collocation alas the day and its variants expressing sorrow, concern, or regret at the events of a particular day; later more generally, alas-a-day (with variants alack the day, alackaday) for expressing surprise or dismay, often in response to a particular event; OED examples come from Chaucer and Shakespeare. The third main entry gives alas as a regretful or sorrowful cry with expressions that are stereotypical. This meaning is frequently found in the present data of early modern popular literature. 5.1.1 Genuine feelings To illustrate the full scale of alas as an expressive interjection, it is necessary to step outside the CED and HC corpora. Shakespeare’s tragedies provide reliable examples that can be considered canonical in how they are used to convey genuine emotions. A case in point with distress and agony can be found in Othello 1604. The scene in example (1) contains several instances of extreme feelings in a deeply tragic monologue with two instances of alas. The first has become a common-stock collocation and a set phrase in later use. However, in this passage it is not a commonplace but reflects genuine feeling enforced by a question that expresses compassion and empathy in Desdemona’s speech, in response to Othello’s command “Ah, Desdemon! Away, away, away!”: “Alas, the heavy day! Why do you weep?” (Act IV, scene ii, lines 41–42; italics mine). The second instance of the interjection, shown in example (1), occurs a couple of lines below and is preceded by a cumulative list of misfortunes (sores, shames) and miserable conditions (poverty, captivity) in powerful images and expressive verbs connected with body parts (“rained […] on my bare head”; “Steeped me […] 5. The citation comes from Heywood 1602, see example (11) below.



Chapter 6.  Interjections in early popular literature 85

to the very lips”).6 Such misfortunes can be tolerated, but a contrast is provided by things that are unbearable. These are introduced by but alas and “being a perpetual figure of scorn”, which can be interpreted as a public laughing stock. It is, however, immediately cancelled by yet. This fate can also be faced, but the ultimate cannot: it is death connected with the loss of love expressed in figurative language “where I have garnered up my heart” and what follows provides the culmination on Othello’s monologue: (1) Had it pleas’d heaven To try me with affliction, had they rain’d All kinds of sores and shames on my bare head, Steep’d me in poverty to the very lips, Given to captivity me and my utmost hopes, I should have found in some place of my soul A drop of patience; but, alas, to make me The fixed figure for the time of scorn To point his slow [unmoving] finger at! Yet could I bear that too, well, very well; But there where I have garner’d up my heart; Where either I must live or bear no life …  (Act IV, scene ii, lines 48–58; emphases in all examples mine)

5.1.2 Stereotypical reactions Genuine feelings show sincerity of attitude and stand in contrast with the mockery found especially in popular jest books. In them, alas is not an expression of feeling at all but is used as a standard reaction to a surprising turn in the plot. It condenses the jest and flouts sincerity by mocking the original sentiment. The situations given in example (2) below are comical: running around naked, for example, depicts an unusual and vivid action, appealing to the comic imagination and likely to release laughter in the contemporary audience. Interjections are employed to highlight the course of events, and they mark such unexpected turns and elicit laughter in anticipation of the final punch line of the story (cf. Norrick 2010). In the following passage, the first occurrence expresses a false pretense of emotion; the second is the victim’s stereotypical lamentation with repetition of both the interjection and the keyword stolen, imitating real life spreading the news with street cries in the contemporary world, and the repetitions are not accidental but imitate natural speech without premeditation. The scene grows into a parody:

6. References to body parts provide another key feature of Shakespeare’s tragedies (see Murphy 2015).

86 Irma Taavitsainen



(2) Sayd the maltman I I haue let my boget in to ye water & there is .xl. li. of money threin yf thou wilt wade in to ye water & go and seke it & get it me agayne

I shall gyue ye .xii. pence for thy labour … The maltman within a whyle after with great payne & depe wadynge founde ye boget & came out of the water & sawe not his felowe there & sawe that his his clothys & money were not there as he left them suspected ye mater and openyd the boget and than founde nothing therein But stonys cryed out lyke a mad man and ran all nakyd to London agayne and sayde Alas alas helpe or I shall be stolen. For my capons be stolen. My hors is stolen. My money and clothys be stolen and I shall be stolen myself …  (HC, Merry Tales, 1526: 148–149)

Proximal deictic expressions are often found in the same passages. Indexical features of proximity include for instance the pronouns this and here that help to bring the events and characters to the immediate experience of the readers, thus contributing to reader involvement.7 They contribute to the overall style of writing and become highlighted with combinations that release merriment and laughter in these stories. Some of Robert Armin’s comical turns of the plot are marked with the interjection alas as well, reflecting mock feelings. The tales are told with present tense verbs and short sentences, using parallel structures that give the passage a special tone, as in example (3). The historical present brings the events close to the audience and makes the story more vivid: (3) Night comes, nine o’clock strikes, Iemy and his horse comes riding forward, sets Him vp and knockes at the doore, she bids him welcome bonny Man: to bed hee goes … who no sooner was in bed, but she herself knockt at the doore, and herself asked who was there, which Iemy hearing was afraide of her Mother; alas Sir says she, creepe vnder the bed, my mother comes …  (HC, Armin, A Fat Foole, 1608: 3c)

Alas is also found in other short fiction of the seventeenth century as well as in direct speech quotations in unhappy situations, for instance in an anonymous text that belongs to the genre popular fiction (example 4).

(4) The gentleman at this looked as pale as ashes, and Meg coming in asked, “What’s the matter?” “Oh, Meg”, quoth he, and fetched a great sigh, “I am arrested and, alas, utterly undone”  (HC, Anon., The Long Meg of Westminster 1590: 95)

7. The use of proximal deictic features served as a distinctive feature between fiction and non-fiction in my earlier study (Taavitsainen 1997).

Chapter 6.  Interjections in early popular literature 87



5.2

Lo

The OED explains the origin of lo as either from Old English la as an expression of surprise, grief or joy, or another possible origin is given as look ‘see, behold’. Lo came into English from Hebrew via Vulgate Latin and originally belonged to Biblical style with occurrences in for example sermons. The original use can be found in biblical quotations and Wycliffite texts. It is described as an expression of surprise, grief or joy (OED), employed to attract attention. 5.2.1 Comical overtones The stereotypical uses of lo in fabliaux convey overtones of absolute certainty, deriving from its original biblical context. In EModE, the same kind of use is found in Merry Tales, and one of the stories employs lo to mark the climax (example 5). The presence of the Latin ergo works in the same direction and enhances the didactic tone; it is another device that derives from classical argumentation patterns emphasizing absolute certainty. In the following joke, the boy thinks himself very witty as in reality there are only two items, not three, as he concludes. Lo can well be interpreted as look: (5) Lo, here is one chicken … Here is two chickens – and one and two maketh three. Ergo, here is three chickens!” Then the father took one of the chickens to himself and gave another to his wife, and said thus: “Lo, I will have one of the chickens to my part, and thy mother shall have another. And because of thy good argument, thou shalt have the third to thy supply.  (Merry Tales, 1526: 26)

5.3

O

The original use of O can be found in a pious wish in rhetorical style in the Coverdale version of the Bible from 1535 (Psalms liv. [lv.] OED) “O that I had wynges like a doue”. O is conspicuously present in philosophical writing and texts about saints’ lives in which it expresses the speaker’s emotional commitment, not only in vocatives like “O thow makere of the wheel that bereth the sterres” (Boece I, metrum 5), but also in other bursts of emotion: “O how weleful artow, if thow knowe thy goodes” (Book II, Prosa 4). This tradition is present in a conventional form for example in an anthem with invocations “O Sapientia, O Adonai” traditionally sung on the days preceding Christmas Eve (OED). 5.3.1 Conventional uses The interjection O typically marks the vocative case. In Culpeper & Kytö’s (2010) data, it is the most common item with 631 occurrences and is said to be the least independent interjection (2010: 277). Meanings vary from appeal to surprise and

88

Irma Taavitsainen

lament, and the interjection casts a poetic and rhetorical tone over the style. The vocative function of O makes the interjection itself somewhat void of meaning, but the stylistic connotations remain: it belongs to the Latinate sophisticated tradition. Busse (2006: 27) notes that interjections such as O and Ah repeatedly introduce vocatives, not only with human reference and personal names like “O Romeo” (Romeo and Juliet 3.2.33) but also “O thou weed” (Othello 4.2.66).8 When transferred to comical literature, the reversal of meaning is clear, but like lo it keeps its textual position and grammatical function as vocative. Collocated with nouns of a special kind, it becomes ironical. O has gained a special function in literary works intended for oral presentation to be read aloud, as the initial position of interjections serves to distinguish speech turns in direct speech quotations. 5.3.2 Innovative uses Novel functions of O emerge when the interjection is employed in a new way for several effects in the written mode. In some other passages of Armin’s text from 1608, O is repeated several times. As a rule, it occurs in short sentences in a series, marking the climax of the story. Deictic elements are often present, sometimes shifting from proximal to distal, for instance from come to go, and present participle forms emphasize the simultaneous course of actions that contributes to the vividness of the narration.9 Body parts are listed in the passage below example (6) with the interjection O with an enhanced deictic function.

(6) By and by the footeman comes sweating, with water powred on his face and head: o my heart says he: O my legs says Iemy, I will not doe so much …  (CED Armin, A Fat Foole, 1608: C 3)

Another special use is also present in Armin’s A Nest of Ninnies (1608), as a further development from the above. The passage in example (7) employs an outsider’s angle with third person pronouns and the interjection O in a sequence of direct speech quotations where the deictic interjection precedes nouns denoting body parts or objects that are nearly equivalent for musicians. As a result, the angle shifts several times in a quick pace turning from one protagonist to the other. This brings about an effective conflict of viewpoints (example 7).

8. Some scholars have even read special meanings to O as a marker of sexual desire or grief (Busse 2006: 27), but have not discussed these cases further. 9. When combined with proximal or distal deictic expressions such as this–that, here–(yonder)–there, now–then or personal pronouns of the first and second or third person, the effect of movement from proximal to distal can be effectively described.



Chapter 6.  Interjections in early popular literature 89

(7) The pyper and the minstrel, being in bed together, one cryed O! his backe and face: the other O his face and eye: the one cryed O! his pype! The other O! his fiddle! Good mussicke or broken consorts, they agree well together.  (HC, Armin, A Nest of Ninnies, 1608: 11–12)

But O occurs in narrative passages as well. In the following, the point of view is shifted in the middle of the passage, and the end imitates the character’s stammering speech with graphic means.10 Examples (8) and (9) show typical items of pragmatic noise: (8) Buy any flawne, pastie, pudding pyes, plumbe pottage, or pescods; O it ws death to lack to do it, but like a willing foole he felte it: buy any, buy any, fla flawne, ppp pasties, a ppp puddling pppp pyes, ppp etc.  (HC, Armin, A Cleane Foole, 1608: C)

A related example, (9), comes from Thomas Heywood’s drama comedy How a Man May Chuse. He wrote his play in rhyming couplets in 1602. It has a racy pace with several turns beginning with O in a sequence: (9) [$ (^Pip.^) $] O my mistris, my mistris, shees dead, shees gone, shees dead, shees gone. [$ (^Hu.^) $] O M. (^Pipkin^) , what do you meane, what do you meane M. (^Pipkin^) ? [$ (^Pip.^) $] O (^Hue^) , o’ Mistris, o’ Mistris, o’ (^Hue^) . [$ (^Hu.^) $] O (^Pipkin^) , o’ God, o’ God, o’ (^Pipkin^) . [$ (^Pip.^) $] O (^Hue^) , I am mad, beare with me, I cannot chuse, o’ death, o’ Mistris, o’ Mistris, o’ death.11 [$ (^Exit.^) $]  (CED, D2CHeyv)12

The use of O occurs in repetitions at a swift pace shifting the attention from one side to the other. The context is also of interest because of its abundance of other emotive and expressive language features. A particularly rich passage in this respect (see example 10) consists of the speech acts of a curse (“Death, hell, and Limbo be his share”), an oath (“by my beard”), an insult (“That Rat, that shrimp, that spindleshanck …”) and a threat (“hang him up”). The vocative use is marked with O and proximal deictic pronouns (hither, here). The culmination point is marked by O, in a vain wish. The insult is an expressive speech act made up of a string of pejorative attributes (cf. contemporary insults with similar lists by Shakespeare and Marlowe; see Jucker & Taavitsainen 2000). 10. For this kind of pragmatic noise, see also Tottie (e.g. 2015) and Jucker (e.g. 2015). 11. According to Culpeper and Kytö (2010: 261, 230), O is typical in death scenes, which are usually faked in comedies. 12. D2C stands for the genre of drama comedies and the second period of the corpus.

90 Irma Taavitsainen

(10) (^ (\Amo, amas, amaui\) ^) still. (^ (\Qui audet\) ^) let him come that dare, Death, hell, and Limbo be his share. [$ (^Enter Brabo.^) $] [$ (^Bra.^) $] Wheres mistris (^Mary^) , neuer a post here, A bar of Iron gainst which to trie my sword? Now by my beard a daintie peece of steele. [$ (^Ami.^) $] O (^Ioue^) what a qualme is this I feele? [$ (^Bra.^) $] Come hither (^Mall^) , is none here but we two? When didst thou see the starueling Schoole-maister? That Rat, that shrimp, that spindleshanck, that Wren, that sheep-biter, that leane chittiface, that famine, that leane Enuy, that all bones, that bare Anatomy, that Iack a Lent, that ghost, that shadow, that Moone in the waine. [$ (^Ami.^) $] I waile in woe, I plunge in paine. [$ (^Bra.^) $] When next I finde him here Ile hang him vp Like a dried Sawsedge, in the Chimnies top: That Stock-fish, that poore Iohn, that gut of men. [$ (^Ami.^) $] O that I were at home againe. [$ (^Bra.^) $] When he comes next turne him into the streets …  (CED, D2CHeyv)

Heywood’s repertoire is, however, much wider, as another scene (example 11) from the same play proves. The use of the interjection contributes to an intensive emotional tone preceding a wish for death: (11)

[$ (^Ami.^) $] O death come with thy dart, come death whe~ I bid # thee, (^ (\Mors vem veni mors\) ^) , and from this misery rid mee. She whom I lou’d, whom I lou’d, eue~ she my sweet pretie (^Mary^) , Doth but flout & mock, & Iest, and dissimulary. [$ (^Ful.^) $] Ile fit him finely:  (CED, D2CHeyv)

This use is very different from stereotypical examples, and here we have a generic modulation going back to the original emotive content, yet the effect is achieved by totally different means. The above examples show the stability of O and Alas in expressions of emotion. They persist throughout centuries in speech-related texts. Some shifts occur in the paradigm as O is often found in the early periods, where Oh would be employed today (Culpeper & Kytö 2010: 282). As we have seen in this chapter, sudden turns of the plot and proclamations of fabliaux justice are evidently present in antisocial literature beyond the Middle Ages. The conventions of presenting emotions and



Chapter 6.  Interjections in early popular literature 91

humour in narratives with written interjections had already become conventionalized by the early modern period. Besides semantic meanings (which may have been bleached), they carry specific stylistic connotations in popular literature. Interjections and proximal deictic expressions bring the events and characters close to the immediate experience of readers. Comic turns of the plot are often marked with strings of interjections and bundles of related features that belong to the common stock of stylistic effects of popular fiction. These passages may also signal specific points in the narrative with comical effects where interjections play an essential part in releasing laughter. The continuum stretches to later centuries with modulations and changes in time (cf. Fowler 1982). Interjections as stylistic devices can be encountered much later, in eighteenth-century Gothic novels, but with a new function: Jane Austen exploits them in character descriptions (Taavitsainen 1998). 6. Conclusions Reductions of meaning can be verified by comparing the use of interjections in most highly regarded romances and tragedies in literary history: the original emotive meanings show a tendency of becoming stereotypical reactions or ironical comments in popular fiction. Anonymous jests are much simpler. Highlights of the plot are often marked with interjections in both, but in contrast to Chaucer, the ironical tone is absent from most later works and verbal mastery is often lacking. Alas is found in passages of regret and lamentation, accompanied with brisk action of running round depicting stereotypical reactions to situations, or pointing out the climax of the story. A more elaborate use is encountered in Armin’s works (1608), as he combines interjections with other deictic elements. The effect is genuinely new, and the edge of humour has shifted from irony to parody to carnivalism. The above analysis shows that interjections may be used for very specific purposes in literature. They are genre-specific devices that act as signals to the audience. In these comic stories they point out amusing turns of the plot by sharpening the contrasts between various scenes. This is the stereotypical use in which the emotive contents are levelled to reactions and set patterns of behaviour. The analysis also shows that skillful authors employ stylistic devices in their own creative ways, modifying the common uses of these linguistic items by adding new shades of meaning. An interplay of proximal and distal pronouns, vocatives with address terms and emotive speech acts contribute to subtle mechanisms that produce special stylistic effects and provide a novel viewpoint to the issue. Thus, it is no exaggeration to state that Early Modern English jests build on medieval comical short fiction in imitation of Chaucer’s fabliaux, but renew them by novel applications for special effects.

92

Irma Taavitsainen

References Aijmer, K. 1987. Oh and ah in English conversation. In Corpus Linguistics and Beyond, W. Meijs (ed.), 61–86. Amsterdam: Rodopi. Aijmer, K. 1996. Conversational Routines in English: Convention and Creativity. London: Longman. Ameka, F. 1992. Interjections: The universal yet neglected part of speech. Journal of Pragmatics 18(2–3): 101–118. Benson, L. (ed.). 1987. The Riverside Chaucer, new edn. Oxford: OUP. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education. Brown, P. A. 2003. Better Shrew Than a Sheep: Women, Drama and the Culture of Jest in Early Modern England. Ithaca NY: Cornell University Press. Busse, B. 2006. Vocative Constructions in the Language of Shakespeare [Pragmatics & Beyond New Series 150]. Amsterdam: John Benjamins. CED = A Corpus of English Dialogues 1560–1760. 2006. Compiled under the supervision of M. Kytö (Uppsala University) and J. Culpeper (Lancaster University). Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Davies, H. N. 1976. The Cobbler of Canterbury: Frederic Ouvry’s Edition of 1862 with a New Introduction by H. Neville Davies. Cambridge: D.S. Brewer. Evans, G. B. (ed.). 1973. The Riverside Shakespeare. Boston MA: Houghton Miffin Company. Felver, C. S. 1961. Robert Armin, Shakespeare’s fool: A biographical essay. Kent State University Bulletin (Kent, Ohio) XLIX(1). Fowler, A. 1982. Kinds of Literature: An Introduction to the Theory of Genres and Modes. Oxford: Clarendon Press. HC = The Helsinki Corpus of English Texts. 1991. Department of Modern Languages, University of Helsinki. Compiled by M. Rissanen (Project leader), M. Kytö (Project secretary); L. Kahlas-Tarkka, M. Kilpiö (Old English); S. Nevanlinna, I. Taavitsainen (Middle English); T. Nevalainen, H. Raumolin-Brunberg (Early Modern English). Heritage, J. 2019. From case-marking to interjection: Speculations on the passage of English oh and its pathways. Guest lecture on the 20th of September at the University of Helsinki. Holcomb, C. 2001. Mirth Making: The Rhetorical Discourse on Jesting in Early Modern England. Columbia SC: University of South Carolina Press. Hughes, G. 1991. Swearing: A Social History of Foul Language, Oaths and Profanity in English. Oxford: Blackwell. Jucker, A. H. 2015. Uh and Um as planners in the Corpus of Historical American English. In Developments in English: Expanding Electronic Evidence, I. Taavitsainen, M. Kytö, C. Claridge & J. Smith (eds), 162–177. Cambridge: CUP. Jucker, A. H. & Taavitsainen, I. 2000. Diachronic speech act analysis: Insults from flyting to flaming. Journal of Historical Pragmatics 1(1): 67–95. Morson, G. S. (ed.). 1981. Preface: Perhaps Bakhtin. In Bakhtin: Essays and Dialogues on His Works, vii–xiii. Chicago IL: University of Chicago Press. Murphy, S. 2015. I will proclaim myself what I am: Corpus stylistics and the language of Shakespeare’s soliloquies. Language and Literature 24(4): 338–354.



Chapter 6.  Interjections in early popular literature 93

Norrick, N. R. 2010. Laughter before the punch line during the performance of narrative jokes in conversation. Text & Talk 30(1): 75–95. OED = Oxford English Dictionary Online, 2nd edn with additions. Oxford: OUP. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Taavitsainen, I. 1995a. Narrative patterns of affect in four genres of The Canterbury Tales. The Chaucer Review 30(2): 82–101. Taavitsainen, I. 1995b. Interjections in Early Modern English: From imitations of spoken to conventions of written language. In Historical Pragmatics: Pragmatic Developments in the History of English [Pragmatics & Beyond New Series 35], A. H. Jucker (ed.), 419–45. Amsterdam: John Benjamins. Taavitsainen, I. 1997. Genre conventions: Personal affect in fiction and non-fiction in Early Modern English. In English in Transition: Corpus-based Studies in Linguistic Variation and Genre Styles, M. Rissanen, M. Kytö & K. Heikkonen (eds), 185–266. Berlin: Mouton de Gruyter. Taavitsainen, I. 1998. Emphatic language and romantic prose: Changing functions of interjections in a sociocultural perspective. In Linguistic Theory and Practice in Current Literary Scholarship, M. Fludernik (ed.). Special issue of European Journal of English Studies 2: 195–214. Tottie, G. 2015. Turn management and the fillers uh and um. In Corpus Pragmatics: A Handbook, K. Aijmer & C. Rühlemann (eds), 381–407. Cambridge: CUP.

Chapter 7

Godly vocabulary in Early Modern English religious debate Jeremy J. Smith

University of Glasgow

The English Reformation of the mid-sixteenth century was characterised by a vigorous public discourse of controversy, mediated by the still-novel printing press. On the one side were those – the godly – who favoured reformed religion; on the other were those – generally exiles – who held to increasingly embattled Roman Catholicism. This chapter compares the outputs of two communities of practice – one Protestant, one Catholic – from a key period in the Reformation’s history: the 1560s. It demonstrates how both sides developed distinctive, ideologically-charged lexicons of theology and insult. It also shows how reformers in particular deployed a coded English vocabulary, including words not usually seen as part of the semantic field of religion, to mark their distinctive discourse community. Keywords: codes, keywords, religion, community of practice, discourse community

1. Godly folk in the 1560s The “middle way” Elizabethan Settlement of 1559 was a compromise made by the English authorities to bring to an end the social divisions that had emerged since Henry VIII’s break from Rome and Queen Mary’s attempt (1553–1558) to reinstate papal authority: Elizabeth’s Church of England was to be both reformed and Catholic (if not Roman Catholic). However, the Settlement did not mean the end of religious controversy within the English polity. Radical evangelicals – Protestants – who identified strongly with varieties of reformed religion strove against conservatives who favoured a resurgent Roman Catholicism. And, in ways that eerily prefigure the Twitter storms and social-media offensives of the present-day, the two extremes, in producing a large body of “involved”, controversial material, looked to the contemporary media to carry their voices: the still-novel, but increasingly active, printing presses. https://doi.org/10.1075/scl.97.07smi © 2020 John Benjamins Publishing Company

96 Jeremy J. Smith

In so doing, it appears the two sides developed distinct, ideologically-charged lexicons: sets of “keywords”, in Williams’s famous formulation (1983). This chapter argues that we can reconstruct with some confidence the linguistic currency with which these writers attacked each other, and we can track, through the vigorous “pamphlet wars” that characterised the period, the interactive nature of their discourse. We can reconstruct what vocabulary they used as a badge of their own religious identity, and we know how they chose to insult those who belonged to the opposing side. In the case of at least one group, the reformers, this usage seems to have been “coded”, not necessarily with an explicit theological denotation. And, although we do not have direct access to these texts’ spoken-equivalents, we can presume, given the “speech-like” nature of so much early modern discourse,1 that such vocabulary was part of their speech as well. Sixteenth- and seventeenth-century English evangelicals traditionally called themselves godly. Early English Books Online (EEBO)2 records some 633 texts with godly in their titles, and a simple word-search of Semantic EEBO, the online concordance containing much of EEBO’s content, shows a significant increase in godly’s usage, with 7.2 occurrences per million words in the 1520s rising to 55.5 per million in the 1540s.3 Such discourses increasingly interest linguists, for example those working on the Linguistic DNA project: … we are interested in the interaction between traditional thesaurus-style semantic categories and the discursive concepts which are being identified by Linguistic DNA. In particular, we look for areas of vocabulary which have undergone rapid expansion or contraction, and seek patterns in these changes, especially whether the emergence of new words appears to drive or be driven by pressure from the discursive concept sets. (Fitzmaurice et al. 2017: 28–9)

And if our goal is to account for such phenomena – which surely it should be, as in all historical subjects such as historical linguistics – then linking linguistic and socio-cultural patterns has to be a major concern. 1. For “speech-like”, see inter alia Culpeper & Kytö (2010). 2. EEBO = Early English Books Online 3. For Semantic EEBO, see , last consulted 19 August 2019. Raw figures provided by this concordance need to be used carefully, given changing patterns and growth in publishing in English during the sixteenth and seventeenth centuries, and indeed the distinct skew – as will be argued – between traditional/conservative/Catholic and reformed/ evangelical/Protestant discourses; nevertheless, for this stage of the argument such figures are at least suggestive. On EEBO’s background and context, see , last consulted 22 August 2019, and also Gadd (2009).



Chapter 7.  Godly vocabulary in Early Modern English religious debate 97

Cultural historians have widely adopted godly – obviously with nuance – as an appropriate modifying adjective to describe persons holding reformist religious ideologies during this period. Important studies include Lamont’s Godly Rule (1969), Collinson’s Godly People (1983), Morgan’s Godly Learning (1988), Webster’s Godly Clergy (2003), and Cambers’s Godly Reading (2011). Much work has also been undertaken on “purist” coinages of Protestant writers as John Cheke, Miles Coverdale, and Edmund Spenser (e.g. hundreder ‘centurion’ in Cheke’s translation of the New Testament).4 But, although such usages have been identified as ideologically significant, little organised investigation has been undertaken on other specialised linguistic usages associated with this discourse community. There is nothing for instance comparable with research undertaken on the usages of the reformers’ late medieval precursors, the Lollards (see e.g. Hudson 1981). However, the availability of historical corpora and other electronic resources allows for much enhanced methodological approaches. Extensive reconstruction of special vocabularies has become much more possible, as demonstrated in numerous corpus-based investigations.5 The following study compares the lexicon of a small group of texts from a key period in the formation of evangelical, Protestant identity in England – the 1560s – with that found in a control group of contemporary texts composed by controversialists writing in the conservative, Roman Catholic tradition, the latter forming a close-knit and – because in exile – necessarily embattled community of practice.6 It will be demonstrated how these two communities developed distinctive lexicons – of theology, of insult, and indeed subtler differences – that reflected their differing ideologies. Given – as we shall see – their close connexions and engagement on common apologetic enterprises, these two groups of people clearly formed what have generally been referred to as communities of practice. First identified by anthropologists 4. For Cheke’s purism, see for example Norton (2000: 26–27) and references there cited. 5. Representative projects within English historical linguistics include the Helsinki VARIENG research-programme (see , last consulted 22 August 2019), and the exciting Sheffield/Glasgow Linguistic DNA project (see , last consulted 22 August 2019); see further Fitzmaurice et al. (2017). 6. For the terminology “evangelical”/“conservative”, now widely accepted by historians of the English reformation, see for instance MacCulloch (1996). The alternative terms “Protestant”/ “Catholic” are sometimes used as synonyms, although it is important to recall, as already flagged, that the Church of England regarded – and still regards – itself as a Catholic communion, while not accepting the pope’s magisterium. I thus generally refer to “reformed” and “Roman Catholic” practices in what follows. Reformed religion followed a different trajectory in Scotland, and a parallel study of Scottish evangelical vocabulary is in progress.

98 Jeremy J. Smith

and educationalists, this notion has recently received a great deal of attention from linguists. Eckert and McConnell-Ginet (1992) have defined it as follows: A community of practice is an aggregate of people who come together in mutual engagement in an endeavor. Ways of doing things, ways of talking, beliefs, power relations – in short practices – emerge in the course of this mutual endeavor. As a social construct, a community of practice is different from the traditional community, primarily because it is defined simultaneously by its membership and by the practice in which that membership engages.  (Eckert & McConnell-Ginet 1992: 464)

Communities of practice are to be distinguished from, though overlap with, two other notions in widespread use in that sub-branch of linguistics known as pragmatics: social networks and discourse communities. In social network research, as widely practised in several disciplines (e.g. history, sociology and anthropology, politics and economics), links between groups and individuals may be mapped in terms of close or weak social ties.7 Perhaps even more relevant for the current project, however, is the notion of discourse communities, that is communicative networks that engage with a common world-view and express their ideologies (however conflicting) in mutually comprehensible ways. Communities of practice differ from discourse communities in that while the latter share a common language they do not share a mutual endeavour.8 2. Materials and methods Perhaps the most important evangelical printer of the first Elizabethan decade was John Day (1521/2–1584). After Elizabeth’s accession in 1558, Day rapidly became a leader in his profession. Best-sellers in Day’s list included The whole book of psalmes (1562), an English metrical psalter at the heart of new forms of worship; a career-highlight was John Foxe’s Acts and monuments (1563), better-known as the Book of Martyrs.9 Day’s output in the 1560s, therefore, included many works with reformed agendas; his editions thus make obvious starting-points for investigating English evangelical vocabulary. Three works by English evangelical writers are analysed here: 7. For the notions of “weak” and “strong” social ties, see famously Milroy (1992); for such ties’ importance for linguistic change in the history of English, see for example Smith (1996). 8. For the notions discourse community and community of practice, see notably the papers in Kopaczyk & Jucker (2013), including references there cited. 9. For bibliographical details of most books referred to, and printed before 1700, see EEBO. Full bibliographical details, however, are given for the six works forming the two corpora. For an account of the publishing context, see most importantly Collinson et al. (2002).



Chapter 7.  Godly vocabulary in Early Modern English religious debate 99

Evangelical texts – Stephen Batman, A christall glasse of christian reformation wherein the godly maye beholde the coloured abuses vsed in this our present tyme (London, 1569) (34,533 words) – Thomas Becon, The sycke mans salue vvherin the faithfull christians may learne both how to behaue them selues paciently and thankefully, in the tyme of sickenes, and also vertuously to dispose their temporall goodes, and finally to prepare them selues gladly and godly to die (London, 1561) (83,613 words) – Edward Dering, A sermon preached at the Tower of London (London, 1569) (7,044 words) All three men were well-known evangelicals. Batman (d.1584) was early a member of Anglican Archbishop Matthew Parker’s antiquarian “research team”; his translation of Bartholomaeus Anglicus’s thirteenth-century encyclopedia, Batman vppon Bartholome (1582), influenced Shakespeare. Becon (1512/13–1567) had been chaplain inter alia to the first Anglican Archbishop, Thomas Cranmer, and on returning from exile had significant roles in Parker’s archdiocese. Dering (d.1576) had a short yet abrasive career as a preacher and pamphleteer. He owed his first preferment in 1567 to Parker, whose chaplain he may have been, and in whose household he certainly spent time. Batman, Becon and Dering – all published by Day – were all therefore part of Parker’s circle: they clearly therefore form a community of practice as defined above. As a control group, three other contemporary texts have been selected: works composed in English, but published by another community of practice, viz. Roman Catholic exiles. As with the evangelical texts, two are long-form while one is shorter:

Roman Catholic texts – Thomas Dorman, A proufe of certeyne articles in religion (Antwerp, 1564) (86,744 words) – John Rastell, A replie against an ansvver (falslie intitled) in defence of the truth (Antwerp, 1565) (81,377 words) – Thomas Stapleton, A returne of vntruthes vpon M. Jewelles replie (Antwerp, 1566) (9,998 words) Dorman (c.1534–c.1577) and Stapleton (1535–1598) were two Oxford graduates who fled to Louvain soon after Elizabeth’s accession; both later moved to Douai’s new English College for Jesuits in 1568/1569. Rastell (1530–1577) was at first one of the Louvain group, but in 1566 he took orders, becoming a Jesuit novice in Rome in 1568, and later still vice-rector of the University of Ingolstadt. Dorman’s and Stapleton’s books were both produced by the same printer, Joannes de Laet. These

100 Jeremy J. Smith

men thus formed another distinctive community of practice. Publishing their writings in the great trading and university cities of the then Spanish Netherlands, all were clearly engaged on a common enterprise; they focused on attacking perceived Anglican heresies, and on preparing to return to an England restored to Roman Catholicism. In what follows, the lexicon of each of the texts above are analysed to identify commonalities and differences. The focus will be on open-class categories, viz. nouns, adjectives, and lexical verbs; pronouns, prepositions and conjunctions are set aside for future study. It is acknowledged that generic distinctions – for instance between Dering’s sermon and Rastell’s attempted refutation of a reformist opponent – need further investigation. More generally, the two corpora are small in comparison with those commonly analysed in such research, with the evangelical corpus being rather smaller (over 125,000 words) than the conservative one (just over 178,000 words) – though some normalising has been undertaken to underline key points. Nevertheless, the corpora differ sufficiently to confirm the validity of the exercise. The methodology adopted is straightforward. Machine-readable versions of each text were concordanced, both on their own and as parts of the two larger corpora under investigation, and the resulting wordlists were screened to identify common open-class forms, in order to exclude “grammar words” such as the, of, and etc. The Oxford English Dictionary (OED)10 was then checked to identify other texts where the open-class words thus distinguished were also deployed. As a control, the simple word-search function of Semantic EEBO was also used in places to offer a picture of more general changes in vocabulary.11 It is hoped that a “proof of concept” has been established for future investigations.12

10. OED = Oxford English Dictionary . 11. Semantic EEBO, as its name suggests, also offers access to semantic tagging in line with the categories developed for the Historical Thesaurus of English (HTE), for which see , last consulted on 23 August 2019. However, HTE’s semantic tagging (e.g. 03.06 Faith), relates to explicitly-defined terms; one of the goals of this enquiry is the subtler discrimination of a confessional vocabulary that is not explicitly religious. 12. All frequencies of forms were calculated using a simple freeware concordancer (AntConc 3.5.7 for Windows), obtainable from < http://www.laurenceanthony.net/software/antconc/>. The texts analysed were downloaded from those provided freely under the Early English Books Online Text Creation Partnership (EEBO-TCP) initiative, for which see . Both resources were last accessed in 29 April 2019.



Chapter 7.  Godly vocabulary in Early Modern English religious debate 101

3. Textual analysis 3.1

Reformed texts

Perhaps unsurprisingly, God/Gods is the most frequently-attested noun in Batman’s A christall glasse (311 occurrences). Other common forms include the following: man/men (200), lord(e) (159), come/came (99), truth (98), loue (79), Christ (75), thing/thinges (71), tyme/time (69), good (62), sonne (59), death (57), hope (50), faith (44), people (42), peace (41), life (38), sinne (36), world (36), saying (35), geue (33), children, iudge and wisdome (31), euill, father, king and made (30), light, mercy, and shewed (29), fayth, iesus, and iudgement (28), put (27), body, Dauid, hand, and know (26). Interestingly, godly – the term traditionally associated with evangelicals – occurs only 16 times; however, the form is roughly matched in frequency by its antonym vngodly, which occurs 15 times along with three instances of its derivative vngodlines(se).13 By contrast, Becon’s The sycke mans salue – admittedly a text over twice the length of Batman’s – contains no fewer than 108 instances of godly/godliness, alongside 18 of vngodly. Otherwise, however, the best-attested vocabulary closely resembles Batman’s, albeit with – naturally, given the text’s greater length – many more attestations. By far the most commonly witnessed open-class lexeme is God/Gods (1330 instances), followed at some distance by the following items (some high scores are clearly associated with the book’s reformed approach to the inherited medieval ars moriendi genre, e.g. comfort, death/die/dead, euerlasting, soule): lord/lorde (589), Christ/Christe/Christes (534), life/lyfe (321), come/came (253), death and man (237), good (224), sinne/sinnes (213), holy (201), saith (189), father (180), faith (170), world (132), blessed (131), sonne (130), geue (124), body (116), made (114), euerlasting (110), great (106), glory (103), beleue (101), die (92), liue and soule (91), pray and worlde (89), faithfull (87), mercy (85), know (82), name (80), heauen (77), true (75), end (70), heart and take (69), sake (67),

13. There are also usages that reflect the peculiar contents of the text; Batman’s A cristall glasse interprets the signification of a series of vigorously-presented woodcut-images. It is thus no coincidence that the terms signification (36 occurrences) and signifieth (46) appear so frequently; the arguably quirky contents of the book determine some less frequent forms, for example the sole-occurring forms foreskin, Troyans, Zorababel, puissaunt; the last word, referring to Christ’s knighthood, is an interesting medieval carry-over.

102 Jeremy J. Smith

put (66), sauiour (63), crosse (61), day (58), doubt (58), lawe and pleasure (57), earth (54), saying (54), glorious (53), grace (53), time (53), trust (53), people, power and trouble (52), comfort, dead and ioy (51). Dering’s shorter Sermon naturally offers fewer attestations than the others. The most commonly-deployed open-class lexemes are Christ/Christe (105 occurrences) and God/Gods (48), followed by bread (31). This last form’s frequency derives from the sermon’s regularly-quoted biblical text, John 6.34 (“And Jesus said unto them, I am the bread of life: he that cometh to me shall never hunger; and he that believeth on me shall never thirst”). Other common semantically-related forms include the following: body/bodye (29), eate (23), bloud (15), drinke (12), tasted (11), feede and flesh (9), drinketh and eateth (6), and – once – wyne in the chalice. As these last examples suggest, Dering confronts directly Roman Catholic views on transubstantiation. He calls the pope (8) “a sicke head of an ill disposed sinagoge” – a sadly traditional antisemitic trope – and his attacks on the adoration of sayntes, aungels and archangeles (one occurrence each), and on the romish (1) doctrine of purgatory (1), all reflect his strong reformist ideology. Vngodlye appears only once, and godly is not used. 3.2

Roman Catholic texts

The Roman Catholic corpus differs lexically in several ways from reformed usage. Although roughly the same size as Becon’s The sycke mans salue, Dorman’s A proufe has a markedly larger vocabulary, with nearly 10,000 distinct lexemes compared with Becon’s 8,000. Some forms are – unsurprisingly – common to both traditions: God/Goddes/Gods appear 243 times, while Christ/Christes appears 339 times. But the Roman Catholic corpus’s most common open-class form is churche (366). Other common items include: man/men (344), wordes (230), priest/priestes (208) first (189), bishop/bishoppes (186), good (183), man/men (182), place (164), saie (159), time (144), priest/ priestes (114), true (112), Rome (100), head (91), councell (90), sacrament (90), holie (80), auctoritie (79), religion (77). By contrast, godly/godlye/godlie is rare, occurring only five times. Some items (saie, time) are comparatively neutral, but others are clearly skewed to Roman Catholic concerns with church authority and/or ritual.



Chapter 7.  Godly vocabulary in Early Modern English religious debate 103

Rastell’s A reply offers a very similar pattern. The most frequent forms are again Christ (412) and church (306), saie/saye (293), God (252), sacrament (215) and sacrifice (164).14 Other frequently-occurring terms include: priest (125), masse (111), receiue (105) and bread (104), bodye (97), bloud (91) and institution (85). Such terms plausibly relate to the Roman Catholic interpretation of the eucharist.15 Godly appears a mere three times. Stapleton’s much shorter A returne offers a slightly distinct pattern. Here the most common open-class form by far is vntruthe/ vntruthes (81), followed by replie (39), truthe and first (33), and good (26): these terms may be related to the fact that is an attack on John Jewel (d.1571), a prominent reformist Stapleton considered especially deceitful. However, terms referring to church institutions are not far behind: bishop/bishops (21), followed by Pope (18), churche (15), doctrine (13), and Catholike (12). Godly appears once only (in the phrase “good, godly and true”). 3.3

Theological differences

So far, then, the findings seem predictable. The Louvain texts deploy terms associated with the Roman Catholic church, its institutions, its doctrines, and its sacraments, while evangelical/reformists writings assert different priorities. It is no coincidence that conservatives commonly – and reformists much more rarely – use terms such as the following:

14. Other texts cited by Semantic EEBO with collocations of sacrament and sacrifice, alongside Rastell’s A reply, include A reioindre to M. Iewels replie against the sacrifice of the Masse (1567) by another Louvain exile, Thomas Harding (d.1572), and The supper of our Lord set foorth according to the truth of the gospell and Catholike faith (1566) by Nicholas Sander (d.1581). Sander was a major figure in the Louvain circle of exiles, and was to die in Ireland in 1581, whither he had accompanied a small invasion-force to assist the Irish against the English army. One of the few collocations of these forms cited from reformed writings in Semantic EEBO is in this context significant, and clearly opposed to the language adopted by Rastell, Harding and Sander: Norton’s translation of Calvin’s Institutes emphasises how “so much doth the sacrifice differ from the sacrament of the supper”. 15. According to Semantic EEBO, however, body and blood regularly collocate in both reformed and Roman Catholic texts in the 1560s; this was a period of great importance in the doctrinal tussle over transubstantiation. Works containing this collocation and most frequently-cited in Semantic EEBO include Norton’s English translation of Calvin’s Institutes, and – by contrast – Sander’s The supper of our Lord.

104 Jeremy J. Smith

authority, bishop, celebration, ceremonies, church, clergy, communicant, consecrate, deacon, dispensation, doctor, doctrine, ecclesiastical, excommunication, interpretation, martyr, mystery, pope, presence, priest, reverence, rite, rule, sacrifice, and sacrament. Frequencies of forms are indicative here, and are pointed up even more if the figures are normalised per 100,000 words to reflect the fact that the evangelical corpus is slightly smaller than the Roman Catholic one (see Table 1). The term sacrament(e) appears only ten times in the entire evangelical corpus, compared with 90 instances in Dorman’s A proufe alone, and 408 in the Roman Catholic corpus overall; church(e), which appears 815 times in the whole conservative corpus, occurs a mere 30 times in the evangelical material; the equivalent figures for priest and its derivatives are 471 (conservative) and 20 (evangelical); and the conservative corpus supplies 162 instances of authority and their derivatives compared with 8 from the evangelical witnesses.16 Table 1.  Theological terminology in the two corpora ITEM

authority church priest sacrament repentance salvation saviour sin

Roman Catholic

Evangelical

(normalised per 100,000 words)

(normalised per 100,000 words)

 91.0 457.6 264.4 229.1   3.4   5.1  25.8  19.6

  6.4  24.0  16.0   8.0  69.5  52.7  95.9 336.3

Roman Catholic

Evangelical

(raw)

(raw)

162 815 471 408   6   9  46  35

  8  30  20  10  87  66 120 421

Similarly, a reformed theological ideology clearly underpins the frequent evangelical deployment of such terms as intercessor, mercy, repentance, resurrection, revelation, saviour, superstition, testimony, and visitation. Examples appear in Table 1. It is, it would appear, no coincidence that Semantic EEBO shows a marked increase in the occurrence of repentance, salvation, saviour from the 1560s onwards; salvation, for instance, usually spelt saluation, appears 2637 times in the 1560s compared with 384 times in the 1550s. Semantic EEBO also records close collocations of repentance 16. Given spelling-variation between witnesses, modern usage is henceforth adopted when citing complete corpora.



Chapter 7.  Godly vocabulary in Early Modern English religious debate 105

and salvation from the 1560s onwards, significantly in Thomas Norton’s translation of John Calvin’s The Institutes of Christian Religion (1561), and in another work by Becon: A new postill conteinyng most godly and learned sermons (1566). Reformists refer to Satan (often spelt Sathan) much more commonly than do conservatives, and the same goes for sins, something evangelicals seem to be more conscious of.17 More subtly, evangelical emphasis on Christ’s lordship doubtless underpins the more frequent appearance of lord etc. in that corpus: 829 times, compared with 122 times in the three conservative texts, a skew that is emphasised even more strongly when the figures are normalised to 662.2 and 68.5 respectively.18 3.4

The vocabulary of insult

Given the fiercely partisan nature of the two communities, it is to be expected that – as in present-day social media – abusive language is deployed. There are differences for this category of language between the two corpora, although distinctions in frequency of occurrences are not so clear-cut, and deserve further research. Although all six writers refer with shock to their radical contemporaries the anabaptistes, only the Roman Catholic corpus has a reference to the Arrian heresy. Evangelicals refer to their opponents as papists (alongside papistical, papistry), a term used by conservatives only when they quote their enemies (as when Dorman quotes Jewel).19 Other forms occasionally used by evangelicals include fornication and its derivatives (12 occurrences), and idolatry (7). Other sporadic forms include massmonger, mumbling and mumming (with reference to the priestly celebration of the mass), romish, sodomical, synagogue, beaders (i.e. tellers of beads, 17. Some distinctions are even more subtle, relating to a distinct language-level; in evangelical texts ‘devil’ is spelt deuill, while Roman Catholics favour diuill. Roman Catholics apparently prefer Latin-derived sa(i)nct (cf. sanctus); reformers prefer saints. 18. Interestingly, kingdom is repeatedly used by reformists (82), but rarely by conservatives (3) (the latter however refer 21 times to Christendom, a form not found in the evangelical corpus). The emphasis on Christ’s kingship seems, like his lordship, to have been an early reformed trope, even though deployed by Roman Catholics in later centuries. It may be relevant that the earliest non-Biblical collocations of kingship and lord recorded in Semantic EEBO are in writings by other major reformers such as the prominent Scottish religious leader John Knox’s An answer to a great nomber of blasphemous cauillations written by an anabaptist (1560). 19. Semantic EEBO notes an increase in the occurrences of papist from 2 occurrences recorded there in the 1530s to 163 in the 1560s, rising to 3191 in the 1680s – this last perhaps reflecting the controversies before and after the Glorious Revolution of 1688. The form Romish follows a similar trajectory.

106 Jeremy J. Smith

viz. the rosary) and superbious.20 With the exception of beaders, not there cited,21 the Oxford English Dictionary (hence OED) offers useful attestations for all such insults.22

Idolatry* OED’s citations for idolatry nearest in date to those in the evangelical corpus are from translations of the Bible published by early reformers: one from 1526 by the Protestant martyr, William Tyndale (d.1536), and the second by William Whittingham (d.1579), which was printed in Geneva in 1557. In later years Whittingham described his conversion to reformed religion as being “called from the blindness of idolatry and superstition” (ODNB).23 Massmonger OED’s earliest attestation of massmonger – a term of contempt for a Roman Catholic priest – dates from 1551, cited from writings by the renegade Carmelite friar John Bale (1495–1563). Romish Romish became an official term of contempt, not only in the records of Parliament – OED cites an Act of 1585, condemning “Priests .. made .. according to the Order and Rites of the Romish Church” – but famously Article XXII, still in force, of the Anglican communion’s Thirty-Nine Articles, Of Purgatory (1571): “The Romish Doctrine concerning Purgatory, Pardons, Worshipping and Adoration, as well of Images as of Relics, and also Invocation of Saints, is a fond thing vainly invented, and … repugnant to the Word of God” (1969: 702).

20. Many forms of insult are shared between the two corpora, of course, such as hypocrisy and related forms. 21. Although OED offers no citation for beader, there is a relevant citation for the word from which it derives. According to an Act of the Elizabethan parliament (13 Eliz. ii para 7), dated 1570, “Crosses, Pictures, Beads and such like superstitious Things” are to be destroyed when encountered. 22. OED is of course undergoing ongoing revision; OED entries not fully updated to date are marked here and below with an asterisk, thus *. Discussion of OED entries is derived from the entries as they appeared last on 22 August 2019. 23. ODNB = Oxford Dictionary of National Biography .



Chapter 7.  Godly vocabulary in Early Modern English religious debate 107

Sodomical 24 OED’s sole sixteenth-century (‘?1556’) citation for sodomical is from “E.P.’s” A confutation of unwritten verities … made by Thomas Cranmer, a collection of Cranmer’s polemical writings: “If these … lawes wer throughly executed … the realme of England should not swarme so ful of runnagates, adulterous and sodomicall Pryestes”. Superbious Superbious appears in Samuel Harsnett’s attack on the Jesuits, A declaration of egregious popish impostures (1603). The usage, though from the early seventeenth century, may be an inheritance from earlier discourse; Harsnett’s parents “were noted among the twelve Colchester Protestants indicted for heresy in 1556” (ODNB), though Harsnett himself, from 1628 Archbishop of York, eventually supported Charles I’s royal absolutism. Synagogue* Finally, synagogue as a term of abuse – when not qualified by “of Satan” (following Revelation II.9) – was already current among evangelicals before Mary Tudor’s accession. OED cites The Booke of Marchauntes, the 1547 English version of the French Protestant pastor Antoine de Marcourt’s Le livre des marchans (1533): “To be slayne and murdred of them, or at the least excommunicate in their sinagog”.25 Unlike their opponents, Roman Catholics used the terms Calvinist (beside Calvinian, now extinct), Lutheran and Huguenot to refer to reformist groupings, confirming that these lexemes, now neutral, were originally insults. Other insults occasionally occurring in the conservative corpus include bloodsuckers, bolters, renegades, preachments and snaphances.26 The last two forms occur in other OED citations from Roman Catholic writings. OED cites one of Thomas Stapleton’s other works, A fortresse of the faith first planted amonge vs Englishmen (Antwerp, 1565): “To folow the preachments of a few apostat friers and monkes”. Similarly, the earliest OED citation for snaphance (‘an armed robber … a freebooter … a desperate fellow or thief ’) is from a 1539 Palm Sunday sermon preached by the conservative-leaning 24. A new entry in the 2018 edition of OED. 25. For Marcourt, see , last consulted 14 May 2019. A Confutation’s authorship is discussed in MacCulloch (1996: 633–636 and passim), who ascribes the work to Stephen Nevinson, a member of Cranmer’s household. For a critique of Cranmer’s biographers, see MacCulloch (2017). 26. The forms are admittedly few. The most frequently-occurring form is renegade, that appears (in such forms as rennegat(e)) three times only.

108 Jeremy J. Smith

Bishop Cuthbert Tunstall (1474–1559): “To make this realme a praye to al venturers, al spoylers, al snaphanses, all forlornehopes”. Comparison with other OED citations, however, suggests that most insults in the Catholic corpus were not – in contrast with those just cited from the evangelical texts – confessionally distinctive. Bloodsuckers is cited in OED from Gracious Menewe’s A plaine subuersyon or turnyng vp syde down of all the argumentes, that the Popecatholykes can make for the maintenaunce of auricular confession (1555). Little seems to be known of Menewe, whose curious name may be a pseudonym;27 but the book’s title flags hostility to Roman Catholicism. The lawyer-poet William Warner (d.1609), a wholly orthodox Anglican, referred to “tedious Preachments” in Albion’s England (1596); and, in 1548, snaphance appeared in the English version of Erasmus’s Paraphrase of the New Testament – a translation patronised, significantly, by Katherine Parr, King Henry’s sixth and most sincerely evangelical queen. 3.5

Some further differences

There are however other, subtler differences where the distinctions become more intriguing, since the words under review are not obviously theological or abusive. On the evidence of a comparison of the two corpora, presented as Table 2, the following lexemes might seem, initially, prototypical of Roman Catholic rather than evangelical discourse, since they occur comparatively commonly in the former corpus rather than in the latter: effect, high, honest, impudent, malice, memory, plainly, and vain. Table 2.  Distribution of eight lexemes in the two corpora ITEM

effect high honest impudent malice memory plainly vain

Roman Catholic

Evangelical

(normalised per 100,000 words)

(normalised per 100,000 words)

15.7 24.7 33.7 12.4 11.8  9.5 73.0 52.2

 2.4 12.8 16.0  0  4.0  5.6 20.0 27.2

Roman Catholic

Evangelical

(raw)

(raw)

 28  44  60  22  21  17 130  93

 3 16 20  0  5  7 25 34

27. EEBO and the National Library of Scotland’s catalogue-entry suggest that “Menewe” was actually Thomas Becon, already cited above, although no reference to the pseudonym is cited in Becon’s ODNB entry.



Chapter 7.  Godly vocabulary in Early Modern English religious debate 109

However, both comparison with Table 1 – where the contrasts are much stronger – and further research drawing on the rich corpus of OED citations suggest that this conclusion needs nuance. Impudent, indeed, though cited from John Foxe’s reformed martyrology Acts and monuments (1563), is from a speech placed in the mouth of the prominent conservative Stephen Gardiner, Bishop of Winchester. However, honest appears in the royal servant – and moderate reformer – Edward Hall’s The union of the two noble and illustre families of Lancaster and York (1548), while OED citations for memory include one from the vigorous reformer Robert Crowley’s The voyce of the laste trumpet (1549). By contrast, analysis of differences in vocabulary suggests that reformed, evangelical discourse was in the 1560s becoming rather more distinctive. Table 3 lists some frequencies relevant – arguably much more convincing than in Table 2 – for distinguishing the two corpora. Some forms (e.g. comfortable) seem much more prototypical of evangelical discourse than others, but the general pattern seems clear. Again, the normalised figures point up the contrast. Table 3.  A possible evangelical code ITEM

beget blessed comfortable elect enemy fire friend gift glory grievous happy heart heal hope humble iniquity joy light love meek mercy

Roman Catholic

Evangelical

(normalised per 100,000 words)

(normalised per 100,000 words)

 1.1 51.7  0.6  2.8  6.2  1.1  8.4  7.9 12.9  3.4  3.4 32.0  0.6  9.5  3.9  2.2  2.8 20.8 23.0  0.6 11.2

 19.2 159.8  78.3  13.6  34.3  24.0  61.5  32.0 168.5  35.1  19.2 115.0  26.4  86.3  35.1  18.4 129.4  65.5 226.1   8.0 191.7

Roman Catholic

Evangelical

(raw)

(raw)

 2 92  1  5 11  2 15 14 23  6  6 57  1 17  7  4  5 37 41  1 20

 24 200  98  17  43  30  77  40 211  44  24 144  33 108  44  23 162  82 283  10 240

(continued)

110 Jeremy J. Smith

Table 3.  (continued) ITEM

neighbour oppress patience promise righteous soul

Roman Catholic

Evangelical

(normalised per 100,000 words)

(normalised per 100,000 words)

 4.5  1.1  2.2 14.0  1.7 18.5

 91.9  12.0  36.7  77.5 99.8 178.1

Roman Catholic

Evangelical

(raw)

(raw)

 8  2  4 25  3 33

115  15  46  97 125 223

Analysis of OED citations would seem to confirm that the forms cited in Table 3 are part of a distinctively evangelical lexicon. Thus, although the Roman Catholic Sir Thomas More seems to have used comfortable* in 1535, in the same year it also appeared in Coverdale’s translation of the Bible (Psalm liii. 6). The most culturally impactful use of the form seems to be from the 1549 reformed Prayer Book: “The moste confortable Sacrament of the bodye and bloude of Christe”. Certainly, according to Semantic EEBO, the term seems to increase in general usage after that date, reaching a peak in attestations in the 1630s. Sixteenth- and early seventeenth-century citations from OED for neighbour and righteous are similarly restricted to reformist sources. OED cites for instance Thomas Norton’s translation of Nowell’s A catechisme, or first instruction and learning of Christian religion (1570): “The name of Neighbour conteineth … also those whom we know not, yea and our enemies”. Norton, who died in 1584, was inter alia Cranmer’s son-in-law. Most OED citations for righteous are from the writings of well-known reformers: Miles Coverdale, William Turner (d.1568), John Daus/Dawes (d.1602), John Frith (d.1533) and John Rastell (d.1536).28 Frith was burned at the stake for denying transubstantiation; Rastell became a reformer through Frith’s influence, and died in the Tower of London; Turner and Daus were translators of wrirings by the leading Swiss reformer Heinrich Bullinger. Occurrences of righteous in Semantic EEBO rise from 130 in the 1550s to 1226 in the 1560s. Semantic EEBO offers further evidence for this discourse, showing (e.g.) that hope and glory collocate regularly in texts from the 1560s onwards, almost always in the writings of reformers.29 And no fewer than 94 texts containing comfortable are 28. This John Rastell is not the same man as the later John Rastell who contributed to the conservative corpus. 29. Works most frequently-cited in Semantic EEBO as containing collocations of hope and glory include John Barthlet’s The pedegrewe of heretiques (1560), Barnabe Googe’s The zodiake of life (1565), and another work by Becon: The pomaunder of prayer (1561). Both Barthlet and Googe were “hardline Protestants” (ODNB).



Chapter 7.  Godly vocabulary in Early Modern English religious debate 111

recorded in EEBO-TCP for the period 1560–1570, almost all written or translated by significant evangelicals or produced by prominent reformist printers like Day. It seems clear that the two groups – though both writing in English – were developing distinctive vocabularies that characterised each community of practice. 4. Conclusions Several conclusions arise from this study. First, special vocabularies of theology and insult clearly characterised both evangelical and conservative discourses during the 1560s. Roman Catholics prototypically referred to the institutions, teachings and ritual practices of their church, while reformists drew on their distinctive theology. And although their terms of insult overlap there are nevertheless notable differences. Secondly, however, the two discourses differed in that, during the 1560s, a distinctive, “coded” lexicon – in addition to the oft-cited term godly – seems to have been emerging within the evangelical/reformist echo-chamber: a marker of a distinctive community of discourse, mediated by reformist printers. Prototypical evangelical texts, therefore, deployed, along with (un)godly, such expressions as repentance and sin, insulted enemies by calling them papists, and – very interestingly – also used less obviously theologically-marked lexemes such as comfortable, joy, and righteous. All such usages remind us of “the relationship between ‘English’ in a purely linguistic sense and the environment and social and other conditions that pertained at the time when particular forms of that language were used” (McIntosh 1994: 135). More generally, it may be observed that investigations of the kind undertaken here have brought into articulation not only large electronic corpora and accompanying resources such as Semantic EEBO, but also the OED and ODNB. Such reimaginings of philology, deploying sophisticated techniques of data-analysis and bringing quantitative and qualitative approaches together, have been an exciting hallmark of Merja Kytö’s research.30

30. I am grateful to the editors and two anonymous reviewers who helped tighten this chapter’s arguments. I am responsible for any remaining flaws.

112 Jeremy J. Smith

References 1969. The Book of Common Prayer. Oxford: OUP. Batman, S. 1569. A christall glasse of Christian reformation. London: Day (= ESTC S115367)31 Becon, T. 1561. The sycke mans salue. London: Day (= ESTC S114654). Cambers, A. 2011. Godly Reading: Print, Manuscript and Puritanism in England, 1580–1720. Cambridge: CUP. Collinson, P. 1983. Godly People. London: Hambledon. Collinson, P., Hunt, A. & Walsham, A. 2002. Religious publishing in England 1557–1640. In The Cambridge History of the Book in Britain IV: 1557–1695, J. Barnard, D. F. McKenzie & M. Bell (eds), 29–66. Cambridge: CUP. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues. Cambridge: CUP. Dering, E. 1569. A sermon preached at the Tower of London. London: Day (= ESTC S113566). Dorman, T. 1564. A proufe of certeyne articles in religion. Antwerp: de Laet (= ESTC S110087). Eckert, P. & McConnell-Ginet, S. 1992. Think practically and look locally: Language and gender as community-based practice. Annual Review of Anthropology 21: 461–490. Fitzmaurice, S., Robinson, J., Alexander, M., Hine, I., Mehl, S. & Dallachy, F. 2017. Linguistic DNA: Investigating conceptual change in Early Modern English discourse. Studia Neophilo­ logica 89: 21–38. Gadd, I. 2009. The use and misuse of Early English Books Online. Literature Compass 6: 680–692. Hudson, A. 1981. A Lollard sect vocabulary? In So Meny People Longages and Tonges: Philological Essays in Scots and Mediaeval English Presented to Angus McIntosh, M. Benskin & M. L. Samuels (eds), 15–30. Edinburgh: Middle English Dialect Project. Kopaczyk, J. & Jucker A. H. (eds). 2013. Communities of Practice in the History of English [Pragmatics & Beyond New Series 235]. Amsterdam: John Benjamins. Lamont, W. 1969. Godly Rule. London: Palgrave Macmillan. MacCulloch, D. 1996. Thomas Cranmer. New Haven CT: Yale University Press. MacCulloch, D. 2017. Thomas Cranmer’s biographers. In All Things Made New: Writings on the Reformation, D. MacCullock (ed.), 256–278. Harmondsworth: Penguin. McIntosh, A. 1994. Codes and cultures. In Speaking in our Tongues, M. Laing & K. Williamson (eds), 135–7. Cambridge: Brewer. Milroy, J. 1992. Linguistic Variation and Change. Oxford: Blackwell. Morgan, J. 1988. Godly Learning. Cambridge: CUP. Norton, D. 2000. A History of the English Bible as Literature. Cambridge: CUP. Rastell, J. 1565. A replie against an ansvver (falslie intitled) in defence of the truth. Antwerp: Diest (= ESTC S121762). Smith, J. 1996. An Historical Study of English: Function, Form and Change. London: Routledge. Stapleton, T. 1566. A returne of vntruthes vpon M. Jewelles replie. Antwerp: de Laet (= ESTC S105218). Webster, T. 2003. Godly Clergy. Cambridge: CUP. Williams, R. 1983. Keywords: A Vocabulary of Culture and Society, 2nd edn. Oxford: OUP.

31. ESTC = English Short-Title Catalogue .

Chapter 8

Patterns of reader involvement on sixteenth-century English title pages, with special reference to second-person pronouns Matti Peikola

University of Turku

Title pages may be viewed as early forms of advertisement, intended to make the potential reader purchase the book and attach a high value to its contents. In research into the consumer psychology of present-day advertisements, second-person pronouns have been found an effective means of persuasion. Based on a comprehensive dataset of sixteenth-century title page texts, this study shows that early English book producers made versatile and creative promotional use of the second-person pronouns you and thou so as to involve the potential reader (purchaser). The core of the analysis consists of a qualitative contextual analysis of the pronoun forms. Keywords: advertising, Early Modern English, involvement, pronouns, title pages

1. Introduction In the language of advertising, pronouns play an important role in constructing relationships between the sender (manufacturer, retailer etc.), the product, and the receiver (consumer, purchaser). In fact, according to Cook (1992: 155), “[o]ne of the most distinctive features of advertising is its use of pronouns”. Second-person pronouns are of specific interest in advertising because of their deictic potential to induce receivers to “become personally involved with the message and product and to make them feel that the communication is meant for them as individuals” (Debevec & Romeo 1992: 86). In advertising research, the strategy of employing second-person pronouns for this purpose is one of the means available for what consumer psychologists call self-referencing, that is “relating a message to one’s

https://doi.org/10.1075/scl.97.08pei © 2020 John Benjamins Publishing Company

114 Matti Peikola

personal experience” (Chang 2011: 147).1 Second-person pronouns are also a hallmark feature of internal requests, which cognitive reader/listener response studies have shown to reduce the psychological distance between the message and its receiver, thereby potentially enhancing the persuasiveness of advertisements, in addition to other factors, such as the motivation of the receiver and the quality of the message (see e.g. Debevec & Iyer 1988; Burnkrant & Unnava 1995; Chang 2011). Historical linguistic studies of English advertising texts have largely focused on the period from ca. 1700 onwards, when advertisements became an important feature of periodical publications (e.g. Gieszinger 2001; Görlach 2002; Brownlees 2017). However, while the late modern period is crucial for the development of advertising genres and the context of mass media communication, the history of promotional texts in English actually goes further back. Owing to the ephemeral status of separately circulating advertisements, only tiny fragments survive of what must have been a prolific promotional culture based on posters (Hirsch 1967: 63– 65). A rare survival of this kind is Caxton’s 1477 broadside advertisement for the Sarum Ordinal; both extant copies were probably salvaged from binder’s waste (Needham 1986: 82). Promotional texts that formed part of some larger publication, however, had a better chance of survival. In the emerging print culture of the early modern period, front matter developed into a promotional system, enabling book producers to communicate with their consumers; Saenger (2005: 197) suggests that it might “constitute something like the birth of modern advertising”. Hirsch (1967: 72) describes front matter as an innovation that helped “the producer to make his merchandise attractive and useful” (see also Genette 1997: 33–34). Within front matter, the title page came to play a key role in this innovative process. Smith (2000) shows how the early modern title page developed from a blank to a space crowded with textual and visual elements geared towards promoting the book. By making various ‘claims’ for the value of their product on the title page, book producers used it as an interactive site in engaging the potential purchaser/ reader. Such claims would often focus on the merits of the work at hand or on those of its author, or highlight some novelty in the material book or its production process, such as the presence of specific visual features or corrections made to a new edition (Smith 2000: 102–108; Olson 2016; Varila & Peikola 2019: 74–77). The promotional potential and interactional nature of title pages is also manifest in early booksellers’ custom of hanging them outside their shops to attract customers

1. The use of second-person pronouns as a characteristic feature of self-referencing in consumer psychology may seem confusing or even counterintuitive in linguistics, where self-referencing tends to be prototypically associated with the use of first-person pronouns, interpreted from the deictic centre of the speaker/language user (see e.g. Fetzer & Bull 2008: 275–276).



Chapter 8.  Reader involvement on sixteenth-century English title pages 115

(Shevlin 1999: 49; Olson 2016: 620–623), reflecting the fuzzy borderline between ephemeral book advertising through posters and that based on front matter. The purpose of this chapter is to shed more light onto English sixteenth-century book producers’ use of the promotional potential of the title page, exploring their use of second-person pronouns as a ‘self-referencing’ strategy (cf. Chang 2011, above). The study thus joins the ranks of recent pragmaphilological and book-historical research into the communicative properties of Early Modern English (EModE) title pages (e.g. Tyrkkö et al. 2013; Olson 2016; Ratia & Suhr 2017; Varila & Peikola 2019). The core of the analysis consists of a qualitative contextual analysis of second-person pronouns (Section 4.2). 2. Second-person pronouns The strategic potential of second-person pronouns in promotional texts can be linked to the general discourse properties of personal pronouns. As Fetzer and Bull (2008: 275) observe, “[b]ecause of an individual’s multiple social, discursive and interactional roles, a personal pronoun can refer to more than one identity and therefore can express multiple meanings” (see also Tyrkkö 2016). According to Myers (2008: 359), it is precisely due to this “slippery” quality that “the you of ads is powerful”. In addition to its potential for addressing actual (anticipated) readers individually or collectively, as in the famous World War I enlistment poster “Your Country Needs You”, the textual you may also be used without a clear addressing purpose, in an impersonal or generalized referential function of the kind typical of recipes or proverbs, as in the proverb “You can lead a horse to water but you can’t make it drink” (Herman 2002: 340–341; Fetzer & Bull 2008: 276–280; Bell & Ensslin 2011: 314–315). Furthermore, you has the potential of indexing entities in the fictional world(s) of the text, for example when the pronoun is used in communication between the characters in a novel (Herman 2002: 341, 345; Bell & Ensslin 2011: 316). In both modern advertisements and early modern title pages alike, there is also the possibility of what Herman (2002: 341–342), discussing modern narrative fiction, calls actualized address, in which addressing a character in the text-internal fictional world is extended to the ‘real’ (external) world. In a British World War I enlistment poster discussed by Myers (2008: 359–360), for example, the question “Daddy, what did YOU do in the Great War”, asked by a girl sitting on the lap to his father in an armchair, clearly extends the you from the father to the reader of the poster. Ultimately, in double deixis, the internal and external references may become blurred and wholly superimposed upon one another (Herman 2002: 342–343; Macrae 2015: 111–115). In advertising, the persuasive power of the second-person reference essentially requires the reader to recognize themselves in the general or fictional you (Myers

116 Matti Peikola

2008: 368). Cook (1992: 156–157) characterises this deictic quality of you in advertisements as a double exophora. He maintains that its function is different from literary narrative in the sense that advertisements practically ‘force’ their readers to project themselves onto the you, whereas literary readers have the option of detaching themselves from it and identifying the you with some other referent(s) instead (Cook 1992: 157). Macrae (2015: 110), however, reports studies by narratologists who doubt readers’ ability to completely detach themselves from the you even if they are aware of its pragmatic fluidity. In their use for indexing the addressee, second-person pronouns belong to the linguistic markers of what Chafe calls “involvement of the speaker with the hearer” (Chafe 1985: 116–117; see also Biber 1988: 105, 225). In a factor-analytical study of EModE genre conventions, Taavitsainen (1997: 210) found second-person pronouns among the most salient of the features contributing to the factor she interpreted as “Interaction”. She identified this factor as typical of the genre of fiction, particularly passages that emulate natural dialogue and show a strong emotional stance (ibid. 213–217). In a genre study of English medical writing between 1375 and 1550, Taavitsainen (2000) found the use of second-person pronouns characteristic of instructive texts that focus on the actions expected of the reader (cf. Werlich 1976: 136). Another frequent context for these pronouns to occur consisted of metatextual passages that “bring a reader-centred point-of-view to the text” (Taavitsainen 2000: 200; see also Taavitsainen 2006). In that context, second-person pronouns may usefully be approached as instantiations of addressee-related metadiscourse, which “typically make use of 2nd-person pronouns or imperative constructions” (Boggel 2009: 28; see also Chaemsaithong 2013: 173–174). Owing to a systemic difference, the deictic potential of you in sixteenth-century English is not identical with that of the PDE you; in this study, we also needed to take into account the singular pronoun thou. Although thou was on the decrease, it was still commonly used at the close of the sixteenth century (and well beyond), especially in fiction and speech-related genres (e.g. Finkenstaedt 1963: 172–173; Taavitsainen 1997: 239–244; Walker 2003). The complex linguistic and extralinguistic factors affecting the choice of the singular you vs. thou in EModE have been subject to extensive discussion and debate (for useful summaries, see Walker 2007: 39–63; Buyle & De Smet 2018). The previously dominant interpretation of thou as invariably the “marked” form, signalling social asymmetry between the interactants or their adoption of an affective/agitated style (e.g. Calvo 1992; Wales 1996: 75–76; Lass 1999: 149–150), appears to have given room to pragmatically more nuanced approaches, seeking to interpret individual choices of you/thou based on the flexible social roles and shifting interactional statuses of the participants in a specific communicative situation (e.g. Jucker 2000; Walker 2007: 48). Thus, Buyle & De Smet (2018: 53) argue that “semantic neutrality in EModE you is only apparent”.



Chapter 8.  Reader involvement on sixteenth-century English title pages 117

3. Research design This study was based on a digital dataset of English title pages from 1501 to 1600. The dataset was compiled by the Framing Text research team at the University of Turku, from the bibliographic metadata available through ProQuest’s Early English Books Online (EEBO) database.2 The txt-formatted metadata, arranged into 100 annual files, comprise the title-page texts of almost 15,000 publications (editions), that is a large majority of all surviving publications from the sixteenth century.3 While the metadata systematically include the titles of the publications and information about the printer/bookseller when they appear on the title page, there is some variation in the extent to which other text segments are recorded. This applies especially to epigrams and other short verses that have sometimes been excluded by the EEBO compilers. AntConc was used to find occurrences of the second-person pronouns you (EModE nominative form ye) and thou in the title page metadata. For both pronouns, their object (you, thee) and genitive (your/yours, thy/thine) forms were also included. As the metadata were not lemmatised, spelling variations had to be worked into the queries. Table 1 shows the spelling variants used in the queries. For practical reasons, the possible spelling variant “the” for thee had to be left out from the query, as its homonymity with the definite article resulted in 49,167 hits in the dataset. A close reading of all the retrieved title pages containing forms of thou (using the variants listed in Table 1) revealed five title pages in which “the” was used for thee – four of them in editions of the same work from the 1530s and 1540s. As a ‘safety check’, all hits for “the” in 1530–1560 (N = 12,624) were then browsed using the keyword-in-context (KWIC) display in AntConc. No other occurrences of “the” as a variant of thee were spotted, suggesting that the exclusion of “the” from the query did not lead to any loss of a substantive amount of relevant data for the century as a whole. The results generated by AntConc were checked individually using the KWIC display, and all irrelevant hits were removed (such as instances of “ye” standing for the definite article the). All relevant hits (i.e. those representing the pronouns shown in Table 1) were then checked against the corresponding image in EEBO. A few instances of pronouns were discarded from the data at this stage, as these pronouns occurred in broadside publications (proclamations, ballads), without an actual title page, and were also not part of their typographically highlighted title or first paragraph. Pronouns occurring in the visually highlighted title or first 2. The dataset was compiled from EEBO in 2016–2018 using the old Chadwyck-Healey interface before its replacement with the new ProQuest platform in August 2019. 3. Barnard & Bell (2002: 780–782) list 15,367 titles (editions) for the years 1501–1600.

118 Matti Peikola

Table 1.  Case forms and spelling variants of the pronouns you and thou used in the query Pronouns

Case forms

Spellings

you

ye you your/yours thou thee thy/thine

“ye”, “yee” “you”, “yow” “your”, “yowr”, “youre”, “yowre”, “yours”, “yowrs” “thou”, “thow” “thee” “thy”, “thi”, “thine”, “thyne”

thou

paragraph of broadsides were included in the data. In some title pages, additional second-person pronouns were found in epigrams or very long subtitles that had not been included in the EEBO metadata in their entirety. These instances were included in the qualitative analysis of the pronouns, but owing to the inadvertent nature of their discovery they were excluded from the quantitative overview. A few transcription errors found in the EEBO database were corrected in the present data during the checking procedure. 4. Second-person pronouns on sixteenth-century title pages The findings reported in this section comprise a quantitative decade-by-decade overview of the frequencies of second-person pronouns on sixteenth-century title pages (Section 4.1), followed by their qualitative contextual analysis (Section 4.2). The contextual analysis of the use of the pronouns to signal reader involvement focused on such features as their referents and their collocation with imperatives and auxiliary verbs. 4.1

Quantitative overview

Interpreting the nuances of pragmatic meaning of the pronouns requires a context-sensitive close reading of the title pages located by means of corpus searches. It is helpful, nonetheless, to begin with a quantitative overview of the frequency and chronological distribution of sixteenth-century English title pages containing second-person pronouns. The search turned up a total of 231 such publications (editions/issues furnished with different Short-Title Catalogue (STC) numbers). Their title pages (or, in the case of broadsides, their visually highlighted titles or first paragraphs) contain a total of 321 instances of second-person pronouns, indicating a certain tendency for the pronouns to co-occur on individual title pages (see examples 1, 8, 14, and 16 below). Table 2 shows, decade by decade, the absolute frequencies of publications with second-person pronouns on their title

Chapter 8.  Reader involvement on sixteenth-century English title pages 119



Table 2.  Decennial absolute frequencies of sixteenth-century publications with second-person pronouns on their title page, and absolute frequencies of second-person pronouns on these pages Decade Publications Pronouns

1500s 1510s 1520s 1530s 1540s 1550s 1560s 1570s 1580s 1590s 1501–1600 0 0

1 4

10 19

13 21

39 52

36 46

27 42

29 40

37 54

39 43

231 321

pages, and the absolute frequencies of the second-person pronouns themselves on these title pages. To interpret these frequencies meaningfully it is crucial to take into account the fact that the overall number of publications (editions) increased steadily throughout the sixteenth century, from 475 publications in the 1500s to 2,987 in the 1590s (as reported by Barnard & Bell 2002: 780–782).4 Figure 1 shows the proportional distribution of the 231 publications with second-person pronoun title pages decade by decade as percentages of all publications that appeared during each decade. As Figure 1 indicates, when the frequencies of publications containing second-­ person pronouns on their title pages (cf. Table 2) are viewed in relation to overall publication numbers, the 1540s and 1550s stand out as the decades when book producers favoured you and thou on the title pages of their publications. Thereafter, the relative frequency of the second-person pronouns on the title pages declined gradually towards the end of the century. Figure 1 also shows that with the decennial proportions ranging between 0 (1500s) and 2.7 per cent (1540s) of all publications appearing in each decade, the use of second-person pronouns on title pages remained relatively infrequent throughout the century. The 1540s and 1550s also emerge as the top decades in terms of the relative frequency of the second-person pronoun title pages when the chronological distribution of the altogether 321 occurrences of you and thou (cf. Table 2) is viewed in relation to the total number of words used on the title pages in each decade (see Figure 2; frequencies are normalised to 1,000 words).

4. The decennial numbers of published titles (editions) from the 1500s to the 1590s, calculated from the annual figures provided by Barnard and Bell (2002: 780–782) from STC (2nd ed.), are as follows: 1500s (N = 475), 1510s (585), 1520s (831), 1530s (1043), 1540s (1470), 1550s (1501), 1560s (1634), 1570s (2117), 1580s (2724), 1590s (2987). The data in Barnard and Bell (2002) include all known publications issued in the British Isles, including those in languages other than English (notably Latin); the proportional frequencies of second-person pronouns in English-language ­title page data were therefore slightly higher than those shown in Figure 2, especially for the early decades of the century, from the 1500s to the 1520s, when more Latin publications came out than English ones (for decennial counts of Latin vs. English title pages in 1500–1550, based on EEBO, see Varila 2018; no similar counts are available for the latter half of the century).

120 Matti Peikola

3 2.7 2.5

2.4

2 1.7 1.5 1.2

1

1.4

1.4

1.2

1.3

0.5 0

0.2 0 1500s 1510s 1520s

1530s

1540s

1550s

1560s

1570s

1580s 1590s

Figure 1.  Decennial proportions (in percentages) of publications with second-person pronouns on their title pages out of all publications (editions) of each decade 0.7 0.63

0.6

0.54

0.5

0.49

0.44

0.41

0.4 0.3 0.2

0.30

0.30 0.22

0.17

0.1 0

0.00 1500s 1510s

1520s

1530s

1540s

1550s

1560s

1570s

1580s

1590s

Figure 2.  Normalised frequencies (per 1,000 words) of occurrences of you and thou on sixteenth-century title pages, by decade5

5. The decennial word counts of the title pages used as the basis of the normalisation are calculated from the annual (1501–1600) files of the title page dataset compiled from the bibliographic metadata in EEBO (1500s: N = 16,725 words; 1510s: 23,056; 1520s: 39,108; 1530s: 50,886; 1540s: 83,198; 1550s: 84,886; 1560s: 95,138; 1570s: 133,773; 1580s: 178,347; 1590s: 194,844). The word counts include the titles of all publications as they appear in the EEBO metadata, regardless of language. The counts also include the date and bibliographic reference name/number for each publication as part of their metadata, which adds approximately 10–20 words to the word count of each title page. Since these additions apply to all publications, however, their presence does not change the overall pattern of the results.

Chapter 8.  Reader involvement on sixteenth-century English title pages 121



4.2

Second-person pronouns in context

As observed in Section 2, the deictic fluidity and ambiguity of second-person pronouns can present a challenge to any attempt to interpret the interaction expressed through them. In interpreting the use of these pronouns on sixteenth-century title pages, it therefore makes sense to start with those instances in which a referent for the pronoun can be explicitly identified. While such uses are sporadic, they also facilitate our understanding of those instances in which no such explicit reference is present. There are several instances in which a second-person pronoun occurs in the immediate context of an explicit address to the reader, in a noun phrase whose head is the noun reader. In examples (1) to (7), boldface highlights these noun phrases and the second-person pronouns collocating with them. They predominantly show book producer(s) directly guiding the individual (singular) reader in various ways in examples (1) to (6); the plural address to readers in (7) is exceptional. The reader(s) addressed may be specified as Christian (1, 2), good (3), good Christian (4) or (most) gentle (5) to (7), in a manner that resembles reader address in early modern prefaces. To make these reader-oriented expressions visually more salient, they are often presented in a separate paragraph on the title page. (1) Rede bothe o Christen Reader, truthe is comynge home, longe afore beynge incaptyuytye, steppe forth and meete her by the waye: yf thou see her presente, embrace hir, and shewe thy selfe gladde of her retourne.  (1538, STC 13081)6

(2) To the reader. In thys boke shal you fynde Christian Reader the ryght probation of the righte Olde Catholyke Churche, and of the newe false Churche, whereby eyther of them is to be knowen. Reade and iudge.  (1548, STC 16964)

(3) Reade with iudgement, and conferre with diligence, [la]iying aside all affection on eyther partie, and you shall easily perceaue (good Reader) how slender and weake the allegations and persuasions of the Pa[pi]stes are (1551, STC 5991) (4) Thou hast heer in this little book good christian Reader moste plain, sure and substantiall reasons to establish mutuall consent and christian concord  (1575, STC 4055.5) (5) Therefore beware (gentle Reader) you catch not the hicket with laughing.  (1589, STC 534). (6) Thow shalt also fynde here (most gentle Reader) of the reasons wher wyth a firme and sure concorde and peace in the Churche [...] and of other certen thynges moste worthy truly to be red and consydered.  (1542, STC 3047) 6. Boldface and underlining have been added to examples to highlight the items discussed.

122 Matti Peikola



(7) The complaynte and testament of a Popiniay Which lyeth sore wounded and maye not dye, tyll euery man hathe herd what he sayth: Wherfore gentyll readers haste you yt he were oute of his payne.  (1538, STC 15671)

In most of these examples, book producers use directives when addressing the reader. Directives are among the characteristic interactional devices whereby readers are constructed in texts (Chaemsaithong 2013: 177). Taavitsainen (2006: 449) found the co-occurrence of second-person singular pronouns with imperatives of cognitive verbs to be a prototypical feature of audience guidance in Late Middle English learned medical writing. The imperatives underlined in the above examples also exhibit cognitive verbs that characterise the reader’s engagement with the book/ text: confer in example (3), judge (2), read (1, 3). The actions that the readers are exhorted to perform, however, extend well beyond the conventional cognitive field, indicating a possible endeavour on the part of book producers to attract potential purchasers with linguistically creative title-page texts. This can be seen especially in (1), where the reader is told to step forth and meet the personified Truth, now freed from her long captivity, embrace her, and show themselves glad. In (7), readers are directed to haste to hear (through reading the book) what the dying popinjay has to say and end his sufferings. As instances of promotional language, these figurative examples might possibly be viewed in terms of what Leech (1966: 199–200) calls a ‘literary’ copy, that is an advertisement “in which imaginatively unconventional use of language is itself one of the major appeals to the reader’s interest” (see also ibid., 45 for imperatives as one characteristic of direct address advertising, and Mcquarrie & Mick, 1996, for the use of rhetorical figures in present-day advertising language). Since, however, metaphorical language was a stock feature of Renaissance English front matter – to the extent that “one encounters metaphors and personifications on virtually every page” (Saenger 2006: 95) – the unconventionality of figurative exhortations formed with non-cognitive verbs cannot be automatically assumed (see also example 16 below). Imperatives are found in less than 20 per cent of the title pages containing second-person pronouns.7 Unlike in the examples above, however, they are quite often used in contexts in which either (a) the person(s) directed through them cannot be explicitly identified with the referent of the second-person pronoun (see example 8), or (b) both the second-person pronoun and the imperative at least primarily index entities in the (fictional) world of the text at hand or in some other text (example 9).

7. The imperatives were located through a close reading of the title pages that contain secondperson pronouns.



Chapter 8.  Reader involvement on sixteenth-century English title pages 123

(8) Rede me and be nott wrothe for I saye no thynge but trothe. I will ascende makyng my state so hye/ that my pompous honoure shall never dye. O Caytyfe when thou thynkest least of all/ [with] confusion thou shalt have a fall.  (1528, STC 1462.7)

(9) A most excellent and heavenly sermon: Vpon the 23. Chapter of the Gospell by Saint Luke. The text. Luke 23.28. Weepe not for me, but weepe for your selues.  (1595, STC 20014)

In (8) – a satire against Cardinal Wolsey – the imperatives occur in the main title of the work, in which the personified book addresses the reader. The two instances of the second-person pronoun, however, occur in a rhyming couplet addressing Wolsey as a “Caytyfe” (‘wretch, villain’; see Parker 1992: 159, 212). It appears unlikely that readers would have wished to identify themselves with the ridiculed cardinal; this instance thus seems to be one of those few instances in the data in which the readers’/consumers’ projection of themselves onto the you/thou does not seem to work in the way characteristic of the double exophora in the language of advertising. In (9), both the second-person pronoun and the imperative occur in the biblical locus on which the sermon is based (Luke 23:28), where Jesus on his way to Calvary addresses “Daughters of Jerusalem” in the words cited on the title page. It may be argued, however, that the text on which the sermon is based is also inherently intended to apply to any Christian reader who recognises the biblical quotation and identifies with those addressed by Jesus. The effect is similar to that of biblical epigrams containing second-person pronouns on some sixteenth-century title pages (e.g. “Let the worde of Chryst dwell in you plenteously in all wysdome” from Colossians 3:16 in the New Testament edition STC 2843). Such uses resemble Herman’s (2002) actualized address or double deixis. In addition to the use of imperatives, some of the examples above in which the second-person pronoun explicitly indexes the reader illustrate ways in which the reader (you/thou) can also be guided to engage with the book by means of periphrastic future expressions formed with shall (examples 2, 3, 6), possibly sometimes including a degree of obligation in addition to marking futurity (cf. Mustanoja 1960: 491–492; Kytö, 1991: 261–263; Rissanen, 1999: 210–212). Such expressions are found quite frequently on sixteenth-century title pages containing second-person pronouns; in fact, approximately 33 per cent (77 out of 234) of all instances of you/thou in grammatical subject positions occur in them. As to the future actions projected onto the you/thou through these constructions, find is by far the most common verb attested in them, present in 43 of the 77 occurrences. In addition to book producers’ promotional claims that the reader will find in the book some specific topic or theme, such as “the ryght probation of the righte Olde

124 Matti Peikola

Catholyke Churche, and of the newe false Churche” in example (2), future expressions with find are also often used to advertise a specific (para)textual element in the physical book, such as a perpetual almanack (10), a table (11), or additions appended to a new edition (12). (10) And in the ende ye shal finde an almanack for euer 

(1547, STC 20423)

(11) The cōplaint of Roderyck Mors […] For the redresse of certein wycked lawes, euell custumes & cruell decrees. A table wherof thou shalt finde in the next leafe.  (1548, STC 3760) (12) The Workes of our Antient and lerned English Poet, Geffrey Chavcer, newly Printed. In this Impression you shall find these Additions: 1 His Portraiture and Progenie shewed. 2 His Life collected. 3 Arguments to euery Booke gathered. 4 Old and obscure Words explaned. 5 Authors by him cited, declared. 6 Difficulties opened. 7 Two bookes of his neuer before printed.  (1598, STC 5077)

Another common pattern evoking reader involvement through second-person pronouns consists of modal expressions formed with may. Approximately 25 per cent (62 of 234) of all title-page occurrences of you/thou in grammatical subject positions represent this pattern. The three most frequent main verbs used in these constructions are all cognitive: see (N = 19), learn (N = 11) and read (N = 11). These are illustrated in examples (13) to (15) respectively:8 (13) A new Almanacke and Prognostication, for the yere of our Lorde. M. D. lxviii beyng leape yere, wherin is set forthe and shewed the chaūge of the Moone […] and many other necessarie notes, as in the table of the contentes, you maie aesely and euidently see and perceiue.  (1568, STC 422.3) (14) The Antidotharius, in the whiche thou mayst lerne howe thou shalte make many, and dyuers noble plasters […] & wounde drynkes/ the whiche be very necessary, and behouefull/ vtyle/ & profytable, for euery Surgyan, therin to be expert/ and redy at all tymes of nede.  (1535, STC 675.7) (15) Two wunderfull and rare Examples. Of the vndeferred and present approching iudgement, of the Lord our God: the one vpon a wicked and pernitious blasphemer […] The other vpon a vvoman […] to whome the Deuill verie straungely appeared, as in the discourse following, you may reade.  (1581, STC 23399.7)

8. These frequencies also include all new editions of one and the same work, in which book producers may have retained the wording of the title as it appeared in a previous edition.



Chapter 8.  Reader involvement on sixteenth-century English title pages 125

As seen in (13) to (15), the modality expressed through may in the present data as a rule represents what Rissanen (1999: 237) calls “neutral possibility”, that is one that does not primarily seem to indicate ability or permission. These expressions too (cf. 10–12) can be used to focus the reader’s attention on the benefits of a specific (para)textual element in the book, like the table of contents mentioned in (13). The promotional quality of the claim in (13) is evident in how the adverbs easily and evidently are used to modify the acts of seeing and perceiving. In addition to the constructions using shall or may, a third relatively frequent type of expression to mark the involvement of the you/thou as the reader of the book is by means of non-periphrastic constructions formed with have. This pattern occurs in approximately 9 per cent (22 out of 234) of the instances of you/thou in grammatical subject positions in the title-page data. In such cases, producers plainly state what the book has to offer to the reader. This strategy is exemplified in (4) above: “Thou hast heer in this little book […] moste plain, sure and substantiall reasons” (STC 4055). Similarly to (4), these expressions often include spatial deictic adverbs (here, herein, wherein) or locative prepositional phrases (formed with after or before) to focus the reader’s attention on the physical artefact and the ‘navigation’ of its structure. As seen in (16), have can also appear in periphrastic future expressions formed with shall. Here the book producers make a jocular promotional use of a commercial metaphor by presenting the book with its five treatises as a shop of five windows, for readers to “cheapen and copen” (‘bargain for and buy’) for their profit (see OED Online s.vv. cheap, v., cope, v.3): (16) The key to vnknowne knovvledge. Or, A shop of fiue Windowes, Which if you doe open, to cheapen and copen, You will be vnwilling, for many a shilling, To part with the profit, that you shall haue of it. Consisting of fiue necessarie Treatises:  (1599, STC 14946)

While the second-person pronouns marking involvement on sixteenth-century title pages typically seem to identify the you/thou with an unspecified generic reader who is expected to navigate the book and/or benefit from its contents in various ways, we also find instances where the intended audience is specified more narrowly. This may be seen in (14) above, where the medicines discussed in the book are said to be “behouefull/ vtyle/ & profytable, for euery Surgyan” (STC 675.7). Addressing members of this readership with the singular thou helps render the alleged multiple benefits of the book more immediate to the individual surgeon, and thus potentially persuade him more readily to purchase the volume.

126 Matti Peikola

5. Conclusion As indicated at the beginning of this chapter, title pages may be viewed as early forms of advertising, intended to make the potential reader purchase the book and attach a high value to its content. In studies of the consumer psychology of present-day advertisements, second-person pronouns have been found an effective means of persuasion. Based on a comprehensive dataset of sixteenth-century title page texts, the present study shows that early English book producers made versatile promotional use of the second-person pronouns you and thou to engage the potential reader (purchaser). This type of reader involvement was typically manifested in linguistic patterns formed with imperatives and periphrastic future/modal expressions with shall and may. By means of these patterns, potential readers – often addressed in the singular – were persuaded to engage cognitively with the content of the book at hand, or alerted to the value of some specific (para)textual element in it that would help them use and navigate the volume more effectively. In these expressions, second-person pronouns were mostly found to index the reader in a relatively unambiguous way. The study, however, also revealed cases in which book producers exploited the deictic potential of second-person pronouns to simultaneously index the reader through personae in the fictional world.

References Barnard, J. & Bell, M. 2002. Appendix 1. Statistical tables. In The Cambridge History of the Book in Britain, Vol. 4, J. Barnard & D. F. McKenzie (eds), 779–801. Cambridge: CUP. Bell, A. & Ensslin, A. 2011. ‘I know what it was. You know what it was’: Second-person narration in hypertext fiction. Narrative 19(3): 311–329.  https://doi.org/10.1353/nar.2011.0020 Biber, D. 1988. Variation across Speech and Writing. Cambridge: CUP. Boggel, S. 2009. Metadiscourse in Middle English and Early Modern English Religious Texts. Frankfurt: Peter Lang. Brownlees, N. 2017. Contemporary observations on the attention value and selling power of English print advertisements (1700–1760). In Diachronic Developments in English News Discourse [Advances in Historical Sociolinguistics 6], M. Palander-Collin, M. Ratia & I. ­Taavitsainen (eds), 61–79. Amsterdam: John Benjamins. Burnkrant, R. E. & Unnava, H. R. 1995. Effects of self-referencing on persuasion. Journal of Consumer Research 22 (1): 17–26. Buyle, A. & De Smet, H. 2018. Meaning in a changing paradigm: The semantics of you and the pragmatics of thou. Language Sciences 68: 42–55.  https://doi.org/10.1016/j.langsci.2017.12.004 Calvo, C. 1992. Pronouns of address and social negotiation in As you like it. Language and Literature 1(1): 5–27. Chaemsaithong, K. 2013. Interaction in early modern news discourse: The case of English witchcraft pamphlets and their prefaces (1566–1621). Text & Talk 33(2): 167–188. https://doi.org/10.1515/text-2013-0008



Chapter 8.  Reader involvement on sixteenth-century English title pages 127

Chafe, W. L. 1985. Linguistic differences produced by differences between speaking and writing. In Literacy, Language, and Learning: The Nature and Consequences of Reading and Writing, D. R. Olson, N. Torrance & A. Hildyard (eds), 105–123. Cambridge: CUP. Chang, C. 2011. Enhancing self-referencing to health messages. The Journal of Consumer Affairs 45(1): 147–164.  https://doi.org/10.1111/j.1745-6606.2010.01196.x Cook, G. 1992. The Discourse of Advertising. London: Routledge. Debevec, K. & Iyer, E. 1988. Self-referencing as a mediator of the effectiveness of sex-role portrayals in advertising. Psychology and Marketing 5(1): 71–84. Debevec, K. & Romeo, J. B. 1992. Self-referent processing in perceptions of verbal and visual commercial information. Journal of Consumer Psychology 1(1): 83–102. https://doi.org/10.1016/S1057-7408(08)80046-0 Fetzer, A. & Bull, P. 2008. ‘Well, I answer it by simply inviting you to look at the evidence’: The strategic use of pronouns in political interviews. Journal of Language and Politics 7(2): 271–289. https://doi.org/10.1075/jlp.7.2.05fet Finkenstaedt, T. 1963. You und thou. Studien zur Anrede im Englischen (mit einem Exkurs über die Anrede im Deutschen). Berlin: Walter de Gruyter. Genette, G. 1997. Paratexts: Thresholds of Interpretation. Cambridge: CUP. Gieszinger, S. 2001. The History of Advertising Language: The Advertisements in The Times from 1788 to 1996. Frankfurt: Peter Lang. Görlach, M. 2002. A linguistic history of advertising, 1700–1890. In Selected Papers from 11 ICEHL, Santiago de Compostela, 7–11 September 2000, Vol. 2: Sounds, Words, Texts and Change [Current Issues in Linguistic Theory 224], T. Fanego, B. Méndez-Naya & E. Seoane (eds), 83–104. Amsterdam: John Benjamins. Herman, D. 2002. Story Logic: Problems and Possibilities of Narrative. Lincoln NE: University of Nebraska Press. Hirsch, R. 1967. Printing, Selling and Reading 1450–1550. Wiesbaden: Otto Harrassowitz. Jucker, A. H. 2000. Thou in the history of English: A case for historical semantics or pragmatics? InWords: Structure, Meaning, Function. A Festschrift for Dieter Kastovsky, C. Dalton-Puffer & N. Ritt (eds), 153–163. Berlin: Mouton de Gruyter. Kytö, M. 1991. Variation and Diachrony, with Early American English in Focus. Studies on CAN/ MAY and SHALL/WILL. Frankfurt: Peter Lang. Lass, R. 1999. Phonology and morphology. In The Cambridge History of the English Language, Vol. 3, R. Lass (ed.), 56–186. Cambridge: CUP. Leech, G. N. 1966. English in Advertising: A Linguistic Study of Advertising in Great Britain. London: Longmans. Macrae, A. 2015. ‘You’ and ‘I’ in charity fundraising appeals. In The Pragmatics of Personal Pronouns [Studies in Language Companion Series 171], L. Gardelle & S. Sorlin (eds), 105–124. Amsterdam: John Benjamins. Mcquarrie, E. F. & Mick, D. G. 1996. Figures of rhetoric in advertising language. Journal of Consumer Research 22(4): 424–443. Mustanoja, T. F. 1960. A Middle English Syntax. Helsinki: Société Néophilologique. Myers, G. 2008. ‘You in the shocking pink shellsuit’: Pronouns and address. In The Language of Advertising: Major Themes in English Studies, Vol. 1, G. Cook (ed.), 357–369. London: Routledge. Needham, P. 1986. The Printer & the Pardoner: An Unrecorded Indulgence Printed by William Caxton for the Hospital of St. Mary Rounceval, Charing Cross. Washington DC: Library of Congress.

128 Matti Peikola

OED Online. Retrieved from . Oxford: OUP. Olson, J. R. 2016. ‘Newly amended and much enlarged’: Claims of novelty and enlargement on the title pages of reprints in the early modern English book trade. History of European Ideas 42(5): 618–628.  https://doi.org/10.1080/01916599.2016.1152753 Parker, D. H. (ed.). 1992. Jeremy Barlowe and William Roye, Rede me and be nott wrothe. ­Toronto: University of Toronto Press. Ratia, M. & Suhr, C. 2017. Verbal and visual communication in title pages of Early Modern English medical texts. In Verbal and Visual Communication in Early English Texts, M. Peikola, A. Mäkilähde, H. Salmi, M.-L. Varila & J. Skaffari (eds), 67–93. Turnhout: Brepols. Rissanen, M. 1999. Syntax. In The Cambridge History of the English Language, Vol. 3, R. Lass (ed.), 187–331. Cambridge: CUP. Saenger, M. B. 2005. The birth of advertising. In Printing and Parenting in Early Modern England, D. A. Brooks (ed.), 197–219. Aldershot: Ashgate. Saenger, M. B. 2006. The Commodification of Textual Engagements in the English Renaissance. Aldershot: Ashgate. Shevlin, E. F. 1999. ‘To reconcile book and title, and make ‘em kin to one another’: The evolution of the title’s contractual functions. Book History 2: 42–77.  https://doi.org/10.1353/bh.1999.0011 Smith, M. M. 2000. The Title-page: Its Early Development 1460–1510. London & New Castle DE: The British Library & Oak Knoll Press. STC = Pollard, A. W. & Redgrave, G. R. 1976–1991. A Short-title Catalogue of Books Printed in England, Scotland, and Ireland, and of English Books Printed Abroad, 1475–1640 (2nd ed., rev. by K. F. Pantzer, W. A. Jackson & F. S. Ferguson, 3 Vols.). London: Bibliographical Society. Taavitsainen, I. 1997. Genre conventions: Personal affect in fiction and non-fiction. In English in Transition: Corpus-based Studies in Linguistic Variation and Genre Styles, M. Rissanen, M. Kytö & K. Heikkonen (eds), 185–266. Berlin: Mouton de Gruyter. Taavitsainen, I. 2000. Metadiscursive practices and the evolution of early English medical writing 1375–1550. In Corpora Galore: Analyses and Techniques in Describing English, J. M. Kirk (ed.), 191–207. Amsterdam: Rodopi. Taavitsainen, I. 2006. Audience guidance and learned medical writing in late medieval English. In Advances in Medical Discourse Analysis: Oral and Written Contexts, M. Gotti & F. ­Salager-Meyer (eds), 431–455. Bern: Peter Lang. Tyrkkö, J. 2016. Looking for rhetorical thresholds: Pronoun frequencies in political speeches. In The Pragmatics and Stylistics of Identity Construction and Characterisation, M. Nevala, U. Lutzky, G. Mazzon & C. Suhr (eds). Helsinki: Varieng.  (14 May 2020). Tyrkkö, J., Marttila, V. & Suhr, C. 2013. The Culpeper Project: Digital editing of title-pages. In Principles and Practices for the Digital Editing of Diachronic Data, A. Meurman-Solin & J. Tyrkkö (eds). Helsinki: Varieng. Retrieved from (14 May 2020). Varila, M.-L. 2018. Compiling practices in printed English paratexts 1500–1550. Journal of the Early Book Society for the Study of Manuscripts and Printing History 21: 27–51. Varila, M.-L. & Peikola, M. 2019. Promotional conventions on English title-pages up to 1550: Modifiers of time, scope, and quality. In Norms and Conventions in the History of English [Current Issues in Linguistic Theory 347], B. Bös & C. Claridge (eds), 73–97. Amsterdam: John Benjamins.



Chapter 8.  Reader involvement on sixteenth-century English title pages 129

Wales, K. 1996. Personal Pronouns in Present-day English. Cambridge: CUP. Walker, T. 2003. You and thou in Early Modern English dialogues: Patterns of usage. In Diachronic Perspectives on Address Term Systems [Pragmatics & Beyond New Series 107], I. Taavitsainen & A. H. Jucker (eds), 308–342. Amsterdam: John Benjamins. Walker, T. 2007. Thou and You in Early Modern English Dialogues: Trials, Depositions, and Drama Comedy [Pragmatics & Beyond New Series 158]. Amsterdam: John Benjamins. Werlich, E. 1976. A Text Grammar of English. Heidelberg: Quelle & Meyer.

Part II

Late Modern English

Chapter 9

Epistemic adverbs in the Old Bailey Corpus Claudia Claridge

University of Augsburg

This study investigates selected epistemic adverbs in the courtroom discourse of the Old Bailey Corpus. Over time, more epistemic types are used in court and the frequencies of individual items are on the rise, with probably standing out as the most frequent item. All items are overwhelmingly used as sentence adverbs, which are mostly found in clause-medial positions. Additionally, the adverbs are used with medium frequency as focalizers, modifying words and phrases, and rarely as response items. All social groups show increasing usage, with higher-class males apparently leading the development. Witnesses are the most prolific user group, followed by defendants and judges. While all groups use probably frequently, witnesses show a preference for evidently and apparently and lawyers/judges for undoubtedly. Keywords: epistemic adverbs, legal register, Late Modern English, sociopragmatic approach, language change

1. Introduction And who could it be who was her confederate? A lover evidently, for who else could outweigh the love and gratitude which she must feel to you?  (Arthur Conan Doyle 1892, The Adventure of the Beryl Coronet, italics added)

This statement occurs within a long crime reconstruction by Sherlock Holmes, with evidently marking one of his many inferred conclusions. Evidently indicates his epistemic process of assuming and therefore also a degree on the certainty scale. Browsing through the late 19th-century fiction of Conan Doyle, one encounters quite a density of such epistemic markers, among them also probably and apparently. Wierzbicka (2006: 247–249, 265) has identified the formal range and high frequency of such epistemic (sentential) adverbs as something specifically English, a peculiarity that she moreover links to the time from the early 18th century onwards. The expansion of modal sentence adverbials over the history of English has

https://doi.org/10.1075/scl.97.09cla © 2020 John Benjamins Publishing Company

134 Claudia Claridge

also been noted by Swan (1988: 514). She further noticed a qualitative change in their function, with early, that is Old and Middle English, uses marking truth, but Modern English uses much more likely indicating doubt (Swan 1988: 399–400). Thus, Holmes’ use above marks a high degree of conviction and thus confidence concerning his conclusion and relatively little doubt, but definitely not truth. However, an outstanding fictional character’s use is of course not indicative of what is going on in the language as a whole. A context that yields a wider and more representative picture of epistemic usage in the Late Modern English period is the courtroom of the Old Bailey in London. A wide range of speakers from many social contexts, crucially also including the lower classes, and in various situational roles have occasion and need to indicate degrees of certainty and doubt there. The Old Bailey proceedings also cover the time that Wierzbicka singled out as an important period for the development of such forms, the period from 1720 onwards. The trial data thus will be used to answer questions on the characteristics and users of epistemic adverbs in Late Modern English. Characteristics cover their general syntactic and semantic profiles, their status as sentence adverbials, as well as their frequencies. User patterns concern their socio-pragmatic distributions according to speakers’ gender, class, or role in the courtroom. This chapter will first discuss epistemic adverbs in theory and previous research (Section 2), then the data and methodology employed in the investigation (Section 3). Section 4 will present the results of the analysis, after which Section 5 will conclude. 2. Epistemic adverbs If one assumes that an unmodified declarative sentence indicates complete speaker confidence, then the addition of any epistemic marker signals a deviation from an “ideal knowledge status” (DeLancey 2001: 380), as probably in they will probably be back tomorrow does by signalling some speaker uncertainty. The epistemic adverbs in focus here have speaker-oriented semantics in the sense that they concern the degree of speaker knowledge and confidence as opposed to the factual truth, that is they are marked by subject-orientation (Wierzbicka 2006; Palmer 2001; Swan 1988). The adverbs mark the subjective assessment of a given speaker of their certainty relating to the matter at hand and the degree of commitment they want to communicate to interlocutors. In doing this, speakers can transmit further semantic nuances by their precise choice of adverb (Swan 1988, 1991). They may want to express their evaluation of the logical possibility of something, either with a higher (e.g. necessarily) or lower (e.g. possibly) indication of probability. They



Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 135

can also indicate an evidential basis for their assumption (e.g. manifestly). Finally, speakers may use distancing types (e.g. supposedly) in order to downplay their own responsibility for the statement made.1 Even strong-sounding forms such as certainly or undoubtedly do not express full conviction. According to Wierzbicka (2006: 259, 261), these items are not completely subjective, but partly objectified, that is indicating that other people have a similar view. If this is part of the meaning of these items, or at least implied by them, they would of course be particularly useful forms in the courtroom, as the speaker’s meaning would be ‘I say so, because I believe so – and so may you’. These adverbs may either act as sentence adverbs (1) or as focalizers (2), depending on whether they have scope over an entire sentence or over a smaller element within the sentence (Swan 1988; Simon-Vandenbergen & Aijmer 2007: 87). (1) Probably he handed it to some sailor customer of his, who forgot all about it for some days.  (Doyle 1892, The Man with the Twisted Lip)

(2) He had foresight, but has less now than formerly, pointing to a moral retrogression, which, when taken with the decline of his fortunes, seems to indicate some evil influence, probably drink, at work upon him.  (Doyle 1892, The Adventure of the Blue Carbuncle)

As sentence adverbs, they are commonly sentence-initial, but may also occur in medial positions (Swan 1988). Initial position functions to signal up front and to emphasize both the non-fully assertive and the more subjective nature of what follows (Wierzbicka 2006: 261). In other words, the initial adverb works like a frame or a contextualization cue. Simon-Vandenbergen and Aijmer (2007: 301) add a third use of epistemic adverbs, namely that of actually non-epistemic, emphatic responses in answers (3), which they call discourse marker use. (3) It must be those wretched gipsies in the plantation. – Very likely. […]  (Doyle 1892, The Adventure of the Speckled Band)

There are substantial frequency differences between individual adverbs, from very frequent probably via medium-frequent presumably and apparently to low-frequent undoubtedly, likely, conceivably, supposedly, allegedly and reportedly (Wierzbicka 2006: 262, based on the Cobuild Corpus). Furthermore, the usage profile of these adverbs is such that they occur more often in interactive contexts and thus also more commonly in speech, although individual forms may diverge from this pattern, for example undoubtedly, which is more common in writing (Simon-Vandenbergen & 1. The performative type (e.g. assuredly) is left out of consideration here, as it does not have one clear illocutionary force.

136 Claudia Claridge

Aijmer 2007: 196–197; González-Álvarez 1996: 246–247; Wierzbicka 2006: 262). Socially higher-ranking speakers have been found to use these adverbs more frequently than lower-ranks (González-Álvarez 1996: 252). Historically, both the inventory of types and the token frequencies of epistemic adverbs have increased. The investigation of twelve items in Literature Online by Wierzbicka (2006: 293) revealed only one of them (undoubtedly) to be present in the 16th and 17th centuries, whereas from the 18th century onwards the type range kept expanding to five in the 18th century (+ evidently, probably, unquestionably, possibly), to eleven in the 19th (+ presumably, obviously, apparently, arguably, allegedly, conceivably) and finally in the 20th century 12 types (+ reportedly). Some items are, however, attested earlier in the OED, for instance conceivably and presumably from the 17th century onwards. With regard to the context of interest here, historical trials have been found to exhibit a steady increase of epistemic adverbs (González-Álvarez 1996: 249). 3. Data and methodology The material for this investigation comes from the Old Bailey Corpus, version 2.0 (henceforth OBC), which is a corpus of c. 24.4 million words drawn from trial transcripts from London’s central criminal court (Huber 2007, Huber et al. 2012). The corpus documents spoken English in the courtroom from 1720 through 1913. The transcripts are arguably as near as we can get to the spoken language of the time, but nevertheless the data is to be characterized as merely speech-related as it has been filtered through various processes from scribal short-hand transcription via expansion, editing and proof-reading to produce the printed material the corpus is based on (Huber 2010). Infelicities concerning the actual speech produced will have occurred at many steps and items of interest for this chapter may have been omitted or changed. One of the scribes, Thomas Gurney, said he could not guarantee that “these are the very words she made use of, or that she made use of no more words” (Huber 2007) and different printed versions of trials indeed show interesting divergences, for example “I can’t particularly say that” vs. “I cannot positively say” or “That is Mr Ayliffe’s hand-writing” vs. “I believe it to be Mr Ayliffe’s hand” (Huber 2007, my emphasis). Apart from individual segments of speech being altered, there are also large-scale omissions, concerning especially the language of lawyers and judges, thus the most typical legal genres like openings, closings and summations are usually completely excluded (Emsley et al. n.d.). The extant proceedings contain questions, answers as well as brief narratives, and focus mainly on the speech of witnesses, victims, and defendants. Despite the possible interventions mentioned above, the language recorded retains a fairly high degree

Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 137



of interactivity and sometimes even a colloquial flair. Nevertheless, the gravity and formality of the courtroom will also have made an impression on the speakers, who will have felt the need to be careful in their phrasing and to present themselves in a good and credible light. The corpus has been divided here into five subperiods, approximately forty years each, to trace developments over time. The corpus is encoded for speakers’ gender, social class (in the HISCLASS higher- and lower-class system)2 and role in the courtroom (victim, witness, defendant, lawyer or judge). Table 1 presents a summary of the word counts of the corpus. The speech proportions of various groups show a very skewed distribution. Male participants produced about 84% of attributable words, whereas female speakers are represented by only 16%. Regarding courtroom speaker roles, witnesses contribute most speech (66%), followed by victims and defendants, while the professionals, lawyers and judges, are represented rather meagrely with 6% and 2% respectively. Table 1.  Word counts of the Old Bailey Corpus, version 2.0 (OBC 2.0)  

1720–59

1760–99

1800–39

1840–79

1880–1913

Total

Subperiods

3,420,347

4,684,667

5,529,139

5,859,601

4,949,834

24,443,588

GENDER Females Males

  662,385 2,579,819

  732,948 3,839,168

  860,608 4,518,760

  932,755 4,724,540

  691,050 4,184,195

  3,879,746 19,846,482

CLASS Lower classes Higher classes

  564,538   888,477

  973,895 1,751,316

1,281,435 2,358,375

1,203,219 2,895,354

  920,810 2,815,307

  4,943,897 10,708,829

ROLE Judges Lawyers Defendants Victims Witnesses

  101,912   243,700   209,993   587,243 1,401,308

  218,083   598,032   347,587   884,888 1,895,277

  133,246   353,129   349,242 1,252,800 2,912,865

   70,238   117,168   237,600   852,129 4,370,239

    7,146    14,941   377,674   639,494 3,857,501

  530,625   1,326,970   1,522,096   4,216,554 14,437,190

The searches were conducted in the annotated version of the corpus and its accompanying concordancer. Thirteen items, partly inspired by Wierzbicka’s listing (see Section 2) were searched for, paying attention to possible orthographical variants. The largest group concerns logical possibility, namely undoubtedly, indubitably 2. As suggested by the corpus compilers (Huber et al 2016: 9), the OBC’s HISCLASS scheme of 13 social classes is here simplified into two class denotations: ‘higher class’ for HISCLASS 1–5 (non-manual professions) and ‘lower class’ for HISCLASS 6–13 (manual professions).

138 Claudia Claridge

(high possibility), probably, presumably, likely (medium), and conceivably (low). Evidently and apparently make up the evidential group, while supposedly, allegedly and reportedly represent distancing types. The last three as well as indubitably were not at all attested in the data. The hits for all types were screened in order to exclude non-suitable instances. Only truly spoken uses are of interest here, not instances of quoting from writing, as in (4). Non-adverbial uses, like the adjectival likely in (5) were also discarded. Only adverbial instances are of interest, and more precisely only those with relevant epistemic semantics (speaker induction or deduction). This excludes the emphatic use (with the meaning ‘definitely’) in (6), and the more literal meanings seen in (7) ‘by visual evidence’ and (8) ‘as appears’ (cf. Traugott 1989: 46–47). The semantic decisions were made by two independent raters; in case of divergence (ca. 20% of cases) a final decision was reached by discussion.

(4) I quote from Caspar. [“]Marc has collected eight cases of so-called homicidal mania; there is however not one among them in which general mental disease did not indubitably exist”  (t18720108-117)3



(5) Do you think him likely to go out a robbing? 

(t17540116-33)

(6) she has been a trustworthy servant – I would undoubtedly employ her again.  (t18341205-299) (7) I can now evidently trace that the “y” has been added–the “0” is not so evident  (t18360713-1435) (8) apparently by the documents the promissory note was given on 6th December, instead of in November,  (t18910504-431)

Response types like (3) above were included as they are considered here to be cases of contextual ellipsis. This left 2,112 relevant epistemic instances, which were then further analysed concerning syntactic position, scope, clause types and moves. 4. Results This section will first present an overview of the general frequencies of items across time (Section 4.1), which will be followed by a description of the syntactic behaviour of items in Section 4.2, which correlates with the function they fulfil. Finally, Section 4.3 will present the sociopragmatic distribution across the gender, social class and courtroom role of speakers.

3. OBC file references have the structure tYearMonthDay-identifier, that is (4) is taken from a trial that took place on January 8, 1872.

Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 139



4.1

Overview of findings

Table 2 gives an overview of all occurrences of the search items. The three types probably, presumably and conceivably were only found in epistemic uses, while all others exhibited many of the excluded uses illustrated in (4) to (8). Two extremely low-frequency items (presumably, conceivably) are joined by the others with sizeable frequencies: from dominant probably in descending frequency apparently, likely, undoubtedly to evidently. With the exception of likely this parallels Swan’s (1988) frequency ranking for Late Modern English (LModE). Table 2.  Occurrences of epistemic adverbs (raw frequencies)  

1720–59 1760–99 1800–39 1840–79 1880–1913 total epistemic

apparently conceivably evidently likely presumably probably undoubtedly Total

 0  0  0 12  0 16  6 34

 18   0   4  33   0  70  72 197

109   0  15  87   0  83  27 321

181   0  50 148   0 194  38 611

251   1  77 139   6 411  64 949

ambiguous & other

  559    1   146   419    6   774   207 2,112

130   0  55 908   0   0  34  

The figures in Table 2 point to increasing use and this impression is indeed corroborated by looking at the normalized frequencies in Figure 1. 1880–1913

1840–79

1800–39

1760–99

1720–59 0 apparently

10

20 conceivably

30 evidently

40 likely

50

60

presumably

70 probably

80

90

undoubtedly

Figure 1.  The development of epistemic adverbs in the OBC (per one million words)

140 Claudia Claridge

All items show an uninterrupted rising pattern, except for undoubtedly, which wanes in the third and fourth subperiods. The inventory also increases, from three items at the beginning, via five in the following three subperiods, to seven items in the last subperiod. The very late occurrence (in the fifth subperiod) and rarity of conceivably and presumably4 here is in line with Wierzbicka’s (2006) Literature Online results, but is still somewhat surprising given Oxford English Dictionary (OED) attestations; this may speak for an original preference of these items for written contexts. If we focus on the whole group of adverbs treated here and their users, we also see a clearly rising pattern (Figure 2). All social groups, male or female, lower or higher classes, adopt these items more frequently over time, which shows increasing integration into common speech styles.        higher class lower class male female

–  .  

– . . . .

– . . . .

– .  . .

– . . . 

Figure 2.  The development of epistemic adverbs (per one million words)

4.2

Functional distributions

As mentioned above, epistemic adverbs may be used as sentence adverbs, focalizers or as response items. As the first two types occur within sentences, they are combined in Figure 3 (n = 2,071). It shows that the majority of them have scope over the entire sentence or clause, that is are used as sentence adverbs. This means that speakers overwhelmingly modalize entire propositions, thus highlighting their certainty and commitment status. 4. There is one more occurrence of conceivably in 1910 in the complete, 120-million-word Old Bailey proceedings , cf. Hitchcock et al. n.d.), while presumably searches still yield zero results there.

Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 141



    ()

      

focalizer sentence scope

–  

–  

–  

–  

–  

Figure 3.  Scope of sentence-internal epistemic adverbs across subperiods (%)

In sentence-adverbial use (n = 1,647), the adverbs can occur in initial, medial or final positions, as the examples in (9), (10), and (11) respectively illustrate. Note that in (10 b, d, e) it is the (relative or complement) clause, not the entire sentence, that is modified.

(9) a. From the visitors’ book it appears that she had visits from her father, brother, sister, and friend. Apparently her husband did not visit her.  (t19130107-63, witness, m, higher) b. Q. Look at these scissars [the potential murder weapon, CC]. – Saul: Very likely it was done with something like this: perhaps the very same.  (t17601204-13, m, higher)

(10) a. he most likely sells to other people  (t18390513-1554, victim, m, lower) b. I do not know any Sergeant Nichols, who was presumably a sergeant belonging to his own regiment.  (t19060430-14, witness, m) c. Reichberg had evidently done some work there  (t18620407-428, witness, m, higher) d. when I told him he would probably be summonsed he made a complete laugh of it  (t18620616-689, witness, m, higher) e. he said the money had been taken from him: which was, very probably, the case.  (t18170521-140, victim, m, lower) (11) a. He had that opportunity undoubtedly.  (t18000917-46, witness, m, higher) b. [about a cheque] November, 1893, for £100, which had been cashed and returned apparently  (t18950722-598, witness, m, higher)

142 Claudia Claridge

As indicated by the number of examples provided and visible in Figure 4, it is the medial position which dominates overall. The tendency toward initial position (Leech & Svartvik 1975: 202; Swan 1988: 46) is thus not borne out in this data, but Swan also noted a fairly common occurrence in medial slots (ibid. 443).      ()       final medial initial

apparently   

evidently   

likely   

presumably   

probably   

undoubtedly   

Figure 4.  Positions of epistemic adverbs with sentence scope (%)

Conceivably and presumably are included for the sake of completeness here (also below), but due to minimal frequencies, these will be ignored in the discussion. Individual adverbs show different positional patterns: while evidently and apparently overwhelmingly prefer medial positions (close to 90%), this is less pronounced for undoubtedly and probably (around 70%). Only likely shows a more balanced distribution with about 50% medial positioning. It is also the item with the highest amount of initial placing (close to 50%), while the other adverbs vary between less than 10% to almost 30% for initial position.5 Compared to present-day data, there are both similarities and differences: probably, likely, and undoubtedly are similar to Swan’s (1988: 452, 456, 464) present-day data with regard to initial position (11.5%, 28.8% and 40%), while apparently and evidently differ (33.3% and 41.6% placed initially). On the whole, Wierzbicka’s (2006) notion of an advance signalling of non-assertiveness seems not to play a major role in the present data, although it is present in cases like (12). But it is doubtful whether initial position truly generally highlights non-assertiveness in the courtroom data. The overall effect in (9b) above, with two initial epistemics (very likely, perhaps) in sequence, for example, is clearly intended to show commitment, rather than hedginess. Also, we find cases like (13) 5. For all items, initial positions decline over time.



Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 143

from a counsel of state’s opening statement in a coining offence trial, where the lawyer certainly intends his statement as uncontroversial and definitive. (12) Levi might have left the room for a few minutes; I can’t say that exactly–probably he might have gone down for a few minutes into the parlour.  (t18650227-339, witness, m, lower) (13) Gentlemen, undoubtedly it is the Prerogative of the Crown to take Care of the current Coin of the Kingdom.  (t17451204-32, lawyer, m, higher)

The final position, while rare overall with less than 10%, is interesting. In spoken contexts like here it could of course signal addition as an afterthought, that is speakers realising while speaking that some explicit epistemic marking is necessary. This produces an unusual structure, making the items stand out more prominently, also enhanced by the fact that they take the position specialising for new information. In (11) above, the effect seems to be to emphasize their (semantic) contribution, undoubtedly increasing and apparently decreasing assertiveness and commitment. Interestingly, undoubtedly is said to never be clause-final in Present-day English by Simon-Vandenbergen and Aijmer (2007: 133–134). Zooming in on the focalizer uses (the darker shade in Figure 3 above), we can look in more detail at how common this use is with individual items and which kind of elements are modalized by these adverbs. The most common modification targets are noun phrases, as in (14), and prepositional phrases, cf. (15) (targets are underlined). Adjective phrases, for instance (16), are fairly common with apparently and evidently, reflecting the fact that described characteristics are based on perceivable evidence. Adverb phrases, as in (17), are moderately common only with probably. Numerals, such as (18), have been given a separate category as they may be important in court, but have turned out to be infrequent. (14) I do not know who prepared it, most likely Goodheart  (t18901215-91, witness, f) (15) I paid him on the market on February 22, presumably in cash.  (t19070108-34, defendant, m, higher) (16) I went up to him; he stood against the wall, apparently speechless  (t18261026-34, witness, f, lower) (17) death was three days; the minimum period five or six hours, conceivably less;  (t19120227-48, witness, m, higher) (18) that the child had been dead probably three or four days.  (t19111205-55, witness, m, higher)

144 Claudia Claridge

Examples (14) to (18) show that the focalized elements are usually short, but nevertheless often important, for instance the time of death (17) and (18) in murder cases. Like the final markers, they can take on the character of an afterthought, although they produce less marked structures. Their main function is to allow the speaker to only modalize part of the content, for example that the fact of death in (17) and (18) is certain, but not the exact timing. They also allow the speaker to openly speculate and give various options, as in (17). The overall context for (15) illustrates the importance of epistemic marking in general: the defendant is making his statement under oath, where any kind of untruth or misunderstanding may lead to unwanted consequences. If he did not remember precisely the form of payment, the phrasing used is a safe option. Figure 5 shows the modified targets for the individual items, both with raw frequencies and proportionally.     ()

       apparently

conceivably

evidently

likely

presumably

probably

undoubtedly

Num















AdvP















AdjP















PP















NP















Figure 5.  Scope targets of focalizer uses (raw figures [n = 424] and %)

Probably and apparently show higher raw frequencies than the other items in Figure 5 and this indeed correlates with relatively higher proportions of focalizer uses. These make up about 20% and 40% of all sentence-based uses, respectively, while focalizer uses tend to remain below 10% with the other items. Epistemic adverbs employed as response items also occur in the data, albeit not very frequently (41 occurrences). They can stand in for the affirmative, as in (19) to (21), or function as denials in conjunction with not, as in (22), appear completely on their own or with further explanation added, as in (21) to (22).

Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 145



(19) If any body comes in to ask for a lock, you sell them one? – Undoubtedly.  (t17830430-42, witness, m, higher) (20) The person who is employed as charge-taker is not likely to be much employed in sorting letters for the particular division, but if any letters are sent from the inland office into his division through mistake, it would come through his hands? – Most likely.  (t17950114-36, m, higher) (21) Q. If I understand you right, if a person was to send 30 boxes to me, I should end with No. 30? – Most probably, and the next box perhaps would begin No. 31  (t18601126-52, witness, m, higher) (22) Q. Are you able to tell whether the prisoner was one? A. Apparently not, by his height; the three men did not appear much taller than me; he is shorter than me.  (t18271206-43, witness, m, lower)

Response uses give the speaker (or possibly just the transcriber) the chance to save time (or space) by not spelling out again content contained in the question. This use is found most frequently with undoubtedly (10%), followed by likely (3%); for all other items it is negligible. 4.3

Sociopragmatic patterns

Figure 2 above already gave a glimpse of the social embedding of epistemic adverbs, showing male and higher-ranking speakers to be in the lead in their use. Figure 6 presents social-class and gender preferences regarding the use of specific items. The most favoured item of male and higher-ranking speakers is probably, while for female and lower-ranking ones the front-runner is the only native item in the group, likely.6 Higher-ranking speakers use the strongest, most assertive item, undoubtedly, the most of all speaker groups, perhaps indicating more confident speech styles.

6. A reviewer raised the question of the behaviour of perhaps (a foreign-native hybrid) and maybe (native) here in comparison to likely. Very infrequent maybe (22 occurrences) is clearly favoured by lower-class and male speakers. Perhaps, which is more frequent than all items treated here (2,034 occurrences), is used most by higher-class and male speakers. Thus, there is no common pattern for these three stylistically similar items. Both perhaps and maybe were excluded from this study, because their semantics are more about general contingency than about speaker knowledge (cf. also Wierzbicka 2006: 248–251).

146 Claudia Claridge

      

male

female

higher class

lower class

apparently

.

.

.

.

conceivably

.



.



evidently likely presumably probably undoubtedly

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 6.  Distribution of individual epistemic adverbs based on social characteristics (per one million words).

The speaker groups which are even more important for the courtroom context are those according to their function in the trial, namely judge and lawyer on the professional side, and defendants, witnesses, and victims on the lay side. These are presented in Figure 7. Focusing on overall speaker group totals first (the rightmost set of bars in Figure 7), we see that witnesses use such adverbs the most. 120 100 80 60 40 20 0

apparently

conceivably

judges

9.4

0

lawyers

6.8

0

10.5

0

1.3

victims

13.3

0

witnesses

31.8

0.1

defendants

evidently

likely

presumably

probably

undoubtedly

Total

1.9

1.9

0

30.1

24.5

67.8

2.3

3.8

0

22.6

21.8

57.3

19.7

0.6

29.6

1.3

63.1

2.8

16.6

0.2

19.0

2.6

54.5

8.6

19.9

0.3

39.4

9.3

109.4

Figure 7.  Distribution of epistemic adverbs according to functional courtroom speaker roles (per one million words)



Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 147

This is due to the fact that their knowledge and commitment status are frequently in focus while they are being (cross-)examined. The usage levels of all other speaker groups is fairly similar. The fact that neither defendants nor victims pattern more closely with witnesses is of interest. Defendants are very rarely examined like witnesses but instead, especially in earlier corpus periods, may act as their own ‘lawyer’, examining witnesses themselves. Although victims did play a witness role in the courtroom, they show much lower use than other witnesses. A tentative explanation for this may be that, being closer to the crime, they have first-hand and more certain knowledge and thus prefer unmodalized statements. Investigating all victim utterance for degrees of modalization is beyond the scope of this study, however. Focusing now on individual adverbs, we note that probably is the most common form for all groups, making this a core item. Its usefulness is based on its indicating medium possibility, which is both sufficient and inconspicuous in most contexts, and its semantic neutrality (unlike e.g. evidently). Witnesses stand out by using the two evidential types (evidently, apparently) more than other groups. These can be argued to indicate the closest link to direct experience, that is explicitly expressing or at least still implying that something has indeed been directly perceived by the witness. The examples in (23) illustrate this clearly: the nature of the wound (23a), the size of the footmarks (b), the looks and behaviour of the person (c) and the flashing event (d) all indicate the (material) evidence that functions as the basis of the assumption or the induction (cf. Chafe 1986: 270). (23) a. Q. Could this wound have arisen from a fall? A. No – it was evidently a blow by a blunt instrument  (t18421128-56, witness, m) b. noticed some footmarks on the border in the garden, that were evidently the footmarks of a female  (t18701121-2, witness, m, higher) c. The second man, who I should say was evidently a German, looked behind him several times, …  (t19130107-79, witness, m, higher) d. Evidently something was in your hand, or it would not have flashed  (t19040725-615, witness, m, lower)

Examples (23a–c) were spoken by expert or professional witnesses, namely a medical doctor (23a) and policemen (23b–c), who indeed make up prominent users of evidently with 26% and 29%, respectively, of all occurrences of the item. Like evidently, apparently also has clear connections to ‘appearance’ in the witnesses’ usage and thus to physical evidence. In (24a–c) something in the situation will have led the witness to make the subsequent assumption, usually with somewhat more inference than in the case of evidently; in (24b) this is even explicitly verbalized (see underlined passage). This is different in (24d), as truth-speaking does not necessarily have clear outward signs, so that this represents a more abstract epistemic use.

148 Claudia Claridge

(24) a. still he continued to pursue us, apparently with a design to reach the coach door,  (t17741019-3, witness, m, higher) b. she came very fast; she apparently came from Mrs. Dimmock’s room; Dimmock’s room is on the ground floor, to the left; she seemed to turn from the left.  (t18090517-9, witness, f) c. with her head lying in a pool of blood; she was apparently unconscious  (t19060205-231, witness, f) d. he stated where he resided, he had a loan-book, and, apparently speaking the truth, he was discharged  (t18650227-335, witness, m, higher)

Overall, these items with their link to material circumstances serve well the witness’s role of introducing evidence into the courtroom. In contrast to Wierzbicka’s (2006: 279) characterisation of a noncommittal speaker attitude produced by apparently, this item and also evidently seem to enhance credibility in the Old Bailey courtroom. The professional participants, judges and lawyers, also use one item much more than the other speaker groups, namely the high-possibility type undoubtedly. Judges and lawyers use this form in procedural contexts and in the argumentation of their opening and closing statements. (25a–b) illustrate judges making decisions about the legal applicability of a course of action, undoubtedly highlighting the seriousness of the situation in (25a) and marking the degree of commitment to the conclusion drawn in (25b). (25) a. ‘Tis undoubtedly a very heinous Offence, and deserves a severe Punishment. But the greater the Crime is, the clearer the Proof ought to be – If the Letter is not prov’d, it cannot be read in Court. (t17340424-60, judge, m, higher) b. if any person sees them taken, it is not a capital offence, if Mrs. Cave saw them taken, most undoubtedly it is not within the act of parliament.  (t17820109-32, judge, m, higher) c. for it is not necessary that the hand only which gave the blow, should be considered as the murderer; because, if more people are concerned, aiding, abetting, and comforting; they all of them contribute to fortifying the mind of the man who gives the blow; all of them are included in the same degree of guilt; all of them in the eye of the law, and of common sense and morality too, all of them are undoubtedly murderers.  (t17900416-1, m, judge, higher)

Example (25c) is found in the closing address of the judge to the jury, where a legal argument is laid out for them and marked as a logical necessity by undoubtedly. Lawyers also use the form in addressing the jury, in the opening statement in (26a) to impart to them the gravity of the situation as well as ultimate to ask for leniency, and in the closing in (26b) to present a conclusion to them as inevitable.



Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 149

(26) a. Gentlemen, undoubtedly when any body contemplates such a wretch as that is, of so tender years, and recollects that he stands at this moment, one may certainly state on the brink of eternity, to be rescued from inevitable ruin only by your verdict; the sensation of such a person must be undoubtedly very painful.  (t17910720-32, lawyer, m, higher) b. You cannot, as wise and reasonable men, form your judgment upon any one part of this transaction, as distinct from the rest; because, undoubtedly, all concur to support one proposition, that the prisoner was employed in a traiterous design to communicate intelligence to the enemy; and which he did communicate by the instruments, and in the particular manner which has been proved.  (t17810711-1, lawyer, m, higher) c. That letter disposes of the last point; with reference to the first, my friend has ingeniously mixed up the affidavit which might have been made with that which was in fact made; the licence was undoubtedly issued in pursuance of the defendant’s statement, falsely made, that he had the father’s consent;  (t18620407-382, lawyer, m, higher)

Example (26c) is found in the context of a larger discussion between legal professionals, with undoubtedly marking a conclusion drawn by the lawyer after a piece of evidence had been read out. Interestingly, undoubtedly is also found in the professionals’ interrogatives, as in (27), and thus may be said to aim at eliciting an answer of a high degree of certainty. However, also lower-possibility forms are found in questions, as (28) shows. (27) a. Are not Waltrond, Roger, Ratcliffe, and De la Motte, by this evidence, undoubtedly joined together in the transaction?  (t17810711-1, lawyer)7 b. It is in the power of the sorters to conceal a letter undoubtedly?  (t17920912-85, lawyer) (28) a. do you think that his Death might not probably have been prevented?  (t17320906-25, judge) b. You have lived there some time probably?  (t17870523-98, lawyer) c. The child had evidently been born without professional assistance?  (t18610408-344, judge) d. You have apparently regarded the female as the man’s wife?  (t18660709-652, judge)

7. This speaker is a prominent user of undoubtedly, whose six uses are all found in the argumentative context of his closing speech. Individual user profiles may thus be interesting in this field.

150 Claudia Claridge

These cases are of special interest,8 as most epistemic adverbs have been seen as non-admissible in questions and imperatives (Wierzbicka 2006: 249, SimonVandenbergen & Aijmer 2007: 286), exceptions being, among others, likely and conceivably. Such uses may be acceptable under certain circumstances, however, as Swan (1988: 47–48) remarked. Example (27a) does not have interrogative force, but is a rhetorical question used in a closing statement. In (28a), the adverb is found in the embedded clause, not the interrogative matrix clause; more importantly perhaps, it also contains a negative particle. Finally, (27b) and (28b–c) are questions but not interrogatives: they are conducive and confirmation-seeking. The use of the adverb is either geared to elicit a fully committed answer (27b) or to make it easier for the witness through the mitigated form to go along with the questioner’s intended direction (28). 5. Conclusion Based on the OBC evidence and the set of items chosen here, epistemic adverbs can indeed be said to have established themselves as resources increasingly used by the speech community in LModE. Over time, more epistemic types are used in court and the frequencies of individual items are on the rise. With regard to overall frequency, probably dominates the set, which must be due to its fairly unspecific nature semantically and its intermediate probability marking. All items are overwhelmingly used as sentence adverbs, thus modifying larger units or whole propositions. This usually goes along with clause-medial placement, however, with initial position infrequent and also not gaining ground over time. All social groups show increasing usage, with higher-class males apparently leading the development. While individual social groups show certain preferences regarding types, the one item they agree on, albeit with a moderate frequency only, is the native term likely. Across the different speaker groups in the courtroom, it is the form probably which is most commonly used. The legal professionals additionally exhibit a fondness for exploiting the high-probability type undoubtedly for argumentative and persuasive uses. Witnesses furthermore show a preference for the evidential types evidently and apparently, thus in some sense highlighting their eyewitness status. The fact that witnesses stand out as the most frequent users of epistemic adverbs in the courtroom shows how important it was for them to provide an appropriate modification to their statements, so as not to be nailed down to a content they might not want to stand by completely. This attitude is nicely embodied in a quote containing an 8. There are 26 such occurrences (i.e. 14 per one million words), most commonly with probably (16), followed by undoubtedly (4), likely, evidently, and apparently (2 each).



Chapter 9.  Epistemic adverbs in the Old Bailey Corpus 151

emphatically double probably: “I don’t remember who sent me to Eldred and Co., probably Mr. Smith, I only say probably” (t18901215-91, italics added). Thus, the speaker groups in the courtroom exhibit distinctive epistemic voices to a certain extent, which mirror in what ways they are involved in the trial.

References Chafe, W. 1986. Evidentiality in English conversation and academic writing. In Evidentiality: The Linguistic Coding of Epistemology, W. Chafe & J. Nichols (eds), 349–365. Norwood NJ: Ablex. DeLancey, S. 2001. The mirative and evidentiality. Journal of Pragmatics 33(3): 369–382. Doyle, A. C. 1892. The Adventures of Sherlock Holmes. Accessed at Project Gutenberg, 10 April, 2019. Emsley, C., Hitchcock, T. & Shoemaker, R. n.d. The proceedings – The value of the proceedings as a historical source. Old Bailey Proceedings Online, version 7.0 (9 June, 2019). González-Álvarez, M. D. 1996. Epistemic disjuncts in Early Modern English. International Journal of Corpus Linguistics 1(2): 219–256. Hitchcock, T., Shoemaker, R., Emsley, C., Howard, S. & McLaughlin, J. n.d. The Old Bailey Proceedings Online, 1674–1913, version 7.0. (24 March 2012). Huber, M. 2007. The Old Bailey proceedings, 1674–1834. Evaluating and annotating a corpus of 18th- and 19th-century spoken English. In Annotating Variation and Change [Studies in Variation, Contacts and Change in English 1], A. Meurman-Solin & A. Nurmi (eds). Helsinki: University of Helsinki. (12 May 2020) Huber, M. 2010. Trial proceedings as a source of spoken English: A critical evaluation based on the Proceedings of the Old Bailey, 1674–1913. Anglistentag 2009 Klagenfurt: Proceedings, 65–78. Trier: Wissenschaftlicher Verlag Trier. Huber, M., Nissel, M., Maiwald, P. & Widlitzki, B. 2012. The Old Bailey Corpus. Spoken English in the 18th and 19th centuries. Huber, M., Nissel, M. & Puga, K. 2016. The Old Bailey Corpus 2.0, 1720–1913, Manual. (12 May 2020). Leech, G. & Svartvik, J. 1975. A Communicative Grammar of English. London: Longman. Palmer, F. R. 2001. Mood and Modality, 2nd edn. Cambridge: CUP. Simon-Vandenbergen, A.-M. & Aijmer, K. 2007. The Semantic Field of Modal Certainty. Berlin: Mouton de Gruyter. Swan, T. 1988. Sentence Adverbials in English: A Synchronic and Diachronic Investigation. Oslo: Novus. Swan, T. 1991. Adverbial shifts: Evidence from Norwegian and English. In Historical English Syntax, D. Kastovsky (ed.), 409–438. Berlin: Mouton de Gruyter. Traugott, E. 1989. On the rise of epistemic meanings in English: An example of subjectification in semantic change. Language 65(1): 31–55. Wierzbicka, A. 2006. English: Meaning and Culture. Oxford: OUP.

Chapter 10

Question strategies in the Old Bailey Corpus Patricia Ronan

Technical University of Dortmund

This qualitative and quantitative pilot study investigates the use of different question strategies of varying coerciveness in four different periods in the Old Bailey Corpus. It asks what question strategies are used by which trial participants at what time in the later early and late modern periods of English, using Woodbury’s (1984) continuum of control. Data stems from the Old Bailey Corpus 2.0 and is investigated manually. Results show that compared to the defendants asking questions, which was the practice in earlier periods, the legal practitioners asked more varied questions with broader scopes. These drove the discourse of the court proceedings forward more successfully than the more narrow questions asked by the defendants in the early trials under consideration. Keywords: Late Modern English, Old Bailey Corpus, courtroom discourse, question strategies, continuum of control

1. Introduction Present-day legal practitioners are constrained by legal guidelines as well as societal practices in their interaction with defendants and witnesses in the courtroom. However, there is a clear power structure in the courtroom both in terms of receiving the right to speak and in terms of who determines topics (e.g. Gibbons 2003; Holt & Johnson 2010). This is even more obvious in the disadvantaged role that the defendants played in the legal process in the pre-modern period, for example in the use of questions, which may have prevented defendants from defending themselves successfully. Previous research on legal interaction in Early Modern English has been carried out in an extensive study of courtroom questions and answers by Archer (2005), who investigates courtroom interaction from the years 1640 to 1760. Particularly for the study of historical discourse, trial data is highly relevant as trials, together with witness depositions, “represent ‘authentic’ language use” (Culpeper & Kytö 2010: 60). The current study tracks the use of legal questions in the Old Bailey https://doi.org/10.1075/scl.97.10ron © 2020 John Benjamins Publishing Company

154 Patricia Ronan

Corpus into the Late Modern English period to outline the use of more or less coercive question strategies in court. This pilot study shows a) how the participants in the legal process interacted at different stages represented in the Old Bailey Corpus, b) which question strategies were used by judges and other legal representatives, and c) how the legal interaction mirrors changes in courtroom practice. The data of the study stems from the Old Bailey Corpus 2.0 (Huber et al. 2016). This corpus of London courtroom exchanges comprises 24.4 million words recorded between 1720 and 1913. Data is culled from early, middle and late periods of the corpus by searching specifically for questions and by examining what kinds of questions were asked by whom. After this introduction, the theoretical background to the study is introduced in Section 2, the data and methodology are explained in Section 3 and the findings are discussed in Section 4 before a conclusion is offered. 2. Discourse in contemporary and earlier legal systems 2.1

Questions in the present-day courtroom

In present-day court cases, the prosecutors are opposed by the defendants and/or their legal representatives. Procedures are highly regulated, and the types of questions that can be asked are clearly delimited as well. Generally, the questions that are asked by the counsels may not influence the witnesses, that is they may not be leading questions. By contrast, in cross-examinations, in which the witnesses of the opposite, ‘hostile’ side are questioned, leading questions are allowed. The witnesses’ answers should be restricted to representing the truth, without partial replies, and without incorrect and irrelevant information (Gibbons 2003: 108–109). At the earliest stage of the legal process, interviewers try to obtain as much information as possible about the situation. For this, questions of different types could be used. To extract broad overview information, open, information-seeking questions are used (Gibbons 2003: 96; Newbury & Johnson 2006: 224–225). Particularly wh-questions are used by the investigators, as well as open invitations for narratives and either-or questions (Newbury & Johnson 2006: 225). The questions that leave respondents most freedom in answering are broad requests for a narrative (see example 1). Then, wh-questions narrow down the subject area of the reply, for example questions starting with why or how (example 2). (1) Would you like to make any comment on that – that finding?  (Newbury & Johnson 2006: 225)

(2) You tell me why you needed to do that? 

(Newbury & Johnson 2006: 227)

Chapter 10.  Question strategies in the Old Bailey Corpus 155



Woodbury (1984: 204–206) divides wh-questions into broad wh-questions, which allow an open range of answers on the one hand (such as What did you do?), and narrow wh-questions, which ask for a specific variable and thus are more constraining (When/where did you go?) on the other hand. Gibbons (2003: 103) also presents statements with rising intonation which he argues to be neutral in terms of information control. Other authors, however, disagree (Woodbury 1984; Archer 2005: 79) and consider bare declarative statements that prompt the respondents to be more coercive. Information-seeking questions can restrict the scope of answers. This holds for either/or questions (example 3), which give the choice between two options (Newbury & Johnson 2006: 229).

(3) […] did you alter it before you left the surgery, […], or as soon as you got back did you cover up your tracks and start altering this lady’s medical records?  (Newbury & Johnson 2006: 229)

Once the interviewers have formed a theory of events that have taken place, they set out to confirm their expectations by using confirmation-seeking questions (Newbury & Johnson 2006: 219). For this, grammatical yes-no questions can be used. Even higher control over the respondent is exercised by negative grammatical yes-no questions (Woodbury 1984: 204–206), such as Didn’t you just tell us that created some doubt? Further, highly controlling confirmation-seeking questions such as declaratives with a tag, tag questions, or bare declaratives are used (Newbury & Johnson 2006: 219). Declaratives with tags are similar to tag questions in that they consist of declaratives plus question forms, as in Mrs. Mellor’s body was buried on the 18th May 1998 at Highfield Cemetery, Stockport. Would you accept that from me? (Newbury & Johnson 2006: 220). Tag questions also consist of declarations followed by question tags, as in you were the person who administered that lady with the drug, aren’t you?, which expects a positive or negative response (loc. cit.: 221). According to Newbury and Johnson, the most coercive confirmation-seeking strategy is the use of bare declaratives, which try to force the interviewee into either confirming a proposition or using an opposition strategy (example 4).

(4) Interviewer:

Interviewee:

The levels were such that this woman actually died from toxicity of morphine, not as you wrongly diagnosed – in plain speaking you murdered her. No.  (Newbury & Johnson 2006: 223)

While professional trial participants are likely to use question strategies expertly, inexperienced lay litigants are less successful in their use, which may have negative effects on the lay participants’ cases (Tkačuková 2010).

156 Patricia Ronan

In her study of questions and answers in the early modern courtroom, Archer (2005: 132) uses the above subdivisions, but distinguishes among the following 17 categories (Table 1). Table 1.  Archer’s questions types in the historical courtroom (2005: 32) Question type

Definition and examples from Archer’s corpus

Wh-interrogative

Starting with a wh-element; What is your name, Sir?

Negative wh-

With a wh-element in a negative sentence; Why did you not name Coleman at that time?

Indirect wh-interrogative

With a reported wh-clause; I asked her why she had done it.

Polar interrogative

Questions that expect either yes or no answers; Have you any more to say?

Negatively framed polar

Yes-no questions with negative framing; shall I not know by what Law I am tried?

Indirect polar

Indirect yes-no questions; I humbly ask whether it was a reasonable thing […]

Indirect negative polar

Indirect negatively framed yes-no question; The Question is, Whether she has never owned and confessed to any Body, that […]

Disjunctive interrogative

Questions offering a choice between either – or elements; Are you Elder or Younger than he?

Indirect disjunctive

Indirect either-or questions; tell us whether it is your own handwriting, or not?

Declarative question

Declarative clause used as a question; And you don’t remember that your Father and Mother came to England in that time?

Tag question

Tag-question added to a declarative clause; You dwell there sometimes, don’t you?

Echo question

A question echoing something that has been said previously.

Elliptical question

Question involving an ellipsis; Well, Sir, the second Year?

Rhetorical question

Question not expecting an answer.

Question(s) in narrative

Questions appearing in speakers’ narratives; Says I, if you would have me, I will go to him, and desire him to come. When would you speak with him? At any time, says he, …

Multiple interrogatives Utterances containing more than one question. Problematic

Question interpretable in more than one way.

Not all of Archer’s categories are relevant to the considerably smaller data set in this study. Thus, those categories that are only marginally represented in her data are not considered here.



Chapter 10.  Question strategies in the Old Bailey Corpus 157

The answers that the questions can elicit may be more or less appropriate for the questions in terms of the Gricean maxims of quality, quantity, relation and manner of the contribution (cf. Archer 2005: 57–59). An answer that is given by a respondent may fulfill all those requirements. If it is insincere, it violates the maxim of quality. Too much information, or indeed too little information, violates the maxim of quantity. Contributions that are off the topic violate the maxim of relation. Confused or unstructured answers violate the maxim of manner. In addition to these, Archer also identifies ambiguous examples which are genuine uncertainties that lead to flouting the maxims (loc. cit.: 58). 2.2

The situation in the Early Modern English courtroom

The earlier English courtroom was structured differently from the contemporary one. Archer (2005: 1; 2010: 185–186) shows that the defendants in the early period of Early Modern English typically had to organize their own defence. At that time, the presiding judge would present evidence that tied the defendant to the crime while the defendant tried to make plausible that they were not in fact culpable (Archer 2010: 186). As the judge or magistrate tried to connect the defendant to the crime, they would also be asking the questions which tried to determine the defendant’s guilt. While the state was represented by the attorney general and the solicitor general from the Tudor period onwards, and prosecution counsels became common from the 1720s onwards, defence counsels were only introduced later (Archer 2005: 85–90). We thus find that the prisoners also questioned the witnesses. Having to organize and conduct their defence, and not knowing what evidence would be presented against them, would obviously have put the defendants at a grave disadvantage (2005: 89). Archer points out that handling evidence in criminal trials becomes a more important issue as the role of the defence lawyer stabilizes (Archer 2005: 94). Concerning the role of defence lawyers, Rama-Martinéz (2013) investigates in how far the stabilization of their role impacts on turn-taking in the English courtroom. She finds that during the 19th century, defendants use decreasing numbers of turn transitions, reporting moves and informing acts. Concerning question types that are used in the Early Modern English courtroom, Archer (2005: 132) provides frequency ranks for the different question strategies used in court for the overall period of her investigation (1640–1760), as well as for the earliest (1640–1679), the middle (1680–1719) and the later subperiod (1720–1760) of her investigation. In the last subperiod of her study, which corresponds to the first decades investigated in the current study here, Archer finds polar yes-no questions to be the most frequent question strategy at 46.3% of the questions in her data from that period. Wh-questions account for 36.2% of her questions

158 Patricia Ronan

of that period. Negative polar (yes-no) questions (5.5%) Declaratives (4.3%) and either-or questions (3.7%) show low frequencies. Indirect-polar yes-no questions amount to 2.1% of her data. Archer further presents rhetorical questions (1.6% overall) and problematic cases (1.6%). Other question types account for 0.3% or less of her data. 3. Data and method The purpose of this study is to give an overview of the development of coerciveness of question use from the Early to the Late Modern English courtroom. As the study by Archer (2005) covers the period from the mid-17th to the mid-18th century in great detail, my investigation starts in the early 18th century and ranges until the latest period covered by the data in the Old Bailey Corpus. The Old Bailey Corpus contains 24.4 million words of text ranging from 1720 to 1913, which consist of 637 proceedings with 518,000 utterances subdivided into an average of 1.2 million speech-related words per decade (Huber et al. 2016: 2). The transcripts of the court cases in hand each cover a number of different cases that were heard on a given day. A manual investigation of this vast amount of data would have been impossible given the scope of the current study. Thus, dates with roughly 50-year intervals between them were chosen. For the start of the investigation, a year was selected that also formed part of the time span investigated by Archer (2005), in order to be able to align the current results with her findings. Here the year 1732 suggested itself. This is the first year attested in the Old Bailey Corpus 2.0 which has a sizeable number of attestations of lawyers’ participation in the trials. With 274 references to lawyers, the first day of trial proceedings in this year, January 14, yielded by far the highest count in this year and in the early 1730s in general. This was followed by data points from the years 1780 and 1830. In the late period of attestation of the corpus materials, a change in the recording of the proceedings appears to have taken place: from the 1870s onwards, questions were recorded more sporadically and emphasis was put on the witness statements and depositions instead. Thus, in order to still draw on a file with high frequencies of recorded lawyers’ questions, the last file that was chosen for analysis was from October 24, 1870, as it is the last one to show high numbers of lawyers’ questions. For each year that was studied except 1870, the first day of proceedings was investigated. As each of these days still contained many trials, for instance 169 on the first day in 1830, February 18, the data set was restricted further by classifying only the first 100 examples of questions from each chosen day. For 1870, the latest file with the largest number of questions was chosen until a total of 400 examples were collected. An overview of the data is given in Table 2.

Chapter 10.  Question strategies in the Old Bailey Corpus 159



Table 2.  Old Bailey trial sources used for the study Year

From trial

1732 1780 1830 1870

t17320114-1 t17800112-1 t18300218-1 t18701024-787

To trial t17320114-41 t17800112-4 t18300218-27 t18701024-835

Total questions 100 100 100 100

All questions asked by legal practitioners, judges or lawyers, or where relevant also jury members, were classified manually into the different question categories presented in Section 2.1 above. Differences in attestation between the different trial participants and between the different periods have been tested for statistical significance with chi-square contingency tests. The resulting findings are presented in Section 4 below. 4. Results This section provides an overview of the question strategies found in the investigated trial proceedings from the earlier and later 18th and 19th centuries. It focusses on changes of attestation in the data from the different periods and compares the use of the strategies and discusses the changes involved. 4.1

Questions in the early eighteenth century

As mentioned in Section 3, the questions from the early eighteenth century are taken from the trials in the first sitting of the court in the year 1732. Particularly active questioning was recorded for the trials t17320114-9 and t17320114-41. The 1730s are a particularly interesting period as it is only during this time that the presence of defence lawyers is increasingly established at court (Archer 2005: 185–186). Correspondingly, at this early stage we find lawyers for the crown as well as the defendants and, to a smaller extent, judges asking questions at the trials. Concerning the distribution of social classes amongst the participants, the Old Bailey Corpus makes an overall distinction between manual labourers, who are considered lower classes, and non-manual professions, who belong to the higher classes (Huber et al. 2016: 9). At the earliest stages of the trial proceedings, little information is given on class. The defendants for whom we receive information belong to the lower classes. The only three participants who are marked as higher class, apart from representatives of the law, are a midwife, a medical doctor and a town official who act as witnesses.

160 Patricia Ronan

The most frequently asked question type overall is the yes-no question (46% of all questions; see Figure 1). This is a question type that exercises strong control over the answer of the respondent, and is significantly more frequently used by the lawyers than by the defendants (p = 0.003 with Yates’ continuity correction, 1 df, according to a chi-square contingency test, and a small effect size (Cramer’s V = 0.30)). As in contemporary courtrooms, we can see that the crown lawyers use the yes-no questions to get confirmation or denial of the information that is given in the question (cf. Gibbons 2003: 104). This may either lead to yes-no answers as in example (5), or prompt more extensively cooperative answers as in (6). (5) [Lawyer:] Did he take the Poker out himself? [Witness:] Yes. 

(t17320114-41)



(6) [Lawyer:] Before the Pistol went off, did you hear any Noise in the Room like struggling? [Witness:] I heard a Noise in the Room just before like walking about, but not like struggling. Immediately after the Pistol went off, the Prisoner came out of the Room, and pulled the Door to, twice, to shut it; He look’d as white and as pale as Death, and came hastily out at the Street-Door.  (t17320114-41)

In example (6) it is notable that the lawyer leads the witness by suggesting that he might have heard a struggle. The witness provides a cooperative answer, but does not follow this suggestion. By contrast, in case the questioner faces an uncooperative witness, the yes-no question of course may also prove a dead end for a questioner, who might hope for further information (7):

(7) [Prisoner:] If his Face had been towards the Window when he received the Wound could he have turned himself about afterwards? [Witness:] No.  (t17320114-41)

Thus, while controlling the type of answer that the questionee provides, the yes-no question is also likely to be too coercive to elicit new information. For that purpose, a narrow wh-question is often used which intends to provide the questioner, as well as the jury who listens to the reply, with detailed information on the circumstances of the crime (examples 8 and 9). The defendants use this question type more rarely, but not statistically significantly so (p = 0.36 with Yates’ continuity correction, 1 df, according to a chi-square contingency test, and no clear effect size (Cramer’s V = 0.1)). (8) [Lawyer:] Where was Mr. Falkingham then? [Witness:] In the Country, for his Health. The Deceased was still very uneasy; he swore and made a Noise, and beat himself several Blows on the Breast, and was very angry. […]  (t17320114-41)



Chapter 10.  Question strategies in the Old Bailey Corpus 161

(9) [Lawyer:] Where is he now? [Witness:] In Derby 

(t17320114-41)

Narrow wh-questions ask for specific information and do not usually leave the informants room to provide a broad narrative of their own. However, some questionees may flout the maxim of quantity here (as in example 8 above) and provide more information than was asked for. Use of wh-questions confirms the facts of the event, and presents the case to the jury. To achieve this, good strategic planning is needed and legal practitioners are more likely to succeed in this than the defendants (cf. Tkačuková 2010), especially defendants with little education. By contrast, the most frequently used question type by the defendants in this sample is the negative grammatical question (examples 10 and 11), which anticipates a particular response (Archer 2005: 79). Overall, 19 (19%) of the questions in the data set are negative grammatical questions, and the majority of those, 14 (74%), are asked by defendants. (10) [Prisoner:] Did not you take the Mallet out of our Vestry? Tis the very Thing that Alderman Parsons knocks with. And did not you borrow the Chissel of your Brother-in-Law? [Witness:] No, you brought them both to me  (t17320114-1) (11) [Prisoner:] Did not you bid me send for a Smelling-Bottle, because she was in a Fit? [Witness:] Yes; and I bid you go for a Doctor; for tho’ I thought she was dead, I was not willing to trust to my own Judgment, because I have no great Skill in the Dead; but I told you that she did not stir, and in my Opinion never would any more.  (t17320114-9)

In these questions, the defendants clearly, albeit unsuccessfully, try to show that they are innocent of the crime they are accused of. 33% of the defendants’ questions are negative grammatical questions. By contrast, negative grammatical questions only account for less than 10% of all questions asked by lawyers. The difference is statistically significant (p = 0.01 with Yates’ continuity correction, 1 df, according to a chi-square contingency test and a small effect size (Cramer’s V = 0.26)). Another type of question that seems to be used more frequently by the defendants than the legal practitioners is the either-or question, which only allows the selection of one of two answers (Gibbons 2003: 104; Archer 2005: 79). However, the difference in use between defendants and lawyers is not statistically significant.1 Where it is used, the questioners may already know the answer, or may try to get 1. p = 0.12 at 1df, according to a chi-square contingency test with Yates’ continuity correction, but due to the low number of attestations this result is not entirely reliable.

162 Patricia Ronan

the answer that they want. The questionee may collaborate (example 12), or may not be overtly cooperative, but exact (example 13). (12) [Lawyer:] Was he standing or sitting? [Witness:] He sat in a Chair, and taking the Poker, he said, I’ll rip my self up with this. His Coat was open, and I think he run the the Poker against his Waistcoat.  (t17320114-9) (13) [Prisoner:] Was you or Mr. Furnell nearest my Wife when she fell? [Witness:] Mr. Furnell and his Friend were coming by before she fell out. I begg’d ‘em for God’s Sake to stop, for here was a Man that would murder his Wife. And one of ‘em said, What have we to do between a Man and his Wife? But for God’s Sake stay, says I, for fear he should kill his ‘Prentice too, who is my Sister’s Son.  (t17320114-9)

Other question strategies are little used in the early data set. Defendants occasionally prompt witnesses with declarative statements (example 14). Broad wh-questions to elicit narrative are used by defendants and legal practitioners (example 15). (14) [Defendant:] You could hear things very plain! [Witness:] It was a very still Night, I was a Stranger to both you and your Wife.  (t17320114-9) (15) [Defendant:] How do you know but it was your Mistress that came up Stairs? for you said you thought it was me, only because you heard my Voice after I was come up. [Witness:] I thought it was not possible that my Mistress shou’d come up Stairs so readily, after the Blows that she received, and the Groans and Cries that she made.  (t17320114-9)

As negative declaratives and tagged declaratives are missing in this data set, the most coercive question strategies are not or hardly in evidence. Figure 1 gives an overview of the raw frequencies of question strategies used by the various participants in the first 100 questions in the court trials in this early period. For the lawyers, yes-no questions are the most frequently used question strategy in this early data set, followed by narrow wh-questions. This observation ties in well with Archer’s (2005: 138) finding for her 1720–1760 data set, where yes-no questions are the most frequent, followed by wh-questions. Declaratives in her data set account for around 5% of the questions and alternative, or disjunctive, (either-or) questions account for around 3%. Even though the results are not an exact match, they are broadly comparable and may serve for cross-validation. Archer (2005: 243) argues that in the early modern period, in contrast to the present day, defendants acted as initiators of actions in the courtroom. Even though they tried to obtain

Chapter 10.  Question strategies in the Old Bailey Corpus 163



50 45 40 35 30 25 20 15 10 5 0 prompt

wh-broad wh-narrow

either/or

yes/no

neg. declarative grammatical

neg.decl

tagged declarative 0

judges

0

0

2

0

2

0

0

0

defendant

3

3

5

5

12

14

1

0

0

lawyers

0

4

11

1

32

5

0

0

0

Figure 1.  Question strategies used in the 1732 trials investigated

responses from the witnesses, they often failed in these attempts. On the one hand, Archer sees this as being due to the defendants’ lack of coercive power and, on the other hand, as being due to their lack of knowledge and expertise in these matters. In the data here, the defendants also show a lack of narrative control by their frequent use of negative questions, which do not achieve their communicative goal, support for the defendants’ positions. 4.2

Questions in the late eighteenth century

The most notable difference between the data sets from 1732 and 1780 is that in 1780 only a fraction of the questions are asked by defendants. Instead, we now increasingly find lawyers acting for the prisoners and asking questions (example 16). However, some defendants still seem to have defended themselves, as also visible in example (17). It seems likely that this would particularly apply to those prisoners who could not afford to employ a lawyer. It is noteworthy that, where information on social class is given, all the defendants in the trials under investigation belonged to the lower class. In the investigated data set from the late 18th century here, witnesses from higher classes are only represented by one aggrieved party and policemen. (16) [Counsel for the prisoner:]

You said Herring thought that he had the money in his pocket the next morning? (t17800112-4)

164 Patricia Ronan

(17) [Prisoner:] Whether my mistress did not tell me that night, that if I could discover the persons who had been guilty of the offence, and they would restore the money, she would give them twenty pounds, and would not prosecute them?  (t17800112-3)

Of the 100 questions investigated for 1780, only one question was asked by the defendant; all others were asked by lawyers. The question strategies that the lawyers used in this subset are very similar to those in the earlier period. The large majority of questions are yes-no questions (52%), again followed by narrow wh-questions. A slight, but statistically non-significant, increase is observable in the use of broad wh-questions (9%) (example 18). (18) [Lawyer:] What passed at Sir John Fielding’s? [Witness:] We were before Sir John Fielding on the Tuesday afternoon; the prisoner was then questioned as to the fact; his account of it was that he had been out into the yard on the Sunday evening to a pantry, that returning from that pantry he saw a man coming from the back door of the house. […]  (t17800112-3)

The broad wh-question in example (18) here is used as an introductory question for a new witness, who answers with a free narrative. There is a notable increase in the use of declaratives used as questions, which are used more frequently than in the 1732 data (example 19). The difference is statistically significant (p = 0.03, 1 df, according to a chi-square contingency test with Yates’ continuity correction and a small effect size (Cramer’s V = 0.18)). (19) [Lawyer:] There was a writing table in the study which was broke open this night? [Aggrieved:] Yes.  (t17800112-2)

In contrast to broad wh-questions, the declarative anticipates a certain response, be it positive or negative. Archer (2005: 79) sees it as exercising rather high control. Gibbons (2003: 179), by contrast, considers it to be neutral in terms of information control as it can also be used as an opening to obtain new information. Here, nine (82%) of the declaratives elicit a short response, typically yes or no; two examples elicit a broader or broad narrative (example 20). (20) [Lawyer:] The linen was not in her apron? [Witness:] No, it was on the ground; when she got up I saw the linen; I went round the table to meet her at the door, and asked her what she wanted; she said a pint of purl, and if she could not get it there, she desired I would let her go where she could get it; I said she should

Chapter 10.  Question strategies in the Old Bailey Corpus 165



not go till my master or mistress came down stairs; I called out to my mistress who came down and lifted up her hat and looked at her; the prisoner told her she wanted a pint of purl; we asked her if that looked like a public-house; she said she did not know. (t17800112-1)



However, the actual cooperative response to the question would have been No, it was on the ground. What follows this clause is not relevant to the question and violates the maxim of quantity. What is also noteworthy is that in this data set we find, albeit with low counts, the more coercive question strategies, negative declaratives as in example (21; three cases) and tagged declaratives as in example (22; one case). (21) [Lawyer:] Nor did you hear her say the money is all my own? [Witness:] No.  (t17800112-4) (22) [Lawyer:] You are a watchman of the parish are you not? [Witness:] Yes. 

(t17800112-2)

An overview of the different question strategies used in the first 100 questions in the investigated trials from 1780 is given in Figure 2. 60 50 40 30 20 10 0

judges

0

0

0

tagged declarative 0

defendants

0

0

0

0

1

0

0

0

0

lawyers

0

9

17

2

51

5

11

3

1

prompt

whbroad

neg. wh- either/or declarayes/no gramm narrow tive atical 0 0 0 0 0

Figure 2.  Question strategies used in the 1780 trials investigated

neg. decl.

166 Patricia Ronan

4.3

Questions in the early nineteenth century

From the early 19th century, the proceedings from the first court sitting in 1830 have been investigated. In this data set defendants are still involved: they ask questions in three of the investigated 27 trials in this period. Class-information is somewhat more extensive than in the earlier data sets. While the victims belong to higher and lower classes, all the defendants – who do not necessarily ask questions – still belong to the lower classes. Various members of higher classes are also recorded as witnesses. These are members of the police force and other witnesses, who often are trades- and business people. The most notable development between this and the 1780s data set is the further, though not yet statistically significant, increase of a variety of more coercive question strategies by lawyers: this holds for negative grammatical questions (13%), as well as for different declarative sentence types in questioning. Thus, ordinary declaratives are very well represented (22%), as well as negative declaratives (5%) and tagged declaratives (1%). For ordinary declaratives, a large number of examples (examples 23, 24) can be found here that indeed prompt further narrative (cf. Gibbons 2003: 104). Of the 20 examples, 14 (70%) elicit narratives, and only 6 (30%) elicit short confirmations or denials. (23) [Lawyer:] I believe there is no water-mark on it? [Aggrieved:] There is on some of it – here is, “Simmons, 1827;” he is an extensive maker – it is the same make, and the same stamp as mine; I always have mine stamped “Superfine Bath.”  (t18300218-2) (24) [Lawyer:] You paid more attention to him than to the prisoner? [Witness:] He came up and spoke to me; I observed him the most – I do not think I could swear to him if I saw him, but I saw the prisoner at the watch-house, and saw him running away; he could have got away if he liked while I was going for the officer – I am turned thirteen years old.  (t18300218-2)

Examples (23) and (24) illustrate that in order for the questionee to show themselves cooperative here, a simple affirmation or negation would not be sufficient to fulfil the Gricean maxim of quantity. An answer which gives a – detailed – narrative is needed. Negative declaratives are particularly used to challenge statements (five examples, such as example 25), but also to clarify points. While a declarative sentence with a tag question may be used to cast doubt on statements, negatively tagged declaratives are considered highly coercive (Gibbons 2003: 102). In example (26), however, the witness resists the effort.

Chapter 10.  Question strategies in the Old Bailey Corpus 167



(25) [Lawyer:] You would not have claimed it if you had seen it on the table of an inn ten miles off? [Aggrieved:] No, certainly – I might have bought it six months before; I have taken stock since, and have not sold four quires and a half of this altogether; I value it at 1s.  (t18300218-2) (26) [Prisoner:] I could have got away, could not I? [Witness:] Not from me – you made no resistance. 

(t18300218-17)

In contrast to the rise in declarative strategies, the use of yes-no questions has decreased, though not statistically significantly, to 40% in the data from this period. Yes-no questions continue to be addressed to both the aggrieved parties and witnesses. Their function continues to be confirmation-seeking (Gibbons 2003: 104). As before, we can see in this period that the defendants use question types that trigger narrow answers: narrow wh-questions, negative grammatical and negative declaratives. The judges, too, use narrow questions. In the case of the judge, this could be to clarify points in the statements, as does the single narrow wh-question asked by the jury. The overall use of question strategies in the first 100 questions in the 1830 proceedings is given in Figure 3. The lawyers in the data from 1830 use comparatively fewer yes-no question, but significantly more declaratives of all types than in the two earlier periods (p  Agreement > Reassurance Unstressed

The movement from the situation on the left in (5) to that in the right would appear to have involved a shift from lexical to contextual, pragmatic meaning for sure, much along the lines discussed by Aijmer (2012: 202–203) in her assessment of the views in Norén and Linell (2007). In this development one can recognize a 5. The pragmatic element of reassurance in (4a+b) is further reinforced in the Irish use of grand (see Hickey 2017 for a full discussion). In his brief remarks on PM sure, Taniguchi (1956: 40) interestingly observes that sure “always anticipates agreement on the part of the listener”. He would appear to have had an intuition about the use of PM sure in Irish English literature. However, Taniguchi did not pursue the matter further.

Chapter 11.  Sure in Irish English: The diachrony of a pragmatic marker 177

development within the hearer’s vantage point in a discourse, as perceived by the hearer, and it goes from a confirmation of knowledge to pointing out agreement and finally to offering reassurance to the hearer regarding issues which might cause doubt. To substantiate this interpretation a variety of examples are given here (see 6a–f) from the author’s own data collections, which illustrate the hearer perspective in the use of sure.6 Note that the use of PM sure in tag questions (6f) is directly related to the appeal to agreement with the speaker or the search for confirmation of shared knowledge, see (6c) and (6a) respectively. (6) a. Confirmation of shared knowledge: Sure there wasn’t any cheap flights then.  (DER, M60+) Sure there’s no children around anymore on the streets much  (WER, F85+) b. Affirming new information: Sure we all have to pay these fierce Euro prices, don’t we now?  (WER, M50+) c. Appeals to agreement with speaker: And they calls them small, sure what can you do?  (WER, F45+) Sure we all know that!  (WER, F55+) d. Inevitability of situation: Sure that the way it is.  (WER, F85+) Sure we all have to go some time.  (WER, F85+) e. Reassurance (sharing new information): Sure they was only trying to help, that’s all.  (DER M60+) Sure you’re grand where you are.  (WER, F85+) f. Appeal to agreement: It’s not worth your while, sure it isn’t?  (WER. F55+)

5. Pragmatic marker sure: The question of vintage As noted by Amador-Moreno and McCafferty (2015: 282), in their quotation of Blake (1981: 15), the use of PM sure was regarded, already by the nineteenth century, as a strong signal of Irishness and used in English literature as such. But one of the earliest uses of PM sure would appear to be from an English author without any obvious Irish connection: there is a letter from Winefrid Thimelby to Herbert Aston, probably written in 1659 (Clifford ed. 1815) and contained in the Corpus of Early 6. The abbreviations in the examples refer to informant data collected for Hickey (2007). DER = ‘Dublin English Recordings’; WER = ‘Waterford English Recordings’; M = ‘male’; F = ‘female’; the digits refer to the approximate age of the informant.

178 Raymond Hickey

English Correspondence by Terttu Nevalainen and Helena Raumolin-Brunberg, University of Helsinki. This letter contains an example of sure which appears to appeal to shared knowledge as does the later Irish use as can be seen in (7).

(7) Sure you thincke me so hardned by affliction, that I have lost both sence of ill.

As the seventeenth century proceeded, documented instances of PM sure increased in number; cf. the following two instances from the play The Relapse, or, Virtue in Danger being the Sequel of The Fool in Fashion, a comedy by Sir John Vanbrugh (1664–1726) performed in the Theatre Royal in Drury Lane and printed in 1697. (8) a. Sure none has deserved more to be damned than I. b. Sure never no body was us’d as I am.

A Restoration playwright of special interest in the present context is Thomas Shadwell (?1642–1692), the author of a play entitled The Lancashire Witches, and Tegue o Divelly the Irish Priest, produced in 1681, which was inspired by the animosity generated against the Catholics as a result of the ‘Popish Plot’.7 The Irish priest referred to in the title is supposed to exorcise the witches of the play who are tormenting English aristocracy and his speech offers renderings of what Shadwell conceived of as Irish English.8 There are 29 instances of sure in The Lancashire Witches, yielding 9.07 as the value for n/10,000;9 the total is 31,961 words; ten of these instances would seem to allow an interpretation as pragmatic marker. (9)

a. b. c. d. e. f.

Thou saist right, sure the world would be almost depopulated. sure they would never trust this Fool. sure his Patron knows him not. sure he is bewitched. sure this is some mistake, you told me you were willing to marry sure I was blind, she is a beauty

7. This was fabricated in 1678 by Titus Oates. He claimed that there was a plot to murder Charles II and place the Catholic James on the throne, thus re-introducing Catholicism into reformed England. 8. Shadwell was an Englishman born in Norfolk. With the restoration of Charles II to king in 1660, his father, John Shadwell, was appointed Recorder of Galway. While this fact does not seem to have affected his son’s life, it may have led to him gaining experience of Irish English accents. 9. This statistic is given throughout this chapter and is a means of normalising the occurrences in texts per 10,000 words. It allows one to compare occurrences using a fixed number of words and hence makes it easier to appreciate the relative number of instances per text. The figures for n/10,000 are given with two decimal places.

Chapter 11.  Sure in Irish English: The diachrony of a pragmatic marker 179



g. h. i. j.

sure I am bewitch’d. sure this is roguery, and Confederacy. hah! here’s nobody, sure all’s clear now! I tremble to think on’t; sure the surprise the Ladies were in before, has frightned ‘em from attempting again.

During the Restoration period, plays by Irish authors or English writers with Irish connections are common, especially comedies or satires. This literary orientation is one which triggered the use of PM sure. William Congreve (1670–1729) in his play Way of the World (1700) has at least five clear instances of PM sure. (10)

a. b. c. d. e.

Sure never anything was so Unbred as that odious Man! Sure I was born with budding Antlers like a young Satyr … That hurts none here, sure here are none of those. And sure he must have more than mortal Skill … Well, sure if I should live to be rid of my Wife, I should be a miserable Man.

The playwright George Farquhar (1678–1707) also has cases of PM sure, for instance, the following four from The Beaux Strategem (1707). (11)

a. b. c. d.

Sure he hears me not, and I could almost wish he – did not. Sure I have passed the gulf of silent death. Sure if I gave him an opportunity he durst not offer it. Sure I have had the dream of some poor mariner …

Throughout the eighteenth century comedies by Irish writers were in vogue, the most popular of which were probably by Richard Brinsley Sheridan (1751–1816). His play The School for Scandal (1777) has instances of sure which appear to be transitional between emphatic and confirmation/ agreement uses. (12)

a. b. c. d.

Sure Lady Sneerwell I am the greatest Sufferer. Sure never were seen two such beautiful Ponies. Other Horses are Clowns. Sure I must know better than you whether He’s come or not. Sure Fortune never play’d a man of my Policy – such a Trick – before.

By the middle of the nineteenth century, the use of PM sure was clearly established. The popular Dublin writer of comedies, Dion Boucicault (1820–1890) has instances of PM sure, clearly recognizable in sentence-initial position. (13) a. Sure I do be ashamed, sir. I do be afraid to go near some girls …  (Dion Boucicault Arrah na Pogue, 1864) b. Sure he does be always telling me my heart is too near my mouth.  (Dion Boucicault The Shaughraun, 1875)

180 Raymond Hickey

6. Pragmatic marker sure and Irishness In his 1981 monograph on non-standard language in English literature, Norman Blake notes the use of PM sure in the portrayal of Irish characters (Blake 1981: 135) and sees it as a key feature used by writers presenting such figures. To test the hypothesis that PM sure was indexical of Irishness, Maria Edgeworth’s Castle Rackrent was taken as a starting point, as this work is commonly regarded as the first regional novel in English literature and specifically aims at portraying Anglo-Irish life around 1800. There are 54 instances of sure in the novel, that is 20.13 per 10,000 words with a total of 26,826 words. Of these instances, 19 occur in the phrase ‘to be sure’ (7.08 per 10,000 words). A full 13 of the 54 occurrences (4.84 per 10,000 words) are as pragmatic markers, that is of unstressed sure as discussed above. There are only three occurrences of surely, as in Something has surely happened, thought I, which illustrate the emphatic use of the adverb. (14)

a. b. c. d. e. f. g. h.



i. j. k. l.

‘Sure I could not get the glazier, Ma’am,’ says I. ‘Sure can’t you sell, though at a loss? ‘Sure it’s time for me, (says she) … ‘Sure you wouldn’t refuse to be my lady Rackrent … … sure I was as careful as possible all the time you were away … … sure you can sell, and I’ve a purchaser ready for you,’ says Jason. ‘Oh, murder, Jason! sure you won’t put this in? … sure I might without any great trouble have the satisfaction of seeing a bit of my own funeral. … sure I remember you very well – but you’re greatly altered, Judy. ‘You can’t see him yet, (says I) sure he is not awake.’ … hark, sure Sir Condy is drinking her health. – sure what good is the car and no horse to draw it?

The frequency of occurrence from Castle Rackrent confirms the use of PM sure as a distinctly Irish English feature. And if this remained true of representations of Irish English throughout the nineteenth and into the twentieth century, then one would expect a high occurrence in the works of playwrights like Lady Augusta Gregory (1852–1932), John Millington Synge (1871–1909) and Sean O’Casey (1884–1954) as well as of the novelist James Joyce (1882–1941). In Synge’s most famous play, The Playboy of the Western World, there are 33 occurrences of surely (n/10,000 = 15.32 word total = 21,539), but only two of sure, only one of which is as a pragmatic marker: Sure he cannot hurt you if you keep your distance from his teeth alone. There are even fewer instances in his other plays: The Well of the Saints contains one, while The Tinker’s Wedding has none. This very low occurrence across Synge’s plays is unexpected and two possible reasons can be offered. The first is that PM sure was rare in the English spoken in the west of

Chapter 11.  Sure in Irish English: The diachrony of a pragmatic marker 181

Ireland by vernacular speakers, the group whose language Synge strove to represent. The second is simply that PM sure did not register with Synge and hence he did not avail of it in his representations of vernacular speech from rural Ireland. The second reason gains likelihood when one considers the work of Lady Gregory. She has a high incidence of sure in her plays which she wrote in an idiom which she regarded as typical of language shifters from Irish in the west of the country. For instance, The Workhouse Ward (1907) is a short one-acter of some 3,611 words with nine instances of sure (n/10,000 = 24.92), five of which are pragmatic markers (15a–e). (15)

a. b. c. d. e.

Sure amn’t I your sister, Honor McInerney that was … Sure we must go through our crosses. Sure you could be keeping the fire in, and stirring the pot … Sure his anger rises fast and goes away like the wind. Sure I am saying nothing at all to displease you.

Sean O’Casey represented vernacular Dublin English in his plays, written in the 1920s. In one of these, Juno and the Paycock,10 there are 22 (n/10,000 = 9.69; word total = 22,691) instances of sure but only three as a pragmatic marker. In another play, The Plough and the Stars, there are 14 instances of sure, three of which are as pragmatic markers. A third play, The Shadow of the Gunman, has 18 instances (n/10,000 = 10.53; word total = 17,088), four of which are as pragmatic markers. Turning to the novelist James Joyce, one finds instances of both emphatic sure and PM sure. The latter type is found in his earlier novel, A Portrait of the Artist as a Young Man (1916), confirming that the feature was already well established before Joyce started writing (16a+b). (16) a. Ulysses (1922) – emphatic sure: I’m sure it’ll be grand if I can only get in with a handsome young poet at my age.  (Molly Bloom’s soliloquy) b. A Portrait of the Artist as a Young Man (1916) – PM sure, affirming new information adding to shared knowledge: … and sure we thought we were grand fellows because we had pipes stuck in the corners of our mouths.

The conclusion from the attestations in Sections 5 and 6 is that PM sure was well established by the beginning of the nineteenth century and would appear to have been used to signal Irishness in regional plays and novels albeit to an extent which varied by author depending on how salient PM sure was for them, either consciously or unconsciously. 10. This play is one of the Dublin trilogy dealing with events around the struggle for independence in Ireland after 1916.

182 Raymond Hickey

7. Possible transfer from Irish The development of Irish English over the past three centuries or more has taken place under the influence of Irish seeing as how the majority of Ireland’s population were Irish-speaking and shifted to English as their first language, particularly in the course of the nineteenth century. By 1900, over 90% of the Irish population no longer spoke the heritage language natively. Given this situation, it is worth considering whether the specifically Irish use of PM sure could be the result of transfer during the language shift process. When considering this possibility, one needs to recall that Irish is a VSO language, and a sentence-initial element would require the finite verb of the sentence to be placed in the relative form at the head of a relative clause after the word meaning ‘sure’. The sentence in example (17) shows an instance of such usage. (17) Cinnte go bhfuil mé ag dul ann amárach. [sure that is-rel me at going there tomorrow] ‘Sure, I am going there tomorrow.’

However, the Irish sentence in (17) shows the emphatic function which (stressed) sure has in English (see gloss). In fact such sentences in Irish can occur with a double emphatic as in (18). (18) Cinnte dearfa go bhfuil mé ag dul ann amárach. [sure positive that is-rel me at going there tomorrow] ‘Sure, I am definitely going there tomorrow.’

The view of older writers that non-standard features of Irish English can usually be traced to Irish influence does not hold in this case. For instance, Hayden and Hartog (1909: 784) simply assume that sure as used in Ireland is the result of transfer from Irish: “‘sure’ and ‘well’ and ‘entirely’ are indeed true English words; but their abuse as interjections and expletives is a Gaelic use transferred”. The two authors do not give any Irish words or structures which they think may have been operative here, so their standpoint that a specifically Irish use of sure (here: PM sure) is due to transfer from the Irish language remains unsubstantiated. 7.1

Sure in English and Irish corpora

As a reference corpus to A Corpus of Irish English, A Corpus of English Dialogues 1560–1760 (main investigators and compilers Merja Kytö, Uppsala University and Jonathan Culpeper, Lancaster University, see Culpeper & Kytö 2000, 2010) was consulted. The texts of this corpus are of British provenance and cover the time period from 1560 to 1760, divided into chronological subperiods. The last three

Chapter 11.  Sure in Irish English: The diachrony of a pragmatic marker 183

subperiods, 1640–1679, 1680–1719 and 1720–1760, are of interest in the present context as they show certain overlap with the periods for which the first attestations of PM sure are available in Ireland or in texts having to do with Ireland. Note that the texts of the third subperiod of A Corpus of English Dialogues (see the first subperiod listed in Table 1) consist largely of comedy, a genre in which vernacular speech prevails. Table 1.  A Corpus of English Dialogues: occurrences of sure (pragmatic marker usage in brackets) Subperiod 1640–1679 1680–1719 1720–1760 Three subperiods together

Tokens (as PM)

Texts with finds

Word total

n/10,000

130 (10) 215 (9) 149 (6) 494 (25)

35 35 26 96

243,371 325,803 238,472 807,646

5.3 (0.4) 6.7 (0.3) 6.3 (0.3) 6.1 (0.3)

The overall incidence of sure is much lower in A Corpus of English Dialogues (Table 1) than in the texts of A Corpus of Irish English (Table 2). Even allowing for the difficulty of determining with certainty the semantics/pragmatics of sure in every instance, the Irish corpus has many times the amount of sure than the English dialogues corpus. This fact is significant when one bears in mind that PM sure is not a transfer feature from Irish and must thus have been embryonically found in the input varieties of English to Ireland (Amador-Moreno & McCafferty 2015: 282–283). The expansion to a pragmatic marker with an intersubjective function in discourse would seem to be a development which took place in Ireland after renewed input of English to the country in the seventeenth century. Compare the occurrences of PM sure (indicated in brackets in columns 2 and 5 of Tables 1 and 2) in the English and Irish corpora consulted here. Table 2.  A Corpus of Irish English: occurrences of sure in dramatic texts (pragmatic marker usage in brackets) Period 18th century 19th century 20th century Three periods

Tokens (as PM)

Texts with finds

Word total

n/10,000

209 (26) 199 (57) 147 (48) 555 (131)

 8  9 10 27

122,037 134,545 235,493 492,075

17.1 (2.1) 14.8 (4.2)   6.2 (2.0) 11.3 (2.7)

In addition, it is clear that the use of PM sure in dramatic texts from Ireland peaked in the nineteenth century and dropped off in the twentieth century (see n/10,000 figures in the rightmost column of Table 2). The question is why a decrease in the twentieth century literature is observable? The indexical character of PM sure may

184 Raymond Hickey

well have triggered an unconscious reaction against it among writers who saw it as too stereotypical of Irishness. This interpretation receives support from an examination of sentence-initial sure,11 that is in a position of high syntactic salience (see Table 3). For the twentieth century there is a drop of about 41%. The decline in literary representations of Irish English does not, however, imply that there was a corresponding decrease of its occurrence in spoken Irish English. Table 3.  A Corpus of Irish English: occurrences of sentence-initial sure in dramatic texts Period 19th century 20th century

Tokens 66 59

Texts with finds 8 7

Word total

n/10,000

113,564 175,658

5.8 3.4

8. Conclusion In keeping with the call for “further diachronic analysis of IrE” by Amador-Moreno and McCafferty (2015: 287) the present study has examined the occurrence of PM sure in historical texts available from the seventeenth to the twentieth century. In this way the investigation of corpus letters, which the two authors just mentioned undertook recently, has been complemented by the dimension of literary texts. The examination of historical texts, especially pieces by English authors concerned with Ireland and the Irish, has shown that already in the late seventeenth century PM sure had become sufficiently enregistered for English authors to use it when satirically portraying Irish characters. The most likely reason for the quick recognition of PM sure as a specifically Irish feature (Blake 1981: 135) was probably the identifiable syntactic context in which it occurred and still occurs, namely in an unstressed sentence-initial or clause-initial position before a full verb phrase. The interpretation of PM sure shows a clear trajectory from emphatic to affirming/ reassuring particle which bridges the gap between a subjective use and an intersubjective one. This is a parallel to the trajectory which the Irish use of grand took during the nineteenth century (Hickey 2017). There is also a wider question here: why did Irish English pragmatics12 develop along these lines? One suggestion concerns the essentially consensual nature of exchanges (Hickey 2007: 371–374) in rural communities in Ireland which then spread

11. Sentence-initial position includes both emphatic uses and uses as a pragmatic marker. 12. For further studies of this area, see Barron & Schneider (eds. 2005), Vaughan & Clancy (2011), Clancy & Vaughan (2012), Hickey (2015), Kirk (2015), Barron & Pandarova (2016).

Chapter 11.  Sure in Irish English: The diachrony of a pragmatic marker 185

to include the towns. Consensuality is based on broad areas of agreement and is maintained in discourse by confirming information shared by the participants. In this sense, PM sure can be interpreted as an integral part of the larger field of Irish English pragmatics and indeed to be a sign of Irishness.

References Aijmer, K. 2009. The pragmatics of adverbs. In One Language, Two Grammars? Differences between British and American English, G. Rohdenburg & J. Schlüter (eds), 324–340. Cambridge: CUP. Aijmer, K. 2012. Understanding Pragmatic Markers: A Variational Pragmatic Approach. Edinburgh: EUP. Aijmer, K. 2014. Pragmatic markers. In A Handbook of Corpus Pragmatics, K. Aijmer & C. Rühlemann (eds), 195–218. Cambridge: CUP. Amador-Moreno, C. P. & McCafferty, K. 2015. “Sure this is a great country for drink and rowing at elections”: Discourse markers in the Corpus of Irish English Correspondence, 1750–1940. In Amador-Moreno, McCafferty & Vaughan (eds), 270–292. Amador-Moreno, C. P., McCafferty, K. & Vaughan, E. (eds) 2015. Pragmatic Markers in Irish English [Pragmatics & Beyond New Series 258]. Amsterdam: John Benjamins. Barron, A. & Pandarova, I. 2016. The sociolinguistics of language use in Ireland. In Hickey (ed.), 107–130. Barron, A. & Schneider, K. P. 2008. Variational pragmatics: Studying the impact of social factors on language use in interaction. Intercultural Pragmatics 6(4): 425–442. Barron, A. & Schneider, K. P. (eds). 2005. The Pragmatics of Irish English. Berlin: Mouton de Gruyter. Blake, Norman F. 1981. Non-standard Language in English Literature. London: André Deutsch. Brinton, L. 1996. Pragmatic Markers in English: Grammaticalization and Discourse Functions. Berlin: Mouton de Gruyter. Clancy, B. & Vaughan, E. 2012. “It’s lunacy now”: A corpus-based pragmatic analysis of the use of ‘now’ in contemporary Irish English. In New Perspectives on Irish English [Varieties of English around the World G44], B. Migge & M. Ní Chiosáin (eds), 225–246. Amsterdam: John Benjamins. Culpeper, J. & Kytö, M. 2000. Data in historical pragmatics: Spoken interaction (re)cast as writing. Journal of Historical Pragmatics 1(2): 175–199. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Davidse, K., Vandelanotte, L. & Cuyckens, H. (eds) 2010. Subjectification, Intersubjectification and Grammaticalization. Berlin: De Gruyter Mouton. Defour, T. 2010. The semantic-pragmatic development of well from the viewpoint of (inter) subjectification. In Davidse, Vandelanotte & Cuyckens (eds), 155–196. Dolan, T. 2012[1998]. A Dictionary of Hiberno-English: The Irish Use of English, 3rd edn. Dublin: Gill and Macmillan. Hickey, R. 2003. Corpus Presenter: Software for Language Analysis. With a Manual and A Corpus of Irish English as Sample Data. Amsterdam: John Benjamins.

186 Raymond Hickey

Hickey, R. 2007. Irish English: History and Present-day Forms. Cambridge: CUP. Hickey, R. 2015. The pragmatics of Irish English and Irish. In Amador-Moreno, McCafferty & Vaughan (eds), 17–36. Hickey, R. 2017. The pragmatics of grand in Irish English. Journal of Historical Pragmatics 18(1): 82–102. Hickey, R. (ed.) 2016. Sociolinguistics in Ireland. Basingstoke: Palgrave Macmillan. Joyce, P. W. 1910. English as we Speak it in Ireland. London: Longmans, Green, & Co. Reprinted 1979 by the Wolfhound Press, Dublin. Kallen, J. L. 2013. Irish English, Vol. 2: The Republic of Ireland. Berlin: De Gruyter Mouton. Kirk, J. M. 2015. Kind of and sort of: Pragmatic discourse markers in the SPICE-Ireland Corpus. In Amador-Moreno, McCafferty & Vaughan (eds), 89–114. Kytö, M. & Culpeper, J. 2006. A Corpus of English Dialogues 1560–1760. Uppsala: Department of English, Uppsala University. Norén, K. & Linell, P. 2007. Meaning potentials and the interaction between lexis and contexts. Pragmatics 17(3): 387–416. Pandarova, I. (In preparation). Revisiting sentence adverbials in relevance theory: The semantics and pragmatics of sure across varieties of English. Lüneburg: Leuphana University. Schiffrin, D. 1990. The principle of intersubjectivity in communication and conversation. Semiotica 80, 121–151. Taniguchi, J. 1972[1956]. A Grammatical Analysis of Artistic Representation of Irish English with a Brief Discussion of Sounds and Spelling, revised and enlarged edn. Tokyo: Shinozaki Shorin. Traugott, E. C. 2003. From subjectification to intersubjectification. In Motives for Language Change, R. Hickey (ed.), 124–142. Cambridge: CUP. Traugott, E. C. 2010. (Inter)subjectivity and (inter)subjectification: A reassessment. In Davidse, Vandelanotte & Cuyckens (eds), 29–71. Vaughan, E. & Clancy, B. 2011. The pragmatics of Irish English. In Irish English in Today’s World, R. Hickey (ed.). Special issue of English Today 106: 47–52.

Chapter 12

American English gotten Historical retention, change from below, or something else? Lieselotte Anderwald University of Kiel

I trace the history of the American English (AmE) past participle gotten, widely (but wrongly) regarded as a historical retention of an earlier BrE form. As corpus data shows, gotten almost died out in American English as well, but was then revived. Although get is found mainly in speech-related text types, the revival of gotten is not an innovation from below – contrary to linguistic intuition. Instead, its rise was promoted by careful writers who deliberately avoided the highly stigmatized stative have got. This explains why the perfect form gotten appears in more formal text types first, and how gotten became specialized to dynamic contexts only. AmE gotten is thus a curious case of an unintended side-effect of marginally successful prescriptivism. Keywords: American English, language change, COHA, AHN, past participle gotten

1. Introduction: Gotten was lost, then revived in American English The use of AmE past participle gotten, as in (1), instead of BrE got, as in (2), is widely (but wrongly) regarded as a historical retention of an earlier BrE form. (1) a. I had gotten some money from Houghton Mifflin ( COHA, 2002, Fiction) b. I know if one of the kids has gotten sick during the day  (COHA, 2002, Magazine) (2) a. Her mother had never understood that, either. ‘You’ve got Daddy’s money,’ she would say.  (BNC, 1990, Fiction) b. When Jenny had got ill, she had been willful again (BNC, 1991, Fiction)

In this chapter, I will show that the present-day American use of (have) gotten is not a straightforward retention of the older British English form, and present an alternative history of the decline, and rise of gotten in American English. https://doi.org/10.1075/scl.97.12and © 2020 John Benjamins Publishing Company

188 Lieselotte Anderwald

The myth of gotten as a straightforward historical retention has been thoroughly exploded, busted and buried by corpus linguists (Hundt 2009; Anderwald 2020, forthcoming), although in public discourse it still survives unscathed. As corpus evidence can quickly show, gotten is a clear case of an older (British English) form that was presumably transported to the new colonies, almost died out, but was then revived in American English in the nineteenth century. Based on Hundt 2009, Figure 1 shows the decline of (have) gotten (as opposed to (have) got) in British and American English before and around 1800, followed by an increase in American English after 1800. 12

AmE gotten (ARCHER) BrE gotten (ARCHER)

share of gotten (vs. got, %)

10

AmE gotten (EAF)

8 6 4 2 0

1600

1650

1700

1750

1800

1850

1900

1950

Figure 1.  Gotten in historical corpora (based on figures in Hundt 2009, who has looked at ARCHER and an additional corpus of Early American Fiction (cf. Kytö 2004), here recalculated as percentages of gotten of all perfect forms)

Hundt therefore correctly calls AmE gotten a “postcolonial” revival (Hundt 2009). However, since ARCHER (A Representative Corpus of Historical English Registers, cf. Biber et al. 1994) is comparatively small and has rather large, at best half-century intervals, the exact turning point of the decline and revival of gotten is still unclear. With the help of COHA (Corpus of Historical American English, cf. Davies 2010–), we can be more specific and date the revival to the middle of the nineteenth century, more precisely to the decades following the 1850s. Figure 2 shows the slow decline of have gotten,1 until the 1850s, followed by a slow and then rapid increase in the form. 1. For the sake of comparability, the corpus was only searched for lexeme have followed by the forms got vs. gotten. This clearly does not find all instances of the perfect form (e.g. inversion, negation, split forms), but ensures comparability across decades.

Chapter 12.  American English gotten 189



50

have gotten

45

occ. per 1 million words

40 35 30 25 20 15 10 5 0

00 20 0 9 19 0 8 19 70 19 0 6 19 50 19 0 4 19 30 19 20 19 10 19 0 0 19 0 9 18 0 8 18 70 18 0 6 18 50 18 0 4 18 30 18 20 18 10 18

Figure 2.  The rise of have gotten in COHA (text frequency per 1 million words)

If we concentrate on just the nineteenth-century development, the scale of the decline and subsequent reversal becomes even clearer, as the zoomed-in Figure 3 illustrates. 6

have gotten

occ. per 1 million words

5 4 3 2 1 0

1810

1820

1830

1840

1850

1860

1870

1880 1890 1900

Figure 3.  The decline and rise of have gotten in the nineteenth century in COHA (text frequency per 1 million words)

This more detailed picture then opens up the new, perhaps more interesting question: who revived gotten, and why? The following sections will investigate whether gotten arose as a result of the development of an endonormative American standard,

190 Lieselotte Anderwald

whether it can be linked to Scottish/Irish immigration, and whether it can with justification be called a change from below, before presenting an alternative scenario. An additional question that has not been raised before also follows from the rejection of gotten as a straightforward retention. Since AmE gotten is used in dynamic contexts only, the question of where this functional differentiation comes from will also have to be answered. 2. Who revived gotten? 2.1

Gotten as a result of the development of an endonormative American standard

At least three possible answers have been proposed (so far) to explain the revival of gotten: gotten can be interpreted (a) as part of deliberate Americanization in the course of developing an endonormative American standard English, (b) as Scots-Irish influence, or (c) as change from below. For deliberate Americanization, I will refer to Schneider’s (2007) Dynamic Model. Although Schneider does not explicitly mention gotten, given its present-day iconic status as a much-noticed morphological Americanism (at least since Mencken 1921, Marckwardt 1958, up to Gowers 2016, for more details cf. Anderwald 2020, forthcoming), we could assume that the revival of gotten was a change from above, actively promoted by prescriptivist writers of American grammars and schoolbooks in order to differentiate American English from British English. According to Schneider, although morphological features may lag behind phonological and orthographic ones, Americanization happened during, after, or as a consequence of political independence. In his detailed investigation of the emergence of American English, his Phase 4 (Schneider 2007: 282–91), called “endonormative stabilization”, is explicitly headed by a famous quotation from Noah Webster: “Our honor requires us to have a system of our own, …” (Webster 1789: 20. In fact, the citation continues even more pertinently: “ … in language as well as government. Great Britain, whose children we are, and whose language we speak, should no longer be our standard”). An emerging sense of nationhood and political independence is thus clearly linked to linguistic independence (at least by Webster, but presumably he would have been representative of wider discourses at the time; cf. Finegan 1980, 1998). However, it already has to be noted here that the timing is slightly off – Schneider’s Phase 4 spans the second half of the nineteenth century, 1848–1898, but Webster wrote a century earlier. Presumably one would have to concede the possibility of a delay between theoretical positions and actual changes. However, the question whether there is any evidence for a deliberate dissociation from British English for this specific feature is a question that can now be answered on the basis of empirical data.



Chapter 12.  American English gotten 191

Presumably, an act of ‘deliberate dissociation’ would involve official institutions like government agencies, language academies and/or schools (cf. the focus on institutions in Gloy 1998). In the absence of official language agencies and language academies in North America, schools would play an important part in installing official doctrines in schoolchildren, and thus in the population. We would expect a consensus to emerge over time of what it was appropriate to teach in an American context, and if this included a (perhaps implicit) understanding that American English should rightfully become quite different from British English, this discourse should also show up in grammar books designed for the American market. I thus looked for any information on gotten as a past participle form in my collection of 125 nineteenth-century grammar books written by Americans for speakers of American English especially for use in schools, or for home teaching (the American part of the Collection of Nineteenth-Century Grammars CNG, cf. Anderwald 2016 for a full list). Strikingly, in American grammar books of the time, gotten is only rarely preferred as a past participle in the long lists of irregular verbs. Although 94 out of 125 American grammars mention the verb get, only 17 grammars overall prefer gotten as a possible participle form. These 17 grammars are found from the 1830s onwards, meaning that some authors were writing at a time when gotten was actually still dying out. Figure 4 shows this development. At the bottom of the bars can be found the stance that gotten is rare (or even obsolete) – a stance that declines from half of all grammars in the 1800s to between 20% and 10% at the end of the century. Clearly what becomes dominant after the 1850s is the opinion that got is to be preferred. At the top of the bars, we find the opposite view that gotten is preferred, or that gotten is the only possible participle form – together these stances decline from about 30% in the 1840s to 15% at the end of the century, and they only ever constitute a minority opinion.2 Given that many grammar writers in the nineteenth century are linguistically extremely conservative (thus they still routinely include thou as the second person pronoun, recommend negation without do-support, and/or list many obsolete verb forms, amongst others past tense gat instead of got), it is perhaps not surprising that they would include a participle that is still found in Shakespeare, or the Bible. However, there are no explicit comments why gotten should be preferred to got as the past participle – if any remarks on gotten are made, it is called obsolete, nearly obsolete, obsolescent, archaic, seldom used, less in use or little used (e.g. by Webster 1822; Greenleaf 1821; Fisk 1822; Putnam 1828; Hamlin 1832; Felton 1843; 2. Some grammars do not mention get separately, but conflate get and forget (or beget). I have marked these cases separately because it is possible that for these grammar writers, non-variation in the paradigm of forget (forget–forgot–forgotten) may have obscured the possibility that the participle of get may have been variable.

192 Lieselotte Anderwald

25

(for)gotten gotten gotten preferred

20

got preferred got

15

gotten obsolete/rare

10

5

0

1800 1810 1820 1830 1840 1850 1860 1870 1880 1890

Figure 4.  Past participle got vs. gotten in the CNG (only American grammars that mention get)

Hallock 1849; York 1862, or Welsh 1889). Presumably then, a preference for gotten is linked to the perception of gotten as the older form, but this is never explicitly said – in particular it is never mentioned as a specifically American form (cf. Anderwald forthcoming). In sum, we can with some certainty conclude that the revival of gotten was not promoted through an act of deliberate dissociation from British English in or before the 1850s, not even in grammars that were explicitly marketed as grammars of the “American language”. In fact, as I have shown elsewhere (Anderwald 2020), a wider public discourse of gotten as an Americanism does not start until the 1870s, and it starts in publications on Americanisms and in newspapers, not in strictly prescriptive grammar books. 2.2

Gotten as a Scotticism

Another notion that sometimes links an older use of gotten with gotten as an Americanism centres on the fact that gotten is still used in Scottish English (cf. OED: s.v. get v.). Of course, it is well known that many Scots (and Scots-Irish) immigrated to the US in the nineteenth century, and it is thus highly plausible that they would have brought gotten with them. This is already mentioned by Jespersen in passing, who observes that “In Scotland gotten seems to be frequent (Burns, Scott), and it is possible that Americans have taken this form from Scottish” (Jespersen 1931: 54).

Chapter 12.  American English gotten 193



However, it also has to be noted that there is no functional differentiation in Scottish English gotten, where it is used indiscriminately for stative as well as dynamic uses of get, that is both in the sense of ‘own’ (stative: I have gotten a car = ‘I have/own a car’) and in the sense of ‘obtain’ (dynamic: I have gotten a car = ‘I have obtained/bought/… a car’). In English English, gotten died out in both the more traditional dynamic use and in the new stative use. Even though it lingered on in dynamic uses slightly longer, eventually in both senses it was replaced by got, as Lorenz (2016) has shown, based on the Parsed Corpus of Early English Correspondence (Taylor et al. 2006) and the Corpus of English Dialogues (Kytö & Walker 2006); in Figure 5, his “initial” contexts are the older dynamic contexts, “bridging” contexts are ambiguous between dynamic and stative reading, and the decline of gotten (vs. got) in both contexts is clearly visible. Indeed, in his last period (1701–1760) gotten is not used anymore in either context. In Scottish English, functionally undifferentiated gotten is thus a true historical retention. However, in American English we find a striking functional differentiation that we do not observe in Scottish English even today (cf. Durham p.c. for Hebridean English): in American English, have gotten has specialized to dynamic uses, whereas have got has a purely stative sense (and is thus equivalent to stative have, cf. Mencken 1919: 206; Robertson 1931; Jespersen 1931: 54; Mencken 1936: 257). I know of no studies that empirically test the claim that the recessive form gotten was bolstered in American English by mid-century Scots-Irish immigration, but presumably as a substitute one would need at least evidence for parallel Scottish English influence in other linguistic features – perhaps rhoticity, or the 100

initial

share of gotten (vs. got, %)

90

bridging

80 70 60 50 40 30 20 10 0

1460–1580

1581–1640

1641–1700

1701–1760

Figure 5.  The decline of have gotten versus have got (based on figures in Lorenz 2016: 500), percentage of gotten

194 Lieselotte Anderwald

lexemes pinkie or wee could be adduced. Extant emigrant letter corpora have to my knowledge not been investigated for their use of gotten vs. got yet, although this would constitute a relatively straightforward endeavour. At least in terms of chronology, an initial impetus by Scots-Irish immigrants reviving latent uses of past participle gotten in the middle of the nineteenth century seems a possibility, and should not be dismissed out of hand. However, this then still leaves the question of a subsequent functional differentiation to be answered. I will come back to this functional differentiation in Section 3. 2.3

Gotten as a change from below

The third argument is intuitively the most convincing: as we know from a host of corpus studies (Hundt 2001; Biber et al. 1999: 367–376 et passim; Anderwald 2017, forthcoming), all forms of get are a clear indicator of colloquial, informal language, and therefore the rise of gotten may very well have been a change from below that spread from spoken language to written language in the nineteenth century. This seems to be the line of thought in Hundt when she says that “gotten … may thus be described as a low-frequency, colloquial variant that has been gaining ground again rather lately in written AmE” (Hundt 2009: 22) and that this development “would fit with the observed colloquialization of the written norm” (Hundt 2009: 35). Based on the text types contained in the Corpus of Historical American English, Figure 6 illustrates that all forms of get are indeed at all times strongly preferred in Fiction in COHA – the text type that is closest to spoken interaction. In fact, get is so much more frequent in the text type Fiction that this skews the average figures (depicted in the broken line in Figure 6), and they essentially follow the pattern of get-use in that text type. However, unfortunately, measuring the text frequency of get (though highly indicative of its informal status) is the wrong measure when we are interested in the competition between have got and have gotten, and determining the factors that have impacted the shifting ratio between the two strategies over time. Figure 6 just shows that overall, get is used the most in more informal texts. Incidentally, this is true for all constructions involving get, not just lexeme get, but also the (much rarer) get passive or the perfect form have gotten per text type in COHA, as Figures 7 and 8 illustrate. Because the get-perfect is so much rarer, occurrences in Figure 8 are calculated per million words, to keep the development visible.3

3. Searches were for the lexeme get followed by a past participle (for the get-passive), and lexeme have followed by gotten vs. got (for the perfect forms); for more details and a discussion cf. Anderwald (2016: 217–236, 2017, 2018).

Chapter 12.  American English gotten 195





Fiction Average GET

 occ. per , words

Magazines 

Newspapers Non-Fiction

     

                                                    

Figure 6.  The text-type specific rise of get in COHA (text frequency per 100,000 words) 

Fiction



Average GET passive

occ. per , words

   

Magazines Newspapers Non-Fiction

    

                                                       

Figure 7.  The rise of the get-passive in COHA text types (text frequency per 100,000 words)

The status of the general informality of the lexeme get (in all its constructions) is thus undisputed, and get in all its forms is indeed a clear indicator of speech-based styles. However, if we are interested in the shifting fates of the two variants have gotten vs. have got, we have to measure this competition by directly comparing the two variants in their respective environment – looking at percentages of the two variants per text type is thus the method of choice. This is displayed in Figure 9, which shows a strikingly different profile, since Fiction is not the dominant text type anymore.

196 Lieselotte Anderwald



Fiction Magazines

occ. per  million words

.

Newspapers



Non-Fiction

.  .  . 

         

Figure 8.  The rise of have gotten in COHA text types (text frequency per 1 million words) 

Fiction Magazines

share of gotten (vs. got, )



Newspapers Non-Fiction

    

                                                    

Figure 9.  Have gotten in COHA per text type, percentages4

Investigating the percentages rather than the text frequencies of the new, rising form have gotten (compared to have got) in COHA thus leads to rather different and completely unexpected results. Despite the fact that in terms of text frequency, gotten and all other forms of get were mostly used in Fiction texts, in relative terms the rise of have gotten was promoted by expository prose, especially Non-Fiction texts 4. Occurrences in the 1810s have been left out of this diagram since they are below 5 in Magazines and Non-Fiction, skewing percentages for those text types.



Chapter 12.  American English gotten 197

and News writing. By contrast, Fiction, and thus presumably other speech-based genres, kept preferring the perfect form have got over have gotten. The relative use of have gotten (instead of have got) in perfect environments was thus not promoted by Fiction texts, but by much more carefully constructed and edited Non-Fiction writing, and in Newspaper texts. In contrast to the overall use of get or the getpassive, have gotten thus presumably does not originate in spoken colloquial language, and is presumably not a change from below. 3. How did the functional differentiation of AmE gotten develop? As briefly mentioned in the beginning, AmE have gotten is today restricted to dynamic contexts. Stative have got (as opposed to older dynamic have gotten/have got) is a relative newcomer in the field of possession marking. According to Lorenz, first attestations are usually dated to around 1600, and the seventeenth century is “the relevant period for its early usage and variation with bare have” (Lorenz 2016: 489), although have got(ten) is notoriously difficult to disambiguate between stative and dynamic uses: I have got a car can mean both ‘have obtained’ and ‘own now’, because possession is usually a consequence of obtaining. As Lorenz has shown (cf. Figure 5 above), there was a slight preference during its decline for gotten to remain in use in the older (dynamic) function, but gotten took on, and then disappeared completely from, both the new (stative) function and the dynamic function. This makes it even more startling to see that the revival and then rise of gotten in American English is intimately tied up with the emergence of the functional differentiation as we observe it today for American English, and it begs the question since when this functional differentiation can be observed. As I will argue, the text-type specific preference for gotten in expository prose as illustrated in Figure 9 holds the clue also to the emergence of its functional specialization. 3.1

The functional differentiation of gotten in prescriptive sources

Like for the ‘American-ness’ of gotten, there is no evidence of an awareness of a functional differentiation in nineteenth-century grammars and handbooks of Americanisms for gotten vs. got. What we do find in prescriptive sources, however, perhaps not unexpectedly, is massive criticism of stative have got. This is not unexpected given that get in general is a marker of informality, that stative have got was indeed a relatively new use of get, and that this language change in all likelihood was a change from below, and thus presumably promoted by the ‘wrong’ sort of speakers. Based on my American subsection of the CNG, Figure 10 illustrates how

198 Lieselotte Anderwald

30

not mentioned descriptive

No. of grammars

25

criticized

20 15 10 5 0

1800

1810

1820

1830

1840

1850

1860

1870

1880

1890

Figure 10.  Grammars mentioning get-constructions outside lists of irregular verbs (CNG, American grammars)

many grammar writers per decade mentioned get descriptively (i.e. without evaluation), how many criticized get, and how many did not mention get at all – apart from in their lists of irregular verbs, that is. As we can see, get is not mentioned by the vast majority of American grammar writers beyond their lists of irregular verbs. In fact, only 28 grammars, that is about one in five, mention get in the remaining text. When get is mentioned, however, it is almost always criticized (by 25 out of 28 grammars, i.e. 90 per cent). In fact, it is the most criticized feature of those I have investigated so far (Anderwald 2016: Chapter 8), and especially stative have got is always criticized. I will argue that the massive criticism of have got laid the ground for a functional differentiation, because careful or insecure writers would have tried to avoid have got at all costs. Although proscribing stative have got implies that the use of dynamic have got should be permitted, criticism of stative have got was so vicious that it may have affected all forms of have got. As quite an extreme example, the popular writer Richard Grant White remarks in 1870: Get, one of the most willing and serviceable of our vocal servants, is one of the most ill used and imposed upon – is, indeed, made a servant of all work, even by those who have the greatest retinue of words at their command. They use the word get – the radical, essential, and inexpugnable meaning of which is the attainment of possession by voluntary exertion – to express the idea of possessing, of receiving, of suffering, and even of doing. In all these cases the word is misused. … The most common misuse of this word, however, is to express simple possession. It is said of a man that he has got this, that, or the other thing, or that he has not got



Chapter 12.  American English gotten 199

it; what is meant being simply that he has it, or has it not – the use of the word got being not only wrong, but, if right, superfluous. If we mean to say that a man is substantially wealthy, our meaning is completely expressed by saying that he is rich, has a large estate, or has a handsome property. We do not express that fact a whit better by saying that he has got rich, or has got a large estate; we only pervert a word which, in that case, is entirely needless, and is probably somewhat more than needless. For it is quite correct to say, in the very same words, that by such and such a business or manoeuvre the man has gotten a large estate. Possession is completely expressed by have; get expresses attainment by exertion. … to say of a vagrant that he has got no home is bad.  (White 1870: 116–117, my emphasis in bold)

It probably would have taken quite some courage to stand up to the rhetoric of an influential writer like White, and to continue to use have got in print, even in dynamic contexts. White himself makes it clear that in his opinion, dynamic uses are much better served by distinctly marking them using have gotten instead. Proscription of stative have got thus seems to have had a “bulldozer effect” (Chapman 2012), razing everything in its path without distinction and thus extending also to dynamic have got. Dynamic have got was therefore avoided to escape censure by writers like White and perhaps likeminded editors, but indirectly this obviously promoted the use of have gotten for dynamic contexts.5 Proscription of stative have got thus clearly affected the use of dynamic have got as well; it demonstrably created widespread linguistic insecurity as to whether the general use of get, the use of have got, or of stative or dynamic get was licensed by the authorities or not. It was this insecurity which then laid the ground for a functional differentiation. Add to this the acknowledgement of gotten as a legitimate, conservative form, which since the 1870s is frequently referred to as “simply an old form” (i.e. the myth of gotten as conservative Americanism), and you have the making of a functional specialization of this perfect form, as exemplified in White’s text above.

5. In addition, it is objectively very difficult to distinguish dynamic and stative contexts: because there is so much overlap, Lorenz (2016) above for example only speaks of ambiguous “bridging” contexts, that is contexts that could potentially be read as stative. Widespread ambiguity also makes disambiguation of the corpus data above extremely difficult, if not impossible. Writers must have faced the same dilemma: even if they intended have got to have a clearly dynamic use, they ran the risk of being read as employing it in its stative sense– even White’s constructed example above suffers from this ambiguity, because his much maligned “he has got a large estate” could very well have a dynamic meaning, as he concedes a sentence later (“he has gotten a large estate”).

200 Lieselotte Anderwald

3.2

Gotten in wider societal discourses

The linguistic insecurity and a metapragmatic awareness of got/gotten as a ‘problem’ is mirrored in wider societal discourses, some of which are documented for example in newspapers of the time. Large databases of digitized historical newspapers today allow us to reconstruct some of those discourses.6 Thus, in 1915 an anonymous “subscriber” asks the editor of the Morning Oregonian: “(To the Editor.) – Please tell me the past participle of ‘get’” (Morning Oregonian, Jan 15, 1915, p. 10). Only a year later The State (Columbia) heads an anecdote about the use of got/gotten with the headline “A Question of Usage”, and starts: “‘Can you say “gotten”?’ the maiden asked” (The State, Aug 27, 1916, p. 4). The article then later summarizes: “The debate continues, then, over ‘got’ and ‘gotten’”. Many other articles around the turn of the century demonstrate that there was an awareness of widespread uncertainty as to whether to use have got at all, and then more specifically uncertainty as to whether to use got or gotten. Already in 1820, the Vermont Daily Intelligencer declares that “Gotten and got are both unnecessary. He has, for example, is quite sufficient, and appears sufficiently succinct without the aid of those awkward auxiliaries” (Vermont Intelligencer, Dec 4, 1820, p. 4; auxiliaries here should perhaps be read as participles), and this sentiment continues throughout the century, as the quote from White also demonstrated. In 1900, the Philadelphia Inquirer in its section Night School: How to Do Things writes … “XVIII. How to Avoid Common Blunders … Gotten is an old form of ‘got’ … It should be remembered … that ‘I have got’ means, ‘I have acquired,’ and that, therefore, it should not be used when the meaning is simply ‘I have’” (Philadelphia Inquirer, December 11, 1900, p. 9, my emphasis in bold). Simultaneously, the widespread tendency to see gotten as the older form, and as a specifically American form from the 1880s onwards (cf. Anderwald forthcoming for more details on the temporal development) legitimizes it for use, but only in dynamic contexts, conveniently disambiguating potentially ambiguous forms of have got, and avoiding proscriptive censure by White and likeminded sticklers. There are even some hints that speakers/writers were aware of the fact that have gotten was promoted by insecure writers: in a book review in 1909 the author remarks that although not altogether incorrect, “had gotten is perhaps more common with writers who have more fear of the grammarian than knowledge of literature” (Springfield Republican, Massachusetts, Sep 5, 1909, p. 23). As the first linguist to 6. All examples to be presented are taken from the databases America’s Historical Newspapers (AHN, available at) and Nineteenth-Century U.S. Newspapers (NCNP, available at ), databases that together contain over 120 million articles.



Chapter 12.  American English gotten 201

note the functional differentiation of gotten,7 H. L. Mencken also explicitly mentions “polite speech” as the locus of this innovation, thus also hinting at a change from above: “In the polite speech gotten indicates a distinction between a completed action and a continuing action, – between obtaining and possessing. ‘I have gotten what I came for’ is correct, and so is ‘I have got the measles’” (Mencken 1919: 206). It remains to be investigated why and when gotten was taken over into everyday speech. The formal and functional differentiation between stative have got, which was already widespread, and the new dynamic have gotten, used by overly careful writers in writing, presumably turned out to be something speakers found useful. In evolutionary terms, we could see this as a case of remorphologization (parallel e.g. to Wolfram & Schilling-Estes 1996) or of exaptation (Lass 1990), that is the use of a nearly obsolete form (gotten) that was revived and ascribed a new purpose, namely indicating dynamic meaning. How this written form came to be accepted so widely in spoken language that it is now felt to be a marker of spoken discourse or even a ‘colloquialism’ is a story that needs to be investigated by North American sociolinguists, but first results from apparent time studies seem to offer intriguing insights in that direction (Tagliamonte p. c.) – we eagerly await more detailed regional surveys here. 4. Conclusion To conclude, the emergence of the morphological Americanism of gotten in the nineteenth and twentieth centuries seems to illustrate the unusual case of change from above that is successful. However, if we take the massive criticism of stative have got as one of the main causes of this development, even proponents of the efficacy of prescriptivism will have to concede that the rise of AmE gotten is a case of prescriptivism having rather different consequences than the ones intended: Instead of using stative have in the place of stative, ‘tautologous’ have got, as every prescriptive source advised, speakers – or rather writers – developed a specialized form have gotten as a formally different alternative for dynamic contexts to escape the censure of have got. Instead of the eradication of have got, as hoped, the new form have gotten spread very quickly, and so did stative have got. These outcomes were obviously not directly intended by the actors involved in proscribing have got. Like water that is shored up in one place but may break through in unexpected ones, the proscription of have got obviously had unintended consequences, amongst 7. This is predated by the same article quoted above from The State in 1916, whose author explicitly states: “No one would use the words interchangeably, saying, ‘I have gotten my pen in my pocket,’ for ‘I have got my pen’” (The State, August 27, 1916).

202 Lieselotte Anderwald

others the functional specialization and rise of dynamic have gotten. Have gotten must have been available over the nineteenth century in the low frequency pool of variants that were used in spoken language (in the sense of Kretzschmar 2009), presumably bolstered by massive immigration from Scotland and Ireland in the nineteenth century. This variant gotten, originally a fully equivalent form of got, was remorphologized under prescriptive pressure against stative have got for the purpose of indicating a dynamic–stative distinction that was originally not indicated grammatically. In this sense, the emergence of American English dynamic have gotten is not just a case of remorphologization, but even of complexification in the sense of Trudgill (2011) or McWhorter (2001), since it introduces a new grammaticalized distinction that was not present before. The recent revival of have gotten is thus not an instance of ‘colloquialization’, a straightforward change from below, or the rise to prominence of a traditional feature of spoken language, but it is instead a very curious case of an unintended by-product of prescriptivism, leading to the revitalization of an older, almost obsolete form which is remorphologized for a new functional distinction, promoted by overly careful, linguistically insecure writers.

References Anderwald, L. 2016. Language between Description and Prescription: Verb Categories in Nineteenth-century Grammars of English. Oxford: OUP. Anderwald, L. 2017. Get, get-constructions and the get-passive in 19th-century English: Corpus analysis and prescriptive comments. In Exploring Recent Diachrony: Corpus Studies of Lexicogrammar and Language Practices in Late Modern English [Varieng], S. Hoffmann, A. Sand & S. Arndt-Lappe (eds), n.p. Helsinki: Helsinki University. Anderwald, L. 2018. Language change and cultural change: The grammaticalization of the get-passive in context. Language & Communication 62: 1–14. Anderwald, L. 2020. The myth of AmE gotten as a historical retention. In Late Modern English: Novel Encounters [Studies in Language Companion Series 214], M. Kytö & E. Smitterberg (eds), 67–90. Amsterdam: John Benjamins. Anderwald, L. Forthcoming. Historical retention, progressive nation or the eye of the beholder? The evolution of morphological Americanisms. In Early North-American Englishes, M. Kytö & L. Siebers (eds). Amsterdam: John Benjamins. Biber, D., Finegan, E. & Atkinson, D. 1994. ARCHER and its challenges: Compiling and exploring a representative corpus of historical English registers. In Creating and Using English Language Corpora, U. Fries, G. Tottie & P. Schneider (eds), 1–14. Amsterdam: Rodopi. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Chapman, D. 2012. Enforcing or effacing useful distinctions? Infervs. imply. Paper presented at ICEHL 17, August 2012, Zurich, Switzerland. Davies, M. 2010–. The Corpus of Historical American English: 400 Million Words, 1810–2009. (14 May 2020).



Chapter 12.  American English gotten 203

Felton, O. C. 1843. The analytic and practical grammar. A concise manual of English grammar, arranged on the principle of analysis: containing the first principles and rules, fully illustrated by examples; directions for constructing, analyzing and transposing sentences; a system of parsing, in some respects new and attractive; alternate exercises in correct and false syntax, arranged under most of the rules of syntax; and a series of parsing lessons in regular gradation from the simplest to the most abstruse. Designed for the use of common schools. Salem: W. & S.B. Ives; and Boston: B.B. Muzzey. Finegan, E. 1980. Attitudes towards English Usage: The History of a War of Words. New York NY: Teachers College Press. Finegan, E. 1998. English grammar and usage. In The Cambridge History of the English Language, Vol. IV: 1776–1997, S. Romaine (ed.), 536–588. Cambridge: CUP. Fisk, A. [1821]1822. Murray’s English grammar simplified; designed to facilitate the study of the English language; comprehending the principles and rules of English grammar, illustrated by appropriate exercises; to which is added a series of questions for examination. Abridged for the use of schools. Troy, New York: Z. Clark. Gloy, K. 1998. Sprachnormierung und Sprachkritik in ihrer gesellschaftlichen Verflechtung. In Sprachgeschichte: Ein Handbuch zur Geschichte der deutschen Sprache und ihrer Erforschung, Vol. HSK 2.1, W. Besch, A. Betten, O. Reichmann & S. Sonderegger (eds), 396–406. Berlin: Walter de Gruyter. Gowers, R. 2016. Horrible Words: A Guide to the Misuse of English. London: Penguin. Greenleaf, J. 1821[1819]. Grammar simplified; or, an ocular analysis of the English language, 3rd edn. New York NY: Charles Starr. Hallock, E. J. 1849. A grammar of the English language; for the use of common schools, academies and seminaries. New York NY: Mark H. Newman & Co. Hamlin, L. F. 1832[1831]. English grammar in lectures: Designed to render its principles easily adapted to the mind of the young learner, and its study entertaining. Stereotype edn. Brattleboro: Peck, Steen and Company. Hundt, M. 2001. What corpora can tell us about the grammaticalisation of voice in get-constructions. Studies in Language 25: 49–88. Hundt, M. 2009. Colonial lag, colonial innovation or simply language change? In One Language – Two Grammars? Differences between British and American English, G. Rohdenburg & J. Schlüter (eds), 13–37. Cambridge: CUP. Jespersen, O. 1931. A Modern English Grammar on Historical Principles. Vol. IV: Syntax. Third Volume. Time and Tense. London: Allan & Unwin, and Copenhagen: Munksgaard. Kretzschmar Jr., W. A. 2009. The Linguistics of Speech. Cambridge: CUP. Kytö, M. 2004. The emergence of American English: Evidence from seventeenth-century records in New England. In Legacies of Colonial English: Studies in Transported Dialects, R. Hickey (ed.), 121–157. Cambridge: CUP. Kytö, M. & Walker, T. 2006. Guide to A Corpus of English Dialogues, 1560–1760. Studia Anglistica Upsaliensia, Vol. 130. Uppsala: Acta Universitatis Upsaliensia. Lass, R. 1990. How to do things with junk: Exaptation in language change. Journal of Linguistics 26(1): 79–102. Lorenz, D. 2016. Form does not follow function, but variation does: The origin and early usage of possessive have got in English. English Language and Linguistics 20(3): 487–510. Marckwardt, A. H. 1958. American English. New York NY: OUP. McWhorter, J. H. 2001. The world’s simplest grammars are creole grammars. Linguistic Typology 5: 125–166.

204 Lieselotte Anderwald

Mencken, H. L. 1919. The American Language: A Preliminary Inquiry into the Development of English in the United States. New York NY: Alfred A. Knopf. Mencken, H. L. 1921. The American Language, 2nd edn. New York NY: Alfred A. Knopf. Mencken, H. L. 1936. The American Language, 4th corrected, enlarged, and rewritten edn. New York NY: Alfred A. Knopf. OED. 2011–. Oxford English Dictionary Online. Oxford: OUP. < http://www.oed.com> (14 May 2020). Putnam, J. M. 1828[1825]. English grammar, with an improved syntax. Part I: Comprehending at one view what is necessary to be committed to memory. Part II: Containing a recapitulation, with various illustrations and critical remarks. Designed for the use of schools, 2nd edn. Concord: Jacob B. Moore. Robertson, S. 1931. A British misconception. American Speech 6(4): 314–316. Schneider, E. W. 2007. Postcolonial English: Varieties around the World. Cambridge: CUP. Taylor, A., Nurmi, A., Warner, A., Pintzuk, S. & Nevalainen, T. 2006. Parsed Corpus of Early English Correspondence. York & Helsinki: Oxford Text Archive. Trudgill, P. 2011. Sociolinguistic Typology: Social Determinants of Linguistic Complexity. Oxford: OUP. Webster, N. 1789. Dissertations on the English language: With notes, historical and critical. To which is added, by way of appendix, an essay on a reformed mode of spelling, with Dr. Franklin’s arguments on that subject. Boston: For the author. Webster, N. 1822[1807]. A philosophical and practical grammar of the English language, 2nd edn. New Haven: Howe & Spalding. Welsh, J. P. 1889. A practical English grammar, with lessons in composition and letter-writing. Philadelphia PA: Christopher Sower Company. White, R. G. 1870. Words and their uses, past and present: A study of the English language. New York NY: Sheldon and Company. Wolfram, W. & Schilling-Estes, N. 1996. Dialect change and maintenance in a post-insular island community. In Focus on the USA [Varieties of English around the World G16], E. W. Schneider (ed.), 103–148. Amsterdam: John Benjamins. York, B. 1862[1854]. An analytical, illustrative, and constructive grammar of the English language. Accompanied by several original diagrams, exhibiting an occular illustration of some of the most difficult principles of the science of language; also, an extensive glossary of the derivation of the principal scientific terms used in this work, in two parts, for the use of every one who may wish to adopt it, 3rd edn. Raleigh: W. L. Pomercy.

Part III

Present-day English

Chapter 13

Explaining explanatory so David Denison

University of Manchester

This chapter examines a recent use of so in spoken British English, namely as a discourse marker conveying acceptance of an invitation to take the floor and give an explanation. I demonstrate a long-term increase in turn-initial so, dating the specifically ‘explanatory so’ to the 2010s in Britain. Evidence comes from corpora of academic discourse, of media language and especially of conversation. I argue that the usage is a coalescence of several well-attested discourse uses of so, perhaps strengthened by transatlantic influence. I explain the often hostile public reaction by the sentence grammar of so, also offering a general hypothesis about what makes an innovation salient and objectionable to conservative speakers. Keywords: discourse markers, initial so, current change, language attitudes, result clauses

1. Introduction A young academic, asked over a beer why he lived in Manchester when his post-doctoral fellowship was in Bristol, began a relaxed account of the reason as follows: (1) So my girlfriend lives here. […] 

(27 March 2019, attested DD)

Such uses of so are my topic. They tend to involve explanations, hence the title of this chapter, but that is only to give the principal context of occurrence, not the function. To anticipate my findings, explanatory so is a discourse particle in turn-initial position which has such functions as accepting an invitation to take the floor and prefacing an explanation. This chapter offers the first corpus-based approach to the recent development of explanatory so in British English. Section 2 is a sketch of those uses of so in clause-initial position which can be classed broadly as belonging to sentence grammar, of which result clauses in particular will be relevant to both discourse functions and metalinguistic attitudes

https://doi.org/10.1075/scl.97.13den © 2020 John Benjamins Publishing Company

208 David Denison

further on. Section 3 takes the discussion from the grammatical to the discourse level and reviews so as a discourse marker. Section 4 shows the rise of discourse so by means of a corpus investigation, conversation in the British National Corpus (BNC) and Spoken BNC2014, and academic English in the British Academic Spoken English (BASE) corpus. I also look briefly at broadcasting. Section 5 traces the origins of explanatory so in British English in the light of the corpus data. Section 6 explains the extraordinary reaction it has provoked in the media, and more generally why certain linguistic innovations but not others draw the (f)ire of the complaint tradition. Section 7 is a brief envoi. In surveying the distribution of such a common word,1 I confine my discussion to so at the start of a clause and with clausal scope. The function of so is largely unchanged when initial position is shared with an agreement marker. Throughout the chapter, therefore, the label ‘initial’ also covers a so preceded by an interjection, up to two words usually of assent or disagreement, as in the interviewee’s response in (2).

(2) Interviewer: It feels a bit out of sync with public policy, doesn’t it, though, I mean if we’re trying to move to zero carbon by 2050, in 30 years’ time, how does this play into (pause) to that goal? Interviewee:  Yes, so, there is a broader context within UK energy policy, we […]  (BBC Radio 4 Today, 17 June 2019)

2. So in sentence grammar In sentence grammar, each item has a syntactic slot and makes some contribution to meaning, albeit not always amenable to glossing as we move down the cline from lexical to function words. The following sketch is based on the standard modern grammars of English (Quirk et al. 1985; Biber et al. 1999; Huddleston & Pullum 2002). As a conjunction (at least in traditional terminology), so introduces a clause. Huddleston and Pullum (2002: 726) observe that “[s]o is used to indicate either purpose or result”, with the differences between the two functions discussed at (2002: 733), not just in relation to so. (See also Quirk et al. 1985: 1108–9.) If asked

1. The BNC recognises several so lexemes with different part-of-speech tags. Taken together, they constitute the 31st most frequent form in its spoken portion.

Chapter 13.  Explaining explanatory so 209



to give an example of clause-initial so, many people would probably come up with either purpose or result. I take them in turn.2 In purpose clauses like (3) (the first example is from Huddleston & Pullum 2002: 725), that is optional: (3) a. We booked early so that we could be sure of getting good seats. b. We booked early so we could be sure of getting good seats. c. So (that) we could be sure of getting good seats, we booked early.

The whole subordinate clause can be placed either before or after its main clause. In result clauses like (4) (examples from Huddleston & Pullum 2002: 725, 1320, respectively), the so-clause can only follow the main clause. However, Huddleston and Pullum (2002: 1320–21) argue that so in (4b) is close to being a coordinator. For Quirk et al. (1985: 442) it is a conjunct adverb that resembles a conjunction, but differs because its clause cannot be fronted without change of meaning, while so can be preceded by a coordinating conjunction, but not by a subordinating one (Quirk et al. 1985: 645–646). (4) a. The airline had overbooked, so that two of us couldn’t get on the plane. b. There was a bus strike on, so we had to go by taxi.

Finally, Quirk et al. (1985: 1070) state that a result clause like (5) is a disjunct rather than an adjunct – that is, a speaker comment on the preceding main clause.

(5) We know her well, so that we can speak to her on your behalf.

A disjunct can be part of discourse structure. I have given their example verbatim, though in my own usage, so would be more natural here without that. So has uses as an intensifier, part of a VP- or other substitute, and part of more complex adverbs and conjunctions, illustrated respectively in (6a–c), all with so in initial position: (6) a. So cruelly did he treat them, that … b. So I believe. c. So long as you insist on …

So can also be an adverb, as in this example from Huddleston & Pullum (2002: 1319): 2. A third variant, the manner clause, is described only in Huddleston and Pullum (2002: 968). It is said to be “comparatively rare […] and usually interpreted as” result or purpose – in fact obscuring the distinction between result and purpose. Their examples are (i) He’d arranged the programme so that we had lots of time to discuss the papers. (ii) I apply the hay so that only the tops of the plants show above it.

210 David Denison



(7) The mill could be sold off, so providing much-needed capital.

In (6) and (7), so does not introduce and have scope over the clause. Corpus investigation of so requires string searches, as part-of-speech tags are not reliable, and any of the sentence grammar patterns in (3) to (7) may turn up. Only result clauses like (4b) will play some part in the story of explanatory so. The rest are irrelevant and must be filtered out manually. 3. So as discourse marker Linguistic items may have discourse functions and thereby help to organise different sections of a text, or, at a lower level, make connections between adjacent utterances. They can also express speaker attitudes, and they can help to negotiate interpersonal relations between speaker and hearer. (See, for example, the references in Brinton 2006.) Unfortunately, items often serve more than one of these functions at the same time, in proportions which cannot be measured systematically, and according to classification schemes that vary from scholar to scholar. Initial position is usually regarded as crucial for discourse markers. In spoken discourse, speakers generally take turns, and what is crucial about explanatory so is that it is turn-initial, as for example in (1). Quirk et al. (1985: 633) observe that a conjunct such as so can actually be discourse-initial: (8) So you’re leaving, then! [intonation marking not reproduced here – DD] […] Discourse-initiating items can be less easy to account for plausibly, but it seems significant that such items are usually those that have a well-established conjunctive role in mid-discourse use. […] It would seem that, in discourse-initial use, these items seek to enforce by implication some continuity with what might have gone before. Silence is difficult to break without some such convention.  (Quirk et al. 1985: 633–4)

This observation neatly captures the behaviour of so, in (8) and more widely. The conjunct so appears under the semantic heading ‘resultive’ (label from Quirk et al. 1985: 635; cf. also 638) and is labelled ‘informal’. What are the discourse functions of so? Among the different but often overlapping classifications available, Buysse’s survey is helpful for our purposes. Drawing on Halliday and Matthiessen (2004), he lists ten functions in three categories (Buysse 2012: 1767, Table 1), exemplified in my Table 1 below. If more than one can be discerned in the same example, he argues that one function is always more



Chapter 13.  Explaining explanatory so 211

salient than the others, and in classifying his data, he allows an ideational or an interpersonal function to trump a textual one. ‘Ideational’ would include what I have called a result clause in sentence grammar, which Buysse defends including among discourse functions as well “because it is both syntactically and semantically optional in this context’” (2012: 1768), whereas purpose is not a discourse function (2012: 1767). If the dual status of result in both sentence grammar and discourse grammar seems analytically messy, I would contend in response that discourse markers have generally grown out of sentential functions, hence it is only to be expected that there should be some overlap between the two.3 I adopt Buysse’s list of functions, adapting it slightly for my data.4 Can it accommodate explanatory so? Two of the textual items on the list bear a close relationship to explanatory so without quite being suitable descriptions. The closest is ‘Introduce a section of the discourse’, where so can open an interviewee’s first turn, perhaps preceded by a short prompt from the interviewer (Buysse 2012: 1771–2). The other is ‘Introduce a new sequence’. In discussing that function, Buysse writes that so “can indeed introduce […] a new step in an explanation” (2012: 1768). I suggest that explanatory so goes further, in that the speaker actually begins an explanation at that point, and that the usage has the further function of accepting an invitation to take the floor. Hence explanatory so must be located among the interpersonal discourse uses. Note furthermore that it must be distinct from so in purpose or result clauses, because finite purpose clauses require a modal auxiliary, unlike explanatory so, while result clauses have different semantics and can begin with and, again unlike explanatory so. Explanatory so is not ideational, therefore, nor merely textual in function. I have added it to Buysse’s classification under the heading “Introduce an invited explanation”. My Table 1 reproduces Buysse’s Table 1 faithfully, apart from that one additional interpersonal function (indicated by italics), and with abbreviated examples taken from my corpora.

3. The difficulty of assigning a word class to certain uses of so, noted by all three major grammars, testifies to its often liminal status. 4. I considered adding a function such as ‘Introduce a question’, a context that is frequent in my data but unlikely to come from Buysse’s interviewees. Prefaced questions cannot be assigned to his ‘Prompt’ function as a way for a speaker to yield the floor, as Prompt is said to occur at the end of a segment. I have tentatively decided not to include introducing questions in my tabulations, since most such examples can also be assigned to one of Buysse’s ten functions. However, I have annotated so-questions as such in my data in case useful for further research. The decision does not affect my main line of argument.

212 David Denison

Table 1.  Discourse marker functions of so, adapted from Buysse (2012: 1767) Type of relation

Discourse marker function

Abbreviated example (from Spoken BNC2014 texts unless otherwise noted)

Ideational

Indicate a result

[Cos mummy made the tea …] so daddy does the washing up. (BNC, KBW) yeah so she was born in forty-three (SKDX) but erm so (SM6B) so so that ’s that so but not much else has really happened (S26N) So, I did a little research on what the conditions are like in the mines (TV Corpus, Big Bang Theory) so they ’re not free […] (SHTM) so did you guys have a good week ? (SKHW) [out of the blue] so what time is the flight tomorrow ? (SBB2) [discussed earlier] so I was like oh my god I have to get this for her like (S23A) [continuing a narrative] so that was brilliant you know cos […] (S2T6) So I had to […] yeah, so I did have to guess a great deal (BNC, KC0)

Interpersonal Draw a conclusion Prompt Hold the floor Introduce an invited explanation [explanatory so] Textual

Introduce a summary Introduce a section of the discourse Indicate a shift back to a higher unit of the discourse Introduce a new sequence Introduce elaboration Mark self-correction

The classification of Table 1, if sufficiently comprehensive, should permit us to explore corpus data systematically in search of explanatory so and its precursors. 4. Corpus data Explanatory so belongs to recent spoken English. The most important reservoir of speech patterns is everyday conversational usage. Anecdotally, explanatory so is prevalent in academic speech and in radio or TV interviews (see e.g. Liberman 2010; Creighton 2015). Accordingly I have tried to find British corpus data that represents conversation, academic speech and broadcasting. Ideally we would have searchable diachronic corpora for all three domains that straddle the advent of explanatory so, or else comparable synchronic corpora from before and afterwards. Not all the desiderata can be met. For conversation ‘before’, I took the demographically sampled spoken component of the British National Corpus, henceforth ‘Spoken BNC1994DS’. Recordings were mostly made in 1992, and the text-type is similar to the purely conversational



Chapter 13.  Explaining explanatory so 213

Spoken BNC2014,5 recorded 2012–16, permitting a comparison of usage about a generation apart. These corpora have the advantage of generous markup (BNC 2007, Love et al. 2017) and excellent search engines.6 Compared to academic speech and media interviews, relaxed conversation among friends and family is not a context where explanatory so would be especially likely to occur, but we can at least expect information about the frequency of turn-initial so, and about which uses might allow an indigenous development of explanatory so. For academic speech I used the British Academic Spoken English corpus (BASE, Nesi & Thompson 2000–2005), a corpus of transcriptions of lectures and seminars at two UK universities in 1998 to 2005.7 For broadcast speech it is easy to illustrate explanatory so with anecdotal data, for example in BBC Radio 4’s daily Today programme. Unfortunately the BBC does not offer recordings of this programme beyond a rolling four-week catch-up period, and I have not located a suitable corpus. Some historical information on broadcast usage can be gleaned from the TV Corpus (Davies 2019) and from the work of Schlegl (2018). 4.1

Conversation: Spoken BNC1994DS and Spoken BNC2014

I searched within a single turn for start of turn followed by up to two optional words, an optional piece of punctuation, and finally so.8 The same CQP query was run on each corpus, with the results shown in Table 2. The searches actually find turn-initial so (and not necessarily discourse so). Recall should have been very high, and precision in the samples extracted from each set of hits was surprisingly 5. I am grateful to an anonymous referee for this point. 6. BNC is available with BNCweb search software (Hoffmann et al. 1996–2010) at Manchester and Lancaster Universities. ‘Word counts’ are in fact token counts, as is normal with tagged corpora (e.g. don’t is split into a stem and a clitic token). The software CQPweb (Hardie 2013) was developed from BNCweb, and the CQPweb server at Lancaster hosts a wide range of corpora, including both BNC and Spoken BNC2014, but note that its token counts include punctuation. Search syntax and results are identical in both packages; only the frequencies differ. 7. The corpus was developed at the Universities of Warwick and Reading under the directorship of Hilary Nesi and Paul Thompson. Corpus development was assisted by funding from BALEAP, EURALEX, the British Academy and the Arts and Humanities Research Council. The file ­pssem009.xml is missing from the claimed 200 files, never having been transcribed (Hilary Nesi, pers. comm., 1 September 2019). 8. Search term “ ([] ){0,2} ([pos="PUN"]?)? [word="so"%c] within u”.

214 David Denison

Table 2.  Turn-initial so in two conversational BNC corpora  

Spoken BNC1994DS

Spoken BNC2014

Total tokens including punctuation (CQPweb) Total ‘words’ excluding punctuation* Turn-initial so Frequency per million tokens (CQPweb)

5,014,655 4,233,962     8,743     1,743.49

11,422,617 11,209,172     42,007      3,677.53

* The figure for Spoken BNC1994DS is given by the BNCweb interface. The figure for Spoken BNC2014 is calculated by subtracting the number of tokens tagged as _Y*, namely 213,438, of which all but 7 are question marks, virtually the only punctuation used in that corpus.

high too at 82–83%.9 This means that the more than doubling in frequency shown in the table is probably a safe indication of a steep rise of turn-initial discourse so. The hits were randomly thinned down to 200 from each corpus, then classified according to the scheme of Table 1 (or as non-discourse or non-initial so).10 Examination proceeded until 100 examples of turn-initial discourse so had been found. The results are shown in Table 3. The two directly causal functions, indicating a result and drawing a conclusion, have between them dropped by 40%, though numbers are too small for precise claims. However, a fall is consistent with a continued shift from so as a marker of logical inference towards a more general core meaning, something like ‘in the light of what has been said previously’.11 As already noted, overall use of turn-initial so has more than doubled between the two BNC corpora. An interesting example in my Spoken BNC1994DS sample is (9), involving overlapping speech (signalled by ) between mother and son: (9) Jane

1661 Which reminds me you still haven’t written to Geoffrey and Jean to thank them for your birthday money have we? David 1662 So mum, I know I haven’t. Jane 1663 Or have you, which is more to the point.  (KCH 1659) 9. The precision figures taken from Table 3 below are simply the ratio of genuine examples of turn-initial discourse so in each sample (100) to the number of hits that had to be examined before 100 genuine examples had been found (120 and 122, respectively). The percentages serve as an estimate of the precision of the full search results reported in Table 2. 10. Hits were discounted as being non-initial if something other than an agreement marker or similar preceded so, or if the turn was in fact a continuation interrupted by overlap with another speaker. 11. Or perhaps sometimes ‘in the light of our shared contextual knowledge’, given the occasional use of discourse-initial so, as in (8), though this is probably uncommon, to judge from the low figures for introducing a section of the discourse in my samples.

Chapter 13.  Explaining explanatory so 215



Table 3.  Comparison of turn-initial discourse so in Spoken BNC1994DS and Spoken BNC2014  

Spoken BNC1994DS

Indicate a result Draw a conclusion Prompt Hold the floor Introduce an invited explanation [explanatory so] Introduce a summary Introduce a section of the discourse Indicate a shift back to a higher unit of the discourse Introduce a new sequence Introduce elaboration Mark self-correction Not classified Subtotal Non-discourse or non-initial so Total hits examined Precision in sample examined

  2  18   0  10   0   6   3  11  30  14   2   4 100  20 120 83%

Spoken BNC2014   1  11   0  20   0   4   1   3  16  42   0   2 100  22 122 82%

David’s so seems to carry something of the interpersonal emotive charge seen in (8), and it does not follow in an obvious way from the previous discourse. As its function is at present unclear to me, I marked it as ‘not classified’. There were no examples of explanatory so in my thinned set of 200 Spoken BNC1994DS hits, nor did I notice any in other, relatively cursory searches of the BNC. Although there were no examples in my thinned set of 200 Spoken BNC2014 hits, explanatory so does occur in the full set of hits, for example as in (10). (10) S0018: but do you put the water inside the ball ? S0146: yeah so you put all the potatoey stuff in and then you just fill up the rest of it with the tamarind water […]  (S4L9)

4.2

Academic speech: BASE

The whole BASE corpus of 1,644,942 tokens12 was loaded into the MonoConc Pro concordancer and a regex search run for so as first word of a turn, possibly preceded

12. MonoConc Pro gives a total of 1,633,617 words, including filled pauses; the difference is unimportant.

216 David Denison

by one or two other words and/or pauses.13 A word preceding so was typically a marker of assent such as okay, right, yeah, yes, but also sorry or possibly, or else so was repeated. There were 313 hits in the whole corpus, a rate of 190.28 pmw, which is far lower than in either of the conversation corpora. Hits were examined and filtered until 100 examples of turn-initial discourse so had been found, classified as before. The results are shown in Table 4. Table 4.  Turn-initial discourse so in BASE  

BASE

Indicate a result Draw a conclusion Prompt Hold the floor Introduce an invited explanation [explanatory so] Introduce a summary Introduce a section of the discourse Indicate a shift back to a higher unit of the discourse Introduce a new sequence Introduce elaboration Mark self-correction Not classified Subtotal Non-discourse or non-initial so Total hits examined Precision in sample examined

  2  10   0   6   0  22   1   9  19  29   0   2 100   6 106  94%

There were some difficulties of classification. For example, although (11) did indicate a shift to a higher unit, repeated hesitation by staff member nm0285 led me to code his so as primarily holding the floor. The trickiest case was (12), in the middle of a group presentation, where student sf5092’s so was open to several different analyses: introducing a new sequence, or a summary of preceding utterances, or elaboration. I even considered explanatory so, given that she received an invitation to take over at the end of the preceding turn, but since this was not her opening contribution, and since introducing a new sequence was the clear function

13. Search term “]+?> ((#|]+?>)[ ])?([a-z]+[ ]((#|]+?>)[ ])?){0,2} so\W”. Note that ‘’ (for ‘utterance’) marks the start of a speaker turn, and ‘#’ represents a filled pause. In BASE there can be consecutive turns by the same speaker resuming after an audio or video demonstration or long pause.

Chapter 13.  Explaining explanatory so 217



of a fellow-student’s turn-initial so later in the presentation, that was the primary function recorded for (12). (11) yes but you are advising

so so we we we tell you what our opinion is and you can modify your proposal if you wish […]  (ahsem008) (12) yeah we all decided in one voice one feeling one soul we decided that we can’t judge yet the the Cuban Revolution because # it hasn’t finished yet it’s still running Castro will wake up this morning he will have his cigar and everything so it hasn’t finished that’s the point so continue okay so basically we decided that according to the initial aims that we thought the revolutionaries had when they first came into power in nineteen-fifty-nine # seems to have been a failure # one of the main aims they seem to have had was to restore democracy to restore the constitution of nineteen-forty nineteen-forty  (ahsem003)

So was hardly ever used to introduce a response (4%), and no examples were classified as explanatory so, whether in the sample of 100 or the remaining 213 hits. Absence from a modest-sized corpus is not proof of non-existence, of course, but it is suggestive. 4.3

Broadcasting

A recently-released historical corpus of TV episodes from six English-speaking regions (Davies 2019) offers a promising 325 million words. The left-hand portion of Figure 1 plots occurrence over time in all TV shows in all regions of the string ‘? So’, used as a very crude proxy for a context where explanatory so might appear. There is a fall after the 1950s, but a steady increase from the 1960s to a frequency well over three times higher in the 2010s. The last two columns compare hits over the whole period in two of the three regional groupings: the overall frequency is 19% higher in USA/Canada than in UK/Ireland. Inspection suggests that most of the 54,472 hits are of discourse so (a higher proportion in the last decade of the corpus than the first), but that the overwhelming majority of hits are not explanatory so, and indeed many are themselves questions. It is not practical to do any statistical work on discourse types. Search tools are limited, and crucially, the corpus does not mark speakers, turns or even scene changes, so the linguistic context of a sentence-initial so can only be inferred and may not even be a response to the previous question.

218 David Denison

Frequency per million words

250

194.47

200

171.16

164.73

143.96

150 127.4 100

87.07 69.34 54.48

50

65.37

0 1950s

1960s

1970s

1980s

1990s

2000s

2010s

US/CA

UK/IE

Figure 1.  Initial so after question mark in the TV Corpus; figure adapted from Davies (2019)

A ‘virtual corpus’ of UK documentaries from the TV Corpus attests to the currency of explanatory so. (13) A. And you’re getting these things up to what sort of height? B. So, we’re interested at the height of about 10km, but they continue on up to about 25–30km.  (2016 BBC TV Horizon, episode “Ice Station Antarctica”)

There are about a dozen such examples, all recent. For Canadian (and to some extent US) English, Schlegl (2018: 14–16) managed to assemble a corpus of broadcast material that spans five time periods ranging from 1951 to 2018, size not stated.14 She tracks a number of discourse markers in a sample from the corpus, including well and so, and applies a mixed-effects regression model to the data. Among her interesting observations are (i) a striking decrease from the 1970s to 2018 in well as a marker of spontaneous new topics (50% to 16%), with a corresponding increase in so (25% to 66%) (Schlegl 2018: 30); and (ii) that utterance-initial so,15 although present in her oldest data to mark ongoing speech, 14. I am grateful to Lisa Schlegl for letting me see her dissertation, and to Heike Pichler (pers. comm. 14 March 2019) for putting me in touch with her. Note too that Pichler was interviewed on BBC Radio 4’s Feedback programme on 3 November and 10 November 2017 about listeners’ complaints about language annoyances in broadcasts, among others the use of so (2017). 15. ‘Utterance’ is not defined but is not necessarily a whole turn by one speaker.

Chapter 13.  Explaining explanatory so 219



is being extended “primarily by young women” into marking new turns and new topics, and that “the same uses […] are apparently metalinguistically salient to members of the [Canadian] speech community” (2018: 36–7). 5. Origins of explanatory so Our analyses of recent British corpora will help to date the advent of explanatory so and give evidence of its precursor functions. Explanatory so is probably absent from Spoken BNC1994DS and from BASE (despite example 12) but is present in Spoken BNC2014 in examples that fall outside the sample counted for Table 3. I briefly review some secondary literature. Explanatory so is not described in the path-breaking study of discourse markers by Schiffrin (1987), which includes other uses of so, nor is it recognised in any of the standard reference grammars (Quirk et al. 1985; Biber et al. 1999; Huddleston & Pullum 2002), nor is precisely that usage yet in the Oxford English Dictionary (but then it is not typically a written usage). Shortly after the turn of the millennium, Bolden (2003a, 2003b, 2008, 2009, etc.) began a series of conference papers and publications on so in American conversation, though not specifically explanatory so. Blevins (2015), apparently American, has a level-headed survey of all kinds of “sentence-initial so” in history. She offers the following as a “commonly touted theory” about its recent spread, a theory which apparently targets something close to explanatory so: This was first noted by Michael Lewis in The New New Thing (1999) – “When a computer programmer answers a question, he often begins with the word ‘so’.” As to how this came about, it is thought that given the international composition of the typical Silicon Valley work site, where a large number did not speak English as their first language, it became the simple “catchall” word of transition. Over time and frequent usage, it eventually became like a tic and just part of the common speech pattern of those in that industry and then spreading beyond.  (Blevins 2015: n.p.)

All of this suggests that explanatory so is largely a twenty-first-century development there. Public reaction in the UK doesn’t start till well into the 2010s. I have no data of my own to test the plausibility of a computer programmers’ lingua franca in California for the initiation of the usage, nor of Schlegl’s better-supported suggestion (Section 4.3 above) of a spread led by young women, at least in Canada. My concern in this chapter is with British English. It is quite possible that explanatory so in Britain was introduced or reinforced by North American influence, whether in academic usage or via the media. The alternative explanation, not incompatible, is that it arose naturally within British English. Indeed a dual source seems likely.

220 David Denison

For a possible internal origin I suggest the following scenario. Turn-initial so has been on a long-term trajectory of increase at the expense of well and other discourse particles in speech. For a speaker conveying expertise, so has the particular advantage over well of not sounding vague or tentative, instead briskly suggesting a logical basis for what follows. In my discourse so data, the function ‘Introduce a new sequence’ is quite common (16 to 30% in the three corpora), whereas ‘Introduce a section’ is not (1 to 3%). Explanatory so seems to combine elements of these two textual functions: from the former, a step within an exposition; from the latter, an opening gambit. Its interpersonal function of ‘Accept an invitation to take the floor’ is an extension of the interpersonal function ‘Hold the floor’ (see Table 1), which itself appears to have become more common, rising to 20% in Spoken BNC2014. This whole complex of functions has come to be associated with the giving of explanations. Whatever the precise source of explanatory so, my strong impression is that it has become common in academic circles for explanations and research presentations, which may explain in part its appearance in broadcast interviews. 6. Public reaction to explanatory so There is an interesting interaction between language change and language attitudes. Explanatory so hits a nerve in some conservative speakers, when other innovations pass unnoticed or at least unchallenged. The usage has attracted mostly hostile media discussion, as for example an interview with the radio presenter John Humphrys (Creighton 2015), a leader in The Times (anon. 2017), and the radio/podcast appearances by Pichler (Dixon 2017) and Gopnik (2018);16 compare Shariatmadari (2017) and Liberman (2010). Dr Bernard Lamb, President of the Queen’s English Society, is quoted as follows: I think [the use of ‘so’ at the beginning of a sentence] is a sign of someone who is not particularly fluent, it’s fulfilling the function of ‘ummm’ and ‘errrr’ and giving the person a bit longer. It’s not being used as a conjunction to join things up, which is how it should be used. I think someone started doing it and then other people have begun slavishly copying it, it becomes fashionable. It’s just carelessness, it doesn’t have any meaning when used this way. (Creighton 2015: n.p.)

The quotation implies that in so-called ‘good’ or ‘correct’ usage, so is always and only a subordinating conjunction, which is not true. Indeed in everyday speech, 16. Reference due to Stefan Krasowski in a comment on Liberman (2019).



Chapter 13.  Explaining explanatory so 221

conjunction uses are only a small minority of turn-initial so.17 Nor is it sufficient to refer only to the word’s part of speech and syntax: the semantics, pragmatics and discourse functions must be considered as well. As for “giving the person a bit longer”, that is what a discourse grammarian would call “holding the floor”. Why, though, is there such a reaction? I offer the following explanation, framed somewhat simplistically in terms of ‘conservative’ and ‘advanced’ users. The conservative user does not themself have explanatory so in their repertoire, but it is highly salient to them when heard, coming typically as the very first word uttered by an advanced speaker.18 The Recency and Frequency Illusions (Zwicky 2005) kick in to amplify the salience (and possible annoyance), leading to the assumption that the usage is more frequent, and perhaps more recent, than it actually is. Language peevers rarely distinguish between the grammars of spoken and written English, and the quotation above unrealistically demands that speakers ‘talk like a book’. In an informal setting it would actually be rather unnatural to launch into an explanation without an introductory discourse particle. Note that the conservative user is unlikely to be riled by hearing so introducing a question from an interviewer, as in this example from the BNC: (14) So are we are we talking er do you see this as a as a as a launch pad?  (J9X 460)

That usage has been common for years, optionally with a separate intonation contour and rising intonation, or unstressed as part of the intonation contour of the following clause. What seems to annoy the conservatives is so before the answer. Now, explanatory utterances can involve clauses of reason, result or purpose. Among the multiple possible forms of expression, so and because can act as approximate converses:

17. Among the hits examined there are 5 (4.0%) result or purpose clauses in Spoken BNC1994DS and 3 (2.5%) in Spoken BNC2014. There might be a higher proportion if so-clauses in mid-turn were included. 18. Compare a use of because as a discourse marker by several UK radio presenters without any real causal link to the preceding clause:

(i) And um is that piece typical of what’s on the album, because it’s the biggest piece, it’s the centre of the album, around which the smaller pieces are in orbit, she says. um Is the whole thing effectively a single piece?  (Andrew McGregor, in conversation with  Sara Mohr-Pietsch, “Record Review”, BBC Radio 3, 2 March 2019, 1:59:08)

If there is any causal sense, it is metatextual (‘the reason I am asking is’). The usage aids fluency and continuity, and in mid-turn and indeed mid-sentence position is far less salient than explanatory so. It does not, as far as I know, provoke listener complaints.

222 David Denison

(15) He went for a walk because he was bored. (16) He was bored, so he went for a walk.

See also Quirk et al. (1985: 1108–9), Schiffrin (1987: Chapter 7). Increasingly, however, a whole explanation can be prefaced by so, as we have seen. Recall example (1), repeated and amplified below as (17): (17) A. Why do you live in Manchester if your job is in Bristol? B. So my girlfriend lives here. […]

Contrast the reply with what were previously more conventional alternatives, all still in use and in variation with so, such as (18) a. B. Because my girlfriend lives here. b. B. Well/You see/Actually/The reason is/… , my girlfriend lives here. c. B. My girlfriend lives here.

To the conservative language user, so in (17) is being misused. While the sentence pattern is like (18a) – a subordinate clause whose main clause is understood from the preceding turn – the lexical choice of clause-introducer is apparently like (16). But in (16) the second clause expresses result, not reason. I speculate that one or both of the following conditions must obtain for people to have a hostile reaction to a linguistic innovation – given, of course, that they have noticed it at all: (19) a. They (think they) remember having been taught that it’s an error. b. They can use those words themselves, but not with that meaning or function.

For explanatory so, it is condition (19b) that is satisfied. Note that an unfamiliar word, even though noticeable, satisfies neither condition and is probably less likely to be complained about. Some conservative speakers have gone some way towards recognising the new use of so. A telling passage occurs in the following online advice, recommending the avoidance of initial so in business presentations (Thurman 2014: n.p.): That little head cock, slight furrowing of the brow, and set-up with “so” says to your audience, “I’m trying to dumb this down so someone like you may have at least a chance of comprehending the importance of what I do.”

In other words, explanatory so is said to come over as patronising and condescending. Gopnik (2018) has the same interpretation. However, what I observe is that many people can use turn-initial so quite freely when giving explanations – which is, after all, one of the main jobs of someone in

Chapter 13.  Explaining explanatory so 223



academic life – and without the condescension that Thurman and Gopnik detect. It is unstressed, apparently unremarkable (to them), a polite but crisply efficient signal that they accept an invitation to take the floor. I don’t think it necessarily comes with a pause (though cf. example 2 above). I do not find what Bob Ladd did some years ago (comment dated 25 August 2010 below Liberman 2010): The pause mentioned by a couple of people is also relevant – the usage in question usually has a prolonged vowel, on a fairly steady level pitch, followed by a pause before the rest of the discourse begins.

The discourse semantics of explanatory so conveys the acceptance of an invitation to speak, and it signals that what follows is an explanation. It occurs frequently now without any sense on either side that a requested answer requires specialist knowledge or training. No doubt there are cases where a speaker feels they have a superior right to offer an explanation and perhaps also that their expertise should be acknowledged, but that would be situationally contingent – what we might call discourse pragmatics. I suggest that it is not a function of explanatory so. 7. Envoi Explanatory so has become widespread in North America, and in the last decade or so in Britain, where it is likely to have developed at least in part out of existing uses of discourse so, as explored in Section 5. However, those other uses remain available and in many cases frequent. The same speakers may use so, for instance, to resume a narrative, to give a summary, to hold the floor. The explanatory function appears in a context that is particularly noticeable to non-users, but it is not far removed from other functions of longer standing. Explanatory so is a distinct, identifiable usage that deserves a label of its own – in the modern parlance, ‘explanatory so is a thing’ – but it is after all neither an isolated nor a surprising development.

Acknowledgment I hope Merja Kytö, indefatigably generous contributor to the world of English linguistics, will appreciate this sketch, with its various dialogues: between speakers, between syntax and discourse or pragmatics, between usage and attitudes. I am grateful to reviewers and editors for pushing me in the right, or at least a better, direction. I wish also to thank audiences at the British Library (‘English Grammar Day’, July 2019) and Université Paris 3 (‘New perspectives on language change and variation in the history of English’, October 2019) for comments on different oral versions of the material.

224 David Denison

References anon. 2017. So what? Well, it’s time to brush up (and vary) your verbal tics. The Times, 15 November 2017. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson. Blevins, M. 2015. So, when did we start introducing sentences with so? Today I found out: Feed your brain. (15 May 2020). BNC. 2007. The British National Corpus, version 3, BNC XML edn. Oxford University Computing Services on behalf of the BNC Consortium. Bolden, G. B. 2003a. Self and the other: The use of “oh” and “so” in sequence-initial position. Paper presented at Annual Conference on Language, Interaction, and Social Organization, Santa Barbara CA. Bolden, G. B. 2003b. The use of “so” and “oh” in sequence-initial position. Paper presented at Annual Conference of the National Communication Association, Miami FL. Bolden, G. B. 2008. “So what’s up?”: Using the discourse marker so to launch conversational business [Jul–Sep]. Research on Language and Social Interaction 41(3): 302–337. Bolden, G. B. 2009. Implementing incipient actions: The discourse marker ‘so’ in English conversation. Journal of Pragmatics 41(5): 974–998. Brinton, L. J. 2006. Pathways in the development of pragmatic markers in English. In The Handbook of the History of English, A. van Kemenade & B. Los (eds), 307–334. Oxford: Blackwell. Buysse, L. 2012. So as a multifunctional discourse marker in native and learner speech. Journal of Pragmatics 44(13): 1764–1782. Creighton, S. 2015. SO wrong! Why John Humphrys is in a rage at such a little word after it invades everyday speech. Mail Online, 20 June 2015. (30 March 2019). Davies, M. 2019. TV Corpus. 325 Million Words, 1950–2018. Brigham Young University. (15 May 2020) Dixon, K. 2017. Feedback, presented by Roger Bolton, with guest Heike Pichler. In Feedback podcast, BBC Radio. Gopnik, A. 2018. On prefixes. In A Point of View, BBC Radio 4. Halliday, M. A. K. & Matthiessen, C. M. I. M. 2004. An Introduction to Functional Grammar. London: Edward Arnold. Hardie, A. 2013. CQPweb – Combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17(3): 380–409. Hoffmann, S., Evert, S., Lehmann, H.-M. & Schneider, P. 1996–2010. BNCweb (CQP-Edition): A web-based interface to the British National Corpus (2009). (15 May 2020). Huddleston, R. & Pullum, G. K. 2002. The Cambridge Grammar of the English Language. ­Cambridge: CUP. Lewis, M. 1999[2012]. The New New Thing: A Silicon Valley Story. New York NY: Norton. Liberman, M. 2010. So new? Language Log. (13 April 2019).



Chapter 13.  Explaining explanatory so 225

Liberman, M. 2019. So. Language Log. (26 December 2019). Love, R., Dembry, C., Hardie, A., Brezina, V. & McEnery, T. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3): 319–344. Nesi, H. & Thompson, P. 2000–2005. BASE: British Academic Spoken English. Warwick: University of Warwick. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Schiffrin, D. 1987. Discourse Markers. Cambridge: CUP. Schlegl, L. 2018. Tracking Change in Canadian English Utterance-initial Discourse Markers. MA dissertation, University of Toronto. Shariatmadari, D. 2017. So, what’s the problem with ‘so’? (15 November). The Guardian. Thurman, H. 2014. How a popular two-letter word is undermining your credibility. Fast Company. (26 April 2019). Zwicky, A. 2005. Just between Dr. Language and I. Language Log. (4 May 2006).

Chapter 14

Return to the future Exploring spoken language in the BNC and BNC2014 Ylva Berglund Prytz University of Oxford

This study explores the use of expressions of future in two spoken corpora: BNC XML, from 1994, and BNC2014. It is a first, explorative venture aiming to identify features that merit further study. For that, the initial focus is placed on examining two features shown to vary between different corpora: frequency and proportions of the different expressions and collocations with personal pronouns. The results lend support to previous suggestions about changes in how future reference is expressed in English. It also shows that the new BNC2014 offers exciting opportunities for studies of Present-day British English. In combination with the demographically sampled component of the original BNC it becomes a useful resource for exploring the development of very recent language change. Keywords: expressions of future, spoken data, diachronic change, pronouns, BNC1994 and BNC2014 Que será, será Whatever will be, will be The future’s not ours to see Que será, será What will be, will be 

(Livingston & Evans 1955)

1. Introduction “The future’s not ours to see” sang Doris Day in 1956 (Livingstone & Evans 1955). It may well be that we cannot look into the future to determine our fate, but there are other ways we can look at the future. This chapter aims do so in two ways. Firstly, it will explore the use of certain means employed to express reference to the future in English (here referred to as ‘expressions of future’). By doing so, it will also identify a number of future opportunities in the use of old and new corpora of spoken https://doi.org/10.1075/scl.97.14ber © 2020 John Benjamins Publishing Company

228 Ylva Berglund Prytz

English language. The study is an introductory exploration of data from the Spoken BNC2014 and BNC corpora, looking at the use of expressions of future across the last 20 years (1994–2014). It focuses on discovering similarities and differences between the two datasets with regard to overall distribution of expressions of future and in relation to collocational patterns with personal pronouns. It is a first, explorative venture focussing on outlining quantitative patterns and identifying features that merit further study. The chapter contains an introduction, a methods and conventions section, two sections presenting the results of the introductory investigations, followed by a summary and conclusion. 1.1

Background

Speakers of English can refer to the future in a number of ways. Biber et al. (1999: 456) note that “there is no formal future tense in English” and suggest that “future time is typically marked in the verb phrase by modal or semi-modal verbs such as will, shall, and be going to”. Studies of the English future usually focus on these expressions, though some choose to add the present progressive and simple present to the list, as well as other modal verbs (for examples and discussion, see Palmer 1965, 1974; Quirk, Greenbaum, Leech & Svartvik 1985; Crystal 1995; Huddleston & Pullum 2002; Berglund 2005). For the present study, the focus will be on the modal or semi-modal constructions listed by Biber et al. (1999). In studies of the different means of future reference in English, it is a well-documented observation that going to is more common in spoken than written language. It is also argued that going to is becoming more frequent with time and that the use of shall is increasingly marginal. Until now, however, studies of the use across time have by necessity focussed mainly on developments in written language. Without access to comparable, spoken data it has been difficult to examine how usage in spoken language has changed over time.1 Fortunately, recent developments in the creation of spoken corpora mean that it is now possible to explore the use of these expressions in comparable spoken material from different times in the near past. When the British National Corpus (BNC) was first released in 1994, it changed the character and scope of corpus-based studies of contemporary English. 1. It is recognised that a number of initiatives have compiled valuable corpora of speech-related and interactive written material that can serve as a useful means to explore non-written language across time, for example A Corpus of English Dialogues 1560–1760 (CED) compiled under the supervision of Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University). See Kytö et al. for details.



Chapter 14.  Return to the future 229

Comprising 100 million words, its size was perhaps the most immediately noticed new feature. Less noticeable but equally important were the carefully considered compilation methods and sampling strategies used to create this generally available, very large, balanced, reference corpus. Another prominent and appreciated feature of the BNC is that it contains a considerable amount of spoken language, over 10 million words in total. Since its release, the BNC, and not least its spoken component, has been a major feature of research on contemporary British English (see, for example, the discussion in Love et al. (Love, Dembry, Hardie, Brezina & McEnery, 2017)). Although other large corpus compilation projects have followed that of the BNC, none has had the same spread or impact. It is increasingly realised, though, that however large or balanced the corpus may be, a corpus like the BNC, which contains material from 1960 to 1993, is not necessarily a representative sample of the language used today. It is obvious that certain types of text are not included in the corpus (such as social media material, email, online discussions) and that particular vocabulary will not be found (for example new terms like blog) while certain terms are used in a different sense (such as chat, like, and web). It is difficult to say to what extent the BNC can still be considered ‘contemporary’ or be useful as a source for studies of present-day English without knowing what the language looks like today. The emergence of the BNC2014 makes it possible to find out. The BNC2014 is a joint initiative between Cambridge University Press and the ESRC Centre for Corpus Approaches to Social Science at Lancaster University. The project is still ongoing but the spoken component of the BNC2014 is now available to researchers across the world, offering a tempting opportunity to explore very recent, spoken British English and get an insight into how the language has changed, or not, since the creation of the BNC. The Spoken BNC2014 is a corpus of over 11 million words, orthographically transcribed from audio recordings. It matches the size of the spoken component of the original BNC, but the methods used to gather the data are slightly different. The spoken component of the BNC consists of two kinds of spoken material: spontaneous conversations recorded by demographically sampled volunteers carrying a portable audio recorder to capture their spoken interactions, and material recorded in more formal contexts, such as meetings and talks (Aston & Burnard 1998; Burnard 2007). By contrast, BNC2014 only contains material recorded by volunteers who capture their conversations, similar to the demographically sampled part of the original BNC. For this study, only the conversational material will be considered, about 4 million words from the original BNC and 11 million words from the newer BNC2014. These samples will here be referred to as BNC1994 and BNC2014 respectively.

230 Ylva Berglund Prytz

2. Method and conventions For this study, the data was explored using the CQPweb platform at Lancaster .2 References and frequencies are as extracted through the platform and may differ slightly from those given in the corpus manuals. Unless otherwise stated, frequencies are relative and normalised, usually showing instances per million words (pmw). Corpus references are in the format “Corpus text: sentence”, for example BNC2014 S23A: 1264 denoting the BNC2014 corpus, text S23A, sentence 1264. Data for this study was retrieved through separate searches for the following constructions: will, wo n’t, ’ll, shall, sha n’t, going to, gon na.3 The results for will only include instances where the verb is tagged as a modal, plus the occurrences of won’t.4 The form ’ll is included as a separate expression, primarily motivated by its different distribution patterns (see Berglund 2005 for further discussion and justification). All instances of shall (including shan’t) are considered, irrespective of the level of futurity expressed. For going to, results only include instances where to is tagged as an infinitival marker. Gonna is included as a separate expression, in parallel to the treatment of ’ll. It has been argued that will and going to “… are often interchangeable, or can be substituted for each other with only the faintest change of meaning (Coates 1983: 201). To be able to compare the use of going to and gonna with the other expressions of future, it was necessary to exclude instances of the semi-modal that are not comparable in their temporal reference, such as example (1). In the interest of time and efficiency, this was done by only including instances of going to and gonna that are preceded by a present-time form of the auxiliary verb BE in the span of 1–3 words to the left (examples 2–4). (1) you were gonna set bloody fire to me  Cannot be substituted. (Not included)

(BNC1994 KB1: 209)

(2) But are we going to inflict that on you?  (BNC1994 KB0: 1703) Can be substituted: But will we inflict that on you? (Included)

2. I am indebted and grateful to the Lancaster crew for providing access to the data. 3. In the corpora, some orthographic units have been split into two to allow the wordclass tagging of the separate constituents (for example can’t = modal verb ca(n) + negation n’t). In such cases, it is necessary to adapt the search and retrieve wo n’t, sha n’t, and gon + na. These have, however, been rendered in this presentation as the orthographic units won’t, shan’t, and gonna. See Burnard (2007: 6.2) or (Leech & Smith, 2000) for further description, explanation and examples. 4. Will is also found tagged as a singular noun, proper noun and main verb.



Chapter 14.  Return to the future 231

(3) I’m gonna check it out I think  (BNC2014 S23A: 1264) Can be substituted: I will check it out I think (Included) (4) When are we gonna take the tree down?  (BNC2014 S2EF: 1822) Can be substituted: When will we take the tree down? (Included)

It is recognized that this somewhat crude method may either over-estimate the exact frequencies (include instances where the auxiliary is not the auxiliary used with the going to construction) or fail to find all instances (ignore instances where the auxiliary is missing or not found in any of the three positions to the left). However, manual inspection of a sample set suggests the error rate is low and the method is good enough for the present purposes. When contracted forms are included as separate expressions, it must be noted that frequencies may be affected by the way the conversations have been interpreted and transcribed. It cannot be assumed that the distinction is absolute and that a different transcriber would not have made a different choice. One way to examine to what extent variation in transcription practices has caused the difference in proportions of the full and contracted forms is to refer to the sound files of the recorded conversations. Although some of them are available, for example via the CQPweb interface to BNC1994, it is far from straightforward to get a consistent measurement of the variation in practice. The pronunciation of gonna, for example, is not binary with a clear distinction between a full and contracted form, as pointed out not only by Poplack and Tagliamonte (1999: 328) who suggest that “[g]oing to actually subsumes a number of phonetically distinct forms, variously realized as goin(g)ta, gonna, gon, go”. Lorenz (2013) finds that although the transcription of gonna and other contracted semi-modals “are quite reliably identified” in the American English corpus he is using, he still found that the transcription needed to be corrected in just over 7% of the cases (2012: 43). For the current study, it has been considered sufficient to consider the variants as transcribed as no detailed study of the variation between the full and contracted forms is made. The extent to which variation in transcription practices can be influential has been discussed previously, for example by Berglund (2005: 51–54), Krug (2000), Aston and Burnard (1998), and Crowdy (1995). The transcription of the new BNC2014 is described by Love et al. (2017).

232 Ylva Berglund Prytz

3. Quantitative findings 3.1

Future reference in the corpora

The frequencies of the five expressions of future explored here can be found in Figure 1. As shown in the figure, the most immediate difference between the two corpora is that the overall frequency of the expressions is lower in the newer BNC2014, 6,439 instances per million words compared to 7,591 in BNC1994. 8000 7000 6000 5000 4000 3000 2000 1000 0 gonna

1994

2014

1,200

1,539

going to

419

533

shall

328

190

'll

3,824

2,837

will

1,820

1,340

Figure 1.  Expressions of future in the two corpora (frequency pmw)

There are two potential explanations for this frequency difference. It could be that the language has changed so that reference to the future is expressed using other means. Another possible explanation is that the material in the two corpora varies so that there is less opportunity for using these expressions. 3.2

Opportunity for future reference

It is only to be expected that the extent to which reference is made to the future, naturally, varies with the topic of conversation, the context of the discourse and the people involved. If two corpora contain different numbers of expressions, this could indicate a difference in the composition of the datasets. If one set contains conversations about something that happened in the past and the other focuses on speculations about what is to come, the number of past tense verbs would



Chapter 14.  Return to the future 233

be expected to differ, and we would expect to find more expressions of future in the latter set. As noted above, reference to the future can be made in different ways. In addition to the expressions of future examined here, other means used to refer to the future include constructions with the present progressive and simple present, as well as other modal verbs. While the expressions included in this study can be argued to contain an inherent reference to the future (no further indication of future reference is needed), other constructions usually require an explicit or implicit reference to a point in the future for this interpretation to be made. Biber et al. (1999) suggest that in such cases, the future reference is not inherent in the verb but lies in the “grammatical contexts”; the reference to the future stems from something other than the verb as such, for example a time adverbial or an adverbial clause with future time reference (1999: 455). Examples (5) to (11) show some future-referring examples found in the corpora, together with versions where the adverbial is removed. It can then be seen that the future reference is still obvious in the examples with will and going to (examples 5 to 8), but not in examples (9) to (11) where simple present or progressive forms are used. (5) so this will come out later in the year  so this will come out (future reference) (6) so we will come to that later  so we will come to that (future reference)

(BNC1994 KDW: 6310) (BNC2014 SY5K: 317)

(7) your school’s going to be a polling station next week.  (BNC1994 KCH: 249) your school’s going to be a polling station (future reference) (8) I’m going to hand it in next week  I’m going to hand it in (future reference)

(BNC1994 KE3: 315)

(9) I’m not preaching anywhere tomorrow  I’m not preaching anywhere (no future reference)

(BNC1994 KB0: 2453)

(10) they can have it another time  they can have it (no future reference)

(BNC2014 S47C: 339)

(11) it’s half term next week isn’t it  it’s half term isn’t it (no future reference)

(BNC2014 SCG9: 606)

If the difference between the corpora is due to speakers preferring not to use the will/shall/going to expressions when talking about the future, this would be expected to be mirrored in a higher relative frequency of the words used to create the grammatical contexts needed for the future reference. Figure 2 illustrates the frequencies of a selection of words and phrases used to mark future reference. It is recognised that the list is far from complete and that any conclusions are but tentative at this stage, but it is nevertheless interesting to briefly examine the results.

234 Ylva Berglund Prytz

350 1994

2014

300 250 200 150 100 50 0

tomorrow

later

another another time day

soon

next week

in the future

shortly

Figure 2.  Future-referring expressions in the BN1994 and BNC2014 corpora (pmw)

Looking at Figure 2, it is clear that the BNC1994 corpus, which had the higher frequency of expressions of future, also has a generally higher relative frequency of the future-referring terms. Although some of the terms are used more frequently in the newer corpus, these are less frequent items, like in the future and shortly, which means that they would not influence the overall frequency to any greater degree. Altogether, this suggests that the older corpus contains more future-relating content and that the difference in the frequency of expressions of future could be explained by the topics covered in the recorded conversations. It is not possible to say without further study whether this difference is simply due to chance selection of topics or if it is an indication of a change in the language. Can it be that we today simply prefer to talk less about the future? Or is it the case that when we choose to do so, we use other constructions than those used by speakers in 1994, foregoing also the future-referring terms explored here? Further studies of the composition of the corpora could shed more light on this phenomenon. 3.3

Proportions of expressions of future

Frequencies are interesting when studying is the use, and possible change, of a language feature. Looking at frequencies is, however, not the only way to reveal interesting patterns of usage. If the expressions are interchangeable, as suggested by Coates (1983: 201), it may be assumed that all things being equal, the increased use of one expression should be mirrored in a corresponding decrease in the use of the others. It is therefore useful to explore not only the frequencies of the expressions but also the proportions in which they are used. Figure 3 presents the proportions

Chapter 14.  Return to the future 235



   ()    gonna going to shall

















'll





will





Figure 3.  Proportions of different expressions of future in the two corpora

of the five expressions examined here, where 100% is the combined number of instances of will (+won’t), ’ll, shall (+shan’t), going to, and gonna. It is easy to see that the corpora are similar in the way that ’ll is the most frequent expression while shall is very infrequent in both datasets. The contracted forms ’ll and gonna are used more than the full forms will and going to, something that is not surprising when exploring spoken language.5 More interesting may be to note that the use of the semi-modal going to and gonna is proportionately higher in the newer corpus: 32% of the total number of expressions of future compared to just over 21% in the older BN1994, a difference of about 50%. This suggests an increase in the use of the expression over time, thus tallying with the results presented by Mair (1997). In his study of the press categories of the LOB, Brown, FLOB, and Frown corpora he finds that going to is used more frequently in the later corpora, FLOB and Frown. He is not alone in suggesting an increase with time in the use of going to. Aitchison (1981) comments on the rising use of going to over time and already nearly 40 years ago she suggested that “[t]his is a construction whose progress is likely to be interesting in the next twenty or so years”, a suggestion that is also repeated in the later editions of her well-known book on language change (Aitchison 1991: 100, 2001: 110, 2013: 110). Even before 5. It is worth noting, once again, that transcription practices may affect the number of full and contracted forms that are given.

236 Ylva Berglund Prytz

that, Danchev et al. state that “[t]he construction to be going to + inf. has spread considerably during the last 50–60 years in modern English, and … this process continues” (1965: 375). The development and spread of the construction has been discussed in a number of works, including Danchev and Kytö (1994) and Krug (2000). Interestingly, none of the publications connect an increase in the use of going to with a corresponding decrease in the use of other expressions of future, nor is the possibility mentioned that the increase could depend on a greater inclination by English speakers/writers to refer to the future. Szmrecsanyi (2003) compares the use of going to and will in spoken language and identifies four factors which seem to influence the choice of expression. It will be useful to explore his findings in the context of language change in spoken language, something that is now possible with the availability of the comparable BNC1994 and BNC2014 data. 4. Collocations with personal pronouns Having found that the use of the expressions of future differs between the two corpora both in overall frequency and proportion, it is useful next to look at how the expressions are used. Berglund (2005) explores linguistic association patterns of the expressions of future. It is shown that the use of expressions of future differs more between the two spoken components of the original BNC than between the written LOB and FLOB corpora from 1961 and 1991 respectively. This has been interpreted as a suggestion that usage varies more with type of text (conversational data vs. material recorded in more formal contexts) than with time. As it has not previously been possible to compare large amounts of similar kinds of spoken language from different periods, it has been difficult to explore how the linguistic association patterns of the expressions of future in spoken language have changed with time. Having access to BNC1994 and BNC2014 it is interesting now to revisit this question. As a starting point, one specific pattern will be explored, namely the use of expressions of future with personal pronouns in the nominative: I, you, he, she, it, we, they. Berglund (2005) shows that the collocational patterns with personal pronouns vary between different expressions, and looking first at the overall distribution of pronouns in the two corpora, as shown in Figure 4, it can now be seen that the newer corpus has a somewhat higher overall frequency of the pronouns, but that the frequency of the gendered third person singular pronouns he and she is lower. The greatest proportional difference between the corpora is found in the distribution of the third person plural pronoun they, which is used considerably more in the newer corpus; just over 11,000 instances per million words in BNC2014 compared to under 8,800 in BNC1994, a difference of 26%. This raises interesting

Chapter 14.  Return to the future 237



40000

BNC1994

35000

BNC2014

30000 25000 20000 15000 10000 5000 0

I

you

he

she

it

we

they

Figure 4.  Personal pronouns in the BNC1994 and BNC2014 corpora (pmw)

questions relating to a potential pattern of change where the gendered single person pronouns are replaced by the un-gendered plural pronoun, such as in example 12 (for a discussion of the use of singular they, see Baron 2018). (12) need to go downstairs and speak to the librarian downstairs and they will take it and they’ll like (.) destroy it but like they do destroy documents  (BNC2014 SUVQ: 14827)

Turning to the co-occurrence of expressions of future and personal pronouns, the exploration is here focussing on the overall patters, and the function of the pronouns is not considered at this stage. This frequency-based exercise finds personal pronouns that function both as subjects and objects, and at times neither, as illustrated by examples (13) and (14). (13) I’m her best friend I will do it 

(BNC2014: S632 6393)

(14) if people want to do it they will do it 

(BNC2014: SDR9 333)

Although this does not necessarily identify syntactic relationships between the constituents, it is considered to be sufficient for the current purpose of identifying and comparing overall patterns in the corpora, in preparation for future more detailed examination of the syntactic patterns of the different expressions, following the process recommended by Leech (2004: 75): “Once the gross frequency changes have been plotted, the next step is to investigate factors internal to the corpora that might help to explain these changes”. Looking at the co-occurrence of personal pronouns and expressions of future, it is interesting to note that although there are more pronouns in the newer BNC2014 corpus, the frequency of these pronouns co-occurring with expressions of future is lower. That could suggest that the subject is expressed by another means than personal pronouns (examples (15)–(16)) or that the expressions are used without a subject at all, for example in false starts and implied subjects as in example (17).

238 Ylva Berglund Prytz

(15) the true flavour of the wine will be more aired 

(BNC2014 S23A: 478)

(16) I think the house market is going to crash further 

(BNC2014 S2B5: 235)

(17) Gonna say that’s they’re all on word processor 

(BNC1994 KB1: 4255)

The two corpora appear to be similar in how the expressions of future co-occur with personal pronouns, as shown in Figure 5 and Figure 6. Unsurprisingly, the pronouns are most frequent in the position immediately preceding the expression of future. One exception is the infrequent expression shall, as further discussed below. 2500

2000

1500

1000

500

0

–3

–2 I

–1 you

he

1 she

it

2 we

3

they

Figure 5.  BNC1994: Pronouns co-occurring with expressions of future in positions −3 to 3 (pmw)

Some strong similarities are also found between the datasets where patterns including the individual expressions and pronouns are concerned. In both corpora, seven out of ten of the instances of going to, for example, are immediately preceded by a personal pronoun6 (68% in BNC1994 and 69% in BNC2014), and just over half of all instances of ’ll co-occur with the first person singular pronoun I in one of the positions within three words before and after (55% and 54% in BNC1994 and BNC2014 respectively). Shall is different from the other expressions in that it is often, found with a pronoun in the position immediately following the expression, that is, appears in questions, as in (18). (18) Christmas wrapping paper or shall we just use the owl one?  (BNC2014 S2EF: 1851) 6. ‘Immediately preceding’ here means in the position preceding the auxiliary before going to.

Chapter 14.  Return to the future 239



2500

2000

1500

1000

500

0

–3

–2 I

–1 you

he

1 she

it

2 we

3

they

Figure 6.  BNC2014: Pronouns co-occurring with expressions of future in positions −3 to 3 (pmw)

Although similarities are found, it may be more interesting to look at the patters that differ between the two corpora, and to compare the patterns with different expressions of future. 4.1

Will

The proportion of will co-occurring with a personal pronoun is lower in the newer corpus, despite the fact that the personal pronouns are more frequent overall there. In the older corpus, about 23% of all instances of will are found with a personal pronoun immediately following it. The corresponding number for the newer corpus is only 15%, indicating that will is used less with inverted word-order in the newer data. The frequency of pronouns in the position immediately preceding will is also lower in the newer corpus, although not as markedly different, 45% vs. 49% in BNC1994. This suggests that the expression is found more with non-pronoun subjects in the newer corpus. 4.2

’ll

It has been shown that contracted forms, like ’ll, are primarily used with pronoun subjects, (for example by Axelsson 1998). That also appears to be the case in the two spoken corpora examined here. The proportion of ’ll immediately preceded by a personal pronoun is higher than for any of the other expressions and very

240 Ylva Berglund Prytz

similar in the two datasets: 93% in BNC1994 and 92% in BNC2014. Less than 2% of the instances of ’ll are immediately preceded by a noun. In most cases these are proper nouns or words similar to proper nouns, such as mum and daddy, as in example (19). (19) I bet you anything my dad ’ll have that sandwich when we get there  (BNC2014 SVH7: 5)

4.3

Going to

The uses of going to with personal pronouns are remarkably similar in the two corpora. The proportion of she used with the expression is slightly lower in the newer corpus, but as this matches the overall frequency distribution of the pronoun it is difficult to see an immediate reason to explore this in more detail. 4.4 Gonna Personal pronouns co-occur with gonna to a similar extent in both corpora, and the patterns are very similar, both where the proportions and positions of the pronouns are concerned. Like going to, gonna is used mainly with I and you, as in (20). It is rare to find instances of a personal pronoun in the position immediately following the expression other than in instances such as example (20) and (21), where they are the results of false starts and not syntactically linked to the preceding expression of future. (20) I’m gonna I’m gonna be a captain for Robin Hood 

(BNC2014 S46J: 496)

(21) Christmas cancelled for the rest of your life darling but I’m going to I’m I’m going to try  (BNC2014 SDQG: 44)

4.5

Shall

It has been suggested that shall is falling out of use, and that it is mainly used to express obligation unless used with a first person subject when it can be used for future reference. It is also suggested that shall used with second or third person pronouns “imports compulsion and obligation” (Bell 2016). Already in 1653, Wallis suggested that “The rule is […] for emphasis, willfulness, or insistence, one should say I/we will, but you/he/she/they shall” (Wallis, 1653). Shall is less frequent in BNC2014 than in the earlier BNC1994. Proportionately, the difference is noticeable (30% less in the newer corpus) but the actual proportions

Chapter 14.  Return to the future 241



are very low; in BNC2014, only 3% of the expressions of future are shall, compared to 4.3% in the 1994 corpus. Although the proportions are low, the size of the datasets means that there are still over 2,000 occurrences of shall in BNC2014 that can be compared and contrasted to the c. 1,500 instances in BNC1994 to offer a basis for some indications of current use. An initial exploration of the examples suggests that the similarities between the two datasets are considerable. As illustrated in Figure 7 and Figure 8, the use of the expression with he, she and they is very low; there are but a handful of examples where these pronouns are found within the −3 to 3 window of shall (about 40 in each corpus). Though sentences with shall + he/she/they often carry a sense of obligation or necessity, they may also at the same time convey a sense of future reference, as shown in examples (22) and (23). (22) Tony says oh we want this and we want that he shall have to employ somebody to do  (BNC1994 KCT: 10476) (23) he shall walk in and daddy will say oh what is that beautiful scent  (BNC1994 KBW: 12642) 120 100 80 60 40 20 0 I

–3

–2

–1

1

2

3

10.0

11.0

106.1

102.1

4.4

6.6

you

7.0

4.8

2.4

1.0

5.6

12.0

he

0.2

1.4

0.6

0.8

0.0

0.0

she

1.0

0.6

0.0

0.0

0.2

0.2

it

6.6

4.2

1.6

0.0

7.6

18.3

we

2.8

1.8

20.7

78.8

1.6

1.8

they

0.6

0.4

1.0

0.0

0.0

0.4

Figure 7.  BNC1994: Personal pronouns collocating with shall in position −3 to 3 (pmw)

242 Ylva Berglund Prytz 120

100

80

60

40

20

0 I

–3

–2

–1

1

2

3

6.0

4.1

15.1

82.1

1.8

3.4

you

3.9

3.1

1.1

0.1

0.9

8.6

he

0.7

0.4

0.0

0.1

0.0

0.1

she

1.0

0.3

0.0

0.0

0.2

0.1

it

4.6

5.4

2.2

0.2

0.9

14.3

we

2.9

2.2

6.4

84.7

1.0

1.7

they

0.5

0.1

0.0

0.0

0.0

0.1

Figure 8.  BNC2014: Personal pronouns collocating with shall in position −3 to 3 (pmw)

You and it are more commonly used with shall than are he, she and they, but the occurrences are mainly found further away from the expression within the search window. This may indicate that you and it are not normally the subject of the expression but have other functions, for example used as an indirect object as in example (24). (24) Shall I tell you something? I think I’ll be glad when.  (BNC1994 KB8: 10321)

As expected, the exclusively most frequent personal pronouns collocating with shall are the first person pronouns I and we, and it is for these that the most marked difference between the corpora can be seen. As illustrated in Figure 7, shall is found with the first person singular pronoun immediately preceding or following the expression to a considerable extent. The first person plural pronoun, however, is mainly used following the expressions, in questions such as (25) or tag questions (26). (25) Do you hear the music. Shall we have a waltz? 

(BNC1994 KB811:705)

(26) We’ll have a crack at it shall we? 

(BNC1994 KB7: 4001)



Chapter 14.  Return to the future 243

As shown in Figure 8, the first person pronouns are still the most frequently used with shall in the newer BNC2014. The same pattern is found for both the singular and plural first-person pronoun, although even more markedly so for the plural. The proportion of instances with inverted word-order is even higher than in the older data; 88% of all the instance of shall in BNC2014 are immediately followed by a personal pronoun. The corresponding figure for BNC1994 is 55%. This suggests that shall today is almost exclusively used in questions, including tag questions, which is a change from the patterns found in BNC1994. The higher proportion of plural pronouns used in questions, for example, may suggest that shall is seen as a feature of polite inclusivity, that shall we is used when the speaker wants to make a suggestion for action that also involves others. In the newer corpus, the higher proportion of shall with inverted word-order coincides with a lower use of inverted word-order will, which could indicate an emerging pattern where speakers prefer to use shall in contexts where will was used before. The development of shall + first person pronouns could offer an interesting topic to investigate further, starting before 1994 and continuing beyond 2014 to see if there is a shift of function. Is shall becoming the preferred expression in certain contexts, replacing will, possibly to the extent to which this increased use will counter the overall decrease in frequency? A related discussion to follow would be to what extent shall will remain an expression of future, included in studies like the present one (for a discussion of reasons for or against including shall among the expressions of future, see Berglund 2005). 5. Summary and concluding remarks This study has examined the use of expressions of future and found that it varies between the examined BNC1994 and BNC2014 corpora. The overall frequency of the expressions is lower in the newer BNC2014 material. As the frequency of other future-referring expressions also is lower, it cannot be excluded that the difference found may be a result of the differences in the topics covered in the different corpora rather than a change in the language. A possible, if unlikely, explanation to the change could be that people now choose to refer to the future more rarely in their conversations. If this were the case, it would be an interesting change in language use and speaker behaviour, but one that is far beyond the scope of the present brief exploration. Another explanation that would warrant further investigation is the extent to which other means of expressing future reference are used, such as simple present or progressive constructions without the future-referring adverbials considered here, for example where other adverbials are used or where the future reference is provided in the wider context of a conversation.

244 Ylva Berglund Prytz

Where the relative use of the different expressions of future is examined, it is found that the proportion of the semi-auxiliary expression going to is higher in the newer material, both in its full form and in its contracted variant form gonna. It has been suggested in a number of contexts that the use of going to is increasing with time, and the findings presented here lend support to that. The reduced use of shall is another possibly ongoing change. This study finds that the infrequent expression is used even less in the newer data, but a potentially emerging pattern can also be spotted where the expression is used more in questions and tags, suggesting a change not only in the frequency of use but also in the patterns of usage. To what extent this change will continue remains to be seen. This brief overview has shown that the new BNC2014 offers exciting opportunities for studies of Present-day British English. In combination with the demographically sampled component of the original BNC (BNC1994), it becomes a useful resource for exploring the development of very recent language change. The chapter has but scratched the surface of the possibilities made available with the existence of the new Spoken BNC2014. With the exciting prospect of a new written corpus also to appear, there will no doubt be reason to return to the future.

References Aitchison, J. 1981. Language Change: Progress or Decay? London: Fontana Aitchison, J. 1991. Language Change: Progress or Decay? Second edition. Cambridge: Cambridge University Press. Aitchison, J. 2001. Language Change: Progress or Decay? Third edition. Cambridge: Cambridge University Press. Aitchison, J. 2013. Language Change: Progress or Decay? Fourth edition. Cambridge: Cambridge University Press. Aston, G. & Burnard, L. 1998. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press. Axelsson, M. W. 1998. Contractions in British Newspapers in the Late 20th Century, Studia Anglis­ tica Upsaliensia 102. Uppsala: Acta Universitatis Upsaliensis. Baron, D. 2018. A brief history of singular ‘they’. Oxford University Press, 4 September 2018, (15 September 2019). Bell, E. 2016. Using ‘will’, ‘shall’ and ‘must’ in commercial contracts. Blake Morgan, 24 March 2016, (8 January 2020). Berglund, Y. 2005. Expressions of Future in Present-day English: A Corpus-based Approach, Studia Anglistica Upsaliensia 126. Uppsala: Acta Universitatis Upsaliensis. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. The Longman Grammar. ­Harlow: Pearson Education Limited. British National Corpus (XML edition). Retrieved from CQPweb at Lancaster, (15 September 2019).



Chapter 14.  Return to the future 245

The British National Corpus 2014: User Manual and Reference Guide (Version 1.1). (15 September 2019). Burnard, L. 2007. Reference Guide for the British National Corpus (XML Edition). (15 September 2019). Coates, J. 1983. The Semantics of the Modal Auxiliaries. London and Canberra: Croom Helm. Crowdy, S. 1995. The BNC spoken corpus. In Spoken English on Computer: Transcription, Mark-up and Application, G. Leech, G. Myers & J. Thomas (eds), 224–234. Harlow: L ­ ongman Group Limited. Crystal, D. 1995. The Cambridge Enclyclopedia of the English Language. Cambridge: Cambridge University Press. Danchev, A. & Kytö, M. 1994. The construction be going to + infinitive in Early Modern English. In Studies in Early Modern English, D. Kastovsky (ed.), 59–77. Berlin and New York: M ­ outon de Gruyter. Danchev, A., Pavlova, A., Nalchadjan, M. & Zlatareva, O. 1965. The construction going to + inf. in Modern English. Zeitschrift für Anglistik und Amerikanistik 13: 375–386. Huddleston, R. D. & Pullum, G. K. 2002. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press. Krug, M. G. 2000. Emerging English Modals : A Corpus-based Study of Grammaticalization. Berlin: Mouton de Gruyter. Kytö, M., Culpeper, J., Walker, T. & Archer, D. A Corpus of English Dialogues 1560–1760. Department of English, Uppsala University, (15 September 2019). Leech, G. & Smith, N. 2000. Manual to Accompany The British National Corpus (Version 2) with Improved Word-class Tagging. (15 September 2019). Leech, G. 2004. Recent grammatical change in English: Data, description, theory. In Advances in Corpus Linguistics: Papers from the 23rd International Conference on English language Research on Computerized Corpora (ICAME 23), K. Aijmer & B. Altenberg (eds), 61–81. Amsterdam: Rodopi. Livingston, J. & Evans, R. 1955. Whatever will be, will be : Que será, será. In Que será, será. London: Melcher-Toff Music. Lorenz, D. 2013. Contractions of English Semi-modals: The Emancipating Effect of Frequency. NIHN Studies. Freiburg: Rombach. Love, R., Dembry, C., Hardie, A., Brezina, V. & McEnery, T. 2017. The spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3): 319–344.  https://doi.org/10.1075/ijcl.22.3.02lov Mair, C. 1997. The spread of the going-to-future in written English: A corpus-based investigation into language change in progress. In Language History and Linguistic Modelling: A Festschrift for Jacek Fisiak on his 60th Birthday, R. Hickey & S. Puppel (eds), 1537–1543. Berlin/New York: Mouton de Gruyter. Palmer, F. R. 1965. A Linguistic Study of the English Verb. London: Longman. Palmer, F. R. 1974. The English Verb. London: Longman. Poplack, S. & Tagliamonte, S. 1999. The grammaticization of going to in (African American) English. Language Variation and Change 11(3): 315–342. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman.

246 Ylva Berglund Prytz

Spoken British National Corpus 2014. Retrieved from CQPweb at Lancaster: (15 September 2019). Szmrecsanyi, B. 2003. Be going to versus will/shall: Does syntax matter? Journal of English Linguistics 31(4): 295–323.  https://doi.org/10.1177/0075424203257830 Wallis, J. 1653. Grammatica linguæ anglicanæ Cui præsigitur, De loquela sive sonorum formatione, tractatus grammatico-physicus. Oxoniæ: Excudebat Leon. Lichfield. Veneunt apud Tho. Robinson.

Chapter 15

Sort of and kind of from an English-Swedish perspective Karin Aijmer

University of Gothenburg

The aim of this study has been to contribute to the discussion of type nouns by investigating the different meanings and functions of sort of and kind of from an English-Swedish contrastive perspective on the basis of their correspondences (translations or sources) in the English-Swedish Parallel Corpus. The results can confirm analyses of the functions of sort of and kind of which have been suggested in earlier work and point to cases where sort/kind of are ambivalent between different functions. The contrastive analysis gives support for both conventional and less conventional or semi-conventionalised meanings, implicatures, pragmatic effects and semantic (negative) connotations associated with the type noun. Keywords: type noun, parallel corpus, sort/kind of, meaning potential

1. Introduction A number of recent articles and books have been devoted to the study of type nouns (sort, kind, type) in English in both a synchronic and diachronic perspective (Denison 2002, 2005; Margerie 2010; Brems 2011; Mihatsch 2016). However, the precise analysis of the grammatical patterns with sort/kind/type of is not always clear but “they are characterised by construction-specific constraints and peculiarities” (Brems 2011: 61). Moreover, sort/kind of have different interpretations depending on the syntactic and semantic context. The aim of my chapter is to contribute to this discussion about the properties of the type nouns by investigating sort (of) and kind (of) from an English-Swedish contrastive perspective. A contrastive perspective is interesting if languages have markers which seem to have followed the same development. The Swedish type nouns sort-s (‘sort+genitive -s’) and slag-s (‘slag+genitive -s’) resemble English sort and kind since they are used in many different patterns with different functions https://doi.org/10.1075/scl.97.15aij © 2020 John Benjamins Publishing Company

248 Karin Aijmer

(Teleman et al. 1999, Volume 3: 83). The hypothesis is therefore that they have developed in similar ways. Another aim is to account for the multifunctionality of the two type nouns. The hypothesis is that sort of and kind of have a meaning potential rather than a fixed meaning (Norén & Linell 2007). The theory about meaning potentials explains, for example, how speakers negotiate about meanings in the interaction and use lexical items in specific ways according to the strategic needs of the communication. The meaning potential model further implies that the meanings of sort of and kind of are organized in some way which is compatible with their diachronic development from a core meaning. The research questions are: – What functions are served by the type nouns sort and kind and how can they be identified syntactically and semantically in the translations? – What can the translations tell us about the meanings and functions of sort of and kind of and how they are organized? The study is structured in the following way. I will first describe the parallel corpus and how it will be used to study sort of and kind of (Section 2). Section 3 analyses the results of the contrastive analysis in the light of how well they support previous analyses of the type nouns. The concluding section (Section 4) is a summary and a discussion of how the contrastive analysis can support the analysis of sort of and kind of as having a meaning potential organized around one or several core aspects. 2. Description of the data The contrastive perspective is essential in the study of pragmatic markers such as sort of and kind of that seem to have followed the same route of evolution from taxonomic or type nouns to new uses in many different languages. Mihatsch (2007: 240) showed that the equivalents of sort and kind in Romance languages (French, Italian, Portuguese, Spanish) and some Germanic languages seemed to have the potential to develop from taxonomic nouns to approximators. Moreover, type is used in new ways as a discourse marker in many languages (Swedish ‘typ’ Rosenkvist & Skärlund 2013; Italian ‘tipo’ Voghera 2013; Spanish ‘tipo’ Marques 2015). Other studies have used parallel corpora to compare the status of type nouns in different languages. Janebová and Martinková (2017), for example, compared type nouns in the ‘NP1of NP2’ structure in English and Czech through their translations in a parallel translation corpus. In another innovative study, Davidse et al. (2013) used comparable corpus data from English and French teenage forums to study the grammaticalization of sort of in English and its French correspondence genre de.

Chapter 15  Sort of and kind of from an English-Swedish perspective 249

Contrastive analysis can nowadays be carried out on the basis of parallel corpora where texts from different language pairs are presented in parallel. The use of parallel translation corpora is interesting from a methodological point of view since they provide access to the linguistic practice and knowledge of translators engaged in the professional activity of translating from one language to another. In this way translation corpora make it possible to derive semantic information or to test one’s hypotheses about the meaning of words in their contexts (Dyvik 1998; Noël 2003). The occurrences of sort of and kind of and their correspondences have been taken from the English-Swedish Parallel Corpus (ESPC) (Altenberg & Aijmer 2000; Altenberg et al. 2001). The corpus, which has almost three million words, consists of matching texts in English and Swedish. All the texts are fairly recent and come from both fiction and non-fiction. The fiction text samples, which are fairly short (c. 15,000 words), are taken from works by 25 different authors and represent children’s fiction, crime and mystery and general fiction. All the major varieties of English are represented. An attempt has been made to include only translations of good quality and to have as many different translators as possible in order to avoid translator bias. There were 22 different translators for the English books and 20 different translators of Swedish into English. The search tool is the Translation Corpus Explorer (Ebeling 1988). With a corpus organised like the English-Swedish Parallel Corpus it is possible to search both for the set of forms corresponding to sort of and kind of in Swedish and the set of forms in the Swedish source text which correspond to sort of or kind of in the target text: It is also possible to limit the search to either fiction or non-fiction texts. Only the fiction texts have been used for this study since this is where features characteristic of spoken involvement and interaction are most likely to be found. Table 1 shows the total number of occurrences of sort of and kind of in translations and sources in the English-Swedish Parallel Corpus. Kind of was more frequent than sort of in the corpus. Table 1.  The frequencies of sort of and kind of in translations and sources in the ESPC   sort of kind of Total

Translations

Sources

Total

 88 103 191

 65 116 181

153 219 372

250 Karin Aijmer

3. Syntactic, semantic and pragmatic properties of sort of and kind of A broad range of uses of sort of and kind of can be distinguished. Brems (2011),1 for example, found evidence for seven different functional categories on the basis of syntactic, sematic, pragmatic and prosodic criteria. Some of the patterns (from the ESPC) are illustrated below.

(1) But one needs a certain kind of energy to go on like like that. 



(2) He has a sort of alibi for the seventeenth 

(3) We sort of got talking.  (4) that’s the sort of story my brother tells me 

(BR1)2 (SW1T) (JB1) (RJ1T)

The criteria are both syntactic and semantic. In (1) kind is the lexical head of the noun phrase post-modified by ‘of energy’. The function of the construction is to categorize by singling out a subclass of the superordinate category ‘energy’. When sort of and kind of are not used to categorize they have meanings associated with approximation. A number of different sub-categories can be distinguished on syntactic and semantic grounds. In (2), where sort (of) is preceded by the indefinite article, it has a hedging or approximating function. Example (3) where sort of is adverbial has a downtoning or approximating function. Examples such as (4) are more difficult to analyse both syntactically and semantically. The sort of seems to be similar to such, and it is used cataphorically to refer to the following relative clause (see Section 3.4). The present boom of interest in type nouns also has to do with their uses as discourse markers. Sort of and kind of have been analysed as discourse markers when they do not contribute to the propositional content but have the procedural function to signal to the hearer how the utterance should be understood (Aijmer 2002; Miskovic-Lukovic 2009; Beeching 2016). There is also evidence for the nature and number of patterns from diachronic studies. Denison (2002) has, for example, shown that type noun patterns (or constructions) having qualifying meaning can be explained as the result of grammaticalization processes from structures where the type noun was the head word and that, furthermore, the developments are reflected in synchronic layering (Denison 2002;

1. The data from the synchronic part of Brems’ study of type nouns were extracted from the Times sub-corpus of the Collins Wordbanks corpus as well as from the COLT corpus (the Bergen Corpus of London Teenage Language) (Brems 2011: 273). 2. BR1 refers to the title of the text from which the example is taken. BR1T indicates that the text is a translation.

Chapter 15  Sort of and kind of from an English-Swedish perspective 251

cf. also Margerie 2010; Brems 2011). In this Section, I will provide additional evidence from the contrastive corpus study for the number and kinds of constructions which have been proposed in the diachronic studies of type nouns. 3.1

Type nouns with a classifying use

It is generally agreed that “the fact that TNs [type nouns] can function as the head noun of a NP is not in question” (Brems 2011: 275). In the head noun construction, sort or kind has a taxonomic relationship to N2. A sort of apple refers, for example, to a particular sub-category of the superordinate category ‘apple’. Both sort and kind and the plural forms sorts and kinds can have taxonomic meaning. The corpus contained 12 examples of kind of with a classifying function (and three examples with kinds of) and eight examples with sort of (and three examples with sorts of). In (5) the superordinate category is ‘tiles’ and sort of refers to different types of tiles. The default correspondence is a Swedish type noun (slags, sorts).3 (5) It was n’t easy to feel in the dark what sort of tiles they were; in any case not a type he recognised.  (LG1T) Det var inte lätt att känna i mörkret vad slags kakel det var; i varje fall ingen sort som han kände igen.  (LG1)

In (6) information is the superordinate category that has different sub-categories (information which the military intelligence service wants to keep secret and information which can be publicized). The Swedish source text contains typ (‘type’), which is closely associated with the classifying function. (6) But this was the kind of information that military intelligence wanted to keep secret.  (JG1T) Det sista var en typ av uppgifter som den militära underrättelsetjänsten i sin helhet ville hemlighålla.  (JG1)

In (7) all sorts of is a construction with a head noun as shown by its Swedish correspondence (alla sorters) with reference to all types of people (including those who think that aristocracy and wealth are most important in life). (7) Det finns himlar för alla sorters människor, inklusive dem som tycker att aristokrati och rikedom är viktigast i livet.  (RDA1T) There are Heavens for all kinds of people, including those who think chiefly in terms of aristocracy and wealth.  (RDA1)

3. The Swedish slags (and sorts) contains an original genitive -s corresponding to English of (Teleman et al. 1999, Volume 3: 86).

252 Karin Aijmer

The head noun could have an adjective restricting the reference to a special subcategory of a superordinate species or category, as in example 8.

(8) She had painted her nails dark red, Macon saw, and put on a blackish lipstick that showed her mouth to be an unusually complicated shape – angular, like certain kinds of apples.  (AT1) Macon såg att hon just hade målat naglarna mörkröda och lagt på nästan svart läppstift, vilket avslöjade att hennes mun hade en mycket ovanlig form – den var kantig, som en viss sorts äpplen.  (AT1T) (Lit. ‘a certain sort-s apples’)

‘Certain kind of apples’ refers to apples with an unusually complicated form. Other adjectives or predeterminers constraining the reference to a type of people or things were same, different, right, wrong, similar, special, only. In (9) the restriction is to only one type of the larger category referred to. (9) According to Raymond Chandler, there is only one kind of person who wears shoes and socks like that.  (RJ1T) Enligt Raymond Chandler finns det bara en sorts människor som har sådana skodon och sådana sockor.  (RJ1)

3.2

Type nouns with a characterising use

Sort/kind of can also be a modifier preceded by an adjective with the second noun in its scope. In this attributive modifier use, sort/kind of applies to an instance and not to a type (subcategory) of the category named. The patterns with attributive adjectives contained seven examples of sort of in the English original or translation. Kind of was found in five examples. In example (10) the qualities named by the adjective are attributed to the second noun (N2) and not to sort. This explains why sort of (chap, person, way) was added in the translations from Swedish into English in (10) to (12) and could be omitted in the translation of an English original into Swedish in (13). (10) But anyway Pentti, that plumber who could hardly speak Swedish, was a decent and helpful sort of chap  (LG1T) Men hur som helst var nog Pentti, den där rörmokarkillen som knappast kunde svenska, en hjälpsam och anständig människa.  (LG1) (Lit. ‘a helpful and decent chap’)

The construction is typically used with adjectives describing a person’s characteristic properties:

Chapter 15  Sort of and kind of from an English-Swedish perspective 253

(11) She seems like a stable sort of person.  Hon verkar stabil.  (Lit. ‘She seems stable’)

(SW1T) (SW1)

The attributive adjective could also be followed by sort of way or sort of thing, as in (12). (12) The grown-ups pitied me in a respectful sort of way.  De vuxna tyckte synd om mig på ett respektfullt sätt. 

(RJ1T) (RJ1)

Sort of has been omitted in the translation of the English original sentence, as in example 13. (13) An artist is a tawdry, lazy sort of thing to be, as most people in this country will tell you.  (MA1) Att vara konstnär är pråligt, lättjefullt, vilket de flesta här i landet kan tala om. (Lit. ‘is tawdry, lazy’) (MA1T)

3.3

The quantifier construction

Constructions in which sorts/kinds of are premodified by all4 can refer to a type or subcategory of N2. In (14) the speaker is referring to distinct types of climate. (14) Like certain plants, he tolerated all kinds of climates.  Precis som vissa växter, tålde han alla olika slag av klimat.  (Lit. ‘all different kinds of climate’)

(MR1T) (MR1

However, the type nouns sorts or kinds can lose their head word status. In some examples with the same surface structure, the type noun shifts from being categorizing as a head noun to being part of a quantifier. According to Brems and Davidse (2010: 188), “the argumentation for this reanalysis claim hinges on the shift from the universal quantifier sense of all in the binomial construction to the ‘many’ sense of the quantifier construction”. In example (15) the translation shows that the quantifier meaning of the construction is foregrounded. (15) He ‘d go mad, shout and scream, pull off your boots and do all sorts of things to them which made you look away, and then you had to get rid of them.  (RJ1T) Då blev han tokig, vrålade och slet av stöveln och gjorde en massa saker i den så att man var tvungen att blunda och sen fick slänga bort den.  (RJ1) (Lit. ‘a lot of things’) 4. In addition there was one example of no kind of.

254 Karin Aijmer

In (16), the Swedish original has mycket ‘much’, which corresponds to all kinds of things in the translation. We are told about a person who had obtained insight into many things rather than ‘all different types of things’. (16) She had thus obtained insight into all kinds of things that were intended to be secret.  (KOB1T) Därigenom hade hon fått insikt i mycket som var ämnat att förbli hemligt.  (KOB1)

In (17) the categorizing meaning and the implicature of a great quantity are both present in the translation. (17) She saw them come riding home in the evening with goods on their horses’ backs, all kinds of goods in sacks and leather bags and chests and boxes.  (AL1T) Hon såg dem komma hemridande om kvällarna med varor på hästryggen, många olika slags varor i säckar och skinnpåsar och lådor och skrin.  (AL1) (Lit. ‘many different kinds of goods’)

According to Brems and Davidse (2010: 189), all sorts of and all kinds of can acquire additional meanings or implicatures: “In addition to ‘large quantity’, these do add the notion of ‘variety’ making it similar in meaning to quantifier various”. The Swedish correspondences allt möjligt (‘all possible things’), alla möjliga människor (‘all possible people’), alla möjliga slags (‘all possible kinds’), alla möjliga sorters (‘all possible sorts’), av alla möjliga slags (‘of all possible kinds’), alla möjliga former (‘all possible forms’) focus on the variety of people or things (all possible kinds) rather than quantity, see examples 18 and 19. (18) “These past two years I ‘ve been introducing you to all sorts of people. (BR1) “De här två sista åren har jag presenterat dig för alla möjliga människor.  (BR1T) (19) You see a familiar name and it sets off all sorts of memories.  (JB1) Man får syn på ett välbekant namn i spalterna och det ger upphov till alla möjliga sorters minnen.  (JB1T)

There was also one example of every kind of with a weaker quantifier reading (example 20). (20) Johnny produced from his tool bag every kind of spanner and screwdriver you can think of and then a few more besides.  (PP1T) Janne hyvade fram alla möjliga och omöjliga nycklar och mejslar ur verktygsväskan.  (PP1) (Lit. ‘all possible and impossible spanners and screwdrivers’)

Chapter 15  Sort of and kind of from an English-Swedish perspective 255

The variety meaning can also be rendered by an adjective (example 21) or in some other way (22). (21) Discomfort can be caused by all kinds of things.  Olust kan uppkomma av allehanda skäl.  (Lit. ‘for a variety of reasons’)

(KOB1T) (KOB1)

(22) “Yes,” said Lovis, “that ‘s true of all sorts of things.  “Ja”, sa Lovis, “det stämmer på lite av varje.  (Lit. ‘a little of everything’)

(AL1T) (AL1)

We can conclude that all sorts of and all kinds of can be regarded as a special construction with a strong or weak quantifier meaning. In the corpus, the meaning of quantity or variety was even more frequent than the categorizing meaning. Table 2 shows the correspondences of all sorts of and all kinds of in Swedish translations and Swedish sources classified with regard to whether they have categorizing meaning or refer to quantity (many instances) or variety (diverse instances). Table 2.  The correspondences of all sorts/kinds of in Swedish sources and translations   all sorts of all kinds of* Total

Categorizing

Quantity

Variety

Other

Total

4 4 8

1 5 6

 7  5 12

1** – 1

13 14 27

* Including one example of every kind of ** The example is: Grown-ups could avoid all sorts of accidents if only they listened to my grandmother, but nobody cares. (RJ1T) Vuxna skulle kunna undvika olyckor de får reda på tack vare min mormor, men ingen bryr sig om henne. (Lit. ‘accidents they find out about’) (RJ1)

3.4

Type nouns with an identifying use

The type noun sort/kind of can also be a post-determiner (a complex determiner) used to form a determiner complex together with the preceding determiner (Brems 2011: 294). The specific meaning of the sort/kind of is identifying rather than classifying. Table 3 shows the Swedish correspondences of the sort/kind of (and other postdeterminers with a demonstrative before the type nouns).

256 Karin Aijmer

Table 3.  The Swedish correspondences of sort of/kind of as a post-determiner construction   (DET) sort of (DET) kind of* Total

sådan (sån, sådant, sånt, sådana, såna) 19 18 37

den (här) sortens/det slags,den sorts



other

Total

 9 20 29**

– 1 1

 5 12 17

33 51 84

* There was also one example of ‘these kind of ’ translated by a form of such ** 22 examples used ‘den sortens’ (the sort-GENs) as a correspondence. One example used av den sorten (‘of that sort’)

A frequent translation of the post-determiner is a form of sådan ‘such’, as in (23). (23) I thought: All right, this is the sort of thing one must expect.  Jag tänkte: Det är klart, sådant här måste man vara beredd på.  (Lit. ‘such here’)

(BR1) (BR1T)

The type noun can also be preceded by a demonstrative determiner, illustrated in (24). (24) She knew that these sort of wonders do pop up in the world from time to time, but only once or twice in a hundred years.  (RD1) Hon visste att sådana mirakel inträffar då och då här i världen, men bara en eller två gånger vart hundrade år.  (RD1T) (Lit. ‘such miracles’)

‘A wound of that sort’ illustrates a variant of the construction, see (25). (25) It does occasionally happen that a wound of that sort has difficulty in healing, and leaves an ugly scar as a constant reminder of the insult.  (KOB1T) Ibland händer det att ett sådant sår har svårt för att läkas och efterlämnar ett missprydande ärr som ständigt påminner om skymfen.  (KOB1) (Lit. ‘a such wound’)

In (26) the original Swedish text has a form of sådan followed by the type noun (‘sådana slags’). (26) Big white sheets hang beside undershirts and the kind of crocheted pads women use for their monthlies.  (JM1T) Där hänger stora vita lakan bredvid undertröjor och sådana slags virkade bindor som kvinnor brukar ha till måntan.  (JM1) (Lit. ‘such kind of ’)

Chapter 15  Sort of and kind of from an English-Swedish perspective 257

We can conclude that the post-determiner construction and sådan (‘such’) are rather similar. The translation correspondences with a form of sådan also confirm Brems’ (2011) observation that “in many of the determiner complex constructions, the referential meaning of the determiner complex can be substituted by a determiner with predetermining such” (Brems 2011: 298). The meaning of the sort/the kind of also corresponds to ‘den sortens’ (‘the sort-GENs’) with an explicit deictic meaning. The kind of in (27) points forward to a relative clause containing a description intended to characterize the kind of wife the speaker has in mind. (27) She was the kind of wife who looks out of her front door in the morning and, if it ‘s raining, apologizes.  (FW1) Hon var den sortens hustru som tittar ut genom ytterdörren på morgonen och ber om ursäkt om det regnar.  (FW1T)

A form of sådan could also have been used in the translation. However, when the translation refers to the type (‘den sortens’) the cohesive function of the determiner is more explicit. The reference of den sortens (det slags), ‘the sort-GENs’, ‘the kind-GEN-s’, is typically cataphoric. In (28) den sortens (småaktighet) connects the noun phrase to the more elaborate description in the following restrictive relative clause. (28) Det var just den sortens småaktighet som man kunde vänta sig av bittra och omedgörliga män som Con.  (JC1T) It was the sort of petty rebuff he ’d expect from bitter, unforgiving men like Con.  (JC1)

Sådan as a translation of the/that sort/kind of mainly has identifying function. However, there is a short step from the identifying to the intensifying function as shown by the translation with the intensifying adverb så (‘so’), see (29). (29) There was nothing wrong with his request on the face of it. I chased the proposition around in my head with caution, wondering what Tony Gahan had done for Limardo that would net him this kind of payoff.  (SG1) Ytligt sett fanns det inget fel i hans begäran. Omsorgsfullt gick jag i tankarna igenom uppdraget och undrade vad Tony Gahan gjort för Limardo som var värt en så furstlig belöning.  (SG1T) (Lit. ‘one so royal reward’)

En så (‘one so’) seems to be both identifying and intensifying.

258 Karin Aijmer

3.5

Type nouns with an approximating function

Sort of and kind of were frequently used with the indefinite article or with some in original texts and in translations (a sort of: 27 examples, some sort of: 23 examples, a kind of: 66 examples, some kind of: 44 examples). A sort of/kind of are functionally similar to discourse markers. Their general meaning is approximating or hedging, as in (30). (30) Efter en stund kände jag ett slags svindel av att bara titta på allt som rörde sig.  (BO1T) After a while I felt a sort of vertigo just looking at anything that moved. (BO1)

In Swedish the corresponding structure consists of a head noun modified by the indefinite article (‘ett slags’) but the function of the construction is qualifying or approximating rather than classifying. The translations can also function as a contextual clue to the interpretation. In (31) the translator’s choice of nästan (‘almost’) suggests that a kind of has a downtoning function (‘it was almost as if ’, ‘like’). (31) Some places the walls gave off a kind of echo  I somliga vrår av huset nästan ekade det  (Lit. ‘almost echoed’)

(AT1) (AT1T)

In (32), liksom in the Swedish source text conveys that the speaker is uncertain and that the following noun should not be interpreted literally. (32) He knew no one else would dare employ him, so he felt a kind of responsibility for him.  (MR1T) Men han var duktig, och mormors far tyckte om honom, och han visste att ingen annan skulle våga ha karlen i sin tjänst, så han kände liksom ett ansvar för honom.  (MR1)

Some sort/kind of also have the meaning approximation. In (33) the adjectival suffix -aktig (brunaktig ‘brownish’) conveys the approximate idea of colour. (33) He was at least thirty-five and dressed similarly to the boy except that his jeans were of some dark colour, dark grey or dark brown, and he wore some sort of brown pullover.  (RR1) Han var åtminstone trettiofem och klädd ungefär som pojken, sånär som på att hans jeans var i någon mörk färg, mörkgrå eller mörkbrun, och han haft på sig en brunaktig pullover.  (RR1T)

Chapter 15  Sort of and kind of from an English-Swedish perspective 259

The correspondence between girlish and a sort of in (34) shows that the translator has interpreted sort of as having approximating meaning. (34) They drank a toast, and there was no trace of unrest in her eyes, rather a sort of girlish mischief.  (KOB1T) De skålade och det fanns ingen oro i hennes blick, snarare ett ungflickaktigt okynne.  (KOB1)

På något vis (‘in some way’) (two examples) and någotsånär (‘more or less’, ‘rather’) are other correspondences with approximating meaning, as in (35). (35) There seemed little doubt that the man felt some kind of jealousy.  (RD1) Det rådde ingen som helst tvekan om att karln var svartsjuk på något vis.  (RD1T) (Lit. ‘jealous in some way’)

Kind of can also express the speaker’s negative attitude to what is talked about, see (36). (36) I guess he thinks we ‘re some kind of pimp service or something.  (SW1T) Han tror att vi är någon jävla horförmedling, va.  (SW1) (Lit. ‘some bloody pimp service’)

In addition to the approximating function, a sort/kind of can be identifying as in (37) where the Swedish source text has en sådan (‘a such’). (37) There ‘s also a kind of sundial to put in the south window.  (JMY1T) Det är en sådan apparat man skall ställa i söderfönster och den är som ett solur.  (JMY1)

3.6

Adverbial or discourse marker uses

Sort of/kind of were also used to qualify verbs (and a single example of an adjective) in my data. The Swedish type nouns cannot be used in the translations where we find adverbials with a downtoning or approximating function (with correspondences such as liksom ‘like’, nästan ‘almost’, mer eller mindre ‘more or less’, vara som ‘be as’, på något vis ‘in some way’, ungefär ‘approximately’). The adverbials signal that the following word should be interpreted loosely as not being very precise or only expressing a moderate degree of a property. Some examples are shown in (38) to (44).

260 Karin Aijmer

(38) I waited till Henno had gone out – he was always going out; he said he knew what we got up to when his back was turned so not to try anything, and we kind of believed him – and I got the bin from beside his desk and brought it down to my desk.  (RD01) Jag väntade tills Henno hade gått ut – han gick alltid ut; han sa att han visste vad vi hade för oss bakom ryggen på honom så försök inte, och vi trodde honom nästan – och sen hämtade jag papperskorgen bredvid katedern och tog den till min bänk.  (RD01T)

In (38) kind of has been translated as nästan (‘almost’) indicating that the meaning conveyed by the verbs does not fully apply. Example (39) illustrates sort of with the meaning imprecision. The translation shows that zooms off should not be understood literally but only as close to or approaching reality. (39) I ca n’t reproduce the way he talks – you ‘ll have to listen to him for yourself – but he just sort of zooms off.  (JB1) Jag kan inte återge hans sätt att tala – ni får själva lyssna på honom – men han är som en zoom: går igång fortare än kvickt.  (JB1T) (Lit. ‘he is like a zoom: starts off more quickly than quick’)

Liksom was the most frequent Swedish correspondence in the corpus (six examples, including 40). (40) We sort of got talking  Vi kom liksom att talas vid 

(JB1) (JB1T)

Mer eller mindre (‘more or less’) conveys the speaker’s shift to more figurative language, as in (41). (41) When I meet people I like, instead of saying more and showing I like them and asking questions, I sort of clam up, as if I do n’t expect them to like me, or as if I ‘m not interesting enough for them.  (JB1) Träffar jag sympatiska människor så tiger jag mer eller mindre som en mussla, istället för att visa dem mitt gillande och bidra till konversationen; det är som om jag utgår från att de inte ska gilla mig eller att jag inte är intressant nog.  (JB1T) (Lit. ‘I shut up more or less like a clam’)

In (42) sort of was followed by an adjective. The translator has chosen a fairly free rendering.

Chapter 15  Sort of and kind of from an English-Swedish perspective 261

(42) As it is, I study the moustache and think: That looks sort of good.  (MA1) I stället betraktar jag mustaschen och tänker: Det där är inte så tokigt. (Lit. ‘is not so bad’) (MA1T)

It is evident from the translation that sort of expresses the speaker’s wish to be cautious. The adverbial sort of/kind of is also found at the end of sentences, functioning as a hedge with scope over the preceding clause, see (43). (43) Allwright picked up the receiver. “Allwright?” he said. Whoever it was apparently made some amusing remark. “Yes, I am, sort of.”  Nöjd tog luren och sa: - Nöjd?) Den andre sa antagligen någon lustighet. - Ja, det är jag, tämligen.  (Lit. ‘Yes, that am I, rather’)

(SW1T)

(SW1)

Tämligen (‘rather’) is compatible with an ironic interpretation of sort of in this context. The person answering the telephone call gives his last name (Nöjd) which is the same word as an adjective (‘nöjd’ = all right) and can be modified by sort of. In (44) the Swedish original employs the hedging så där (‘like that’) to convey that the description is not precise (‘in all directions like’). (44) “Yes, but it stands straight up like straw – in all directions, sort of,” maintained Henry.  (ARP1T) – Ja, men det står rätt upp som halm – åt alla håll så där, framhärdade Henry.  (ARP1)

4. Conclusion As can be expected from their association with spoken involvement and interaction, sort of and kind of do not have a fixed meaning but their meanings depend on the context, in particular the syntactic co-text. Sort of and kind of have a broad range of different meanings that can be selected by the language user to fulfil needs in the communication situation. At the same time, some meanings seem more central than others. In the case of sort of and kind of, an obvious core aspect of their semantic potential is the categorizing meaning. This meaning is, for example, basic in a diachronic perspective, and it can be expected to be relevant for the interpretation

262 Karin Aijmer

of the marker synchronically. Thus, the head noun meaning never completely disappears but many contexts are ambiguous between the head noun use and other meanings (e.g. between head noun and qualifier). There also appears to be another core aspect of sort of and kind of, which I have referred to as ‘approximating’. When sort of and kind of have approximating meaning, they have sub-functions such as imprecision, downtoning and hedging. Another aim of the study has been to contribute to the discussion of type nouns by using a contrastive analysis to describe the different meanings of sort of and kind of. As hypothesized, the translations (and sources) can confirm analyses of sort of and kind of which have been suggested in earlier work and point to cases where sort/kind of is ambivalent between different functions. The correspondences give support for both conventional and less conventional or semi-conventionalised meanings, implicatures, pragmatic effects and semantic (negative) connotations associated with the type noun. The translations and sources showed, for example, that all sorts/kinds of correspond to a form of mycket (‘much’) or to lexical items with the meaning ‘diversity’. When sort/kind of were preceded by a (descriptive) adjective they were generally omitted in the translation which was taken to show that the adjectives no longer had scope over the head noun. Moreover, the contrastive analysis provided evidence that the post-determiners the sort/kind of (also with a demonstrative pronoun, for example this sort of (thing), had a function similar to such). The translations also make it possible to discover pragmatic effects of the type nouns that are associated with more established or grammaticalized meanings (such as intensification associated with the postdeterminer sort of/kind of).

References Aijmer, K. 2002. English Discourse Particles: Evidence from a Corpus. Amsterdam: John Benjamins. Altenberg, B. & Aijmer, K. 2000. The English-Swedish Parallel Corpus: A resource for contrastive research and translation studies. In Corpus Linguistics and Linguistic Theory. Papers from the 20th International Conference on English Language Research on Computerized Corpora (ICAME 20) Freiburg im Breisgau 1999, C. Mair & M. Hundt (eds), 15–33. Amsterdam: Rodopi. Altenberg, B., Aijmer, K. & Svensson, M. 2001. The English-Swedish Parallel Corpus (ESPC): Manual. Department of English, Lund University. Beeching, K. 2016. Pragmatic Markers in British English. Cambridge: CUP. Brems, L. 2011. Layering of Size and Type Noun Constructions in English. Berlin: Mouton de Gruyter. Brems, L. & Davidse, K. 2010. The grammaticalization of nominal type noun constructions with kind/sort of: Chronology and paths of change. English Studies 91(2): 180–202.

Chapter 15  Sort of and kind of from an English-Swedish perspective 263

Davidse, K., Brems, L., Willemse, P., Doyen, E., Kiermeer, J. & Thoelen, E. 2013. A comparative study of the grammaticalized uses of English ‘sort (of)’ and French ‘genre (de)’. In Teenage Forum Data: Standard and Non-standard Languages on the Internet, E. Miola (ed.), 41–66. Alessandria: Edizionidell’Orso. Denison, D. 2002. History of the sort of construction family. Paper presented at the second International Conference on Construction Grammar, University of Helsinki, 7 September 2002. Denison, D. 2005. The grammaticalization of sort of, kind of and type of in English. Paper presented at NRG 3 (New Reflections on Grammaticalization 3), Santiago de Compostela, 17–20 July 2005. Dyvik, H. 1998. A translational basis for semantics. In Corpora and Cross-linguistic Research: Theory, Method and Case Studies, S. Johansson & S. Oksefjell (eds), 51–86. Amsterdam: Rodopi. Ebeling, J. 1988. The translation corpus explorer: A browser for parallel texts. In Corpora and Cross-linguistic Research: Theory, Method and Case Studies, S. Johansson & S. Oksefjell (eds), 101–112. Amsterdam: Rodopi. Janebová, M. & Martinková, M. 2017. NP-Internal kind of and sort of: Evidence from a parallel translation corpus. In Contrasting English and Other Languages through Corpora, M. Janebová, E. Lapshinova-Koltunski & M. Martinková (eds), 164–217. Newcastle upon Tyne: Cambridge Scholars. Margerie. H. 2010. On the rise of (inter)subjective meaning in the grammaticalization of kind of /kinda. In Subjectification, Intersubjectification and Grammaticalization, K. Davidse, L. Vandelanotte & H. Cuyckens (eds), 315–346. Berlin: Mouton de Gruyter. Marques, M. A. 2015. Tipo. Référentiation et modalisation dans des interactions verbales orales. In Travaux et documents 60 (Faits de langue et de discours pour l’expression des modalités dans les langues romanes), M. H. Carreira (ed.), 249–260. Paris: Université Paris 8. Mihatsch, W. 2007. The construction of vagueness: Sort-of expressions in Romance languages. In Aspects of Meaning Construction: From Concepts to Utterance, G. Radden, K. M. Köpke, T. Berg & P. Siemund (eds), 225–245. Amsterdam: John Benjamins. Mihatsch, W. 2016. Type-noun nominals in four Romance languages. Language Sciences 53(B): 136–159. Miskovic-Lukovic, M. 2009. Is there a chance that I might kinda sort of take you to dinner?: The role of the pragmatic particles kind of and sort of in utterance interpretation. Journal of Pragmatics 41(3): 602–625. Noël, D. 2003. Translations as evidence for semantics: An illustration. Linguistics 41(4): 757–785. Norén, K. & Linell, P. 2007. Meaning potentials and the interaction between lexis and contexts: An empirical substantiation. Pragmatics 17(3): 387–416. Rosenkvist, H. & Skärlund, S. 2013. Grammaticalization in the present – the changes of modern Swedish typ. In Synchrony and Diachrony: A Dynamic Interface, A. G. Ramat, C. Mauri & P. Molinelli (eds), 313–338. Amsterdam: John Benjamins. Teleman, U., Hellberg, S. & Andersson, E. 1999. Svenska Akademiens grammatik. Stockholm: Norstedts. Voghera, M. 2013. A case study on the relationship between grammatical change and synchronic variation: The emergence of tipo[-N] in Italian. In Synchrony and Diachrony: A Dynamic Interface, A. G. Ramat, C. Mauri & P. Molinelli (eds), 283–312. Amsterdam: John Benjamins.

Chapter 16

From yes to innit Origin, development and general characteristics of pragmatic markers Anna-Brita Stenström University of Bergen

The form innit (from isn’t it) is used as a pragmatic marker in the same way as yes, yeah and okay by London teenagers. This chapter discusses how this usage has developed by comparing The Bergen Corpus of London Teenage Language (COLT) with the more recent Multicultural London English Corpus (MLE). It includes a comparison with the use of the markers by adults in the spoken part of the British National Corpus (here ‘BNC Old’, for convenience),1 and the more recent BNC2014. The comparison gives an indication of how the use of the markers has developed, for instance that okay may or may not out-manoeuver yes and yeah, while innit has become an established marker. Keywords: corpora, teenage talk, pragmatic markers

1. Introduction What aroused my interest in the present topic was the introduction by Metcalf to his book Ok: The Improbable Story of America’s Greatest Word (2011), where okay is said to be “the most frequently spoken (or typed) word on the planet” (2011: 1). This led to the topic of this chapter, the aim of which has been to examine the spread and development in time in Britain, not only of okay but of three other pragmatic markers: yes, yeah, and not least innit, discussed in detail in Pichler (ed.) (2016).

1. According to the BNC homepage “[w]ork on building the corpus … was completed in 1994” with “no new texts added after the completion”. https://doi.org/10.1075/scl.97.16ste © 2020 John Benjamins Publishing Company

266 Anna-Brita Stenström

1.1

The markers

Yes, yeah, okay and innit will be regarded as pragmatic markers in line with Romero-Trillo (2012: 1) for “constructions, such as you know, I mean, you see, well, yeah, that are present in speech to support interaction but do not generally add any specific semantic meaning to the message”. To some extent, yes, yeah, okay and innit serve the same functions in conversation, while other functions are served by only one marker (see e.g. Stenström et al. 2002). This will be demonstrated by means of extracts from corpora. The following questions are of particular interest: – Do yes, yeah and okay fulfil the same functions, and to what extent do the functions of the relative newcomer innit match those performed by the already established forms yes, yeah and okay? – How has the use of the markers developed in terms of frequency since the end of the 1990s? – Is innit gaining ground? – Are there any signs that okay is out-manoeuvering yes and yeah? 1.2

Outline

Section 2, a brief data and methods section, is followed in Section 3 by a short summary of the origin and general characteristics of yes, yeah, okay and innit. Section 4, which discusses the functions of the markers as a result of their position in the discourse, is richly illustrated by corpus extracts and ends with a summarising discussion. Section 5 consists of a survey of the spread of the markers in relation to the teenagers’ gender, socioeconomic background and regional origin and ends with a survey of the development of the markers through time. Section 6 concludes with a summary of the main results and some potential generalizations. 2. Materials and methods 2.1

The corpora

The study is based on four corpora of spoken English, two representing youth language, The Bergen Corpus of London Teenage Language (COLT) and The Multicultural London English Corpus (MLE), and two representing adult language, BNC Old and BNC2014. COLT was collected in 1992 by students in five London boroughs: Barnet, Camden, Hackney, Hertfordshire and Tower Hamlets. It consists of spontaneous

Chapter 16. From yes to innit 267



everyday conversations involving students aged 13 to 19, who recorded their own conversations by means of portable recorders in various surroundings. The total number of words is 431,528 (cf. Stenström et al. 2002). The MLE corpus, which encompasses 2,391,040 words, presents new English varieties from the late 20th century. It is made up of transcripts of informal conversation-like interviews with some 120–130 young speakers and some self-recordings (cf. Cheshire et al. 2011). The spoken part of BNC Old consists of ten million words recorded from the 1980s until 1994 and consists of informal conversations from the whole country, collected in different contexts among speakers aged 20 and upwards. Since COLT is currently part of BNC Old, the COLT component has been removed for the purpose of this study. The new version, BNC2014, or the “Spoken British National Corpus 2014”, consists of a large collection of samples of spoken language from the entire country with speakers representing a similar age range (cf. and ). 2.2

Methods

COLT, which is easily accessible on the internet, has been used to study the spread of the markers with respect to differences to do with gender, family background and school districts and was a rich source for dialogues illustrating the use of the markers. The program that accompanies COLT on the internet allows you to search for gender, family background and school district by just pressing a button. A comparison of all four corpora is intended to illustrate the use of the markers at the end of the twentieth and the beginning of the twenty-first century in youth as well as adult conversation. 3. Origin, development and general characteristics of the markers 3.1

Yes and yeah

According to the Online Etymology Dictionary, yes (adv) derives from OE gea (West Saxon), ge (Anglian) ‘so yes’ from Proto Germanic *ja, *jai (which is a word for affirmation and also the source for German, Danish, Norwegian and Swedish ja) and ultimately from PIE *yam. The form yeah is a reflection of the American drawling pronunciation of yes, which became a colloquial form in American English by 1863 (Online Etymology Dictionary) to answer a question.

268 Anna-Brita Stenström

Green (2000: 1303) defines yeah simply as a 20th century adverb, while Stenström et al. (2002: 172–176) show that it can be used as a tag with an interactional, checking and chunking function, which sometimes overlaps with innit, as in The twelve hundred´s good, yeah (Stenström et al. 2002: 174). Carter and McCarthy (2006: 213), too, illustrate yeah as a tag, in addition to a topic transition indicator. Other functions are to initiate a response, to act as a backchannel showing that the interlocutor is listening, and to serve as a filler when one does not know what to say (Stenström et al. 2002: 174–176). 3.2

Okay

The origin of okay has been the subject of a great deal of speculation (cf. Green 2000; Metcalf 2011), but scholars generally agree with Reid (1963–1964) that it made its first appearance in the Boston Morning Post on March 23, 1839, in the form o.k., which is a jocular misspelling of ‘oll korrect’, a form that soon appeared in other newspapers. Initially, it was looked upon as slang and not accepted. Today, it is a fully accepted expression, spelt okay in the twentieth century (cf. Metcalf 2011), which has spread all over the world and become a common word in many different languages. It is still regarded as slang in slang dictionaries such as Green’s Cassell’s Dictionary of Slang (2000: 863) and Green’s Chambers Slang Dictionary (2005). From a pragmatic point of view, okay expresses acceptance, agreement and well-being and acts as a device for checking an argument, ending an argument or signalling end of topic. Collins English Dictionary (1998: 1082), for instance, describes it as an informal sentence substitute expressing approval, agreement and so on, and Macmillan English Dictionary for Advanced Learners (2007: 1038–1039) as an interjection with various functions. Metcalf (2011: 17–18) uses the term ‘structural marker’, describing it as a marker that signals a new stretch of discourse, indicates a change of topic, serves as a contact check, as a punctuator, a filler or as a closer of conversation. Perhaps the most typical feature, he says, is that it “affirms without evaluating” (2011: 14); more precisely, by itself it is “value neutral”, but like other words, it acquires values as a result of the linguistic environment.2

2. This echoes Firth’s well-known dictum “You shall know a word by the company it keeps” (Firth 1957: 112).

Chapter 16. From yes to innit 269



3.3

Innit

Innit, the contracted form of isn’t it, is not only used for the tag question isn’t it. It has developed into an ‘invariant tag’ and can be used to refer to (i.e. mirror) any verb form; thus, we can hear utterances such as just wear the wig innit, they just get it off innit and after that he moved in with me innit. As an invariant tag it is not only used to ask for agreement but also to agree emphatically on what has been said. Innit represents a recent development; it is not mentioned in Green’s dictionary of slang (2000), and in Carter & McCarthy’s English grammar only as a spelling representing speech in writing (2000: 237), but it is mentioned in more detail in Biber et al. (1999) and in Pichler (2016), while Cambridge Advanced Learners’ Dictionary (2013: 1065–66) describes innit as a non-standard form of isn’t it “used at the end of sentence for emphasis”. As reflected in COLT, it was used by the London teenagers as early as the beginning of the 1990s, and by teenagers with a multicultural background in particular (cf. Stenström et al. 2002). Searches on the internet reveal that Your Dictionary regards the use of innit as US, UK and Australian slang, while Urban Dictionary characterizes it as British slang (esp. Asian, Indian, Pakistani), emphasizing that speakers’ ethnicity is an important factor for its use. It is argued that innit derives from the chav subculture, with ‘chav’ understood as a pejorative epithet that became popular in the first decade of the 21st century with reference to young, lower-class boys and girls who were said to behave badly, lack education and resist learning English. That innit was not yet a fully accepted expression as late as 2013 was demonstrated in an article in The Guardian of October 15, 2013, which reported that a school in south London had banned pupils from using certain slang words in classrooms and corridors, among them innit, and also forbid them to begin and end sentences with yeah in order to make them aware of a proper use of language.3 4. Position and function 4.1

Factors contributing to the functions of the markers

Several factors contribute to the functions of yes, yeah, okay and innit in the discourse, such as who is talking to whom, what the conversation is about, in what context the pragmatic marker occurs, and not least its position in the utterance (turn), which has been my point of departure. In order to find out how the markers are used, I checked their position by means of concordance lines (cf. Biber et al. 3.

270 Anna-Brita Stenström

1998: 26 ff.), which show the markers in context. Examples (1a) to (1d) from COLT show the different positions that a marker can instantiate. (1) a. b. c.

A: … And, he also had another video. B: Yes I know he had another video. Don’t bug me. A: …’s dad yeah, he walks through the bathrooms yeah B: Yeah. A: and he’s in the bath yeah, I’m … A: I think oh stupid what, what a waste. Okay you’ve gotta have certain rules d. A: … a year behind us in fashion. B: Innit man. Erm, There’re erm sushi bars…  (38201)4

Yes uttered by speaker B in (1a) occurs at the beginning of an utterance (Initial); in (1b) the first and fourth instances of yeah occur within the utterance (Medial), the second at the end (Final), while the third, uttered by B, makes up a turn of its own (Alone). Okay in (1c) occurs medially and Innit in (1d) initially. Example (1b) is particularly interesting; it shows that the instances of yeah perform different functions depending on their position (cf. Table 1, to be presented in Section 4.3). The concordance lines show that yes and yeah typically occur in initial position, okay in mid position and innit in end position. The least common position for yes, yeah and okay is at the end and for innit at the beginning of an utterance. In other words, innit stands out as being different from the other markers. Yes, yeah and okay are frequent stand-alones, while innit is seldom found in this position in the corpus. What the markers do have in common is that they all occur in all four positions. 4.2

Illustrations

Table 1 in Section 4.3 lists the functions performed by the pragmatic markers in their different positions as follows: 4.2.1 Functions in initial position The uptake connects with the previous speaker’s utterance and is followed by the response proper, as in (2a) and (2b). (2) a. b.

A: Are you going out with Warren? B: Yes I am  A: She knows about it B: Yeah I know 

(32601) (41707)

4. The corpus citations in Section 4 refer to the number of the individual texts in COLT.

Chapter 16. From yes to innit 271



In (2a) and (2b) Yes and Yeah link up with the preceding question and introduce the response realized by I am and I know, with stress on the verb. The response can either answer a question or serve as a reply more generally, as in (3a) to (3d). (3) a. b. c. d.

A: Was it good? B: Yes I enjoyed it immensely.  A: Were you attracted to him then? B: Yeah I was really attracted …  A: I’m turning this off. B: Okay okay no don’t don’t  A: Mm he’s a dickhead in he? B: Innit Dad’s a pig. 

(41201) (42704) (41107) (39506)

In (3a) to (3d) the response is realized by yes, yeah, okay and innit, followed by an amplification, objection or explanation. 4.2.2 Functions in medial position The check establishes or maintains contact between speaker and interlocutor, in a way equivalent to the function of the pragmatic marker ‘you know’; see (4a) to (4c). (4) a. A: … me up and stuff yeah? for a joke yeah? and my mum, came and she’s yeah and …  (40504) b. A: … had to walk past James okay cos everyone else was away  (40703) c. A: I wasn’t talking about her innit I was talking about the girl behind  (35207)

In examples (4a) to (4c), yeah, okay and innit serve one and the same function, equal to that of ‘you know’, ‘see what I mean’, ‘right’, each speaker establishing or maintaining rapport with the addressee by involving him/her in the discourse. Notice the question mark after yeah, which highlights the checking function. The query has a questioning effect similar to ‘d’you see what I mean’, as in (5a) and (5b). (5) a. A: … is directly in the way of the sun yes? so when you try and look at the sun you …  (51303) b. A: That all you can say innit can’t say nothing else  (34804)

In (5a), a teacher informs pupils about the sun, expecting them to agree in silence. In (5b), on the other hand, innit emphasizes a reprimand. The punctuator is equivalent to a punctuation mark in writing, as illustrated in (6).

272 Anna-Brita Stenström



(6) …yeah and I can’t , I can’t get myself pushed cos if they push me too hard I usually ache . It’s like going {nv} sound effects {/}, cos I ain’t got asthma or shit like that but I can’t breathe , that well okay . so I tell this to that , lady yeah , she goes , {mimicking} well I’ll get , the teachers to calm down on you a bit {/} well I hadn’t even done games once {laughing} so you know , okay {/}. So that was a bit eh strange that . calm down . I don’t realise they’ve been doing anything to me yet okay . and . it’s that simple okay , and mm the day I have games mister was asking me to do fifty pick-ups run around the , circuit five hundred  (40504)

Here, okay marks the end of a chunk of speech similar to what comes between commas and periods in writing (cf. Briz 1998). Notice that this function is very similar to checking. The reorienter indicates a change of direction in mid-turn, as in (7).

(7) A: … and services available in all areas Okay turn the page please  (35907)

Okay indicates that the conversation is taking a different direction, that ‘services available’ is a finished topic (see 8). The closer suggests/initiates the end of a conversation: (8) A: Okay Mum, bye . Bye people . Bye everyone , see you some other time  (37904)

Okay indicates end of conversation. Only okay can be used for this purpose. 4.2.3 Function in final position The trigger invites a response, as in (9a) to (9c). (9) a. b. c.

A: You know crumble yeah? B: Yeah  A: Next time you give it to me! okay? B: No  A: You got visitors innit B: What? A: You got visitors 

(40806) (35207)

(34602)

In examples (9a) to (9c), the markers serve to ask for or invite a response. In this function they can be replaced by ‘don’t you’, ‘won’t you’ or ‘haven’t you’, all having a response-triggering effect.

Chapter 16. From yes to innit 273



4.2.4 Functions in stand-alone position The reactor agrees, confirms, etc., as in (10a) to (10d). (10) a. b. c. d.

A: Can you take the rest? B: Yes.  A: It’s like that Bulger thing , yeah? B: Yeah  A: I’ll hold the Walkman B: Okay A: Keep talking  A: These batteries are crap B: Innit A: Buy rechargeable ones 

(41203) (42706)

(40802)

(40503)

As demonstrated in (10a) to (10d), the marker can be a direct answer to a yes/no question, an agreement, or a confirmation. Yes is equivalent to ‘yes I will’, yeah to ‘I agree’, and okay is confirming. Innit differs from the other markers by adding emphasis. The backchannel signals that the hearer is following; see (11a) and (11b). (11) a. b.

A: They still care a bit though B: Yes A: cos like , on Friday …  A: Oi he did have one break B: Yeah A: he came and watched the football 

(40504)

(42105)

As backchannels, yes and yeah signify ‘go on I’m listening’ without interrupting the current speaker. This is particularly obvious in (11a), where cos like indicates that speaker A finishes his sentence without paying (audible) attention to B’s Yes. 4.3

Positions and functions in COLT

Extracts (2a) to (11b) show that some of the markers perform one and the same function, while some are more versatile than others with more than one function. An overview of the functions of the markers in the various positions attested in the material is presented in Table 1, based on a scrutiny of the COLT concordances. An ‘x’ indicates that a marker fulfils a particular function.

274 Anna-Brita Stenström

Table 1.  Positions and functions in COLT Position

Function

Yes

Yeah

Okay

Innit

Initial

Uptake Response

x x

x x

  x

  x

Medial

Check Query Punctuator Reorienter Closer

x x      

x x x x  

x x x x x

x x      

Final

Trigger



x

x

x

Alone

Reactor Backchannel

x x

x x

x  

x  

As Table 1 illustrates, the markers are not entirely interchangeable; there is agreement between the markers with regard to pragmatic function except in five cases: – – – – –

Only yes and yeah serve as uptakes Only yeah and okay serve as punctuators and reorienters Only okay serves as a closer Only yeah, okay and innit serve as triggers Only yes and okay serve as backchannels

4.4 Summing up 4.4.1 Functions A check in the COLT concordances confirms that yes and yeah are typically used as responses, which is reflected in their frequent occurrence at the beginning of an utterance or standing alone. Okay, on the other hand, dominates as a contact check in mid-utterance position and is often found utterance-finally with a response-triggering effect. As a result of its origin, as a tag question, innit seldom serves as a response. Consequently, it seldom occurs at the beginning of an utterance and rarely stands alone but is typically found in mid-utterance position with a checking effect or at the end with a triggering effect. Okay and yeah are the most versatile markers with a number of functions in common, but while only yeah is used as an uptake, only okay has a closing function. Like yeah and okay, yes does not seem to occur as a mid-utterance check, and it is not found as an utterance-final trigger like yeah and innit. In quite a few cases okay serves as a kind of non-committing ‘first resort’, filling a space while the speaker is making up his mind how to go on, and the same goes for yeah, as in (12).



(12) A: it’s the seeds of the poppy B: yeah . no . that’s not , cos you, you eat them 

Chapter 16. From yes to innit 275

(40202)

Innit is the marker that adds most emphasis. At the beginning of an utterance, it reflects strong agreement (‘you’re absolutely right’) or surprise (‘oh’, ‘really?’) and when occurring alone, it expresses strong agreement (‘indeed’) or disbelief (‘are you sure?’). 4.4.2 Structures In their role as an uptake introducing a response to a question, yes and yeah are followed by a brief clause that constitutes the response proper, a clause that is generally worded differently depending on which of the two pragmatic markers realizes the uptake. Consequently, we get yes I am/have/know, yes it is/was, yes you can/did/have and so on, and yeah I know/mean/suppose/think, and yeah that’s right/ true/enough/ cool. By contrast, neither okay nor innit, are found as uptakes introducing a response. As a response, yes is typically followed by a clause beginning with the conjunctions and or but or by the initiator well, as in (13), while yeah is more often followed by a clause beginning with and or cos (‘because’), as in (14). (13) A: Sir d’ya have to round it up? B: Yes well round it up to what? 

(40702)

(14) A: was it like an old Dallas? B: yeah cos they’ve … 

(33905)

Okay tends to be followed by a conjunction, in particular and, but, cos, if, or so, by the adverbs now and then, a personal pronoun (I, you, we, etc.), by let’s, or another pragmatic marker, e.g. well, yeah, or a filled pause (erm). Innit is extremely rare as a response/answer. Consider (15), where innit acts as a confirming response to a statement rather than as an answer to a question. (15) A: They’re about a year behind us in fashion B: Innit man 

(38201)

5. The distribution of the markers in the corpora This section provides an overview of the distribution of the markers in the corpora, starting by the frequency of use in terms of gender, socioeconomic background and the spread of the markers in five London districts in COLT, and ending with the development across time, as reflected in all four corpora. The frequencies are normalized to occurrences per thousand words in all four graphs.

276 Anna-Brita Stenström

Distribution across genders in COLT

5.1

The distribution of the markers in COLT displayed in Figures 1 to 3 is accounted for in relation to the students’ gender, socioeconomic background and area of residence, which generally coincides with their school district.5 20 18 16 14 12 10 8 6 4 2 0

Boys Girls

yes

yeah

okay

innit

Figure 1.  Distribution of the pragmatic markers across genders in COLT (frequencies per thousand words)

As Figure 1 shows, yeah and okay dominate among the boys and are used slightly less by the girls. The figures representing yes and innit show that the boys are the most frequent users of yes and the girls the most frequent users of the less common innit. That innit6 is most frequent in the girls’ conversations is not surprising, since many of the female recruits and their friends have a multi-ethnic background, which in turn is related to their working-class background (Figure 2) as well as the area where they live, Hackney in particular (cf. Stenström et al. 2002: 190). A comparison of COLT and the more recent Linguistic Innovators Corpus (LIC) has pointed to a decline in the use of okay and showed that girls were more frequent users than were the boys at the beginning of the 21st century (see Torgersen et al. 2011). 5.2

Distribution across socioeconomic classes in COLT

Figure 2 shows the distribution of the markers in relation to socioeconomic class. The raw figures were low, except for yeah, for lack of adequate information regarding the use of the markers; information about the students’ socioeconomic background 5. In order to obtain the COLT data a number of school districts in London were approached, where students volunteered to record conversations they were engaged in with their friends for a few days. 6. The pragmatic use and the tag question use have not been separated.

Chapter 16. From yes to innit 277



22 20 18 16 14 12 10 8 6 4 2 0

Upper Middle Lower

yes

yeah

okay

innit

Figure 2.  Distribution of the pragmatic markers across socioeconomic classes in COLT (frequencies per thousand words)

was often missing in the log books accompanying the recordings, since the space to be ticked off concerning class by the student who recorded the conversation had been left empty. The fact that the students recorded their own conversations did not really favour exact annotations about the co-speakers’ background. Even so, the figures do seem to indicate that, at the beginning of the 1990s, yes and okay were more frequent among the upper- and middle-class students, that yeah, with the highest figures, dominated among the middle-class students with the upper-class students slightly behind, and that innit was typically a lower-class marker, which was most frequent in Hackney (cf. Figure 3). 30

yes

25

yeah okay

20

innit

15 10 5 0

Hertfordshire

Barnet

Camden

Hackney

Tower Hamlets

Figure 3.  Distribution of the pragmatic markers across the five London boroughs represented in COLT (frequencies per thousand words)

278 Anna-Brita Stenström

5.3

Distribution across London boroughs in COLT

The students taking part in the recordings that make up COLT came from the five London boroughs shown in Figure 3, namely Hertfordshire and Barnet, considered upper/middle class areas, Camden middle/lower class and Hackney and Tower Hamlets, considered lower class areas (cf. Key statistics for local authorities: Office of population censuses and surveys 1994; for a discussion about borough and social class, see Stenström et al. 2002: 20 f.). As Figure 3 shows, yeah is the most common marker overall. Among the other markers, yes is most common in Camden, followed by okay, whereas the order is the reverse in all other areas except in Hackney. In Hackney, which has a large proportion of teenagers with a multi-ethnic background, innit takes precedence over both yes and okay. Innit is least often used by students in upper-class Hertfordshire. As regards frequency, while yeah is favoured in all areas, yes and okay are largely upper/middle class words, and innit is largely a middle/lower class word, according to findings in COLT. 5.4

Frequencies compared: BNC Old, COLT, MLE and BNC2014

Figure 4 shows the spread of the markers in adult and in youth conversation at the end of the twentieth century and the beginning of the twenty-first century, as represented in the BNC Old, COLT, MLE and BNC2014 corpora. 30

BNC Old

25

COLT MLE

20

BNC 2014

15 10 5 0

yes

yeah

okay

innit

Figure 4.  Distribution of the pragmatic markers in the four corpora investigated (frequencies per thousand words)

Judging by Figure 4, yes, which was comparatively frequent in BNC Old, has decreased slightly in frequency among the adult population since the end of the 20th century, but has remained more frequent than it was among the younger population, judging by the lower bars representing the youth language corpora, and is still

Chapter 16. From yes to innit 279



comparatively frequent in BNC2014. Yeah, the most common marker overall, is getting more and more frequent as time goes by, while okay has risen in frequency among the adult population since the end of the 20th century, although it is not as common as in MLE, nor as frequent today as yeah. Innit, which is less frequent in the adult language than in the youth language, appears to be even less frequent in BNC2014 (at 0.10 per thousand words) than it was in BNC Old (0.16 per thousand words). The development across time seen in Figure 4 is reflected in Table 2. Table 2.  Rank order: From BNC Old to BNC2014.   yes yeah okay innit

BNC Old

COLT

MLE

BNC2014

1 4 4 3

3 3 2 2

4 1 1 1

2 2 3 4

The conclusions that can be drawn from the diachronic comparison of the four corpora in Table 2 are that yes has become less common overall, in youth language in particular, that yeah and okay have become more common overall and that innit has remained in adult speech since the 1980s/90s but has not seen an increase in use, despite its popularity among the youth before the turn of the millenium. 6. Conclusion The aim of this study was to trace the distribution of yes, yeah, okay and innit from the last decade of the 20th century to the second decade of the 21st century by means of two corpora representing adult talk, BNC Old and BNC2014, and two representing youth talk, COLT and MLE. The brief overview of the origin of the markers following the introductory chapters, was followed in Section 4 by a demonstration of how the functions of the markers depend on their position in the discourse, illustrated by extracts from COLT conversations. COLT also gave an idea of the spread of the markers among London adolescents in terms of gender, socioeconomic and regional background at the end of the 20th century, showing that innit was favoured by girls and yes, yeah and okay by boys, that yes and okay were more frequently used by upper- and middle-class students, and that yeah was least used by the lower-class students, who were the most frequent users of innit. While yeah was the dominant marker in all five boroughs investigated, innit was the least common marker in all areas except in Hackney, where it dominated over both yes and okay.

280 Anna-Brita Stenström

The first question posed in the introduction was whether the established forms yes, yeah, and okay, which are all positive reaction signals, have identical functions in conversation, and whether the functions of the newcomer innit match those performed by the already established forms. A study of their position in the utterance – initial, medial, final or standing alone – pointed to differences in more than one respect: the most versatile marker, yeah, served all functions but one, closer, followed by okay, which was not used as an uptake or a backchannel, while yes and innit served fewer functions, the same functions with three exceptions: yes but not innit, was used as an uptake and a backchannel, and innit but not yes was used as a trigger (Table 1). The answer to the question regarding frequency of use since the end of the 1990s is tricky, since two age groups are involved, BNC Old and BNC2014 representing adult speakers, and COLT and MLE representing young speakers. If the four corpora are seen as a continuum, the end result is that yeah is the winner and innit the loser, but if the youth and adult corpora are considered as separate groups, the winners are yeah, okay and innit in the youth corpora versus yeah and okay in the adult corpora according to Figure 4, which also shows that yeah and okay but not innit show a rising trend into the 21st century (Figure 4). The follow-up questions are whether innit is gaining ground and whether okay is out-manoeuvering yes and yeah. As Figure 4 indicates, innit remains in use in BNC2014, which is partly a logical consequence of its appearance in youth speech before the turn of the century and this youth coming of age, but also shows that it continues to be accepted by the adult population. The answer to the question whether there are any signs that okay is out-manoeuvering yes and yeah is more problematic. All corpora considered, yeah has remained the dominant marker since the 1990s, but there are indeed some signs that okay, on the rise, is catching up with yes, which is in decline. Consequently, judging by this study, okay does not seem to have out-manoeuvered yeah in the language of today’s adult population, but the popularity of okay among the youth, paired with their dispreference of yes, seems to have had a contagious effect on the adults in that okay is on the rise, whereas yes is on the decline. Obviously, a small-scale study like the present can only give a hint of the development, so what we need is more spoken-language corpora, and larger corpora, to be able to draw more firm conclusions – and not least youth language corpora in order to trace new tendencies. What would be particularly interesting to investigate is the development of innit as a positive response in line with yes, yeah and okay, which has already been observed in COLT.



Chapter 16. From yes to innit 281

References Biber, D., Conrad, S. & Reppen, R. 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: CUP. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education. BNC Old = British National Corpus. Retrieved from BNCweb (CQP edition) at Lancaster. (15 May 2020). BNC2014 = Spoken British National Corpus 2014. Retrieved from CQPweb at Lancaster. (15 May 2020). Briz, A. 1998. El español colloquial en la conversación. Madrid: Ariel. Cambridge Advanced Learners’ Dictionary. 2013. Cambridge: CUP. Carter, R. & McCarthy, M. 2006. Cambridge Grammar of English. Cambridge: CUP. Cheshire, J., Kerswill, P., Fox, S. & Torgersen, E. N. 2011. Contact, the feature pool and the speech community: The emergence of multicultural London English. Journal of Sociolinguistics 15(2): 151–196. Collins English Dictionary. 1998. Glasgow: Collins. COLT = Bergen Corpus of London Teenage Language. Retrieved from (15 May 2020). Firth, J. 1957. Papers in Linguistics, 1934–1951. London: OUP. Green, J. 2000. Cassell’s Dictionary of Slang. London: Cassell & Co. Green, J. 2005. Chambers Slang Dictionary. Edinburgh: Chambers. Key Statistics for Local Authorities: Office of Population Censuses and Surveys 1994. Longman Dictionary of Contemporary English. 1987. London: Longman. Macmillan English Dictionary for Advanced Learners. 2007. Basingstoke: Macmillan. Metcalf, A. 2011. Ok: The Improbable Story of America’s Greatest Word. Oxford: OUP. MLE = Multicultural London English Corpus. Retrieved via Sketch Engine at (15 May 2020). Online Etymology Dictionary. (15 May 2020). Pichler, H. 2016. Uncovering discourse-pragmatic innovations: Innit in multicultural London English. In Discourse-pragmatic Variation and Change in English: New Methods and Insights, H. Pichler (ed.), 59–85. Cambridge: CUP. Reid, L. A. W. 1963–64. American Speech. Romero-Trillo, J. 2012. Pragmatic Markers. The Encyclopedia of Applied Linguistics. Amsterdam: Elsevier. Stenström, A-B., Andersen, G. & Hasund, K. 2002. Trends in Teenage Talk [Studies in Corpus Linguistics 49]. Amsterdam: John Benjamins. Torgersen E. N., Gabrieletos, C., Hoffmann, S. & Fox, S. 2011. A corpus-based study of pragmatic markers in London English. Corpus Linguistics and Linguistic Theory 7(1): 93–118. Urban Dictionary. (15 May 2020). Your Dictionary. (15 May 2020).

Chapter 17

“If anyone would have told me, I would have not believed it” Using corpora to question assumptions about spoken vs. written grammar in EFL grammars and other normative works Sarah Schwarz and Erik Smitterberg Uppsala University

The aim of this chapter is to critically examine the accuracy of the advice on formal, written grammar offered in EFL teaching materials and other normative works. We use corpus data to investigate the use of four grammatical features often labelled as “informal” or “spoken” and find that the guidelines presented in university textbooks and other normative sources do not always match actual usage. Our findings call for a re-examination of the role of prescriptivism in teaching materials and in the language classroom, especially regarding the use of the split infinitive, like meaning ‘such as,’ and the conjunct though. Keywords: normative works, prescriptivism, corpus studies, EFL instruction

1. Introduction English as a Foreign Language (EFL) textbooks at university level tend to place a great deal of emphasis on the differences between spoken and written English, as well they should: study after study (e.g. the seminal Biber 1988; Biber et al. 1999) has shown that formal, written English has a particular set of stylistic restrictions not relevant to casual conversation. However, written academic English is not immune to change. There is a well-documented tendency for spoken-language features to become increasingly acceptable in writing, a tendency that, over time, may affect even the most formal genres. This tendency towards orality in writing is known as colloquialization (Siemund 1995; Mair 1997), and it is an important process for language teachers to be aware of.

https://doi.org/10.1075/scl.97.17sch © 2020 John Benjamins Publishing Company

284 Sarah Schwarz and Erik Smitterberg

Ideally, textbook instructions on formal writing are based on descriptive findings. That is, features are not proscribed due to subjective prejudice, but because they are rarely used in academic writing. But there is always a danger that the advice that a student is given is prescriptive. That is, the teaching materials may have a bias that is not necessarily supported by evidence, which may then be adopted by teachers. This kind of bias against linguistic features that are deemed to be inappropriate for a certain genre is termed stylistic prescriptivism by Curzan (2014: 24). Many EFL teachers are unaware of their own prejudice against the use of certain linguistic features in formal writing (Connors & Lunsford 1988). Teachers may not reflect on their preference for a particular native-speaker variety of English (assuming that a native-speaker variety is the target), or whether a feature proscribed as “too informal” in a grammar may actually be rather prevalent in writing. There is reason to question our own intuition as teachers and to examine the teaching materials that we are using, especially as regards what counts as acceptable written (vs. spoken) English. Should teachers really be warning students against split infinitives, or commenting on the use of like to mean ‘as if ’ in their essays? One way of empirically substantiating such comments is to investigate the use of such features in corpora that include both spoken and written language. In this study, we pose the following main research question: To what extent does the stylistic advice in EFL textbooks (in particular Estling Vannestål 2015, the most popular EFL university grammar in Sweden) and in other normative works mirror actual usage, especially across medium (spoken vs. written)?

In order to have sufficient space to discuss our findings, we limited this study to four features, which are further discussed and exemplified in Sections 3.1 to 3.4: – would have in conditional clauses (e.g. If the book would have contained a chapter on phonology, it would have been more suitable) – like used as a conjunction to mean ‘as’ (e.g. Like you mentioned, this is a good argument) or ‘as if ’ (e.g. It seemed like the narrator was unreliable) and like used as a preposition in exemplification (e.g. I prefer authors like Austen and Dickens) – the conjunct though (e.g. The author’s second argument is more important, though) – the so-called split infinitive (e.g. It was bold to even suggest changes to the syllabus) These four features all meet two important criteria. First, they are often given labels such as “informal,” “spoken,” or “colloquial” in textbooks and reference works (see Sections 3.1–3.4), although they occur in native-speaker output and thus cannot be considered ungrammatical. Second, they all have competing expressions that

Chapter 17.  If anyone would have told me 285

are used freely in formal texts (in the examples above, had for would have; as, as if, and such as, respectively, for like; however for though; and even to suggest for to even suggest). In other words, if they are rare in writing, it is not because their functions are not needed in the written medium; they are rather proscribed in favor of alternatives that carry out the same work but are considered more suitable stylistically. 2. Material and method We investigate the use of the four features in the 560-million-word Corpus of Contemporary American English (COCA) (Davies 2008), which at the time of writing (2019) includes material from 1990–2017. The choice of American English is not a given: EFL teaching at Swedish universities normally focuses on both Standard British English (BrE) and Standard American English (AmE) as the target varieties. It would therefore also be of interest to use a British corpus of spoken and written English (such as the 2014 update of the British National Corpus) to answer our research questions. However, AmE was selected here for two reasons: (i) our students in Sweden are inarguably exposed to a great deal of AmE, especially spoken AmE in film, television,1 and video games; (ii) colloquialization has been shown to be more advanced in AmE as regards several linguistic features (Siemund 1995; Mair 1997; Mair 2006: 189f). At present, COCA includes 560 million words evenly divided between five genres: spoken, fiction, popular magazines, newspapers, and academic texts. For this study, we were interested in features which are claimed to be speech-related, so it was important to search for them in spoken language as well as at least one written genre.2 The most formal genre (texts from academic peer-reviewed journals) was the preferred choice, as this is the genre that university students are most often encouraged to master. However, other written genres have also been studied in order to nuance the picture or in one case to replace the academic genre due to lack of relevant data. 1. Sweden is a subtitling country rather than a dubbing country. Rubio and Martínez Lirola (2010: 32f) find a correlation between subtitling and higher EFL proficiency in their investigation of English proficiency in EU member states. They also find that American programming predominates on European airwaves. 2. The spoken language section of COCA consists mainly of broadcast speech and does not represent the “linguistic baseline” of casual, face-to-face spoken interaction identified by Mair (2006: 183). Findings based on this genre are thus not necessarily generalizable to conversation. For our purposes, of course, findings are even more interesting if there is a difference between less-casual speech and formal writing.

286 Sarah Schwarz and Erik Smitterberg

Although COCA’s enormous size is an obvious advantage, it is also a corpus that requires a degree of methodological rigor to use responsibly. For one thing, the corpus tags are not always reliable. Our search for though as a conjunct had to be revised because of tagging errors (see Section 3.3 for details). Search results must sometimes be manually examined for relevant tokens, as described throughout Section 3. Sometimes it is possible to recover every example of a feature in the corpus, but often the large size of the corpus prohibits total recall. For frequently-occurring features, we gathered large, randomized samples of possible candidates of each feature (normally 500 per genre). These subsets were manually examined for valid tokens. The percentage of valid tokens in the data set was multiplied by the total number of tokens returned by that search in COCA. This extrapolated raw frequency was used to estimate the per-million-word (pmw) frequency of the feature in the corpus section. Our search for like in the sense ‘as if ’ exemplifies the procedure: the word like is entered into COCA’s search field this search returns 347,142 tokens in the Spoken genre of COCA 500 randomized tokens are collected and manually examined 53 of these 500, or 11%, are valid tokens where like has the sense ‘as if ’ this percentage (11%) is multiplied by the total tokens returned by the search (347,142) to extrapolate the raw frequency of valid tokens in the genre f. the extrapolated raw frequency (36,797.05) is divided by the word count of the Spoken genre (118,167,133) and multiplied by 1 million g. the extrapolated per-million-word frequency of like meaning ‘as if ’ in COCA is 311.4

a. b. c. d. e.

3. Results and discussion In this section, we present the results of our analyses. The features are addressed in order of how prevalent they turned out to be in academic texts as compared with broadcast dialogue, beginning with the least frequent feature in the academic genre: conditional would have. 3.1

Conditional would have

In a hypothetical conditional construction with past reference, as in (1), the speaker does not believe that the condition – which may be implicit or explicit – was fulfilled; the construction thus conveys “the probable or certain falsity of the proposition expressed by the matrix clause” (Quirk et al. 1985: 1091).

Chapter 17.  If anyone would have told me 287

(1) Instead of ending Vine, it would have been better if Twitter had learned how to integrate video and community into its main platform, […]  (COCA, Magazine, 2017)

When such a conditional is described in university grammars, the conditional-clause verb phrase features had + past participle, while the verb phrase in the matrix clause consists of would have + past participle (Collins 2017: 388–389; Estling Vannestål 2015: 207–208; Svartvik & Sager 1996: 97). However, especially in speech, native-speaker usage is frequently more varied than the typology of conditional constructions in learner grammars allows for (Gabrielatos 2005: 6; Römer 2007: 361). In addition to the pattern in (1), native speakers use a variant structure where both clauses feature would have + past participle, as in (2). (2) I hate to think what would have happened if I would have used the knife.  (COCA, Spoken, 1992)

Quirk et al. (1985: 1011n.) claim that the variant in (2), which will be referred to as “conditional would have” below, is restricted to informal spoken American English.3 Estling Vannestål (2015: 208) argues that it is characteristic of informal English in general but not appropriate for academic writing. Ishihara (2003: 22–23) reports that 28 of the 34 sources she checked either proscribed or excluded conditional would have, which is typically considered acceptable only when would has the volitional meaning ‘would be willing to,’ as in If they would help us, we could finish early (Quirk et al. 1985: 1011).4 However, in Ishihara’s (2003) analysis of spoken American English, conditional would have was used in 41% of spoken tokens; in addition, more than 75% of native speakers – though only a third of native speakers who were ESL professionals! – accepted conditional would have as correct in written representation of colloquial dialogue (Ishihara 2003: 30–34). Against this background, an examination of the occurrence of conditional would have in spoken as well as written Present-day American English is clearly warranted.

3. Accounting for British usage falls outside the scope of this study, but as Ishihara (2003) notes, Denison (1998: 300) and Molencki (2000: 318–319) cite British occurrences of conditional would have from Late Modern and Present-day English, respectively. Quirk et al.’s (1985: 1011n.) statement may thus to some extent be part of the well-attested tendency to associate informality in usage with American English by default. Hancock (1993: 241), however, appears to regard it as an American development. 4. No convincing tokens of conditional would have with volitional would were found in our data.

288 Sarah Schwarz and Erik Smitterberg

In our investigation of COCA, we searched for would have preceded by if, with a window of up to four words.5 Conditional would have seems marginal in academic texts; the first 50 of the 139 hits returned by the search were checked, and only four out of those 50 were relevant tokens of conditional would have, the other 46 either being examples of other constructions or occurring in quoted speech. For this reason, we instead considered Magazines, a written but less formal genre, to see whether conditional would have had made at least some inroads into written Standard AmE. The Spoken genre was used as a basis for comparison. As only 174 potential tokens were attested in Magazines, all hits returned by the search were considered. The Spoken genre returned 809 hits, a random sample of 200 of which were examined manually. While 145 of the 200 tokens in Spoken were relevant (72.5%), only 40 relevant tokens (23%) were attested in Magazines. Normalized frequencies (estimated for Spoken; exact for Magazines) were calculated; the results are given in Figure 1. 6 5

pmw

4 3 2 1 0

Spoken

Magazines

Figure 1.  Normalized frequencies of conditional would have in Spoken and Magazines in COCA (see Table A1 in the Appendix)

As Figure 1 shows, frequencies are very low compared with those reported on in Sections 3.2 to 3.4; however, this is partly due to the fact that conditional would have is a complex clausal construction, which is likely to have a low frequency overall. If that factor is taken into account, conditional would have seems to be a reasonably established feature of spoken American English with a normalized frequency of c. 5 pmw. In contrast, the very low frequency in Magazines (0.34 pmw) 5. The contracted form ’d have was excluded for two reasons: unlike would have it proved impossible to retrieve ’d have with if as a context word in the COCA interface (search carried out on July 4, 2019; and ’d have is sometimes perceived by native speakers as a contraction of had have rather than would have in conditional clauses (see Ishihara 2003: 28–29).

Chapter 17.  If anyone would have told me 289

indicates that the construction remains largely confined to speech. This hypothesis is strengthened by a look at the distribution of the 40 tokens in this genre: 28 (70%) occur in quoted direct speech, as indicated by quotation marks. The connection to spoken usage thus remains strong.6 Against this background, two interrelated questions that arise are (i) what the future of conditional would have may be and (ii) whether and how conditional would have should be taught to learners. As Denison (1998: 300) notes, had + past participle used to be possible in both the conditional and the matrix clause, and as a modal is now required in the matrix clause, modal usage may spread to the conditional clause and restore parallelism. If this construction in turn spreads to the written language, most likely as an instance of colloquialization, teachers and textbooks will need to take that development into account. For instance, Ishihara (2003: 40–43) suggests that conditional would have should be taught as an alternative to had + past participle in informal settings. 3.2

Three uses of like

As noted by D’Arcy (2006: 339–340), the form like is highly multi-functional. Several of its functions have become the target of prescriptive criticism, for instance quotative like in speech, as in They’re like, “Okay we’ll pay you now” (from D’Arcy 2006: 240). This section focuses on three other uses of like which, unlike quotative like, occur in writing but are nevertheless sometimes objected to: the conjunction like ‘as,’ as in (3); the conjunction like ‘as if,’ as in (4); and the preposition like ‘such as,’ as in (5). (3) Like my dad used to say, if you’re such a genius, why can’t you get three lines on “Law & Order?”  (COCA, Spoken, 2016)

(4) I felt like I was a Venetian blind. 

(COCA, Spoken, 2013)



(5) It rapidly became clear that the EEG was utterly unable to identify even major cerebral events like delusions, hallucinations, or obsessions.  (COCA, Academic, 2001)

Like Estling Vannestål (2015: 426), Quirk et al. (1985: 1110) label both uses of the conjunction “informal”; they also argue that the conjunction like is especially characteristic of American English and that its ‘as’ function is “widely criticized but common in informal style,” while the ‘as if ’ function is “widely regarded as 6. To some extent, the low frequency in Magazines may also be due to a paucity of past hypothetical conditionals in general. When 100 tokens of if from each of the Spoken and Magazines genres were examined manually, the analysis revealed that not a single token in Magazines was a past hypothetical conditional, while there were five relevant tokens in the Spoken genre.

290 Sarah Schwarz and Erik Smitterberg

nonstandard” (Quirk et al. 1985: 662). The Guardian and Observer Style Guide suggests that the conjunction like “will annoy many readers.” Huddleston and Pullum (2002: 1158) argue that, while there is a “tradition of prescriptive opposition” to the conjunction like, it is quite frequent; they also regard it as more popular – and less associated with informality – in American than in British English. The prepositional ‘such as’ function is sometimes claimed to be characteristic of spoken English (Longman Dictionary of Contemporary English, s.v. like) or of informal English, especially when more than one example is mentioned (Concise Oxford Dictionary of Current English, s.v. like), as in example (5). Huddleston and Pullum (2002: 1157) claim that, although this use of like is proscribed in some styles manuals, it is “very common and in no way restricted to informal style.” Some sources also indicate a semantic difference between like and such as; for instance, The Guardian and Observer Style Guide suggests that like excludes while such as includes, so that in “cities like Manchester” like would mean ‘that are similar to’ rather than ‘such as.’ However, it is doubtful whether such prescriptions mirror current usage; in (5), for example, delusions, hallucinations, and obsessions seem clearly to be intended to be included in the class of major cerebral events (see also Huddleston & Pullum 2002: 1157, who argue that “there is no requirement that the resemblance stop short of inclusion or identity”). To gauge the extent to which these three functions of like are used in formal writing, we sampled 500 random instances of like from each of the two genres Spoken and Academic in COCA. These two random samples were examined manually to identify tokens of the ‘as,’ ‘as if,’ and ‘such as’ functions, and estimated raw and normalized frequencies of these functions in the two genres were computed according to the methodological set-up outlined in Section 2. The normalized frequencies are presented in Figure 2. 350

Spoken

300

Academic

pmw

250 200 150 100 50 0

as

as if

such as

Figure 2.  Estimated normalized frequencies of the ‘as,’ ‘as if,’ and ‘such as’ functions of like in COCA (Spoken and Academic; see Table A2 in the Appendix)

Chapter 17.  If anyone would have told me 291

As Figure 2 shows, there are differences across functions as well as genres. As regards the conjunction like, the ‘as’ function is rare in Spoken and almost non-existent in Academic, while the ‘as if ’ function is quite frequent in Spoken but also very rare in Academic; for both functions, estimated frequencies for Academic should be taken with a grain of salt since raw frequencies are very low (three and twelve tokens, respectively). The ‘such as’ function is almost as common as the ‘as if ’ function in Spoken, but, unlike the others, this function of like is also established in formal writing. Moreover, the tokens of like as a conjunction meaning ‘as’ or ‘as if ’ in Academic turn out to be even less indicative of occurrence in formal writing than the figures imply. When these tokens were examined in detail, it turned out that only three instances of the ‘as if ’ function in our Academic dataset appeared to be non-speechrelated; the remaining tokens were from quotations and thus not original to the academic text examined (typically from quoted speech, but also from documents such as survey questions and student responses). One of the three tokens is given in (6).

(6) Olafson (2002) conducted individual interviews and focus group interviews with adolescent girls about their experiences in school. The girls indicated that physical education is often embarrassing for them. Many of the girls felt like the boys were staring at them.  (COCA, Academic, 2007)

Although (6) is not part of a quote, even here there is an indirect relationship with a number of speech events (interviews). As also illustrated in (6), most occurrences of like ‘as if ’ in both Spoken and Academic follow one of the perception verbs feel, look, and sound (41 of 65 tokens). Other fairly common verbs in this position are be and seem, typically in the construction it is/seems like (15 occurrences). In sum, these three functions of like behave very differently. Since the ‘as’ function is quite rare overall and the ‘as if ’ function is rare in Academic, it still seems advisable to instruct students to avoid using like as a conjunction in academic writing. The ‘such as’ function of like, in contrast, seems established even in formal writing. 3.3

Though as a conjunct

Though has two related uses in Present-day English: it can function as a subordinator, alone or after words such as even; and it can be a conjunct or linking adverbial meaning ‘however,’ as in (7), or ‘indeed.’7

(7) It’s only recently, though, that people who’ve lived their lives as conservatives have actually begun voting that way.  (COCA, Spoken, 1995)

7. Barth-Weingarten and Couper-Kuhlen (2002) suggest that final though is undergoing further development into a discourse marker. For the purposes of the present study, however, all non-subordinator uses were classified as conjuncts.

292 Sarah Schwarz and Erik Smitterberg

The adverbial use is in focus here. Adverbial though is labelled “colloquial” by the OED Online (s.v. though) and referred to as informal by Svartvik and Sager (1996: 418). Estling Vannestål (2015: 294) even suggests that “it may be a good idea to avoid it in academic writing” owing to its informality. Shaw (2009: 226) finds that students of literature overuse adverbial though compared with authors of research articles in the same field. To see whether the conjunct though is restricted to informal native-speaker production, we extracted 500 random tokens of though from the Spoken and Academic genres in COCA. In order to increase the proportion of valid tokens, the search was restricted to though followed immediately by punctuation, as it was deemed likely that virtually all conjunct tokens would precede a punctuation mark.8 However, it turned out that the punctuation tag was not wholly reliable: the search engine did not return any hits for though followed by punctuation after 2012, even though such tokens clearly occurred in the material when a search for though without the restriction was carried out. For this reason, only the subperiods 1990–1994, 1995–1999, 2000–2004, and 2005–2009 were included in the dataset and in the calculation of extrapolated frequencies; this limitation decreased the number of randomly selected tokens to 459 from Academic and 441 from Spoken, which were reviewed manually to identify conjunct tokens. The results of the calculations are given in Figure 3. 160 140 120

pmw

100 80 60 40 20 0

Spoken

Academic

Figure 3.  Estimated pmw frequencies of though as a conjunct in Spoken and Academic (COCA; see Table A3 in the Appendix) 8. To ensure that punctuation had been inserted in the transcriptions of speech, 100 random tokens of though in Spoken were checked manually. Of these 100 tokens, 51 were conjuncts, and 48 of these were immediately followed by punctuation (49 if an instance with a following dash is counted). Limiting retrieval to tokens followed by punctuation may thus under-report the frequency of the conjunct though in Spoken by some 4–6%. This rate was considered acceptable.

Chapter 17.  If anyone would have told me 293

The most conspicuous finding in Figure 3 is arguably the occurrence of though as a conjunct in academic writing. While 22 tokens pmw is certainly not a high frequency, though can nevertheless be said to be fairly established in academic texts. Moreover, a quick look at diachronic tendencies suggests that the high frequency may be a result of colloquialization. Between 1990 and 2009, the frequency of though followed by punctuation in the Academic genre rose from 25.25 to 39.89 occurrences pmw – an increase of nearly 60% in two decades. During the same time span, the frequency of however, one of its competitors, followed by punctuation decreased from 917.07 to 799.51 pmw. While however thus clearly remains dominant, there seems to be no reason to teach advanced learners of English to avoid though altogether in academic writing. Learners may instead be taught to use though as a far less frequent alternative to however, and to use though primarily in non-final position in academic writing: only 14% of the conjunct tokens of though in the Academic genre occurred in what Quirk et al. (1985: 490–501) call “end position,” as against 43% of conjuncts in the Spoken genre. 3.4

The “split” infinitive

The split infinitive is one of the most infamous bugbears of conservative English style guides. Perales-Escudero (2011: 314) finds “evidence that few other prescriptive rules have occupied the attention of prescriptive grammarians and speakers of English to the extent that the split infinitive has.” EFL learners, just like native speakers, are often cautioned to avoid using split infinitives, particularly in formal writing, even though, as noted by Huddleston and Pullum (2002: 581), there is no rational basis for the avoidance. Estling Vannestål (2015: 300) warns her student readers about the stylistic proscription on the split infinitive, explaining that although “in informal language, the split infinitive is very frequent,” it “is still often avoided in writing.” She encourages English learners to avoid splitting infinitives with not, never and only in particular. Perales-Escudero (2011: 320) finds that most style manuals today encourage writers to avoid splitting infinitives “wherever possible.” This suggests that splitting infinitives sometimes cannot be avoided, but style manuals normally do not offer any linguistic criteria for determining when this is the case. Quirk et al. (1985: 497), on the other hand, describe two linguistic environments where infinitives may be more likely to be split: (i) when avoidance of the split infinitive would produce ambiguity (see also Huddleston & Pullum 2002: 581–582), and (ii) when the splitting adverb is a subjunct of “narrow orientation” (e.g. to really understand; see Quirk et al. 1985: 572–612 for further discussion of this category of adverbials). Our research aims for this case study were (i) to investigate the occurrence of split infinitives across genres in COCA to check for an association of split

294 Sarah Schwarz and Erik Smitterberg

infinitives with informality and (ii) to conduct a separate search for infinitives split with not, never, and only in order to see if those are particularly rare, as Estling Vannestål (2015: 300) suggests. A secondary goal of the corpus investigation was to examine the collected tokens to try to determine if there were linguistic motivations for the split. The split infinitive was particularly easy to search for in COCA, as nearly all results of a search for to + adverb + infinitive verb appeared to be valid tokens of split infinitives, as in examples (8) and (9).

(8) So it’s a little bit surprising to suddenly see his own name appear on the list that presumably he was keeping.  (COCA, Spoken, 2000)



(9) To date, China has steadfastly refused to even contemplate binding constraints on its greenhouse gas emissions.  (COCA, Academic, 2008)

Figure 4 shows the results of the search for to + adverb + infinitive verb in the entirety of COCA. 250 200

pmw

150 100 50 0

Spoken

Fiction

Magazines Newspapers

Academic

Figure 4.  Split infinitives (to + adverb + infinitive verb) in COCA, per million words (see Table A4 in the Appendix)

Figure 4 does not support Estling Vannestål’s (2015: 300) assertion that split infinitives are especially frequent in informal language. The most formal written genre, Academic, contains a higher pmw frequency of split infinitives than the other, less formal, written genres, and approaches the Spoken frequency. Even more interestingly, the Academic genre actually contains the lowest frequency pmw of non-split infinitives of any genre (13,974.92 pmw). Infinitives are simply more likely to be split in the academic texts than in the other written genres. This result challenges the common perception that split infinitives are an informal feature that tends to be avoided in academic writing.

Chapter 17.  If anyone would have told me 295

For many of the sentences from the Academic genre, moving the adverb to outside the infinitive construction renders the scope of the adverb ambiguous, that is, the adverb, if it were to precede to, could be understood as modifying a preceding verb instead of the infinitive, as in (10) and (11), as suggested by Quirk et al. (1985: 497). Preliminarily, it would appear that, at least in formal writing, avoiding ambiguity may take precedence over avoiding split infinitives. (10) As pressure mounts to rapidly improve levels of adolescent literacy, teachers need suggestions about how to reach learners that relate to existing standards.  (COCA, Academic, 2015) Non-split version: As pressure mounts rapidly to improve… (rapidly may be understood as modifying mounts) (11) The reader is advised to cautiously apply these findings to Latino families residing in different parts of the country.  (COCA, Academic, 2007) Non-split version: The reader is advised cautiously to apply… (cautiously may be understood as modifying advised )

However, our results do not seem to support Quirk et al.’s (1985: 496) assertion that subjuncts are the most frequent splitters of infinitives. The splitting adverbs in our sentences from the Academic genre mostly appear to be adjuncts, as in (10) and (11). A separate search for infinitives split by the restricting adverbs not, never, and only was carried out in order to test whether Estling Vannestål’s (2015: 300) focus on avoiding these split infinitives in particular seemed justified. The results are shown in Figure 5. 20

not

18

never

16

only

14 12 10 8 6 4 2 0

Spoken

Fiction

Magazine

Newspaper

Academic

Figure 5.  Per-million-word frequencies of infinitives split with not/never/only in COCA (see Table A5 in the Appendix)

296 Sarah Schwarz and Erik Smitterberg

Split infinitives with not, never and only are certainly very infrequent in the Academic genre of COCA. However, split infinitives with never and only are very infrequent in the Spoken genre as well, so perhaps the distinction “formal vs. informal” is misleading for never and only. It is only not which seems to split infinitives far more readily in spoken language than in formal writing: the frequency pmw of infinitives split by not is 341% higher in Spoken than in Academic, while the equivalent differences for never and only are 172% and 76%, respectively. 4. Conclusion Our analysis in Section 3 shows that not all features flagged as “informal” in normative works have the expected distribution pattern over oral and written genres. In our investigation, conditional would have and the conjunction like remain infrequent in writing compared to speech, and many of the written tokens that are attested are in fact related to quoted speech. In contrast, like in the sense ‘such as,’ the conjunct though, and many uses of the split infinitive appear to be relatively established even in academic texts. Also, though seems to be undergoing short-term diachronic change indicative of colloquialization. In addition, the results demonstrate that close readings of individual tokens are often necessary to shed light on the reasons that underlie usage; for instance, many of the split infinitives in the Academic genre would have been ambiguous if the adverb had preceded to, and medial position appears to make though more acceptable in academic writing. An important question indirectly raised by the results concerns whether learners should be made to avoid features claimed and/or found to be “oral” or “informal” in the first place. Advice of that kind may of course be helpful in that the learners’ texts will potentially be evaluated more favorably by some readers. Such an approach is also problematic, though. Lack of genre awareness – often manifesting itself as the occurrence of informal features in formal texts – is frequently cited as a characteristic of non-native-speaker production (see, for instance, the discussion in Larsson & Kaatari 2019). However, that characteristic would to some extent cease to be problematic if educated native-speaker output was no longer considered the uncontroversial norm for written English-language production, as suggested by Aijmer (2003: 16). According to Crystal (2003: 69), non-native speakers of English outnumbered native speakers by a factor of roughly three to one at the beginning of the twenty-first century, and the gap is increasing steadily owing to uneven population growth and improved opportunities for language education. Statistically, the genre awareness exhibited by the idealized native speaker is fast becoming a highly atypical norm.

Chapter 17.  If anyone would have told me 297

This argument should not be taken to mean that linguistic genre differences should not be taught in EFL classrooms; on the contrary, raising awareness of differences across text categories is an important part of the education of advanced learners (see Section 1). But two distinctions need to be made here. First, some features of academic English are part of the linguistic make-up of texts because they are useful vehicles for accomplishing the aims of those texts; these features include the extensive use of phrasal modification (see, for instance, Biber & Gray 2011), and they should be taught to any speaker who needs to produce academic English. However, the features considered in this study are simply alternative ways of fulfilling functions that are necessary in texts in general (see Section 1); in other words, they are non-academic only because they tend to occur primarily in non-academic texts (or, in the case of at least the split infinitive, because they are claimed to do so). Second, there is a difference between proscribing certain features and making learners aware of perceived differences and empowering them to make their own choices (see the discussion in Swales 2001: 53–54). In the extralinguistic context in which English is used globally today, promulgating the norms of a subset of native speakers without problematizing this practice is becoming an increasingly ideological choice.

Acknowledgment We are grateful to two anonymous reviewers and to the editors for their constructive feedback.

References Aijmer, K. 2003. Engelska  – ett språk utan gränser. In Gränser: Humanistdag-boken 16, E. ­Ahlstedt, M.-L. Follér, A. Lundqvist, M. Nyman, T. Magnusson, T. Olsson & B. Ryder Liljegren (eds), 13–20. Gothenburg: Faculty of the Humanities, Gothenburg University. Barth-Weingarten, D. & Couper-Kuhlen, E. 2002. On the development of final though: A case of grammaticalization? In New Reflections on Grammaticalization [Typological Studies in Language 49], I. Wischer & G. Diewald (eds), 345–361. Amsterdam: John Benjamins. Biber, D. 1988. Variation across Speech and Writing. Cambridge: CUP. Biber, D. & Gray, B. 2011. Grammatical change in the noun phrase: The influence of written language use. English Language and Linguistics 15(2): 223–250. Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson. Collins = Collins COBUILD English Grammar, 4th edn. 2017. Glasgow: Collins. The Concise Oxford Dictionary of Current English, 9th edn. 1995. Oxford: Clarendon Press. Connors, R. J. & Lunsford, A. A. 1988. Frequency of formal errors in current college writing, or Ma and Pa Kettle do research. College Composition and Communication 39(4): 395–409.

298 Sarah Schwarz and Erik Smitterberg

Crystal, D. 2003. English as a Global Language, 2nd edn. Cambridge: CUP. Curzan, A. 2014. Fixing English: Prescriptivism and Language History. Cambridge: CUP. D’Arcy, A. 2006. Lexical replacement and the like(s). American Speech 81(4): 339–357. Davies, M. 2008. The Corpus of Contemporary American English: 450 million words, 1990–present. (10 July 2019). Denison, D. 1998. Syntax. In The Cambridge History of the English Language, Vol. IV: 1776–1997, S. Romaine (ed.), 92–329. Cambridge: CUP. Estling Vannestål, M. 2015. A University Grammar of English: With a Swedish Perspective, 2nd edn. Lund: Studentlitteratur. Gabrielatos, C. 2005. Corpora and language teaching: Just a fling or wedding bells? Teaching English as a Second or Foreign Language 8(4): n.p. The Guardian and Observer Style Guide. (10 July 2019). Hancock, C. R. 1993. If he would have and if he didn’t. American Speech 68(3): 241–252. Huddleston, R. & Pullum, G. K. 2002. The Cambridge Grammar of the English Language. ­Cambridge: CUP. Hundt, M. & Mair, C. 1999. ‘Agile’ and ‘uptight’ genres: The corpus-based approach to language change in progress. The International Journal of Corpus Linguistics 4(2): 221–242. Hyland, K. & Jiang, F. (K.). 2017. Is academic writing becoming more informal? English for Specific Purposes 45(1): 40–51. Ishihara, N. 2003. “I wish I would have known!”: The usage of would have in past counterfactual if- and wish-clauses. Issues in Applied Linguistics 14(1): 21–48. Larsson, T. & Kaatari, H. 2019. Extraposition in learner and expert writing: Exploring (in)formality and the impact of register. International Journal of Learner Corpus Research 5(1): 33–62. Longman Dictionary of Contemporary English, 5th edn. 2009. Harlow: Pearson. Mair, C. 1997. Parallel corpora: A real-time approach to the study of language change in progress. In Corpus-based Studies in English, M. Ljung (ed.), 195–209. Amsterdam: Rodopi. Mair, C. 2006. Twentieth-century English: History, Variation and Standardization. Cambridge: CUP. Molencki, R. 2000. Parallelism vs. asymmetry: The case of English counterfactual conditionals. In Pathways of Change: Grammaticalization in English [Studies in Language Companion Series 53], O. Fischer, A. Rosenbach & D. Stein (eds), 311–328. Amsterdam: John Benjamins. OED = Oxford English Dictionary Online. OUP. (15 May 2020). Perales-Escudero, M. D. 2011. To split or not to split: The split infinitive past and present. Journal of English Linguistics 39(4): 313–334. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. Römer, U. 2007. Learner language and the norms in native corpora and EFL teaching materials: A case study of English conditionals. In Anglistentag 2006 Halle: Proceedings, S. Volk-Birke & J. Lippert (eds), 355–363. Trier: Wissenschaftlicher Verlag Trier. Rubio, F. D. & Martínez Lirola, M. 2010. English as a foreign language in the EU: Preliminary analysis of the difference in proficiency levels among the member states. European Journal of Language Policy 2(1): 23–40.

Chapter 17.  If anyone would have told me 299

Shaw, P. 2009. Chapter 11. Linking adverbials in student and professional writing in literary studies: What makes writing mature. In Academic Writing: At the Interface of Corpus and Discourse, M. Charles, D. Pecorari & S. Hunston (eds), 215–235. London: Continuum. Siemund, R. 1995. ‘For who the bell tolls’: Or why corpus linguistics should carry the bell in the study of language change in present-day English. Arbeiten aus Anglistik und Amerikanistik 20: 351–376. Svartvik, J. & Sager, O. 1996. Engelsk universitetsgrammatik, 2nd edn. Stockholm: Liber. Swales, J. M. 2001. EAP-related linguistic research: An intellectual history. In Research Perspectives on English for Academic Purposes, J. Flowerdew & M. Peacock (eds), 42–54. C ­ ambridge: CUP.

Appendix Table A1.  Corresponds to Figure 1. Raw and normalized frequencies of conditional would have in two genres of COCA (estimated for Spoken tokens on the basis of 200 random examples; see Section 2 for a detailed description of how frequencies were calculated)  

Estimated raw freq.

Spoken Magazines Total (raw) /Avg. (pmw)

Estimated freq. pmw

587  40 627

4.96 0.34 2.65

Table A2.  Corresponds to Figure 2. Estimated raw and normalized frequencies of three uses of like in two genres of COCA (see Section 2 for a detailed description of how frequencies were calculated)   Spoken Academic Total/Avg.

‘as’



raw

pmw

12,497    457 12,954

105.76   4.10  54.93

‘as if ’ raw

  36,797  1,828 38,625

pmw 311.40  16.39 163.89



‘such as’ raw

  36,103 17,059 53,162

pmw 305.52 152.95 229.23



Total/Avg. raw

pmw

   85,397 240.89  19,344  57.81 104,741 149.35

Table A3.  Corresponds to Figure 3. Estimated raw and normalized frequencies of though as a conjunct in two genres of COCA, 1990–2009 (see Section 2 for a detailed description of how frequencies were calculated; number of tokens analysed = 903)   Spoken Academic Total (raw) / Avg. (pmw)

Estimated raw freq.

Estimated freq. pmw

16,707  2,407 19,114

141.38  21.58  81.48

300 Sarah Schwarz and Erik Smitterberg

Table A4.  Corresponds to Figure 4. Raw and normalized frequencies of [to + adverb + infinitive verb] in COCA  

Raw frequency

Spoken Fiction Magazines Newspapers Academic Total (raw) /Avg. (pmw)

Frequency pmw

25,894  7,739 15,981 15,028 20,944 85,586

221.79  69.19 136.18 133.00 187.99 149.63

Table A5.  Corresponds to Figure 5. Raw and normalized frequencies of [to + not/never/only + infinitive verb] in COCA not

  Spoken Fiction Magazines Newspapers Academic Total (raw)/Avg. (pmw)

never



only





Total/Avg.

raw

pmw

raw

pmw

raw

pmw

raw

pmw

2,140  562  717  996  464 4,879

18.33   124  5.02 221  6.11 185  8.81 160  4.16  43  8.49 733

1.06 1.98 1.58 1.42 0.39 1.29

  127  45  63  82  69 386

1.09 0.4 0.54 0.73 0.62 0.68

  2,391  828  965 1,238  576 5,998

6.83 2.47 2.74 3.65 1.72 3.48

Chapter 18

Intensification in dialogue vs. narrative in a corpus of present-day English fiction Signe Oksefjell Ebeling and Hilde Hasselgård University of Oslo

This chapter examines adverbial intensification of adjectives in present-day English fiction, with the aim of establishing whether dialogic and narrative passages differ in this regard. Although the corpus is relatively small, the study indicates that the frequency of intensification differs across these two subregisters of fiction. Findings of a more qualitative nature include slight differences in preferred choice of intensifiers in dialogue (very > so > too) vs. narrative (so > too > very). Moreover, dialogue has a preference for evaluative adjectives, while narrative shows more variation. A comparison with authentic speech uncovered few clear patterns except the predominance of some high-frequency adverb-adjective combinations. Nevertheless, the differences between fictional dialogue and narrative demonstrate that linguistic phenomena may vary both within and across registers. Keywords: intensifiers, adverb-adjective combinations, amplifiers, fictional dialogue vs. narrative, attributive vs. predicative

1. Introduction and aims Fiction can be regarded as a hybrid register in that it often has conversation-like passages embedded in what may be termed the narrative (cf. Biber et al. 1999: 16). Nevertheless, most previous corpus studies of the language of fiction tend to talk about fiction as a monolithic register. In this exploratory study, we investigate potential intra-register differences in the use of intensifying adverbs in the dialogue and narrative parts of a corpus of present-day English fiction. As pointed out by Biber et al. (1999: 564 ff.) “[e]ven for similar degree adverbs there are differing preferences across registers and associations with different adjectives”. Against this backdrop, we specifically aim to compare the two subregisters as to the frequency of adverbial intensifiers, the lexical choice of intensifiers and the typical intensifier-adjective combinations. Our corpus for this undertaking is the English https://doi.org/10.1075/scl.97.18ebe © 2020 John Benjamins Publishing Company

302 Signe Oksefjell Ebeling and Hilde Hasselgård

original fiction part of the English-Norwegian Parallel Corpus (ENPC), in which direct speech has been marked up so that the dialogic passages can be studied separately (Ebeling & Ebeling, forthcoming). Following the framework of Quirk et al. (1985: 445–446), we classify intensifiers into amplifiers and downtoners. The former type “scale[s] upwards from an assumed norm” (Quirk et al. 1985: 445) and includes for example very, extremely and highly. The latter type has “a generally lowering effect” (ibid.), thus scaling downwards. Downtoners include almost, a bit and fairly. Furthermore, we distinguish between attributive and predicative position of the intensified adjective phrase (e.g. a very enjoyable holiday vs. it was so dark). This also enables us to investigate whether the two subregisters differ as to the syntax as well as the general value of the intensifiers. Since previous studies have shown intensifiers to vary across registers (e.g. Biber et al. 1999; Xiao & Tao 2007), we may expect to find differences between the two subregisters of fiction, in such a way that the dialogue is closer to the patterns reported for conversational speech (Aijmer 2018; Biber et al. 1999; Tagliamonte 2008). Thus, in addition to comparing the dialogic and narrative portions of the same literary corpus, we also compare our results to those of Aijmer (2018) in order to find out how adverb-adjective combinations in dialogue compare to authentic speech. The chapter is organized as follows: In Section 2, we give a brief outline of previous research on intensification in English. Section 3 describes the corpus and method used. In Section 4, we start by making some descriptive and statistical observations of the material, before analysing the patterns of intensification of adjectives with particular emphasis on the most frequently occurring pattern in our data. Furthermore, we compare three of the most frequent amplifiers and their collocational patterns in dialogue vs. narrative vs. British and American speech (Aijmer 2018). Finally, in Section 5, we summarize our findings and offer some concluding remarks. 2. Previous research Much has been written on adverbial intensifiers, and this brief review cannot do justice to it all.1 See, however, Méndez-Naya’s (2008) introduction to a special issue on intensifiers in English and the papers therein for a good general overview. 1. Two notable omissions here are studies of intensifier use in learner language (e.g. Granger 1998; Lorenz 1999; Hendrikx 2019) and across languages (e.g. Napoli & Ravetto (eds) 2017; Wilhelmsen 2019).



Chapter 18.  Intensification in dialogue vs. narrative 303

Bolinger discusses the open-ended nature of such ‘degree words’, suggesting that “virtually any adverb modifying an adjective tends to have or to develop an intensifying meaning” (1972: 23). However, as various corpus-based studies have demonstrated, different lexical items used as intensifiers vary greatly in frequency, and the frequencies also vary across registers (Biber et al. 1999: 564; Xiao & Tao 2007), regional varieties (Tagliamonte 2008; Aijmer 2018), age group (Tagliamonte 2008), and time (Ito & Tagliamonte 2003; Claridge & Kytö 2014). Xiao and Tao (2007) take advantage of the sociolinguistic information in the British National Corpus to identify relevant variables for the use of 33 different adverbial amplifiers. For our purposes, it is interesting to note that amplifiers are found to be twice as frequent in spoken as in written English (including all registers of either mode), and moreover that the lexical preferences vary across the two modes. There are several studies of intensifiers in spoken English. Paradis (1997) investigates degree modifiers in the London-Lund Corpus (spoken British English) with regard to their frequency, collocability with adjectives, and intonation. She also provides a semantic analysis of the modifiers in context and in the light of the interplay of lexical meaning with attitude, intonation and discourse (Paradis 1997: 29). An important finding regarding adverb-adjective collocations is the requirement of ‘semantic harmony’. For example, adjectives denoting a scalar property combine with intensifiers indicating a point on the scale (very good), while adjectives associated with a limit (e.g. sufficient, identical) combine with ‘totality modifiers’ (e.g. almost, entirely) rather than scalar intensifiers such as very (Paradis 1997: 159–160). It is observed that “the adjective and the degree modifier exert semantic pressure on one another” (Paradis 1997: 162). Tagliamonte (2008) investigates intensifiers in a corpus of Toronto speech, in which speaker age is identified. Interestingly, she finds that intensification of adjectives occurs in about 36% of possible contexts and is thus a pervasive phenomenon (see also Paradis 1997: 11). The most frequent intensifier by far in Tagliamonte’s corpus is really, followed by very, so and pretty. The top three intensifiers are found to differ in frequency in British (very > really > so), American (so > really > very) and Canadian English (really > very > so), partly explained by a clear preference for really among the younger Canadian speakers. Aijmer (2018) investigates the intensifiers very, really and so in four varieties of spoken English based on material from the International Corpus of English (ICE) for British, New Zealand and Singapore English and the Santa Barbara Corpus for American English. She discovers differences among the varieties in the preferred intensifiers (for example, really is more frequent than very in NZ English, so is the most frequent intensifier in American English, and very is much more frequent in ICE-SIN than in the other corpora). The adverb-adjective collocations also differ across varieties, with Singapore English having the most idiosyncratic behaviour.

304 Signe Oksefjell Ebeling and Hilde Hasselgård

It should also be mentioned that some early studies of adverb-adjective collocations in spoken and written English (Altenberg 1991; Johansson 1993) have noticed that certain adverb-adjective combinations are more likely to occur than others (see also Paradis 1997). For example, Altenberg (1991: 137) notes that “perfectly tends to collocate with words referring to positive or commendable qualities” such as perfectly capable while absolutely colligates with “inherently superlative adjectives” (e.g. absolutely marvellous). Hence, previous research into the intensification of adjectives gives a very clear picture of variation along a great number of linguistic and sociolinguistic variables, which makes it all the more interesting to study this feature in two subvarieties of the same register. 3. Material and method For the purpose of this study, we use the English fiction component (EngFic) of the English-Norwegian Parallel Corpus, which contains 30 text extracts of between 10,000 and 15,000 words each from novels written in present-day English (from the 1980s and 90s). All the texts were originally written in English, mainly by British or American authors (see Johansson et al. 1999/2002 for more information on the corpus design). The reason for this choice was that we needed a corpus that enabled separate searches in the two subregisters of fiction: dialogue and narrative. In EngFic, passages marked by the writers as instances of direct speech have received special mark-up (Johansson et al. 1999/2002: 19), thus facilitating a split into two sub-corpora, one containing dialogue only and one containing narrative only (see Ebeling & Ebeling forthcoming). In terms of size, the two subregisters differ greatly, with more than 3.5 times as many running words in narrative (330,168) as in dialogue (91,995). This proportion of words in narrative vs. dialogue in the novels is illustrated in Figure 1. There is also great variation among individual corpus texts as to the representation of the two subregisters: whereas the narrative portions vary between roughly 

Narrative



Dialogue

()



.

 

.



Figure 1.  Proportion of words in dialogue vs. narrative

Chapter 18.  Intensification in dialogue vs. narrative 305



4,500 and 14,000 words per text, the dialogic portions vary between 158 and 6,500 words. Figure 2 displays this variance in dialogue, where the percentage of number of words per text ranges from 1.3% to 59.4% (median = 20.6%). 60 50 40 30 20 10 0

Figure 2.  Proportion of number of words in dialogue per text

The corpus was tagged with the CLAWS7 tagger, so that searches for undefined combinations of adverbs immediately followed by adjectives could be made. Although CLAWS7 conveniently has a degree adverb tag (RG), some degree adverbs have received a general adverb (RR) tag instead, mostly due to ambiguity. Really is a case in point; in CLAWS7 it is tagged as a general adverb, but its function in example (1) is clearly that of a degree adverb, or indeed intensifier.2

(1) … he wanted to buy a really good shirt he’d seen … 

(JB1)

Using AntConc (Anthony 2018), we searched for all combinations of either *_RG or *_RR + *_JJ in both dialogue and narrative. The resulting concordance lines were analysed manually to remove irrelevant hits (e.g. accidental sequences of adverb and adjective and adverbs describing manner rather than degree; see Johansson 1993: 42, or non-intensifying uses of polysemous adverbs such as really and too). Furthermore, each intensifier was classified according to value (amplifier vs. downtoner) and the syntactic position of the adjective phrase (attributive vs. predicative). The classification was made on the basis of meaning in context rather than lexical form, allowing for instance quite to be either a downtoner (quite good) or an amplifier (quite different); see Paradis (1997: 17 f.). Some of the material was analysed by both authors and problematic cases were discussed to ensure consistency of analysis.

2. The CLAWS7 documentation concedes that “[i]n some cases, specific usage is not tagged as such (e.g. really can only be an RR, even when it is an intensifier). In other cases there are specific tags available (e.g. bloody can be an RR or an RG). This is an area of inconsistency that needs to be addressed” (: Section 2.2).

306 Signe Oksefjell Ebeling and Hilde Hasselgård

4. Analysis and discussion 4.1

Some descriptive and statistical observations of the data

Following manual scrutiny of the concordance lines, we identified 862 intensifiers immediately followed by an adjective in narrative and 354 in dialogue, amounting to 26.1 and 38.5 per 10,000 words, respectively. From these overall frequencies, it is clear that intensification of adjectives is a much more frequent phenomenon in dialogue than in narrative. In fact, this is the case in 22 out of the 30 texts in the corpus. The use of intensifier + ADJ is normally distributed across the texts in narrative but not in dialogue, calling for a non-parametric statistic to test the hypothesis that there is no difference in the use of intensifiers + ADJ between the two subregisters. A Wilcoxon signed-rank test (wilcox.test in R)3 shows that the difference between dialogue and narrative in the use of intensifier + ADJ is statistically significant and not trivial, as indicated by the p-value (p  too > very. These three account for about 74% of the amplifiers in the dialogue and about 67% in the narrative.5 Thus, it is fair to say that the subregisters behave similarly in their lexical choices of amplifiers, but with some differences in terms of the overall and ranked frequencies. It may be noted that the most common amplifiers in Table 1 are also among those found by Biber et al. (1999: 565, 567) to be most frequent in conversation and academic prose. The rank order of amplifiers in dialogue in Table 1 (very > so > too > quite > really) resembles British conversation more than American conversation as reported in Biber et al. (1999: 565, 567): the rank frequency order in British conversation is very > so > really, quite > too, while in American conversation, we find so > really/very > too/real, with quite being rather infrequent (Biber et al. 1999: 565, 567). Comparing our narrative material to Biber et al.’s academic prose (since they do not give frequency information for fiction), we may note much overlap in lexical choice, apart from the absence of really in Biber et al.’s list (1999: 565). However, the rank frequencies of amplifiers in our narrative material (so > too > very > quite > really) differ from those reported for academic prose in Biber et al. (1999: 565, 567); i.e. very > so > quite > too. Thus, unsurprisingly, there seems to be a greater register difference between fictional narrative and academic prose than between fictional dialogue and real conversation. The amplified adjectives that occur in predicative function are less recurrent than the top three amplifiers discussed above. In dialogue, the most frequent adjectives are good and bad, with 15 and nine occurrences, respectively. In narrative, the most frequent adjective in this function is young (22), followed by different (15) and good (15). Despite the relatively low numbers, there seems to be a slight difference in preferred choice of type of adjective between the subregisters. While the two most prominent adjectives in dialogue are clearly evaluative (appreciation: desirable/non-desirable), two of the most frequent ones in narrative are outside the evaluative domain (young and different). To expand on this, it may be noted that, of the ten most frequent adjectives in dialogue (ranging from four to 15 occurrences each), six can be said to be evaluative (good, bad, kind, neat, clever, busy), following Martin and White’s (2005) classification. Of the non-evaluative ones, important can be classified as an adjective of relevance, while easy, hard and difficult are better classified as feasibility adjectives (Lorenz 1999). For the top ten in narrative, on the other hand, only three are clearly 5. The 256 tokens of amplifiers with adjectives in predicative position in dialogue are distributed across 29 types, and the 561 tokens in narrative across 82 types.



Chapter 18.  Intensification in dialogue vs. narrative 309

evaluative (good, nice, sure). The remaining seven are non-evaluative and may be said to belong to the following categories, with reference to Lorenz (1999) and Tagliamonte (2008) in particular: dimensional adjectives (big, small, thin), relevance (different),6 age (young and old), and physical property (full). Some of the most recurrent predicative adjectives seem to have different lexical preferences with regard to amplifiers. The most frequent adjective in dialogue, good, occurs most frequently with very (seven out of 15 instances), while bad prefers (not) too in four out of nine cases. This reflects the lexicalized nature of the collocation (not) too bad, which has a separate entry in for example the Macmillan dictionary. In narrative, the most frequent adjective, young, occurs with very in 13 out of 22 instances, as in (5), followed by too (six examples) and so (two examples). Curiously, old is found most frequently with too (five out of eight). Note, however, that the amplifier in too old is usually further modified by a clause or phrase, as in example (6).

(5) I was still very young when in a daze I saw Dad swallowed up by a hole in the road.  (BO1)

(6) … I’m too old to be kissed unless it’s a birthday or some other occasion…  (NG1)

Preceding different we find so in seven out of 15 instances. Good and bad in narrative select intensifiers in a similar fashion as in dialogue, though with a slightly smaller proportion of very good. With adjectives denoting size (big, large, small, thin) the intensifier too is quite frequent, especially with big, which occurs with too in six out of eight instances. The predicative adjective nice occurs exclusively with very as an amplifier (eight instances). In contrast, the least consistent picture was observed with the adjective pleased, which occurs in predicative position five times, and with five different intensifiers (bloody, masochistically, pretty, so, very). All of the adjectives discussed here may be characterized as scalar (Paradis, 1997: 160) and the modifiers identify a point or range on the scale. Not surprisingly, the evaluative (and arguably emotional) nature of intensification in dialogue seems to be in line with earlier studies of colloquial, spoken language, for example Tagliamonte (2008), Aijmer (2018). Thus, in the next section, we compare some of our findings with previous research, notably Aijmer’s, to see to what extent writers of fictional dialogue manage to mimic authentic spoken language.

6. Lorenz (1999: 53) argues that, “[w]ith a bit of leeway, these adjectives [including different] can all be associated with ‘relevance’, with topics that are novel, unusual and hence worth noting or writing about (or markedly not, cf. boring and dated).”

310 Signe Oksefjell Ebeling and Hilde Hasselgård

4.3

Collocations of amplifier + ADJ in EngFic compared to authentic speech

In this section, we take a closer look at the four most recurrent sequences of amplifier plus adjective in our material and compare them to the most frequent collocations identified for spoken English in Aijmer (2018). Biber et al. observe that “[c]onversation […] has higher frequencies of adjective with modifying adverb combinations than academic prose does” (1999: 545). By contrast, conversation is found to have “less diversity in word choice” (ibid.). Unfortunately, Biber et al. (1999) compare conversation only to academic prose, not to fiction or any other written register. However, their findings lead us to hypothesize that dialogue – along with authentic speech – may have less variation in the use of adverb-adjective combinations than narrative does. Moreover, Quaglio (2008: 203f) observes that the use of intensifiers in scripted television dialogue differs from that of authentic speech regarding frequency as well as lexical preferences. This makes it likely that there may be differences also between fictional dialogue and real conversation. As seen above, the most frequent amplifiers in our material are very, so and too. In contrast, both Aijmer (2018) and Tagliamonte (2008) find that really is more frequent than very in North American varieties.7 Because of this, and because Aijmer’s study does not include too, this section focuses on collocations with very, so and really. Table 2 shows the most recurrent collocations in our material juxtaposed with results from Aijmer’s analysis of British and American speech (2018).8 Like Aijmer’s (2018) dataset, the reported data from EngFic in Table 2 includes phrases in both attributive and predicative position. Only the five most frequent adjectives occurring at least three times have been included.9 The main purpose of the overview in Table 2 is to compare ranked frequencies; however, raw frequencies have been given in brackets in order to give a better impression of the distance between more and less frequent collocations within each corpus. Johansson (1993: 46) notes that his material, the LOB corpus, is “too small for a proper study of collocations”. The same can certainly be said about our material, in which only a few very common adjectives are found to recur with each intensifier, 7. Biber et al. (1999: 565) report that very is about twice as frequent as really in British conversation while they are equally frequent in American conversation. Ito and Tagliamonte, however, find an increasing frequency of really among young speakers of British English (2003: 270, 276). 8. Aijmer’s study also includes Singapore and New Zealand English, but these varieties are less relevant to compare with EngFic. 9. Two of the collocations in Table 2 occur in only one text each: really black (from a text by Nadine Gordimer) and very neat (in a text by Minette Walters). They can be linked to a specific topic and the speech of a particular character, respectively.

Chapter 18.  Intensification in dialogue vs. narrative 311



Table 2.  Adjective collocates of very, so and really in EngFic and authentic speech  

EngFic

  Speech (Aijmer 2018)

Dialogue

Narrative

British

American

very

good (13) clever (3) hard (3) neat (3)

young (15) nice (8) good (7) small (7)

  good (59) nice (33) difficult (5) different (3)

so

good (4)

different (7) full (5) easy (4) good (4)

funny (8) good (5) difficult (3) sweet (3)

good (20) funny (10) tired (7) upset (5)

really

good (3)

good (4) black (3)

good (32) nice (20) funny (5) interesting (4)

good (13) nice (7) bad (5) hard (4)

good (24) nice (6) important (5) true (4)

and where it is hard to see any patterns. The only clear tendency across dialogue and the authentic speech in Table 2 is the top ranked position of good with each modifier except so in British speech. This is, however, most likely due to the generally high frequency of good, although the ranked frequency of very good is lower in narrative than in both dialogue and authentic speech. In contrast, we may note that the combination very nice is second most frequent in narrative as well as British and American speech. In our dialogue material, very nice occurs only once; in fact the adjective tends to be used without modification in the dialogues; 34 out of the 39 occurrences of nice in dialogue have no intensifier, as in example (7).10 (7) That’s nice of you. 

(MD1)

The intensifier really is underrepresented in the dialogue part of our corpus compared to natural conversation (as seen from the frequency of really good in relation to very good) although it is comparatively more frequent in dialogue than in narrative. The texts in EngFic are not much older than the spoken material in ICE-GB, hence the difference should not reflect a real diachronic difference (but see Ito & Tagliamonte 2003), although one may speculate that fictional dialogue may be more conservative than authentic conversation.

10. In the narrative material, the adjective nice occurs only once with another intensifier than very (viz. the downtoner rather).

312 Signe Oksefjell Ebeling and Hilde Hasselgård

5. Summary of findings and concluding remarks The study started out with a general overview of the primary material used, where it was shown that the proportion of dialogue in the 30 present-day English fiction text extracts varies greatly from text to text, the median reaching roughly 20% (see Section 3). Thus, it is important to bear in mind that the results presented here are based on relatively limited material, particularly for dialogue. Our main aim was to investigate to what extent and in what way the use of intensification and intensification patterns differs between the two subregisters of fiction. The quantitative observations of our material suggest that both dialogue and narrative text make frequent use of intensifier + ADJ combinations, but that these are significantly more commonly used in dialogue. It was also found that the preferred type of intensification pattern in our data consists of an amplifier with an adjective in predicative function. This finding led us to focus our further analysis on the most frequently attested amplifiers and predicative adjectives. A study of downtoners as well as intensification of attributive adjectives could be an interesting avenue of further research, but this will require a far bigger corpus. While the most frequent amplifiers are the same in dialogue and narrative, their order of preference differs (very > so > too in dialogue and so > too > very in narrative). The fact that these three amplifiers account for 74% and 67% of all amplifiers in dialogue and narrative, respectively, is in line with Altenberg’s observation, based on spoken data from the London-Lund Corpus, that there is a “strong concentration to a few highly exploited amplifiers” (1991: 145). As far as the choice of adjective is concerned, our corpus data uncovered further differences, with evaluative adjectives being predominant in dialogue and non-evaluative ones in narrative. It was also revealed that the adjectives seem to have different lexical preferences with regard to their selection of amplifiers, for instance good prefers very, while bad prefers too. Generally, recurrence of identical sequences was found to apply to a very limited set of amplifiers and adjectives. Thus, again our findings tally well with Altenberg’s (1991): his study “revealed interesting quantitative tendencies, such as the limited range of [amplifiers] used in recurrent combinations” (1991: 131). Although it has been suggested that “intensification is a field of highly individual preference and self-expression” (Lorenz 1999: 24), the material investigated here also underlines the fact that the most frequent combinations tend to be shared among most writers/speakers (e.g. very good, very nice). This may be related to Sinclair’s (1991: 109) idiom principle, in that “a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices”.



Chapter 18.  Intensification in dialogue vs. narrative 313

In the final step of our investigation we compared three of our most recurrent sequences of amplifier (very, so, really) plus adjective with the most frequent sequences identified for authentic spoken English (Aijmer 2018). However, the only clear tendency that emerged from this comparison – most likely due to the small size of our corpus – was that good is among the top collocates both in dialogue in fiction and in authentic British and American speech (see Table 2). On the basis of the material at hand, it is hard to draw any further conclusions as to how successful fictional dialogue is in mimicking authentic spoken English. The main conclusion to be drawn from this small-scale study is, the frequency differences notwithstanding, that the two subregisters of fiction make fairly similar lexical choices in terms of their preferred amplifiers, while more lexico-grammatical differences emerge in terms of preferred adjectives. As a result, we have demonstrated that not only are there “differing preferences across registers” (Biber et al. 1999: 564), but also within registers. Future studies of the language of fiction should therefore be encouraged to pay more attention to the hybrid nature of this register in order to shed more light on the areas in which dialogue and narrative may differ.

References Aijmer, K. 2018. Intensification with very, really and so in selected varieties of English. In Corpora and Lexis, S. Hoffmann, A. Sand, S. Arndt-Lappe & L. M. Dillmann (eds), 106–139. Leiden: Brill. Altenberg, B. 1991. Amplifier collocations in spoken English. In English Computer Corpora: Selected Papers and Research Guide, S. Johansson & A. Stenström (eds), 127–147. Berlin: Mouton de Gruyter. Anthony, L. 2018. AntConc (Version 3.5.7) [Computer Software]. Tokyo, Japan: Waseda University. (8 February 2019). Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Longman. Bolinger, D. 1972. Degree Words. The Hague: Mouton. Claridge, C. & Kytö, M. 2014. I had lost sight of them then for a bit, but I went on pretty fast: Two degree modifiers in the Old Bailey Corpus. In Diachronic Corpus Pragmatics, I. Taavitsainen, A. H. Jucker & J. Tuominen (eds), 29–52. Amsterdam: John Benjamins. CLAWS7. 1996. A Post-editor’s Guide to CLAWS7 Tagging. Written by the UCREL team, University of Lancaster. (8 February 2019). Ebeling, S. O. & Ebeling, J. Forthcoming. Dialogue vs. narrative in fiction: A cross-linguistic comparison. To appear in Languages in Contrast 20(2). Granger, S. 1998. Prefabricated patterns in advanced EFL writing: Collocations and formulae. In Phraseology: Theory, Analysis, and Applications, A. P. Cowie (ed.), 145–160. Oxford: OUP.

314 Signe Oksefjell Ebeling and Hilde Hasselgård

Hendrikx, I. 2019. The Acquisition of Intensifying Constructions in Dutch and English by French-speaking CLIL and non-CLIL Students: Cross-linguistic Influence and Exposure Effects. PhD dissertation. Université catholique de Louvain, Louvain-la-Neuve. Ito, R. & Tagliamonte, S. 2003. Well weird, right dodgy, very strange, really cool: Layering and recycling in English intensifiers. Language in Society 32(2): 257–279. Johansson, S. 1993. ‘Sweetly oblivious’: Some aspects of adverb-adjective combinations in present-day English. In Data, Description, Discourse: Papers on the English Language in Honour of John McH Sinclair on his Sixtieth Birthday, Michael Hoey (ed.), 39–49. London: HarperCollins. Johansson, S., Ebeling, J. & Oksefjell, S. 1999/2002. English-Norwegian Parallel Corpus: Manual. University of Oslo. (13 February 2019). Lorenz, G. R. 1999. Adjective Intensification – Learners Versus Native Speakers: A Corpus Study of Argumentative Writing. Amsterdam: Rodopi. Macmillan Dictionary, online. (17 June 2019). Martin, J. & White, P. R. R. 2005. The Language of Evaluation: Appraisal in English. Houndmills: Palgrave Macmillan. Méndez-Naya, B. 2008. Special issue on English intensifiers, Introduction. English Language and Linguistics 12(2): 213–219.  https://doi.org/10.1017/S1360674308002591 Napoli, M. & Revetto, M. (eds). 2017. Exploring Intensification: Synchronic, Diachronic and Cross-linguistic Perspectives. Amsterdam: John Benjamins. Paradis, C. 1997. Degree Modifiers of Adjectives in Spoken British English. Lund: Lund University Press. Quaglio, P. 2008. Television dialogue and natural conversation: Linguistic similarities and functional differences. In Corpora and Discourse: The Challenges of Different Settings, A. Ädel & R. Reppen (eds), 189–210. Amsterdam: John Benjamins. Quirk, R., Greenbaum, S., Leech, G. & Svartvik, J. 1985. A Comprehensive Grammar of the English Language. London: Longman. R Core Team. 2018. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. (8 February 2019). Sinclair, J. 1991. Corpus, Concordance, Collocation. Oxford: OUP. Tagliamonte, S. 2008. So different and pretty cool! Recycling intensifiers in Toronto, Canada. English Language and Linguistics 12(2): 361–394.  https://doi.org/10.1017/S1360674308002669 Wilhelmsen, A. 2019. Pretty complete or completely pretty? Investigating Degree Modifiers in English and Norwegian Original and Translated Text. MA thesis, Université catholique de Louvain / University of Oslo. (15 May 2020). Xiao, R. & Tao. H. 2007. A corpus-based sociolinguistic study of amplifiers in British English. Sociolinguistic Studies 1(2): 241–273.  https://doi.org/10.1558/sols.v1i2.241

Chapter 18.  Intensification in dialogue vs. narrative 315 Appendix Table A1. Amplifiers in dialogue vs. narrative Amplifier

very so too quite really perfectly bloody awfully highly terribly completely entirely extremely Other – two occurrences each (absolutely, dead, frightfully)

Dialogue

Narrative

Raw frequencies Number of texts (percentages) in which attested

Raw frequencies Number of texts (percentages) in which attested

76 (29.7%) 64 (25%) 50 (19.5%) 12 11 6 4 3 3 3 1 2 2 6

20 22 20 9 7 6 3 2 3 3 1 2 2

110 (19.6%) 150 (26.7%) 118 (21%) 18 11 4 1 1 3 3 4 8 4 3

Amplifier

20 30 29 16 8 4 1 1 2 3 4 6 4 3

very so too quite really perfectly bloody awfully highly completely deeply entirely extremely fully

(continued)

316 Signe Oksefjell Ebeling and Hilde Hasselgård Table A1. (continued) Amplifier

Other (one occurrence each)*

Total

Dialogue

Narrative

Raw frequencies Number of texts (percentages) in which attested

Raw frequencies Number of texts (percentages) in which attested

13

256

Amplifier

3

3

immensely

4 5 9 9 4 6 3 4 3 3 26

4 5 8 6 1 5 3 4 2 2

increasingly much particularly pretty surprisingly totally truly unusually utterly wholly Other – two occurrences each (absolutely, amazingly, dead, especially, exceedingly, extraordinarily, genuinely, incredibly, invariably, peculiarly, positively, that, thoroughly) Other (one occurrence each)**

44 561

* Dialogue: Abso-bloody-lutely, damned, exceptionally, firmly, horribly, incredibly, overly, ridiculously, seriously, that, thoroughly, well, wonderfully. ** Narrative: Abnormally, absurdly, all, all that, altogether, bizarrely, cadaverously, chronically, desperately, disastrously, essentially, exceptionally, exuberantly, fiercely, gravely, hopelessly, horribly, impressively, jolly, largely, ludicrously, magnificently, marvellously, masochistically, miserably, monstrously, painfully, passionately, preposterously, provoca-tively, relentlessly, remarkably, seriously, significantly, strikingly, ultimately, uncompromisingly, undeniably, unexpectedly, unfailingly, unnaturally, unreasonably, violently, vividly.

Chapter 19

Orality on the searchable web A comparison of involved web registers and face-to-face conversation Douglas Biber and Jesse Egbert Northern Arizona University

As Culpeper and Kytö (2010) discuss, one challenge of historical linguistics is the extent to which written texts represent the linguistic characteristics of speech. Synchronic linguists face similar challenges, leading to the practice of using a web corpus to represent the spectrum of oral–literate registers. However, there has been little research that tests the validity of this practice. The present chapter begins by summarizing the patterns of register variation on the searchable web documented in Biber and Egbert (2018). While that study documents the importance of oral–literate linguistic dimensions, it does not investigate whether involved web registers represent the linguistic characteristics of spoken registers. We explore that research question here, comparing the multi-dimensional profiles of online registers and spoken conversation. Keywords: orality, web registers, involvement, conversation, Multi-Dimensional analysis

1. Introduction Linguists from several different subdisciplines have been interested in the nature of colloquial written registers, and the extent to which the linguistic characteristics of speech can be represented in written discourse. For example, historical linguists like Culpeper and Kytö (2010) discuss the extent to which written representations of spoken interactions (e.g. trial proceedings or dramatic plays) can be used to document the linguistic characteristics of spoken discourse from earlier historical periods. Other researchers, like Chafe (1982), Tannen (1982), and Biber (1988) are interested in the broader question of the extent to which interactive or involved written registers (such as personal letters) employ the same linguistic features as spoken registers like face-to-face conversation. https://doi.org/10.1075/scl.97.19bib © 2020 John Benjamins Publishing Company

318 Douglas Biber and Jesse Egbert

With the advent and rapid expanse of the web, such questions have become of even greater interest (see Crystal 2001; Baron 2008). Some researchers argue for using the ‘web as corpus’ (WAC), sometimes even suggesting that no other corpus resources are required, due to the mind-boggling number of documents on the web that are readily available for linguistic searches. Of course, such claims are offset by questions of the ‘representativeness’ of the web used as a corpus, in large part because this enterprise usually relies on the results of standard search engines. Three problems in particular have been noted: (1) the specific set of documents included in any given web search is not known to the end-user; (2) that set of documents varies from one search to the next; and (3) the text categories of documents included in a search are also not known. The advantages and disadvantages of using the web-as-corpus have been discussed in numerous publications (see, e.g., Kilgarriff & Grefenstette 2003; Leech 2007; Gatto 2014). Advocates of the WAC approach do not specifically claim that searching the web captures the linguistic characteristics of oral discourse. However, the underlying assumption of the approach is that the web, because it is so large and diverse, represents the full range of linguistic variation in a language – with the implication that this includes the linguistic characteristics of both literate and oral discourse. An additional consideration is the large number of new registers that have emerged with the historical development of the Internet. In particular, certain computer-mediated-communication (CMC) registers closely emulate spoken interaction with respect to interactivity and communicative purposes, even though they are produced in writing rather than speech. Jonsson (2015) describes the linguistic characteristics of two major CMC registers: ‘synchronous’ and ‘supersynchronous’ written interactions on the Internet. In one major respect, synchronous interactions (such as online ‘chat’ or IM) are similar to face-to-face conversations: they occur in real-time, and so the addressee can read a text as soon as it is written, and can then immediately respond. Text messaging on cell phones can be a type of synchronous interaction if both participants are actively reading and responding at the same time. However, supersynchronous written interactions are even more similar to face-to-face conversation, because all participants can be producing text at the same time, and they can all immediately see the text being produced by other participants. Thus, similar to face-to-face conversation, supersynchronous written interactions include false starts, overlapping turns, and all of the other production phenomena typical of true spoken interactions. Interestingly, Jonsson (2015) found that some CMC registers are actually more ‘oral’ in their linguistic characteristics than spoken face-to-face conversations. The most dramatic findings of this type came from consideration of Dimension 1 from the Biber (1988) Multi-Dimensional (MD) analysis. That dimension, labelled ‘Involved versus Informational Production’, opposes two sets of co-occurring



Chapter 19.  Orality on the searchable web 319

linguistic features: an ‘interactive/involved’ set of features versus an ‘informational’ set of features. The ‘interactive/involved’ set of features is associated with interpersonal interaction, a focus on personal stance, and real-time production circumstances. These features include first and second person pronouns, WH questions, emphatics, amplifiers, and sentence relatives, which can all be interpreted as reflecting interpersonal interaction and the involved expression of personal stance (feelings and attitudes). Other positive features are associated with the constraints of real-time production, resulting in a reduced surface form, a generalized or uncertain presentation of information, and a generally ‘fragmented’ production of text; these include that deletions, contractions, pro verb DO, the pronominal forms, and final (stranded) prepositions. This collection of features is associated with personal interaction, involved communicative purposes, real-time production circumstances, and ‘oral’ discourse. These features are especially common in face-to-face conversation and telephone conversations. The opposing ‘informational’ set of linguistic features that define Dimension 1 include nouns, long words, prepositional phrases, and attributive adjectives. This collection of features is interpreted as reflecting an informational focus, a careful integration of information in a text, and precise lexical choice – discourse characteristics that are associated with careful production circumstances and the opportunity to revise and edit a text. It turns out that these features are especially common in ‘literate’ written registers, like academic prose and official documents. Jonsson (2015) showed that CMC interactive registers are similar to spoken conversation in having large, positive Dimension 1 scores, reflecting a frequent use of present tense verbs, possibility modals, pronouns, wh-questions, and the other ‘oral’ features grouped on that dimension. But the most surprising finding is that supersynchronous CMC discourse actually has a more ‘oral’ characterization than face-to-face conversation! Thus, supersynchronous CMC discourse, produced in writing, is more ‘conversational’ than ‘genuine’ conversations that have been produced in speech. Jonsson’s (2015) study of CMC interactive registers raises the question of whether there are written registers on the searchable web, readily accessible to researchers and practitioners, that provide reasonable representations of the typical discourse style found in spoken interactions? The motivation for this question comes from the fact that not all documents on the web are equally accessible to end-users. That is, although the distinction is not always emphasized, the web consists of two major discourse domains: documents that are publicly accessible and therefore part of the ‘searchable web’, and documents that are private and not accessible to the general public. The former consists of all documents that have been indexed by search engines such as Google. In contrast, the latter includes two major types of texts: documents on websites that are protected by passwords (e.g.

320 Douglas Biber and Jesse Egbert

government, corporations, academic research journals) as well as documents on social media sites such as Twitter and Facebook. The new synchronous written registers that have emerged online are associated with such private sites; synchronous interactions do not (normally) occur on the public searchable web. In contrast, documents on the searchable web are simply posted to a public website and not addressed to a specific individual. If a second user responds to a document, they do it at some later time, rather than as part of a synchronous interaction. This type of asynchronous interaction is found even in cases where two individuals are directly responding to one another. Cases of this type occur when a user on a review website writes a comment that is directly addressed to a previous commenter. A second example of this type is when a user of a discussion forum writes a question in a posting, and then a second user directly responds to that question several hours later. For our purposes here, the key characteristic of these interactions is that they are asynchronous, in contrast to the synchronous interactions that are common on the private Internet. Thus, in summary, the technology of the private Internet facilitates synchronous interactions, but the public searchable web differs in two major respects: (1) interactive discourse is in general not common at all, and (2) when it does occur, interactive discourse on the public web is almost always asynchronous. As a result of these differences, it is not clear whether registers on the public searchable web employ ‘oral’ linguistic features in similar ways to face-to-face conversation and/or CMC registers. This is the central research question that we take up in the present chapter. The study here builds on a multi-year project to document the patterns of register variation on the searchable web (see, e.g., Egbert et al. 2015; Biber et al. 2015; Biber & Egbert 2016, 2018). Thus, we begin in Section 2 by first summarizing the research design and major findings from that project, focusing especially on the most colloquial registers found on the searchable web. The discussion in this section includes consideration of the situational characteristics of these colloquial registers, as well as the results of an MD analysis comparing the linguistic characteristics of colloquial web registers to other registers found on the searchable web. Building on the previously published results from that project, Section 3 then directly compares the situational and linguistic characteristics of colloquial registers found on the searchable web to the characteristics of spoken conversation and private CMC. We begin with descriptions of the situational characteristics of colloquial web registers, focusing especially on whether they are interactive and personally involved to the same extent as spoken and computer-mediated conversations. Then, we move on to a direct linguistic comparison of registers, employing the Biber 1988 MD framework. Finally, in Section 4, we summarize the results and provide a discussion of the implications for students and researchers hoping to represent ‘oral’ discourse through sub-corpora collected from the searchable web.



Chapter 19.  Orality on the searchable web 321

2. Background: Describing the patterns of register variation on the searchable web The searchable web differs from other discourse domains in that the population of searchable web documents has been itemized and indexed by search engines. As a result, at least in theory, it is possible to obtain a random sample of documents from the entire public web. The corpus constructed for the present study attempts to achieve that goal. The corpus was extracted from the ‘General’ component of the Corpus of Global Web-based English (GloWbE; see ). The GloWbE corpus contains c. 1.9 billion words in c. 1.8 million web documents. The corpus was collected in November–December 2012 by using the results of Google searches of highly frequent English 3-grams (i.e. the most common 3-grams occurring in COCA, e.g. is not the, and from the). 800–1,000 links were saved for each n-gram (i.e. 80–100 Google results pages), minimizing the bias from the preferences built into Google searches. To create a representative corpus of web pages for the detailed analyses in the current project, we randomly extracted 43,685 documents from the GloWbE corpus, focusing on web documents from five geographic regions (United States, United Kingdom, Canada, Australia, and New Zealand). The first methodological challenge of the project was determining the register category of each document in the corpus. Because texts were randomly collected from across the entire spectrum of the searchable web, the register category of documents was initially unknown. To address this challenge, we employed a bottom-up approach to identify the register category (shown in CAPS below) and the sub-register (shown in ITALIC CAPS) of each document. We developed and piloted a coding rubric for this purpose, and we then employed Mechanical Turk to have four independent raters code the situational register characteristics of each of the 43,685 documents in our corpus. Egbert et al. (2015) and Biber et al. (2015) provide full details of this procedure and the distribution of registers and sub-registers in the corpus. After texts were coded, we were able to analyze the composition of the searchable web. Table 1 presents the breakdown of documents across the major register categories used for our descriptions. Narrative is the most common of these general registers, while Informational Description and Opinion are also prevalent registers on the web. In addition, we asked coders to select a specific sub-register for each document. For example, interviews and TV transcripts were possible choices under the Spoken general category, while news reports and travel blogs were possible choices of specific registers under the Narrative general category. Table 2 summarizes the major sub-registers identified in our corpus.

322 Douglas Biber and Jesse Egbert

Table 1.  Composition of the corpus across register categories   General registers Narrative Informational Description (or Explanation) Opinion Interactive Discussion How-To / Instructional Informational Persuasion Lyrical (Songs/Poems) Spoken other TOTAL

Number of documents

Percentage of total documents in the corpus

Number of words

13,688  6,338

 31.3  14.5

13,797,504  8,664,046

 4,936  2,835  1,030    751    571    414 13,122 43,685

 11.3   6.5   2.4   1.7   1.3   0.9  30.0 100.0%

 7,754,456  2,690,415  1,027,940    684,912    250,669    829,656 16,965,781 52,665,379

Table 2.  Major sub-registers in the corpus   Number of documents (i.e. at least 3 of the 4 raters agreed on the coding) Narrative   News reports / News blogs   Sports reports   Personal blogs   Historical articles   Fiction   Travel blogs Informational Description (or Explanation)   Description of a thing   Encyclopedia articles   Research articles   Description of a person   Information blogs   FAQs Opinion   Reviews   Personal opinion blogs   Religious blogs/sermons   Advice Interactive Discussion   Discussion forums   Question-Answer forums

7,168 2,202 1,534   181   162   113 1,425   428   371   325   303    95 1,024 1,857   421   289 1,626   821

Chapter 19.  Orality on the searchable web 323



Table 2.  (continued)   Number of documents (i.e. at least 3 of the 4 raters agreed on the coding) How-To   How-to/instructions   Recipes Informational Persuasion   Description with intent to sell   News+Opinion blogs / Editorials Lyrical   Songs   Poems Spoken   Interviews   Formal speeches   TV transcripts

2.1

    757   110   622   380   476    45   251    20    11

The multi-dimensional analysis of web registers

In addition to keyword analyses and key grammatical feature analyses (see Biber & Egbert 2018, Chapters 5–8), the project employed MD analysis to provide a comprehensive linguistic description of register variation on the searchable web (see Biber & Egbert 2016; 2018, Chapter 4). We began with the 150+ specific grammatical features identified by the Biber tagger. Principal component analysis (Proc Factor in SAS) was used for the analysis, and variables with low communalities (reflecting low shared variance with the overall factor structure) were eliminated. Fifty-seven linguistic variables were retained for the final analysis (see Biber & Egbert 2018, Appendix A). The ten-factor solution was selected as optimal. (See Biber & Egbert 2016, 2018 for a more detailed description of the factor analysis. Cf. Berber-Sardinha 2014 for another MD study with similar research goals.) The similarities and differences among texts with respect to each dimension are analyzed in a second quantitative step: computing a dimension score for each text by summing the individual scores of the co-occurring linguistic features (see Biber 1988: 93–97). Once a dimension score is computed for each text, the mean dimension score for each register can be computed. Plots of these mean dimension scores allow linguistic characterization of any given register, comparison of the relations between any two registers, and a fuller functional interpretation of the underlying dimension. After the statistical analysis is completed, dimensions are interpreted functionally, based on the assumption that linguistic co occurrence reflects underlying communicative functions.

324 Douglas Biber and Jesse Egbert

For our purposes here, the most interesting findings from the MD description relate to the first three factors extracted from the statistical analysis, which all relate to oral–literate distinctions (cf. the discussion of oral vs. literate dimensions in Biber 2014). Table 3 below summarizes the main patterns for those three dimensions, organized into five columns: – Column 1 briefly summarizes the functional interpretation of the dimension – Column 2 lists the important co-occurring linguistic features that comprise the dimension (i.e. all features with loadings > ±.3; features with loadings between .2 and .3 are listed in parentheses) – Column 3 lists the eight general registers, arranged vertically according to their dimension scores – Column 4 lists selected sub-registers, arranged vertically according to their dimension scores – Column 5 contains the + or – symbols to signify the mean dimension score for the registers and sub-registers in Columns 3 and 4. Six levels are distinguished for the dimension scores: > ± 1.5 marked by + + + + + + / −−−−−− > ± 1.2 marked by + + + + + / −−−−− > ± 0.9 marked by + + + + / −−−− > ± 0.6 marked by + + + / −−− > ± 0.3 marked by + + / −− > ± 0.15 marked by + / − The linguistic features with important positive and negative loadings on each dimension are listed in Column 2. Each dimension can have ‘positive’ and ‘negative’ features. Rather than reflecting importance, positive and negative signs identify two groupings of features that occur in a complementary pattern as part of the same dimension. That is, when the positive features occur together frequently in a text, the negative features are markedly less frequent in that text, and vice versa. Column 3 gives a semi-graphical display of the eight general registers, indicating the dimension score for each register. Column 5 provides the key for the semi-graphical listing of registers in Column 3: the magnitude of the dimension score is indicated by the number of + or – signs. The display is semi-graphical because the registers in Column 3 line up vertically with the Dimension Score Level in Column 5. Finally, Column 4 lists specific sub-registers that are especially marked for a given dimension, often contrasting several sub-registers within a single general category. The format for Column 4 is the same as Column 3, with sub-registers having large positive dimension scores listed at the top, and sub-registers having large negative dimension scores listed at the bottom. The key to the dimension scores presented in Column 5 applies to the sub-registers in Column 4 in exactly the same way as the general registers in Column 3.

Chapter 19.  Orality on the searchable web 325

Summary of results for Dimensions 1–3 from the Biber & Egbert (2016) MD analysis of web registers Dimension interpretation

Co-occurring linguistic features on the dimension

Oral-involved versus Literate informational

POSITIVE FEATURES (+) Verbs: progressive aspect, non-past tense, activity verbs person pronouns Stance features: desire verb + to-clause, mental verbs, attitudinal adjectives (not controlling a complement clause), (stance adverbs) (type-token ratio, verb + WH-clause)

Summary of the dimension scores for general registers

Summary of the dimension scores for selected sub-registers

Dimension Score Level

++++ +++++ ++++ ++++ +++ ++ ++ +

0 0 0 0

VERSUS

NEGATIVE FEATURES (−)

nouns)

– −− −−−−− −−−− −−−−− −−−−

(continued)

326 Douglas Biber and Jesse Egbert (continued) Dimension interpretation

Co-occurring linguistic features on the dimension

Oral elaboration

POSITIVE FEATURES (+) Verbs: existence verbs, mental verbs, epistemic verbs (not controlling a complement clause) Verb + complement clause: likelihood verb + that-clause, certainty verb + that-clause, (verb + to-clause (excl. desire verbs), verb + WH-clause) complementizer deletion

VERSUS

Summary of the dimension scores for general registers

Summary of the dimension scores for selected sub-registers

Dimension Score Level

+++ ++++ +++ +++ +++

+

+

0 0 NEGATIVE FEATURES (−) (Activity verbs, proper nouns, type-token ratio)

– – −− −− −−

(continued)

Chapter 19.  Orality on the searchable web 327 (continued) Dimension interpretation

Co-occurring linguistic features on the dimension

Oral narrative versus Written information

POSITIVE FEATURES (+) Verbs: past tense verbs, perfect aspect verbs, activity verbs Adverbs: time adverbs, place adverbs, (total other adverbs) Adverbial clauses (excl. conditional) Pronouns: 1st person pronouns, (it, 3rd person pronouns)

Summary of the dimension scores for general registers

Summary of the dimension scores for selected sub-registers

Dimension Score Level

++++ +++ ++ + ++++ ++ +++ ++ ++ +

concrete nouns) VERSUS

0 0 NEGATIVE FEATURES (−) Long words Nouns: common nouns, process nouns

− −− −−−

attributive adjectives pre-modifying nouns −−−− −−−−−

+ +

328 Douglas Biber and Jesse Egbert

Dimensions 1–3 are similar to one another in distinguishing among oral versus literate web registers. Lyrical documents are the most marked on all three dimensions, while spoken discourse and interactive discussions also have large positive scores on the three dimensions. Written informational registers have negative scores on all three dimensions. Beyond those similarities, though, there are some less noticeable differences in the register patterns for the three dimensions. Informational description has a large negative score on Dimension 1, reflecting the absence of the positive features on this dimension, combined with high frequencies for the negative Dimension 1 features. Narrative has a small negative score on Dimension 1, while how-to and opinion have small to moderate positive Dimension 1 scores. Informational description and narrative are also the only two registers with negative scores on Dimension 2, although the magnitude of those scores is small. One major difference between the two dimensions is that Dimension 1 defines a clear opposition between two discourse styles – oral-involved versus literate-informational – whereas Dimension 2 primarily identifies a single discourse style: the style of elaboration found in oral discourse (which is to a large extent absent in informational ­description and narrative). Finally, Dimension 3 combines two major functional influences: an oral vs. literate contrast, and a narrative vs. non-narrative contrast. Apparently, all of the oral general registers in our corpus tend to have narrative communicative purposes, as do written narrative documents. At the other extreme, both informational description and informational persuasion are marked by the absence of positive Dimension 3 features, together with high frequencies of the negative features on this dimension. In addition, despite the similarities in register patterns, the three dimensions are distinct in their linguistic composition. Dimension 1 is composed of dynamic activity verbs, progressive aspect verb phrases, and present tense, combined with pronouns and stance features; Dimension 2 is composed of stative verb classes and complement clause constructions; Dimension 3 is composed of past tense verbs and perfect aspect verb phrases, combined with adverbs and pronouns (opposed to long words and features associated with complex noun phrases). These linguistic differences reflect the different functional underpinnings of the dimensions. Thus, the positive features on Dimension 1 include dynamic verbs and verb tenses (progressive aspect, non-past tense, and activity verbs), first and second person pronouns, and some stance features. These are stereotypically ‘oral’ features, but they also reflect a high degree of interactivity and personal involvement. Such features are common not only in spoken discourse and written interactive discussions; they are also common in written how-to documents, which reflect high personal involvement.



Chapter 19.  Orality on the searchable web 329

Table 3 shows that these personal involvement features are rare in informational description/explanation (which has a large negative score on Dimension 1). Instead, we find a frequent use of the negative Dimension 1 features in these registers: noun phrases with definite articles, prepositional phrases, and passive non-finite relative clauses. These are features used to convey information, but they also reflect a kind of impersonal ‘detachment’, in opposition to the dynamic ‘involved’ functions of the positive features. The Dimension 1 pattern of variation for general registers shows that it defines a fundamental opposition between spoken/oral/involved versus written/literate/ informational registers. The Dimension 1 scores for specific sub-registers further help with the functional interpretation. For example, Table 3 shows that there are important differences among opinion sub-registers with respect to these linguistic features: Advice documents are extremely ‘involved’, while news-opinion blogs (and editorials), and religious blogs are detached and ‘informational’ rather than ‘involved’. Similarly, there is a large range of variation among sub-registers within the general category of narrative with respect to Dimension 1 scores. For example, personal blogs are highly ‘involved’ on this dimension; fiction is intermediate; and historical articles are extremely ‘informational’ in their Dimension 1 characteristics. Even spoken sub-registers and informational-written sub-registers vary to some extent along this dimension: Within the general register of spoken discourse, TV transcripts are extremely ‘involved’, while formal speeches have an intermediate Dimension 1 score. And within the general register of informational description, informational blogs are intermediate along Dimension 1, while sub-registers like research articles and encyclopedia articles have some of the largest negative Dimension 1 scores. Dimension 2 can similarly be interpreted as relating to a general oral–literate opposition, but the linguistic basis of that contrast relates to structural elaboration rather than personal involvement. That is, the co-occurring linguistic features on Dimension 2 include stative verbs (existence verbs and mental verbs) and three types of dependent clauses (complement clause constructions: that-clauses, to-clauses, and WH-clauses). Lyrical discourse – especially songs – are extremely marked for the use of these elaborated features, but they are common in all spoken sub-registers of our corpus. These features are also quite common in ‘oral’ written registers, such as interactive discussions (discussion forums or question-answer forums) as well as fictional narratives and personal blog narratives. Finally, Dimension 3 comprises stereotypically narrative linguistic features, such as past tense verbs, perfect aspect verbs, activity verbs, place and time adverbs and most types of adverbial clauses (except conditional clauses). The narrative style captured by Dimension 3 is mostly first-person (shown by the high loading

330 Douglas Biber and Jesse Egbert

for first person pronouns), although third person pronouns also co-occur with these features. The most marked general web registers with respect to Dimension 3 are lyrical texts and interactive discussions. However, consideration of the specific sub-registers (see Table 3) shows that fictional narrative is actually the most extreme in its reliance on these features. Personal blogs are also extremely marked for the use of these features. Interestingly, though, historical articles have only an intermediate ‘narrative’ score on Dimension 3, while News reports are actually ‘informational’ rather than ‘narrative’ in their Dimension 3 score. Somewhat surprisingly, spoken TV interactions are also often narrative with respect to Dimension 3 features, while spoken formal speeches are actually marked for the absence of these narrative features. Linguistically, the negative pole of Dimension 3 is composed of long words and several features related to complex noun phrases: common nouns, process nouns, and three types of nominal modifiers (attributive adjectives, pre-modifying nouns, and finite relative clauses). These features are especially prevalent in informational registers, including both informational persuasion and informational description/explanation. (At the same time, Dimension 3 shows that these informational registers tend to be linguistically non-narrative.) However, there is some variation among informational sub-registers. For example, descriptions of a person (including biographies) and encyclopedia articles are intermediate along Dimension 3, employing both ‘narrative’ and the ‘informational’ linguistic features. In contrast, research articles are extremely marked on Dimension 3, making a dense use of the negative noun phrase features coupled with rare use of the positive ‘narrative’ features. Overall, the findings from the descriptions in Biber and Egbert (2018) clearly show that some registers on the searchable web are considerably more ‘oral’ in their linguistic characteristics than other registers. Four registers stand out as being especially ‘oral’ in their MD profiles: song lyrics, transcribed interviews, TV transcripts, and discussion forums. However, these previous analyses do not directly compare the ‘oral’ registers from the searchable web to genuine spoken conversation (or to private CMC interactions). This is the research question that we take up in the following section.



Chapter 19.  Orality on the searchable web 331

3. Analyzing the extent to which ‘oral’ registers from the searchable web represent the characteristics of face-to-face conversation and online conversations In terms of their situational characteristics, the ‘oral’ registers from the searchable web differ in several key respects from spoken conversations and (super)synchronous CMC interactions. In the first place, the web oral registers are interactive to a lesser extent than conversation and (super)synchronous CMC. TV transcripts are purported to represent genuine conversations, and should therefore be interactive in similar ways to conversation. However, TV dialogues are normally pre-scripted, and the dialogue serves dual functions: carrying the narrative forward in addition to performing the interpersonal functions of normal conversations. Interviews and discussion forums are interactive and co-constructed by multiple participants, but in a much more constrained way than everyday conversations. In particular, both of these web registers are usually structured in terms of question–answer adjacency pairs, with one participant asking a relatively short question, and the other participant providing long responses, including narratives, opinionated views on a topic, or explanations of a concept or procedure. Finally, song lyrics printed on the web are not interactive in the normal sense, because they represent the discourse produced by a songwriter with no direct interaction with readers of the lyrics (apart from the possibility of comment posts). In addition, these searchable web ‘oral’ registers differ from spoken conversation and (super)synchronous CMC in their production circumstances. Interviews are produced in speech but usually with pre-planning and relatively long pauses between turns; in some cases, the transcripts of interviews have been edited to remove dysfluencies. The other three ‘oral’ web registers are all produced in writing, with ample opportunity for revising and editing the text. These circumstances contrast with the real-time production of both face-to-face conversation and CMC. Thus, although song lyrics, transcribed interviews, TV transcripts, and discussion forums are the most ‘oral’ of the registers found on the public searchable web, their situational characteristics differ in several key respects from both spoken conversation and (super)synchronous CMC. It turns out that these situational differences correspond to systematic linguistic differences. To document those patterns, we follow the methodology used in Jonsson (2015), comparing the dimension scores for all registers with respect to the three ‘oral’-’literate’ dimensions from Biber (1988): ‘Involved versus Informational Production’ (Dimension 1), ‘Situation-dependent versus Elaborated Reference’ (Dimension 3), and ‘Non-abstract versus Abstract/Impersonal Style’ (Dimension 5).

332 Douglas Biber and Jesse Egbert

Figures 1–3 plot the dimension scores of all four ‘oral’ web registers compared to spoken conversation and CMC along these three dimensions. Academic prose, newspaper reportage, general fiction, and spoken interviews are also shown on these plots for reference. The quantitative dimension scores for the ‘oral’ web registers are taken from Biber and Egbert (2018); dimension scores for the CMC registers are taken from Jonsson (2015); and scores for the other registers are taken from Biber (1988).1 50 + | Supersynchronous CMC | | 40 + | Telephone conversations (1988) | Face-to-face conversations (1988) | 30 + | | Synchronous CMC | Web song lyrics 20 + Web TV transcripts | Spoken interviews (1988) | Web interviews | 10 + Web discussion forums | | | 0 + Fiction (1988) | | | −10 + | | Academic prose; Newspaper reportage (1988) | −20 +

c19-figFigure 1.  Mean scores of registers along 1988 Dimension 1: ‘Involved versus Informational Production’ (Bold marks registers from the searchable web)

1. The polarity of the 1988 Dimensions 3 and 5 has been reversed from the original analysis.

Chapter 19.  Orality on the searchable web 333



6 + | Telephone conversations (1988) | Supersynchronous CMC | Synchronous CMC; Web song lyrics 4 + Face-to-face conversations (1988); Web TV transcripts | | Web discussion forums | Fiction (1988) 2 + | | Web interviews | Spoken interviews (1988) 0 + Newspaper reportage (1988) | | | −2 + | | | −4 + | Academic prose (1988) | | −6 +

c19-fig2Figure 2.  Mean scores of registers along 1988 Dimension 3: ‘Situation-dependent versus Elaborated Reference’ (Bold marks registers from the searchable web) [Polarity reversed] 6 + | | |

4 + Telephone conversations (1988); Supersynchronous CMC | Face-to-face conversations (1988); Synchronous CMC | | Fiction (1988)

2 + Spoken interviews (1988) | | Web song lyrics | Web interviews

0 +

| Web discussion forums; Web TV transcripts | Newspaper reportage (1988) |

−2 + | | |

−4 +

| | | Academic prose (1988)

−6 +

c19-fig3Figure 3.  Mean scores of registers along 1988 Dimension 5: ‘Non-abstract versus Abstract/ Impersonal Style’ (Bold marks registers from the searchable web) [Polarity reversed]

334 Douglas Biber and Jesse Egbert

Figure 1, which plots the scores for the most important oral/literate dimension (1988 Dimension 1), shows that all web ‘oral’ registers are involved and interactive in their linguistic characteristics, but to a lesser extent than spoken conversation. Web interviews have nearly the same Dimension 1 score as the 1988 spoken interviews, and Web TV transcripts and Web song lyrics (as well as synchronous CMC) have very similar Dimension 1 scores. These scores reflect a dense use of the involved and interactive features associated with Dimension 1 (e.g. first and second person pronouns, contractions, present tense verbs and adverbs, finite complement clauses and adverbial clauses, stance features) combined with the relative rarity of nominal and adjectival features. However, spoken conversations (both face-to-face and telephone) are considerably more involved and interactive than any of the web registers, and as noted in the introduction above, split-window supersynchronous CMC is even more marked for Dimension 1 than conversation. Thus, while web ‘oral’ registers utilize involved and interactive features to a greater extent than informational written registers, they are not as marked for these features as actual spoken conversations (or some CMC registers). In contrast, web registers like song lyrics and TV transcripts are very similar to spoken conversation with respect to Dimension 3 ‘situation-dependent’ (versus ‘elaborated’) features. Thus, these registers employ frequent place and time adverbs, while WH-relative clause constructions are rarely employed (see also Figure 8.3 in Biber & Egbert 2018: 189). Web discussion forums are relatively similar in their Dimension 3 characteristics, while Web interviews have an intermediate score on Dimension 3 (similar to the 1988 spoken interviews). Finally, Figure 3 plots the dimension scores for Dimension 5, ‘Non-abstract versus Abstract/Impersonal Style’. The major linguistic features associated with this dimension are passive constructions, including agentless and BY-passives, and including both finite and non-finite passive clauses. Figure 3 shows a similar distributional pattern to Figure 1: On the one hand, compared to informational written registers like academic prose, all ‘oral’ web registers are notably ‘non-abstract’, reflecting the fact that passive constructions are relatively rare. At the same time, though, none of the ‘oral’ web registers are nearly as marked for the absence of abstract/impersonal Dimension 5 features as spoken conversation or CMC.



Chapter 19.  Orality on the searchable web 335

4. Conclusion The general issue explored here has been the extent to which the public searchable web represents the linguistic characteristics of spoken conversational discourse. The underlying assumption is that the web represents virtually all types of discourse in English, because it is so vast and diverse. However, we have shown here that the representation of ‘oral’ registers on the searchable web is relatively constrained. The earlier MD analysis by Biber and Egbert (2016, 2018) indicates that four web registers – interviews, TV transcripts, song lyrics and discussion forums – are especially ‘oral’ relative to other written registers on the searchable web. However, the analyses reported in Section 3 above indicate that those registers are not as marked for the use of interactive/involved/non-abstract linguistic features as genuine spoken conversation. In addition, the results here show that the kinds of interactive ‘oral’ discourse found on the private Internet – such as synchronous and supersynchronous CMC – are considerably more interactive/involved/non-abstract than ‘oral’ registers on the searchable web. Descriptive linguistic research and language pedagogy based on the web as corpus usually relies on the public searchable web, because it is readily accessible to users. However, the results here indicate that WAC is not a substitute for direct analysis of genuine spoken conversation if the goal is to describe the linguistic patterns of that register. In contrast, the results in Jonsson (2015) indicate that (super)synchronous CMC registers do a good job of representing the linguistic characteristics of spoken conversation. To a large extent, this difference reflects the synchronous (and supersynchronous) technologies associated with the private Internet (which permit real-time production and interaction) as opposed to the asynchronous technologies associated with the public web. That is, interaction on the searchable web usually does not occur in real-time, and as a result, web documents tend to be more carefully produced and edited than (super)synchronous CMC posts. Those situational differences correspond to the more moderate linguistic characterizations seen with respect to the 1988 Dimensions 1 and 3. Thus, while it is the case that the searchable web represents an incredible range of linguistic variation, the descriptions here indicate that the discourse styles of fully interactive registers are not well-represented in that discourse domain.

336 Douglas Biber and Jesse Egbert

References Baron, N. 2008. Always On: Language in an Online and Mobile World. Oxford: OUP. Berber-Sardinha, T. 2014. 25 years later: Comparing Internet and pre-Internet registers. In Multi-dimensional Analysis, 25 Years On: A Tribute to Douglas Biber [Studies in Corpus Linguistics 60], T. Berber-Sardinha & M. Veirano Pinto (eds), 81–105. Amsterdam: John Benjamins. Biber, D. 1988. Variation across Speech and Writing. Cambridge: CUP. Biber, D. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1): 7–34. Biber, D. & Egbert, J. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2): 95–137. Biber, D. & Egbert, J. 2018. Register Variation Online. Cambridge: CUP. Biber, D., Egbert, J. & Davies, M. 2015. Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora 10(1): 11–45. Chafe, W. L. 1982. Integration and involvement in speaking, writing, and oral literature. In Spoken and Written Language: Exploring Orality and Literacy, D. Tannen (ed.), 35–54. Norwood NJ: Ablex. Crystal, D. 2001. Language and the Internet. Cambridge: CUP. Culpeper, J. & Kytö, M. 2010. Early Modern English Dialogues: Spoken Interaction as Writing. Cambridge: CUP. Egbert, J., Biber, D. & Davies, M. 2015. Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology 66(9): 1817–1831. Gatto, M. 2014. Web as Corpus: Theory and Practice. London: Bloomsbury. Jonsson, E. 2015. Conversational Writing: A Multidimensional Study of Synchronous and Supersynchronous Computer-mediated Communication. Frankfurt: Peter Lang. (15 May 2020). Leech, G. 2007. New resources or just better old ones? The holy grail of representativeness. In Corpus Linguistics and the Web, M. Hundt, N. Nesselhauf & C. Biewer (eds), 133–150. Amsterdam: Rodopi. Kilgarriff, A. & Grefenstette, G. 2003. Introduction to the special issue on the web as corpus. Computational Linguistics 29(3): 333–347. Tannen, D. 1982. Oral and literate strategies in spoken and written narratives. Language 58(1): 1–21.

Select list of publications by Merja Kytö

1. Books and special issues Early North-American Englishes (M. Kytö & L. Siebers eds). Amsterdam: John Benjamins, forthcoming. Intensifiers in English: A Sociopragmatic Analysis, 1700–1900 (C. Claridge, E. Jonsson & M. Kytö). Cambridge: Cambridge University Press, forthcoming. ICAME Journal 44 (M. Kytö, A.-B. Stenström & I. Mindt eds). Berlin: Mouton de Gruyter, 2020. Late Modern English: Novel Encounters (M. Kytö & E. Smitterberg eds). Amsterdam & Philadelphia: John Benjamins, 2020.  https://doi.org/10.1075/slcs.214 Punctuation in Context – Past and Present Perspectives (C. Claridge & M. Kytö eds). Bern: Peter Lang, 2020.  https://doi.org/10.3726/b16021 ICAME Journal 43 (M. Kytö, A.-B. Stenström & I. Mindt eds). Berlin: Mouton de Gruyter, 2019. Dialogues in Diachrony: Celebrating Historical Corpora of Speech-Related Texts (M. Kytö & T. Walker eds). Amsterdam: John Benjamins, 2018. ICAME Journal 42 (M. Kytö, A.-B. Stenström & I. Mindt eds). Berlin: Mouton de Gruyter, 2018. Punctuation: Past and Present. Special issue of Studia Neophilologica (B. Andersson & M. Kytö eds). London: Routledge, 2018. ICAME Journal 41 (M. Kytö, A.-B. Stenström & I. Mindt eds). Berlin: Mouton de Gruyter, 2017. Interfacing Individuality and Collaboration in English Language Research World. Studia Neophilologica, Vol. 89, supplement 1, special issue (M. Kytö, J. Smith & I. Taavitsainen eds). London: Routledge, 2017. Texts from Speech and Speech in Texts. Special issue of the Nordic Journal of English Studies. Vol. 16, no. 1 (T. Walker & M. Kytö eds). Gothenburg: University of Gothenburg, Department of Languages and Literature, 2017. Årsbok 2015 (M. Kytö ed.). Uppsala: Kungl. Humanistiska Vetenskaps-Samfundet i Uppsala, 2016. ICAME Journal 40 (M. Kytö, A.-B. Stenström & I. Mindt eds). Berlin: Mouton de Gruyter, 2016. The Cambridge Handbook of English Historical Linguistics (M. Kytö & P. Pahta eds). Cambridge: Cambridge University Press, 2016.  https://doi.org/10.1017/CBO9781139600231 Årsbok 2014 (M. Kytö ed.). Uppsala: Kungl. Humanistiska Vetenskaps-Samfundet i Uppsala, 2015. Developments in English: Expanding Electronic Evidence (I. Taavitsainen, M. Kytö, C. Claridge & J. Smith eds). Cambridge: Cambridge University Press, 2015. Årsbok 2013 (M. Kytö ed.). Uppsala: Kungl. Humanistiska Vetenskaps-Samfundet i Uppsala, 2014. Manuscript Studies and Codicology: Theory and Practice. Special issue of Studia Neophilologica (M. Peikola & M. Kytö eds). London: Routledge, 2014.

338 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Confess if you be Guilty: Witchcraft Records in their Linguistic and Socio-cultural Setting. Special issue of Studia Neophilologica (M. Kytö ed.). London: Routledge, 2012. English Corpus Linguistics: Crossing Paths (M. Kytö ed.). Amsterdam: Rodopi, 2012. Testifying to Language and Life in Early Modern England. Including a CD-ROM containing An Electronic Text Edition of Depositions 1560–1760 (ETED) (M. Kytö, P. J. Grund & T. Walker eds). Amsterdam: John Benjamins, 2011. Early Modern English Dialogues: Spoken Interaction as Writing (J. Culpeper & M. Kytö eds). Cambridge: Cambridge University Press, 2010. Language Change and Variation from Old English to Late Modern English. A Festschrift for Minoji Akimoto (Linguistic Insights 114) (M. Kytö, J. Scahill & H. Tanabe eds). Bern: Peter Lang, 2010. https://doi.org/10.3726/978-3-0351-0092-1 Records of the Salem Witch-Hunt (B. Rosenthal, G. A. Adams, M. Burns, P. Grund, R. Hiltunen, L. Kahlas-Tarkka, M. Kytö, M. Peikola, B. C. Ray, M. Rissanen, M. K. Roach & R. Trask eds). Cambridge: Cambridge University Press, 2009.  https://doi.org/10.1017/9781107589766 Corpus Linguistics: An International Handbook (HSK / Handbücher zur Sprach- und Kommunikationswissenschaft / Handbooks of Linguistics and Communication Science 29.1–2) (A. Lüdeling & M. Kytö eds). Berlin: Walter de Gruyter, 2008–2009. https://doi.org/10.1515/9783110211429 Guide to A Corpus of English Dialogues 1560–1760 (Studia Anglistica Upsaliensia 130) (M. Kytö & T. Walker eds). Uppsala: Acta Universitatis Upsaliensis, 2006. Nineteenth-century English. Stability and Change (M. Kytö, M. Rydén & E. Smitterberg eds). Cambridge: Cambridge University Press, 2006.  https://doi.org/10.1017/CBO9780511486944 Samtal i livet och i litteraturen / Conversation in Life and Literature. Papers from the ASLA Symposium, Uppsala, 8–9 November 2001 (U. Melander Marttala, C. Östman & M. Kytö eds). Uppsala: ASLA, 2002. A Reader in Early Modern English (Bamberger Beiträge zur Englischen Sprachwissenschaft 43) (M. Rydén, I. Tieken-Boon van Ostade & M. Kytö eds). Bern: Peter Lang, 1998. English in Transition: Corpus-Based Studies in Linguistic Variation and Genre Styles (Topics in English Linguistics 23) M. Rissanen, M. Kytö & K. Heikkonen eds). Berlin: Mouton de Gruyter, 1997.  https://doi.org/10.1515/9783110811148 Grammaticalization at Work: Studies of Long-Term Developments in English (Topics in English Linguistics 24). M. Rissanen, M. Kytö & K. Heikkonen eds). Berlin: Mouton de Gruyter, 1997. https://doi.org/10.1515/9783110810745 Tracing the Trail of Time. Proceedings from the Second Diachronic Corpora Workshop, New College, University of Toronto, May 1995 (R. Hickey, M. Kytö, I. Lancashire & M. Rissanen eds). Amsterdam: Rodopi, 1997. Speech Past and Present. Studies in English Dialectology in Memory of Ossi Ihalainen (Bamberger Beiträge zur Englischen Sprachwissenschaft / University of Bamberg Studies in English Linguistics 38) (J. Klemola, M. Kytö & M. Rissanen eds). Bern: Peter Lang, 1996. Corpora across the Centuries. Proceedings of the First International Colloquium on English Diachronic Corpora, St Catharine’s College Cambridge, 25–27 March, 1993 (M. Kytö, M. Rissanen & S. Wright eds). Amsterdam: Rodopi, 1994. Early English in the Computer Age: Explorations through the Helsinki Corpus (Topics in English Linguistics 11) M. Rissanen, M. Kytö & M. Palander-Collin eds). Berlin: Mouton de Gruyter, 1993.



Select list of publications by Merja Kytö 339

Variation and Diachrony, with Early American English in Focus: Studies on CAN/MAY and SHALL/WILL (Bamberger Beiträge zur Englischen Sprachwissenschaft 28 / University of Bamberg Studies in English Linguistics 28) (M. Kytö ed.). Bern: Peter Lang, 1991. [A cumulative PhD dissertation, University of Helsinki.] Corpus Linguistics, Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora (Language and Computers: Studies in Practical Linguistics 2) (M. Kytö, O. Ihalainen & M. Rissanen eds). Amsterdam: Rodopi, 1988.

2. Articles A little something goes a long way: Little in the Old Bailey Corpus (C. Claridge, E. Jonsson & M. Kytö). Forthcoming. Coordination in the courtroom: The uses of AND in the records of the Salem Witchcraft trials (M. Kytö). In M. Kytö & L. Siebers (eds), Early North-American Englishes. Amsterdam: John Benjamins, forthcoming. Entirely innocent: A historical sociopragmatic analysis of maximizers in the Old Bailey Corpus (C. Claridge, E. Jonsson & M. Kytö). Forthcoming. Migration, localities, and discourse: A century of community cookbook data and language contact (M. Kytö & A. Hoffman). Forthcoming. English in North America (M. Kytö). In D. Schreier, M. Hundt & E. W. Schneider (eds), The Cambridge Handbook of World Englishes. Cambridge: Cambridge University Press, 2020, 160–184. Introduction: Late Modern English studies into the twenty-first century (M. Kytö & E. Smitterberg). In M. Kytö & E. Smitterberg (eds), Late Modern English: Novel Encounters. Amsterdam & Philadelphia: John Benjamins, 2020, 1–17.  https://doi.org/10.1075/slcs.214.int Introduction: Multiple functions and contexts of punctuation (C. Claridge & M. Kytö). In C. Claridge & M. Kytö (eds), Punctuation in Context – Past and Present Perspectives. Bern: Peter Lang, 2020, 9–20.  https://doi.org/10.3726/b16021 L’interaction orale du passé: A Corpus of English Dialogues 1560–1760 (M. Kytö & T. Walker). Langages 217(1), 2020, 55–69.  https://doi.org/10.3917/lang.217.0055 A (great) deal of: Developments in 19th-century British and Australian English (C. Claridge & M. Kytö). In S. Jansen & L. Siebers (eds), Processes of Change: Studies in Late Modern and Present-Day English. Amsterdam: John Benjamins, 2019, 49–71. https://doi.org/10.1075/silv.21.04cla Register in historical linguistics (M. Kytö). Register Studies 1(1), 2019, 136–167. https://doi.org/10.1075/rs.18011.kyt The conjunction ‘and’ in phrasal and clausal structures in the Old Bailey Corpus (M. Kytö & E. Smitterberg). In N. Yáñez-Bouza, E. Moore, L. van Bergen & W. B. Hollmann (eds), Categories, Constructions, and Change in English Syntax. Cambridge: Cambridge University Press, 2019, 234–250.  https://doi.org/10.1017/9781108303576 Varying social roles and networks on a family farm: Evidence from Swedish immigrant letters, 1880s to 1930s (A. Hoffman & M. Kytö). Journal of Historical Sociolinguistics 5(2), 2019, 1–31.  https://doi.org/10.1515/jhsl-2018-0031

340 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Heritage Swedish, English, and textual space in rural communities of practice (A. Hoffman & M. Kytö). In J. Heegård Petersen & K. Kühl (eds), Selected Proceedings of the Eighth Workshop on Immigrant Languages in the Americas (WILA 8). Somerville, MA: Cascadilla Press, 2018, 44–54. Introduction (M. Kytö & T. Walker). In M. Kytö & T. Walker (eds), Dialogues in Diachrony: Celebrating Historical Corpora of Speech-Related Texts. Amsterdam: John Benjamins, 2018, 161–166. Introduction: Exploring the multifaceted faces of punctuation (B. Andersson & M. Kytö). In B. Andersson & M. Kytö (eds), Punctuation: Past and Present. Special issue of Studia Neophilologica. London: Routledge, 2018, 1–4. Breaking boundaries: Current research trends in English linguistics and philology (M. Kytö, J. Smith & I. Taavitsainen). In M. Kytö, J. Smith & I. Taavitsainen (eds), Interfacing Individuality and Collaboration in English Language Research World (Studia Neophilologica, Special Issue), 2017, 1–4. The linguistic landscapes of Swedish heritage cookbooks in the American Midwest, 1895–2005 (A. Hoffman & M. Kytö). Studia Neophilologica 89(2), 2017, 261–286. https://doi.org/10.1080/00393274.2017.1301783 Introduction (M. Kytö & P. Pahta). In M. Kytö & P. Pahta (eds), The Cambridge Handbook of English Historical Linguistics. Cambridge: Cambridge University Press, 2016, 1–15. https://doi.org/10.1017/CBO9781139600231.001 Well! Burn me, or hang me, I will stand in the truth of Christ: Investigating early spoken English (M. Kytö). In E. Eggert & J. Kilian (eds), Historische Mündlichkeit. Bern: Peter Lang, 2016, 163–180. Diachronic registers (M. Kytö & E. Smitterberg). In D. Biber & R. Reppen (eds), The Cambridge Handbook of English Corpus Linguistics. Cambridge: Cambridge University Press, 2015, 330–345.  https://doi.org/10.1017/CBO9781139764377.019 English genres in diachronic corpus linguistics (E. Smitterberg & M. Kytö). In P. Shaw, B. Erman, G. Melchers & P. Sundkvist (eds), From Clerks to Corpora: Essays on the English Language Yesterday and Today. Stockholm: Stockholm University, 2015, 117–133. https://doi.org/10.16993/bab.g English in the digital age: A general introduction (I. Taavitsainen, M. Kytö, C. Claridge & J. Smith). In I. Taavitsainen, M. Kytö, C. Claridge & J. Smith (eds), Developments in English: Expanding Electronic Evidence. Cambridge: Cambridge University Press, 2015, 1–8. Guidelines for normalising Early Modern English corpora: Decisions and justifications (D. Archer, M. Kytö, A. Baron & P. Rayson). ICAME Journal 39, 2015, 5–24. https://doi.org/10.1515/icame-2015-0001 I had lost sight of them then for a bit, but I went on pretty fast: Two degree modifiers in the Old Bailey Corpus (C. Claridge & M. Kytö). In I. Taavitsainen, A. H. Jucker & J. Tuominen (eds), Diachronic Corpus Pragmatics. Amsterdam: John Benjamins, 2014, 29–52. https://doi.org/10.1075/pbns.243.05cla Philology on the move: Manuscript studies at the dawn of the 21st century (M. Kytö & M. Peikola). In M. Peikola & M. Kytö (eds), Manuscript Studies and Codicology: Theory and Practice (Studia Neophilologica, Special Issue), 2014, 1–8. You are a bit of a sneak: Exploring a degree modifier in the Old Bailey Corpus (C. Claridge & M. Kytö). In M. Hundt (ed.), Late Modern English Syntax. Cambridge: Cambridge University Press, 2014, 239–268.  https://doi.org/10.1017/CBO9781139507226.018



Select list of publications by Merja Kytö 341

Features of layout and other visual effects in the source manuscripts of An Electronic Text Edition of Depositions 1560–1760 (ETED) (T. Walker & M. Kytö). In A. Meurman-Solin & J. Tyrkkö (eds), Principles and Practices for the Digital Editing and Annotation of Diachronic Data (Studies in Variation, Contacts and Change in English 14). Helsinki: VARIENG, 2013, available at Evidence from Historical Corpora up to the Twentieth Century (M. Kytö & P. Pahta). In T. ­Nevalainen & E. Traugott (eds), The Oxford Handbook of the History of English. Oxford: Oxford University Press, 2012, 123–133. Introduction (M. Kytö). In M. Kytö (ed.), Confess if you be Guilty: Witchcraft Records in their Linguistic and Socio-Cultural Setting (Studia Neophilologica, Special Issue), 2012, 1–5. New perspectives, theories and methods: Corpus linguistics (M. Kytö). In L. Brinton & A. Bergs (eds), English Historical Linguistics. An International Handbook, Vol. 2 (Handbooks of Linguistics and Communication Science 34.2). Berlin: Mouton de Gruyter, 2012, 1509–1531. Corpora and historical linguistics (M. Kytö). In S. Th. Gries (ed.), Corpus Linguistics (Brazilian Journal of Applied Linguistics/Revista Brasileira de Linguística Aplicada (BJAL), Special Issue 11(2)), 2010, 417–457. Data in historical pragmatics (M. Kytö). In A. Jucker & I. Taavitsainen (eds), Historical Pragmatics (Handbooks of Pragmatics 8). Berlin: Walter de Gruyter, 2010, 33–67. Explorations into ‘spoken’ interaction of the past: Evidence from early English texts (M. Kytö). In J. Helbig & R. Schallegger (eds), Anglistentag 2009 Klagenfurt – Proceedings (Proceedings of the Conference of the German Association of University Teachers of English 31). Trier: Wissenschaftlicher Verlag Trier, 2010, 9–20. Non-standard language in earlier English (M. Kytö & C. Claridge). In R. Hickey (ed.), Varieties in English Writing: The Written Word as Linguistic Evidence (Varieties of English Around the World G41). Amsterdam: John Benjamins, 2010, 15–41. Linguistic introduction (P. Grund, R. Hiltunen, L. Kahlas-Tarkka, M. Kytö, M. Peikola & M. ­Rissanen). In B. Rosenthal, G. A. Adams, M. Burns, P. Grund, R. Hiltunen, L. KahlasTarkka, M. Kytö, M. Peikola, B. C. Ray, M. Rissanen, M. K. Roach & R. Trask (eds), Records of the Salem Witch-Hunt. Cambridge: Cambridge University Press, 2009, 64–90. https://doi.org/10.1017/9781107589766.004 Engelskans historia speglad i sina texter (M. Kytö). Årsbok 2008. Stockholm: Kungl. Vitterhets Historie och Antikvitets Akademien, 2008, 65–75. My dearest Minnykins: Style, gender and affect in 19th century English letters (M. Kytö & S. ­Romaine). In G. Watson (ed.), The State of Stylistics: PALA 26. Amsterdam: Rodopi, 2008, 229–263. https://doi.org/10.1163/9789401206082_014 Collocational and idiomatic aspects of verbs in Early Modern English: A corpus-based study of MAKE, HAVE, GIVE, TAKE and DO. A reprint of the previous 1999 publication (M. Kytö). In W. Teubert & R. Krishnamurthy (eds), Corpus Linguistics: Critical Concepts in Linguistics. London: Routledge, 2007, 349–385. Engelska (M. Kytö). In E. Strangert (ed.), Databaser och digitalisering inom humaniora – existerande resurser och framtida behov. Bilaga 4. Stockholm: Vetenskapsrådet, Database Infrastructure Committee (DISC), 2007, no pagination. English witness depositions 1560–1760: An electronic text edition (M. Kytö, T. Walker & P. Grund). ICAME Journal 31, 2007, 65–85. Historisk dialoganalys: Lexikala upprepningar i äldre nyengelska (M. Kytö). Kungliga Vetenskapssamhällets i Uppsala Årsbok 36 (2005–2006), 2007, 41–46.

342 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Regional variation and the language of English witness depositions 1560–1760: Constructing a ‘linguistic’ edition in electronic form (M. Kytö, P. Grund & T. Walker). In P. Pahta, I. ­Taavitsainen, T. Nevalainen & J. Tyrkkö (eds), VARIENG E-Series special issue: Towards Multimedia in Corpus Studies, 2007, available at Adjective comparison in nineteenth-century English (M. Kytö & S. Romaine). In M. Kytö, M. Rydén & E. Smitterberg (eds), Nineteenth-century English: Stability and Change. Cambridge: Cambridge University Press, 2006, 194–214.  https://doi.org/10.1017/CBO9780511486944.008 Good, good indeed, the best that ere I heard: Exploring lexical repetitions in the Corpus of English Dialogues 1560–1760 (J. Culpeper & M. Kytö). In I. Taavitsainen, J. Härmä & J. Korhonen (eds), Dialogic Language Use / Dimensions du dialogisme / Dialogischer Sprachgebrauch (Mémoires de la Société Néophilologique LXVI). Helsinki: Société Néophilologique, 2006, 69–85. Introduction: Exploring nineteenth-century English – past and present perspectives (M. Kytö, M. Rydén & E. Smitterberg). In M. Kytö, M. Rydén & E. Smitterberg (eds), Nineteenth-century English: Stability and Change. Cambridge: Cambridge University Press, 2006, 1–16. https://doi.org/10.1017/CBO9780511486944.001 Nineteenth-century English: An age of stability or a period of change? (M. Kytö & E. Smitterberg). In R. Facchinetti & M. Rissanen (eds), Corpus Linguistic Studies in Diachronic English (Linguistic Insights: Studies in Language and Communication 31). Bern: Peter Lang, 2006, 199–230. We had like to have been killed by thunder & lightning: The semantic and pragmatic history of a construction that like to disappeared (M. Kytö & S. Romaine). Journal of Historical Pragmatics 6(1), 2005, 1–35.  https://doi.org/10.1075/jhp.6.1.02kyt Editing the documents from the Salem witchcraft trials: An exploration of a linguistic treasury (P. Grund, M. Kytö & M. Rissanen). American Speech 79(2), 2004, 146–166. https://doi.org/10.1215/00031283-79-2-146 The emergence of American English: Evidence from seventeenth-century records in New England (M. Kytö). In R. Hickey (ed.), Legacies of Colonial English: Studies in Transported Dialects (Studies in English Language). Cambridge: Cambridge University Press, 2004, 121–157. The linguistic study of Early Modern English speech-related texts: How ‘bad’ can ‘bad’ data be? (M. Kytö & T. Walker). Journal of English Linguistics 31(3), 2003, 221–248. https://doi.org/10.1177/0075424203257260 Lexical bundles in Early Modern English dialogues: A window into the speech-related language of the past (J. Culpeper & M. Kytö). In T. Fanego, B. Méndez-Naya & E. Seoane (eds), Sounds, Words, Texts and Change. Selected Papers from the 11 ICHEL, Santiago de Compostela, 7–11 September 2000 (Current Issues in Linguistic Theory 224). Amsterdam: John Benjamins, 2002, 45–65. The go-futures in English and French viewed as an areal feature (A. Danchev & M. Kytö). ­NOWELE 40, 2002, 29–60.  https://doi.org/10.1075/nowele.40.02dan The Middle English for to + infinitive construction: A twofold contact phenomenon? (A. Danchev & M. Kytö). In D. Kastovsky & A. Mettinger (eds), Language Contact in the History of English (Studies in English Medieval Language and Literature 1). Bern: Peter Lang, 2001, 35–55. Adjective comparison and standardisation processes in American and British English from 1620 to the present (M. Kytö & S. Romaine). In L. Wright (ed.), The Development of Standard English 1300–1800. Theories, Descriptions, Conflicts. Cambridge: Cambridge University Press, 2000, 171–194.  https://doi.org/10.1017/CBO9780511551758.011



Select list of publications by Merja Kytö 343

Building a bridge between the present and the past: A corpus of 19th-century English (M. Kytö, J. Rudanko & E. Smitterberg). ICAME Journal 24, 2000, 85–97. Data in historical pragmatics: Spoken interaction (re)cast as writing (J. Culpeper & M. Kytö). Journal of Historical Pragmatics 1(2), 2000, 175–199.  https://doi.org/10.1075/jhp.1.2.03cul English historical corpora: Report on developments in 1999 (M. Kytö & M. Rissanen). ICAME Journal 24, 2000, 159–175. Gender voices in the spoken interaction of the past: A pilot study based on Early Modern English trial proceedings (J. Culpeper & M. Kytö). In D. Kastovsky & A. Mettinger (eds), The History of English in Social Context. A Contribution to Historical Sociolinguistics (Trends in Linguistics. Studies and Monographs 129). Berlin: Mouton de Gruyter, 2000, 53–89. https://doi.org/10.1515/9783110810301.53 Robert Keayne’s Notebooks: A verbatim record of spoken English in early Boston? (M. Kytö). In S. C. Herring, P. van Reenen & L. Schøsler (eds), Textual Parameters in Older Languages (Current Issues in Linguistic Theory 195). Amsterdam: John Benjamins, 2000, 273–308. The conjunction and in Early Modern English: Frequencies and uses in speech-related writing and other texts (J. Culpeper & M. Kytö). In R. Bermúdez-Otero, D. Denison, R. M. Hogg & C. B. McCully (eds), Generative Theory and Corpus Studies. A Dialogue from 10 ICEHL (Topics in English Linguistics 31). Berlin: Mouton de Gruyter, 2000, 299–326. https://doi.org/10.1515/9783110814699.299 Collocational and idiomatic aspects of verbs in Early Modern English: A corpus-based study of MAKE, HAVE, GIVE, TAKE, and DO (M. Kytö). In L. J. Brinton & M. Akimoto (eds), Collocational and Idiomatic Aspects of Composite Predicates in the History of English. Amsterdam: John Benjamins, 1999, 167–206.  https://doi.org/10.1075/slcs.47.53kyt English historical corpora: Report on developments in 1998 (M. Kytö & M. Rissanen). ICAME Journal 23, 1999, 175–188. Investigating nonstandard language in a corpus of Early Modern English dialogues: Methodological considerations and problems (J. Culpeper & M. Kytö). In I. Taavitsainen, G. ­Melchers & P. Pahta (eds), Writing in Nonstandard English (Pragmatics & Beyond New Series 67). Amsterdam: John Benjamins, 1999, 171–187. Modifying pragmatic force: Hedges in Early Modern English dialogues (J. Culpeper & M. Kytö). In A. H. Jucker, G. Fritz & F. Lebsanft (eds), Historical Dialogue Analysis (Pragmatics & Beyond New Series 66). Amsterdam: John Benjamins, 1999, 293–312. https://doi.org/10.1075/pbns.66.12cul Pragmatik i språkets förflutna: Nya insikter från talspråksbaserad äldre nyengelska (M. Kytö). Årsbok 1998. Uppsala: Kungl. Humanistiska Vetenskaps-Samfundet i Uppsala (Annales Societatis Litterarum Humaniorum Regiae Upsaliensis), 1999, 41–56. Backdating the English constraint grammar parser for the analysis of English historical texts (M. Kytö & A. Voutilainen). In R. Hogg & L. van Bergen (eds), Historical Linguistics 1995: Selected papers from the twelfth International Conference on Historical Linguistics, Manchester, August 1995. Vol. 2. Amsterdam: John Benjamins, 1998, 149–166. https://doi.org/10.1075/cilt.162.12kyt English historical corpora: Report on developments in 1997 (M. Kytö & M. Rissanen). ICAME Journal 22, 1998, 113–120. BE/HAVE + past participle: The choice of the auxiliary with intransitives from Late Middle to Modern English (M. Kytö). In M. Rissanen, M. Kytö & K. Heikkonen (eds), English in Transition: Corpus-Based Studies in Linguistic Variation and Genre Styles (Topics in English Linguistics 23). Berlin: Mouton de Gruyter, 1997, 17–85. https://doi.org/10.1515/9783110811148.17

344 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

Competing forms of adjective comparison in Modern English: What could be more quicker and easier and more effective? (M. Kytö & S. Romaine). In T. Nevalainen & L. Kahlas-Tarkka (eds), To Explain the Present. Studies in the Changing English language In Honour of Matti Rissanen (Mémoires de la Société Néophilologique 52). Helsinki: Société Néophilologique, 1997, 329–352. English historical corpora: Report on developments in 1996 (M. Kytö & M. Rissanen). ICAME Journal 21, 1997, 109–120. Language analysis and diachronic corpora (M. Kytö & M. Rissanen). In R. Hickey, M. Kytö, I. Lancashire & M. Rissanen (eds), Tracing the Trail of Time. Proceedings from the Second Diachronic Corpora Workshop, New College, University of Toronto, Toronto, May 1995. Amsterdam: Rodopi, 1997, 9–22. Therfor speke playnly to the poynt: Punctuation in Robert Keayne’s notes of church meetings from early Boston, New England (M. Kytö). In R. Hickey & S. Puppel (eds), Language History and Linguistic Modelling. A Festschrift for Jacek Fisiak on his 60th Birthday (Trends in Linguistics, Studies and Monographs 101). Berlin: Mouton de Gruyter, 1997, 323–342. https://doi.org/10.1515/9783110820751.323 Towards a corpus of dialogues, 1550–1750 (J. Culpeper & M. Kytö). In H. Ramisch & K. Wynne (eds), Language in Time and Space. Studies in Honour of Wolfgang Viereck on the Occasion of his 60th Birthday. Stuttgart: Franz Steiner Verlag, 1997, 60–71. A corpus of English for specific purposes: Work in progress at the University of Tampere (J. Norri & M. Kytö). In C. E. Percy, C. F. Meyer & I. Lancashire (eds), Synchronic Corpus Linguistics. Papers from the Sixteenth International Conference on English Language Research on Computerized Corpora (ICAME 16). Amsterdam: Rodopi, 1996, 159–169. English historical corpora: Report on developments in 1995 (M. Kytö & M. Rissanen). ICAME Journal 20, 1996, 117–132. The best and most excellentest way: The rivalling forms of adjective comparison in Late Middle and Early Modern English (M. Kytö). In J. Svartvik (ed.), Words. Proceedings of an International Symposium, Lund, 25–26 August 1995 (Konferenser 36). Stockholm: Kungl. Vitterhets Historie och Antikvitets Akademien, 1996, 123–144. Applying the constraint grammar parser of English to the Helsinki Corpus (M. Kytö & A. ­Voutilainen). ICAME Journal 19, 1995, 23–48. English historical corpora: Report on developments in 1993–94 (M. Kytö & M. Rissanen). ICAME Journal 19, 1995, 145–158. BE vs. HAVE with intransitives in Early Modern English (M. Kytö). In F. Fernández, M. Fuster & J. J. Calvo (eds), English Historical Linguistics 1992 (Current Issues in Linguistic Theory 113). Amsterdam: John Benjamins, 1994, 179–190.  https://doi.org/10.1075/cilt.113.19kyt The construction be going to + infinitive in Early Modern English (A. Danchev & M. Kytö). In D. Kastovsky (ed.), Studies in Early Modern English (Topics in English Linguistics 13). Berlin: Mouton de Gruyter, 1994, 59–77.  https://doi.org/10.1515/9783110879599.59 Towards a corpus of early American English (M. Kytö). In M. Kytö, M. Rissanen & S. Wright (eds), Corpora across the Centuries. Proceedings of the First International Colloquium on English Diachronic Corpora, St Catharine’s College Cambridge, 25–27 March 1993. Amsterdam: Rodopi, 1994, 33–39. A supplement to the Helsinki Corpus of English texts: The Corpus of Early American English (M. Kytö). In J. Aarts, P. de Haan & N. Oostdijk (eds), English Language Corpora: Design, Analysis and Exploitation. Papers from the Thirteenth International Conference on English Language Research on Computerized Corpora, Nijmegen 1992. Amsterdam: Rodopi, 1993, 3–10.



Select list of publications by Merja Kytö 345

‘By and by enters [this] my artificiall foole … who, when Jack beheld, sodainely he flew at him’: Searching for syntactic constructions in the Helsinki Corpus (M. Kytö & M. Rissanen). In M. Rissanen, M. Kytö & M. Palander-Collin (eds), Early English in the Computer Age: Explorations through the Helsinki Corpus (Topics in English Linguistics 11). Berlin: Mouton de Gruyter, 1993, 253–266. Early American English [a period introduction to the Helsinki Corpus] (M. Kytö). In M. Rissanen, M. Kytö & M. Palander-Collin (eds), Early English in the computer age: Explorations through the Helsinki Corpus (Topics in English Linguistics 11). Berlin: Mouton de Gruyter, 1993, 83–91. General introduction [to the Helsinki Corpus] (M. Kytö & M. Rissanen). In M. Rissanen, M. Kytö & M. Palander-Collin (eds), Early English in the Computer Age: Explorations through the Helsinki Corpus (Topics in English Linguistics 11). Berlin: Mouton de Gruyter, 1993, 1–17. The first international colloquium on English diachronic corpora (St Catharine’s College ­Cambridge, 25–27 March, 1993) (M. Kytö, M. Rissanen & S. Wright). ICAME Journal 17, 1993, 132–137. Third-person present singular verb inflection in early British and American English (M. Kytö). Language Variation and Change 5, 1993, 113–139. https://doi.org/10.1017/S0954394500001447 A language in transition: The Helsinki Corpus of English Texts (M. Kytö & M. Rissanen). ICAME Journal 16, 1992, 7–27. On the arrival of English to American shores (M. Kytö). In P. Pahta, I. Taavitsainen & L. KahlasTarkka (eds), “As who say”—Many Happy Returns: Essays In Honour of Saara Nevanlinna. Helsinki: Privately printed, 1992, 41–50. SHALL (SHOULD) vs. WILL (WOULD) in early British and American English: A variational study of change (M. Kytö). NOWELE 19, 1992, 3–73.  https://doi.org/10.1075/nowele.19.01kyt Can (could) vs. may (might): Regional variation in Early Modern English? (M. Kytö). In D. Kastovsky (ed.), Historical English Syntax (Topics in English Linguistics 2). Berlin: Mouton de Gruyter, 1991, 233–289.  https://doi.org/10.1515/9783110863314.233 Empirical evidence for the study of the structure of English: Helsinki Corpus of English Texts: Diachronic and dialectal (M. Kytö & M. Rissanen). The European English Messenger, Zero Issue, 1990, 22–25. Introduction to the use of the Helsinki Corpus of English Texts: Diachronic and dialectal (M. Kytö). In M. Ljung (ed.), Proceedings from the Stockholm Conference on the Use of Computers in Language Research and Teaching, September 7–9, 1989 (Stockholm Papers in English Language and Literature 6). Stockholm: University of Stockholm, English Department, 1990, 41–56. SHALL or WILL? Choice of the variant form in Early Modern English, British and American (M. Kytö). In H. Andersen & K. Koerner (eds), Historical Linguistics 1987. Papers from the Eighth International Conference on Historical Linguistics (8. ICHL) (Lille, 31 August–4 September, 1987) (Current Issues in Linguistic Theory 66). Amsterdam: John Benjamins, 1990, 275–288.  https://doi.org/10.1075/cilt.66.18kyt The Helsinki Corpus of English Texts: Diachronic and dialectal (M. Kytö & M. Rissanen). Medieval English Studies Newsletter 23, 1990, 11–14. The use of SHALL and WILL from Middle to Early Modern English (M. Kytö). In G. Caie, K. Haastrup, A. L. Jakobsen, J. E. Nielsen, J. Sevaldsen, H. Specht & A. Zettersten (eds), Proceedings from the Fourth Nordic Conference for English Studies, Helsingør, May 11–13, 1989 (1–2). Helsingør: University of Copenhagen, Department of English, 1990, 71–85.

346 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

CAN or MAY? Choice of the variant form in Early Modern English, British and American (M. Kytö). In T. J. Walsh (ed.), Georgetown University Round Table on Languages and Linguistics 1988: Synchronic and Diachronic Approaches to Linguistic Variation and Change. Washington D. C.: Georgetown University Press, 1989, 163–178. Progress report on the diachronic part of the Helsinki Corpus (M. Kytö). ICAME Journal 13, 1989, 12–15. Recording early American English: Robert Keayne’s Note-books (M. Kytö). M.H.S. Miscellany 36, 1988, 4–5. The Helsinki Corpus of English Texts: Classifying and coding the diachronic part (M. Kytö & M. Rissanen). In M. Kytö, O. Ihalainen & M. Rissanen (eds), Corpus Linguistics, Hard and Soft. Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora (Language and Computers: Studies in Practical Linguistics 2). Amsterdam: Rodopi, 1988, 169–179. CAN (COULD) vs. MAY (MIGHT) in Old and Middle English: Testing a diachronic corpus (M. Kytö). In L. Kahlas-Tarkka (ed.), Neophilologica Fennica: Société Néophilologique 100 ans (Mémoires de la Société Néophilologique de Helsinki 45). Helsinki: Société Néophilologique, 1987, 205–240. In search of the roots of American English (M. Kytö & M. Rissanen). In M. Henriksson (ed.), Ten Years of American Studies: The Helsinki Experience. Helsinki: Suomen Historiallinen Seura, 1987, 215–233. On the use of the modal auxiliaries indicating ‘possibility’ in early American English (M. Kytö). In M. Harris & P. Ramat (eds), Historical Development of Auxiliaries (Trends in Linguistics, Studies and Monographs 35). Berlin: Mouton de Gruyter, 1987, 145–170. https://doi.org/10.1515/9783110856910.145 The Helsinki Corpus of English Texts: Diachronic and dialectal. Report on work in progress (O. Ihalainen, M. Kytö & M. Rissanen). In W. Meijs (ed.), Corpus Linguistics and Beyond. Proceedings of the Seventh International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi, 1987, 21–32. May and might indicating ‘epistemic possibility’ in early American English (M. Kytö). In S. Jacobson (ed.), Papers from the Third Scandinavian Symposium on Syntactic Variation, Stockholm, May 11–12, 1985 (Stockholm Studies in English 65). Stockholm: Almqvist & Wiksell International, 1986, 131–142. On the use of the modal auxiliaries can and may in early American English (M. Kytö). In D. ­Sankoff (ed.), Diversity and Diachrony (Current Issues in Linguistic Theory 53). Amsterdam: John ­Benjamins, 1986, 123–138.  https://doi.org/10.1075/cilt.53.13kyt The syntactic study of early American English: The variationist at the mercy of his corpus? (M. Kytö & M. Rissanen). Neuphilologische Mitteilungen 84, 1983, 470–490.

Index

A advertising  113–115, 122–123, 126 agentivity  3, 47, 50–54, 56, 58–60 AmE  188, 190, 194, 197, 201, 285, 288 American English  xi, 4–5, 74, 187–194, 197, 202, 231, 267, 285, 287–289, 303 Americanism  190, 192, 199, 201 amplifier(s)  301–303, 305–310, 312–313, 315–316, 319 B BNC  5, 187, 208, 212–215, 219–221, 227–234, 236–244, 265–267, 278–280, 303 BNC1994  212–215, 219, 221, 227, 229–234, 236–244 BNC2014  5, 208, 212–215, 219–221, 227–234, 236–244, 265–267, 278–280 C CED  xii, 2–3, 63–69, 71–74, 76–77, 79–80, 83–84, 88–90, 182–183, 193, 228 co-occurrence  5, 37, 122, 237 coercive question  154, 162, 165–166, 168, 170 coerciveness  153, 158 COHA  4, 187–189, 194–196 COLT  5, 250, 265–267, 269– 270, 273–280 community of practice  95, 97–100, 111 courtroom discourse  33, 153

D direct speech  63–64, 67–68, 73–75, 86, 88, 289, 302, 304 discourse community  95, 97–98 discourse marker(s)  12, 135, 207–208, 210–212, 218–219, 221, 248, 250, 258–259, 291 E EEBO  3–4, 96, 98, 100, 103–105, 110–111, 117–120 EFL  283–285, 293, 297 emotion(s)  13, 20–22, 25–26, 28, 36, 79–85, 87, 90 English-Swedish Parallel Corpus  5, 247, 249 epistemic adverbs  4, 133–136, 139–142, 144–146, 150 explanatory so  207–208, 210–213, 215–223 expression(s) of future  4, 227–228, 230, 232–239, 240–241, 243–244 F fabliaux  79, 81–82, 87, 90–91 fiction  5, 13, 63–69, 72, 76–77, 83, 86, 91, 115–116, 133, 187–188, 194–197, 249, 285, 300–302, 304, 308–313, 322, 325–327, 329, 332 functional categories  63, 66, 68, 70–71, 76, 250 G gotten  4, 187–197, 199–202 H HC  xi, 2–3, 79–80, 84, 86, 89 humour  3, 79–80, 82, 91

I Iago  3, 32, 47, 49–50, 54, 56–59 innit  5, 265–266, 268–280 inserts  13, 76, 83 intensifier(s)  5, 209, 301–303, 305–306, 309–312 interjection(s)  3, 11–13, 79–88, 90–91, 182, 208, 268 intersubjectification  176 involvement  1–3, 52, 79, 81, 86, 113, 116, 118, 124–126, 249, 261, 317, 328–329 Irish English  4, 173–176, 178, 180, 182–185 Irishness  173, 177, 180–181, 184–185 K keyword analysis  3, 31, 33, 43, 83 kind of  5, 247–262 King Lear  17–18, 31, 35–36, 39, 44 L language change  2, 6, 197, 220, 227, 235–236, 244 legal practitioners  153, 159, 161–162, 170 log ratio  34 M Macbeth  17, 19–20, 31–32, 35–36, 42–44 metadata  117–118, 120 MLE  5, 265–267, 278–280 multi-dimensional analysis  323 N normative works  283–284, 296

348 Voices Past and Present – Studies of Involved, Speech-related and Spoken Texts

O OBC  4, 133, 136–139, 150, 153–154, 158–159, 170 orality  1–2, 5, 81, 283, 317 Othello  3, 17–18, 20, 25, 47–50, 52, 54, 57–59, 80, 84–85, 88 P personal pronoun(s)  5, 88, 115, 227–228, 236–243, 275 pragmatic marker(s)  4–5, 12, 173, 175–178, 180–181, 183–184, 248, 265–266, 269–271, 275–278 pragmatic noise  1, 3, 11–28, 32, 79, 83, 89 prescriptivism  187, 201–202, 283–284

Q question strategies  4, 153–155, 157, 159, 162–70 R regularisation  14, 33 religious debate  95 result clause(s)  207, 209–211 Romeo and Juliet  13, 17–18, 20–21, 32, 88 S second-person pronoun(s)  4, 32, 113–126 Semantic EEBO  96, 100, 103–105, 110–111 semantic features  47, 53 social media  95, 105, 229, 320 soliloquy(-ies)  3, 47–50, 54, 56–59, 83, 181

sort of  5, 247–253, 256–262 speech reporting expression(s) 3, 63–65, 67–69, 73–77 speech representation  63–65, 68–71, 77 sure  4, 173–185, 309 T teaching materials  283–284 Titus Andronicus  17–18, 31, 36 type noun(s)  247–248, 250– 253, 255–259, 262 W web registers  317, 320, 323, 325, 328, 330–332, 334–335 wh-question(s)  154–155, 157, 160–162, 164, 167–168, 170, 319

This volume provides a diachronic and synchronic overview of linguistic variability and change in involved, speech-related and spoken texts in English. While previous works on the topic have focused on more limited time periods, this book covers data from the 16th century up to the present day. The studies offer new insights into historical and presentday corpus pragmatics by identifying and exploring features of orality in a variety of registers. For readers who are new to the field, the range of approaches will provide a helpful overview; for readers who are already familiar with the field, the volume will shed light on the complexity of factors such as register, sociolinguistic variability and language attitude, thus making it a useful resource and stepping stone for further exploration. The volume celebrates the groundbreaking contributions of Professor Merja Kytö in making accessible speech-related corpus material and leading the way in its exploration.

isbn 978 90 272 0765 4

JOHN BENJAMINS PUBLISHING COMPANY