Spoken Corpus Linguistics: From Monomodal to Multimodal 0415888298, 9780415888295

In this book, Adolphs and Carter explore key approaches to work in spoken corpus linguistics. The book discusses some of

561 90 2MB

English Pages 216 Year 2012

Report DMCA / Copyright


Polecaj historie

Spoken Corpus Linguistics: From Monomodal to Multimodal
 0415888298, 9780415888295

Citation preview

Spoken Corpus Linguistics

In this book Adolphs and Carter explore key approaches to work in spoken corpus linguistics. The book discusses some of the pioneering challenges faced in designing, building and utilising insights from the analysis of spoken corpora, arguing that, even though written text is heavily privileged in corpus research, the spoken language can reveal patterns of language use that are both different and distinctive and that this has important implications for the way in which language is described, for the study of human communication and for the field of applied linguistics as a whole. Spoken Corpus Linguistics is divided into two main parts. The first part sets the scene by discussing traditional and new approaches to monomodal spoken corpus analysis, with a focus on discourse organisation and conversational interaction and with particular attention to forms of language such as discourse markers and multi-word units, areas of language not conventionally described but which are argued to be of importance to spoken language description and to spoken language learning and teaching research within the field of applied linguistics. The second part of the book moves into the multimodal domain and focuses on alignments between language and gesture in a spoken corpus, with particular reference to gestural movements of the head and the hand and to the different ways in which prosody might be used to enhance communication. A brief final chapter discusses new developments in the area of spoken corpus research, including the relationship between language and context, emerging research methods as well as discussing possible shifts in scope and emphasis in spoken corpus research in the future. Svenja Adolphs is Professor of English Language and Linguistics at the University of Nottingham. Ronald Carter is Professor of Modern English Language at the University of Nottingham.

Routledge Advances in Corpus Linguistics Edited by Tony McEnery, Lancaster University, UK Michael Hoey, Liverpool University, UK

1 Swearing in English Bad Language, Purity and Power from 1586 to the Present Tony McEnery 2 Antonymy A Corpus-Based Perspective Steven Jones 3 Modelling Variation in Spoken and Written English David Y. W. Lee 4 The Linguistics of Political Argument The Spin-Doctor and the WolfPack at the White House Alan Partington 5 Corpus Stylistics Speech, Writing and Thought Presentation in a Corpus of English Writing Elena Semino and Mick Short 6 Discourse Markers Across Languages A Contrastive Study of SecondLevel Discourse Markers in Native and Non-Native Text with Implications for General and Pedagogic Lexicography Dirk Siepmann 7 Grammaticalization and English Complex Prepositions A Corpus-Based Study Sebastian Hoffmann

8 Public Discourses of Gay Men Paul Baker 9 Semantic Prosody A Critical Evaluation Dominic Stewart 10 Corpus Assisted Discourse Studies on the Iraq Conflict Wording the War Edited by John Morley and Paul Bayley 11 Corpus-Based Contrastive Studies of English and Chinese Richard Xiao and Tony McEnery 12 The Discourse of Teaching Practice Feedback A Corpus-Based Investigation of Spoken and Written Modes Fiona Farr 13 Corpus Approaches to Evaluation Susan Hunston 14 Corpus Stylistics and Dickens’s Fiction Michaela Mahlberg 15 Spoken Corpus Linguistics From Monomodal to Multimodal Svenja Adolphs and Ronald Carter

Spoken Corpus Linguistics From Monomodal to Multimodal Svenja Adolphs and Ronald Carter

First published 2013 by Routledge 711 Third Avenue, New York, NY 10017 Simultaneously published in the UK by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2013 Taylor & Francis The right of Svenja Adolphs and Ronald Carter to be identified as author of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging in Publication Data Adolphs, Svenja, author. Spoken Corpus Linguistics : From monomodal to multimodal / Svenja Adolphs and Ronald Carter. pages cm. — (Routledge Advances in Corpus Linguistics ; 15) Includes bibliographical references and index. 1. Corpora (Linguistics) 2. Discourse analysis. 3. Speech acts (Linguistics) 4. Linguistic analysis (Linguistics) 5. Grammar, Comparative and general. 6. Computational linguistics. I. Carter, Ronald, 1947– author. II. Title. P128.C68A36 2013 410.1'88—dc23 2012042037 ISBN: 978-0-415-88829-5 (hbk) ISBN: 978-0-203-52614-9 (ebk) Typeset in Sabon by Apex CoVantage, LLC


Acknowledgments Introduction

vii 1

PART I Monomodal Spoken Corpus Analysis 1 Making a Start: Building and Analyzing a Spoken Corpus


2 Corpus and Spoken Interaction: Multi-Word Units in Spoken English


3 From Concordance to Discourse: Responses to Speakers


4 Case Studies in Applied Spoken Corpus Linguistics


PART II Multimodal Spoken Corpus Analysis 5 Sound Evidence: Prosody and Spoken Corpora


6 Moving Beyond the Text


7 Developing a Framework for Analyzing ‘Headtalk’ and ‘Handtalk’: First Steps


8 Future Directions


Appendix Notes References Index

183 185 187 203

This page intentionally left blank


Thanks to our series editors, Tony McEnery and Michael Hoey, for their patience, support, always helpful critical comment and always good guidance at all stages in the writing of this book. This book draws on research that we have undertaken in the field of spoken corpus linguistics over the course of the past 15 years or so and in that sense is a summary collection of research material that has involved collaborative work on many fronts as part of funded research projects, coauthorship of books and articles, research reports and conference papers. Thanks are due to many colleagues and friends, all of whom have kindly granted us permission to use material that we have co-written, co-constructed, and co-designed or previously published with them both as part of our overall narrative and argument and as part of specific case studies. In this sense the book is also fundamentally a compilation with many voices, and we offer therefore warmest thanks to Irina Dahlmann, Dave Evans, Loretta Fung, Daniel Hunt, Anne O’Keeffe, Dawn Knight, Mike McCarthy, Ron Martinez, Phoebe Ming-sum Lin, Pawel Szudarski and Catherine Smith. Thanks, too, to Peter Stockwell, Sarah Atkins and Dave Evans who have also generously allowed us to use video images of them. Special thanks are due to Dawn Knight who worked as a researcher with us on several ESRC-funded projects and who has been an irreplaceable source of advice, guidance and support to us during the course of these and other projects. In the writing of this book, we have also used material from previously published papers and chapters in books. In all cases the material has been rewritten, or recast, revised and updated for the purposes of this book. These are ‘Discourse markers and spoken English: Native and learner use in pedagogic settings’, Applied Linguistics, 28 (3) (2007), 410–439 (Fung and Carter); ‘This, that and the other’: Multi-word clusters in spoken English as visible patterns of interaction’, TEANGA 21, 30–52 (2004) (McCarthy and Carter);‘Beyond the text: Construction and analysis of multi-modal linguistic corpora’, 2nd Annual International e-Social Science conference, http://www. ncess.ac.uk/research/sgp/headtalk/, June 2006, University of Manchester (multiauthored) ‘HeadTalk’, ‘HandTalk’ and the corpus: Towards a framework for multi-modal, multi-media corpus development, Corpora (2009), 4(1), 1–32



(Knight, Evans, Carter and Adolphs); ‘Listening to lectures: thinking smaller’, European Journal of Applied Linguistics and TEFL (1 (1) (2012) (Carter, Martinez, Adolphs and Smith); Linking the verbal and visual: New directions for Corpus Linguistics Language and Computers. 64, 275–291 (Carter and Adolphs); From Corpus to Classroom: Language Use and Language Teaching (CUP: Cambridge, 2007) (O’Keeffe, McCarthy and Carter); ‘Building a spoken corpus: what are the basics? In: O’Keeffe, A. and McCarthy, M. (eds) The Routledge Handbook of Corpus Linguistics. London: Routledge, 2010: 38–52 (Adolphs and Knight); Pauses as an Indicator of Psycholinguistically Valid Multi-Word Expressions (MWUs)? In: Proceedings of Association of Computational Linguistics (ACL) 2007 workshop ‘A Broader Perspective on Multiword Expressions’: 49–56. (Dahlmann and Adolphs); ‘Beyond the Word: New challenges in analysing corpora of spoken English’ European Journal of English Studies 11 (2) (2007): 133–146 (Carter and Adolphs); ‘Corpus Linguistics’ In: Simpson, J. (ed) The Routledge Handbook of Applied Linguistics. (London: Routledge, 2010) (Adolphs and Lin); ‘Response tokens in British and Irish discourse: Corpus, context and variational pragmatics’, In: Barron, A. and Schneider, K. (eds) Variational Pragmatics. Amsterdam: John Benjamins; (2008) (O’Keeffe and Adolphs); ‘Sound evidence: A multimodal corpus-based study into the notion of holistic processing of multiword units’, In: Barfield, A. and Gyllstad, H. (eds) Collocating in Another Language: Multiple Interpretations. (Palgrave, Macmillan, 2009) (Lin and Adolphs); Multi-modal spoken corpus analysis and language description: the case of multi-word expressions. In: Baker P., (ed), Contemporary Approaches to Corpus Linguistics (London, Continuum Press, 2009) (Dahlman and Adolphs); Using a corpus to study spoken language. In: Hunston, S. and Oakey, D. (eds), Doing Applied Linguistics: Key concepts and skills for postgraduate study (London: Routledge) (Adolphs). Our research on monomodal corpora has been funded by Cambridge University Press, and we thank the Press for allowing us to use sample extracts from the CANCODE corpus (now part of the over one-billion word Cambridge English Corpus) with the following citation: “CANCODE means Cambridge and Nottingham Corpus of Discourse in English. The corpus consists of five million words of informal conversations recorded across the islands of Britain and Ireland. Cambridge University Press is the sole copyright holder.” Where indicated, BNC data cited herein has been extracted from the British National Corpus Online service, managed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved. Some examples of usage taken from the British National Corpus were obtained under the terms of the BNC End User License. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the website at http://www.natcorp.ox.ac.uk This book also draws on previously published research material produced in connection with three Economic and Social Research Council (ESRC)



funded research projects. All previous publications and ESRC research reports have been re-accented, restructured, revised and updated, and new data has been added, where appropriate. Our research on multimodal corpora has been funded by ESRC (grant numbers RES-149–25–1067, RES149–25–0035, RES-149–25–1016), EPSRC (grant number EP/C548191/1) and again partly in collaboration with Cambridge University Press. Svenja Adolphs and Ronald Carter, Nottingham August 2012

This page intentionally left blank


. . . we speak with our vocal organs, but we converse with our whole body. (Abercrombie, 1963: 55) . . . the reflexivity of gesture, movement and setting is difficult to express in a transcript. (Saferstein, 2004: 213)

There are now a very large number of books and studies devoted to corpus linguistics and the written language. Databases for the study of written language run into millions of words. Yet there are still relatively few projects devoted to spoken corpus linguistics. In order to develop a more balanced view and to represent more fully the important part the spoken language plays, this book looks at some key approaches that are devoted to the insights and understandings offered by work in spoken corpus linguistics. The first part of this book discusses some of the pioneering challenges faced in designing, building and utilizing insights from the analysis of spoken corpora, arguing that, even though writing is heavily privileged, the spoken language can reveal patterns of language use that are both different and distinctive, and that this has important implications for the way in which language is described, for the study of human communication and for the field of applied linguistics as a whole. However, while the analysis of spoken corpora can provide important insights into language patterning and help establish linguistic profiles of particular social contexts, such analysis is often limited to the more monomodal and textual dimension of communication. Communication processes are multimodal in nature, and there is now a distinct need for the development of theories, analytical frameworks, and resources that enable the user to begin to carry out analyses of both the speech and gestures of the participants in a conversation; and to explore how the verbal and nonverbal complement one another. The second part of this book illustrates some starting points for this new direction in corpus linguistics, embracing the development of multimodal spoken corpora by integrating textual, prosodic


Spoken Corpus Linguistics

and gestural representations. While spoken corpus linguistics is a necessary development of written corpus linguistics, multimodal spoken corpus linguistics is a necessary development of monomodal spoken corpus linguistics. Spoken Corpus Linguistics is divided into two main parts. The first part sets the scene by presenting and discussing traditional and new approaches to monomodal spoken corpus analysis, with a focus on discourse organization and conversational interaction and with particular attention to forms of language such as discourse markers and multi-word units, areas of language not conventionally described but which are argued to be of importance to spoken language description and to spoken language learning and teaching. The second part of the book moves into the multimodal domain and focuses on alignments between language and gesture in a spoken corpus, with particular reference to gestural movements of the head and the hand and to the different ways in which prosody might be used to enhance communication. A brief final chapter discusses new developments in the area of spoken corpus research, including the relationship between language and context and emerging research methods as well as discussing possible shifts in scope and emphasis in spoken corpus research in the future. There are currently few spoken corpus projects which embrace the painstaking annotation of spoken data, and there are even fewer that explore the multimodal dimension and attempt to align audio and video streams with the transcript. The benefits of going through this process are substantial not only in terms of discovering new patterns between the different modes but also in terms of adding to the description and identification of patterns that have been derived on the basis of textual analysis of the transcripts. Inevitably, the research reported at this stage in the development of multimodal spoken corpus research is incipient and is, when compared with the norms that apply to many corpus linguistic studies, based on case studies using limited data sets. However, in this book, opportunities are taken to describe new research processes and practices in some detail, and in this way the book makes a positive contribution to ongoing discussions and debates about corpus methods and the centrality of spoken language to that enterprise.

Part I

Monomodal Spoken Corpus Analysis

This page intentionally left blank


Making a Start Building and Analyzing a Spoken Corpus



The aim of this chapter is to provide an overview of some of the main approaches to spoken corpus research and to explore the contributions that may be made by corpus analysis toward the study of spoken language. Among the topics discussed are the design of spoken corpora and how they differ from written corpus design; issues involved in the transcription and coding of spoken data; the role of metadata that accounts for the participants and contexts for the recorded speech; and particular questions of research ethics that intersect with the collection, coding, representation and storage of the data. By reviewing previous work in this way, the chapter also aims to lay a basis for discussions in the following chapters of the ways in which recent monomodal and multimodal spoken corpus research has extended and enriched these foundations.

What Is a Spoken Corpus? Spoken corpora provide a unique resource for the exploration of naturally occurring discourse; and the growing interest in the development of spoken corpora is testament to the value they provide to a diverse number of research communities. Following the early developments of relatively small spoken corpora in the 1960s, such as the London-Lund corpus (Svartvik, 1990), the past two decades have seen major advances in the collection and development of spoken corpora, particularly but not exclusively in the English language. Some examples of spoken corpora are the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) (McCarthy, 1998), a five-million word corpus collected mainly in Britain and Ireland; the Limerick Corpus of Irish English (LCIE) (Farr et al., 2004); the Hong Kong Corpus of Spoken English (HKCSE) (Cheng and Warren, 1999, 2000, 2002); the Michigan Corpus of Academic Spoken English (MICASE) (Simpson et al., 2000) and the spoken components of the British National Corpus (BNC) (www.natcorp.ox.ac. uk) and the COBUILD Bank of English (www.mycobuild.com/about-collinscorpus.aspx). In addition, there is a growing interest in the development of


Spoken Corpus Linguistics

spoken corpora of international varieties of English and other languages: for example, the International Corpus of English (ICE corpus: ice-corpora.net/ice/ index.htm), as well as of corpora of learner language aimed at the development and assessment of competencies on the part of learners of English as a second or foreign language (Bolton et al., 2003; De Cock et al., 1998; the English Profile Cambridge Learner Corpus www.englishprofile.org). These corpora provide researchers with rich samples of spoken language-in-use, which form the basis of new and emerging descriptions of naturally occurring discourse. Research outputs based on the analysis of spoken corpora are wide ranging and include, for example, descriptions of lexis and grammar (Biber and Conrad, 1999; Carter and McCarthy, 2006), discourse particles (Aijmer 2002), courtroom talk (Cotterill, 2004), media discourse (O’Keeffe, 2006), language teaching and learning (O’Keeffe et al., 2007) and health-care communication (Adolphs et al., 2004, 2007). This research covers phenomena at utterance level as well as at the level of discourse. A number of studies (e.g., McCarthy, 1998) start with the exploration of concordance outputs and frequency information as a point of entry into the data and carry out subsequent analyses at the level of discourse, while others start with a discourse analytical approach followed by subsequent analyses of concordance data. Before a spoken corpus can be subjected to this kind of analysis, the data has to be collected, transcribed, and categorized in a way that allows the researcher to address specific research questions. This chapter deals with the basic steps that need to be taken when assembling a spoken corpus for research purposes. We will discuss the different considerations behind corpus design and data collection as well as associated issues of permission and ethics, transcription and representation of spoken discourse.



Issues of basic corpus design need to be addressed prior to any discussion of the content of the corpus, or of the methods used to organize the data. Often, design and construction principles are locally determined (Conrad, 2002: 77); however, some principles articulated in relation to corpus design by Sinclair (Sinclair, 2005) can be seen as general guidelines for both spoken and written corpora. Sinclair sets out the following guidelines: 1. The contents of a corpus should be selected without regard for the language they contain, but according to their communicative functions in the contexts and communities in which the data is collected. 2 The corpus should be as representative as possible of the target language. 3. While still maintaining coverage of the language as a whole, a corpus should aim to be homogeneous in all its different components. 4. Criteria for determining the structure of a corpus should be small in number, clearly separate from each other, and be efficient as a group

Making a Start


in delineating a corpus that is representative of the language or variety under examination. 5. Any information about a text (for example, information about the age, gender, social class of speakers) should be stored separately from the plain text and only merged when required in applications. 6. Samples of language for a corpus should, wherever possible, consist of entire documents or transcriptions of complete speech events, or should get as close to this target as possible. This means that samples will differ substantially in size. 7. The design and composition of a corpus should be documented fully, together with information about the corpus contents as well as the reasons that support all the decisions taken. 8. The corpus builder should retain, as target notions, representativeness and balance. While these are not precisely definable and attainable goals, they must be used to guide the design of a corpus and the selection of its components. 9. Any control of subject matter in a corpus should be imposed by the use of external, and not internal, criteria. (For similar prescriptions, see Biber et al., 1998; Reppen and Simpson, 2002: 93; Stuart, 2005: 185; Wynne, 2005). Some of these guidelines can be difficult to uphold due to the nature of language itself that has been described elsewhere by Sinclair as a ‘. . . population without limits’, even though ‘a corpus is necessarily finite at any one point’ (Sinclair, 2008: 30). The requirements for ensuring representativeness, balance and homogeneity in the design process are thus to some degree idealistic. They are also specific and relative to individual research aims, and thus have to be judged in relation to the different questions that are asked of the data. These requirements are, however, fundamental. With regard to the guidelines above, there are a number of issues that pertain specifically to the construction of spoken corpora. These are best described in relation to the fundamental stages of construction: • Recording • Transcribing, Coding and Mark-up • Management and Analysis These issues are not, of course, separate or independent, and in many ways they interact and influence each other. For example, the stage of recording data is determined by the type of analysis that is planned, which in turn determines the granularity and detail of transcription, coding and markup. It is therefore important to plan the development of a corpus carefully and to consider all practical and ethical issues that may arise. While the planning phase is an important stage in the design process, the approach to corpus construction needs to be continually reassessed (see also Knight


Spoken Corpus Linguistics

et al., 2006; Lapadat and Lindsay, 1999; Leech et al., 1995; Psathas and Anderson, 1990; Thompson, 2005, for further discussion of issues of design and corpus planning). Without such constant reassessment, there is a real danger that the corpus that emerges may not be homogenous, balanced or representative. The use of a checklist or log to chart progress can be helpful and act as an invaluable point of reference for discussing anomalies or absences that may occur in the data. We will now discuss in greater detail the different stages in the construction of spoken corpora.

Recording At the data recording stage, care needs to be taken that the recordings made are of a sufficient quality to be used and reused over a period of time and are sufficiently rich and varied as language used to promote subsequent analysis. This involves documenting information about the participants, the location, and the overall context in which the speech events occur. With authentic spoken communication, it is even more the case that any loss or omission of data cannot be subsequently retrieved or repaired. It is of course not feasible to establish rules for data collection, not least because they inevitably depend on what is being investigated. Nonetheless it is important to guarantee that the corpus construction process is as systematic, principled and replicable as possible. With regard to the equipment used for recording data, there are now a number of high-quality voice recorders available for recording spoken interaction. Most are now digital, and recordings can be easily transferred to a PC or other device. Video recordings of spoken interactions are becoming an increasingly important alternative to pure sound recordings, as the resulting data offer further scope for analysis. Video recording equipment is thus starting to offer a useful alternative to sound recorders, and, as we shall see in subsequent chapters of this book, the availability of very small and unobtrusive video recording equipment means that this procedure can now be used in a variety of different contexts (see Crabtree et al., 2006). As part of the planning process for the recording stage, it is essential that decisions are made determining the design of the recording process (i.e., what kind of data needs to be recorded and how much), as well as the physical conditions (i.e., the when and where) under which the recordings take place and all this with particular reference to the nature of the equipment that is being used.

How Long Does It Take? Spoken corpus development is very expensive in both time and real costs. Thompson (2005) highlights that there is a need to decide between the ‘breadth’ and the ‘depth’ of what is to be recorded. The cost-benefit consideration assesses the relative advantages of capturing large amounts of data (in terms of time, number of encounters and different discourse contexts),

Making a Start


the amount of detail added during the transcription and annotation phase, and the nature of analyses that might be generated using the recordings. In terms of the number of hours of recording needed to achieve a particular word count, in previous corpus development projects, such as the CANCODE project, one hour of recorded casual conversation accounted for approximately 10,000 words of transcribed data. This is only a very broad estimate as the number of words recorded per hour depends on a range of different factors, including the clarity of the data, the rate of speech of the participants and the quality of the recording equipment. How much is enough data? At the heart of this question is the variable under investigation and a key ‘factor that affects how many different encounters you may have to record is how frequently the variable you are interested in occurs in talk’ (Cameron, 2001: 28). Thus, the amount of data we need to record to analyze words or phrases that occur very frequently is less than we would need to study less frequent items. A study of minimal response tokens in discourse of the kind undertaken in chapter 3—that is, tokens such as yeah, mmm, etc., which are very frequent in certain types of interaction, requires less data to be collected than the study of more lexicalized patterns which function as non-minimal response tokens and which may be less frequent, such as that’s great, oh, I see, or brilliant.



Metadata is critical to a corpus and helps to achieve the standards for representativeness, balance and homogeneity outlined by Sinclair above. Burnard (2005) uses the term ‘metadata’ as an umbrella term which includes editorial, analytic, descriptive and administrative categories: • Editorial metadata—providing information about the relationship between corpus components and their original source. • Analytic metadata—providing information about the way in which corpus components have been interpreted and analyzed. • Descriptive metadata—providing classificatory information derived from internal or external properties of the corpus components. • Administrative metadata—providing documentary information about the corpus itself, such as its title, its availability, its revision status etc. Metadata is valuable when the corpus is shared and reused by different research communities and can be kept in a separate database or included as a ‘header’ at the start of each document (usually encoded through markup language). A separate database with this information makes it easier to compare different types of documents and has the distinct advantage that it can be further extended by other users of the same data. The documentation of the design rationale, as well as the various editorial processes that


Spoken Corpus Linguistics

an individual text has been subjected to during the collection and archiving stages, allows other researchers to assess its suitability for their own research purposes, while at the same time enabling a critical evaluation of other studies that have drawn on this particular text to be carried out.

Ethics: Issues of Consent and Anonymization Typical practice in addressing ethical issues on a professional or institutional level suggests that corpus developers should ensure that, in advance of undertaking recording, informed consent is documented and received from all participants. Typically, such consent describes how recordings are to take place, how data is presented and outlines the research purposes for which it is used. While a participant’s consent to record may be relatively easy to obtain, and commonly involves a signature on a consent form, it is important to ensure that this consent holds true at every stage of the corpus compilation process with participants returned to for further permission if the requirements of data use change. For example, consent to distribute recorded material is different from consent to record and should include reference to the way in which recorded data is accessed and by whom, as well as the channel by which it may be distributed. Participants should be informed that once the data is distributed, it can be difficult or impossible to deal with requests at a later stage for retraction of consent. The maintenance of anonymity in relation to the social identity of the participants is also a key factor. In the case of some corpus projects, the names of participants and third parties are often modified or completely omitted, along with any other details which may make the identity of referents obvious (see Du Bois et al., 1992, 1993). The quest for anonymity can also extend to the use of specific words or phrases used, as well as topics of discussion or particular opinions deemed as ‘sensitive’ or ‘in any way compromising to the subject’ (Wray et al., 1998: 10–11). Issues of anonymity are more easily addressed when constructing written text-based, monomodal corpora. If the data used is already in the public domain and freely available, no alterations to the texts included are usually required. Otherwise, permission needs to be obtained from the relevant authors or publishers of texts or the relevant copyright holders. Similar procedures are involved when constructing spoken corpora which are based solely on transcripts of the recorded events. It is also important to recognize that anonymization is not simply a matter of protecting individual speakers. Modifications in relation to names, places and other ‘identifiers’ may also need to be made at the transcription stage, or as a next step following initial transcription of data. As we shall see in chapters 2 and 3, anonymity is more problematic when it comes to audio or video recordings of conversations in corpora. Audio data is ‘raw’ as it captures the vocalizations of a person. Such very individual ‘fingerprints’ make it relatively easy to identify participants when audio files are replayed. Alteration of vocal output for the purpose of anonymization

Making a Start


can make for an inauthentic record and render the data unsuitable for naturalistic phonetic and prosodic analysis. A similar problem arises with the use of video data. Although it is possible to shadow, blur or pixellate video data in order to conceal the identity of speakers (see Newton et al., 2005 for a method for pixellating video), such procedures can be difficult to apply in practice (especially with large data sets). In addition, practices such as this may obscure key facial features of the individual, blurring distinctions between gestures and language forms, with the result that data sets may again become impractical for most forms of multimodal corpus analysis. Considering the difficulties involved in the anonymization of audiovisual data, it is important to discuss these issues fully with the participants prior to the recording, and to ensure that participants understand the nature of the recording and the format of distribution and access. And while those who agree for their day-to-day activities to be recorded for research purposes may not be concerned about anonymity, the issue of protecting the identity of third parties in the interactions remains an ethical challenge with such data, as does the issue of reusing and sharing contextually sensitive data recorded as part of multimodal corpora. Thus, challenges are raised that are central to the development of any corpus; namely, how can multisite, multiuser, multisource, multimedia data sets protect the rights of participants as an integral part of the way in which they are constructed and used? Reconciling the desire for the traceability and probity of corpus data with the need for confidentiality and data protection, while at the same time ensuring that the data is genuinely naturally occurring, requires serious consideration and should be addressed at the outset of any corpus development project.



One of the paradoxes of spoken language capture is that the way in which the capture is represented is often by means of the written language. Recording unscripted, naturally occurring conversations is necessary if we are to explore the differences and distinctions between spoken and written language and is necessary both for what are the relatively new domains in language study of spoken grammar and lexis, and for the analysis of the construction of meaning in interaction (Cameron, 2001; Carter, 2004; Halliday, 2004; McCarthy, 1998). However, in the representation of spoken data, the recorded conversations have to undergo a transition from the spoken mode to the written mode before they can be included in a corpus. In transcribing spoken discourse, researchers need to make decisions concerning the amount of detail to be included in the written record. Since there are in spoken interaction so many layers of linguistic (and nonlinguistic) detail that carry meaning, this task can easily become a seemingly endlessly regressive


Spoken Corpus Linguistics

one with a potentially infinite amount of contextual information to record (Cook, 1990). One of the main reasons for this is that spoken interaction is essentially multimodal in nature, featuring an interplay between textual, prosodic, gestural and environmental elements in the construction of meaning—a main focus in the second part of this book. In order to determine what type of transcription is needed, it is therefore important to decide what the main purpose of the study is. It is generally advisable to identify the spoken features of interest at the outset and to tailor the focus of the transcription accordingly. For example, a study of discourse structure might require the transcription to include overlaps but not detailed prosodic information. Although transcription may appear on the surface to be a relatively straightforward or even mechanical process, it is, of course, one which requires constant decision making and is best viewed as being ‘both interpretative and constructive’ (Lapadat and Lindsay, 1999: 77; see also Cameron, 2001 and O’Connell and Kowal, 1999: 104). But transcribers cannot be too idiosyncratic, and at the same time there is a need to follow certain transcription guidelines in order to make them reusable by the research community. This, in turn, allows both the size and quality of corpus data available for linguistic research to be enhanced, without individuals or teams of researchers expending large amounts of time and resources or having to start from scratch, each time a spoken corpus or part of a spoken corpus is required. Inevitably, however, there are now a number of different types of transcription conventions available, including those adopted by the Network of European Reference Corpora (NERC), which was used for the spoken component of the COBUILD project (Sinclair, 1987). Another set of guidelines for transcribing spoken data has been recommended by the Text Encoding Initiative (TEI) and has been applied, for example, to the British National Corpus (BNC) (see Sperberg-McQueen and Burnard, 1994). These guidelines include the representation of structural, contextual, prosodic, temporal and kinesic elements of spoken interactions and provide a useful resource for the transcription of the different levels of detail required to meet particular research goals. What to include in a transcription and at what level of detail varies as the level of detail of transcription reflects the basic needs of the type of research that they are intended to inform. One corpus that has been transcribed to a particularly advanced level of detail is the London-Lund corpus (Svartvik, 1990). Alongside the standard encoding of textual structure, speaker turns and overlaps, this corpus also includes prosodic information and has remained a valuable resource for a wide range of researchers over the years. What is vital is that the transcription system adopted is transparent and replicable so that other researchers can apply it consistently and, where appropriate, use the system for purposes of comparison and contrast with their own data. Key features that most users of spoken data will want to see marked up include who is speaking when and where and to whom; interruptions, overlaps, back channels, hesitations, pauses, and laughter as

Making a Start


they occur in the discourse; as well as some distinct pronunciation and prosodic variations.

Laying Out the Transcript Once decisions have been taken as to the features that are to be transcribed, and the level of granularity and detail of the information to be included is formulated, the next step is to decide on an appropriate layout of the transcription. There are many different possibilities for laying out a transcript, but it is important to acknowledge that ‘there will always be something of a tension between validity and ease of reading’ (Graddol et al., 1994: 185). A linear representation of turns with varying degrees of detail in terms of overlapping speech, prosody, and extralinguistic information remains the most usable format. The following example is taken from a subcomponent of the Nottingham Multi-Modal Corpus NMMC), a corpus described in more detail in the second part of this book. This corpus is fully aligned with audio and video streams, and while these particular features of the data will be explored in subsequent chapters, the basic transcription formats are outlined here: In this figure, speakers are denoted by and tags, gender (M male; F female), false starts are framed by and , while interruptions are indicated by the presence of the + tag. See Appendix 1 for further information on transcription conventions. The linear representation of the transcript in Figure 1.1 structures the speech in the manner of a conventional drama script. Using a linear format

Figure 1.1

An example of transcribed speech, taken from the NMMC


Spoken Corpus Linguistics

Figure 1.2

A column-based transcript

of transcription makes it particularly difficult to show speaker overlap, and for this reason some researchers prefer to use different columns and thus separate transcripts according to who is speaking (see Thompson, 2005). Since speech is rarely ‘orderly’ in the sense that one speaker speaks at a time, linear transcription has been criticized as a misrepresentation of discourse structure (Graddol et al., 1994: 182). Such criticism is particularly pertinent if, for example, four or five speakers are present in a conversation or where there is a high level of simultaneous speech. Another paradigm is to arrange the data in columns. The use of column transcripts (Figure 1.2) allows for a better representation of overlapping speech, presenting contributions from each speaker on the same line rather than having one speaker turn positioned after the other: A final, alternative method of representing speech in a transcript can be seen in Figure 1.3, below (based on Dahlmann and Adolphs, 2007). Here the speech is presented as a musical score, with the talk of each speaker arranged on an individual line (or track) on the score or with speech arranged according to the time at which it occurred. As with Figure 1.2, overlapping contributions are indicated as text which is positioned at the same point along the score and across each individual speaker track. The contributions of multiple speakers can also be represented using this method of transcription which is based on a similar principle to that used in transcription and coding software, such as Anvil (Kipp, 2001) and DRS (French et al., 2006). Can the different features of ongoing talk be integrated into a more holistic representation? Ongoing advances in the representation and alignment of different data streams have started to provide possibilities for studying spoken discourse in an integrated framework including textual, prosodic and video data. The alignment of the different elements and the software needed to analyze such a multimodal resource are still in the early stages of

Making a Start

Figure 1.3


Line-aligned transcription in a musical score-type format

development, and at the present time it is probably beyond the scope of the majority of individual corpus projects to develop a searchable resource that includes the kind of dynamic representation that would address the need for a less linear transcription layout.

Coding Spoken Data Current corpora are manually or automatically marked up and tagged according to a large range of discourse features such as information on speakers (demographic), context (extralinguistic information), P-O-S (part of speech—a form of grammatical tagging), prosodic features (marking stress in spoken corpora), phonetic features (marking speech sounds), or a combination of these (for more information, see Leech, 2004 and McEnery and Xiao, 2004; McEnery and Hardie, 2011). Annotated corpora are sometimes described as being tagged in so far as every single token in a given corpus has been assigned grammatical word-class labels. However, not all corpora are tagged, coded or marked up, as it is possible to have both annotated and unannotated corpora, although an annotated corpus can be used for a wider range of research purposes than an unannotated corpus.(Knight and Adolphs, 2008). The coding stage refers to ‘the assignment of events to stipulated symbolic categories’ (Bird and Liberman, 2001: 26). This is the stage where qualitative records of events start to become quantifiable, as specific items that are relevant to the variables under consideration are marked up for future analyses (Scholfield, 1995: 46). The coding stage is essentially a development of the transcription stage, providing further detail to supplement the basic systems of annotation and mark-up that are applied through the use of transcription notation. The coding stage thus operates at a higher level of abstraction compared to the transcription stage, and may include, among other factors, annotation of grammatical, semantic, pragmatic or discoursal features. Such coding is a key part of the process of annotating language resources, and the process is often undertaken with the use of coding software. The majority of corpora include some type of annotation as they allow corpora to be navigated in an automated way.


Spoken Corpus Linguistics

In this chapter so far we have outlined some of the issues that surround these three main stages of spoken corpus development and analysis. We have tried to emphasize that in order to be able to share resources across these diverse research communities, it is important that spoken corpora are developed in a way that enables reusability. We have also underlined that one of the best ways in which this reusability can be brought about is through the use of guidelines and frameworks for recording, representing and replaying spoken discourse. As advances in technology allow us to develop new kinds of spoken corpora, which include audiovisual data streams, as well as a much richer description of contextual variables, it will become increasingly important to agree on conventions for recording and representing this kind of data and the associated metadata. Similarly, advances in voice-to-text software may ease the burden of transcription, but will also rely heavily on the ability to follow clearly articulated conventions for coding and transcribing communicative events. Adherence to agreed conventions of this kind, especially when developing new kinds of multimodal and contextually enhanced spoken corpora, will significantly extend the scope of spoken corpus linguistics in the future.



One of the key contributions that monomodal spoken corpus analysis can make in relation to language description is to help us to identify features of spoken English which have been largely neglected and which remained underexplored due to a long tradition of using exclusively written data as the basis for descriptions of the English language. The analysis below explores some examples of the word like which are more typical of its usage in spoken discourse.

Word Frequencies An analysis of word frequencies in different corpora can be a useful first step to get a general idea of the texts in the corpus. Adolphs (2006: 41) shows the 10 most frequent words in the written part of the BNC compared with the spoken CANCODE corpus. While there is a certain amount of overlap between the two corpora in terms of the most frequent items, there are a number of differences which highlight the fact that one corpus contains written language samples while the other consists solely of spoken language. In particular, the personal pronouns ‘I’ and ‘you’ and the signal of active listenership ‘yeah’, which feature among the most frequent items in the CANCODE corpus but are absent from the written data, indicate key differences in mode. The word like is the 23rd most frequent item in the CANCODE corpus and occurs with an overall frequency of 31,743 instances. It is five times

Making a Start


more frequent in spoken than in written English. The high frequency of occurrence in the spoken language suggests that like carries a number of additional functions in speech compared with its uses in writing. In order to fully illustrate the meaning of items such as this, it is important to go beyond the sentence and the individual speaking turn which still remain the main units in descriptions of lexis and grammar in reference material about the English language. When we study the word like in the CANCODE corpus, it is important, therefore, to consider stretches of discourse which capture the nature of the interaction across the boundaries of interpersonal speaking turns. In addition to the traditional grammatical roles of like as a preposition, conjunction, common verb and suffix, an analysis of spoken corpus data illustrates that it often fulfills the function of a discourse marker. When like is used as a discourse marker, it often functions to mark direct speech in speech reporting episodes. In addition, like is often used to suggest points of comparison or exemplification even if those comparisons and examples are not actually drawn upon. These two functions are examined in more detail below.

Speech Reporting One of the more frequent uses of like in spoken English is to mark direct speech. This is a relatively recent phenomenon, but it is extensive, the corpus reveals, in the speech of younger speakers. The use of like in this way is not quite (yet) an example of spoken standard English. Like stands in the place of ‘said that plus quoted speech’. As such, it often introduces speech reports. In his study of CANCODE data, McCarthy (1998: 161) finds that ‘[. . .] in the narrative texts in the CANCODE corpus, speech reports are mainly conveyed in direct speech, and with reporting verbs in past simple (said, told) or historical present (says) tenses’. Representing the speech as a direct quotation adds to the authenticity of a narrative. The extracts below, all drawn from the CANCODE corpus, illustrate the use of like in place of reporting verbs, and this adds to the quotative ‘vividness’ and ‘real-time staging’ of the discourse. Extract 1 is drawn from a conversation between three female friends. [A group of three women in their twenties are discussing previous events. The conversation centers on an inflatable ‘blow-up’ chair.] Extract 1 I was having this hideous party last weekend and there was a blowup chair so I sat in it for a bit. I was feeling really antisocial and just really wanted to go home. And Jane and Benny had made me come cos it’s this Denise and oh er she’s a hairdresser and she had a lot of hairdressery friends. All dressed really smartly and standing round not saying anything.


Spoken Corpus Linguistics

Jane is? No Jane’s friend Denise. Oh right. So Jane made me come because she she she’d agreed to go and so she was like “I don’t want to go there and there are all these hairdressers and me and Benny.’’ [laughter] And that was that. It was really shit and I wish I hadn’t agreed to do it. Sat in the chair. After five minutes I was like “Yeah. Party’’ and singing. Making suggestions [laughter] Suddenly became the life and soul after sitting there. Barmy. [laughter] An anti anti social chair. Maybe I’ll get one. [laughs] They’re just so ugly. They are hideous. (CANCODE data) The word like is used here to report the speech of other people, as well as that of the speakers themselves. The goal of the conversation is to entertain the other speakers and to keep the conversation flowing. There is, as has been noted, almost a display or performance element to what is quoted by means of like, so that like, certainly much more than a plain ‘said’, serves to dramatically highlight what follows and to set the stage for a speech report which is marked by its quotability and especially by its intensity and by the very prosodic contours which are reproduced. Other elements that add to this goal and to the vividness of the conversation are the use of strong evaluative statements (‘It was really shit’, ‘they are hideous’) and the embedding of creativity in the narrative (‘An anti anti social chair’). The latter is achieved through the use and part repetition of ‘anti social’ in relation to an inanimate object.

Comparison and Exemplification The next extract from the CANCODE corpus illustrates another function of the word like which we commonly find in the spoken corpus data. The item like here introduces a new element into the discourse flow which prompts further elaboration. Its function is to compare the situation that is being discussed with a similar situation. This, in turn, extends the discussion in a particular direction.

Making a Start


[The conversation below takes place between two female cleaners in a university hall of residence.] Extract 2 Ah it’s a bit dicey cos there’s a lot of people going round at the minute. You know what with Adam checking the lights and+ Yeah. + checking curtains. Mind you I had to laugh cos I went over there and Lynsey and bloody Agnes are sitting in the bloody porter’s lodge talking. I thought You’ve got a good sodding job. [unintelligible] find you summat to do. Oh like what Alice’s doing? Well she’s wip = she’s re-cleaning what she cleaned the other week+ [unintelligible] +from what I saw of her yesterday. I can’t see the point can you. No. It’s so boring. (CANCODE data) In the next extract like is used in a similar way and is again preceded by ‘oh’ and refers back to the information presented in the preceding turn, marking the collaborative nature of the discourse at hand. In this conversation two members of a family are chatting. The first use of like in this extract functions to express the similarity between two objects in order to establish a closer description of a particular point of reference. This is done collaboratively with speaker extending and refining the description advanced by speaker . The second use of like again expresses similarity and suggests a point of comparison, although this time the point of reference is contained within the speaker’s utterance and does not extend the previous discourse. Extract 3 [A couple in their early twenties female; male] This was just There was a nice little c = little like a little house cafe erm of home-made things and we had to have a lunch there you know. Just er They were doing these toasties you know where you put it in a toaster and put it down and put a filling in the toaster. Erm Oh like waffle type things. Yes. And we had our lunch there and you could walk through this cafe and then into this big mill. And erm I got two two packs. One was three hundred grams and it was two ninety nine and the other


Spoken Corpus Linguistics

one was three hundred grams and it was one ninety nine but it wasn’t wool it was silky. It was a silky thing. Erm I I’ve Oh Chrissy got some stuff. It was like ribbon that she got. Oh yes this is the up and coming thing this ribbon. (CANCODE data) One of the reasons why the word like cannot easily be described with reference to single speaking turns or isolated utterances is that there are a number of features in the extended stretch of conversation which are important markers of spoken language and which reinforce the function of like in this context. For example, like co-occurs here with a number of markers of vague language and other discourse markers, including just, home-made things, you know, waffle type things and some stuff. Vague language (see Adolphs et al., 2007; Channell, 1994) softens expressions so that they do not appear too direct or unduly authoritative and assertive. When we interact with others, there are times when it is necessary to give accurate and precise information; in many informal contexts, however, speakers prefer to convey information vaguely which is, although such vagueness is often wrongly taken as a sign of careless thinking or sloppy expression, softened in some way or which is purposefully imprecise. Table 1.1. shows the 15 most frequent collocates of the word like in the CANCODE corpus. Using the C-Score, a measure to calculate the level of attraction between the word like and other words in the corpus, the output thus generated illustrates a strong patterning with the kinds of vague language and discourse markers discussed above (e.g., things, just, stuff, anything, something, (you) know). Deictic reference markers which underline the shared context and shared knowledge of the participants also abound in extract 3 above (e.g., these toasties, this café, this ribbon). It is interesting to note that like shares the same communicative territory as vague language markers, discourse markers and other reference markers. The corpus also Table 1.1

Words and C-scores



































Making a Start


reveals that in terms of social interaction like has a particular provenance in more informal encounters. In CANCODE there is a significantly lower count of uses of like as a discourse marker in more formal contexts.



The analysis in this chapter has illustrated some of the main issues in the use of a corpus to study spoken language. And two main functions of the word like have been discussed which tend to appear more frequently in spoken discourse than in written discourse. A corpus of spoken English can provide evidence of frequency distribution, as well as discourse level features that co-occur with individual words that are being investigated. Spoken corpora that are carefully categorized in terms of different contextual configurations can also provide evidence for word distribution across different contexts. The specific uses of like discussed above are associated mainly with casual conversation between people who know each other well. The discussion of like, together with a review of the differences and distinctions between spoken and written language and the different processes and practices in the assembly of spoken and written corpora, lays a foundation for exploration in subsequent chapters of this book.

Note: All CANCODE data in this chapter is © Cambridge University Press.


Corpus and Spoken Interaction Multi-Word Units in Spoken English



Throughout the development of corpus linguistics there has been a noticeable focus on analyzing written language and, with some written corpora now exceeding the one-billion word mark, the possibilities for generating new insights into the way in which language is structured and used are both exciting and unprecedented. Spoken corpora, on the other hand, tend to be much smaller in size and are thus often unable to offer the same level of recurrence of individual items and phrases when compared to their written counterparts. In addition, the analysis of spoken discourse as recorded in spoken corpora requires specific attention not just to single words but also to the patterns made by words in larger units. It is to the construction, design, development and corpus analysis of such features that we now turn. Within corpus linguistics, the single word has served the analysis of vocabulary well and will continue to do so, not least in such fields as lexicography. It also remains very much the popular conception of how vocabulary is constituted and learned. And just as vocabulary is represented for many people as single words, so too is it often the unusual or exceptional words that commonly attract attention. This chapter focuses, however, on very common everyday words and on the patterns that go beyond the confines of a single word.1 Units consisting of more than one word, such as phrasal verbs, compounds and idioms, have become the subject of increasing attention, and the last decade has seen even more attention to formulaic vocabulary patterns or ‘multi-word units’, which, we shall argue, are also key components in spoken interaction and in corpus linguistic analysis of such patterns.

Collocational Patterns The most important of these developments can be seen in the Neo-Firthian approach to word meaning. Firth (1935) famously proposed that the meaning of a word was as much a matter of how the word combined in context

Corpus and Spoken Interaction


with other words (i.e., its collocations) as any inherent properties of meaning it possessed of itself: lean is part of the meaning of meat, and vice versa, through their high probability of co-occurrence in texts (Firth, 1957). Collocations are the probabilistic outcomes of these repeated combinations. We normally talk of an amicable divorce rather than a friendly divorce; tea is described as weak but an excuse is feeble not weak (whereas tea is not normally described as feeble) In this sense words are known by the close associations they have with other words. Key discussions of the implications of Firth’s theory of collocation appear in Halliday (1966) and Sinclair (1966). Some 40 years ago, both Halliday and Sinclair foresaw the development of computational analysis of texts as a way of getting at the common collocations of a language, and both, in different ways, have fulfilled that vision, especially Sinclair (1991, 2004) (See also Hoey, 2005; Hoey et al., 2011; and Stubbs, 2001). The advent of corpus linguistics has enabled linguists to verify these earlier, mainly intuition-based notions in actual, attested language use on a large scale. The notion of collocation alters the emphasis from the single word to pairs of words as integrated chunks of meaning, and the term ‘collocation’ is now an accepted element in language description, particularly so in corpus-based descriptions of language. Studies of large corpora by linguists such as Sinclair (1991, 2004) have shown that vocabulary is a far more powerful influence in the basic organization of language and of meaning than was ever previously conceived. Corpora reveal the regular, patterned preferences for modes of expression of language users in given contexts, and show how large numbers of users separated in time and space repeatedly orient towards the same language patterns when involved in comparable social activities. However, corpora also reveal that much of our lexical output consists of multi-word units; language occurs in formulaic patterns much more commonly than a description of language that looks at vocabulary and grammar as separate entities can account for.

Formulaic Patterns of Language Although a considerable amount of work has been done on formulaic language in the last decade, research has tended to be widely distributed across a number of fields (child L1 acquisition, psychology, corpus linguistics). This diversity is illustrated by the wide variety of terminology (see Carter, 1998: chapter 4; Schmitt, 2010; Schmitt and Carter, 2004; Wray, 2002: 9) that is found for the various sorts of formulaic language: chunks collocations conventionalized forms clusters

formulaic speech formulas holophrases lexical bundles

multi-word units prefabricated routines ready-made utterances


Spoken Corpus Linguistics

Schmitt and Carter (2004: 3) illustrate the extent of terminological differences: . . . formulaic sequences can be long (You can lead a horse to water, but you can’t make him drink) or short (Oh no!), or anything in between. They are commonly used for different purposes. They can be used to express a message or idea (The early bird gets the worm = do not procrastinate), functions ([I’m] just looking [thanks] = declining an offer of assistance from a shopkeeper), social solidarity (I know what you mean = agreeing with an interlocutor), and to transact specific information in a precise and understandable way (Wind 28 at 7 = in aviation language this formula is used to state that the wind is 7 knots per hour from 280 degrees). They realize many other purposes as well, as formulaic sequences can be used for most things society requires of communication through language. These sequences can be totally fixed (Ladies and Gentlemen) or have a number of ‘slots’ which can be filled with appropriate words or strings of words, for example, [someone/thing, usually with authority] made it plain that [something as yet unrealized was intended or desired]. What descriptions of these various forms reveal is that fixedness is a key feature of formulaic language, allowing such language to be memorized and used as wholes, rather than being newly created each time it is used. On first inspection, all formulaic language looks as if it is completely fixed, but this is not necessarily the case. Of course, idioms are usually cited as examples. Thus, corpus evidence shows that to have forty winks occurs almost exclusively as that exact phrase, and not as variations such as * to have thirty-nine winks, *to have a short wink. In other words, if we want to use an idiom to express the notion ‘to have a short sleep’, we can only use the intact idiom to have forty winks and not some variation. However, much formulaic language is not quite so fixed in this way, and in fact allows for a surprising amount of flexibility (see also Biber et al., 1999; Carter, 2006; De Cock, 2000). not touch someone/something with a bargepole

(British vs. American English) not touch someone/something with a ten-foot pole burn your boats (varying a lexical component) burn your bridges cost an arm and a leg (verb variation) pay an arm and a leg spend an arm and a leg charge an arm and a leg every cloud has a silver lining (truncation) silver lining

Corpus and Spoken Interaction break the ice ice-breaker ice-breaking



In fact, it also seems that once a piece of formulaic language becomes wellknown in a speech community, it can be creatively adapted and still be comprehensible (see Carter, 2004: chapters 3 and 4 and section 3 below for further discussion). Although in several studies the term ‘cluster’ captures the way words combine and coalesce, the terms chunk, formulaic language and multi-word unit are used relatively interchangeably in this chapter with the term multiword unit, the most salient. The main examples here will be drawn from spoken English, not least because issues of fluent production and reception are foregrounded in talk. As with most high-frequency phenomena, their core contribution to language use is subliminal and not immediately accessible to the intuition of the native speaker or fluent user. This is once again where a spoken corpus comes in.

Spoken Corpus Data Using a representative, almost five-million word sample of North American English conversation from the over one-billion word Cambridge English Corpus (CEC), research by O’Keefe, McCarthy and Carter (Table 2.1) underlines that certain multi-word units occur with greater frequency than some common single words. Table 2.1 High-frequency chunks and single words (from O’Keeffe et al., 2007) you know




I think




kind of


and then


I don’t know






something like that




I don’t know if


a lot of people





Spoken Corpus Linguistics

The table suggests that many high-frequency chunks (I don’t know; something like that) are more frequent and more central to communication than even very frequent lexical words (friend) or very frequent function words such as their, where or under. Several patterns isolated in this way are, of course, syntactic fragments, and their occurrence is probably due to the regularity of the content world itself; for example, a fragment such as and then generates a common temporal sequence. For more extended studies of such patterning, see O’Keeffe et al. (2007: chapters 2 and 3). However, it is in the domain of pragmatics rather than in those of syntax or semantics that we are likely to find the reasons why many of these units are so frequent. Pragmatic categories refer to the creation of speaker meanings in context (of the kind that we begun to highlight in chapter 1 in respect to the word like); these include such functions as discourse marking and the expression of politeness, hedging, and purposive vagueness, which create a world of speakers and listeners interacting in real time rather than a purely propositional world, where the main emphasis is on the content of what is said. For example, some of the most frequent multi-word units are discourse markers, for example, You know, I mean, I guess, (Do) You know what I mean. You know, the most frequent unit, is, for example, an important token of projected shared knowledge between speaker and listener. I mean is also commonly used when speakers need to paraphrase or elaborate. Speakers also use indirect forms to perform speech acts such as directives (e.g., commands, requests, suggestions, etc.) to protect the face of their addressees, and the multi-word units reveal common frames for such acts. Indirectness is also important in the polite and non-face-threatening negotiation of attitude and stance. Units in this category include Do you think, I don’t know if, What do you think, I was going to say. For example: I don’t know if you’ve seen my magazine article yet, but I was going to say “can I have an appointment with you tomorrow”? (NMMC corpus) Some of the most frequent multi-word units also have a related hedging function, that is, they modify propositions to make them less assertive and more negotiable. These include I think, kind of, I don’t know, I don’t think, a little bit. Also relevant here is work on listener language and the uses of feedback or response tokens by listeners (items such as yeah; I’ve got you; really; exactly; oh, I see; right; that’s interesting, etc.) to indicate their involvement and engagement with the ongoing discourse (see McCarthy and Carter, 1994, 2006: 188ff; and O’Keeffe et al., 2007: 140–58, for a range of examples). This latter range of ‘listener’ multi-word units is a main topic of discussion in chapter 3 and in subsequent parts of this book such as in chapter 6. Such patterns of language show how extensive ordinary interactive meaning making is in everyday conversation and simply underline the considerable degree to which speakers engage continuously with one another in an interpersonal dimension. The addition of these ‘words’ to the lexicon of any

Corpus and Spoken Interaction


language should not be seen as an optional extra, since the meanings they create are extremely frequent, are necessary in discourse and are fundamental to successful interaction. The present chapter attempts to shift the balance away from the more semantically opaque multi-word expressions such as idioms and seeks to tease out some of the most common sequences in everyday talk. As with most high-frequency phenomena, their recurrence is typically subliminal and not immediately accessible to the intuition of the native speaker. This chapter therefore allows the first steps in the process of examining recurrent everyday multi-word strings to be effected automatically, by a computer count of recurring characters and spaces. This has both advantages and disadvantages, as the next section will show.



The case study in this chapter uses as its data source the 10-million word spoken English component of the British National Corpus (BNC). The analytical software used for the present chapter (Wordsmith Tools 5.0, Scott, 2008) is capable of automatically retrieving words and counting how frequently they occur. The user sets the number of words for the recurrent strings (e.g., twoword clusters, three-word clusters) and any cut-off points for frequency (e.g., minimum 10/50/100 occurrences). This necessarily means that the software will retrieve strings which in many cases lack any syntactic or semantic integrity, as well as strings that display integrity of one or both kinds. Computers in their present state cannot distinguish between strings which recur but which have no psychological status as units of meaning (e.g., the fragmentary string to me and occurs more than 100 times in the BNC corpus) and those units which have a clearer meaning and function, even though they may be less frequent (for example, the discourse marker phrase as far as I know occurs with less than half the frequency of to me and). This difficulty has led some researchers to incorporate fragmentary strings (e.g., Altenberg, 1998; De Cock, 2000) into their definition of multi-word units. We wish to focus here on those items in the automatically extracted strings which display pragmatic integrity regardless of their syntax or lack of semantic wholeness, a task which necessitates a more qualitative scrutiny of the examples. The procedure for extracting the recurrent strings was to generate rankorder frequency lists of two-, three-, four-, five- and six-word sequences for the entire 10-million word corpus. For practical reasons, a frequency cut-off point had to be established, and for the present purposes, an occurrence of at least four times per million words was the criterion for inclusion (in other words 40 times in the 10-million word corpus). This compares with Biber et al.’s (1999) figure of 10 times per million and Cortes’s (2002) figure of 20 per million. Our figure is more expansive on account of the low occurrence of six-word clusters (only 43 being generated at the necessary 40 or more occurrences in 10 million words). Six-word recurrent clusters are of very


Spoken Corpus Linguistics 40000 36877 35000


30000 24691

25000 20000 15000 10000







0 2-wd

Figure 2.1



Distribution of clusters in excess of 40 occurrences (BNC)

low frequency in BNC, and it does seem that six is a practical cut-off point beyond which recurrent clusters seem to be extremely rare. The lists for the smaller combinations were, predictably, much longer. Figure 2.1 shows the comparative distribution of two-, three-, four-, five- and six-word clusters in excess of 40 occurrences, and it can be seen that there is a very sharp fall-off between three- and four-word clusters, and an even sharper drop between four- and five-word clusters. It should be noted that, in these counts, contracted forms such as it’s and don’t are considered as one ‘word’, since the computer is counting characters and spaces only.



Tables 2.2–2.6 show the top items in each cluster list for two- to six-word clusters. The tables exclude repetitions such as you, you, you, which often occur as a stutter starts (although we recognize that these may indeed have importance in some kinds of analysis) and nonlexical phenomena such as hesitation markers (e.g., er, erm). The lists were then used as the basis for analysis and interpretation, first in terms of identifying integrated units, and then in terms of what such units reveal about conversational interaction.

Clusters and Single Words It is useful to gain a perspective on how the high-frequency clusters relate to the distribution of single words in the corpus. An exhaustive count is

Corpus and Spoken Interaction Table 2.2


Top 20 two-word clusters (BNC) Word


































































Table 2.3

Top 20 three-word clusters (BNC) Word










































beyond the scope of this case study, but some indicative examples are offered to assist the overall understanding of the place of such clusters in a corpusbased description of the spoken lexicon. Fifty-one single-word items in the BNC occur more frequently than the most frequent cluster ‘of the’ which occurs 35,933 times. Only 33 items in the single-word rank-order frequency list from BNC occur more frequently than the most frequent cluster (that is, more frequently than the number 3 item you know, which occurs 30,703 times). Clearly then, you know, a cluster or multi-word unit, is one of the most frequent items in the English lexicon. The tables suggest that word lists which focus only on single words risk losing sight of the fact that many high-frequency clusters are more frequent and central to communication than even very frequent words. However, the


Spoken Corpus Linguistics

Table 2.4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20




1,163 1,103 1,031 868 667 628 625 601 600 589 572 562 550 545 528 507 501 475 474 471

Table 2.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Top 20 four-word clusters (BNC)

Top 20 five-word clusters (BNC) Word



710 489 346 231 174 167 166 158 149 149 138 138 134 129 127 125 122 119 119 107

Corpus and Spoken Interaction Table 2.6


Top six-word clusters (BNC) Word
























































question remains whether the clusters in the tables and figures above should be considered as units of any kind or simply as statistical phenomena reflecting inevitable recurrence of a finite number of words in the vocabulary. To what extent do such co-occurrences reveal anything about how everyday spoken interaction takes place?



Many of the recurrent clusters present in the tables and graphs above are syntactic fragments. This means that they are not complete syntactic elements at phrasal or clausal levels. These include in the, on the, at the, to do, do you in the two-word list in Table 2.2, one of the and I think it’s in the three-word list in Table 2.3, the end of the and I think it was in the four-word list in Table 2.4. Conventional grammars would certainly label these as incomplete in terms of their syntactic structure. That is not to say that all models of grammar would reject such phenomena: emergent grammar, as epitomized in the work of Hopper (1998) considers fragments to be important clues as to how interaction unfolds and how meaning emerges rather than being predetermined in linguistic units. And there is no obvious reason why one


Spoken Corpus Linguistics

should exclude syntactically fragmentary strings from consideration when evaluating their interactive role. For instance, I think it’s (number 15 in Table 2.3) shows the frequency of I think as a hedge prefacing evaluations of situations likely to be referred to by pro-form it. I think is number 4 in the two-word list (Table 2.2), occurring more than 25,000 times. Other multiword units seem less pragmatically motivated (e.g., it was, what do you, in the middle of the) and their occurrence is probably due to the regularity and stability of the content world itself. We would argue, then, that it is in pragmatic categories rather than syntactic or semantic ones that we are likely to find the reasons why many of the strings of words occur so frequently. By pragmatic categories here, as we have already begun to elaborate, we mean those which embrace the creation in context of speaker meanings as part of conversational interaction. Such categories include discourse marking, the preservation of face and the expression of politeness, the acts of hedging and deliberate, purposeful vagueness, all of which create the speaker-listener world rather than the content or propositional world.

Discourse Markers Some of the most frequent clusters have discourse-marking functions. These include: You know I mean And then But I mean What I want you to do Do you know what I mean At the end of the day You know, as the most frequent cluster of all, is an important token of projected shared knowledge between speaker and listener, as well as being a topic launcher (Erman, 1987; Östman, 1981) and is pervasive in everyday informal talk, as the following extract shows: : Well one of the things we did do immediately after the er youth consultation is that we erm, we erected a paid for a skate board ramp at erm one of the youth centers in, in Harlow, so you know, we we creating that facility, but erm, what, you, so you your question was more directed at providing more of those kind of facilities. Well, providing, mainly because you see a lot of the youngsters with their skate boards, but they’re skating through the town, all over the place. (BNC File D96)

Corpus and Spoken Interaction


The extended multi-word clusters (do) you know what I mean have a similar function of checking shared knowledge. Separately, I mean is used when shared knowledge is not inferred or when the speaker needs to reformulate (Erman, 1987): Well this, this was a letter of the, granted it was 89, but I mean this paragraph just said, you know, if we, does say that, you know, if we established a, a demand, and I don’t see why we shouldn’t have a bash at. Well then I have the inspector. I mean, if we have a petition like they do on the counter in Welfare Rights in the Town Hall, I don’t think there be any opposition to that. . . (BNC file D95) The overlap of components within (do) you know (what) (I mean) partly account for the extremely high frequency of you know and I mean, but above all it is their core function in the monitoring of the state of shared knowledge which gives them the pragmatic integrity which qualifies them for consideration as multi-word units. Likewise, and then is extremely frequent in narrative as a marker of temporal sequence, while at the end of the day typically has a summarizing function.

Protecting Face Speakers use indirect forms to perform speech acts such as directives and requests in order to protect the face of their receivers, and the multi-word units indicate the kinds of everyday frames that enable us to do this. Indirectness is also important in the polite and non-face-threatening expression of attitude, opinion and stance. Speakers work hard to protect the face of their interlocutors, most commonly by not wishing to place too great a set of demands or impositions on them (see Brown and Levinson, 1987). Multiword units from the above tables include the following: I think that (Table 2.1) I don’t know if/whether (Table 2.4) It seems to me that (Table 2.5) What do you think (Table 2.4) I was going to say (Table 2.5) Do you think (Table 2.3) The following examples illustrate such pragmatic functions: How long have we got to? Erm about ten minutes?


Spoken Corpus Linguistics

Piece of paper. Yes, but that wouldn’t tell us, would it? Would it tell us? Would it. What questions do you think we need to know? And you want to know whether it would be more expensive. Mhm. Yes ah. (BNC File G3U) They er they just had their blacklists and er and er that was it, you you you were out, and you weren’t going do er, you know, you were not allowed to have another job. Now I don’t know if you remember anything about the nine days of the General Strike, as opposed to the s sort of the whole miners’ strike in that year. : Yeah. (BNC file FYJ) The utterances containing the multi-word units can be perfectly well formed with more direct assertions (e.g., But that wouldn’t tell us, would it. . .?) but the presence of the units plays a significant role in the collaborative protection of face and the listener-sensitive movement of the talk. Once again, it is pragmatic integrity rather than syntactic or semantic completeness which is most relevant. Hedging is another key feature of face protection and politeness. Some of the most frequent clusters have a hedging function. The hedges subtly alter propositions to make them less direct (or possibly even rude) and therefore less open to challenge or refutation. These include: I think Sort of A bit (of a) I don’t know I don’t think The following BNC corpus extracts illustrate these functions. Actually I’ve got a, I’ve got a brilliant picture that I took outside the train. Erm we were travelling through Malaysia and erm it’s just one

Corpus and Spoken Interaction


canopy tree standing on its own in the middle of nowhere. And there’s all this sort of undergrowth. (BNC file D97)

Purposeful Vagueness Equally apparent in the high-frequency clusters are markers of strategic, purposeful vagueness and approximation. Vagueness is central to informal conversation, and its absence can make utterances too direct and assertive, especially in such domains as references to number and quantity, where approximations are the norm in conversation. Vagueness also enables speakers to refer to the world in an open-ended way which calls on shared points of reference but which fills what is referred to only indirectly (see Chafe, 1982; Channell, 1994; Powell, 1985). Examples, not all of which are in the above frequency tables but that are nonetheless frequent in spoken interaction, include the following: A couple of And things like that Or something like that (And) that sort of thing (And) this that and the other All the rest of it (And) all this/that sort of thing For example:

Do we do we ad did we advertise for one before village meeting? Yes we have done yes. What recently? No, twenty year ago. No no no I mean since this. We we’ve used it for things like the street clean street cleaning and things like that. (BNC file KM8)

In such an example, it would be clearly conversationally inappropriate to list all the items implied by the vague tokens; speakers need only allude to the shared cultural knowledge and may assume their listeners can fill in the detail. Once again, the vague tokens exhibit pragmatic integrity and play central interactive and interpersonally sensitive roles, even though their syntax is incomplete and dependent.


Spoken Corpus Linguistics



Not all of the multi-word units can or need to be accounted for in terms of pragmatic integrity. By exploring the uses of the clusters in the corpus, it does seem that among the most frequent (the top 20 in each case), there seem to be a considerable number which achieve wholeness as units when their pragmatic functions are adduced. What such multi-word units show is the extent to which meanings in everyday conversational interaction are mutually constructed and the degree to which speakers constantly engage on the interactive plane as well as the transactional or content plane. The meanings they create are extremely frequent and necessary in discourse, and are fundamental to successful interaction. The units support Sinclair’s notion of the idiom principle at work, with the clusters best viewed as being evidence of single linguistic choices rather than assembled at the moment of speaking (Sinclair, 1987; Stubbs, 2009). A final word needs to be said about the status of such units vis-à-vis the more opaque idiomatic units that have traditionally been studied. In the absence of corpus evidence, it is difficult to introspect on what one says. It is much easier to introspect on what one writes, and additionally, introspection is more likely to light upon the colorful, the curious, the rare, precisely because such items are psychologically salient. Hence it should not surprise us that, with few exceptions, pre-corpus studies of multi-word units focused on idioms, phrasal verbs, compounds and so on, either as colorful curiosities or, in the pedagogic domain, a difficult characteristic of English for learners to struggle with. Meanwhile the banal, hidden, subliminal patterns of the everyday lexicon stubbornly resisted exposure. Corpus analysis enables us to circumvent our difficulties in retrieving such patterned occurrences, but the automatic retrieval of recurrent strings is only the beginning, and a good deal of qualitative inferential analysis is still necessary to see meaning in the more quantitative figures generated by the computer.

Mono and Multimodal What cannot be denied, however, is that multi-word strings of the kind described in this chapter can be identified by corpus methods, and the results reinforce their significance in our descriptions of spoken language and of conversational interaction. In chapter 4, two further case studies take the pedagogical dimension further and look closely at speaking and listening skills development involving such multi-word patterns. In subsequent chapters, these essentially corpus-generated insights will be built upon further as crucial for multimodal communication, for we do not just utter these words and phrases monomodally but with our voice and body we enunciate, accent and color them multimodally, too. First, however, in the following chapter we look at interaction from the point of view of listeners as well as speakers and at how we mark our listening attentiveness as a key component of spoken interaction. And, as we shall illustrate in part 2 of this book, we mark such listenership both monomodally and multimodally.


From Concordance to Discourse Responses to Speakers



In the previous chapters we have looked at what a corpus can reveal about key frequencies and patterns in spoken language. We have explored some differences and distinctions between spoken and written words and looked at what a corpus such as the British National Corpus (BNC) can tell us about some key functions of formulaic patterns or multi-word units as they operate in spoken discourse and with particular reference to pragmatic meanings. In this chapter,1 we move more markedly across speaking turns and look at the kinds of patterns adopted by speakers when they respond to utterances. Speakers are always both speakers and listeners in interaction, of course, but this next step takes us more directly into a realm of language investigation that can be said to be listener language. Looking ahead, too, to the second part of this book and to more multimodal dimensions of corpus analysis, the discussion here provides a platform for subsequent discussion of the ways in which response in language is not simply or exclusively a verbal phenomenon. Chapters 6 and 7, for example, look at how we talk with our heads in interaction and especially markedly in response to other speakers. The frameworks for the corpus analysis of listener response tokens developed in this chapter form therefore the basis for subsequent discussion and multimodal analysis.



Here we are interested in looking at a discourse feature across two language varieties, using data from two language corpora which have been assembled with the study of spoken discourse in mind, namely, the Cambridge and Nottingham Corpus of Discourse in English (CANCODE) and the Limerick Corpus of Irish English (LCIE). Both CANCODE and LCIE have been designed using the same data collection and categorization matrix (for an extensive description of CANCODE, see McCarthy 1998; and for LCIE,


Spoken Corpus Linguistics

see Farr et al., 2004). These corpora are suited to variational research as they have been designed using the same principles, which are to be sensitive to speaker relationship, context and speech genre, but their data come from distinct sociocultural settings, Britain and Ireland respectively. In this chapter, we will focus on a common feature of spoken interaction, namely, listener response tokens, in the two corpora so as to examine the degree of variation, if any, between their form and use in British and Irish English.



The issue of how best to represent a language is a key concern to corpus designers (see Atkins, Clear and Ostler, 1992; Biber, 1993; Crowdy, 1994; Farr et al., 2004; Tognini-Bonelli, 2001). In the case of variation-sensitive spoken corpora, there are two core concerns: 1. How to best represent a language variety 2. How to best represent a spoken language The first is a question of geographical and demographic coverage and sampling. For example, as we saw in the previous chapter, the BNC 10-million word spoken component consists of unscripted informal conversation. It is divided into a demographically sampled component (4 million words)— recorded from speakers of different ages, regions and social classes—and a more general component consisting of scripted and unscripted speeches (Crowdy, 1994). The second concern is a more complex matter relating to how spoken language itself is represented. Because most spoken corpora started out as appendages to much larger written corpora, many of them were based around written text typologies. CANCODE, which was mainly recorded in the late1990s, is one of the few corpora that has been designed to represent both a language variety and the genres of casual conversation. Recorded in a wide range of areas across Britain and Ireland, the CANCODE corpus is carefully categorized according to the relationship that holds between the speakers and according to a broad discourse goal that the speakers pursue (see McCarthy, 1998, for a detailed discussion of the design rationale for CANCODE). In terms of speaker relationships, CANCODE is divided into five broad categories which reflect the degree of familiarity between the speakers. The relationship categories are as follows: intimate, sociocultural, professional, transactional and pedagogic. Conversations that have been assigned to the intimate category tend to take place between members of the same family or partners while the sociocultural category encompasses interactions between friends. The professional category captures discourse that is related to professional interactions. The transactional category refers to situations in which the speakers do not know one another prior to the conversation that is being

From Concordance to Discourse


recorded. Typical examples would be an interaction between a customer and a waitress at a restaurant. The pedagogic category includes interactions that take place between students and lecturers or pupils and teachers in the given institutional context. In terms of goal types, there are three broad categories in the CANCODE corpus: information provision, which is characterized by unidirectional interactions; collaborative idea, which refers to bidirectional discourse; and collaborative task, which includes interactions in which the participants are engaged in a task, such as assembling flat-packed furniture. Since discourse is dynamic in nature, the goal-type categories sometimes change within individual conversations. Where this happened, the interaction was assigned to the category which reflected the dominant goal type in the interaction. The LCIE has been assembled to mirror the different relationship and goal-type categories in the CANCODE corpus, in order to facilitate inter-varietal research between British and Irish English in the given genres. Because the CANCODE and LCIE corpora have been designed to represent spoken discourse, they are suited to our purpose of looking at the discourse feature of listener response. We will now survey the existing research into this discourse feature.



Researchers from a variety of perspectives have long recognized that conversations contain listener responses, that is, short utterances and nonverbal surrogates (e.g., head nods) (see Drummond and Hopper 1993a, 1993b; Fries, 1952; Gardner, 2002; Kendon, 1967; McCarthy, 2002; Maynard, 1989, 1990, 1997; Tottie, 1991; Yngve, 1970). These signals are produced by the listener, according to Kendon (1967), as an accompaniment to a speaker, and he suggests that there is some evidence that the speaker relies upon these for guidance as to how the message is being received. Examine, for example, how the word yeah functions in this extract from a radio phone-in (taken from the LCIE). Here an elderly caller to a radio phone-in is explaining how, when she was young, a local woman used to do home ear piercing, using a thick darning needle, olive oil, some string and a cork. (1) Caller: Presenter: Caller: Presenter: Caller:

The way this was done was a Scottish lady who lived across the road from us. Yeah. And she would soak some gray wool. A length of gray wool in a saucer with olive oil. Yeah. And then she’d thread it through an extremely large darning needle.


Spoken Corpus Linguistics

Presenter: Caller:

Yeah. Then there was a cork held together. . . and she just threaded the needle with the wool straight through your ear and into the cork.

[Extracts are referred to by a number (1), (2), etc.] In extract (1), we see that the presenter wants to signal that she is listening and that she wants the caller to continue telling her story, but she does not want to take over the speaking turn (or the ‘floor’). To achieve this, she uses short response tokens that keep the conversation going (in this case, yeah). Tottie (1991: 225) provides an apt metaphor for this phenomenon saying that these tokens ‘grease the wheels of the conversation but constitute no claim to take over the turn’. Many terms exist for this phenomenon in the research literature, often depending on discipline and definition. Yngve (1970) introduced the term back channel to refer to the ‘short messages’ that a speaker receives while holding the floor (1970: 568), and this term is widely used by many researchers. Fellegy (1995) uses the term minimal response which comes from the body of research into language and gender (see Coates, 1986; Fishman, 1978; Zimmerman and West, 1975), while in another study Roger, Bull and Smyth (1988) use the broader term listener response. In this chapter, we will use the term listener response as an umbrella term to refer to the activity involving vocal, verbal and nonverbal non-floor-holding responses when a listener responds to the floor-holding message in a conversation. We will also refer to items which are used in this activity as response tokens. It is worth noting that we refer here to the discourse function of these lexical items, rather than their word-class identity as adjectives or adverbs or other parts of speech. Duncan and Niederehe (1974) note that they project an understanding between speaker and listener that the turn has not been yielded, but they also note that it is often difficult to identify the boundary between brief utterances and proper turns where the ‘listener’ becomes the ‘speaker’. This problem, however, is more for the analyst than the actual conversational participants, who, in real-time conversation, will draw on prosodic features, facial expressions, gestures, and so on, to interpret whether an interlocutor is trying to take the floor or display listenership in a given context.



In this study, we will compare and contrast the distribution forms and functions of such listener response tokens in two varieties of spoken English, British and Irish, using data from two corpora, CANCODE and LCIE, which have been designed for the study of spoken discourse, both using the same design matrix as detailed above. The existing research on forms shows that response tokens can be divided into minimal and non-minimal tokens (Fellegy, 1995; Fishman, 1978; Gardner, 1997, 1998, 2002; Maynard, 1989,

From Concordance to Discourse


1990, 1997; McCarthy, 2002; McCarthy and Carter, 2000; Schegloff, 1982; Tottie, 1991; Zimmerman and West, 1975). The distinction is not necessarily clear cut, especially when using a corpus of transcribed audio cassette recordings, as they usually fail to capture nonverbal response tokens such as head nods and shoulder shrugs. Usually, minimal responses are defined as short utterances (for example, yeah) or non-word vocalizations (such as mm, umhum), while nonminimal response tokens are mostly adverbs or adjectives functioning as pragmatic markers (for example good, really great, absolutely) or short phrases/minimal clauses (such as you’re not serious, Is that so? by all means, fair enough, that’s true, not at all).

Minimal Response Tokens (2)

Tis a lovely day but tis cold isn’t it? Ah the days are grand shure well yesterday was a bad bad evening. Mm. It turned black. [LCIE]


Her hair is fab isn’t it? Fab? It’s so cool though. Yeah it’s cool all right. Do you know it’s so natural. Mm. It’s a real nice shade like it’s not you know. [LCIE]

Non-Minimal Response Tokens (4) I wouldn’t have minded giving an apprenticeship to that lad here on the site cos he was a good strong worker so he was. . . . he was a polite young fella too. Is that right? She had a tough job with them she brought up those two kids herself. Her marriage broke down there a long time ago. [LCIE]


Spoken Corpus Linguistics

(5) . . . isn’t that nice now. Blue sky. Lovely. A bit of a breeze. [LCIE] As noted by McCarthy (2002), non-minimal response tokens may be premodified by intensifying adverbs that add further emphasis: (6) [Woman talking about giving birth] Dick was very excited cos at one point they asked for hot towels. Oh. Just like the movies. So he skipped off down the corridor to get the hot towels. Oh jolly good. [CANCODE, McCarthy, 2002: 65] (7) [Discussing tenancy problems in rented accommodation] Isn’t there something in your tenancy agreement about have a written agreement don’t you? Most definitely.

it? You

[CANCODE, McCarthy, 2002: 65] McCarthy (2002) notes that both minimal and non-minimal response tokens can occur in pairs or clusters, as in this example from LCIE: (8)

. . . you know it reminds me of am the play and ah. Mm. And the character in the play is not+ I don’t know. +someone I’d kind of identify with+ Yeah that’s true that’s true but I wonder if that’s a cultural sort of+ Yeah mm +I don’t know I had the same question for Rosemary . . . [LCIE]

From Concordance to Discourse


Carter and McCarthy (2006) suggest that response token pairings are particularly evident when a topic is being closed down or at a boundary in the talk when another topic is being introduced. These function both to signal a boundary and pragmatically to add satisfaction or agreement or simply to express friendly social support. Occasionally, triple response tokens occur. (9) [Couple asking permission to look at a disused railway line] It went through, it goes through. Straight, straight on. Right. Wonderful. Great. Can we look round then? Yes certainly. Thank you. [CANCODE] McCarthy (2002) and Carter and McCarthy (2006) also point out that the tokens absolutely, certainly and definitely may be negated as response tokens by adding not. (10) [Speaker A is considering buying a CD player for the first time] . . . but then I’d have to go out and buy lots of CD’s wouldn’t I. Well yes. I suppose you would. There’s no point in having a thing if you can’t play them. Haven’t got any. Absolutely not. Absolutely not. [CANCODE]



In comparison to the volume of research on forms, relatively few studies address the micro-functions of response tokens in conversation. However, there is enough research available to assert that they have more than one macro-discourse function. Yngve (1970), for example, notes from his observations of laboratory conversations, recorded audiovisually, that there is an apparent link between the use of certain forms and the marking of known or common information. Mott and Petrie (1995), in line with Bilous and Krauss (1988) and Fishman (1978), point out that listener responses signal support for, or attention to, what the speaker is saying. Fellegy (1995) concludes that the functions of minimal responses can be


Spoken Corpus Linguistics

considered both grammatical and social. Schegloff (1982) identifies the ‘continuer’ function of response tokens. This function will be discussed further below. Building on this, Maynard’s (1989) cross-cultural study of Japanese students conversing with American counterparts identified five further functions: display of understanding of content; support towards the speaker’s judgment; agreement; strong emotional response; and minor addition, correction or request for information. Gardner (1997), who looks at minimal responses, points out that each has a distinctive role and interactional function. He profiles the functions of certain minimal response forms, such as the continuer mm hm, mm, as a weak acknowledgment; yeah as a stronger, aligning acknowledgement. Gardner (2002) goes into substantial detail on listener responses that have previously been put together as minimal types of responses such as Mm, Mm hm and yeah. One of the few studies to look at listener response tokens in a specific social context is Antaki et al. (2000). They use the term high-grade assessment to refer to what other studies call non-minimal responses in the context of interviews (for example, tokens such as brilliant, excellent, smashing; see also Antaki, 2002). Antaki et al. (2000) argue that high-grade assessments function in a task-oriented rather than content-oriented manner within such institutional interactions to mark successful completion of the interactional objective. Though expressed differently, this parallels the findings of McCarthy (2003) that these items function over and above the transactional domain of an interaction. 3.6


Data and Methodology The following study looks therefore at the discourse feature of response tokens in two varieties, in terms of both forms and functions. First, in relation to forms: we use two databases of one million words, each extracted from CANCODE and LCIE, as described above. Each comprises only casual conversation from intimate contexts (that is, friends and families). Word lists and cluster analyses were generated to identify and compare the forms used in the data sets. Word-list generation is a core corpus software function which facilitates the rank ordering of all the words in order of frequency. As we have seen in the previous chapter, cluster analysis is similar to this process, except that it looks for clusters of words as opposed to single-word items, for example, two-word clusters (you know), threeword clusters (Are you sure?), four-word clusters (know what I mean?) and so on. Word and cluster lists for both corpora were generated, and from these lists response token forms were identified manually by cross-checking qualitatively with transcripts using concordancing. A cut-off of the first 500 items

From Concordance to Discourse


was used (in some cases, there were fewer than 500 overall occurrences). In this selection process, a response token was defined as an item that fills a response slot but which does not take over the speaker turn. They are seen as turn yielding. In our analysis, response tokens that form part of a turn were not included as response tokens. For example, really in this extract was not counted as a response token: (11) I heard John’s lost his job. Really. That’s terrible. Has he any chance of finding something else? Whereas really in this example does count as a response token because it does not take over the speaker turn: (12) I heard John’s lost his job. Really. He’d only just started, six months ago. We limited our focus of forms to lexicalized items (e.g., really, right, absolutely, no way, oh my God etc.) in the single-word count. Vocalizations (e.g., mm, umhum, etc) or other minimal nonlexicalized forms (such as yep, oooh etc.) were not included as single-word tokens. In order to examine response token functions, we extracted from CANCODE and LCIE two small, highly comparable corpora. Both these corpora consisted of 20,000 words of casual conversations, between British and Irish females all around 20 years of age. All participants were students and close friends, who, in most cases, shared living accommodation. This data was looked at qualitatively, in terms of all the response tokens that occurred, so as to identify and compare their functions.

Results Within the cut-off range of 500 occurrences of a word or cluster, only items that occurred at least 5 times as a response token were counted. This yielded 87 forms in all: 36 in LCIE and 51 in CANCODE. No five- or six-word clusters occurring with a frequency greater than five were found in either CANCODE or LCIE (See tables 3.1–3.4 below). In terms of comparison at the level of forms, the corpus search has brought to light a number of points. First, we see that there is a broader range of forms used by British English speakers, particularly at the single and two-word level. Some of the variation in single-word forms is attributable to language varieties, for example, the Irish English form grand, and quite, yes, and aye in the British English (the latter may be


Spoken Corpus Linguistics

Table 3.1 LCIE and CANCODE single-word response tokens within the first 500 words which occurred more than five times LCIE



































Table 3.2 LCIE and CANCODE two-word clusters that occurred more than five times within the 500 most frequent forms LCIE


oh yeah

oh yeah

oh right

oh right

no no

yeah yeah

that’s right

very nice

no no

oh yes

don’t you?

oh God

I know

very good

is she?

oh God

I see

oh no

oh dear

all right

is it?

right yeah

all right

Jesus Christ

did you?

my God

do you? oh well do they? isn't it?

as a result of a regional variety). We also see a broader range of forms in the British English single-word items which are also found in American English: right, absolutely, sure, good, lovely, exactly, great, definitely, absolutely, true, really (as noted by McCarthy, 2002). In contrast, the

From Concordance to Discourse


Table 3.3 LCIE and CANCODE three-word clusters that occurred more than five times within the 500 most frequent forms LCIE


I don't know

I don’t know

Oh my God

oh I see

Yeah yeah yeah

oh my god

yeah I know

something like that

Are you serious?

do you reckon

I know yeah

I can’t remember

oh yeah yeah

it doesn’t matter

Not at all Oh right yeah no no no I know that

Table 3.4 LCIE and CANCODE four-word clusters that occurred more than five times within the 500 most frequent forms LCIE


oh yeah yeah yeah no no no no yeah yeah yeah yeah I don't think so oh I don’t know Erm I don't know

Irish single-word forms only have really, sure and right in common with McCarthy’s findings for single-word non-minimal responses in American English. At a pragmatic level, we note that there are a number of differences. First, the use of yes and quite in British English has no corresponding occurrence in the Irish data. For example: [In the conversation below, two women in their 40s and 50s talk about the speaker A’s chiropractor] (13) But he’s very nice and what he does is erm he doesn’t he tries to do the minimum+


Spoken Corpus Linguistics

Mm. + to get you right. He doesn’t believe in doing everything all the time. No. If things are going well he tries to leave it alone you see.

Er that’s just a sort of m = minor thing. Mm. Oh er er the the the w = the like full works all the time or the. Well the trouble with who was just so brilliant+ Yeah. +such such a wonderful man+ Yes. [CANCODE]

(14) Well I do hope this Hoover thing is gonna be sorted out. Cos I am not having my flight out to Orlando if if poss ruined by+ Mm. +a bunch of Hoover-swinging Scotsmen. Quite. [CANCODE] We posit that they index a higher level of formality in British English. McCarthy (2002) also found occurrences of quite in his study of non-minimal forms in British data, and he comments that ‘Intuition and subjective impressions suggest that quite as a single-word response token is at the very least rather formal in contemporary British speech, and may be on the verge of being perceived as an archaism’ (2002: 60). Other form-related observations which account for the broader spread of two-word items in the British English data include the use of tag questions, such as is it?, did you?, do you?, do they?, isn’t it?, don’t you? We also found evidence of their use in the Irish data, but only in the form of is she? Carter and McCarthy (2006) and Carter et al. (2011) use the term follow-up question to refer to these forms which they say can function as a signal of engagement and attention by the listener. Their function, they note, is often very similar to that of back channel responses such as yeah and really. They confirm their response token function, saying that follow-up tag questions in informal spoken language often simply function to keep the conversation going by inviting further responses from the listener.

From Concordance to Discourse


(15) [talking about a recent visit to the dentist] But it’s all right. . . . She’s done the drilling and she just put this thing in just to keep it quiet for a week. It’s one one thing I used to dread. Did you? As a kid I loved it. Yeah. My mother used to give me a shilling for toothache. Yeah. I wouldn’t Did they give you gas and knock you out? No. [CANCODE] (16) Maybe wants to clear away my napkin. [laughing] [laughing] Feels like blow your nose in it. [laughing] That’s polenta. They have polenta all the time on Ready Steady Cook. Do they? It’s like this like flour stuff and you just put it into water and then it gets really thick and stodgy so it looks like ever ever so thick soup. And then you leave it to cool and slice it. Yeah. Sounds tasty. Doesn’t look very nice. [CANCODE] Religious references and swear words appear in both the British and Irish data. However, their use in the British data is limited to God and oh God, while the Irish data comprises God, oh God and oh my God and the swear words Jesus and Jesus Christ. Here is an example from LCIE: (17) Friends are looking at an old school team photo and are trying to identify the people in it: Ryan the oldest guy Tom Hartnett John Rodgers+


Spoken Corpus Linguistics

Oh yeah. +Brian Fitz. Paul Regan. [laughing]. Jesus Christ What year is this? The late nineties [LCIE] In the differing use and frequency of religious references, we again see a pragmatic variation that points to greater informality within the Irish data, as the pragmatic impact of God is more neutral compared with Jesus. However, there is a paradox here that is best understood socioculturally. The Irish speakers seem to accept swearing as a normal and frequent response token. It seems to have reached semantic neutrality. However, Ireland is still a predominantly Catholic country, and so one might expect the opposite to be the case. The explanation may be found in Andersson and Trudgill’s (1990) work; they note that swearing is associated with the areas that are taboo or significant in a particular culture. Hence, it is because Jesus has more significance in Irish society that it is being used as a swear word in everyday conversation. There is also contrast in the reduplication of forms. The Irish data displays more reduplication: yeah yeah, no no, yeah yeah yeah, oh yeah yeah yeah, no, no, no, no and yeah yeah yeah yeah. For example: (18) [Three friends are discussing surrogate reproduction] That would kill me seeing someone else having my child. Ah no no no no no no. I had this conversation with my mother now. No no no no no. No if Caitríona couldn’t have kids or one of my friends or someone and they asked me to have their kid I’d have no problem having it for them. I wouldn’t have a problem doing it but I would have a problem with someone else having it. Imagine having your mother carrying your baby like. [laughing] My baby would be my sister like. [laughing] [LCIE]

From Concordance to Discourse


The British data, while it has less reduplication, contains more clusters with the vocalization oh: oh yeah, oh right, oh God, oh dear, oh well, oh yeah yeah, oh right yeah, oh I don’t know. For example: (19)

They’d been cleaned and put in. Oh they’d put them back in a bag or something had they. No. They weren’t in a bag. They were just inside the chicken. Oh God. I just chucked them away and said nowt. [laughing] The meat was all right you know. That’s . They erm they wouldn’t affect the taste that much anyway+ Oh didn’t+ +would they. I wouldn’t have thought so no. [CANCODE} Finally, we note that the Irish form Are you serious? could (after Thomas, 1983) lead to cross-cultural pragmatic failure because it could be misunderstood in terms of how the listener orients towards the prepositional content of the message. The form, which is used in Irish English as non-minimal response token, is not found in the more dominant variety of British English, and therefore we propose that it has potential for pragmatic confusion or even face threat. Here is an example of its use in Irish English: (20) [Three speakers are gossiping about two young men in their locality who have built a house together] . . . but uh we went out in last night and we went back to Morgan’s house. Morgan is Trevor’s best friend since they were knee high and the two of them+ And are the two of them? Morgan Morgan who cabinet makers do we know him? Am Morgan. Murphy. It’s not Morgan Murphy. Is it Morgan Murphy? I’d know him all right. I don’t know. People keep on saying that. Is he a tall chap? Tall chap dark hair? Dark hair and he’s also pretty pretty broad yeah yeah I’d say it’s Morgan Murphy by the sound of it.


Spoken Corpus Linguistics

He’s twenty. He’d be twenty-nine as well a year older than Niamh and he’s from down there. Yeah his mum used to be a national school teacher. . . . [nine turns later after a digression about Morgan’s relations] . . . Well anyway he’s got he’s just built his house it was built in the last six months my god it’s a massive yoke the two lads living on their own. Are you serious? Yes you would be afraid to touch anything. Aren’t they marvellous. Yeah really like it doesn’t look like a home at all cause everything is just perfect. Like a showhouse. [LCIE]



Up to now, we have looked at response token forms using relatively large corpus samples (though one million words would be considered a small corpus by contemporary norms). This has allowed us to see lexical patterns using our software within these data. There is no automatic means of extracting and comparing the discourse functions of response tokens, so, in order to overcome this, we have constructed two very small and comparable data sets which we will examine qualitatively in terms of how response tokens function within them. The data sets are again sub-corpora of LCIE and CANCODE (see tables 3.5–3.7). Both comprise 20,000 words of data and are matched in terms of gender, age, social relationship, two-party to multiparty ratio of interactions, socioeconomic class and genre of conversation: All of the data was read exhaustively so as to manually identify and classify all response tokens. These functional classifications were devised by two raters and cross-checked by a third rater. In terms of frequency of response tokens, we found that there were considerably more in the British data. This is in line with the finding above that fewer forms were used in Irish English. This result allows us to speculate that there is more response token use in British English than in Irish English. However, this hypothesis would merit a separate investigation. In all, four functions were identified in the data sets. As Figures 3.1 and 3.2 and Table 3.5 illustrate, the pattern of their distribution in the Irish and British young women data is reasonably similar, with the function of convergence tokens being the most frequent followed by engagement tokens. When the results were compared across multi- versus two-party interactions, we found them to be inversely proportional. The functional pattern of distribution remained the same, with

From Concordance to Discourse Table 3.5



Sub-Corpus YW20a—Irish English

Number of words 20,000

Young women of 20 (LCIE)

Description Two sets of 10,000 words 1) of a two-party conversation between close Irish female friends 2) of a multiparty conversation between four close Irish female friends. In all cases the women were students around the age of 20 years Topics covered include gossip about friends and boyfriends, anecdotes and stories. All data is taken from LCIE.

YW20b—British English Young women of 20 (CANCODE)


In parallel with sub-corpus YW20a, these data comprise 10,000 words of a two-party conversation between close British female friends and 10,000 of a multiparty conversation between five close British female friends. In all cases the women were students around the age of 20 years Topics covered include gossip about friends and boyfriends, anecdotes and stories. All data is taken from CANCODE.

Table 3.6 Frequency of forms in YW20a (Irish) and YW20b (British) data sets Corpus

Frequency/20,000 words

YW20a (Irish)


YW20b (British)


convergence tokens remaining the most frequent type of response tokens used in all categories. Let us look qualitatively now at the functions which we have identified in these data.

Convergence Response Tokens When we scrutinize the corpus examples more closely, we find that response tokens are most frequently found at points of convergence in conversations, that is, where participants agree, or simply converge on opinions or mundane topics


Spoken Corpus Linguistics

Table 3.7

Functions of response tokens in casual conversation

Type of token


Typical examples

Continuer tokens

Maintain the flow of the discourse.

Minimal forms such as Yeah, mm.

Convergence tokens

Markers of agreement/convergence.

Many forms can perform this function such as: • single word items yeah

They are linked to points in the discourse:

• follow-up questions such as did you? is she?

1) where there is a topic boundary or closure 2) where there is a need to converge on an understanding of what is common ground or shared knowledge between participants.

• short statements, e.g., agreeing statements yeah it’s pretty sad.

Engagement tokens

Markers of high engagement where addressee(s) respond on an affective level to the content of the message. These back channels express genuine emotional responses such as surprise, shock, horror, sympathy, empathy and so on.

They manifest in many forms for example: • single word forms such as excellent, absolutely • short statements and repetitions that’s nice, oh wow, oh really • follow-up questions did you?

Information receipt tokens

Markers of points in the discourse where adequate information has been received. These responses can impose a boundary in the discourse and can signal a point of topic transition or closure, and they can be indicative of asymmetrical discourse.

Right and okay

and this leads them collaboratively to negotiate topic boundary points, where a topic can be shifted or changed. Convergence can also be followed by a conversational closure point. In this way, response tokens have a pragmatic function in that they help bring about agreement and convergence leading sometimes to topic shifts. In the example from YW20b between female roommates, we see that the topic (a great night out that the friends had together) has run its course, and it is collaboratively rounded off with the non-minimal response token you never know. Notice also how this phrase is a recycling of a phrase from the

From Concordance to Discourse


Occurrences per million

14000 12000

British YW20b


Irish YW20a

8000 6000 4000 2000









ce i




d ge ga En

ve r


nc e


Figure 3.1 Functional distribution of response tokens in British and Irish young woman data

Occurrences per million

5000 YW20a multi


YW20a two-party 3000 2000 1000

rm fo In









d ge ga En








Figure 3.2 Profile of functions in YW20a and YW20b two-party and multiparty conversations

previous turn, which makes for a very symmetrical ending point at which participants converge topically and lexically before moving on to a new topic: (21)

Yeah. We haven’t had a night like that for a while have we? No. Must have another one. Silly night. [laughing] What? Must have another one. Well I think we will.


Spoken Corpus Linguistics

Wednesday. Mm. Lifts the spirits. Mm. You never know we might be able to get a new recruit. [laughing] You never know. [laughing] [CANCODE—YW20b] After this point in the conversation, the topic shifts. In excerpt (22) below, two close friends are chatting about a former classmate who committed suicide. We see how one phase of the narrative ends with an evaluative formulation phased over two turns: it just goes to show you can’t take people at face value and And you don’t know what’s going on. This evaluation is unchallenged by the addressee and convergence is signaled after each phase of the evaluation by the response tokens no and exactly. This registers the addressee’s agreement, and it allows the conversation to move to a side sequence to this tragic story to which both participants contribute: (22) . . . it just goes to show you can’t take people at face value. No. And you don’t know what’s going on either. Exactly. But am seemingly she knew what she was doing as well because she brought the+ Oh she had it all planned out. She brought the little brother into get a present inside in Galway . . . [LCIE—YW20a] Adolphs and O’Keeffe (2002) noted that as well as helping to bring about topic shifts, these tokens are often found in closings as they allow conversations to come to a collaborative end. They illustrate how in an Irish radio phone-in show, Liveline, the presenter uses them and other markers of agreement in the closing of the call: (23) The presenter and the caller are chatting about the merits of clip-on earrings: Presenter:

And aren’t they grand?


Yes they’re very very handy.



From Concordance to Discourse Caller: Presenter: Caller: Presenter: Caller: Presenter:


But they’re not as secure as having them in your ear. This is true. This is true. You know you could lose them easily. That’s true. O.K. Tess well thanks for talking to us thanks very much. Right thanks very much. Bye All the best. Thank you indeed bye bye bye bye. [LCIE]

McCarthy (2003) notes that non-minimal response tokens sometimes cluster in consecutive series across speakers, providing multiple signals that a conversation is about to be terminated, while at the same time consolidating interpersonal relationships. He observes that they often occur together with other markers of closure such as thanks, checks, confirmations and greetings and that clustering is especially frequent in telephone conversations where there are often preclosing and closing routines: (24) [Telephone call concerning a printing order] Do you think it needs editing? Erm I shouldn’t think so. Good. Brilliant. Okay, well I’ll be round to pop it up. Okay. Pick it up today. Okay Jack. Have you got the compliment slips? Yes. On all er = They they look very good. Great. Yes. Fabulous. All right. [laughing] Okay. Thanks for that. Okay Len. Cheers. Bye. [CANCODE]


Spoken Corpus Linguistics

In a pragmatic sense, the affective value of convergence response tokens is worth noting. These tokens are of higher relational value than continuer tokens (see below). They do more than just signal turn yielding, listenership and a desire for the narrative to continue. Signaling agreement or converging on mundane topics is a form of interactional bonding between speaker and addressee, and convergence response tokens help maintain good relations between speakers by reinforcing commonality between them.

Engagement Tokens Engagement tokens were the second-most frequent type of response items in our analysis. This type of response token again functions very much at an affective level. They signal the addressee’s enthusiasm, empathy, sympathy, surprise, shock, disgust etc. at what the speaker is saying, without taking over the turn. It is also indicative of the addressee’s high level of engagement with the content of the speaker’s message. These tokens are typically non-minimal responses, and common items include brilliant, absolutely, wow, cool, gosh, really and short phrases such as that’s tough, that’s true, you’re not serious, Is that so? In example (25), an engagement token is used to express the addressee’s delight at what her friend is saying. Speaker is talking about how she will spend the summer with her boyfriend in Edinburgh (note: Debenhams is a well-known British department store; CV refers to curriculum vitae or résumé): (25) What are you going to do about a job? I don’t know. He says that it’s going to be like Killarney and that I should get one easily enough and I’ve been in contact with Debenhams and they told me to send over my CV. Brilliant Mary brilliant. [LCIE—YW20a] In Example (26), we see an engagement token signaling the addressee’s sympathy with the speaker’s message using a vocalization. (26) Speaker A has just told a story of how she and her boyfriend had a row a few days earlier. Were you out last night? I was. Where were you? Am you see we had to reconcile last night and get it all back on.

From Concordance to Discourse


Aaahhhh He says to me “I forgive you anyway” he says “for what you did to me” and I says “I was only testing the waters.” [LCIE—YW20a] Examples from the British YW20b data include: (27) He was singing to me. [laughing] And then he come over and he gave me one. And he gave me a peck on the he kissed me on the forehead and gave me a hug. Ah. That’s nice. [laughing] And then walked me home. It was hand in hand skipping up the road. [laughing] And gave me a hug goodbye. I’ve had flowers given to me. That was nice. [CANCODE—YW20b] (28) I ate almost a whole jar of Roses this weekend. Did you? [laughing] [CANCODE—YW20b] (29) . . . I went to a craft fair. Mm. . . . I went to a craft fair in Cambridge and they had erm this stained glass stall and it was all mobiles made out of stained glass. Oh wow. And they were superb. . . . And the mirrors with all different colours . . . all different size bits of coloured glass on it. Oh wow. It was superb. Massive. This type of response token functions at a much higher relational level than continuer tokens. They not only signal a desire for the speaker to continue, they also communicate the addressee’s affective response to the speaker’s message.


Spoken Corpus Linguistics

Continuer Response Tokens Continuer response tokens are facilitative in that they maintain the flow of talk. As their name suggests, they encourage the current speaker to continue. As mentioned above, many researchers have identified this function of listener response, and usually minimal response tokens are associated with it (see Gardner, 1997, 1998, 2002; Maynard, 1989; Schegloff, 1982). Speakers perceive them as floor-yielding signals that mark the addressee’s desire for the talk to continue. By looking at concordance lines for a minimal response token such as mm, we find that it is surrounded by ongoing utterances rather than being part of a turn itself. In extract (30) taken from the LCIE YW20a corpus, a friend is telling of a text message ‘conversation’ she had with her boyfriend (note ‘messing’ is Irish English slang for joking). Yeah signals that the listener is eager for the story to continue: (30) And he sent one back saying “ah come on now Sinead are you messing or are you serious like?” Yeah. And ah he sent one saying “no I’m deadly serious am I’m going to kill you when I catch you” so the next thing your man was pure upset over this like and . . . [LCIE—YW20a] We can observe from examples such as this that continuer tokens maintain the flow of talk. They may be perceived by the speaker as floor-yielding signals that mark the addressee’s desire for the narrative to continue. In example (31) from CANCODE YW20b, friends are talking about buying a pair of shoes, and the minimal response token mm facilitates the flow of the conversation. (31)

I didn’t even know they sold erm shoes in there. No. I didn’t know they sold shoes. Didn’t know that. But erm. They’re really nice. Cos like it’s really weird cos I had erm you know when you think of something you want to have. Mm. And you haven’t seen them in the shops. Mm.

From Concordance to Discourse


I sort of thought oh I really want you know. And I sort of visualised what I wanted and then erm I went down Superdrug with Rachel and we popped in and I thought Ooh. They’re the ones I want. [CANCODE YW20b]

Information Receipt Tokens We found a small number of response tokens, in both data sets, which did not fit any of the above categories. While the previous three types of tokens seemed to serve relational functions, these tokens seemed to have a more organizational function. These back channels are usually also marked by falling pitch. In the few examples that we found, they seemed to serve a globaldiscourse marking function (see Lenk, 1998) within the orientation stage of a narrative. The response token is used as a ‘self-imposed’ pragmatic marker at which the storyteller marks a boundary where the narrative can begin now that the contingent details are clear for the participants. In example (32), taken from the Irish casual conversation data (YW20a), we see that when the storyteller uses an information receipt token at the point where she assumes all of the contingent details are in place to continue with the story, the listener is not ready and still needs more details (or at least confirmation of an assumed piece of information), which the storyteller provides before continuing. (32) He’s been in Wexford for years right. I told you he’s separated didn’t I? And that he has a child. Yeah. Right. But he’s only young isn’t he? He’s only 29. [LCIE—YW20a] In extract (33) from the British data (YW20b), we see another instance of an information receipt token being ‘self-imposed’ so as to organize information in the preamble to an anecdote. Here we see that speaker signals that she is au fait with the contingent details, but this is not the case for , and so there is a prolonged stage of inquiry about the character of the forthcoming anecdote: (33) . . . I just saw this person I thought was quite nice but I can’t remember what he looks like . . . [laughing] Is that Trinity?


Spoken Corpus Linguistics

Mm. Oh right. Is he tall? Short? Don’t know really?: Distorted. [laughing] No. He’s not like really short and he’s not really tall. He’s sort of. Average height. Average. Yeah. [laughing] A normal sort of guy. [laughing] [CANCODE—YW20b] Adolphs and O’Keeffe (2002) and O’Keeffe (2003) found this type of response token, particularly in the form of right, to be very prevalent in their analyses of Irish radio phone-in data. The presenter used it in an organizational manner, and they propose that this token is strongly associated with asymmetrical interaction where one of the participants is a power role holder (see Figure 3.3 below for a functional comparison). McCarthy (2003) has also noted that some response tokens are strongly associated with particular contexts. Fine, he suggests, most typically occurs in the making of arrangements and reaching decisions, and that certainly most typically occurs in reply to a request for a service or favor: (34) Okay. I’ll see you a bit later then. Fine. In the morning, whenever. [CANCODE] (35) [To a waiter] Can I have the bill please? Yes, certainly. [CANCODE] McCarthy (2003) notes that adjectives such as excellent, fine, great, good, lovely, right, perfect offer positive feedback to the speaker and often mark the

From Concordance to Discourse


boundaries of topics, where speakers express their satisfaction with phases of business such as making arrangements, agreeing on courses of action, and marking the satisfactory exchange of information, goods and services. (36) [At a travel agent’s] Assistant: There you go. There’s your ticket. And your accommodation there. Insurance, and just some general information. Customer: Excellent. Right. [CANCODE] (37) [Dealer (A) and customer (B) in a car spare parts depot] I’ll get one of the lads in to come and do it for you. Lovely. The YW20a and YW20b data sets were closely matched in terms of age, gender, social relationship, and socioeconomic class, but we posit that the type of conversation (everyday conversations about friends, shopping, boyfriends etc.) is the most influential factor resulting in the homogeneity of functional distribution. This is perhaps substantiated by a comparison with our earlier findings (Adolphs and O’Keeffe, 2002; O’Keeffe, 2003), when we conducted a similar functional analysis of 20,000 words of interactions from an Irish radio phone-in show, Liveline. Figure 3.3 compares the functions of response tokens in the young women’s data and the radio phone-in corpus.

Occurrences per million

10000 British Irish


Radio 6000 4000 2000



re ce ip


ue rs C

on tin

ge d En ga


on v


ge n



Figure 3.3 Comparison of functions in YW20a and b with radio phone-in functions across 20,000 words (results presented per million words)


Spoken Corpus Linguistics

Because the genre of conversation differs, we find a different functional pattern. Most noticeable is the substantially higher frequency of response tokens which function as continuers. This is attributable also to the mode of communication as radio conversations take place in sound-only mode.



In this chapter we set out to address the dearth of comparative pragmatic research in terms of how spoken varieties of English differ. Within the paradigm of variational pragmatics, we also wanted to test corpus linguistics as a methodological tool. The focus of our study was the discourse feature of listener response. First, we can say that the area of variational pragmatics, from the perspective of our study, has great potential for development, even within this one area of listener response tokens. Further studies could look at how they differ pragmatically in other varieties of English, as well as in languages other than English. The paradigm of variational pragmatics, therefore, serves us well and will form the basis of much more of our future comparative work. From this short study alone, it is clear that even with two neighboring varieties of English, within which there is frequent contact, they are not monolithic entities. They vary in terms of forms, and these forms reflect socicultural norms and subtleties that differ in Irish and British society. We also found considerable difference in the frequency of listener response token activity. British English conversations contained far more response tokens than Irish English conversations. This is something that merits further investigation and raises questions such as Are British people better listeners? Do Irish people talk more and respond less? Do Irish people yield turns less and interrupt more? Our study of forms also raises issues of variation in terms of positive politeness. The forms used in British English displayed greater positive politeness: for example, the use of quite and the form yes, neither of which occurred in the Irish data. Other indicators of greater positive politeness were the use and variation in forms which involved religious reference, or which were swear words. These prevailed more in the Irish data. In our quest to compare the data functionally, we undertook a qualitative study. The results from this manual analysis of two sub-corpora of 20,000 words (each of which contain closely matched data with respect to gender, age, social relationship, socioeconomic class and genre of discourse) pointed to three main functions of response tokens in this context, with the minor function of information receipt marking. At the level of overall frequency, we again found a discrepancy between British and Irish English, where the British data contained 59% more response tokens. However, at the level of response token function, we did not find there to be any difference between their use in British and Irish English in these cohorts. In addition, the overall functional pattern held constant for two-party and multiparty interac-

From Concordance to Discourse


tions in both data sets. This leads us to assert, therefore, that while we have observed differences in the forms, frequencies and sociocultural subtleties of response tokens in British and Irish English, the pragmatics of the discourse function itself appears to be constant. We note also that this manual phase of the study only looked at female data (in order to control the variable of gender). In the future, we hope to replicate this study using male speakers, where all other variables are controlled. In terms of general statements about response tokens that come from this study, we can say: • Response tokens are core fluency items which function pragmatically to show listenership. • The tokens are discourse tokens rather than adverbs or adjectives. • A vocabulary of non-minimal response tokens probably exists in all languages but seems to vary within and between languages. • Even between language varieties, there is potential for cross-cultural pragmatic failure. Finally, let us consider corpus linguistics as a methodological tool for the study of spoken language variation from this kind of pragmatic perspective. We hope that we have demonstrated that it has proved useful to us as a means of accessing large amounts of spoken language samples, which we have easily been able to control for a number of variables. It allowed us to automatically retrieve results and compare forms in the data sets (in total, amounting to two million words, approximately 170 hours of talk). Where there was need to disambiguate forms or identify only those forms which functioned as response tokens, we had computerized access to the source files and the exact location in the original conversations in which the items occurred (as well as all of the speaker information for that conversation). In this sense, it is undoubtedly a tool of considerable merit. In terms of its limitations, first, it is only a tool and requires other frameworks for the interpretation of data. For example, discourse analysis, conversation analysis and pragmatics aided our understanding of the forms and functions of response tokens, and our work is based on a long lineage of research in these areas. Second, spoken corpora are only beginning to be extensively digitized. We are working with transcriptions of audio recordings. They have gone from the moment of recording to the person who transcribed them before our research. This raises a number of issues: (1) they have been extracted from their audi-visual situational context and transposed into the written word; (2) in so doing, they have lost much of their prosodic integrity; as well as (3) visual clues such as head nods (which could operate as surrogate response tokens) and facial expressions; and (4) though we can do so much automatically, corpus data still requires manual and qualitative work to offset and interpret purely quantitative data. The future of spoken corpora is, as we shall argue and as we hope to demonstrate in the


Spoken Corpus Linguistics

second part of this book, with digital audiovisual recording, where sound and image can be aligned to transcriptions, and we are now at a stage where technology can allow for this (See chapters 6 and 7, in particular). With this in mind, the potential of corpus linguistics as a tool to aid our understanding of variational pragmatics is very promising. This chapter does, however, confirm the value of developing corpus approaches to spoken discourse, for without such approaches a significant proportion of the language would not be amendable to analysis of the kind that corpus techniques supply.

Note: All CANCODE data in this chapter is © Cambridge University Press.


Case Studies in Applied Spoken Corpus Linguistics

Case Study 1: Discourse Markers, Spoken English and Pedagogic Settings 4.0


The first case study in this chapter continues the concern with different meanings of very frequently spoken words developed in the previous three chapters. It examines and compares the production of discourse markers by native and non-native speakers of English based on a pedagogic sub-corpus from CANCODE, a corpus of spoken British English which was drawn on for discussion in chapters 1 and 3. It is contrasted with a spoken corpus of interactive classroom discourse of English-speaking secondary pupils in Hong Kong. It will be noted that many of the most frequently spoken words in the language (as indicated in chapter 1 and in part in chapter 2) come into the category of discourse marker. Language development has been undoubtedly aided by corpus-informed studies and by the applications of corpus linguistics to pedagogic contexts, but it has tended to be confined to the development of writing and reading. The main applied linguistic purpose here is to underline that the teaching and learning of spoken English can be advanced by spoken corpus development and use. The second case study in the chapter draws more fully on chapter 2 and uses corpus insights into multi-word patterns with a discourse-marking function to underline their importance to listening skills development. Both case studies illustrate how spoken corpora can be utilized for speaking and listening enhancement and to assist in the development of interactional competence but also, fundamentally, continue to lay the ground for increased understandings of those lexical features that are central in their pragmatic integrity to spoken communication. Both studies illustrate how spoken corpora can be used to aid more precise definition of the function of words and of patterns that include but also go beyond the limits of the individual words.


Spoken Corpus Linguistics

Discourse markers are, however, complex in form and function and difficult to define. Before we commence specific corpus analysis and before we draw on relevant corpus data, there needs to be a preamble in which different categories and forms are discussed and refined. As in the previous two chapters, however, the process does illustrate how the isolation of word forms in a corpus, valuable though that has increasingly come to be, is only part of the story when it comes to spoken corpus analysis. Those same word forms can have numerous different functions and meanings and can also be part of phrases which in themselves add incrementally to our semantic and pragmatic repertoires of communication. Discourse markers (henceforth DMs) are pervasive in conversation and play a fundamental role in spoken interaction. Recent analyses of corpora of spoken interaction show that they are represented among the top 10 word forms (Allwood, 1996) and that an ‘utterance particle’ is found in continuous talk on average every 1.5 seconds (Luke, 1987). Discourse markers may be both single words and comprise multi-word units. There are studies of DMs which deal with individual markers in English (Aijmer, 1987; Andersen, 1998; Östman, 1981; Schiffrin, 1986; Stenström, 1998; Svartvik, 1980; Watts, 1987) and small sets of English DMs (Aijmer, 1996, 2002; Erman, 1987; Schiffrin, 1987; Schourup, 1985). However, relatively limited research has been undertaken on the range and variety of DMs used in spoken English by second or foreign language speakers. Comparative usage between native and nonnative speakers and the pedagogical significance they have in an ESL/EFL classroom has been studied even less, although Hays (1992), Romero Trillo (1997, 2002) and Müller (2004, 2005) are notable exceptions. The research reported here takes a step in this direction by reviewing definitions of core discourse markers, by classifying them according to a specific categorial framework and by undertaking a corpus-based analysis of their use in two different pedagogic contexts in the United Kingdom and Hong Kong. This study does not claim to offer detailed exploration of individual discourse markers and acknowledges in several places the richness of work in this regard within the research paradigm of traditional pragmatics (see Archer et al., 2012); the main aim here is to offer a broader and wider-angled corpus-driven account that points ultimately to the need for further cross-cultural and cross-linguistic research.1

Terminology Different terminology has been applied to DMs. They have been labeled sentence connectives (Halliday and Hasan, 1976), discourse particles (Goldberg, 1980; Schourup, 1985), utterance particles (Luke, 1987, 1990), semantic conjuncts (Quirk et al., 1985), pragmatic expressions (Erman, 1987), discourse operators (Redeker, 1991), continuatives (Romero Trillo, 1997), etc. The multiplicity of terminology surrounding DMs reflects diverse research interests and analytical categories, as well as the difficulties of accounting for them adequately in theoretical terms.

Case Studies in Applied Spoken Corpus Linguistics


Schiffrin’s (1987) analysis of DMs is based on a theory of discourse coherence. She defines DMs as ‘sequentially dependent elements which bracket units of talk’ (Schiffrin, 1987: 31). They are ‘sequentially dependent’ in that the units of talk prior to and following a discourse marker determine the type of marker to be used and are indicative of the kinds of social and pragmatic meaning a speaker communicates or infers. Schiffrin proposes that the markers in her study serve as contextual coordinates for utterances by locating them on one or more planes of talk (ideational structure, action structure, exchange structure, participation framework, information state) and maintains that coherence is constructed through relations between adjacent units in discourse by virtue of their semantic and syntactic properties and, most important, by virtue of their sequential position as the initial or terminal brackets that demarcate discourse units (Schiffrin, 1987: 35–40). Further influential research has been undertaken by Fraser (1990, 1996, 1999) who approaches DMs (labeled ‘pragmatic markers’ in Fraser, 1996) from a grammatical-pragmatic perspective. Slightly different from Schiffrin’s (1987) definition, which includes vocalizations such as oh, Fraser limits DMs to linguistic expressions which signal a relationship that the speaker intends between the utterance a DM introduces and the foregoing utterance. In these definitions, DMs have a core meaning, which is procedural, not conceptual. While taking into account the indexical potential of DMs, a more recent work by Aijmer (2002) emphasizes the lack of semantic or propositional content in DMs. In Aijmer’s account, DMs are indexed to attitudes, to participants and to the text; therefore, they have discourse and pragmatic functions both on the textual and interpersonal level (Aijmer, 2002) and must be described in terms of larger discourse contexts than one or two single utterances. 4.1


Definition In most studies, DMs are therefore defined as intra-sentential and suprasentential linguistic units which fulfill a largely non-propositional and connective function at the level of discourse. As useful contextual coordinates, they signal a transition in the evolving process of the conversation, index the relation of an utterance to the preceding context and indicate an interactive relationship between speaker, hearer and message. In respect of discourse markers, our aims in the present chapter are twofold: 1. To offer a broad description of discourse markers in pedagogic settings, using data from the CANCODE corpus (indicated by CANCODE) and a Hong Kong student corpus (indicated by student corpus) (see section 4.3.for details of the corpora). 2. Based on these corpora, to compare and contrast differences in the use of discourse markers in pedagogic settings between British native speakers and Hong Kong nonnative speakers of English.


Spoken Corpus Linguistics

The following criteria are used for a linguistic item or expression to be defined as a DM.

Position Most DMs occur in turn or utterance initial position. Initiality highlights many functions denoted by DMs, such as marking boundaries of talk (Okay in a.), topic initiation (Now in a.), topic closure or as an attention-seeking signal (Right in b.), and so on. a. Okay so you’re all happy with it. Now how are we going to approach it would anyone like to suggest a method? (CANCODE) b. Right. That’s covered the nineteenth century. Next week it’s the early part of the twentieth century, starting with James Joyce. (CANCODE) The position of some DMs is, however, very flexible. DMs can also be inserted in utterance medial position for a floor-holding purpose or to clarify meaning. Less frequent is the utterance final position where DMs are understood as comments (I think in a.), clarification (I mean in d.) or as an afterthought (actually in e.). c. He’s going to be changing jobs, I think. (CANCODE) d. But ah since it’s for children, this can’t be too high, the price, I mean. (student corpus) e. I think he may be bringing some pizza actually. (CANCODE)

Prosody DMs should be prosodically independent and have a fairly high separability from the utterances they introduce. Schiffrin (1987: 328) notes that a discourse particle ‘has to have a range of prosodic contours’. The prosodic clues that go with DMs include pauses, phonological reductions and separate tone units which are distinguished from other linguistic items in the discourse units. For instance, So, in f. below, followed by the vocalization er which functions like a pause, is a DM. Also, Right in g. can be distinguished prosodically as a DM because it marks a separate tone unit from the utterances that follow. f. So, er, what time is going to be best for the first meeting . . . (CANCODE) g. Right. Very good. What do you think might have happened since he left hospital that caused this ulcer to break down yet again? (CANCODE) Issues of prosody are not covered in detail here but corpus analysis of the separability of prosodic phenomena in discourse is covered in the next chapter (chapter5).

Case Studies in Applied Spoken Corpus Linguistics


Varied Grammatical Class DMs are distinctive in terms of their categorical heterogeneity (Bazzanella and Morra, 2000; Carter et al., 2011). They do not constitute a single welldefined grammatical class but are drawn from different grammatical and lexical inventories. The following, though not an exhaustive list, indicates the range of classes that they primarily come from: coordinate conjunctions (e.g., and, but, or); subordinate conjunctions (e.g., since, because, so); prepositional phrases (e.g., as a consequence, in particular, by the way, at the end of the day); adverbs (e.g., now, actually, anyway, obviously, really, certainly, absolutely); minor clauses (e.g., you see, I mean, you know); response tokens (e.g., yeah, yes, no); interjections (e.g., oh, ah, well); metaexpressions (e.g., this is the point, what I mean is, that is to say, in other words). It should, however, be noted that not all linguistic items from the above grammatical classes are DMs according to the criteria developed here. Such varied grammatical classes and functions of DMs can be exemplified by the words so and now and their use as a flexible interactional resource in summarizing, marking boundaries of talk, switching topic, establishing consequences etc. In this respect, the primary grammatical class of now and so as, respectively, adverb and conjunction is suspended. The grammatical status of a DM needs therefore to be continually contextually referenced.

Cohesion A DM has to function to signal the cohesive relation of an utterance to the preceding context and to assign the discourse units a coherent link. In terms of the conceptual meanings denoted by a DM, they exist in a cline which can be conceptually empty (well, OK, hey, oh), partly conceptual (so—with the semantic meaning ‘cause’) and conceptually rich (I guess, I think, first, second, obviously, frankly). Lexical words that have become DMs, as argued by Aijmer (2002), have often undergone a process of grammaticalization which leads to a change of function from an originally propositional meaning to a mainly textual or interpersonal function.

Propositional Meaning DMs are semantically and grammatically optional so that their existence does not affect the truth condition of the propositions. This means they can be omitted from a discourse without syntactic and semantic consequences. But listeners are then left without clues as to how the propositions can best be interpreted in relation to the rest of the message. Consider the effect to the following utterance h. if all the highlighted DMs are omitted. The stance and attitude of the speaker would not be properly signaled. h. Well actually there’s a couple of things really . . . I need to ask you about one draft of my medieval my em history of English. (CANCODE)


Spoken Corpus Linguistics

Serving principally as a set of guidelines for determining whether a linguistic item carries the status of a DM, any of the above criteria alone is a necessary but not a sufficient condition for the verification of DM status. Instead, a combination of criteria and sociolinguistic variables such as the contextual situation of the interaction, the participants and the nature of the relationships between them (Montolío, Durán and Unamuno, 2001) also need to be taken into consideration. 4.2


What needs to be borne in mind at this point in the trajectory of a more developed description of discourse markers is that access to spoken corpus data can enrich our frameworks of description. In this section we adopt a corpus-driven approach that emphasizes the descriptive value for classroom discourse of recurrent patterns and of frequency distribution. Accepting that an exclusive emphasis on the level of textual coherence and relevance is insufficient, both textual and interpersonal dimensions of DMs are included. DMs are thus viewed as both pragmatically significant and socially sensitive. Research has indicated that social or pragmatic problems of language use, both in children and in adults, may issue in an inappropriate use of DMs. Children learn to use DMs from an early age to convey social meaning (Andersen et al., 1999) and Hong Kong Chinese bilinguals intermingle Cantonese DMs in online chat exchanges—which are usually conducted in English—to signify their membership and identity in a social group (Fung and Carter, 2006). Also within some social groups the use of these expressions is stigmatized (Watts, 1989) and attributed to speakers’ incompetence. Therefore, our theoretical framework embraces a functionally based account and is grounded both on Schiffrin’s (1987) notion of a multidimensional model of coherence and on Aijmer’s (2002) interpersonal perspective. This functional orientation is framed in terms of interpersonal, cognitive, topical and textual categories so as to systematically classify the various roles DMs play in the pedagogic register. Before proceeding to a more precise corpus analysis, a provisional framework is first established to enable, we argue, more systematic, subsequent use of our corpus data in relation to the target pedagogic questions.

Interpersonal Category DMs are one of the mechanisms that mark the affective and social functions of spoken grammar (Carter and McCarthy, 2006; Carter et al., 2011). In this category, DMs are used to mark shared knowledge (you know, you see, see, listen) and to indicate responses like agreement, confirmation and acknowledgment (OK/okay, oh, right/alright, yeah, yes, I see, great, oh great, sure). Markers that function on this level also serve to mark the attitudes of the speaker (well, I think, you know, sort/kind of, like, just, to be frank, etc.) and to indicate a stance towards propositional meanings (basically, actually, really, obviously, absolutely, exactly).

Case Studies in Applied Spoken Corpus Linguistics


Textual Category In the textual category, DMs work on a textual level and mark relationships between verbal activities preceding and following a DM. Relationships of various kinds are indicated mainly by conjunctions: cause (because/cos), consequence (so), contrast (but, and, yet, however, nevertheless), coordination (and), disjunction (or), digression (anyway) and comparison (likewise, similarly). This category echoes most of the distinctions suggested in Halliday and Hasan (1976), Quirk et al. (1985) and Fraser (1990).

Topical Category In the topical category, DMs indicate the discourse in progress, the presence of which may affect the subject under discussion or even the distribution of turn taking. On the textual level, DMs in this category signal links and transitions between topics, for instance, signposting opening and closing of topics (now, OK, right, well, by the way, let’s start, let me conclude the discussion), indicating sequential relationships (first, firstly, second, next, then, finally), and marking topic shifts (so, now, and what about, how about) which may be returns to a previous topic or projections to a new topic. On the interactional level, DMs serve as a structural device to mark continuation of the current topic (yeah, and, cos, so), to summarize opinions (so), to regain control over the talk, to hold the floor, etc.

Cognitive Category This category comprises markers which provide information about the cognitive state of speakers in spoken exchanges. In unplanned speech, coherence or continuity in utterances may break down if the speaker has covert or unsignaled topic shifts, or if the hearer is required to go through inferential procedures in order to understand the discourse. Cognitive discourse markers instruct the hearer to identify the relevant phenomenon and to construct a mental representation of the discourse. DMs in this category serve to denote the thinking process (well, I think, I see, and), reformulate (I mean, that is, in other words), elaborate (like, I mean), mark hesitation (well, sort of) and assess the listener’s knowledge about the utterances (you know), allowing a co-construction of meaning. As there are multidimensional meanings or functions in a DM, primary and secondary functions can co-occur that change according to both the context and other variables. 4.3


The present case study makes use of two sources of data for a comparative study: a student corpus from Hong Kong (Lam and Wong, 1996) and the pedagogic sub-corpus from CANCODE (© Cambridge University Press). The use of corpora enables a relatively large number of texts to be dealt with


Spoken Corpus Linguistics

alongside empirical analyses of the actual patterns of DM use in a pedagogic discourse environment.

Hong Kong English Data Participants The student corpus contains data totaling 14,157 words from group discussions of 49 (20 males and 29 females) intermediate–advanced learners of English in a secondary school in Hong Kong. The subjects were all Form Six students between 17–19 years of age. The participants were divided into 12 groups and were given a task at the beginning of a lesson. The task specified that they are the staff of a toy company and need to submit a proposal to their boss suggesting the type of toy they intend to manufacture. In order to reach a conclusion as to what to manufacture, they have to make suggestions, comment and negotiate in their discussion. The activity involves students in an interactive environment where opinions are exchanged, ideas discussed and organized, and social relationships expressed. The interaction involves speech acts such as explanation, clarification, persuasion, agreement and disagreement.

Transcription The transcription of the student data is represented in standard orthographic form. Each line in the transcription represents either a continuing or completed intonation unit, but sometimes the turns may be taken or completed by other interlocutors, which has been clearly indicated in the transcription (for further information on transcription conventions, see Appendix 1). Pseudonyms are used to conceal the identity of the speakers, and transcription symbols parallel to those used for CANCODE data are adopted. As far as prosodic information is concerned, there is no detailed transcription of intonation contours except pauses. Information on pauses is especially important to the analysis because it can distinguish forms used as DMs from their other usages.

Data Selection Transcripts of classroom recordings provide an excellent record of ‘naturally occurring’ interaction (Silverman, 1993). We chose to use senior form students’ discussions as the basis for analysis because classroom discussion is an important aspect of speech that is greatly valued by teachers and also, oral proficiency and competence are essential tools for further academic and career advancement in Hong Kong. Such discussion can also illuminate the extent to which intermediate–advanced Chinese ESL learners are capable of incorporating DMs in their discussion in the same way that native speakers do. The degree to which the language used in role play, as in the present setting, can reflect actual usage might also be questioned. We would argue that

Case Studies in Applied Spoken Corpus Linguistics


this context involves students in a range of naturalistic speech acts that are of general utility within and outside a classroom and can generate language choices that distinguish students’ abilities to convey both ideational meanings and to express social relationships. In this respect, the case study may be of further particular value to teachers, especially in Hong Kong, while conveying broader pedagogic significance for the teaching of spoken language in context.

British English Data—CANCODE A comparison was made with examples randomly selected from the pedagogic sub-corpus (460,055 words in size) in CANCODE, the five-millionword spoken corpus developed by Nottingham University and Cambridge University Press (see McCarthy, 1998: chapter 2 and chapters 1 and 3 in this book for further details). As we have seen in previous chapters, the project was established with the aim of developing a description of spoken English. The corpus is organized generically to capture the speakers, environments and contexts in which spoken language is produced. Five broad contexts for data collection based on the type of relationship of speakers (transactional, professional, pedagogical, socializing and intimate) were added to three typical goal types (information provision, collaborative ideas and collaborative tasks) to capture the wide varieties of generic activities in everyday spoken English. CANCODE is a well-established corpus, and findings from CANCODE take a more central role in this study, with the Hong Kong data treated as more exploratory and indicative.

Design and Method: Quantitative and Qualitative Since DMs operate inside and outside a clause, a traditional semantic and syntactic analytical approach cannot fully explain this aspect of spoken grammar. Such a focus can be more adequately elucidated by a method of analysis which moves from lexical or sentential levels to discoursal or contextual usage. Therefore, a combination of quantitative and qualitative methods is used in this case study, which ranges from a macro-investigation by WordSmith Tools 4.0 (Scott, 2004) to a micro-discourse analytic examination through qualitative observation. As a starting point (and as an extension to the frequency analysis explored in chapter 1), the top 100 most frequent words in the pedagogic sub-corpus in CANCODE were identified. Among these top 100 words, 23 lexical items, with roles commensurate with those of DMs, were selected. However, words such as know, think, sort, mean do not occur as DMs on their own. Another frequency list for multi-word units such as you know, I think, sort of and I mean was therefore run retrospectively in order to ensure comparability with the student data. Further investigation of specific words was achieved by studying the co-text qualitatively to distinguish regularities and recurring patterns in the data.


Spoken Corpus Linguistics



General Analysis The first results section is essentially descriptive. It details the core functional paradigm of DMs in pedagogic discourse based on the multi-categorial model proposed, with examples from both CANCODE and the Hong Kong data (Table 4.1).

Interpersonal: Marking Shared Knowledge On the interpersonal dimension, verbs of perception like see, listen, know are often used as DMs for marking shared knowledge between the speakers. Table 4.1 discourse

A core functional paradigm of discourse markers in pedagogic



Organisational Structural Opening and closing of topics: Now, OK/okay, right/alright, well, let’s start, let’s discuss, let me conclude the discussion


Marking shared knowledge: See, you see, you know, listen

Cause: Because, cos Contrast: But, and, yet, however, nevertheless

Denoting thinking process: Well, I think, I see, and

Indicating attitudes: Well, really, I think, obviously, absolutely, basically, actually, exactly, sort of, kind of, like, to be frank, to be honest, just, oh

Coordination: Sequence: And First, firstly, second, secondly, Disjunction: next, then, finally Or

Reformulation/ Self-correction: I mean, that is, in other words, what I mean is, to put it in another way

Showing responses: OK/okay, oh, right/ alright, yeah, yes, I see, great, oh great, sure, yeah

Consequence: Topic shifts: So So, now, well, and what about, how about

Elaboration: Like, I mean

Digression: Anyway

Summarising opinions: So

Hesitation: Well, sort of

Comparison: Likewise, similarly

Continuation of topics: Yeah, and, cos, so

Assessment of the listener’s knowledge about the utterances You know

Case Studies in Applied Spoken Corpus Linguistics


(1) I mean her main work’s on Pinter really. See that was the problem because I thought Yeah just and then sort of the idea will have to be thrown out. She said Well you can risk it. But obviously it wasn’t a very good idea you know the ex = she said the external examiner could query with it being too close. (CANCODE) The verb see acts as a DM here. It occurs in initial position and is an utterance launcher to orientate and draw the attention of the listener to the upcoming utterance. See qualifies as a DM in this example, as it is not followed by any complementizer. The that in See that was the problem (line 2) is a deictic item referring to the problem Speaker 2 is addressing. In spoken discourse, you know is often used to appeal to the assumed shared knowledge or experience of the speaker for the acceptance of information. In the above excerpt (1), Speaker 2 is talking about the reaction of an external examiner concerning the work and is appealing to Speaker 1’s shared understanding where you know occurs in the medial position. Indicating Attitudes Many adverbs such as basically, actually, really, absolutely, sort of, etc. frequently occur in the CANCODE data to mark pragmatically the attitudes or stance of the speaker. (1) I mean her main work’s on Pinter really. See that was the problem because I thought Yeah just and then sort of the idea will have to be thrown out. She said Well you can risk it. But obviously it wasn’t a very good idea you know the ex = she said the external examiner could query it with it being too close (CANCODE) An examination of the role of really and obviously in (1) reveals that they enable the speakers to express certainty towards the propositional meanings of the utterances. Having the interactive effect of softening the tone through an element of vagueness, sort of is further used as a hedge or a weakener of illocutionary force to reduce the face-threatening act in . . . sort of the idea will have to be thrown out. Another common DM used to express attitude in many conversational exchanges is well in (2) below. In the CANCODE data, it almost always occurs in turn initial position. Speakers use well to show that they are thinking about something. It can be used to concede points or to indicate a change in topic away from what might have been expected (Carter et al., 2011; Schiffrin, 1987).


Spoken Corpus Linguistics

(2) Are you going to buy a new car? Well, we’ve stopped looking for the moment. . . . (CANCODE) I thought I might come but well a = as I say I’m I’m a bit tired and think I might give it a miss . . . (CANCODE) As an attitude-marking device, which is also called ‘stance marker’ in Biber et al. (1999), the above responses prefaced by well indicate a degree of reservation or hesitation on the part of the speaker while at the same time implying a contrary view. Many DMs in both sets of corpora are used to provide responses or feedback to the speaker. These DMs indicate active participation and positive listenership, making the communication more interactive, involving and informal. Widely used markers, especially in CANCODE, are okay, great, right, yeah, and I mean. Such markers also commonly entail an additional component of gesture (a topic to be explored more fully in chapters 6 and 7).

Textual: Indicating Relationships between Utterances On the textual level, conjunctions frequently used in written language are exploited in spoken discourse to signpost the textual connectedness of an existing utterance with a preceding one. They provide indexical direction to various semantic relationships such as causal (cos/because), consequential (so), contrastive (but), disjunctive (or), coordinative (and), digressive (anyway) and comparative relationships (likewise, similarly). Discoursal links can be expressed using this type of textual marker.

Topical Markers in the topical category provide information about the ways in which successive units of talk are linked to each other and how a sequence of verbal activities, the opening, closing, transition and continuation of topics, are organized and managed. Opening and Closing of Topics DMs are useful in signaling the opening and closing of a conversation in which the listener is oriented to the end of one discourse boundary and the beginning of the next. This has been noted by Sinclair and Coulthard (1975) who observed the frequent recurrence of a small set of words like right, well, good, OK, now in their analysis of classroom discourse (3). More explicit

Case Studies in Applied Spoken Corpus Linguistics


markers like Let’s start now, Let’s discuss . . ., let me conclude . . . (4) are used in the student corpus to perform a similar function. (3) OK. Now is there any clinical evidence that he might have occluded his graft? What sort of symptoms would he have had? (CANCODE) (4) A-a let’s start : OK. Today we are going to discuss that what kind of toys is best < = > is for new best selling. Eh what do you think Eric? (student corpus) Besides functioning on the interpersonal dimension to mark responses, all right/right (Example 4) is also used to signal a discourse boundary where a topic ends and another begins. (5) All right to to start from here. Right. Erm could you just tell me a little bit about the er the name of the inst = er er the department that you work in. Er erm Oh right. (CANCODE) Sometimes, a combination of different DMs with vocalization erm, as in Example (6) is used to frame the beginning of the topic. (6) Right. Erm okay. Well let’s see if we can get our heads around the unfair Contract Terms Act then . . . (CANCODE) Sequence DMs are frequently used to signal the sequence of talk and signpost to the listener the logical sequence of segments of talk; in Example (7), firstly, secondly, thirdly, then, and then, etc. Firstly, then (line 6), thirdly, and then (are used to signal how the sequence of debate is organized and presented. (7) Erm okay this is the basic structure. And we’ve got thirteen points. Mhm.


Spoken Corpus Linguistics

So < = > this is this is what we’ll do. Firstly introduce the speakers. Yes. Then introduce the topics of the debate and < = > the main the main topics. Er thirdly we’ll give the reasons for actually having the debate in the first place. part actually. Yeah. Yeah. Yeah. A = and then say why these Right. (CANCODE) Transition of Topics From the corpora, DMs like so, and what about, how about are found to signal the transition of a topic, marking the end of a topic and the beginning of another. As indicated in the conversation in excerpt (8), So, preceded by another DM, Okay, signals the end of the prior utterance regarding the findings on an angiogram of a patient, and it is used to project the discourse forward to a change of topic regarding the treatment of the stenosis. The tutor in excerpt (9) is inviting suggestions from the students for the kind of mock exam questions to be discussed. Here Now marks the transition of topic to how the questions they have previously raised can be answered. (8) And on these views over here you can see that all three vessels go on down+ Mm. +but only one of them seems to cross the ankle joint and into the dorsal arch of the foot. Okay. So now the next thing we really want to know is is there any way we could treat that stenosis? (CANCODE) (9) OK so you’re all happy with it. Now how are we going to approach it would anyone like to suggest a method? (CANCODE)

Case Studies in Applied Spoken Corpus Linguistics


Summarizing Topics Other than functioning on a referential plane to indicate a causal relationship and on a structural plane to indicate a transition of topic, it is also found that so functions to signal that the conversation has come to an end and that it prefaces a summary of the opinions that will be made as a conclusion. Very often it occurs in turn initial position as in (10). (10) : So we < = > all we have discussed all the things. (student corpus) : So have a conclusion? We’ll have the eh eh new game called Planet and the< = > it the method of playing is just like other Monopoly. : Mm. : And it’s suitable to all people. And then the price of toys is about $200 because we (student corpus) Continuation of Topics DMs are also found to be frequently exploited as continuers to provide the prior speaker with a conversational space to expand upon. Some typical uses by the native speakers for this purpose are the uses of and, then (above) and cos. Besides being commonly used to signal a causal relationship on a referential plane, cos (11) also functions as a continuer to indicate one’s intention to hold the floor as illustrated below. (11) So what we were talking about last time was that can you can you how did you formulate your hypothesis? Where’s the hypothesis? Have you got it? Er Cos it’s not to do with transitive and intransitive. Oh God. (CANCODE)

Cognitive: Denoting Formulation and Reformulation DMs provide information concerning cognitive processes. For example, well is frequently used as a delaying tactic (Svartvik, 1980: 171) to denote the thinking process when an answer is not immediately available in the speaker’s mind and to buy time for processes such as word searching and syntactic


Spoken Corpus Linguistics

completion. It also performs a local coherence function as a qualifier to patch up slight mishaps in the conversational flow (Tsui, 1994: 46). In Example (12), Speaker 2 seeks to answer the student’s question by framing the turn with Well, which is followed by a false start and pauses. The use of well can be viewed as a hesitation device to give the speaker time to plan and keep a turn in an interaction, a point that has been raised by Altenberg (1990). Similarly, I think, as another DM also serves to denote the thinking process. In Example (13), the turn is initiated by a cluster of DMs—well and I think which are surrounded by vocalizations er erm and er er. It is observed that well occurs mostly at initial, whereas I think is more flexible in its position. (12) Well < = > I it looks like em pause “They are actually one specific senior positions” pause Er I mean I think that yes I would regard it as being existential there which < = > is in a sense em er it works the same way. (CANCODE)

(13) Er erm well I think that er er from a linguistic point of view there is er an interest in all kinds of what linguists call formal properties of words. (CANCODE) Speakers in real speech are under time constraints to structure and formulate their ideas. DMs are therefore exploited to allow sufficient time for speakers to reformulate, rephrase, self-correct or repair their utterance. One common DM used to mark this purpose is I mean. The motivation for this device is to clarify reference or to indicate one’s stance retrospectively. It marks the speaker’s reformulation or modification of his/her prior ideas or intentions (Schiffrin, 1987). Elaboration Similarly, the DMs like and I mean are used to elaborate and modify the existing propositional meaning to make clear the intention of the speaker or to supplement the inadequacies of the meanings. Schiffrin (1987) claims that I mean is used to modify the speaker’s own ideas and intentions. Jucker and Smith (1998) categorize like as an information-centered presentation marker, and Müller (2005) observes like functioning on the textual level to mark an approximate number, introduce an exemplifier or an explanation, search for the appropriate expression and serve as a lexical focuser. In the following Example (14), like functions as an exemplifier to indicate Speaker A’s proposal to produce a Monopoly game.

Case Studies in Applied Spoken Corpus Linguistics


(14) What what type of game+ Like the < = > Mono Monopoly game is popular in Hong Kong nowadays. +do you (student corpus) Similarly, the following instances of I mean (15) are used to qualify or elaborate the information in the utterances they frame. (15) Yes the last university I was at was very very free. Em you had to do some literary theory and you had to do the first year courses but er er I mean you could miss out Shakespeare you could do nothing but novels. Mm. I mean actually students didn’t but the course allowed you to do that I mean it was quite bizarre. (CANCODE) Summary So Far To summarize, the first section of analyses suggests that DMs, though they may appear as small words in conversations, can fulfill diverse discourse functions in the four multi-categorial realms as intimacy signals, boundary markers, connectors, confirmation seekers, turn takers, topic switchers, hesitation markers, repair markers and attitude markers. We are fully aware of the fact that a broad corpus-based study offers little scope for detailed analyses of individual markers; yet, it is hoped that the suggested distinctions underline the core functional paradigm that this range of discourse markers have in spoken discourse, which can in turn suggest a range of possibilities for metapragmatic instruction in the L2 classroom (see section 4.6 below).



The second part of the results section indicates a discrepancy in the production of DMs by the two groups of speakers, at least in relation to a British English norm. Indeed, it will be seen that many of the examples used for definition are drawn from CANCODE. This list of 23 DMs and the data are presented in Table 4.2. (These 23 words, which are closely associated with DMs, are labeled as DMs throughout the results section, though they obviously perform other grammatical functions). Owing to the limitations of the present computer software in discriminating the discoursal role of


Spoken Corpus Linguistics

individual words, it is therefore stressed that the words cited may carry other grammatical functions than those of DMs, and it is accepted that the overlapping functions of DMs, as discussed previously, sometimes make their classification difficult. Moreover, the list presented serves only as an illustration rather than as an exhaustive list of all those DMs that appear in CANCODE and the student corpus. However, they are consistent with the observation of Carter and McCarthy (2006: chapter 3 and Carter et al., 2011) who, using CANCODE, identify the following DMs as ‘very common’ in spoken British English contexts: cos, like, right, so, I mean, I think, OK, you know and well. Based on the information suggested in Table 4.2, the frequency of the same 23 DMs used in the student data were run (Table 4.3), and a simple mathematical subtraction was performed on the two columns in order to obtain a contrastive frequency of the two sets of markers. After a careful Table 4.2 Frequency of discourse markers among the top 100 most frequent words [pedagogic sub-corpus in CANCODE] (460,055 words) Discourse Markers 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.

And So Yeah Right But Or Just Okay Like You know Well Because Now Yes Sort of See I think I mean Say Actually Oh Really Cos



11,736 4,424 4,118 3,262 3,152 2,133 1,988 1,865 1,822 1,659 1,637 1,496 1,471 1,250 1,172 1,141 1,060 922 920 867 819 745 627

2.55 0.96 0.90 0.71 0.69 0.46 0.43 0.41 0.40 0.38 0.36 0.33 0.32 0.27 0.25 0.25 0.23 0.20 0.20 0.19 0.18 0.16 0.14

Case Studies in Applied Spoken Corpus Linguistics Table 4.3


Frequency of discourse markers in student corpus (14,157 words) Discourse Markers


Student Data %










I think





















67 (60/7)










55 (7/48)









You know




I mean




















Sort of



















study of the figures, a contrastive frequency of ±0.14 was chosen as the cut-off point for the three categories: if the contrastive frequency is +0.15 or above, the representation is regarded as more frequent. If the contrastive frequency is -0.15 or below, then the representation is regarded as less frequent. Since there is no occurrence of cos in the student data, it can be categorized in the ‘less frequent’ column, although its contrastive frequency is only -0.14. Hence, it was decided that if the figure falls within the range between -0.14 and +0.14, the representation of DMs is regarded as comparable. A positive difference in contrastive frequency means the DM is used more frequently in the student data, whereas a negative difference in contrastive frequency means it is used less frequently in the student data. Additionally, log likelihood and chi-square tests were undertaken (Table 4.4) in order to provide a further statistical account of significance in the use of DMs. As Table 4.4. indicates, items that are in bold were found to be


Spoken Corpus Linguistics

Table 4.4

Further statistical measures

Discourse Markers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

And But I think Yes So Like Because Yeah/yeh Or Okay/OK Just Oh You know I mean Now See Really Say Sort of Well Right Actually Cos

CANCODE Corpus (460,055 words)

Student corpus (14,157 words)

Chi-square Df=1

Log likelihood



X2=42.48; p