The Prosody of Formulaic Sequences: A Corpus and Discourse Approach 9781441181152, 9781474205627, 9781441132512

To apply the same approaches to analysing spoken and written formulaic language is problematic; to do so masks the fact

132 74 7MB

English Pages [248] Year 2018

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The Prosody of Formulaic Sequences: A Corpus and Discourse Approach
 9781441181152, 9781474205627, 9781441132512

Table of contents :
Cover
Half-title
Title
Copyright
Contents
List of Illustrations
Acknowledgements
1. Introduction
2. Formulaic Language: An Overview
3. Can We Identify Formulaic Language Based on Prosodic Cues?
4. Study One: Do Formulaic Sequences Align with Intonation Units?
5. Study Two: A Comprehensive Profile of the Intonation, Stress and Rhythm of Formulaic Language
6. Study Three: A Multimodal Approach to the Identification of Formulaic Language by Native Speaker Judgement
7. Conclusions: The Prosody of Formulaic Language
Appendix 1: Information on the Pilot Study
Appendix 2: The Instructions Used in Various Sessions of the Pilot Study
Appendix 3: Screenshots of the Interactive PowerPoint Interface with Integrated Audio Playback
Appendix 4: The Task Booklet for the Native Speaker Judgement Process
Appendix 5: Instructions for Prosodic Transcribers
Appendix 6: Prosodic Transcription of Text W
Appendix 7: Formulaic Sequences Assigned the Maximum Confidence Score in Text W
Appendix 8: The Duration of the Task for each Native Speaker Judge
Notes
Bibliography
Author Index
Subject Index

Citation preview

The Prosody of Formulaic Sequences

Corpus and Discourse Series editors Wolfgang Teubert, University of Birmingham, and Michaela Mahlberg, University of Nottingham Editorial Board Paul Baker (Lancaster), Frantisek Čermák (Prague), Susan Conrad (Portland), Dominique Maingueneau (Paris XII), Christian Mair (Freiburg), Alan Partington (Bologna), Elena Tognini-Bonelli (Siena and TWC), Ruth Wodak (Lancaster), Feng Zhiwei (Beijing). Corpus linguistics provides the methodology to extract meaning from texts. Taking as its starting point the fact that language is not a mirror of reality but lets us share what we know, believe and think about reality, it focuses on language as a social phenomenon, and makes visible the attitudes and beliefs expressed by the members of a discourse community. Consisting of both spoken and written language, discourse always has historical, social, functional and regional dimensions. Discourse can be monolingual or multilingual, interconnected by translations. Discourse is where language and social studies meet. The Corpus and Discourse series consists of two strands. The first, Research in Corpus and Discourse, features innovative contributions to various aspects of corpus linguistics and a wide range of applications, from language technology via the teaching of a second language to a history of mentalities. The second strand, Studies in Corpus and Discourse, is comprised of key texts bridging the gap between social studies and linguistics. Although equally academically rigorous, this strand will be aimed at a wider audience of academics and postgraduate students working in both disciplines.

Research in Corpus and Discourse Conversation in Context A Corpus-driven Approach With a preface by Michael McCarthy Christoph Rühlemann Corpus-Based Approaches to English Language Teaching Edited by Mari Carmen Campoy, Begona Bellés-Fortuno and Ma Lluïsa Gea-Valor Corpus Linguistics and World Englishes An Analysis of Xhosa English Vivian de Klerk Evaluation and Stance in War News A Linguistic Analysis of American, British and Italian television news reporting of the 2003 Iraqi war Edited by Louann Haarman and Linda Lombardo Evaluation in Media Discourse Analysis of a Newspaper Corpus Monika Bednarek Historical Corpus Stylistics Media, Technology and Change Patrick Studer Idioms and Collocations Corpus-based Linguistic and Lexicographic Studies Edited by Christiane Fellbaum Investigating Adolescent Health Communication A Corpus Linguistics Approach Kevin Harvey

Meaningful Texts The Extraction of Semantic Information from Monolingual and Multilingual Corpora Edited by Geoff Barnbrook, Pernilla Danielsson and Michaela Mahlberg Multimodality and Active Listenership A Corpus Approach Dawn Knight New Trends in Corpora and Language Learning Edited by Ana Frankenberg-Garcia, Lynne Flowerdew and Guy Aston Representation of the British Suffrage Movement Kat Gupta Rethinking Idiomaticity A Usage-based Approach Stefanie Wulff Corpus Linguistic Contrastive Semantic Analysis Ruihua Zhang Working with Spanish Corpora Edited by Giovanni Parodi Studies in Corpus and Discourse Corpus Linguistics in Literary Analysis Jane Austen and Her Contemporaries Bettina Fischer-Starcke English Collocation Studies The OSTI Report John Sinclair, Susan Jones and Robert Daley Edited by Ramesh Krishnamurthy With an introduction by Wolfgang Teubert Text, Discourse, and Corpora. Theory and Analysis Michael Hoey, Michaela Mahlberg, Michael Stubbs and Wolfgang Teubert With an introduction by John Sinclair Web As Corpus Theory and Practice Maristella Gatto

The Prosody of Formulaic Sequences A Corpus and Discourse Approach Phoebe M. S. Lin

BLOOMSBURY ACADEMIC Bloomsbur y Publishing Plc 50 Bedford Square, London, WC1B 3DP, UK 1385 Broadway, New York, NY 10018, USA BLOOMSBURY, BLOOMSBURY ACADEMIC and the Diana logo are trademarks of Bloomsbury Publishing Plc First published in Great Britain 2018 Paperback edition published 2020 Copyright © Phoebe M. S. Lin, 2018 Phoebe M. S. Lin has asserted her right under the Copyright, Designs and Patents Act, 1988, to be identified as Author of this work. For legal purposes the Acknowledgements on p. x constitute an extension of this copyright page. Cover image © iStock/RadomanDurkovic All rights reser ved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. Bloomsbur y Publishing Plc does not have any control over, or responsibility for, any third-party websites referred to or in this book. All internet addresses given in this book were correct at the time of going to press. The author and publisher regret any inconvenience caused if addresses have changed or sites have ceased to exist, but can accept no responsibility for any such changes. A catalogue record for this book is available from the British Librar y. Librar y of Congress Cataloging-in-Publication Data Names: Lin, Phoebe M. S. (Phoebe Min sum Lin), author. Title: The prosody of formulaic sequences: corpus and discourse / Phoebe M.S. Lin. Description: London: Bloomsbur y Academic, [2018] | Series: Corpus and discourse | Includes bibliographical references and index. Identifiers: LCCN 2017055772 (print) | LCCN 2017060063 (ebook) | ISBN 9781441132512 (ePDF) | ISBN 9781441100856 (ePub) | ISBN 9781441181152 (hardcover) Subjects: LCSH: Prosodic analysis (Linguistics) | Oral-formulaic analysis. | Linguistic analysis (Linguistics) | Linguistic models. | BISAC: Language arts & discipline / linguistics / general. Classification: LCC P224 (ebook) | LCC P224 .L36 2018 (print) | DDC 414/.6–dc23 LC record available at https://lccn.loc.gov/2017055772 ISBN: HB: 978-1-4411-8115-2 PB: 978-1-3501-5530-5 ePDF: 978-1-4411-3251-2 eBook: 978-1-4411-0085-6 Series: Corpus and Discourse Typeset by Deanta Global Publishing Ser vices, Chennai, India To find out more about our authors and books visit www.bloomsbur y.com and sign up for our newsletters.

Contents List of Illustrations Acknowledgements 1 2 3 4 5 6 7

Introduction Formulaic Language: An Overview Can We Identify Formulaic Language Based on Prosodic Cues? Study One: Do Formulaic Sequences Align with Intonation Units? Study Two: A Comprehensive Profile of the Intonation, Stress and Rhythm of Formulaic Language Study Three: A Multimodal Approach to the Identification of Formulaic Language by Native Speaker Judgement Conclusions: The Prosody of Formulaic Language

Appendix 1: Information on the Pilot Study Appendix 2: The Instructions Used in Various Sessions of the Pilot Study Appendix 3: Screenshots of the Interactive PowerPoint Interface with Integrated Audio Playback Appendix 4: The Task Booklet for the Native Speaker Judgement Process Appendix 5: Instructions for Prosodic Transcribers Appendix 6: Prosodic Transcription of Text W Appendix 7: Formulaic Sequences Assigned the Maximum Confidence Score in Text W Appendix 8: The Duration of the Task for each Native Speaker Judge Notes Bibliography Author Index Subject Index

viii x 1 11 35 65 113 133 155 173 177 181 188 193 196 201 206 207 214 228 231

List of Illustrations Figure 2.1

Figure 2.2 Figure 4.1 Figure 4.2 Figure 4.3 Figure 4.4 Figure 5.1 Figure 6.1

Table 2.1 Table 2.2

Table 3.1 Table 3.2 Table 3.3 Table 3.4

Altenberg’s (1998) categorization of spoken English phraseology based on the London-Lund Corpus of Spoken English The results of an attempt to take the exhaustive approach with corpus-based automatic extraction The four possible outcomes concerning the alignment of formulaic sequence boundaries with IU boundaries The waveform and pitch changes of I don't know why in Example 4 The waveform and pitch changes of I don't know why in Example 5 An overview of the discourse structure of Text W Measures of the fundamental frequency over time of Text W Comparison of the judges’ formulaicity judgements using an Excel spreadsheet

Size of the datasets of formulaic language studies that use native speaker judgement Definitions of formulaic language in empirical studies that involve native speaker judgement as a formulaic language identification method An overview of the studies on the prosody of formulaic language since the 1970s A hierarchy of prosodic discontinuities (Knowles, 1991, p. 153) A summary of relevant quotes about the claim that formulaic language forms single intonation units A summary of selected observations in Wells (2006) which show the peculiarity of stress placement in idiomatic language

14 21 73 79 80 86 115 144

28

31 40 50 53 58

List of Illustrations

Table 4.1 Table 4.2 Table 4.3 Table 4.4 Table 4.5 Table 4.6 Table 4.7 Table 5.1 Table 5.2 Table 5.3 Table 5.4

Table 6.1 Table 6.2 Table 6.3 Table 6.4

The most frequent clusters in NICLEs-CHN and their frequencies A summary of acoustic markers associated with intonation group boundaries Examples of I don’t know why as sentence stem, comment clause and disclaimer in NICLEs-CHN The alignment between intonation unit boundaries and instances of I don’t know why of different function categories Distribution of the sixty-two formulaic sequences assigned the maximum confidence score The distribution of formulaic sequences of confidence scores 1 to 5 in Text W An overview of the alignment of formulaic sequences with intonation unit boundaries in Text W The distribution of pauses around selected formulaic sequences The distribution of stress in the target formulaic sequences, Text W and SEC sub-corpus The distribution of stress in formulaic sequences, Text W and SEC lecture sub-corpus Stressed/unstressed ratios of lexical and function words for the three groups: formulaic sequences, Text W and SEC lectures Landis and Koch’s (1977, p. 165) guidelines on the interpretation of the Cohen’s Kappa coefficient Inter-judge reliability (Cohen’s Kappa) of the judgement data BEFORE the thirty judges listen to the audio recordings Inter-judge reliability (Cohen’s Kappa) of the judgement data AFTER the thirty judges listen to the audio recordings The number of judges each individual judge agrees with at κ ≥ 0.2, κ ≥ 0.3 and κ ≥ 0.4 levels

ix

69 71 74 76 104 105 107 116 123 124

125 145 146 150 152

Acknowledgements The research that led to this book began in 2007 when, after reading Wray’s Formulaic Language and the Lexicon, I became interested in the notion that formulaic sequences could be validated based on prosodic cues. Like many other researchers of formulaic language, I felt frustrated by the lack of robust methods for identifying and extracting formulaic language from naturally occurring texts. As an umbrella term, formulaic language refers to many types of lexicalized word combinations that display, to varying extents, features such as formal fixedness, semantic transparency, familiarity to the discourse community and frequency of occurrence in language corpora. To find a method capable of extracting or validating all types of formulaic language seemed impossible, until a group of researchers (e.g. Aijmer, 1996; Peters, 1977, 1983; Baker and McCarthy, 1988; Wray, 2002, 2004) proposed the idea that it might be possible to validate the formulaicity of word sequences based on their prosodic features. The suggestion that formulaicity can be validated using prosodic evidence is very attractive because it is possible that this method could be applicable to all subtypes of spoken formulaic sequences. In fact, some researchers have gone even further, suggesting that prosodic cues may, in fact, reveal the degree of fixedness of word sequences. If this suggestion is found to be valid, then the fundamental issue, which has long hindered the development of formulaic language research, of extracting or validating formulaic sequences, could finally be overcome. This book presents three original studies which empirically assess the hypothesis that formulaic language can be identified based on prosodic cues. These three studies examined the hypothesis from different angles. They also reflect a progression in the depth of our understanding of how this method, known as the phonological method, can be applied to the analysis of naturally occurring data. Study one examined whether formulaic language could be identified by tracking intonation unit boundaries. The results suggested that, although the tracking of intonation unit boundaries alone is not sufficient to identify formulaic language in the spontaneous speech of native speakers and proficient learners, it may give some indication as to the level of formulaicity of word sequences.

Acknowledgements

xi

Study two considered whether formulaic language can be identified by prosodic cues, particularly tempo and stress placement. As a first step in this direction, the study aimed to establish, empirically, whether formulaic language demonstrates unique temporal and stress patterns. Samples of formulaic sequences taken from an academic lecture extract were analysed in terms of their temporal and stress patterns. Among other observations, it was found that words within formulaic sequences were markedly less likely to attract stress than general lecture speech. Furthermore, speakers may slow down their articulation rate in order to draw attention to the meaning of individual formulaic sequences. Study three adopted an alternative interpretation of the phonological method in the identification of formulaic language. It asked whether allowing judges to listen to the prosody of formulaic sequences could reduce the subjectivity of their formulaicity judgements and increase the level of agreement between judges. The results of this study provided an affirmative answer to this question and, at the same time, revealed the mechanism through which hearing the audio recording improved the use of collective native speaker judgement as a formulaic language identification method. These results show that, while the search for the prosodic cues unique to formulaic language should continue, the replacement of textual speech transcripts with the multimodal transcripts in the process of formulaic language identification by collective native speaker judgement is an important alternative means of realizing the phonological method. In this book, the three studies are arranged chronologically according to the order in which they were conducted. As will become clear, the findings and observations made in the first study influenced the approach and methodological decisions taken in the second. A similar link exists between second and the third study. This progression in approach and methodology throughout these three studies reflects how my own ongoing interest in the exploration and understanding of the prosody of formulaic language has progressed over the past decade. Therefore, this book may be considered as an interim report of my findings on formulaic language prosody so far. It is hoped that this book will stimulate a greater interest in the topic among researchers of formulaic language and applied linguistics. This is important, as further empirical studies will be necessary to develop a more accurate, comprehensive and in-depth understanding of the prosody of formulaic language. Finally, I would like to thank the great number of organizations and people who have offered their generous support for the research that has formed the basis of this book. Preparation of this book was supported, in part, by a research grant from the Hong Kong Research Grants Council (Project number: 25612116) and

xii

Acknowledgements

an internal research grant from the Hong Kong Polytechnic University (Project code: 1-ZVET). Thanks are due also to the University of Nottingham School of English Studies for giving me access to the corpora used in studies one and two. During the preparation of this book, I have benefited tremendously from important discussions with many helpful colleagues and friends. I am indebted to Svenja Adolphs, who I was very privileged to have as my PhD supervisor. Her determination, drive and energy have been infectious. In fact, it was Svenja’s passion for applying innovative technologies to applied linguistics that inspired my own interest and my latest approach to the prosody of formulaic sequences, which taps into the web itself as a dynamic multimodal corpus. I am grateful to Ronald Carter, whose wise words about the fundamental importance of spoken versus written communication and the prosodic salience of formulaic sequences have had a profound impact on all of my research projects. His guidance and advice have shaped my research, and his wisdom has been a constant influence on my teaching. Thanks also go to Jean Hudson, Alison Wray, Diana van LanckerSidtis, Norbert Schmitt, Peter Skehan, Zoltán Dörnyei, William S-Y Wang, BitChee Kwok, Philip Durant, Anna Siyanova-Chanturia, Keiko Tsuchiya, Steve Kirk, Ronald Martinez, Ana Pellicer-Sanchez, Penny Ding, Tatsuya Taguchi, Irina Dahlmann, Christine Yu and Letty Chan for their invaluable comments and suggestions relating to various aspects of my research on formulaic language. I should emphasize, however, that they do not necessarily endorse the views expressed in this book, and any inaccuracies or misunderstandings found within it are my responsibility. I am indebted to the series editors Wolfgang Teubert and Michaela Mahlberg and the editorial team at Bloomsbury for their continuous and valuable support. Special thanks go to Georgina Phillips who has been a great help in preparing the final version of this manuscript for publication. This book is dedicated to my parents whose unfailing support has been essential in allowing me to pursue my studies in applied linguistics. They have given up so much so that I could indulge myself and be free to pursue my own interests in the academic world. I hope they will be proud of my work.

1

Introduction

Formulaic language, which is ‘any sequence of two or more words that are perceived to be more constrained than usual in their co-occurrence’ (Hudson and Wiktorsson, 2009, p. 81), is ubiquitous in everyday language. In the past two decades, many corpus-based studies have been conducted to reveal the use of formulaic language in spontaneous speech (e.g. Altenberg, 1998; Altenberg and Eeg-Olofsson, 1990; Biber, 2006; Biber and Barbieria, 2007; De Cock, 1998, 2000, 2007; Simpson, 2004). A very popular approach has been to examine highly recurrent, contiguous sequences of word forms extracted from a spoken corpus using automatic extraction tools like WordSmith (Scott, 2012). This approach has revealed many important findings; however, treating spoken and written data in the same way may mask the fact that meaning in spoken language is, to a great extent, encoded in speech prosody. Words only hold part of the overall contextual meaning of an utterance. To fully understand meaning in context, one must also consider the prosody of the utterance. The English cliché ‘It’s not what you say, but how you say it’ captures the essence of this argument. Nonetheless, many spoken corpora (with the notable exceptions of the Spoken English Corpus and the London-Lund Corpus of Spoken English) still do not provide prosodic information either through prosodic transcriptions or through audio streams synchronized with the transcripts. This explains, in part, why researchers in the past often treated spoken formulaic language and written formulaic language in the same fashion. If it is found that there is a particular way in which formulaic language should be spoken, the impact of such a discovery could be tremendous since it concerns not only the teaching and learning of formulaic language, but also formulaic language research methodology. In relation to language teaching, researchers including Pawley and Syder (1983), Wray (2002) and Wood (2012) have long argued that if learners want to achieve native-like fluency and proficiency in English as a second or foreign language they need to have a similarly sized

2

The Prosody of Formulaic Sequences

store of formulaic language as native speakers. A very important point that has been overlooked so far is that every entry of formulaic language in the mental lexicon must contain the phonological form of the formulaic language in addition to its orthographic, structural and semantic information (see also Lin, 2012, 2018). Therefore, to be able to use formulaic language effectively, English language learners must also master the prosody of formulaic language. If there is a particular prosody with which formulaic language is said, English language learners should be aware of it and teachers should likewise consider teaching it. English prosody, however, has not, until now, been a priority in the English language teaching (ELT) syllabus. This reflects a lack of awareness of the importance of prosody in conveying (and contradicting) meaning. The work of phonologists including Bolinger (1989), Crystal (1969, 2003), Halliday (1967) and Wennerstrom (1994, 1998, 2001, 2006) has shown that prosody is a complex system that reflects emotion, contextual meaning, grammatical structure and cohesion. Aijmer (1996), for example, discussed the distribution of nuclear tones in the multi-word formula thank you and how tone choice makes a difference to the emotional weight of the phrase (see also Wells, 2006, Section 2.19). One can imagine learners learning to use thank you but without an understanding of the tones required to add their implied meaning to the formula, thus impeding their ability to communicate the meaning fully. Likewise, Cowie (1988, p. 134) used the example of do you know as in the utterance Do you know, he was still in bed! to illustrate the argument that intonation is an indispensable part of knowledge in the successful use of formulaic language: In order to use this formula successfully, the speaker requires knowledge of invariant form, syntactic position (initial rather than final) and intonation (fallrise on know).

In the field of formulaic language research, investigations into the prosody of formulaic language are also likely to have a profound impact on research methodology. Nowadays corpus-based automatic extraction is the most commonly used identification method of formulaic language. This identification method makes it possible for researchers to look at patterns of formulaic language use in a language corpus on a large scale. However, the key problem, as mentioned earlier, is that it treats spoken and written formulaic language in the same fashion, due to the neglect of phonological evidence (see also Section 2.3.1). In fact, many researchers (e.g. Aijmer, 1996; Altenberg and Eeg-Olofsson, 1990; Baker and McCarthy, 1988; Bloom, 1973; Hickey, 1993; Peters, 1977, 1983; Plunkett, 1990; Weinert, 1995; Wray, 2002, 2004; Wray and Namba, 2003)

Introduction

3

have suggested that formulaic language can be identified by its phonological and prosodic features. These prosodic features that are particularly associated with formulaic language will be reviewed in Chapter 3. The key point to be noted here however is that, so far, no empirical study has been conducted that has comprehensively investigated the hypothesis that formulaic language can be identified by its unique prosodic features (see Section 3.3). If phonological and prosodic cues are to assist in the identification of formulaic language in adult speech, an investigation into the prosodic features of formulaic language is a prerequisite. It is hoped that this will be achieved through study one and study two.

1.1 Is there a prosody of formulaic language? The previous section presented the potential impacts of the discovery that formulaic language has a distinctive prosody on ELT and formulaic language methodology. The key question being asked here is whether there is a prosody specific to formulaic language that is different from the prosody of spoken English. Phonologists including Ashby (2006) and Wells (2006) have provided an affirmative answer to this question. Ashby (2006) gives the example of to have eyes in the back of one’s head to illustrate how the prosody of idioms differs from the prosodic rules governing general spoken language. When this idiom is used in context as an utterance to describe a person who is very vigilant about everything around him or her (see (1) below), it must be delivered with stress or pitch prominence on the word head as in (1a). However, general prosodic rules should predict (1b) or (1c) instead because, given that everybody has eyes in the front of their head, the ‘surprising’ and ‘new’ information in the utterance that should be highlighted prosodically is eyes or back. However, it is clear that neither (1b) nor (1c) is appropriate because they convey to the hearer that a woman ‘literally’ has eyes in the back of her head. (1) She has eyes in the back of her head. (1a) She has eyes in the back of her HEAD. *(1b) She has EYES in the back of her head. *(1c) She has eyes in the BACK of her head. In the book English Intonation: An Introduction, Wells (2006) comprehensively presents the rules governing the prosody of spoken English. In presenting the

4

The Prosody of Formulaic Sequences

prosodic rules, he notes some exceptional cases which cannot be explained by the overarching rules. Because no logical explanation can be found for these cases, he declares these to be ‘idiomatic cases’, as the two quotes below clearly show: Some instances of a speaker accenting repeated words do not seem to have a logical explanation, and must be regarded as idiomatic. (Wells, 2006, p. 179)

and Rather than seek a logical explanation for this tonicity, perhaps we should regard such cases as merely idiomatic. (Wells, 2006, p. 181)

Examples of idiomatic cases include accenting the ‘empty word’ some in what Wells calls ‘idiomatic expressions’ such as for SOME reason, in SOME cases and SOME days (see Wells, 2006, p. 150), and the accenting of the verb to be in what he calls ‘intonational idioms’ such as the trouble IS, the thing IS, the difficulty IS and the snag IS (see Wells, 2006, p. 146). Other examples of idiomatic cases provided by Wells (2006) are summarized in Table 3.4. It is clear that from the phonologists’ (i.e. Ashby, 2006; Wells, 2006) perspective, a prosody of formulaic language exists that cannot be explained by general English prosodic rules. This is true at least in terms of stress placement. However, going beyond the work of Ashby (2006) and Wells (2006), it can be seen that the distinctive prosody of formulaic language in fact also extends to the assignment of intonation breaks or pauses. Utterance (2), below, is a concordance line taken from the British National Corpus. Any proficient speaker of English will recognize that there are only two options as to where to assign a break or a pause within the utterance, as in (2a) and (2b). In both cases, the ‘legitimate’ locations for the breaks and pauses are right at the boundaries of formulaic sequences, that is the thing is, organic produce and isn’t cheap, but not elsewhere. In fact, the break after the thing is appears to be obligatory as (2c), even if it exists, would be extremely rare. Hence, the break after the thing is is interesting because, if syntactic rules govern the prosody of formulaic language just as they govern the prosody of general spoken English, we would expect a break between the subject and the predicate, (in (2d) – The thing is the subject and is organic produce isn’t cheap is the predicate). (2) The thing is organic produce isn’t cheap. (2a) The thing is, organic produce isn’t cheap. (2b) The thing is, organic produce, isn’t cheap. *(2c) The thing is organic produce, isn’t cheap. *(2d) The thing, is organic produce isn’t cheap.

Introduction

5

Based on the accounts of the phonologists, it is clear that there is no shortage of examples in which no logical explanation can be found for the prosodic features observed other than to regard such cases as ‘idiomatic’. However, attention must be drawn to the fact that the discussions provided by phonologists have often been based on anecdotal and introspective data. Instead of putting the prosody of formulaic language at the centre of study, phonologists have a tendency to leave idiomaticity at the periphery and use the concept only as an explanation for ‘exceptional cases’ which general English prosodic rules cannot explain. From the perspective of formulaic language researchers, the phonologists’ approach is insufficient. First, formulaic language research requires an objective definition or a system by which the formulaicity of word sequences can be validated. However, the phonologists’ identification of formulaic language may appear ad hoc. Secondly, phonologists’ introspection cannot replace the empirical observations of the prosodic features of formulaic language based on real, spontaneous speech data. Thirdly, the introspective approach makes it difficult to establish the scale of the phenomenon of prosodic fixedness. On this basis, there is a great need for empirical investigations into the prosodic features of formulaic language which adopt a systematic approach not only to the prosodic description but also to the identification of formulaic language. This need is fulfilled in the three empirical studies presented here.

1.2 The three empirical studies This book reports three empirical studies into the prosody of formulaic language. Before introducing the three studies, it is necessary to point out that these three studies represent the journey of how my consideration over time of the overarching question can we identify formulaic language based on prosodic cues? has changed. To address this overarching research question, the original plan was to examine empirically whether formulaic language can be identified based on tracking intonation unit boundaries. This was what study one of the book accomplished. However, in examining the literature and the results of study one, it became obvious to me that the idea that formulaic language could be identified by tracking intonation unit boundaries alone might be too simplistic and onedimensional in two respects. First, it is possible that formulaic language could be identified on the basis of other prosodic cues such as stress and rhythmic patterns in addition to intonation. Secondly, the division of spontaneous speech into intonation units is affected by a series of globally and locally managed

6

The Prosody of Formulaic Sequences

contextual factors such as stylistic effects, cognitive resources and semantic focus; formulaicity is only one of the variables in the formula. With the aim of expanding the scope of the exploration of the overarching research question, study two was designed as an extension to study one. The aim of this study was to determine more precisely if formulaic language demonstrates any unique patterns in terms of stress placement and rhythm that may ultimately lead to the identification of formulaic language in spontaneous speech. In considering the use of prosodic cues to identify formulaic language in studies one and two, I realized that my exploration of the overarching research question had been particularly influenced by the unsaid assumption in the literature about how the phonological method would work. This untold assumption is that the tracking of a finite and unique set of prosodic cues should reveal the extent to which a word sequence is formulaic. This assumption influenced the design of studies one and two. However, this is only one way of interpreting the significance of the phonological method in formulaic language research. An alternative interpretation of the phonological method exists, however, which has been inspired by the recent trend towards multimodality in corpus linguistics. In thinking about the practicalities of data collection for this book, I discovered that in the past the native speaker judgement method of identifying formulaic language was conducted on textual transcripts without the provision of accompanying audio recordings. Omitting the audio component in the process of native speaker judgement is problematic because the native speaker judges have no access to meaning that is encoded in prosody. Furthermore, if formulaic language can be identified by tracking a finite and unique set of prosodic cues, it is necessary for the native speaker judges to be able to listen to the prosody of formulaic language in the identification process. That is why this alternative interpretation of the phonological method adopts a multimodal approach in the process of native speaker judgement, under which native speaker judges are allowed to listen to the original recordings of the spoken texts alongside the transcripts in which they are required to identify formulaic language. This alternative phonological method was what study three aimed to test. All in all, the three empirical studies were designed to explore, from different angles, the question of whether and how formulaic language can be identified using prosodic cues. Furthermore, studies one to three represent how my consideration of the overarching research question developed over time. The starting point, in study one, is the simple belief that formulaic language can be identified by tracking intonation units; then study two expands the exploration

Introduction

7

by considering the stress and rhythmic patterns of formulaic language; finally, study three reflects the realization that an alternative interpretation of phonological method can be adopted, in which native speaker judges are allowed to listen to the audio recordings of the transcript in the process of formulaic language identification. This alternative interpretation of the phonological method, however, should not be considered as a replacement of the original interpretation which is reflected in studies one and two. They are both necessary. While it is essential to discuss the overarching question that links together the three empirical studies, each of the empirical studies addressed very specific research questions about the use of prosodic cues in the identification of formulaic language. To be precise, they were designed to answer research questions one, two and three respectively. 1. To what extent do formulaic sequences align with intonation unit boundaries a) in the fluent, spontaneous speech of native speakers and proficient English-as-a-Foreign-Language (EFL) learners; b) when the formulaic sequences are identified by automatic extraction or by collective native speaker judgement; and c) when the formulaic sequences are classified according to how confident the native speaker judges are of their formulaicity judgements? 2. Does formulaic language demonstrate any unique patterns in terms of tempo/rhythm and stress placement? 3. Will listening to the audio recordings in the process of native speaker judgement increase the level of agreement among the judges in their formulaicity judgements? Study one investigated the extent to which formulaic sequences aligned with intonation unit boundaries. This investigation was motivated by the prospect that formulaic language could be identified based on its prosodic cues. Dechert (1983) and Raupach (1984) observe that formulaic sequences in the speech of language learners are delineated by pauses and hesitation phenomena. However, the view of Dechert (1983) and Raupach (1984) that pauses and hesitation phenomena delineate formulaic sequences is likely to apply only to the dysfluent speech of novice language learners, its application cannot be extended to the fluent speech of adult native speakers and proficient language learners (see Sections 3.3.2 and 4.1 for an elaboration). Instead, the literature suggests that,

8

The Prosody of Formulaic Sequences

in the fluent speech of adult native speakers and proficient language learners, intonation unit boundaries would be found more readily at the boundaries of formulaic sequences. Considering impact of the identification method on the prosodic description of formulaic language (see Section 2.3 for further discussion), both of the mainstream formulaic language identification methods, namely corpus-based automatic extraction and native speaker judgement, were used in study one. Because there were two sets of factors affecting this study (automatic extraction/ native speaker judgement and native speaker speech/proficient English learner speech) a 2  ×  2 design would have been expected. However, such a research design was not adopted due to the nature of native speaker judgement as an identification method. As will be discussed in Section 2.3.2, the process of native speaker judgement emphasizes the use of native speakers because these people have had extensive exposure to their native language and the formulaic language particular to their speech community. Based on this logic, it is only appropriate to ask a native speaker of English to judge the formulaicity of spoken data from fellow native speakers of English, but not data from the proficient English learners (see Section 2.3.2). Due to this restriction concerning the use of native speaker judgement, the only option was to match the method of native speaker judgement with the native speaker spoken data, and the method of corpus-based automatic extraction with the proficient English learner spoken data. The English learner spoken data that informed study one came from a 230,000-word sub-corpus of the Nottingham International Corpus of Learner English compiled by the University of Nottingham (see Dahlmann, 2009 and Section 4.2 for further information about this corpus). The native speaker spoken data came from an academic lecture collected in the Nottingham Multimodal Corpus that was also compiled by the University of Nottingham (Knight, 2009; Knight, Evans, Carter and Adolphs, 2009). Many corpora were considered but these two corpora in particular were chosen because of the availability of the original audio recordings (see Section 4.6 for further information). In these studies, the potential of native speaker judgement was fully exploited as the native speakers were asked not only to identify formulaic language but also to indicate, on a scale of 1 to 5, how confident they were in their formulaicity judgement for each individual formulaic sequence (see Section 4.7.2 for details). The main goals of this set-up were as follows: (1) to explore whether the level of alignment between formulaic language and intonation unit boundaries increased with the level of confidence the judges had in their judgement (for study one); and (2) to help select formulaic sequences for detailed prosodic analysis (for

Introduction

9

study two). This formulaicity judgement scoring system also served as a measure for improving the method of native speaker judgement by addressing ‘the human factor’. It was thought that the judges might find the formulaicity judgement task easier if they were given the opportunity to express differences in the level of confidence they had with the formulaicity judgement of certain word sequences (see Section 2.3.3 for a detailed discussion). Study two explored whether formulaic language in adult native speaker speech demonstrated any unique patterns in terms of rhythm and stress placement. It examined the rhythm and stress placement within sequences identified in the university lecture data using native speaker judgement in study one. As mentioned above, this study was motivated by the need to broaden the basis of the phonological method by considering whether formulaic language could be identified by tracking other unique rhythm and stress placement patterns in addition to intonation units. To increase the credibility of the prosodic investigation, external English prosody experts were asked to provide a prosodic transcription of the lecture extract independently without knowing which formulaic sequences would be analysed in study two. The formulaic sequences that had previously been assigned the highest formulaicity judgement scores were then mapped onto the prosodic transcriptions provided by the experts. To put the prosodic analysis of the formulaic language in the lecture data into perspective, an academic lecture sub-corpus from the Spoken English Corpus (SEC) compiled by the Lancaster University (Knowles, Wichmann and Alderson, 1996; Knowles, Williams and Taylor, 1996) was introduced to enable comparison. The aim of this comparison was to reveal whether the formulaic language in the lecture data demonstrated any rhythm and stress placement patterns that were not found in the Spoken English Corpora (SEC) academic lecture sub-corpus. Clearly, the three empirical studies have provided deeper understanding of whether and how formulaic language can be identified using prosodic cues. In the following chapters, the details of these studies, their significance and their implications for formulaic language and other research disciplines will be provided.

1.3 The structure of the book The book consists of seven chapters. Chapters 2 and 3 cover the theoretical ground and review previous research on formulaic language and the prosody

10

The Prosody of Formulaic Sequences

of formulaic language respectively. The review of formulaic language research in Chapter 2 looks at fundamental issues concerning the definition and the identification of formulaic language. Of particular importance is this chapter’s discussion of the characteristics of the two mainstream methods of formulaic language identification, namely corpus-based automatic extraction and native speaker judgement, as they underlay the methodological considerations in the three empirical studies presented here. Chapter 3 systematically reviews the literature on the prosody of formulaic language and the idea of identifying formulaic language on the basis of prosodic evidence. It first begins by considering the great potential of the phonological method in formulaic language research. It then examines the existing literature to see how the idea has evolved over time. After that, the attention shifts to the question of what prosodic cues may reveal about the psycholinguistic processing of formulaic language. The chapter ends with a review of all the prosodic cues associated with formulaicity in the literature. Chapters 4, 5 and 6 present, in turn, the three main studies, including discussions of the rationales, methodologies and results. Chapter 4 explores the extent to which formulaic language is delineated by intonation unit boundaries in the spontaneous speech of native speakers and proficient EFL learners. Chapter 5 presents an investigation of rhythm and stress placement within formulaic language. Chapter 6 examines whether listening to the prosody of formulaic language can increase the level of agreement between judges in their formulaicity judgements. Finally, Chapter 7 brings the book together by looking at the findings, limitations and implications of the book. The first part reviews and highlights how the literature review and the results of the three empirical studies answer the overarching research question. It also reflects on the approaches taken to address the research question including their merits and limitations. The second part looks ahead and presents how the significance of this book extends beyond addressing a single, specific methodological issue. More specifically, this part will discuss the implications and applications of the book in the areas of formulaic language research, language teaching and Natural Language Processing. The book ends with a reiteration of the importance of continuing research into the prosody of formulaic language.

2

Formulaic Language: An Overview

This chapter focuses on the fundamental issues concerning the definition and the identification of formulaic language, which underlay the methodological considerations of the present research. Section 2.1 begins with the basics, by defining formulaic language. Section 2.2 looks at the criteria for identifying formulaic language, with a particular focus on the important role that the phonology of formulaic language has played in the identification of child formulaic language. Section 2.3 presents an in-depth exploration of the characteristics of corpus-based automatic extraction and native speaker judgement and how these characteristics have affected the methodological decisions taken in studies one to three. Particularly noteworthy is the discussion of the implementation of the native speaker judgement study in Section 2.3.3. In that section, attention will be drawn to the way that lapses of concentration, fatigue, uncertainty and internal inconsistency in judgement can undermine the credibility and robustness of formulaic language data collected using collective native speaker judgements. This discussion supports the implementation of measures to justify the robust use of native speaker judgements as a method for sampling formulaic language in the present investigations.

2.1 What is formulaic language? In the history of linguistic research, researchers have often focused on single words as the units of meaning and linguistic analysis. While Firth’s (1957) words ‘You shall know a word by the company it keeps’ (p. 11) shed initial light on the co-selection of words, the true extent and significance of this relationship was revealed only when Sinclair (1966) and Halliday (1966) published their works on lexical grammar and lexicogrammar respectively. Outside of the lexicogrammatical tradition, the notion of formulaic language has also been

12

The Prosody of Formulaic Sequences

approached from the perspectives of psycholinguistics (Wray, 2002), usagebased models (Bybee, 2002; Ellis, 2002), construction grammar (Hudson and Wiktorsson, 2009; Wulff, 2008), first language acquisition (Peters, 1977, 1983) and second language acquisition (Dechert, 1983; Pawley and Syder, 1983; Raupach, 1984). It is fair to say that the rise of formulaic language research is the combined result of decades of investigations and observations based on these diverse perspectives. The definition of formulaic language that this book adopts as a working definition, that is ‘any sequence of two or more words that are perceived to be more constrained than usual in their co-occurrence’ (Hudson and Wiktorsson, 2009, p. 81), reflects, most strongly, the joint influence of the collocation research tradition represented by Firth (1957), Sinclair (1966) and Halliday (1966) and the construction grammar paradigm represented by Fillmore, Kay and O’Connor (1998), Goldberg (1995) and Kay and Fillmore (1999). Although the concept of multi-word items as units of meaning and linguistic analysis remains central to the notion of formulaicity, some psycholinguists see the significance of formulaic language in a different light. Wray and Perkins (2000), for instance, suggest that at the heart of formulaic language is the concept of holistic storage and retrieval. They believe that formulaic sequences are ‘stored and retrieved whole from memory at the time of use, rather than being subject to generation or analysis by the language grammar’ (p. 1). This concept of holistic storage and retrieval is central to the study of formulaicity because of the sense that all the properties of formulaic language originate from it, including its formal fixedness, phonological coherence and semantic non-compositionality. Despite the fact that my interest in the prosodic cues of formulaic language had its origin in psycholinguistics, the Wray and Perkins (2000) definition was not adopted. This was primarily because the existence of the concept of holistic storage and retrieval remains highly debatable (see Lin, 2010). It is true that a series of empirical studies has been implemented in recent years to explore the psycholinguistic processing of formulaic sequences (e.g. Conklin and Schmitt, 2007; Jiang and Nekrasova, 2007; Schmitt, Grandage and Adolphs, 2004; Schmitt and Underwood, 2004; Underwood, Schmitt and Galpin, 2004). However, whether the psycholinguistic experimental techniques used in these studies are sufficient to prove directly holistic storage and retrieval in action is questionable. As will be discussed in Section 3.4, the claim that prosodic cues reveal holistic storage and retrieval probably reflects a misinterpretation of the significance of prosodic cues as a window into the psycholinguistic processing of language. With regard to the definition of formulaic language, it was important for this book to build upon a firm foundation, therefore Hudson and Wiktorsson’s suggestion

Formulaic Language: An Overview

13

(2009), cited above, was used as a working definition. This definition provides more flexibility in its exploration of the prosody of formulaic language as it removes the restriction of equating prosodic cues such as single intonation units with holistic storage and retrieval. In doing so, it permits alternative perspectives on the significance of prosodic cues in revealing the psycholinguistic processing of formulaic language. One such alternative view will be presented in Section 3.4. The concept of formulaic language presented by Hudson and Wiktorsson (2009) is intended to be necessarily both inclusive and vague. Influenced by construction grammar, their argument is that ‘all language is at some level and to some extent formulaic’ (p. 78). Therefore, they prefer to deal with formulaicity in relative terms, rating formulaicity on a scale of less to more formulaic. Importantly, this concept of relative formulaicity is in line with the philosophy, as well as the eventual findings, of this book, in two essential ways. First, it introduces the innovation of a formulaicity judgement scoring system to standardize the process of formulaic language identification by native speaker judgement. This scoring system allowed native speaker judges to indicate how confident they were that the sequences they identified were formulaic. The results produced by this system demonstrated that word sequences can be placed along a sliding scale of formulaicity. Secondly, based on the collated responses of thirty native speaker judges, 96.3 per cent of a text was deemed to be, to some extent, formulaic, with varying scores along the formulaicity judgement scale. This again lends support to Hudson and Wiktorsson’s (2009) argument that all language is, at some level and to some extent, formulaic. The discussion so far has presented formulaic language as a unitary category. However, it is important to highlight the fact that formulaic language is an umbrella term covering a wide range of subtypes. For instance, Van Lancker (1987) categorizes non-propositional speech according to its conceptual level of novelty. These categories include sentence stems, schemata, frozen metaphors, formulas (greetings), pause fillers and so on. Also working on spoken English, Altenberg (1998) provides a structural/functional categorization of spoken English phraseology based on his empirical study of highly recurrent contiguous sequences of word forms in the London-Lund Corpus of Spoken English (see Figure 2.1 for my graphical summary of Altenberg’s categories). Similarly, Altenberg (1998) and Biber, Conrad and Cortes (2003) have provided a corpusbased and functional taxonomy of highly recurrent contiguous sequences of word forms. However, the categories in the Biber, Conrad and Cortes (2003) taxonomy are rather different from those of Altenberg (1998) as they do not make structural completeness the primary distinguishing factor. Therefore, both

14

The Prosody of Formulaic Sequences

Independent Responses Metaquestions

Epistemic tags

Incomplete phrases

Clause constituents

Full clauses Dependent Comment clauses Apposition marker

Indirect questions

Single

Multiple

Collocational frameworks

Prepositions

Vagueness tags

Qualifying expressions

Frame

Onset

Conjunctions

Premodifying quantitiers

Intensifier/ quantitiers

Connectors

Stem

Medial

Quantitiers

Intensitier

Temporal expressions

Spatial expressions

Rheme

Tail

Permutation

Separation

Transition

Figure 2.1 Altenberg’s (1998) categorization of spoken English phraseology based on the London-Lund Corpus of Spoken English.

The Prosody of Formulaic Sequences

Spoken English phraseology

Formulaic Language: An Overview

15

structurally complete and incomplete word sequences can be found under the four major functional categories, namely ‘referential bundles’, ‘text organisers’, ‘stance bundles’ and ‘interactional bundles’ (see also Biber and Barbieria, 2007). It should be emphasized that the choice of terminology used and the categorization of formulaic language into subtypes are by no means uncontroversial decisions. In practice there is no current consensus over the use and the meaning of terms referring to formulaic language. For this reason, formulaic language has been variably termed (to name a few) non-propositional speech, chunks, fixed expressions, formulaic language, formulas, multi-word units, prefabricated routines and patterns, conversational routines, ready-made expressions, lexical phrases, clichés, collocations, idioms and idiomatic language (see Wray, 2002 for a discussion of the issue of terminology in formulaic language research). The position of this book is that each subtype of formulaic sequence probably contains a particular property or a set of properties of formulaic language. However, researchers often follow different practices when naming and categorizing the subtypes of formulaic language to correspond to their specific linguistic sub-disciplines. For instance, conversational routines have a particular, pragmatic role in spoken dialogue. Greetings like how are you? mark the beginning of conversations and phrases such as see you later mark the end. The term multi-word units is used widely in computational linguistics and it focuses on the orthographic form of the sequence. As the name suggests, multi-word units must consist of more than one word, thus excluding items such as so and however. Fixed expressions are relatively unchangeable in form, while idioms used to refer to semantically opaque items such as spill the beans and kick the bucket, although modern use of the term also includes more semantically transparent items such as I am writing to X or let’s think about X. Less frequently mentioned in the literature, however, is the fact that that the range of formulaic language used can also be examined from an age/population perspective. While people of all ages use formulaic language extensively, the nature of the formulaic language that a child uses may be very different from that used by native speaker adults or foreign language learners (see Erman, 2007 for a study that compares the use of formulaic language in adults to that of adolescents). This clarification concerning the diverse nature of formulaic language in children, language learners and adult native speakers is important background knowledge for the discussion in Section 3.3. These issues with terminology and categorization are not easy to deal with and, unfortunately, this book can only afford an oversimplified account of both issues. This is because the focus is on the prosody of formulaic language, with formulaic language identified in the empirical studies on the basis of

16

The Prosody of Formulaic Sequences

clearly stated corpus-based automatic extraction or collective native speaker judgement procedures. Rather than presenting a prolonged discussion of the various practices noted in the literature for defining and categorizing formulaic language, this book accepts the fact that there are many facets of formulaicity and that the practice researchers adopt largely depends on their research focus.

2.2 The criteria for formulaic language Over the past decades, many researchers (e.g. Hickey, 1993; Langlotz, 2006; Moon, 1997; Peters, 1983; Wray and Namba, 2003; Zgusta, 1967) have developed criteria for the identification of formulaic language. These criteria have been developed over years of careful observations of the features of formulaic language. Therefore, by discussing selected lists of identification criteria, some understanding can be gained about the features of formulaic language. The main focus of this section, however, is to highlight the fundamental role of the phonological criterion in the identification of formulaic language. This will be achieved through an examination of the relevant literature. The most comprehensive set of criteria for the identification of formulaic language was generated by child language researchers, including Hickey (1993), Peters (1983) and Wray and Namba (2003).1 Their criteria cover a range of linguistic aspects belonging to formulaic language, including the form, grammar, phonology, lexical structure, sociolinguistics and pragmatics and, in doing so, they demonstrate the multidisciplinary nature of formulaic language. The following is a simplified list of the key features of child formulaic language based on the work of Hickey (1993), Peters (1983) and Wray and Namba (2003): (1) formulaic language demonstrates a sense of unity in terms of its phonology and grammatical and lexical structure; (2) formulaic language is familiar among the speech community; (3) formulaic language is relatively fixed in form; (4) formulaic language is grammatically more advanced than the rest of a child’s speech; (5) formulaic language is associated with certain functions in specific communicative contexts; and (6) formulaic language may be used inappropriately in context. The comprehensiveness of the above list of criteria for child formulaic language is not the only reason their work deserves attention. More importantly, these

Formulaic Language: An Overview

17

criteria, provided by child language researchers, demonstrate best the important role played by phonology in the identification of formulaic language (see Lin, 2018). In fact, it was the peculiar intonation pattern of formulaic language that led Peters (1977) to discover the existence of formulaic language in child speech in the first place. This discovery, without doubt, informed the first-ever mention in the literature of the possibility of identifying formulaic language using prosodic cues. Since Peters’ (1977, 1983) discovery, the peculiar intonation pattern of formulaic language has been one of the most crucial criteria for its identification in children’s language. The significance of the phonological criterion is further exemplified in the subsequent studies by Hickey (1993) and Plunkett (1990). In Hickey’s (1993) examination of her nine criteria for the identification of formulaic language in child language, the phonological criterion is the only one that is deemed both ‘essential’ and ‘non-graded’. In other words, the satisfaction of this phonological criterion is compulsory and categorical if a word sequence is to be counted as formulaic. Furthermore, Plunkett’s (1990) project, investigating the first language acquisition strategies of children between one and two years old, demonstrated how the phonological criterion alone is sufficient for the identification of formulaic language. In his study, he compared the then new articulatory/fluency criterion with the distributional/frequency criterion and concluded that the phonological criterion was a superior methodology for analysing the speech of very young children. The work of the aforementioned child language researchers has laid the foundation for the recent hypothesis that formulaic language in adult speech can also be identified using the phonological method. In light of the reported failure of formal and semantic criteria as a means of identifying formulaic language in adult speech (Wray, 2004), some researchers (e.g. Read and Nation, 2004; Wray, 2002, 2004) have expressed optimism over the potential application of the phonological method to solve this problem. Whether the phonological method can be as effective and successful in the identification of adult formulaic language as it is in the case of children is not immediately apparent. However, the discussions in the following chapters will reveal a clearer picture.

2.3 The two main methods of identifying formulaic language Section 2.2 examined the criteria developed to aid the identification of formulaic language. Nevertheless, even with these criteria in place, the identification of formulaic language is still a great challenge because formulaicity is a multi-

18

The Prosody of Formulaic Sequences

faceted and variable construct. For instance, each formulaic sequence has its own characteristics, which means that each needs to be assessed on a case-bycase basis.2 Some word sequences are perceived to be formulaic because their meanings are relatively opaque (e.g. kick the bucket), while some are perceived to be formulaic because they have irregular grammatical structures (e.g. by and large). Others are taken as formulaic because they seem to be used frequently by a particular speech community (e.g. how can I help you?). They do not necessarily satisfy all of the identification criteria (see those outlined in the previous section) or to the same extent (see Moon, 1997; Wray and Namba, 2003). Recent advances in technology have made fully automatic (or computerassisted) identification of formulaic language possible. Many computer programs (e.g. Scott’s 2012 WordSmith, Rayson’s 2003 Wmatrix, and Greaves’ 2009 ConcGram) have been developed to extract formulaic language based on user-defined criteria. These automatic tools have allowed researchers to explore formulaicity in naturally occurring language on a scale that would have been impossible if formulaic language had to be identified manually. However, as the development of these automatic tools is still in its infancy, they are not yet sophisticated enough to consider a balance of criteria (like those discussed in Section 2.2) in the process of automatic extraction. Due to the lack of flexibility in terms of orthographic, form-based extraction and the failure to include multiple identification criteria in the extraction algorithm, the products of corpus-based automatic extraction are of a very special type, which may be far from the expectations of researchers. This is especially true of lexical bundles or clusters, so named by Biber et al. (1999) and Scott (2012), respectively, to reflect their distinctive nature as highly recurrent, contiguous and fixed sequences of (unlemmatized) word forms. The decision about whether to use native speaker judgement or automatic tools to identify formulaic language could potentially affect the results of the studies presented in this book. The formulaic language produced by these two methods belongs to different types because the principles for selection are markedly different. For the prosodic descriptions of this book to cover a broad definition of formulaic language, it was necessary that descriptions were available for the products of both identification methods. This need was fulfilled by study one in its investigation into the extent to which formulaic language aligns with intonation unit boundaries. The review below aims to provide background information covering these two mainstream identification methods so that, when interpreting the results of study one, readers are also aware of the influence of the sampling method.

Formulaic Language: An Overview

19

As mentioned above, this section (2.3) also highlights the importance of introducing countermeasures against issues with lapses of concentration, fatigue, uncertainty and internal inconsistency in judgement in the manual formulaic language identification process. It will be argued that attention to details in this area can potentially improve the robustness of native speaker judgement as an identification method in formulaic language research.

2.3.1 Corpus-based automatic extraction Corpus-based automatic extraction has seen a marked increase in popularity since Sinclair (1991) first provided corpus evidence to show that idiomaticity is fundamental to the language production process and prevalent in our everyday language.3 In the past, researchers had to program their own tools to extract formulaic language from corpora, but now there are a number of ready-made program packages on the market for researchers to choose from. These packages, such as WordSmith (Scott, 2012), Wmatrix (Rayson, 2003) and ConcGram (Greaves, 2009), identify recurrent word combinations based on different principles and algorithms. WordSmith (Scott, 2012), for example, can extract all highly recurrent, contiguous and fixed sequences of unlemmatized word forms from any corpus and rank the results according to frequency of occurrence. The minimum frequency threshold and required length of the word sequences can be defined by the user. Despite being so widely used in formulaic language research, WordSmith is known to have some limitations, which are common among all automatic extraction tools. The most notable limitation is its lack of flexibility and discretion in terms of judgements of formulaicity. A number of researchers (e.g. De Cock, 1998; Simpson, 2004) have raised doubts about the formulaicity of some of the word sequences extracted by WordSmith (or other tools that operate using similar principles and algorithms), reporting their need to manually filter the results of the extraction for their research. Another problem with automatic tools that operate on similar principles and algorithms to WordSmith is the ‘arbitrariness’ of the boundaries of the sequences identified. This problem is seldom mentioned in the literature (e.g. Kennedy, 2008), but definitely impacted the way the prosodic analysis was carried out on the formulaic language in study one (see Chapter 4). WordSmith operates purely on the basis of frequency of occurrence; this reliance on a single criterion (as opposed to a balance of criteria) is exactly the reason why the boundaries of formulaic sequences can appear ‘arbitrary’. Examples of this can be found in data extracted from CANCODE, a five million-word corpus of British spoken

20

The Prosody of Formulaic Sequences

English. Adolphs (2006, p. 42) provides a table listing the ten most frequent two-word, three-word and four-word clusters in the corpus.4 This table indicates considerable overlaps in the results. For instance, the four-word unit I don’t know what actually contains the most frequent three-word unit I don’t know and the sixth most frequent two-word unit I don’t. While the decision of WordSmith is to count I don’t, I don’t know and I don’t know what as three different items, it is doubtful whether they actually constitute three different formulaic sequences. A further example is you know what I mean. Based on its operating principles, WordSmith has to cut you know what I mean after I and before know to produce the two most frequent four-word units which are you know what I and know what I mean, even though common sense reveals that you know what I mean is probably the real formulaic sequence in people’s mental lexicon. As will be discussed in Section 3.5.1, researchers, including Aijmer (1996), Altenberg and Eeg-Olofsson (1990), Baker and McCarthy (1988) and Moon (1997), have asserted that formulaic language forms single intonation units (i.e. there is complete alignment of formulaic language with the boundaries of intonation units). Similarly, Wray (2004) suggests that the tracking of pauses can help to identify formulaicity, a suggestion which hints that formulaic language is delineated by pauses. For this book to investigate the validity of these predictions about the alignment of formulaic language with intonation unit boundaries and pauses, certainty about the placement of the boundaries of formulaic language is crucial. Unfortunately, as discussed above, automatic tools cannot provide this, largely because of the rigidity of the operating principles. With the newer automatic extraction tools like ConcGram (Greaves, 2009) and Wmatrix (Rayson, 2003), such rigidity is lessened through the adoption of different operating principles (see Cheng, Greaves and Warren, 2006; Rayson, 2003 and the Wmatrix website http://ucrel.lancs.ac.uk/wmatrix/ for details). However, there is still some way to go before the problem with the apparent ‘arbitrariness’ of the boundary placement can be satisfactorily resolved. As mentioned above, the purpose of developing automatic tools to identify recurrent word combinations is that researchers can study formulaicity in a corpus of perhaps several hundred million words on a large scale. Automatic extraction is designed to reveal the general patterning of recurrent word combinations. It presents frequency lists so that researchers can see which sequences are the most and the least frequent in a particular genre. Researchers can also see concordances showing all the instances of the same word combinations in different texts. However, a weakness of corpus-based automatic extraction is that it is not suitable

Formulaic Language: An Overview

21

for investigations that take what I call ‘the exhaustive approach’ to the identification of formulaic language. This approach requires that all formulaic sequences in a text be identified, an approach which was taken by Erman and Warren’s (2000) study. By way of demonstration, I attempted to use the exhaustive approach with the corpus-based automatic extraction method. However, the results were very complex as can be seen in Figure 2.2. To obtain these results, I combined The British Academic Spoken English (BASE) Corpus of 1.6 million words and The Michigan Corpus of Academic Spoken English (MICASE) of 1.8 million words to produce a sizeable Academic English corpus from which WordSmith could extract contiguous sequences of unlemmatized word forms (or clusters) of two to six words, with the minimum frequency threshold set at five.5 Using Microsoft Excel, the resulted 114,768 clusters were then searched for one by one in the short, randomly chosen sample text of fifty-four words below, taken from a university lecture. erm so the as you can see this this session is is being filmed so I’m going to have to try and discipline myself to stand here within range of this light so if I start wandering over there I’ll get sent back. so that that should be the only difference in the lecture.

Altogether fifty-two clusters, which have been underlined in Figure 2.2, were found in this sample text. These results were considered ‘complex’ because, as seen in Figure 2.2, the clusters and their boundaries overlapped to a considerable extent. In one case, for instance, where one lexical bundle linked to another, the result was a long chain of words: so the as you can see this this session is is being filmed, which contained seventeen interlocking clusters. The way in which automatic extraction produced these interlocking chains, seen in Figure 2.2, helps to demonstrate why Erman and Warren’s study, which traces

Figure 2.2 The results of an attempt to take the exhaustive approach with corpusbased automatic extraction.

22

The Prosody of Formulaic Sequences

of the alternation between the idiom principle and the open-choice principle in naturally-occurring language, could not have been possible had automatic extraction been used. Nonetheless, the ambiguity of the cluster boundaries shown in Figure 2.2 again demonstrates why, for the present studies, which aim to assess the validity of the phonological method, the use of automatic extraction as an identification method could have been problematic. Corpus-based automatic extraction relies on the recurrence of unlemmatized word forms, but it does not take account of how those word forms are used, formulaically, in context. This reliance on form is potentially problematic because, as Bahns, Burmeister and Vogel (1986) point out, fixedness in form and lexical structure should not be the sole basis for the identification of formulaic language. The context must also be incorporated when considering the formulaic status of a wordstring: We soon came to the conclusion that the pertinent utterances could not just be regarded as formulas because of their fixed and rigid formal and lexical structure; instead, the function of some of them and their special formulaic status could only be discovered when related to the semantic-pragmatic background and situative context. (p. 699)

Issues arising from the failure of corpus-based automatic extraction tools to consider the context of use have long attracted attention in the empirical literature. De Cock (1998) gives the example of you know to illustrate how a formbased automatic extraction tool failed to consider the context and, therefore, extracted instances of you know whose formulaicity was ambiguous (see Section 6.2), along with other convincingly formulaic instances of the same sequence. Another observation which more closely corroborates the beliefs presented in this book is made by Lin and Adolphs (2009).6 Based on their corpus data, they observed that the same formulaic sequence can indeed serve more than one function, and, the extent to which an instance of formulaic language is aligned with intonation units is related to the function served by that instance in its context. As corpus-based automatic extraction tools, including WordSmith, do not distinguish between function and meaning, the expectation is that automatic extraction from a corpus may not be compatible with prosodic analysis because prosodic analysis must be situated in context. Prosodic features are managed at a local level in discourse and so are conditioned by many aspects of context, including lexical and syntactic structure, semantic focus and emphasis, emotion and other psycholinguistic processing factors. Prosodic analysis, supported by corpus-based automatic extraction, allows researchers to see the prosodic

Formulaic Language: An Overview

23

shape of many instances of the same formulaic sequences and then generate a description of the prosody of those particular sequences. However, to develop a robust model of the prosody of formulaic language, researchers also need to investigate the prosody of formulaic language identified by collective native speaker judgements. This is because this method considers meaning and function of word sequences in context. It can therefore prevent the problem, noted by De Cock (1998), that some instances of you know in her data might not actually have been used formulaically in context even though they shared the same word form with the well-recognized formulaic sequence you know (see Section 6.2). Considering the characteristics of corpus-based automatic extraction as an identification method and their implications for prosodic investigations of formulaic language, study one included both automatic extraction and native speaker judgement as identification methods so as to ensure that the results would not be considered as artefacts due to the choice of formulaic language sampling methods (see Chapter 4 for further discussion).

2.3.2 Native speaker intuitive formulaicity judgement and its application in the identification of formulaic language The Oxford English Dictionary (OED) defines intuition as ‘the action of looking upon or into; contemplation; inspection; a sight or view’. While people commonly take intuition to mean the same as hunch, gut reaction or instinct, the word ‘contemplation’ in this OED definition states that intuition is not really the random, ungrounded opinion that some people may believe it to be. The fact that people may not always be able to verbalize the logical reasons that lead to their intuitive formulaicity judgements has given researchers the impression that intuition is unscientific (see the discussion in Wray, 2002) and therefore of limited use in formulaic language research. This view, however, was quickly challenged by Wray and Namba (2003), who point out that native speakers’ intuition about formulaicity is more than just their ‘hunches’; it is, in fact, the product of the internal organization of their linguistic knowledge from their sociolinguistic experience. Wray (2002, p. 22) criticizes the way Bahns, Burmeister and Vogel (1986) used native speaker intuition to identify formulaic language in their study: There is a strong temptation to be unashamedly unscientific; for example, ‘we eventually listed a number of expressions that we intuitively regarded as formulas’. (Bahns, Burmeister and Vogel, 1986, p. 700)

24

The Prosody of Formulaic Sequences

Wray’s (2002) criticism is probably not to discourage the use of native speaker intuitive judgement but to highlight the need for a more careful and thoughtful approach to its use on the part of the researchers. For instance, Wray and Namba (2003, p. 27) suggest that if intuition is to be used as a replacement for firm external evidence, it must itself be externally validated. … Clearly, even if intuition is the starting point, the researcher will not fare well by simply asserting that the former ‘feels’ more formulaic than the latter.

On this basis, Wray and Namba (2003, p. 27) propose eleven criteria that ‘should capture most of the features that are likely to underlie an intuitive judgement’. Their idea is that these criteria will ‘enable the researcher to explore why he or she feels that a particular wordstring is formulaic, by establishing reliable justifications for the intuitive judgement’ (Wray and Namba, 2003, p. 27). Objections to intuition and its use in linguistic research do exist, and the general impression is that corpus linguists, in particular, are against the use of intuition. For instance, citing Sinclair (1991), Wray (2002, p. 21) notes, However, corpus research has revealed that ‘human intuition about language is highly specific, and not at all a good guide to what actually happens when the same people actually use the language’. (Sinclair, 1991: 4)

However, it could not be further from the truth to say that intuition has no role to play in corpus linguistics at all, a fact which is highlighted by numerous discussions in the works of many corpus linguists such as McEnery and Gabrielatos (2006), Sampson (2001), Sinclair (2004) and Stubbs (Stubbs, 1996). Sinclair (2004), for example, states clearly that intuition is not a gut reaction to events, but instead represents, in various ways, an educated guess. This view echoes what Wray and Namba (2003) suggest about the nature of intuition. A recent study by Wulff (2008) provides precisely the concrete empirical evidence needed to show that intuitive formulaicity judgements involve informed decisions, even though the knowledge that underlies the judgement is implicit and unconscious. Working within the field of corpus linguistics, Wulff (2008) was interested in exploring the extent to which native speakers’ intuitive judgements could be modelled using a quantitative corpus linguistic methodology (Gries and Stefanowitsch, 2004; Stefanowitsch and Gries, 2003, 2005). Using multiple regression analysis, she successfully isolated the factors (including (non-)compositionality and flexibility and other frequency-based variables) that could statistically account for intuitive idiomaticity judgements (see Lin, 2009

Formulaic Language: An Overview

25

for a review of this study). In this way, Wulff developed a quantitative model of native speaker intuitive idiomaticity judgements. In the context of the current studies, the significance of this model is that it shows how, when making intuitive judgements regarding formulaicity, native speakers are, in fact, unconsciously assessing a variety of variables (e.g. the passivizability of a construction and its flexibility with adverbial modification, the frequency of occurrence and the non-compositionality of a construction). Put simply, Wulff ’s (2008) study demonstrates that native speakers’ ‘intuitive’ formulaicity judgements represent anything but a gut reaction. The implications of Wulff ’s (2008) study and the contribution it has made to deepening understanding of native speaker formulaicity judgement cannot be underestimated. From the perspective of formulaic language research, in addition to its status as the first study to reveal the factors underlying native speaker intuitive formulaicity judgements, the Wulff study was also the first to ask laypeople (instead of linguists) to contribute their own intuitive judgements. Despite the acknowledged risks (see Wulff, 2008, p. 32), the attempt to use laypeople was based on the researcher’s aim of proving that making formulaicity judgements is not an expertise exclusive to linguists. If linguists alone are able to judge the idiomaticity of constructions, then people could counter that idiomaticity and idiomaticity judgements are not generalizable because they are only ‘driven through conscious initiatives on the side of the expert community’ (Wulff, 2008). Wulff ’s study, however, highlighted the fact that the ability to make formulaicity judgements is common to everyone, be they laypeople or linguistic experts. It is true that laypeople may not be able to explain their formulaicity judgements as easily or accurately as linguistic experts, but that is because laypeople do not have the training or the linguistic metalanguage to do so. Their implicit, intuitive knowledge of formulaicity, however, is perfectly adequate. Earlier, in Section 1.2, it was argued that if it was necessary for study one to control for the influence of the identification methods used while investigating the extent to which formulaic language aligns with intonation unit boundaries in the speech of native speakers and proficient language learners, the 2  ×  2 design was not feasible because it is inappropriate to apply native speaker judgements to spoken data from proficient learners. The explanation provided here is in relation to the discussion of the nature of native speaker judgement as a methodology. In the case of native speaker judgement, there is an emphasis on using native speakers as opposed to non-native speakers.7 That is to say, it is only appropriate to ask native speakers of British English to judge the formulaicity in the speech

26

The Prosody of Formulaic Sequences

(and writing) of British English, and learners of English to judge formulaicity in the speech (and writing) of learners from the same first language background. This necessarily narrow definition of the ‘native speakers’ in the native speaker judgement could be mistaken for negligence of the fact that English language learners with near-native proficiency can also develop native-like intuitive formulaicity judgements about English. After all, it could be argued that, given their bilingual background, non-native speakers (or learners) of English might be more sensitive than native speakers to peculiarities in structures and non-compositionality. However, the counter-argument here is that the native speakers’ linguistic experience and their implicit knowledge about formulaicity in their native language is the result of their extensive sociolinguistic experience and exposure to the language. This level of experience and exposure cannot be matched by EFL learners, who may not have much chance to use their foreign language outside the language classroom. To summarize, this section has examined the nature of native speakers’ intuitive formulaicity judgements. Evidence has been provided which shows that intuitive judgement is anything but a random, gut reaction. It has also considered the meaning of the term native speakers in preparation for the challenge of selecting judges to participate in the task of identifying formulaic language for studies one, two and three. Furthermore, it was noted that the native speakers used for the native speaker formulaicity judgement procedure do not need to be linguistic experts; the intuitive formulaicity judgements of laypeople are just as valid as those of linguists. Finally, an emphasis was placed on inviting judges from the same speech community as the contributors of the spoken data (e.g. native speakers of British English should judge the formulaicity of British English texts) due to these people’s extensive sociolinguistic experience of their native language.

2.3.3 Native speaker judgement as an identification method: The issue with implementation Unlike corpus-based automatic extraction, native speaker judgement relies on people. In using this method to support formulaic language research, managing the human factor is the key to success. Section 2.3.2 provided a discussion of the nature of native speakers’ intuitive knowledge of formulaicity and its application in the identification of formulaic language. This section develops the depth of this discussion through a consideration of some of the issues pertaining to the implementation of native speaker judgement in formulaic language research.

Formulaic Language: An Overview

27

2.3.3.1 The human factor and preventive measures against its negative influences In her study, which involved the use of native speaker judgement to identify formulaic language, Foster (2001, p. 84) reports the considerable impact of the human factor on the quality of the formulaicity judgement data: According to the written comments of all seven informants, theirs was not an easy task. Lapses of concentration with reading meant missing even obvious examples of prefabricated language, so progress was slow and exhausting. All seven reported difficulty in knowing where exactly to mark boundaries of some lexical chunks and stems as one could overlap or even envelop another. Nevertheless, after a certain amount of self-imposed revision, each reported feeling reasonably confident with their coding.

In the above report, at least five issues confronting the human judges can be identified: lapses of concentration, missing obvious examples, slow progress, exhaustion and uncertainty concerning decisions about boundaries and overlapping formulaic sequences. However, these issues have rarely been reported in the literature (with the exception of Foster, 2001) probably because (1) the human factor is not often considered to be worth additional attention; and (2) if researchers are not careful about the measures taken to counter the issues, drawing attention to these problems could, potentially, affect the credibility of the research findings. For instance, in discussing the impact of the five issues, Foster (2001) also adds that the judges in her study took the initiative to go over and revise their work. This note has helped to increase readers’ confidence in the quality of the formulaicity judgement data. Although self-imposed revisions can address some of the problems, in order to gain better control of the situation, researchers who use collective native speaker judgement to identify formulaic language should have preventive and precautionary measures in place before the identification procedures begin, rather than waiting and hoping that the judges will be self-disciplined enough to implement self-imposed revisions. Doing so requires careful planning and attention to detail (as will be demonstrated in Sections 4.7.1 and 4.7.2). Unfortunately, in previous research, the necessity of this kind of planning has often been overlooked. From the perspective of the researchers, collecting formulaicity judgement data is a straightforward task. As long as the instructions are clear and the judges understand what they need to identify, the identification procedure will run smoothly. However, human judges are not like automatic extraction tools such as WordSmith, Wmatrix and ConcGram,

28

The Prosody of Formulaic Sequences

which certainly do not have problems with concentration, exhaustion and uncertainty. As Erman and Warren (2000) note, if they fail to pay very close attention, it is very easy for judges to overlook some formulaic sequences that appear, at first sight, to be completely transparent combinations of words. For this reason, I argue here the need for greater control over the negative influences contributed by the human factor as, when judges miss or misjudge obvious items, they can potentially undermine the quality of the judgement data. If researchers consider the perspective of the native speaker judges, they will understand that the identification task is a challenge. First of all, it is a timeconsuming process. From the experience of study one, it takes a native speaker judge thirty to forty-five minutes to identify formulaic language in a text of about 850 words. Based on this rate, Foster’s (2001) native speaker judges could have spent an estimate of at least fifteen  hours going over the 20,000-word speech transcripts (see Table 2.1). Moreover, from the perspective of the judges, the task can also seem quite onerous and unappealing. For the whole fifteen hours, the judges had to read silently and think. Given the monotonous nature of the task, it is hardly surprising that Foster’s (2001) judges reported having lapses of concentration and feeling exhausted. These judges were applied linguists who had extensive experience teaching English as a foreign language; one can imagine that the situation is likely to be far more difficult for judges who have had no prior experience of text analysis and who are not used to analysing their own implicit knowledge about formulaicity. In fact, it is also very likely that the judges could become confused about their understanding of formulaicity during and after the task. This is exactly what Foster’s (2001) judges meant when they ‘reported difficulty in knowing where exactly to mark boundaries of some lexical chunks and stems, as one could overlap or even envelop another’ (p. 84). This confusion simply means that the judges began to realize and share the formulaic language

Table 2.1 Size of the datasets of formulaic language studies that use native speaker judgement

Size of datasets Bahns et al. (1986)

Estimated duration of task9

Erman and Warren (2000) Foster (2001)

3,000 sheets of written notes from observing child speech Spoken: 4,200–5,600 words Written: 1,800–4,800 words Spoken: about 20,000 words



15 hours

Wood (2006)

Spoken: 26,000 words

20 hours

8 hours

Formulaic Language: An Overview

29

researchers’ dilemma regarding the identification of formulaic language. That is, while it is easy to give prototypical examples of formulaic language, there is a considerable grey area when it comes to judging formulaicity. A number of measures can be taken to reduce the negative influences of human factors on the quality of formulaicity judgement data. These measures, which have been applied in all the studies presented in the following chapters (see Sections 4.7.1 and 4.7.2 for details about these measures), have one aim which is to make the identification task as brief, interesting, easy and efficient as possible for the judges. Concerning the instructions given to the judges about what to identify in the research text, it is tempting to provide them with as many tips and identification criteria as possible (such as the lists of criteria given in Section 2.2). However, researchers should also strike a balance between the level of detail given and the amount of new information the judges are able to process at the beginning of the identification task (see Section 2.3.3.2). With this in mind, it was important that only information essential to the identification procedure should remain in the instructions. The tricky question is, of course, how to separate the essential information from the non-essential information. The only way to answer this question is to carry out pilot testing, observe the performance of the pilot study participants and hear their feedback. This is exactly the approach taken within the studies presented here (see Appendix 1 for details about the purposes and the findings of the piloting sessions in these studies). To summarize, this section has highlighted the importance of taking measures to prevent the negative impact of lapses of concentration, exhaustion and confusion on the quality of the formulaicity judgement data. Some researchers (e.g. Foster, 2001) have noted these negative impacts in the literature. The next step, however, is to prevent these negative factors at the research design stage, by incorporating strategies to make the identification task as pleasant as possible for the native speaker judges. On the part of researchers, this requires not only good management of the formulaicity judgement data collection process and attention to detail, but also a consideration of the identification task from the perspective of the judges. This discussion will be continued in Sections 4.7.1 and 4.7.2 where details can be found about the measures taken by studies one, two and three to address the issues that human judges experience with lapses of concentration, exhaustion, confusion and so on.

2.3.3.2 Defining formulaic language for native speaker judges The previous section (2.3.3.1) highlighted the need to take preventive measures against the negative influences that can occur due to lapses of concentration,

30

The Prosody of Formulaic Sequences

exhaustion and confusion, which are inevitable in all studies that involve human participants. This section continues to examine another issue with the implementation of native speaker judgement as an identification method: defining formulaic language for native speaker judges. It is obvious why providing an accurate definition of formulaic language for the judges is very important in the use of native speaker judgement, as they have to identify formulaic language based on the specific definition provided. However, accuracy is just one of many aspects that needs to be considered in the case of the definition. There are also issues about the audience (e.g. laypeople versus linguistic experts) and about whether the definition is both the simplest and most effective possible. It is necessary to consider all these issues at the stage of research design. To do this, a number of pilot studies are often needed to achieve optimal results in forming a definition. Even so, it is true that there is no definition of formulaic language that suits every example of native speaker judgement. Nonetheless the definitions used in previous studies that use native speaker judgement have provided an important starting point for the present search for a suitable definition for these empirical studies. The aim of this section is to review previous approaches to the definition of formulaic language and to the provision of instructions for the native speaker judges in the formulaic language identification task. This section provides a theoretical basis for the methodological decisions concerning the implementation of native speaker judgement as an identification method in study one, which will be presented in Section 4.7. In the context of the native speaker judgement task, the audience for the definition of formulaic language is obviously the native speaker judges. A distinction can be made between two types of judges, internal judges (i.e. researchers who are judges of formulaicity in their own studies) and external judges (i.e. other people who are invited to be independent judges of formulaicity),8 and between linguistic expert native speakers and layperson native speakers. The requirements for the definition of formulaic language for each of these groups can be very different, as a review of the literature shows. Table 2.2 The definitions provided by Bahns et al. (1986), Erman and Warren (2000), Foster (2001) and Wood (2006) share a common characteristic; they target linguistic expert judges. For this reason, they contain a lot of linguistic jargon and assume that the judges have a thorough understanding of linguistic theories and concepts such as speech acts, form–meaning mappings and word-by-word construction. These definitions also require the judges to possess a certain

Formulaic Language: An Overview

31

Table 2.2 Definitions of formulaic language in empirical studies that involve native speaker judgement as a formulaic language identification method Bahns et al. (1986), citing Coulmas (1981, p. 2):

Erman and Warren (2000, p. 31)

Foster (2001, p. 83)

Routines are tools ‘which individuals employ in order to relate to others in an accepted way’ (p. 695). ‘After a further examination of the utterances (under semantic-pragmatic aspects) we eventually listed a number of expressions which we intuitively regarded as formulas’ (p. 700). ‘A prefab is a combination of at least two words favoured by native speakers in preference to an alternative combination which could have been equivalent had there been no conventionalization.’ ‘Without consulting anyone else, any documentation or corpus, mark any language which you feel has not been constructed word by word but has been produced as a fixed “chunk,” or as part of a sentence “stem” to which some morphological adjustments or lexical additions have been required.’ ‘Five general criteria were applied in deciding whether a sequence was a formula. … It is important to stress that no particular criterion or combination of criteria was deemed as essential for a word combination to be marked as formulaic, and judgements were made on the basis of one, several, or all of these (criteria).’

Wood (2006, p. 21)

(1) Phonological coherence and reduction; (2) Nattinger and DeCarrico’s (1992) taxonomy; (3) greater length/complexity than other output; (4) semantic irregularity as in idioms and metaphors; (5) syntactic irregularity.

Wulff (2008, p. 172)

The present questionnaire is concerned with so-called idiomatic sentences. Idiomatic sentences are the kind of sentences you typically find in dictionaries or phrase books.

level of linguistic and text analysis skill including discernment of phonological phenomena and morphological modifications. This level of linguistic knowledge cannot be expected from layperson judges because, from the experience of the pilot study sessions of this book (see Appendix 1), it is clear that even basic linguistic concepts, such as those mentioned above, are a challenge to laypeople. Because of the challenge of recognizing even basic linguistic concepts, Wulff ’s (2008) study, which used layperson native speakers as judges in the formulaic language identification procedure, had to take a completely different approach to the definition of formulaic language (see Table 2.2). Wulff defined idiomatic ‘sentences’ as ‘the kind of sentences you typically find in dictionaries or phrase

32

The Prosody of Formulaic Sequences

books’ (p. 172). This definition was free of technical jargon and used everyday language so that laypeople could relate to it relatively easily. However, to cater to the needs of layperson native speaker judges, the simplification of the definition was achieved at the price of precision. For instance, Wulff used the term ‘idiomatic sentences’ even though it is clear that that formulaic language does not need to be a complete sentence (or clause); it can be longer or shorter than a clause. Moreover, ‘the type of language typically found in dictionaries’ is indeed a very vague description of formulaic language. Obviously, the type of language found depends on the type of dictionary referred to. To instruct native speaker judges to find sentences that can be found in dictionaries might not be specific enough because these ‘sentences’ could be single-word or multi-word items. In fact, there is also the confusion in this definition about whether the researcher wants the judges to judge only the target single-word or multi-word items or the whole sentences containing these items. In comparison, the reference to ‘the kind of language found in phrase books’ might be clearer because ‘phrases’ suggest multi-word units with specific pragmatic functions. Because phrasebooks are often associated with translation between two languages, the reference to the language found in phrasebooks indirectly encouraged judges to consider the use of a target wordstring from a crosslinguistic perspective which could aid in the identification of formulaic language. One of Zgusta’s (1967) formulaic language identification criteria specifies that the existence of a one-word equivalent of the target wordstring in a foreign language is an indicator of the status of the wordstring as a multi-word lexical unit. Therefore, the reference to the type of language found in phrase books is simpler and more effective in communicating certain aspects of formulaic language. From a methodological standpoint, it is easy to be critical about the lack of precision and details in the definition provided in the Wulff study. In fact, Wulff (2008, p. 32) noted the risks of her approach to the definition of formulaic language. However, her decisions not to provide a more specific definition and to deviate from the tradition to use linguist native speaker judges were conscious, based on a number of considerations: It might be argued that eliciting idiomaticity data without providing a detailed definition of this concept from non-experts is a risky undertaking. In the absence of a more informative explanation of idiomaticity, naïve speakers may resort to an understanding of idiomaticity as their familiarity with that phrase. Such a strategy may be further motivated by the instructions to judge the constructions depending on how relevant they are for inclusion in a dictionary or phrase book. However, it was not opted to instead provide a more specific definition of idiomaticity to the participants or to take trained linguists as subjects

Formulaic Language: An Overview

33

instead because this most likely would have distorted the results even more. Expert opinions on what is grammatically acceptable, typical of a language, or stylistically appropriate in certain text types and registers often deviate starkly from how the general public puts language to use. Similarly, it is plausible to assume that language experts will have had considerable exposure to theoretical approaches to idiomaticity, so their judgements will hardly be unfiltered (the widely established equation of idiomaticity with non-compositionality is very likely to be particularly problematic here). The scope of the present study, in contrast, is to carve out the understanding of idiomaticity in naïve speakers’ heads. The line of reasoning here is that idiomaticity is a daughter process of language change, which, from the usage-based perspective adopted here, is not primarily driven through conscious initiatives on the side of the expert community. Rather, it is unconsciously perpetuated by the speaker community at large. Therefore, to ask naïve native speakers for their judgements was a conscious decision.

In the instructions for her formulaic language identification task, Wulff gave the three examples below to complement the simple definition she provided (although there was no specific discussion on why these three examples were chosen). The government got its fingers burnt. Vincent has spilled his guts. The knives are out for me at the moment.

As well as the simplification of the definition and the provision of examples, the Wulff study was markedly different from other studies involving the use of native speaker judgement (i.e. Bahns et al., 1976; Erman and Warren, 2000; Foster, 2001; Wood, 2006), in that it did not take an ‘exhaustive approach’ to the identification of formulaic language (see Section 2.3.1). Instead, the judges were asked to judge the idiomaticity of pre-selected sentences ranging from three to eight words in length. Because each pre-selected sentence contained only one target idiomatic construction, the layperson judges’ attention could be focused on that construction. This way, the problem that Erman and Warren (2000) highlight, that the judges sometimes missed missing formulaic sequences in the identification because they appeared to be completely semantically transparent at first sight, was bypassed. However, strictly speaking, the layperson native speaker judges in the Wulff study were not asked to identify idiomatic language as such, but to judge the level of idiomaticity of pre-selected idiomatic constructions embedded in sentences (see Lin, 2009 for a review of the Wulff study).

34

The Prosody of Formulaic Sequences

Nonetheless, the Wulff study has played an important role in the development of studies one to three. The findings of that study provide concrete evidence for the logical basis of native speakers’ intuitive formulaicity judgements and show that the ability to form formulaicity judgements is not exclusive to linguistic experts. Of particular relevance to the present research is Wulff ’s approach to the definition of formulaic language provided to the layperson native speaker judges in the instructions for the identification task. To suit their level of linguistic knowledge, this definition used simple, everyday language and appropriate examples to convey the requirements of the task. As will be seen later, Wulff ’s approach shares some important similarities with the three empirical studies mentioned here because they also involved the use of layperson native speaker judges. What is different, however, is that Wulff ’s judges were helped by the research design to focus on only one target idiomatic construction at a time in each sentence. On the other hand, the judges in studies one, two and three were asked to identify all the formulaic sequences in a text without any hint of how many were present (i.e. taking the exhaustive approach). From this perspective, it can be said that the requirements of the empirical studies in this book were more demanding for the layperson judges than those of the Wulff study.

2.4 Chapter summary To summarize, Chapter 2 has discussed issues concerning the definition of formulaic language (Section 2.1), the criteria for the identification of formulaic language (Section 2.2) and the characteristics of the identification methods used within empirical investigations into the prosody of formulaic language (Section 2.3). These issues were fundamental to the next stage of the investigations into the prosody of formulaic language (studies one and two) and led to improvements in the final study (study three) in which a multimodal approach to native speaker judgement was taken, by allowing judges to listen to the prosody of the formulaic language presented.

3

Can We Identify Formulaic Language Based on Prosodic Cues?

3.1 Introduction Chapter 1 pointed out that the prosody of formulaic language warrants detailed investigation. This is because a greater understanding of this research area could potentially contribute to the discovery of a solution to the vexing problem of identifying formulaic language in adult speech. Section 2.2 developed this argument by highlighting the role of phonology in the criteria for the identification of formulaic language. Prosodic cues have previously been used to aid the identification of child formulaic language (see Plunkett, 1990). On this basis, a recent hypothesis has been proposed that the prosodic cues should aid the identification of adult formulaic language. This hypothesis will be examined in the following chapters because, at present, sufficient and directly related empirical support is still lacking (see Section 3.3 for details). The aim of Chapter 3, as its title suggests, is to address the question of whether formulaic language can be identified based on prosodic cues. This will be achieved through a careful examination of the speculations and research findings from the relevant literature on the speech of children, language learners and native speakers. This examination of the literature will also highlight the original contribution this book has made in tackling the knowledge gap concerning the use of prosodic cues in the identification of formulaic language. The research questions addressed by all three empirical studies presented here, therefore, are built upon the foundation of the critical review provided in this chapter. This chapter is divided into five sections. From the first to the final sections, there is a gradual progression in the depth of the discussion presented about the reality of using the phonological method in the identification of formulaic language in adult speech. Section 3.2 presents the implications of the

36

The Prosody of Formulaic Sequences

phonological method for our understanding of the psycholinguistic processing of formulaic language and explains why researchers are particularly attracted by the prospect of the phonological method. Section 3.4.1 then critically examines the assumption based upon which researchers (including Altenberg and EegOlofsson, 1990; Baker and McCarthy, 1998; Wray, 2002, 2004) have derived the hypothesis that formulaic language in adult speech can be identified using prosodic cues. On closer examination, the arguments underlying this hypothesis could be problematic. The three studies in this book should shed empirical light on this hypothesis. Clearly, the successful application of the phonological method to adult speech builds on the foundation of an in-depth understanding of the prosodic features of formulaic language. As mentioned in Chapter 1, anecdotal observations concerning these prosodic features have been made in the literature. Section 3.5 combines these observations with the aim of gaining a more comprehensive picture of the prosody of formulaic language. These observations will be classified according to the three components of prosody: intonation, rhythm/tempo and stress and are discussed in Sections 3.5.1, 3.5.2 and 3.5.3 respectively. Finally, Section 3.6, which has provided a theoretical basis for the empirical investigations in studies one (Chapter 4), two (Chapter 5) and three (Chapter 6) summarizes this chapter.

3.2 The phonological method as a window into the psycholinguistic processing of formulaic language The search for ‘better’ identification methods for formulaic language has been a key issue in formulaic language research (see Wray, 2002). As discussed in Section 2.3, researchers have long recognized the limitations of the two mainstream formulaic language identification methods: the lack of psycholinguistic validity of some of the products of automatic extraction (see also De Cock, 1998; Simpson, 2004) and the inconsistencies between judges in the use of native speaker judgement (see also Erman and Warren, 2000; Wood, 2006; Wray, 2002). Therefore, researchers have been excited by the prospect that the phonological method can aid in the identification of formulaic language (see Wray, 2004). To Wray (2004), the phonological method is, in a sense, ‘superior’ to other identification methods which rely on the orthographic form, meaning or function of word sequences because prosodic cues can provide solid evidence

Can We Identify Formulaic Language Based on Prosodic Cues?

37

of the psycholinguistic reality of spoken formulaic language. This optimism concerning the phonological method can be detected in the following quotation from Wray (2004, p. 250), which clearly suggests that the tracking of pauses and intonation is a more reliable criterion for formulaicity than formal fixedness, semantic non-compositionality or situation dependence:1 It holds that there is nothing to prevent any wordstring from being treated formulaically. … If any wordstring can become formulaic, it follows that one can neither guarantee to spot formulaic strings by looking at their form, meaning or usage, nor compile a complete list of them. The identification of formulaicity in this definition is not, however, impossible. Various techniques have been applied, including the tracking of pauses, eye-gaze, intonation and, for written text, fluency in typing. (see Wray, 2002a, Chap. 2 for a review)

There will be a detailed examination of Wray’s (2004) claim that the tracking of pauses and intonation has been applied to the identification of formulaic language in Sections 3.3 and 3.4. If, for now, this claim is assumed to be valid, then the significance of using the phonological method as a test of the psycholinguistic validity of formulaic language will be considerable. The reason for this, is that, although there are many existing experimental techniques for revealing the psycholinguistic reality of formulaic language (e.g. eye-tracking, self-paced reading, event-related potential and reaction time measurement), they only deal with written formulaic language, and have to be conducted in an experimental setting and involve the use of pre-designed and controlled stimuli. The phonological method, however, makes it possible for researchers to begin to investigate spoken formulaic language in a natural, spontaneous setting. This release from the constraints of the experimental setting is particularly important as it allows a wider range of language genres to be explored and researchers are able to observe the spontaneous use of formulaic language as a dynamic, multi-faceted phenomenon subject to the influence of the interaction of many contextual factors. For this reason, the development of the phonological method could provide an important complement to existing psycholinguistic methods, such as eye-tracking and self-paced reading. In addition to complementing the existing experimental techniques available for the examination of the psycholinguistic validity of formulaic language, the phonological method should also complement the two mainstream methods of identifying formulaic language: corpus-based automatic extraction and native speaker judgement. As suggested in Section 2.3, both methods face issues with determining the boundaries of formulaic sequences. In Section 2.3.1, I don’t

38

The Prosody of Formulaic Sequences

know what was given as an example to illustrate how the corpus-based automatic extraction tool WordSmith counts I don’t, I don’t know and I don’t know what as three different items, thus failing to make a clear distinction about where the cutoff point lies in this word sequence. With native speaker judgement, the issue is with the judges’ uncertainty about where to place the boundaries in formulaic sequences (see Section 2.3.2). This point is well illustrated by Foster (2001): ‘All seven [judges] reported difficulty in knowing where exactly to mark boundaries of some lexical chunks and stems as one could overlap or even envelop another’ (p. 84). The idea here, with regard to the decision concerning boundaries, is that if native speaker judges are able to take into consideration the phonological evidence, the intonation contours in the original audio recordings may guide them to place the boundaries of formulaic sequences alongside the intonation unit boundaries. As for the issues confronting automatic extraction, the solution suggested is that researchers can apply the phonological method as a filter so that only word sequences that satisfy the predetermined prosodic qualities will count as formulaic sequences. This technique has already been demonstrated in Dahlmann’s (2009) study.

3.3 The evolution of the phonological method Researchers who suggest the use of the phonological method in formulaic language research (e.g. Aijmer, 1996; Altenberg and Eeg-Olofsson, 1990; Baker and McCarthy, 1988; Moon, 1997; Weinert, 1995; Wood, 2006; Wray, 2002, 2004) often cite a number of empirical studies to provide supporting evidence, such as Pawley and Syder (2000), Peters (1983) and Van Lancker, Canter and Terbeek (1981). There has also been a tendency for some researchers to crossreference others’ theoretical suggestions as if they provide empirical evidence supporting the validity of their arguments. Gradually, some confusion has developed over the original sources providing empirical support for the use of the phonological method. Many researchers have simply assumed the feasibility of the phonological method without the necessary examination of the empirical evidence from the literature. If this examination is carried out, it should reveal the distinction between the unattested theoretical speculations about the phonological method and the empirical findings. The aim of this section is to trace the origins of the supposed empirical support for the use of the phonological method as a test of the psycholinguistic validity of formulaic sequences and the use of prosodic cues as indicators of

Can We Identify Formulaic Language Based on Prosodic Cues?

39

formulaic language boundaries. This exercise reviews the literature on child language (e.g. Engel, 1973; Hickey, 1993; Peters, 1977, 1983; Plunkett, 1990), the speech of dysfluent foreign language learners (e.g. Dechert, 1983; Raupach, 1984) and, finally, the speech of adult native speakers (e.g. Ashby, 2006; Van Lancker, Canter and Terbeek, 1981). The fundamental purpose of this section is to highlight the necessity for empirical research into the extent to which the boundaries of formulaic sequences are indicated by intonation unit boundaries (see study one in Chapter 4), and into the prosodic features of formulaic language in general (see study two in Chapter 5). This necessity for empirical research is presented in Table 5 which summarizes, to my knowledge, all of the relevant research pertaining to the phonological method and the prosody of formulaic language. Detailed discussion of these studies will appear in the following subsections. However, as this book is primarily interested in adult native speaker speech, it is important to point out that Table 5 clearly shows the deficit in empirical research into all three components of prosody in the formulaic language of adult native speakers, namely intonation, rhythm/ tempo and stress. Although both the Erman (2007) and Wray (2004) studies in Table 5 presented empirical research into formulaic language in the naturally occurring speech of adult native speakers, their focus was only the distribution of pauses around formulaic sequences and excluded aspects of intonation and stress placement. With this in mind, it seems there is a great need for a study that comprehensively addresses all three components of prosody of formulaic language, a need which the combination of studies one and two aims to fulfil.Table 3.1

3.3.1 Child language The earliest mention of using prosodic cues to identify formulaic language can be traced back to Peters (1977, 1983), Bloom (1973) and Engel (1973). Peters (1977) observes that her child participant was able to utter sentence-like sequences of sounds with distinctive intonation approximating that of adult speech. Based on this observation, Peters (1983) included phonological coherence as one of the criteria for the identification of formulaic language in child speech. This phonological criterion was further developed in Hickey (1993) and successfully applied by Plunkett (1990). Furthermore, the importance of phonological coherence as a criterion was later cited by Weinert (1995) in her discussion of formulaic language in the speech of second language learners to address the question of how to identify formulaic language (see also Section 3.3.2).

40

Table 3.1 An overview of the studies on the prosody of formulaic language since the 1970s

Experimental data Child language



Foreign learner speech



Adult speech

Naturally occurring data Peters (1977) and Engel (1973)

Dechert (1983), Raupach (1984) and Lin and Adolphs (2009) Van Lancker, Canter Erman (2007) and Wray and Terbeek (2004) (1981)

Theoretical discussions Introspective data

Speculations









Ashby (2006)

Assumptions Peters (1983), Hickey (1993), Plunkett (1990) and Wray and Namba (2003) –

Read and Nation Wood (2006) and (2004) and Dahlmann (2009) Moon (1997)

The Prosody of Formulaic Sequences

Empirical research

Can We Identify Formulaic Language Based on Prosodic Cues?

41

It is obvious why phonological cues and the phonological criterion are fundamental to the identification of formulaic language in child speech, given that all child language is in the spoken mode. Children segment spoken language input from their carers, memorize useful chunks and reproduce them with the intact prosodic form that they first heard their carers use. This mechanism explains why Peters (1977) observed that her child participant was already producing utterances like open the door with distinctive intonation. Peters (1977, pp. 563–4) notes that each target phrase has a very characteristic intonation contour … a ‘melody’ unique enough so that it can be recognized even if rather badly mumbled … although the segmental fidelity was not very great, the combination of number of syllables, stress, intonation, and such segments as could be distinguished combined to give a very good impression of sentencehood.

Peters (1977) noticed that, in fact, her child participant began producing utterances months before he could accurately pronounce the phonemes constituting the utterances. This same observation was also documented by Engel (1973) who reported that her son was humming with sentence-like intonation even before he began babbling. Therefore, Peters (1977) proposed the idea that children are ‘learning the tune before the words’ (p. 563). This idea has found support in the latest research, which has demonstrated that, long before babbling begins, infants are already sensitive to the prosodic features of their first language (Boysson-Bardies, 1999; Mampe et al., 2009; Vihman, 1996). In fact, Hepper (1997) and Shahidullah and Hepper (1994) even suggest that auditory acquisition of prosody starts as early as the third trimester of gestation. All these studies of children’s acquisition of language unanimously point to the fact that child language learners are very sensitive to prosodic cues and are somehow able to exploit them in their language acquisition process. Owing to this book’s special interest in the prosodic form(s) of formulaic language, the speculation is that children’s acquisition of formulaic language is very much facilitated by the unique prosody of formulaic language. It may well be that words constituting formulaic sequences are often grouped together as single prosodic units. Therefore, these prosodic cues guide children to segment words into formulaic sequences as holistic chunks (see Section 7.3 for further discussion). Nonetheless, in relation to the prosodic cues associated with child formulaic language, Bloom (1973, p. 41) observes that the prosodic pattern that distinguished such words said in succession as single word utterances was unmistakable. Each word occurred with terminal falling

42

The Prosody of Formulaic Sequences pitch contour, and relatively equal stress, and there was a variable but distinct pause between them so that utterance boundaries were clearly marked.

The evidence from Bloom (1973), Engel (1973) and Peters (1977) together provided a concrete empirical basis for Peters (1983) to include a phonological criterion among her six criteria for the identification of child formulaic language. In her original elaboration on the phonological criterion,2 she made a number of key points about its use and constraints. These key points, summarized below, are very important to the understanding of how to apply the phonological criterion to adult native speaker speech: 1. If a formulaic sequence coheres phonologically, it is always produced fluently as a unit with an unbroken intonation contour and no hesitations for encoding; 2. Hesitation pauses are not reliable indicators of the size and nature of encoding units in adult speech (Rosenberg, 1977); and 3. The absence of hesitation (especially pauses) and single intonation contours are two criteria often used in child language research to identify multi-word constructions (e.g. Branigan and Stokes, 1982; Scollon, 1976). Together they provide a ‘good’ clue to some kind of pre-planned psycholinguistic unit, at least for very young children. Clearly, based on Peters’ original elaboration of the phonological criterion, the applicability of (some parts) the criterion does not extend beyond child language. Citing Rosenberg (1977) she points out that the tracking of pauses may not be a reliable way of discerning the formulaic status of word sequences in adult speech. Peters’ (1983) argument here contrasts with the approach of Dahlmann (2009) and Erman (2007) who used the distribution of pauses as an indicator of formulaic language in adult speech. In response to these conflicting arguments, the findings of study two of this book should reveal whether pauses can be reliable indicators of formulaic sequence boundaries in adult speech (see Chapter 5).

3.3.2 Dysfluent foreign language learners Regarding the production of formulaic language by foreign language learners, Weinert (1995) represents one of the most important publications because it is believed to be the first to introduce the phonological method for the identification of formulaic language in the speech production of foreign language learners. To

Can We Identify Formulaic Language Based on Prosodic Cues?

43

support her argument for the feasibility of this idea, Weinert (1995) cites the work of Peters (1983) and Raupach (1984) as empirical evidence that prosodic cues provide boundary markers for formulaic units. Although both studies are relevant to the discussion of the phonological criterion, their findings were clearly over-interpreted in the Weinert (1995) paper. First, it failed to draw attention to the fact that Peters’ (1983) phonological criterion was designed for the context of child language based on empirical observations of child language data (see Peters, 1977, and also Section 3.3.1); therefore, it is possible that the phonological criterion cannot be applied to the context of second language learners. Secondly, it failed to adequately highlight the fact that Raupach’s (1984) empirical observation did not extend beyond the speech of dysfluent foreign language learners (see discussion below for an elaboration). Although Weinert (1995) cited only Raupach (1984) as evidence that pauses are an indicator of formulaic sequence boundaries, Dechert (1983) has also contributed to the development of the idea that formulaic language in second language learners’ speech can be identified by prosodic cues. Strictly speaking, Dechert (1983) did not suggest that pauses are a reliable indicator of formulaic sequence boundaries. Instead, he only indicated that, after dividing samples of speech from a foreign language learner using non-verbal boundary markers (i.e. falling intonation contours, pauses, speech errors and their corrections, see p. 180), the presence of ‘two types of stretches of speech’ could be observed, ‘those that are marked by hesitations, fillers, drawls, and corrections, and others which run smoothly and fluently’ (p. 183). Included within the type that run smoothly and fluently are ‘native-like utterances’ such as shorter and shorter and and in the end which Dechert (1983) terms islands of reliability. Because all the examples of islands of reliability are followed by long pauses, it gives the impression that they are, in fact, delineated by long pauses. In fact, Dechert (1983) did not assert that pauses could be used in the formulaic language identification process. Raupach’s (1984) approach to the treatment of data was similar to Dechert’s (1983) in that they both divided the samples of speech from a foreign language learner into units based on pauses. As Raupach (1984) points out, ‘A formal approach to identifying formula units in spontaneous speech must, as a first step, list the strings which are not interrupted by unfilled pauses’ (p. 116). However, the sequential order in which the data were treated (i.e. formulaic language identification followed the division procedure) potentially reduced the chance of finding formulaic sequences with internal dysfluency phenomena and increased the salience of the notion that formulaic sequences are delineated

44

The Prosody of Formulaic Sequences

by prosodic boundary markers. In fact, it is possible that the ad hoc definition of formulaic language in that study may, to a certain extent, have driven the finding that formulaic language is readily delineated by prosodic boundary markers. If an investigation is to determine whether prosodic boundary markers do readily indicate the boundaries of formulaic sequences, it is crucial that a policy is established for identifying formulaic language independently before any treatment is given to the raw data. This is essential in order to avoid giving the impression of bias in the analysis. Finally, a close reading of Raupach (1984) makes it clear that Weinert (1995) was only making a selective interpretation of Raupach’s findings relating to the reliability of pauses as a valid indicator of formulaic sequence boundaries. This becomes clear when it is noted that two subsequent points made by Raupach (1984, p. 119) were not presented: Not all segments produced within the boundaries of hesitation phenomena can be regarded as candidates for formula units as we conceive them. In some cases, the segments can easily be broken up into smaller units; this is sometimes signalled by rising intonation contours within the sequences and becomes even more evident if the cut-off point of the unfilled pauses is lowered to 0.20 sec.

The above quotation highlights two important points: first, not all stretches of speech enclosed by two consecutive pauses are formulaic and, second, temporal features and intonation should be considered together when identifying formulaic language based on prosodic cues because intonation will complement where pauses provide insufficient information. Raupach’s (1984) emphasis on the need to consider both types of prosodic cues indeed echoes Peters’ (1983) comment about the dual criteria necessary in the identification of formulaic language in child speech (see Section 3.3.1). It also shows that Weinert (1995) had offered only half of the real picture when she cited Raupach (1984) in saying that pauses are a reliable indicator of formulaic language. As will be seen later, the importance of considering Peters’ (1983) dual criteria was the motivating factor for study one’s exploration of the extent to which formulaic sequences are delineated by intonation units.

3.3.3 Adult speech Regarding the suggestion that phonological cues can aid in the identification of formulaic language in adult speech, researchers (e.g. Wray, 2002; Wray and Namba, 2003) often cite Van Lancker, Canter and Terbeek (1981). This

Can We Identify Formulaic Language Based on Prosodic Cues?

45

is because they misinterpret the empirical findings of this study as indirect evidence supporting the use of the phonological method (see Ashby, 2006). In fact, Van Lancker and her colleagues never suggested that their findings could be interpreted as support for the use of the phonological method. By the same token, Van Lancker, Canter and Terbeek (1981) never suggested that the list of prosodic differences they provided as a result of their investigation represented the prosody of idioms (see endnote 3 for the list of prosodic differences). In fact, it should be made clear that their findings show nothing but how speakers can manipulate prosodic cues in order to signal to listeners that the (marked) literal interpretation of an idiom is intended rather than the (unmarked) idiomatic interpretation. If anything, Van Lancker, Canter and Terbeek’s findings, instead, provided concrete support for Ashby’s (2006) theory of ‘prosodic hand-waving’, concerning the prosody of idioms (see Section 3.5.3 for an elaboration of Ashby’s theory). To cite Van Lancker, Canter and Terbeek (1981) as evidence for the feasibility of the phonological method, as many researchers have, is also problematic because the subject of Van Lancker, Canter and Terbeek’s (1981) study is readaloud idioms, which are narrowly defined (e.g. semantically non-compositional idioms). Therefore, the extent to which the findings can be generalized to formulaic language in spontaneous speech when broadly defined is questionable. As Chafe’s (2006) study shows, read-aloud speech and spontaneous speech have two different prosodies. That is why, in order to determine the feasibility of the phonological method in the identification of formulaic language, empirical studies that focus directly on formulaic language when broadly defined and in spontaneous speech are needed. This need has been fulfilled in the present studies. Apart from the Van Lancker, Canter and Terbeek (1981) study, other research which is often cited as support for the use of the phonological method in adult speech includes Peters (1983), Baker and McCarthy (1988), Altenberg and EegOlofsson (1990) and Pawley and Syder (2000). From a close examination of these researchers’ own words (see below), it is clear that both the Baker and McCarthy (1988) study and the Altenberg and Eeg-Olofsson (1990) study were only speculating about using prosodic cues to identify formulaic language rather than providing empirical evidence to highlight the validity of their speculation. MWUs may be tested by examining whether or not they are amenable to crossing the boundaries of tone-units. … MWUs up to clause-length normally occupy one tone-unit and only one. … Some long MWUs, such as sayings

46

The Prosody of Formulaic Sequences and proverbs may span more than one clause, and thus, characteristically, are realised with more than one tone-unit: … Generally, MWUs will display fixed intonational contours over a wide range of occurrences. (Baker and McCarthy, 1988, pp. 14–5)

and Throughout Phase 3 [of the research project reported], the prosody of the combinations will play an important part in the analysis. Phenomena like pausing and hesitation will also be considered, since these are likely to provide further information about the degree of lexicalization of the different collocations and about their psycholinguistic role in the speech process (hesitations and toneunit boundaries, for example, can be expected to delimit rather than interrupt a prefabricated expression retrieved from memory). (Altenberg and EegOlofsson, 1990, p. 18)

Pawley and Syder (2000) also commented on prosodic features, although their comments (on p. 173) concern the prosodic features associated with intonation unit boundaries rather than the prosodic features associated with formulaic language in spontaneous adult speech. In the end, the reference to the Pawley and Syder (2000) study in Wray (2002, p. 35) may be a misleading citation.4 So far Section 3.3 has offered a critical review of the origins and the development of suggestions concerning the use of the phonological method in the literature on child language, foreign language learners and adult native speakers. The aim of this discussion has been to establish which parts of these suggestions are firmly grounded in empirical research and which are merely speculations or hypotheses. The clarification of this confusion concerning the feasibility of the phonological method in spontaneous adult speech is important because it demonstrates the need for the three empirical studies of this book in order for real progress to be made in investigating this phenomenon. As discussed in this section, there are indeed certain misinterpretations and overgeneralizations concerning the subject, which involve: 1. The failure to recognize that the empirical findings supporting the use of the phonological method to identify formulaic language in child language may not be transferable to adult speech; 2. The confusion of speculations/predictions with empirically validated findings; 3. The misinterpretation of the findings/observations of the Van Lancker, Canter and Terbeek (1981) study and the Pawley and Syder (2000) study

Can We Identify Formulaic Language Based on Prosodic Cues?

47

as evidence supporting the use of the phonological method in identifying formulaic language in adult speech; and 4. The over-generalization of empirical findings based on idioms when narrowly defined and/or in a read-aloud speech setting to the case of formulaic language when broadly defined and/or in a spontaneous speech setting.

3.4 The relationship between prosodic cues and holistic processing The hypothesis that formulaic language demonstrates certain prosodic features, including alignment with intonation units and an absence of internal dysfluency phenomena, is often based on two unstated assumptions: 1) formulaic language is processed as holistic units in the mental lexicon, and 2) psycholinguistic processes can be deduced from the prosodic cues produced in the output of that processing (see Section 3.4.1). A certain circularity in this argument is noticeable however. On the one hand, researchers believe that formulaic language should demonstrate certain prosodic features in accordance with their assumed nature of holistic storage and retrieval; yet, on the other hand, the prosodic features of formulaic language are highlighted as proof for the existence of holistic storage and retrieval (see Section 3.2). Formulaic language researchers often have to choose to side with one of two opposing positions: They need to either interpret the prosodic cues as a result of the holistic processing of formulaic language (as in Altenberg and Eeg-Olofsson, 1990; Baker and McCarthy, 1988; Moon, 1997), or use the prosodic cues to qualify or validate the formulaicity of wordstrings (as in Dahlmann, 2009; Wray, 2004). The position of this book is somewhere between these two positions. It supports the notion that prosodic cues can provide some indication of whether formulaic language is processed as chunks, but, alone, they are not sensitive enough to indicate whether formulaic language is holistically stored and retrieved. Moreover, when it comes to the use of prosodic cues as evidence for psycholinguistic processing, a distinction should be made between holistic storage and holistic processing (see Section 3.4.1 for an elaboration). If a certain formulaic sequence is found to form a single intonation unit and is resistant to internal dysfluency phenomena, then prosodic evidence adds to the validity of the claim that the sequence is holistically processed. However, the formulaicity

48

The Prosody of Formulaic Sequences

of a wordstring should not be dismissed on the basis that it does not always form a single intonation unit and is not always resistant to internal dysfluency phenomena. That said, it should be pointed out that the validity of these predictions concerning the formation of single intonation units and resistance to internal dysfluency phenomena as applied to adult speech is still the subject of empirical investigation. Therefore, until a more comprehensive picture of the prosody of formulaic language has been developed, it is unwise to qualify or dismiss the formulaicity of wordstrings on the sole basis of prosodic cues.

3.4.1 The theory of holistic storage as the psycholinguistic underpinning of the prosodic features of formulaic language In recent years, the theory of holistic storage has become a widely accepted explanation for all of the formal, structural, semantic and syntactic behaviour of formulaic language, with support for its validity found in the cases of very young child first language learners and aphasic patients (see Wray, 2002 for a review). In the case of very young child language learners, their ability to produce complete sentences which are clearly beyond their linguistic competence cannot be explained by any other theory than that they have memorized these sentences as chunks from the spoken input of their adult carers (see e.g. Peters, 1977). In the case of some patients who became aphasic after sustaining brain damage due to a stroke, accident or disease, the fact that they retain storage of a number of holistic phrases in the absence of any other language behaviour also provides incontrovertible evidence for the idea that holistic phrases are stored holistically and separately in the mental lexicon (see Van Lancker and Canter, 1981). The concern here, however, is not to argue about whether holistic storage exists, but whether prosodic cues are sufficient to provide evidence for the holistic storage of wordstrings. This concern is obviously relevant to the discussion in this chapter because it is necessary to establish exactly what level of information can be provided by the phonological method if it is used to aid in the identification or validation of formulaicity in spontaneous speech. In fact, this book argues that showing the existence of holistic storage requires a highly precise level of information which is beyond what is made permissible by the phonological method. All that the phonological method (in particular, by tracking intonation units) can show is whether a wordstring has been processed as a chunk, a position which is consistent with the way prosodic evidence has been interpreted in the psycholinguistic literature.

Can We Identify Formulaic Language Based on Prosodic Cues?

49

To illustrate why proof of holistic storage requires a higher level of precision of method than holistic processing, the analogy of a stack of paper is useful. Imagine a colleague bringing in a stack of journal papers which you have requested. If he delivers the whole stack of papers in one pile, then ‘a stack’ is a processing unit. Within this stack, there are stapled sets of sheets and loose single sheets. In this case, each detached item, be it a sheet or a stapled set of sheets, is a storage unit. Obviously, because a processing unit can be made up of one or more storage units, it can be said that a storage unit is a more precise unit than a processing unit. If the information required is at the level of holistic processing, to say that the stack is delivered as one pile will suffice. However, if the information required is at the level of holistic storage, unless the stack of paper can be turned over and searched, it will be impossible to accurately establish exactly how many storage units there are within the stack. In the case of prosodic cues, the fact that a wordstring forms a single intonation unit tells us that the wordstring is a processing unit. The question of how many single-word and multi-word items together make up a processing unit remains unanswered. An intonation unit as a processing unit may contain one or more (single or) multi-word items which are storage units. However, this level of information does not become available by considering the prosodic cues alone. In psycholinguistics, prosodic cues provide evidence of the process of chunking (see Cutler, Dahan and van Donselaar, 1997; Warren, 1996, 1999 for reviews of the link between prosody and language processing). Because there is a limit to the online processing capacity of the working memory at one time (Miller, 1956), language information has to be processed in chunks. Researchers such as Chafe (1987) associate the processing capacity of the working memory with the length of an intonation unit by suggesting that an intonation unit is ‘the expression of a single focus of consciousness’ (Chafe, 1987, p. 32). This concept is what he terms ‘the cognitive basis of an intonation unit’ (Chafe, 1987,). In psycholinguistic literature, the concept of chunking concerns the holistic processing of language rather than holistic storage. By the same token, the prosodic cues which are used as indicators of chunking also concern the holistic processing (not the holistic storage) of language. Unless formulaic language researchers are proposing a different, more advanced way of using prosodic cues as indicators of language processing, it is simply not accurate to suggest that the holistic storage of formulaic language can be examined via the tracking of prosodic cues alone. It is also crucial that we recognize the basic fact that an intonation unit, which, according to Chafe (1987), reflects a processing unit, can

50

The Prosody of Formulaic Sequences

be equivalent to one or more formulaic sequences and/or single-word units (cf. Raupach, 1984). For that reason, expectations should be adjusted regarding the application of the phonological method to adult speech. The reality is that, given the complex situation of language processing in the adult linguistic system, the formulaicity of a wordstring cannot be categorically established on the sole basis of its forming single intonation units or its resistance to internal dysfluency phenomena.5 Put simply, before applying the phonological method to a target wordstring, other logical reasons or criteria (see Section 2.2) should exist to persuade researchers that the wordstring could be formulaic. If the application of the phonological method is not supported by other evidence (e.g. semantic non-compositionality and formal fixedness), even if the wordstring possesses all of the necessary prosodic cues associated with formulaic language, it cannot be guaranteed that the wordstring is formulaic (see Section 3.5 and Chapter 5 for a discussion of these prosodic cues).

3.4.2 The difference between pauses and intonation unit boundaries The discussion so far might give the impression that all prosodic cues are the same. In fact, Knowles (1991) has long suggested that pauses and intonation unit boundaries, as two distinct forms of prosodic discontinuities, hold different positions in the hierarchy of prosodic discontinuities (see Table 3.2). In the context of the present discussion, this concept of prosodic discontinuities helps us to see the varying levels of psycholinguistic significance that prosodic cues demonstrate as potential indicators of formulaic sequence boundaries. Put explicitly, the fact that intonation unit boundaries are lower in the hierarchy than

Table 3.2 A hierarchy of prosodic discontinuities (Knowles, 1991, p. 153) 5 pause accompanied by audible breathing 4 pause 3 pitch discontinuity 2 segmental separation features 1 segmental run-on cancelled 0 nothing measurable

Can We Identify Formulaic Language Based on Prosodic Cues?

51

pauses means that intonation unit boundaries should occur more frequently than pauses in spontaneous speech, and, that an intonation unit is a smaller unit than a pause-defined unit. To understand the concept of the hierarchy of prosodic discontinuities, one has to first appreciate that spontaneous speech is just a long stream of sounds. It is necessary to divide the continuous stream of sounds into smaller chunks because of the limited capacity of the working memory. Between two neighbouring chunks is a juncture (see Hockett, 1958; Lehiste, 1960; Trager and Bloch, 1941 for the original definition of juncture in phonology) which is indicated by various prosodic cues to signal prosodic discontinuities. The most well-known prosodic discontinuity is certainly the unfilled pause, but there are also other types that can contribute to a listener’s perception of a break in prosody. These include, for example the pitch reset, syllable-lengthening, anacrusis (i.e. the increased rate of articulation which spans the first few words of an intonation unit) and the removal of possible elision between two adjacent words. Knowles (1991) groups these prosodic discontinuities into three categories: temporal, pitch and segmental.6 The remarkable thing about Knowles’s (1991) suggestion, however, is the idea that there is a hierarchical relationship between the three categories of prosodic discontinuities (see Table 3.2). The weakest prosodic discontinuity is the segmental, which concerns the cancellation or removal of features normal in connected speech (level 1). The strongest prosodic discontinuity is a breathing pause (level 5). In this hierarchy, there is an implicit assumption that when one perceives a level-3 discontinuity, segmental separation features (level-2 discontinuity) and the cancellation of segmental connected speech features (level-1 discontinuity) will be present as well. By the same token, when one perceives a level-5 discontinuity, all five levels of prosodic discontinuities will be present. Knowles’s (1991) purpose in devising this hierarchy of prosodic discontinuities was to allow an even more detailed transcription of speech to indicate the ‘strength’ of the prosodic breaks that transcribers perceive. The idea of introducing a hierarchy of prosodic breaks and that of transcribing the strength of breaks is also captured in the Tones and Break Index (ToBI) prosody analysis system (Beckman and Elam, 1997; Beckman, Hirschberg and ShattuckHufnagel, 2005; Pierrehumbert and Hirschberg, 1990), which is widely used among American phonologists. This discussion of the hierarchical relationship between the types of prosodic discontinuities was the motivation behind study one, which set out to investigate

52

The Prosody of Formulaic Sequences

the extent to which formulaic language in the fluent speech of adult native speakers and proficient foreign language learners aligned with intonation unit boundaries instead of pauses. Based on Knowles’s theory about the hierarchy of prosodic discontinuities, it can be deduced that intonation unit boundaries should occur more frequently than pauses, and intonation units should be smaller and more precise than pause-defined units.7 That is why it is more likely that formulaic language in fluent spontaneous speech will be delineated by intonation unit boundaries than pauses. Nevertheless, there are other, more important, reasons why study one focused on intonation units instead of pauses. These reasons will be presented, in detail, in Chapter 4.

3.5 Prosodic cues associated with formulaic language This section reviews the prosodic cues associated with formulaic language with a particular emphasis on adult speech. Such a literature review is useful particularly for study two, which aimed to reveal whether the hypotheses highlighted in the literature concerning the prosody of formulaic language were valid, and whether observations based on read-aloud idioms can be generalized to formulaic language broadly defined in spontaneous speech. English prosody consists of three components: intonation, tempo/rhythm and stress (see Cruttenden, 1997 for a review of the fundamentals of English prosody). Some phonologists (e.g. Lindström, 1978; Tench, 1996; Wells, 2006) prefer to approach the suprasegmental aspect of the English language from the perspective of ‘intonation’ broadly defined. In this case, the components to study become ‘tonicity’, ‘tonality’ and ‘tones’ (see Tench, 1996 for an elaboration of tonicity, tonality and tones).8 Although the two systems represent two different perspectives on the suprasegmental aspect of the English language, there is considerable overlap in their content. The following review of the prosodic cues associated with formulaic language is divided into three parts according to the three components of English prosody (i.e. intonation, tempo/rhythm and stress).

3.5.1 Intonation As mentioned earlier, many researchers (e.g. Aijmer, 1996; Altenberg and EegOlofsson, 1990; Baker and McCarthy, 1988; Moon, 1997; Wray, 2004) speculate

Can We Identify Formulaic Language Based on Prosodic Cues?

53

Table 3.3 A summary of relevant quotes about the claim that formulaic language forms single intonation units Aijmer (1996, p. 14)

Prosodic fixedness Prosodically, conversational routines correspond to tone units. Tone unit boundaries and hesitation pauses within the tone unit therefore provide additional clues about the degree of fixedness of a conversational routine (see Altenberg and Eeg-Olofsson, 1990, p. 18; Peters, 1983, p. 10). Altenberg and Throughout Phase 3 [of their project discussed in the paper], Eeg-Olofsson the prosody of the combinations will play an important part (1990, p. 18) in the analysis. Phenomena like pausing and hesitation will also be considered, since these are likely to provide further information about the degree of lexicalization of the different collocations and about their psycholinguistic role in the speech process (hesitations and tone-unit boundaries, for example can be expected to delimit rather than interrupt a prefabricated expression retrieved from memory). Baker and MWUs may be tested by examining whether or not they are McCarthy amenable to crossing the boundaries of tone-units. … MWUs (1988, up to clause-length normally occupy one tone-unit and only pp. 14–15) one … . Some long MWUs, such as sayings and proverbs may span more than one clause, and thus, characteristically are realised with more than one tone-unit: … Generally, MWUs will display fixed intonational contours over a wide range of occurrences. Moon (1997, These three criteria [institutionalization, fixedness and nonp. 44) compositionality] operate together – in spoken English, in conjunction with a phonological criterion where multiword items often form single tone units. The criteria are not absolutes but variables, and they are present in differing degrees in each multi-word unit. Wray (2004, It holds that there is nothing to prevent any wordstring from p. 250) being treated formulaically. … If any wordstring can become formulaic, it follows that one can neither guarantee to spot formulaic strings by looking at their form, meaning or usage, nor compile a complete list of them. The identification of formulaicity in this definition is not, however, impossible. Various techniques have been applied, including the tracking of pauses, eye-gaze, intonation and, for written text, fluency in typing (see Wray, 2002a, Chap. 2 for a review).

that adult formulaic language forms single intonation units. Table 3.3 summarizes excerpts from the original publications that discuss this claim (some of these excerpts have also appeared in previous sections). The juxtaposition of these original quotes allows a comparison of the various elaborations of the suggestion that formulaic language forms single intonation units. In this regard, there are four main points that particularly stand out.

54

The Prosody of Formulaic Sequences

First, both Aijmer (1996) and Altenberg and Eeg-Olofsson (1990) mention ‘the degree of fixedness’ and ‘the degree of lexicalization’ and suggest that prosodic cues can provide additional clues about them. However, they do not specify the details about how this would work and how researchers might objectively measure the degree of lexicalization or fixedness. Although Van Lancker (1987, p. 56) has provided an intuitive and speculative ranking of the types of formulaic language according to their degree of formulaicity, no researcher has yet proposed a convincing and objective way of measuring and comparing the degree of fixedness of actual formulaic sequences (not types of formulaic language) which can vary to a great extent in terms of grammatical structure, semantic transparency, context of use, pragmatic function and so on.9 The topic of the measurement of the degree of fixedness will be revisited in Sections 4.7.2 and 4.11 in relation to the third sub-question addressed in study one. Secondly, as mentioned in Section 3.2, Wray’s (2004) explanation (see Table  3.3) suggests that the method of tracking pauses and intonation is a more reliable criterion for formulaicity than formal fixedness, semantic noncompositionality or situation dependence. According to Aijmer (1996, p. 15), Keller (1981) also highlights the superiority of the phonological method. Thirdly, both Baker and McCarthy (1988) and Moon (1997) made allowances, in their prediction, for cases in which formulaic language did not form single intonation units. Moon (1997) specifies that the phonological criterion is not absolute: formulaic sequences can vary in the degree to which they observe the phonological criterion, and she was careful to suggest that multi-word units ‘often’ (instead of ‘always’) form single intonation units. Baker and McCarthy (1988) also made allowance for the case of ‘long MWUs, such as sayings and proverbs [which] might span more than one clause’ (Baker and McCarthy 1988, p. 15) and suggest that they are ‘characteristically realized with more than one tone-unit’ (Baker and McCarthy, 1988). Finally, both Aijmer (1996) and Baker and McCarthy (1988) raise an argument about ‘prosodic fixedness’, claiming that ‘MWUs will display fixed intonational contours over a wide range of occurrences’ (Baker and McCarthy, 1988, p. 15).

3.5.2 Pauses and other temporal features Compared with intonation, pauses receive more attention in the formulaic language literature. Researchers believe that formulaic language, at least in

Can We Identify Formulaic Language Based on Prosodic Cues?

55

the case of learner speech, is delineated by pauses (Dechert, 1983; Raupach, 1984; Weinert, 1995) and contains fewer internal pauses or other dysfluency phenomena (Pawley, 1985; Wray, 2002, 2004). The motivation for linking pauses and intonation with formulaic language is the same. As Dechert (1983) states, non-verbal boundary markers including falling intonation contours and pauses, as well as speech errors and their corrections, are taken as ‘indicators of planning’ (p. 180). This belief is found not only in the Dechert (1980, 1983) studies, but also in the works of Butterworth (1975), Raupach (1984), Chafe (1987, 1988a,b) and Butler (1997). Many researchers of spoken language who have followed this approach first break down the texts to be analysed into chunks that are not interrupted by unfilled pauses, (as Raupach, 1984, and Dechert, 1983, did), or breaks in intonation contour (as Chafe (1987, 1988a,b) did).10 This treatment of the data before the analysis, as Dechert’s (1983) study demonstrates, assists researchers in claiming that there are two types of stretches of speech: those that are marked by hesitations, fillers, drawls (i.e. unnatural lengthening of syllables) and corrections, and others which run smoothly and fluently. This latter type of stretch, which plays an important role in learners’ development of fluency in their second language, is what Dechert (1983) calls an ‘island of reliability’ (however, see Section 3.3.2 for a discussion about the limitations of this treatment). The point of the above observations here is clear: formulaic language is linked to the location of pauses. For a speaker who struggles to produce fluent speech and has to pause or hesitate frequently, if he or she uses a memorized chunk, pauses and hesitation will be markedly fewer. However, immediately after the fluent stretch of speech constituted by formulaic language, dysfluent speech with frequent pauses and hesitation phenomena will be resumed. That is the reason why, at the switching points between formulaic language and rule-based language, pauses and hesitations are expected. This is the underlying logic of the speculation by Dechert (1983), Raupach (1984), Weinert (1995) and Wray (2004) that formulaic language will be delineated by pauses and there will be fewer pauses or dysfluency phenomena within formulaic language. Apart from pause distribution, in the literature, formulaic language is often associated with high speech rate and articulation rate. To explain the motivation for such an association, the discourse (also the macro) perspective and the psycholinguistic (also the micro) perspective will be useful. From a discourse perspective, formulaic language promotes fluency not only in the speech of very dysfluent foreign language learners but also in highly fluent speech such as that of disc jockeys, sports commentators and auctioneers.

56

The Prosody of Formulaic Sequences

Kuiper and his colleagues (Haggo and Kuiper, 1985; Kuiper and Austin, 1990; Kuiper and Haggo, 1984, 1985) observed that the speech fluency of live sports commentators and livestock auctioneers could be attributed to their adept use of formulaic speech. The nature of these genres puts a lot of pressure on the working memory of the performers; as Fillmore (1979) observes, disc jockeys and sports commentators have ‘the ability to talk at length with few pauses, the ability to fill time with talk [and do] not have to stop many times to think of what to say next or how to phrase it’ (p. 93). However, the key to easing the workload of the working memory and producing a high speech-fluency performance is to rely on formulaic language. From a psycholinguistic perspective, formulaic language is associated with not only fewer occurrences of pauses and other dysfluency phenomena, but also a higher rate of articulation. Some formulaic sequences are highly recurrent in our everyday speech, and the research of Bybee and her colleagues (see e.g. Bush, 2001; Bybee, 2001, 2002, 2006; Bybee and Scheibman, 1999) has shown that frequent repetition of certain word pairs can lead to assimilation, reduction, deletion or linking of the phonemes that constitute the words. These phonemic phenomena explain why the rate of articulation of formulaic language is expected to be higher than that of general speech. There are other explanations for the assumed higher articulation rate of formulaic language which are more difficult to discover. For instance, the time it takes to retrieve and process formulaic language is shorter because of the processing advantage of formulaic language (see Conklin and Schmitt, 2007; Gibbs et al., 1997; Underwood, Schmitt and Galpin, 2004). In addition, it is speculated here that when formulaic language acts as a sentence stem, it will have a higher articulation rate. This theory, which awaits empirical examination, begins with the assumption that many common ways of starting an utterance are actually formulaic. As the beginning of an utterance is also the beginning of an intonation unit, it means that the class of formulaic sequences which are sentence stems are expected to occupy the beginning of an intonation unit. In the prosody literature, it is a fact that the first few words of intonation units tend to be articulated faster and the last few words tend to be articulated more slowly (Cruttenden, 1997; Pawley and Syder, 2000) – Pawley and Syder (2000) called this the ‘surge-and-fade’ pattern of intonation units. Therefore, it is believed that the cause of the higher rate of articulation in this class of formulaic sequences is the inherent bias of their position in utterances (see Cortes and Csomay, 2007, for a discussion about the positioning of lexical bundles in university lectures). Nonetheless, by predicting

Can We Identify Formulaic Language Based on Prosodic Cues?

57

that formulaic sequences functioning as sentence stems to begin utterances will have a higher rate of articulation; there is an implication that different types of formulaic sequences may vary in their prosodic features. This is an intriguing suggestion that awaits empirical examination.

3.5.3 Stress placement The placement of stress or nuclear accent within an intonation unit is a topic that has attracted considerable attention among phonologists. As discussed in Section  1.1, phonologists including Ashby (2006) and Wells (2006) have documented various observations about the peculiarity of stress placement within idiomatic/formulaic language (see Table 3.4 for selected observations in Wells, 2006, which show the peculiarity of stress placement in idiomatic language). Wells (2006) in particular asserts that ‘some instances of a speaker accenting repeated words do not seem to have a logical explanation, and must be regarded as idiomatic’ (p. 179). While Wells’s (2006) observations about stress placement in idiomatic language may be rather anecdotal, Ashby’s (2006) observations represent a more systematic approach to the issue. Therefore, this section covering the prosodic cues associated with formulaic language focuses on Ashby’s (2006) observations on idioms narrowly defined and how they can be extended to formulaic language broadly defined. Ashby’s (2006) aim was to describe the rules governing stress placement in semantically opaque idioms. Among all the observations he made based on introspective data, there are two important points that are particularly worth noting. First, he suggested that stress placement in idioms can be summed up in a single rule: the avoidance of narrow focus within what he terms ‘the noncompositional part of an idiom’.11 Deviation from this rule, that is the introduction of narrow focus within the non-compositional part of an idiom, is a signal to the listener that a meaning other than the obvious is intended (i.e. prosodic handwaving). This rule has convincingly explained the findings of the seminal Van Lancker, Canter and Terbeek (1981) study and the mechanism of some English jokes which involve the literal interpretation of idioms. The second important contribution of the Ashby (2006) study concerns the identification of three classes of idioms, each of which has a different prosodic pattern. Case (i) idioms have the same stress placement pattern as those of their literal counterparts, formed by exchanging constituting words in original idioms. Examples are to have a CHIP on one’s shoulder versus to have a BOW on one’s shoulder, and to

58

Table 3.4 A summary of selected observations in Wells (2006) which show the peculiarity of stress placement in idiomatic language

General prosodic rule We do not usually accent a personal pronoun except: 1) if it is placed in contrastive focus; 2) it is subject and still in focus; 3) if it is the complement of verb to be; 4) if it as a pronoun has an ellipted verb; 5) if it is a possessive pronoun in clause-final position Empty words Vague general usages of nouns or noun phrases such as things, people, the man and that woman are usually not accented Contrastive focus We accent items placed in contrastive focus

Adverbials

1) There are various idiomatic usages in which the accent is put on the pronoun despite there being no obvious contrast with other items 2) A number of idioms have fixed tonicity

In Wells (2006) Section 3.11

Vague general usages of some in several idiomatic Section 3.20 expressions are accented (often with a fall–rise tone) while the following noun is not Repeated focus is placed on said although what and Section 3.32 way are contrastive It’s ‘not what he \/said, | it’s the ‘way that he \said it 1) Adverbs and adverbial phrases that qualify a 1) Some disjuncts (e.g. at least, at any rate, by the Section 2.23 way and incidentally) regularly take a falling tone whole clause or sentence (i.e. disjuncts) often even though they are not obviously reinforcing form single intonation units 2) A distinction is made between two types of 2) These disjuncts do not form a separate intonation tones on the disjuncts: the limiting non-fall and unit when they are put after the main clause the reinforcing fall

The Prosody of Formulaic Sequences

Pronouns

Prosodic pattern of idiomatic language

Can We Identify Formulaic Language Based on Prosodic Cues?

59

have eyes in the back of one’s HEAD versus to have dirt in the back of one’s HEAD. Case (ii) idioms, on the other hand, have a stress pattern different from that of the literal reading of the same idioms, for example POUR down (idiomatic) versus pour DOWN (literal), and be ROLLing in money (idiomatic) versus be rolling in MONEY (literal). Case (iii) idioms have a high level of fixedness with their pattern because even the tone choice in the idiom is fixed, or at least highly constrained. Examples given by Ashby are the unmarked version (3a) and the marked version (3b). (3a) I could eat a \HORSE (falling tone) (3b) I could eat a \/HORSE (falling–rising tone) Among the three classes of idioms that Ashby discovered, Case (ii) idioms are the least surprising because the common belief is that the prosody of formulaic language is unique and distinct from that of general spoken English. This common belief has found support in the many examples discussed in Section 1.1 as well as the list of observations from Wells (2006) given in Table 3.4. However, the existence of Case (i) idioms shows that there is certain overlap in the rules governing stress placement in idioms and in general spoken English. In other words, certain stress placement behaviour of formulaic language may also be explained by general rules. Wells (2006, p. 172) gave the examples below to demonstrate that the fixed tonicity of idiomatic language can sometimes be explained simply by the general English prosodic rule which states that the nucleus is placed on a noun wherever possible in preference to other word classes. 'Onions make my 'eyes water. (= make me shed tears) You’re 'going to get your 'fingers burnt. (= suffer unpleasant consequences) She’s 'got a 'screw loose. (= is crazy) Let’s 'wait for the 'dust to settle. (= till things calm down) 'Wait and see which way the 'wind is blowing. (= what’s going to happen) She looked like 'something the 'cat had brought in. (= very untidy) 'Keep your 'fingers crossed! (= let’s hope something good happens) We can 'go on ‘asking | till the 'cows come home. (= for ever) It 'made my 'hair stand on end. (= frightened me) They 'got on like a 'house on fire. (= quickly established a good relationship) He’ll 'have his 'work cut out! (= it will be difficult for him to do)

While Ashby (2006) and Wells (2006) together have provided an innovative account of stress placement in idiomatic language, there are issues with applying

60

The Prosody of Formulaic Sequences

their observations to formulaic language broadly defined. For instance, Ashby suggests that if a narrow focus is introduced within the non-compositional part of an idiom, the deviant interpretation of the idiom may be intended. However, with many formulaic sequences being semantically compositional, the ‘noncompositional part’ of formulaic sequences does not exist. If Ashby’s theory has to be applied to formulaic language broadly defined, it needs to undergo significant modifications. Because there is very little empirical work on the stress pattern of formulaic language (except study two), only speculations can be made at this stage until the empirical studies of this book have been presented. Nevertheless, it is clear that stress placement in formulaic sequences is not random, just as that of general spoken English is not random. The following examples (4a), (4b), (5a) and (5b) illustrate this point: (4a) It rains a LOT in the UK. *(4b) It rains A lot in the UK. (5) I haven’t seen you for ages. (5a) I haven’t seen you for AGES. ?(5b) I haven’t seen YOU for AGES. As in general spoken English, contrastive meanings can result from stressing different words within a formulaic sequence. Stressing lot in (4a), a lot means a large amount; but stressing the indefinite article of a in (4b), a lot means *a physical lot of rain which is semantically nonsensical. (5) is a common line of opening in English conversation. By stressing ages in (5a) broad focus is introduced to the whole utterance (see Cruttenden, 1997, for the definition of broad focus). This reading of the formulaic sequence with broad focus conveys the typical, unmarked meaning of the utterance. By stressing you and ages in (5b), narrow focus is introduced to the utterance. This emphasis on you will result in the utterance being marked. Between (4a) and (4b), and between (5a) and (5b), it is obvious what the ‘right’ choices of stress pattern are. The point here is that there is a certain way of placing stress so that the unmarked meaning of the formulaic sequence can be communicated. Deviations from this pattern will lead to marked interpretations of the formulaic sequences and the whole utterances, as in (4b) and (5b). To take Ashby’s observation about broad/narrow focus further, we can speculate that there should be fewer stresses within formulaic language. The assignment of broad focus to an utterance means that only the final lexical word of the utterance is stressed. If another lexical word in the utterance is stressed, it

Can We Identify Formulaic Language Based on Prosodic Cues?

61

means that narrow focus is introduced, and we know that this is not acceptable in the case of semantically opaque idioms as in (6b). In order to avoid narrow focus, cats cannot be stressed even though it is a lexical word. This explains why there are fewer stresses within a semantically opaque idiom compared to general spoken English. An alternative interpretation of Ashby’s observation that narrow focus is not permissible in idioms is that idioms are holistic units, or, as Ellis (1996, p. 111) proposes, ‘big words’. According to common sense, one unit can only be assigned one stress. However, the situation in (6b), for instance, violates this rule because cats and dogs, as one holistic unit, cannot be assigned two stresses; otherwise, the construction becomes marked. Based on these two arguments, if Ashby’s theory can be modified to suit formulaic language broadly defined, fewer stresses should be found within formulaic sequences than in general spoken English.12 This is exactly what study two of this book explored and was able to confirm after a detailed analysis of spontaneous speech (see Section 5.3.3). (6a) It was raining cats and DOGS *(6b) It was raining CATS and DOGS Another way to consider stress placement in formulaic sequences is to focus on the function words. If the trend in general spoken English is to stress lexical words but to leave function words unstressed (see Section 3.3 of Wells, 2006), it is possible that this trend might also be observed in formulaic sequences. These examples were generated by personal introspection: (7) What are you on about (7a) What are you ON about (8) I just want to show that I deserve to be here (8a) I just want to show that I deserve to BE here The unmarked articulation of (7) and (8) requires stress to be put on on and be as in (7a) and (8a). Therefore, at first sight, these examples seem to suggest that prepositions and the auxiliary be can be stressed if they are within formulaic sequences. However, with further consideration, it becomes clear that these examples do not actually work because on here functions as a lexical word even though it shares the same orthographic form as the function word on. By the same token, the word be is used as a lexical word not a function word. This point can be illustrated if on and be are substituted, as in (9) and (10). (9) What are you COMPLAINING about (10) I just want to show that I deserve to STAY here

62

The Prosody of Formulaic Sequences

From examples (9) and (10) it can be seen that although the utterances are much less formulaic than (7a) and (8a), stress still has to be assigned to the slots that on and be occupy. Put simply, the case of (7) and (8) cannot prove that there is irregularity with the stress placement on the function words in formulaic sequences. However, note example (11): (11) What’s going on (11a) What’s going ON The word on is a function word in (11) but, for the unmarked articulation, the stress has to fall on this word to achieve broad focus. Based on this example, one may speculate that while broad focus in general spoken English means introducing stress on the final lexical word of an intonation unit, there is a distinct pattern in formulaic language that it is acceptable to achieve broad focus by introducing stress on the final function word. However, this speculation awaits empirical validation. Looking again at (11), it appears that stressing the word on is not a stylistic consideration. Neither does it appear to be the effect of preposition stranding because of examples (12), (12a), (13) and (13a). (12) This is the course I’m interested in *(12a) This is the COURSE I’m INterested IN (13) It’s this topic that I’d like to write a book on *(13a) It’s THIS topic that I’d like to write a book ON Perhaps it can be speculated that the formulaicity of (11) is the key reason that the function word on has to be stressed. The function words in and on in (12) and (13) remain unstressed because these two utterances are (relatively) unformulaic. Formulaic language researchers, including Aijmer (1996), have suggested that formulaic language has a fixed prosody which makes storage and retrieval of these sequences easier (Kuiper and Haggo, 1984). However, the discussion so far shows that, if based on introspection data alone, it is not easy to discover the rules governing stress placement that are unique to formulaic language. Further investigations must be undertaken with a more careful research design and using a spoken corpus in order that a fuller account of the stress patterns can be obtained. However, the discussion in this section should shed some light on such investigations, such as study two (see Chapter 5).

Can We Identify Formulaic Language Based on Prosodic Cues?

63

3.6 Chapter summary Chapter 3 has highlighted a number of important theoretical arguments that represent the foundation of the empirical studies presented in the next few chapters. Section 3.2 explained why the prospect of identifying formulaic language by prosodic cues has attracted the attention of many researchers. Section 3.3 reviewed the evolution of the phonological method in the literature on the speech of child language learners, foreign language learners and adult native speakers. The purpose of this discussion was to reveal the lack of direct, empirical evidence to support the idea that formulaic language broadly defined and in spontaneous adult speech can be identified based on prosodic cues alone. This lack represents the gap that studies one and two attempt to tackle. Section 3.4 explored the relationship between prosodic cues and the holistic processing of formulaic language. It provided important background information about the arguments and the interpretation of study one. In particular, it explained why study one focused on intonation unit boundaries instead of pauses, and why study one was careful when interpreting whether the results of the prosodic investigation provided evidence for the holistic storage and retrieval of formulaic language. Section 3.5 provided a review of the prosodic cues that are said to be associated with formulaic language. The prosodic cues suggested in the literature have informed the direction of the investigations in studies one and two.

4

Study One: Do Formulaic Sequences Align with Intonation Units?

4.1 Introduction This chapter reports the first of a series of empirical studies that examine the possibility of identifying formulaic language on the basis of prosodic cues alone. This study was motivated by the suggestion that prosodic cues can aid in the identification of formulaic language in adult speech (see Sections 3.1 and 3.2). In Sections 3.3.1 and 3.3.2, it was established that the phonological method finds some empirical support when applied to child speech and the speech of dysfluent foreign language learners. However, despite much speculation, comprehensive empirical evidence is yet to be found to support the application of the phonological method to the speech of adult native speakers. There is reason to believe that the application of the phonological method to the speech of adult native speakers and fluent foreign-language learners will be less straightforward. The success of the application of this method to the language of children and dysfluent foreign language learners is due to the simplicity of these two types of spontaneous speech (see Sections 3.3.1 and 3.3.2).1 However, the speech of native speakers and fluent language learners is more complex. The expectation is that reliance on pauses to locate the boundaries of formulaic sequences will no longer be sufficient because the fluent speech of these two groups of people is unlikely to contain as many pauses. If, as Erman and Warren (2000) suggest, 58.6 per cent of adult native speaker speech is formulaic and if every formulaic sequence has to be delineated by pauses on both sides, we should expect to find very extensive pausing and hesitation phenomena in the spoken discourse of adult native speakers. The reality is, instead, that given the advanced linguistic ability of native speakers and fluent language learners, they are able to construct relatively long, complex stretches of speech containing several

66

The Prosody of Formulaic Sequences

formulaic sequences, combined with other single-word units. There is also a lesser need for these fluent speakers to pause for speech planning in comparison with novice language learners. This ability to construct long stretches of speech between two consecutive pauses is an indicator of the language proficiency of native speakers and learners alike (Wood, 2001). The solution to this problem of the unreliability of pauses in fluent speech, which is to rely on intonation contours, as mentioned in Chapter 3, has been offered by Raupach (1984) (see Section 3.3.2). Many researchers (e.g. Aijmer, 1996; Altenberg and Eeg-Olofsson, 1990; Baker and McCarthy, 1988; Moon, 1997) have, indeed, predicted that formulaic sequences form single intonation units (see Section 3.5.1). Baker and McCarthy (1988), Altenberg and Eeg-Olofsson (1990) and Moon (1997), in particular, hypothesized that this prosodic feature would help to identify formulaic language. Since the differing selection principles of the two mainstream formulaic language identification methods produce formulaic sequences of different character (see Section 2.3), it was important to ensure that the results of study one were not methodological artefacts but the true effects of formulaicity. Sections 2.3.1 and 2.3.2 revealed the characteristics of the two methods in their definition of the boundaries of formulaic sequences. Study one aimed to determine the extent to which formulaic sequence boundaries align with those of intonation units. With this aim in mind, it is possible that the definition of formulaic sequence boundaries used could have a direct impact on the results. Therefore, the decision was made to use both corpus-based automatic extraction and collective native speaker judgement in this study. In terms of research design, although this study controlled for two variables (i.e. native speaker speech vs. proficient EFL learner speech, and corpus-based automatic extraction vs. native speaker judgement), the 2  ×  2 design was not adopted because of the constraints of the use of native speaker judgement as a methodology for the identification of formulaic language (see Sections 1.2 and 2.3.2). The consequence of this design (i.e. applying corpus-based automatic extraction only to the proficient EFL learner data and native speaker judgement method only to the native speaker data) was that comparisons between the results of the two groups were not possible. Regarding the use of native speaker judgement in the identification of formulaic language in spontaneous adult speech, this book is special in two aspects. First, in recognizing the need to increase the robustness of the

Do Formulaic Sequences Align with Intonation Units?

67

formulaicity judgement data through an improved implementation of the native speaker method (see Section 2.3.3), study one introduced a number of preventive measures to address the negative influences of the human factor on the quality of the formulaicity judgement data (see Sections 2.3.3 and 4.7.1). Secondly, the present study introduced a five-point formulaicity judgement scoring system, which enabled an investigation into the effect of confidence scores on the extent to which formulaic language aligns with intonation unit boundaries. This sub-research question was motivated by a previous suggestion that the degree of fixedness of a formulaic sequence could be revealed through an examination of the extent to which the sequence formed single intonation units (see Section 3.5.1). Further information concerning these arguments and the formulaicity judgement scoring system will be provided in Section 4.7.2. This chapter is organized into two parts. Part A (Sections 4.2 to 4.5) reports on the first investigation into the extent to which formulaic sequences extracted automatically from the Nottingham International Corpus of Learner English (NICLEs-CHN), which is a corpus of conversations with proficient EFL speakers (see Dahlmann, 2009; see also Section 4.2), align with intonation unit boundaries. Part B (Sections 4.6 to 4.10) reports on the second investigation, which focused on the extent of the alignment between intonation unit boundaries and formulaic sequences identified by native speaker judgement in an academic lecture taken from the Nottingham Multimodal Corpus (NMMC), a corpus compiled by researchers at the University of Nottingham (see Section 4.6). Finally, the chapter summary (Section 4.11) brings together the findings of both investigations and concludes with the prospect of using intonation unit boundaries as indicators of formulaic sequence boundaries in the fluent speech of adult native speakers and EFL learners.

Part A: The case of proficient EFL learners2 Part A reports on the first investigation which aimed to examine the extent to which instances of a formulaic sequence identified using WordSmith 6.0 (Scott, 2012) from a spoken corpus of proficient EFL learners, align with intonation unit boundaries. This investigation was motivated by the need to examine the validity of the suggestion that prosodic cues are reliable indicators of formulaic language boundaries and can help to identify formulaic language (see Sections 3.2 and 3.5.1). In tracing the origins of this suggestion (see Section 3.3), it became clear that the notion of identifying formulaic language by tracing pause locations might not be viable if the speech was fluent. The hypothesis of study

68

The Prosody of Formulaic Sequences

one, therefore, was that intonation unit boundaries (instead of pauses) can be a reliable indicator of formulaic sequence boundaries when it comes to the identification of formulaic language in fluent speech. The aim was to test this hypothesis by investigating the extent to which formulaic sequence boundaries align with intonation unit boundaries. Part A focuses on the fluent speech of EFL learners.

4.2 Corpus data The learner corpus used in this study was the 230,000-word sub-corpus NICLEs-CHN (Dahlmann, 2009) which is made up of interview data collected, longitudinally, from seventeen Chinese EFL learners who were studying at a British university.3 When their course started, their IELTS test scores ranged from 5.0 to 7.5. The test score distribution is as follows: 5.0 (one participant), 5.5 (five participants), 6.0 (four participants), 6.5 (four participants), 7.0 (two participants) and 7.5 (one participant). The participants were interviewed three to five times on a regular basis by a native speaker of British English with topics including their life, study and English language-learning experience in the UK. The longitudinal design of this corpus was effective in capturing the learners’ development of phraseological knowledge over time and their idiosyncratic phraseological preferences. To generate a corpus which included only learner language, the native speaker interviewer’s turns were removed.

4.3 Method There are two parts to the methodology of this study: first, identifying potential formulaic sequences in the corpus and, secondly, performing the phonological analyses.

4.3.1 The automatic extraction of potential formulaic sequences The identification of potential formulaic sequences was carried out by corpusbased automatic extraction using WordSmith 6.0 (Scott, 2012). Some caution should be exercised here because not all products of WordSmith can be deemed formulaic (see Section 2.3.1). The computer tool is only designed to extract

Do Formulaic Sequences Align with Intonation Units?

69

highly recurrent, contiguous sequences of words which have invariant word (orthographic) forms.4 In WordSmith, the products of the tool are called clusters. Therefore, the formulaicity of the clusters is separately verified. The intention of the identification procedures was to obtain enough samples of the same formulaic sequences for prosodic analysis. As prosodic analysis is very labour intensive, the number of samples to be subjected to prosodic analysis had to be manageable and reasonable. In this study, WordSmith 4.0 was set to extract clusters ranging from two to eight words with a minimum frequency of occurrence threshold set at five. Table 4.1 shows the most frequent clusters for each specific length. As can be seen in the table, the length of the clusters was inversely proportional to their frequency. The most frequent two-word cluster, I think, occurred 3,028 times in the learner corpus NICLEs-CHN, but the most frequent seven-word item, I don’t know how to say, appeared only nine times. In order to have enough samples for the prosodic analysis, the most frequent five-word cluster, I don’t know why, was chosen for detailed investigation because fifty-six instances of the item seemed to be of a manageable sample size. Furthermore, this cluster was used by half of the participants (n = 8). Based on the frequency of this five-word cluster and the fact that it appeared to be widely known and used among this community of Chinese learners of English, there was reason to believe that I don’t know why was a psycholinguistically real formulaic sequence. Therefore, I don’t know why was regarded as a formulaic sequence in this study. To investigate whether the five-word formulaic sequence, I don’t know why, aligned with intonation units, the method chosen for the identification of the intonation unit boundaries was important. For this investigation, techniques from both the auditory and the acoustic analyses were used (see Cruttenden, Table 4.1 The most frequent clusters in NICLEs-CHN and their frequencies Category

Most frequent item in the category

Two-word Three-word Four-word Five-word Six-word Seven-word Eight-word

I think I don’t I don’t know I don’t know why I don’t know how to I don’t know how to say /

Frequency 3,028 800 338 56 35 9 0

70

The Prosody of Formulaic Sequences

1997 for a discussion of these two types of analyses), details of which are provided in Section 4.3.2.

4.3.2 The identification of intonation units Intonation unit (IU) is a term used by Chafe (1979, 1987, 1994) to refer to what other researchers call a tone unit (Quirk et al., 1964), intonation group (Cruttenden, 1997), tone group (Halliday, 1967), intonation phrase (Pierrehumbert, 1980) or intonational phrase (Nespor and Vogel, 1986; Selkirk, 1984). An intonation unit is a unit of speech covered by a single intonation contour. According to the literature, intonation unit boundaries are associated with a number of acoustic markers (see Table 4.2 for a summary of these markers) although their identification in spontaneous speech is by no means uncontroversial (Cruttenden, 1997). As Wennerstrom (2001, p. vii) notes, ‘Phonologists’ conceptions of prosody are far from settled.’ Unfortunately, even among linguists, there is a failure to understand the complexity of the problem of identifying intonation units. In fact as Cruttenden (1997) points out, linguists often tend to oversimplify the issue: ‘It is undoubtedly also true that the majority of linguists assume that the phonetic correlates of boundaries between intonation-groups are far more straightforward than they actually are’ (p. 29). For instance, there is the assumption that intonation units should coincide with syntactic boundaries. Therefore, researchers often rely on syntactic structure to identify intonation unit boundaries. However, from their experience in an empirical investigation into the prosody of Scottish English, Brown, Currie and Kenworthy (1980, p. 47) note that the coincidence of syntactic and intonation unit boundaries cannot be taken for granted, especially in spontaneous speech: In the task of reading texts aloud, readers usually produce fluent and coherent chunks of speech, readily relatable to coherent syntactic/semantic structures, which is hardly surprising since the reader is not confronted by most of the planning decisions that a speaker speaking in the here and now and interacting with an interlocutor is confronted with (see Brown, 1978 for an extended discussion of this). … in read texts a syntactic boundary usually coincides with an intonation boundary and often coincides with a pause. In nonfluent spontaneous speech it is very common to find these boundaries not coinciding. This may occur for many reasons – because the speaker is having planning difficulties, because he thinks his interlocutor may jump in and take away his turn, because he wants to create a special effect.

Table 4.2 A summary of acoustic markers associated with intonation group boundaries

Descriptions

Support

The first few syllables of an intonation group should be uttered faster such that it seems like an initial burst The syllable before the end of the tone group tends to be lengthened This is only occasional. Cannot work alone

Cruttenden (1997)

Anacrusis

T

Syllable-lengthening

T

Pause

T

A global declination in pitch over the tone group (aka pitch reset)

I

Pitch is highest at the beginning of a tone group and drops gradually until it is lowest at the end

Falling tone at the boundary

I

Change in direction of pitch in unaccented syllables

I

Wichmann (2000) found that 90% of tone groups end with a falling tone Normally, change in direction means the syllable is pitch accented. But Cruttenden says if this is happening, it is an indicator of a tone group boundary

Cruttenden (1997) Van Lancker, Canter and Terbeek (1981), Cruttenden (1997) Brownn Currie and Kenworthy (1980), Cruttenden (1997), DuBois et al. (1993), Roelof de Pijper and Sanderman (1994), Wichmann (2000) Wichmann (2000), Pickering (1996) Cruttenden (1997)

Do Formulaic Sequences Align with Intonation Units?

Type (temporal/ intonational)

71

72

The Prosody of Formulaic Sequences

What Brown, Currie and Kenworthy (1980) suggest is that the alignment between intonation unit boundaries and the boundaries of syntactic/semantic structures is all very orderly in read-aloud speech because this kind of speech does not require planning; however, this is unlikely to be the same in spontaneous speech. The essence of this suggestion echoes Chafe’s (2006) ideas exactly. Considering the fact that the acoustic markers in Table 4.2 may not represent completely reliable indicators of intonation unit boundaries in spontaneous speech, the identification of intonation unit boundaries in study one could not rely solely on the acoustic/instrumental approach. Instead, it used both the auditory/perceptual and the acoustic/instrumental approach (see Cruttenden, 1997 for the characteristics of these two approaches to prosodic analysis). In fact, it can be argued that a purely acoustic/instrumental approach does not exist because phonologists must, necessarily, listen to and analyse speech samples aurally before they can make sense of the rises and falls in the pitch graph or spectrograms from the speech analysis machines. Obviously, the advantage of the acoustic approach is that the observations can be visualized using the graphs of various, measurable acoustic parameters. These graphs can be very useful in complementing the illustration of the prosodic phenomena perceived by the researcher in the auditory analysis. For this reason, pitch graphs from the acoustic analysis using Praat (Boersma, 2001) are also included in the report of the results in part A, Section 4.4. To investigate whether the five-word formulaic sequence, I don’t know why, forms a single intonation unit, the audio clips of all fifty-six instances were manually extracted from the corpus. These clips, containing approximately 4 seconds to the left and to the right of the target sequence were then analysed. Following this, the intonation unit boundaries were auditorily identified, guided by the prosodic cues listed in Table 4.2.

4.4 Results There are four possible outcomes concerning the alignment of formulaic sequence boundaries with intonation unit boundaries (see Figure 4.1). Case 1 showed complete alignment of formulaic sequence boundaries with intonation unit boundaries. Through analysis of fifty-six instances of I don’t know why in the corpus NICLEs-CHN, 55 per cent of the instances were found to belong to Case 1. Case 4 demonstrated no alignment between the two types of boundaries;

Do Formulaic Sequences Align with Intonation Units?

73

Case 1

Case 2

Case 3

Case 4

Both boundaries align with IU boundaries

Only left boundary aligns with IU boundaries

Only right boundary aligns with IU boundaries

Neither boundary aligns with IU boundaries

: An utterance

: A formulaic sequence

: An intonation unit (IU)

Figure 4.1 The four possible outcomes concerning the alignment of formulaic sequence boundaries with intonation unit boundaries.

4 per cent belonged to this case. In the middle were Case 2, which showed alignment of the boundaries to the left-hand side only, and Case 3, which showed alignment of the boundaries to the right only. Fourteen per cent belonged to Case 2, while the remaining sixteen per cent belonged to Case 3. Eleven per cent of the instances were placed in the Miscellaneous group because, upon listening to the audio recording, despite the fact that they were identical in form on the orthographic transcript, these seemed to be examples of I don’t know. Why … instead of the target sequence, I don’t know why. The above percentages can also be seen in terms of probabilities. In other words, an instance of I don’t know why in a spoken corpus has a 55-per cent chance of aligning completely with intonation unit boundaries on both sides, an 85-per cent chance (=55% + 14% + 16%) of aligning on at least one side, and a 4-per cent chance of showing no alignment at all. As the corpus was made up of interview data, natural turn-taking between the interlocutors also played a role in the overall picture of intonation unit alignment. When I don’t know why began a speaker’s turn, the intonation unit boundary alignment found on the left side was necessitated by turn-taking. It was unclear whether formulaicity had an important role to play in the distribution of intonation unit boundaries in this circumstance. The situation was the same when I don’t know why occurred at the end of the turn. The turn-taking effect actually affected 41 per cent of the cases in this investigation, which is quite high. So far the general results have been presented. For a more detailed analysis, the next section will investigate whether the grammatical structure or function of an instance of a formulaic sequence could have an effect on the extent to which the instance aligned with intonation unit boundaries. However, before the prosodic investigation, there must be a consideration of the grammatical structures or functions of I don’t know why in the corpus data.

74

The Prosody of Formulaic Sequences

4.4.1 The effect of grammatical structure/function categories In written English, I don’t know why is often used to begin a question to which the speaker does not know the answer. That is why this type of formulaic language is called a sentence stem (Nattinger and DeCarrico, 1992). In the NICLEsCHN corpus, however, it was found that only eight out of the total of fifty-six instances functioned as sentence stems. The majority of the instances (k = 38) appeared as complete clauses. According to Altenberg’s (1998) grammar-based categorization of spoken multi-word units, these complete clauses can be further broken down into ‘comment clauses’ (i.e. parentheticals, k = 6) and ‘disclaimers’ (k = 32). Table 4.3 shows examples taken from the corpus The task of categorizing all fifty-six instances of I don’t know why into these three functional categories was not as straightforward as it may appear to be. Sometimes there was simply not enough information available within the concordance lines. In fact, in 18 per cent of the cases, it was difficult to determine to which category the instance should be assigned. Therefore, these cases were placed within the Unanalysable category. Example 1 below from the corpus illustrates this point: Example 1 S2: I had changed my address but I don’t know why they also sent the package to China Participant 009, interview 3 Table 4.3 Examples of I don’t know why as sentence stem, comment clause and disclaimer in NICLEs-CHN Sentence stem

S2: since the people around me are laughing but I don’t know why [laughs] they laugh Participant 004, interview 5

Comment clause

S2: yeah sometimes we you know hang out to the pub or some party yeah S1: and do you go to the cinema or to S2: well actually I don’t like you know go to the cinema but most of my friends I don’t know why most of my friends they like you know go the cinema quite often and every weekend call me to go outside to watching them. Participant 002, interview 5

Disclaimer

S2: but it’s really right decision. S1: mm mmm S2: yeah I don’t know why but after that I think the world is more better than before here Participant 003, interview 5

Do Formulaic Sequences Align with Intonation Units?

75

This example is unanalysable because it is ambiguous whether I don’t know why is functioning as a sentence stem or a disclaimer. If the instance was delivered at fast pace and contained no break in rhythm or intonation, example 1 would be more likely to be categorized as a sentence stem. Alternatively, if it was delivered at a slow pace, along the lines of I had changed my address but … I don’t know why … they also send the package to China, it is more likely the instance would be categorized as a disclaimer. Without listening to the prosodic information at the stage of categorization, it was difficult to gain a full picture of the pattern of use of the formulaic sequence. This observation was also made by De Cock (1998, see also Section 2.3.1). However, the decision, in this study, was to keep prosodic information away from the categorization of the instances of the formulaic sequence I don’t know why and to base the process on textual data only. This was to avoid the problem of circularity that would arise if the prosodic description was based on instances of the formulaic sequence that had been categorized, in the first place, according to prosodic information. Table 4.4 shows the percentages and raw frequencies when the formulaic sequences are classified according to Altenberg’s (1998) categories, as well as the level of alignment with intonation unit boundaries. As the table highlights, only instances of I don’t know why that were complete clauses (which function as either comment clauses or disclaimers) aligned completely with intonation unit boundaries (Case 1). On the other hand, instances of I don’t know why that represented sentence stems never aligned with intonation unit boundaries. It has long been suggested that intonation unit boundaries will coincide with clause boundaries (see Croft, 1995; Crystal, 1975). Therefore, it is hardly surprising that nearly three quarters (74 per cent) of the instances that represented full clauses aligned completely with intonation unit boundaries (i.e. Case 1).5 However, it is interesting to investigate why 25 per cent of the complete clauses did not fall under Case 1 as expected. An analysis reveals two main factors for the non-alignment in this situation: conjunctions and repetitions. In these cases, the intonation unit boundary fell either on the left or right side of conjunctions or repetitions that preceded or succeeded these instances of I don’t know why. Example 2 below shows a case in which the intonation unit ended after the conjunction but that succeeded I don’t know why, thus resulting in Case 2 alignment, rather than Case 1. Example 3, on the other hand, shows a case in which the intonation unit began before the repetition I don’t that preceded I don’t know why, thus resulting in Case 3 alignment.

76

Table 4.4 The alignment between intonation unit boundaries and instances of I don’t know why of different function categories (raw frequencies in brackets)

Case 1 Both boundaries align with IU boundaries Case 2 Only left boundary aligns with IU boundaries Case 3 Only right boundary aligns with IU boundaries Case 4 Neither boundary aligns with IU boundaries Miscellaneous TOTAL

Comment clause, %

Disclaimer, % Unanalysable, % TOTAL, %



5 (3)

41 (23)

9 (5)

55 (31)

5 (3)

5 (3)

4 (2)



14 (8)

2 (1)



5 (3)

9 (5)

16 (9)

2 (1)



2 (1)



4 (2)

5 (3) 14 (8)

– 10 (6)

5 (3) 57 (32)

– 18 (10)

10 (6) 99* (56)

*The total does not add up to 100 per cent due to fractions being rounded up or down.

The Prosody of Formulaic Sequences

Sentence stem, %

Do Formulaic Sequences Align with Intonation Units?

77

Example 2 S2: but it’s really right decision S1: Mm mmm S2: yeah I don’t know why but after that I think the world is more better than before here Participant 003, interview 5

Example 3 S2: because I th= I don’t like USA at all I don’t I don’t know why but I don’t like it Participant 056, interview 2

The purpose of dividing the instances according to their grammatical structure/functional categories was to investigate whether this had any effect on the alignment of formulaic sequences with intonation unit boundaries. The descriptive statistics above provide an affirmative answer to this question. A further 5 × 4 chi-square analysis based on the raw frequencies in Table 4.4 echoes this finding and reveals that there is a significant relationship between the type of alignment with intonation unit boundaries (i.e. Cases one to four) and the functional categories, with χ2 (12, 56) = 35.1, p  Κ ≥ 0.3 grey Κ ≥ 0.4 light grey The colour-coding function was very useful because, as can be seen in Table 6.2, it is immediately clear which of the thirty native speaker judges were relatively idiosyncratic in their formulaicity judgements (i.e. those who showed the fewest colours in their row/column) and which of them had agreed with most of the other judges (i.e. those with a number of cells coloured in the same row/ column). For instance, judges KHU, RMA and NSE appear to have the smallest number of coloured cells within their columns, meaning that their formulaicity judgements differed the most from the rest of the group. The reason for their apparently idiosyncratic formulaicity judgements can be uncovered by referring to their identification task booklets and looking at their post-task interview data. Further explorations into their perception of formulaicity were made, for instance, questioning their own interpretation of formulaicity, their memory of their experience in the training and the identification task and their markings on the task booklet. Based on an initial investigation, the most obvious distinction between these three judges and the rest of the panel was that these three seemed to have taken a much more liberal view of formulaicity. While most judges had marked only 20 per cent of Text W as formulaic, these three judges – NSE, KHU and RMA – had marked 56.3 per cent, 65.6 per cent and 84.8 per cent, respectively, of Text W as formulaic (before listening to the audio recordings). Furthermore, the time these judges spent on the identification task was almost

MMA NSE STO MCO NBO JPR KHU SAT DHU KHA NSA ECO EAD RDO MFO ADO JKI RMA BGR RNE WVI JPI LCH BPA SLY CBR VWA CME CDD JCH

1.0 0.1 0.2 0.5 0.3 0.3 0.1 0.3 0.4 0.4 0.2 0.3 0.2 0.2 0.4 0.3 0.3 0.0 0.2 0.1 0.2 0.3 0.2 0.2 0.3 0.2 0.3 0.3 0.3 0.3

MMA

0.1 1.0 0.1 0.2 0.1 0.2 0.3 0.1 0.2 0.2 0.2 0.1 0.1 0.2 0.1 0.1 0.2 0.2 0.1 0.1 0.2 0.3 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.1

NSE

0.2 0.1 1.0 0.3 0.2 0.2 0.1 0.3 0.2 0.3 0.3 0.3 0.1 0.3 0.2 0.2 0.3 0.1 0.2 0.1 0.3 0.3 0.2 0.2 0.2 0.1 0.3 0.2 0.3 0.4

STO 0.5 0.2 0.3 1.0 0.4 0.4 0.1 0.4 0.4 0.5 0.3 0.4 0.2 0.3 0.4 0.3 0.3 0.0 0.2 0.1 0.2 0.4 0.4 0.2 0.3 0.3 0.4 0.4 0.3 0.4

MCO 0.3 0.1 0.2 0.4 1.0 0.3 0.1 0.3 0.3 0.4 0.2 0.2 0.2 0.3 0.4 0.3 0.4 0.0 0.3 0.2 0.2 0.3 0.3 0.2 0.3 0.2 0.3 0.3 0.3 0.3

NBO 0.3 0.2 0.2 0.4 0.3 1.0 0.1 0.4 0.3 0.4 0.3 0.3 0.2 0.2 0.2 0.4 0.4 0.1 0.3 0.1 0.3 0.4 0.3 0.2 0.3 0.3 0.3 0.4 0.5 0.2

JPR 0.1 0.3 0.1 0.1 0.1 0.1 1.0 0.2 0.1 0.1 0.2 0.0 0.1 0.2 0.1 0.1 0.2 0.3 0.1 0.1 0.3 0.2 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2

KHU 0.3 0.1 0.3 0.4 0.3 0.4 0.2 1.0 0.3 0.5 0.4 0.3 0.3 0.3 0.2 0.3 0.4 0.1 0.3 0.2 0.3 0.4 0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4

SAT 0.4 0.2 0.2 0.4 0.3 0.3 0.1 0.3 1.0 0.3 0.3 0.3 0.2 0.3 0.4 0.3 0.3 0.0 0.3 0.1 0.3 0.4 0.4 0.2 0.2 0.3 0.3 0.4 0.3 0.3

DHU 0.4 0.2 0.3 0.5 0.4 0.4 0.1 0.5 0.3 1.0 0.4 0.3 0.3 0.4 0.3 0.3 0.4 0.0 0.4 0.2 0.3 0.5 0.4 0.3 0.3 0.2 0.3 0.5 0.4 0.3

KHA 0.2 0.2 0.3 0.3 0.2 0.3 0.2 0.4 0.3 0.4 1.0 0.2 0.3 0.3 0.3 0.3 0.4 0.1 0.4 0.1 0.3 0.4 0.4 0.2 0.3 0.2 0.4 0.4 0.4 0.4

NSA 0.3 0.1 0.3 0.4 0.2 0.3 0.0 0.3 0.3 0.3 0.2 1.0 0.4 0.2 0.3 0.3 0.2 0.0 0.3 0.1 0.2 0.3 0.3 0.3 0.2 0.3 0.3 0.2 0.2 0.3

ECO 0.2 0.1 0.1 0.2 0.2 0.2 0.1 0.3 0.2 0.3 0.3 0.4 1.0 0.1 0.2 0.3 0.3 0.0 0.4 0.3 0.2 0.2 0.3 0.2 0.3 0.4 0.2 0.3 0.3 0.3

EAD 0.2 0.2 0.3 0.3 0.3 0.2 0.2 0.3 0.3 0.4 0.3 0.2 0.1 1.0 0.3 0.2 0.3 0.1 0.2 0.1 0.3 0.4 0.2 0.2 0.3 0.0 0.3 0.3 0.3 0.3

RDO 0.4 0.1 0.2 0.4 0.4 0.2 0.1 0.2 0.4 0.3 0.3 0.3 0.2 0.3 1.0 0.3 0.3 0.0 0.2 0.1 0.2 0.3 0.2 0.2 0.3 0.1 0.2 0.3 0.3 0.3

MFO

Table 6.2 Inter-judge reliability (Cohen’s Kappa) of the judgement data BEFORE the thirty judges listen to the audio recordings

146 The Prosody of Formulaic Sequences

MMA NSE STO MCO NBO JPR KHU SAT DHU KHA NSA ECO EAD RDO MFO ADO JKI RMA BGR RNE WVI JPI LCH BPA SLY CBR VWA CME CDD JCH

0.3 0.1 0.2 0.3 0.3 0.4 0.1 0.3 0.3 0.3 0.3 0.3 0.3 0.2 0.3 1.0 0.3 0.0 0.3 0.2 0.3 0.3 0.3 0.2 0.2 0.3 0.4 0.3 0.2 0.3

ADO

0.3 0.2 0.3 0.3 0.4 0.4 0.2 0.4 0.3 0.4 0.4 0.2 0.3 0.3 0.3 0.3 1.0 0.1 0.3 0.2 0.3 0.4 0.4 0.2 0.4 0.3 0.3 0.4 0.4 0.3

JKI

0.0 0.2 0.1 0.0 0.0 0.1 0.3 0.1 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.1 1.0 0.0 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.1 0.1

RMA 0.2 0.1 0.2 0.2 0.3 0.3 0.1 0.3 0.3 0.4 0.4 0.3 0.4 0.2 0.2 0.3 0.3 0.0 1.0 0.1 0.2 0.2 0.3 0.2 0.2 0.2 0.3 0.3 0.2 0.2

BGR 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.2 0.1 0.2 0.1 0.1 0.3 0.1 0.1 0.2 0.2 0.0 0.1 1.0 0.2 0.1 0.3 0.3 0.3 0.2 0.1 0.2 0.2 0.1

RNE 0.2 0.2 0.3 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.2 0.2 0.3 0.2 0.3 0.3 0.1 0.2 0.2 1.0 0.3 0.3 0.3 0.3 0.2 0.3 0.3 0.5 0.3

WVI 0.3 0.3 0.3 0.4 0.3 0.4 0.2 0.4 0.4 0.5 0.4 0.3 0.2 0.4 0.3 0.3 0.4 0.1 0.2 0.1 0.3 1.0 0.3 0.2 0.2 0.2 0.4 0.4 0.5 0.4

JPI 0.2 0.1 0.2 0.4 0.3 0.3 0.1 0.3 0.4 0.4 0.4 0.3 0.3 0.2 0.2 0.3 0.4 0.0 0.3 0.3 0.3 0.3 1.0 0.2 0.3 0.3 0.4 0.4 0.2 0.3

LCH 0.2 0.1 0.2 0.2 0.2 0.2 0.1 0.3 0.2 0.3 0.2 0.3 0.2 0.2 0.2 0.2 0.2 0.1 0.2 0.3 0.3 0.2 0.2 1.0 0.2 0.3 0.2 0.3 0.3 0.2

BPA 0.3 0.1 0.2 0.3 0.3 0.3 0.1 0.3 0.2 0.3 0.3 0.2 0.3 0.3 0.3 0.2 0.4 0.0 0.2 0.3 0.3 0.2 0.3 0.2 1.0 0.2 0.2 0.4 0.3 0.3

SLY 0.2 0.1 0.1 0.3 0.2 0.3 0.1 0.3 0.3 0.2 0.2 0.3 0.4 0.0 0.1 0.3 0.3 0.0 0.2 0.2 0.2 0.2 0.3 0.3 0.2 1.0 0.2 0.4 0.2 0.2

CBR 0.3 0.2 0.3 0.4 0.3 0.3 0.1 0.4 0.3 0.3 0.4 0.3 0.2 0.3 0.2 0.4 0.3 0.0 0.3 0.1 0.3 0.4 0.4 0.2 0.2 0.2 1.0 0.3 0.4 0.4

VWA 0.3 0.2 0.2 0.4 0.3 0.4 0.1 0.4 0.4 0.5 0.4 0.2 0.3 0.3 0.3 0.3 0.4 0.0 0.3 0.2 0.3 0.4 0.4 0.3 0.4 0.4 0.3 1.0 0.4 0.2

CME 0.3 0.2 0.3 0.3 0.3 0.5 0.2 0.4 0.3 0.4 0.4 0.2 0.3 0.3 0.3 0.2 0.4 0.1 0.2 0.2 0.5 0.5 0.2 0.3 0.3 0.2 0.4 0.4 1.0 0.3

CDD 0.3 0.1 0.4 0.4 0.3 0.2 0.2 0.4 0.3 0.3 0.4 0.3 0.3 0.3 0.3 0.3 0.3 0.1 0.2 0.1 0.3 0.4 0.3 0.2 0.3 0.2 0.4 0.2 0.3 1.0

JCH

A Multimodal Approach to the Identification of Formulaic Language 147

148

The Prosody of Formulaic Sequences

double of that of the rest of the judges. As shown in Appendix 8, NSE and KHU spent 57  minutes 30  seconds and 51  minutes 16  seconds, respectively, on the task, significantly longer than the 30 minutes spent by the other judges.2 The purpose of highlighting the potentially different perspectives on formulaicity of these three judges is simply to show that this method of tabulating and colouring the results of the inter-judge reliability test enabled the exploration of individual differences in the formulaicity judgements. While there is very little (if any) empirical research in the literature that has examined the notion of formulaicity judgements, the method introduced here can be replicated in future research. Moreover, the system developed in the present study can be applied to future studies as a technique for selecting native speakers to take part in the formulaic language identification process. In other words, judges who have been shown to make similar judgements in the trial sessions using this system can be chosen to proceed to the next stage, in which these ‘qualified’ judges will continue to analyse formulaic language on a larger scale. So far the discussion has covered the formulaicity judgement data from before the introduction of the original audio recording of Text W. As for the judgement data after the judges had listened to the recording, the same procedures were followed to produce the matrix shown in Table 6.3.

6.6 Results and discussion To determine if there was an increase in inter-judge reliability when the judges were able to listen to the original audio recordings, the number of Cohen’s Kappa coefficient (κ) results that had reached three thresholds (i.e. κ  ≥  0.2, κ  ≥  0.3 and κ  ≥ 0.4) in Tables 6.2 and 6.3 were tallied. The results are summarized in Table 6.4. To illustrate how Table 6.4 works, an examination of the example of MMA can be made. Before MMA listened to the audio recording, he could only base his formulaicity judgements on the information provided on the textual transcript (i.e. the identification task booklet). At this stage, MMA’s judgements agreed with twenty-four other judges at the κ ≥ 0.2 level, ten judges at the κ ≥ 0.3 level and one judge at the κ  ≥  0.4 level (see Table 6.4). The fact that MMA agreed with twenty-four judges at the κ  ≥  0.2 level is impressive because there were only thirty judges in total. After MMA had listened to the audio recordings, his judgements were even closer to the other judges. This is shown in the increase

A Multimodal Approach to the Identification of Formulaic Language

149

from ten to thirteen other judges with whom MMA agreed at the κ  ≥ 0.3 level and the increase from one to three judges with whom he agreed at the κ  ≥ 0.4 level (see Table 6.4). Table 6.4 clearly demonstrates an increase in inter-rater reliability within the group of judges as a whole after the exposure to the audio recordings. Before the exposure to the audio recordings, only nine judges agreed with at least fifteen other judges (i.e. half of the group) at the κ ≥ 0.3 level. However, after exposure to the audio recording, there was an increase to fifteen judges who agreed with at least fifteen other judges at the κ  ≥ 0.3 level. At the κ  ≥ 0.2 level, there was also an increase from twenty-five to twenty-seven judges agreeing with at least twenty of the other judges. The significance of this increase in the number of judges who agreed with at least half/two-third of the members in the group means that, following their exposure to the audio recording, the judges became a more homogeneous group and their perspectives on formulaicity and their formulaicity judgements had become closer. In other words, there was a notable increase in the reliability of the study. The figures in Table 6.4 show that, on average, after the exposure to the audio recordings, each judge agreed with 0.87 more judges at the κ  ≥ 0.2 level, 2.53 more judges at the κ ≥ 0.3 level and 1.53 more judge at the κ ≥ 0.4 level. This again is concrete evidence for the positive effect of exposure to the audio recordings on the level of agreement between the judges. This positive effect is reflected in the increase in the reliability of the judgement data. Table 6.4 also establishes that the exposure to the audio recordings benefited each judge to different extents. While some judges demonstrated marked gains (e.g. EAD, RDO, ADO, BGR and SLY), other judges did not seem to benefit from the exposure at all (e.g. RMA), and some even dropped very slightly in their level of agreement with the rest of the judges (e.g. NSE, SAT and BPA). One of the questions the judges were asked in the post-task interview was whether and how (they thought) they had benefited from the exposure to the audio recordings. Clearly, as in all qualitative research, the participants’ selfreporting may not have been fully reflective of reality. This is not because the judges intended to withhold any information, but because they might have found it difficult to verbalize their implicit and unconscious knowledge of formulaicity fully and accurately. Nonetheless, summarizing the responses of the judges in the interview, there were four main ways in which the judges thought the exposure to the audio recordings had benefited their formulaicity judgements. Surprisingly, the first point that many judges mentioned is that

MMA NSE STO MCO NBO JPR KHU SAT DHU KHA NSA ECO EAD RDO MFO ADO JKI RMA BGR RNE WVI JPI LCH BPA SLY CBR VWA CME CDD JCH

1.0 0.1 0.3 0.5 0.3 0.2 0.2 0.3 0.3 0.5 0.3 0.2 0.2 0.4 0.4 0.4 0.3 0.1 0.3 0.1 0.2 0.4 0.3 0.2 0.4 0.2 0.3 0.3 0.3 0.3

MMA

0.1 1.0 0.1 0.2 0.1 0.2 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.1 0.2 0.2 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2

NSE

0.3 0.1 1.0 0.3 0.2 0.2 0.1 0.4 0.2 0.3 0.3 0.3 0.2 0.3 0.2 0.3 0.3 0.1 0.2 0.1 0.2 0.3 0.2 0.2 0.2 0.1 0.3 0.2 0.4 0.4

STO 0.5 0.2 0.3 1.0 0.3 0.4 0.2 0.4 0.4 0.5 0.4 0.4 0.3 0.5 0.5 0.5 0.4 0.1 0.3 0.1 0.3 0.5 0.4 0.2 0.3 0.3 0.4 0.4 0.3 0.4

MCO 0.3 0.1 0.2 0.3 1.0 0.3 0.1 0.2 0.3 0.4 0.2 0.2 0.3 0.4 0.4 0.3 0.4 0.0 0.3 0.2 0.2 0.3 0.3 0.2 0.3 0.2 0.3 0.4 0.2 0.2

NBO 0.2 0.2 0.2 0.4 0.3 1.0 0.2 0.4 0.3 0.4 0.3 0.2 0.3 0.3 0.2 0.4 0.4 0.1 0.3 0.1 0.4 0.4 0.3 0.3 0.3 0.4 0.3 0.3 0.4 0.3

JPR 0.2 0.3 0.1 0.2 0.1 0.2 1.0 0.3 0.1 0.2 0.2 0.0 0.1 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.3 0.2 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.1

KHU 0.3 0.2 0.4 0.4 0.2 0.4 0.3 1.0 0.2 0.5 0.4 0.3 0.3 0.3 0.2 0.4 0.3 0.1 0.3 0.2 0.4 0.4 0.3 0.3 0.3 0.3 0.4 0.4 0.5 0.4

SAT 0.3 0.2 0.2 0.4 0.3 0.3 0.1 0.2 1.0 0.4 0.3 0.2 0.2 0.3 0.3 0.3 0.3 0.0 0.2 0.2 0.3 0.3 0.4 0.2 0.3 0.3 0.3 0.3 0.2 0.2

DHU 0.5 0.2 0.3 0.5 0.4 0.4 0.2 0.5 0.4 1.0 0.4 0.3 0.4 0.4 0.3 0.4 0.4 0.1 0.4 0.3 0.3 0.5 0.5 0.3 0.4 0.3 0.3 0.5 0.4 0.3

KHA 0.3 0.2 0.3 0.4 0.2 0.3 0.2 0.4 0.3 0.4 1.0 0.2 0.4 0.4 0.3 0.5 0.4 0.1 0.4 0.1 0.3 0.4 0.4 0.2 0.4 0.3 0.3 0.5 0.4 0.3

NSA 0.2 0.1 0.3 0.4 0.2 0.2 0.0 0.3 0.2 0.3 0.2 1.0 0.3 0.3 0.3 0.3 0.2 0.0 0.3 0.1 0.2 0.3 0.2 0.3 0.3 0.3 0.3 0.2 0.3 0.3

ECO 0.2 0.1 0.2 0.3 0.3 0.3 0.1 0.3 0.2 0.4 0.4 0.3 1.0 0.3 0.2 0.5 0.3 0.1 0.4 0.3 0.4 0.3 0.4 0.3 0.4 0.4 0.2 0.4 0.4 0.4

EAD 0.4 0.2 0.3 0.5 0.4 0.3 0.1 0.3 0.3 0.4 0.4 0.3 0.3 1.0 0.4 0.5 0.4 0.1 0.3 0.1 0.3 0.4 0.3 0.2 0.4 0.2 0.4 0.3 0.3 0.4

RDO

Table 6.3 Inter-judge reliability (Cohen’s Kappa) of the judgement data AFTER the thirty judges listen to the audio recordings

0.4 0.1 0.2 0.5 0.4 0.2 0.1 0.2 0.3 0.3 0.3 0.3 0.2 0.4 1.0 0.4 0.3 0.0 0.2 0.1 0.2 0.3 0.2 0.2 0.3 0.1 0.3 0.4 0.3 0.3

MFO

150 The Prosody of Formulaic Sequences

MMA NSE STO MCO NBO JPR KHU SAT DHU KHA NSA ECO EAD RDO MFO ADO JKI RMA BGR RNE WVI JPI LCH BPA SLY CBR VWA CME CDD JCH

0.4 0.2 0.3 0.5 0.3 0.4 0.2 0.4 0.3 0.4 0.5 0.3 0.5 0.5 0.4 1.0 0.4 0.1 0.4 0.2 0.3 0.5 0.5 0.3 0.3 0.4 0.3 0.5 0.4 0.4

ADO

0.3 0.1 0.3 0.4 0.4 0.4 0.2 0.3 0.3 0.4 0.4 0.2 0.3 0.4 0.3 0.4 1.0 0.1 0.3 0.2 0.3 0.4 0.4 0.2 0.4 0.4 0.3 0.4 0.3 0.3

JKI

0.1 0.2 0.1 0.1 0.0 0.1 0.3 0.1 0.0 0.1 0.1 0.0 0.1 0.1 0.0 0.1 0.1 1.0 0.1 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.1 0.1 0.1 0.1

RMA 0.3 0.1 0.2 0.3 0.3 0.3 0.1 0.3 0.2 0.4 0.4 0.3 0.4 0.3 0.2 0.4 0.3 0.1 1.0 0.1 0.3 0.3 0.3 0.3 0.2 0.3 0.3 0.4 0.3 0.2

BGR 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.3 0.1 0.1 0.2 0.2 0.0 0.1 1.0 0.2 0.1 0.3 0.2 0.3 0.3 0.1 0.2 0.2 0.2

RNE 0.2 0.2 0.2 0.3 0.2 0.4 0.3 0.4 0.3 0.3 0.3 0.2 0.4 0.3 0.2 0.3 0.3 0.1 0.3 0.2 1.0 0.3 0.3 0.4 0.3 0.3 0.3 0.3 0.5 0.3

WVI 0.4 0.2 0.3 0.5 0.3 0.4 0.2 0.4 0.3 0.5 0.4 0.3 0.3 0.4 0.3 0.5 0.4 0.1 0.3 0.1 0.3 1.0 0.4 0.3 0.3 0.2 0.3 0.5 0.5 0.4

JPI 0.3 0.1 0.2 0.4 0.3 0.3 0.1 0.3 0.4 0.5 0.4 0.2 0.4 0.3 0.2 0.5 0.4 0.0 0.3 0.3 0.3 0.4 1.0 0.2 0.3 0.3 0.3 0.4 0.3 0.3

LCH 0.2 0.1 0.2 0.2 0.2 0.3 0.1 0.3 0.2 0.3 0.2 0.3 0.3 0.2 0.2 0.3 0.2 0.1 0.3 0.2 0.4 0.3 0.2 1.0 0.2 0.3 0.2 0.3 0.4 0.3

BPA 0.4 0.1 0.2 0.3 0.3 0.3 0.1 0.3 0.3 0.4 0.4 0.3 0.4 0.4 0.3 0.3 0.4 0.0 0.2 0.3 0.3 0.3 0.3 0.2 1.0 0.2 0.3 0.3 0.3 0.3

SLY 0.2 0.1 0.1 0.3 0.2 0.4 0.1 0.3 0.3 0.3 0.3 0.3 0.4 0.2 0.1 0.4 0.4 0.0 0.3 0.3 0.3 0.2 0.3 0.3 0.2 1.0 0.2 0.4 0.3 0.3

CBR 0.3 0.1 0.3 0.4 0.3 0.3 0.1 0.4 0.3 0.3 0.3 0.3 0.2 0.4 0.3 0.3 0.3 0.1 0.3 0.1 0.3 0.3 0.3 0.2 0.3 0.2 1.0 0.3 0.4 0.4

VWA 0.3 0.2 0.2 0.4 0.4 0.3 0.2 0.4 0.3 0.5 0.5 0.2 0.4 0.3 0.4 0.5 0.4 0.1 0.4 0.2 0.3 0.5 0.4 0.3 0.3 0.4 0.3 1.0 0.4 0.2

CME 0.3 0.2 0.4 0.3 0.2 0.4 0.2 0.5 0.2 0.4 0.4 0.3 0.4 0.3 0.3 0.4 0.3 0.1 0.3 0.2 0.5 0.5 0.3 0.4 0.3 0.3 0.4 0.4 1.0 0.5

CDD 0.3 0.2 0.4 0.4 0.2 0.3 0.1 0.4 0.2 0.3 0.3 0.3 0.4 0.4 0.3 0.4 0.3 0.1 0.2 0.2 0.3 0.4 0.3 0.3 0.3 0.3 0.4 0.2 0.5 1.0

JCH

A Multimodal Approach to the Identification of Formulaic Language 151

152

The Prosody of Formulaic Sequences

Table 6.4 The number of judges each individual judge agrees with at κ ≥ 0.2, κ ≥ 0.3 and κ ≥ 0.4 levels κ ≥ 0.2

κ ≥ 0.3

κ ≥ 0.4

Before After Change Before After Change Before After Change MMA NSE STO MCO NBO JPR KHU SAT DHU KHA NSA ECO EAD RDO MFO ADO JKI RMA BGR RNE WVI JPI LCH BPA SLY CBR VWA CME CDD JCH

24 7 18 25 24 25 6 27 22 26 26 23 22 23 22 25 25 2 20 11 26 25 26 21 22 20 24 23 27 23

23 5 21 25 23 24 6 27 24 26 27 24 26 24 22 26 26 2 25 11 26 26 26 21 26 22 24 25 28 25

−1 −2 3 0 −1 −1 0 0 2 0 1 1 4 1 0 1 1 0 5 0 0 1 0 0 4 2 0 2 1 2

10 0 4 17 9 12 0 17 9 20 16 7 6 11 11 18 17 0 6 0 7 17 14 4 8 6 15 17 13 13

13 0 8 20 11 13 0 17 9 21 19 7 16 21 11 24 19 0 11 1 10 18 16 3 15 9 16 20 17 15

3 0 4 3 2 1 0 0 0 1 3 0 10 10 0 6 2 0 5 1 3 1 2 −1 7 3 1 3 4 2

1 0 0 7 1 3 0 5 2 8 4 0 0 0 1 0 3 0 0 0 1 7 4 0 0 0 3 5 6 1

3 0 0 11 1 3 0 4 0 9 10 0 5 5 2 9 4 0 3 0 1 9 5 0 1 2 2 8 6 5

2 0 0 4 0 0 0 -1 -2 1 6 0 5 5 1 9 1 0 3 0 0 2 1 0 1 2 −1 3 0 4

the exposure to the audio recordings gave them additional time to consider Text W. This extra time on the task facilitated their noticing of the presence of more formulaic sequences in Text W than they had noted before. Furthermore, listening to Text W also helped them to see it from an additional and alternative perspective, so that they noticed new formulaic sequences that they had missed before. The third way is that the judges reported a better grasp of the speaker’s meaning in context after exposure to the recordings which allowed them to assess whether word sequences in utterances had been used formulaically or

A Multimodal Approach to the Identification of Formulaic Language

153

not. The final way is that they would delete their previous choices of formulaic sequences if, upon listening to the audio recordings, they found the formulaic sequences to be articulated in an unformulaic way. That is to say, some of these judges had expected formulaic sequences to be articulated fluently, so, if the actual phonological realization did not match their expectations, some judges changed their minds and revised their previous choices. Whichever of these four benefits aided the judges most, what really mattered in the present study was that the provision of original audio recordings did indeed help to increase the inter-judge reliability and, therefore, the robustness of the formulaicity judgement data. In previous formulaic language research involving the use of native speaker judgement as an identification method, researchers often provided only textual speech transcripts for the judges. In the light of the findings of this study, future researchers should consider providing the native speaker judges with the original audio recordings, along with the textual speech transcripts. Although, there are more issues to discuss regarding how these procedures should be implemented, there is no space to do them justice here. Issues of copyright and anonymity of the data notwithstanding, researchers still need to consider what instructions to give to the judges on the use of the audio data. For instance, some judges might not bother spending extra time listening to the audio recordings but just get on with the identification of formulaic language in the textual transcripts. This situation was carefully prevented in this study because the interactive interface (see Appendix 3) allowed the researcher to monitor exactly how much time each judge spent listening to the audio data. Otherwise, if the issue of potentially unequal exposure to the audio recordings is not addressed, it can become an extraneous variable. There are also technical issues concerning how to integrate the audio data into the identification process so that native speaker judges with limited computer skills can engage in the process of constantly switching between the aural and textual information with ease.

6.7 Chapter summary This chapter reported the findings of a study that investigated whether listening to the original audio recordings would improve the inter-judge reliability in research involving the use of collective native speaker judgement to identify formulaic language. The empirical evidence from the judgement data of thirty

154

The Prosody of Formulaic Sequences

native speaker judges confirmed that this method does represent an improvement on previous attempts to identify formulaic language without audio assistance. The idea of listening to the original audio recordings was inspired by the concern in the literature regarding the crucial role of intonation in the understanding of meaning in context. For many reasons, previous formulaic language research only provided judges with the textual transcripts. Therefore, this study asked whether providing judges with concurrent audio recordings would help researchers to improve the reliability of their studies. In attempting to answer that question, a mathematical formula was applied to the calculation of inter-rater reliability (i.e. Cohen’s Kappa). Moreover, a way to convert the qualitative formulaicity judgement data into numerical data was introduced, along with the use of a matrix to present inter-judge reliability data. To my knowledge, this is the first study to apply these techniques in formulaic language research. As for future implications, these techniques should be transferable to future formulaic language research using native speaker judgement. There is, however, still room for improvement regarding the technique of comparing the judgement data using Microsoft Excel. For instance, this system does not yet allow for embedded formulaic sequences or overlaps in formulaic sequence boundaries. Furthermore, to keep the operations simple and manageable, the system so far can handle only the binary distinction between the formulaic and the unformulaic. In future research, the goal should be to advance the Excel comparison technique so that it can manage the complexity of making comparisons between the five confidence score categories used for the formulaicity judgement scoring system.

7

Conclusions: The Prosody of Formulaic Language

From Chapters 1 to 6, this book has presented an assessment of the possibility of identifying formulaic language using ‘the phonological method’. This final chapter of the book brings together the discussions from the previous chapters to address the overarching question: Can formulaic language be identified based on prosodic cues? Based on the evidence presented, it is clear that an interpretation of the phonological method should be multi-dimensional. That is, any vision for the use of the phonological method should extend beyond the tracking of intonation units or pauses, even though this has been the main focus of previous research. Study one began with the basics by considering the tracking of intonation unit boundaries as indicators of formulaic language. Study two expanded on the interpretation of the phonological method to take account of the stress and rhythmic patterns of formulaic language. The belief was that if unique stress and rhythmic patterns could be found in samples of word sequences which had confidently been judged as formulaic, then tracking these patterns could help to establish the formulaicity of an unknown word sequence. It is fair to say that the prosodic features found to be associated with formulaic language in study two represent only the first steps in the attempt to widen the basis of the phonological method by increasing the range of prosodic cues considered. Future research should aim to broaden the basis even further. In an attempt to achieve this, study three explored an alternative interpretation of the phonological method that focused on the fact that native speaker judges seldom take prosody into account during the process of formulaic language identification. This interpretation sought to improve the process of formulaic language identification in the light of recommendations highlighting the importance of prosody in assisting the understanding of meaning in context. In comparison to this, the original

156

The Prosody of Formulaic Sequences

interpretation offered by studies one and two was radically different since they simply aimed to uncover specific prosodic indicators of formulaicity. Despite the differences between these two interpretations of the phonological method, both should be developed simultaneously in future research in order to achieve optimal results in the identification of formulaic language in spontaneous speech. The remaining sections of this final chapter have two main goals: 1) to review the findings of these three studies, consider to what extent they answer the book’s overarching research question and signal the gaps for further research; and 2) to look ahead at the implications of the work presented in this book in terms of formulaic language research methodology, language teaching and Natural Language Processing, and, finally, to highlight where further research is needed in relation to the prosody of formulaic language.

7.1 Looking back: The findings, the limitations and the overarching research question 7.1.1 The overall findings After the introduction to the book, Chapter 2 provided an overview of the literature on formulaic language with a particular focus on issues connected to the identification of formulaic language. This discussion on the characteristics of formulaic language identified using a variety of methodologies informed the methodological decisions made regarding the empirical studies presented here. Chapter 3 marked the first consideration of the possibility of identifying formulaic language on the basis of prosodic cues by presenting a review of the relevant literature, published since the 1970s, on child, learner and adult speech. Based on this review, it became clear first that the feasibility of the method of using prosodic cues to identify formulaic language has been validated in the case of child language and dysfluent foreign language learners. However, such empirical validation has not been emulated in the case of adult native speakers and fluent language learners. As such, it appears that many studies which are frequently cited as supporting evidence for the feasibility of the phonological method in the case of adult native speakers and fluent language learners actually represent only hypotheses or assumptions. The very few empirical studies that have addressed certain prosodic aspects of formulaic language in adult speech have not provided sufficient or direct enough evidence because of their narrow focus on read-aloud, semantically opaque idioms or on the

Conclusions

157

temporal aspect of semantically transparent formulaic sequences. Secondly, the review in Chapter 3 argued that if prosodic cues really can be used as indicators of formulaic language in the spontaneous speech of native speakers and fluent language learners, intonation unit boundaries are likely to be the most reliable indicators. This hypothesis was shown to be true in the subsequent empirical studies. The three empirical studies presented here built on the theoretical foundation presented in Chapter 3 and took steps to increase the depth of the exploration of the potential of using prosodic cues to identify formulaic language. Study one set out to test the hypothesis that intonation unit boundaries can be reliable indicators of formulaic language in the fluent, spontaneous speech of adult native speakers and proficient EFL learners. Having recognized the potential influence the identification method might have on the nature of the formulaic language identified (see Chapter 2), study one controlled for this influence by using both of the two mainstream formulaic language identification methods (i.e. corpusbased automatic extraction and native speaker judgement) to sample formulaic sequences for prosodic analysis. The results show that, regardless of the identification method used or the type of fluent spontaneous speech presented (i.e. that of native speakers or that of proficient EFL learners), formulaic language aligned completely with intonation unit boundaries more often than not. This finding supports some researchers’ predictions that formulaic language often forms single intonation units and sheds light on the psycholinguistic processing of formulaic language. However, since the highest level of alignment between formulaic language and intonation unit boundaries was only 55 per cent, the idea that formulaic language can be identified solely by tracking intonation unit boundaries is clearly inaccurate. This outcome is congruent with the concern raised towards the end of the review in Chapter 3 that, in contrast to the speech of child and novice learners due to the complexity of the adult linguistic system, it is not possible to identify adult formulaic language based on the phonological criterion alone. The focus of study one was intonation. However, there is no reason to assume that formulaic language in adult speech may not be identified by other prosodic cues. The aim of study two, therefore, was to explore aspects of tempo/rhythm and stress placement in formulaic language in the speech of an adult native speaker. Again, this exploration was guided by relevant research regarding these two aspects in the literature. The results suggested that formulaic language is not often interrupted by internal pauses, nor does it often align completely with pauses. This finding adds further support to the conclusion of study one,

158

The Prosody of Formulaic Sequences

that formulaic language is more likely to be delineated on both sides by IU boundaries than pauses. A surprising finding of study two, however, concerns the rate of articulation of formulaic language. Contrary to common belief, the articulation rate of formulaic language did not appear to be distinct from the speaker’s mean articulation rate. Several hypotheses were proposed to explain this finding, including the speculation that the speaker may slow down their articulation rate in order to call attention to the meaning of individual formulaic sequences, and that the distribution of higher and lower articulation rate formulaic sequences may have cancelled out the influence of formulaicity on the mean articulation rate of the formulaic language. Nonetheless, this finding potentially challenges the observation, found in other studies, regarding the mental processing advantage enjoyed by formulaic language. Finally, the results of the stress placement analysis revealed that words within the formulaic sequences were markedly less likely to attract stress than the general lecture speech (as represented by lecture data from the SEC). These three major findings from study two suggest that the prosody of formulaic language has some unique aspects, such as stress placement, but others, such as pause locations and rate of articulation, follow the patterns of general English. Taken together, the results of studies one and two have shed light on the use of prosodic cues as a means of identifying formulaic language in fluent, spontaneous speech. Based on the empirical investigations presented here, the initial conclusion is that prosodic cues alone, whether intonation, rhythm or stress placement, do not provide conclusive evidence of the formulaicity of wordstrings in the speech of adult native speakers (unlike in the speech of children and dysfluent language learners). In the case of adult native speakers, a multi-criteria approach to the validation of the formulaicity of wordstrings should be taken. This means that before applying the phonological method to a target wordstring, there should first be other logical reasons or criteria suggesting that the wordstring might be formulaic (see Section 2.2). If the application of the phonological method to a particular wordstring is not supported by other evidence, even if the wordstring exhibits all of the prosodic cues associated with formulaic sequences, it cannot be guaranteed that the wordstring is formulaic. Although researchers have long suggested that formulaic language can be identified based on the phonological criterion, the original audio recordings of the spoken texts from which formulaic language is to be identified have never before been provided during the identification procedure (Wood’s 2006 study being the exception). This was an issue that this book aimed to redress. Therefore,

Conclusions

159

a key argument here was that the robustness of the formulaicity judgement data would be improved if the native speaker judges were allowed to listen to the original audio recordings during the judgement procedure. Study three set out to examine this hypothesis empirically by comparing the formulaicity judgement data of thirty native speaker judges before and after they had listened to the original recordings. As expected, the results showed that listening to the audio recordings increased the level of agreement among judges in their formulaicity judgements. One explanation for this increase in inter-judge reliability is that listening to the audio recordings deepened their understanding of the contextual meaning of the spoken discourse, and, consequently, reduced the margin of error that tended to occur when judges had to guess the contextual meaning subjectively without the necessary audio information. Study three provided the necessary empirical evidence to show the validity of this suggestion and the mechanism by which listening benefited the formulaicity judgement data. In short, it has been shown here that prosodic cues, to some extent, provide a window into the mental processing of formulaic language. Nevertheless, it is not possible to ascertain the formulaicity of wordstrings on the sole basis of the prosodic cues often associated with formulaicity. However, providing native speaker judges with the opportunity to listen to the audio recordings of the sample text during the native speaker judgement procedure has been shown to increase the robustness of the formulaicity judgement data. This should provide another means for improving the formulaic language identification procedure in studies of the prosody of formulaic language.

7.1.2 Reflections on the process of investigation and the limitations of the present investigations The three empirical studies presented here have demonstrated ways of addressing the overarching research question concerning the identification of formulaic language on the basis of prosodic evidence. While the previous section provided a concise summary of the main arguments and findings of the studies, it is also important to reflect on the process of the investigation and draw attention to the potential limitations of these studies. On reflection, the investigation throughout these studies has raised some interesting issues: first, the need to adapt the phonological method according to the data type, and, secondly, the long-standing problem of identifying and validating formulaic language. The first issue refers to the varying methods of

160

The Prosody of Formulaic Sequences

implementation and levels of success achieved by the phonological method depending on whether it is applied to child data, learner data or adult native speaker data. This is not least because of the distinctive nature of formulaic language in child, learner and adult language (see Wray, 2002). However, as demonstrated here, the implementation of the phonological method can also depend on whether the speech data is monologic or dialogic. When applying the phonological method to dialogic data, it is necessary to isolate the prosodic effects of formulaicity from the effects of conversation prosody (see Hughes and Szczepek Reed, 2006; Szczepek Reed, 2004, for a discussion on conversation prosody). For instance, the level of alignment between formulaic language and intonation units in study one part A could be challenged because conversation prosody and turn-taking may have influenced the results. The example of study one part A shows that, perhaps, in future research, the interpretation and assessment of the effectiveness of the phonological method may need to be adjusted according to the data type. The second issue to highlight regarding the investigation process concerns the sampling of formulaic language. Obviously, this issue is of vital importance because if the samples of formulaic language investigated are not valid, then the results and conclusions of all three studies could be seriously challenged. Even if mainstream methods such as corpus-based automatic extraction and native speaker judgement are used to sample formulaic language, because of the very different nature of their products, the results could still be challenged as being artefacts of methodology. For this reason, the issue with sampling formulaic language was addressed carefully in these studies. In the context of this book, formulaic language was sampled using both corpus-based automatic extraction and native speaker judgement, for which the procedures followed have been clearly explained in previous chapters. Recognizing the doubts over the formulaicity of the products of automatic extraction, the clusters identified by WordSmith were not all automatically assumed to be formulaic. As an example, justification was given for why the automatically extracted five-word sequence I don’t know why should be regarded as formulaic in the particular learner corpus investigated. Similarly, the products of native speaker judgement were presented with careful descriptions regarding their nature. This level of caution is necessary to ensure the quality of formulaic language research. In terms of the limitations of this book, a notable weakness is the small amount of data investigated: Study one part A only examined fifty-six instances of one formulaic sequence, while study two was based on just sixty-two

Conclusions

161

different formulaic sequences. Nonetheless, there were reasons for the small data size. In the context of study one part A, given the resources available, the intention was to limit the number of instances investigated to a manageable size. Therefore, the 3,000 instances of I think were not chosen. The nine instances of I don’t know how to say were also not considered because there were too few instances for a reasonable observation to be made. In the context of study two, the number of formulaic sequences available for prosodic analysis was restricted by the amount of text data the native speaker judges could explore without compromising their attention to the task or the consistency of their formulaicity judgements. Based on the results of the pilot study sessions, it was found that the native speaker judges could only manage five  minutes of spoken data within the one-hour time frame allowed for each data collection session. This was partly because the pre-task training and post-task interview had to be counted towards this time frame as well. Therefore, the samples of formulaic sequences to which formulaicity could fairly confidently be assigned and thus submitted to prosodic analysis consisted of only sixty-two instances. This small dataset means that only tentative conclusions can be drawn based on the original findings outlined in the previous section (7.1.1). Nevertheless, future research can build on these tentative conclusions and investigate the possibility of identifying formulaic language on the basis of prosodic cues using significantly more data. Another issue concerns the sources of spoken data from which formulaic language was sampled. The investigation in study one part A sampled fiftysix instances of a formulaic sequence from interview data, but subsequently discovered that in 41 per cent of cases it was unclear whether the level of alignment observed between the formulaic language and intonation units was due to the effect of turn-taking, formulaicity or both. As explained previously in Chapter 4, this observation highlights the problem with using conversational data in investigations into the prosody of formulaic language; the dynamics of dyadic interaction can introduce a variety of changes to speech prosody (see Couper-Kuhlen and Selting 1996; Szczepek Reed, 2004, for discussions on the prosody of English conversations). In order to isolate the effect of formulaicity from the effect of turn-taking, study one part B began to use monologic data. Spoken monologues from lectures suited the requirement of extended monologic speech with few interruptions and were, therefore, also used in studies two and three. The issue with using lecture data, however, is that academic lectures as an individual genre have their own unique prosodic structure. Therefore, it may

162

The Prosody of Formulaic Sequences

be argued that the findings from study two and study one part B may not all be directly applicable to the case of everyday conversational speech. The choice of data constitutes a methodological quandary for these studies. Nonetheless, this dilemma can hopefully be resolved in future research if it is conducted using a variety of spoken data. Comparison of these results will then allow judgement of whether the findings of this study, based on lecture data, are consistent with other speech genres. Finally, due to space limitations, only a preliminary analysis of the prosodic features of the sixty-two formulaic sequences was provided in studies one part B, two and three. In the future, further analysis should be carried out on each of the sixty-two formulaic sequences. One particularly interesting direction for future research would be the relationship between the various categories of formulaic language and their prosodic features. In fact, the results of study one part A have already shed light on this relationship, so it will be exciting to see whether the findings are replicated when the data set is expanded from one formulaic sequence (i.e. I don’t know why) to sixty-two formulaic sequences. The scale of this future project cannot be underestimated because it may involve the examination of several thousand instances of formulaic sequences in the Nottingham Multimodal Corpus or any other equivalent lecture corpus. Based on my experience with I don’t know why, a formulaic sequence can serve several functions. Moreover, the function served depends on the context in which it is used. Therefore, it is necessary to carefully examine the concordance lines one by one in context. Even so, the categorization of the sixty-two formulaic sequences would still be a highly controversial issue because consensus is yet to be reached concerning the categorization of formulaic sequences.

7.1.3 Can we identify formulaic language based on prosodic cues? An answer to the overarching research question based on the three empirical studies This book aimed to determine whether formulaic language can be identified on the basis of prosodic cues. Realizing the huge implications of a positive answer to this question, I set out to explore, empirically, whether the use of the phonological method to achieve this goal is realistic and feasible. I considered different interpretations of the phonological method and finally concluded that the idea of identifying formulaic language using the phonological method is not as straightforward as is often assumed. It is true that prosodic features may shed some light on the degree of formulaicity of individual formulaic sequences

Conclusions

163

(see Section 4.10), and that, in some respects, formulaic language demonstrates unique prosodic patterns (see Chapter 5). However, in native speaker and proficient learner speech, formulaic language cannot be identified by tracking intonation unit boundaries or pauses alone. The assumption that formulaic language aligns readily with intonation units or pauses simply does not hold true in the light of corpus evidence. Therefore, instead of concluding that formulaic language cannot be identified by prosodic cues, this book sought alternative approaches to the phonological method with the aim of enabling the maximum utilization of prosodic evidence to help identify formulaic language. An attempt was made to expand the scope of the phonological method to consider not only intonation but also temporal and stress placement cues. It also tried re-interpreting the phonological method as a multimodal approach to the identification of formulaic language using the method of native speaker judgement. It is clear that further research is needed in order to explore a range of approaches to the phonological method. Nonetheless, some initial success has already been achieved in these studies in terms of the multimodal approach to formulaic language identification using native speaker judgement, and in terms of discovering some of the unique stress placement patterns of formulaic language. The experience of authoring this book has shown that it is important to consider the phonological method from multiple perspectives. Consideration of the prosodic evidence in the process of formulaic language identification should not be limited to the tracking of prosodic cues, whether they are intonation, temporal or stress placement cues. Instead, these studies have demonstrated how a creative approach to the phonological method can be adopted to improve the process of formulaic language identification. This alternative and creative approach was inspired by the observation that, in the past, native speaker judgement as a formulaic language identification method was often conducted using only a textual transcript. Speech transcripts with synchronized audio streams have never before been provided to native speaker judges in investigations on the use of formulaic language in speech (with the exception of Wood, 2006). Therefore, in the light of growing awareness of the role of multimodality in linguistic analysis (Adolphs, 2008), these studies have successfully demonstrated how the phonological method can be reinterpreted as a multimodal approach to the use of native speaker judgement. Compared to the traditional approach to the phonological method represented by studies one and two, this alternative approach appears to have produced by far the most satisfactory results.

164

The Prosody of Formulaic Sequences

7.2 Looking ahead: Implications, applications and directions for future research This book has taken a fundamental interest in a very specific methodological issue in formulaic language research. However, the significance of this book reaches far beyond addressing this issue. This section argues that these studies can be approached from many different angles. As a piece of basic research, this book has provided an empirical evaluation of the phonological method for the identification of formulaic language. However, this evaluation has not only generated a multi-dimensional approach to the phonological method, but also a deeper understanding of the significance of the prosodic features of formulaic language. The discussion below will consider the implications and applications of the findings of these studies in future formulaic language research and methodology, language teaching and learning and natural language processing.

7.2.1 Implications for formulaic language research and methodology At the heart of these studies is the issue of whether and how it is possible to identify formulaic language on the basis of prosodic cues. While this is the most fundamental contribution of this book to formulaic language research and methodology, it has also provided other new and important insights into various aspects of formulaic language. First of all, the findings presented here belong to only a handful of studies that have addressed the phonological aspect of formulaic language. In the past two to three decades, most research on formulaic language has explored the phenomenon from the lexical, psycholinguistic and language teaching and learning perspectives. Researchers within the emergentist tradition (e.g. Bush, 2001; Bybee, 2001, 2002, 2006; Bybee and Scheibman, 1999) have produced a considerable amount of work on phonetic reduction at word boundaries. However, this line of research has often focused on the segmental aspect of formulaic language, rather than the suprasegmental (i.e. prosodic) aspect. The discussions in this book have covered all three aspects of the prosody of formulaic language (i.e. intonation, stress and rhythm), providing the comprehensive approach which sets this piece of work apart from most existing work in the field. Nevertheless, there is still some distance to go before research on the phonology of formulaic language is equivalent to that on other aspects of

Conclusions

165

formulaic language, which have been fast developing in recent years. However, this book has hopefully provided some directions as to how research on the prosody of formulaic language can be implemented and which topics are worth further investigation. Therefore, this book could represent the first step towards another decade of research on the phonological aspects of formulaic language. Secondly, this book has demonstrated some original techniques concerning the implementation of collective native speaker judgement as a formulaic language identification method. These transferable techniques include 1) using a multimodal approach to native speaker judgement to increase the inter-rater reliability of the judgement data; 2) the drastic increase in the number of judges involved in the native speaker judgement process to further improve the robustness of the judgement data; 3) an advanced method of collating and analysing formulaic judgement data with the help of Microsoft Excel; 4) the report of inter-rater reliability statistics as a measure and an indicator of the robustness of the native speaker formulaicity judgement data; and 5) the tabulation of the inter-rater reliability results to create a matrix (with colour-coding) which enables the exploration of inter-judge differences in terms of formulaicity judgements. As mentioned in Chapter 6, these techniques should make a great improvement to the robustness of native speaker judgement as a method for formulaic language identification in future research. In recent formulaic language research, native speaker judgement has not been very popular, probably because it has been regarded as subjective and unscientific. However, the techniques proposed in these studies represent significant steps towards increasing the scientific rigour and reducing the subjectivity of the native speaker judgement method. In the long term, it is hoped that these studies will inspire greater attention to be paid to native speaker judgement so that the application of this method in research can be improved even further. It is strongly believed that the benefit of increased understanding of the logical basis of native speaker judgement will go far beyond the native speaker judgement method itself. If, in the future, researchers can isolate all possible factors that can affect native speaker judgement, these factors could be used to develop more sophisticated automatic tools for the extraction of formulaic language (as opposed to just clusters) in language corpora. This could be one of the most significant advances in the future of formulaic language research.

166

The Prosody of Formulaic Sequences

Although the ambition of using prosodic cues as a method of identifying formulaic sequences has proved difficult to realize, the fundamental role of prosodic cues in the acquisition and memory of formulaic language is incontrovertible. Further research needs to be carried out along the lines of exploring, for instance, how child first language learners are able to exploit these prosodic cues to segment and acquire formulaic language from the input of their adult carers (an idea which was briefly explained in Chapter 3). If any phonemic or prosodic cues exist that signal to child first language learners that an utterance is formulaic, then, theoretically, foreign language learners could be taught these cues to facilitate more effective acquisition of formulaic language from purely spoken input. Grounded in the domains of child language research and the psycholinguistics of second language acquisition, these two future research directions should offer an alternative perspective on the prosody of formulaic language.

7.2.2 Applications in language teaching This book began with the anticipation that the significance of the search for the prosodic patterns of formulaic language would be far greater than merely providing a solution to a methodological issue. Chapter 1 proposed that, if formulaic language has a unique prosody, and this prosody can result in contrastive meanings, then prosody should be part of the formulaic language teaching syllabus. To support this argument, selected examples from Ashby (2006) and Wells (2006) were presented. Cowie’s (1998) assertion that intonation is also an important component to include if learners are to master the use of speech formulas was also cited. Now that the empirical findings of this book have been reviewed, the present discussion can be more specific about how the observations presented here could be useful in ELT and learning. These studies have provided empirical evidence supporting the idea that, at least in some respects, formulaic language demonstrates a unique prosody. Formulaic language in native speaker data seems to have a tendency to align with intonation units, and not to contain internal pauses or receive sentence stress. Until this book, these unique prosodic features have seldom attracted the attention of formulaic language researchers, let alone language teachers. But, as empirical evidence has now become available, teachers may want to consider drawing learners’ attention to the fact that formulaic language must be delivered prosodically in a certain manner, and, moreover, that the restrictions

Conclusions

167

on the prosodic delivery of some types of formulaic language should be stricter than others. For instance, stress placement within idioms like raining cats and dogs and have eyes in the back of one’s head is fixed – any deviation from the conventional pattern could result in the loss of the original idiomatic meaning (Ashby, 2006; Wells, 2006). This kind of awareness-raising activity could have subtle but long-term influences on enhancing the comprehensibility and fluency of EFL learners’ speech. Much psycholinguistic research (e.g. Bock and Mazzella, 1983; Cutler, 1976; Terken and Noteboom, 1987; van Donselaar and Lentz, 1994) has shown that the hearers’ processing and comprehension of speech signals can be facilitated when the prosodic patterns of speech are used appropriately. Of course, a separate investigation is needed to establish whether prosodic patterns of formulaic language that deviate from those observed in this book could actually lead to a perception of low comprehensibility and poor fluency of speech. However, based on this book’s analysis of the prosody of the formulaic language in a fluently delivered lecture speech extract, a bold hypothesis can be made that there is a link between comprehensibility, the perceived fluency of speech and the prosodic delivery of formulaic language. In fact, various discussions related to this idea can be found in the literature. For instance, Wray (2002), citing Goldman-Eisler (1968), suggests that formulaic language affects pause locations, which, in turn, contributes to perceived second language speech fluency. Similarly, Wennerstrom (2006) recommends that EFL learners with halting speech problems carry out speaking drills using formulaic sequences so that they can practise making fluent stretches of speech. Both Wennerstrom (2006) and Wray’s (2002) claims have found some empirical support in the present research. With reference to Wennerstrom’s (2006) suggestion regarding the use of formulaic sequences as practice resources in prosodic training, this book has already shown that in the native speaker speech data formulaic sequences were often delivered in a single intonation contour with very few internal pauses. In other words, the argument that formulaic sequences represent fluent stretches of speech appears to be valid. However, with regard to Wray’s (2002) and Goldman-Eisler’s (1968) suggestion, the real picture is slightly more complicated and the evidence of this book cannot, straightforwardly, confirm its validity. In essence, Goldman-Eisler’s (1968) idea is that there are ‘grammatical’ and ‘ungrammatical’ pausing places in speech; pausing in grammatical places helps the hearer to decode the speech, but pausing in ungrammatical places can impede decoding. To add to Goldman-

168

The Prosody of Formulaic Sequences

Eisler’s theory, Wray (2002) argues that these (un)grammatical pausing places are closely linked to the boundaries of formulaic language. If pauses occur outside the boundaries of formulaic language (and the boundaries of syntactic units), then they may be ungrammatical. If this argument is true, then, pedagogically, teachers could increase the comprehensibility of EFL learners’ speech simply by telling them to pause only at the boundaries of formulaic language (and the boundaries of syntactic units) when they talk. However, the reality is more complicated because, as the findings of study three indicate, it is intonation unit boundaries, rather than pauses, that are commonly found at the boundaries of formulaic sequences. Further research is needed to investigate how Wray’s (2002) theory can be developed to accommodate the findings of this book. To summarize, this section has argued that the observations from this book concerning the prosody of formulaic language can be applied in language teaching to improve the comprehensibility and perceived fluency of the speech of EFL learners. The pedagogical applications follow two major routes. The first application is to exploit the quality of formulaic sequences as fluent stretches of speech in prosodic training and conduct drills with learners using formulaic sequences as practice materials. The second application is to raise learners’ awareness of the fact that formulaic language tends to be delivered, prosodically, in a certain manner. The purpose of this activity, as mentioned earlier, is to improve the comprehensibility of learners’ speech through more effective manipulation of prosody. It is true that further research needs to be carried out to empirically investigate the link between speech comprehensibility and the prosody of formulaic language. However, the present discussion has shed light on how teaching EFL learners about the prosody of formulaic language and using formulaic language in prosodic training may help to improve the comprehensibility and perceived fluency of EFL learners’ speech. Given the lack of attention to English prosody (and more generally to speaking) in the EFL classroom (Wennerstrom, 2006), prosodic training is an area that requires urgent improvements and solutions. To this end, it seems that using formulaic language is likely to represent one of these solutions.

7.2.3 Applications in Natural Language Processing The applications of n-grams (which are highly recurrent and contiguous word combinations) in the area of natural language processing (NLP) have been growing rapidly over the past decade. N-grams are now widely found in technologies

Conclusions

169

including automatic speech recognition (T’sou et al., 2000), computer-aided conversation (Todman, Rankin and File, 1999), automatic plagiarism detection (Barrón-Cedeño and Rosso, 2009; Lyon, Barrett and Malcolm, 2004; Lyon, Malcolm and Dickerson, 2001) and semantic disambiguation (Nakov and Hearst, 2005). For NLP researchers, n-grams are very useful because they can be extracted automatically and easily from large language databases. While all these applications in different areas of NLP exemplify the value of research into n-grams, one particular area of application focused on in this section is automatic speech synthesis. In the literature, there have been two notable attempts by linguists (i.e. Altenberg, 1990a, 1990b; Knowles and Lawrence, 1987) to develop automatic speech synthesis technology, in particular text-to-speech (TTS) conversion. In TTS, while converting orthographic words into phonemes is rather easy, the more challenging procedure is to assign prosodic structure to the generated speech stream, the first step of which is to tackle the segmentation of the stream into intonation units. To tackle intonation unit segmentation, Altenberg (1990a,b) adopted a top–down solution based on Crystal’s (1975) theoretical model. This model is grammar-based and operates on three levels: ‘between clauses’, ‘between clause elements’ and ‘between phrase constituents’. While this solution has produced impressive results at the ‘between clauses’ level with successful prediction of intonation units for 95 per cent of the training data, and with 93 per cent success rate, it fails to predict at the ‘between clause elements’ and ‘between phrase constituents’ levels. In this regard, Altenberg (1990b) comments that ‘the prosodic breaks at clause boundaries can generally be captured in grammatical terms, whereas the segmentation of phrases and clause elements is less regular and often easier to define in text linguistic or discourse-functional terms than in strictly grammatical terms’ (p. 286). Another weakness of the system noted by Altenberg (1990b) is its dependency on high-level grammatical analysis which may not be practically viable due to the strain on processing efficiency and resources. Considering the failure of intonation unit segmentation for smaller units and the problem of practical viability, Altenberg (1990b) urges the need for further investigations into alternative models that require only a shallow level of grammatical analysis and can account for the segmentation of smaller units. This is where I believe a model of intonation unit segmentation based on formulaic language could, potentially, be useful. Evidence for the place of formulaic language in intonation unit segmentation is derived from the finding, in these studies, that 40.3 per cent of formulaic

170

The Prosody of Formulaic Sequences

language aligned completely with intonation units in the native speaker data. Certainly, intonation unit segmentation for TTS conversion cannot rely on formulaic language alone; but the idea is that if formulaic language can be added to the algorithm of intonation unit segmentation, the success rate and effectiveness of the segmentation may be improved. This suggestion is potentially feasible because, as mentioned earlier, the automatic extraction of recurrent word combinations is straightforward from an NLP point of view. The process does not necessarily require high-level grammatical analysis, meaning that the problem Altenberg (1990b) notes with his model can be bypassed.1 Furthermore, highly recurrent word combinations tend to be two to five words in length, a size which roughly corresponds to Chafe’s (1988) observation that the mean length of an intonation unit is 5.2–5.7 words. This correspondence suggests that there is the chance that formulaic language may help to predict the segmentation of the smaller intonation units which constitute the ‘between clause elements’ and ‘between phrase constituents’ levels. This, in turn, addresses the other issue facing Altenberg’s (1990a,b) model. Formulaic language can also be incorporated into Knowles and Lawrence’s (1987) grammar-based, bottom–up solution to the intonation unit segmentation problem. Unlike Altenberg’s (1990a,b) top–down approach, this solution achieves word segmentation by devising rules to govern when to group adjacent words under one intonation unit. For instance, the grammatical words should be attached to the lexical words (e.g. the man), and adjective–noun collocations (e.g. old man) and verb–adverb collocations (e.g. walked slowly) should not be interrupted by internal intonation breaks. Although these rules are plausible, the concern is that they are not sophisticated enough. One reason for this is that the intonation units generated by this bottom–up system may well be only two to three words in length. To this end, formulaic language may be useful in the construction of longer intonation units because a rule can be added to restrict the insertion of internal intonation breaks within formulaic language. The empirical evidence for the feasibility of this rule again comes from the results of these studies. In addition, as my example from the BNC at the beginning of this book The thing is, organic produce isn’t cheap shows, grammar sometimes simply fails to explain intonation unit segmentation in spontaneous speech, and formulaic language may, in some cases, explain it more effectively. This section has discussed how formulaic language may help tackle existing problems with intonation unit segmentation in TTS conversion. Given the complexity of the issue, it is clear that neither Altenberg’s top–down approach,

Conclusions

171

Knowles and Lawrence’s bottom–up approach, nor formulaic language alone can tackle the problem sufficiently. However, if the insights from the three approaches could be combined, the complementary effects may help to increase the effectiveness of automatic intonation unit segmentation in NLP technology. Nonetheless, this idea is still at a primitive stage and further research is needed to explore how formulaic language can be incorporated into other, existing intonation unit segmentation solutions in practice. Based on the findings of this book, there is a chance that formulaic language may provide new insights into the problem of intonation unit segmentation in NLP.

7.3 Conclusion This book set out to determine whether formulaic language could be identified by prosodic cues. To address this question, I considered the phonological method from different perspectives and finally arrived at the conclusion that phonological information can facilitate the identification of formulaic language in spontaneous speech. There are two ways in which phonological information can contribute to this identification. First, certain phonological cues such as alignment with intonation units, distribution of pauses, articulation rate and stress placement patterns may be used as rough indicators of formulaicity. Research into the prosodic cues that may be indicators of formulaicity is still in its infancy, and further study in this area is expected in future. Secondly, it has been shown that, during the process of formulaic language identification using native speaker judgement, allowing judges to listen to the original audio recordings of the texts in which formulaic language is to be identified can significantly increase the validity of the results. As study three shows, if judges are given the opportunity to listen to the original recordings, the level of agreement between them is likely to increase. Although the starting point of the investigations presented here was to discover, specifically, how the phonological method could aid in the identification of formulaic language, the findings have proved to have important implications and applications for formulaic language research, language teaching and Natural Language Processing as well. These research implications require further study, but at least they demonstrate the great value and influence of research into the prosody of formulaic language. The next decade should see many more researchers beginning to consider the prosody of formulaic language. This is

172

The Prosody of Formulaic Sequences

not only because the prosody of formulaic language is a considerably underdeveloped area of research, but also because knowledge of the prosody is the most fundamental aspect in the acquisition and storage of formulaic language (see Lin, 2012, 2018). Inspired by Wray’s (2002) discussion of the notion of typographic salience and various studies on how prosodic cues aid children’s language acquisition (Fisher and Tokura, 1996; Gerken, Jusczyk and Mandel, 1994; Jusczyk et al., 1992; Kemler Nelson, Hirsh-Pasek, Jusczyk and Wright Cassidy, 1989; Morgan, 1986), my belief is that the acquisition of formulaic language happens primarily, and more naturally, through speech rather than through writing. Furthermore, I  hypothesize that it is the prosodic cues of formulaic language that help learners to notice, in the first place, that series of adjacent words form chunks, and it is according to their phonological form (as opposed to their orthographic form) that entries of formulaic language are stored in the mental lexicon and accessed when needed. If these hypotheses are valid, the study of the prosody of formulaic language could be one of the most influential areas in formulaic language research in the decades to come.

Appendix 1: Information on the Pilot Study

Session Purposes 1

Test instructions version 1a (see 0) which uses Foster’s (2001) definition of formulaic language Determine duration of the task Test whether the audio playback system (iTunes) is easy enough to use for the participants

Participants

Feedback

CF (psycho- It took CF 2 hours to go through logy) 10 minutes of transcripts CF found the task difficult and quite confusing though she completed it CF found the examples much more useful than the formal definition provided in guiding her judgements CF was misled by the last line of the instructions and thought that she would be asked to recall the examples she was presented on the instructions page CF thought the iTunes system was easy to control

Follow-up actions Shorten the transcripts to 7 minutes to allow participants to finish the task within an hour Rewritten the last line of the instructions and emphasized that the task is not a memory test so the participant does not need to burden themselves to memorize the examples

(Continued)

174

Appendix 1 (Continued)

Session Purposes 2

Feedback

Follow-up actions

HG HG took 1.5 hours to go through Test instructions version 1b (see 0) Shorten the transcripts to 5 minutes to (linguis7 minutes of transcripts which still uses Foster’s (2001) allow participants to finish the task definition tics) HG found that the task was harder within an hour Determine the duration of the task than he first thought Replaced Foster’s (2001) definition HG reported being uncertain about Determine whether the problems CF with a simpler definition that reported were because of her lack some of his judgements. Therefore, covers different types of formulaic he had to take more time to think of experience in analysing texts language during the task Reduced the number of examples to HG also relied on the examples save participants’ time to read the (not the definition) to make his instructions and give them greater judgements flexibility in their judgements Added a formulaicity judgement scoring system to allow participants to indicate how confident they are with their judgements. There are several benefits of adding this system (see discussion in Section 4.7.2). Test instructions (version 2a, see LB (nurLB found the instructions easy to Replaced the explanation about Appendix 2) and see whether they sing) understand ‘phrases with a cohesive meaning are clear enough But she expressed difficulty in or function’ grasping the meaning of ‘phrases with a cohesive meaning or function’

Appendix 1

3

Participants

Session Purposes

Participants MD (physics)

Feedback

Test instructions (version 2b, see Appendix 2)

MD found the instructions easy to understand He also understood the elaborations about different types of formulaic expressions

5

Test instructions (version 3, see MF (litera- MF took 45 minutes and CW Appendix 2) ture), 38 minutes to finish the task Test the new interactive PowerPoint CW (litera- Both thought the task was not too interface which integrates the ture) difficult audio playback (using QuickTime player) and guide the participants step by step (see Appendix 2) – this is necessary because the tasks become more complicated with the addition of the before/after listening comparison for research question 3 of the book. Test if the duration of the task is suitable after correction

The keyboard shortcut keys were removed for the sake of simplicity. Participants could use the mouse to control the audio playback anyway Windows Media Player replaces QuickTime as the audio player in the integrated interface because most people are familiar with the former than the latter. Windows Media Player also seems to operate better with PowerPoint

Appendix 1

4

Follow-up actions

(Continued)

175

Session Purposes 6

176

Appendix 1 (Continued) Participants

Follow-up actions

There was no indication that JR did the task differently because the elaborations were taken away

The setup of the final version for the main study: 5 minutes of Text W (to allow participants to finish the task within an hour) Instructions version 4 The interactive PowerPoint with integrated audio playback using Windows Media Player

Appendix 1

Test instructions (version 4, see JR (pharAppendix 2) which removes macy) the elaborations but leaves the examples. This is to compare whether the presence of the elaborations make any difference to the task (since pilot study session one participants have been saying that the examples were more useful than the elaborations in helping them make decisions)

Feedback

Appendix 2: The Instructions Used in Various Sessions of the Pilot Study

Version 1a Linguistic patterns in spoken English This study focuses on the use of multi-word units in spoken English. These fixed phrases are known by many different names. You might have come across the concepts closely associated with multi-word units. These include collocation, phrasal verbs, formulaic language, sentence frames, multi-word units, chunks, clichés, phrases, fixed phrases, idiomatic expressions, sayings, proverbs and idioms. Your task is, without consulting anyone else, any documentation or corpus, mark any language which you feel has not been constructed word by word but has been produced as a fixed chunk, or as part of a sentence stem to which some morphological adjustments or lexical additions have been required.

Here are some examples Please familiarize yourselves with these examples so that you will feel effortless when marking these multi-word units on a new text without this instruction sheet and the assistance of the researcher. Please ask the researcher any questions you are not sure about now. Notes to prosodic transcribers

178

Appendix 2

Version 1b Linguistic patterns in spoken English This study focuses on the use of multi-word units in spoken English. These fixed phrases are known by many different names. You might have come across the concepts closely associated with multi-word units. These include collocation, phrasal verbs, formulaic language, sentence frames, multi-word units, chunks, clichés, phrases, fixed phrases, idiomatic expressions, sayings, proverbs and idioms. Your task is, without consulting anyone else, any documentation or corpus, mark any language which you feel has not been constructed word by word but has been produced as a fixed chunk, or as part of a sentence stem to which some morphological adjustments or lexical additions have been required.

Here are some examples Please go through these examples carefully to give yourself a general familiarity with what multi-word units are. However, there is no need to memorize these examples as this is not a memory test. You should feel effortless when marking these multi-word units on a new text without this instruction sheet and the assistance of the researcher. Please ask the researcher any questions you are not sure about now.

Version 2a Formulaic expressions, fixed phrases and chunks in English It is believed that native speakers of English use a lot of expressions, fixed phrases and chunks in everyday talk. These include: phrases with a cohesive meaning or function, for example don’t worry about, we’ll move on to and it’s up to you common sayings, for example bits and pieces, that sort of thing and a piece of work idiomatic language, for example at the end of the day, finding my feet and off the top of my head etc.

Appendix 2

179

Version 2b Formulaic expressions, fixed phrases and chunks in English It is believed that native speakers of English use a lot of formulaic expressions, fixed phrases and chunks in everyday talk. These include: phrases which are associated with a specific situation, for example don’t worry about and it’s up to you phrases which are routinely employed for a specific act, for example we’ll move on to, I was wondering if and it seems to me common sayings, for example bits and pieces, that sort of thing and a piece of work idiomatic language, for example at the end of the day, finding my feet and off the top of my head etc. d

Version 3 Formulaic expressions, fixed phrases and chunks in English It is believed that native speakers of English use a lot of formulaic expressions, fixed phrases, and chunks in everyday talk. These include: phrases which are associated with a specific situation, for example don’t worry about, it’s up to you and tell me about it phrases which are routinely employed for a specific act, for example we’ll move on to, I was wondering if and it seems to me common sayings, for example bits and pieces, that sort of thing and moment of truth idiomatic language, for example at the end of the day, finding my feet and off the top of my head etc. … Your task You will be presented with twenty-four extracts taken from a lecture on strategic planning in business management. For each extract,

180

Appendix 2

Step 1: Mark by highlighting the formulaic expressions, phrases or chunks and indicate on a scale of 1–5 how confident you are that each of these items is formulaic (five means you are most confident) Step 2: Play the audio file and decide you would like to revise anything upon listening to the audio file and then make changes with the blue pen. Write notes to clarify if necessary. … Please note This task is about what you feel are formulaic or fixed. So there really are no “right” or “wrong” answers Please mark as many formulaic or fixed units as possible, even if you may not be very sure. Remember you can make use of the confidence scale to reflect your uncertainty This exercise is not to test your English. I am interested in gathering your opinion about formulaic expressions and the process of your judgement In cases when you think there are optional or changeable elements within an expression or a chunk, you might put them in brackets, for example it’s (absolutely) up to you Tips: Using the audio player Play/pause Spacebar Go to the previous or next section Rewind within the same section Fast forward within the same section

UP or DOWN arrow keys CTRL +ALT +LEFT arrow key CTRL +ALT +RIGHT arrow key

Version 4 Formulaic expressions, fixed phrases and chunks in English It is believed that native speakers of English use a lot of formulaic expressions, fixed phrases and chunks in everyday talk. Here are some examples to give you an idea

Appendix 3: Screenshots of the Interactive PowerPoint Interface with Integrated Audio Playback

Please start by reading page 1 carefully for instructions on the task.

Formulaic expressions, fixed phrases and chunks in English

Next

Next

Step 2

A quick practice before we start

• •



Please turn to page 2 now where you will find two examples to practice with Look at example 1 first Step 1:

Now listen to example 1 (for as many times as you like)

After listening, do you feel you would like to make changes to your previous judgement? Yes€ • switch to the pencil now

•you may remove / add bits to items already highlighted,

Highlight what you think is formulaic and give a score (1-5)

change confidence scores or use the pencil this time to highlight new formulaic expressions

•write notes to clarify if necessary

e.g., it’s absolutely up to you 4

Notes

e.g., it’s absolutely up to you 4 €5

Click Next when you finish step 1 Next

No€ Move on to example 2

• Now we do the same things again for example 2

Exclude ‘it’s absolutely’ Only ‘up to you’ and with confidence score 5 Next

Look at example 2 Step 1: Highlight what you think is formulaic and give a score (1-5)

Click Next when you finish step 1

Next

Next

Step 2



Now listen to example 2 (for as many times as you like)

Now we’ll start the real task. You’re going to do exactly the same thing as you did in the practice session but on a 5-minute extract from a lecture.

After listening, do you feel you would like to make changes to your previous judgement? Yes€

Now please let Phoebe know that you’ve finished the

• switch to the pencil now

Click

•you may remove / add bits to items already high-

lighted, change confidence scores or use the pencil this time to highlight new formulaic expressions write notes to clarify if necessary

when you finish

practice session



Notes

e.g., it’s absolutely up to you 4 €5

No€

Exclude ‘it’s absolutely’ Only ‘up to you’ and with confidence score 5

Move on to the next stage

Ok. Let’s start Next

Appendix 3

182

Extract 1

Highlight and give a score (1-5)

Let’s look at extract 1

Done

OK

Extract 1

Listen, make changes, and write notes if necessary.

Extract 2

Highlight and give a score (1-5)

Done. Move on to extract 2

Extract 2

Listen, make changes, and write notes if necessary.

Done

Extract 3

Highlight and give a score (1-5)

Done. Move on to extract 3

Extract 3

Listen, make changes, and write notes if necessary.

Done

Extract 4

Highlight and give a score (1-5)

Done. Move on to extract 4

Done

Appendix 3

Extract 4

Listen, make changes, and write notes if necessary.

183

Extract 5

Highlight and give a score (1-5)

Done Done. Move on to extract 5

Extract 5

Listen, make changes, and write notes if necessary.

Extract 6

Highlight and give a score (1-5)

Done. Move on to extract 6

Extract 6

Listen, make changes, and write notes if necessary.

Done

Extract 7

Highlight and give a score (1-5)

Done. Move on to extract 7

Extract 7

Listen, make changes, and write notes if necessary.

Done

Extract 8

Highlight and give a score (1-5)

Done. Move on to extract 8

Done

184

Appendix 3

Extract 8

Listen, make changes, and write notes if necessary.

Extract 9

Highlight and give a score (1-5)

Done. Move on to extract 9

Extract 9

Listen, make changes, and write notes if necessary.

Done

Extract 10

Highlight and give a score (1-5)

Done. Move on to extract 10

Extract 10

Listen, make changes, and write notes if necessary.

Done

Extract 11

Highlight and give a score (1-5)

Done. Move on to extract 11

Extract 11

Listen, make changes, and write notes if necessary.

Done

Extract 12

Highlight and give a score (1-5)

Done. Move on to extract 12

Done

Appendix 3

Extract 12

Listen, make changes, and write notes if necessary.

185

Extract 13

Highlight and give a score (1-5)

Done. Move on to extract 13

Extract 13

Listen, make changes, and write notes if necessary.

Done

Extract 14

Highlight and give a score (1-5)

Done. Move on to extract 14

Extract 14

Listen, make changes, and write notes if necessary.

Done

Extract 15

Highlight and give a score (1-5)

Done. Move on to extract 15

Extract 15

Listen, make changes, and write notes if necessary.

Done

Extract 16

Highlight and give a score (1-5)

Done. Move on to extract 16 Done

186

Extract 16

Appendix 3

Listen, make changes, and write notes if necessary.

Extract 17

Highlight and give a score (1-5)

Done. Move on to extract 17

Extract 17

Listen, make changes, and write notes if necessary.

Done

Extract 18

Highlight and give a score (1-5)

Done. Move on to extract 18

Extract 18

Listen, make changes, and write notes if necessary.

Done

Extract 19

Highlight and give a score (1-5)

Done. Move on to extract 19

Extract 19

Listen, make changes, and write notes if necessary.

Done

Extract 20

Highlight and give a score (1-5)

Done Done. Move on to the last two extracts

Appendix 3

Extract 20

187

Extract 21

Listen, make changes, and write notes if necessary.

Highlight and give a score (1-5)

on to the last extract! Done

Extract 21

Listen, make changes, and write notes if necessary.

Now Phoebe would like to interview you just very quickly

Finish!

Appendix 4: The Task Booklet for the Native Speaker Judgement Process

Formulaic expressions, fixed phrases and chunks in English It is believed that native speakers of English use a lot of formulaic expressions, fixed phrases and chunks in everyday talk. These include: ●









phrases which are associated with a specific situation, for example don’t worry about, it’s up to you and tell me about it phrases which are routinely employed for a specific act, for example we’ll move on to, I was wondering if and it seems to me common sayings, for example bits and pieces, that sort of thing and moment of truth idiomatic language, for example at the end of the day, finding my feet and off the top of my head etc.

Appendix 4

189

Your task You will be presented with twenty-four extracts taken from a lecture on strategic planning in business management. For each extract, Step 1: Mark by highlighting the formulaic expressions, phrases or chunks and indicate on a scale of 1–5 how confident you are that each of these items is formulaic (5 means you are most confident) Step 2: Play the audio file and decide whether you would like to revise anything upon listening to the audio file and then make changes with the blue pen. Write notes to clarify if necessary.

Please note This task is about what you feel are formulaic or fixed. So there really are no ‘right’ or ‘wrong’ answersPlease mark as many formulaic or fixed units as possible, even if you may not be very sure. Remember you can make use of the confidence scale to reflect your uncertainty This exercise is not to test your English. I am interested in gathering your opinion about formulaic expressions and the process of your judgement In cases when you think there are optional or changeable elements within an expression or a chunk, you might put them in brackets, for example it’s (absolutely) up to you

Tips: using the audio player Play/pause

Spacebar

Go to the previous or next section Rewind within the same section Fast forward within the same section

UP or DOWN arrow keys CTRL + ALT + LEFT arrow key CTRL + ALT + RIGHT arrow key

A quick practice before we start… Here is a text similar to what you will see later. Try and mark the formulaic expressions, fixed phrases or chunks following the steps outlined above. Ask anything you are not sure about now. Eg1. well today’s session’s slightly unusual in that we’re giving a revision session rather than introducing any more texts and contexts Eg2. and what I want to do in this session is to really to clarify what exactly you’re going to be facing in that take-away exam and perhaps talk to you a bit about what you might prepare over the next week er to get you ready for that exam

190

Appendix 4

In case the participant has difficulty finding examples of formulaic sequences in the test round, I will suggest to them that the following sequences are what I think formulaic. This box will not appear in the task given out to the participants. well today’s session’s slightly unusual in that we’re giving a revision session rather than introducing any more texts and contexts and what I want to do in this session is to really to clarify what exactly you’re going to be facing in that take-away exam and perhaps talk to you a bit about what you might prepare over the next week er to get you ready for that exam

1. come on now you’re getting carried away with yourselves down there suppress your excitement we have another instalment of the adventures of organizing and managing work 2. erm so the as you can see this this session is is being filmed so I’m going to have to try and discipline myself to stand here within range of this light so if I start wondering over there I’ll get sent back so that that should be the only difference in the lecture 3. er this is a a film being made to help in the the training of lecturers so you can expect the general quality of lecturing in the world to decline after this is er transmitted more broadly 4. however I’ll I’ll do my best to be as clear as possible erm any any challenges or questions you want to ask during the er procedures please do whether I answer it or not is another question but I shall do my best 5. now what I said to you last week just to make the connection back there was that we were working within the general framework of the program focusing on individuals 6. and I pointed out there how we’re trying to move away from this traditional or systems control thinking to a more processual way of looking at human beings and how they interact 7. in there I mentioned the word strategy to suggest to you that individuals in their lives can be uh they are strategic I said to a greater or lesser extent 8. now I didn’t go a long way in that to explain what I meant by strategy that comes now where we talk about strategy in a more conventional way being related to matters of the or = the organizational level at the corporate level if you like 9. so we’re now talking in a more conventional way about strategy but we’ve got to do the same thing as we’ve done in the other lectures

Appendix 4

191

10. show how there’s been a shift in for some people anyway in systems control thinking about corporate strategy to a more processual er way 11. so let’s start in the normal way this is the convention the way you you’ll you’d normally be taught or you’ll see in the conventional text books on strategy although some of the newer books are moving in the direction that we are in this in this module 12. so what I call the systems control view and that’s a phrase by the way that that I’ve developed it fits with what a lot of other people call orthodox or or systems approaches I’ve given it a slightly different name 13. but in the strategy literature sometimes you’ll see it simply called the rational approach to strategy I think that’s a bit of a misnomer sometimes the planning mode approach or whatever 14. anyway for our purposes we’d say the conventional business studies way of talking about strategy has been as follows 15. strategy is that which is done by top managers often with the help of experts so you you are defined as a strategic manager if you’re in a senior position and it’s less the case now 16. but perhaps ten fifteen years ago many large corporations actually had strategy or strategic planning departments um and such people to help them sounds sensible like most of this does at first sight 17. and indeed we’re not going to question the fact that top managers need to be strategic but the tendency was to leave strategy largely to them in the accounts given 18. the other assumption was and this is a bit like the decision making stuff we looked at in the earlier lecture the assumption was you would gather all the information before you made your strategies 19. and then you’d follow through a clear sequence of steps that you decide where is the business going or where do we want to be in 5 years 20. you gather the information you do a swat analysis or some other kind of porter type analysis all that’ll mean more to you when you do the subsequent module on strategy this year 21. but there’s usually in all those text books the steps that you are said to go through all the time staying neutral being unbiased unprejudiced all the time being logical this is advocated in all those things 22. and again remember I said the M B As that developed across the western world were trying to make managers more rational use more information

192

Appendix 4

be more neutral in their thinking that is non-political that blemish applied to this strategy area as well 23. now this is where you can per= perhaps begin because all that sounds sensible this is perhaps where you can begin to see something of a problem in the assumptions behind this 24. the assumption is with the orthodoxy or was if you think it’s declining that once you had made a sensible strategy using all the information that everybody involved would see the logic of it accept the logic of it and implementation could follow

Appendix 5: Instructions for Prosodic Transcribers

The foci of the investigation are on the placement of intonation unit boundaries, stress assignment and rhythmic changes. The transcription conventions on the next page have been specially designed to reflect these three foci. There are a number of principles to guide the transcription. Please read them carefully because it is important that transcribers work under the same rubric to ensure consistency of the transcriptions.

Principles of transcription of intonation units ●





An intonation unit is defined as a sequence of words combined under a single, coherent intonation contour. It is determined purely on the basis of the natural flow of the intonation contour. An intonation unit may (but not always) be preceded by a pause. But pauses can also occur within intonation units (Chafe, 1987, 1988). For the sake of simplicity, please distinguish between intonation units and pause-defined units. The decision on the placement of intonation unit boundaries and the nuclei is purely based on prosodic evidence and should be kept independent of considerations of the syntax and semantics

Principles of transcription of stress and accent ●



Please identify the syllables that carry primary accent (i.e. one which has pitch prominence and involves a change in tone direction) and secondary accent (i.e. one which receives stress/pitch prominence but does not involve a change in tone direction) However, there is no need to transcribe tone direction

Principles of transcription of tempo and rhythm ●



The study is interested in breaks or changes in rhythm to the phrase level. So please make as fine a record of these breaks and changes as possible These breaks or changes in rhythm can be due to perceived pauses, perceived slowing down or speeding up of rhythm, the speaker pronouncing each word as distinct and emphasized or perceived halting speech (see p. 3)

194 ●



Appendix 5

Please only begin to transcribe the rhythm when you have finished transcribing intonation units and stress/accent. You may want to start on a new sheet as the transcript should have been heavily marked up by then The transcription may look like this (you may find this sound file at http://www. nottingham.ac.uk/~aexmsl/textM_1.mp3)

well today’s session’s slightly unusual in that (.) we’re giving a revision session rather than introducing any more texts and contexts

len rall all all Please note ●

● ●

Please transcribe the text based on your perception and refrain from using any sound analysis program to assist your transcription Please use the margins to write notes to clarify decisions when necessary Please feel free to raise any concerns you have with Phoebe (aexmsL@ nottingham.ac.uk) about the transcription

TRANSCRIBING INTONATION AND PITCH INTONATION UNIT BOUNDARIES major boundary minor boundary

|| |

PITCH AT THE END OF INTONATION UNITS rising level falling

rise level fall

PITCH AT THE BEGINNING OF INTONATION UNITS upstep downstep continuing

up down cont

Appendix 5

195

CONSPICUOUS PITCH JUMPS to higher pitch level to lower pitch level

↑ ↓

TRANSCRIBING STRESS AND ACCENT primary accent (also nucleus) secondary accent

ACcent Accent •

TRANSCRIBING TEMPO AND RHYTHM allegro: fast lento: slow accelerando: continuously faster rallentando: continuously slower marcato: each word distinct and emphasized arrhythmic: halting speech

all len acc rall mar arr

Appendix 6: Prosodic Transcription of Text W

Intonation units and stress placement come ON nOw || you re getting cArried awAY with yoursElves dOwn thEre || supPRESS your exCITEment || we have anOTHer || inSTALment || of the adVENtures || of ORganizing and mAnaging WORK || erm sO the || as you can SEE || this thIs sEssion is is bEing FILMED || SO || I m going to have to try and DIScipline mysElf || to STAND HERE || within rAnge of this LIGHT || so if I start wandering over there I ll get || SENT BACK || so that thAt should be the ONly difference in the lEcture || er thIs is a a fIlm being made to hElp in the the trAIning of LECturers || so you can expEct the gEneral quAlity of lEcturing in the wOrld to deCLINE after this is er || transmItted more BROADly || howEver || I ll I ll do my BEST || to be as clear as POSSible || erm an- any any CHALLenges or quEstions you want to ASK || during the er proCEdures || PLEASE DO || whether I ANswer it or not is another quEstion || but I shall dO my BEST || now what I said to you lAst WEEK || just to make the connection BACK there || WAS || that we were WORKing within the GENeral FRAMEwork of the prOgramme || fOcusing on indiVIDuals || and I pOInted out THERE || hOw we re trYing to mOve aWAY || from this traDITional || or systems conTROL thinking || to a more procEssual way of LOOKing || AT human bEings and || hOw they interACT || IN thEre || I mEntioned the word STRATegy || to sugGEST to you || that indiVIDuals || in their LIVES || cAn be are they ARE straTEgic || I SAID || to a grEAter or lEsser exTENT || now I DIDn t go a lOng way|| IN thAt || to explAIn what I mEAnt by STRATegy || that comes NOW || where we TALK about strategy || in a more conVENTional way || being

Appendix 6

197

relAted to mAtters of the or= at the organiSATional level || at the CORporate level || IF you like || so we re now TALKing in a more || conVENtional way about strAtegy || but we ve got to do the SAME thing as we ve done in the other LECTures || SHOW || how there s been a SHIFT || in for SOME people anyway || from sYstems conTROL thinking || about CORporate strategy || to a more proCESSual || er WAY || so LETS || stArt in the NORmal way || this is the conVENtion || the way you you ll you d normally be TAUGHT || or you ll see in the conventional TEXT books on strAtegy || although sOme of the NEWer bOOks || are MOVing || in the diRECtion || that we Are in this in this MODule || so what I CALL || the sYstems conTROL view || and thAt s a phrase bY the way that that I ve deVELoped || it fIts with what a lOt of other people call ORTHodox || or or SYStems apPROACHes || I ve given it a SLIGHTly different nAme || but In the STRATegy literature sometimes you ll see it sImply cAlled || the RATional apprOAch to strAtegy || I think thAt s a bit of a misNOMer || SOMEtimes || the PLANning MODE apPROACH || or whatEVer || ANyway || for OUR purposes we d sAY || the conVENtional || BUSiness studies way of talking about strAtegy || has BEEN || as FOLLows || STRATegy || IS that || WHICH is dOne || by TOP mAnagers || OFten || with the hElp of EXperts || so you yOU are deFINED || as a straTEGic mAnager || IF you re in a senior posItion || AND || it s LESS the cAse NOW || but perHAPS || TEN fifteen yEArs ago || MAny large corporations Actually hAd strAtegy or stratEgic plAnning dePARTments || um and sUch pEOple to HELP them || sOUnds SENsible || like MOST of this does || At FIRST SIGHT || and indEEd we re not going to QUESTion the fAct || that tOp mAnagers nEEd to be straTEGic || but the TENdency wAs || to LEAVE || STRATegy || lArgely to thEm in the accOUnts GIVen || the OTHer assUmption wAs || and THIS is a bit lIke || the deCISion making stuff || we lOOked at in the earlier LECTure || the assUmption was you would GATHer || ALL the informAtion || befOre you MADE || yOUr STRATegies || and THEN || you d fOllow through a clEAr sEquence of STEPS || that you deCIDE || WHERE is the business gOing || or whEre do we want to be in fIve yEArs || you gAther the informAtion you do || a SWAT analysis || or some OTHer || KIND of || PORTer

198

Appendix 6

type analysis || All that ll mean MORE to you || when you do the sUbsequent MODule || on STRATegy || this YEAR || but there s USually || in ALL those text books || the STEPS || that you are sAId to gO THROUGH || ALL the tIme || stAYing NEUtral || being UNbIased || UnPREjudiced || ALL the time being lOgical || this is Advocated in ALL thOse thIngs || and again reMEMber I said || the M b As that deVELoped || aCROSS the wEstern wOrld || were trYing to mAke managers mOre RATional || use more inforMATion || bE mOre NEUtral in their thInking || that is NON-polItical || er thAt blemish applIEd to the strategy area as WELL || now thIs is where you can per= perhAps begIn || because ALL that sounds SENsible || thIs is perhaps where you can begIn to SEE || SOMEthing || of a PROBlem || in the asSUMPtions behind this || the asSUMPtion IS with the Orthodoxy || and WAS || if you think it s deCLINing || that ONCE you had MADE a sEnsible STRATegy || USing all the informAtion || then EVerybody invOlved || would SEE the LOGic of it || acCEPT the logic of it || AND || ImplemenTATion || COULD || FOLLow

Tempo and Rhythm come on now selves down there (.) suppress your excitement (.) we have another (.) instalment of the adventures (.) of organizing and managing work (.) erm (.) so the (.) see this this session is is being filmed (.) so (.) I m going to have to try and discipline myself to stand here within range of this light (.) (.) sent back (.) so that in the lecture (.) er this is a (.) a film (.) being made to help in the (.) the training of lecturers (.) (.) transmitted more broadly (.) (.) I ll I ll do my best to be as clear as possible (.) erm (.) an- any (.) any (.) challenges or questions you want to ask during the (.) er (.) (.) please do (.) (.) but I shall do my best (.)

Appendix 6

199

now (.) what I said to you last week (.) back there was that we were (.) working framework of the programme (.) focussing on individuals (.) and I pointed out there (.) how (.) we re trying to (.) (.) from this traditional (.) or systems control thinking (.) to a more looking (.) at human beings and how they interact (.) in there (.) I (.) mentioned the word (.) strategy (.) to suggest to you (.) that individuals (.) in their lives (.) can be (.) uh they are strategic (.) I said (.) to a greater or lesser extent go a long way (.) in that (.) to explain what I meant by strategy that comes now where we (.) being related (.) to matters (.) of (.) the or= (.) if you like (.) (.) (.) (.) show how there s been a shift (.) in for about (.) corporate strategy (.) to a more processual (.) er way (.) so let s (.) start in the normal way (.) (.) you d normally be taught or you ll see in the conventional text books on strategy (.) moving in the direction that we are in this (.) in this module (.) so what I call (.) the systems control view and that s a phrase by the way that that I ve developed it fits call or or I ve (.) (.) sometimes you ll see it simply called (.) the to strategy (.) (.) sometimes the planning mode approach> (.) or whatever (.) (.) the conventional (.) business studies way of talking about strategy (.) has been (.) (.) strategy (.) is that (.) which is done> (.) by top managers (.) often (.) (.) so you (.) you are defined (.) as a strategic manager (.) if you re in a senior position (.) and (.) (.) but (.) perhaps (.) ten fifteen years actually had (.) strategy or strategic planning departments (.) um and such people to help them (.) (.) does (.) (.) and indeed we re not going to question the fact that top managers need to be strategic (.) but the (.) largely to them in the accounts given (.) the other assumption was (.) and this is a bit like (.) the decision making stuff we looked at in the earlier lecture (.) the assumption was> you would gather (.) all the information (.) before you made (.) your strategies (.) and then (.) you d follow through a clear sequence of steps (.) that (.) you decide (.) where is the business going or where want to be in five years you gather the information you do (.) a analysis or some other (.) porter type analysis all more to you when you do the subsequent module (.) on strategy this year (.) but (.) there s usua all those text books (.) (.) (.) neutral (.) being (.) unprejudiced (.) all the time being all those things (.)

(.) were trying to make managers more rational (.) use more information (.) be more neutral in their thinking (.) that is (.) non-political (.) er that blemish applied to this strategy area as well (.) now this is where you can per= perhaps begin begin to see (.) in the assumptions behind this (.) with the orthodoxy or (.) was if you (.) think it s declining (.) that (.) a sensible strategy using all the information (.) that everybody involved would see the logic of it (.) accept the logic of it (.)

Appendix 7: Formulaic Sequences Assigned the Maximum Confidence Score in Text W

Below are images generated using the qualitative data analysis tool NVivo 8. These images show how Text W was coded by the thirty native speaker judges in study one part B. On the left-hand side of the images, the formulaic sequences which were assigned the maximum confidence score by the judges are highlighted in light brown. Each line contains only one formulaic sequence with the maximum confidence score. Therefore, come on now and you’re getting carried away with yourselves were identified as two separate formulaic sequences. On the right-hand side of the images are the equivalent coding stripes generated automatically by the tool. Each coding stripe represents one judge, and the name of the judge is printed next to each stripe for easy reading. The coding stripes show which judge(s) was/were responsible for assigning the maximum score to the formulaic sequence on the left-hand side. For example, the coding stripes indicate clearly that come on now was assigned the maximum score by fourteen judges, including BPA, CDD, CME, JPY, LCH, MCO, MFO, MMA, NSA, RDO, RMA, SLY, VWA and VWI. Another word sequence down there, however, was assigned the maximum score by only one judge (i.e. MFO). According to the data analysis criteria outlined in Section 4.8, a word sequence must have been assigned the maximum score by at least two judges if they were to be used in the prosodic analysis in studies one and two. Therefore, these images below have been a very effective way of indicating which word sequences were popular choices among the judges and which word sequences were idiosyncratic choices of individual judges.

202

Appendix 7

Appendix 7

203

204

Appendix 7

Appendix 7

205

Appendix 8: The Duration of the Task for each Native Speaker Judge

Time spent on Training

Identification

MMA NSE STO MCO NBO

11”34 08”04” 03”04” 11”34” 03”33”

22”01 57”30” 19”02” 23”00” 23”32”

JPR KHU SAT DHU KHA

05”53” 05”54” 04”17” 05”35” 05”54”

30”45” 51”16” 27”25” 32”07” n/a*

NSA ECO EAD RDO MFO

04”17” 05”47” 03”42” n/a* 1”27”

25”25” 16”06” 18”22” n/a* 10”19”

ADO JKI RMA BGR RNE

06”56” 02”30” n/a* 06”34” 6”33”

23”10” 19”02” n/a* 34”56” 26”03”

WVI JPI LCH BPA SLY

4”12” 4”19” 4”11” 5”08” 5”41”

22”16” 34”24” 27”22” 24”52” 24”12”

CBR VWA CME CDD JCH

4”19” 4”46” 06”12” 4”45” 6”57”

39”49” 35”51” 31”38” 25”29” 29”25”

*Due to technical problems with the computers on the day of data collection, these figures cannot be included in the analysis

Notes Chapter 2 1 While the identification criteria provided by child language researchers are characterized by their comprehensiveness, those highlighted by adult language researchers are marked by the depth of their exploration, particularly of the formal and semantic aspects of formulaic language. Recent examples include Svensson’s (2008) investigation of semantic non-compositionality, Wulff ’s (2008) exploration of semantic non-compositionality and formal inflexibility and Hudson’s (1998) discussion of formal fixedness. 2 In this book, a formulaic sequence is an instantiation of formulaic language. 3 However, it should be pointed out that Sinclair (1991) was not the first to publish corpus-based empirical studies on formulaic language. Before him, Norrick (1985) investigated the use of proverbs (e.g. Norrick, 1985) and Strässler (1982) investigated the use of idioms (see Moon, 1998, for a review of early corpus-based studies of idioms and idiomatic language). 4 Clusters is the term given to highly recurrent, contiguous and fixed sequences of unlemmatized word forms by Scott (2012) in his automatic tool WordSmith. The definition of clusters is exactly the same as that of Biber, Johansson, Leech, Conrad and Finegan’s (1999) lexical bundles. 5 For further information about the BASE corpus and MICASE, see http://www2. warwick.ac.uk/fac/soc/al/research/collect/base/ and http://micase.elicorpora.info/ respectively. 6 Lin and Adolphs’ (2009) paper is part of study one of this book published as a book chapter. 7 For further discussions on the notion of ‘native speakers’, see Coppieters (1987), Davies (1991, 2003, 2005) and Phillipson (1992). 8 However, from Bahns et al. (1986) to Erman and Warren (2000), Foster (2001), Wood (2006) and Wulff (2008), there has been a tendency, over time, to prefer the use of external native speaker judges to internal judges. The problem with the use of internal judges is that there can potentially be a conflict of interests which could weaken the credibility of the findings of the empirical study: these internal

208

Notes

judges’ identification of formulaic sequences can be unconsciously affected by their knowing the purposes and the predicted outcomes of their own studies. 9 As researchers have not provided this piece of information, the durations here are only estimates based on the experience of study two of this book.

Chapter 3 1 Interestingly, this concept that the phonological criterion is more reliable than other criteria for formulaic language is also found in Keller (1981, cited in Aijmer, 1996, p. 15). 2 Here is Peters’ (1983, p. 10) original elaboration of the phonological criterion: 4. Does the utterance cohere phonologically? That is, is it always produced fluently as a unit with an unbroken intonation contour and no hesitations for encoding? Note that whereas for adults hesitation pauses are not reliable indicators of the size and nature of encoding units (Rosenberg, 1977), the dual criteria of presence versus absence of hesitation and double versus single intonation contour have been fairly widely used in child language research to distinguish a succession of two one-word utterances from a single two-word construction. Thus Scollon uses these criteria to separate ‘vertical’ from ‘horizontal’ constructions (1976, pp. 152–3). Branigan and Stokes, too, on the basis of their phonetic analysis of the prosodic integration of early utterances, propose two classes of utterances: those that are temporally fragmented (with pauses) and those that are temporally unified (1982, p. 8). It would seem, therefore, that at least for very young children the absence of pauses together with the presence of smooth intonation contours is a good clue to some kind of preplanned psycholinguistic unit. And a particular sequence of adult morphemes that is always marked by such phonological coherence (i.e. one in which there are never any hesitations in the middle) is a good candidate for a unit in the child’s system. Certain early utterances may even have special intonation contours associated with them. Minh as early as 1;2 had easily recognizable ‘tunes’ for the expressions what’s that? look at that! and uh-oh! (Peters, 1977). These utterances were also marked by phonological integration: They were articulated as units, the most common forms being [‘ʌsæ:], [dukədæ:], and [‘ɔ‘ɔ:]. 3 The following are the list of prosodic differences between the idiomatic and literal readings of idioms provided by Van Lancker, Canter and Terbeek (1981): 1. literal utterances have a longer duration than idiomatic utterances (i.e. idiomatic utterances are spoken faster); 2. literal utterances have five times as many pauses as idiomatic utterances;

Notes

209

3. juncture (Lehiste, 1960) occurred almost three times more in literal rather than in idiomatic utterances; Note: The term juncture is a classical term in phonology and was discussed in Hockett (1958), Lehiste (1960) and Trager and Bloch (1941). Van Lancker, Canter and Terbeek’s (1981) explanation of juncture is that it is a phonetic phenomenon which signals linguistic (lexical or syntactic) boundaries. Open juncture is a perceptual category, indicated by a complex combination of acoustic and linguistic attributes. This definition may be slightly different from Lehiste (1960) and Trager and Bloch’s (1941) definition, which states that ‘the transition from the pause preceding an isolated utterance to the first segmental phoneme, and from the last segmental phoneme to the following pause, we call open juncture. By contrast, the transition from one segmental phoneme to the next within the utterance … we call close juncture.’ 4. pitch contours more in literal than in idiomatic utterances (0.33–0.5 times); and 5. literal utterances are systematically marked by what Bolinger (1965) defines as Accent A. Note: According to Bolinger (1965, p. 57), Accent A is ‘a “stress” marked by a sharp drop in pitch after the accented syllable as against gradual movement or level pitch elsewhere’. Accent B, on the other hand, is ‘a “stress” marked by a jump to a higher pitch on the accented syllable and with no sharp drop immediately afterward’. 4 This is the direct quote from Wray (2002, p. 35) concerning the Pawley and Syder (2000) study: ‘Features such as overall fluency, intonation pattern and changes in speed of articulation are all potential pointers to a stretch of prefabricated material’ (Pawley and Syder, 2000, p. 173). 5 That said, it is not impossible to establish the formulaicity of wordstrings on the sole basis of the prosodic qualities of the wordstring in child language, as the Plunkett (1990) study has shown successfully (see also Section 3.3.1). This of course is due to the unique circumstance of child language acquisition/ production. 6 What Knowles (1991) refers to as a temporal break is simply a filled or unfilled pause with a measurable duration of greater than 0.25 seconds, but obviously we should also add syllable-lengthening and anacrusis to this category. In Knowles’s (1991) classification, pitch discontinuity means a break in the line of the preceding pitch contour and the commonest example is a rise in pitch following a fall. Segmental discontinuities can be created by either taking away features characteristic of connected speech (e.g. assimilation, elision, r-linking, (j, w) glides after close vowels, germination of stop/plosive phases and contractions as in John’s here), or by using a pattern which points positively to the existence of a boundary (e.g. releasing a final stop/plosive with a prominent burst).

210

Notes

7 This extrapolation is also supported by Dechert’s (1984) observation that some pause-defined units can consist of smaller formulaic units that are signalled by intonation contours; see Section 3.3.2. 8 Tonality concerns the division of speech into intonation units and the location of intonation unit boundaries, which are what Section 3.5.1 deals with under the heading ‘intonation’. Tonicity is related to which syllable(s) should be given pitch prominence and stress within an intonation unit, which is what Section 3.5.3 deals with under the heading ‘stress placement’. Finally, tones refers to which, among the five tones (i.e. rise, fall, rise–fall, fall–rise and level), is assigned to the syllable(s) and is given pitch prominence within an intonation unit. These terms are not covered within this book (however, see Aijmer, 1996, for a discussion on the types of tone associated with thanking and apologizing formulas; see also Crystal, 2003, for a discussion of the fine distinction of tones and their associated meanings; see Ashby (2006) and Wells (2006) for a discussion on the fixed tones on idiomatic language). In the sense of ‘intonation’ broadly defined, pauses and temporal features represent aspects of the prosodic cues signalling intonation unit boundaries (see Section 4.3.2 for further discussion). 9 It is true that the layperson native speaker judges in the Wulff (2008) study successfully ranked thirty-nine idiomatic constructions according to their perceived degree of idiomaticity, but all these constructions fell under the same grammatical class and had comparable frequency of occurrence in corpus. That is to say, the comparison in the Wulff study is possible probably only because the nature of the comparison is simplified. 10 There is much to be said about the decision of whether to break up the texts according to pauses or intonation breaks because this decision reflects the researchers’ belief about the inter-relationship between pauses and intonation and can make a difference to the way the texts are broken down. Chafe (1988a) believes that having pauses within an intonation unit is possible probably because he thinks a consideration of intonation should be kept independent of pauses. However, the alternative view that Dechert (1983) and Raupach (1984) take is that it is possible for a pause-defined unit to contain more than one intonation unit, probably because pauses are higher in the prosodic discontinuity hierarchy (see Knowles, 1991, for details). 11 In the sentence, it’s raining cats and dogs, the non-compositional part of the idiom, according to Ashby (2006) is ‘cats and dogs’. When broad focus is introduced, dogs will be accented; when narrow focus is introduced, either cats or and is accented. However, Ashby notes that this rule requires an intelligent interpretation: the rule only predicts the location(s) to put pitch prominence if it falls within the idiom. It does not postulate that pitch prominence on these location(s) is obligatory.

Notes

211

12 The comparison here between formulaic sequences and ‘general spoken English’ is considerably simplified. As Section 1.2 pointed out, the reality is that any chunk of language can, potentially, be formulaic. Therefore, the notion of general spoken English here requires an intelligent interpretation.

Chapter 4 1 In the case of very young children, formulaic speech can be easily recognized by its prosodic features because it is markedly far more fluent and structurally far more complex than the rest of the children’s oral production (Hickey, 1993; Wray and Namba, 2003). Likewise, formulaic speech is markedly more fluent and native-like than the rest of the oral production of dysfluent foreign language learners (Dechert, 1983; Raupach, 1984; Weinert, 1995; Wray, 2004; see also Section 3.5.2). After using a formulaic sequence to produce a fluent stretch of speech, the speaker will return to speaking with a lot of pauses and hesitation phenomena. That is why the switching points between formulaic and non-formulaic speech are expected to be marked by pauses and hesitation phenomena. 2 Study one part A has been published in Lin and Adolphs (2009). It is reproduced with the permission of Palgrave Macmillan. 3 NICLEs-CHN is the short name for the Nottingham International Corpus of Learner English (spoken)-Chinese learner sub-corpus (see Dahlmann, 2009, for further information about the corpus). 4 In other words, make a difference and makes a difference are counted as two separate items. 5 This figure is calculated by dividing Case 1 complete clauses (i.e. 3 + 23 = 26) by the sum of Cases 1–4 complete clauses (i.e. 3 + 3 + 23 + 2 + 3 + 1 = 35) under the ‘comment clause’ and ‘disclaimer’ columns in Table 4.4. 6 It should be noted, however, that the formulaic language automatic extraction tool named Wmatrix (Rayson, 2003) has incorporated automatic semantic tagging and automatic part-of-speech tagging facilities which could, potentially, enhance the ability of the extraction tool to discern the function and meaning of word sequences in the extraction process. 7 From my personal experience it is natural to prioritize meaning over form in the mental processing of language. While the message of utterances made a second ago can often be recalled, it is almost always a struggle to recall the exact words used in those utterances. 8 Therefore, in a sense, there was a trade-off between the difficulty level of the comparison and the conditions of the comparison: if the comparison is more intellectually challenging (e.g. the abstractness of the subject of comparison),

212

Notes

the material requiring comparison needs to be made easier (e.g. making it more focused). 9 It is true that Wulff ’s study shows that even laypeople with no specialist linguistic training can readily judge between different levels of idiomaticity. However, as pointed out earlier in this section, Wulff ’s judges were judging the idiomaticity of items that come from the same class (i.e. V-NP constructions). Therefore, the scenario for this study is different. 10 NVivo 8 is designed to support qualitative research in social sciences. In the software, the coder refers to the person who codes, for example, an interview transcript or a journal entry kept by the research participants. To increase the reliability of the research, two or more coders may be asked to code the same data so that in the end the conclusions are not based on the subjective judgement of one person. NVivo provides the inter-rater reliability function so that researchers can report the extent to which the coders agree on their judgements.

Chapter 5 1 It should be noted that lengthening the duration of syllables is only one of the ways in which emphasis can be achieved by manipulating prosody. In the prosodic research literature, Cutler and Isard (1980) and Warren (1999) have discussed how speakers may use different phonetic parameters to make particular distinctions. Evidence for this phenomenon comes from an empirical analysis of the speech style of two speakers conducted by Cutler and Isard (1980). What they found is that, when speaking the same sentences, one speaker relied heavily on syllablelengthening to show emphasis while the other tended to manipulate pitch changes to achieve the same purpose. 2 See Chafe, 2006, for a discussion on the prosodic differences between read-aloud academic speech and spontaneous academic speech. 3 There is a need to extend the applicability of the suggestions of Ashby (2006) and Wells (2006), for instance, because the subject of both studies is semantically noncompositional idioms (i.e. idioms narrowly defined), but the target of this book is semantically transparent formulaic language broadly defined. 4 This quotation has already been cited in Sections 3.2 and 3.5.1, but there is another dimension to this quote which is worth noting in relation to the discussion in this section.

Notes

213

Chapter 6 1 This is Wood’s (2006, p. 21) explanation of his phonological criterion: Phonological coherence and reduction. In speech production, formulaic sequences may be uttered with phonological coherence (Coulmas, 1979; Wray, 2002), with no internal pausing and a continuous intonation contour. Phonological reduction such as phonological fusion, reduction of syllables or deletion of schwa may be present as well. All are common features of the utterance of the most high-frequency phrases in English, but are much less so in low-frequency or more imaginatively constructed utterances, according to Bybee (2002). Phonological reduction can be taken as evidence that ‘much of the production of fluent speech proceeds by selecting prefabricated sequences of words’ (Bybee, 2002, p. 217). This criterion was important in this research and gave the expert judges a readily identifiable characteristic of the speech samples to attend to.

2 Unfortunately, due to technical problems with the computer on the day of data collection, the time RMA spent on the task could not be recorded accurately by the interactive PowerPoint interface. However, according to my notes about the performance of RMA on that day, he took more than an hour to complete the training and the identification task.

Chapter 7 1 This suggestion, of course, requires an intelligent interpretation. Although the automatic extraction of recurrent word combinations is fairly straightforward, the automatic extraction of formulaic language which has psycholinguistic validity (like the ones identified by native speaker judgement in this book) is not. Therefore, if formulaic language is to be incorporated into intonation unit segmentation algorithms, further research needs to be carried out to enable the effective extraction of formulaic language with psycholinguistic validity.

Bibliography Adolphs, S. (2006). Introducing Electronic Text Analysis. Abingdon and New York, NY: Routledge. Adolphs, S. (2008). Corpus and Context: Investigating Pragmatic Functions in Spoken Discourse. Amsterdam: John Benjamins. Aijmer, K. (1996). Conversational Routines in English. London and New York, NY: Longman. Altenberg, B. (1987). Prosodic Patterns in Spoken English: Studies in the Correlation Between Prosody and Grammar for Text-to-Speech Conversion. Lund: Lund University Press. Altenberg, B. (1990a). Automatic text segmentation into tone units. In J. Svartvik (Ed.), The London-Lund Corpus of Spoken English: Description and Research (pp. 287–324). Lund: Lund University Press. Altenberg, B. (1990b). Predicting text segmentation into tone units. In J. Svartvik (Ed.), The London-Lund Corpus of Spoken English: Description and Research (pp. 275–86). Lund: Lund University Press. Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word-combinations. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis and Applications (pp. 101–22). Oxford, England: Clarendon Press. Altenberg, B. and Eeg-Olofsson, M. (1990). Phraseology in spoken English: Presentation of a project. In J. Aarts and W. Meijs (Eds), Theory and Practice in Corpus Linguistics (pp. 1–26). Amsterdam: Rodopi. Ashby, M. (2006). Prosody and idioms in English. Journal of Pragmatics, 38(10), 1580–97. Bahns, J., Burmeister, H. and Vogel, T. (1986). The pragmatics of formulas in L2 learner speech: Use and development. Journal of Pragmatics, 10, 693–723. Baker, M. and McCarthy, M. (1988). Multiword units and things like that. In Mimeograph. Birmingham: University of Birmingham. Barr, P. (1990). The role of discourse intonation in lecture comprehension. In M. Hewings (Ed.), Papers in Discourse Intonation. Birmingham: English Language Research, University of Birmingham. Barrón-Cedeño, A. and Rosso, P. (2009). On automatic plagiarism detection based on n-grams comparison. In M. Boughanem, C. Berrut, J. Mothe and C. SouleDepuy (Eds), Advances in Information Retrieval: Proceedings of the 31st European Conference on Information Retrieval Conference, ECIR 2009, Toulouse, France, April 2009 (pp. 696–700). Berlin and Heidelberg: Springer-Verlag.

Bibliography

215

Beckman, M. E. and Elam, G. A. (1997). Guidelines for ToBI Labelling (3rd Version). Columbus, OH: Ohio State University. Beckman, M. E., Hirschberg, J. and Shattuck-Hufnagel, S. (2005). The original ToBI system and the evolution of the ToBI framework. In S. A. Jun (Ed.), Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford: Oxford University Press. Benjamin, L. T., Jr. (2002). Lecturing. In S. F. Davis and W. Buskist (Eds), The Teaching of Psychology: Essays in Honor of Wilbert J. McKeachie and Charles L. Brewer (pp. 57–67). Mahwah, NJ: Lawrence Erlbaum. Biber, D. (2006). University Language: A Corpus-based Study of Spoken and Written Registers. Amsterdam and Philadelphia, PA: John Benjamins. Biber, D. and Barbieria, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26(3), 263–86. Biber, D., Conrad, S. and Cortes, V. (2003). Lexical bundles in speech and writing: An initial taxonomy. In A. Wilson, P. Rayson and T. McEnery (Eds), Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech (pp. 71–92). Frankfurt and Main: Peter Lang. Biber, D., Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow, Essex: Longman. Bing, J. M. (1985). Aspects of English Prosody. New York, NY: Garland. Bloom, L. (1973). One Word at a Time: The Use of Single Word Utterances before Syntax. The Hague: Mouton. Bock, J. K. and Mazzella, J. R. (1983). Intonational marking of given and new information: Some consequences for comprehension. Memory & Cognition, 11, 64–76. Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10), 341–5. Bolinger, D. (1964). Intonation as a universal. In H. G. Lunt (Ed.), Proceedings of the Ninth International Congress of Linguists (pp. 833–48). The Hague: Mouton. Bolinger, D. (1965). On certain functions of accents A and B. In I. Abe and T. Kanekiyo (Eds), Forms of English (pp. 57–66). Cambridge, MA: Harvard University Press. Bolinger, D. (1976). Meaning and memory. Forum Linguisticum, 1, 1–14. Bolinger, D. (1989). Intonation and its Uses: Melody in Grammar and Discourse. London: Edward Arnold. Boysson-Bardies, B. (1999). How Language Comes to Children: From Birth to Two Years. Cambridge, MA: MIT Press. Branigan, G. (1979). Some reasons why successive single word utterances are not. Journal of Child Language, 6, 411–21. Branigan, G. and Stokes, W. (1982). An integrated account of utterance variability in early language development. In C. E. Johnson and C. L. Thew (Eds), Proceedings of the Second International Congress for the Study of Child Language. Washington, DC: University Press of America. Brazil, D. (1997). The Communicative Value of Intonation in English. Cambridge: Cambridge University Press.

216

Bibliography

Brown, G. (1978). Understanding spoken language. TESOL Quarterly, 12(3), 271–83. Brown, G., Currie, K. L. and Kenworthy, J. (1980). Questions of Intonation. London: Croom Helm. Bush, N. (2001). Frequency effects and word-boundary palatalization in English. In J. Bybee and P. J. Hopper (Eds), Frequency and the Emergence of Linguistic Structure (pp. 255–80). Amsterdam: John Benjamins. Butler, C. S. (1997). Repeated word combinations in spoken and written text: Some implications for functional grammar. In C. S. Butler, J. H. Connolly, R. A. Gatward and R. M. Vismans (Eds), A Fund of Ideas: Recent Developments in Functional Grammar (pp. 60–77). Amsterdam: IFOTT. Butterworth, B. (1975). Hesitation and semantic planning in speech. Journal of Psycholinguistic Research, 4(1), 75–87. Bybee, J. (2001). Frequency effects on French liaison. In J. Bybee and P. J. Hopper (Eds), Frequency and the Emergence of Linguistic Structure (pp. 337–59). Amsterdam: John Benjamins. Bybee, J. (2002). Phonological evidence for exemplar storage of multiword sequences. Studies in Second Language Acquisition, 24, 215–21. Bybee, J. (2006). From usage to grammar: The mind’s response to repetition. Language, 82(4), 711–33. Bybee, J. and Scheibman, J. (1999). The effect of usage on degrees of constituency: The reduction of don’t in American English. Linguistics, 37, 575–96. Chafe, W. L. (1979). The flow of thought and the flow of language. In T. Givón (Ed.), Discourse and Syntax (pp. 159–81). New York, NY: Academic Press. Chafe, W. L. (1980a). Some reasons for hesitating. In H. W. Dechert and M. Raupach (Eds), Temporal Variables in Speech (pp. 169–80). The Hague: Mouton. Chafe, W. L. (Ed.). (1980b). The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex. Chafe, W. L. (1987). Cognitive constraints on information flow. In R. S. Tomlin (Ed.), Coherence and Grounding in Discourse (pp. 21–51). Philadelphia, PA: John Benjamins. Chafe, W. L. (1988). Punctuation and the prosody of written language. Written Communication, 5, 395–426. Chafe, W. L. (1988a). Linking intonation units in spoken English. In J. Haiman and S. A. Thompson (Eds), Clause Combining in Grammar and Discourse (pp. 1–27). Amsterdam: John Benjamins. Chafe, W. L. (1988b). Punctuation and the prosody of written language. Written Communication, 5, 395–426. Chafe, W. L. (1994). Discourse, Consciousness, and Time: The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago: University of Chicago Press. Chafe, W. L. (2006). Reading aloud. In R. Hughes (Ed.), Spoken English, TESOL and Applied Linguistics (pp. 53–71). Basingstoke: Palgrave Macmillan.

Bibliography

217

Cheng, W., Greaves, C. and Warren, M. (2006). From n-gram to skipgram to concgram. International Journal of Corpus Linguistics, 11(4), 411–33. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Conklin, K. and Schmitt, N. (1 March 2008). Formulaic sequences: Are they processed more quickly than nonformulaic language by native and nonnative speakers? Applied Linguistics, 29 (1), 72–89. Coppieters, R. (1987). Competence differences between native and near-native speakers. Language, 63(3), 544–73. Cortes, V. and Csomay, E. (2007). Positioning lexical bundles in university lectures. In M. C. Campoy and M. J. Luzón (Eds), Spoken Corpora in Applied Linguistics (pp. 57–76). Bern: Peter Lang. Coulmas, F. (1979). On the sociolinguistics relevance of routine formulae. Journal of Pragmatics, 3, 239–66. Coulmas, F. (1981). Introduction: Conversational routine. In F. Coulmas (Ed.), Conversational Routine: Explorations in Standardized Communication Situations and Prepatterned Speech (pp. 1–17). The Hague: Mouton. Couper-Kuhlen, E. and Selting, M. (Eds.) (1996). Prosody in Conversation. Cambridge: Cambridge University Press. Cowie, A. P. (1988). Stable and creative aspects of vocabulary use. In R. Carter and M. McCarthy (Eds), Vocabulary and Language Teaching (pp. 126–39). London: Longman. Croft, W. (1995). Intonation units and grammatical structure. Linguistics, 33, 839–82. Cruttenden, A. (1997). Intonation (2 ed.). Cambridge: Cambridge University Press. Crystal, D. (1969). Prosodic Systems and Intonation in English. Cambridge: Cambridge University Press. Crystal, D. (1975). The English Tone of Voice: Essays on Intonation, Prosody and Paralanguage. London: Edward Arnold. Crystal, D. (2003). Prosody. In David Crystal (Eds), The Cambridge Encyclopaedia of the English Language (2 ed., pp. 248–9). Cambridge: Cambridge University Press. Cutler, A. (1976). Phoneme-monitoring reaction time as a function of preceding intonation contour. Perception & Psychophysics, 20, 55–60. Cutler, A. and Isard, S. D. (1980). The production of prosody. In B. Butterworth (Ed.), Language Production (Vol. 1, pp. 245–69). New York, NY: Academic Press. Cutler, A., Dahan, D. and van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40(2), 141–201. Dahlmann, I. (2009). Towards a Multi-Word Unit Inventory of Spoken Discourse. Nottingham: The University of Nottingham. Davies, A. (1991). The Native Speaker in Applied Linguistics. Edinburgh: Edinburgh University Press. Davies, A. (2003). The Native Speaker: Myth and Reality. Clevedon: Multilingual Matters. Davies, A. (2005). The native speaker in applied linguistics. In A. Davies and C. Elder (Eds), The Handbook of Applied Linguistics (pp. 431–50). Malden, MA: Blackwell.

218

Bibliography

Davis, B. G. (1993). Tools for Teaching. San Francisco, CA: Jossey-Bass Publishers. De Cock, S. (1998). A recurrent word combination approach to the study of formulae in the speech of native and non-native speakers of English. International Journal of Corpus Linguistics, 3(1), 59–80. De Cock, S. (2000). Repetitive phrasal chunkiness and advanced EFL speech and writing. In C. Mair and M. Hundt (Eds), Corpus Linguistics and Linguistic Theory (pp. 51–68). Amsterdam: Rodopi. De Cock, S. (2007). Routinized building blocks in native speaker and learner speech: Clausal sequences in the spotlight. In M. C. Campoy and M. J. Luzón (Eds), Spoken Corpora in Applied Linguistics (pp. 217–33). Bern: Peter Lang. Dechert, H. W. (1980). Pauses and intonation as indicators of verbal planning in second-language speech productions: Two examples from a case study. In H. W. Dechert and M. Raupach (Eds), Temporal Variables in Speech (pp. 271–85). The Hague: Mouton. Dechert, H. W. (1983). How a story is done in a second language. In C. Faerch and G. Kasper (Eds), Strategies in Interlanguage Communication (pp. 175–95). London: Longman. Dehé, N. (2007). The relation between syntactic and prosodic parenthesis. In N. Dehé and Y. Kavalova (Eds), Parentheticals (pp. 263–86). Amsterdam and Philadelphia, PA: John Benjamins. Ellis, N. C. (1996). Sequencing in SLA: Phonological memory, chunking, and points of order. Studies in Second Language Acquisition, 18(1), 91–126. Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143–88. Engel, W. V. R. (1973). The development from sound to phoneme in child language. In C. A. Ferguson and D. I. Slobin (Eds), Studies of Child Language Development (pp. 9–12). New York, NY: Holt, Rinehart and Winston. Erman, B. (2007). Cognitive processes as evidence of the idiom principle. International Journal of Corpus Linguistics, 12(1), 25–53. Erman, B. and Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29–62. Fagyal, Z. (2002). Prosodic boundaries in the vicinity of utterance-medial parentheticals in French. Probus: International Journal of Latin & Romance Linguistics, 14(1), 93–111. Field, A. (2009). Discovering Statistics using SPSS (3 ed.). Thousand Oaks, CA: Sage. Fillmore, C. J. (1979). On fluency. In C. J. Fillmore, D. Kempler and W. S. Y. Wang (Eds), Individual Differences in Language Ability and Language Behaviour (pp. 85–101). New York, NY: Academic Press. Fillmore, C. J., Kay, P. and O’Connor, M. C. (1998). Regularity & idiomaticity in grammatical constructions: The case of ‘let alone’. Language, 64(3), 501–38.

Bibliography

219

Firth, J. R. (1957). A synopsis of linguistic theory, 1930–1955. In Studies in Linguistic Analysis (pp. 1–32). Oxford: Blackwell. Fisher, C. and Tokura, H. (1996). Prosody in speech to infants: Direct and indirect acoustic cues to syntactic structure. In J. L. Morgan and K. Demuth (Eds), Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition (pp. 343–63). Hillsdale, NJ and England: Lawrence Erlbaum. Foster, P. (2001). Rules and routines: A consideration of their role in the task-based language production of native and non-native speakers. In M. Bygate, P. Skehan and M. Swain (Eds), Researching Pedagogic Tasks: Second Language Learning, Teaching, and Testing (pp. 75–94). Harlow, England and New York, NY: Longman. Frederick, P. J. and Fry, H. (1986). The lively lecture—8 variations. College Teaching, 34(2), 43–50. Garside, R. (1996). The robust tagging of unrestricted text: The BNC experience. In J. Thompson and M. Short (Eds), Using Corpora for Language Research: Studies in the Honour of Geoffrey Leech (pp. 167–80). London: Longman. Garside, R. and Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech and T. McEnery (Eds), Corpus Annotation: Linguistic Information from Computer Text Corpora (pp. 102–21). London: Longman. Garside, R., Leech, G. and Sampson, G. (1987). The Computational Analysis of English: A Corpus-based Approach. London: Longman. Gerken, L., Jusczyk, P. W. and Mandel, D. R. (1994). When prosody fails to cue syntactic structure: 9-month-olds’ sensitivity to phonological versus syntactic phrases. Cognition, 51(3), 237–65. Gibbs, R. W., Bogdanovich, J. M., Sykes, J. R. and Barr, D. J. (1997). Metaphor in idiom comprehension. Journal of Memory and Language, 37(2), 141–54. Goldberg, A. (1995). Constructions. Chicago: University of Chicago Press. Goldman-Eisler, F. (1968). Psycholinguistics: Experiments in Spontaneous Speech. London: Academic Press. Greaves, C. (2009). ConcGram: A Phraseological Search Engine (Version 1.0). Amsterdam: John Benjamins. Gries, S. T. and Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on ‘alternations’. International Journal of Corpus Linguistics, 9, 97–129. Haggo, D. C. and Kuiper, K. (1985). Stock auction speech in Canada and New Zealand. In R. Berry and J. Acheson (Eds), Regionalism and National Identity: Multidisciplinary Essays on Canada, Australia and New Zealand (pp. 189–97). Christchurch: Association for Canadian Studies in Australia and New Zealand. Halliday, M. A. K. (1966). Lexis as a linguistic level. In C. E. Bazell, J. Catford, M. A. K. Halliday and R. Robins (Eds), In Memory of J. R. Firth (pp. 148–62). London: Longman. Halliday, M. A. K. (1967). Intonation and Grammar in British English. The Hague: Mouton.

220

Bibliography

Hepper, P. G. (1997). Memory in utero? Developmental Medicine & Child Neurology, 39(5), 343–6. Hickey, T. (1993). Identifying formulas in first language acquisition. Journal of Child Language, 20(1), 27–41. Hockett, C. F. (1958). A Course in Modern Linguistics. New York, NY: Macmillan. Horgan, J. (2003). Lecturing for learning. In H. Fry, S. Ketteridge and S. Marshall (Eds), A Handbook for Teaching and Learning in Higher Education (pp. 75–90). London: Kogan Page. Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 24–44. Hudson, J. (1998). Perspectives on Fixedness: Applied and Theoretical. Lund: Lund University Press. Hudson, J. and Wiktorsson, M. (2009). Formulaic language and the relater category – the case of about. In R. Corrigan, E. A. Moravcsik, H. Ouali and K. M. Wheatley (Eds), Formulaic Language Volume 1: Distribution and Historical Change (pp. 77–96). Amsterdam: John Benjamins. Hughes, R. and Szczepek Reed, B. (2006). Factors affecting turn-taking behaviour: Genre meets prosody. In R. Hughes (Ed.), Spoken English, TESOL and Applied Linguistics: Challenges for Theory and Practice (pp. 126–40). Basingstoke: Palgrave Macmillan. Jiang, N. and Nekrasova, T. M. (2007). The processing of formulaic sequences by second language speakers. The Modern Language Journal, 91(3), 433–45. Johnstone, A. H. and Percival, F. (1976). Attention breaks in lectures. Education in Chemistry, 13(2), 49–50. Jusczyk, P. W., Hirsh-Pasek, K., Kemler Nelson, D. G., Kennedy, L. J., Woodward, A. and Piwoz, J. (1992). Perception of acoustic correlates of major phrasal units by young infants. Cognitive Psychology, 24, 252–93. Kay, P. and Fillmore, C. J. (1999). Grammatical constructions and linguistic generalisations: The what’s X doing Y? construction. Language, 75(1), 1–33. Keller, E. (1981). Gambits: Conversational strategy signals. In F. Coulmas (Ed.), Conversational Routine: Explorations in Standardized Communication Situations and Prepatterned Speech (pp. 93–113). The Hague: Mouton. Kemler Nelson, D. G., Hirsh-Pasek, K., Jusczyk, P. W. and Wright Cassidy, K. (1989). How the prosodic cues in motherese might assist language learning. Journal of Child Language, 16, 55–68. Kennedy, G. (2008). Phraseology and language pedagogy: Semantic preference associated with English verbs in the British National Corpus. In F. Meunier and S. Granger (Eds), Phraseology in Foreign Language Learning and Teaching (pp. 21–41). Amsterdam: John Benjamins. Knight, D. (2009). A Multi-Modal Corpus Approach to the Analysis of Backchanneling Behaviour. Nottingham: University of Nottingham. Knight, D., Evans, D., Carter, R. and Adolphs, S. (2009). HeadTalk, HandTalk and the corpus: Towards a framework for multi-modal, multi-media corpus development. Corpora, 4(1), 1–32.

Bibliography

221

Knowles, G. (1991). Prosodic labelling: The problem of tone group boundaries. In S. Johansson and A.-B. Stenstrom (Eds), English Computer Corpora: Selected Papers and Research Guide (pp. 149–63). Berlin: Mouton de Gruyter. Knowles, G. and Lawrence, L. (1987). Automatic intonation assignment. In R. Garside, G. Leech and G. Sampson (Eds), The Computational Analysis of English: A Corpus-based Approach. London: Longman. Knowles, G., Williams, B. and Taylor, L. (1996a). A Corpus of Formal British English Speech: The Lancaster/IBM Spoken English Corpus. London and New York, NY: Longman. Knowles, G., Wichmann, A. and Alderson, P. R. (Eds) (1996b). Working with Speech: Perspectives on Research into the Lancaster/IBM Spoken English Corpus. London: Longman. Kuiper, K. and Austin, P. (1990). They’re off and racing now: The speech of the New Zealand race caller. In A. Bell and J. Holmes (Eds), New Zealand Ways of Speaking English (pp. 195–220). Clevedon: Multilingual Matters. Kuiper, K. and Haggo, D. C. (1984). Livestock auctions, oral poetry and ordinary language. Language and Society, 13, 205–34. Kuiper, K. and Haggo, D. C. (1985). On the nature of ice hockey commentaries. In R. Berry and J. Acheson (Eds), Regionalism and National Identity: Multidisciplinary Essays on Canada, Australia and New Zealand (pp. 167–75). Christchurch: Association for Canadian Studies in Australia and New Zealand. Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–74. Langlotz, A. (2006). Idiomatic Creativity: A Cognitive-Linguistic Model of IdiomRepresentation and Idiom-Variation in English. Philadelphia, PA: John Benjamins. Leech, G., Garside, R. and Bryant, M. (1994). CLAWS4: The Tagging of the British National Corpus. Paper presented at the Proceedings of the 15th International Conference on Computational Linguistics (COLING 94), Kyoto, Japan. Lehiste, I. (1960). An acoustic-phonetic study of internal open juncture. Phonetica, 5 (Supplementum ad), 1–54. Lin, P. (2009). Review of Wulff (2008): ‘Rethinking idiomaticity: A usage-based approach’. International Journal of Corpus Linguistics, 14(3), 388–393. Lin, P. (2010). The phonology of formulaic sequences: A review. In D. Wood (ed.), Perspectives On Formulaic Language In Acquisition And Communication (pp. 174–193). London: Continuum. [FH Grade A publisher] Lin, P. (2012). Sound Evidence: The missing piece of the jigsaw in formulaic language research. Applied Linguistics, 33(3), 342–347. Lin, P. (2016). Internet social media as a multimodal corpus for profiling the prosodic patterns of formulaic speech. Paper presented at Joint conference of the English Linguistics Society of Korea and the Korea Society of Language and Information, Seoul, South Korea: Kyung Hee University, 28 May 2016.

222

Bibliography

Lin, P. (2017). A new tool for concordancing the Web as a multimodal corpus. Poster presented at Corpus Linguistics 2017 conference. UK: University of Birmingham, 24–28 July 2017. Lin, P. (2018). Formulaic language and speech prosody. In A. Siyanova-Chanturia and A. Pellicer-Sánchez (Eds), Understanding Formulaic Language: A Second Language Acquisition Perspective. London: Routledge. Lin, P. and Adolphs, S. (2009). Sound evidence: Phraseological units in spoken corpora. In A. Barfield and H. Gyllstad (Eds), Collocating in Another Language: Multiple Interpretations. Basingstoke, England: Palgrave Macmillan. Lin, P. and Chen, Y. (forthcoming). Multimodality I: Speech prosody and gesture. In S. Adolphs and D. Knight (Eds), Routledge Handbook of English Language and Digital Humanities. London: Routledge. Lindström, O. (1978). Aspects of English Intonation. Göteborg: Acta Universitatis Gothoburgensis. Lyon, C., Malcolm, J. and Dickerson, B. (2001). Detecting short passages of similar text in large document collections. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (pp. 118–25). New York, NY: Cornell University. Lyon, C., Barrett, R. and Malcolm, J. (2004). A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector. In Plagiarism: Prevention, Practice and Policies Conference, June 2004. Mackworth, N. H. (1948). The breakdown of vigilance during prolonged visual search. The Quarterly Journal of Experimental Psychology, 1(1), 6–21. Malinowski, B. (1989/1923). The problem of meaning in primitive languages. In C. K. Ogden and I. A. Richards (Eds), The Meaning of Meaning: A Study of the Influence of Language upon Thought and of the Science of Symbolism (pp. 296–336). San Diego, CA: Harcourt Brace Jovanovich. Mampe, B., Friederici, A. D., Christophe, A. and Wermke, K. (2009). Newborns’ cry melody is shaped by their native language. Current Biology, 19, 1–4. Matsumoto, K. (2000). Intonation units, clauses and preferred argument structure in conversational Japanese. Language Sciences, 22(1), 63–86. McEnery, T. and Gabrielatos, C. (2006). English corpus linguistics. In B. Aarts and A. McMahon (Eds), The Handbook of English Linguistics (pp. 33–71). Oxford: Blackwell. McKeachie, W. J. (1986). Teaching Tips: A Guidebook for Beginning College Teachers (8 ed.). Lexington, MA: Heath. McKeachie, W. J. (1999). Teaching Tips: Strategies, Research, and Theory for College and University Teachers (10 ed.). Lexington, MA: Heath. McLeish, J. (1968). The Lecture Method. Cambridge: Cambridge Institute of Education. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63(2), 81–97.

Bibliography

223

Moon, R. (1997). Vocabulary connections: Multi-word items in English. In N. Schmitt and M. McCarthy (Eds), Vocabulary: Description, Acquisition and Pedagogy (pp. 40–63). Cambridge and New York, NY: Cambridge University Press. Moon, R. (1998). Frequencies and forms of phrasal lexemes in English. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis and Applications (pp. 70–100). Oxford, England: Clarendon Press. Morgan, J. L. (1986). From Simple Input to Complex Grammar. Cambridge, MA: MIT Press. Nakov, P. and Hearst, M. (2005). Search engine statistics beyond the n-gram: Application to noun compound bracketing. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL ’05) (pp. 17–24). Nattinger, J. R. and DeCarrico, J. S. (1992). Lexical Phrases and Language Teaching. Oxford: Oxford University Press. Nesi, H. and Basturkmen, H. (2001). Lexical bundles and discourse signalling in academic lectures. International Journal of Corpus Linguistics, 11(3), 283–304. Nespor, M. and Vogel, I. (1986). Prosodic Phonology. Dordrecht and Holland: Foris. Norrick, N. R. (1985). How Proverbs Mean: Semantic Studies in English Proverbs. Berlin, New York and Amsterdam: Mouton. Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press. O’Donnell, M. B. and Ellis, N. C. (2009). Measuring Formulaic Language in Corpora from the Perspective of Language as a Complex System. Paper presented at the Corpus Linguistics 2009. Pawley, A. (1985). Lexicalization. In D. Tannen and J. E. Alatis (Eds), Languages and Linguistics: The Interdependence of Theory, Data and Application (pp. 98–120). Washington, DC: University of Georgetown. Pawley, A. and Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike fluency. In J. C. Richards and R. W. Schmidt (Eds), Language and Communication (pp. 191–226). London: Longman. Pawley, A. and Syder, F. H. (2000). The one-clause-at-a-time hypothesis. In H. Riggenbach (Ed.), Perspectives on Fluency (pp. 163–99). Ann Arbor, MI: University of Michigan Press. Peters, A. M. (1977). Language learning strategies: Does the whole equal the sum of the parts? Language, 53(3), 560–73. Peters, A. M. (1983). The Units of Language Acquisition. Cambridge: Cambridge University Press. Phillipson, R. (1992). Linguistic Imperialism. Oxford: Oxford University Press. Pickering, B. (1996). Distributional features of TSMs in the SEC. In G. Knowles, A. Wichmann and P. R. Alderson (Eds), Working with Speech: Perspectives on Research into the Lancaster/IBM Spoken English Corpus (pp. 109–28). London: Longman. Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation. Cambridge, MA: Massachusetts Institute of Technology.

224

Bibliography

Pierrehumbert, J. and Hirschberg, J. (1990). The meaning of intonational contours in the interpretation of discourse. In P. Cohen, J. Morgan and M. Pollock (Eds), Intentions in Communication (pp. 271–311). Cambridge, MA: MIT Press. Pike, K. L. (1945). The Intonation of American English. Ann Arbor, MI: University of Michigan Press. Plunkett, K. (1990). The segmentation problem in early language acquisition. Center for Research in Language Newsletter, 5(1), 1–17. Price, P. J., Ostendorf, M., Shattuck-Hufnagel, S. and Fong, C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90 (6), 2956–70. Quirk, R., Duckworth, A. P., Svartvik, J., Rusiecki, J. P. L. and Colin, A. J. T. (1964). Studies in the correspondence of prosodic to grammatical features in English. In H. G. Lunt (Ed.), Proceedings of the Ninth International Congress of Linguists (pp. 679–91). The Hague: Mouton. Raupach, M. (1984). Formulae in second language speech production. In H. W. Dechert, D. Möhle and M. Raupach (Eds), Second Language Productions (pp. 114–37). Tubingen: Gunter Narr. Rayson, P. (2003). Matrix: A Statistical Method and Software Tool for Linguistic Analysis Through Corpus Comparison. Lancaster: Lancaster University. Read, J. and Nation, P. (2004). Measurement of formulaic sequences. In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use (pp. 23–36). Amsterdam and Philadelphia, PA: John Benjamins. Roelof de Pijper, J. and Sanderman, A. A. (1994). On the perceptual strength of prosodic boundaries and its relation to suprasegmental cues. Journal of the Acoustical Society of America, 96(4), 2037–47. Rosenberg, S. (1977). Semantic constraints on sentence production: An experimental approach. In S. Rosenberg (Ed.), Sentence Production: Developments in Research and Theory (pp. 195–228). New York, NY: John Wiley. Sampson, G. (2001). Empirical Linguistics. London and New York, NY: Continuum. Schmidt, R. W. (1983). Interaction, acculturation, and the acquisition of communicative competence: A case study of an adult. In N. Wolfson and E. Judd (Eds), Sociolinguistics and Language Acquisition (pp. 137–74). Rowley, MA: Newbury House. Schmitt, N. and Underwood, G. (2004). Exploring the processing of formulaic sequences through a self-paced reading task. In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use (pp. 173–90). Amsterdam and Philadelphia, PA: John Benjamins. Schmitt, N., Grandage, S. and Adolphs, S. (2004). Are corpus-derived recurrent clusters psycholinguistically valid? In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use (pp. 127–52). Amsterdam and Philadelphia, PA: John Benjamins. Schuetze-Coburn, S., Shapley, M. and Weber, E. (1991). Units of intonation in discourse: A comparison of acoustic and auditory analyses. Language and Speech, 34(3), 207–34.

Bibliography

225

Scollon, R. (1976). Conversations with a One Year Old: A Case Study of the Developmental Foundation of Syntax. Honolulu: University of Hawaii Press. Scott, M. (2012). WordSmith Tools (Version 6). Liverpool: Lexical Analysis Software Ltd. Selkirk, E. (1984). Phonology and Syntax: The Relation Between Sound and Structure. Cambridge, MA: MIT Press. Shahidullah, S. and Hepper, P. G. (1994). Frequency discrimination by the fetus. Early Human Development, 36, 13–26. Simpson, R. (2004). Stylistic features of academic speech: The role of formulaic expressions. In U. Connor and T. A. Upton (Eds), Discourse in the Professions: Perspectives from Corpus Linguistics (pp. 37–64). Amsterdam and Philadelphia, PA: John Benjamins. Sinclair, J. M. (1966). Beginning the study of lexis. In C. E. Bazell, J. Catford, M. A. K. Halliday and R. Robins (Eds), In Memory of J. R. Firth (pp. 410–31). London: Longman. Sinclair, J. M. (1991). Corpus, Concordance and Collocation. Oxford: Oxford University Press. Sinclair, J. M. (2004). Intuition and annotation: The discussion continues. In K. Aijmer and B. Altenberg (Eds), Advances in Corpus Linguistics. Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23), Goteborg, May 22–26, 2002 (pp. 39–59). Amsterdam: Rodopi. Siyanova-Chanturia, A. and Lin, P. (2017). Production of ambiguous idioms in English: A reading aloud study. International Journal of Applied Linguistics, early view, 1–13. Skehan, P. (1998). A Cognitive Approach to Language Learning. Oxford: Oxford University Press. Stefanowitsch, A. and Gries, S. T. (2003). Collostructions: Investigating the interaction of words and constructions. International Journal of Corpus Linguistics, 8, 209–43. Stefanowitsch, A. and Gries, S. T. (2005). Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1(1), 1–43. Strässler, J. (1982). Idioms in English: A Pragmatic Analysis. Tubingen: Gunter Narr. Stuart, J. and Rutherford, R. J. D. (1978). Medical student concentration during lectures. The Lancet, 2, 514–16. Stubbs, M. (1996). Text and Corpus Analysis: Computer-Assisted Studies of Language and Culture. Oxford: Blackwell. Szczepek Reed, B. (2004). Turn-final intonation in English. In E. Couper-Kuhlen and C. E. Ford (Eds), Sound Patterns in Interaction (pp. 97–119). Amsterdam: John Benjamins. Tench, P. (1996). The Intonation Systems of English. London: Cassell. Terken, J. and Noteboom, S. D. (1987). Opposite effects of accentuation and deaccentuation on verification latencies for given and new information. Language and Cognitive Processes, 2, 145–63. Thompson, S. E. (1997). Presenting Research: A Study of Interaction in Academic Monologue. Liverpool: University of Liverpool.

226

Bibliography

Thompson, S. E. (2003). Text-structuring metadiscourse, intonation and the signalling of organisation in academic lectures. Journal of English for Academic Purposes, 2(1), 5–20. Todman, J., Rankin, D. and File, P. (1999). The use of stored text in computer-aided conversation: A single-case experiment. Journal of Language and Social Psychology, 18(3), 287–309. Trager, G. L. and Bloch, B. (1941). The syllabic phonemes of English. Language, 17, 223–46. T’sou, B. K., Sin, K. K., Chan, S. W. K., Lai, T. B. Y., Lun, C., Ko, K. T., et al. (2000). Jurilinguistic engineering in Cantonese Chinese: An N-gram-based speech to text transcription system. In Proceedings of the 18th International Conference on Computational Linguistics, Universität des Saarlandes, Saarbrücken, Germany, 31 July–4 August 2000 (Vol. 1, pp. 1121–5). Underwood, G., Schmitt, N. and Galpin, A. (2004). The eyes have it: An eyemovement study into the processing of formulaic sequences. In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use (pp. 153–72). Amsterdam and Philadelphia, PA: John Benjamins. van Donselaar, W. and Lentz, J. (1994). The function of sentence accents and given/ new information in speech processing: Different strategies for normal-hearing and hearing-impaired listeners? Language and Speech, 37, 375–91. van Lancker, D. (1987). Nonpropositional speech: Neurolinguistic studies. In A. W. Ellis (Ed.), Progress in the Psychology of Language (Vol. 3, pp. 49–118). London: Lawrence Erlbaum. van Lancker, D. and Canter, G. J. (1981). Idiomatic versus literal interpretations of ditropically ambiguous sentences. Journal of Speech and Hearing Research, 46(1), 64–9. van Lancker, D., Canter, G. J. and Terbeek, D. (1981). Disambiguation of ditropic sentences acoustic and phonetic cues. Journal of Speech and Hearing Research, 24(3), 330–35. Vihman, M. M. (1996). Phonological Development: The Origins of Language in the Child. Oxford: Blackwell. Warren, P. (1996). Prosody and parsing: An introduction. Language and Cognitive Processes, 11(1/2), 1–16. Warren, P. (1999). Prosody and language processing. In S. Garrod and M. Pickering (Eds), Language Processing (pp. 155–88). Hove: Psychology Press. Weinert, R. (1995). The role of formulaic language in second language acquisition: A review. Applied Linguistics, 16(2), 180–205. Wells, J. C. (2006). English Intonation: An Introduction. Cambridge: Cambridge University Press. Wennerstrom, A. K. (1994). Intonational meaning in English discourse: A study of non-native speakers. Applied Linguistics, 15(4), 399–420. Wennerstrom, A. K. (1998). Intonation as cohesion in academic discourse: A study of Chinese speakers of English. Studies in Second Language Acquisition, 20(1), 1–25.

Bibliography

227

Wennerstrom, A. K. (2001). Music of Everyday Speech: Prosody and Discourse Analysis. New York, NY and Oxford: Oxford University Press. Wennerstrom, A. K. (2006). Intonational meaning starting from talk. In R. Hughes (Ed.), Spoken English, TESOL and Applied Linguistics (pp. 72–98). Basingstoke, England: Palgrave Macmillan. Wichmann, A. (2000). Intonation in Text and Discourse. Harlow : Longman. Wichmann, A. (2001). Spoken parentheticals. In K. Aijmer (Ed.), A Wealth of English (pp. 177–93). Göteborg: Acta Universitatis Gothoburgensis. Wong Fillmore, L. (1976). The Second Time Around: Cognitive and Social Strategies in Second Language Acquisition. Unpublished PhD book. Stanford University, CA. Wood, D. (2001). In search of fluency: What is it and how can we teach it? Canadian Modern Language Review, 57(4), 573–89. Wood, D. (2002). Formulaic language in acquisition and production: Implications for teaching. TESL Canada Journal, 20(1), 1–15. Wood, D. (2004). An empirical investigation into the facilitating role of automatized lexical phrases in second language fluency development. Journal of Language and Learning, 2(1), 27–50. Wood, D. (2006). Uses and functions of formulaic sequences in second language speech: An exploration of the foundations of fluency. The Canadian Modern Language Review, 63(1), 13–33. Wood, D. (2012). Formulaic Language and Second Language Speech Fluency: Background, Evidence and Classroom Applications. London: Continuum. Wozniak, R. J., Coelho, C. A., Duffy, R. J. and Liles, B. Z. (1999). Intonation unit analysis of conversational discourse in closed head injury. Brain Injury, 13(3), 191–203. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wray, A. (2004). ‘Here’s one I prepared earlier’: Formulaic language learning on television. In N. Schmitt (Ed.), Formulaic Sequences: Acquisition, Processing and Use (pp. 249–68). Amsterdam and Philadelphia, PA: John Benjamins. Wray, A. and Namba, K. (2003). Formulaic language in a Japanese-English bilingual child: A practical approach to data analysis. Japan Journal of Multilingualism and Multiculturalism, 9, 24–51. Wray, A. and Perkins, M. R. (2000). The functions of formulaic language: An integrated model. Language and Communication, 20(1), 1–28. Wulff, S. (2008). Rethinking Idiomaticity: A Usage-Based Approach. London and New York, NY: Continuum. Young, M. S., Robinson, S. and Alberts, P. (2009). Students pay attention! Combating the vigilance decrement to improve learning during lectures. Active Learning in Higher Education, 10(1), 41–55. Yule, G. (1980). Speakers’ topics and major paratones. Lingua, 52, 33–47. Zgusta, L. (1967). Multiword lexical units. Word, 23, 578–83.

Author Index Adolphs, S. 8–22, 40, 130, 163 Aijmer, K. 2, 20, 38, 52–66, 81, 110 Alberts, P. 85 Alderson, P. R. 9, 122 Altenberg, B. 1–2, 13–20, 36–81, 110, 115, 169–70 Ashby, M. 3–4, 39–60, 120–30, 166–7 Austin, P. 56 Bahns, J. 23, 28–33, 138 Baker, M. 2, 20, 36–54, 66, 81, 110 Barbieria, F. 1, 15 Barr, D. J. 114 Barr, P. 114 Barrett, R. 169 Barrón-Cedeño, A. 169 Basturkmen, H. 119 Beckman, M. E. 51 Benjamin L. T. (Jr.) 85 Biber, D. 1, 13–18 Bing, J. M. 77 Bloch, B. 51 Bloom, L. 2, 39–42 Bock, J. K. 167 Boersma, P. 72, 91, 114, 117 Bogdanovich, J. M. 56, 117, 130 Bolinger, D. 2, 78, 114 Boysson-Bardies, B. 41 Branigan, G. 42 Brazil, D. 100 Brown, G. 70, 72, 100 Bryant, M. 123 Burmeister, H. 22, 23, 138 Bush, N. 56, 164 Butler, C. S. 55 Butterworth, B. 55 Bybee, J. 12, 56, 84, 164 Canter, G. J. 38–57, 71, 117–20 Carter, R. 8 Chafe, W. L. 49, 55, 70, 99 Chan, S. W. K. 169

Cheng, W. 20 Christophe, A. 41 Coelho, C. A. 100 Colin, A. J. T. 70 Conklin, K. 12, 56, 117, 130 Conrad, S. 13–14 Coppieters, R. 207 Cortes, V. 13–14, 56, 122 Coulmas, F. 31, 120, 213 Cowie, A. P. 2 Croft, W. 75, 78, 85 Cruttenden, A. 52–79, 100, 118 Crystal, D. 2, 75 Csomay, E. 56, 122 Currie, K. L. 70–2, 100 Cutler, A. 49, 167 Dahan, D. 49 Dahlmann, I. 8, 40–7, 67–8, 129 Davies, A. 207 Davis, B. G. 85 DeCarrico, J. S. 74 Dechert, H. W. 7, 12, 39–55, 116 De Cock, S. 1, 19–36, 75, 133–6 Dehé, N. 77–8 Dickerson, B. 169 Duckworth, A. P. 70 Duffy, R. J. 100 Eeg-Olofsson, M. 1–20, 36–66, 81, 110 Elam, G. A. 51 Ellis, N. C. 12, 61, 84–93, 117 Engel, W. v. R. 39–42 Erman, B. 15–42, 65, 91, 138 Evans, D. 8 Fagyal, Z. 77 Field, A. 2, 24, 87, 106, 128–33, 164 File, P. 169 Fillmore, C. J. 12, 56 Finegan, E. 18, 207 Firth, J. R. 12

Author Index Fisher, C. 172 Fong, C. 100 Foster, P. 27–38, 91–109, 137–9 Frederick, P. J. 85 Friederici, A. D. 41 Fry, H. 85 Gabrielatos, C. 24 Galpin, A. 12, 56, 117, 130 Garside, R. 122–3 Gerken, L. 172 Gibbs, R. W. 56, 117, 130 Goldberg, A. 12 Goldman-Eisler, F. 167 Grandage, S. 12, 130 Greaves, C. 18–20 Gries, S. T. 24 Haggo, D. C. 56, 62 Halliday, M. A. K. 2, 11–12, 70 Hearst, M. 169 Hepper, P. G. 41 Hickey, T. 2, 16, 39–40, 110 Hirschberg, J. 51 Hirsh-Pasek, K. 172 Hockett, C. F. 51 Horgan, J. 85 Howarth, P. 95 Hudson, J. 1, 12–13, 93–106, 124 Hughes, R. 160 Isard, S. D.

212

Jiang, N. 12, 130 Johansson, S. 207 Johnstone, A. H. 85 Jusczyk, P. W. 172 Kay, P. 12 Keller, E. 54 Kemler Nelson, D. G. 172 Kennedy, G. 19 Kennedy, L. J. 19 Kenworthy, J. 70–2, 100 Knight, D. 8, 82 Knowles, G. 9, 50–1, 100–22, 169–71 Ko, K. T. 169 Koch, G. G. 144 Kuiper, K. 56, 62

229

Lai, T. B. Y. 169 Landis, J. R. 144–5 Langlotz, A. 16 Lawrence, L. 115, 169–170 Leech, G. 122, 123, 207 Lehiste, I. 51 Lentz, J. 167 Liles, B. Z. 100 Lin, P. 2, 12, 17, 22–40, 91, 117, 128, 172 Lindström, O. 52 Lun, C. 169 Lyon, C. 169 McCarthy, M. 2, 19, 36–66, 81, 110 McEnery, T. 24 McKeachie, W. J. 85 Mackworth N. H. 85 McLeish, J. 85 Malcolm, J. 169 Malinowski, B. 120 Mampe, B. 41 Mandel, D. R. 172 Matsumoto, K. 78 Mazzella, J. R. 167 Miller, G. A. 49 Moon, R. 16–20, 38–66, 81, 95, 107, 110, 121 Morgan, J. L. 172 Nakov, P. 169 Namba, K. 3, 16–24, 40, 44 Nation, P. 17, 40 Nattinger, J. R. 31, 74 Nekrasova, T. M. 12, 130 Nesi, H. 119 Nespor, M. 70, 77 Norrick, N. R. 207 Noteboom, S. D. 167 Oakes, M. P. 77 O'Connor, M. C. 12 O'Donnell, M. B. 93 Ostendorf, M. 100 Pawley, A. 1, 12, 38–56, 117 Percival, F. 85 Perkins, M. R. 12 Peters, A. M. 2, 12–16, 38–53, 116 Phillipson, R. 207

230 Pickering, B. 71 Pierrehumbert, J. 51, 70 Pike, K. L. 114 Piwoz, J. 172 Plunkett, K. 2, 17, 35–40 Price, P. J. 100 Quirk, R. 70 Rankin, D. 169 Raupach, M. 7, 12, 39–66, 109–16 Rayson, P. 18, 20 Read, J. 17, 27, 40, 45, 70–98, 123 Robinson, S. 85 Roelof de Pijper, J. 71, 114 Rosenberg, S. 42 Rosso, P. 169 Rusiecki, J. P. L. 70 Rutherford, R. J. D. 85 Sampson, G. 24, 122 Sanderman, A. A. 71, 114 Scheibman, J. 56, 164 Schmitt, N. 12, 56, 117, 130 Schuetze-Coburn, S. 78, 114 Scollon, R. 42 Scott, M. 1, 18, 67–8 Selkirk, E. 70, 77 Shahidullah, S. 41 Shapley, M. 78, 114 Shattuck-Hufnagel, S. 51 Simpson, R. 1, 19, 36, 133 Sin, K. K. 169 Sinclair, J. M. 11–19, 23–4 Smith, N. 123 Stefanowitsch, A. 24 Stokes, W. 42 Strässler, J. 207 Stuart, J. 85 Stubbs, M. 24 Svartvik, J. 70 Syder, F. H. 1, 12, 38–56, 117 Sykes, J. R. 56, 117, 130 Szczepek Reed, B. 160–161

Author Index Taylor, L. 9, 122 Tench, P. 52 Terbeek, D. 38–57, 71, 120 Terken, J. 167 Thompson, S. E. 114 Todman, J. 169 Tokura, H. 172 Trager, G. L. 51 T'sou, B. K. 169 Underwood, G.

12, 56, 117, 130

van Donselaar, W. 49, 167 Van Lancker, D. 13, 38–57, 71, 95, 117, 120, 208–9 Vihman, M. M. 41 Vogel, I. 22, 23, 70, 138 Vogel, T. 77 Warren, B. 21–36, 65, 91, 138 Warren, M. 20 Warren, P. 49 Weber, E. 78, 114 Weinert, R. 2, 38–55, 109, 117 Wells, J. C. 2–4, 52–61, 123, 130, 166–7 Wennerstrom A. K. 2, 70, 114, 133, 167–8 Wermke, K. 41 Wichmann, A. 9, 71–9, 114, 122 Wiktorsson, M. 1, 12–13, 124 Williams, B. 9, 122 Wood, D. 1, 28–40, 66, 91, 117, 137, 139, 163, 207 Woodward, A. 172 Wozniak, R. J. 100 Wray, A. 1, 2, 12–55, 116–29, 160–8, 209–13 Wright Cassidy, K. 172 Wulff, S. 12, 24–34, 91, 95, 207, 210 Young, M. S. 85 Yule, G. 114 Zgusta, L. 16

Subject Index accent 57, 58 alignment 8, 20, 47, 67–110, 127, 129, 157–71 approaches to speech prosody acoustic 69–79, 117–18 auditory 41, 69–81 assimilation 56 automatic extraction ConcGram 18–19, 20, 27 Wmatrix 18–20, 27 WordSmith 1, 18–27, 38, 160 British Academic Spoken English Corpus (BASE) 21, 207 Cambridge and Nottingham Corpus of Discourse in English (CANCODE) 19 child 11–17, 28–48, 63, 65, 110, 156–66 chunk 31, 48, 55, 92 cliché 1 cluster 21, 69 concordance 4, 74, 136, 162 declination 71, 114 dictionary 23, 32 discourse 22, 55, 65, 86–120, 133–7, 159 dysfluency 43–56, 116 English-as-a-Foreign-Language (EFL) learners 7, 10, 26, 67, 68, 81, 109–11, 157, 167–8 English Language Teaching (ELT) 2–3, 121, 128, 166 fixedness 5, 12, 22, 37, 50–67, 111–30 fluency 1, 37, 53–6, 84, 117, 128, 167–8 formula 2, 6, 31, 43–4, 142–54

formulaic language criteria 11–53, 85, 100, 134–58 extraction (see automatic extraction; native speaker judgement) validation 48, 62, 156, 158 formulaicity 5–37, 47–77, 82–96, 101–13, 126–71 frequency 19–24, 69–84, 104–20, 143 hesitation

7, 42–65, 83, 129

idiom/idiomaticity 5, 19–61, 95, 120–1, 210 intonation 2–82, 87–115, 122–71 intonation unit/ intonation contour 41–9, 55–79, 100–15, 167, 170, 193, 210 London-Lund Corpus (LLC) 83 Michigan Corpus of Academic Spoken English (MICASE) 21, 83–4, 207 multimodality 6, 136, 163 multiword unit 53 native speaker 6–42, 65–8, 82–97, 101–10, 122–9, 133–45, 153–71, 201, 207, 210, 213 native speaker formulaicity judgement concentration/fatigue 11–29, 84–96, 109, 126, 137 inconsistency 11, 19, 85, 88, 134 inter-rater reliability 134, 135, 138, 142, 149, 154, 165, 212 Natural Language Processing 78, 121, 128, 168–71 Nottingham Multimodal Corpus (NMMC) 82, 83, 84, 92 nucleus 59, 101 open-choice principle 22

232

Subject Index

rhythm 6–10, 52, 75, 113–27, 158, 164

tone/tone choice 2, 58–9, 71, 79–80, 101, 114, 121, 193, 210 tonicity 4, 52–9 syllable lengthening 51, 79, 81, 209, 212 salience 43, 172 segment 41, 166 Spoken English Corpus (SEC) 9, 83, 121–7, 158

stress tonality

Tempo. See rhythm turn-taking 73, 82, 84, 160–1

pause 4, 13, 42–80, 100–27, 158, 167–8 pause-defined unit 51, 210 phraseology 13–14 pitch reset 51, 71, 79 PRAAT 72, 91, 114, 117 prosodic discontinuity 51, 210

52